X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-file-format.texi;h=9cee3a4c822ba2e9044a9028eb6daa5d39eec914;hb=e0d0265ba2c4c74d3f7c57a33a18014bd82c8d27;hp=b9c6e6f8e41a0ae5995414b4441c12d3c92b2382;hpb=35b33ef0aa2090c0cbb83d31c71175f5b7c95c95;p=pspp-builds.git diff --git a/doc/data-file-format.texi b/doc/data-file-format.texi index b9c6e6f8..9cee3a4c 100644 --- a/doc/data-file-format.texi +++ b/doc/data-file-format.texi @@ -1,4 +1,4 @@ -@node Data File Format, q2c Input Format, Portable File Format, Top +@node Data File Format @appendix Data File Format PSPP necessarily uses the same format for system files as do the @@ -9,7 +9,8 @@ There are three data types used in system files: 32-bit integers, 64-bit floating points, and 1-byte characters. In this document these will simply be referred to as @code{int32}, @code{flt64}, and @code{char}, the names that are used in the PSPP source code. Every field of type -@code{int32} or @code{flt64} is aligned on a 32-bit boundary. +@code{int32} or @code{flt64} is aligned on a 32-bit boundary relative to +the start of the record. The endianness of data in PSPP system files is not specified. System files output on a computer of a particular endianness will have the @@ -45,12 +46,13 @@ described below: * Machine flt64 Info Record:: * Auxiliary Variable Parameter Record:: * Long Variable Names Record:: +* Very Long String Length Record:: * Miscellaneous Informational Records:: * Dictionary Termination Record:: * Data Record:: @end menu -@node File Header Record, Variable Record, Data File Format, Data File Format +@node File Header Record @section File Header Record The file header is always the first record in the file. @@ -61,7 +63,7 @@ struct sysfile_header char rec_type[4]; char prod_name[60]; int32 layout_code; - int32 case_size; + int32 nominal_case_size; int32 compressed; int32 weight_index; int32 ncases; @@ -90,13 +92,13 @@ with spaces. Always set to 2. PSPP reads this value to determine the file's endianness. -@item int32 case_size; +@item int32 nominal_case_size; Number of data elements per case. This is the number of variables, except that long string variables add extra data elements (one for every -8 characters after the first 8). -When reading system files, PSPP will use this value unless it is set -to -1, in which case it will determine the number of data elements by -context. When writing system files PSPP always uses this value. +8 characters after the first 8). However, string variables do not +contribute to this value beyond the first 255 bytes. Further, system +files written by some systems set this value to -1. In general, it is +unsafe for systems reading system files to rely upon this value. @item int32 compressed; Set to 1 if the data in the file is compressed, 0 otherwise. @@ -133,22 +135,20 @@ format and using 24-hour time. If the time is not available then this field is arbitrarily set to @samp{00:00:00}. @item char file_label[64]; -Set the the file label declared by the user, if any. Padded on the -right with spaces. +Set the file label declared by the user, if any (@pxref{FILE LABEL}). +Padded on the right with spaces. @item char padding[3]; Ignored padding bytes to make the structure a multiple of 32 bits in length. Set to zeros. @end table -@node Variable Record, Value Label Record, File Header Record, Data File Format +@node Variable Record @section Variable Record Immediately following the header must come the variable records. There must be one variable record for every variable and every 8 characters in -a long string beyond the first 8; i.e., there must be exactly as many -variable records as the value specified for @code{case_size} in the file -header record. +a long string beyond the first 8. @example struct sysfile_variable @@ -168,7 +168,7 @@ struct sysfile_variable /* The following field is present only if n_missing_values is not 0. */ - flt64 missing_values[/* variable length*/]; + flt64 missing_values[/* variable length */]; @}; @end example @@ -201,7 +201,7 @@ Write format for this variable. See below. @item char name[8]; Variable name. The variable name must begin with a capital letter or -the at-sign (@samp{@@}). Subsequent characters may also be octothorpes +the at-sign (@samp{@@}). Subsequent characters may also be digits, octothorpes (@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full stops (@samp{.}). The variable name is padded on the right with spaces. @@ -318,7 +318,7 @@ Not used. @code{SDATE} @end table -@node Value Label Record, Value Label Variable Record, Variable Record, Data File Format +@node Value Label Record @section Value Label Record Value label records must follow the variable records and must precede @@ -344,7 +344,7 @@ first @code{char} is a count of the number of characters in the value label. The remainder of the field is the label itself. The field is padded on the right to a multiple of 64 bits in length. -@node Value Label Variable Record, Document Record, Value Label Record, Data File Format +@node Value Label Variable Record @section Value Label Variable Record Every value label variable record must be immediately preceded by a @@ -367,12 +367,14 @@ Record type. Always set to 4. Number of variables that the associated value labels from the value label record are to be applied. -@item int32 vars[/* variable length]; +@item int32 vars[/* variable length */]; A list of variables to which to apply the value labels. There are -@code{count} elements. +@code{count} elements. Each element identifies a variable record, where +the first element is numbered 1 and long string variables are considered +to occupy multiple indexes. @end table -@node Document Record, Machine int32 Info Record, Value Label Variable Record, Data File Format +@node Document Record @section Document Record There must be no more than one document record per system file. @@ -400,7 +402,7 @@ Document lines. The number of elements is defined by @code{n_lines}. Lines shorter than 80 characters are padded on the right with spaces. @end table -@node Machine int32 Info Record, Machine flt64 Info Record, Document Record, Data File Format +@node Machine int32 Info Record @section Machine @code{int32} Info Record There must be no more than one machine @code{int32} info record per @@ -470,9 +472,10 @@ Machine endianness. 1 indicates big-endian, 2 indicates little-endian. @item int32 character_code; Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3 indicates 8-bit ASCII, 4 indicates DEC Kanji. +Windows code page numbers are also valid. @end table -@node Machine flt64 Info Record, Auxiliary Variable Parameter Record, Machine int32 Info Record, Data File Format +@node Machine flt64 Info Record @section Machine @code{flt64} Info Record There must be no more than one machine @code{flt64} info record per @@ -503,7 +506,7 @@ Record type. Always set to 7. Record subtype. Always set to 4. @item int32 size; -Size of each piece of data in the data part, in bytes. Always set to 4. +Size of each piece of data in the data part, in bytes. Always set to 8. @item int32 count; Number of pieces of data in the data part. Always set to 3. @@ -518,7 +521,7 @@ The value used for HIGHEST in missing values. The value used for LOWEST in missing values. @end table -@node Auxiliary Variable Parameter Record, Long Variable Names Record, Machine flt64 Info Record, Data File Format +@node Auxiliary Variable Parameter Record @section Auxiliary Variable Parameter Record There must be no more than one auxiliary variable parameter record per @@ -554,7 +557,9 @@ The total number of records in @code{aux_params}, multiplied by 3. @item struct aux_params aux_params[]; An array of @code{struct aux_params}. The order of the elements corresponds -to the order of the variables in the Variable Records. The @code{struct aux_params} type is defined as follows: +to the order of the variables in the Variable Records. No element +corresponds to variable records that continue long string variables. +The @code{struct aux_params} type is defined as follows: @example struct aux_params @@ -569,14 +574,17 @@ struct aux_params @item int32 measure The measurement type of the variable: @table @asis -@item 0 -Nominal Scale @item 1 -Ordinal Scale +Nominal Scale @item 2 +Ordinal Scale +@item 3 Continuous Scale @end table +Occasionally a value of 0 is seen here. PSPP interprets this to mean +a nominal scale. + @item int32 width The width of the display column for the variable in characters. @@ -600,7 +608,7 @@ Centre aligned -@node Long Variable Names Record, Miscellaneous Informational Records, Auxiliary Variable Parameter Record, Data File Format +@node Long Variable Names Record @section Long Variable Names Record There must be no more than one long variable names record per @@ -634,11 +642,12 @@ The size of each element in the @code{var_name_pairs} member. Always set to 1. @item int32 count; The total number of bytes in @code{var_name_pairs}. -@item char var_name_pairs[/* variable length]; +@item char var_name_pairs[/* variable length */]; A list of @var{key}--@var{value} tuples, where @var{key} is the name of a variable, and @var{value} is its long variable name. The @var{key} field is at most 8 bytes long and must match the -name of a variable which appears in the variable record @xref{Variable Record}. +name of a variable which appears in the variable record (@pxref{Variable +Record}). The @var{value} field is at most 64 bytes long. The @var{key} and @var{value} fields are separated by a @samp{=} byte. Each tuple is separated by a byte whose value is 09. There is no @@ -646,15 +655,67 @@ trailing separator following the last tuple. The total length is @code{count} bytes. @end table +@node Very Long String Length Record +@comment node-name, next, previous, up +@section Very Long String Length Record + + +There must be no more than one very long string length record per +system file. This record must follow the variable records and precede the +dictionary termination record. + +@example +struct sysfile_very_long_string_lengths + @{ + /* Header. */ + int32 rec_type; + int32 subtype; + int32 size; + int32 count; + + /* Data. */ + char string_lengths[/* variable length */]; + @}; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 14. + +@item int32 size; +The size of each element in the @code{string_lengths} member. Always set to 1. + +@item int32 count; +The total number of bytes in @code{string_lengths}. + +@item char string_lengths[/* variable length */]; +A list of @var{key}--@var{value} tuples, where @var{key} is the name +of a variable, and @var{value} is its length. +The @var{key} field is at most 8 bytes long and must match the +name of a variable which appears in the variable record (@pxref{Variable +Record}). +The @var{value} field is exactly 5 bytes long. It is a zero-padded, +ASCII-encoded string that is the length of the variable. +The @var{key} and @var{value} fields are separated by a @samp{=} byte. +Tuples are delimited by a two-byte sequence @{00, 09@}. +After the last tuple, there may be a single byte 00, or @{00, 09@}. +The total length is @code{count} bytes. +@end table + + -@node Miscellaneous Informational Records, Dictionary Termination Record, Long Variable Names Record, Data File Format +@node Miscellaneous Informational Records @section Miscellaneous Informational Records Miscellaneous informational records must follow the variable records and precede the dictionary termination record. -Miscellaneous informational records are ignored by PSPP when reading -system files. They are not written by PSPP when writing system files. +Some specific types of miscellaneous informational records are +documented here, but others are known to exist. PSPP ignores unknown +miscellaneous informational records when reading system files. @example struct sysfile_misc_info @@ -691,7 +752,7 @@ Arbitrary data. There must be @code{size} times @code{count} bytes of data. @end table -@node Dictionary Termination Record, Data Record, Miscellaneous Informational Records, Data File Format +@node Dictionary Termination Record @section Dictionary Termination Record The dictionary termination record must follow all other records, except @@ -714,7 +775,7 @@ Record type. Always set to 999. Ignored padding. Should be set to 0. @end table -@node Data Record, , Dictionary Termination Record, Data File Format +@node Data Record @section Data Record Data records must follow all other records in the data file. There must @@ -724,9 +785,8 @@ The format of data records varies depending on whether the data is compressed. Regardless, the data is arranged in a series of 8-byte elements. -When data is not compressed, Every case is composed of @code{case_size} -of these 8-byte elements, where @code{case_size} comes from the file -header record (@pxref{File Header Record}). Each element corresponds to +When data is not compressed, +each element corresponds to the variable declared in the respective variable record (@pxref{Variable Record}). Numeric values are given in @code{flt64} format; string values are literal characters string, padded on the right when