X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-file-format.texi;h=006395078d9cb56e9cc618c72fba9edd83f1cad9;hb=5715e8846c6e3a8dd539b2dc9c9c65640f84af76;hp=c6b0c9653aa7e2aa27c95beb074eae1bfe942fa7;hpb=c3ff2edaca074e102e3b8fddda610d68e051229c;p=pspp-builds.git diff --git a/doc/data-file-format.texi b/doc/data-file-format.texi index c6b0c965..00639507 100644 --- a/doc/data-file-format.texi +++ b/doc/data-file-format.texi @@ -9,7 +9,8 @@ There are three data types used in system files: 32-bit integers, 64-bit floating points, and 1-byte characters. In this document these will simply be referred to as @code{int32}, @code{flt64}, and @code{char}, the names that are used in the PSPP source code. Every field of type -@code{int32} or @code{flt64} is aligned on a 32-bit boundary. +@code{int32} or @code{flt64} is aligned on a 32-bit boundary relative to +the start of the record. The endianness of data in PSPP system files is not specified. System files output on a computer of a particular endianness will have the @@ -45,6 +46,7 @@ described below: * Machine flt64 Info Record:: * Auxiliary Variable Parameter Record:: * Long Variable Names Record:: +* Very Long String Length Record:: * Miscellaneous Informational Records:: * Dictionary Termination Record:: * Data Record:: @@ -61,7 +63,7 @@ struct sysfile_header char rec_type[4]; char prod_name[60]; int32 layout_code; - int32 case_size; + int32 nominal_case_size; int32 compressed; int32 weight_index; int32 ncases; @@ -90,13 +92,13 @@ with spaces. Always set to 2. PSPP reads this value to determine the file's endianness. -@item int32 case_size; +@item int32 nominal_case_size; Number of data elements per case. This is the number of variables, except that long string variables add extra data elements (one for every -8 characters after the first 8). -When reading system files, PSPP will use this value unless it is set -to -1, in which case it will determine the number of data elements by -context. When writing system files PSPP always uses this value. +8 characters after the first 8). However, string variables do not +contribute to this value beyond the first 255 bytes. Further, system +files written by some systems set this value to -1. In general, it is +unsafe for systems reading system files to rely upon this value. @item int32 compressed; Set to 1 if the data in the file is compressed, 0 otherwise. @@ -133,8 +135,8 @@ format and using 24-hour time. If the time is not available then this field is arbitrarily set to @samp{00:00:00}. @item char file_label[64]; -Set the the file label declared by the user, if any. Padded on the -right with spaces. +Set the file label declared by the user, if any (@pxref{FILE LABEL}). +Padded on the right with spaces. @item char padding[3]; Ignored padding bytes to make the structure a multiple of 32 bits in @@ -146,9 +148,7 @@ length. Set to zeros. Immediately following the header must come the variable records. There must be one variable record for every variable and every 8 characters in -a long string beyond the first 8; i.e., there must be exactly as many -variable records as the value specified for @code{case_size} in the file -header record. +a long string beyond the first 8. @example struct sysfile_variable @@ -168,7 +168,7 @@ struct sysfile_variable /* The following field is present only if n_missing_values is not 0. */ - flt64 missing_values[/* variable length*/]; + flt64 missing_values[/* variable length */]; @}; @end example @@ -201,7 +201,7 @@ Write format for this variable. See below. @item char name[8]; Variable name. The variable name must begin with a capital letter or -the at-sign (@samp{@@}). Subsequent characters may also be octothorpes +the at-sign (@samp{@@}). Subsequent characters may also be digits, octothorpes (@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full stops (@samp{.}). The variable name is padded on the right with spaces. @@ -367,9 +367,11 @@ Record type. Always set to 4. Number of variables that the associated value labels from the value label record are to be applied. -@item int32 vars[/* variable length]; +@item int32 vars[/* variable length */]; A list of variables to which to apply the value labels. There are -@code{count} elements. +@code{count} elements. Each element identifies a variable record, where +the first element is numbered 1 and long string variables are considered +to occupy multiple indexes. @end table @node Document Record, Machine int32 Info Record, Value Label Variable Record, Data File Format @@ -470,6 +472,7 @@ Machine endianness. 1 indicates big-endian, 2 indicates little-endian. @item int32 character_code; Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3 indicates 8-bit ASCII, 4 indicates DEC Kanji. +Windows code page numbers are also valid. @end table @node Machine flt64 Info Record, Auxiliary Variable Parameter Record, Machine int32 Info Record, Data File Format @@ -503,7 +506,7 @@ Record type. Always set to 7. Record subtype. Always set to 4. @item int32 size; -Size of each piece of data in the data part, in bytes. Always set to 4. +Size of each piece of data in the data part, in bytes. Always set to 8. @item int32 count; Number of pieces of data in the data part. Always set to 3. @@ -554,7 +557,9 @@ The total number of records in @code{aux_params}, multiplied by 3. @item struct aux_params aux_params[]; An array of @code{struct aux_params}. The order of the elements corresponds -to the order of the variables in the Variable Records. The @code{struct aux_params} type is defined as follows: +to the order of the variables in the Variable Records. No element +corresponds to variable records that continue long string variables. +The @code{struct aux_params} type is defined as follows: @example struct aux_params @@ -577,6 +582,9 @@ Ordinal Scale Continuous Scale @end table +Occasionally a value of 0 is seen here. PSPP interprets this to mean +a nominal scale. + @item int32 width The width of the display column for the variable in characters. @@ -600,7 +608,7 @@ Centre aligned -@node Long Variable Names Record, Miscellaneous Informational Records, Auxiliary Variable Parameter Record, Data File Format +@node Long Variable Names Record, Very Long String Length Record, Auxiliary Variable Parameter Record, Data File Format @section Long Variable Names Record There must be no more than one long variable names record per @@ -634,11 +642,12 @@ The size of each element in the @code{var_name_pairs} member. Always set to 1. @item int32 count; The total number of bytes in @code{var_name_pairs}. -@item char var_name_pairs[/* variable length]; +@item char var_name_pairs[/* variable length */]; A list of @var{key}--@var{value} tuples, where @var{key} is the name of a variable, and @var{value} is its long variable name. The @var{key} field is at most 8 bytes long and must match the -name of a variable which appears in the variable record @xref{Variable Record}. +name of a variable which appears in the variable record (@pxref{Variable +Record}). The @var{value} field is at most 64 bytes long. The @var{key} and @var{value} fields are separated by a @samp{=} byte. Each tuple is separated by a byte whose value is 09. There is no @@ -646,15 +655,67 @@ trailing separator following the last tuple. The total length is @code{count} bytes. @end table +@node Very Long String Length Record, Miscellaneous Informational Records, Long Variable Names Record, Data File Format +@comment node-name, next, previous, up +@section Very Long String Length Record + + +There must be no more than one very long string length record per +system file. This record must follow the variable records and precede the +dictionary termination record. + +@example +struct sysfile_very_long_string_lengths + @{ + /* Header. */ + int32 rec_type; + int32 subtype; + int32 size; + int32 count; + + /* Data. */ + char string_lengths[/* variable length */]; + @}; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 14. + +@item int32 size; +The size of each element in the @code{string_lengths} member. Always set to 1. + +@item int32 count; +The total number of bytes in @code{string_lengths}. + +@item char string_lengths[/* variable length */]; +A list of @var{key}--@var{value} tuples, where @var{key} is the name +of a variable, and @var{value} is its length. +The @var{key} field is at most 8 bytes long and must match the +name of a variable which appears in the variable record (@pxref{Variable +Record}). +The @var{value} field is exactly 5 bytes long. It is a zero-padded, +ASCII-encoded string that is the length of the variable. +The @var{key} and @var{value} fields are separated by a @samp{=} byte. +Tuples are delimited by a two-byte sequence @{00, 09@}. +After the last tuple, there may be a single byte 00, or @{00, 09@}. +The total length is @code{count} bytes. +@end table + + -@node Miscellaneous Informational Records, Dictionary Termination Record, Long Variable Names Record, Data File Format +@node Miscellaneous Informational Records, Dictionary Termination Record, Very Long String Length Record, Data File Format @section Miscellaneous Informational Records Miscellaneous informational records must follow the variable records and precede the dictionary termination record. -Miscellaneous informational records are ignored by PSPP when reading -system files. They are not written by PSPP when writing system files. +Some specific types of miscellaneous informational records are +documented here, but others are known to exist. PSPP ignores unknown +miscellaneous informational records when reading system files. @example struct sysfile_misc_info @@ -724,9 +785,8 @@ The format of data records varies depending on whether the data is compressed. Regardless, the data is arranged in a series of 8-byte elements. -When data is not compressed, Every case is composed of @code{case_size} -of these 8-byte elements, where @code{case_size} comes from the file -header record (@pxref{File Header Record}). Each element corresponds to +When data is not compressed, +each element corresponds to the variable declared in the respective variable record (@pxref{Variable Record}). Numeric values are given in @code{flt64} format; string values are literal characters string, padded on the right when