X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fsystem-file-format.texi;h=a5a89f4ded01672a57c53791d4f81d51ca3bd24f;hb=7b43b700543f6914d8eaf3d41b42f95d1db1c487;hp=fb131f5274f0d61e90f9eefdf0c994c413b7e468;hpb=8e27b1a0dba7f33b7acb0d8894efe2045b0bb98f;p=pspp diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi index fb131f5274..a5a89f4ded 100644 --- a/doc/dev/system-file-format.texi +++ b/doc/dev/system-file-format.texi @@ -1,3 +1,13 @@ +@c PSPP - a program for statistical analysis. +@c Copyright (C) 2019 Free Software Foundation, Inc. +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 +@c or any later version published by the Free Software Foundation; +@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the section entitled "GNU +@c Free Documentation License". +@c + @node System File Format @appendix System File Format @@ -6,7 +16,7 @@ that describes how they may be interpreted. This chapter describes the format of a system file. System files use four data types: 8-bit characters, 32-bit integers, -64-bit integers, +64-bit integers, and 64-bit floating points, called here @code{char}, @code{int32}, @code{int64}, and @code{flt64}, respectively. Data is not necessarily aligned on a word @@ -222,6 +232,12 @@ pspp 0.1.4 - sparc-sun-solaris2.5.2}. The string is truncated if it would be longer than 60 characters; otherwise it is padded on the right with spaces. +The product name field allow readers to behave differently based on +quirks in the way that particular software writes system files. +@xref{Value Labels Records}, for the detail of the quirk that the PSPP +system file reader tolerates in files written by ReadStat, which has +@code{https://github.com/WizardMac/ReadStat} in @code{prod_name}. + @anchor{layout_code} @item int32 layout_code; Normally set to 2, although a few system files have been spotted in @@ -232,8 +248,8 @@ file's integer endianness (@pxref{System File Format}). Number of data elements per case. This is the number of variables, except that long string variables add extra data elements (one for every 8 characters after the first 8). However, string variables do not -contribute to this value beyond the first 255 bytes. Further, system -files written by some systems set this value to -1. In general, it is +contribute to this value beyond the first 255 bytes. Further, some +software always writes -1 or 0 in this field. In general, it is unsafe for systems reading system files to rely upon this value. @item int32 compression; @@ -310,10 +326,10 @@ so readers should take care to parse dummy variable records in the same way as other variable records. @anchor{Dictionary Index} -The @dfn{dictionary index} of a variable is its offset in the set of +The @dfn{dictionary index} of a variable is a 1-based offset in the set of variable records, including dummy variable records for long string -variables. The first variable record has a dictionary index of 0, the -second has a dictionary index of 1, and so on. +variables. The first variable record has a dictionary index of 1, the +second has a dictionary index of 2, and so on. The system file format does not directly support string variables wider than 255 bytes. Such very long string variables are represented @@ -508,6 +524,10 @@ Format types are defined as follows: @tab @code{EDATE} @item 39 @tab @code{SDATE} +@item 40 +@tab @code{MTIME} +@item 41 +@tab @code{YMDHMS} @end multitable @end quotation @@ -524,6 +544,14 @@ numeric and short string variables only. Long string variables may have value labels, but their value labels are recorded using a different record type (@pxref{Long String Value Labels Record}). +ReadStat (@pxref{File Header Record}) writes value labels that label a +single value more than once. In more detail, it emits value labels +whose values are longer than string variables' widths, that are +identical in the actual width of the variable, e.g.@: labels for +values @code{ABC123} and @code{ABC456} for a string variable with +width 3. For files written by this software, PSPP ignores such +labels. + The value label record has the following format: @example @@ -582,7 +610,7 @@ Number of variables that the associated value labels from the value label record are to be applied. @item int32 vars[]; -A list of dictionary indexes of variables to which to apply the value +A list of 1-based dictionary indexes of variables to which to apply the value labels (@pxref{Dictionary Index}). There are @code{var_count} elements. @@ -606,9 +634,7 @@ Record type. Always set to 6. @item int32 n_lines; Number of lines of documents present. This should be greater than -zero, but the system file writer that identifies itself as -@url{https://github.com/WizardMac/ReadStat} writes document records -with zero @code{n_lines}. +zero, but ReadStats writes system files with zero @code{n_lines}. @item char lines[][80]; Document lines. The number of elements is defined by @code{n_lines}. @@ -1560,9 +1586,10 @@ value @var{code} - @var{bias}, where variable @code{bias} from the file header. For example, code 105 with bias 100.0 (the normal value) indicates a numeric variable of value 5. -One file has been seen written by SPSS 14 that contained such a code -in a @emph{string} field with the value 0 (after the bias is -subtracted) as a way of encoding null bytes. + +A code of 0 (after subtracting the bias) in a string field encodes +null bytes. This is unusual, since a string field normally encodes +text data, but it exists in real system files. @item 252 End of file. This code may or may not appear at the end of the data @@ -1624,7 +1651,7 @@ The number of bytes in the ZLIB data trailer. This and the previous field sum to the size of the system file in bytes. @end table -The data header is followed by @code{(ztrailer_ofs - 24) / 24} ZLIB +The data header is followed by @code{(ztrailer_len - 24) / 24} ZLIB compressed data blocks. Each ZLIB compressed data block begins with a ZLIB header as specified in RFC@tie{}1950, e.g.@: hex bytes @code{78 01} (the only header yet observed in practice). Each block @@ -1661,7 +1688,7 @@ been observed so far. @item int32 n_blocks; The number of ZLIB compressed data blocks, always exactly -@code{(ztrailer_ofs - 24) / 24}. +@code{(ztrailer_len - 24) / 24}. @end table The fixed header is followed by @code{n_blocks} 24-byte ZLIB data