+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2019 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
+
@node System File Format
@appendix System File Format
the format of a system file.
System files use four data types: 8-bit characters, 32-bit integers,
-64-bit integers,
+64-bit integers,
and 64-bit floating points, called here @code{char}, @code{int32},
@code{int64}, and
@code{flt64}, respectively. Data is not necessarily aligned on a word
would be longer than 60 characters; otherwise it is padded on the right
with spaces.
+The product name field allow readers to behave differently based on
+quirks in the way that particular software writes system files.
+@xref{Value Labels Records}, for the detail of the quirk that the PSPP
+system file reader tolerates in files written by ReadStat, which has
+@code{https://github.com/WizardMac/ReadStat} in @code{prod_name}.
+
@anchor{layout_code}
@item int32 layout_code;
Normally set to 2, although a few system files have been spotted in
same way as other variable records.
@anchor{Dictionary Index}
-The @dfn{dictionary index} of a variable is its offset in the set of
+The @dfn{dictionary index} of a variable is a 1-based offset in the set of
variable records, including dummy variable records for long string
-variables. The first variable record has a dictionary index of 0, the
-second has a dictionary index of 1, and so on.
+variables. The first variable record has a dictionary index of 1, the
+second has a dictionary index of 2, and so on.
The system file format does not directly support string variables
wider than 255 bytes. Such very long string variables are represented
@tab @code{EDATE}
@item 39
@tab @code{SDATE}
+@item 40
+@tab @code{MTIME}
+@item 41
+@tab @code{YMDHMS}
@end multitable
@end quotation
have value labels, but their value labels are recorded using a
different record type (@pxref{Long String Value Labels Record}).
+ReadStat (@pxref{File Header Record}) writes value labels that label a
+single value more than once. In more detail, it emits value labels
+whose values are longer than string variables' widths, that are
+identical in the actual width of the variable, e.g.@: labels for
+values @code{ABC123} and @code{ABC456} for a string variable with
+width 3. For files written by this software, PSPP ignores such
+labels.
+
The value label record has the following format:
@example
label record are to be applied.
@item int32 vars[];
-A list of dictionary indexes of variables to which to apply the value
+A list of 1-based dictionary indexes of variables to which to apply the value
labels (@pxref{Dictionary Index}). There are @code{var_count}
elements.
@item int32 n_lines;
Number of lines of documents present. This should be greater than
-zero, but the system file writer that identifies itself as
-@url{https://github.com/WizardMac/ReadStat} writes document records
-with zero @code{n_lines}.
+zero, but ReadStats writes system files with zero @code{n_lines}.
@item char lines[][80];
Document lines. The number of elements is defined by @code{n_lines}.
variable @code{bias} from the file header. For example,
code 105 with bias 100.0 (the normal value) indicates a numeric variable
of value 5.
-One file has been seen written by SPSS 14 that contained such a code
-in a @emph{string} field with the value 0 (after the bias is
-subtracted) as a way of encoding null bytes.
+
+A code of 0 (after subtracting the bias) in a string field encodes
+null bytes. This is unusual, since a string field normally encodes
+text data, but it exists in real system files.
@item 252
End of file. This code may or may not appear at the end of the data
field sum to the size of the system file in bytes.
@end table
-The data header is followed by @code{(ztrailer_ofs - 24) / 24} ZLIB
+The data header is followed by @code{(ztrailer_len - 24) / 24} ZLIB
compressed data blocks. Each ZLIB compressed data block begins with a
ZLIB header as specified in RFC@tie{}1950, e.g.@: hex bytes @code{78
01} (the only header yet observed in practice). Each block
@item int32 n_blocks;
The number of ZLIB compressed data blocks, always exactly
-@code{(ztrailer_ofs - 24) / 24}.
+@code{(ztrailer_len - 24) / 24}.
@end table
The fixed header is followed by @code{n_blocks} 24-byte ZLIB data