+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2019 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
+
@node System File Format
@appendix System File Format
the format of a system file.
System files use four data types: 8-bit characters, 32-bit integers,
-64-bit integers,
+64-bit integers,
and 64-bit floating points, called here @code{char}, @code{int32},
@code{int64}, and
@code{flt64}, respectively. Data is not necessarily aligned on a word
Extension (type 7) records, in ascending numerical order of their
subtypes.
+System files written by SPSS include at most one of each kind of
+extension record. This is generally true of system files written by
+other software as well, with known exceptions noted below in the
+individual sections about each type of record.
+
@item
Dictionary termination record.
would be longer than 60 characters; otherwise it is padded on the right
with spaces.
+The product name field allow readers to behave differently based on
+quirks in the way that particular software writes system files.
+@xref{Value Labels Records}, for the detail of the quirk that the PSPP
+system file reader tolerates in files written by ReadStat, which has
+@code{https://github.com/WizardMac/ReadStat} in @code{prod_name}.
+
@anchor{layout_code}
@item int32 layout_code;
Normally set to 2, although a few system files have been spotted in
Number of data elements per case. This is the number of variables,
except that long string variables add extra data elements (one for every
8 characters after the first 8). However, string variables do not
-contribute to this value beyond the first 255 bytes. Further, system
-files written by some systems set this value to -1. In general, it is
+contribute to this value beyond the first 255 bytes. Further, some
+software always writes -1 or 0 in this field. In general, it is
unsafe for systems reading system files to rely upon this value.
@item int32 compression;
same way as other variable records.
@anchor{Dictionary Index}
-The @dfn{dictionary index} of a variable is its offset in the set of
+The @dfn{dictionary index} of a variable is a 1-based offset in the set of
variable records, including dummy variable records for long string
-variables. The first variable record has a dictionary index of 0, the
-second has a dictionary index of 1, and so on.
+variables. The first variable record has a dictionary index of 1, the
+second has a dictionary index of 2, and so on.
The system file format does not directly support string variables
wider than 255 bytes. Such very long string variables are represented
@tab @code{EDATE}
@item 39
@tab @code{SDATE}
+@item 40
+@tab @code{MTIME}
+@item 41
+@tab @code{YMDHMS}
@end multitable
@end quotation
have value labels, but their value labels are recorded using a
different record type (@pxref{Long String Value Labels Record}).
+ReadStat (@pxref{File Header Record}) writes value labels that label a
+single value more than once. In more detail, it emits value labels
+whose values are longer than string variables' widths, that are
+identical in the actual width of the variable, e.g.@: labels for
+values @code{ABC123} and @code{ABC456} for a string variable with
+width 3. For files written by this software, PSPP ignores such
+labels.
+
The value label record has the following format:
@example
int32 rec_type;
int32 label_count;
-/* @r{Repeated @code{label_cnt} times}. */
+/* @r{Repeated @code{n_label} times}. */
char value[8];
char label_len;
char label[];
label record are to be applied.
@item int32 vars[];
-A list of dictionary indexes of variables to which to apply the value
+A list of 1-based dictionary indexes of variables to which to apply the value
labels (@pxref{Dictionary Index}). There are @code{var_count}
elements.
Record type. Always set to 6.
@item int32 n_lines;
-Number of lines of documents present.
+Number of lines of documents present. This should be greater than
+zero, but ReadStats writes system files with zero @code{n_lines}.
@item char lines[][80];
Document lines. The number of elements is defined by @code{n_lines}.
@item char attributes[];
The attributes, in a text-based format.
-In record type 17, this field contains a single attribute set. An
+In record subtype 17, this field contains a single attribute set. An
attribute set is a sequence of one or more attributes concatenated
together. Each attribute consists of a name, which has the same
syntax as a variable name, followed by, inside parentheses, a sequence
an attribute with a single value and an attribute array with one
element.
-In record type 18, this field contains a sequence of one or more
+In record subtype 18, this field contains a sequence of one or more
variable attribute sets. If more than one variable attribute set is
present, each one after the first is delimited from the previous by
@code{/}. Each variable attribute set consists of a long
variable name,
followed by @code{:}, followed by an attribute set with the same
-syntax as on record type 17.
+syntax as on record subtype 17.
+
+System files written by @code{Stata 14.1/-savespss- 1.77 by
+S.Radyakin} may include multiple records with subtype 18, one per
+variable that has variable attributes.
The total length is @code{count} bytes.
@end table
variable @code{bias} from the file header. For example,
code 105 with bias 100.0 (the normal value) indicates a numeric variable
of value 5.
-One file has been seen written by SPSS 14 that contained such a code
-in a @emph{string} field with the value 0 (after the bias is
-subtracted) as a way of encoding null bytes.
+
+A code of 0 (after subtracting the bias) in a string field encodes
+null bytes. This is unusual, since a string field normally encodes
+text data, but it exists in real system files.
@item 252
End of file. This code may or may not appear at the end of the data
field sum to the size of the system file in bytes.
@end table
-The data header is followed by @code{(ztrailer_ofs - 24) / 24} ZLIB
+The data header is followed by @code{(ztrailer_len - 24) / 24} ZLIB
compressed data blocks. Each ZLIB compressed data block begins with a
ZLIB header as specified in RFC@tie{}1950, e.g.@: hex bytes @code{78
01} (the only header yet observed in practice). Each block
@item int32 n_blocks;
The number of ZLIB compressed data blocks, always exactly
-@code{(ztrailer_ofs - 24) / 24}.
+@code{(ztrailer_len - 24) / 24}.
@end table
The fixed header is followed by @code{n_blocks} 24-byte ZLIB data