@c
@node System File Format
-@appendix System File Format
+@chapter System File Format
-A system file encapsulates a set of cases and dictionary information
-that describes how they may be interpreted. This chapter describes
-the format of a system file.
+An SPSS system file holds a set of cases and dictionary information
+that describes how they may be interpreted. The system file format
+dates back 40+ years and has evolved greatly over that time to support
+new features, but in a way to facilitate interchange between even the
+oldest and newest versions of software. This chapter describes the
+system file format.
System files use four data types: 8-bit characters, 32-bit integers,
64-bit integers,
* Multiple Response Sets Records::
* Extra Product Info Record::
* Variable Display Parameter Record::
+* Variable Sets Record::
* Long Variable Names Record::
* Very Long String Record::
* Character Encoding Record::
@table @code
@item int32 measure;
-The measurement type of the variable:
+The measurement level of the variable:
@table @asis
+@item 0
+Unknown
@item 1
-Nominal Scale
+Nominal
@item 2
-Ordinal Scale
+Ordinal
@item 3
-Continuous Scale
+Scale
@end table
-SPSS sometimes writes a @code{measure} of 0. PSPP interprets this as
-nominal scale.
+An ``unknown'' @code{measure} of 0 means that the variable was created
+in some way that doesn't make the measurement level clear, e.g.@: with
+a @code{COMPUTE} transformation. PSPP sets the measurement level the
+first time it reads the data using the rules documented in
+@ref{Measurement Level,,,pspp, PSPP Users Guide}, so this should
+rarely appear.
@item int32 width;
The width of the display column for the variable in characters.
@end table
@end table
+@node Variable Sets Record
+@section Variable Sets Record
+
+The SPSS GUI offers users the ability to arrange variables in sets.
+Users may enable and disable sets individually, and the data editor
+and analysis dialog boxes only show enabled sets. Syntax does not use
+variable sets.
+
+The variable sets record, if present, has the following format:
+
+@example
+/* @r{Header.} */
+int32 rec_type;
+int32 subtype;
+int32 size;
+int32 count;
+
+/* @r{Exactly @code{count} bytes of text.} */
+char text[];
+@end example
+
+@table @code
+@item int32 rec_type;
+Record type. Always set to 7.
+
+@item int32 subtype;
+Record subtype. Always set to 5.
+
+@item int32 size;
+Always set to 1.
+
+@item int32 count;
+The total number of bytes in @code{text}.
+
+@item char text[];
+The variable sets, in a text-based format.
+
+Each variable set occupies one line of text, each of which ends with a
+line feed (byte 0x0a), optionally preceded by a carriage return (byte
+0x0d).
+
+Each line begins with the name of the variable set, followed by an
+equals sign (@samp{=}) and a space (byte 0x20), followed by the long
+variable names of the members of the set, separated by spaces. A
+variable set may be empty, in which case the equals sign and the space
+following it are still present.
+@end table
+
@node Long Variable Names Record
@section Long Variable Names Record
int32 var_name_len;
char var_name[];
char n_missing_values;
-long_string_missing_value values[];
+int32 value_len;
+char values[values_len * n_missing_values];
@end example
@table @code
The number of missing values, either 1, 2, or 3. (This is, unusually,
a single byte instead of a 32-bit number.)
-@item long_string_missing_value values[];
-The missing values themselves. This array contains exactly
-@code{n_missing_values} elements, each of which has the following
-substructure:
-
-@example
-int32 value_len;
-char value[];
-@end example
-
-@table @code
@item int32 value_len;
-The length of the missing value string, in bytes. This value should
+The length of each missing value string, in bytes. This value should
be 8, because long string variables are at least 8 bytes wide (by
definition), only the first 8 bytes of a long string variable's
missing values are allowed to be non-spaces, and any spaces within the
first 8 bytes are included in the missing value here.
-@item char value[];
-The missing value string, exactly @code{value_len} bytes, without
-any padding or null terminator.
-@end table
+@item char values[values_len * n_missing_values]
+The missing values themselves, without any padding or null
+terminators.
@end table
+An earlier version of this document stated that @code{value_len} was
+repeated before each of the missing values, so that there was an extra
+@code{int32} value of 8 before each missing value after the first.
+Old versions of PSPP wrote data files in this format. Readers can
+tolerate this mistake, if they wish, by noticing and skipping the
+extra @code{int32} values, which wouldn't ordinarily occur in strings.
+
@node Data File and Variable Attributes Records
@section Data File and Variable Attributes Records
the following believed meanings:
@table @asis
-@item 5
-A named variable set for use in the GUI (according to Aapi
-H@"am@"al@"ainen).
-
@item 6
Date info, probably related to USE (according to Aapi H@"am@"al@"ainen).