that describes how they may be interpreted. This chapter describes
the format of a system file.
-System files use three data types: 8-bit characters, 32-bit integers,
-and 64-bit floating points, called here @code{char}, @code{int32}, and
+System files use four data types: 8-bit characters, 32-bit integers,
+64-bit integers,
+and 64-bit floating points, called here @code{char}, @code{int32},
+@code{int64}, and
@code{flt64}, respectively. Data is not necessarily aligned on a word
or double-word boundary: the long variable name record (@pxref{Long
Variable Names Record}) and very long string records (@pxref{Very Long
Document record, if present.
@item
-Any of the following records, if present, in any order:
-
-@itemize @minus
-@item
-Machine integer info record.
-
-@item
-Machine floating-point info record.
-
-@item
-Variable display parameter record.
-
-@item
-Long variable names record.
-
-@item
-Miscellaneous informational records.
-@end itemize
+Any records not explicitly included in this list, in any order.
@item
Dictionary termination record.
* Long Variable Names Record::
* Very Long String Record::
* Character Encoding Record::
+* Long String Value Labels Record::
* Data File and Variable Attributes Records::
+* Extended Number of Cases Record::
* Miscellaneous Informational Records::
* Dictionary Termination Record::
* Data Record::
that will be output to a system file at the time that the header is
written. The way that this is dealt with is by writing the entire
system file, including the header, then seeking back to the beginning of
-the file and writing just the @code{ncases} field. For `files' in which
+the file and writing just the @code{ncases} field. For files in which
this is not valid, the seek operation fails. In this case,
@code{ncases} remains -1.
The first @code{label_len} characters are the variable's variable label.
@item flt64 missing_values[];
-This field is present only if @code{n_missing_values} is not 0. It has
-the same number of elements as the absolute value of
-@code{n_missing_values}. For discrete missing values, each element
-represents one missing value. When a range is present, the first
-element denotes the minimum value in the range, and the second element
-denotes the maximum value in the range. When a range plus a value are
-present, the third element denotes the additional discrete missing
-value. HIGHEST and LOWEST are indicated as described in the chapter
-introduction.
+This field is present only if @code{n_missing_values} is nonzero. It
+has the same number of 8-byte elements as the absolute value of
+@code{n_missing_values}. Each element is interpreted as a number for
+numeric variables (with HIGHEST and LOWEST indicated as described in
+the chapter introduction). For string variables of width less than 8
+bytes, elements are right-padded with spaces; for string variables
+wider than 8 bytes, only the first 8 bytes of each missing value are
+specified, with the remainder implicitly all spaces.
+
+For discrete missing values, each element represents one missing
+value. When a range is present, the first element denotes the minimum
+value in the range, and the second element denotes the maximum value
+in the range. When a range plus a value are present, the third
+element denotes the additional discrete missing value.
@end table
The @code{print} and @code{write} members of sysfile_variable are output
@node Value Labels Records
@section Value Labels Records
+The value label records documented in this section are used for
+numeric and short string variables only. Long string variables may
+have value labels, but their value labels are recorded using a
+different record type (@pxref{Long String Value Labels Record}).
+
The value label record has the following format:
@example
labels (@pxref{Dictionary Index}). There are @code{var_count}
elements.
-String variables wider than 8 bytes may not have value labels.
+String variables wider than 8 bytes may not be specified in this list.
@end table
@node Document Record
This record is not present in files generated by older software.
See also @ref{character-code}.
+@node Long String Value Labels Record
+@section Long String Value Labels Record
+
+This record, if present, specifies value labels for long string
+variables.
+
+@example
+/* @r{Header.} */
+int32 rec_type;
+int32 subtype;
+int32 size;
+int32 count;
+
+/* @r{Repeated up to exactly @code{count} bytes.} */
+int32 var_name_len;
+char var_name[];
+int32 var_width;
+int32 n_labels;
+long_string_label labels[];
+@end example
+
+@table @code
+@item int32 rec_type;
+Record type. Always set to 7.
+
+@item int32 subtype;
+Record subtype. Always set to 21.
+
+@item int32 size;
+Always set to 1.
+
+@item int32 count;
+The number of bytes following the header until the next header.
+
+@item int32 var_name_len;
+@itemx char var_name[];
+The number of bytes in the name of the variable that has long string
+value labels, plus the variable name itself, which consists of exactly
+@code{var_name_len} bytes. The variable name is not padded to any
+particular boundary, nor is it null-terminated.
+
+@item int32 var_width;
+The width of the variable, in bytes, which will be between 9 and
+32767.
+
+@item int32 n_labels;
+@itemx long_string_label labels[];
+The long string labels themselves. The @code{labels} array contains
+exactly @code{n_labels} elements, each of which has the following
+substructure:
+
+@example
+int32 value_len;
+char value[];
+int32 label_len;
+char label[];
+@end example
+
+@table @code
+@item int32 value_len;
+@itemx char value[];
+The string value being labeled. @code{value_len} is the number of
+bytes in @code{value}; it is equal to @code{var_width}. The
+@code{value} array is not padded or null-terminated.
+
+@item int32 label_len;
+@itemx char label[];
+The label for the string value. @code{label_len}, which must be
+between 0 and 120, is the number of bytes in @code{label}. The
+@code{label} array is not padded or null-terminated.
+@end table
+@end table
@node Data File and Variable Attributes Records
@section Data File and Variable Attributes Records
00000030 0a 29 |.) |
@end example
+@node Extended Number of Cases Record
+@section Extended Number of Cases Record
+
+The file header record expresses the number of cases in the system
+file as an int32 (@pxref{File Header Record}). This record allows the
+number of cases in the system file to be expressed as a 64-bit number.
+
+@example
+int32 rec_type;
+int32 subtype;
+int32 size;
+int32 count;
+int64 unknown;
+int64 ncases64;
+@end example
+
+@table @code
+@item int32 rec_type;
+Record type. Always set to 7.
+
+@item int32 subtype;
+Record subtype. Always set to 16.
+
+@item int32 size;
+Size of each element. Always set to 8.
+
+@item int32 count;
+Number of pieces of data in the data part. Alway set to 2.
+
+@item int64 unknown;
+Meaning unknown. Always set to 1.
+
+@item int64 ncases64;
+Number of cases in the file as a 64-bit integer. Presumably this
+could be -1 to indicate that the number of cases is unknown, for the
+same reason as @code{ncases} in the file header record, but this has
+not been observed in the wild.
+@end table
+
@node Miscellaneous Informational Records
@section Miscellaneous Informational Records
variable @code{bias} from the file header. For example,
code 105 with bias 100.0 (the normal value) indicates a numeric variable
of value 5.
+One file has been seen written by SPSS 14 that contained such a code
+in a @emph{string} field with the value 0 (after the bias is
+subtracted) as a way of encoding null bytes.
@item 252
End of file. This code may or may not appear at the end of the data