has actually been observed in system files, and it is likely that
other formats are obsolete or were never used.
-The PSPP system-missing value is represented by the largest possible
-negative number in the floating point format (@code{-DBL_MAX}). Two
-other values are important for use as missing values: @code{HIGHEST},
-represented by the largest possible positive number (@code{DBL_MAX}),
-and @code{LOWEST}, represented by the second-largest negative number
-(in IEEE 754 format, @code{0xffeffffffffffffe}).
+System files use a few floating point values for special purposes:
+
+@table @asis
+@item SYSMIS
+The system-missing value is represented by the largest possible
+negative number in the floating point format (@code{-DBL_MAX}).
+
+@item HIGHEST
+HIGHEST is used as the high end of a missing value range with an
+unbounded maximum. It is represented by the largest possible positive
+number (@code{DBL_MAX}).
+
+@item LOWEST
+LOWEST is used as the low end of a missing value range with an
+unbounded minimum. It was originally represented by the
+second-largest negative number (in IEEE 754 format,
+@code{0xffeffffffffffffe}). System files written by SPSS 21 and later
+instead use the largest negative number (@code{-DBL_MAX}), the same
+value as SYSMIS. This does not lead to ambiguity because LOWEST
+appears in system files only in missing value ranges, which never
+contain SYSMIS.
+@end table
System files are divided into records, each of which begins with a
4-byte record type, usually regarded as an @code{int32}.
Document record, if present.
@item
-Any records not explicitly included in this list, in any order.
+Extension (type 7) records, in ascending numerical order of their
+subtypes.
@item
Dictionary termination record.
* Machine Integer Info Record::
* Machine Floating-Point Info Record::
* Multiple Response Sets Records::
+* Extra Product Info Record::
* Variable Display Parameter Record::
* Long Variable Names Record::
* Very Long String Record::
@table @code
@item char rec_type[4];
-Record type code, set to @samp{$FL2}.
+Record type code, set to @samp{$FL2}, that is, either @code{24 46 4c
+32} if the file uses an ASCII-based character encoding, or @code{5b c6
+d3 f2} if the file uses an EBCDIC-based character encoding.
@item char prod_name[60];
Product identification string. This always begins with the characters
File label declared by the user, if any (@pxref{FILE LABEL,,,pspp,
PSPP Users Guide}). Padded on the right with spaces.
+A product that identifies itself as @code{VOXCO INTERVIEWER 4.3} uses
+CR-only line ends in this field, rather than the more usual LF-only or
+CR LF line ends.
+
@item char padding[3];
Ignored padding bytes to make the structure a multiple of 32 bits in
length. Set to zeros.
@end multitable
@end quotation
+A few system files have been observed in the wild with invalid
+@code{write} fields, in particular with value 0. Readers should
+probably treat invalid @code{print} or @code{write} fields as some
+default format.
+
@node Value Labels Records
@section Value Labels Records
Machine endianness. 1 indicates big-endian, 2 indicates little-endian.
@item int32 character_code;
-@anchor{character-code}
-Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3
-indicates 8-bit ASCII, 4 indicates DEC Kanji.
-Windows code page numbers are also valid.
-
-Experience has shown that in many files, this field is ignored or incorrect.
-For a more reliable indication of the file's character encoding
-see @ref{Character Encoding Record}.
+@anchor{character-code} Character code. The following values have
+been actually observed in system files:
+
+@table @asis
+@item 1
+EBCDIC.
+
+@item 2
+7-bit ASCII.
+
+@item 1250
+The @code{windows-1250} code page for Central European and Eastern
+European languages.
+
+@item 1252
+The @code{windows-1252} code page for Western European languages.
+
+@item 28591
+ISO 8859-1.
+
+@item 65001
+UTF-8.
+@end table
+
+The following additional values are known to be defined:
+
+@table @asis
+@item 3
+8-bit ``ASCII''.
+
+@item 4
+DEC Kanji.
+@end table
+
+Other Windows code page numbers are known to be generally valid.
+
+Old versions of SPSS for Unix and Windows always wrote value 2 in this
+field, regardless of the encoding in use. Newer versions also write
+the character encoding as a string (see @ref{Character Encoding
+Record}).
@end table
@node Machine Floating-Point Info Record
@itemize @bullet
@item
-The set's name (an identifier that begins with @samp{$}).
+The set's name (an identifier that begins with @samp{$}), in mixed
+upper and lower case.
@item
An equals sign (@samp{=}).
A space.
@item
-The names of the variables in the set, each separated from the
-previous by a single space.
+The short names of the variables in the set, converted to lowercase,
+each separated from the previous by a single space.
@item
A line feed (byte 0x0a).
$e=E 11 6 choice 0 n o p
@end example
+@node Extra Product Info Record
+@section Extra Product Info Record
+
+This optional record appears to contain a text string that describes
+the program that wrote the file and the source of the data. (This is
+redundant with the file label and product info found in the file
+header record.)
+
+@example
+/* @r{Header.} */
+int32 rec_type;
+int32 subtype;
+int32 size;
+int32 count;
+
+/* @r{Exactly @code{count} bytes of data.} */
+char info[];
+@end example
+
+@table @code
+@item int32 rec_type;
+Record type. Always set to 7.
+
+@item int32 subtype;
+Record subtype. Always set to 10.
+
+@item int32 size;
+The size of each element in the @code{info} member. Always set to 1.
+
+@item int32 count;
+The total number of bytes in @code{info}.
+
+@item char info[];
+A text string. A product that identifies itself as @code{VOXCO
+INTERVIEWER 4.3} uses CR-only line ends in this field, rather than the
+more usual LF-only or CR LF line ends.
+@end table
+
@node Variable Display Parameter Record
@section Variable Display Parameter Record
Continuous Scale
@end table
-SPSS 14 sometimes writes a @code{measure} of 0. PSPP interprets this
-as nominal scale.
+SPSS sometimes writes a @code{measure} of 0. PSPP interprets this as
+nominal scale.
@item int32 width;
The width of the display column for the variable in characters.
The total number of bytes in @code{encoding}.
@item char encoding[];
-The name of the character encoding. Normally this will be an official IANA characterset name or alias.
+The name of the character encoding. Normally this will be an official
+IANA character set name or alias.
See @url{http://www.iana.org/assignments/character-sets}.
+Character set names are not case-sensitive, but SPSS appears to write
+them in all-uppercase.
@end table
-This record is not present in files generated by older software.
-See also @ref{character-code}.
+This record is not present in files generated by older software. See
+also the @code{character_code} field in the machine integer info
+record (@pxref{character-code}).
+
+When the character encoding record and the machine integer info record
+are both present, all system files observed in practice indicate the
+same character encoding, e.g.@: 1252 as @code{character_code} and
+@code{windows-1252} as @code{encoding}, 65001 and @code{UTF-8}, etc.
+
+If, for testing purposes, a file is crafted with different
+@code{character_code} and @code{encoding}, it seems that
+@code{character_code} controls the encoding for all strings in the
+system file before the dictionary termination record, including
+strings in data (e.g.@: string missing values), and @code{encoding}
+controls the encoding for strings following the dictionary termination
+record.
@node Long String Value Labels Record
@section Long String Value Labels Record
In record type 18, this field contains a sequence of one or more
variable attribute sets. If more than one variable attribute set is
present, each one after the first is delimited from the previous by
-@code{/}. Each variable attribute set consists of a (potentially
-long) variable name,
+@code{/}. Each variable attribute set consists of a long
+variable name,
followed by @code{:}, followed by an attribute set with the same
syntax as on record type 17.
00000030 0a 29 |.) |
@end example
+@menu
+* Variable Roles::
+@end menu
+
+@node Variable Roles
+@subsection Variable Roles
+
+A variable's role is represented as an attribute named @code{$@@Role}.
+This attribute has a single element whose values and their meanings
+are:
+
+@table @code
+@item 0
+Input. This, the default, is the most common role.
+@item 1
+Output.
+@item 2
+Both.
+@item 3
+None.
+@item 4
+Partition.
+@item 5
+Split.
+@end table
+
@node Extended Number of Cases Record
@section Extended Number of Cases Record
@item int32 subtype;
Record subtype. May take any value. According to Aapi
H@"am@"al@"ainen, value 5 indicates a set of grouped variables and 6
-indicates date info (probably related to USE).
+indicates date info (probably related to USE). Subtype 24 appears to
+contain XML that describes how data in the file should be displayed
+on-screen.
@item int32 size;
Size of each piece of data in the data part. Should have the value 1,