-@node Data File Format, q2c Input Format, Portable File Format, Top
+@node Data File Format
@appendix Data File Format
PSPP necessarily uses the same format for system files as do the
floating points, and 1-byte characters. In this document these will
simply be referred to as @code{int32}, @code{flt64}, and @code{char},
the names that are used in the PSPP source code. Every field of type
-@code{int32} or @code{flt64} is aligned on a 32-bit boundary.
+@code{int32} or @code{flt64} is aligned on a 32-bit boundary relative to
+the start of the record.
The endianness of data in PSPP system files is not specified. System
files output on a computer of a particular endianness will have the
* Document Record::
* Machine int32 Info Record::
* Machine flt64 Info Record::
-* Auxilliary Variable Parameter Record::
+* Auxiliary Variable Parameter Record::
* Long Variable Names Record::
+* Very Long String Length Record::
* Miscellaneous Informational Records::
* Dictionary Termination Record::
* Data Record::
@end menu
-@node File Header Record, Variable Record, Data File Format, Data File Format
+@node File Header Record
@section File Header Record
The file header is always the first record in the file.
char rec_type[4];
char prod_name[60];
int32 layout_code;
- int32 case_size;
+ int32 nominal_case_size;
int32 compressed;
int32 weight_index;
int32 ncases;
Always set to 2. PSPP reads this value to determine the
file's endianness.
-@item int32 case_size;
+@item int32 nominal_case_size;
Number of data elements per case. This is the number of variables,
except that long string variables add extra data elements (one for every
-8 characters after the first 8).
+8 characters after the first 8). However, string variables do not
+contribute to this value beyond the first 255 bytes. Further, system
+files written by some systems set this value to -1. In general, it is
+unsafe for systems reading system files to rely upon this value.
@item int32 compressed;
Set to 1 if the data in the file is compressed, 0 otherwise.
field is arbitrarily set to @samp{00:00:00}.
@item char file_label[64];
-Set the the file label declared by the user, if any. Padded on the
-right with spaces.
+Set the file label declared by the user, if any (@pxref{FILE LABEL}).
+Padded on the right with spaces.
@item char padding[3];
Ignored padding bytes to make the structure a multiple of 32 bits in
length. Set to zeros.
@end table
-@node Variable Record, Value Label Record, File Header Record, Data File Format
+@node Variable Record
@section Variable Record
Immediately following the header must come the variable records. There
must be one variable record for every variable and every 8 characters in
-a long string beyond the first 8; i.e., there must be exactly as many
-variable records as the value specified for @code{case_size} in the file
-header record.
+a long string beyond the first 8.
@example
struct sysfile_variable
/* The following field is present only
if n_missing_values is not 0. */
- flt64 missing_values[/* variable length*/];
+ flt64 missing_values[/* variable length */];
@};
@end example
@item char name[8];
Variable name. The variable name must begin with a capital letter or
-the at-sign (@samp{@@}). Subsequent characters may also be octothorpes
+the at-sign (@samp{@@}). Subsequent characters may also be digits, octothorpes
(@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full
stops (@samp{.}). The variable name is padded on the right with spaces.
@code{SDATE}
@end table
-@node Value Label Record, Value Label Variable Record, Variable Record, Data File Format
+@node Value Label Record
@section Value Label Record
Value label records must follow the variable records and must precede
label. The remainder of the field is the label itself. The field is
padded on the right to a multiple of 64 bits in length.
-@node Value Label Variable Record, Document Record, Value Label Record, Data File Format
+@node Value Label Variable Record
@section Value Label Variable Record
Every value label variable record must be immediately preceded by a
Number of variables that the associated value labels from the value
label record are to be applied.
-@item int32 vars[/* variable length];
+@item int32 vars[/* variable length */];
A list of variables to which to apply the value labels. There are
-@code{count} elements.
+@code{count} elements. Each element identifies a variable record, where
+the first element is numbered 1 and long string variables are considered
+to occupy multiple indexes.
@end table
-@node Document Record, Machine int32 Info Record, Value Label Variable Record, Data File Format
+@node Document Record
@section Document Record
There must be no more than one document record per system file.
Lines shorter than 80 characters are padded on the right with spaces.
@end table
-@node Machine int32 Info Record, Machine flt64 Info Record, Document Record, Data File Format
+@node Machine int32 Info Record
@section Machine @code{int32} Info Record
There must be no more than one machine @code{int32} info record per
@item int32 character_code;
Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3
indicates 8-bit ASCII, 4 indicates DEC Kanji.
+Windows code page numbers are also valid.
@end table
-@node Machine flt64 Info Record, Auxilliary Variable Parameter Record, Machine int32 Info Record, Data File Format
+@node Machine flt64 Info Record
@section Machine @code{flt64} Info Record
There must be no more than one machine @code{flt64} info record per
Record subtype. Always set to 4.
@item int32 size;
-Size of each piece of data in the data part, in bytes. Always set to 4.
+Size of each piece of data in the data part, in bytes. Always set to 8.
@item int32 count;
Number of pieces of data in the data part. Always set to 3.
The value used for LOWEST in missing values.
@end table
-@node Auxilliary Variable Parameter Record, Long Variable Names Record, Machine flt64 Info Record, Data File Format
-@section Auxilliary Variable Parameter Record
+@node Auxiliary Variable Parameter Record
+@section Auxiliary Variable Parameter Record
-There must be no more than one auxilliary variable parameter record per
+There must be no more than one auxiliary variable parameter record per
system file. This record must follow the variable
records and precede the dictionary termination record.
The size @code{int32}. Always set to 4.
@item int32 count;
-The total number of bytes in @code{aux_params} divided by 3.
+The total number of records in @code{aux_params}, multiplied by 3.
@item struct aux_params aux_params[];
An array of @code{struct aux_params}. The order of the elements corresponds
-to the order of the variables in the Variable Records. The @code{struct aux_params} type is defined as follows:
+to the order of the variables in the Variable Records. No element
+corresponds to variable records that continue long string variables.
+The @code{struct aux_params} type is defined as follows:
@example
struct aux_params
@item int32 measure
The measurement type of the variable:
@table @asis
-@item 0
-Nominal Scale
@item 1
-Ordinal Scale
+Nominal Scale
@item 2
+Ordinal Scale
+@item 3
Continuous Scale
@end table
+Occasionally a value of 0 is seen here. PSPP interprets this to mean
+a nominal scale.
+
@item int32 width
The width of the display column for the variable in characters.
-@node Long Variable Names Record, Miscellaneous Informational Records, Auxilliary Variable Parameter Record, Data File Format
+@node Long Variable Names Record
@section Long Variable Names Record
There must be no more than one long variable names record per
@item int32 count;
The total number of bytes in @code{var_name_pairs}.
-@item char var_name_pairs[/* variable length];
+@item char var_name_pairs[/* variable length */];
A list of @var{key}--@var{value} tuples, where @var{key} is the name
of a variable, and @var{value} is its long variable name.
The @var{key} field is at most 8 bytes long and must match the
-name of a variable which appears in the variable record @xref{Variable Record}.
+name of a variable which appears in the variable record (@pxref{Variable
+Record}).
The @var{value} field is at most 64 bytes long.
The @var{key} and @var{value} fields are separated by a @samp{=} byte.
Each tuple is separated by a byte whose value is 09. There is no
The total length is @code{count} bytes.
@end table
+@node Very Long String Length Record
+@comment node-name, next, previous, up
+@section Very Long String Length Record
+
+
+There must be no more than one very long string length record per
+system file. This record must follow the variable records and precede the
+dictionary termination record.
+
+@example
+struct sysfile_very_long_string_lengths
+ @{
+ /* Header. */
+ int32 rec_type;
+ int32 subtype;
+ int32 size;
+ int32 count;
+
+ /* Data. */
+ char string_lengths[/* variable length */];
+ @};
+@end example
+
+@table @code
+@item int32 rec_type;
+Record type. Always set to 7.
+
+@item int32 subtype;
+Record subtype. Always set to 14.
+
+@item int32 size;
+The size of each element in the @code{string_lengths} member. Always set to 1.
+
+@item int32 count;
+The total number of bytes in @code{string_lengths}.
+
+@item char string_lengths[/* variable length */];
+A list of @var{key}--@var{value} tuples, where @var{key} is the name
+of a variable, and @var{value} is its length.
+The @var{key} field is at most 8 bytes long and must match the
+name of a variable which appears in the variable record (@pxref{Variable
+Record}).
+The @var{value} field is exactly 5 bytes long. It is a zero-padded,
+ASCII-encoded string that is the length of the variable.
+The @var{key} and @var{value} fields are separated by a @samp{=} byte.
+Tuples are delimited by a two-byte sequence @{00, 09@}.
+After the last tuple, there may be a single byte 00, or @{00, 09@}.
+The total length is @code{count} bytes.
+@end table
+
+
-@node Miscellaneous Informational Records, Dictionary Termination Record, Long Variable Names Record, Data File Format
+@node Miscellaneous Informational Records
@section Miscellaneous Informational Records
Miscellaneous informational records must follow the variable records and
precede the dictionary termination record.
-Miscellaneous informational records are ignored by PSPP when reading
-system files. They are not written by PSPP when writing system files.
+Some specific types of miscellaneous informational records are
+documented here, but others are known to exist. PSPP ignores unknown
+miscellaneous informational records when reading system files.
@example
struct sysfile_misc_info
data.
@end table
-@node Dictionary Termination Record, Data Record, Miscellaneous Informational Records, Data File Format
+@node Dictionary Termination Record
@section Dictionary Termination Record
The dictionary termination record must follow all other records, except
Ignored padding. Should be set to 0.
@end table
-@node Data Record, , Dictionary Termination Record, Data File Format
+@node Data Record
@section Data Record
Data records must follow all other records in the data file. There must
compressed. Regardless, the data is arranged in a series of 8-byte
elements.
-When data is not compressed, Every case is composed of @code{case_size}
-of these 8-byte elements, where @code{case_size} comes from the file
-header record (@pxref{File Header Record}). Each element corresponds to
+When data is not compressed,
+each element corresponds to
the variable declared in the respective variable record (@pxref{Variable
Record}). Numeric values are given in @code{flt64} format; string
values are literal characters string, padded on the right when