X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fsystem-file-format.texi;h=b8670681d9e3789a9803d1e31eae56d54ac32cc7;hb=8c7010202f34d6648359603f6490ce2fe4084b6e;hp=116fa6f787445fb5931e9ae2dafe87eb97782988;hpb=5dbf5abcbed01f04422d4dead1c0ae0bb7efde4f;p=pspp diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi index 116fa6f787..b8670681d9 100644 --- a/doc/dev/system-file-format.texi +++ b/doc/dev/system-file-format.texi @@ -1,12 +1,25 @@ +@c PSPP - a program for statistical analysis. +@c Copyright (C) 2019 Free Software Foundation, Inc. +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 +@c or any later version published by the Free Software Foundation; +@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the section entitled "GNU +@c Free Documentation License". +@c + @node System File Format -@appendix System File Format +@chapter System File Format -A system file encapsulates a set of cases and dictionary information -that describes how they may be interpreted. This chapter describes -the format of a system file. +An SPSS system file holds a set of cases and dictionary information +that describes how they may be interpreted. The system file format +dates back 40+ years and has evolved greatly over that time to support +new features, but in a way to facilitate interchange between even the +oldest and newest versions of software. This chapter describes the +system file format. System files use four data types: 8-bit characters, 32-bit integers, -64-bit integers, +64-bit integers, and 64-bit floating points, called here @code{char}, @code{int32}, @code{int64}, and @code{flt64}, respectively. Data is not necessarily aligned on a word @@ -316,10 +329,10 @@ so readers should take care to parse dummy variable records in the same way as other variable records. @anchor{Dictionary Index} -The @dfn{dictionary index} of a variable is its offset in the set of +The @dfn{dictionary index} of a variable is a 1-based offset in the set of variable records, including dummy variable records for long string -variables. The first variable record has a dictionary index of 0, the -second has a dictionary index of 1, and so on. +variables. The first variable record has a dictionary index of 1, the +second has a dictionary index of 2, and so on. The system file format does not directly support string variables wider than 255 bytes. Such very long string variables are represented @@ -514,6 +527,10 @@ Format types are defined as follows: @tab @code{EDATE} @item 39 @tab @code{SDATE} +@item 40 +@tab @code{MTIME} +@item 41 +@tab @code{YMDHMS} @end multitable @end quotation @@ -544,7 +561,7 @@ The value label record has the following format: int32 rec_type; int32 label_count; -/* @r{Repeated @code{label_cnt} times}. */ +/* @r{Repeated @code{n_label} times}. */ char value[8]; char label_len; char label[]; @@ -596,7 +613,7 @@ Number of variables that the associated value labels from the value label record are to be applied. @item int32 vars[]; -A list of dictionary indexes of variables to which to apply the value +A list of 1-based dictionary indexes of variables to which to apply the value labels (@pxref{Dictionary Index}). There are @code{var_count} elements. @@ -987,18 +1004,24 @@ members are as follows: @table @code @item int32 measure; -The measurement type of the variable: +The measurement level of the variable: @table @asis +@item 0 +Unknown @item 1 -Nominal Scale +Nominal @item 2 -Ordinal Scale +Ordinal @item 3 -Continuous Scale +Scale @end table -SPSS sometimes writes a @code{measure} of 0. PSPP interprets this as -nominal scale. +An ``unknown'' @code{measure} of 0 means that the variable was created +in some way that doesn't make the measurement level clear, e.g.@: with +a @code{COMPUTE} transformation. PSPP sets the measurement level the +first time it reads the data using the rules documented in +@ref{Measurement Level,,,pspp, PSPP Users Guide}, so this should +rarely appear. @item int32 width; The width of the display column for the variable in characters. @@ -1504,7 +1527,8 @@ the following believed meanings: @table @asis @item 5 -A set of grouped variables (according to Aapi H@"am@"al@"ainen). +A named variable set for use in the GUI (according to Aapi +H@"am@"al@"ainen). @item 6 Date info, probably related to USE (according to Aapi H@"am@"al@"ainen). @@ -1637,7 +1661,7 @@ The number of bytes in the ZLIB data trailer. This and the previous field sum to the size of the system file in bytes. @end table -The data header is followed by @code{(ztrailer_ofs - 24) / 24} ZLIB +The data header is followed by @code{(ztrailer_len - 24) / 24} ZLIB compressed data blocks. Each ZLIB compressed data block begins with a ZLIB header as specified in RFC@tie{}1950, e.g.@: hex bytes @code{78 01} (the only header yet observed in practice). Each block @@ -1674,7 +1698,7 @@ been observed so far. @item int32 n_blocks; The number of ZLIB compressed data blocks, always exactly -@code{(ztrailer_ofs - 24) / 24}. +@code{(ztrailer_len - 24) / 24}. @end table The fixed header is followed by @code{n_blocks} 24-byte ZLIB data