X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fsystem-file-format.texi;h=9bb636144030ea83c595a2ebd48a099192a61003;hb=d7ab350db46e9e410119723b14a6f2783279bd2a;hp=7b0eff9f79823f0bb623e10d2f1da8c22922eff7;hpb=8080fdb87f96b96ae07ef902e7912a23beea49c7;p=pspp diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi index 7b0eff9f79..9bb6361440 100644 --- a/doc/dev/system-file-format.texi +++ b/doc/dev/system-file-format.texi @@ -33,12 +33,28 @@ floating-point numbers, and translates as needed. However, only IEEE has actually been observed in system files, and it is likely that other formats are obsolete or were never used. -The PSPP system-missing value is represented by the largest possible -negative number in the floating point format (@code{-DBL_MAX}). Two -other values are important for use as missing values: @code{HIGHEST}, -represented by the largest possible positive number (@code{DBL_MAX}), -and @code{LOWEST}, represented by the second-largest negative number -(in IEEE 754 format, @code{0xffeffffffffffffe}). +System files use a few floating point values for special purposes: + +@table @asis +@item SYSMIS +The system-missing value is represented by the largest possible +negative number in the floating point format (@code{-DBL_MAX}). + +@item HIGHEST +HIGHEST is used as the high end of a missing value range with an +unbounded maximum. It is represented by the largest possible positive +number (@code{DBL_MAX}). + +@item LOWEST +LOWEST is used as the low end of a missing value range with an +unbounded minimum. It was originally represented by the +second-largest negative number (in IEEE 754 format, +@code{0xffeffffffffffffe}). System files written by SPSS 21 and later +instead use the largest negative number (@code{-DBL_MAX}), the same +value as SYSMIS. This does not lead to ambiguity because LOWEST +appears in system files only in missing value ranges, which never +contain SYSMIS. +@end table System files are divided into records, each of which begins with a 4-byte record type, usually regarded as an @code{int32}. @@ -115,7 +131,9 @@ char padding[3]; @table @code @item char rec_type[4]; -Record type code, set to @samp{$FL2}. +Record type code, set to @samp{$FL2}, that is, either @code{24 46 4c +32} if the file uses an ASCII-based character encoding, or @code{5b c6 +d3 f2} if the file uses an EBCDIC-based character encoding. @item char prod_name[60]; Product identification string. This always begins with the characters @@ -559,6 +577,9 @@ Machine endianness. 1 indicates big-endian, 2 indicates little-endian. been actually observed in system files: @table @asis +@item 1 +EBCDIC. + @item 2 7-bit ASCII. @@ -579,9 +600,6 @@ UTF-8. The following additional values are known to be defined: @table @asis -@item 1 -EBCDIC. - @item 3 8-bit ``ASCII''. @@ -591,9 +609,10 @@ DEC Kanji. Other Windows code page numbers are known to be generally valid. -Old versions of SPSS always wrote value 2 in this field, regardless of -the encoding in use. Newer versions also write the character encoding -as a string (see @ref{Character Encoding Record}). +Old versions of SPSS for Unix and Windows always wrote value 2 in this +field, regardless of the encoding in use. Newer versions also write +the character encoding as a string (see @ref{Character Encoding +Record}). @end table @node Machine Floating-Point Info Record @@ -812,8 +831,8 @@ Ordinal Scale Continuous Scale @end table -SPSS 14 sometimes writes a @code{measure} of 0 for string variables. -PSPP interprets this as nominal scale. +SPSS sometimes writes a @code{measure} of 0. PSPP interprets this as +nominal scale. @item int32 width; The width of the display column for the variable in characters. @@ -1170,6 +1189,32 @@ will contain a variable attribute record with the following contents: 00000030 0a 29 |.) | @end example +@menu +* Variable Roles:: +@end menu + +@node Variable Roles +@subsection Variable Roles + +A variable's role is represented as an attribute named @code{$@@Role}. +This attribute has a single element whose values and their meanings +are: + +@table @code +@item 0 +Input. This, the default, is the most common role. +@item 1 +Output. +@item 2 +Both. +@item 3 +None. +@item 4 +Partition. +@item 5 +Split. +@end table + @node Extended Number of Cases Record @section Extended Number of Cases Record @@ -1234,7 +1279,9 @@ Record type. Always set to 7. @item int32 subtype; Record subtype. May take any value. According to Aapi H@"am@"al@"ainen, value 5 indicates a set of grouped variables and 6 -indicates date info (probably related to USE). +indicates date info (probably related to USE). Subtype 24 appears to +contain XML that describes how data in the file should be displayed +on-screen. @item int32 size; Size of each piece of data in the data part. Should have the value 1,