X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fportable-file-format.texi;h=d58a987fb4ff7a18472a1b0af51f25535ac4ca9a;hb=577c6ac9b93c494efdabc324365ec70a43f6d742;hp=5480e3ec216c49a30ea5f4f515bf52fcdcb18bef;hpb=c15c93a4a2421502b415cdb2c41fa10e2e533a5a;p=pspp-builds.git diff --git a/doc/portable-file-format.texi b/doc/portable-file-format.texi index 5480e3ec..d58a987f 100644 --- a/doc/portable-file-format.texi +++ b/doc/portable-file-format.texi @@ -1,4 +1,4 @@ -@node Portable File Format, Data File Format, Configuration, Top +@node Portable File Format @appendix Portable File Format These days, most computers use the same internal data formats for @@ -22,17 +22,24 @@ may be incorrect in the general case. * Case Weight Variable Record:: * Variable Records:: * Value Label Records:: +* Portable File Document Record:: * Portable File Data:: @end menu -@node Portable File Characters, Portable File Structure, Portable File Format, Portable File Format +@node Portable File Characters @section Portable File Characters -Portable files are arranged as a series of lines of exactly 80 +Portable files are arranged as a series of lines of 80 characters each. Each line is terminated by a carriage-return, -line-feed sequence ``new-lines''). New-lines are only used to avoid +line-feed sequence (``new-lines''). New-lines are only used to avoid line length limits imposed by some OSes; they are not meaningful. +Most lines in portable files are exactly 80 characters long. The only +exception is a line that ends in one or more spaces, in which the +spaces may optionally be omitted. Thus, a portable file reader must +act as though a line shorter than 80 characters is padded to that +length with spaces. + The file must be terminated with a @samp{Z} character. In addition, if the final line in the file does not have exactly 80 characters, then it is padded on the right with @samp{Z} characters. (The file contents may @@ -45,7 +52,7 @@ and the trailing @samp{Z}s will be ignored, as if they did not exist, because they are not an important part of understanding the file contents. -@node Portable File Structure, Portable File Header, Portable File Characters, Portable File Format +@node Portable File Structure @section Portable File Structure Every portable file consists of the following records, in sequence: @@ -80,6 +87,9 @@ missing value record and a variable label record. @item Value labels (optional). +@item +Documents (optional). + @item Data. @end itemize @@ -127,7 +137,7 @@ may not contain a fraction. String fields take the form of a integer field having value @var{n}, followed by exactly @var{n} characters, which are the string content. -@node Portable File Header, Version and Date Info Record, Portable File Structure, Portable File Format +@node Portable File Header @section Portable File Header Every portable file begins with a 464-byte header, consisting of a @@ -304,7 +314,7 @@ The 8-byte tag string consists of the exact characters @code{SPSSPORT} in the portable file's character set, which can be used to verify that the file is indeed a portable file. -@node Version and Date Info Record, Identification Records, Portable File Header, Portable File Format +@node Version and Date Info Record @section Version and Date Info Record This record does not have a tag code. It has the following structure: @@ -323,7 +333,7 @@ A 6-character string field giving the file creation time in the format HHMMSS. @end itemize -@node Identification Records, Variable Count Record, Version and Date Info Record, Portable File Format +@node Identification Records @section Identification Records The product identification record has tag code @samp{1}. It consists of @@ -338,7 +348,7 @@ The subproduct identification record has tag code @samp{3}. It is optional. If present, it consists of a single string field giving additional information on the product that wrote the portable file. -@node Variable Count Record, Case Weight Variable Record, Identification Records, Portable File Format +@node Variable Count Record @section Variable Count Record The variable count record has tag code @samp{4}. It consists of two @@ -346,7 +356,7 @@ integer fields. The first contains the number of variables in the file dictionary. The purpose of the second is unknown; it contains the value 161 in all portable files examined so far. -@node Case Weight Variable Record, Variable Records, Variable Count Record, Portable File Format +@node Case Weight Variable Record @section Case Weight Variable Record The case weight variable record is optional. If it is present, it @@ -354,7 +364,7 @@ indicates the variable used for weighting cases; if it is absent, cases are unweighted. It has tag code @samp{6}. It consists of a single string field that names the weighting variable. -@node Variable Records, Value Label Records, Case Weight Variable Record, Portable File Format +@node Variable Records @section Variable Records Each variable record represents a single variable. Variable records @@ -369,6 +379,11 @@ and 255 for a string variable. @item Name (string). 1--8 characters long. Must be in all capitals. +A few portable files that contain duplicate variable names have been +spotted in the wild. PSPP handles these by renaming the duplicates +with numeric extensions: @code{@var{var}_1}, @code{@var{var}_2}, and +so on. + @item Print format. This is a set of three integer fields: @@ -384,6 +399,11 @@ Format width. 1--40. Number of decimal places. 1--40. @end itemize +A few portable files with invalid format types or formats that are not +of the appropriate width for their variables have been spotted in the +wild. PSPP assigns a default F or A format to a variable with an +invalid format. + @item Write format. Same structure as the print format described above. @end itemize @@ -407,7 +427,7 @@ In addition, each variable record can optionally be followed by a variable label record, which has tag code @samp{C}. A variable label record has one field, the variable label itself (string). -@node Value Label Records, Portable File Data, Variable Records, Portable File Format +@node Value Label Records @section Value Label Records Value label records have tag code @samp{D}. They have the following @@ -420,7 +440,8 @@ Variable count (integer). @item List of variables (strings). The variable count specifies the number in the list. Variables are specified by their names. All variables must -be of the same type (numeric or string). +be of the same type (numeric or string), but string variables do not +necessarily have the same width. @item Label count (integer). @@ -431,7 +452,21 @@ tuples. Each tuple consists of a value, which is numeric or string as appropriate to the variables, followed by a label (string). @end itemize -@node Portable File Data, , Value Label Records, Portable File Format +A few portable files that specify duplicate value labels, that is, two +different labels for a single value of a single variable, have been +spotted in the wild. PSPP uses the last value label specified in +these cases. + +@node Portable File Document Record +@section Document Record + +One document record may optionally follow the value label record. The +document record consists of tag code @samp{E}, following by the number +of document lines as an integer, followed by that number of strings, +each of which represents one document line. Document lines must be 80 +bytes long or shorter. + +@node Portable File Data @section Portable File Data The data record has tag code @samp{F}. There is only one tag for all