X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fsystem-file-format.texi;h=c1d1e42129a02c5e7fb7dfbf9f6daa528c2550dc;hb=3bbb4370239deb29ebbf813d258aef6249e2a431;hp=b1be385334bf5ef3f8aa1adfcfdecb10d9adaf3d;hpb=f463d854c12bda4f9f12f798ba12d3ea88c3a9ed;p=pspp-builds.git diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi index b1be3853..c1d1e421 100644 --- a/doc/dev/system-file-format.texi +++ b/doc/dev/system-file-format.texi @@ -96,6 +96,9 @@ Each type of record is described separately below. * Variable Display Parameter Record:: * Long Variable Names Record:: * Very Long String Record:: +* Character Encoding Record:: +* Long String Value Labels Record:: +* Data File and Variable Attributes Records:: * Miscellaneous Informational Records:: * Dictionary Termination Record:: * Data Record:: @@ -286,15 +289,20 @@ length @code{label_len}, rounded up to the nearest multiple of 32 bits. The first @code{label_len} characters are the variable's variable label. @item flt64 missing_values[]; -This field is present only if @code{n_missing_values} is not 0. It has -the same number of elements as the absolute value of -@code{n_missing_values}. For discrete missing values, each element -represents one missing value. When a range is present, the first -element denotes the minimum value in the range, and the second element -denotes the maximum value in the range. When a range plus a value are -present, the third element denotes the additional discrete missing -value. HIGHEST and LOWEST are indicated as described in the chapter -introduction. +This field is present only if @code{n_missing_values} is nonzero. It +has the same number of 8-byte elements as the absolute value of +@code{n_missing_values}. Each element is interpreted as a number for +numeric variables (with HIGHEST and LOWEST indicated as described in +the chapter introduction). For string variables of width less than 8 +bytes, elements are right-padded with spaces; for string variables +wider than 8 bytes, only the first 8 bytes of each missing value are +specified, with the remainder implicitly all spaces. + +For discrete missing values, each element represents one missing +value. When a range is present, the first element denotes the minimum +value in the range, and the second element denotes the maximum value +in the range. When a range plus a value are present, the third +element denotes the additional discrete missing value. @end table The @code{print} and @code{write} members of sysfile_variable are output @@ -396,6 +404,11 @@ Format types are defined as follows: @node Value Labels Records @section Value Labels Records +The value label records documented in this section are used for +numeric and short string variables only. Long string variables may +have value labels, but their value labels are recorded using a +different record type (@pxref{Long String Value Labels Record}). + The value label record has the following format: @example @@ -456,7 +469,7 @@ A list of dictionary indexes of variables to which to apply the value labels (@pxref{Dictionary Index}). There are @code{var_count} elements. -String variables wider than 8 bytes may not have value labels. +String variables wider than 8 bytes may not be specified in this list. @end table @node Document Record @@ -545,9 +558,14 @@ Compression code. Always set to 1. Machine endianness. 1 indicates big-endian, 2 indicates little-endian. @item int32 character_code; +@anchor{character-code} Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3 indicates 8-bit ASCII, 4 indicates DEC Kanji. Windows code page numbers are also valid. + +Experience has shown that in many files, this field is ignored or incorrect. +For a more reliable indication of the file's character encoding +see @ref{Character Encoding Record}. @end table @node Machine Floating-Point Info Record @@ -791,6 +809,197 @@ After the last tuple, there may be a single byte 00, or @{00, 09@}. The total length is @code{count} bytes. @end table +@node Character Encoding Record +@section Character Encoding Record + +This record, if present, indicates the character encoding for string data, +long variable names, variable labels, value labels and other strings in the +file. + +@example +/* @r{Header.} */ +int32 rec_type; +int32 subtype; +int32 size; +int32 count; + +/* @r{Exactly @code{count} bytes of data.} */ +char encoding[]; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 20. + +@item int32 size; +The size of each element in the @code{encoding} member. Always set to 1. + +@item int32 count; +The total number of bytes in @code{encoding}. + +@item char encoding[]; +The name of the character encoding. Normally this will be an official IANA characterset name or alias. +See @url{http://www.iana.org/assignments/character-sets}. +@end table + +This record is not present in files generated by older software. +See also @ref{character-code}. + +@node Long String Value Labels Record +@section Long String Value Labels Record + +This record, if present, specifies value labels for long string +variables. + +@example +/* @r{Header.} */ +int32 rec_type; +int32 subtype; +int32 size; +int32 count; + +/* @r{Repeated up to exactly @code{count} bytes.} */ +int32 var_name_len; +char var_name[]; +int32 var_width; +int32 n_labels; +long_string_label labels[]; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 21. + +@item int32 size; +Always set to 1. + +@item int32 count; +The number of bytes following the header until the next header. + +@item int32 var_name_len; +@itemx char var_name[]; +The number of bytes in the name of the variable that has long string +value labels, plus the variable name itself, which consists of exactly +@code{var_name_len} bytes. The variable name is not padded to any +particular boundary, nor is it null-terminated. + +@item int32 var_width; +The width of the variable, in bytes, which will be between 9 and +32767. + +@item int32 n_labels; +@itemx long_string_label labels[]; +The long string labels themselves. The @code{labels} array contains +exactly @code{n_labels} elements, each of which has the following +substructure: + +@example +int32 value_len; +char value[]; +int32 label_len; +char label[]; +@end example + +@table @code +@item int32 value_len; +@itemx char value[]; +The string value being labeled. @code{value_len} is the number of +bytes in @code{value}; it is equal to @code{var_width}. The +@code{value} array is not padded or null-terminated. + +@item int32 label_len; +@itemx char label[]; +The label for the string value. @code{label_len}, which must be +between 0 and 120, is the number of bytes in @code{label}. The +@code{label} array is not padded or null-terminated. +@end table +@end table + +@node Data File and Variable Attributes Records +@section Data File and Variable Attributes Records + +The data file and variable attributes records represent custom +attributes for the system file or for individual variables in the +system file, as defined on the DATAFILE ATTRIBUTE (@pxref{DATAFILE +ATTRIBUTE,,,pspp, PSPP Users Guide}) and VARIABLE ATTRIBUTE commands +(@pxref{VARIABLE ATTRIBUTE,,,pspp, PSPP Users Guide}), respectively. + +@example +/* @r{Header.} */ +int32 rec_type; +int32 subtype; +int32 size; +int32 count; + +/* @r{Exactly @code{count} bytes of data.} */ +char attributes[]; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 17 for a data file attribute record or +to 18 for a variable attributes record. + +@item int32 size; +The size of each element in the @code{attributes} member. Always set to 1. + +@item int32 count; +The total number of bytes in @code{attributes}. + +@item char attributes[]; +The attributes, in a text-based format. + +In record type 17, this field contains a single attribute set. An +attribute set is a sequence of one or more attributes concatenated +together. Each attribute consists of a name, which has the same +syntax as a variable name, followed by, inside parentheses, a sequence +of one or more values. Each value consists of a string enclosed in +single quotes (@code{'}) followed by a line feed (byte 0x0a). A value +may contain single quote characters, which are not themselves escaped +or quoted or required to be present in pairs. There is no apparent +way to embed a line feed in a value. There is no distinction between +an attribute with a single value and an attribute array with one +element. + +In record type 18, this field contains a sequence of one or more +variable attribute sets. If more than one variable attribute set is +present, each one after the first is delimited from the previous by +@code{/}. Each variable attribute set consists of a variable name, +followed by @code{:}, followed by an attribute set with the same +syntax as on record type 17. + +The total length is @code{count} bytes. +@end table + +@subheading Example + +A system file produced with the following VARIABLE ATTRIBUTE commands +in effect: + +@example +VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=fred[1]('23') fred[2]('34'). +VARIABLE ATTRIBUTE VARIABLES=dummy ATTRIBUTE=bert('123'). +@end example + +@noindent +will contain a variable attribute record with the following contents: + +@example +00000000 07 00 00 00 12 00 00 00 01 00 00 00 22 00 00 00 |............"...| +00000010 64 75 6d 6d 79 3a 66 72 65 64 28 27 32 33 27 0a |dummy:fred('23'.| +00000020 27 33 34 27 0a 29 62 65 72 74 28 27 31 32 33 27 |'34'.)bert('123'| +00000030 0a 29 |.) | +@end example + @node Miscellaneous Informational Records @section Miscellaneous Informational Records