X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=spv-file-format.texi;h=1aef76c64750d0bef77720dadf166235531993dc;hb=7802d3315464f29b075f69253162405e4d52fd27;hp=c58723379517a7ddfd275184a96020313f147703;hpb=86009a9088ecdfddfacd7974f30b88cb89937d55;p=pspp diff --git a/spv-file-format.texi b/spv-file-format.texi index c587233795..1aef76c647 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -55,10 +55,12 @@ Same format used for tables, with a different name. The structure of a chart plus its data. Charts do not have a ``light'' format. -@item @var{prefix}_model.xml -@itemx @var{prefix}_pmml.xml -@itemx @var{prefix}_stats.xml +@item @var{prefix}_model.scf +@itemx @var{prefix}_pmml.scf Not yet investigated. The corpus contains only one example of each. + +@itemx @var{prefix}_stats.xml +Not yet investigated. The corpus contains few examples. @end table The @file{@var{prefix}} in the names of the detail members is @@ -400,8 +402,24 @@ styles := 00 font*8 (i0 | i-1) (00 | 01) 00 (00 | 01) int byte[decimal] byte[grouping] - int[x5] string*[x5] /* @r{custom currency} */ - int[x6] byte*[x6] + int[n-ccs] string*[n-ccs] /* @r{custom currency} */ + styles2 + +x2 := 00 00 00 01 00 00 00 00 00 00 00 00 00 02 00 00 00 00 /* @r{18 bytes} */ + +styles2 := i0 /* @r{version 1} */ +styles2 := count(count(x5) count(x6)) /* @r{version 3} */ +x5 := byte*33 int[n] int*n +x6 := 01 00 (03 | 04) 00 00 00 + string[command] string[subcommand] + string[language] string[charset] string[locale] + (00 | 01) 00 (00 | 01) (00 | 01) + int + byte[decimal] byte[grouping] + byte*8 01 + (string[dataset] string[datafile] i0 int i0)? + int[n-ccs] string*[n-ccs] + 2e (00 | 01) (i2000000 i0)? @end example In every example in the corpus, @code{x1} is 240. The meaning of the @@ -411,8 +429,12 @@ In every example in the corpus, @code{x2} is 18 and the bytes that follow it are @code{00 00 00 01 00 00 00 00 00 00 00 00 00 02 00 00 00 00}. The meaning of these bytes is unknown. -Observed values of @code{x3} vary from 16 to 150. The bytes that -follow it vary somewhat. +In every example in the corpus for version 1, @code{x3} is 16 and the +bytes that follow it are @code{00 00 00 01 00 00 00 01 00 00 00 00 01 +01 01 01}. In version 3, observed @code{x3} varies from 117 to 150, +and its bytes include a 1-byte count at offset 0x34. When the count +is nonzero, a text string of that length at offset 0x35 is the name of +a ``TableLook'', e.g. ``Default'' or ``Academic''. Observed values of @code{x4} vary from 0 to 17. Out of 7060 examples in the corpus, it is nonzero only 36 times. @@ -429,9 +451,9 @@ are @samp{.} and @samp{,}. @samp{,}, @samp{.}, @samp{'}, @samp{ }, and zero (presumably indicating that digits should not be grouped). -@code{x5} is observed as either 0 or 5. When it is 5, the following -strings are CCA through CCE format strings. Most commonly these are -all @code{-,,,} but other strings occur. +@code{n-ccs} is observed as either 0 or 5. When it is 5, the +following strings are CCA through CCE format strings. Most commonly +these are all @code{-,,,} but other strings occur. @example font := byte[index] 31 string[typeface] @@ -554,3 +576,75 @@ are terminal categories that directly represent data values for a variable (e.g. in a frequency table or crosstabulation, a group of values in a variable being tabulated) and i0 otherwise, but this might be naive. + +@example +data := int[layers] int[rows] int[columns] int*[n-dimensions] + int[n-data] datum*[n-data] +@end example + +The values of @code{layers}, @code{rows}, and @code{columns} each +specifies the number of dimensions represented in layers or rows or +columns, respectively, and their values sum to the number of +dimensions. + +The @code{n-dimensions} integers are a permutation of the 0-based +dimension numbers. The first @code{layers} of them specify each of +the dimensions represented by layers, the next @code{rows} of them +specify the dimensions represented by rows, and the final +@code{columns} of them specify the dimensions represented by columns. +When there is more than one dimension of a given kind, the inner +dimensions are given first. + +@example +datum := int64[index] 00? value /* @r{version 1} */ +datum := int64[index] value /* @r{version 3} */ +@end example + +The format of a datum varies slightly from version 1 to version 3: in +version 1 it allows for an extra optional 00 byte. + +A datum consists of an index and a value. Suppose there are @math{d} +dimensions and dimension @math{i} for @math{0 \le i < d} has +@math{n_i} categories. Consider the datum at coordinates @math{x_i} +for @math{0 \le i < d}; note that @math{0 \le x_i < n_i}. Then the +index is calculated by the following algorithm: + +@display +let index = 0 +for each @math{i} from 0 to @math{d - 1}: + index = @math{n_i \times} index + @math{x_i} +@end display + +For example, suppose there are 3 dimensions with 3, 4, and 5 +categories, respectively. The datum at coordinates (1, 2, 3) has +index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}. + +@example +value := 00? 00? 00? 00? raw-value +raw-value := 01 opt-value int32[format] double + | 02 opt-value int32[format] double string[varname] string[vallab] + (01 | 02 | 03) + | 03 string[local] opt-value string[id] string[c] (00 | 01) + | 04 opt-value int32[format] string[vallab] string[varname] + (01 | 02 | 03) string[vallab] + | 05 opt-value string[varname] string[varlabel] (01 | 02 | 03) + | opt-value string[format] int32[n-substs] substitution*[n-substs] +substitution := i0 value + | int32[x] value*[x + 1] /* @r{x > 0} */ +opt-value := 31 i0 (i0 | i1 string) opt-value-i0-v1 /* @r{version 1} */ + | 31 i0 (i0 | i1 string) opt-value-i0-v3 /* @r{version 3} */ + | 31 i1 int32[footnote-number] nested-string + | 31 i2 (00 | 01 | 02) 00 (i1 | i2 | i3) nested-string + | 31 i3 00 00 01 00 i2 nested-string + | 58 +opt-value-i0-v1 := 00 (i1 | i2) 00 00 int32 00 00 +opt-value-i0-v3 := count(counted-string + (58 | 31 style) + (58 + | 31 i0 i0 i0 i0 01 00 (01 | 02 | 08) + 00 08 00 0a 00)) + +style := 01? 00? 00? 00? 01 string[fgcolor] string[bgcolor] string[font] byte +nested-string := 00 00 count(counted-string (58 | 31 style) 58) +counted-string := count((i0 (58 | 31 string))?) +@end example