X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=spv-file-format.texi;h=f9d698706e50acf0ba48e577f2f1647a2737a118;hb=8af10bb39253b97589c5f4b455b708c8fb9e233b;hp=c58723379517a7ddfd275184a96020313f147703;hpb=86009a9088ecdfddfacd7974f30b88cb89937d55;p=pspp diff --git a/spv-file-format.texi b/spv-file-format.texi index c587233795..f9d698706e 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -411,8 +411,12 @@ In every example in the corpus, @code{x2} is 18 and the bytes that follow it are @code{00 00 00 01 00 00 00 00 00 00 00 00 00 02 00 00 00 00}. The meaning of these bytes is unknown. -Observed values of @code{x3} vary from 16 to 150. The bytes that -follow it vary somewhat. +In every example in the corpus for version 1, @code{x3} is 16 and the +bytes that follow it are @code{00 00 00 01 00 00 00 01 00 00 00 00 01 +01 01 01}. In version 3, observed @code{x3} varies from 117 to 150 and +the bytes that follow it vary somewhat and often include a readable +text string, e.g. ``Default'' or ``Academic'', which appears to be the +name of a ``TableLook''. Observed values of @code{x4} vary from 0 to 17. Out of 7060 examples in the corpus, it is nonzero only 36 times. @@ -554,3 +558,77 @@ are terminal categories that directly represent data values for a variable (e.g. in a frequency table or crosstabulation, a group of values in a variable being tabulated) and i0 otherwise, but this might be naive. + +@example +data := int[layers] int[rows] int[columns] int*[n-dimensions] + int[n-data] datum*[n-data] +@end example + +The values of @code{layers}, @code{rows}, and @code{columns} each +specifies the number of dimensions represented in layers or rows or +columns, respectively, and their values sum to the number of +dimensions. + +The @code{n-dimensions} integers are a permutation of the 0-based +dimension numbers. The first @code{layers} of them specify each of +the dimensions represented by layers, the next @code{rows} of them +specify the dimensions represented by rows, and the final +@code{columns} of them specify the dimensions represented by columns. +When there is more than one dimension of a given kind, the inner +dimensions are given first. + +@example +datum := int64[index] 00? value /* @r{version 1} */ +datum := int64[index] value /* @r{version 3} */ +@end example + +The format of a datum varies slightly from version 1 to version 3: in +version 1 it allows for an extra optional 00 byte. + +A datum consists of an index and a value. Suppose there are @math{d} +dimensions and dimension @math{i} for @math{0 \le i < d} has +@math{n_i} categories. Consider the datum at coordinates @math{x_i} +for @math{0 \le i < d}; note that @math{0 \le x_i < n_i}. Then the +index is calculated by the following algorithm: + +@display +let index = 0 +for each @math{i} from 0 to @math{d - 1}: + index = @math{n_i \times} index + @math{x_i} +@end display + +For example, suppose there are 3 dimensions with 3, 4, and 5 +categories, respectively. The datum at coordinates (1, 2, 3) has +index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}. + +@example +value := 00? 00? 00? 00? raw-value +raw-value := 01 opt-value int32[format] double + | 02 opt-value int32[format] double string[varname] string[vallab] + (01 | 02 | 03) + | 03 string[local] opt-value string[id] string[c] (00 | 01) + | 04 opt-value int32[format] string[vallab] string[varname] + (01 | 02 | 03) string[vallab] + | 05 opt-value string[varname] string[varlabel] (01 | 02 | 03) + | opt-value string[format] int32[n-substs] substitution*[n-substs] +substitution := i0 value + | int32[x] value*[x + 1] /* @r{x > 0} */ +opt-value := 31 i0 (i0 | i1 string) opt-value-i0-v1 /* @r{version 1} */ + | 31 i0 (i0 | i1 string) opt-value-i0-v3 /* @r{version 3} */ + | 31 i1 int32[footnote-number] nested-string + | 31 i2 (00 | 02) 00 (i1 | i2 | i3) nested-string + | 31 i3 00 00 01 00 i2 nested-string + | 58 +opt-value-i0-v1 := 00 (i1 | i2) 00 00 int32 00 00 +opt-value-i0-v3 := count(counted-string + (58 + | 31 01? 00? 00? 00? 01 + string[fgcolor] string[bgcolor] string[typeface] + byte) + (58 + | 31 i0 i0 i0 i0 01 00 (01 | 02 | 08) + 00 08 00 0a 00)) + +nested-string := 00 00 count(counted-string 58 58) +counted-string := count((i0 (58 | 31 string))?) +@end example