X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=spv-file-format.texi;h=a5f45e5b4979f7a86307413c2ad4a8c304658a5c;hb=463238cd3f894fc6fb5cebbcc7bb2b9584c640a8;hp=5ced97ce610131e2d93d40d20675836b2efd48c5;hpb=368df5e381894c9fa6bb2b51596a1a58b0d869a8;p=pspp diff --git a/spv-file-format.texi b/spv-file-format.texi index 5ced97ce61..a5f45e5b49 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -517,28 +517,82 @@ for the first dimension, 1 for the second, and so on. The latter is the case 98% of the time in the corpus. @example -category := value[name] - (00 | 01)[merge] 00 (00 | 01)[unindexed] (i0 | i2) - int[index] int[n-subcategories] category*[n-subcategories] +category := value[name] (terminal | group) +terminal-category := 00 00 00 i2 int[index] i0 @end example +@code{name} is the name of the category (or group). + @code{category} can represent a terminal category. In that case, -@code{name} is the name of the category, @code{merge} is 00, -@code{unindexed} is 00, @code{index} is a nonnegative integer less -than @code{n-categories} in the @code{dimension} in which the -@code{category} is nested (directly or indirectly), and -@code{n-subcategories} is 0. - -Alternatively, @code{category} can represent a group of nested -categories. In that case, @code{name} is the name of the group, -@code{unindexed} is 01, and @code{index} is -1. Ordinarily a group -has some nested content, so that @code{n-subcategories} is positive, -but a few instances of groups with @code{n-subcategories} 0 has been -observed. If @code{merge} is 00, the most common value, then the -group is really a distinct group that should be represented as such in -the visual representation and user interface. If @code{merge} is 01, -however, the categories in this group should be shown and treated as -if they were direct children of the group's parent group (or if it has -no parent group, then direct children of the dimension), and this -group's name is irrelevant and should not be displayed. (Merged -groups can be nested!) +@code{index} is a nonnegative integer less than @code{n-categories} in +the @code{dimension} in which the @code{category} is nested (directly +or indirectly). + +Alternatively, @code{category} can represent a @code{group} of nested +categories: + +@example +group := (00 | 01)[merge] 00 01 (i0 | i2)[data] + i-1 int[n-subcategories] category*[n-subcategories] +@end example + +Ordinarily a group has some nested content, so that +@code{n-subcategories} is positive, but a few instances of groups with +@code{n-subcategories} 0 has been observed. + +If @code{merge} is 00, the most common value, then the group is really +a distinct group that should be represented as such in the visual +representation and user interface. If @code{merge} is 01, however, +the categories in this group should be shown and treated as if they +were direct children of the group's parent group (or if it has no +parent group, then direct children of the dimension), and this group's +name is irrelevant and should not be displayed. (Merged groups can be +nested!) + +@code{data} appears to be i2 when all of the categories within a group +are terminal categories that directly represent data values for a +variable (e.g. in a frequency table or crosstabulation, a group of +values in a variable being tabulated) and i0 otherwise, but this might +be naive. + +@example +data := int[layers] int[rows] int[columns] int*[n-dimensions] + int[n-data] datum*[n-data] +@end example + +The values of @code{layers}, @code{rows}, and @code{columns} each +specifies the number of dimensions represented in layers or rows or +columns, respectively, and their values sum to the number of +dimensions. + +The @code{n-dimensions} integers are a permutation of the 0-based +dimension numbers. The first @code{layers} of them specify each of +the dimensions represented by layers, the next @code{rows} of them +specify the dimensions represented by rows, and the final +@code{columns} of them specify the dimensions represented by columns. +When there is more than one dimension of a given kind, the inner +dimensions are given first. + +@example +datum := int64[index] 00? value @r{# Version 1.} +datum := int64[index] value @r{# Version 3.} +@end example + +A datum consists of an index and a value. Suppose there are @math{d} +dimensions and dimension @math{i} for @math{0 \le i < d} has +@math{n_i} categories. Consider the datum at coordinates @math{x_i} +for @math{0 \le i < d}; note that @math{0 \le x_i < n_i}. Then the +index is calculated by the following algorithm: + +@display +let index = 0 +for each @math{i} from 0 to @math{d - 1}: + index = @math{n_i \times} index + @math{x_i} +@end display + +For example, suppose there are 3 dimensions with 3, 4, and 5 +categories, respectively. The datum at coordinates (1, 2, 3) has +index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}. + +The format of a datum varies slightly from version 1 to version 3, in +that version 1 has an extra optional 00 byte.