From 5089e6cce252354ae178ae74d5d9035f025322b3 Mon Sep 17 00:00:00 2001 From: Ben Pfaff Date: Thu, 7 Jan 2021 16:38:53 -0800 Subject: [PATCH] spv-file-format: Revise .spv light member documentation. --- doc/dev/spv-file-format.texi | 110 +++++++++++++++++------------------ src/output/spv/spv-writer.c | 49 ++++++++++------ 2 files changed, 87 insertions(+), 72 deletions(-) diff --git a/doc/dev/spv-file-format.texi b/doc/dev/spv-file-format.texi index 0aa53083b9..ebeac668cb 100644 --- a/doc/dev/spv-file-format.texi +++ b/doc/dev/spv-file-format.texi @@ -946,7 +946,7 @@ A ``light'' detail member @file{.bin} consists of a number of sections concatenated together, terminated by an optional byte 01: @example -LightMember => +Table => Header Titles Footnotes Areas Borders PrintSettings TableSettings Formats Dimensions Axes Cells @@ -998,17 +998,12 @@ and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for version-specific formatting (as described previously). If @code{rotate-inner-column-labels} is 1, then column labels closest -to the data are rotated to be vertical; otherwise, they are shown -in the normal way. +to the data are rotated 90° counterclockwise; otherwise, they are +shown in the normal way. If @code{rotate-outer-row-labels} is 1, then row labels farthest from -the data are rotated to be vertical; otherwise, they are shown in the -normal way. - -@code{table-id} is a binary version of the @code{tableId} attribute in -the structure member that refers to the detail member. For example, -if @code{tableId} is @code{-4122591256483201023}, then @code{table-id} -would be 0xc6c99d183b300001. +the data are rotated 90° counterclockwise; otherwise, they are shown +in the normal way. @code{min-col-width} is the minimum width that a column will be assigned automatically. @code{max-col-width} is the maximum width @@ -1017,6 +1012,11 @@ that a column will be assigned to accommodate a long column label. the width of row labels. All of these measurements are in 1/96 inch units (called a ``device independent pixel'' unit in Windows). +@code{table-id} is a binary version of the @code{tableId} attribute in +the structure member that refers to the detail member. For example, +if @code{tableId} is @code{-4122591256483201023}, then @code{table-id} +would be 0xc6c99d183b300001. + The meaning of the other variable parts of the header is not known. A writer may safely use version 3, true for @code{x0}, false for @code{x1}, true for @code{x2}, and 0x15 for @code{x3}. @@ -1036,7 +1036,7 @@ Titles => The Titles follow the Header and specify the table's title, caption, and corner text. -The @code{user-title} is shown above the title and reflects any user +The @code{user-title} reflects any user editing of the title text or style. The @code{title} is the title originally generated by the procedure. Both of these are appropriate for presentation and localized to the user's language. For example, @@ -1049,9 +1049,9 @@ name the variable and @code{c} is simply ``Frequencies''. The @code{corner-text}, if present, is shown in the upper-left corner of the table, above the row headings and to the left of the column -headings. It is usually absent. Corner text prevents row dimension -labels from being displayed above the dimension's group and category -labels (see @code{show-row-labels-in-corner}). +headings. It is usually absent. When row dimension labels are +displayed in the corner (see @code{show-row-labels-in-corner}), corner +text is hidden. The @code{caption}, if present, is shown below the table. @code{caption} reflects user editing of the caption. @@ -1092,7 +1092,7 @@ Each Area represents the style for a different area of the table, in the following order: title, caption, footer, corner, column labels, row labels, data, and layers. -@code{index} is the 1-based index of the Area, i.e. 1 for the first +@code{index} is the 1-based index of the Area, i.e.@: 1 for the first Area, through 8 for the final Area. @code{typeface} is the string name of the font used in the area. In @@ -1100,7 +1100,7 @@ the corpus, this is @code{SansSerif} in over 99% of instances and @code{Times New Roman} in the rest. @code{size} is the size of the font, in px (@pxref{SPV Light Detail -Member Format}) The most common size in the corpus is 12 px. Even +Member Format}). The most common size in the corpus is 12 px. Even though @code{size} has a floating-point type, in the corpus its values are always integers. @@ -1217,8 +1217,9 @@ PrintSettings => The PrintSettings reflect settings for printing. The fixed value of @code{endian} can be used to validate the endianness. -@code{all-layers} is 1 to print all layers, 0 to print only the -visible layers. +@code{all-layers} is 1 to print all layers, 0 to print only the layer +designated by @code{current-layer} in TableSettings (@pxref{SPV Light +Member Table Settings}). @code{paginate-layers} is 1 to print each layer at the start of a new page, 0 otherwise. (This setting is honored only @code{all-layers} is @@ -1257,7 +1258,7 @@ TableSettings => ) bestring[notes] bestring[table-look] - 00...)) + )...) Breakpoints => be32[n-breaks] be32*[n-breaks] @@ -1303,7 +1304,7 @@ or columns to pixel or point offsets. @code{notes} is a text string that contains user-specified notes. It is displayed when the user hovers the cursor over the table, like text -in the @code{title} attribute in HTML. It is not printed. It is +in the @code{title} attribute in HTML@. It is not printed. It is usually empty. @code{table-look} is the name of a SPSS ``TableLook'' table style, @@ -1322,7 +1323,7 @@ Formats => int32[n-widths] int32*[n-widths] string[locale] int32[current-layer] - bool bool bool + bool[x7] bool[x8] bool[x9] Y0 CustomCurrency count( @@ -1358,6 +1359,8 @@ following strings are CCA through CCE format strings. @xref{Custom Currency Formats,,, pspp, PSPP}. Most commonly these are all @code{-,,,} but other strings occur. +A writer may safely use false for @code{x7}, @code{x8}, and @code{x9}. + @subsubheading X0 X0 only appears, optionally, in version 1 members. @@ -1377,19 +1380,11 @@ output, in English. It is not necessarily the literal syntax name of the procedure: for example, NPAR TESTS becomes ``Nonparametric Tests.'' @code{command-local} is the procedure's name, translated into the output language; it is often empty and, when it is not, -sometimes the same as @code{command}. - -@code{dataset} is the name of the dataset analyzed to produce the -output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the -file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter -is sometimes the empty string. +sometimes the same as @code{command}.q @code{missing} is the character used to indicate that a cell contains a missing value. It is always observed as @samp{.}. -X0 repeats @code{decimal}, @code{grouping}, CustomCurrency, and -@code{missing} already included in Formats. - A writer may safely use false for @code{x17}. @subsubheading X1 @@ -1398,7 +1393,7 @@ X1 only appears in version 3 members. @example X1 => - bool + bool[x14] byte[show-title] bool[x16] byte[lang] @@ -1431,8 +1426,8 @@ which probably means to use a global default. @code{show-caption} is true to show the caption, false to hide it. -A writer may safely use false for @code{x14}, false -for @code{x16}, -1 for @code{x18} and @code{x19}, and false for +A writer may safely use false for @code{x14}, false for @code{x16}, 0 +for @code{lang}, -1 for @code{x18} and @code{x19}, and false for @code{x20}. @subsubheading X2 @@ -1476,22 +1471,22 @@ X3 => (int32[x22] i0)? @end example -@code{date} is a date, as seconds since the epoch, i.e.@: since -January 1, 1970. Pivot tables within an SPV file often have dates a -few minutes apart, so this is probably a creation date for the table -rather than for the file. - -X3 repeats @code{decimal}, @code{grouping}, CustomCurrency, and -@code{missing} already included in Formats. @code{command}, -@code{command-local}, @code{language}, @code{charset}, and -@code{locale} have the same meaning as in X0. - @code{small} is a small real number. In the corpus, it overwhelmingly takes the value 0.0001, with zero occasionally seen. Nonzero numbers with format 40 (@pxref{SPV Light Member Value}) whose magnitudes are smaller than displayed in scientific notation. (Thus, a @code{small} of zero prevents scientific notation from being chosen.) +@code{dataset} is the name of the dataset analyzed to produce the +output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the +file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter +is sometimes the empty string. + +@code{date} is a date, as seconds since the epoch, i.e.@: since +January 1, 1970. Pivot tables within an SPV file often have dates a +few minutes apart, so this is probably a creation date for the table +rather than for the file. + Sometimes @code{dataset}, @code{datafile}, and @code{date} are present and other times they are absent. The reader can distinguish by assuming that they are present and then checking whether the @@ -1550,7 +1545,8 @@ When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored. @code{dim-index} is usually the 0-based index of the dimension, e.g.@: 0 for the first dimension, 1 for the second, and so on. Sometimes it -is -1. There is no visible difference. +is -1. There is no visible difference. A writer may safely use the +0-based index. @node SPV Light Member Categories @subsection Categories @@ -1577,12 +1573,12 @@ Leaf. If the user does sorts or rearrange the categories, then the order of categories in the file reflects that change and @code{leaf-index} reflects the original order. -Occasionally a dimension has no leaf categories at all. A table that +A dimension can have no leaf categories at all. A table that contains such a dimension necessarily has no data at all. A Group is a group of nested categories. Usually a Group contains at -least one Category, so that @code{n-subcategories} is positive, but a -few Groups with @code{n-subcategories} 0 has been observed. +least one Category, so that @code{n-subcategories} is positive, but +Groups with zero subcategories have been observed. If a Group's @code{merge} is 00, the most common value, then the group is really a distinct group that should be represented as such in the @@ -1593,8 +1589,7 @@ parent group, then direct children of the dimension), and this group's name is irrelevant and should not be displayed. (Merged groups can be nested!) -(For writing an SPV file, there is no need to use the @code{merge} -feature unless it is convenient.) +Writers need not use merged groups. A Group's @code{x23} appears to be i2 when all of the categories within a group are leaf categories that directly represent data values @@ -1640,7 +1635,7 @@ Cell => int64[index] v1(00?) Value A Cell consists of an @code{index} and a Value. Suppose there are @math{d} dimensions, numbered 1 through @math{d} in the order given in -the Dimensions previously, and that dimension @math{i}, has @math{n_i} +the Dimensions previously, and that dimension @math{i} has @math{n_i} categories. Consider the cell at coordinates @math{x_i}, @math{1 \le i \le d}, and note that @math{0 \le x_i < n_i}. Then the index is calculated by the following algorithm: @@ -1672,6 +1667,7 @@ RawValue => @math{|} 04 ValueMod int32[format] string[value-label] string[var-name] byte[show] string[s] @math{|} 05 ValueMod string[var-name] string[var-label] byte[show] + @math{|} 06 string[local] ValueMod string[id] string[c] @math{|} ValueMod string[template] int32[n-args] Argument*[n-args] Argument => i0 Value @@ -1696,8 +1692,8 @@ Most commonly, @code{format} has width 40 (the maximum). An @code{x} with the maximum negative double value @code{-DBL_MAX} represents the system-missing value SYSMIS. (HIGHEST and LOWEST have -not been observed.) @xref{System File Format}, for more about these -special values. +not been observed.) See @ref{System File Format}, for more about +these special values. @item 02 Similar to @code{01}, with the additional information that @code{x} is @@ -1747,14 +1743,18 @@ AHEX format. @code{show} has the same meaning as in the encoding for 02. @item 05 -Variable @code{var-name}, which is rarely observed as empty in the -corpus, with variable label @code{var-label}, which is often empty. +Variable @code{var-name} with variable label @code{var-label}. In the +corpus, @code{var-name} is rarely empty and @code{var-label} is often +empty. @code{show} determines whether to show the variable name or the variable label. A value of 1 means to show the name, 2 to show the label, 3 to show both, and 0 means to use the default specified in @code{show-variables} (@pxref{SPV Light Member Formats}). +@item 06 +Similar to type 03, with @code{fixed} assumed to be true. + @item otherwise When the first byte of a RawValue is not one of the above, the RawValue starts with a ValueMod, whose syntax is described in the next @@ -1870,7 +1870,7 @@ a Value. Each of the @code{n-refs} integers is a reference to a Footnote (@pxref{SPV Light Member Footnotes}) by 0-based index. Footnote markers are shown appended to the main text of the Value, as -superscripts. +superscripts or subscripts. The @code{subscripts}, if present, are strings to append to the main text of the Value, as subscripts. Each subscript text is a brief diff --git a/src/output/spv/spv-writer.c b/src/output/spv/spv-writer.c index 092558d832..c27368aa63 100644 --- a/src/output/spv/spv-writer.c +++ b/src/output/spv/spv-writer.c @@ -691,7 +691,7 @@ put_category (struct buf *buf, const struct pivot_category *c) else { put_bytes (buf, "\0\0\1", 3); - put_u32 (buf, 0); + put_u32 (buf, 0); /* x23 */ put_u32 (buf, -1); put_u32 (buf, c->n_subs); for (size_t i = 0; i < c->n_subs; i++) @@ -704,7 +704,7 @@ put_y0 (struct buf *buf, const struct pivot_table *table) { put_u32 (buf, table->settings.epoch); put_byte (buf, table->settings.decimal); - put_byte (buf, table->grouping); + put_byte (buf, ','); } static void @@ -724,17 +724,17 @@ put_custom_currency (struct buf *buf, const struct pivot_table *table) static void put_x1 (struct buf *buf, const struct pivot_table *table) { - put_byte (buf, 0); + put_byte (buf, 0); /* x14 */ put_byte (buf, table->show_title ? 1 : 10); - put_byte (buf, 0); - put_byte (buf, 0); + put_byte (buf, 0); /* x16 */ + put_byte (buf, 0); /* lang */ put_show_values (buf, table->show_variables); put_show_values (buf, table->show_values); - put_u32 (buf, -1); - put_u32 (buf, -1); + put_u32 (buf, -1); /* x18 */ + put_u32 (buf, -1); /* x19 */ for (int i = 0; i < 17; i++) put_byte (buf, 0); - put_bool (buf, false); + put_bool (buf, false); /* x20 */ put_byte (buf, table->show_caption); } @@ -748,9 +748,8 @@ put_x2 (struct buf *buf) } static void -put_x3 (struct buf *buf, const struct pivot_table *table) +put_y1 (struct buf *buf, const struct pivot_table *table) { - put_bytes (buf, "\1\0\4\0\0\0", 6); put_string (buf, table->command_c); put_string (buf, table->command_local); put_string (buf, table->language); @@ -758,6 +757,26 @@ put_x3 (struct buf *buf, const struct pivot_table *table) put_string (buf, table->locale); put_bytes (buf, "\0\0\1\1", 4); put_y0 (buf, table); +} + +static void +put_y2 (struct buf *buf, const struct pivot_table *table) +{ + put_custom_currency (buf, table); + put_byte (buf, '.'); + put_bool (buf, 0); +} + +static void +put_x3 (struct buf *buf, const struct pivot_table *table) +{ + put_byte (buf, 1); + put_byte (buf, 0); + put_byte (buf, 4); /* x21 */ + put_byte (buf, 0); + put_byte (buf, 0); + put_byte (buf, 0); + put_y1 (buf, table); put_double (buf, table->small); put_byte (buf, 1); put_string (buf, table->dataset); @@ -765,11 +784,7 @@ put_x3 (struct buf *buf, const struct pivot_table *table) put_u32 (buf, 0); put_u32 (buf, table->date); put_u32 (buf, 0); - - /* Y2. */ - put_custom_currency (buf, table); - put_byte (buf, '.'); - put_bool (buf, 0); + put_y2 (buf, table); } static void @@ -943,9 +958,9 @@ put_light_table (struct buf *buf, uint64_t table_id, { const struct pivot_dimension *d = table->dimensions[i]; put_value (buf, d->root->name); - put_byte (buf, 0); + put_byte (buf, 0); /* x1 */ put_byte (buf, x2[i]); - put_u32 (buf, 2); + put_u32 (buf, 2); /* x3 */ put_bool (buf, !d->root->show_label); put_bool (buf, d->hide_all_labels); put_bool (buf, 1); -- 2.30.2