X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fspv-file-format.texi;h=b27037cce7b069580d667ca1147fa586d349aa9a;hb=f8659933d48c5682010d1e1f04ae7acb5cbcd611;hp=79bf4ab8d39bd34a15f102e144599aa188410540;hpb=bbe5d82770003c1309beee4e5c8231ed44885c51;p=pspp diff --git a/doc/dev/spv-file-format.texi b/doc/dev/spv-file-format.texi index 79bf4ab8d3..b27037cce7 100644 --- a/doc/dev/spv-file-format.texi +++ b/doc/dev/spv-file-format.texi @@ -14,7 +14,7 @@ SPSS Viewer or @file{.spv} files, here called SPV files, are written by SPSS 16 and later to represent the contents of its output editor. This chapter documents the format, based on examination of a corpus of -about 3,000 files from a variety of sources. This description is +about 8,000 files from a variety of sources. This description is detailed enough to both read and write SPV files. SPSS 15 and earlier versions instead use @file{.spo} files, which have @@ -31,7 +31,10 @@ manifest contains the string @samp{allowPivoting=true}, without a new-line. PSPP uses this string to identify an SPV file; it is invariant across the corpus.@footnote{SPV files always begin with the 7-byte sequence 50 4b 03 04 14 00 08, but this is not a useful magic -number because most Zip archives start the same way.} +number because most Zip archives start the same way.}@footnote{SPSS +writes @file{META-INF/MANIFEST.MF} to every SPV file, but it does not +read it or even require it to exist, so using different contents, +e.g.@: as @samp{allowingPivot=false} has no effect.} The rest of the members in an SPV file's Zip archive fall into two categories: @dfn{structure} and @dfn{detail} members. Structure @@ -69,6 +72,13 @@ Same format used for tables, with a different name. The structure of a chart plus its data. Charts do not have a ``light'' format. +@item @file{@var{prefix}_Imagegeneric.png} +@itemx @file{@var{prefix}_PastedObjectgeneric.png} +@itemx @file{@var{prefix}_imageData.bin} +A PNG image referenced by an @code{object} element (in the first two +cases) or an @code{image} element (in the final case). @xref{SPV +Structure object and image Elements}. + @item @file{@var{prefix}_pmml.scf} @itemx @file{@var{prefix}_stats.scf} @item @file{@var{prefix}_model.xml} @@ -134,7 +144,7 @@ container :page-break-before=(always)? :text-align=(left | center)? :width=dimension -=> label (table | container_text | graph | model | object | image) +=> label (table | container_text | graph | model | object | image | tree) @end example Each attribute specification begins with @samp{:} followed by the @@ -155,11 +165,23 @@ Either @code{true} or @code{false}. @item dimension A floating-point number followed by a unit, e.g.@: @code{10pt}. Units in the corpus include @code{in} (inch), @code{pt} (points, 72/inch), -@code{px} (``device-independent pixels'', 96/inch), and @code{cm}. -The corpus also contains localized names for units: @code{인치} for -inch, @code{пт} for points, and @code{см} for centimeters. If the -unit is omitted then points should be assumed. The number and unit -may be separated by white space. +@code{px} (``device-independent pixels'', 96/inch), and @code{cm}. If +the unit is omitted then points should be assumed. The number and +unit may be separated by white space. + +The corpus also includes localized names for units. A reader must +understand these to properly interpret the dimension: + +@table @asis +@item inch +@code{인치}, @code{pol.}, @code{cala}, @code{cali} + +@item point +@code{пт} + +@item centimeter +@code{см} +@end table @item real A floating-point number. @@ -279,7 +301,9 @@ information, and the CSS from the embedded HTML: * SPV Structure table Element:: * SPV Structure graph Element:: * SPV Structure model Element:: -* SPV Structure dataPath and path Elements:: +* SPV Structure object and image Elements:: +* SPV Structure tree Element:: +* SPV Structure Path Elements:: * SPV Structure pageSetup Element:: * SPV Structure @code{text} Element (Inside @code{pageParagraph}):: @end menu @@ -306,7 +330,7 @@ heading @end example The root of a structure member is a @code{heading}, which represents a -section of output beginning with a title (the @code{label}) and +section of output beginning with a @code{label} and ordinarily followed by content containers or further nested (sub)-sections of output. Unlike heading elements in HTML and other common document formats, which precede the content that they head, @@ -385,17 +409,21 @@ label => TEXT @end example Every @code{heading} and @code{container} holds a @code{label} as its -first child. The root @code{heading} in a structure member always -contains the string ``Output'' (localized). Otherwise, the text in -@code{label} describes what it labels, often by naming the statistical -procedure that was executed, e.g.@: ``Frequencies'' or ``T-Test''. -Labels are often very generic, especially within a @code{container}, -e.g.@: ``Title'' or ``Warnings'' or ``Notes''. Label text is -localized according to the output language, e.g.@: in Italian a -frequency table procedure is labeled ``Frequenze''. - -The corpus contains a few examples of empty labels, ones that contain -no text. +first child. The label text is what appears in the outline pane of +the GUI's viewer window. PSPP also puts it into the outline of PDF +output. The label text doesn't appear in the output itself. + +The text in @code{label} describes what it labels, often by naming the +statistical procedure that was executed, e.g.@: ``Frequencies'' or +``T-Test''. The root @code{heading} in a structure member is normally +``Output''. Labels are often very generic, especially within a +@code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''. +Label text is localized according to the output language, e.g.@: in +Italian a frequency table procedure is labeled ``Frequenze''. + +The user can edit labels to be anything they want. The corpus +contains a few examples of empty labels, ones that contain no text, +probably as a result of user editing. @node SPV Structure container Element @subsection The @code{container} Element @@ -406,7 +434,7 @@ container :page-break-before=(always)? :text-align=(left | center)? :width=dimension -=> label (table | container_text | graph | model | object | image) +=> label (table | container_text | graph | model | object | image | tree) @end example A @code{container} serves to contain and label a @code{table}, @@ -497,9 +525,30 @@ inclusive. The CSS in the corpus is simple. To understand it, a parser only needs to be able to skip white space, @code{}, and -parse style only for @code{p} elements. Only @code{font-weight}, -@code{font-style}, @code{font-decoration}, @code{font-family}, and -@code{font-size} matter. +parse style only for @code{p} elements. Only the following properties +matter: + +@table @code +@item color +In the form @code{@var{rr}@var{gg}@var{bb}}, e.g. @code{000000}, with +no leading @samp{#}. + +@item font-weight +Either @code{bold} or @code{normal}. + +@item font-style +Either @code{italic} or @code{normal}. + +@item text-decoration +Either @code{underline} or @code{normal}. + +@item font-family +A font name, commonly @code{Monospaced} or @code{SansSerif}. + +@item font-size +Values claim to be in points, e.g.@: @code{14pt}, but the values are +actually in ``device-independent pixels'' (px), at 96/inch. +@end table This element has the following attributes. @@ -527,7 +576,7 @@ table :type[table_type]=(table | note | warning) => tableProperties? tableStructure -tableStructure => path? dataPath +tableStructure => path? dataPath csvPath? @end example This element has the following attributes. @@ -575,7 +624,9 @@ graph :editor? :refMapId? :refMapURI? -=> dataPath? path + :csvFileIds? + :csvFileNames? +=> dataPath? path csvPath? @end example This element represents a graph. The @code{dataPath} and @code{path} @@ -583,12 +634,21 @@ elements name the Zip members that give the details of the graph. Normally, both elements are present; there is only one counterexample in the corpus. +@code{csvPath} only appears in one SPV file in the corpus, for two +graphs. In these two cases, @code{dataPath}, @code{path}, and +@code{csvPath} all appear. These @code{csvPath} name Zip members with +names of the form @file{@var{number}_csv.bin}, where @var{number} is a +many-digit number and the same as the @code{csvFileIds}. The named +Zip members are CSV text files (despite the @file{.bin} extension). +The CSV files are encoded in UTF-8 and begin with a U+FEFF byte-order +marker. + @node SPV Structure model Element @subsection The @code{model} Element @example model - :PMMLContainerId + :PMMLContainerId? :PMMLId :StatXMLContainerId :VDPId @@ -596,7 +656,7 @@ model :commandName :creator-version :mainViewName -=> ViZml? path | pmmlContainerPath statsContainerPath +=> ViZml? dataPath? path | pmmlContainerPath statsContainerPath pmmlContainerPath => TEXT @@ -616,13 +676,51 @@ strings, and @code{path} names an Zip member that contains XML. Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath} name Zip members with @file{.scf} extension. -@node SPV Structure dataPath and path Elements -@subsection The @code{dataPath} and @code{path} Elements +@node SPV Structure object and image Elements +@subsection The @code{object} and @code{image} Elements + +@example +object :type[object_type]=(unknown)? :uri => EMPTY + +image :VDPId :commandName => dataPath +@end example + +These two elements represent an image in PNG format. They are +equivalent and the corpus contains examples of both. The only +difference is the syntax: for @code{object}, the @code{uri} attribute +names the Zip member that contains a PNG file; for @code{image}, the +text of the inner @code{dataPath} element names the Zip member. + +PSPP writes @code{object} in output but there is no strong reason to +choose this form. + +The corpus only contains PNG image files. + +@node SPV Structure tree Element +@subsection The @code{tree} Element + +@example +tree + :commandName + :creator-version + :name + :type +=> dataPath path +@end example + +This element represents a tree. The @code{dataPath} and @code{path} +elements name the Zip members that give the details of the tree. +The details are unexplored. + +@node SPV Structure Path Elements +@subsection Path Elements @example dataPath => TEXT path => TEXT + +csvPath => TEXT @end example These element contain the name of the Zip members that hold details @@ -801,17 +899,17 @@ A byte with value 0 or 1. @item int16 @itemx be16 -A 16-bit integer in little-endian or big-endian byte order, +A 16-bit unsigned integer in little-endian or big-endian byte order, respectively. @item int32 @itemx be32 -A 32-bit integer in little-endian or big-endian byte order, +A 32-bit unsigned integer in little-endian or big-endian byte order, respectively. @item int64 @itemx be64 -A 64-bit integer in little-endian or big-endian byte order, +A 64-bit unsigned integer in little-endian or big-endian byte order, respectively. @item double @@ -822,9 +920,9 @@ A 32-bit IEEE floating-point number. @item string @itemx bestring -A 32-bit integer, in little-endian or big-endian byte order, -respectively, followed by the specified number of bytes of character -data. (The encoding is indicated by the Formats nonterminal.) +A 32-bit unsigned integer, in little-endian or big-endian byte order, +respectively, followed by the specified number of bytes of UTF-8 +encoded character data. @item @var{x}? @var{x} is optional, e.g.@: 00? is an optional zero byte. @@ -848,8 +946,9 @@ in the presence of @math{|}, e.g.@: in 00 (01 @math{|} 02 @math{|} 03) @item count(@var{x}) @itemx becount(@var{x}) -A 32-bit integer, in little-endian or big-endian byte order, respectively, -that indicates the number of bytes in @var{x}, followed by @var{x} itself. +A 32-bit unsigned integer, in little-endian or big-endian byte order, +respectively, that indicates the number of bytes in @var{x}, followed +by @var{x} itself. @item v1(@var{x}) In a version 1 @file{.bin} member, @var{x}; in version 3, nothing. @@ -875,7 +974,7 @@ A ``light'' detail member @file{.bin} consists of a number of sections concatenated together, terminated by an optional byte 01: @example -LightMember => +Table => Header Titles Footnotes Areas Borders PrintSettings TableSettings Formats Dimensions Axes Cells @@ -927,17 +1026,12 @@ and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for version-specific formatting (as described previously). If @code{rotate-inner-column-labels} is 1, then column labels closest -to the data are rotated to be vertical; otherwise, they are shown -in the normal way. +to the data are rotated 90° counterclockwise; otherwise, they are +shown in the normal way. If @code{rotate-outer-row-labels} is 1, then row labels farthest from -the data are rotated to be vertical; otherwise, they are shown in the -normal way. - -@code{table-id} is a binary version of the @code{tableId} attribute in -the structure member that refers to the detail member. For example, -if @code{tableId} is @code{-4122591256483201023}, then @code{table-id} -would be 0xc6c99d183b300001. +the data are rotated 90° counterclockwise; otherwise, they are shown +in the normal way. @code{min-col-width} is the minimum width that a column will be assigned automatically. @code{max-col-width} is the maximum width @@ -946,6 +1040,11 @@ that a column will be assigned to accommodate a long column label. the width of row labels. All of these measurements are in 1/96 inch units (called a ``device independent pixel'' unit in Windows). +@code{table-id} is a binary version of the @code{tableId} attribute in +the structure member that refers to the detail member. For example, +if @code{tableId} is @code{-4122591256483201023}, then @code{table-id} +would be 0xc6c99d183b300001. + The meaning of the other variable parts of the header is not known. A writer may safely use version 3, true for @code{x0}, false for @code{x1}, true for @code{x2}, and 0x15 for @code{x3}. @@ -965,7 +1064,7 @@ Titles => The Titles follow the Header and specify the table's title, caption, and corner text. -The @code{user-title} is shown above the title and reflects any user +The @code{user-title} reflects any user editing of the title text or style. The @code{title} is the title originally generated by the procedure. Both of these are appropriate for presentation and localized to the user's language. For example, @@ -978,9 +1077,9 @@ name the variable and @code{c} is simply ``Frequencies''. The @code{corner-text}, if present, is shown in the upper-left corner of the table, above the row headings and to the left of the column -headings. It is usually absent. Corner text prevents row dimension -labels from being displayed above the dimension's group and category -labels (see @code{show-row-labels-in-corner}). +headings. It is usually absent. When row dimension labels are +displayed in the corner (see @code{show-row-labels-in-corner}), corner +text is hidden. The @code{caption}, if present, is shown below the table. @code{caption} reflects user editing of the caption. @@ -990,12 +1089,19 @@ The @code{caption}, if present, is shown below the table. @example Footnotes => int32[n-footnotes] Footnote*[n-footnotes] -Footnote => Value[text] (58 @math{|} 31 Value[marker]) byte*4 +Footnote => Value[text] (58 @math{|} 31 Value[marker]) int32[show] @end example Each footnote has @code{text} and an optional custom @code{marker} (such as @samp{*}). +The syntax for Value would allow footnotes (and their markers) to +reference other footnotes, but in practice this doesn't work. + +@code{show} is a 32-bit signed integer. It is positive to show the +footnote or negative to hide it. Its magnitude is often 1, and in +other cases tends to be the number of references to the footnote. + @node SPV Light Member Areas @subsection Areas @@ -1014,7 +1120,7 @@ Each Area represents the style for a different area of the table, in the following order: title, caption, footer, corner, column labels, row labels, data, and layers. -@code{index} is the 1-based index of the Area, i.e. 1 for the first +@code{index} is the 1-based index of the Area, i.e.@: 1 for the first Area, through 8 for the final Area. @code{typeface} is the string name of the font used in the area. In @@ -1022,7 +1128,7 @@ the corpus, this is @code{SansSerif} in over 99% of instances and @code{Times New Roman} in the rest. @code{size} is the size of the font, in px (@pxref{SPV Light Detail -Member Format}) The most common size in the corpus is 12 px. Even +Member Format}). The most common size in the corpus is 12 px. Even though @code{size} has a floating-point type, in the corpus its values are always integers. @@ -1139,8 +1245,9 @@ PrintSettings => The PrintSettings reflect settings for printing. The fixed value of @code{endian} can be used to validate the endianness. -@code{all-layers} is 1 to print all layers, 0 to print only the -visible layers. +@code{all-layers} is 1 to print all layers, 0 to print only the layer +designated by @code{current-layer} in TableSettings (@pxref{SPV Light +Member Table Settings}). @code{paginate-layers} is 1 to print each layer at the start of a new page, 0 otherwise. (This setting is honored only @code{all-layers} is @@ -1179,7 +1286,7 @@ TableSettings => ) bestring[notes] bestring[table-look] - 00...)) + )...) Breakpoints => be32[n-breaks] be32*[n-breaks] @@ -1224,8 +1331,9 @@ user-specified Keeps. They seems to indicate a conversion from rows or columns to pixel or point offsets. @code{notes} is a text string that contains user-specified notes. It -is displayed when the user hovers the cursor over the table, like -``alt text'' on a webpage. It is not printed. It is usually empty. +is displayed when the user hovers the cursor over the table, like text +in the @code{title} attribute in HTML@. It is not printed. It is +usually empty. @code{table-look} is the name of a SPSS ``TableLook'' table style, such as ``Default'' or ``Academic''; it is often empty. @@ -1243,7 +1351,7 @@ Formats => int32[n-widths] int32*[n-widths] string[locale] int32[current-layer] - bool bool bool + bool[x7] bool[x8] bool[x9] Y0 CustomCurrency count( @@ -1257,9 +1365,8 @@ If @code{n-widths} is nonzero, then the accompanying integers are column widths as manually adjusted by the user. @code{locale} is a locale including an encoding, such as -@code{en_US.windows-1252} or @code{it_IT.windows-1252}. The rest of -the character strings in the member use this encoding. The encoding -string is itself encoded in US-ASCII. +@code{en_US.windows-1252} or @code{it_IT.windows-1252}. The encoding +string (like other strings in the member) is encoded in UTF-8. @code{epoch} is the year that starts the epoch. A 2-digit year is interpreted as belonging to the 100 years beginning at the epoch. The @@ -1280,6 +1387,8 @@ following strings are CCA through CCE format strings. @xref{Custom Currency Formats,,, pspp, PSPP}. Most commonly these are all @code{-,,,} but other strings occur. +A writer may safely use false for @code{x7}, @code{x8}, and @code{x9}. + @subsubheading X0 X0 only appears, optionally, in version 1 members. @@ -1291,7 +1400,7 @@ Y1 => string[language] string[charset] string[locale] bool bool bool bool Y0 -Y2 => CustomCurrency byte[missing] bool[x16] +Y2 => CustomCurrency byte[missing] bool[x17] @end example @code{command} describes the statistical procedure that generated the @@ -1299,20 +1408,12 @@ output, in English. It is not necessarily the literal syntax name of the procedure: for example, NPAR TESTS becomes ``Nonparametric Tests.'' @code{command-local} is the procedure's name, translated into the output language; it is often empty and, when it is not, -sometimes the same as @code{command}. - -@code{dataset} is the name of the dataset analyzed to produce the -output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the -file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter -is sometimes the empty string. +sometimes the same as @code{command}.q @code{missing} is the character used to indicate that a cell contains a missing value. It is always observed as @samp{.}. -X0 repeats @code{decimal}, @code{grouping}, CustomCurrency, and -@code{missing} already included in Formats. - -A writer may safely use false for @code{x16}. +A writer may safely use false for @code{x17}. @subsubheading X1 @@ -1320,14 +1421,16 @@ X1 only appears in version 3 members. @example X1 => - 00 byte[x14] bool[x15] + bool[x14] + byte[show-title] + bool[x16] byte[lang] byte[show-variables] byte[show-values] - int32[x17] int32[x18] + int32[x18] int32[x19] 00*17 - bool[x19] - 01 + bool[x20] + bool[show-caption] @end example @code{lang} may indicate the language in use. Some values seem to be @@ -1347,8 +1450,13 @@ means to display the value, 2 to display the value label when available, 3 to display both. Again, the most common value is 0, which probably means to use a global default. -A writer may safely use 1 for @code{x14}, false for @code{x15}, -1 for -@code{x17} and @code{x18}, and false for @code{x19}. +@code{show-title} is 1 to show the caption, 10 to hide it. + +@code{show-caption} is true to show the caption, false to hide it. + +A writer may safely use false for @code{x14}, false for @code{x16}, 0 +for @code{lang}, -1 for @code{x18} and @code{x19}, and false for +@code{x20}. @subsubheading X2 @@ -1383,35 +1491,40 @@ X3 only appears in version 3 members. @example X3 => - 01 00 byte[x20] 00 00 00 + 01 00 byte[x21] 00 00 00 Y1 double[small] 01 (string[dataset] string[datafile] i0 int32[date] i0)? Y2 - (int32 i0)? + (int32[x22] i0)? @end example +@code{small} is a small real number. In the corpus, it overwhelmingly +takes the value 0.0001, with zero occasionally seen. Nonzero numbers +with format 40 (@pxref{SPV Light Member Value}) whose magnitudes are +smaller than displayed in scientific notation. (Thus, a @code{small} +of zero prevents scientific notation from being chosen.) + +@code{dataset} is the name of the dataset analyzed to produce the +output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the +file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter +is sometimes the empty string. + @code{date} is a date, as seconds since the epoch, i.e.@: since January 1, 1970. Pivot tables within an SPV file often have dates a few minutes apart, so this is probably a creation date for the table rather than for the file. -X3 repeats @code{decimal}, @code{grouping}, CustomCurrency, and -@code{missing} already included in Formats. @code{command}, -@code{command-local}, @code{language}, @code{charset}, and -@code{locale} have the same meaning as in X0. - -@code{small} is a small real number, e.g.@: .001. Numbers smaller -than this in absolute value are displayed in scientific notation. - Sometimes @code{dataset}, @code{datafile}, and @code{date} are present and other times they are absent. The reader can distinguish by assuming that they are present and then checking whether the presumptive @code{dataset} contains a null byte (a valid string never will). -A writer may safely use 4 for @code{x20} and omit the optional bytes -at the end. +@code{x22} is usually 0 or 2000000. + +A writer may safely use 4 for @code{x21} and omit @code{x22} and the +other optional bytes at the end. @node SPV Light Member Dimensions @subsection Dimensions @@ -1460,7 +1573,8 @@ When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored. @code{dim-index} is usually the 0-based index of the dimension, e.g.@: 0 for the first dimension, 1 for the second, and so on. Sometimes it -is -1. There is no visible difference. +is -1. There is no visible difference. A writer may safely use the +0-based index. @node SPV Light Member Categories @subsection Categories @@ -1472,7 +1586,7 @@ are really categories; the others just serve as grouping constructs. Category => Value[name] (Leaf @math{|} Group) Leaf => 00 00 00 i2 int32[leaf-index] i0 Group => - bool[merge] 00 01 int32[x22] + bool[merge] 00 01 int32[x23] i-1 int32[n-subcategories] Category*[n-subcategories] @end example @@ -1487,12 +1601,12 @@ Leaf. If the user does sorts or rearrange the categories, then the order of categories in the file reflects that change and @code{leaf-index} reflects the original order. -Occasionally a dimension has no leaf categories at all. A table that +A dimension can have no leaf categories at all. A table that contains such a dimension necessarily has no data at all. A Group is a group of nested categories. Usually a Group contains at -least one Category, so that @code{n-subcategories} is positive, but a -few Groups with @code{n-subcategories} 0 has been observed. +least one Category, so that @code{n-subcategories} is positive, but +Groups with zero subcategories have been observed. If a Group's @code{merge} is 00, the most common value, then the group is really a distinct group that should be represented as such in the @@ -1503,10 +1617,9 @@ parent group, then direct children of the dimension), and this group's name is irrelevant and should not be displayed. (Merged groups can be nested!) -(For writing an SPV file, there is no need to use the @code{merge} -feature unless it is convenient.) +Writers need not use merged groups. -A Group's @code{x22} appears to be i2 when all of the categories +A Group's @code{x23} appears to be i2 when all of the categories within a group are leaf categories that directly represent data values for a variable (e.g.@: in a frequency table or crosstabulation, a group of values in a variable being tabulated) and i0 otherwise. A writer @@ -1550,7 +1663,7 @@ Cell => int64[index] v1(00?) Value A Cell consists of an @code{index} and a Value. Suppose there are @math{d} dimensions, numbered 1 through @math{d} in the order given in -the Dimensions previously, and that dimension @math{i}, has @math{n_i} +the Dimensions previously, and that dimension @math{i} has @math{n_i} categories. Consider the cell at coordinates @math{x_i}, @math{1 \le i \le d}, and note that @math{0 \le x_i < n_i}. Then the index is calculated by the following algorithm: @@ -1582,6 +1695,7 @@ RawValue => @math{|} 04 ValueMod int32[format] string[value-label] string[var-name] byte[show] string[s] @math{|} 05 ValueMod string[var-name] string[var-label] byte[show] + @math{|} 06 string[local] ValueMod string[id] string[c] @math{|} ValueMod string[template] int32[n-args] Argument*[n-args] Argument => i0 Value @@ -1594,15 +1708,20 @@ first nonzero byte in the encoding. @table @asis @item 01 The numeric value @code{x}, intended to be presented to the user -formatted according to @code{format}, which is in the format described -for system files, except that format 40 is a synonym for F format -instead of MTIME. @xref{System File Output Formats}, for details. +formatted according to @code{format}, which is about the same as the +format described for system files (@pxref{System File Output +Formats}). The exception is that format 40 is not MTIME but instead +approximately a synonym for F format with a different rule for whether +a value is shown in scientific notation: a value in format 40 is shown +in scientific notation if and only if it is nonzero and its magnitude +is less than @code{small} (@pxref{SPV Light Member Formats}). + Most commonly, @code{format} has width 40 (the maximum). An @code{x} with the maximum negative double value @code{-DBL_MAX} represents the system-missing value SYSMIS. (HIGHEST and LOWEST have -not been observed.) @xref{System File Format}, for more about these -special values. +not been observed.) See @ref{System File Format}, for more about +these special values. @item 02 Similar to @code{01}, with the additional information that @code{x} is @@ -1641,8 +1760,9 @@ case, @code{id} is always the empty string; in the latter case, The string value @code{s}, intended to be presented to the user formatted according to @code{format}. The format for a string is not too interesting, and the corpus contains many clearly invalid formats -like A16.39 or A255.127 or A134.1, so readers should probably ignore -the format entirely. +like A16.39 or A255.127 or A134.1, so readers should probably entirely +disregard the format. PSPP only checks @code{format} to distinguish +AHEX format. @code{s} is a value of variable @code{var-name} and has value label @code{value-label}. @code{var-name} is never empty but @@ -1651,14 +1771,18 @@ the format entirely. @code{show} has the same meaning as in the encoding for 02. @item 05 -Variable @code{var-name}, which is rarely observed as empty in the -corpus, with variable label @code{var-label}, which is often empty. +Variable @code{var-name} with variable label @code{var-label}. In the +corpus, @code{var-name} is rarely empty and @code{var-label} is often +empty. @code{show} determines whether to show the variable name or the variable label. A value of 1 means to show the name, 2 to show the label, 3 to show both, and 0 means to use the default specified in @code{show-variables} (@pxref{SPV Light Member Formats}). +@item 06 +Similar to type 03, with @code{fixed} assumed to be true. + @item otherwise When the first byte of a RawValue is not one of the above, the RawValue starts with a ValueMod, whose syntax is described in the next @@ -1747,11 +1871,11 @@ ValueMod => 58 @math{|} 31 int32[n-refs] int16*[n-refs] - (i0 | i1 string[subscript]) + int32[n-subscripts] string*[n-subscripts] v1(00 (i1 | i2) 00? 00? int32 00? 00?) v3(count(TemplateString StylePair)) -TemplateString => count((count((i0 58)?) (58 @math{|} 31 string[id]))?) +TemplateString => count((count((i0 (58 @math{|} 31 55))?) (58 @math{|} 31 string[id]))?) StylePair => (31 FontStyle | 58) @@ -1774,12 +1898,13 @@ a Value. Each of the @code{n-refs} integers is a reference to a Footnote (@pxref{SPV Light Member Footnotes}) by 0-based index. Footnote markers are shown appended to the main text of the Value, as -superscripts. +superscripts or subscripts. -The @code{subscript}, if present, is a string to append to the main -text of the Value, as a subscript. The subscript text is a brief -indicator, e.g.@: @samp{a} or @samp{a,b}, with its meaning indicated -by the table caption. +The @code{subscripts}, if present, are strings to append to the main +text of the Value, as subscripts. Each subscript text is a brief +indicator, e.g.@: @samp{a} or @samp{b}, with its meaning indicated by +the table caption. When multiple subscripts are present, they are +displayed separated by commas. The @code{id} inside the TemplateString, if present, is a template string for substitutions using the syntax explained previously. It @@ -1971,8 +2096,10 @@ An XML Schema for VizML is available, distributed with SPSS binaries, under a nonfree license. It contains documentation that is occasionally helpful. -See @file{src/output/spv/detail-xml.grammar} in the PSPP source tree -for the full grammar that it uses for parsing. +This section describes the detail XML format using the same notation +already used for the structure XML format (@pxref{SPV Structure Member +Format}). See @file{src/output/spv/detail-xml.grammar} in the PSPP +source tree for the full grammar that it uses for parsing. The important elements of the detail XML format are: @@ -3575,6 +3702,7 @@ XML, which has the following @code{tableProperties} element: @example tableProperties + :name? => generalProperties footnoteProperties cellFormatProperties borderProperties printingProperties generalProperties @@ -3605,6 +3733,7 @@ style :font-size? :font-style=(regular | italic)? :font-weight=(regular | bold)? + :font-underline=(none | underline)? :labelLocationVertical=(positive | negative | center)? :margin-bottom=dimension? :margin-left=dimension? @@ -3632,3 +3761,6 @@ printingProperties :printEachLayerOnSeparatePage=bool? => EMPTY @end example + +The @code{name} attribute appears only in standalone @file{.stt} files +(@pxref{SPSS TableLook STT Format}).