@node SPV Structure Member Format
@section Structure Member Format
+A structure member lays out the high-level structure for a group of
+output items such as heading, tables, and charts. Structure members
+do not include the details of tables and charts but instead refer to
+them by their member names.
+
Structure members' XML files claim conformance with a collection of
XML Schemas. These schemas are distributed, under a nonfree license,
with SPSS binaries. Fortunately, the schemas are not necessary to
-understand the structure members. To a degree, the schemas can even
+understand the structure members. The schemas can even
be deceptive because they document elements and attributes that are
not in the corpus and do not document elements and attributes that are
-commonly found there.
+commonly found in the corpus.
Structure members use a different XML namespace for each schema, but
these namespaces are not entirely consistent. In some SPV files, for
@center @image{dev/spv-structure, 5in}
@end iftex
+The following example shows the contents of a typical structure member
+for a @cmd{DESCRIPTIVES} procedure. A real structure member would not
+have indentation. This example all omits most attributes, all XML
+namespace information, and the CSS from the embedded HTML:
+
+@example
+<?xml version="1.0" encoding="utf-8"?>
+<heading>
+ <label>Output</label>
+ <heading commandName="Descriptives">
+ <label>Descriptives</label>
+ <container>
+ <label>Title</label>
+ <text commandName="Descriptives" type="title">
+ <html lang="en">
+<![CDATA[<head><style type="text/css">...</style></head><BR>Descriptives]]>
+ </html>
+ </text>
+ </container>
+ <container visibility="hidden">
+ <label>Notes</label>
+ <table commandName="Descriptives" subType="Notes" type="note">
+ <tableStructure>
+ <dataPath>00000000001_lightNotesData.bin</dataPath>
+ </tableStructure>
+ </table>
+ </container>
+ <container>
+ <label>Descriptive Statistics</label>
+ <table commandName="Descriptives" subType="Descriptive Statistics"
+ type="table">
+ <tableStructure>
+ <dataPath>00000000002_lightTableData.bin</dataPath>
+ </tableStructure>
+ </table>
+ </container>
+ </heading>
+</heading>
+@end example
+
@menu
* SPV Structure heading Element::
* SPV Structure label Element::
Contents: CDATA
The CDATA contains an HTML document. In some cases, the document
-starts with @code{<html>} and ends with @code{</html}; in others the
+starts with @code{<html>} and ends with @code{</html>}; in others the
@code{html} element is implied. Generally the HTML includes a
@code{head} element with a CSS stylesheet. The HTML body often begins
with @code{<BR>}. The actual content ranges from trivial to simple:
@node SPV Structure @code{text} Element (Inside @code{pageParagraph})
@subsection The @code{text} Element (Inside @code{pageParagraph})
-Parent: @code{pageParagraph}
+Parent: @code{pageParagraph} @*
Contents: CDATA?
This @code{text} element is nested inside a @code{pageParagraph}. There
text: in the corpus, either an @code{html} or @code{p} element. It is
@emph{almost}-XHTML because the @code{html} element designates the
default namespace as
-@code{http://xml.spss.com/spss/viewer/viewer-tree} instead of an XHTML
+@indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} instead of an XHTML
namespace, and because the CDATA can contain substitution variables:
@code{&[Page]} for the page number and @code{&[PageTitle]} for the
page title.
Header @result{}
01 00
(i1 @math{|} i3)[@t{version}]
- 01 bool*4 int
+ bool
+ bool[@t{show-numeric-markers}]
+ bool[@t{rotate-inner-column-labels}]
+ bool[@t{rotate-outer-row-labels}]
+ bool
+ int
int[@t{min-column-width}] int[@t{max-column-width}]
int[@t{min-row-width}] int[@t{max-row-width}]
int64[@t{table-id}]
and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for
version-specific formatting (as described previously).
+If @code{show-numeric-markers} is 1, footnote markers are shown as
+numbers, starting from 1; otherwise, they are shown as letters,
+starting from @samp{a}.
+
+If @code{rotate-inner-column-labels} is 1, then column labels closest
+to the data are rotated to be vertical; otherwise, they are shown
+in the normal way.
+
+If @code{rotate-outer-row-labels} is 1, then row labels farthest from
+the data are rotated to be vertical; otherwise, they are shown in the
+normal way.
+
@code{table-id} is a binary version of the @code{tableId} attribute in
the structure member that refers to the detail member. For example,
if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
bool[@t{footnote-marker-position}]
v3(
byte
- be32[@t{n}] byte*[@t{n}]
+ count(
+ Breakpoints[@t{row-breaks}] Breakpoints[@t{column-breaks}]
+ Keeps[@t{row-keeps}] Keeps[@t{column-keeps}]
+ PointKeeps[@t{row-keeps}] PointKeeps[@t{column-keeps}]
+ )
bestring[@t{notes}]
bestring[@t{table-look}]
00...
)
+
+Breakpoints @result{} be32[@t{n-breaks}] be32*[@t{n-breaks}]
+
+Keeps @result{} be32[@t{n-keeps}] Keep*@t{n-keeps}
+Keep @result{} be32[@t{offset}] be[@t{n}]
+
+PointKeeps @result{} be32[@t{n-point-keeps}] PointKeep*@t{n-point-keeps}
+PointKeep @result{} be32[@t{offset}] be32 be32
+
@end format
@end cartouche
When @code{footnote-marker-position} is 1, footnote markers are shown
as superscripts, otherwise as subscripts.
+The Breakpoints are rows or columns after which there is a page break;
+for example, a row break of 1 requests a page break after the second
+row. Usually no breakpoints are specified, indicating that page
+breaks should be selected automatically.
+
+The Keeps are ranges of rows or columns to be kept together without a
+page break; for example, a row Keep with @code{offset} 1 and @code{n}
+10 requests that the 10 rows starting with the second row be kept
+together. Usually no Keeps are specified.
+
+The PointKeeps seem to be generated automatically based on
+user-specified Keeps. They seems to indicate a conversion from rows
+or columns to pixel or point offsets.
+
@code{notes} is a text string that contains user-specified notes. It
is displayed when the user hovers the cursor over the table, like
``alt text'' on a webpage. It is not printed. It is usually empty.
@cartouche
@format
Formats @result{}
- int[@t{nwidths}] int*[@t{nwidths}]
+ int[@t{n-widths}] int*[@t{n-widths}]
string[@t{encoding}]
- int (00 @math{|} 01) 00 (00 @math{|} 01)
+ int[@t{current-layer}]
+ bool[@t{digit-grouping}] bool[@t{leading-zero}] bool
int[@t{epoch}]
byte[@t{decimal}] byte[@t{grouping}]
CustomCurrency
- v1(i0)
- v3(count(count(X5) count(X6)))
-
-CustomCurrency @result{} int[@t{n-ccs}] string*[@t{n-ccs}]
+ count(
+ v1(X0?)
+ v3(count(X1 count(X2)) count(X3))
-X5 @result{} byte*33 int[@t{n}] int*[@t{n}]
-X6 @result{}
+X0 @result{}
+ byte*14
+ string[@t{command}] string[@t{command-local}]
+ string[@t{language}] string[@t{charset}] string[@t{locale}]
+ bool 00 bool bool
+ int[@t{epoch}]
+ byte[@t{decimal}] byte[@t{grouping}]
+ CustomCurrency
+ byte[@t{missing}] bool
+
+X1 @result{}
+ byte*2
+ byte[@t{lang}]
+ byte[@t{variable-mode}]
+ byte[@t{value-mode}]
+ int*2
+ 00*17
+ bool
+ 01
+X2 @result{}
+ int[@t{n-heights}] int*[@t{n-heights}]
+ int[@t{n-style-map}] BlankMap*[@t{n-style-map}]
+ int[@t{n-styles}] StylePair*[@t{n-styles}]
+ count((i0 i0)?)
+StyleMap @result{} int64[@t{cell-index}] int16[@t{style-index}]
+X3 @result{}
01 00 (03 @math{|} 04) 00 00 00
- string[@t{command}] string[@t{subcommand}]
+ string[@t{command}] string[@t{command-local}]
string[@t{language}] string[@t{charset}] string[@t{locale}]
- (00 @math{|} 01) 00 bool bool
+ bool 00 bool bool
int[@t{epoch}]
byte[@t{decimal}] byte[@t{grouping}]
double[@t{small}] 01
(string[@t{dataset}] string[@t{datafile}] i0 int[@t{date}] i0)?
CustomCurrency
byte[@t{missing}] bool (i2000000 i0)?
+
+CustomCurrency @result{} int[@t{n-ccs}] string*[@t{n-ccs}]
@end format
@end cartouche
-If @code{nwidths} is nonzero, then the accompanying integers are
+If @code{n-widths} is nonzero, then the accompanying integers are
column widths as manually adjusted by the user. (Row heights are
computed automatically based on the widths.)
@samp{'} (apostrophe), @samp{ } (space), and zero (presumably
indicating that digits should not be grouped).
+@code{command} describes the statistical procedure that generated the
+output, in English. It is not necessarily the literal syntax name of
+the procedure: for example, NPAR TESTS becomes ``Nonparametric
+Tests.'' @code{command-local} is the procedure's name, translated
+into the output language; it is often empty and, when it is not,
+sometimes the same as @code{command}.
+
@code{dataset} is the name of the dataset analyzed to produce the
output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter
Currency Formats,,, pspp, PSPP}. Most commonly these are all
@code{-,,,} but other strings occur.
+@code{missing} is the character used to indicate that a cell contains
+a missing value. It is always observed as @samp{.}.
+
@node SPV Light Member Dimensions
@subsection Dimensions
@cartouche
@format
Dimensions @result{} int[@t{n-dims}] Dimension*[@t{n-dims}]
-Dimension @result{} Value[@t{name}] DimUnknown int[@t{n-categories}] Category*[@t{n-categories}]
-DimUnknown @result{}
+Dimension @result{} Value[@t{name}] DimProperties int[@t{n-categories}] Category*[@t{n-categories}]
+DimProperties @result{}
byte[@t{d1}]
(00 @math{|} 01 @math{|} 02)[@t{d2}]
(i0 @math{|} i2)[@t{d3}]
- (00 @math{|} 01)[@t{d4}]
- (00 @math{|} 01)[@t{d5}]
- 01
- int[@t{d6}]
+ bool[@t{show-dim-label}]
+ bool[@t{hide-all-labels}]
+ 01 int[@t{dim-index}]
@end format
@end cartouche
@code{name} is the name of the dimension, e.g. @code{Variables},
@code{Statistics}, or a variable name.
+The meanings of @code{d1}, @code{d2}, and @code{d3} are unknown.
@code{d1} is usually 0 but many other values have been observed.
-@code{d3} is 2 over 99% of the time.
+If @code{show-dim-label} is 01, the pivot table displays a label for
+the dimension itself. Because usually the group and category labels
+are enough explanation, it is usually 00.
-@code{d5} is 0 over 99% of the time.
+If @code{hide-all-labels} is 01, the pivot table omits all labels for
+the dimension, including group and category labels. It is usually 00.
+When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored.
-@code{d6} is either -1 or the 0-based index of the dimension, e.g.@: 0
-for the first dimension, 1 for the second, and so on. The latter is
-the case 98% of the time in the corpus.
+@code{dim-index} is usually the 0-based index of the dimension, e.g.@:
+0 for the first dimension, 1 for the second, and so on. Sometimes it
+is -1. There is no visible difference.
@node SPV Light Member Categories
@subsection Categories
@cartouche
@format
Category @result{} Value[@t{name}] (Leaf @math{|} Group)
-Leaf @result{} 00 00 00 i2 int[@t{index}] i0
+Leaf @result{} 00 00 00 i2 int[@t{cat-index}] i0
Group @result{}
- (00 @math{|} 01)[@t{merge}] 00 01 (i0 @math{|} i2)[@t{data}]
+ bool[@t{merge}] 00 01 (i0 @math{|} i2)[@t{data}]
i-1 int[@t{n-subcategories}] Category*[@t{n-subcategories}]
@end format
@end cartouche
@code{name} is the name of the category (or group).
-A Leaf represents a leaf category. The Leaf's @code{index} is a
+A Leaf represents a leaf category. The Leaf's @code{cat-index} is a
nonnegative integer less than @code{n-categories} in the Dimension in
-which the Category is nested (directly or indirectly).
+which the Category is nested (directly or indirectly). These
+categories represent the original order in which the categories were
+sorted; if the user sorted or rearranged the categories, then the
+order of categories in the file reflects that without changing the
+@code{cat-index} values.
-A Group represents a Group of nested categories. Usually a Group
-contains at least one Category, so that @code{n-subcategories} is
-positive, but a few Groups with @code{n-subcategories} 0 has been
-observed.
+A Group is a group of nested categories. Usually a Group contains at
+least one Category, so that @code{n-subcategories} is positive, but a
+few Groups with @code{n-subcategories} 0 has been observed.
If a Group's @code{merge} is 00, the most common value, then the group
is really a distinct group that should be represented as such in the
Data @result{}
int[@t{layers}] int[@t{rows}] int[@t{columns}] int*[@t{n-dimensions}]
int[@t{n-data}] Datum*[@t{n-data}]
-Datum @result{} int64[@t{index}] v3(00?) Value
+Datum @result{} int64[@t{index}] v1(00?) Value
@end format
@end cartouche
-The values of @code{layers}, @code{rows}, and @code{columns} each
-specifies the number of dimensions displayed in layers, rows, and
+The values of @code{n-layers}, @code{n-rows}, and @code{n-columns}
+each specifies the number of dimensions displayed in layers, rows, and
columns, respectively. Any of them may be zero. Their values sum to
@code{n-dimensions} from Dimensions (@pxref{SPV Light Member
Dimensions}).
The @code{n-dimensions} integers are a permutation of the 0-based
-dimension numbers. The first @code{layers} integers specify each of
-the dimensions represented by layers, the next @code{rows} integers
+dimension numbers. The first @code{n-layers} integers specify each of
+the dimensions represented by layers, the next @code{n-rows} integers
specify the dimensions represented by rows, and the final
-@code{columns} integers specify the dimensions represented by columns.
-When there is more than one dimension of a given kind, the inner
-dimensions are given first.
+@code{n-columns} integers specify the dimensions represented by
+columns. When there is more than one dimension of a given kind, the
+inner dimensions are given first.
The format of a Datum varies slightly from version 1 to version 3: in
version 1 it allows for an extra optional 00 byte.
For example, suppose there are 3 dimensions with 3, 4, and 5
categories, respectively. The datum at coordinates (1, 2, 3) has
index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}.
+Within a given dimension, the index is the @code{cat-index} in a Leaf.
@node SPV Light Member Value
@subsection Value
01 ValueMod int[@t{format}] double[@t{x}]
@math{|} 02 ValueMod int[@t{format}] double[@t{x}]
string[@t{varname}] string[@t{vallab}] (01 @math{|} 02 @math{|} 03)
- @math{|} 03 string[@t{local}] ValueMod string[@t{id}] string[@t{c}] (00 @math{|} 01)[@t{type}]
+ @math{|} 03 string[@t{local}] ValueMod string[@t{id}] string[@t{c}] bool[@t{type}]
@math{|} 04 ValueMod int[@t{format}] string[@t{vallab}] string[@t{varname}]
(01 @math{|} 02 @math{|} 03) string[@t{s}]
@math{|} 05 ValueMod string[@t{varname}] string[@t{varlabel}] (01 @math{|} 02 @math{|} 03)
ValueMod @result{}
31 i0 (i0 @math{|} i1 string[@t{subscript}])
v1(00 (i1 @math{|} i2) 00 00 int 00 00)
- v3(count(FormatString Style ValueModUnknown))
+ v3(count(FormatString StylePair))
@math{|} 31 int[@t{n-refs}] int16*[@t{n-refs}] Format
@math{|} 58
-Style @result{} 58 @math{|} 31 01? 00? 00? 00? 01 string[@t{fgcolor}] string[@t{bgcolor}] string[@t{typeface}] byte[@t{size}]
+
Format @result{} 00 00 count(FormatString Style 58)
-FormatString @result{} count((i0 (58 @math{|} 31 string))?)
-ValueModUnknown @result{} 58 @math{|} 31 i0 i0 i0 i0 01 00 (01 @math{|} 02 @math{|} 08) 00 08 00 0a 00)
+FormatString @result{} count((count((i0 58)?) (58 @math{|} 31 string))?)
+
+StylePair @result{}
+ (31 Style | 58)
+ (31 Style2 | 58)
+
+Style @result{}
+ bool[@t{bold}] bool[@t{italic}] bool[@t{underline}] bool[@t{show}]
+ string[@t{fgcolor}] string[@t{bgcolor}]
+ string[@t{typeface}] byte[@t{size}]
+
+Style2 @result{}
+ int[@t{halign}] int[@t{valign}] double[@t{offset}]
+ int16[@t{left-margin}] int16[@t{right-margin}]
+ int16[@t{top-margin}] int16[@t{bottom-margin}]
@end format
@end cartouche
version of the localized format string in the Value in which the
Format is nested.
-The Style, if present, changes the style for this individual Value.
-The @code{size} is a font size in units of 1/96 inch.
+Style and Style2, if present, change the style for this individual
+Value. @code{bold}, @code{italic}, and @code{underline} control the
+particular style. @code{fgcolor} and @code{bgcolor} are strings, such
+as @code{#ffffff}. The @code{size} is a font size in units of 1/96
+inch.
+
+@code{halign} is 0 for center, 2 for left, 4 for right, 6 for decimal,
+0xffffffad for mixed. For decimal alignment, @code{offset} is the
+decimal point's offset from the right side of the cell, in units of
+1/72 inch.
+
+@code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
+for bottom.
+
+@code{left-margin}, @code{right-margin}, @code{top-margin}, and
+@code{bottom-margin} are in units of 1/72 inch.
@node SPV Legacy Detail Member Binary Format
@section Legacy Detail Member Binary Format
* SPV Detail coordinates Element::
* SPV Detail faceting Element::
* SPV Detail facetLayout Element::
+* SPV Detail style Element::
@end menu
@node SPV Detail visualization Element
@defvr {Required} style
The @code{id} of a @code{style} element (@pxref{SPV Detail style
-element}). This is the base style for the entire pivot table. In
+Element}). This is the base style for the entire pivot table. In
every example in the corpus, the value is @code{visualizationStyle}
and the corresponding @code{style} element has no attributes other
than @code{id}.
@defvr {Required} cellStyle
@defvrx {Required} style
Each of these is the @code{id} of a @code{style} element (@pxref{SPV
-Detail style element}). The former is the default style for
+Detail style Element}). The former is the default style for
individual cells, the latter for the entire table.
@end defvr
@defvrx {Optional} useGroupging
The syntax and meaning of these attributes is the same as on the
@code{format} element for a numeric format. @pxref{SPV Detail format
-element}.
+Element}.
@end defvr
@node SPV Detail stringFormat Element
e.g.@: @code{0} or @code{13;14;15;16}.
@end defvr
-@subsubheading The @code{intersectWhere}
+@subsubheading The @code{intersectWhere} Element
Parent: @code{intersect} @*
Contents: empty
the corpus they always take the values @code{dimension2categories} and
@code{dimension0categories}, respectively.
@end defvr
+
+@node SPV Detail style Element
+@subsection The @code{style} Element
+
+TBD.