1 @node SPSS Viewer File Format
2 @appendix SPSS Viewer File Format
4 SPSS Viewer or @file{.spv} files, here called SPV files, are written
5 by SPSS 16 and later to represent the contents of its output editor.
6 This chapter documents the format, based on examination of a corpus of
7 about 500 files from a variety of sources. This description is
8 detailed enough to read SPV files, but probably not enough to write
11 SPSS 15 and earlier versions use a completely different output format
12 based on the Microsoft Compound Document Format. This format is not
15 An SPV file is a Zip archive that can be read with @command{zipinfo}
16 and @command{unzip} and similar programs. The final member in the Zip
17 archive is a file named @file{META-INF/MANIFEST.MF}. This structure
18 makes SPV files resemble Java ``JAR'' files (and ODF files), but
19 whereas a JAR manifest contains a sequence of colon-delimited
20 key/value pairs, an SPV manifest contains the string
21 @samp{allowPivoting=true}, without a new-line. (This string may be
22 the best way to identify an SPV file; it is invariant across the
25 The rest of the members in an SPV file's Zip archive fall into two
26 categories: @dfn{structure} and @dfn{detail} members. Structure
27 member names begin with @file{outputViewer@var{nnnnnnnnnn}}, where
28 each @var{n} is a decimal digit, and end with @file{.xml}, and often
29 include the string @file{_heading} in between. Each of these members
30 represents some kind of output item (a table, a heading, a block of
31 text, etc.) or a group of them. The member whose output goes at the
32 beginning of the document is numbered 0, the next member in the output
33 is numbered 1, and so on.
35 Structure members contain XML. This XML is sometimes self-contained,
36 but it often references detail members in the Zip archive, which are
40 @item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin}
41 @itemx @file{@var{prefix}_lightTableData.bin}
42 The structure of a table plus its data. Older SPV files pair a
43 @file{@var{prefix}_table.xml} file that describes the table's
44 structure with a binary @file{@var{prefix}_tableData.bin} file that
45 gives its data. Newer SPV files (the majority of those in the corpus)
46 instead include a single @file{@var{prefix}_lightTableData.bin} file
47 that incorporates both into a single binary format.
49 @item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin}
50 @itemx @file{@var{prefix}_lightWarningData.bin}
51 Same format used for tables, with a different name.
53 @item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin}
54 @itemx @file{@var{prefix}_lightNotesData.bin}
55 Same format used for tables, with a different name.
57 @item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml}
58 The structure of a chart plus its data. Charts do not have a
61 @item @file{@var{prefix}_pmml.scf}
62 @itemx @file{@var{prefix}_stats.scf}
63 @item @file{@var{prefix}_model.xml}
64 Not yet investigated. The corpus contains few examples.
67 The @file{@var{prefix}} in the names of the detail members is
68 typically an 11-digit decimal number that increases for each item,
69 tending to skip values. Older SPV files use different naming
70 conventions. Structure member refer to detail members by name, and so
71 their exact names do not matter to readers as long as they are unique.
74 * SPV Structure Member Format::
75 * SPV Light Detail Member Format::
76 * SPV Legacy Detail Member Binary Format::
77 * SPV Legacy Detail Member XML Format::
80 @node SPV Structure Member Format
81 @section Structure Member Format
83 A structure member lays out the high-level structure for a group of
84 output items such as heading, tables, and charts. Structure members
85 do not include the details of tables and charts but instead refer to
86 them by their member names.
88 Structure members' XML files claim conformance with a collection of
89 XML Schemas. These schemas are distributed, under a nonfree license,
90 with SPSS binaries. Fortunately, the schemas are not necessary to
91 understand the structure members. The schemas can even
92 be deceptive because they document elements and attributes that are
93 not in the corpus and do not document elements and attributes that are
94 commonly found in the corpus.
96 Structure members use a different XML namespace for each schema, but
97 these namespaces are not entirely consistent. In some SPV files, for
98 example, the @code{viewer-tree} schema is associated with namespace
99 @indicateurl{http://xml.spss.com/spss/viewer-tree} and in others with
100 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the
101 additional @file{viewer/}). Under either name, the schema URIs are
102 not resolvable to obtain the schemas themselves.
104 One may ignore all of the above in interpreting a structure member.
105 The actual XML has a simple and straightforward form that does not
106 require a reader to take schemas or namespaces into account. A
107 structure member's root is @code{heading} element, which contains
108 @code{heading} or @code{container} elements (or a mix), forming a
109 tree. In turn, @code{container} holds a @code{label} and one more
110 child, usually @code{text} or @code{table}.
112 The following diagram shows the hierarchy within an SPV structure
113 member more precisely. Names represent elements and <text> and
114 <cdata> represent plain text and CDATA, respectively. Edges point
115 from parent to child. Unlabeled edges indicate that the child appears
116 exactly once; edges labeled with *, zero or more times; edges labeled
117 with ?, zero or one times. Where possible, child elements are shown
118 in the order they actually appear within a parent element.
124 | +--> pageHeader +--> pageParagraph --> text --> <cdata>
125 | +--> pageFooter +--> pageParagraph --> text --> <cdata>
126 +-----> label --?--> <text>
129 +-----> label --?--> <text>
130 +--?--> text ---> html --> <cdata>
132 | +--?-- tableProperties
133 | | +--> generalProperties
134 | | +--> footnoteProperties
135 | | +--> cellFormatProperties
136 | | | +--> caption -------> style
137 | | | +--> footnotes -----> style
138 | | | +--> rowLabelse ----> style
139 | | | +--> columnLabels --> style
140 | | | +--> data ----------> style
141 | | | +--> layers --------> style
142 | | | +--> title ---------> style
143 | | | +--> cornerLabels --> style
144 | | +--> borderProperties
145 | | | +--> topInnerFrame
146 | | | +--> rightInnerFrame
147 | | | +--> horizontalDimensionBorderColumns
148 | | | +--> horizontalDimensionBorderRows
149 | | | +--> horizontalCategoryBorderColumns
150 | | | +--> leftInnerFrame
151 | | | +--> verticalDimensionBorderRows
152 | | | +--> titleLayerSeparator
153 | | | +--> verticalCategoryBorderRows
154 | | | +--> topOuterFrame
155 | | | +--> bottomInnerFrame
156 | | | +--> leftOuterFrame
157 | | | +--> dataAreaTop
158 | | | +--> verticalDimensionBorderColumns
159 | | | +--> dataAreaLeft
160 | | | +--> horizontalCategoryBorderRows
161 | | | +--> bottomOuterFrame
162 | | | +--> rightOuterFrame
163 | | | +--> verticalCategoryBorderColumns
164 | | +--> printingProperties
165 | +----- tableStructure
166 | +--?--> path ------> <text>
167 | +-----> dataPath --> <text>
169 | +--?--> dataPath --> <text>
170 | +-----> path ------> <text>
172 +--?--> ViZml --> <text>
173 +--?--> path ---> <text>
174 +--?--> pmmlContainerPath ---> <text>
175 +--?--> statsContainerPath --> <text>
179 The elements found in structure members are documented below. For
180 each element, we note the possible parent elements and the element's
181 contents. The contents are specified as pseudo-regular expressions
182 with the following conventions:
195 Grouping multiple elements.
200 @item @var{a} @math{|} @var{b}
201 A choice between @var{a} and @var{b}.
204 Zero or more @var{x}.
207 The following example shows the contents of a typical structure member
208 for a @cmd{DESCRIPTIVES} procedure. A real structure member is not
209 indented. This example also omits most attributes, all XML namespace
210 information, and the CSS from the embedded HTML:
213 <?xml version="1.0" encoding="utf-8"?>
215 <label>Output</label>
216 <heading commandName="Descriptives">
217 <label>Descriptives</label>
220 <text commandName="Descriptives" type="title">
222 <![CDATA[<head><style type="text/css">...</style></head><BR>Descriptives]]>
226 <container visibility="hidden">
228 <table commandName="Descriptives" subType="Notes" type="note">
230 <dataPath>00000000001_lightNotesData.bin</dataPath>
235 <label>Descriptive Statistics</label>
236 <table commandName="Descriptives" subType="Descriptive Statistics"
239 <dataPath>00000000002_lightTableData.bin</dataPath>
248 * SPV Structure heading Element::
249 * SPV Structure label Element::
250 * SPV Structure container Element::
251 * SPV Structure text Element (Inside @code{container})::
252 * SPV Structure html Element::
253 * SPV Structure table Element::
254 * SPV Structure tableStructure Element::
255 * SPV Structure graph Element::
256 * SPV Structure model Element::
257 * SPV Structure dataPath and path Elements::
258 * SPV Structure pageSetup Element::
259 * SPV Structure pageHeader and pageFooter Elements::
260 * SPV Structure pageParagraph Element::
261 * SPV Structure @code{text} Element (Inside @code{pageParagraph})::
264 @node SPV Structure heading Element
265 @subsection The @code{heading} Element
267 Parent: Document root or @code{heading} @*
268 Contents: @code{pageSetup}? @code{label} (@code{container} @math{|} @code{heading})*
270 The root of a structure member is a @code{heading}, which represents a
271 section of output beginning with a title (the @code{label}) and
272 ordinarily followed by content containers or further nested
273 (sub)-sections of output. Unlike heading elements in HTML and other
274 common document formats, which precede the content that they head,
275 @code{heading} contains the elements that appear below the heading.
277 The document root heading, only, may also contain a @code{pageSetup}
280 The following attributes have been observed on both document root and
281 nested @code{heading} elements.
283 @defvr {Optional} creator-version
284 The version of the software that created this SPV file. A string of
285 the form @code{xxyyzzww} represents software version xx.yy.zz.ww,
286 e.g.@: @code{21000001} is version 21.0.0.1. Trailing pairs of zeros
287 are sometimes omitted, so that @code{21}, @code{210000}, and
288 @code{21000000} are all version 21.0.0.0 (and the corpus contains all
289 three of those forms).
293 The following attributes have been observed on document root
294 @code{heading} elements only:
296 @defvr {Optional} @code{creator}
297 The directory in the file system of the software that created this SPV
301 @defvr {Optional} @code{creation-date-time}
302 The date and time at which the SPV file was written, in a
303 locale-specific format, e.g.@: @code{Friday, May 16, 2014 6:47:37 PM
304 PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday,
305 December 5, 2014 5:00:19 o'clock PM EST}.
308 @defvr {Optional} @code{lockReader}
309 Whether a reader should be allowed to edit the output. The possible
310 values are @code{true} and @code{false}, but the corpus only contains
314 @defvr {Optional} @code{schemaLocation}
315 This is actually an XML Namespace attribute. A reader may ignore it.
319 The following attributes have been observed only on nested
320 @code{heading} elements:
322 @defvr {Required} @code{commandName}
323 The locale-invariant name of the command that produced the output,
324 e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
327 @defvr {Optional} @code{visibility}
328 To what degree the output represented by the element is visible. The
329 only observed value is @code{collapsed}.
332 @defvr {Optional} @code{locale}
333 The locale used for output, in Windows format, which is similar to the
334 format used in Unix with the underscore replaced by a hyphen, e.g.@:
335 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
338 @defvr {Optional} @code{olang}
339 The output language, e.g.@: @code{en}, @code{it}, @code{es},
340 @code{de}, @code{pt-BR}.
343 @node SPV Structure label Element
344 @subsection The @code{label} Element
346 Parent: @code{heading} or @code{container} @*
349 Every @code{heading} and @code{container} holds a @code{label} as its
350 first child. The root @code{heading} in a structure member always
351 contains the string ``Output''. Otherwise, the text in @code{label}
352 describes what it labels, often by naming the statistical procedure
353 that was executed, e.g.@: ``Frequencies'' or ``T-Test''. Labels are
354 often very generic, especially within a @code{container}, e.g.@:
355 ``Title'' or ``Warnings'' or ``Notes''. Label text is localized
356 according to the output language, e.g.@: in Italian a frequency table
357 procedure is labeled ``Frequenze''.
359 The corpus contains a few examples of empty labels, ones that contain
362 This element has no attributes.
364 @node SPV Structure container Element
365 @subsection The @code{container} Element
367 Parent: @code{heading} @*
368 Contents: @code{label} (@code{table} @math{|} @code{text} @math{|} @code{graph} @math{|} @code{model})
370 A @code{container} serves to label a @code{table} or a @code{text}
373 This element has the following attributes.
375 @defvr {Required} @code{visibility}
376 Either @code{visible} or @code{hidden}, this indicates whether the
377 container's content is displayed.
380 @defvr {Optional} @code{text-align}
381 Presumably indicates the alignment of text within the container. The
382 only observed value is @code{left}. Observed with nested @code{table}
383 and @code{text} elements.
386 @defvr {Optional} @code{width}
387 The width of the container in the form @code{@var{n}px}, e.g.@:
391 @node SPV Structure text Element (Inside @code{container})
392 @subsection The @code{text} Element (Inside @code{container})
394 Parent: @code{container} @*
395 Contents: @code{html}
397 This @code{text} element is nested inside a @code{container}. There
398 is a different @code{text} element that is nested inside a
399 @code{pageParagraph}.
401 This element has the following attributes.
403 @defvr {Required} @code{type}
404 One of @code{title}, @code{log}, or @code{text}.
407 @defvr {Optional} @code{commandName}
408 As on the @code{heading} element. For output not specific to a
409 command, this is simply @code{log}. The corpus contains one example
410 of where @code{commandName} is present but set to the empty string.
413 @defvr {Optional} @code{creator-version}
414 As on the @code{heading} element.
417 @node SPV Structure html Element
418 @subsection The @code{html} Element
420 Parent: @code{text} @*
423 The CDATA contains an HTML document. In some cases, the document
424 starts with @code{<html>} and ends with @code{</html>}; in others the
425 @code{html} element is implied. Generally the HTML includes a
426 @code{head} element with a CSS stylesheet. The HTML body often begins
427 with @code{<BR>}. The actual content ranges from trivial to simple:
428 just discarding the CSS and tags yields readable results.
430 This element has the following attributes.
432 @defvr {Required} @code{lang}
433 This always contains @code{en} in the corpus.
436 @node SPV Structure table Element
437 @subsection The @code{table} Element
439 Parent: @code{container} @*
440 Contents: @code{tableStructure}
442 This element has the following attributes.
444 @defvr {Required} @code{commandName}
445 As on the @code{heading} element.
448 @defvr {Required} @code{type}
449 One of @code{table}, @code{note}, or @code{warning}.
452 @defvr {Required} @code{subType}
453 The locale-invariant name for the particular kind of output that this
454 table represents in the procedure. This can be the same as
455 @code{commandName} e.g.@: @code{Frequencies}, or different, e.g.@:
456 @code{Case Processing Summary}. Generic subtypes @code{Notes} and
457 @code{Warnings} are often used.
460 @defvr {Required} @code{tableId}
461 A number that uniquely identifies the table within the SPV file,
462 typically a large negative number such as @code{-4147135649387905023}.
465 @defvr {Optional} @code{creator-version}
466 As on the @code{heading} element. In the corpus, this is only present
467 for version 21 and up and always includes all 8 digits.
470 @node SPV Structure tableStructure Element
471 @subsection The @code{tableStructure} Element
473 Parent: @code{table} @*
474 Contents: @code{dataPath}
476 This element has no attributes.
478 @node SPV Structure graph Element
479 @subsection The @code{graph} Element
481 Parent: @code{container} @*
482 Contents: @code{dataPath}? @code{path}
484 This element represents a graph. The @code{dataPath} and @code{path}
485 elements name the Zip members that give the details of the graph.
486 Normally, both elements are present; there is only one counterexample
489 @node SPV Structure model Element
490 @subsection The @code{model} Element
492 Parent: @code{container} @*
493 Contents: (@code{ViZml}? @code{path}) @math{|} (@code{pmmlContainerPath} @code{statsContainerPath})
495 This element represents a model. The @code{dataPath} and @code{path}
496 elements name the Zip members that give the details of the model.
497 Normally, both elements are present; there is only one counterexample
500 The details are unexplored. The @code{ViZml} element contains base-64
501 encoded text, that decodes to a binary format with some embedded text
502 strings, and @code{path} names an Zip member that contains XML.
503 Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath}
504 name Zip members with @file{.scf} extension.
506 @node SPV Structure dataPath and path Elements
507 @subsection The @code{dataPath} and @code{path} Elements
509 Parent: @code{tableStructure} or @code{graph} or @code{model} @*
512 These element contain the name of the Zip members that hold details
513 for a container. For tables:
517 When a ``light'' format is used, only @code{dataPath} is present, and
518 it names a @file{.bin} member of the Zip file that has @code{light} in
519 its name, e.g.@: @code{0000000001437_lightTableData.bin} (@pxref{SPV
520 Light Detail Member Format}).
523 When the legacy format is used, both are present. In this case,
524 @code{dataPath} names a Zip member with a legacy binary format that
525 contains relevant data (@pxref{SPV Legacy Detail Member Binary
526 Format}), and @code{path} names a Zip member that uses an XML format
527 (@pxref{SPV Legacy Detail Member XML Format}).
530 Graphs normally follow the legacy approach described above. The
531 corpus contains one example of a graph with @code{path} but not
532 @code{dataPath}. The reason is unexplored.
534 Models use @code{path} but not @code{dataPath}. @xref{SPV Structure
535 graph Element}, for more information.
537 These elements have no attributes.
539 @node SPV Structure pageSetup Element
540 @subsection The @code{pageSetup} Element
542 Parent: @code{heading} @*
543 Contents: @code{pageHeader} @code{pageFooter}
545 This element has the following attributes.
547 @defvr {Required} @code{initial-page-number}
551 @defvr {Optional} @code{chart-size}
552 Always @code{as-is} or a localization (!) of it (e.g.@: @code{dimensione
553 attuale}, @code{Wie vorgegeben}).
556 @defvr {Optional} @code{margin-left}
557 @defvrx {Optional} @code{margin-right}
558 @defvrx {Optional} @code{margin-top}
559 @defvrx {Optional} @code{margin-bottom}
560 Margin sizes in the form @code{@var{size}in}, e.g.@: @code{0.25in}.
563 @defvr {Optional} @code{paper-height}
564 @defvrx {Optional} @code{paper-width}
565 Paper sizes in the form @code{@var{size}in}, e.g.@: @code{8.5in} by
566 @code{11in} for letter paper or @code{8.267in} by @code{11.692in} for
570 @defvr {Optional} @code{reference-orientation}
574 @defvr {Optional} @code{space-after}
578 @node SPV Structure pageHeader and pageFooter Elements
579 @subsection The @code{pageHeader} and @code{pageFooter} Elements
581 Parent: @code{pageSetup} @*
582 Contents: @code{pageParagraph}*
584 This element has no attributes.
586 @node SPV Structure pageParagraph Element
587 @subsection The @code{pageParagraph} Element
589 Parent: @code{pageHeader} or @code{pageFooter} @*
590 Contents: @code{text}
592 Text to go at the top or bottom of a page, respectively.
594 This element has no attributes.
596 @node SPV Structure @code{text} Element (Inside @code{pageParagraph})
597 @subsection The @code{text} Element (Inside @code{pageParagraph})
599 Parent: @code{pageParagraph} @*
602 This @code{text} element is nested inside a @code{pageParagraph}. There
603 is a different @code{text} element that is nested inside a
606 The element is either empty, or contains CDATA that holds almost-XHTML
607 text: in the corpus, either an @code{html} or @code{p} element. It is
608 @emph{almost}-XHTML because the @code{html} element designates the
610 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} instead of an XHTML
611 namespace, and because the CDATA can contain substitution variables:
612 @code{&[Page]} for the page number and @code{&[PageTitle]} for the
615 Typical contents (indented for clarity):
618 <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
621 <p style="text-align:right; margin-top: 0">Page &[Page]</p>
626 This element has the following attributes.
628 @defvr {Required} @code{type}
632 @node SPV Light Detail Member Format
633 @section Light Detail Member Format
635 This section describes the format of ``light'' detail @file{.bin}
636 members. These members have a binary format which we describe here in
637 terms of a context-free grammar using the following conventions:
640 @item NonTerminal @result{} @dots{}
641 Nonterminals have CamelCaps names, and @result{} indicates a
642 production. The right-hand side of a production is often broken
643 across multiple lines. Break points are chosen for aesthetics only
644 and have no semantic significance.
646 @item 00, 01, @dots{}, ff.
647 A bytes with a fixed value, written as a pair of hexadecimal digits.
649 @item i0, i1, @dots{}, i9, i10, i11, @dots{}
650 @itemx b0, b1, @dots{}, b9, b10, b11, @dots{}
651 A 32-bit integer in little-endian or big-endian byte order,
652 respectively, with a fixed value, written in decimal, prefixed by
659 A byte with value 0 or 1.
663 A 16-bit integer in little-endian or big-endian byte order,
668 A 32-bit integer in little-endian or big-endian byte order,
673 A 64-bit integer in little-endian or big-endian byte order,
677 A 64-bit IEEE floating-point number.
680 A 32-bit IEEE floating-point number.
684 A 32-bit integer, in little-endian or big-endian byte order,
685 respectively, followed by the specified number of bytes of character
686 data. (The encoding is indicated by the Formats nonterminal.)
689 @var{x} is optional, e.g.@: 00? is an optional zero byte.
691 @item @var{x}*@var{n}
692 @var{x} is repeated @var{n} times, e.g. byte*10 for ten arbitrary bytes.
694 @item @var{x}[@var{name}]
695 Gives @var{x} the specified @var{name}. Names are used in textual
696 explanations. They are also used, also bracketed, to indicate counts,
697 e.g.@: int[@t{n}] byte*[@t{n}] for a 32-bit integer followed by the
698 specified number of arbitrary bytes.
700 @item @var{a} @math{|} @var{b}
701 Either @var{a} or @var{b}.
704 Parentheses are used for grouping to make precedence clear, especially
705 in the presence of @math{|}, e.g.@: in 00 (01 @math{|} 02 @math{|} 03)
709 A 32-bit integer that indicates the number of bytes in @var{x},
710 followed by @var{x} itself.
713 In a version 1 @file{.bin} member, @var{x}; in version 3, nothing.
714 (The @file{.bin} header indicates the version.)
717 In a version 3 @file{.bin} member, @var{x}; in version 1, nothing.
720 Little-endian byte order is far more common in this format, but a few
721 pieces of the format use big-endian byte order.
723 A ``light'' detail member @file{.bin} consists of a number of sections
724 concatenated together, terminated by a byte 01:
728 LightMember @result{}
731 Fonts Borders PrintSettings TableSettings Formats
737 The following sections go into more detail.
740 * SPV Light Member Header::
741 * SPV Light Member Title::
742 * SPV Light Member Caption::
743 * SPV Light Member Footnotes::
744 * SPV Light Member Fonts::
745 * SPV Light Member Borders::
746 * SPV Light Member Print Settings::
747 * SPV Light Member Table Settings::
748 * SPV Light Member Formats::
749 * SPV Light Member Dimensions::
750 * SPV Light Member Categories::
751 * SPV Light Member Data::
752 * SPV Light Member Value::
753 * SPV Light Member ValueMod::
756 @node SPV Light Member Header
759 An SPV light member begins with a 39-byte header:
765 (i1 @math{|} i3)[@t{version}]
767 bool[@t{show-numeric-markers}]
768 bool[@t{rotate-inner-column-labels}]
769 bool[@t{rotate-outer-row-labels}]
772 int[@t{min-column-width}] int[@t{max-column-width}]
773 int[@t{min-row-width}] int[@t{max-row-width}]
778 @code{version} is a version number that affects the interpretation of
779 some of the other data in the member. We will refer to ``version 1''
780 and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for
781 version-specific formatting (as described previously).
783 If @code{show-numeric-markers} is 1, footnote markers are shown as
784 numbers, starting from 1; otherwise, they are shown as letters,
785 starting from @samp{a}.
787 If @code{rotate-inner-column-labels} is 1, then column labels closest
788 to the data are rotated to be vertical; otherwise, they are shown
791 If @code{rotate-outer-row-labels} is 1, then row labels farthest from
792 the data are rotated to be vertical; otherwise, they are shown in the
795 @code{table-id} is a binary version of the @code{tableId} attribute in
796 the structure member that refers to the detail member. For example,
797 if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
798 would be 0xc6c99d183b300001.
800 @code{min-column-width} is the minimum width that a column will be
801 assigned automatically. @code{max-column-width} is the maximum width
802 that a column will be assigned to accommodate a long column label.
803 @code{min-row-width} and @code{max-row-width} are a similar range for
804 the width of row labels. All of these measurements are in 1/96 inch
807 The meaning of the other variable parts of the header is not known.
809 @node SPV Light Member Title
815 Value[@t{title1}] 01?
817 Value[@t{title2}] 01?
821 The Title, which follows the Header, specifies the pivot table's title
822 twice, as @code{title1} and @code{title2}. In the corpus, they are
825 Whereas the Value in @code{title1} and in @code{title2} are
826 appropriate for presentation, and localized to the user's language,
827 @code{c} is in English, sometimes less specific, and sometimes less
828 well formatted. For example, for a frequency table, @code{title1} and
829 @code{title2} name the variable and @code{c} is simply ``Frequencies''.
831 @node SPV Light Member Caption
836 Caption @result{} Caption1 Caption2
837 Caption1 @result{} 31 Value @math{|} 58
838 Caption2 @result{} 31 Value @math{|} 58
842 The Caption, if present, is shown below the table. Caption2 is
843 normally present. Caption1 is only rarely nonempty; it might reflect
844 user editing of the caption.
846 @node SPV Light Member Footnotes
847 @subsection Footnotes
851 Footnotes @result{} int[@t{n}] Footnote*[@t{n}]
852 Footnote @result{} Value[@t{text}] (58 @math{|} 31 Value[@t{marker}]) byte*4
856 Each footnote has @code{text} and an optional customer @code{marker}
859 @node SPV Light Member Fonts
864 Fonts @result{} 00 Font*8
867 string[@t{typeface}] float[@t{size}] int[@t{style}] bool[@t{underline}]
868 int[@t{halign}] int[@t{valign}]
869 string[@t{fgcolor}] string[@t{bgcolor}]
870 byte[@t{alternate}] string[@t{altfg}] string[@t{altbg}]
871 v3(int[@t{left-margin}] int[@t{right-margin}] int[@t{top-margin}] int[@t{bottom-margin}])
875 Each Font represents the font style for a different element, in the
876 following order: title, caption, footer, corner, column
877 labels, row labels, data, and layers.
879 @code{index} is the 1-based index of the Font, i.e. 1 for the first
880 Font, through 8 for the final Font.
882 @code{typeface} is the string name of the font. In the corpus, this
883 is @code{SansSerif} in over 99% of instances and @code{Times New
886 @code{size} is the size of the font, in points. The most common size
887 in the corpus is 12 points.
889 @code{style} is a bit mask. Bit 0 (with value 1) is set for bold, bit
890 1 (with value 2) is set for italic.
892 @code{underline} is 1 if the font is underlined, 0 otherwise.
894 @code{halign} specifies horizontal alignment: 0 for center, 2 for
895 left, 4 for right, 61453 for decimal, 64173 for mixed. Mixed
896 alignment varies according to type: string data is left-justified,
897 numbers and most other formats are right-justified.
899 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
902 @code{fgcolor} and @code{bgcolor} are the foreground color and
903 background color, respectively. In the corpus, these are always
904 @code{#000000} and @code{#ffffff}, respectively.
906 @code{alternate} is 01 if rows should alternate colors, 00 if all rows
907 should be the same color. When @code{alternate} is 01, @code{altfg}
908 and @code{altbg} specify the colors for the alternate rows.
910 @code{left-margin}, @code{right-margin}, @code{top-margin}, and
911 @code{bottom-margin} are measured in multiples of 1/96 inch.
913 @node SPV Light Member Borders
920 be32[@t{n-borders}] Border*[@t{n-borders}]
921 bool[@t{show-grid-lines}]
925 be32[@t{border-type}]
926 be32[@t{stroke-type}]
931 The Borders reflect how borders between regions are drawn.
933 The fixed value of @code{endian} can be used to validate the
936 @code{show-grid-lines} is 1 to draw grid lines, otherwise 0.
938 Each Border describes one kind of border. @code{n-borders} seems to
939 always be 19. Each @code{border-type} appears once (although in an
940 unpredictable order) and correspond to the following borders:
946 Left, top, right, and bottom outer frame.
948 Left, top, right, and bottom inner frame.
950 Left and top of data area.
952 Horizontal and vertical dimension rows.
954 Horizontal and vertical dimension columns.
956 Horizontal and vertical category rows.
958 Horizontal and vertical category columns.
961 @code{stroke-type} describes how a border is drawn, as one of:
978 @code{color} is an RGB color. Bits 24--31 are alpha, bits 16--23 are
979 red, 8--15 are green, 0--7 are blue. An alpha of 255 indicates an
980 opaque color, therefore opaque black is 0xff000000.
982 @node SPV Light Member Print Settings
983 @subsection Print Settings
987 PrintSettings @result{}
990 bool[@t{paginate-layers}]
993 bool[@t{top-continuation}]
994 bool[@t{bottom-continuation}]
995 be32[@t{n-orphan-lines}]
996 bestring[@t{continuation-string}]
1000 The PrintSettings reflect settings for printing. The fixed value of
1001 @code{endian} can be used to validate the endianness.
1003 @code{all-layers} is 1 to print all layers, 0 to print only the
1006 @code{paginate-layers} is 1 to print each layer at the start of a new
1007 page, 0 otherwise. (This setting is honored only @code{all-layers} is
1008 1, since otherwise only one layer is printed.)
1010 @code{fit-width} and @code{fit-length} control whether the table is
1011 shrunk to fit within a page's width or length, respectively.
1013 @code{n-orphan-lines} is the minimum number of rows or columns to put
1014 in one part of a table that is broken across pages.
1016 If @code{top-continuation} is 1, then @code{continuation-string} is
1017 printed at the top of a page when a table is broken across pages for
1018 printing; similarly for @code{bottom-continuation} and the bottom of a
1019 page. Usually, @code{continuation-string} is empty.
1021 @node SPV Light Member Table Settings
1022 @subsection Table Settings
1026 TableSettings @result{}
1029 be32[@t{current-layer}]
1030 bool[@t{omit-empty}]
1031 bool[@t{show-row-labels-in-corner}]
1032 bool[@t{show-alphabetic-markers}]
1033 bool[@t{footnote-marker-position}]
1037 Breakpoints[@t{row-breaks}] Breakpoints[@t{column-breaks}]
1038 Keeps[@t{row-keeps}] Keeps[@t{column-keeps}]
1039 PointKeeps[@t{row-keeps}] PointKeeps[@t{column-keeps}]
1042 bestring[@t{table-look}]
1046 Breakpoints @result{} be32[@t{n-breaks}] be32*[@t{n-breaks}]
1048 Keeps @result{} be32[@t{n-keeps}] Keep*@t{n-keeps}
1049 Keep @result{} be32[@t{offset}] be[@t{n}]
1051 PointKeeps @result{} be32[@t{n-point-keeps}] PointKeep*@t{n-point-keeps}
1052 PointKeep @result{} be32[@t{offset}] be32 be32
1057 The TableSettings reflect display settings. The fixed value of
1058 @code{endian} can be used to validate the endianness.
1060 @code{current-layer} is the displayed layer.
1062 If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
1063 any cell) are hidden; otherwise, they are shown.
1065 If @code{show-row-labels-in-corner} is 1, then row labels are shown in
1066 the upper left corner; otherwise, they are shown nested.
1068 If @code{show-alphabetic-markers} is 1, markers are shown as letters
1069 (e.g. @samp{a}, @samp{b}, @samp{c}, @dots{}); otherwise, they are
1070 shown as numbers starting from 1.
1072 When @code{footnote-marker-position} is 1, footnote markers are shown
1073 as superscripts, otherwise as subscripts.
1075 The Breakpoints are rows or columns after which there is a page break;
1076 for example, a row break of 1 requests a page break after the second
1077 row. Usually no breakpoints are specified, indicating that page
1078 breaks should be selected automatically.
1080 The Keeps are ranges of rows or columns to be kept together without a
1081 page break; for example, a row Keep with @code{offset} 1 and @code{n}
1082 10 requests that the 10 rows starting with the second row be kept
1083 together. Usually no Keeps are specified.
1085 The PointKeeps seem to be generated automatically based on
1086 user-specified Keeps. They seems to indicate a conversion from rows
1087 or columns to pixel or point offsets.
1089 @code{notes} is a text string that contains user-specified notes. It
1090 is displayed when the user hovers the cursor over the table, like
1091 ``alt text'' on a webpage. It is not printed. It is usually empty.
1093 @code{table-look} is the name of a SPSS ``TableLook'' table style,
1094 such as ``Default'' or ``Academic''; it is often empty.
1096 TableSettings ends with an arbitrary number of null bytes.
1098 @node SPV Light Member Formats
1104 int[@t{n-widths}] int*[@t{n-widths}]
1105 string[@t{encoding}]
1106 int[@t{current-layer}]
1107 bool[@t{digit-grouping}] bool[@t{leading-zero}] bool
1109 byte[@t{decimal}] byte[@t{grouping}]
1113 v3(count(X1 count(X2)) count(X3))
1117 string[@t{command}] string[@t{command-local}]
1118 string[@t{language}] string[@t{charset}] string[@t{locale}]
1121 byte[@t{decimal}] byte[@t{grouping}]
1123 byte[@t{missing}] bool
1128 byte[@t{variable-mode}]
1129 byte[@t{value-mode}]
1135 int[@t{n-heights}] int*[@t{n-heights}]
1136 int[@t{n-style-map}] BlankMap*[@t{n-style-map}]
1137 int[@t{n-styles}] StylePair*[@t{n-styles}]
1139 StyleMap @result{} int64[@t{cell-index}] int16[@t{style-index}]
1141 01 00 (03 @math{|} 04) 00 00 00
1142 string[@t{command}] string[@t{command-local}]
1143 string[@t{language}] string[@t{charset}] string[@t{locale}]
1146 byte[@t{decimal}] byte[@t{grouping}]
1147 double[@t{small}] 01
1148 (string[@t{dataset}] string[@t{datafile}] i0 int[@t{date}] i0)?
1150 byte[@t{missing}] bool (i2000000 i0)?
1152 CustomCurrency @result{} int[@t{n-ccs}] string*[@t{n-ccs}]
1156 If @code{n-widths} is nonzero, then the accompanying integers are
1157 column widths as manually adjusted by the user. (Row heights are
1158 computed automatically based on the widths.)
1160 @code{encoding} is a character encoding, usually a Windows code page
1161 such as @code{en_US.windows-1252} or @code{it_IT.windows-1252}. The
1162 rest of the character strings in the member use this encoding. The
1163 encoding string is itself encoded in US-ASCII.
1165 @code{epoch} is the year that starts the epoch. A 2-digit year is
1166 interpreted as belonging to the 100 years beginning at the epoch. The
1167 default epoch year is 69 years prior to the current year; thus, in
1168 2017 this field by default contains 1948. In the corpus, @code{epoch}
1169 ranges from 1943 to 1948, plus some contain -1.
1171 @code{decimal} is the decimal point character. The observed values
1172 are @samp{.} and @samp{,}.
1174 @code{grouping} is the grouping character. Usually, it is @samp{,} if
1175 @code{decimal} is @samp{.}, and vice versa. Other observed values are
1176 @samp{'} (apostrophe), @samp{ } (space), and zero (presumably
1177 indicating that digits should not be grouped).
1179 @code{command} describes the statistical procedure that generated the
1180 output, in English. It is not necessarily the literal syntax name of
1181 the procedure: for example, NPAR TESTS becomes ``Nonparametric
1182 Tests.'' @code{command-local} is the procedure's name, translated
1183 into the output language; it is often empty and, when it is not,
1184 sometimes the same as @code{command}.
1186 @code{dataset} is the name of the dataset analyzed to produce the
1187 output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
1188 file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter
1189 is sometimes the empty string.
1191 @code{date} is a date, as seconds since the epoch, i.e.@: since
1192 January 1, 1970. Pivot tables within an SPV files often have dates a
1193 few minutes apart, so this is probably a creation date for the tables
1194 rather than for the file.
1196 Sometimes @code{dataset}, @code{datafile}, and @code{date} are present
1197 and other times they are absent. The reader can distinguish by
1198 assuming that they are present and then checking whether the
1199 presumptive @code{dataset} contains a null byte (a valid string never
1202 @code{n-ccs} is observed as either 0 or 5. When it is 5, the
1203 following strings are CCA through CCE format strings. @xref{Custom
1204 Currency Formats,,, pspp, PSPP}. Most commonly these are all
1205 @code{-,,,} but other strings occur.
1207 @code{missing} is the character used to indicate that a cell contains
1208 a missing value. It is always observed as @samp{.}.
1210 @node SPV Light Member Dimensions
1211 @subsection Dimensions
1213 A pivot table presents multidimensional data. A Dimension identifies
1214 the categories associated with each dimension.
1218 Dimensions @result{} int[@t{n-dims}] Dimension*[@t{n-dims}]
1219 Dimension @result{} Value[@t{name}] DimProperties int[@t{n-categories}] Category*[@t{n-categories}]
1220 DimProperties @result{}
1222 (00 @math{|} 01 @math{|} 02)[@t{d2}]
1223 (i0 @math{|} i2)[@t{d3}]
1224 bool[@t{show-dim-label}]
1225 bool[@t{hide-all-labels}]
1226 01 int[@t{dim-index}]
1230 @code{name} is the name of the dimension, e.g. @code{Variables},
1231 @code{Statistics}, or a variable name.
1233 The meanings of @code{d1}, @code{d2}, and @code{d3} are unknown.
1234 @code{d1} is usually 0 but many other values have been observed.
1236 If @code{show-dim-label} is 01, the pivot table displays a label for
1237 the dimension itself. Because usually the group and category labels
1238 are enough explanation, it is usually 00.
1240 If @code{hide-all-labels} is 01, the pivot table omits all labels for
1241 the dimension, including group and category labels. It is usually 00.
1242 When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored.
1244 @code{dim-index} is usually the 0-based index of the dimension, e.g.@:
1245 0 for the first dimension, 1 for the second, and so on. Sometimes it
1246 is -1. There is no visible difference.
1248 @node SPV Light Member Categories
1249 @subsection Categories
1251 Categories are arranged in a tree. Only the leaf nodes in the tree
1252 are really categories; the others just serve as grouping constructs.
1256 Category @result{} Value[@t{name}] (Leaf @math{|} Group)
1257 Leaf @result{} 00 00 00 i2 int[@t{cat-index}] i0
1259 bool[@t{merge}] 00 01 (i0 @math{|} i2)[@t{data}]
1260 i-1 int[@t{n-subcategories}] Category*[@t{n-subcategories}]
1264 @code{name} is the name of the category (or group).
1266 A Leaf represents a leaf category. The Leaf's @code{cat-index} is a
1267 nonnegative integer less than @code{n-categories} in the Dimension in
1268 which the Category is nested (directly or indirectly). These
1269 categories represent the original order in which the categories were
1270 sorted; if the user sorted or rearranged the categories, then the
1271 order of categories in the file reflects that without changing the
1272 @code{cat-index} values.
1274 A Group is a group of nested categories. Usually a Group contains at
1275 least one Category, so that @code{n-subcategories} is positive, but a
1276 few Groups with @code{n-subcategories} 0 has been observed.
1278 If a Group's @code{merge} is 00, the most common value, then the group
1279 is really a distinct group that should be represented as such in the
1280 visual representation and user interface. If @code{merge} is 01, the
1281 categories in this group should be shown and treated as if they were
1282 direct children of the group's containing group (or if it has no
1283 parent group, then direct children of the dimension), and this group's
1284 name is irrelevant and should not be displayed. (Merged groups can be
1287 A Group's @code{data} appears to be i2 when all of the categories
1288 within a group are leaf categories that directly represent data values
1289 for a variable (e.g. in a frequency table or crosstabulation, a group
1290 of values in a variable being tabulated) and i0 otherwise.
1292 @node SPV Light Member Data
1295 The final part of an SPV light member contains the actual data.
1300 int[@t{layers}] int[@t{rows}] int[@t{columns}] int*[@t{n-dimensions}]
1301 int[@t{n-data}] Datum*[@t{n-data}]
1302 Datum @result{} int64[@t{index}] v1(00?) Value
1306 The values of @code{n-layers}, @code{n-rows}, and @code{n-columns}
1307 each specifies the number of dimensions displayed in layers, rows, and
1308 columns, respectively. Any of them may be zero. Their values sum to
1309 @code{n-dimensions} from Dimensions (@pxref{SPV Light Member
1312 The @code{n-dimensions} integers are a permutation of the 0-based
1313 dimension numbers. The first @code{n-layers} integers specify each of
1314 the dimensions represented by layers, the next @code{n-rows} integers
1315 specify the dimensions represented by rows, and the final
1316 @code{n-columns} integers specify the dimensions represented by
1317 columns. When there is more than one dimension of a given kind, the
1318 inner dimensions are given first.
1320 The format of a Datum varies slightly from version 1 to version 3: in
1321 version 1 it allows for an extra optional 00 byte.
1323 A Datum consists of an @code{index} and a Value. Suppose there are
1324 @math{d} dimensions and dimension @math{i}, @math{0 \le i < d}, has
1325 @math{n_i} categories. Consider the datum at coordinates @math{x_i},
1326 @math{0 \le i < d}, and note that @math{0 \le x_i < n_i}. Then the
1327 index is calculated by the following algorithm:
1331 for each @math{i} from 0 to @math{d - 1}:
1332 @i{index} = (@math{n_i \times} @i{index}) @math{+} @math{x_i}
1335 For example, suppose there are 3 dimensions with 3, 4, and 5
1336 categories, respectively. The datum at coordinates (1, 2, 3) has
1337 index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}.
1338 Within a given dimension, the index is the @code{cat-index} in a Leaf.
1340 @node SPV Light Member Value
1343 Value is used throughout the SPV light member format. It boils down
1344 to a number or a string.
1348 Value @result{} 00? 00? 00? 00? RawValue
1350 01 ValueMod int[@t{format}] double[@t{x}]
1351 @math{|} 02 ValueMod int[@t{format}] double[@t{x}]
1352 string[@t{varname}] string[@t{vallab}] (01 @math{|} 02 @math{|} 03)
1353 @math{|} 03 string[@t{local}] ValueMod string[@t{id}] string[@t{c}] bool[@t{type}]
1354 @math{|} 04 ValueMod int[@t{format}] string[@t{vallab}] string[@t{varname}]
1355 (01 @math{|} 02 @math{|} 03) string[@t{s}]
1356 @math{|} 05 ValueMod string[@t{varname}] string[@t{varlabel}] (01 @math{|} 02 @math{|} 03)
1357 @math{|} ValueMod string[@t{format}] int[@t{n-args}] Argument*[@t{n-args}]
1360 @math{|} int[@t{x}] i0 Value*[@t{x}@math{+}1] /* @t{x} @math{>} 0 */
1364 There are several possible encodings, which one can distinguish by the
1365 first nonzero byte in the encoding.
1369 The numeric value @code{x}, intended to be presented to the user
1370 formatted according to @code{format}, which is in the format described
1371 for system files. @xref{System File Output Formats}, for details.
1372 Most commonly, @code{format} has width 40 (the maximum).
1374 An @code{x} with the maximum negative double value @code{-DBL_MAX}
1375 represents the system-missing value SYSMIS. (HIGHEST and LOWEST have
1376 not been observed.) @xref{System File Format}, for more about these
1380 Similar to @code{01}, with the additional information that @code{x} is
1381 a value of variable @code{varname} and has value label @code{vallab}.
1382 Both @code{varname} and @code{vallab} can be the empty string, the
1383 latter very commonly.
1385 The meaning of the final byte is unknown. Possibly it is connected to
1386 whether the value or the label should be displayed.
1389 A text string, in two forms: @code{c} is in English, and sometimes
1390 abbreviated or obscure, and @code{local} is localized to the user's
1391 locale. In an English-language locale, the two strings are often the
1392 same, and in the cases where they differ, @code{local} is more
1393 appropriate for a user interface, e.g.@: @code{c} of ``Not a PxP table
1394 for MCN...'' versus @code{local} of ``Computed only for a PxP table,
1395 where P must be greater than 1.''
1397 @code{c} and @code{local} are always either both empty or both
1400 @code{id} is a brief identifying string whose form seems to resemble a
1401 programming language identifier, e.g.@: @code{cumulative_percent} or
1402 @code{factor_14}. It is not unique.
1404 @code{type} is 00 for text taken from user input, such as syntax
1405 fragment, expressions, file names, data set names, and 01 for fixed
1406 text strings such as names of procedures or statistics. In the former
1407 case, @code{id} is always the empty string; in the latter case,
1408 @code{id} is still sometimes empty.
1411 The string value @code{s}, intended to be presented to the user
1412 formatted according to @code{format}. The format for a string is not
1413 too interesting, and the corpus contains many clearly invalid formats
1414 like A16.39 or A255.127 or A134.1, so readers should probably ignore
1415 the format entirely.
1417 @code{s} is a value of variable @code{varname} and has value label
1418 @code{vallab}. @code{varname} is never empty but @code{vallab} is
1421 The meaning of the final byte is unknown.
1424 Variable @code{varname}, which is rarely observed as empty in the
1425 corpus, with variable label @code{varlabel}, which is often empty.
1427 The meaning of the final byte is unknown.
1430 (These bytes begin a ValueMod.) A format string, analogous to
1431 @code{printf}, followed by one or more Arguments, each of which has
1432 one or more values. The format string uses the following syntax:
1439 Each of these expands to the character following @samp{\\}, to escape
1440 characters that have special meaning in format strings. These are
1441 effective inside and outside the @code{[@dots{}]} syntax forms
1445 Expands to a new-line, inside or outside the @code{[@dots{}]} forms
1449 Expands to a formatted version of argument @var{i}, which must have
1450 only a single value. For example, @code{^1} expands to the first
1451 argument's @code{value}.
1453 @item [:@var{a}:]@var{i}
1454 Expands @var{a} for each of the values in @var{i}. @var{a}
1455 should contain one or more @code{^@var{j}} conversions, which are
1456 drawn from the values for argument @var{i} in order. Some examples
1461 All of the values for the first argument, concatenated.
1464 Expands to the values for the first argument, each followed by
1468 Expands to @code{@var{x} = @var{y}} where @var{x} is the second
1469 argument's first value and @var{y} is its second value. (This would
1470 be used only if the argument has two values. If there were more
1471 values, the second and third values would be directly concatenated,
1472 which would look funny.)
1475 @item [@var{a}:@var{b}:]@var{i}
1476 This extends the previous form so that the first values are expanded
1477 using @var{a} and later values are expanded using @var{b}. For an
1478 unknown reason, within @var{a} the @code{^@var{j}} conversions are
1479 instead written as @code{%@var{j}}. Some examples from the corpus:
1483 Expands to all of the values for the first argument, separated by
1486 @item [%1 = %2:, ^1 = ^2:]1
1487 Given appropriate values for the first argument, expands to @code{X =
1491 Given appropriate values, expands to @code{1, 2, 3}.
1495 The format string is localized to the user's locale.
1498 @node SPV Light Member ValueMod
1499 @subsection ValueMod
1501 A ValueMod can specify special modifications to a Value.
1506 31 i0 (i0 @math{|} i1 string[@t{subscript}])
1507 v1(00 (i1 @math{|} i2) 00 00 int 00 00)
1508 v3(count(FormatString StylePair))
1509 @math{|} 31 int[@t{n-refs}] int16*[@t{n-refs}] Format
1512 Format @result{} 00 00 count(FormatString Style 58)
1513 FormatString @result{} count((count((i0 58)?) (58 @math{|} 31 string))?)
1520 bool[@t{bold}] bool[@t{italic}] bool[@t{underline}] bool[@t{show}]
1521 string[@t{fgcolor}] string[@t{bgcolor}]
1522 string[@t{typeface}] byte[@t{size}]
1525 int[@t{halign}] int[@t{valign}] double[@t{offset}]
1526 int16[@t{left-margin}] int16[@t{right-margin}]
1527 int16[@t{top-margin}] int16[@t{bottom-margin}]
1531 A ValueMod that begins with ``31 i0'' specifies a string to append to
1532 the main text of the Value, as a subscript. The subscript text is a
1533 brief indicator, e.g.@: @samp{a} or @samp{a,b}, with its meaning
1534 indicated by the table caption. In this usage, subscripts are similar
1535 to footnotes. One apparent difference is that a Value can only
1536 reference one footnote but a subscript can list more than one letter.
1538 A ValueMod that begins with 31 followed by a nonzero ``int'' specifies
1539 a footnote or footnotes that the Value references. Footnote markers
1540 are shown appended to the main text of the Value, as superscripts.
1542 The Format, if present, is a format string for substitutions using the
1543 syntax explained previously. It appears to be an English-language
1544 version of the localized format string in the Value in which the
1547 Style and Style2, if present, change the style for this individual
1548 Value. @code{bold}, @code{italic}, and @code{underline} control the
1549 particular style. @code{fgcolor} and @code{bgcolor} are strings, such
1550 as @code{#ffffff}. The @code{size} is a font size in units of 1/96
1553 @code{halign} is 0 for center, 2 for left, 4 for right, 6 for decimal,
1554 0xffffffad for mixed. For decimal alignment, @code{offset} is the
1555 decimal point's offset from the right side of the cell, in units of
1558 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
1561 @code{left-margin}, @code{right-margin}, @code{top-margin}, and
1562 @code{bottom-margin} are in units of 1/72 inch.
1564 @node SPV Legacy Detail Member Binary Format
1565 @section Legacy Detail Member Binary Format
1567 Whereas the light binary format represents everything about a given
1568 pivot table, the legacy binary format conceptually consists of a
1569 number of named sources, each of which consists of a number of named
1570 variables, each of which is a 1-dimensional array of numbers or
1571 strings or a mix. Thus, the legacy binary member format is quite
1574 This section uses the same context-free grammar notation as in the
1575 previous section, with the following additions:
1579 In a version 0xaf legacy member, @var{x}; in other versions, nothing.
1580 (The legacy member header indicates the version; see below.)
1583 In a version 0xb0 legacy member, @var{x}; in other versions, nothing.
1586 A legacy detail member @file{.bin} has the following overall format:
1590 LegacyBinary @result{}
1591 00 byte[@t{version}] int16[@t{n-sources}] int[@t{member-size}]
1592 Metadata*[@t{n-sources}] Data*[@t{n-sources}]
1596 @code{version} is a version number that affects the interpretation of
1597 some of the other data in the member. Versions 0xaf and 0xb0 are
1598 known. We will refer to ``version 0xaf'' and ``version 0xb0'' members
1601 A legacy member consists of @code{n-sources} data sources, each of
1602 which has Metadata and Data.
1604 @code{member-size} is the size of the legacy binary member, in bytes.
1606 The following sections go into more detail.
1609 * SPV Legacy Member Metadata::
1610 * SPV Legacy Member Data::
1613 @node SPV Legacy Member Metadata
1614 @subsection Metadata
1619 int[@t{n-data}] int[@t{n-variables}] int[@t{offset}]
1620 vAF(byte*32[@t{source-name}])
1621 vB0(byte*64[@t{source-name}] int[@t{x}])
1625 A data source has @code{n-variables} variables, each with
1626 @code{n-data} data values.
1628 @code{source-name} is a 32- or 64-byte string padded on the right with
1629 zero bytes. The names that appear in the corpus are very generic:
1630 usually @code{tableData} for pivot table data or @code{source0} for
1633 A given Metadata's @code{offset} is the offset, in bytes, from the
1634 beginning of the member to the start of the corresponding Data. This
1635 allows programs to skip to the beginning of the data for a particular
1636 source; it is also important to determine whether a source includes
1637 any string data (@pxref{SPV Legacy Member Data}).
1639 The meaning of @code{x} in version 0xb0 is unknown.
1641 @node SPV Legacy Member Data
1646 Data @result{} NumericData*[@t{n-variables}] StringData?
1647 NumericData @result{} byte*288[@t{variable-name}] double*[@t{n-data}]
1651 Data follow the Metadata in the legacy binary format, with sources in
1652 the same order. Each NumericSeries begins with a @code{variable-name}
1653 that generally indicates its role in the pivot table, e.g.@: ``cell'',
1654 ``cellFormat'', ``dimension0categories'', ``dimension0group0'',
1655 followed by the numeric data, one double per datum. A double with the
1656 maximum negative double @code{-DBL_MAX} represents the system-missing
1661 StringData @result{} i1 string[@t{source-name}] Pairs Labels
1663 Pairs @result{} int[@t{n-string-vars}] PairSeries*[@t{n-string-vars}]
1664 PairVar @result{} string[@t{pair-var-name}] int[@t{n-pairs}] Pair*[@t{n-pairs}]
1665 Pair @result{} int[@t{i}] int[@t{j}]
1667 Labels @result{} int[@t{n-labels}] Label*[@t{n-labels}]
1668 Label @result{} int[@t{frequency}] int[@t{s}]
1672 A source may include a mix of numeric and string data values. When a
1673 source includes any string data, the data values that are strings are
1674 set to SYSMIS in the NumericData, and StringData follows the
1675 NumericData. A source that contains no string data omits the
1676 StringData. To reliably determine whether a source includes
1677 StringData, the reader should check whether the offset following the
1678 NumericData is the offset of the next source, as indicated by its
1679 Metadata (or the end of the member, in the case of the last source).
1681 StringData repeats the name of the source (from Metadata).
1683 The string data overlays the numeric data. @code{n-string-vars} is
1684 the number of variables in the source that include string data. More
1685 precisely, it is the 1-based index of the last variable in the source
1686 that includes any string data; thus, it would be 4 if there are 5
1687 variables and only the fourth one includes string data.
1689 Each PairVar consists a sequence of 0 or more Pair nonterminals, each
1690 of which maps from a 0-based index within variable @code{i} to a
1691 0-based label index @code{j}, e.g.@: pair @code{i} = 2, @code{j} = 3,
1692 means that the third data value (with value SYSMIS) is to be replaced
1693 by the string of the fourth Label.
1695 The labels themselves follow the pairs. The valuable part of each
1696 label is the string @code{s}. Each label also includes a
1697 @code{frequency} that reports the number of pairs that reference it
1698 (although this is not useful).
1700 @node SPV Legacy Detail Member XML Format
1701 @section Legacy Detail Member XML Format
1703 This format is still under investigation.
1705 The design of the detail XML format is not what one would end up with
1706 for describing pivot tables. This is because it is a special case
1707 of a much more general format (``visualization XML'' or ``VizML'')
1708 that can describe a wide range of visualizations. Most of this
1709 generality is overkill for tables, and so we end up with a funny
1710 subset of a general-purpose format.
1712 The important elements of the detail XML format are:
1716 Variables. Variables in detail XML roughly correspond to the
1717 dimensions in a light detail member. There is one variable for each
1718 dimension, plus one variable for each level of labeling along an axis.
1720 The bulk of variables are defined with @code{sourceVariable} elements.
1721 The data for these variables comes from the associated
1722 @code{tableData.bin} member. Some variables are defined, with
1723 @code{derivedVariable} elements, as a constant or in terms of a
1724 mapping function from a source variable.
1727 Assignment of variables to axes. A variable can appear as columns, or
1728 rows, or layers. The @code{faceting} element and its sub-elements
1729 describe this assignment.
1732 All elements have an optional @code{id} attribute. In practice many
1733 elements are assigned @code{id} attributes that are never referenced.
1736 * SPV Detail visualization Element::
1737 * SPV Detail userSource Element::
1738 * SPV Detail sourceVariable Element::
1739 * SPV Detail derivedVariable Element::
1740 * SPV Detail extension Element::
1741 * SPV Detail graph Element::
1742 * SPV Detail location Element::
1743 * SPV Detail coordinates Element::
1744 * SPV Detail faceting Element::
1745 * SPV Detail facetLayout Element::
1746 * SPV Detail style Element::
1749 @node SPV Detail visualization Element
1750 @subsection The @code{visualization} Element
1753 Parent: Document root
1757 (sourceVariable @math{|} derivedVariable)@math{+}
1765 This element has the following attributes.
1767 @defvr {Required} creator
1768 The version of the software that created this SPV file, as a string of
1769 the form @code{xxyyzz}, which represents software version xx.yy.zz,
1770 e.g.@: @code{160001} is version 16.0.1. The corpus includes major
1771 versions 16 through 19.
1774 @defvr {Required} date
1775 The date on the which the file was created, as a string of the form
1779 @defvr {Required} lang
1780 The locale used for output, in Windows format, which is similar to the
1781 format used in Unix with the underscore replaced by a hyphen, e.g.@:
1782 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
1785 @defvr {Required} name
1786 The title of the pivot table, localized to the output language.
1789 @defvr {Required} style
1790 The @code{id} of a @code{style} element (@pxref{SPV Detail style
1791 Element}). This is the base style for the entire pivot table. In
1792 every example in the corpus, the value is @code{visualizationStyle}
1793 and the corresponding @code{style} element has no attributes other
1797 @defvr {Required} type
1798 A floating-point number. The meaning is unknown.
1801 @defvr {Required} version
1802 The visualization schema version number. In the corpus, the value is
1803 one of 2.4, 2.5, 2.7, and 2.8.
1806 @node SPV Detail userSource Element
1807 @subsection The @code{userSource} Element
1809 Parent: @code{visualization} @*
1812 This element has the following attributes.
1814 @defvr {Optional} missing
1815 Always @code{listwise}.
1818 @node SPV Detail sourceVariable Element
1819 @subsection The @code{sourceVariable} Element
1821 Parent: @code{visualization} @*
1822 Contents: @code{extension}* (@code{format} @math{|} @code{stringFormat})?
1824 This element defines a variable whose values can be used elsewhere in
1825 the visualization. It ties this element's @code{id} to a variable
1826 from the @file{tableData.bin} member that corresponds to this
1829 This element has the following attributes.
1831 @defvr {Required} categorical
1832 Always set to @code{true}.
1835 @defvr {Required} source
1836 Always set to @code{tableData}, the @code{source-name} in the
1837 corresponding @file{tableData.bin} member (@pxref{SPV Legacy Member
1841 @defvr {Required} sourceName
1842 The name of a variable within the source, the @code{variable-name} in
1843 the corresponding @file{tableData.bin} member (@pxref{SPV Legacy
1847 @defvr {Optional} dependsOn
1848 The @code{variable-name} of a variable linked to this one, so that a
1849 viewer can work with them together. For a group variable, this is the
1850 name of the corresponding categorical variable.
1853 @defvr {Optional} label
1854 The variable label, if any
1857 @defvr {Optional} labelVariable
1858 The @code{variable-name} of a variable whose string values correspond
1859 one-to-one with the values of this variable and are suitable for use
1863 @node SPV Detail derivedVariable Element
1864 @subsection The @code{derivedVariable} Element
1866 Parent: @code{visualization} @*
1867 Contents: @code{extension}* (@code{format} @math{|} @code{stringFormat} @code{valueMapEntry}*)
1869 Like @code{sourceVariable}, this element defines a variable whose
1870 values can be used elsewhere in the visualization. Instead of being
1871 read from a data source, the variable's data are defined by a
1872 mathematical expression.
1874 This element has the following attributes.
1876 @defvr {Required} categorical
1877 Always set to @code{true}.
1880 @defvr {Required} value
1881 An expression that defines the variable's value. In theory this could
1882 be an arbitrary expression in terms of constants, functions, and other
1883 variables, e.g.@: @math{(@var{var1} + @var{var2}) / 2}. In practice,
1884 the corpus contains only the following forms of expressions:
1887 @item constant(@var{number})
1888 @itemx constant(@var{variable})
1889 A constant. The meaning when a variable is named is unknown.
1890 Sometimes the ``variable name'' has spaces in it.
1892 @item map(@var{variable})
1893 Transforms the values in the named @var{variable} using the
1894 @code{valueMapEntry}s contained within the element.
1898 @defvr {Optional} dependsOn
1899 The @code{variable-name} of a variable linked to this one, so that a
1900 viewer can work with them together. For a group variable, this is the
1901 name of the corresponding categorical variable.
1905 * SPV Detail valueMapEntry Element::
1908 @node SPV Detail valueMapEntry Element
1909 @subsubsection The @code{valueMapEntry} Element
1911 Parent: @code{derivedVariable} @*
1914 A @code{valueMapEntry} element defines a mapping from one or more
1915 values of a source expression to a target value. (In the corpus, the
1916 source expression is always just the name of a variable.) Each target
1917 value requires a separate @code{valueMapEntry}. If multiple source
1918 values map to the same target value, they can be combined or separate.
1920 @code{valueMapEntry} has the following attributes.
1922 @defvr {Required} from
1923 A source value, or multiple source values separated by semicolons,
1924 e.g.@: @code{0} or @code{13;14;15;16}.
1927 @defvr {Required} to
1931 @node SPV Detail extension Element
1932 @subsection The @code{extension} Element
1934 This is a general-purpose ``extension'' element. Readers that don't
1935 understand a given extension should be able to safely ignore it. The
1936 attributes on this element, and their meanings, vary based on the
1937 context. Each known usage is described separately below. The current
1938 extensions use attributes exclusively, without any nested elements.
1940 @subsubheading @code{visualization} Parent Element
1942 With @code{visualization} as its parent element, @code{extension} has
1943 the following attributes.
1945 @defvr {Optional} numRows
1946 An integer that presumably defines the number of rows in the displayed
1950 @defvr {Optional} showGridline
1951 Always set to @code{false} in the corpus.
1954 @defvr {Optional} minWidthSet
1955 @defvrx {Optional} maxWidthSet
1956 Always set to @code{true} in the corpus.
1959 @subsubheading @code{container} Parent Element
1961 With @code{container} as its parent element, @code{extension} has the
1962 following attributes.
1964 @defvr {Required} combinedFootnotes
1965 Always set to @code{true} in the corpus.
1968 @subsubheading @code{sourceVariable} and @code{derivedVariable} Parent Element
1970 With @code{sourceVariable} or @code{derivedVariable} as its parent
1971 element, @code{extension} has the following attributes. A given
1972 parent element often contains several @code{extension} elements that
1973 specify the meaning of the source data's variables or sources, e.g.@:
1976 <extension from="0" helpId="corrected_model"/>
1977 <extension from="3" helpId="error"/>
1978 <extension from="4" helpId="total_9"/>
1979 <extension from="5" helpId="corrected_total"/>
1982 @defvr {Required} from
1983 An integer or a name like ``dimension0''.
1986 @defvr {Required} helpId
1990 @node SPV Detail graph Element
1991 @subsection The @code{graph} Element
1993 Parent: @code{visualization} @*
1994 Contents: @code{location}@math{+} @code{coordinates} @code{faceting} @code{facetLayout} @code{interval}
1996 @code{graph} has the following attributes.
1998 @defvr {Required} cellStyle
1999 @defvrx {Required} style
2000 Each of these is the @code{id} of a @code{style} element (@pxref{SPV
2001 Detail style Element}). The former is the default style for
2002 individual cells, the latter for the entire table.
2005 @node SPV Detail location Element
2006 @subsection The @code{location} Element
2008 Parent: @code{graph} @*
2011 Each instance of this element specifies where some part of the table
2012 frame is located. All the examples in the corpus have four instances
2013 of this element, one for each of the parts @code{height},
2014 @code{width}, @code{left}, and @code{top}. Some examples in the
2015 corpus add a fifth for part @code{bottom}, even though it is not clear
2016 how all of @code{top}, @code{bottom}, and @code{heigth} can be honored
2017 at the same time. In any case, @code{location} seems to have little
2018 importance in representing tables; a reader can safely ignore it.
2020 @defvr {Required} part
2021 One of @code{height}, @code{width}, @code{top}, @code{bottom}, or
2022 @code{left}. Presumably @code{right} is acceptable as well but the
2023 corpus contains no examples.
2026 @defvr {Required} method
2027 How the location is determined:
2031 Based on the natural size of the table. Observed only for
2032 parts @code{height} and @code{width}.
2035 Based on the location specified in @code{target}. Observed only for
2036 parts @code{top} and @code{bottom}.
2039 Using the value in @code{value}. Observed only for parts @code{top},
2040 @code{bottom}, and @code{left}.
2043 Same as the specified @code{target}. Observed only for part
2048 @defvr {Optional} min
2049 Minimum size. Only observed with value @code{100pt}. Only observed
2050 for part @code{width}.
2053 @defvr {Dependent} target
2054 Required when @code{method} is @code{attach} or @code{same}, not
2055 observed otherwise. This is the ID of an element to attach to.
2056 Observed with the ID of @code{title}, @code{footnote}, @code{graph},
2060 @defvr {Dependent} value
2061 Required when @code{method} is @code{fixed}, not observed otherwise.
2062 Observed values are @code{0%}, @code{0px}, @code{1px}, and @code{3px}
2063 on parts @code{top} and @code{left}, and @code{100%} on part
2067 @node SPV Detail coordinates Element
2068 @subsection The @code{coordinates} Element
2070 Parent: @code{graph} @*
2073 This element is always present and always empty, with no attributes
2076 @node SPV Detail faceting Element
2077 @subsection The @code{faceting} Element
2079 Parent: @code{graph} @*
2080 Contents: @code{cross} @code{layer}*
2082 The @code{faceting} element describes the row, column, and layer
2083 structure of the table. Its @code{cross} child determines the row and
2084 column structure, and each @code{layer} child (if any) represents a
2087 @code{faceting} has no attributes (other than @code{id}).
2089 @subsubheading The @code{cross} Element
2091 Parent: @code{faceting} @*
2092 Contents: @code{nest} @code{nest}
2094 The @code{cross} element describes the row and column structure of the
2095 table. It has exactly two @code{nest} children, the first of which
2096 describes the table's rows and the second the table's columns.
2098 @code{cross} has no attributes (other than @code{id}).
2100 @subsubheading The @code{nest} Element
2102 Parent: @code{cross} @*
2103 Contents: @code{variableReference}@math{+}
2105 A given @code{nest} usually consists of one or more dimensions, each
2106 of which is represented by @code{variableReference} child elements.
2107 Minimally, a dimension has two @code{variableReference} children, one
2108 for the categories, one for the data, e.g.:
2112 <variableReference ref="dimension0categories"/>
2113 <variableReference ref="dimension0"/>
2118 Groups of categories introduce additional variable references, e.g.@:
2122 <variableReference ref="dimension0categories"/>
2123 <variableReference ref="dimension0group0"/>
2124 <variableReference ref="dimension0"/>
2129 Grouping can be hierarchical, e.g.@:
2133 <variableReference ref="dimension0categories"/>
2134 <variableReference ref="dimension0group1"/>
2135 <variableReference ref="dimension0group0"/>
2136 <variableReference ref="dimension0"/>
2141 XXX what are group maps?
2144 <nest id="nest_1973">
2145 <variableReference ref="dimension1categories"/>
2146 <variableReference ref="dimension1group1map"/>
2147 <variableReference ref="dimension1group0map"/>
2148 <variableReference ref="dimension1"/>
2151 <variableReference ref="dimension0categories"/>
2152 <variableReference ref="dimension0group0map"/>
2153 <variableReference ref="dimension0"/>
2158 A @code{nest} can contain multiple dimensions:
2162 <variableReference ref="dimension1categories"/>
2163 <variableReference ref="dimension1group0"/>
2164 <variableReference ref="dimension1"/>
2165 <variableReference ref="dimension0categories"/>
2166 <variableReference ref="dimension0"/>
2170 One @code{nest} within a given @code{cross} may have no dimensions, in
2171 which case it still has one @code{variableReference} child, which
2172 references a @code{derivedVariable} whose @code{value} attribute is
2173 @code{constant(0)}. In the corpus, such a @code{derivedVariable} has
2174 @code{row} or @code{column}, respectively, as its @code{id}.
2176 @code{nest} has no attributes (other than @code{id}).
2178 @subsubheading The @code{variableReference} Element
2180 Parent: @code{nest} @*
2183 @code{variableReference} has one attribute.
2185 @defvr {Required} ref
2186 The @code{id} of a @code{sourceVariable} or @code{derivedVariable}
2190 @subsubheading The @code{layer} Element
2192 Parent: @code{faceting} @*
2195 Each layer is represented by a pair of @code{layer} elements. The
2196 first of this pair is for a category variable, the second for the data
2200 <layer value="0" variable="dimension0categories" visible="true"/>
2201 <layer value="dimension0" variable="dimension0" visible="false"/>
2205 @code{layer} has the following attributes.
2207 @defvr {Required} variable
2208 The @code{id} of a @code{sourceVariable} or @code{derivedVariable}
2212 @defvr {Required} value
2213 The value to select. For a category variable, this is always
2214 @code{0}; for a data variable, it is the same as the @code{variable}
2218 @defvr {Optional} visible
2219 Whether the layer is visible. Generally, category layers are visible
2220 and data layers are not, but sometimes this attribute is omitted.
2223 @defvr {Optional} method
2224 When present, this is always @code{nest}.
2227 @node SPV Detail facetLayout Element
2228 @subsection The @code{facetLayout} Element
2230 Parent: @code{graph} @*
2231 Contents: @code{tableLayout} @code{facetLevel}@math{+} @code{setCellProperties}*
2233 @subsubheading The @code{tableLayout} Element
2235 Parent: @code{facetLayout} @*
2238 @defvr {Required} verticalTitlesInCorner
2239 Always set to @code{true}.
2242 @defvr {Optional} style
2243 The @code{id} of a @code{style} element.
2246 @defvr {Optional} fitCells
2247 Always set to @code{ticks}.
2250 @subsubheading The @code{facetLevel} Element
2252 Parent: @code{facetLayout} @*
2253 Contents: @code{axis}
2255 Each @code{facetLevel} describes a @code{variableReference} or
2256 @code{layer}, and a table has one @code{facetLevel} element for
2257 each such element. For example, an SPV detail member that contains
2258 four @code{variableReference} elements and two @code{layer} elements
2259 will contain six @code{facetLevel} elements.
2261 In the corpus, @code{facetLevel} elements and the elements that they
2262 describe are always in the same order. The correspondence may also be
2263 observed in two other ways. First, one may use the @code{level}
2264 attribute, described below. Second, in the corpus, a
2265 @code{facetLevel} always has an @code{id} that is the same as the
2266 @code{id} of the element it describes with @code{_facetLevel}
2267 appended. One should not formally rely on this, of course, but it is
2268 usefully indicative.
2270 @defvr {Required} level
2271 A 1-based index into the @code{variableReference} and @code{layer}
2272 elements, e.g.@: a @code{facetLayout} with a @code{level} of 1
2273 describes the first @code{variableReference} in the SPV detail member,
2274 and in a member with four @code{variableReference} elements, a
2275 @code{facetLayout} with a @code{level} of 5 describes the first
2276 @code{layer} in the member.
2279 @defvr {Required} gap
2280 Always observed as @code{0pt}.
2283 @subsubheading The @code{axis} Element
2285 Parent: @code{facetLevel} @*
2286 Contents: @code{label}? @code{majorTicks}
2288 @defvr {Attribute} style
2289 The @code{id} of a @code{style} element.
2292 @subsubheading The @code{label} Element
2294 Parent: @code{axis} or @code{labelFrame} @*
2295 Contents: @code{text}@math{+} @math{|} @code{descriptionGroup}
2297 This element represents a label on some aspect of the table. For example,
2298 the table's title is a @code{label}.
2300 The contents of the label can be one or more @code{text} elements or a
2301 @code{descriptionGroup}.
2303 @defvr {Attribute} style
2304 @defvrx {Optional} textFrameStyle
2305 Each of these is the @code{id} of a @code{style} element.
2306 @code{style} is the style of the label text, @code{textFrameStyle} the
2307 style for the frame around the label.
2310 @defvr {Optional} purpose
2311 The kind of entity being labeled, one of @code{title},
2312 @code{subTitle}, @code{layer}, or @code{footnote}.
2315 @subsubheading The @code{descriptionGroup} Element
2317 Parent: @code{label} @*
2318 Contents: (@code{description} @math{|} @code{text})@math{+}
2320 A @code{descriptionGroup} concatenates one or more elements to form a
2321 label. Each element can be a @code{text} element, which contains
2322 literal text, or a @code{description} element that substitutes a value
2325 @defvr {Attribute} target
2326 The @code{id} of an element being described. In the corpus, this is
2327 always @code{faceting}.
2330 @defvr {Attribute} separator
2331 A string to separate the description of multiple groups, if the
2332 @code{target} has more than one. In the corpus, this is always a
2336 Typical contents for a @code{descriptionGroup} are a value by itself:
2338 <description name="value"/>
2340 @noindent or a variable and its value, separated by a colon:
2342 <description name="variable"/><text>:</text><description name="value"/>
2345 @subsubheading The @code{description} Element
2347 Parent: @code{descriptionGroup} @*
2350 A @code{description} is like a macro that expands to some property of
2351 the target of its parent @code{descriptionGroup}.
2353 @defvr {Attribute} name
2354 The name of the property. Only @code{variable} and @code{value}
2355 appear in the corpus.
2358 @subsubheading The @code{majorTicks} Element
2360 Parent: @code{axis} @*
2361 Contents: @code{gridline}?
2363 @defvr {Attribute} labelAngle
2364 @defvrx {Attribute} length
2365 Both always defined to @code{0}.
2368 @defvr {Attribute} style
2369 @defvrx {Attribute} tickFrameStyle
2370 Each of these is the @code{id} of a @code{style} element.
2371 @code{style} is the style of the tick labels, @code{tickFrameStyle}
2372 the style for the frames around the labels.
2375 @subsubheading The @code{gridline} Element
2377 Parent: @code{majorTicks} @*
2380 Represents ``gridlines,'' which for a table represents the lines
2381 between the rows or columns of a table (XXX?).
2383 @defvr {Attribute} style
2384 The style for the gridline.
2387 @defvr {Attribute} zOrder
2388 Observed as a number between 28 and 31. Does not seem to be
2392 @subsubheading The @code{setCellProperties} Element
2394 Parent: @code{facetLayout} @*
2395 Contents: @code{setMetaData} @code{setStyle}* @code{setFormat}@math{+} @code{union}?
2397 This element sets style properties of cells designated by the
2398 @code{target} attribute of its child elements, as further restricted
2399 by the optional @code{union} element if present. The @code{target}
2400 values often used, e.g.@: @code{graph} or @code{labeling}, actually
2401 affect every cell, so the @code{union} element is a useful
2404 @defvr {Optional} applyToConverse
2405 If present, always @code{true}. This appears to invert the meaning of
2406 the @code{target} of sub-elements: the selected cells are the ones
2407 @emph{not} designated by @code{target}. This is confusing, given the
2408 additional restrictions of @code{union}, but in the corpus
2409 @code{applyToConverse} is never present along with @code{union}.
2412 @subsubheading The @code{setMetaData} Element
2414 Parent: @code{setCellProperties} @*
2417 This element is not known to have any visible effect.
2419 @defvr {Required} target
2420 The @code{id} of an element whose metadata is to be set. In the
2421 corpus, this is always @code{graph}, the @code{id} used for the
2422 @code{graph} element.
2425 @defvr {Required} key
2426 @defvrx {Required} value
2427 A key-value pair to set for the target.
2429 In the corpus, @code{key} is @code{cellPropId} or, rarely,
2430 @code{diagProps}, and @code{value} is always the @code{id} of the
2431 parent @code{setCellProperties}.
2434 @subsubheading The @code{setStyle} Element
2436 Parent: @code{setCellProperties} @*
2439 This element associates a style with the target.
2441 @defvr {Required} target
2442 The @code{id} of an element whose style is to be set. In the corpus,
2443 this is always the @code{id} of an @code{interval}, @code{labeling},
2444 or, rarely, @code{graph} element.
2447 @defvr {Required} style
2448 The @code{id} of a @code{style} element that identifies the style to
2452 @subsubheading The @code{setFormat} Element
2455 Parent: @code{setCellProperties}
2458 @math{|} @code{numberFormat}
2459 @math{|} @code{stringFormat}@math{+}
2460 @math{|} @code{dateTimeFormat}
2463 This element sets the format of the target, ``format'' in this case
2464 meaning the SPSS print format for a variable.
2466 The details of this element vary depending on the schema version, as
2467 declared in the root @code{visualization} element's @code{version}
2468 attribute (@pxref{SPV Detail visualization Element}). In version 2.5
2469 and earlier, @code{setFormat} contains one of a number of child
2470 elements that correspond to the different varieties of print formats.
2471 In version 2.7 and later, @code{setFormat} instead always contains a
2472 @code{format} element.
2474 XXX reinvestigate the above claim about versions: it appears to be
2477 The @code{setFormat} element itself has the following attributes.
2479 @defvr {Required} target
2480 The @code{id} of an element whose style is to be set. In the corpus,
2481 this is always the @code{id} of an @code{majorTicks} or
2482 @code{labeling} element.
2485 @defvr {Optional} reset
2486 If this is @code{true}, this format overrides the target's previous
2487 format. If it is @code{false}, the adds to the previous format. In
2488 the corpus this is always @code{true}. The default behavior is
2493 * SPV Detail format Element::
2494 * SPV Detail numberFormat Element::
2495 * SPV Detail stringFormat Element::
2496 * SPV Detail dateTimeFormat Element::
2497 * SPV Detail affix Element::
2498 * SPV Detail relabel Element::
2499 * SPV Detail union Element::
2502 @node SPV Detail format Element
2503 @subsubsection The @code{format} Element
2505 Parent: @code{sourceVariable}, @code{derivedVariable}, @code{formatMapping}, @code{labeling}, @code{formatMapping}, @code{setFormat} @*
2506 Contents: (@code{affix}@math{+} @math{|} @code{relabel}@math{+})?
2508 This element appears only in schema version 2.7 (@pxref{SPV Detail
2509 visualization Element}).
2511 This element determines a format, equivalent to an SPSS print format.
2513 @subsubheading Attributes for All Formats
2515 These attributes apply to all kinds of formats. The most important of
2516 these attributes determines the high-level kind of formatting in use:
2518 @defvr {Optional} baseFormat
2519 Either @code{dateTime} or @code{elapsedTime}. When this attribute is
2520 omitted, this element is a numeric or string format.
2524 Whether, in the corpus, other attributes are always present (``yes''),
2525 never present (``no''), or sometimes present (``opt'') depends on
2528 @multitable {maximumFractionDigits} {@code{dateTime}} {@code{elapsedTime}} {number} {string}
2529 @headitem Attribute @tab @code{dateTime} @tab @code{elapsedTime} @tab number @tab string
2530 @item errorCharacter @tab yes @tab yes @tab yes @tab opt
2532 @item separatorChars @tab yes @tab no @tab no @tab no
2534 @item mdyOrder @tab yes @tab no @tab no @tab no
2536 @item showYear @tab yes @tab no @tab no @tab no
2537 @item yearAbbreviation @tab yes @tab no @tab no @tab no
2539 @item showMonth @tab yes @tab no @tab no @tab no
2540 @item monthFormat @tab yes @tab no @tab no @tab no
2542 @item showDay @tab yes @tab opt @tab no @tab no
2543 @item dayPadding @tab yes @tab opt @tab no @tab no
2544 @item dayOfMonthPadding @tab yes @tab no @tab no @tab no
2545 @item dayType @tab yes @tab no @tab no @tab no
2547 @item showHour @tab yes @tab opt @tab no @tab no
2548 @item hourFormat @tab yes @tab opt @tab no @tab no
2549 @item hourPadding @tab yes @tab yes @tab no @tab no
2551 @item showMinute @tab yes @tab yes @tab no @tab no
2552 @item minutePadding @tab yes @tab yes @tab no @tab no
2554 @item showSecond @tab yes @tab yes @tab no @tab no
2555 @item secondPadding @tab no @tab yes @tab no @tab no
2557 @item showMillis @tab no @tab yes @tab no @tab no
2559 @item minimumIntegerDigits @tab no @tab no @tab yes @tab no
2560 @item maximumFractionDigits @tab no @tab yes @tab yes @tab no
2561 @item minimumFractionDigits @tab no @tab yes @tab yes @tab no
2562 @item useGrouping @tab no @tab opt @tab yes @tab no
2563 @item scientific @tab no @tab no @tab yes @tab no
2564 @item small @tab no @tab no @tab opt @tab no
2565 @item suffix @tab no @tab no @tab opt @tab no
2567 @item tryStringsAsNumbers @tab no @tab no @tab no @tab yes
2571 @defvr {Attribute} errorCharacter
2572 A character that replaces the formatted value when it cannot otherwise
2573 be represented in the given format. Always @samp{*}.
2576 @subsubheading Date and Time Attributes
2578 These attributes are used with @code{dateTime} and @code{elapsedTime}
2581 @defvr {Attribute} separatorChars
2582 Exactly four characters. In order, these are used for: decimal point,
2583 grouping, date separator, time separator. Always @samp{.,-:}.
2586 @defvr {Attribute} mdyOrder
2587 Within a date, the order of the days, months, and years.
2588 @code{dayMonthYear} is the only observed value, but one would expect
2589 that @code{monthDayYear} and @code{yearMonthDay} to be reasonable as
2593 @defvr {Attribute} showYear
2594 @defvrx {Attribute} yearAbbreviation
2595 Whether to include the year and, if so, whether the year should be
2596 shown abbreviated, that is, with only 2 digits. Each is @code{true}
2597 or @code{false}; only values of @code{true} and @code{false},
2598 respectively, have been observed.
2601 @defvr {Attribute} showMonth
2602 @defvrx {Attribute} monthFormat
2603 Whether to include the month (@code{true} or @code{false}) and, if so,
2604 how to format it. @code{monthFormat} is one of the following:
2608 The full name of the month, e.g.@: in an English locale,
2612 The abbreviated name of the month, e.g.@: in an English locale,
2616 The number representing the month, e.g.@: 9 for September.
2619 A two-digit number representing the month, e.g.@: 09 for September.
2622 Only values of @code{true} and @code{short}, respectively, have been
2626 @defvr {Attribute} dayPadding
2627 @defvrx {Attribute} dayOfMonthPadding
2628 @defvrx {Attribute} hourPadding
2629 @defvrx {Attribute} minutePadding
2630 @defvrx {Attribute} secondPadding
2631 These attributes presumably control whether each field in the output
2632 is padded with spaces to its maximum width, but the details are not
2633 understood. The only observed value for any of these attributes is
2637 @defvr {Attribute} showDay
2638 @defvrx {Attribute} showHour
2639 @defvrx {Attribute} showMinute
2640 @defvrx {Attribute} showSecond
2641 @defvrx {Attribute} showMillis
2642 These attributes presumably control whether each field is displayed
2643 in the output, but the details are not understood. The only
2644 observed value for any of these attributes is @code{true}.
2647 @defvr {Attribute} dayType
2648 This attribute is always @code{month} in the corpus, specifying that
2649 the day of the month is to be displayed; a value of @code{year} is
2650 supposed to indicate that the day of the year, where 1 is January 1,
2651 is to be displayed instead.
2654 @defvr {Attribute} hourFormat
2655 @code{hourFormat}, if present, is one of:
2659 The time is displayed with an @code{am} or @code{pm} suffix, e.g.@:
2663 The time is displayed in a 24-hour format, e.g.@: @code{22:15}.
2665 This is the only value observed in the corpus.
2668 The time is displayed in a 12-hour format, without distinguishing
2669 morning or evening, e.g.@: @code{10;15}.
2672 @code{hourFormat} is sometimes present for @code{elapsedTime} formats,
2673 which is confusing since a time duration does not have a concept of AM
2674 or PM. This might indicate a bug in the code that generated the XML
2675 in the corpus, or it might indicate that @code{elapsedTime} is
2676 sometimes used to format a time of day.
2679 @subsubheading Numeric Attributes
2681 These attributes are used for formats when @code{baseFormat} is
2682 @code{number}. Attributes @code{maximumFractionDigits}, and
2683 @code{minimumFractionDigits}, and @code{useGrouping} are also used
2684 when @code{baseFormat} is @code{elapsedTime}.
2686 @defvr {Attribute} minimumIntegerDigits
2687 Minimum number of digits to display before the decimal point. Always
2688 observed as @code{0}.
2691 @defvr {Attribute} maximumFractionDigits
2692 @defvrx {Attribute} maximumFractionDigits
2693 Maximum or minimum, respectively, number of digits to display after
2694 the decimal point. The observed values of each attribute range from 0
2698 @defvr {Attribute} useGrouping
2699 Whether to use the grouping character to group digits in large
2700 numbers. It would make sense for the grouping character to come from
2701 the @code{separatorChars} attribute, but that attribute is only
2702 present when @code{baseFormat} is @code{dateTime} or
2703 @code{elapsedTime}, in the corpus at least. Perhaps that is because
2704 this attribute has only been observed as @code{false}.
2707 @defvr {Attribute} scientific
2708 This attribute controls when and whether the number is formatted in
2709 scientific notation. It takes the following values:
2713 Use scientific notation only when the number's magnitude is smaller
2714 than the value of the @code{small} attribute.
2717 Use scientific notation when the number will not otherwise fit in the
2721 Always use scientific notation. Not observed in the corpus.
2724 Never use scientific notation. A number that won't otherwise fit will
2725 be replaced by an error indication (see the @code{errorCharacter}
2726 attribute). Not observed in the corpus.
2730 @defvr {Optional} small
2731 Only present when the @code{scientific} attribute is
2732 @code{onlyForSmall}, this is a numeric magnitude below which the
2733 number will be formatted in scientific notation. The values @code{0}
2734 and @code{0.0001} have been observed. The value @code{0} seems like a
2735 pathological choice, since no real number has a magnitude less than 0;
2736 perhaps in practice such a choice is equivalent to setting
2737 @code{scientific} to @code{false}.
2740 @defvr {Optional} prefix
2741 @defvrx {Optional} suffix
2742 Specifies a prefix or a suffix to apply to the formatted number. Only
2743 @code{suffix} has been observed, with value @samp{%}.
2746 @subsubheading String Attributes
2748 These attributes are used for formats when @code{baseFormat} is
2751 @defvr {Attribute} tryStringsAsNumbers
2752 When this is @code{true}, it is supposed to indicate that string
2753 values should be parsed as numbers and then displayed according to
2754 numeric formatting rules. However, in the corpus it is always
2758 @node SPV Detail numberFormat Element
2759 @subsubsection The @code{numberFormat} Element
2761 Parent: @code{setFormat} @*
2762 Contents: @code{affix}@math{+}
2764 This element appears only in schema version 2.5 and earlier
2765 (@pxref{SPV Detail visualization Element}). Possibly this element
2766 could also contain @code{relabel} elements in a more diverse corpus.
2768 This element has the following attributes.
2770 @defvr {Attribute} maximumFractionDigits
2771 @defvrx {Attribute} minimumFractionDigits
2772 @defvrx {Attribute} minimumIntegerDigits
2773 @defvrx {Optional} scientific
2774 @defvrx {Optional} small
2775 @defvrx {Optional} suffix
2776 @defvrx {Optional} useGroupging
2777 The syntax and meaning of these attributes is the same as on the
2778 @code{format} element for a numeric format. @pxref{SPV Detail format
2782 @node SPV Detail stringFormat Element
2783 @subsubsection The @code{stringFormat} Element
2785 Parent: @code{setFormat} @*
2786 Contents: (@code{affix}@math{+} @math{|} @code{relabel}@math{+})?
2788 This element appears only in schema version 2.5 and earlier
2789 (@pxref{SPV Detail visualization Element}).
2791 This element has no attributes.
2793 @node SPV Detail dateTimeFormat Element
2794 @subsubsection The @code{dateTimeFormat} Element
2796 Parent: @code{setFormat} @*
2799 This element appears only in schema version 2.5 and earlier
2800 (@pxref{SPV Detail visualization Element}). Possibly this element
2801 could also contain @code{affix} and @code{relabel} elements in a more
2804 The following attribute is required.
2806 @defvr {Attribute} baseFormat
2807 Either @code{dateTime} or @code{time}.
2810 When @code{baseFormat} is @code{dateTime}, the following attributes
2813 @defvr {Attribute} dayOfMonthPadding
2814 @defvrx {Attribute} dayPadding
2815 @defvrx {Attribute} dayType
2816 @defvrx {Attribute} hourFormat
2817 @defvrx {Attribute} hourPadding
2818 @defvrx {Attribute} mdyOrder
2819 @defvrx {Attribute} minutePadding
2820 @defvrx {Attribute} monthFormat
2821 @defvrx {Attribute} separatorChars
2822 @defvrx {Attribute} showDay
2823 @defvrx {Attribute} showHour
2824 @defvrx {Attribute} showMinute
2825 @defvrx {Attribute} showMonth
2826 @defvrx {Attribute} showSecond
2827 @defvrx {Attribute} showYear
2828 @defvrx {Attribute} yearAbbreviation
2829 The syntax and meaning of these attributes is the same as on the
2830 @code{format} element when that element's @code{baseFormat} is
2831 @code{dateTime}. @pxref{SPV Detail format Element}.
2834 When @code{baseFormat} is @code{time}, the following attributes are
2837 @defvr {Attribute} hourFormat
2838 @defvrx {Attribute} hourPadding
2839 @defvrx {Attribute} minutePadding
2840 @defvrx {Attribute} monthFormat
2841 @defvrx {Attribute} separatorChars
2842 @defvrx {Attribute} showDay
2843 @defvrx {Attribute} showHour
2844 @defvrx {Attribute} showMinute
2845 @defvrx {Attribute} showMonth
2846 @defvrx {Attribute} showSecond
2847 @defvrx {Attribute} showYear
2848 @defvrx {Attribute} yearAbbreviation
2849 The syntax and meaning of these attributes is the same as on the
2850 @code{format} element when that element's @code{baseFormat} is
2851 @code{elapsedTime}. @pxref{SPV Detail format Element}.
2854 @node SPV Detail affix Element
2855 @subsubsection The @code{affix} Element
2857 Parent: @code{format} or @code{numberFormat} or @code{stringFormat} @*
2860 Possibly this element could have @code{dateTimeFormat} as a parent in
2861 a more diverse corpus.
2863 This defines a suffix (or, theoretically, a prefix) for a formatted
2864 value. It is used to insert a reference to a footnote. It has the
2865 following attributes:
2867 @defvr {Attribute} definesReference
2868 This specifies the footnote number as a natural number: 1 for the
2869 first footnote, 2 for the second, and so on.
2872 @defvr {Attribute} position
2873 Position for the footnote label. Always @code{superscript}.
2876 @defvr {Attribute} suffix
2877 Whether the affix is a suffix (@code{true}) or a prefix
2878 (@code{false}). Always @code{true}.
2881 @defvr {Attribute} value
2882 The text of the suffix or prefix. Typically a letter, e.g.@: @code{a}
2883 for footnote 1, @code{b} for footnote 2, @enddots{} The corpus
2884 contains other values: @code{*}, @code{**}, and a few that begin with
2885 at least one comma: @code{,b}, @code{,c}, @code{,,b}, and @code{,,c}.
2888 @node SPV Detail relabel Element
2889 @subsubsection The @code{relabel} Element
2891 Parent: @code{format} or @code{stringFormat} @*
2894 Possibly this element could have @code{numberFormat} or
2895 @code{dateTimeFormat} as a parent in a more diverse corpus.
2897 This specifies how to display a given value. It is used to implement
2898 value labels and to display the system-missing value in a
2899 human-readable way. It has the following attributes:
2901 @defvr {Attribute} from
2902 The value to map. In the corpus this is an integer or the
2903 system-missing value @code{-1.797693134862316E300}.
2906 @defvr {Attribute} to
2907 The string to display in place of the value of @code{from}. In the
2908 corpus this is a wide variety of value labels; the system-missing
2909 value is mapped to @samp{.}.
2912 @node SPV Detail union Element
2913 @subsubsection The @code{union} Element
2915 Parent: @code{setCellProperties} @*
2916 Contents: @code{intersect}@math{+}
2918 This element represents a set of cells, computed as the union of the
2919 sets represented by each of its children.
2921 @subsubheading The @code{intersect} Element
2923 Parent: @code{union} @*
2924 Contents: @code{where}@math{+} @math{|} @code{intersectWhere}?
2926 This element represents a set of cells, computed as the intersection
2927 of the sets represented by each of its children.
2929 Of the two possible children, in the corpus @code{where} is far more
2930 common, appearing thousands of times, whereas @code{intersectWhere}
2931 only appears 4 times.
2933 Most @code{intersect} elements have two or more children.
2935 @subsubheading The @code{where} Element
2937 Parent: @code{intersect} @*
2940 This element represents the set of cells in which the value of a
2941 specified variable falls within a specified set.
2943 @defvr {Attribute} variable
2944 The @code{id} of a variable, e.g.@: @code{dimension0categories} or
2945 @code{dimension0group0map}.
2948 @defvr {Attribute} include
2949 A value, or multiple values separated by semicolons,
2950 e.g.@: @code{0} or @code{13;14;15;16}.
2953 @subsubheading The @code{intersectWhere} Element
2955 Parent: @code{intersect} @*
2958 The meaning of this element is unknown.
2960 @defvr {Attribute} variable
2961 @defvrx {Attribute} variable2
2962 The meaning of these attributes is unknown. In the four examples in
2963 the corpus they always take the values @code{dimension2categories} and
2964 @code{dimension0categories}, respectively.
2967 @node SPV Detail style Element
2968 @subsection The @code{style} Element