1 @node SPSS Viewer File Format
2 @chapter SPSS Viewer File Format
4 SPSS Viewer or @file{.spv} files, here called SPV files, are written
5 by SPSS 16 and later to represent the contents of its output editor.
6 This chapter documents the format, based on examination of a corpus of
7 about 500 files from a variety of sources. This description is
8 detailed enough to read SPV files, but probably not enough to write
11 SPSS 15 and earlier versions use a completely different output format
12 based on the Microsoft Compound Document Format. This format is not
15 An SPV file is a Zip archive that can be read with @command{zipinfo}
16 and @command{unzip} and similar programs. The final member in the Zip
17 archive is a file named @file{META-INF/MANIFEST.MF}. This structure
18 makes SPV files resemble Java ``JAR'' files (and ODF files), but
19 whereas a JAR manifest contains a sequence of colon-delimited
20 key/value pairs, an SPV manifest contains the string
21 @samp{allowPivoting=true}, without a new-line. (This string may be
22 the best way to identify an SPV file; it is invariant across the
25 The rest of the members in an SPV file's Zip archive fall into two
26 categories: @dfn{structure} and @dfn{detail} members. Structure
27 member names begin with @file{outputViewer@var{nnnnnnnnnn}}, where
28 each @var{n} is a decimal digit, and end with @file{.xml}, and often
29 include the string @file{_heading} in between. Each of these members
30 represents some kind of output item (a table, a heading, a block of
31 text, etc.) or a group of them. The member whose output goes at the
32 beginning of the document is numbered 0, the next member in the output
33 is numbered 1, and so on.
35 Structure members contain XML. This XML is sometimes self-contained,
36 but it often references detail members in the Zip archive, which are
40 @item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin}
41 @itemx @file{@var{prefix}_lightTableData.bin}
42 The structure of a table plus its data. Older SPV files pair a
43 @file{@var{prefix}_table.xml} file that describes the table's
44 structure with a binary @file{@var{prefix}_tableData.bin} file that
45 gives its data. Newer SPV files (the majority of those in the corpus)
46 instead include a single @file{@var{prefix}_lightTableData.bin} file
47 that incorporates both into a single binary format.
49 @item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin}
50 @itemx @file{@var{prefix}_lightWarningData.bin}
51 Same format used for tables, with a different name.
53 @item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin}
54 @itemx @file{@var{prefix}_lightNotesData.bin}
55 Same format used for tables, with a different name.
57 @item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml}
58 The structure of a chart plus its data. Charts do not have a
61 @item @file{@var{prefix}_pmml.scf}
62 @itemx @file{@var{prefix}_stats.scf}
63 @item @file{@var{prefix}_model.xml}
64 Not yet investigated. The corpus contains few examples.
67 The @file{@var{prefix}} in the names of the detail members is
68 typically an 11-digit decimal number that increases for each item,
69 tending to skip values. Older SPV files use different naming
70 conventions. Structure member refer to detail members by name, and so
71 their exact names do not matter to readers as long as they are unique.
74 * SPV Structure Member Format::
75 * SPV Light Detail Member Format::
76 * SPV Legacy Detail Member Binary Format::
77 * SPV Legacy Detail Member XML Format::
80 @node SPV Structure Member Format
81 @section Structure Member Format
83 Structure members' XML files claim conformance with a collection of
84 XML Schemas. These schemas are distributed, under a nonfree license,
85 with SPSS binaries. Fortunately, the schemas are not necessary to
86 understand the structure members. To a degree, the schemas can even
87 be deceptive because they document elements and attributes that are
88 not in the corpus and do not document elements and attributes that are
91 Structure members use a different XML namespace for each schema, but
92 these namespaces are not entirely consistent. In some SPV files, for
93 example, the @code{viewer-tree} schema is associated with namespace
94 @indicateurl{http://xml.spss.com/spss/viewer-tree} and in others with
95 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the
96 additional @file{viewer/}). Under either name, the schema URIs are
97 not resolvable to obtain the schemas themselves.
99 One may ignore all of the above in interpreting a structure member.
100 The actual XML has a simple and straightforward form that does not
101 require a reader to take schemas or namespaces into account.
103 The elements found in structure members are documented below. For
104 each element, we note the possible parent elements and the element's
105 contents. The contents are specified as pseudo-regular expressions
106 with the following conventions:
119 Grouping multiple elements.
124 @item @var{a} @math{|} @var{b}
125 A choice between @var{a} and @var{b}.
128 Zero or more @var{x}.
132 For a diagram illustrating the hierarchy of elements within an SPV
133 structure member, please refer to a PDF version of the manual.
137 The following diagram shows the hierarchy of elements within an SPV
138 structure member. Edges point from parent to child elements.
139 Unlabeled edges indicate that the child appears exactly once; edges
140 labeled with *, zero or more times; edges labeled with ?, zero or one
142 @center @image{dev/spv-structure, 5in}
146 * SPV Structure heading Element::
147 * SPV Structure label Element::
148 * SPV Structure container Element::
149 * SPV Structure text Element (Inside @code{container})::
150 * SPV Structure html Element::
151 * SPV Structure table Element::
152 * SPV Structure tableStructure Element::
153 * SPV Structure dataPath Element::
154 * SPV Structure pageSetup Element::
155 * SPV Structure pageHeader and pageFooter Elements::
156 * SPV Structure pageParagraph Element::
157 * SPV Structure @code{text} Element (Inside @code{pageParagraph})::
160 @node SPV Structure heading Element
161 @subsection The @code{heading} Element
163 Parent: Document root or @code{heading} @*
164 Contents: @code{pageSetup}? @code{label} (@code{container} @math{|} @code{heading})*
166 The root of a structure member is a @code{heading}, which represents a
167 section of output beginning with a title (the @code{label}) and
168 ordinarily followed by content containers or further nested
169 (sub)-sections of output.
171 The document root heading, only, may also contain a @code{pageSetup}
174 The following attributes have been observed on both document root and
175 nested @code{heading} elements.
177 @defvr {Optional} creator-version
178 The version of the software that created this SPV file. A string of
179 the form @code{xxyyzzww} represents software version xx.yy.zz.ww,
180 e.g.@: @code{21000001} is version 21.0.0.1. Trailing pairs of zeros
181 are sometimes omitted, so that @code{21}, @code{210000}, and
182 @code{21000000} are all version 21.0.0.0 (and the corpus contains all
183 three of those forms).
187 The following attributes have been observed on document root
188 @code{heading} elements only:
190 @defvr {Optional} @code{creator}
191 The directory in the file system of the software that created this SPV
195 @defvr {Optional} @code{creation-date-time}
196 The date and time at which the SPV file was written, in a
197 locale-specific format, e.g.@: @code{Friday, May 16, 2014 6:47:37 PM
198 PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday,
199 December 5, 2014 5:00:19 o'clock PM EST}.
202 @defvr {Optional} @code{lockReader}
203 Whether a reader should be allowed to edit the output. The possible
204 values are @code{true} and @code{false}, but the corpus only contains
208 @defvr {Optional} @code{schemaLocation}
209 This is actually an XML Namespace attribute. A reader may ignore it.
213 The following attributes have been observed only on nested
214 @code{heading} elements:
216 @defvr {Required} @code{commandName}
217 The locale-invariant name of the command that produced the output,
218 e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
221 @defvr {Optional} @code{visibility}
222 To what degree the output represented by the element is visible. The
223 only observed value is @code{collapsed}.
226 @defvr {Optional} @code{locale}
227 The locale used for output, in Windows format, which is similar to the
228 format used in Unix with the underscore replaced by a hyphen, e.g.@:
229 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
232 @defvr {Optional} @code{olang}
233 The output language, e.g.@: @code{en}, @code{it}, @code{es},
234 @code{de}, @code{pt-BR}.
237 @node SPV Structure label Element
238 @subsection The @code{label} Element
240 Parent: @code{heading} or @code{container} @*
243 Every @code{heading} and @code{container} holds a @code{label} as its
244 first child. The root @code{heading} in a structure member always
245 contains the string ``Output''. Otherwise, the text in @code{label}
246 describes what it labels, often by naming the statistical procedure
247 that was executed, e.g.@: ``Frequencies'' or ``T-Test''. Labels are
248 often very generic, especially within a @code{container}, e.g.@:
249 ``Title'' or ``Warnings'' or ``Notes''. Label text is localized
250 according to the output language, e.g.@: in Italian a frequency table
251 procedure is labeled ``Frequenze''.
253 The corpus contains one example of an empty label, one that contains
256 This element has no attributes.
258 @node SPV Structure container Element
259 @subsection The @code{container} Element
261 Parent: @code{heading} @*
262 Contents: @code{label} (@code{table} @math{|} @code{text})?
264 A @code{container} serves to label a @code{table} or a @code{text}
267 This element has the following attributes.
269 @defvr {Required} @code{visibility}
270 Either @code{visible} or @code{hidden}, this indicates whether the
271 container's content is displayed.
274 @defvr {Optional} @code{text-align}
275 Presumably indicates the alignment of text within the container. The
276 only observed value is @code{left}. Observed with nested @code{table}
277 and @code{text} elements.
280 @defvr {Optional} @code{width}
281 The width of the container in the form @code{@var{n}px}, e.g.@:
285 @node SPV Structure text Element (Inside @code{container})
286 @subsection The @code{text} Element (Inside @code{container})
288 Parent: @code{container} @*
289 Contents: @code{html}
291 This @code{text} element is nested inside a @code{container}. There
292 is a different @code{text} element that is nested inside a
293 @code{pageParagraph}.
295 This element has the following attributes.
297 @defvr {Required} @code{type}
298 One of @code{title}, @code{log}, or @code{text}.
301 @defvr {Optional} @code{commandName}
302 As on the @code{heading} element. For output not specific to a
303 command, this is simply @code{log}. The corpus contains one example
304 of where @code{commandName} is present but set to the empty string.
307 @defvr {Optional} @code{creator-version}
308 As on the @code{heading} element.
311 @node SPV Structure html Element
312 @subsection The @code{html} Element
314 Parent: @code{text} @*
317 The CDATA contains an HTML document. In some cases, the document
318 starts with @code{<html>} and ends with @code{</html}; in others the
319 @code{html} element is implied. Generally the HTML includes a
320 @code{head} element with a CSS stylesheet. The HTML body often begins
321 with @code{<BR>}. The actual content ranges from trivial to simple:
322 just discarding the CSS and tags yields readable results.
324 This element has the following attributes.
326 @defvr {Required} @code{lang}
327 This always contains @code{en} in the corpus.
330 @node SPV Structure table Element
331 @subsection The @code{table} Element
333 Parent: @code{container} @*
334 Contents: @code{tableStructure}
336 This element has the following attributes.
338 @defvr {Required} @code{commandName}
339 As on the @code{heading} element.
342 @defvr {Required} @code{type}
343 One of @code{table}, @code{note}, or @code{warning}.
346 @defvr {Required} @code{subType}
347 The locale-invariant name for the particular kind of output that this
348 table represents in the procedure. This can be the same as
349 @code{commandName} e.g.@: @code{Frequencies}, or different, e.g.@:
350 @code{Case Processing Summary}. Generic subtypes @code{Notes} and
351 @code{Warnings} are often used.
354 @defvr {Required} @code{tableId}
355 A number that uniquely identifies the table within the SPV file,
356 typically a large negative number such as @code{-4147135649387905023}.
359 @defvr {Optional} @code{creator-version}
360 As on the @code{heading} element. In the corpus, this is only present
361 for version 21 and up and always includes all 8 digits.
364 @node SPV Structure tableStructure Element
365 @subsection The @code{tableStructure} Element
367 Parent: @code{table} @*
368 Contents: @code{dataPath}
370 This element has no attributes.
372 @node SPV Structure dataPath Element
373 @subsection The @code{dataPath} Element
375 Parent: @code{tableStructure} @*
378 Contains the name of the Zip member that holds the table details,
379 e.g.@: @code{0000000001437_lightTableData.bin}.
381 This element has no attributes.
383 @node SPV Structure pageSetup Element
384 @subsection The @code{pageSetup} Element
386 Parent: @code{heading} @*
387 Contents: @code{pageHeader} @code{pageFooter}
389 This element has the following attributes.
391 @defvr {Required} @code{initial-page-number}
395 @defvr {Optional} @code{chart-size}
396 Always @code{as-is} or a localization (!) of it (e.g.@: @code{dimensione
397 attuale}, @code{Wie vorgegeben}).
400 @defvr {Optional} @code{margin-left}
401 @defvrx {Optional} @code{margin-right}
402 @defvrx {Optional} @code{margin-top}
403 @defvrx {Optional} @code{margin-bottom}
404 Margin sizes in the form @code{@var{size}in}, e.g.@: @code{0.25in}.
407 @defvr {Optional} @code{paper-height}
408 @defvrx {Optional} @code{paper-width}
409 Paper sizes in the form @code{@var{size}in}, e.g.@: @code{8.5in} by
410 @code{11in} for letter paper or @code{8.267in} by @code{11.692in} for
414 @defvr {Optional} @code{reference-orientation}
418 @defvr {Optional} @code{space-after}
422 @node SPV Structure pageHeader and pageFooter Elements
423 @subsection The @code{pageHeader} and @code{pageFooter} Elements
425 Parent: @code{pageSetup} @*
426 Contents: @code{pageParagraph}*
428 This element has no attributes.
430 @node SPV Structure pageParagraph Element
431 @subsection The @code{pageParagraph} Element
433 Parent: @code{pageHeader} or @code{pageFooter} @*
434 Contents: @code{text}
436 Text to go at the top or bottom of a page, respectively.
438 This element has no attributes.
440 @node SPV Structure @code{text} Element (Inside @code{pageParagraph})
441 @subsection The @code{text} Element (Inside @code{pageParagraph})
443 Parent: @code{pageParagraph}
446 This @code{text} element is nested inside a @code{pageParagraph}. There
447 is a different @code{text} element that is nested inside a
450 The element is either empty, or contains CDATA that holds almost-XHTML
451 text: in the corpus, either an @code{html} or @code{p} element. It is
452 @emph{almost}-XHTML because the @code{html} element designates the
454 @code{http://xml.spss.com/spss/viewer/viewer-tree} instead of an XHTML
455 namespace, and because the CDATA can contain substitution variables:
456 @code{&[Page]} for the page number and @code{&[PageTitle]} for the
459 Typical contents (indented for clarity):
462 <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
465 <p style="text-align:right; margin-top: 0">Page &[Page]</p>
470 This element has the following attributes.
472 @defvr {Required} @code{type}
476 @node SPV Light Detail Member Format
477 @section Light Detail Member Format
479 This section describes the format of ``light'' detail @file{.bin}
480 members. These members have a binary format which we describe here in
481 terms of a context-free grammar using the following conventions:
484 @item NonTerminal @result{} @dots{}
485 Nonterminals have CamelCaps names, and @result{} indicates a
486 production. The right-hand side of a production is often broken
487 across multiple lines. Break points are chosen for aesthetics only
488 and have no semantic significance.
490 @item 00, 01, @dots{}, ff.
491 A bytes with a fixed value, written as a pair of hexadecimal digits.
493 @item i0, i1, @dots{}, i9, i10, i11, @dots{}
494 @itemx b0, b1, @dots{}, b9, b10, b11, @dots{}
495 A 32-bit integer in little-endian or big-endian byte order,
496 respectively, with a fixed value, written in decimal, prefixed by
503 A byte with value 0 or 1.
507 A 16-bit integer in little-endian or big-endian byte order,
512 A 32-bit integer in little-endian or big-endian byte order,
517 A 64-bit integer in little-endian or big-endian byte order,
521 A 64-bit IEEE floating-point number.
524 A 32-bit IEEE floating-point number.
528 A 32-bit integer, in little-endian or big-endian byte order,
529 respectively, followed by the specified number of bytes of character
530 data. (The encoding is indicated by the Formats nonterminal.)
533 @var{x} is optional, e.g.@: 00? is an optional zero byte.
535 @item @var{x}*@var{n}
536 @var{x} is repeated @var{n} times, e.g. byte*10 for ten arbitrary bytes.
538 @item @var{x}[@var{name}]
539 Gives @var{x} the specified @var{name}. Names are used in textual
540 explanations. They are also used, also bracketed, to indicate counts,
541 e.g.@: int[@t{n}] byte*[@t{n}] for a 32-bit integer followed by the
542 specified number of arbitrary bytes.
544 @item @var{a} @math{|} @var{b}
545 Either @var{a} or @var{b}.
548 Parentheses are used for grouping to make precedence clear, especially
549 in the presence of @math{|}, e.g.@: in 00 (01 @math{|} 02 @math{|} 03)
553 A 32-bit integer that indicates the number of bytes in @var{x},
554 followed by @var{x} itself.
557 In a version 1 @file{.bin} member, @var{x}; in version 3, nothing.
558 (The @file{.bin} header indicates the version.)
561 In a version 3 @file{.bin} member, @var{x}; in version 1, nothing.
564 Little-endian byte order is far more common in this format, but a few
565 pieces of the format use big-endian byte order.
567 A ``light'' detail member @file{.bin} consists of a number of sections
568 concatenated together, terminated by a byte 01:
572 LightMember @result{}
575 Fonts Borders PrintSettings TableSettings Formats
581 The following sections go into more detail.
584 * SPV Light Member Header::
585 * SPV Light Member Title::
586 * SPV Light Member Caption::
587 * SPV Light Member Footnotes::
588 * SPV Light Member Fonts::
589 * SPV Light Member Borders::
590 * SPV Light Member Print Settings::
591 * SPV Light Member Table Settings::
592 * SPV Light Member Formats::
593 * SPV Light Member Dimensions::
594 * SPV Light Member Categories::
595 * SPV Light Member Data::
596 * SPV Light Member Value::
597 * SPV Light Member ValueMod::
600 @node SPV Light Member Header
603 An SPV light member begins with a 39-byte header:
609 (i1 @math{|} i3)[@t{version}]
611 bool[@t{show-numeric-markers}]
612 bool[@t{rotate-inner-column-labels}]
613 bool[@t{rotate-outer-row-labels}]
616 int[@t{min-column-width}] int[@t{max-column-width}]
617 int[@t{min-row-width}] int[@t{max-row-width}]
622 @code{version} is a version number that affects the interpretation of
623 some of the other data in the member. We will refer to ``version 1''
624 and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for
625 version-specific formatting (as described previously).
627 If @code{show-numeric-markers} is 1, footnote markers are shown as
628 numbers, starting from 1; otherwise, they are shown as letters,
629 starting from @samp{a}.
631 If @code{rotate-inner-column-labels} is 1, then column labels closest
632 to the data are rotated to be vertical; otherwise, they are shown
635 If @code{rotate-outer-row-labels} is 1, then row labels farthest from
636 the data are rotated to be vertical; otherwise, they are shown in the
639 @code{table-id} is a binary version of the @code{tableId} attribute in
640 the structure member that refers to the detail member. For example,
641 if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
642 would be 0xc6c99d183b300001.
644 @code{min-column-width} is the minimum width that a column will be
645 assigned automatically. @code{max-column-width} is the maximum width
646 that a column will be assigned to accommodate a long column label.
647 @code{min-row-width} and @code{max-row-width} are a similar range for
648 the width of row labels. All of these measurements are in 1/96 inch
651 The meaning of the other variable parts of the header is not known.
653 @node SPV Light Member Title
659 Value[@t{title1}] 01?
661 Value[@t{title2}] 01?
665 The Title, which follows the Header, specifies the pivot table's title
666 twice, as @code{title1} and @code{title2}. In the corpus, they are
669 Whereas the Value in @code{title1} and in @code{title2} are
670 appropriate for presentation, and localized to the user's language,
671 @code{c} is in English, sometimes less specific, and sometimes less
672 well formatted. For example, for a frequency table, @code{title1} and
673 @code{title2} name the variable and @code{c} is simply ``Frequencies''.
675 @node SPV Light Member Caption
680 Caption @result{} Caption1 Caption2
681 Caption1 @result{} 31 Value @math{|} 58
682 Caption2 @result{} 31 Value @math{|} 58
686 The Caption, if present, is shown below the table. Caption2 is
687 normally present. Caption1 is only rarely nonempty; it might reflect
688 user editing of the caption.
690 @node SPV Light Member Footnotes
691 @subsection Footnotes
695 Footnotes @result{} int[@t{n}] Footnote*[@t{n}]
696 Footnote @result{} Value[@t{text}] (58 @math{|} 31 Value[@t{marker}]) byte*4
700 Each footnote has @code{text} and an optional customer @code{marker}
703 @node SPV Light Member Fonts
708 Fonts @result{} 00 Font*8
711 string[@t{typeface}] float[@t{size}] int[@t{style}] bool[@t{underline}]
712 int[@t{halign}] int[@t{valign}]
713 string[@t{fgcolor}] string[@t{bgcolor}]
714 byte[@t{alternate}] string[@t{altfg}] string[@t{altbg}]
715 v3(int[@t{left-margin}] int[@t{right-margin}] int[@t{top-margin}] int[@t{bottom-margin}])
719 Each Font represents the font style for a different element, in the
720 following order: title, caption, footer, corner, column
721 labels, row labels, data, and layers.
723 @code{index} is the 1-based index of the Font, i.e. 1 for the first
724 Font, through 8 for the final Font.
726 @code{typeface} is the string name of the font. In the corpus, this
727 is @code{SansSerif} in over 99% of instances and @code{Times New
730 @code{size} is the size of the font, in points. The most common size
731 in the corpus is 12 points.
733 @code{style} is a bit mask. Bit 0 (with value 1) is set for bold, bit
734 1 (with value 2) is set for italic.
736 @code{underline} is 1 if the font is underlined, 0 otherwise.
738 @code{halign} specifies horizontal alignment: 0 for center, 2 for
739 left, 4 for right, 61453 for decimal, 64173 for mixed. Mixed
740 alignment varies according to type: string data is left-justified,
741 numbers and most other formats are right-justified.
743 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
746 @code{fgcolor} and @code{bgcolor} are the foreground color and
747 background color, respectively. In the corpus, these are always
748 @code{#000000} and @code{#ffffff}, respectively.
750 @code{alternate} is 01 if rows should alternate colors, 00 if all rows
751 should be the same color. When @code{alternate} is 01, @code{altfg}
752 and @code{altbg} specify the colors for the alternate rows.
754 @code{left-margin}, @code{right-margin}, @code{top-margin}, and
755 @code{bottom-margin} are measured in multiples of 1/96 inch.
757 @node SPV Light Member Borders
764 be32[@t{n-borders}] Border*[@t{n-borders}]
765 bool[@t{show-grid-lines}]
769 be32[@t{border-type}]
770 be32[@t{stroke-type}]
775 The Borders reflect how borders between regions are drawn.
777 The fixed value of @code{endian} can be used to validate the
780 @code{show-grid-lines} is 1 to draw grid lines, otherwise 0.
782 Each Border describes one kind of border. @code{n-borders} seems to
783 always be 19. Each @code{border-type} appears once (although in an
784 unpredictable order) and correspond to the following borders:
790 Left, top, right, and bottom outer frame.
792 Left, top, right, and bottom inner frame.
794 Left and top of data area.
796 Horizontal and vertical dimension rows.
798 Horizontal and vertical dimension columns.
800 Horizontal and vertical category rows.
802 Horizontal and vertical category columns.
805 @code{stroke-type} describes how a border is drawn, as one of:
822 @code{color} is an RGB color. Bits 24--31 are alpha, bits 16--23 are
823 red, 8--15 are green, 0--7 are blue. An alpha of 255 indicates an
824 opaque color, therefore opaque black is 0xff000000.
826 @node SPV Light Member Print Settings
827 @subsection Print Settings
831 PrintSettings @result{}
834 bool[@t{paginate-layers}]
837 bool[@t{top-continuation}]
838 bool[@t{bottom-continuation}]
839 be32[@t{n-orphan-lines}]
840 bestring[@t{continuation-string}]
844 The PrintSettings reflect settings for printing. The fixed value of
845 @code{endian} can be used to validate the endianness.
847 @code{all-layers} is 1 to print all layers, 0 to print only the
850 @code{paginate-layers} is 1 to print each layer at the start of a new
851 page, 0 otherwise. (This setting is honored only @code{all-layers} is
852 1, since otherwise only one layer is printed.)
854 @code{fit-width} and @code{fit-length} control whether the table is
855 shrunk to fit within a page's width or length, respectively.
857 @code{n-orphan-lines} is the minimum number of rows or columns to put
858 in one part of a table that is broken across pages.
860 If @code{top-continuation} is 1, then @code{continuation-string} is
861 printed at the top of a page when a table is broken across pages for
862 printing; similarly for @code{bottom-continuation} and the bottom of a
863 page. Usually, @code{continuation-string} is empty.
865 @node SPV Light Member Table Settings
866 @subsection Table Settings
870 TableSettings @result{}
873 be32[@t{current-layer}]
875 bool[@t{show-row-labels-in-corner}]
876 bool[@t{show-alphabetic-markers}]
877 bool[@t{footnote-marker-position}]
881 Breakpoints[@t{row-breaks}] Breakpoints[@t{column-breaks}]
882 Keeps[@t{row-keeps}] Keeps[@t{column-keeps}]
883 PointKeeps[@t{row-keeps}] PointKeeps[@t{column-keeps}]
886 bestring[@t{table-look}]
890 Breakpoints @result{} be32[@t{n-breaks}] be32*[@t{n-breaks}]
892 Keeps @result{} be32[@t{n-keeps}] Keep*@t{n-keeps}
893 Keep @result{} be32[@t{offset}] be[@t{n}]
895 PointKeeps @result{} be32[@t{n-point-keeps}] PointKeep*@t{n-point-keeps}
896 PointKeep @result{} be32[@t{offset}] be32 be32
901 The TableSettings reflect display settings. The fixed value of
902 @code{endian} can be used to validate the endianness.
904 @code{current-layer} is the displayed layer.
906 If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
907 any cell) are hidden; otherwise, they are shown.
909 If @code{show-row-labels-in-corner} is 1, then row labels are shown in
910 the upper left corner; otherwise, they are shown nested.
912 If @code{show-alphabetic-markers} is 1, markers are shown as letters
913 (e.g. @samp{a}, @samp{b}, @samp{c}, @dots{}); otherwise, they are
914 shown as numbers starting from 1.
916 When @code{footnote-marker-position} is 1, footnote markers are shown
917 as superscripts, otherwise as subscripts.
919 The Breakpoints are rows or columns after which there is a page break;
920 for example, a row break of 1 requests a page break after the second
921 row. Usually no breakpoints are specified, indicating that page
922 breaks should be selected automatically.
924 The Keeps are ranges of rows or columns to be kept together without a
925 page break; for example, a row Keep with @code{offset} 1 and @code{n}
926 10 requests that the 10 rows starting with the second row be kept
927 together. Usually no Keeps are specified.
929 The PointKeeps seem to be generated automatically based on
930 user-specified Keeps. They seems to indicate a conversion from rows
931 or columns to pixel or point offsets.
933 @code{notes} is a text string that contains user-specified notes. It
934 is displayed when the user hovers the cursor over the table, like
935 ``alt text'' on a webpage. It is not printed. It is usually empty.
937 @code{table-look} is the name of a SPSS ``TableLook'' table style,
938 such as ``Default'' or ``Academic''; it is often empty.
940 TableSettings ends with an arbitrary number of null bytes.
942 @node SPV Light Member Formats
948 int[@t{n-widths}] int*[@t{n-widths}]
950 int[@t{current-layer}]
951 bool[@t{digit-grouping}] bool[@t{leading-zero}] bool
953 byte[@t{decimal}] byte[@t{grouping}]
957 v3(count(X1 count(X2)) count(X3))
961 string[@t{command}] string[@t{command-local}]
962 string[@t{language}] string[@t{charset}] string[@t{locale}]
965 byte[@t{decimal}] byte[@t{grouping}]
967 byte[@t{missing}] bool
972 byte[@t{variable-mode}]
979 int[@t{n-heights}] int*[@t{n-heights}]
980 int[@t{n-style-map}] BlankMap*[@t{n-style-map}]
981 int[@t{n-styles}] StylePair*[@t{n-styles}]
983 StyleMap @result{} int64[@t{cell-index}] int16[@t{style-index}]
985 01 00 (03 @math{|} 04) 00 00 00
986 string[@t{command}] string[@t{command-local}]
987 string[@t{language}] string[@t{charset}] string[@t{locale}]
990 byte[@t{decimal}] byte[@t{grouping}]
992 (string[@t{dataset}] string[@t{datafile}] i0 int[@t{date}] i0)?
994 byte[@t{missing}] bool (i2000000 i0)?
996 CustomCurrency @result{} int[@t{n-ccs}] string*[@t{n-ccs}]
1000 If @code{n-widths} is nonzero, then the accompanying integers are
1001 column widths as manually adjusted by the user. (Row heights are
1002 computed automatically based on the widths.)
1004 @code{encoding} is a character encoding, usually a Windows code page
1005 such as @code{en_US.windows-1252} or @code{it_IT.windows-1252}. The
1006 rest of the character strings in the member use this encoding. The
1007 encoding string is itself encoded in US-ASCII.
1009 @code{epoch} is the year that starts the epoch. A 2-digit year is
1010 interpreted as belonging to the 100 years beginning at the epoch. The
1011 default epoch year is 69 years prior to the current year; thus, in
1012 2017 this field by default contains 1948. In the corpus, @code{epoch}
1013 ranges from 1943 to 1948, plus some contain -1.
1015 @code{decimal} is the decimal point character. The observed values
1016 are @samp{.} and @samp{,}.
1018 @code{grouping} is the grouping character. Usually, it is @samp{,} if
1019 @code{decimal} is @samp{.}, and vice versa. Other observed values are
1020 @samp{'} (apostrophe), @samp{ } (space), and zero (presumably
1021 indicating that digits should not be grouped).
1023 @code{command} describes the statistical procedure that generated the
1024 output, in English. It is not necessarily the literal syntax name of
1025 the procedure: for example, NPAR TESTS becomes ``Nonparametric
1026 Tests.'' @code{command-local} is the procedure's name, translated
1027 into the output language; it is often empty and, when it is not,
1028 sometimes the same as @code{command}.
1030 @code{dataset} is the name of the dataset analyzed to produce the
1031 output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
1032 file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter
1033 is sometimes the empty string.
1035 @code{date} is a date, as seconds since the epoch, i.e.@: since
1036 January 1, 1970. Pivot tables within an SPV files often have dates a
1037 few minutes apart, so this is probably a creation date for the tables
1038 rather than for the file.
1040 Sometimes @code{dataset}, @code{datafile}, and @code{date} are present
1041 and other times they are absent. The reader can distinguish by
1042 assuming that they are present and then checking whether the
1043 presumptive @code{dataset} contains a null byte (a valid string never
1046 @code{n-ccs} is observed as either 0 or 5. When it is 5, the
1047 following strings are CCA through CCE format strings. @xref{Custom
1048 Currency Formats,,, pspp, PSPP}. Most commonly these are all
1049 @code{-,,,} but other strings occur.
1051 @code{missing} is the character used to indicate that a cell contains
1052 a missing value. It is always observed as @samp{.}.
1054 @node SPV Light Member Dimensions
1055 @subsection Dimensions
1057 A pivot table presents multidimensional data. A Dimension identifies
1058 the categories associated with each dimension.
1062 Dimensions @result{} int[@t{n-dims}] Dimension*[@t{n-dims}]
1063 Dimension @result{} Value[@t{name}] DimProperties int[@t{n-categories}] Category*[@t{n-categories}]
1064 DimProperties @result{}
1066 (00 @math{|} 01 @math{|} 02)[@t{d2}]
1067 (i0 @math{|} i2)[@t{d3}]
1068 bool[@t{show-dim-label}]
1069 bool[@t{hide-all-labels}]
1070 01 int[@t{dim-index}]
1074 @code{name} is the name of the dimension, e.g. @code{Variables},
1075 @code{Statistics}, or a variable name.
1077 The meanings of @code{d1}, @code{d2}, and @code{d3} are unknown.
1078 @code{d1} is usually 0 but many other values have been observed.
1080 If @code{show-dim-label} is 01, the pivot table displays a label for
1081 the dimension itself. Because usually the group and category labels
1082 are enough explanation, it is usually 00.
1084 If @code{hide-all-labels} is 01, the pivot table omits all labels for
1085 the dimension, including group and category labels. It is usually 00.
1086 When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored.
1088 @code{dim-index} is usually the 0-based index of the dimension, e.g.@:
1089 0 for the first dimension, 1 for the second, and so on. Sometimes it
1090 is -1. There is no visible difference.
1092 @node SPV Light Member Categories
1093 @subsection Categories
1095 Categories are arranged in a tree. Only the leaf nodes in the tree
1096 are really categories; the others just serve as grouping constructs.
1100 Category @result{} Value[@t{name}] (Leaf @math{|} Group)
1101 Leaf @result{} 00 00 00 i2 int[@t{cat-index}] i0
1103 bool[@t{merge}] 00 01 (i0 @math{|} i2)[@t{data}]
1104 i-1 int[@t{n-subcategories}] Category*[@t{n-subcategories}]
1108 @code{name} is the name of the category (or group).
1110 A Leaf represents a leaf category. The Leaf's @code{cat-index} is a
1111 nonnegative integer less than @code{n-categories} in the Dimension in
1112 which the Category is nested (directly or indirectly). These
1113 categories represent the original order in which the categories were
1114 sorted; if the user sorted or rearranged the categories, then the
1115 order of categories in the file reflects that without changing the
1116 @code{cat-index} values.
1118 A Group is a group of nested categories. Usually a Group contains at
1119 least one Category, so that @code{n-subcategories} is positive, but a
1120 few Groups with @code{n-subcategories} 0 has been observed.
1122 If a Group's @code{merge} is 00, the most common value, then the group
1123 is really a distinct group that should be represented as such in the
1124 visual representation and user interface. If @code{merge} is 01, the
1125 categories in this group should be shown and treated as if they were
1126 direct children of the group's containing group (or if it has no
1127 parent group, then direct children of the dimension), and this group's
1128 name is irrelevant and should not be displayed. (Merged groups can be
1131 A Group's @code{data} appears to be i2 when all of the categories
1132 within a group are leaf categories that directly represent data values
1133 for a variable (e.g. in a frequency table or crosstabulation, a group
1134 of values in a variable being tabulated) and i0 otherwise.
1136 @node SPV Light Member Data
1139 The final part of an SPV light member contains the actual data.
1144 int[@t{layers}] int[@t{rows}] int[@t{columns}] int*[@t{n-dimensions}]
1145 int[@t{n-data}] Datum*[@t{n-data}]
1146 Datum @result{} int64[@t{index}] v1(00?) Value
1150 The values of @code{n-layers}, @code{n-rows}, and @code{n-columns}
1151 each specifies the number of dimensions displayed in layers, rows, and
1152 columns, respectively. Any of them may be zero. Their values sum to
1153 @code{n-dimensions} from Dimensions (@pxref{SPV Light Member
1156 The @code{n-dimensions} integers are a permutation of the 0-based
1157 dimension numbers. The first @code{n-layers} integers specify each of
1158 the dimensions represented by layers, the next @code{n-rows} integers
1159 specify the dimensions represented by rows, and the final
1160 @code{n-columns} integers specify the dimensions represented by
1161 columns. When there is more than one dimension of a given kind, the
1162 inner dimensions are given first.
1164 The format of a Datum varies slightly from version 1 to version 3: in
1165 version 1 it allows for an extra optional 00 byte.
1167 A Datum consists of an @code{index} and a Value. Suppose there are
1168 @math{d} dimensions and dimension @math{i}, @math{0 \le i < d}, has
1169 @math{n_i} categories. Consider the datum at coordinates @math{x_i},
1170 @math{0 \le i < d}, and note that @math{0 \le x_i < n_i}. Then the
1171 index is calculated by the following algorithm:
1175 for each @math{i} from 0 to @math{d - 1}:
1176 @i{index} = (@math{n_i \times} @i{index}) @math{+} @math{x_i}
1179 For example, suppose there are 3 dimensions with 3, 4, and 5
1180 categories, respectively. The datum at coordinates (1, 2, 3) has
1181 index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}.
1182 Within a given dimension, the index is the @code{cat-index} in a Leaf.
1184 @node SPV Light Member Value
1187 Value is used throughout the SPV light member format. It boils down
1188 to a number or a string.
1192 Value @result{} 00? 00? 00? 00? RawValue
1194 01 ValueMod int[@t{format}] double[@t{x}]
1195 @math{|} 02 ValueMod int[@t{format}] double[@t{x}]
1196 string[@t{varname}] string[@t{vallab}] (01 @math{|} 02 @math{|} 03)
1197 @math{|} 03 string[@t{local}] ValueMod string[@t{id}] string[@t{c}] bool[@t{type}]
1198 @math{|} 04 ValueMod int[@t{format}] string[@t{vallab}] string[@t{varname}]
1199 (01 @math{|} 02 @math{|} 03) string[@t{s}]
1200 @math{|} 05 ValueMod string[@t{varname}] string[@t{varlabel}] (01 @math{|} 02 @math{|} 03)
1201 @math{|} ValueMod string[@t{format}] int[@t{n-args}] Argument*[@t{n-args}]
1204 @math{|} int[@t{x}] i0 Value*[@t{x}@math{+}1] /* @t{x} @math{>} 0 */
1208 There are several possible encodings, which one can distinguish by the
1209 first nonzero byte in the encoding.
1213 The numeric value @code{x}, intended to be presented to the user
1214 formatted according to @code{format}, which is in the format described
1215 for system files. @xref{System File Output Formats}, for details.
1216 Most commonly, @code{format} has width 40 (the maximum).
1218 An @code{x} with the maximum negative double value @code{-DBL_MAX}
1219 represents the system-missing value SYSMIS. (HIGHEST and LOWEST have
1220 not been observed.) @xref{System File Format}, for more about these
1224 Similar to @code{01}, with the additional information that @code{x} is
1225 a value of variable @code{varname} and has value label @code{vallab}.
1226 Both @code{varname} and @code{vallab} can be the empty string, the
1227 latter very commonly.
1229 The meaning of the final byte is unknown. Possibly it is connected to
1230 whether the value or the label should be displayed.
1233 A text string, in two forms: @code{c} is in English, and sometimes
1234 abbreviated or obscure, and @code{local} is localized to the user's
1235 locale. In an English-language locale, the two strings are often the
1236 same, and in the cases where they differ, @code{local} is more
1237 appropriate for a user interface, e.g.@: @code{c} of ``Not a PxP table
1238 for MCN...'' versus @code{local} of ``Computed only for a PxP table,
1239 where P must be greater than 1.''
1241 @code{c} and @code{local} are always either both empty or both
1244 @code{id} is a brief identifying string whose form seems to resemble a
1245 programming language identifier, e.g.@: @code{cumulative_percent} or
1246 @code{factor_14}. It is not unique.
1248 @code{type} is 00 for text taken from user input, such as syntax
1249 fragment, expressions, file names, data set names, and 01 for fixed
1250 text strings such as names of procedures or statistics. In the former
1251 case, @code{id} is always the empty string; in the latter case,
1252 @code{id} is still sometimes empty.
1255 The string value @code{s}, intended to be presented to the user
1256 formatted according to @code{format}. The format for a string is not
1257 too interesting, and the corpus contains many clearly invalid formats
1258 like A16.39 or A255.127 or A134.1, so readers should probably ignore
1259 the format entirely.
1261 @code{s} is a value of variable @code{varname} and has value label
1262 @code{vallab}. @code{varname} is never empty but @code{vallab} is
1265 The meaning of the final byte is unknown.
1268 Variable @code{varname}, which is rarely observed as empty in the
1269 corpus, with variable label @code{varlabel}, which is often empty.
1271 The meaning of the final byte is unknown.
1274 (These bytes begin a ValueMod.) A format string, analogous to
1275 @code{printf}, followed by one or more Arguments, each of which has
1276 one or more values. The format string uses the following syntax:
1283 Each of these expands to the character following @samp{\\}, to escape
1284 characters that have special meaning in format strings. These are
1285 effective inside and outside the @code{[@dots{}]} syntax forms
1289 Expands to a new-line, inside or outside the @code{[@dots{}]} forms
1293 Expands to a formatted version of argument @var{i}, which must have
1294 only a single value. For example, @code{^1} expands to the first
1295 argument's @code{value}.
1297 @item [:@var{a}:]@var{i}
1298 Expands @var{a} for each of the values in @var{i}. @var{a}
1299 should contain one or more @code{^@var{j}} conversions, which are
1300 drawn from the values for argument @var{i} in order. Some examples
1305 All of the values for the first argument, concatenated.
1308 Expands to the values for the first argument, each followed by
1312 Expands to @code{@var{x} = @var{y}} where @var{x} is the second
1313 argument's first value and @var{y} is its second value. (This would
1314 be used only if the argument has two values. If there were more
1315 values, the second and third values would be directly concatenated,
1316 which would look funny.)
1319 @item [@var{a}:@var{b}:]@var{i}
1320 This extends the previous form so that the first values are expanded
1321 using @var{a} and later values are expanded using @var{b}. For an
1322 unknown reason, within @var{a} the @code{^@var{j}} conversions are
1323 instead written as @code{%@var{j}}. Some examples from the corpus:
1327 Expands to all of the values for the first argument, separated by
1330 @item [%1 = %2:, ^1 = ^2:]1
1331 Given appropriate values for the first argument, expands to @code{X =
1335 Given appropriate values, expands to @code{1, 2, 3}.
1339 The format string is localized to the user's locale.
1342 @node SPV Light Member ValueMod
1343 @subsection ValueMod
1345 A ValueMod can specify special modifications to a Value.
1350 31 i0 (i0 @math{|} i1 string[@t{subscript}])
1351 v1(00 (i1 @math{|} i2) 00 00 int 00 00)
1352 v3(count(FormatString StylePair))
1353 @math{|} 31 int[@t{n-refs}] int16*[@t{n-refs}] Format
1356 Format @result{} 00 00 count(FormatString Style 58)
1357 FormatString @result{} count((count((i0 58)?) (58 @math{|} 31 string))?)
1364 bool[@t{bold}] bool[@t{italic}] bool[@t{underline}] bool[@t{show}]
1365 string[@t{fgcolor}] string[@t{bgcolor}]
1366 string[@t{typeface}] byte[@t{size}]
1369 int[@t{halign}] int[@t{valign}] double[@t{offset}]
1370 int16[@t{left-margin}] int16[@t{right-margin}]
1371 int16[@t{top-margin}] int16[@t{bottom-margin}]
1375 A ValueMod that begins with ``31 i0'' specifies a string to append to
1376 the main text of the Value, as a subscript. The subscript text is a
1377 brief indicator, e.g.@: @samp{a} or @samp{a,b}, with its meaning
1378 indicated by the table caption. In this usage, subscripts are similar
1379 to footnotes. One apparent difference is that a Value can only
1380 reference one footnote but a subscript can list more than one letter.
1382 A ValueMod that begins with 31 followed by a nonzero ``int'' specifies
1383 a footnote or footnotes that the Value references. Footnote markers
1384 are shown appended to the main text of the Value, as superscripts.
1386 The Format, if present, is a format string for substitutions using the
1387 syntax explained previously. It appears to be an English-language
1388 version of the localized format string in the Value in which the
1391 Style and Style2, if present, change the style for this individual
1392 Value. @code{bold}, @code{italic}, and @code{underline} control the
1393 particular style. @code{fgcolor} and @code{bgcolor} are strings, such
1394 as @code{#ffffff}. The @code{size} is a font size in units of 1/96
1397 @code{halign} is 0 for center, 2 for left, 4 for right, 6 for decimal,
1398 0xffffffad for mixed. For decimal alignment, @code{offset} is the
1399 decimal point's offset from the right side of the cell, in units of
1402 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
1405 @code{left-margin}, @code{right-margin}, @code{top-margin}, and
1406 @code{bottom-margin} are in units of 1/72 inch.
1408 @node SPV Legacy Detail Member Binary Format
1409 @section Legacy Detail Member Binary Format
1411 Whereas the light binary format represents everything about a given
1412 pivot table, the legacy binary format conceptually consists of a
1413 number of named sources, each of which consists of a number of named
1414 variables, each of which is a 1-dimensional array of numbers or
1415 strings or a mix. Thus, the legacy binary member format is quite
1418 This section uses the same context-free grammar notation as in the
1419 previous section, with the following additions:
1423 In a version 0xaf legacy member, @var{x}; in other versions, nothing.
1424 (The legacy member header indicates the version; see below.)
1427 In a version 0xb0 legacy member, @var{x}; in other versions, nothing.
1430 A legacy detail member @file{.bin} has the following overall format:
1434 LegacyBinary @result{}
1435 00 byte[@t{version}] int16[@t{n-sources}] int[@t{member-size}]
1436 Metadata*[@t{n-sources}] Data*[@t{n-sources}]
1440 @code{version} is a version number that affects the interpretation of
1441 some of the other data in the member. Versions 0xaf and 0xb0 are
1442 known. We will refer to ``version 0xaf'' and ``version 0xb0'' members
1445 A legacy member consists of @code{n-sources} data sources, each of
1446 which has Metadata and Data.
1448 @code{member-size} is the size of the legacy binary member, in bytes.
1450 The following sections go into more detail.
1453 * SPV Legacy Member Metadata::
1454 * SPV Legacy Member Data::
1457 @node SPV Legacy Member Metadata
1458 @subsection Metadata
1463 int[@t{n-data}] int[@t{n-variables}] int[@t{offset}]
1464 vAF(byte*32[@t{source-name}])
1465 vB0(byte*64[@t{source-name}] int[@t{x}])
1469 A data source has @code{n-variables} variables, each with
1470 @code{n-data} data values.
1472 @code{source-name} is a 32- or 64-byte string padded on the right with
1473 zero bytes. The names that appear in the corpus are very generic:
1474 usually @code{tableData} for pivot table data or @code{source0} for
1477 A given Metadata's @code{offset} is the offset, in bytes, from the
1478 beginning of the member to the start of the corresponding Data. This
1479 allows programs to skip to the beginning of the data for a particular
1480 source; it is also important to determine whether a source includes
1481 any string data (@pxref{SPV Legacy Member Data}).
1483 The meaning of @code{x} in version 0xb0 is unknown.
1485 @node SPV Legacy Member Data
1490 Data @result{} NumericData*[@t{n-variables}] StringData?
1491 NumericData @result{} byte*288[@t{variable-name}] double*[@t{n-data}]
1495 Data follow the Metadata in the legacy binary format, with sources in
1496 the same order. Each NumericSeries begins with a @code{variable-name}
1497 that generally indicates its role in the pivot table, e.g.@: ``cell'',
1498 ``cellFormat'', ``dimension0categories'', ``dimension0group0'',
1499 followed by the numeric data, one double per datum. A double with the
1500 maximum negative double @code{-DBL_MAX} represents the system-missing
1505 StringData @result{} i1 string[@t{source-name}] Pairs Labels
1507 Pairs @result{} int[@t{n-string-vars}] PairSeries*[@t{n-string-vars}]
1508 PairVar @result{} string[@t{pair-var-name}] int[@t{n-pairs}] Pair*[@t{n-pairs}]
1509 Pair @result{} int[@t{i}] int[@t{j}]
1511 Labels @result{} int[@t{n-labels}] Label*[@t{n-labels}]
1512 Label @result{} int[@t{frequency}] int[@t{s}]
1516 A source may include a mix of numeric and string data values. When a
1517 source includes any string data, the data values that are strings are
1518 set to SYSMIS in the NumericData, and StringData follows the
1519 NumericData. A source that contains no string data omits the
1520 StringData. To reliably determine whether a source includes
1521 StringData, the reader should check whether the offset following the
1522 NumericData is the offset of the next source, as indicated by its
1523 Metadata (or the end of the member, in the case of the last source).
1525 StringData repeats the name of the source (from Metadata).
1527 The string data overlays the numeric data. @code{n-string-vars} is
1528 the number of variables in the source that include string data. More
1529 precisely, it is the 1-based index of the last variable in the source
1530 that includes any string data; thus, it would be 4 if there are 5
1531 variables and only the fourth one includes string data.
1533 Each PairVar consists a sequence of 0 or more Pair nonterminals, each
1534 of which maps from a 0-based index within variable @code{i} to a
1535 0-based label index @code{j}, e.g.@: pair @code{i} = 2, @code{j} = 3,
1536 means that the third data value (with value SYSMIS) is to be replaced
1537 by the string of the fourth Label.
1539 The labels themselves follow the pairs. The valuable part of each
1540 label is the string @code{s}. Each label also includes a
1541 @code{frequency} that reports the number of pairs that reference it
1542 (although this is not useful).
1544 @node SPV Legacy Detail Member XML Format
1545 @section Legacy Detail Member XML Format
1547 This format is still under investigation.
1549 The design of the detail XML format is not what one would end up with
1550 for describing pivot tables. This is because it is a special case
1551 of a much more general format (``visualization XML'' or ``VizML'')
1552 that can describe a wide range of visualizations. Most of this
1553 generality is overkill for tables, and so we end up with a funny
1554 subset of a general-purpose format.
1556 The important elements of the detail XML format are:
1560 Variables. Variables in detail XML roughly correspond to the
1561 dimensions in a light detail member. There is one variable for each
1562 dimension, plus one variable for each level of labeling along an axis.
1564 The bulk of variables are defined with @code{sourceVariable} elements.
1565 The data for these variables comes from the associated
1566 @code{tableData.bin} member. Some variables are defined, with
1567 @code{derivedVariable} elements, as a constant or in terms of a
1568 mapping function from a source variable.
1571 Assignment of variables to axes. A variable can appear as columns, or
1572 rows, or layers. The @code{faceting} element and its sub-elements
1573 describe this assignment.
1576 All elements have an optional @code{id} attribute. In practice many
1577 elements are assigned @code{id} attributes that are never referenced.
1580 * SPV Detail visualization Element::
1581 * SPV Detail userSource Element::
1582 * SPV Detail sourceVariable Element::
1583 * SPV Detail derivedVariable Element::
1584 * SPV Detail extension Element::
1585 * SPV Detail graph Element::
1586 * SPV Detail location Element::
1587 * SPV Detail coordinates Element::
1588 * SPV Detail faceting Element::
1589 * SPV Detail facetLayout Element::
1592 @node SPV Detail visualization Element
1593 @subsection The @code{visualization} Element
1596 Parent: Document root
1600 (sourceVariable @math{|} derivedVariable)@math{+}
1608 This element has the following attributes.
1610 @defvr {Required} creator
1611 The version of the software that created this SPV file, as a string of
1612 the form @code{xxyyzz}, which represents software version xx.yy.zz,
1613 e.g.@: @code{160001} is version 16.0.1. The corpus includes major
1614 versions 16 through 19.
1617 @defvr {Required} date
1618 The date on the which the file was created, as a string of the form
1622 @defvr {Required} lang
1623 The locale used for output, in Windows format, which is similar to the
1624 format used in Unix with the underscore replaced by a hyphen, e.g.@:
1625 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
1628 @defvr {Required} name
1629 The title of the pivot table, localized to the output language.
1632 @defvr {Required} style
1633 The @code{id} of a @code{style} element (@pxref{SPV Detail style
1634 element}). This is the base style for the entire pivot table. In
1635 every example in the corpus, the value is @code{visualizationStyle}
1636 and the corresponding @code{style} element has no attributes other
1640 @defvr {Required} type
1641 A floating-point number. The meaning is unknown.
1644 @defvr {Required} version
1645 The visualization schema version number. In the corpus, the value is
1646 one of 2.4, 2.5, 2.7, and 2.8.
1649 @node SPV Detail userSource Element
1650 @subsection The @code{userSource} Element
1652 Parent: @code{visualization} @*
1655 This element has the following attributes.
1657 @defvr {Optional} missing
1658 Always @code{listwise}.
1661 @node SPV Detail sourceVariable Element
1662 @subsection The @code{sourceVariable} Element
1664 Parent: @code{visualization} @*
1665 Contents: @code{extension}* (@code{format} @math{|} @code{stringFormat})?
1667 This element defines a variable whose values can be used elsewhere in
1668 the visualization. It ties this element's @code{id} to a variable
1669 from the @file{tableData.bin} member that corresponds to this
1672 This element has the following attributes.
1674 @defvr {Required} categorical
1675 Always set to @code{true}.
1678 @defvr {Required} source
1679 Always set to @code{tableData}, the @code{source-name} in the
1680 corresponding @file{tableData.bin} member (@pxref{SPV Legacy Member
1684 @defvr {Required} sourceName
1685 The name of a variable within the source, the @code{variable-name} in
1686 the corresponding @file{tableData.bin} member (@pxref{SPV Legacy
1690 @defvr {Optional} dependsOn
1691 The @code{variable-name} of a variable linked to this one, so that a
1692 viewer can work with them together. For a group variable, this is the
1693 name of the corresponding categorical variable.
1696 @defvr {Optional} label
1697 The variable label, if any
1700 @defvr {Optional} labelVariable
1701 The @code{variable-name} of a variable whose string values correspond
1702 one-to-one with the values of this variable and are suitable for use
1706 @node SPV Detail derivedVariable Element
1707 @subsection The @code{derivedVariable} Element
1709 Parent: @code{visualization} @*
1710 Contents: @code{extension}* (@code{format} @math{|} @code{stringFormat} @code{valueMapEntry}*)
1712 Like @code{sourceVariable}, this element defines a variable whose
1713 values can be used elsewhere in the visualization. Instead of being
1714 read from a data source, the variable's data are defined by a
1715 mathematical expression.
1717 This element has the following attributes.
1719 @defvr {Required} categorical
1720 Always set to @code{true}.
1723 @defvr {Required} value
1724 An expression that defines the variable's value. In theory this could
1725 be an arbitrary expression in terms of constants, functions, and other
1726 variables, e.g.@: @math{(@var{var1} + @var{var2}) / 2}. In practice,
1727 the corpus contains only the following forms of expressions:
1730 @item constant(@var{number})
1731 @itemx constant(@var{variable})
1732 A constant. The meaning when a variable is named is unknown.
1733 Sometimes the ``variable name'' has spaces in it.
1735 @item map(@var{variable})
1736 Transforms the values in the named @var{variable} using the
1737 @code{valueMapEntry}s contained within the element.
1741 @defvr {Optional} dependsOn
1742 The @code{variable-name} of a variable linked to this one, so that a
1743 viewer can work with them together. For a group variable, this is the
1744 name of the corresponding categorical variable.
1748 * SPV Detail valueMapEntry Element::
1751 @node SPV Detail valueMapEntry Element
1752 @subsubsection The @code{valueMapEntry} Element
1754 Parent: @code{derivedVariable} @*
1757 A @code{valueMapEntry} element defines a mapping from one or more
1758 values of a source expression to a target value. (In the corpus, the
1759 source expression is always just the name of a variable.) Each target
1760 value requires a separate @code{valueMapEntry}. If multiple source
1761 values map to the same target value, they can be combined or separate.
1763 @code{valueMapEntry} has the following attributes.
1765 @defvr {Required} from
1766 A source value, or multiple source values separated by semicolons,
1767 e.g.@: @code{0} or @code{13;14;15;16}.
1770 @defvr {Required} to
1774 @node SPV Detail extension Element
1775 @subsection The @code{extension} Element
1777 This is a general-purpose ``extension'' element. Readers that don't
1778 understand a given extension should be able to safely ignore it. The
1779 attributes on this element, and their meanings, vary based on the
1780 context. Each known usage is described separately below. The current
1781 extensions use attributes exclusively, without any nested elements.
1783 @subsubheading @code{visualization} Parent Element
1785 With @code{visualization} as its parent element, @code{extension} has
1786 the following attributes.
1788 @defvr {Optional} numRows
1789 An integer that presumably defines the number of rows in the displayed
1793 @defvr {Optional} showGridline
1794 Always set to @code{false} in the corpus.
1797 @defvr {Optional} minWidthSet
1798 @defvrx {Optional} maxWidthSet
1799 Always set to @code{true} in the corpus.
1802 @subsubheading @code{container} Parent Element
1804 With @code{container} as its parent element, @code{extension} has the
1805 following attributes.
1807 @defvr {Required} combinedFootnotes
1808 Always set to @code{true} in the corpus.
1811 @subsubheading @code{sourceVariable} and @code{derivedVariable} Parent Element
1813 With @code{sourceVariable} or @code{derivedVariable} as its parent
1814 element, @code{extension} has the following attributes. A given
1815 parent element often contains several @code{extension} elements that
1816 specify the meaning of the source data's variables or sources, e.g.@:
1819 <extension from="0" helpId="corrected_model"/>
1820 <extension from="3" helpId="error"/>
1821 <extension from="4" helpId="total_9"/>
1822 <extension from="5" helpId="corrected_total"/>
1825 @defvr {Required} from
1826 An integer or a name like ``dimension0''.
1829 @defvr {Required} helpId
1833 @node SPV Detail graph Element
1834 @subsection The @code{graph} Element
1836 Parent: @code{visualization} @*
1837 Contents: @code{location}@math{+} @code{coordinates} @code{faceting} @code{facetLayout} @code{interval}
1839 @code{graph} has the following attributes.
1841 @defvr {Required} cellStyle
1842 @defvrx {Required} style
1843 Each of these is the @code{id} of a @code{style} element (@pxref{SPV
1844 Detail style element}). The former is the default style for
1845 individual cells, the latter for the entire table.
1848 @node SPV Detail location Element
1849 @subsection The @code{location} Element
1851 Parent: @code{graph} @*
1854 Each instance of this element specifies where some part of the table
1855 frame is located. All the examples in the corpus have four instances
1856 of this element, one for each of the parts @code{height},
1857 @code{width}, @code{left}, and @code{top}. Some examples in the
1858 corpus add a fifth for part @code{bottom}, even though it is not clear
1859 how all of @code{top}, @code{bottom}, and @code{heigth} can be honored
1860 at the same time. In any case, @code{location} seems to have little
1861 importance in representing tables; a reader can safely ignore it.
1863 @defvr {Required} part
1864 One of @code{height}, @code{width}, @code{top}, @code{bottom}, or
1865 @code{left}. Presumably @code{right} is acceptable as well but the
1866 corpus contains no examples.
1869 @defvr {Required} method
1870 How the location is determined:
1874 Based on the natural size of the table. Observed only for
1875 parts @code{height} and @code{width}.
1878 Based on the location specified in @code{target}. Observed only for
1879 parts @code{top} and @code{bottom}.
1882 Using the value in @code{value}. Observed only for parts @code{top},
1883 @code{bottom}, and @code{left}.
1886 Same as the specified @code{target}. Observed only for part
1891 @defvr {Optional} min
1892 Minimum size. Only observed with value @code{100pt}. Only observed
1893 for part @code{width}.
1896 @defvr {Dependent} target
1897 Required when @code{method} is @code{attach} or @code{same}, not
1898 observed otherwise. This is the ID of an element to attach to.
1899 Observed with the ID of @code{title}, @code{footnote}, @code{graph},
1903 @defvr {Dependent} value
1904 Required when @code{method} is @code{fixed}, not observed otherwise.
1905 Observed values are @code{0%}, @code{0px}, @code{1px}, and @code{3px}
1906 on parts @code{top} and @code{left}, and @code{100%} on part
1910 @node SPV Detail coordinates Element
1911 @subsection The @code{coordinates} Element
1913 Parent: @code{graph} @*
1916 This element is always present and always empty, with no attributes
1919 @node SPV Detail faceting Element
1920 @subsection The @code{faceting} Element
1922 Parent: @code{graph} @*
1923 Contents: @code{cross} @code{layer}*
1925 The @code{faceting} element describes the row, column, and layer
1926 structure of the table. Its @code{cross} child determines the row and
1927 column structure, and each @code{layer} child (if any) represents a
1930 @code{faceting} has no attributes (other than @code{id}).
1932 @subsubheading The @code{cross} Element
1934 Parent: @code{faceting} @*
1935 Contents: @code{nest} @code{nest}
1937 The @code{cross} element describes the row and column structure of the
1938 table. It has exactly two @code{nest} children, the first of which
1939 describes the table's rows and the second the table's columns.
1941 @code{cross} has no attributes (other than @code{id}).
1943 @subsubheading The @code{nest} Element
1945 Parent: @code{cross} @*
1946 Contents: @code{variableReference}@math{+}
1948 A given @code{nest} usually consists of one or more dimensions, each
1949 of which is represented by @code{variableReference} child elements.
1950 Minimally, a dimension has two @code{variableReference} children, one
1951 for the categories, one for the data, e.g.:
1955 <variableReference ref="dimension0categories"/>
1956 <variableReference ref="dimension0"/>
1961 Groups of categories introduce additional variable references, e.g.@:
1965 <variableReference ref="dimension0categories"/>
1966 <variableReference ref="dimension0group0"/>
1967 <variableReference ref="dimension0"/>
1972 Grouping can be hierarchical, e.g.@:
1976 <variableReference ref="dimension0categories"/>
1977 <variableReference ref="dimension0group1"/>
1978 <variableReference ref="dimension0group0"/>
1979 <variableReference ref="dimension0"/>
1984 XXX what are group maps?
1987 <nest id="nest_1973">
1988 <variableReference ref="dimension1categories"/>
1989 <variableReference ref="dimension1group1map"/>
1990 <variableReference ref="dimension1group0map"/>
1991 <variableReference ref="dimension1"/>
1994 <variableReference ref="dimension0categories"/>
1995 <variableReference ref="dimension0group0map"/>
1996 <variableReference ref="dimension0"/>
2001 A @code{nest} can contain multiple dimensions:
2005 <variableReference ref="dimension1categories"/>
2006 <variableReference ref="dimension1group0"/>
2007 <variableReference ref="dimension1"/>
2008 <variableReference ref="dimension0categories"/>
2009 <variableReference ref="dimension0"/>
2013 One @code{nest} within a given @code{cross} may have no dimensions, in
2014 which case it still has one @code{variableReference} child, which
2015 references a @code{derivedVariable} whose @code{value} attribute is
2016 @code{constant(0)}. In the corpus, such a @code{derivedVariable} has
2017 @code{row} or @code{column}, respectively, as its @code{id}.
2019 @code{nest} has no attributes (other than @code{id}).
2021 @subsubheading The @code{variableReference} Element
2023 Parent: @code{nest} @*
2026 @code{variableReference} has one attribute.
2028 @defvr {Required} ref
2029 The @code{id} of a @code{sourceVariable} or @code{derivedVariable}
2033 @subsubheading The @code{layer} Element
2035 Parent: @code{faceting} @*
2038 Each layer is represented by a pair of @code{layer} elements. The
2039 first of this pair is for a category variable, the second for the data
2043 <layer value="0" variable="dimension0categories" visible="true"/>
2044 <layer value="dimension0" variable="dimension0" visible="false"/>
2048 @code{layer} has the following attributes.
2050 @defvr {Required} variable
2051 The @code{id} of a @code{sourceVariable} or @code{derivedVariable}
2055 @defvr {Required} value
2056 The value to select. For a category variable, this is always
2057 @code{0}; for a data variable, it is the same as the @code{variable}
2061 @defvr {Optional} visible
2062 Whether the layer is visible. Generally, category layers are visible
2063 and data layers are not, but sometimes this attribute is omitted.
2066 @defvr {Optional} method
2067 When present, this is always @code{nest}.
2070 @node SPV Detail facetLayout Element
2071 @subsection The @code{facetLayout} Element
2073 Parent: @code{graph} @*
2074 Contents: @code{tableLayout} @code{facetLevel}@math{+} @code{setCellProperties}*
2076 @subsubheading The @code{tableLayout} Element
2078 Parent: @code{facetLayout} @*
2081 @defvr {Required} verticalTitlesInCorner
2082 Always set to @code{true}.
2085 @defvr {Optional} style
2086 The @code{id} of a @code{style} element.
2089 @defvr {Optional} fitCells
2090 Always set to @code{ticks}.
2093 @subsubheading The @code{facetLevel} Element
2095 Parent: @code{facetLayout} @*
2096 Contents: @code{axis}
2098 Each @code{facetLevel} describes a @code{variableReference} or
2099 @code{layer}, and a table has one @code{facetLevel} element for
2100 each such element. For example, an SPV detail member that contains
2101 four @code{variableReference} elements and two @code{layer} elements
2102 will contain six @code{facetLevel} elements.
2104 In the corpus, @code{facetLevel} elements and the elements that they
2105 describe are always in the same order. The correspondence may also be
2106 observed in two other ways. First, one may use the @code{level}
2107 attribute, described below. Second, in the corpus, a
2108 @code{facetLevel} always has an @code{id} that is the same as the
2109 @code{id} of the element it describes with @code{_facetLevel}
2110 appended. One should not formally rely on this, of course, but it is
2111 usefully indicative.
2113 @defvr {Required} level
2114 A 1-based index into the @code{variableReference} and @code{layer}
2115 elements, e.g.@: a @code{facetLayout} with a @code{level} of 1
2116 describes the first @code{variableReference} in the SPV detail member,
2117 and in a member with four @code{variableReference} elements, a
2118 @code{facetLayout} with a @code{level} of 5 describes the first
2119 @code{layer} in the member.
2122 @defvr {Required} gap
2123 Always observed as @code{0pt}.
2126 @subsubheading The @code{axis} Element
2128 Parent: @code{facetLevel} @*
2129 Contents: @code{label}? @code{majorTicks}
2131 @defvr {Attribute} style
2132 The @code{id} of a @code{style} element.
2135 @subsubheading The @code{label} Element
2137 Parent: @code{axis} or @code{labelFrame} @*
2138 Contents: @code{text}@math{+} @math{|} @code{descriptionGroup}
2140 This element represents a label on some aspect of the table. For example,
2141 the table's title is a @code{label}.
2143 The contents of the label can be one or more @code{text} elements or a
2144 @code{descriptionGroup}.
2146 @defvr {Attribute} style
2147 @defvrx {Optional} textFrameStyle
2148 Each of these is the @code{id} of a @code{style} element.
2149 @code{style} is the style of the label text, @code{textFrameStyle} the
2150 style for the frame around the label.
2153 @defvr {Optional} purpose
2154 The kind of entity being labeled, one of @code{title},
2155 @code{subTitle}, @code{layer}, or @code{footnote}.
2158 @subsubheading The @code{descriptionGroup} Element
2160 Parent: @code{label} @*
2161 Contents: (@code{description} @math{|} @code{text})@math{+}
2163 A @code{descriptionGroup} concatenates one or more elements to form a
2164 label. Each element can be a @code{text} element, which contains
2165 literal text, or a @code{description} element that substitutes a value
2168 @defvr {Attribute} target
2169 The @code{id} of an element being described. In the corpus, this is
2170 always @code{faceting}.
2173 @defvr {Attribute} separator
2174 A string to separate the description of multiple groups, if the
2175 @code{target} has more than one. In the corpus, this is always a
2179 Typical contents for a @code{descriptionGroup} are a value by itself:
2181 <description name="value"/>
2183 @noindent or a variable and its value, separated by a colon:
2185 <description name="variable"/><text>:</text><description name="value"/>
2188 @subsubheading The @code{description} Element
2190 Parent: @code{descriptionGroup} @*
2193 A @code{description} is like a macro that expands to some property of
2194 the target of its parent @code{descriptionGroup}.
2196 @defvr {Attribute} name
2197 The name of the property. Only @code{variable} and @code{value}
2198 appear in the corpus.
2201 @subsubheading The @code{majorTicks} Element
2203 Parent: @code{axis} @*
2204 Contents: @code{gridline}?
2206 @defvr {Attribute} labelAngle
2207 @defvrx {Attribute} length
2208 Both always defined to @code{0}.
2211 @defvr {Attribute} style
2212 @defvrx {Attribute} tickFrameStyle
2213 Each of these is the @code{id} of a @code{style} element.
2214 @code{style} is the style of the tick labels, @code{tickFrameStyle}
2215 the style for the frames around the labels.
2218 @subsubheading The @code{gridline} Element
2220 Parent: @code{majorTicks} @*
2223 Represents ``gridlines,'' which for a table represents the lines
2224 between the rows or columns of a table (XXX?).
2226 @defvr {Attribute} style
2227 The style for the gridline.
2230 @defvr {Attribute} zOrder
2231 Observed as a number between 28 and 31. Does not seem to be
2235 @subsubheading The @code{setCellProperties} Element
2237 Parent: @code{facetLayout} @*
2238 Contents: @code{setMetaData} @code{setStyle}* @code{setFormat}@math{+} @code{union}?
2240 This element sets style properties of cells designated by the
2241 @code{target} attribute of its child elements, as further restricted
2242 by the optional @code{union} element if present. The @code{target}
2243 values often used, e.g.@: @code{graph} or @code{labeling}, actually
2244 affect every cell, so the @code{union} element is a useful
2247 @defvr {Optional} applyToConverse
2248 If present, always @code{true}. This appears to invert the meaning of
2249 the @code{target} of sub-elements: the selected cells are the ones
2250 @emph{not} designated by @code{target}. This is confusing, given the
2251 additional restrictions of @code{union}, but in the corpus
2252 @code{applyToConverse} is never present along with @code{union}.
2255 @subsubheading The @code{setMetaData} Element
2257 Parent: @code{setCellProperties} @*
2260 This element is not known to have any visible effect.
2262 @defvr {Required} target
2263 The @code{id} of an element whose metadata is to be set. In the
2264 corpus, this is always @code{graph}, the @code{id} used for the
2265 @code{graph} element.
2268 @defvr {Required} key
2269 @defvrx {Required} value
2270 A key-value pair to set for the target.
2272 In the corpus, @code{key} is @code{cellPropId} or, rarely,
2273 @code{diagProps}, and @code{value} is always the @code{id} of the
2274 parent @code{setCellProperties}.
2277 @subsubheading The @code{setStyle} Element
2279 Parent: @code{setCellProperties} @*
2282 This element associates a style with the target.
2284 @defvr {Required} target
2285 The @code{id} of an element whose style is to be set. In the corpus,
2286 this is always the @code{id} of an @code{interval}, @code{labeling},
2287 or, rarely, @code{graph} element.
2290 @defvr {Required} style
2291 The @code{id} of a @code{style} element that identifies the style to
2295 @subsubheading The @code{setFormat} Element
2298 Parent: @code{setCellProperties}
2301 @math{|} @code{numberFormat}
2302 @math{|} @code{stringFormat}@math{+}
2303 @math{|} @code{dateTimeFormat}
2306 This element sets the format of the target, ``format'' in this case
2307 meaning the SPSS print format for a variable.
2309 The details of this element vary depending on the schema version, as
2310 declared in the root @code{visualization} element's @code{version}
2311 attribute (@pxref{SPV Detail visualization Element}). In version 2.5
2312 and earlier, @code{setFormat} contains one of a number of child
2313 elements that correspond to the different varieties of print formats.
2314 In version 2.7 and later, @code{setFormat} instead always contains a
2315 @code{format} element.
2317 XXX reinvestigate the above claim about versions: it appears to be
2320 The @code{setFormat} element itself has the following attributes.
2322 @defvr {Required} target
2323 The @code{id} of an element whose style is to be set. In the corpus,
2324 this is always the @code{id} of an @code{majorTicks} or
2325 @code{labeling} element.
2328 @defvr {Optional} reset
2329 If this is @code{true}, this format overrides the target's previous
2330 format. If it is @code{false}, the adds to the previous format. In
2331 the corpus this is always @code{true}. The default behavior is
2336 * SPV Detail format Element::
2337 * SPV Detail numberFormat Element::
2338 * SPV Detail stringFormat Element::
2339 * SPV Detail dateTimeFormat Element::
2340 * SPV Detail affix Element::
2341 * SPV Detail relabel Element::
2342 * SPV Detail union Element::
2345 @node SPV Detail format Element
2346 @subsubsection The @code{format} Element
2348 Parent: @code{sourceVariable}, @code{derivedVariable}, @code{formatMapping}, @code{labeling}, @code{formatMapping}, @code{setFormat} @*
2349 Contents: (@code{affix}@math{+} @math{|} @code{relabel}@math{+})?
2351 This element appears only in schema version 2.7 (@pxref{SPV Detail
2352 visualization Element}).
2354 This element determines a format, equivalent to an SPSS print format.
2356 @subsubheading Attributes for All Formats
2358 These attributes apply to all kinds of formats. The most important of
2359 these attributes determines the high-level kind of formatting in use:
2361 @defvr {Optional} baseFormat
2362 Either @code{dateTime} or @code{elapsedTime}. When this attribute is
2363 omitted, this element is a numeric or string format.
2367 Whether, in the corpus, other attributes are always present (``yes''),
2368 never present (``no''), or sometimes present (``opt'') depends on
2371 @multitable {maximumFractionDigits} {@code{dateTime}} {@code{elapsedTime}} {number} {string}
2372 @headitem Attribute @tab @code{dateTime} @tab @code{elapsedTime} @tab number @tab string
2373 @item errorCharacter @tab yes @tab yes @tab yes @tab opt
2375 @item separatorChars @tab yes @tab no @tab no @tab no
2377 @item mdyOrder @tab yes @tab no @tab no @tab no
2379 @item showYear @tab yes @tab no @tab no @tab no
2380 @item yearAbbreviation @tab yes @tab no @tab no @tab no
2382 @item showMonth @tab yes @tab no @tab no @tab no
2383 @item monthFormat @tab yes @tab no @tab no @tab no
2385 @item showDay @tab yes @tab opt @tab no @tab no
2386 @item dayPadding @tab yes @tab opt @tab no @tab no
2387 @item dayOfMonthPadding @tab yes @tab no @tab no @tab no
2388 @item dayType @tab yes @tab no @tab no @tab no
2390 @item showHour @tab yes @tab opt @tab no @tab no
2391 @item hourFormat @tab yes @tab opt @tab no @tab no
2392 @item hourPadding @tab yes @tab yes @tab no @tab no
2394 @item showMinute @tab yes @tab yes @tab no @tab no
2395 @item minutePadding @tab yes @tab yes @tab no @tab no
2397 @item showSecond @tab yes @tab yes @tab no @tab no
2398 @item secondPadding @tab no @tab yes @tab no @tab no
2400 @item showMillis @tab no @tab yes @tab no @tab no
2402 @item minimumIntegerDigits @tab no @tab no @tab yes @tab no
2403 @item maximumFractionDigits @tab no @tab yes @tab yes @tab no
2404 @item minimumFractionDigits @tab no @tab yes @tab yes @tab no
2405 @item useGrouping @tab no @tab opt @tab yes @tab no
2406 @item scientific @tab no @tab no @tab yes @tab no
2407 @item small @tab no @tab no @tab opt @tab no
2408 @item suffix @tab no @tab no @tab opt @tab no
2410 @item tryStringsAsNumbers @tab no @tab no @tab no @tab yes
2414 @defvr {Attribute} errorCharacter
2415 A character that replaces the formatted value when it cannot otherwise
2416 be represented in the given format. Always @samp{*}.
2419 @subsubheading Date and Time Attributes
2421 These attributes are used with @code{dateTime} and @code{elapsedTime}
2424 @defvr {Attribute} separatorChars
2425 Exactly four characters. In order, these are used for: decimal point,
2426 grouping, date separator, time separator. Always @samp{.,-:}.
2429 @defvr {Attribute} mdyOrder
2430 Within a date, the order of the days, months, and years.
2431 @code{dayMonthYear} is the only observed value, but one would expect
2432 that @code{monthDayYear} and @code{yearMonthDay} to be reasonable as
2436 @defvr {Attribute} showYear
2437 @defvrx {Attribute} yearAbbreviation
2438 Whether to include the year and, if so, whether the year should be
2439 shown abbreviated, that is, with only 2 digits. Each is @code{true}
2440 or @code{false}; only values of @code{true} and @code{false},
2441 respectively, have been observed.
2444 @defvr {Attribute} showMonth
2445 @defvrx {Attribute} monthFormat
2446 Whether to include the month (@code{true} or @code{false}) and, if so,
2447 how to format it. @code{monthFormat} is one of the following:
2451 The full name of the month, e.g.@: in an English locale,
2455 The abbreviated name of the month, e.g.@: in an English locale,
2459 The number representing the month, e.g.@: 9 for September.
2462 A two-digit number representing the month, e.g.@: 09 for September.
2465 Only values of @code{true} and @code{short}, respectively, have been
2469 @defvr {Attribute} dayPadding
2470 @defvrx {Attribute} dayOfMonthPadding
2471 @defvrx {Attribute} hourPadding
2472 @defvrx {Attribute} minutePadding
2473 @defvrx {Attribute} secondPadding
2474 These attributes presumably control whether each field in the output
2475 is padded with spaces to its maximum width, but the details are not
2476 understood. The only observed value for any of these attributes is
2480 @defvr {Attribute} showDay
2481 @defvrx {Attribute} showHour
2482 @defvrx {Attribute} showMinute
2483 @defvrx {Attribute} showSecond
2484 @defvrx {Attribute} showMillis
2485 These attributes presumably control whether each field is displayed
2486 in the output, but the details are not understood. The only
2487 observed value for any of these attributes is @code{true}.
2490 @defvr {Attribute} dayType
2491 This attribute is always @code{month} in the corpus, specifying that
2492 the day of the month is to be displayed; a value of @code{year} is
2493 supposed to indicate that the day of the year, where 1 is January 1,
2494 is to be displayed instead.
2497 @defvr {Attribute} hourFormat
2498 @code{hourFormat}, if present, is one of:
2502 The time is displayed with an @code{am} or @code{pm} suffix, e.g.@:
2506 The time is displayed in a 24-hour format, e.g.@: @code{22:15}.
2508 This is the only value observed in the corpus.
2511 The time is displayed in a 12-hour format, without distinguishing
2512 morning or evening, e.g.@: @code{10;15}.
2515 @code{hourFormat} is sometimes present for @code{elapsedTime} formats,
2516 which is confusing since a time duration does not have a concept of AM
2517 or PM. This might indicate a bug in the code that generated the XML
2518 in the corpus, or it might indicate that @code{elapsedTime} is
2519 sometimes used to format a time of day.
2522 @subsubheading Numeric Attributes
2524 These attributes are used for formats when @code{baseFormat} is
2525 @code{number}. Attributes @code{maximumFractionDigits}, and
2526 @code{minimumFractionDigits}, and @code{useGrouping} are also used
2527 when @code{baseFormat} is @code{elapsedTime}.
2529 @defvr {Attribute} minimumIntegerDigits
2530 Minimum number of digits to display before the decimal point. Always
2531 observed as @code{0}.
2534 @defvr {Attribute} maximumFractionDigits
2535 @defvrx {Attribute} maximumFractionDigits
2536 Maximum or minimum, respectively, number of digits to display after
2537 the decimal point. The observed values of each attribute range from 0
2541 @defvr {Attribute} useGrouping
2542 Whether to use the grouping character to group digits in large
2543 numbers. It would make sense for the grouping character to come from
2544 the @code{separatorChars} attribute, but that attribute is only
2545 present when @code{baseFormat} is @code{dateTime} or
2546 @code{elapsedTime}, in the corpus at least. Perhaps that is because
2547 this attribute has only been observed as @code{false}.
2550 @defvr {Attribute} scientific
2551 This attribute controls when and whether the number is formatted in
2552 scientific notation. It takes the following values:
2556 Use scientific notation only when the number's magnitude is smaller
2557 than the value of the @code{small} attribute.
2560 Use scientific notation when the number will not otherwise fit in the
2564 Always use scientific notation. Not observed in the corpus.
2567 Never use scientific notation. A number that won't otherwise fit will
2568 be replaced by an error indication (see the @code{errorCharacter}
2569 attribute). Not observed in the corpus.
2573 @defvr {Optional} small
2574 Only present when the @code{scientific} attribute is
2575 @code{onlyForSmall}, this is a numeric magnitude below which the
2576 number will be formatted in scientific notation. The values @code{0}
2577 and @code{0.0001} have been observed. The value @code{0} seems like a
2578 pathological choice, since no real number has a magnitude less than 0;
2579 perhaps in practice such a choice is equivalent to setting
2580 @code{scientific} to @code{false}.
2583 @defvr {Optional} prefix
2584 @defvrx {Optional} suffix
2585 Specifies a prefix or a suffix to apply to the formatted number. Only
2586 @code{suffix} has been observed, with value @samp{%}.
2589 @subsubheading String Attributes
2591 These attributes are used for formats when @code{baseFormat} is
2594 @defvr {Attribute} tryStringsAsNumbers
2595 When this is @code{true}, it is supposed to indicate that string
2596 values should be parsed as numbers and then displayed according to
2597 numeric formatting rules. However, in the corpus it is always
2601 @node SPV Detail numberFormat Element
2602 @subsubsection The @code{numberFormat} Element
2604 Parent: @code{setFormat} @*
2605 Contents: @code{affix}@math{+}
2607 This element appears only in schema version 2.5 and earlier
2608 (@pxref{SPV Detail visualization Element}). Possibly this element
2609 could also contain @code{relabel} elements in a more diverse corpus.
2611 This element has the following attributes.
2613 @defvr {Attribute} maximumFractionDigits
2614 @defvrx {Attribute} minimumFractionDigits
2615 @defvrx {Attribute} minimumIntegerDigits
2616 @defvrx {Optional} scientific
2617 @defvrx {Optional} small
2618 @defvrx {Optional} suffix
2619 @defvrx {Optional} useGroupging
2620 The syntax and meaning of these attributes is the same as on the
2621 @code{format} element for a numeric format. @pxref{SPV Detail format
2625 @node SPV Detail stringFormat Element
2626 @subsubsection The @code{stringFormat} Element
2628 Parent: @code{setFormat} @*
2629 Contents: (@code{affix}@math{+} @math{|} @code{relabel}@math{+})?
2631 This element appears only in schema version 2.5 and earlier
2632 (@pxref{SPV Detail visualization Element}).
2634 This element has no attributes.
2636 @node SPV Detail dateTimeFormat Element
2637 @subsubsection The @code{dateTimeFormat} Element
2639 Parent: @code{setFormat} @*
2642 This element appears only in schema version 2.5 and earlier
2643 (@pxref{SPV Detail visualization Element}). Possibly this element
2644 could also contain @code{affix} and @code{relabel} elements in a more
2647 The following attribute is required.
2649 @defvr {Attribute} baseFormat
2650 Either @code{dateTime} or @code{time}.
2653 When @code{baseFormat} is @code{dateTime}, the following attributes
2656 @defvr {Attribute} dayOfMonthPadding
2657 @defvrx {Attribute} dayPadding
2658 @defvrx {Attribute} dayType
2659 @defvrx {Attribute} hourFormat
2660 @defvrx {Attribute} hourPadding
2661 @defvrx {Attribute} mdyOrder
2662 @defvrx {Attribute} minutePadding
2663 @defvrx {Attribute} monthFormat
2664 @defvrx {Attribute} separatorChars
2665 @defvrx {Attribute} showDay
2666 @defvrx {Attribute} showHour
2667 @defvrx {Attribute} showMinute
2668 @defvrx {Attribute} showMonth
2669 @defvrx {Attribute} showSecond
2670 @defvrx {Attribute} showYear
2671 @defvrx {Attribute} yearAbbreviation
2672 The syntax and meaning of these attributes is the same as on the
2673 @code{format} element when that element's @code{baseFormat} is
2674 @code{dateTime}. @pxref{SPV Detail format Element}.
2677 When @code{baseFormat} is @code{time}, the following attributes are
2680 @defvr {Attribute} hourFormat
2681 @defvrx {Attribute} hourPadding
2682 @defvrx {Attribute} minutePadding
2683 @defvrx {Attribute} monthFormat
2684 @defvrx {Attribute} separatorChars
2685 @defvrx {Attribute} showDay
2686 @defvrx {Attribute} showHour
2687 @defvrx {Attribute} showMinute
2688 @defvrx {Attribute} showMonth
2689 @defvrx {Attribute} showSecond
2690 @defvrx {Attribute} showYear
2691 @defvrx {Attribute} yearAbbreviation
2692 The syntax and meaning of these attributes is the same as on the
2693 @code{format} element when that element's @code{baseFormat} is
2694 @code{elapsedTime}. @pxref{SPV Detail format Element}.
2697 @node SPV Detail affix Element
2698 @subsubsection The @code{affix} Element
2700 Parent: @code{format} or @code{numberFormat} or @code{stringFormat} @*
2703 Possibly this element could have @code{dateTimeFormat} as a parent in
2704 a more diverse corpus.
2706 This defines a suffix (or, theoretically, a prefix) for a formatted
2707 value. It is used to insert a reference to a footnote. It has the
2708 following attributes:
2710 @defvr {Attribute} definesReference
2711 This specifies the footnote number as a natural number: 1 for the
2712 first footnote, 2 for the second, and so on.
2715 @defvr {Attribute} position
2716 Position for the footnote label. Always @code{superscript}.
2719 @defvr {Attribute} suffix
2720 Whether the affix is a suffix (@code{true}) or a prefix
2721 (@code{false}). Always @code{true}.
2724 @defvr {Attribute} value
2725 The text of the suffix or prefix. Typically a letter, e.g.@: @code{a}
2726 for footnote 1, @code{b} for footnote 2, @enddots{} The corpus
2727 contains other values: @code{*}, @code{**}, and a few that begin with
2728 at least one comma: @code{,b}, @code{,c}, @code{,,b}, and @code{,,c}.
2731 @node SPV Detail relabel Element
2732 @subsubsection The @code{relabel} Element
2734 Parent: @code{format} or @code{stringFormat} @*
2737 Possibly this element could have @code{numberFormat} or
2738 @code{dateTimeFormat} as a parent in a more diverse corpus.
2740 This specifies how to display a given value. It is used to implement
2741 value labels and to display the system-missing value in a
2742 human-readable way. It has the following attributes:
2744 @defvr {Attribute} from
2745 The value to map. In the corpus this is an integer or the
2746 system-missing value @code{-1.797693134862316E300}.
2749 @defvr {Attribute} to
2750 The string to display in place of the value of @code{from}. In the
2751 corpus this is a wide variety of value labels; the system-missing
2752 value is mapped to @samp{.}.
2755 @node SPV Detail union Element
2756 @subsubsection The @code{union} Element
2758 Parent: @code{setCellProperties} @*
2759 Contents: @code{intersect}@math{+}
2761 This element represents a set of cells, computed as the union of the
2762 sets represented by each of its children.
2764 @subsubheading The @code{intersect} Element
2766 Parent: @code{union} @*
2767 Contents: @code{where}@math{+} @math{|} @code{intersectWhere}?
2769 This element represents a set of cells, computed as the intersection
2770 of the sets represented by each of its children.
2772 Of the two possible children, in the corpus @code{where} is far more
2773 common, appearing thousands of times, whereas @code{intersectWhere}
2774 only appears 4 times.
2776 Most @code{intersect} elements have two or more children.
2778 @subsubheading The @code{where} Element
2780 Parent: @code{intersect} @*
2783 This element represents the set of cells in which the value of a
2784 specified variable falls within a specified set.
2786 @defvr {Attribute} variable
2787 The @code{id} of a variable, e.g.@: @code{dimension0categories} or
2788 @code{dimension0group0map}.
2791 @defvr {Attribute} include
2792 A value, or multiple values separated by semicolons,
2793 e.g.@: @code{0} or @code{13;14;15;16}.
2796 @subsubheading The @code{intersectWhere}
2798 Parent: @code{intersect} @*
2801 The meaning of this element is unknown.
2803 @defvr {Attribute} variable
2804 @defvrx {Attribute} variable2
2805 The meaning of these attributes is unknown. In the four examples in
2806 the corpus they always take the values @code{dimension2categories} and
2807 @code{dimension0categories}, respectively.