1 @c PSPP - a program for statistical analysis.
2 @c Copyright (C) 2019 Free Software Foundation, Inc.
3 @c Permission is granted to copy, distribute and/or modify this document
4 @c under the terms of the GNU Free Documentation License, Version 1.3
5 @c or any later version published by the Free Software Foundation;
6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7 @c A copy of the license is included in the section entitled "GNU
8 @c Free Documentation License".
11 @node SPSS Viewer File Format
12 @appendix SPSS Viewer File Format
14 SPSS Viewer or @file{.spv} files, here called SPV files, are written
15 by SPSS 16 and later to represent the contents of its output editor.
16 This chapter documents the format, based on examination of a corpus of
17 about 8,000 files from a variety of sources. This description is
18 detailed enough to both read and write SPV files.
20 SPSS 15 and earlier versions instead use @file{.spo} files, which have
21 a completely different output format based on the Microsoft Compound
22 Document Format. This format is not documented here.
24 An SPV file is a Zip archive that can be read with @command{zipinfo}
25 and @command{unzip} and similar programs. The final member in the Zip
26 archive is the @dfn{manifest}, a file named
27 @file{META-INF/MANIFEST.MF}. This structure makes SPV files resemble
28 Java ``JAR'' files (and ODF files), but whereas a JAR manifest
29 contains a sequence of colon-delimited key/value pairs, an SPV
30 manifest contains the string @samp{allowPivoting=true}, without a
31 new-line. PSPP uses this string to identify an SPV file; it is
32 invariant across the corpus.@footnote{SPV files always begin with the
33 7-byte sequence 50 4b 03 04 14 00 08, but this is not a useful magic
34 number because most Zip archives start the same way.}@footnote{SPSS
35 writes @file{META-INF/MANIFEST.MF} to every SPV file, but it does not
36 read it or even require it to exist, so using different contents,
37 e.g.@: as @samp{allowingPivot=false} has no effect.}
39 The rest of the members in an SPV file's Zip archive fall into two
40 categories: @dfn{structure} and @dfn{detail} members. Structure
41 member names begin with @file{outputViewer@var{nnnnnnnnnn}}, where
42 each @var{n} is a decimal digit, and end with @file{.xml}, and often
43 include the string @file{_heading} in between. Each of these members
44 represents some kind of output item (a table, a heading, a block of
45 text, etc.) or a group of them. The member whose output goes at the
46 beginning of the document is numbered 0, the next member in the output
47 is numbered 1, and so on.
49 Structure members contain XML. This XML is sometimes self-contained,
50 but it often references detail members in the Zip archive, which are
54 @item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin}
55 @itemx @file{@var{prefix}_lightTableData.bin}
56 The structure of a table plus its data. Older SPV files pair a
57 @file{@var{prefix}_table.xml} file that describes the table's
58 structure with a binary @file{@var{prefix}_tableData.bin} file that
59 gives its data. Newer SPV files (the majority of those in the corpus)
60 instead include a single @file{@var{prefix}_lightTableData.bin} file
61 that incorporates both into a single binary format.
63 @item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin}
64 @itemx @file{@var{prefix}_lightWarningData.bin}
65 Same format used for tables, with a different name.
67 @item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin}
68 @itemx @file{@var{prefix}_lightNotesData.bin}
69 Same format used for tables, with a different name.
71 @item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml}
72 The structure of a chart plus its data. Charts do not have a
75 @item @file{@var{prefix}_Imagegeneric.png}
76 @itemx @file{@var{prefix}_PastedObjectgeneric.png}
77 @itemx @file{@var{prefix}_imageData.bin}
78 A PNG image referenced by an @code{object} element (in the first two
79 cases) or an @code{image} element (in the final case). @xref{SPV
80 Structure object and image Elements}.
82 @item @file{@var{prefix}_pmml.scf}
83 @itemx @file{@var{prefix}_stats.scf}
84 @item @file{@var{prefix}_model.xml}
85 Not yet investigated. The corpus contains few examples.
88 The @file{@var{prefix}} in the names of the detail members is
89 typically an 11-digit decimal number that increases for each item,
90 tending to skip values. Older SPV files use different naming
91 conventions. Structure member refer to detail members by name, and so
92 their exact names do not matter to readers as long as they are unique.
94 SPSS tolerates corrupted Zip archives that Zip reader libraries tend
95 to reject. These can be fixed up with @command{zip -FF}.
98 * SPV Structure Member Format::
99 * SPV Light Detail Member Format::
100 * SPV Legacy Detail Member Binary Format::
101 * SPV Legacy Detail Member XML Format::
104 @node SPV Structure Member Format
105 @section Structure Member Format
107 A structure member lays out the high-level structure for a group of
108 output items such as heading, tables, and charts. Structure members
109 do not include the details of tables and charts but instead refer to
110 them by their member names.
112 Structure members' XML files claim conformance with a collection of
113 XML Schemas. These schemas are distributed, under a nonfree license,
114 with SPSS binaries. Fortunately, the schemas are not necessary to
115 understand the structure members. The schemas can even
116 be deceptive because they document elements and attributes that are
117 not in the corpus and do not document elements and attributes that are
118 commonly found in the corpus.
120 Structure members use a different XML namespace for each schema, but
121 these namespaces are not entirely consistent. In some SPV files, for
122 example, the @code{viewer-tree} schema is associated with namespace
123 @indicateurl{http://xml.spss.com/spss/viewer-tree} and in others with
124 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the
125 additional @file{viewer/}). Under either name, the schema URIs are
126 not resolvable to obtain the schemas themselves.
128 One may ignore all of the above in interpreting a structure member.
129 The actual XML has a simple and straightforward form that does not
130 require a reader to take schemas or namespaces into account. A
131 structure member's root is @code{heading} element, which contains
132 @code{heading} or @code{container} elements (or a mix), forming a
133 tree. In turn, @code{container} holds a @code{label} and one more
134 child, usually @code{text} or @code{table}.
136 The following sections document the elements found in structure
137 members in a context-free grammar-like fashion. Consider the
138 following example, which specifies the attributes and content for the
139 @code{container} element:
143 :visibility=(visible | hidden)
144 :page-break-before=(always)?
145 :text-align=(left | center)?
147 => label (table | container_text | graph | model | object | image | tree)
150 Each attribute specification begins with @samp{:} followed by the
151 attribute's name. If the attribute's value has an easily specified
152 form, then @samp{=} and its description follows the name. Finally, if
153 the attribute is optional, the specification ends with @samp{?}. The
154 following value specifications are defined:
157 @item (@var{a} | @var{b} | @dots{})
158 One of the listed literal strings. If only one string is listed, it
159 is the only acceptable value. If @code{OTHER} is listed, then any
160 string not explicitly listed is also accepted.
163 Either @code{true} or @code{false}.
166 A floating-point number followed by a unit, e.g.@: @code{10pt}. Units
167 in the corpus include @code{in} (inch), @code{pt} (points, 72/inch),
168 @code{px} (``device-independent pixels'', 96/inch), and @code{cm}. If
169 the unit is omitted then points should be assumed. The number and
170 unit may be separated by white space.
172 The corpus also includes localized names for units. A reader must
173 understand these to properly interpret the dimension:
177 @code{인치}, @code{pol.}, @code{cala}, @code{cali}
187 A floating-point number.
193 A color in one of the forms @code{#@var{rr}@var{gg}@var{bb}} or
194 @code{@var{rr}@var{gg}@var{bb}}, or the string @code{transparent}, or
195 one of the standard Web color names.
198 @item ref @var{element}
199 @itemx ref(@var{elem1} | @var{elem2} | @dots{})
200 The name from the @code{id} attribute in some element. If one or more
201 elements are named, the name must refer to one of those elements,
202 otherwise any element is acceptable.
205 All elements have an optional @code{id} attribute. If present, its
206 value must be unique. In practice many elements are assigned
207 @code{id} attributes that are never referenced.
209 The content specification for an element supports the following
216 @item @var{a} @var{b}
217 @var{a} followed by @var{b}.
219 @item @var{a} | @var{b} | @var{c}
220 One of @var{a} or @var{b} or @var{c}.
223 Zero or one instances of @var{a}.
226 Zero or more instances of @var{a}.
229 One or more instances of @var{a}.
231 @item (@var{subexpression})
232 Grouping for a subexpression.
241 Element and attribute names are sometimes suffixed by another name in
242 square brackets to distinguish different uses of the same name. For
243 example, structure XML has two @code{text} elements, one inside
244 @code{container}, the other inside @code{pageParagraph}. The former
245 is defined as @code{text[container_text]} and referenced as
246 @code{container_text}, the latter defined as
247 @code{text[pageParagraph_text]} and referenced as
248 @code{pageParagraph_text}.
250 This language is used in the PSPP source code for parsing structure
251 and detail XML members. Refer to
252 @file{src/output/spv/structure-xml.grammar} and
253 @file{src/output/spv/detail-xml.grammar} for the full grammars.
255 The following example shows the contents of a typical structure member
256 for a @cmd{DESCRIPTIVES} procedure. A real structure member is not
257 indented. This example also omits most attributes, all XML namespace
258 information, and the CSS from the embedded HTML:
261 <?xml version="1.0" encoding="utf-8"?>
263 <label>Output</label>
264 <heading commandName="Descriptives">
265 <label>Descriptives</label>
268 <text commandName="Descriptives" type="title">
270 <![CDATA[<head><style type="text/css">...</style></head><BR>Descriptives]]>
274 <container visibility="hidden">
276 <table commandName="Descriptives" subType="Notes" type="note">
278 <dataPath>00000000001_lightNotesData.bin</dataPath>
283 <label>Descriptive Statistics</label>
284 <table commandName="Descriptives" subType="Descriptive Statistics"
287 <dataPath>00000000002_lightTableData.bin</dataPath>
296 * SPV Structure heading Element::
297 * SPV Structure label Element::
298 * SPV Structure container Element::
299 * SPV Structure text Element (Inside @code{container})::
300 * SPV Structure html Element::
301 * SPV Structure table Element::
302 * SPV Structure graph Element::
303 * SPV Structure model Element::
304 * SPV Structure object and image Elements::
305 * SPV Structure tree Element::
306 * SPV Structure Path Elements::
307 * SPV Structure pageSetup Element::
308 * SPV Structure @code{text} Element (Inside @code{pageParagraph})::
311 @node SPV Structure heading Element
312 @subsection The @code{heading} Element
315 heading[root_heading]
321 => label pageSetup? (container | heading)*
326 :visibility[heading_visibility]=(collapsed)?
329 => label (container | heading)*
332 The root of a structure member is a @code{heading}, which represents a
333 section of output beginning with a @code{label} and
334 ordinarily followed by content containers or further nested
335 (sub)-sections of output. Unlike heading elements in HTML and other
336 common document formats, which precede the content that they head,
337 @code{heading} contains the elements that appear below the heading.
339 The document root heading, only, may contain a @code{pageSetup}
342 The following attributes have been observed on both document root and
343 nested @code{heading} elements.
345 @defvr {Attribute} creator-version
346 The version of the software that created this SPV file. A string of
347 the form @code{xxyyzzww} represents software version xx.yy.zz.ww,
348 e.g.@: @code{21000001} is version 21.0.0.1. Trailing pairs of zeros
349 are sometimes omitted, so that @code{21}, @code{210000}, and
350 @code{21000000} are all version 21.0.0.0 (and the corpus contains all
351 three of those forms).
355 The following attributes have been observed on document root
356 @code{heading} elements only:
358 @defvr {Attribute} @code{creator}
359 The directory in the file system of the software that created this SPV
363 @defvr {Attribute} @code{creation-date-time}
364 The date and time at which the SPV file was written, in a
365 locale-specific format, e.g.@: @code{Friday, May 16, 2014 6:47:37 PM
366 PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday,
367 December 5, 2014 5:00:19 o'clock PM EST}.
370 @defvr {Attribute} @code{lockReader}
371 Whether a reader should be allowed to edit the output. The possible
372 values are @code{true} and @code{false}. The value @code{false} is by
376 @defvr {Attribute} @code{schemaLocation}
377 This is actually an XML Namespace attribute. A reader may ignore it.
381 The following attributes have been observed only on nested
382 @code{heading} elements:
384 @defvr {Attribute} @code{commandName}
385 A locale-invariant identifier for the command that produced the
386 output, e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
389 @defvr {Attribute} @code{visibility}
390 To what degree the output represented by the element is visible.
393 @defvr {Attribute} @code{locale}
394 The locale used for output, in Windows format, which is similar to the
395 format used in Unix with the underscore replaced by a hyphen, e.g.@:
396 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
399 @defvr {Attribute} @code{olang}
400 The output language, e.g.@: @code{en}, @code{it}, @code{es},
401 @code{de}, @code{pt-BR}.
404 @node SPV Structure label Element
405 @subsection The @code{label} Element
411 Every @code{heading} and @code{container} holds a @code{label} as its
412 first child. The label text is what appears in the outline pane of
413 the GUI's viewer window. PSPP also puts it into the outline of PDF
414 output. The label text doesn't appear in the output itself.
416 The text in @code{label} describes what it labels, often by naming the
417 statistical procedure that was executed, e.g.@: ``Frequencies'' or
418 ``T-Test''. The root @code{heading} in a structure member is normally
419 ``Output''. Labels are often very generic, especially within a
420 @code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''.
421 Label text is localized according to the output language, e.g.@: in
422 Italian a frequency table procedure is labeled ``Frequenze''.
424 The user can edit labels to be anything they want. The corpus
425 contains a few examples of empty labels, ones that contain no text,
426 probably as a result of user editing.
428 @node SPV Structure container Element
429 @subsection The @code{container} Element
433 :visibility=(visible | hidden)
434 :page-break-before=(always)?
435 :text-align=(left | center)?
437 => label (table | container_text | graph | model | object | image | tree)
440 A @code{container} serves to contain and label a @code{table},
441 @code{text}, or other kind of item.
443 This element has the following attributes.
445 @defvr {Attribute} @code{visibility}
446 Whether the container's content is displayed. ``Notes'' tables are
447 often hidden; other data is usually
450 @defvr {Attribute} @code{text-align}
451 Alignment of text within the container. Observed with nested
452 @code{table} and @code{text} elements.
455 @defvr {Attribute} @code{width}
456 The width of the container, e.g.@: @code{1097px}.
459 @node SPV Structure text Element (Inside @code{container})
460 @subsection The @code{text} Element (Inside @code{container})
464 :type[text_type]=(title | log | text | page-title)
470 This @code{text} element is nested inside a @code{container}. There
471 is a different @code{text} element that is nested inside a
472 @code{pageParagraph}.
474 This element has the following attributes.
476 @defvr {Attribute} @code{type}
477 The semantics of the text.
480 @defvr {Attribute} @code{commandName}
481 As on the @code{heading} element. For output not specific to a
482 command, this is simply @code{log}. The corpus contains one example
483 of where @code{commandName} is present but set to the empty string.
486 @defvr {Attribute} @code{creator-version}
487 As on the @code{heading} element.
490 @node SPV Structure html Element
491 @subsection The @code{html} Element
494 html :lang=(en) => TEXT
497 The element contains an HTML document as text (or, in practice, as
498 CDATA). In some cases, the document starts with @code{<html>} and
499 ends with @code{</html>}; in others the @code{html} element is
500 implied. Generally the HTML includes a @code{head} element with a CSS
501 stylesheet. The HTML body often begins with @code{<BR>}.
503 The HTML document uses only the following elements:
507 Sometimes, the document is enclosed with
508 @code{<html>}@dots{}@code{</html>}.
511 The HTML body often begins with @code{<BR>} and may contain it as well.
519 The attributes @code{face}, @code{color}, and @code{size} are
520 observed. The value of @code{color} takes one of the forms
521 @code{#@var{rr}@var{gg}@var{bb}} or @code{rgb (@var{r}, @var{g},
522 @var{b})}. The value of @code{size} is a number between 1 and 7,
526 The CSS in the corpus is simple. To understand it, a parser only
527 needs to be able to skip white space, @code{<!--}, and @code{-->}, and
528 parse style only for @code{p} elements. Only the following properties
533 In the form @code{@var{rr}@var{gg}@var{bb}}, e.g. @code{000000}, with
537 Either @code{bold} or @code{normal}.
540 Either @code{italic} or @code{normal}.
542 @item text-decoration
543 Either @code{underline} or @code{normal}.
546 A font name, commonly @code{Monospaced} or @code{SansSerif}.
549 Values claim to be in points, e.g.@: @code{14pt}, but the values are
550 actually in ``device-independent pixels'' (px), at 96/inch.
553 This element has the following attributes.
555 @defvr {Attribute} @code{lang}
556 This always contains @code{en} in the corpus.
559 @node SPV Structure table Element
560 @subsection The @code{table} Element
569 :displayFiltering=bool?
571 :orphanTolerance=int?
576 :type[table_type]=(table | note | warning)
577 => tableProperties? tableStructure
579 tableStructure => path? dataPath csvPath?
582 This element has the following attributes.
584 @defvr {Attribute} @code{commandName}
585 As on the @code{heading} element.
588 @defvr {Attribute} @code{type}
589 One of @code{table}, @code{note}, or @code{warning}.
592 @defvr {Attribute} @code{subType}
593 The locale-invariant command ID for the particular kind of output that
594 this table represents in the procedure. This can be the same as
595 @code{commandName} e.g.@: @code{Frequencies}, or different, e.g.@:
596 @code{Case Processing Summary}. Generic subtypes @code{Notes} and
597 @code{Warnings} are often used.
600 @defvr {Attribute} @code{tableId}
601 A number that uniquely identifies the table within the SPV file,
602 typically a large negative number such as @code{-4147135649387905023}.
605 @defvr {Attribute} @code{creator-version}
606 As on the @code{heading} element. In the corpus, this is only present
607 for version 21 and up and always includes all 8 digits.
610 @xref{SPV Detail Legacy Properties}, for details on the
611 @code{tableProperties} element.
613 @node SPV Structure graph Element
614 @subsection The @code{graph} Element
629 => dataPath? path csvPath?
632 This element represents a graph. The @code{dataPath} and @code{path}
633 elements name the Zip members that give the details of the graph.
634 Normally, both elements are present; there is only one counterexample
637 @code{csvPath} only appears in one SPV file in the corpus, for two
638 graphs. In these two cases, @code{dataPath}, @code{path}, and
639 @code{csvPath} all appear. These @code{csvPath} name Zip members with
640 names of the form @file{@var{number}_csv.bin}, where @var{number} is a
641 many-digit number and the same as the @code{csvFileIds}. The named
642 Zip members are CSV text files (despite the @file{.bin} extension).
643 The CSV files are encoded in UTF-8 and begin with a U+FEFF byte-order
646 @node SPV Structure model Element
647 @subsection The @code{model} Element
659 => ViZml? dataPath? path | pmmlContainerPath statsContainerPath
661 pmmlContainerPath => TEXT
663 statsContainerPath => TEXT
665 ViZml :viewName? => TEXT
668 This element represents a model. The @code{dataPath} and @code{path}
669 elements name the Zip members that give the details of the model.
670 Normally, both elements are present; there is only one counterexample
673 The details are unexplored. The @code{ViZml} element contains base-64
674 encoded text, that decodes to a binary format with some embedded text
675 strings, and @code{path} names an Zip member that contains XML.
676 Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath}
677 name Zip members with @file{.scf} extension.
679 @node SPV Structure object and image Elements
680 @subsection The @code{object} and @code{image} Elements
683 object :type[object_type]=(unknown)? :uri => EMPTY
685 image :VDPId :commandName => dataPath
688 These two elements represent an image in PNG format. They are
689 equivalent and the corpus contains examples of both. The only
690 difference is the syntax: for @code{object}, the @code{uri} attribute
691 names the Zip member that contains a PNG file; for @code{image}, the
692 text of the inner @code{dataPath} element names the Zip member.
694 PSPP writes @code{object} in output but there is no strong reason to
697 The corpus only contains PNG image files.
699 @node SPV Structure tree Element
700 @subsection The @code{tree} Element
711 This element represents a tree. The @code{dataPath} and @code{path}
712 elements name the Zip members that give the details of the tree.
713 The details are unexplored.
715 @node SPV Structure Path Elements
716 @subsection Path Elements
726 These element contain the name of the Zip members that hold details
727 for a container. For tables:
731 When a ``light'' format is used, only @code{dataPath} is present, and
732 it names a @file{.bin} member of the Zip file that has @code{light} in
733 its name, e.g.@: @code{0000000001437_lightTableData.bin} (@pxref{SPV
734 Light Detail Member Format}).
737 When the legacy format is used, both are present. In this case,
738 @code{dataPath} names a Zip member with a legacy binary format that
739 contains relevant data (@pxref{SPV Legacy Detail Member Binary
740 Format}), and @code{path} names a Zip member that uses an XML format
741 (@pxref{SPV Legacy Detail Member XML Format}).
744 Graphs normally follow the legacy approach described above. The
745 corpus contains one example of a graph with @code{path} but not
746 @code{dataPath}. The reason is unexplored.
748 Models use @code{path} but not @code{dataPath}. @xref{SPV Structure
749 graph Element}, for more information.
751 These elements have no attributes.
753 @node SPV Structure pageSetup Element
754 @subsection The @code{pageSetup} Element
758 :initial-page-number=int?
759 :chart-size=(as-is | full-height | half-height | quarter-height | OTHER)?
760 :margin-left=dimension?
761 :margin-right=dimension?
762 :margin-top=dimension?
763 :margin-bottom=dimension?
764 :paper-height=dimension?
765 :paper-width=dimension?
766 :reference-orientation?
767 :space-after=dimension?
768 => pageHeader pageFooter
770 pageHeader => pageParagraph?
772 pageFooter => pageParagraph?
774 pageParagraph => pageParagraph_text
777 The @code{pageSetup} element has the following attributes.
779 @defvr {Attribute} @code{initial-page-number}
780 The page number to put on the first page of printed output. Usually
784 @defvr {Attribute} @code{chart-size}
785 One of the listed, self-explanatory chart sizes,
786 @code{quarter-height}, or a localization (!) of one of these (e.g.@:
787 @code{dimensione attuale}, @code{Wie vorgegeben}).
790 @defvr {Attribute} @code{margin-left}
791 @defvrx {Attribute} @code{margin-right}
792 @defvrx {Attribute} @code{margin-top}
793 @defvrx {Attribute} @code{margin-bottom}
794 Margin sizes, e.g.@: @code{0.25in}.
797 @defvr {Attribute} @code{paper-height}
798 @defvrx {Attribute} @code{paper-width}
802 @defvr {Attribute} @code{reference-orientation}
803 Indicates the orientation of the output page. Either @code{0deg}
804 (portrait) or @code{90deg} (landscape),
807 @defvr {Attribute} @code{space-after}
808 The amount of space between printed objects, typically @code{12pt}.
811 @node SPV Structure @code{text} Element (Inside @code{pageParagraph})
812 @subsection The @code{text} Element (Inside @code{pageParagraph})
815 text[pageParagraph_text] :type=(title | text) => TEXT
818 This @code{text} element is nested inside a @code{pageParagraph}. There
819 is a different @code{text} element that is nested inside a
822 The element is either empty, or contains CDATA that holds almost-XHTML
823 text: in the corpus, either an @code{html} or @code{p} element. It is
824 @emph{almost}-XHTML because the @code{html} element designates the
826 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} instead of
827 an XHTML namespace, and because the CDATA can contain substitution
828 variables. The following variables are supported:
833 The current date or time in the preferred format for the locale.
839 First-, second-, third-, or fourth-level heading.
845 Name of the output file.
851 @code{&[Page]} for the page number and @code{&[PageTitle]} for the
854 Typical contents (indented for clarity):
857 <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
860 <p style="text-align:right; margin-top: 0">Page &[Page]</p>
865 This element has the following attributes.
867 @defvr {Attribute} @code{type}
871 @node SPV Light Detail Member Format
872 @section Light Detail Member Format
874 This section describes the format of ``light'' detail @file{.bin}
875 members. These members have a binary format which we describe here in
876 terms of a context-free grammar using the following conventions:
879 @item NonTerminal @result{} @dots{}
880 Nonterminals have CamelCaps names, and @result{} indicates a
881 production. The right-hand side of a production is often broken
882 across multiple lines. Break points are chosen for aesthetics only
883 and have no semantic significance.
885 @item 00, 01, @dots{}, ff.
886 A bytes with a fixed value, written as a pair of hexadecimal digits.
888 @item i0, i1, @dots{}, i9, i10, i11, @dots{}
889 @itemx ib0, ib1, @dots{}, ib9, ib10, ib11, @dots{}
890 A 32-bit integer in little-endian or big-endian byte order,
891 respectively, with a fixed value, written in decimal. Prefixed by
892 @samp{i} for little-endian or @samp{ib} for big-endian.
898 A byte with value 0 or 1.
902 A 16-bit unsigned integer in little-endian or big-endian byte order,
907 A 32-bit unsigned integer in little-endian or big-endian byte order,
912 A 64-bit unsigned integer in little-endian or big-endian byte order,
916 A 64-bit IEEE floating-point number.
919 A 32-bit IEEE floating-point number.
923 A 32-bit unsigned integer, in little-endian or big-endian byte order,
924 respectively, followed by the specified number of bytes of character
925 data. (The encoding is indicated by the Formats nonterminal.)
928 @var{x} is optional, e.g.@: 00? is an optional zero byte.
930 @item @var{x}*@var{n}
931 @var{x} is repeated @var{n} times, e.g.@: byte*10 for ten arbitrary bytes.
933 @item @var{x}[@var{name}]
934 Gives @var{x} the specified @var{name}. Names are used in textual
935 explanations. They are also used, also bracketed, to indicate counts,
936 e.g.@: @code{int32[n] byte*[n]} for a 32-bit integer followed by the
937 specified number of arbitrary bytes.
939 @item @var{a} @math{|} @var{b}
940 Either @var{a} or @var{b}.
943 Parentheses are used for grouping to make precedence clear, especially
944 in the presence of @math{|}, e.g.@: in 00 (01 @math{|} 02 @math{|} 03)
948 @itemx becount(@var{x})
949 A 32-bit unsigned integer, in little-endian or big-endian byte order,
950 respectively, that indicates the number of bytes in @var{x}, followed
954 In a version 1 @file{.bin} member, @var{x}; in version 3, nothing.
955 (The @file{.bin} header indicates the version.)
958 In a version 3 @file{.bin} member, @var{x}; in version 1, nothing.
961 PSPP uses this grammar to parse light detail members. See
962 @file{src/output/spv/light-binary.grammar} in the PSPP source tree for
965 Little-endian byte order is far more common in this format, but a few
966 pieces of the format use big-endian byte order.
968 Light detail members express linear units in two ways: points (pt), at
969 72/inch, and ``device-independent pixels'' (px), at 96/inch. To
970 convert from pt to px, multiply by 1.33 and round up. To convert
971 from px to pt, divide by 1.33 and round down.
973 A ``light'' detail member @file{.bin} consists of a number of sections
974 concatenated together, terminated by an optional byte 01:
978 Header Titles Footnotes
979 Areas Borders PrintSettings TableSettings Formats
980 Dimensions Axes Cells
984 The following sections go into more detail.
987 * SPV Light Member Header::
988 * SPV Light Member Titles::
989 * SPV Light Member Footnotes::
990 * SPV Light Member Areas::
991 * SPV Light Member Borders::
992 * SPV Light Member Print Settings::
993 * SPV Light Member Table Settings::
994 * SPV Light Member Formats::
995 * SPV Light Member Dimensions::
996 * SPV Light Member Categories::
997 * SPV Light Member Axes::
998 * SPV Light Member Cells::
999 * SPV Light Member Value::
1000 * SPV Light Member ValueMod::
1003 @node SPV Light Member Header
1006 An SPV light member begins with a 39-byte header:
1011 (i1 @math{|} i3)[version]
1014 bool[rotate-inner-column-labels]
1015 bool[rotate-outer-row-labels]
1018 int32[min-col-width] int32[max-col-width]
1019 int32[min-row-width] int32[max-row-width]
1023 @code{version} is a version number that affects the interpretation of
1024 some of the other data in the member. We will refer to ``version 1''
1025 and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for
1026 version-specific formatting (as described previously).
1028 If @code{rotate-inner-column-labels} is 1, then column labels closest
1029 to the data are rotated 90° counterclockwise; otherwise, they are
1030 shown in the normal way.
1032 If @code{rotate-outer-row-labels} is 1, then row labels farthest from
1033 the data are rotated 90° counterclockwise; otherwise, they are shown
1036 @code{min-col-width} is the minimum width that a column will be
1037 assigned automatically. @code{max-col-width} is the maximum width
1038 that a column will be assigned to accommodate a long column label.
1039 @code{min-row-width} and @code{max-row-width} are a similar range for
1040 the width of row labels. All of these measurements are in 1/96 inch
1041 units (called a ``device independent pixel'' unit in Windows).
1043 @code{table-id} is a binary version of the @code{tableId} attribute in
1044 the structure member that refers to the detail member. For example,
1045 if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
1046 would be 0xc6c99d183b300001.
1048 The meaning of the other variable parts of the header is not known. A
1049 writer may safely use version 3, true for @code{x0}, false for
1050 @code{x1}, true for @code{x2}, and 0x15 for @code{x3}.
1052 @node SPV Light Member Titles
1058 Value[subtype] 01? 31
1059 Value[user-title] 01?
1060 (31 Value[corner-text] @math{|} 58)
1061 (31 Value[caption] @math{|} 58)
1064 The Titles follow the Header and specify the table's title, caption,
1067 The @code{user-title} reflects any user
1068 editing of the title text or style. The @code{title} is the title
1069 originally generated by the procedure. Both of these are appropriate
1070 for presentation and localized to the user's language. For example,
1071 for a frequency table, @code{title} and @code{user-title} normally
1072 name the variable and @code{c} is simply ``Frequencies''.
1074 @code{subtype} is the same as the @code{subType} attribute in the
1075 @code{table} structure XML element that referred to this member.
1076 @xref{SPV Structure table Element}, for details.
1078 The @code{corner-text}, if present, is shown in the upper-left corner
1079 of the table, above the row headings and to the left of the column
1080 headings. It is usually absent. When row dimension labels are
1081 displayed in the corner (see @code{show-row-labels-in-corner}), corner
1084 The @code{caption}, if present, is shown below the table.
1085 @code{caption} reflects user editing of the caption.
1087 @node SPV Light Member Footnotes
1088 @subsection Footnotes
1091 Footnotes => int32[n-footnotes] Footnote*[n-footnotes]
1092 Footnote => Value[text] (58 @math{|} 31 Value[marker]) int32[show]
1095 Each footnote has @code{text} and an optional custom @code{marker}
1098 The syntax for Value would allow footnotes (and their markers) to
1099 reference other footnotes, but in practice this doesn't work.
1101 @code{show} is a 32-bit signed integer. It is positive to show the
1102 footnote or negative to hide it. Its magnitude is often 1, and in
1103 other cases tends to be the number of references to the footnote.
1104 It is safe to write 1 to show a footnote and -1 to hide it.
1106 @node SPV Light Member Areas
1113 string[typeface] float[size] int32[style] bool[underline]
1114 int32[halign] int32[valign]
1115 string[fg-color] string[bg-color]
1116 bool[alternate] string[alt-fg-color] string[alt-bg-color]
1117 v3(int32[left-margin] int32[right-margin] int32[top-margin] int32[bottom-margin])
1120 Each Area represents the style for a different area of the table, in
1121 the following order: title, caption, footer, corner, column labels,
1122 row labels, data, and layers.
1124 @code{index} is the 1-based index of the Area, i.e.@: 1 for the first
1125 Area, through 8 for the final Area.
1127 @code{typeface} is the string name of the font used in the area. In
1128 the corpus, this is @code{SansSerif} in over 99% of instances and
1129 @code{Times New Roman} in the rest.
1131 @code{size} is the size of the font, in px (@pxref{SPV Light Detail
1132 Member Format}). The most common size in the corpus is 12 px. Even
1133 though @code{size} has a floating-point type, in the corpus its values
1134 are always integers.
1136 @code{style} is a bit mask. Bit 0 (with value 1) is set for bold, bit
1137 1 (with value 2) is set for italic.
1139 @code{underline} is 1 if the font is underlined, 0 otherwise.
1141 @code{halign} specifies horizontal alignment: 0 for center, 2 for
1142 left, 4 for right, 61453 for decimal, 64173 for mixed. Mixed
1143 alignment varies according to type: string data is left-justified,
1144 numbers and most other formats are right-justified.
1146 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
1149 @code{fg-color} and @code{bg-color} are the foreground color and
1150 background color, respectively. In the corpus, these are always
1151 @code{#000000} and @code{#ffffff}, respectively.
1153 @code{alternate} is 1 if rows should alternate colors, 0 if all rows
1154 should be the same color. When @code{alternate} is 1,
1155 @code{alt-fg-color} and @code{alt-bg-color} specify the colors for the
1156 alternate rows; otherwise they are empty strings.
1158 @code{left-margin}, @code{right-margin}, @code{top-margin}, and
1159 @code{bottom-margin} are measured in px.
1161 @node SPV Light Member Borders
1168 be32[n-borders] Border*[n-borders]
1169 bool[show-grid-lines]
1178 The Borders reflect how borders between regions are drawn.
1180 The fixed value of @code{endian} can be used to validate the
1183 @code{show-grid-lines} is 1 to draw grid lines, otherwise 0.
1185 Each Border describes one kind of border. @code{n-borders} seems to
1186 always be 19. Each @code{border-type} appears once (although in an
1187 unpredictable order) and correspond to the following borders:
1193 Left, top, right, and bottom outer frame.
1195 Left, top, right, and bottom inner frame.
1197 Left and top of data area.
1199 Horizontal and vertical dimension rows.
1201 Horizontal and vertical dimension columns.
1203 Horizontal and vertical category rows.
1205 Horizontal and vertical category columns.
1208 @code{stroke-type} describes how a border is drawn, as one of:
1225 @code{color} is an RGB color. Bits 24--31 are alpha, bits 16--23 are
1226 red, 8--15 are green, 0--7 are blue. An alpha of 255 indicates an
1227 opaque color, therefore opaque black is 0xff000000.
1229 @node SPV Light Member Print Settings
1230 @subsection Print Settings
1237 bool[paginate-layers]
1240 bool[top-continuation]
1241 bool[bottom-continuation]
1242 be32[n-orphan-lines]
1243 bestring[continuation-string])
1246 The PrintSettings reflect settings for printing. The fixed value of
1247 @code{endian} can be used to validate the endianness.
1249 @code{all-layers} is 1 to print all layers, 0 to print only the layer
1250 designated by @code{current-layer} in TableSettings (@pxref{SPV Light
1251 Member Table Settings}).
1253 @code{paginate-layers} is 1 to print each layer at the start of a new
1254 page, 0 otherwise. (This setting is honored only @code{all-layers} is
1255 1, since otherwise only one layer is printed.)
1257 @code{fit-width} and @code{fit-length} control whether the table is
1258 shrunk to fit within a page's width or length, respectively.
1260 @code{n-orphan-lines} is the minimum number of rows or columns to put
1261 in one part of a table that is broken across pages.
1263 If @code{top-continuation} is 1, then @code{continuation-string} is
1264 printed at the top of a page when a table is broken across pages for
1265 printing; similarly for @code{bottom-continuation} and the bottom of a
1266 page. Usually, @code{continuation-string} is empty.
1268 @node SPV Light Member Table Settings
1269 @subsection Table Settings
1279 bool[show-row-labels-in-corner]
1280 bool[show-alphabetic-markers]
1281 bool[footnote-marker-superscripts]
1284 Breakpoints[row-breaks] Breakpoints[column-breaks]
1285 Keeps[row-keeps] Keeps[column-keeps]
1286 PointKeeps[row-point-keeps] PointKeeps[column-point-keeps]
1289 bestring[table-look]
1292 Breakpoints => be32[n-breaks] be32*[n-breaks]
1294 Keeps => be32[n-keeps] Keep*[n-keeps]
1295 Keep => be32[offset] be32[n]
1297 PointKeeps => be32[n-point-keeps] PointKeep*[n-point-keeps]
1298 PointKeep => be32[offset] be32 be32
1301 The TableSettings reflect display settings. The fixed value of
1302 @code{endian} can be used to validate the endianness.
1304 @code{current-layer} is the displayed layer. The interpretation when
1305 there is more than one layer dimension is not yet known.
1307 If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
1308 any cell) are hidden; otherwise, they are shown.
1310 If @code{show-row-labels-in-corner} is 1, then row labels are shown in
1311 the upper left corner; otherwise, they are shown nested.
1313 If @code{show-alphabetic-markers} is 1, markers are shown as letters
1314 (e.g.@: @samp{a}, @samp{b}, @samp{c}, @dots{}); otherwise, they are
1315 shown as numbers starting from 1.
1317 When @code{footnote-marker-superscripts} is 1, footnote markers are shown
1318 as superscripts, otherwise as subscripts.
1320 The Breakpoints are rows or columns after which there is a page break;
1321 for example, a row break of 1 requests a page break after the second
1322 row. Usually no breakpoints are specified, indicating that page
1323 breaks should be selected automatically.
1325 The Keeps are ranges of rows or columns to be kept together without a
1326 page break; for example, a row Keep with @code{offset} 1 and @code{n}
1327 10 requests that the 10 rows starting with the second row be kept
1328 together. Usually no Keeps are specified.
1330 The PointKeeps seem to be generated automatically based on
1331 user-specified Keeps. They seems to indicate a conversion from rows
1332 or columns to pixel or point offsets.
1334 @code{notes} is a text string that contains user-specified notes. It
1335 is displayed when the user hovers the cursor over the table, like text
1336 in the @code{title} attribute in HTML@. It is not printed. It is
1339 @code{table-look} is the name of a SPSS ``TableLook'' table style,
1340 such as ``Default'' or ``Academic''; it is often empty.
1342 TableSettings ends with an arbitrary number of null bytes. A writer
1343 may safely write 82 null bytes.
1345 A writer may safely use 4 for @code{x5} and 0 for @code{x6}.
1347 @node SPV Light Member Formats
1352 int32[n-widths] int32*[n-widths]
1354 int32[current-layer]
1355 bool[x7] bool[x8] bool[x9]
1360 v3(count(X1 count(X2)) count(X3)))
1361 Y0 => int32[epoch] byte[decimal] byte[grouping]
1362 CustomCurrency => int32[n-ccs] string*[n-ccs]
1365 If @code{n-widths} is nonzero, then the accompanying integers are
1366 column widths as manually adjusted by the user.
1368 @code{locale} is a locale including an encoding, such as
1369 @code{en_US.windows-1252} or @code{it_IT.windows-1252}.
1370 (@code{locale} is often duplicated in Y1, described below).
1372 @code{epoch} is the year that starts the epoch. A 2-digit year is
1373 interpreted as belonging to the 100 years beginning at the epoch. The
1374 default epoch year is 69 years prior to the current year; thus, in
1375 2017 this field by default contains 1948. In the corpus, @code{epoch}
1376 ranges from 1943 to 1948, plus some contain -1.
1378 @code{decimal} is the decimal point character. The observed values
1379 are @samp{.} and @samp{,}.
1381 @code{grouping} is the grouping character. Usually, it is @samp{,} if
1382 @code{decimal} is @samp{.}, and vice versa. Other observed values are
1383 @samp{'} (apostrophe), @samp{ } (space), and zero (presumably
1384 indicating that digits should not be grouped).
1386 @code{n-ccs} is observed as either 0 or 5. When it is 5, the
1387 following strings are CCA through CCE format strings. @xref{Custom
1388 Currency Formats,,, pspp, PSPP}. Most commonly these are all
1389 @code{-,,,} but other strings occur.
1391 A writer may safely use false for @code{x7}, @code{x8}, and @code{x9}.
1395 X0 only appears, optionally, in version 1 members.
1400 string[command] string[command-local]
1401 string[language] string[charset] string[locale]
1404 Y2 => CustomCurrency byte[missing] bool[x17]
1407 @code{command} describes the statistical procedure that generated the
1408 output, in English. It is not necessarily the literal syntax name of
1409 the procedure: for example, NPAR TESTS becomes ``Nonparametric
1410 Tests.'' @code{command-local} is the procedure's name, translated
1411 into the output language; it is often empty and, when it is not,
1412 sometimes the same as @code{command}.
1414 @code{missing} is the character used to indicate that a cell contains
1415 a missing value. It is always observed as @samp{.}.
1417 A writer may safely use false for @code{x17}.
1421 X1 only appears in version 3 members.
1429 byte[show-variables]
1431 int32[x18] int32[x19]
1437 @code{lang} may indicate the language in use. Some values seem to be
1438 0: @t{en}, 1: @t{de}, 2: @t{es}, 3: @t{it}, 5: @t{ko}, 6: @t{pl}, 8:
1439 @t{zh-tw}, 10: @t{pt_BR}, 11: @t{fr}.
1441 @code{show-variables} determines how variables are displayed by
1442 default. A value of 1 means to display variable names, 2 to display
1443 variable labels when available, 3 to display both (name followed by
1444 label, separated by a space). The most common value is 0, which
1445 probably means to use a global default.
1447 @code{show-values} is a similar setting for values. A value of 1
1448 means to display the value, 2 to display the value label when
1449 available, 3 to display both. Again, the most common value is 0,
1450 which probably means to use a global default.
1452 @code{show-title} is 1 to show the caption, 10 to hide it.
1454 @code{show-caption} is true to show the caption, false to hide it.
1456 A writer may safely use false for @code{x14}, false for @code{x16}, 0
1457 for @code{lang}, -1 for @code{x18} and @code{x19}, and false for
1462 X2 only appears in version 3 members.
1466 int32[n-row-heights] int32*[n-row-heights]
1467 int32[n-style-map] StyleMap*[n-style-map]
1468 int32[n-styles] StylePair*[n-styles]
1470 StyleMap => int64[cell-index] int16[style-index]
1473 If present, @code{n-row-heights} and the accompanying integers are row
1474 heights as manually adjusted by the user.
1476 The rest of X2 specifies styles for data cells. At first glance this
1477 is odd, because each data cell can have its own style embedded as part
1478 of the data, but in practice X2 specifies a style for a cell only if
1479 that cell is empty (and thus does not appear in the data at all).
1480 Each StyleMap specifies the index of a blank cell, calculated the same
1481 was as in the Cells (@pxref{SPV Light Member Cells}), along with a
1482 0-based index into the accompanying StylePair array.
1484 A writer may safely omit the optional @code{i0 i0} inside the
1485 @code{count(@dots{})}.
1489 X3 only appears in version 3 members.
1493 01 00 byte[x21] 00 00 00
1496 (string[dataset] string[datafile] i0 int32[date] i0)?
1501 @code{small} is a small real number. In the corpus, it overwhelmingly
1502 takes the value 0.0001, with zero occasionally seen. Nonzero numbers
1503 with format 40 (@pxref{SPV Light Member Value}) whose magnitudes are
1504 smaller than displayed in scientific notation. (Thus, a @code{small}
1505 of zero prevents scientific notation from being chosen.)
1507 @code{dataset} is the name of the dataset analyzed to produce the
1508 output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
1509 file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter
1510 is sometimes the empty string.
1512 @code{date} is a date, as seconds since the epoch, i.e.@: since
1513 January 1, 1970. Pivot tables within an SPV file often have dates a
1514 few minutes apart, so this is probably a creation date for the table
1515 rather than for the file.
1517 Sometimes @code{dataset}, @code{datafile}, and @code{date} are present
1518 and other times they are absent. The reader can distinguish by
1519 assuming that they are present and then checking whether the
1520 presumptive @code{dataset} contains a null byte (a valid string never
1523 @code{x22} is usually 0 or 2000000.
1525 A writer may safely use 4 for @code{x21} and omit @code{x22} and the
1526 other optional bytes at the end.
1528 @subsubheading Encoding
1530 Formats contains several indications of character encoding:
1534 @code{locale} in Formats itself.
1537 @code{locale} in Y1 (in version 1, Y1 is optionally nested inside X0;
1538 in version 3, Y1 is nested inside X3).
1541 @code{charset} in version 3, in Y1.
1544 @code{lang} in X1, in version 3.
1547 @code{charset}, if present, is a good indication of character
1548 encoding, and in its absence the encoding suffix on @code{locale} in
1551 @code{locale} in Y1 can be disregarded: it is normally the same as
1552 @code{locale} in Formats, and it is only present if @code{charset} is
1555 @code{lang} is not helpful and should be ignored for character
1558 However, the corpus contains many examples of light members whose
1559 strings are encoded in UTF-8 despite declaring some other character
1560 set. Furthermore, the corpus contains several examples of light
1561 members in which some strings are encoded in UTF-8 (and contain
1562 multibyte characters) and other strings are encoded in another
1563 character set (and contain non-ASCII characters). PSPP treats any
1564 valid UTF-8 string as UTF-8 and only falls back to the declared
1565 encoding for strings that are not valid UTF-8.
1567 The @command{pspp-output} program's @command{strings} command can help
1568 analyze the encoding in an SPV light member. Use @code{pspp-output
1569 --help-dev} to see its usage.
1571 @node SPV Light Member Dimensions
1572 @subsection Dimensions
1574 A pivot table presents multidimensional data. A Dimension identifies
1575 the categories associated with each dimension.
1578 Dimensions => int32[n-dims] Dimension*[n-dims]
1580 Value[name] DimProperties
1581 int32[n-categories] Category*[n-categories]
1586 bool[hide-dim-label]
1587 bool[hide-all-labels]
1591 @code{name} is the name of the dimension, e.g.@: @code{Variables},
1592 @code{Statistics}, or a variable name.
1594 The meanings of @code{x1} and @code{x3} are unknown. @code{x1} is
1595 usually 0 but many other values have been observed. A writer may
1596 safely use 0 for @code{x1} and 2 for @code{x3}.
1598 @code{x2} is 0, 1, or 2. For a pivot table with @var{L} layer
1599 dimensions, @var{R} row dimensions, and @var{C} column dimensions,
1600 @code{x2} is 2 for the first @var{L} dimensions, 0 for the next
1601 @var{R} dimensions, and 1 for the remaining @var{C} dimensions. This
1602 does not mean that the layer dimensions must be presented first,
1603 followed by the row dimensions, followed by the column dimensions---on
1604 the contrary, they are frequently in a different order---but @code{x2}
1605 must follow this pattern to prevent the pivot table from being
1608 If @code{hide-dim-label} is 00, the pivot table displays a label for
1609 the dimension itself. Because usually the group and category labels
1610 are enough explanation, it is usually 01.
1612 If @code{hide-all-labels} is 01, the pivot table omits all labels for
1613 the dimension, including group and category labels. It is usually 00.
1614 When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored.
1616 @code{dim-index} is usually the 0-based index of the dimension, e.g.@:
1617 0 for the first dimension, 1 for the second, and so on. Sometimes it
1618 is -1. There is no visible difference. A writer may safely use the
1621 @node SPV Light Member Categories
1622 @subsection Categories
1624 Categories are arranged in a tree. Only the leaf nodes in the tree
1625 are really categories; the others just serve as grouping constructs.
1628 Category => Value[name] (Leaf @math{|} Group)
1629 Leaf => 00 00 00 i2 int32[leaf-index] i0
1631 bool[merge] 00 01 int32[x23]
1632 i-1 int32[n-subcategories] Category*[n-subcategories]
1635 @code{name} is the name of the category (or group).
1637 A Leaf represents a leaf category. The Leaf's @code{leaf-index} is a
1638 nonnegative integer unique within the Dimension and less than
1639 @code{n-categories} in the Dimension. If the user does not sort or
1640 rearrange the categories, then @code{leaf-index} starts at 0 for the
1641 first Leaf in the dimension and increments by 1 with each successive
1642 Leaf. If the user does sorts or rearrange the categories, then the
1643 order of categories in the file reflects that change and
1644 @code{leaf-index} reflects the original order.
1646 A dimension can have no leaf categories at all. A table that
1647 contains such a dimension necessarily has no data at all.
1649 A Group is a group of nested categories. Usually a Group contains at
1650 least one Category, so that @code{n-subcategories} is positive, but
1651 Groups with zero subcategories have been observed.
1653 If a Group's @code{merge} is 00, the most common value, then the group
1654 is really a distinct group that should be represented as such in the
1655 visual representation and user interface. If @code{merge} is 01, the
1656 categories in this group should be shown and treated as if they were
1657 direct children of the group's containing group (or if it has no
1658 parent group, then direct children of the dimension), and this group's
1659 name is irrelevant and should not be displayed. (Merged groups can be
1662 Writers need not use merged groups.
1664 A Group's @code{x23} appears to be i2 when all of the categories
1665 within a group are leaf categories that directly represent data values
1666 for a variable (e.g.@: in a frequency table or crosstabulation, a group
1667 of values in a variable being tabulated) and i0 otherwise. A writer
1668 may safely write a constant 0 in this field.
1670 @node SPV Light Member Axes
1673 After the dimensions come assignment of each dimension to one of the
1674 axes: layers, rows, and columns.
1678 int32[n-layers] int32[n-rows] int32[n-columns]
1679 int32*[n-layers] int32*[n-rows] int32*[n-columns]
1682 The values of @code{n-layers}, @code{n-rows}, and @code{n-columns}
1683 each specifies the number of dimensions displayed in layers, rows, and
1684 columns, respectively. Any of them may be zero. Their values sum to
1685 @code{n-dimensions} from Dimensions (@pxref{SPV Light Member
1688 The following @code{n-dimensions} integers, in three groups, are a
1689 permutation of the 0-based dimension numbers. The first
1690 @code{n-layers} integers specify each of the dimensions represented by
1691 layers, the next @code{n-rows} integers specify the dimensions
1692 represented by rows, and the final @code{n-columns} integers specify
1693 the dimensions represented by columns. When there is more than one
1694 dimension of a given kind, the inner dimensions are given first.
1696 @node SPV Light Member Cells
1699 The final part of an SPV light member contains the actual data.
1702 Cells => int32[n-cells] Cell*[n-cells]
1703 Cell => int64[index] v1(00?) Value
1706 A Cell consists of an @code{index} and a Value. Suppose there are
1707 @math{d} dimensions, numbered 1 through @math{d} in the order given in
1708 the Dimensions previously, and that dimension @math{i} has @math{n_i}
1709 categories. Consider the cell at coordinates @math{x_i}, @math{1 \le
1710 i \le d}, and note that @math{0 \le x_i < n_i}. Then the index is
1711 calculated by the following algorithm:
1715 for each @math{i} from 1 to @math{d}:
1716 @i{index} = (@math{n_i \times} @i{index}) @math{+} @math{x_i}
1719 For example, suppose there are 3 dimensions with 3, 4, and 5
1720 categories, respectively. The cell at coordinates (1, 2, 3) has
1721 index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}.
1722 Within a given dimension, the index is the @code{leaf-index} in a Leaf.
1724 @node SPV Light Member Value
1727 Value is used throughout the SPV light member format. It boils down
1728 to a number or a string.
1731 Value => 00? 00? 00? 00? RawValue
1733 01 ValueMod int32[format] double[x]
1734 @math{|} 02 ValueMod int32[format] double[x]
1735 string[var-name] string[value-label] byte[show]
1736 @math{|} 03 string[local] ValueMod string[id] string[c] bool[fixed]
1737 @math{|} 04 ValueMod int32[format] string[value-label] string[var-name]
1738 byte[show] string[s]
1739 @math{|} 05 ValueMod string[var-name] string[var-label] byte[show]
1740 @math{|} 06 string[local] ValueMod string[id] string[c]
1741 @math{|} ValueMod string[template] int32[n-args] Argument*[n-args]
1744 @math{|} int32[x] i0 Value*[x] /* x > 0 */
1747 There are several possible encodings, which one can distinguish by the
1748 first nonzero byte in the encoding.
1752 The numeric value @code{x}, intended to be presented to the user
1753 formatted according to @code{format}, which is about the same as the
1754 format described for system files (@pxref{System File Output
1755 Formats}). The exception is that format 40 is not MTIME but instead
1756 approximately a synonym for F format with a different rule for whether
1757 a value is shown in scientific notation: a value in format 40 is shown
1758 in scientific notation if and only if it is nonzero and its magnitude
1759 is less than @code{small} (@pxref{SPV Light Member Formats}).
1761 Most commonly, @code{format} has width 40 (the maximum).
1763 An @code{x} with the maximum negative double value @code{-DBL_MAX}
1764 represents the system-missing value SYSMIS. (HIGHEST and LOWEST have
1765 not been observed.) See @ref{System File Format}, for more about
1766 these special values.
1769 Similar to @code{01}, with the additional information that @code{x} is
1770 a value of variable @code{var-name} and has value label
1771 @code{value-label}. Both @code{var-name} and @code{value-label} can
1772 be the empty string, the latter very commonly.
1774 @code{show} determines whether to show the numeric value or the value
1775 label. A value of 1 means to show the value, 2 to show the label, 3
1776 to show both, and 0 means to use the default specified in
1777 @code{show-values} (@pxref{SPV Light Member Formats}).
1780 A text string, in two forms: @code{c} is in English, and sometimes
1781 abbreviated or obscure, and @code{local} is localized to the user's
1782 locale. In an English-language locale, the two strings are often the
1783 same, and in the cases where they differ, @code{local} is more
1784 appropriate for a user interface, e.g.@: @code{c} of ``Not a PxP table
1785 for MCN...'' versus @code{local} of ``Computed only for a PxP table,
1786 where P must be greater than 1.''
1788 @code{c} and @code{local} are always either both empty or both
1791 @code{id} is a brief identifying string whose form seems to resemble a
1792 programming language identifier, e.g.@: @code{cumulative_percent} or
1793 @code{factor_14}. It is not unique.
1795 @code{fixed} is 00 for text taken from user input, such as syntax
1796 fragment, expressions, file names, data set names, and 01 for fixed
1797 text strings such as names of procedures or statistics. In the former
1798 case, @code{id} is always the empty string; in the latter case,
1799 @code{id} is still sometimes empty.
1802 The string value @code{s}, intended to be presented to the user
1803 formatted according to @code{format}. The format for a string is not
1804 too interesting, and the corpus contains many clearly invalid formats
1805 like A16.39 or A255.127 or A134.1, so readers should probably entirely
1806 disregard the format. PSPP only checks @code{format} to distinguish
1809 @code{s} is a value of variable @code{var-name} and has value label
1810 @code{value-label}. @code{var-name} is never empty but
1811 @code{value-label} is commonly empty.
1813 @code{show} has the same meaning as in the encoding for 02.
1816 Variable @code{var-name} with variable label @code{var-label}. In the
1817 corpus, @code{var-name} is rarely empty and @code{var-label} is often
1820 @code{show} determines whether to show the variable name or the
1821 variable label. A value of 1 means to show the name, 2 to show the
1822 label, 3 to show both, and 0 means to use the default specified in
1823 @code{show-variables} (@pxref{SPV Light Member Formats}).
1826 Similar to type 03, with @code{fixed} assumed to be true.
1829 When the first byte of a RawValue is not one of the above, the
1830 RawValue starts with a ValueMod, whose syntax is described in the next
1831 section. (A ValueMod always begins with byte 31 or 58.)
1833 This case is a template string, analogous to @code{printf}, followed
1834 by one or more Arguments, each of which has one or more values. The
1835 template string is copied directly into the output except for the
1836 following special syntax,
1843 Each of these expands to the character following @samp{\\}, to escape
1844 characters that have special meaning in template strings. These are
1845 effective inside and outside the @code{[@dots{}]} syntax forms
1849 Expands to a new-line, inside or outside the @code{[@dots{}]} forms
1853 Expands to a formatted version of argument @var{i}, which must have
1854 only a single value. For example, @code{^1} expands to the first
1855 argument's @code{value}.
1857 @item [:@var{a}:]@var{i}
1858 Expands @var{a} for each of the values in @var{i}. @var{a}
1859 should contain one or more @code{^@var{j}} conversions, which are
1860 drawn from the values for argument @var{i} in order. Some examples
1865 All of the values for the first argument, concatenated.
1868 Expands to the values for the first argument, each followed by
1872 Expands to @code{@var{x} = @var{y}} where @var{x} is the second
1873 argument's first value and @var{y} is its second value. (This would
1874 be used only if the argument has two values. If there were more
1875 values, the second and third values would be directly concatenated,
1876 which would look funny.)
1879 @item [@var{a}:@var{b}:]@var{i}
1880 This extends the previous form so that the first values are expanded
1881 using @var{a} and later values are expanded using @var{b}. For an
1882 unknown reason, within @var{a} the @code{^@var{j}} conversions are
1883 instead written as @code{%@var{j}}. Some examples from the corpus:
1887 Expands to all of the values for the first argument, separated by
1890 @item [%1 = %2:, ^1 = ^2:]1
1891 Given appropriate values for the first argument, expands to @code{X =
1895 Given appropriate values, expands to @code{1, 2, 3}.
1899 The template string is localized to the user's locale.
1902 A writer may safely omit all of the optional 00 bytes at the beginning
1903 of a Value, except that it should write a single 00 byte before a
1906 @node SPV Light Member ValueMod
1907 @subsection ValueMod
1909 A ValueMod can specify special modifications to a Value.
1915 int32[n-refs] int16*[n-refs]
1916 int32[n-subscripts] string*[n-subscripts]
1917 v1(00 (i1 | i2) 00? 00? int32 00? 00?)
1918 v3(count(TemplateString StylePair))
1920 TemplateString => count((count((i0 (58 @math{|} 31 55))?) (58 @math{|} 31 string[id]))?)
1927 bool[bold] bool[italic] bool[underline] bool[show]
1928 string[fg-color] string[bg-color]
1929 string[typeface] byte[size]
1932 int32[halign] int32[valign] double[decimal-offset]
1933 int16[left-margin] int16[right-margin]
1934 int16[top-margin] int16[bottom-margin]
1937 A ValueMod that begins with ``31'' specifies special modifications to
1940 Each of the @code{n-refs} integers is a reference to a Footnote
1941 (@pxref{SPV Light Member Footnotes}) by 0-based index. Footnote
1942 markers are shown appended to the main text of the Value, as
1943 superscripts or subscripts.
1945 The @code{subscripts}, if present, are strings to append to the main
1946 text of the Value, as subscripts. Each subscript text is a brief
1947 indicator, e.g.@: @samp{a} or @samp{b}, with its meaning indicated by
1948 the table caption. When multiple subscripts are present, they are
1949 displayed separated by commas.
1951 The @code{id} inside the TemplateString, if present, is a template
1952 string for substitutions using the syntax explained previously. It
1953 appears to be an English-language version of the localized template
1954 string in the Value in which the Template is nested. A writer may
1955 safely omit the optional fixed data in TemplateString.
1957 FontStyle and CellStyle, if present, change the style for this
1958 individual Value. In FontStyle, @code{bold}, @code{italic}, and
1959 @code{underline} control the particular style. @code{show} is
1960 ordinarily 1; if it is 0, then the cell data is not shown.
1961 @code{fg-color} and @code{bg-color} are strings in the format
1962 @code{#rrggbb}, e.g.@: @code{#ff0000} for red or @code{#ffffff} for
1963 white. The empty string is occasionally observed also. The
1964 @code{size} is a font size in units of 1/128 inch.
1966 In CellStyle, @code{halign} is 0 for center, 2 for left, 4 for right,
1967 6 for decimal, 0xffffffad for mixed. For decimal alignment,
1968 @code{decimal-offset} is the decimal point's offset from the right
1969 side of the cell, in pt (@pxref{SPV Light Detail Member Format}).
1970 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
1971 for bottom. @code{left-margin}, @code{right-margin},
1972 @code{top-margin}, and @code{bottom-margin} are in pt.
1974 @node SPV Legacy Detail Member Binary Format
1975 @section Legacy Detail Member Binary Format
1977 Whereas the light binary format represents everything about a given
1978 pivot table, the legacy binary format conceptually consists of a
1979 number of named sources, each of which consists of a number of named
1980 variables, each of which is a 1-dimensional array of numbers or
1981 strings or a mix. Thus, the legacy binary member format is quite
1984 This section uses the same context-free grammar notation as in the
1985 previous section, with the following additions:
1989 In a version 0xaf legacy member, @var{x}; in other versions, nothing.
1990 (The legacy member header indicates the version; see below.)
1993 In a version 0xb0 legacy member, @var{x}; in other versions, nothing.
1996 A legacy detail member @file{.bin} has the following overall format:
2000 00 byte[version] int16[n-sources] int32[member-size]
2001 Metadata*[n-sources]
2006 @code{version} is a version number that affects the interpretation of
2007 some of the other data in the member. Versions 0xaf and 0xb0 are
2008 known. We will refer to ``version 0xaf'' and ``version 0xb0'' members
2011 A legacy member consists of @code{n-sources} data sources, each of
2012 which has Metadata and Data.
2014 @code{member-size} is the size of the legacy binary member, in bytes.
2016 The Data and Strings above are commented out because the Metadata has
2017 some oddities that mean that the Data sometimes seems to start at
2018 an unexpected place. The following section goes into detail.
2021 * SPV Legacy Member Metadata::
2022 * SPV Legacy Member Numeric Data::
2023 * SPV Legacy Member String Data::
2026 @node SPV Legacy Member Metadata
2027 @subsection Metadata
2031 int32[n-values] int32[n-variables] int32[data-offset]
2032 vAF(byte*28[source-name])
2033 vB0(byte*64[source-name] int32[x])
2036 A data source has @code{n-variables} variables, each with
2037 @code{n-values} data values.
2039 @code{source-name} is a 28- or 64-byte string padded on the right with
2040 0-bytes. The names that appear in the corpus are very generic:
2041 usually @code{tableData} for pivot table data or @code{source0} for
2044 A given Metadata's @code{data-offset} is the offset, in bytes, from
2045 the beginning of the member to the start of the corresponding Data.
2046 This allows programs to skip to the beginning of the data for a
2047 particular source. In every case in the corpus, the Data follow the
2048 Metadata in the same order, but it is important to use
2049 @code{data-offset} instead of reading sequentially through the file
2050 because of the exception described below.
2052 One SPV file in the corpus has legacy binary members with version 0xb0
2053 but a 28-byte @code{source-name} field (and only a single source). In
2054 practice, this means that the 64-byte @code{source-name} used in
2055 version 0xb0 has a lot of 0-bytes in the middle followed by the
2056 @code{variable-name} of the following Data. As long as a reader
2057 treats the first 0-byte in the @code{source-name} as terminating the
2058 string, it can properly interpret these members.
2060 The meaning of @code{x} in version 0xb0 is unknown.
2062 @node SPV Legacy Member Numeric Data
2063 @subsection Numeric Data
2066 Data => Variable*[n-variables]
2067 Variable => byte*288[variable-name] double*[n-values]
2070 Data follow the Metadata in the legacy binary format, with sources in
2071 the same order (but readers should use the @code{data-offset} in
2072 Metadata records, rather than reading sequentially). Each Variable
2073 begins with a @code{variable-name} that generally indicates its role
2074 in the pivot table, e.g.@: ``cell'', ``cellFormat'',
2075 ``dimension0categories'', ``dimension0group0'', followed by the
2076 numeric data, one double per datum. A double with the maximum
2077 negative double @code{-DBL_MAX} represents the system-missing value
2080 @node SPV Legacy Member String Data
2081 @subsection String Data
2084 Strings => SourceMaps[maps] Labels
2086 SourceMaps => int32[n-maps] SourceMap*[n-maps]
2088 SourceMap => string[source-name] int32[n-variables] VariableMap*[n-variables]
2089 VariableMap => string[variable-name] int32[n-data] DatumMap*[n-data]
2090 DatumMap => int32[value-idx] int32[label-idx]
2092 Labels => int32[n-labels] Label*[n-labels]
2093 Label => int32[frequency] string[label]
2096 Each variable may include a mix of numeric and string data values. If
2097 a legacy binary member contains any string data, Strings is present;
2098 otherwise, it ends just after the last Data element.
2100 The string data overlays the numeric data. When a variable includes
2101 any string data, its Variable represents the string values with a
2102 SYSMIS or NaN placeholder. (Not all such values need be
2105 Each SourceMap provides a mapping between SYSMIS or NaN values in source
2106 @code{source-name} and the string data that they represent.
2107 @code{n-variables} is the number of variables in the source that
2108 include string data. More precisely, it is the 1-based index of the
2109 last variable in the source that includes any string data; thus, it
2110 would be 4 if there are 5 variables and only the fourth one includes
2113 A VariableMap repeats its variable's name, but variables are always
2114 present in the same order as the source, starting from the first
2115 variable, without skipping any even if they have no string values.
2116 Each VariableMap contains DatumMap nonterminals, each of which maps
2117 from a 0-based index within its variable's data to a 0-based label
2118 index, e.g.@: pair @code{value-idx} = 2, @code{label-idx} = 3, means
2119 that the third data value (which must be SYSMIS or NaN) is to be
2120 replaced by the string of the fourth Label.
2122 The labels themselves follow the pairs. The valuable part of each
2123 label is the string @code{label}. Each label also includes a
2124 @code{frequency} that reports the number of DatumMaps that reference
2125 it (although this is not useful).
2127 @node SPV Legacy Detail Member XML Format
2128 @section Legacy Detail Member XML Format
2130 The design of the detail XML format is not what one would end up with
2131 for describing pivot tables. This is because it is a special case
2132 of a much more general format (``visualization XML'' or ``VizML'')
2133 that can describe a wide range of visualizations. Most of this
2134 generality is overkill for tables, and so we end up with a funny
2135 subset of a general-purpose format.
2137 An XML Schema for VizML is available, distributed with SPSS binaries,
2138 under a nonfree license. It contains documentation that is
2139 occasionally helpful.
2141 This section describes the detail XML format using the same notation
2142 already used for the structure XML format (@pxref{SPV Structure Member
2143 Format}). See @file{src/output/spv/detail-xml.grammar} in the PSPP
2144 source tree for the full grammar that it uses for parsing.
2146 The important elements of the detail XML format are:
2150 Variables. @xref{SPV Detail Variable Elements}.
2153 Assignment of variables to axes. A variable can appear as columns, or
2154 rows, or layers. The @code{faceting} element and its sub-elements
2155 describe this assignment.
2158 Styles and other annotations.
2161 This description is not detailed enough to write legacy tables.
2162 Instead, write tables in the light binary format.
2165 * SPV Detail visualization Element::
2166 * SPV Detail Variable Elements::
2167 * SPV Detail extension Element::
2168 * SPV Detail graph Element::
2169 * SPV Detail location Element::
2170 * SPV Detail faceting Element::
2171 * SPV Detail facetLayout Element::
2172 * SPV Detail label Element::
2173 * SPV Detail setCellProperties Element::
2174 * SPV Detail setFormat Element::
2175 * SPV Detail interval Element::
2176 * SPV Detail style Element::
2177 * SPV Detail labelFrame Element::
2178 * SPV Detail Legacy Properties::
2181 @node SPV Detail visualization Element
2182 @subsection The @code{visualization} Element
2190 :style[style_ref]=ref style
2194 => visualization_extension?
2196 (sourceVariable | derivedVariable)+
2205 extension[visualization_extension]
2208 :minWidthSet=(true)?
2209 :maxWidthSet=(true)?
2212 userSource :missing=(listwise | pairwise)? => EMPTY
2214 categoricalDomain => variableReference simpleSort
2216 simpleSort :method[sort_method]=(custom) => categoryOrder
2218 container :style=ref style => container_extension? location+ labelFrame*
2220 extension[container_extension] :combinedFootnotes=(true) => EMPTY
2228 The @code{visualization} element is the root of detail XML member. It
2229 has the following attributes:
2231 @defvr {Attribute} creator
2232 The version of the software that created this SPV file, as a string of
2233 the form @code{xxyyzz}, which represents software version xx.yy.zz,
2234 e.g.@: @code{160001} is version 16.0.1. The corpus includes major
2235 versions 16 through 19.
2238 @defvr {Attribute} date
2239 The date on the which the file was created, as a string of the form
2243 @defvr {Attribute} lang
2244 The locale used for output, in Windows format, which is similar to the
2245 format used in Unix with the underscore replaced by a hyphen, e.g.@:
2246 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
2249 @defvr {Attribute} name
2250 The title of the pivot table, localized to the output language.
2253 @defvr {Attribute} style
2254 The base style for the pivot table. In every example in the corpus,
2255 the @code{style} element has no attributes other than @code{id}.
2258 @defvr {Attribute} type
2259 A floating-point number. The meaning is unknown.
2262 @defvr {Attribute} version
2263 The visualization schema version number. In the corpus, the value is
2264 one of 2.4, 2.5, 2.7, and 2.8.
2267 The @code{userSource} element has no visible effect.
2269 The @code{extension} element as a child of @code{visualization} has
2270 the following attributes.
2272 @defvr {Attribute} numRows
2273 An integer that presumably defines the number of rows in the displayed
2277 @defvr {Attribute} showGridline
2278 Always set to @code{false} in the corpus.
2281 @defvr {Attribute} minWidthSet
2282 @defvrx {Attribute} maxWidthSet
2283 Always set to @code{true} in the corpus.
2286 The @code{extension} element as a child of @code{container} has the
2289 @defvr {Attribute} combinedFootnotes
2293 The @code{categoricalDomain} and @code{simpleSort} elements have no
2296 The @code{layerController} element has no visible effect.
2298 @node SPV Detail Variable Elements
2299 @subsection Variable Elements
2301 A ``variable'' in detail XML is a 1-dimensional array of data. Each
2302 element of the array may, independently, have string or numeric
2303 content. All of the variables in a given detail XML member either
2304 have the same number of elements or have zero elements.
2306 Two different elements define variables and their content:
2309 @item sourceVariable
2310 These variables' data comes from the associated @code{tableData.bin}
2313 @item derivedVariable
2314 These variables are defined in terms of a mapping function from a
2315 source variable, or they are empty.
2318 A variable named @code{cell} always exists. This variable holds the
2319 data displayed in the table.
2321 Variables in detail XML roughly correspond to the dimensions in a
2322 light detail member. Each dimension has the following variables with
2323 stylized names, where @var{n} is a number for the dimension starting
2327 @item dimension@var{n}categories
2328 The dimension's leaf categories (@pxref{SPV Light Member Categories}).
2330 @item dimension@var{n}group0
2331 Present only if the dimension's categories are grouped, this variable
2332 holds the group labels for the categories. Grouping is inferred
2333 through adjacent identical labels. Categories that are not part of a
2334 group have empty-string data in this variable.
2336 @item dimension@var{n}group1
2337 Present only if the first-level groups are further grouped, this
2338 variable holds the labels for the second-level groups. There can be
2339 additional variables with further levels of grouping.
2341 @item dimension@var{n}
2345 Determining the data for a (non-empty) variable is a multi-step
2350 Draw initial data from its source, for a @code{sourceVariable}, or
2351 from another named variable, for a @code{derivedVariable}.
2354 Apply mappings from @code{valueMapEntry} elements within the
2355 @code{derivedVariable} element, if any.
2358 Apply mappings from @code{relabel} elements within a @code{format} or
2359 @code{stringFormat} element in the @code{sourceVariable} or
2360 @code{derivedVariable} element, if any.
2363 If the variable is a @code{sourceVariable} with a @code{labelVariable}
2364 attribute, and there were no mappings to apply in previous steps, then
2365 replace each element of the variable by the corresponding value in the
2369 A single variable's data can be modified in two of the steps, if both
2370 @code{valueMapEntry} and @code{relabel} are used. The following
2371 example from the corpus maps several integers to 2, then maps 2 in
2372 turn to the string ``Input'':
2375 <derivedVariable categorical="true" dependsOn="dimension0categories"
2376 id="dimension0group0map" value="map(dimension0group0)">
2378 <relabel from="2" to="Input"/>
2379 <relabel from="10" to="Missing Value Handling"/>
2380 <relabel from="14" to="Resources"/>
2381 <relabel from="0" to=""/>
2382 <relabel from="1" to=""/>
2383 <relabel from="13" to=""/>
2385 <valueMapEntry from="2;3;5;6;7;8;9" to="2"/>
2386 <valueMapEntry from="10;11" to="10"/>
2387 <valueMapEntry from="14;15" to="14"/>
2388 <valueMapEntry from="0" to="0"/>
2389 <valueMapEntry from="1" to="1"/>
2390 <valueMapEntry from="13" to="13"/>
2395 * SPV Detail sourceVariable Element::
2396 * SPV Detail derivedVariable Element::
2397 * SPV Detail valueMapEntry Element::
2400 @node SPV Detail sourceVariable Element
2401 @subsubsection The @code{sourceVariable} Element
2408 :domain=ref categoricalDomain?
2410 :dependsOn=ref sourceVariable?
2412 :labelVariable=ref sourceVariable?
2413 => variable_extension* (format | stringFormat)?
2416 This element defines a variable whose data comes from the
2417 @file{tableData.bin} member that corresponds to this @file{.xml}.
2419 This element has the following attributes.
2421 @defvr {Attribute} id
2422 An @code{id} is always present because this element exists to be
2423 referenced from other elements.
2426 @defvr {Attribute} categorical
2427 Always set to @code{true}.
2430 @defvr {Attribute} source
2431 Always set to @code{tableData}, the @code{source-name} in the
2432 corresponding @file{tableData.bin} member (@pxref{SPV Legacy Member
2436 @defvr {Attribute} sourceName
2437 The name of a variable within the source, corresponding to the
2438 @code{variable-name} in the @file{tableData.bin} member (@pxref{SPV
2439 Legacy Member Numeric Data}).
2442 @defvr {Attribute} label
2443 The variable label, if any.
2446 @defvr {Attribute} labelVariable
2447 The @code{variable-name} of a variable whose string values correspond
2448 one-to-one with the values of this variable and are suitable for use
2452 @defvr {Attribute} dependsOn
2453 This attribute doesn't affect the display of a table.
2456 @node SPV Detail derivedVariable Element
2457 @subsubsection The @code{derivedVariable} Element
2464 :dependsOn=ref sourceVariable?
2465 => variable_extension* (format | stringFormat)? valueMapEntry*
2468 Like @code{sourceVariable}, this element defines a variable whose
2469 values can be used elsewhere in the visualization. Instead of being
2470 read from a data source, the variable's data are defined by a
2471 mathematical expression.
2473 This element has the following attributes.
2475 @defvr {Attribute} id
2476 An @code{id} is always present because this element exists to be
2477 referenced from other elements.
2480 @defvr {Attribute} categorical
2481 Always set to @code{true}.
2484 @defvr {Attribute} value
2485 An expression that defines the variable's value. In theory this could
2486 be an arbitrary expression in terms of constants, functions, and other
2487 variables, e.g.@: @math{(@var{var1} + @var{var2}) / 2}. In practice,
2488 the corpus contains only the following forms of expressions:
2492 @itemx constant(@var{variable})
2493 All zeros. The reason why a variable is sometimes named is unknown.
2494 Sometimes the ``variable name'' has spaces in it.
2496 @item map(@var{variable})
2497 Transforms the values in the named @var{variable} using the
2498 @code{valueMapEntry}s contained within the element.
2502 @defvr {Attribute} dependsOn
2503 This attribute doesn't affect the display of a table.
2506 @node SPV Detail valueMapEntry Element
2507 @subsubsection The @code{valueMapEntry} Element
2510 valueMapEntry :from :to => EMPTY
2513 A @code{valueMapEntry} element defines a mapping from one or more
2514 values of a source expression to a target value. (In the corpus, the
2515 source expression is always just the name of a variable.) Each target
2516 value requires a separate @code{valueMapEntry}. If multiple source
2517 values map to the same target value, they can be combined or separate.
2519 In the corpus, all of the source and target values are integers.
2521 @code{valueMapEntry} has the following attributes.
2523 @defvr {Attribute} from
2524 A source value, or multiple source values separated by semicolons,
2525 e.g.@: @code{0} or @code{13;14;15;16}.
2528 @defvr {Attribute} to
2529 The target value, e.g.@: @code{0}.
2532 @node SPV Detail extension Element
2533 @subsection The @code{extension} Element
2535 This is a general-purpose ``extension'' element. Readers that don't
2536 understand a given extension should be able to safely ignore it. The
2537 attributes on this element, and their meanings, vary based on the
2538 context. Each known usage is described separately below. The current
2539 extensions use attributes exclusively, without any nested elements.
2541 @subsubheading @code{container} Parent Element
2544 extension[container_extension] :combinedFootnotes=(true) => EMPTY
2547 With @code{container} as its parent element, @code{extension} has the
2548 following attributes.
2550 @defvr {Attribute} combinedFootnotes
2551 Always set to @code{true} in the corpus.
2554 @subsubheading @code{sourceVariable} and @code{derivedVariable} Parent Element
2557 extension[variable_extension] :from :helpId => EMPTY
2560 With @code{sourceVariable} or @code{derivedVariable} as its parent
2561 element, @code{extension} has the following attributes. A given
2562 parent element often contains several @code{extension} elements that
2563 specify the meaning of the source data's variables or sources, e.g.@:
2566 <extension from="0" helpId="corrected_model"/>
2567 <extension from="3" helpId="error"/>
2568 <extension from="4" helpId="total_9"/>
2569 <extension from="5" helpId="corrected_total"/>
2572 More commonly they are less helpful, e.g.@:
2575 <extension from="0" helpId="notes"/>
2576 <extension from="1" helpId="notes"/>
2577 <extension from="2" helpId="notes"/>
2578 <extension from="5" helpId="notes"/>
2579 <extension from="6" helpId="notes"/>
2580 <extension from="7" helpId="notes"/>
2581 <extension from="8" helpId="notes"/>
2582 <extension from="12" helpId="notes"/>
2583 <extension from="13" helpId="no_help"/>
2584 <extension from="14" helpId="notes"/>
2587 @defvr {Attribute} from
2588 An integer or a name like ``dimension0''.
2591 @defvr {Attribute} helpId
2595 @node SPV Detail graph Element
2596 @subsection The @code{graph} Element
2600 :cellStyle=ref style
2602 => location+ coordinates faceting facetLayout interval
2604 coordinates => EMPTY
2607 @code{graph} has the following attributes.
2609 @defvr {Attribute} cellStyle
2610 @defvrx {Attribute} style
2611 Each of these is the @code{id} of a @code{style} element (@pxref{SPV
2612 Detail style Element}). The former is the default style for
2613 individual cells, the latter for the entire table.
2616 @node SPV Detail location Element
2617 @subsection The @code{location} Element
2621 :part=(height | width | top | bottom | left | right)
2622 :method=(sizeToContent | attach | fixed | same)
2625 :target=ref (labelFrame | graph | container)?
2630 Each instance of this element specifies where some part of the table
2631 frame is located. All the examples in the corpus have four instances
2632 of this element, one for each of the parts @code{height},
2633 @code{width}, @code{left}, and @code{top}. Some examples in the
2634 corpus add a fifth for part @code{bottom}, even though it is not clear
2635 how all of @code{top}, @code{bottom}, and @code{height} can be honored
2636 at the same time. In any case, @code{location} seems to have little
2637 importance in representing tables; a reader can safely ignore it.
2639 @defvr {Attribute} part
2640 The part of the table being located.
2643 @defvr {Attribute} method
2644 How the location is determined:
2648 Based on the natural size of the table. Observed only for
2649 parts @code{height} and @code{width}.
2652 Based on the location specified in @code{target}. Observed only for
2653 parts @code{top} and @code{bottom}.
2656 Using the value in @code{value}. Observed only for parts @code{top},
2657 @code{bottom}, and @code{left}.
2660 Same as the specified @code{target}. Observed only for part
2665 @defvr {Attribute} min
2666 Minimum size. Only observed with value @code{100pt}. Only observed
2667 for part @code{width}.
2670 @defvr {Dependent} target
2671 Required when @code{method} is @code{attach} or @code{same}, not
2672 observed otherwise. This identifies an element to attach to.
2673 Observed with the ID of @code{title}, @code{footnote}, @code{graph},
2677 @defvr {Dependent} value
2678 Required when @code{method} is @code{fixed}, not observed otherwise.
2679 Observed values are @code{0%}, @code{0px}, @code{1px}, and @code{3px}
2680 on parts @code{top} and @code{left}, and @code{100%} on part
2684 @node SPV Detail faceting Element
2685 @subsection The @code{faceting} Element
2688 faceting => layer[layers1]* cross layer[layers2]*
2690 cross => (unity | nest) (unity | nest)
2694 nest => variableReference[vars]+
2696 variableReference :ref=ref (sourceVariable | derivedVariable) => EMPTY
2699 :variable=ref (sourceVariable | derivedVariable)
2702 :method[layer_method]=(nest)?
2707 The @code{faceting} element describes the row, column, and layer
2708 structure of the table. Its @code{cross} child determines the row and
2709 column structure, and each @code{layer} child (if any) represents a
2710 layer. Layers may appear before or after @code{cross}.
2712 The @code{cross} element describes the row and column structure of the
2713 table. It has exactly two children, the first of which describes the
2714 table's columns and the second the table's rows. Each child is a
2715 @code{nest} element if the table has any dimensions along the axis in
2716 question, otherwise a @code{unity} element.
2718 A @code{nest} element contains of one or more dimensions listed from
2719 innermost to outermost, each represented by @code{variableReference}
2720 child elements. Each variable in a dimension is listed in order.
2721 @xref{SPV Detail Variable Elements}, for information on the variables
2722 that comprise a dimension.
2724 A @code{nest} can contain a single dimension, e.g.:
2728 <variableReference ref="dimension0categories"/>
2729 <variableReference ref="dimension0group0"/>
2730 <variableReference ref="dimension0"/>
2735 A @code{nest} can contain multiple dimensions, e.g.:
2739 <variableReference ref="dimension1categories"/>
2740 <variableReference ref="dimension1group0"/>
2741 <variableReference ref="dimension1"/>
2742 <variableReference ref="dimension0categories"/>
2743 <variableReference ref="dimension0"/>
2747 A @code{nest} may have no dimensions, in which case it still has one
2748 @code{variableReference} child, which references a
2749 @code{derivedVariable} whose @code{value} attribute is
2750 @code{constant(0)}. In the corpus, such a @code{derivedVariable} has
2751 @code{row} or @code{column}, respectively, as its @code{id}. This is
2752 equivalent to using a @code{unity} element in place of @code{nest}.
2754 A @code{variableReference} element refers to a variable through its
2755 @code{ref} attribute.
2757 Each @code{layer} element represents a dimension, e.g.:
2760 <layer value="0" variable="dimension0categories" visible="true"/>
2761 <layer value="dimension0" variable="dimension0" visible="false"/>
2765 @code{layer} has the following attributes.
2767 @defvr {Attribute} variable
2768 Refers to a @code{sourceVariable} or @code{derivedVariable} element.
2771 @defvr {Attribute} value
2772 The value to select. For a category variable, this is always
2773 @code{0}; for a data variable, it is the same as the @code{variable}
2777 @defvr {Attribute} visible
2778 Whether the layer is visible. Generally, category layers are visible
2779 and data layers are not, but sometimes this attribute is omitted.
2782 @defvr {Attribute} method
2783 When present, this is always @code{nest}.
2786 @node SPV Detail facetLayout Element
2787 @subsection The @code{facetLayout} Element
2790 facetLayout => tableLayout setCellProperties[scp1]*
2791 facetLevel+ setCellProperties[scp2]*
2794 :verticalTitlesInCorner=bool
2796 :fitCells=(ticks both)?
2800 The @code{facetLayout} element and its descendants control styling for
2803 Its @code{tableLayout} child has the following attributes
2805 @defvr {Attribute} verticalTitlesInCorner
2806 If true, in the absence of corner text, row headings will be displayed
2810 @defvr {Attribute} style
2811 Refers to a @code{style} element.
2814 @defvr {Attribute} fitCells
2818 @subsubheading The @code{facetLevel} Element
2821 facetLevel :level=int :gap=dimension? => axis
2823 axis :style=ref style => label? majorTicks
2829 :tickFrameStyle=ref style
2830 :labelFrequency=int?
2840 Each @code{facetLevel} describes a @code{variableReference} or
2841 @code{layer}, and a table has one @code{facetLevel} element for
2842 each such element. For example, an SPV detail member that contains
2843 four @code{variableReference} elements and two @code{layer} elements
2844 will contain six @code{facetLevel} elements.
2846 In the corpus, @code{facetLevel} elements and the elements that they
2847 describe are always in the same order. The correspondence may also be
2848 observed in two other ways. First, one may use the @code{level}
2849 attribute, described below. Second, in the corpus, a
2850 @code{facetLevel} always has an @code{id} that is the same as the
2851 @code{id} of the element it describes with @code{_facetLevel}
2852 appended. One should not formally rely on this, of course, but it is
2853 usefully indicative.
2855 @defvr {Attribute} level
2856 A 1-based index into the @code{variableReference} and @code{layer}
2857 elements, e.g.@: a @code{facetLayout} with a @code{level} of 1
2858 describes the first @code{variableReference} in the SPV detail member,
2859 and in a member with four @code{variableReference} elements, a
2860 @code{facetLayout} with a @code{level} of 5 describes the first
2861 @code{layer} in the member.
2864 @defvr {Attribute} gap
2865 Always observed as @code{0pt}.
2868 Each @code{facetLevel} contains an @code{axis}, which in turn may
2869 contain a @code{label} for the @code{facetLevel} (@pxref{SPV Detail
2870 label Element}) and does contain a @code{majorTicks} element.
2872 @defvr {Attribute} labelAngle
2873 Normally 0. The value -90 causes inner column or outer row labels to
2874 be rotated vertically.
2877 @defvr {Attribute} style
2878 @defvrx {Attribute} tickFrameStyle
2879 Each refers to a @code{style} element. @code{style} is the style of
2880 the tick labels, @code{tickFrameStyle} the style for the frames around
2884 @node SPV Detail label Element
2885 @subsection The @code{label} Element
2890 :textFrameStyle=ref style?
2891 :purpose=(title | subTitle | subSubTitle | layer | footnote)?
2892 => text+ | descriptionGroup
2895 :target=ref faceting
2897 => (description | text)+
2899 description :name=(variable | value) => EMPTY
2903 :definesReference=int?
2904 :position=(subscript | superscript)?
2909 This element represents a label on some aspect of the table.
2911 @defvr {Attribute} style
2912 @defvrx {Attribute} textFrameStyle
2913 Each of these refers to a @code{style} element. @code{style} is the
2914 style of the label text, @code{textFrameStyle} the style for the frame
2918 @defvr {Attribute} purpose
2919 The kind of entity being labeled.
2922 A @code{descriptionGroup} concatenates one or more elements to form a
2923 label. Each element can be a @code{text} element, which contains
2924 literal text, or a @code{description} element that substitutes a value
2927 @defvr {Attribute} target
2928 The @code{id} of an element being described. In the corpus, this is
2929 always @code{faceting}.
2932 @defvr {Attribute} separator
2933 A string to separate the description of multiple groups, if the
2934 @code{target} has more than one. In the corpus, this is always a
2938 Typical contents for a @code{descriptionGroup} are a value by itself:
2940 <description name="value"/>
2942 @noindent or a variable and its value, separated by a colon:
2944 <description name="variable"/><text>:</text><description name="value"/>
2947 A @code{description} is like a macro that expands to some property of
2948 the target of its parent @code{descriptionGroup}. The @code{name}
2949 attribute specifies the property.
2951 @node SPV Detail setCellProperties Element
2952 @subsection The @code{setCellProperties} Element
2956 :applyToConverse=bool?
2957 => (setStyle | setFrameStyle | setFormat | setMetaData)* union[union_]?
2960 The @code{setCellProperties} element sets style properties of cells or
2961 row or column labels.
2963 Interpreting @code{setCellProperties} requires answering two
2964 questions: which cells or labels to style, and what styles to use.
2966 @subsubheading Which Cells?
2971 intersect => where+ | intersectWhere | alternating | EMPTY
2974 :variable=ref (sourceVariable | derivedVariable)
2979 :variable=ref (sourceVariable | derivedVariable)
2980 :variable2=ref (sourceVariable | derivedVariable)
2983 alternating => EMPTY
2986 When @code{union} is present with @code{intersect} children, each of
2987 those children specifies a group of cells that should be styled, and
2988 the total group is all those cells taken together. When @code{union}
2989 is absent, every cell is styled. One attribute on
2990 @code{setCellProperties} affects the choice of cells:
2992 @defvr {Attribute} applyToConverse
2993 If true, this inverts the meaning of the cell selection: the selected
2994 cells are the ones @emph{not} designated. This is confusing, given
2995 the additional restrictions of @code{union}, but in the corpus
2996 @code{applyToConverse} is never present along with @code{union}.
2999 An @code{intersect} specifies restrictions on the cells to be matched.
3000 Each @code{where} child specifies which values of a given variable to
3001 include. The attributes of @code{intersect} are:
3003 @defvr {Attribute} variable
3004 Refers to a variable, e.g.@: @code{dimension0categories}. Only
3005 ``categories'' variables make sense here, but other variables, e.g.@:
3006 @code{dimension0group0map}, are sometimes seen. The reader may ignore
3010 @defvr {Attribute} include
3011 A value, or multiple values separated by semicolons,
3012 e.g.@: @code{0} or @code{13;14;15;16}.
3015 PSPP ignores @code{setCellProperties} when @code{intersectWhere} is
3018 @subsubheading What Styles?
3022 :target=ref (labeling | graph | interval | majorTicks)
3026 setMetaData :target=ref graph :key :value => EMPTY
3029 :target=ref (majorTicks | labeling)
3031 => format | numberFormat | stringFormat+ | dateTimeFormat | elapsedTimeFormat
3035 :target=ref majorTicks
3039 The @code{set*} children of @code{setCellProperties} determine the
3042 When @code{setCellProperties} contains a @code{setFormat} whose
3043 @code{target} references a @code{labeling} element, or if it contains
3044 a @code{setStyle} that references a @code{labeling} or @code{interval}
3045 element, the @code{setCellProperties} sets the style for table cells.
3046 The format from the @code{setFormat}, if present, replaces the cells'
3047 format. The style from the @code{setStyle} that references
3048 @code{labeling}, if present, replaces the label's font and cell
3049 styles, except that the background color is taken instead from the
3050 @code{interval}'s style, if present.
3052 When @code{setCellProperties} contains a @code{setFormat} whose
3053 @code{target} references a @code{majorTicks} element, or if it
3054 contains a @code{setStyle} whose @code{target} references a
3055 @code{majorTicks}, or if it contains a @code{setFrameStyle} element,
3056 the @code{setCellProperties} sets the style for row or column labels.
3057 In this case, the @code{setCellProperties} always contains a single
3058 @code{where} element whose @code{variable} designates the variable
3059 whose labels are to be styled. The format from the @code{setFormat},
3060 if present, replaces the labels' format. The style from the
3061 @code{setStyle} that references @code{majorTicks}, if present,
3062 replaces the labels' font and cell styles, except that the background
3063 color is taken instead from the @code{setFrameStyle}'s style, if
3066 When @code{setCellProperties} contains a @code{setStyle} whose
3067 @code{target} references a @code{graph} element, and one that
3068 references a @code{labeling} element, and the @code{union} element
3069 contains @code{alternating}, the @code{setCellProperties} sets the
3070 alternate foreground and background colors for the data area. The
3071 foreground color is taken from the style referenced by the
3072 @code{setStyle} that targets the @code{graph}, the background color
3073 from the @code{setStyle} for @code{labeling}.
3075 A reader may ignore a @code{setCellProperties} that only contains
3076 @code{setMetaData}, as well as @code{setMetaData} within other
3077 @code{setCellProperties}.
3079 A reader may ignore a @code{setCellProperties} whose only @code{set*}
3080 child is a @code{setStyle} that targets the @code{graph} element.
3082 @subsubheading The @code{setStyle} Element
3086 :target=ref (labeling | graph | interval | majorTicks)
3091 This element associates a style with the target.
3093 @defvr {Attribute} target
3094 The @code{id} of an element whose style is to be set.
3097 @defvr {Attribute} style
3098 The @code{id} of a @code{style} element that identifies the style to
3102 @node SPV Detail setFormat Element
3103 @subsection The @code{setFormat} Element
3107 :target=ref (majorTicks | labeling)
3109 => format | numberFormat | stringFormat+ | dateTimeFormat | elapsedTimeFormat
3112 This element sets the format of the target, ``format'' in this case
3113 meaning the SPSS print format for a variable.
3115 The details of this element vary depending on the schema version, as
3116 declared in the root @code{visualization} element's @code{version}
3117 attribute (@pxref{SPV Detail visualization Element}). A reader can
3118 interpret the content without knowing the schema version.
3120 The @code{setFormat} element itself has the following attributes.
3122 @defvr {Attribute} target
3123 Refers to an element whose style is to be set.
3126 @defvr {Attribute} reset
3127 If this is @code{true}, this format replaces the target's previous
3128 format. If it is @code{false}, the modifies the previous format.
3132 * SPV Detail numberFormat Element::
3133 * SPV Detail stringFormat Element::
3134 * SPV Detail dateTimeFormat Element::
3135 * SPV Detail elapsedTimeFormat Element::
3136 * SPV Detail format Element::
3137 * SPV Detail affix Element::
3140 @node SPV Detail numberFormat Element
3141 @subsubsection The @code{numberFormat} Element
3145 :minimumIntegerDigits=int?
3146 :maximumFractionDigits=int?
3147 :minimumFractionDigits=int?
3149 :scientific=(onlyForSmall | whenNeeded | true | false)?
3156 Specifies a format for displaying a number. The available options are
3157 a superset of those available from PSPP print formats. PSPP chooses a
3158 print format type for a @code{numberFormat} as follows:
3162 If @code{scientific} is @code{true}, uses @code{E} format.
3165 If @code{prefix} is @code{$}, uses @code{DOLLAR} format.
3168 If @code{suffix} is @code{%}, uses @code{PCT} format.
3171 If @code{useGrouping} is @code{true}, uses @code{COMMA} format.
3174 Otherwise, uses @code{F} format.
3177 For translating to a print format, PSPP uses
3178 @code{maximumFractionDigits} as the number of decimals, unless that
3179 attribute is missing or out of the range [0,15], in which case it uses
3182 @defvr {Attribute} minimumIntegerDigits
3183 Minimum number of digits to display before the decimal point. Always
3184 observed as @code{0}.
3187 @defvr {Attribute} maximumFractionDigits
3188 @defvrx {Attribute} minimumFractionDigits
3189 Maximum or minimum, respectively, number of digits to display after
3190 the decimal point. The observed values of each attribute range from 0
3194 @defvr {Attribute} useGrouping
3195 Whether to use the grouping character to group digits in large
3199 @defvr {Attribute} scientific
3200 This attribute controls when and whether the number is formatted in
3201 scientific notation. It takes the following values:
3205 Use scientific notation only when the number's magnitude is smaller
3206 than the value of the @code{small} attribute.
3209 Use scientific notation when the number will not otherwise fit in the
3213 Always use scientific notation. Not observed in the corpus.
3216 Never use scientific notation. A number that won't otherwise fit will
3217 be replaced by an error indication (see the @code{errorCharacter}
3218 attribute). Not observed in the corpus.
3222 @defvr {Attribute} small
3223 Only present when the @code{scientific} attribute is
3224 @code{onlyForSmall}, this is a numeric magnitude below which the
3225 number will be formatted in scientific notation. The values @code{0}
3226 and @code{0.0001} have been observed. The value @code{0} seems like a
3227 pathological choice, since no real number has a magnitude less than 0;
3228 perhaps in practice such a choice is equivalent to setting
3229 @code{scientific} to @code{false}.
3232 @defvr {Attribute} prefix
3233 @defvrx {Attribute} suffix
3234 Specifies a prefix or a suffix to apply to the formatted number. Only
3235 @code{suffix} has been observed, with value @samp{%}.
3238 @node SPV Detail stringFormat Element
3239 @subsubsection The @code{stringFormat} Element
3242 stringFormat => relabel* affix*
3244 relabel :from=real :to => EMPTY
3247 The @code{stringFormat} element specifies how to display a string. By
3248 default, a string is displayed verbatim, but @code{relabel} can change
3251 The @code{relabel} element appears as a child of @code{stringFormat}
3252 (and of @code{format}, when it is used to format strings). It
3253 specifies how to display a given value. It is used to implement value
3254 labels and to display the system-missing value in a human-readable
3255 way. It has the following attributes:
3257 @defvr {Attribute} from
3258 The value to map. In the corpus this is an integer or the
3259 system-missing value @code{-1.797693134862316E300}.
3262 @defvr {Attribute} to
3263 The string to display in place of the value of @code{from}. In the
3264 corpus this is a wide variety of value labels; the system-missing
3265 value is mapped to @samp{.}.
3268 @node SPV Detail dateTimeFormat Element
3269 @subsubsection The @code{dateTimeFormat} Element
3273 :baseFormat[dt_base_format]=(date | time | dateTime)
3275 :mdyOrder=(dayMonthYear | monthDayYear | yearMonthDay)?
3277 :yearAbbreviation=bool?
3282 :monthFormat=(long | short | number | paddedNumber)?
3286 :showDayOfWeek=bool?
3287 :dayOfWeekAbbreviation=bool?
3289 :dayOfMonthPadding=bool?
3291 :minutePadding=bool?
3292 :secondPadding=bool?
3298 :dayType=(month | year)?
3299 :hourFormat=(AMPM | AS_24 | AS_12)?
3303 This element appears only in schema version 2.5 and earlier
3304 (@pxref{SPV Detail visualization Element}).
3306 Data to be formatted in date formats is stored as strings in legacy
3307 data, in the format @code{yyyy-mm-ddTHH:MM:SS.SSS} and must be parsed
3308 and reformatted by the reader.
3310 The following attribute is required.
3312 @defvr {Attribute} baseFormat
3313 Specifies whether a date and time are both to be displayed, or just
3317 Many of the attributes' meanings are obvious. The following seem to
3318 be worth documenting.
3320 @defvr {Attribute} separatorChars
3321 Exactly four characters. In order, these are used for: decimal point,
3322 grouping, date separator, time separator. Always @samp{.,-:}.
3325 @defvr {Attribute} mdyOrder
3326 Within a date, the order of the days, months, and years.
3327 @code{dayMonthYear} is the only observed value, but one would expect
3328 that @code{monthDayYear} and @code{yearMonthDay} to be reasonable as
3332 @defvr {Attribute} showYear
3333 @defvrx {Attribute} yearAbbreviation
3334 Whether to include the year and, if so, whether the year should be
3335 shown abbreviated, that is, with only 2 digits. Each is @code{true}
3336 or @code{false}; only values of @code{true} and @code{false},
3337 respectively, have been observed.
3340 @defvr {Attribute} showMonth
3341 @defvrx {Attribute} monthFormat
3342 Whether to include the month (@code{true} or @code{false}) and, if so,
3343 how to format it. @code{monthFormat} is one of the following:
3347 The full name of the month, e.g.@: in an English locale,
3351 The abbreviated name of the month, e.g.@: in an English locale,
3355 The number representing the month, e.g.@: 9 for September.
3358 A two-digit number representing the month, e.g.@: 09 for September.
3361 Only values of @code{true} and @code{short}, respectively, have been
3365 @defvr {Attribute} dayType
3366 This attribute is always @code{month} in the corpus, specifying that
3367 the day of the month is to be displayed; a value of @code{year} is
3368 supposed to indicate that the day of the year, where 1 is January 1,
3369 is to be displayed instead.
3372 @defvr {Attribute} hourFormat
3373 @code{hourFormat}, if present, is one of:
3377 The time is displayed with an @code{am} or @code{pm} suffix, e.g.@:
3381 The time is displayed in a 24-hour format, e.g.@: @code{22:15}.
3383 This is the only value observed in the corpus.
3386 The time is displayed in a 12-hour format, without distinguishing
3387 morning or evening, e.g.@: @code{10;15}.
3390 @code{hourFormat} is sometimes present for @code{elapsedTime} formats,
3391 which is confusing since a time duration does not have a concept of AM
3392 or PM. This might indicate a bug in the code that generated the XML
3393 in the corpus, or it might indicate that @code{elapsedTime} is
3394 sometimes used to format a time of day.
3397 For a @code{baseFormat} of @code{date}, PSPP chooses a print format
3398 type based on the following rules:
3402 If @code{showQuarter} is true: @code{QYR}.
3405 Otherwise, if @code{showWeek} is true: @code{WKYR}.
3408 Otherwise, if @code{mdyOrder} is @code{dayMonthYear}:
3412 If @code{monthFormat} is @code{number} or @code{paddedNumber}: @code{EDATE}.
3415 Otherwise: @code{DATE}.
3419 Otherwise, if @code{mdyOrder} is @code{yearMonthDay}: @code{SDATE}.
3422 Otherwise, @code{ADATE}.
3425 For a @code{baseFormat} of @code{dateTime}, PSPP uses @code{YMDHMS} if
3426 @code{mdyOrder} is @code{yearMonthDay} and @code{DATETIME} otherwise.
3427 For a @code{baseFormat} of @code{time}, PSPP uses @code{DTIME} if
3428 @code{showDay} is true, otherwise @code{TIME} if @code{showHour} is
3429 true, otherwise @code{MTIME}.
3431 For a @code{baseFormat} of @code{date}, the chosen width is the
3432 minimum for the format type, adding 2 if @code{yearAbbreviation} is
3433 false or omitted. For other base formats, the chosen width is the
3434 minimum for its type, plus 3 if @code{showSecond} is true, plus 4 more
3435 if @code{showMillis} is also true. Decimals are 0 by default, or 3
3436 if @code{showMillis} is true.
3438 @node SPV Detail elapsedTimeFormat Element
3439 @subsubsection The @code{elapsedTimeFormat} Element
3443 :baseFormat[dt_base_format]=(date | time | dateTime)
3446 :minutePadding=bool?
3447 :secondPadding=bool?
3457 This element specifies the way to display a time duration.
3459 Data to be formatted in elapsed time formats is stored as strings in
3460 legacy data, in the format @code{H:MM:SS.SSS}, with additional hour
3461 digits as needed for long durations, and must be parsed and
3462 reformatted by the reader.
3464 The following attribute is required.
3466 @defvr {Attribute} baseFormat
3467 Specifies whether a day and a time are both to be displayed, or just
3471 The remaining attributes specify exactly how to display the elapsed
3474 For @code{baseFormat} of @code{time}, PSPP converts this element to
3475 print format type @code{DTIME}; otherwise, if @code{showHour} is true,
3476 to @code{TIME}; otherwise, to @code{MTIME}. The chosen width is the
3477 minimum for the chosen type, adding 3 if @code{showSecond} is true,
3478 adding 4 more if @code{showMillis} is also true. Decimals are 0 by
3479 default, or 3 if @code{showMillis} is true.
3481 @node SPV Detail format Element
3482 @subsubsection The @code{format} Element
3486 :baseFormat[f_base_format]=(date | time | dateTime | elapsedTime)?
3489 :mdyOrder=(dayMonthYear | monthDayYear | yearMonthDay)?
3494 :yearAbbreviation=bool?
3496 :monthFormat=(long | short | number | paddedNumber)?
3498 :dayOfMonthPadding=bool?
3502 :showDayOfWeek=bool?
3503 :dayOfWeekAbbreviation=bool?
3505 :minutePadding=bool?
3506 :secondPadding=bool?
3512 :dayType=(month | year)?
3513 :hourFormat=(AMPM | AS_24 | AS_12)?
3514 :minimumIntegerDigits=int?
3515 :maximumFractionDigits=int?
3516 :minimumFractionDigits=int?
3518 :scientific=(onlyForSmall | whenNeeded | true | false)?
3522 :tryStringsAsNumbers=bool?
3523 :negativesOutside=bool?
3527 This element is the union of all of the more-specific format elements.
3528 It is interpreted in the same way as one of those format elements,
3529 using @code{baseFormat} to determine which kind of format to use.
3531 There are a few attributes not present in the more specific formats:
3533 @defvr {Attribute} tryStringsAsNumbers
3534 When this is @code{true}, it is supposed to indicate that string
3535 values should be parsed as numbers and then displayed according to
3536 numeric formatting rules. However, in the corpus it is always
3540 @defvr {Attribute} negativesOutside
3541 If true, the negative sign should be shown before the prefix; if
3542 false, it should be shown after.
3545 @node SPV Detail affix Element
3546 @subsubsection The @code{affix} Element
3550 :definesReference=int
3551 :position=(subscript | superscript)
3557 This defines a suffix (or, theoretically, a prefix) for a formatted
3558 value. It is used to insert a reference to a footnote. It has the
3559 following attributes:
3561 @defvr {Attribute} definesReference
3562 This specifies the footnote number as a natural number: 1 for the
3563 first footnote, 2 for the second, and so on.
3566 @defvr {Attribute} position
3567 Position for the footnote label. Always @code{superscript}.
3570 @defvr {Attribute} suffix
3571 Whether the affix is a suffix (@code{true}) or a prefix
3572 (@code{false}). Always @code{true}.
3575 @defvr {Attribute} value
3576 The text of the suffix or prefix. Typically a letter, e.g.@: @code{a}
3577 for footnote 1, @code{b} for footnote 2, @enddots{} The corpus
3578 contains other values: @code{*}, @code{**}, and a few that begin with
3579 at least one comma: @code{,b}, @code{,c}, @code{,,b}, and @code{,,c}.
3582 @node SPV Detail interval Element
3583 @subsection The @code{interval} Element
3586 interval :style=ref style => labeling footnotes?
3590 :variable=ref (sourceVariable | derivedVariable)
3591 => (formatting | format | footnotes)*
3593 formatting :variable=ref (sourceVariable | derivedVariable) => formatMapping*
3595 formatMapping :from=int => format?
3599 :variable=ref (sourceVariable | derivedVariable)
3602 footnoteMapping :definesReference=int :from=int :to => EMPTY
3605 The @code{interval} element and its descendants determine the basic
3606 formatting and labeling for the table's cells. These basic styles are
3607 overridden by more specific styles set using @code{setCellProperties}
3608 (@pxref{SPV Detail setCellProperties Element}).
3610 The @code{style} attribute of @code{interval} itself may be ignored.
3612 The @code{labeling} element may have a single @code{formatting} child.
3613 If present, its @code{variable} attribute refers to a variable whose
3614 values are format specifiers as numbers, e.g. value 0x050802 for F8.2.
3615 However, the numbers are not actually interpreted that way. Instead,
3616 each number actually present in the variable's data is mapped by a
3617 @code{formatMapping} child of @code{formatting} to a @code{format}
3618 that specifies how to display it.
3620 The @code{labeling} element may also have a @code{footnotes} child
3621 element. The @code{variable} attribute of this element refers to a
3622 variable whose values are comma-delimited strings that list the
3623 1-based indexes of footnote references. (Cells without any footnote
3624 references are numeric 0 instead of strings.)
3626 Each @code{footnoteMapping} child of the @code{footnotes} element
3627 defines the footnote marker to be its @code{to} attribute text for the
3628 footnote whose 1-based index is given in its @code{definesReference}
3631 @node SPV Detail style Element
3632 @subsection The @code{style} Element
3639 :border-bottom=(solid | thick | thin | double | none)?
3640 :border-top=(solid | thick | thin | double | none)?
3641 :border-left=(solid | thick | thin | double | none)?
3642 :border-right=(solid | thick | thin | double | none)?
3643 :border-bottom-color?
3646 :border-right-color?
3649 :font-weight=(regular | bold)?
3650 :font-style=(regular | italic)?
3651 :font-underline=(none | underline)?
3652 :margin-bottom=dimension?
3653 :margin-left=dimension?
3654 :margin-right=dimension?
3655 :margin-top=dimension?
3656 :textAlignment=(left | right | center | decimal | mixed)?
3657 :labelLocationHorizontal=(positive | negative | center)?
3658 :labelLocationVertical=(positive | negative | center)?
3659 :decimal-offset=dimension?
3666 A @code{style} element has an effect only when it is referenced by
3667 another element to set some aspect of the table's style. Most of the
3668 attributes are self-explanatory. The rest are described below.
3670 @defvr {Attribute} {color}
3671 In some cases, the text color; in others, the background color.
3674 @defvr {Attribute} {color2}
3678 @defvr {Attribute} {labelAngle}
3679 Normally 0. The value -90 causes inner column or outer row labels to
3680 be rotated vertically.
3683 @defvr {Attribute} {labelLocationHorizontal}
3687 @defvr {Attribute} {labelLocationVertical}
3688 The value @code{positive} corresponds to vertically aligning text to
3689 the top of a cell, @code{negative} to the bottom, @code{center} to the
3693 @node SPV Detail labelFrame Element
3694 @subsection The @code{labelFrame} Element
3697 labelFrame :style=ref style => location+ label? paragraph?
3699 paragraph :hangingIndent=dimension? => EMPTY
3702 A @code{labelFrame} element specifies content and style for some
3703 aspect of a table. Only @code{labelFrame} elements that have a
3704 @code{label} child are important. The @code{purpose} attribute in the
3705 @code{label} determines what the @code{labelFrame} affects:
3709 The table's title and its style.
3712 The table's caption and its style.
3715 The table's footnotes and the style for the footer area.
3718 The style for the layer area.
3724 The @code{style} attribute references the style to use for the area.
3726 The @code{label}, if present, specifies the text to put into the title
3727 or caption or footnotes. For footnotes, the label has two @code{text}
3728 children for every footnote, each of which has a @code{usesReference}
3729 attribute identifying the 1-based index of a footnote. The first,
3730 third, fifth, @dots{} @code{text} child specifies the content for a
3731 footnote; the second, fourth, sixth, @dots{} child specifies the
3732 marker. Content tends to end in a new-line, which the reader may wish
3733 to trim; similarly, markers tend to end in @samp{.}.
3735 The @code{paragraph}, if present, may be ignored, since it is always
3738 @node SPV Detail Legacy Properties
3739 @subsection Legacy Properties
3741 The detail XML format has features for styling most of the aspects of
3742 a table. It also inherits defaults for many aspects from structure
3743 XML, which has the following @code{tableProperties} element:
3748 => generalProperties footnoteProperties cellFormatProperties borderProperties printingProperties
3751 :hideEmptyRows=bool?
3752 :maximumColumnWidth=dimension?
3753 :maximumRowWidth=dimension?
3754 :minimumColumnWidth=dimension?
3755 :minimumRowWidth=dimension?
3756 :rowDimensionLabels=(inCorner | nested)?
3760 :markerPosition=(superscript | subscript)?
3761 :numberFormat=(alphabetic | numeric)?
3764 cellFormatProperties => cell_style+
3767 :alternatingColor=color?
3768 :alternatingTextColor=color?
3776 :font-style=(regular | italic)?
3777 :font-weight=(regular | bold)?
3778 :font-underline=(none | underline)?
3779 :labelLocationVertical=(positive | negative | center)?
3780 :margin-bottom=dimension?
3781 :margin-left=dimension?
3782 :margin-right=dimension?
3783 :margin-top=dimension?
3784 :textAlignment=(left | right | center | decimal | mixed)?
3785 :decimal-offset=dimension?
3788 borderProperties => border_style+
3791 :borderStyleType=(none | solid | dashed | thick | thin | double)?
3796 :printAllLayers=bool?
3797 :rescaleLongTableToFitPage=bool?
3798 :rescaleWideTableToFitPage=bool?
3799 :windowOrphanLines=int?
3801 :continuationTextAtBottom=bool?
3802 :continuationTextAtTop=bool?
3803 :printEachLayerOnSeparatePage=bool?
3807 The @code{name} attribute appears only in standalone @file{.stt} files
3808 (@pxref{SPSS TableLook STT Format}).