1 @c PSPP - a program for statistical analysis.
2 @c Copyright (C) 2019 Free Software Foundation, Inc.
3 @c Permission is granted to copy, distribute and/or modify this document
4 @c under the terms of the GNU Free Documentation License, Version 1.3
5 @c or any later version published by the Free Software Foundation;
6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7 @c A copy of the license is included in the section entitled "GNU
8 @c Free Documentation License".
11 @node SPSS Viewer File Format
12 @appendix SPSS Viewer File Format
14 SPSS Viewer or @file{.spv} files, here called SPV files, are written
15 by SPSS 16 and later to represent the contents of its output editor.
16 This chapter documents the format, based on examination of a corpus of
17 about 8,000 files from a variety of sources. This description is
18 detailed enough to both read and write SPV files.
20 SPSS 15 and earlier versions instead use @file{.spo} files, which have
21 a completely different output format based on the Microsoft Compound
22 Document Format. This format is not documented here.
24 An SPV file is a Zip archive that can be read with @command{zipinfo}
25 and @command{unzip} and similar programs. The final member in the Zip
26 archive is the @dfn{manifest}, a file named
27 @file{META-INF/MANIFEST.MF}. This structure makes SPV files resemble
28 Java ``JAR'' files (and ODF files), but whereas a JAR manifest
29 contains a sequence of colon-delimited key/value pairs, an SPV
30 manifest contains the string @samp{allowPivoting=true}, without a
31 new-line. PSPP uses this string to identify an SPV file; it is
32 invariant across the corpus.@footnote{SPV files always begin with the
33 7-byte sequence 50 4b 03 04 14 00 08, but this is not a useful magic
34 number because most Zip archives start the same way.}@footnote{SPSS
35 writes @file{META-INF/MANIFEST.MF} to every SPV file, but it does not
36 read it or even require it to exist, so using different contents,
37 e.g.@: as @samp{allowingPivot=false} has no effect.}
39 The rest of the members in an SPV file's Zip archive fall into two
40 categories: @dfn{structure} and @dfn{detail} members. Structure
41 member names begin with @file{outputViewer@var{nnnnnnnnnn}}, where
42 each @var{n} is a decimal digit, and end with @file{.xml}, and often
43 include the string @file{_heading} in between. Each of these members
44 represents some kind of output item (a table, a heading, a block of
45 text, etc.) or a group of them. The member whose output goes at the
46 beginning of the document is numbered 0, the next member in the output
47 is numbered 1, and so on.
49 Structure members contain XML. This XML is sometimes self-contained,
50 but it often references detail members in the Zip archive, which are
54 @item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin}
55 @itemx @file{@var{prefix}_lightTableData.bin}
56 The structure of a table plus its data. Older SPV files pair a
57 @file{@var{prefix}_table.xml} file that describes the table's
58 structure with a binary @file{@var{prefix}_tableData.bin} file that
59 gives its data. Newer SPV files (the majority of those in the corpus)
60 instead include a single @file{@var{prefix}_lightTableData.bin} file
61 that incorporates both into a single binary format.
63 @item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin}
64 @itemx @file{@var{prefix}_lightWarningData.bin}
65 Same format used for tables, with a different name.
67 @item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin}
68 @itemx @file{@var{prefix}_lightNotesData.bin}
69 Same format used for tables, with a different name.
71 @item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml}
72 The structure of a chart plus its data. Charts do not have a
75 @item @file{@var{prefix}_Imagegeneric.png}
76 @itemx @file{@var{prefix}_PastedObjectgeneric.png}
77 @itemx @file{@var{prefix}_imageData.bin}
78 A PNG image referenced by an @code{object} element (in the first two
79 cases) or an @code{image} element (in the final case). @xref{SPV
80 Structure object and image Elements}.
82 @item @file{@var{prefix}_pmml.scf}
83 @itemx @file{@var{prefix}_stats.scf}
84 @item @file{@var{prefix}_model.xml}
85 Not yet investigated. The corpus contains few examples.
88 The @file{@var{prefix}} in the names of the detail members is
89 typically an 11-digit decimal number that increases for each item,
90 tending to skip values. Older SPV files use different naming
91 conventions. Structure member refer to detail members by name, and so
92 their exact names do not matter to readers as long as they are unique.
94 SPSS tolerates corrupted Zip archives that Zip reader libraries tend
95 to reject. These can be fixed up with @command{zip -FF}.
98 * SPV Structure Member Format::
99 * SPV Light Detail Member Format::
100 * SPV Legacy Detail Member Binary Format::
101 * SPV Legacy Detail Member XML Format::
104 @node SPV Structure Member Format
105 @section Structure Member Format
107 A structure member lays out the high-level structure for a group of
108 output items such as heading, tables, and charts. Structure members
109 do not include the details of tables and charts but instead refer to
110 them by their member names.
112 Structure members' XML files claim conformance with a collection of
113 XML Schemas. These schemas are distributed, under a nonfree license,
114 with SPSS binaries. Fortunately, the schemas are not necessary to
115 understand the structure members. The schemas can even
116 be deceptive because they document elements and attributes that are
117 not in the corpus and do not document elements and attributes that are
118 commonly found in the corpus.
120 Structure members use a different XML namespace for each schema, but
121 these namespaces are not entirely consistent. In some SPV files, for
122 example, the @code{viewer-tree} schema is associated with namespace
123 @indicateurl{http://xml.spss.com/spss/viewer-tree} and in others with
124 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the
125 additional @file{viewer/}). Under either name, the schema URIs are
126 not resolvable to obtain the schemas themselves.
128 One may ignore all of the above in interpreting a structure member.
129 The actual XML has a simple and straightforward form that does not
130 require a reader to take schemas or namespaces into account. A
131 structure member's root is @code{heading} element, which contains
132 @code{heading} or @code{container} elements (or a mix), forming a
133 tree. In turn, @code{container} holds a @code{label} and one more
134 child, usually @code{text} or @code{table}.
136 The following sections document the elements found in structure
137 members in a context-free grammar-like fashion. Consider the
138 following example, which specifies the attributes and content for the
139 @code{container} element:
143 :visibility=(visible | hidden)
144 :page-break-before=(always)?
145 :text-align=(left | center)?
147 => label (table | container_text | graph | model | object | image | tree)
150 Each attribute specification begins with @samp{:} followed by the
151 attribute's name. If the attribute's value has an easily specified
152 form, then @samp{=} and its description follows the name. Finally, if
153 the attribute is optional, the specification ends with @samp{?}. The
154 following value specifications are defined:
157 @item (@var{a} | @var{b} | @dots{})
158 One of the listed literal strings. If only one string is listed, it
159 is the only acceptable value. If @code{OTHER} is listed, then any
160 string not explicitly listed is also accepted.
163 Either @code{true} or @code{false}.
166 A floating-point number followed by a unit, e.g.@: @code{10pt}. Units
167 in the corpus include @code{in} (inch), @code{pt} (points, 72/inch),
168 @code{px} (``device-independent pixels'', 96/inch), and @code{cm}. If
169 the unit is omitted then points should be assumed. The number and
170 unit may be separated by white space.
172 The corpus also includes localized names for units. A reader must
173 understand these to properly interpret the dimension:
177 @code{인치}, @code{pol.}, @code{cala}, @code{cali}
187 A floating-point number.
193 A color in one of the forms @code{#@var{rr}@var{gg}@var{bb}} or
194 @code{@var{rr}@var{gg}@var{bb}}, or the string @code{transparent}, or
195 one of the standard Web color names.
198 @item ref @var{element}
199 @itemx ref(@var{elem1} | @var{elem2} | @dots{})
200 The name from the @code{id} attribute in some element. If one or more
201 elements are named, the name must refer to one of those elements,
202 otherwise any element is acceptable.
205 All elements have an optional @code{id} attribute. If present, its
206 value must be unique. In practice many elements are assigned
207 @code{id} attributes that are never referenced.
209 The content specification for an element supports the following
216 @item @var{a} @var{b}
217 @var{a} followed by @var{b}.
219 @item @var{a} | @var{b} | @var{c}
220 One of @var{a} or @var{b} or @var{c}.
223 Zero or one instances of @var{a}.
226 Zero or more instances of @var{a}.
229 One or more instances of @var{a}.
231 @item (@var{subexpression})
232 Grouping for a subexpression.
241 Element and attribute names are sometimes suffixed by another name in
242 square brackets to distinguish different uses of the same name. For
243 example, structure XML has two @code{text} elements, one inside
244 @code{container}, the other inside @code{pageParagraph}. The former
245 is defined as @code{text[container_text]} and referenced as
246 @code{container_text}, the latter defined as
247 @code{text[pageParagraph_text]} and referenced as
248 @code{pageParagraph_text}.
250 This language is used in the PSPP source code for parsing structure
251 and detail XML members. Refer to
252 @file{src/output/spv/structure-xml.grammar} and
253 @file{src/output/spv/detail-xml.grammar} for the full grammars.
255 The following example shows the contents of a typical structure member
256 for a @cmd{DESCRIPTIVES} procedure. A real structure member is not
257 indented. This example also omits most attributes, all XML namespace
258 information, and the CSS from the embedded HTML:
261 <?xml version="1.0" encoding="utf-8"?>
263 <label>Output</label>
264 <heading commandName="Descriptives">
265 <label>Descriptives</label>
268 <text commandName="Descriptives" type="title">
270 <![CDATA[<head><style type="text/css">...</style></head><BR>Descriptives]]>
274 <container visibility="hidden">
276 <table commandName="Descriptives" subType="Notes" type="note">
278 <dataPath>00000000001_lightNotesData.bin</dataPath>
283 <label>Descriptive Statistics</label>
284 <table commandName="Descriptives" subType="Descriptive Statistics"
287 <dataPath>00000000002_lightTableData.bin</dataPath>
296 * SPV Structure heading Element::
297 * SPV Structure label Element::
298 * SPV Structure container Element::
299 * SPV Structure text Element (Inside @code{container})::
300 * SPV Structure html Element::
301 * SPV Structure table Element::
302 * SPV Structure graph Element::
303 * SPV Structure model Element::
304 * SPV Structure object and image Elements::
305 * SPV Structure tree Element::
306 * SPV Structure Path Elements::
307 * SPV Structure pageSetup Element::
308 * SPV Structure @code{text} Element (Inside @code{pageParagraph})::
311 @node SPV Structure heading Element
312 @subsection The @code{heading} Element
315 heading[root_heading]
321 => label pageSetup? (container | heading)*
326 :visibility[heading_visibility]=(collapsed)?
329 => label (container | heading)*
332 A @code{heading} represents a tree of content that appears in an
333 output viewer window. It contains a @code{label} text string that is
334 shown in the outline view ordinarily followed by content containers or
335 further nested (sub)-sections of output. Unlike heading elements in
336 HTML and other common document formats, which precede the content that
337 they head, @code{heading} contains the elements that appear below the
340 The root of a structure member is a special @code{heading}. The
341 direct children of the root @code{heading} elements in all structure
342 members in an SPV file are siblings. That is, the root @code{heading}
343 in all of the structure members conceptually represent the same node.
344 The root heading's @code{label} is ignored (see @pxref{SPV Structure
345 label Element}). The root heading in the first structure member in
346 the Zip file (typically named @file{outputViewer0000000000.xml}) may
347 contain a @code{pageSetup} element.
349 The following attributes have been observed on both document root and
350 nested @code{heading} elements.
352 @defvr {Attribute} creator-version
353 The version of the software that created this SPV file. A string of
354 the form @code{xxyyzzww} represents software version xx.yy.zz.ww,
355 e.g.@: @code{21000001} is version 21.0.0.1. Trailing pairs of zeros
356 are sometimes omitted, so that @code{21}, @code{210000}, and
357 @code{21000000} are all version 21.0.0.0 (and the corpus contains all
358 three of those forms).
362 The following attributes have been observed on document root
363 @code{heading} elements only:
365 @defvr {Attribute} @code{creator}
366 The directory in the file system of the software that created this SPV
370 @defvr {Attribute} @code{creation-date-time}
371 The date and time at which the SPV file was written, in a
372 locale-specific format, e.g.@: @code{Friday, May 16, 2014 6:47:37 PM
373 PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday,
374 December 5, 2014 5:00:19 o'clock PM EST}.
377 @defvr {Attribute} @code{lockReader}
378 Whether a reader should be allowed to edit the output. The possible
379 values are @code{true} and @code{false}. The value @code{false} is by
383 @defvr {Attribute} @code{schemaLocation}
384 This is actually an XML Namespace attribute. A reader may ignore it.
388 The following attributes have been observed only on nested
389 @code{heading} elements:
391 @defvr {Attribute} @code{commandName}
392 A locale-invariant identifier for the command that produced the
393 output, e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
396 @defvr {Attribute} @code{visibility}
397 If this attribute is absent, the heading's content is expanded in the
398 outline view. If it is set to @code{collapsed}, it is collapsed.
399 (This attribute is never present in a root @code{heading} because the
400 root node is always expanded when a file is loaded, even though the UI
401 can be used to collapse it interactively.)
404 @defvr {Attribute} @code{locale}
405 The locale used for output, in Windows format, which is similar to the
406 format used in Unix with the underscore replaced by a hyphen, e.g.@:
407 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
410 @defvr {Attribute} @code{olang}
411 The output language, e.g.@: @code{en}, @code{it}, @code{es},
412 @code{de}, @code{pt-BR}.
415 @node SPV Structure label Element
416 @subsection The @code{label} Element
422 Every @code{heading} and @code{container} holds a @code{label} as its
423 first child. The label text is what appears in the outline pane of
424 the GUI's viewer window. PSPP also puts it into the outline of PDF
425 output. The label text doesn't appear in the output itself.
427 The text in @code{label} describes what it labels, often by naming the
428 statistical procedure that was executed, e.g.@: ``Frequencies'' or
429 ``T-Test''. Labels are often very generic, especially within a
430 @code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''.
431 Label text is localized according to the output language, e.g.@: in
432 Italian a frequency table procedure is labeled ``Frequenze''.
434 The user can edit labels to be anything they want. The corpus
435 contains a few examples of empty labels, ones that contain no text,
436 probably as a result of user editing.
438 The root @code{heading} in an SPV file has a @code{label}, like every
439 @code{heading}. It normally contains ``Output'' but its content is
440 disregarded anyway. The user cannot edit it.
442 @node SPV Structure container Element
443 @subsection The @code{container} Element
447 :visibility=(visible | hidden)
448 :page-break-before=(always)?
449 :text-align=(left | center)?
451 => label (table | container_text | graph | model | object | image | tree)
454 A @code{container} serves to contain and label a @code{table},
455 @code{text}, or other kind of item.
457 This element has the following attributes.
459 @defvr {Attribute} @code{visibility}
460 Whether the container's content is displayed. ``Notes'' tables are
461 often hidden; other data is usually visible.
464 @defvr {Attribute} @code{text-align}
465 Alignment of text within the container. Observed with nested
466 @code{table} and @code{text} elements.
469 @defvr {Attribute} @code{width}
470 The width of the container, e.g.@: @code{1097px}.
473 @node SPV Structure text Element (Inside @code{container})
474 @subsection The @code{text} Element (Inside @code{container})
478 :type[text_type]=(title | log | text | page-title)
484 This @code{text} element is nested inside a @code{container}. There
485 is a different @code{text} element that is nested inside a
486 @code{pageParagraph}.
488 This element has the following attributes.
490 @defvr {Attribute} @code{type}
491 The semantics of the text.
494 @defvr {Attribute} @code{commandName}
495 As on the @code{heading} element. For output not specific to a
496 command, this is simply @code{log}. The corpus contains one example
497 of where @code{commandName} is present but set to the empty string.
500 @defvr {Attribute} @code{creator-version}
501 As on the @code{heading} element.
504 @node SPV Structure html Element
505 @subsection The @code{html} Element
508 html :lang=(en) => TEXT
511 The element contains an HTML document as text (or, in practice, as
512 CDATA). In some cases, the document starts with @code{<html>} and
513 ends with @code{</html>}; in others the @code{html} element is
514 implied. Generally the HTML includes a @code{head} element with a CSS
515 stylesheet. The HTML body often begins with @code{<BR>}.
517 The HTML document uses only the following elements:
521 Sometimes, the document is enclosed with
522 @code{<html>}@dots{}@code{</html>}.
525 The HTML body often begins with @code{<BR>} and may contain it as well.
533 The attributes @code{face}, @code{color}, and @code{size} are
534 observed. The value of @code{color} takes one of the forms
535 @code{#@var{rr}@var{gg}@var{bb}} or @code{rgb (@var{r}, @var{g},
536 @var{b})}. The value of @code{size} is a number between 1 and 7,
540 The CSS in the corpus is simple. To understand it, a parser only
541 needs to be able to skip white space, @code{<!--}, and @code{-->}, and
542 parse style only for @code{p} elements. Only the following properties
547 In the form @code{@var{rr}@var{gg}@var{bb}}, e.g. @code{000000}, with
551 Either @code{bold} or @code{normal}.
554 Either @code{italic} or @code{normal}.
556 @item text-decoration
557 Either @code{underline} or @code{normal}.
560 A font name, commonly @code{Monospaced} or @code{SansSerif}.
563 Values claim to be in points, e.g.@: @code{14pt}, but the values are
564 actually in ``device-independent pixels'' (px), at 96/inch.
567 This element has the following attributes.
569 @defvr {Attribute} @code{lang}
570 This always contains @code{en} in the corpus.
573 @node SPV Structure table Element
574 @subsection The @code{table} Element
583 :displayFiltering=bool?
585 :orphanTolerance=int?
590 :type[table_type]=(table | note | warning)
591 => tableProperties? tableStructure
593 tableStructure => path? dataPath csvPath?
596 This element has the following attributes.
598 @defvr {Attribute} @code{commandName}
599 As on the @code{heading} element.
602 @defvr {Attribute} @code{type}
603 One of @code{table}, @code{note}, or @code{warning}.
606 @defvr {Attribute} @code{subType}
607 The locale-invariant command ID for the particular kind of output that
608 this table represents in the procedure. This can be the same as
609 @code{commandName} e.g.@: @code{Frequencies}, or different, e.g.@:
610 @code{Case Processing Summary}. Generic subtypes @code{Notes} and
611 @code{Warnings} are often used.
614 @defvr {Attribute} @code{tableId}
615 A number that uniquely identifies the table within the SPV file,
616 typically a large negative number such as @code{-4147135649387905023}.
619 @defvr {Attribute} @code{creator-version}
620 As on the @code{heading} element. In the corpus, this is only present
621 for version 21 and up and always includes all 8 digits.
624 @xref{SPV Detail Legacy Properties}, for details on the
625 @code{tableProperties} element.
627 @node SPV Structure graph Element
628 @subsection The @code{graph} Element
643 => dataPath? path csvPath?
646 This element represents a graph. The @code{dataPath} and @code{path}
647 elements name the Zip members that give the details of the graph.
648 Normally, both elements are present; there is only one counterexample
651 @code{csvPath} only appears in one SPV file in the corpus, for two
652 graphs. In these two cases, @code{dataPath}, @code{path}, and
653 @code{csvPath} all appear. These @code{csvPath} name Zip members with
654 names of the form @file{@var{number}_csv.bin}, where @var{number} is a
655 many-digit number and the same as the @code{csvFileIds}. The named
656 Zip members are CSV text files (despite the @file{.bin} extension).
657 The CSV files are encoded in UTF-8 and begin with a U+FEFF byte-order
660 @node SPV Structure model Element
661 @subsection The @code{model} Element
673 => ViZml? dataPath? path | pmmlContainerPath statsContainerPath
675 pmmlContainerPath => TEXT
677 statsContainerPath => TEXT
679 ViZml :viewName? => TEXT
682 This element represents a model. The @code{dataPath} and @code{path}
683 elements name the Zip members that give the details of the model.
684 Normally, both elements are present; there is only one counterexample
687 The details are unexplored. The @code{ViZml} element contains base-64
688 encoded text, that decodes to a binary format with some embedded text
689 strings, and @code{path} names an Zip member that contains XML.
690 Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath}
691 name Zip members with @file{.scf} extension.
693 @node SPV Structure object and image Elements
694 @subsection The @code{object} and @code{image} Elements
697 object :type[object_type]=(unknown)? :uri => EMPTY
699 image :VDPId :commandName => dataPath
702 These two elements represent an image in PNG format. They are
703 equivalent and the corpus contains examples of both. The only
704 difference is the syntax: for @code{object}, the @code{uri} attribute
705 names the Zip member that contains a PNG file; for @code{image}, the
706 text of the inner @code{dataPath} element names the Zip member.
708 PSPP writes @code{object} in output but there is no strong reason to
711 The corpus only contains PNG image files.
713 @node SPV Structure tree Element
714 @subsection The @code{tree} Element
725 This element represents a tree. The @code{dataPath} and @code{path}
726 elements name the Zip members that give the details of the tree.
727 The details are unexplored.
729 @node SPV Structure Path Elements
730 @subsection Path Elements
740 These element contain the name of the Zip members that hold details
741 for a container. For tables:
745 When a ``light'' format is used, only @code{dataPath} is present, and
746 it names a @file{.bin} member of the Zip file that has @code{light} in
747 its name, e.g.@: @code{0000000001437_lightTableData.bin} (@pxref{SPV
748 Light Detail Member Format}).
751 When the legacy format is used, both are present. In this case,
752 @code{dataPath} names a Zip member with a legacy binary format that
753 contains relevant data (@pxref{SPV Legacy Detail Member Binary
754 Format}), and @code{path} names a Zip member that uses an XML format
755 (@pxref{SPV Legacy Detail Member XML Format}).
758 Graphs normally follow the legacy approach described above. The
759 corpus contains one example of a graph with @code{path} but not
760 @code{dataPath}. The reason is unexplored.
762 Models use @code{path} but not @code{dataPath}. @xref{SPV Structure
763 graph Element}, for more information.
765 These elements have no attributes.
767 @node SPV Structure pageSetup Element
768 @subsection The @code{pageSetup} Element
772 :initial-page-number=int?
773 :chart-size=(as-is | full-height | half-height | quarter-height | OTHER)?
774 :margin-left=dimension?
775 :margin-right=dimension?
776 :margin-top=dimension?
777 :margin-bottom=dimension?
778 :paper-height=dimension?
779 :paper-width=dimension?
780 :reference-orientation?
781 :space-after=dimension?
782 => pageHeader pageFooter
784 pageHeader => pageParagraph?
786 pageFooter => pageParagraph?
788 pageParagraph => pageParagraph_text
791 The @code{pageSetup} element has the following attributes.
793 @defvr {Attribute} @code{initial-page-number}
794 The page number to put on the first page of printed output. Usually
798 @defvr {Attribute} @code{chart-size}
799 One of the listed, self-explanatory chart sizes,
800 @code{quarter-height}, or a localization (!) of one of these (e.g.@:
801 @code{dimensione attuale}, @code{Wie vorgegeben}).
804 @defvr {Attribute} @code{margin-left}
805 @defvrx {Attribute} @code{margin-right}
806 @defvrx {Attribute} @code{margin-top}
807 @defvrx {Attribute} @code{margin-bottom}
808 Margin sizes, e.g.@: @code{0.25in}.
811 @defvr {Attribute} @code{paper-height}
812 @defvrx {Attribute} @code{paper-width}
816 @defvr {Attribute} @code{reference-orientation}
817 Indicates the orientation of the output page. Either @code{0deg}
818 (portrait) or @code{90deg} (landscape),
821 @defvr {Attribute} @code{space-after}
822 The amount of space between printed objects, typically @code{12pt}.
825 @node SPV Structure @code{text} Element (Inside @code{pageParagraph})
826 @subsection The @code{text} Element (Inside @code{pageParagraph})
829 text[pageParagraph_text] :type=(title | text) => TEXT
832 This @code{text} element is nested inside a @code{pageParagraph}. There
833 is a different @code{text} element that is nested inside a
836 The element is either empty, or contains CDATA that holds almost-XHTML
837 text: in the corpus, either an @code{html} or @code{p} element. It is
838 @emph{almost}-XHTML because the @code{html} element designates the
840 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} instead of
841 an XHTML namespace, and because the CDATA can contain substitution
842 variables. The following variables are supported:
847 The current date or time in the preferred format for the locale.
853 First-, second-, third-, or fourth-level heading.
859 Name of the output file.
865 @code{&[Page]} for the page number and @code{&[PageTitle]} for the
868 Typical contents (indented for clarity):
871 <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
874 <p style="text-align:right; margin-top: 0">Page &[Page]</p>
879 This element has the following attributes.
881 @defvr {Attribute} @code{type}
885 @node SPV Light Detail Member Format
886 @section Light Detail Member Format
888 This section describes the format of ``light'' detail @file{.bin}
889 members. These members have a binary format which we describe here in
890 terms of a context-free grammar using the following conventions:
893 @item NonTerminal @result{} @dots{}
894 Nonterminals have CamelCaps names, and @result{} indicates a
895 production. The right-hand side of a production is often broken
896 across multiple lines. Break points are chosen for aesthetics only
897 and have no semantic significance.
899 @item 00, 01, @dots{}, ff.
900 A bytes with a fixed value, written as a pair of hexadecimal digits.
902 @item i0, i1, @dots{}, i9, i10, i11, @dots{}
903 @itemx ib0, ib1, @dots{}, ib9, ib10, ib11, @dots{}
904 A 32-bit integer in little-endian or big-endian byte order,
905 respectively, with a fixed value, written in decimal. Prefixed by
906 @samp{i} for little-endian or @samp{ib} for big-endian.
912 A byte with value 0 or 1.
916 A 16-bit unsigned integer in little-endian or big-endian byte order,
921 A 32-bit unsigned integer in little-endian or big-endian byte order,
926 A 64-bit unsigned integer in little-endian or big-endian byte order,
930 A 64-bit IEEE floating-point number.
933 A 32-bit IEEE floating-point number.
937 A 32-bit unsigned integer, in little-endian or big-endian byte order,
938 respectively, followed by the specified number of bytes of character
939 data. (The encoding is indicated by the Formats nonterminal.)
942 @var{x} is optional, e.g.@: 00? is an optional zero byte.
944 @item @var{x}*@var{n}
945 @var{x} is repeated @var{n} times, e.g.@: byte*10 for ten arbitrary bytes.
947 @item @var{x}[@var{name}]
948 Gives @var{x} the specified @var{name}. Names are used in textual
949 explanations. They are also used, also bracketed, to indicate counts,
950 e.g.@: @code{int32[n] byte*[n]} for a 32-bit integer followed by the
951 specified number of arbitrary bytes.
953 @item @var{a} @math{|} @var{b}
954 Either @var{a} or @var{b}.
957 Parentheses are used for grouping to make precedence clear, especially
958 in the presence of @math{|}, e.g.@: in 00 (01 @math{|} 02 @math{|} 03)
962 @itemx becount(@var{x})
963 A 32-bit unsigned integer, in little-endian or big-endian byte order,
964 respectively, that indicates the number of bytes in @var{x}, followed
968 In a version 1 @file{.bin} member, @var{x}; in version 3, nothing.
969 (The @file{.bin} header indicates the version.)
972 In a version 3 @file{.bin} member, @var{x}; in version 1, nothing.
975 PSPP uses this grammar to parse light detail members. See
976 @file{src/output/spv/light-binary.grammar} in the PSPP source tree for
979 Little-endian byte order is far more common in this format, but a few
980 pieces of the format use big-endian byte order.
982 Light detail members express linear units in two ways: points (pt), at
983 72/inch, and ``device-independent pixels'' (px), at 96/inch. To
984 convert from pt to px, multiply by 1.33 and round up. To convert
985 from px to pt, divide by 1.33 and round down.
987 A ``light'' detail member @file{.bin} consists of a number of sections
988 concatenated together, terminated by an optional byte 01:
992 Header Titles Footnotes
993 Areas Borders PrintSettings TableSettings Formats
994 Dimensions Axes Cells
998 The following sections go into more detail.
1001 * SPV Light Member Header::
1002 * SPV Light Member Titles::
1003 * SPV Light Member Footnotes::
1004 * SPV Light Member Areas::
1005 * SPV Light Member Borders::
1006 * SPV Light Member Print Settings::
1007 * SPV Light Member Table Settings::
1008 * SPV Light Member Formats::
1009 * SPV Light Member Dimensions::
1010 * SPV Light Member Categories::
1011 * SPV Light Member Axes::
1012 * SPV Light Member Cells::
1013 * SPV Light Member Value::
1014 * SPV Light Member ValueMod::
1017 @node SPV Light Member Header
1020 An SPV light member begins with a 39-byte header:
1025 (i1 @math{|} i3)[version]
1028 bool[rotate-inner-column-labels]
1029 bool[rotate-outer-row-labels]
1032 int32[min-col-width] int32[max-col-width]
1033 int32[min-row-width] int32[max-row-width]
1037 @code{version} is a version number that affects the interpretation of
1038 some of the other data in the member. We will refer to ``version 1''
1039 and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for
1040 version-specific formatting (as described previously).
1042 If @code{rotate-inner-column-labels} is 1, then column labels closest
1043 to the data are rotated 90° counterclockwise; otherwise, they are
1044 shown in the normal way.
1046 If @code{rotate-outer-row-labels} is 1, then row labels farthest from
1047 the data are rotated 90° counterclockwise; otherwise, they are shown
1050 @code{min-col-width} is the minimum width that a column will be
1051 assigned automatically. @code{max-col-width} is the maximum width
1052 that a column will be assigned to accommodate a long column label.
1053 @code{min-row-width} and @code{max-row-width} are a similar range for
1054 the width of row labels. All of these measurements are in 1/96 inch
1055 units (called a ``device independent pixel'' unit in Windows).
1057 @code{table-id} is a binary version of the @code{tableId} attribute in
1058 the structure member that refers to the detail member. For example,
1059 if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
1060 would be 0xc6c99d183b300001.
1062 The meaning of the other variable parts of the header is not known. A
1063 writer may safely use version 3, true for @code{x0}, false for
1064 @code{x1}, true for @code{x2}, and 0x15 for @code{x3}.
1066 @node SPV Light Member Titles
1072 Value[subtype] 01? 31
1073 Value[user-title] 01?
1074 (31 Value[corner-text] @math{|} 58)
1075 (31 Value[caption] @math{|} 58)
1078 The Titles follow the Header and specify the table's title, caption,
1081 The @code{user-title} reflects any user
1082 editing of the title text or style. The @code{title} is the title
1083 originally generated by the procedure. Both of these are appropriate
1084 for presentation and localized to the user's language. For example,
1085 for a frequency table, @code{title} and @code{user-title} normally
1086 name the variable and @code{c} is simply ``Frequencies''.
1088 @code{subtype} is the same as the @code{subType} attribute in the
1089 @code{table} structure XML element that referred to this member.
1090 @xref{SPV Structure table Element}, for details.
1092 The @code{corner-text}, if present, is shown in the upper-left corner
1093 of the table, above the row headings and to the left of the column
1094 headings. It is usually absent. When row dimension labels are
1095 displayed in the corner (see @code{show-row-labels-in-corner}), corner
1098 The @code{caption}, if present, is shown below the table.
1099 @code{caption} reflects user editing of the caption.
1101 @node SPV Light Member Footnotes
1102 @subsection Footnotes
1105 Footnotes => int32[n-footnotes] Footnote*[n-footnotes]
1106 Footnote => Value[text] (58 @math{|} 31 Value[marker]) int32[show]
1109 Each footnote has @code{text} and an optional custom @code{marker}
1112 The syntax for Value would allow footnotes (and their markers) to
1113 reference other footnotes, but in practice this doesn't work.
1115 @code{show} is a 32-bit signed integer. It is positive to show the
1116 footnote or negative to hide it. Its magnitude is often 1, and in
1117 other cases tends to be the number of references to the footnote.
1118 It is safe to write 1 to show a footnote and -1 to hide it.
1120 @node SPV Light Member Areas
1127 string[typeface] float[size] int32[style] bool[underline]
1128 int32[halign] int32[valign]
1129 string[fg-color] string[bg-color]
1130 bool[alternate] string[alt-fg-color] string[alt-bg-color]
1131 v3(int32[left-margin] int32[right-margin] int32[top-margin] int32[bottom-margin])
1134 Each Area represents the style for a different area of the table, in
1135 the following order: title, caption, footer, corner, column labels,
1136 row labels, data, and layers.
1138 @code{index} is the 1-based index of the Area, i.e.@: 1 for the first
1139 Area, through 8 for the final Area.
1141 @code{typeface} is the string name of the font used in the area. In
1142 the corpus, this is @code{SansSerif} in over 99% of instances and
1143 @code{Times New Roman} in the rest.
1145 @code{size} is the size of the font, in px (@pxref{SPV Light Detail
1146 Member Format}). The most common size in the corpus is 12 px. Even
1147 though @code{size} has a floating-point type, in the corpus its values
1148 are always integers.
1150 @code{style} is a bit mask. Bit 0 (with value 1) is set for bold, bit
1151 1 (with value 2) is set for italic.
1153 @code{underline} is 1 if the font is underlined, 0 otherwise.
1155 @code{halign} specifies horizontal alignment: 0 for center, 2 for
1156 left, 4 for right, 61453 for decimal, 64173 for mixed. Mixed
1157 alignment varies according to type: string data is left-justified,
1158 numbers and most other formats are right-justified.
1160 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
1163 @code{fg-color} and @code{bg-color} are the foreground color and
1164 background color, respectively. In the corpus, these are always
1165 @code{#000000} and @code{#ffffff}, respectively.
1167 @code{alternate} is 1 if rows should alternate colors, 0 if all rows
1168 should be the same color. When @code{alternate} is 1,
1169 @code{alt-fg-color} and @code{alt-bg-color} specify the colors for the
1170 alternate rows; otherwise they are empty strings.
1172 @code{left-margin}, @code{right-margin}, @code{top-margin}, and
1173 @code{bottom-margin} are measured in px.
1175 @node SPV Light Member Borders
1182 be32[n-borders] Border*[n-borders]
1183 bool[show-grid-lines]
1192 The Borders reflect how borders between regions are drawn.
1194 The fixed value of @code{endian} can be used to validate the
1197 @code{show-grid-lines} is 1 to draw grid lines, otherwise 0.
1199 Each Border describes one kind of border. @code{n-borders} seems to
1200 always be 19. Each @code{border-type} appears once (although in an
1201 unpredictable order) and correspond to the following borders:
1207 Left, top, right, and bottom outer frame.
1209 Left, top, right, and bottom inner frame.
1211 Left and top of data area.
1213 Horizontal and vertical dimension rows.
1215 Horizontal and vertical dimension columns.
1217 Horizontal and vertical category rows.
1219 Horizontal and vertical category columns.
1222 @code{stroke-type} describes how a border is drawn, as one of:
1239 @code{color} is an RGB color. Bits 24--31 are alpha, bits 16--23 are
1240 red, 8--15 are green, 0--7 are blue. An alpha of 255 indicates an
1241 opaque color, therefore opaque black is 0xff000000.
1243 @node SPV Light Member Print Settings
1244 @subsection Print Settings
1251 bool[paginate-layers]
1254 bool[top-continuation]
1255 bool[bottom-continuation]
1256 be32[n-orphan-lines]
1257 bestring[continuation-string])
1260 The PrintSettings reflect settings for printing. The fixed value of
1261 @code{endian} can be used to validate the endianness.
1263 @code{all-layers} is 1 to print all layers, 0 to print only the layer
1264 designated by @code{current-layer} in TableSettings (@pxref{SPV Light
1265 Member Table Settings}).
1267 @code{paginate-layers} is 1 to print each layer at the start of a new
1268 page, 0 otherwise. (This setting is honored only @code{all-layers} is
1269 1, since otherwise only one layer is printed.)
1271 @code{fit-width} and @code{fit-length} control whether the table is
1272 shrunk to fit within a page's width or length, respectively.
1274 @code{n-orphan-lines} is the minimum number of rows or columns to put
1275 in one part of a table that is broken across pages.
1277 If @code{top-continuation} is 1, then @code{continuation-string} is
1278 printed at the top of a page when a table is broken across pages for
1279 printing; similarly for @code{bottom-continuation} and the bottom of a
1280 page. Usually, @code{continuation-string} is empty.
1282 @node SPV Light Member Table Settings
1283 @subsection Table Settings
1293 bool[show-row-labels-in-corner]
1294 bool[show-alphabetic-markers]
1295 bool[footnote-marker-superscripts]
1298 Breakpoints[row-breaks] Breakpoints[column-breaks]
1299 Keeps[row-keeps] Keeps[column-keeps]
1300 PointKeeps[row-point-keeps] PointKeeps[column-point-keeps]
1303 bestring[table-look]
1306 Breakpoints => be32[n-breaks] be32*[n-breaks]
1308 Keeps => be32[n-keeps] Keep*[n-keeps]
1309 Keep => be32[offset] be32[n]
1311 PointKeeps => be32[n-point-keeps] PointKeep*[n-point-keeps]
1312 PointKeep => be32[offset] be32 be32
1315 The TableSettings reflect display settings. The fixed value of
1316 @code{endian} can be used to validate the endianness.
1318 @code{current-layer} is the displayed layer. Suppose there are
1319 @math{d} layers, numbered 1 through @math{d} in the order given in the
1320 Dimensions (@pxref{SPV Light Member Dimensions}), and that the
1321 displayed value of dimension @math{i} is @math{d_i}, @math{0 \le x_i <
1322 n_i}, where @math{n_i} is the number of categories in dimension
1323 @math{i}. Then @code{current-layer} is calculated by the following
1327 let @code{current-layer} = 0
1328 for each @math{i} from @math{d} downto 1:
1329 @code{current-layer} = (@math{n_i \times} @code{current-layer}) @math{+} @math{x_i}
1332 If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
1333 any cell) are hidden; otherwise, they are shown.
1335 If @code{show-row-labels-in-corner} is 1, then row labels are shown in
1336 the upper left corner; otherwise, they are shown nested.
1338 If @code{show-alphabetic-markers} is 1, markers are shown as letters
1339 (e.g.@: @samp{a}, @samp{b}, @samp{c}, @dots{}); otherwise, they are
1340 shown as numbers starting from 1.
1342 When @code{footnote-marker-superscripts} is 1, footnote markers are shown
1343 as superscripts, otherwise as subscripts.
1345 The Breakpoints are rows or columns after which there is a page break;
1346 for example, a row break of 1 requests a page break after the second
1347 row. Usually no breakpoints are specified, indicating that page
1348 breaks should be selected automatically.
1350 The Keeps are ranges of rows or columns to be kept together without a
1351 page break; for example, a row Keep with @code{offset} 1 and @code{n}
1352 10 requests that the 10 rows starting with the second row be kept
1353 together. Usually no Keeps are specified.
1355 The PointKeeps seem to be generated automatically based on
1356 user-specified Keeps. They seems to indicate a conversion from rows
1357 or columns to pixel or point offsets.
1359 @code{notes} is a text string that contains user-specified notes. It
1360 is displayed when the user hovers the cursor over the table, like text
1361 in the @code{title} attribute in HTML@. It is not printed. It is
1364 @code{table-look} is the name of a SPSS ``TableLook'' table style,
1365 such as ``Default'' or ``Academic''; it is often empty.
1367 TableSettings ends with an arbitrary number of null bytes. A writer
1368 may safely write 82 null bytes.
1370 A writer may safely use 4 for @code{x5} and 0 for @code{x6}.
1372 @node SPV Light Member Formats
1377 int32[n-widths] int32*[n-widths]
1379 int32[current-layer]
1380 bool[x7] bool[x8] bool[x9]
1385 v3(count(X1 count(X2)) count(X3)))
1386 Y0 => int32[epoch] byte[decimal] byte[grouping]
1387 CustomCurrency => int32[n-ccs] string*[n-ccs]
1390 If @code{n-widths} is nonzero, then the accompanying integers are
1391 column widths as manually adjusted by the user.
1393 @code{locale} is a locale including an encoding, such as
1394 @code{en_US.windows-1252} or @code{it_IT.windows-1252}.
1395 (@code{locale} is often duplicated in Y1, described below).
1397 @code{epoch} is the year that starts the epoch. A 2-digit year is
1398 interpreted as belonging to the 100 years beginning at the epoch. The
1399 default epoch year is 69 years prior to the current year; thus, in
1400 2017 this field by default contains 1948. In the corpus, @code{epoch}
1401 ranges from 1943 to 1948, plus some contain -1.
1403 @code{decimal} is the decimal point character. The observed values
1404 are @samp{.} and @samp{,}.
1406 @code{grouping} is the grouping character. Usually, it is @samp{,} if
1407 @code{decimal} is @samp{.}, and vice versa. Other observed values are
1408 @samp{'} (apostrophe), @samp{ } (space), and zero (presumably
1409 indicating that digits should not be grouped).
1411 @code{n-ccs} is observed as either 0 or 5. When it is 5, the
1412 following strings are CCA through CCE format strings. @xref{Custom
1413 Currency Formats,,, pspp, PSPP}. Most commonly these are all
1414 @code{-,,,} but other strings occur.
1416 A writer may safely use false for @code{x7}, @code{x8}, and @code{x9}.
1420 X0 only appears, optionally, in version 1 members.
1425 string[command] string[command-local]
1426 string[language] string[charset] string[locale]
1429 Y2 => CustomCurrency byte[missing] bool[x17]
1432 @code{command} describes the statistical procedure that generated the
1433 output, in English. It is not necessarily the literal syntax name of
1434 the procedure: for example, NPAR TESTS becomes ``Nonparametric
1435 Tests.'' @code{command-local} is the procedure's name, translated
1436 into the output language; it is often empty and, when it is not,
1437 sometimes the same as @code{command}.
1439 @code{missing} is the character used to indicate that a cell contains
1440 a missing value. It is always observed as @samp{.}.
1442 A writer may safely use false for @code{x17}.
1446 X1 only appears in version 3 members.
1454 byte[show-variables]
1456 int32[x18] int32[x19]
1462 @code{lang} may indicate the language in use. Some values seem to be
1463 0: @t{en}, 1: @t{de}, 2: @t{es}, 3: @t{it}, 5: @t{ko}, 6: @t{pl}, 8:
1464 @t{zh-tw}, 10: @t{pt_BR}, 11: @t{fr}.
1466 @code{show-variables} determines how variables are displayed by
1467 default. A value of 1 means to display variable names, 2 to display
1468 variable labels when available, 3 to display both (name followed by
1469 label, separated by a space). The most common value is 0, which
1470 probably means to use a global default.
1472 @code{show-values} is a similar setting for values. A value of 1
1473 means to display the value, 2 to display the value label when
1474 available, 3 to display both. Again, the most common value is 0,
1475 which probably means to use a global default.
1477 @code{show-title} is 1 to show the caption, 10 to hide it.
1479 @code{show-caption} is true to show the caption, false to hide it.
1481 A writer may safely use false for @code{x14}, false for @code{x16}, 0
1482 for @code{lang}, -1 for @code{x18} and @code{x19}, and false for
1487 X2 only appears in version 3 members.
1491 int32[n-row-heights] int32*[n-row-heights]
1492 int32[n-style-map] StyleMap*[n-style-map]
1493 int32[n-styles] StylePair*[n-styles]
1495 StyleMap => int64[cell-index] int16[style-index]
1498 If present, @code{n-row-heights} and the accompanying integers are row
1499 heights as manually adjusted by the user.
1501 The rest of X2 specifies styles for data cells. At first glance this
1502 is odd, because each data cell can have its own style embedded as part
1503 of the data, but in practice X2 specifies a style for a cell only if
1504 that cell is empty (and thus does not appear in the data at all).
1505 Each StyleMap specifies the index of a blank cell, calculated the same
1506 was as in the Cells (@pxref{SPV Light Member Cells}), along with a
1507 0-based index into the accompanying StylePair array.
1509 A writer may safely omit the optional @code{i0 i0} inside the
1510 @code{count(@dots{})}.
1514 X3 only appears in version 3 members.
1518 01 00 byte[x21] 00 00 00
1521 (string[dataset] string[datafile] i0 int32[date] i0)?
1526 @code{small} is a small real number. In the corpus, it overwhelmingly
1527 takes the value 0.0001, with zero occasionally seen. Nonzero numbers
1528 with format 40 (@pxref{SPV Light Member Value}) whose magnitudes are
1529 smaller than displayed in scientific notation. (Thus, a @code{small}
1530 of zero prevents scientific notation from being chosen.)
1532 @code{dataset} is the name of the dataset analyzed to produce the
1533 output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
1534 file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter
1535 is sometimes the empty string.
1537 @code{date} is a date, as seconds since the epoch, i.e.@: since
1538 January 1, 1970. Pivot tables within an SPV file often have dates a
1539 few minutes apart, so this is probably a creation date for the table
1540 rather than for the file.
1542 Sometimes @code{dataset}, @code{datafile}, and @code{date} are present
1543 and other times they are absent. The reader can distinguish by
1544 assuming that they are present and then checking whether the
1545 presumptive @code{dataset} contains a null byte (a valid string never
1548 @code{x22} is usually 0 or 2000000.
1550 A writer may safely use 4 for @code{x21} and omit @code{x22} and the
1551 other optional bytes at the end.
1553 @subsubheading Encoding
1555 Formats contains several indications of character encoding:
1559 @code{locale} in Formats itself.
1562 @code{locale} in Y1 (in version 1, Y1 is optionally nested inside X0;
1563 in version 3, Y1 is nested inside X3).
1566 @code{charset} in version 3, in Y1.
1569 @code{lang} in X1, in version 3.
1572 @code{charset}, if present, is a good indication of character
1573 encoding, and in its absence the encoding suffix on @code{locale} in
1576 @code{locale} in Y1 can be disregarded: it is normally the same as
1577 @code{locale} in Formats, and it is only present if @code{charset} is
1580 @code{lang} is not helpful and should be ignored for character
1583 However, the corpus contains many examples of light members whose
1584 strings are encoded in UTF-8 despite declaring some other character
1585 set. Furthermore, the corpus contains several examples of light
1586 members in which some strings are encoded in UTF-8 (and contain
1587 multibyte characters) and other strings are encoded in another
1588 character set (and contain non-ASCII characters). PSPP treats any
1589 valid UTF-8 string as UTF-8 and only falls back to the declared
1590 encoding for strings that are not valid UTF-8.
1592 The @command{pspp-output} program's @command{strings} command can help
1593 analyze the encoding in an SPV light member. Use @code{pspp-output
1594 --help-dev} to see its usage.
1596 @node SPV Light Member Dimensions
1597 @subsection Dimensions
1599 A pivot table presents multidimensional data. A Dimension identifies
1600 the categories associated with each dimension.
1603 Dimensions => int32[n-dims] Dimension*[n-dims]
1605 Value[name] DimProperties
1606 int32[n-categories] Category*[n-categories]
1611 bool[hide-dim-label]
1612 bool[hide-all-labels]
1616 @code{name} is the name of the dimension, e.g.@: @code{Variables},
1617 @code{Statistics}, or a variable name.
1619 The meanings of @code{x1} and @code{x3} are unknown. @code{x1} is
1620 usually 0 but many other values have been observed. A writer may
1621 safely use 0 for @code{x1} and 2 for @code{x3}.
1623 @code{x2} is 0, 1, or 2. For a pivot table with @var{L} layer
1624 dimensions, @var{R} row dimensions, and @var{C} column dimensions,
1625 @code{x2} is 2 for the first @var{L} dimensions, 0 for the next
1626 @var{R} dimensions, and 1 for the remaining @var{C} dimensions. This
1627 does not mean that the layer dimensions must be presented first,
1628 followed by the row dimensions, followed by the column dimensions---on
1629 the contrary, they are frequently in a different order---but @code{x2}
1630 must follow this pattern to prevent the pivot table from being
1633 If @code{hide-dim-label} is 00, the pivot table displays a label for
1634 the dimension itself. Because usually the group and category labels
1635 are enough explanation, it is usually 01.
1637 If @code{hide-all-labels} is 01, the pivot table omits all labels for
1638 the dimension, including group and category labels. It is usually 00.
1639 When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored.
1641 @code{dim-index} is usually the 0-based index of the dimension, e.g.@:
1642 0 for the first dimension, 1 for the second, and so on. Sometimes it
1643 is -1. There is no visible difference. A writer may safely use the
1646 @node SPV Light Member Categories
1647 @subsection Categories
1649 Categories are arranged in a tree. Only the leaf nodes in the tree
1650 are really categories; the others just serve as grouping constructs.
1653 Category => Value[name] (Leaf @math{|} Group)
1654 Leaf => 00 00 00 i2 int32[leaf-index] i0
1656 bool[merge] 00 01 int32[x23]
1657 i-1 int32[n-subcategories] Category*[n-subcategories]
1660 @code{name} is the name of the category (or group).
1662 A Leaf represents a leaf category. The Leaf's @code{leaf-index} is a
1663 nonnegative integer unique within the Dimension and less than
1664 @code{n-categories} in the Dimension. If the user does not sort or
1665 rearrange the categories, then @code{leaf-index} starts at 0 for the
1666 first Leaf in the dimension and increments by 1 with each successive
1667 Leaf. If the user does sorts or rearrange the categories, then the
1668 order of categories in the file reflects that change and
1669 @code{leaf-index} reflects the original order.
1671 A dimension can have no leaf categories at all. A table that
1672 contains such a dimension necessarily has no data at all.
1674 A Group is a group of nested categories. Usually a Group contains at
1675 least one Category, so that @code{n-subcategories} is positive, but
1676 Groups with zero subcategories have been observed.
1678 If a Group's @code{merge} is 00, the most common value, then the group
1679 is really a distinct group that should be represented as such in the
1680 visual representation and user interface. If @code{merge} is 01, the
1681 categories in this group should be shown and treated as if they were
1682 direct children of the group's containing group (or if it has no
1683 parent group, then direct children of the dimension), and this group's
1684 name is irrelevant and should not be displayed. (Merged groups can be
1687 Writers need not use merged groups.
1689 A Group's @code{x23} appears to be i2 when all of the categories
1690 within a group are leaf categories that directly represent data values
1691 for a variable (e.g.@: in a frequency table or crosstabulation, a group
1692 of values in a variable being tabulated) and i0 otherwise. A writer
1693 may safely write a constant 0 in this field.
1695 @node SPV Light Member Axes
1698 After the dimensions come assignment of each dimension to one of the
1699 axes: layers, rows, and columns.
1703 int32[n-layers] int32[n-rows] int32[n-columns]
1704 int32*[n-layers] int32*[n-rows] int32*[n-columns]
1707 The values of @code{n-layers}, @code{n-rows}, and @code{n-columns}
1708 each specifies the number of dimensions displayed in layers, rows, and
1709 columns, respectively. Any of them may be zero. Their values sum to
1710 @code{n-dimensions} from Dimensions (@pxref{SPV Light Member
1713 The following @code{n-dimensions} integers, in three groups, are a
1714 permutation of the 0-based dimension numbers. The first
1715 @code{n-layers} integers specify each of the dimensions represented by
1716 layers, the next @code{n-rows} integers specify the dimensions
1717 represented by rows, and the final @code{n-columns} integers specify
1718 the dimensions represented by columns. When there is more than one
1719 dimension of a given kind, the inner dimensions are given first. (For
1720 the layer axis, this means that the first dimension is at the bottom
1721 of the list and the last dimension is at the top when the current
1722 layer is displayed.)
1724 @node SPV Light Member Cells
1727 The final part of an SPV light member contains the actual data.
1730 Cells => int32[n-cells] Cell*[n-cells]
1731 Cell => int64[index] v1(00?) Value
1734 A Cell consists of an @code{index} and a Value. Suppose there are
1735 @math{d} dimensions, numbered 1 through @math{d} in the order given in
1736 the Dimensions previously, and that dimension @math{i} has @math{n_i}
1737 categories. Consider the cell at coordinates @math{x_i}, @math{1 \le
1738 i \le d}, and note that @math{0 \le x_i < n_i}. Then the index is
1739 calculated by the following algorithm:
1743 for each @math{i} from 1 to @math{d}:
1744 @i{index} = (@math{n_i \times} @i{index}) @math{+} @math{x_i}
1747 For example, suppose there are 3 dimensions with 3, 4, and 5
1748 categories, respectively. The cell at coordinates (1, 2, 3) has
1749 index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}.
1750 Within a given dimension, the index is the @code{leaf-index} in a Leaf.
1752 @node SPV Light Member Value
1755 Value is used throughout the SPV light member format. It boils down
1756 to a number or a string.
1759 Value => 00? 00? 00? 00? RawValue
1761 01 ValueMod int32[format] double[x]
1762 @math{|} 02 ValueMod int32[format] double[x]
1763 string[var-name] string[value-label] byte[show]
1764 @math{|} 03 string[local] ValueMod string[id] string[c] bool[fixed]
1765 @math{|} 04 ValueMod int32[format] string[value-label] string[var-name]
1766 byte[show] string[s]
1767 @math{|} 05 ValueMod string[var-name] string[var-label] byte[show]
1768 @math{|} 06 string[local] ValueMod string[id] string[c]
1769 @math{|} ValueMod string[template] int32[n-args] Argument*[n-args]
1772 @math{|} int32[x] i0 Value*[x] /* x > 0 */
1775 There are several possible encodings, which one can distinguish by the
1776 first nonzero byte in the encoding.
1780 The numeric value @code{x}, intended to be presented to the user
1781 formatted according to @code{format}, which is about the same as the
1782 format described for system files (@pxref{System File Output
1783 Formats}). The exception is that format 40 is not MTIME but instead
1784 approximately a synonym for F format with a different rule for whether
1785 a value is shown in scientific notation: a value in format 40 is shown
1786 in scientific notation if and only if it is nonzero and its magnitude
1787 is less than @code{small} (@pxref{SPV Light Member Formats}).
1789 Most commonly, @code{format} has width 40 (the maximum).
1791 An @code{x} with the maximum negative double value @code{-DBL_MAX}
1792 represents the system-missing value SYSMIS. (HIGHEST and LOWEST have
1793 not been observed.) See @ref{System File Format}, for more about
1794 these special values.
1797 Similar to @code{01}, with the additional information that @code{x} is
1798 a value of variable @code{var-name} and has value label
1799 @code{value-label}. Both @code{var-name} and @code{value-label} can
1800 be the empty string, the latter very commonly.
1802 @code{show} determines whether to show the numeric value or the value
1803 label. A value of 1 means to show the value, 2 to show the label, 3
1804 to show both, and 0 means to use the default specified in
1805 @code{show-values} (@pxref{SPV Light Member Formats}).
1808 A text string, in two forms: @code{c} is in English, and sometimes
1809 abbreviated or obscure, and @code{local} is localized to the user's
1810 locale. In an English-language locale, the two strings are often the
1811 same, and in the cases where they differ, @code{local} is more
1812 appropriate for a user interface, e.g.@: @code{c} of ``Not a PxP table
1813 for MCN...'' versus @code{local} of ``Computed only for a PxP table,
1814 where P must be greater than 1.''
1816 @code{c} and @code{local} are always either both empty or both
1819 @code{id} is a brief identifying string whose form seems to resemble a
1820 programming language identifier, e.g.@: @code{cumulative_percent} or
1821 @code{factor_14}. It is not unique.
1823 @code{fixed} is 00 for text taken from user input, such as syntax
1824 fragment, expressions, file names, data set names, and 01 for fixed
1825 text strings such as names of procedures or statistics. In the former
1826 case, @code{id} is always the empty string; in the latter case,
1827 @code{id} is still sometimes empty.
1830 The string value @code{s}, intended to be presented to the user
1831 formatted according to @code{format}. The format for a string is not
1832 too interesting, and the corpus contains many clearly invalid formats
1833 like A16.39 or A255.127 or A134.1, so readers should probably entirely
1834 disregard the format. PSPP only checks @code{format} to distinguish
1837 @code{s} is a value of variable @code{var-name} and has value label
1838 @code{value-label}. @code{var-name} is never empty but
1839 @code{value-label} is commonly empty.
1841 @code{show} has the same meaning as in the encoding for 02.
1844 Variable @code{var-name} with variable label @code{var-label}. In the
1845 corpus, @code{var-name} is rarely empty and @code{var-label} is often
1848 @code{show} determines whether to show the variable name or the
1849 variable label. A value of 1 means to show the name, 2 to show the
1850 label, 3 to show both, and 0 means to use the default specified in
1851 @code{show-variables} (@pxref{SPV Light Member Formats}).
1854 Similar to type 03, with @code{fixed} assumed to be true.
1857 When the first byte of a RawValue is not one of the above, the
1858 RawValue starts with a ValueMod, whose syntax is described in the next
1859 section. (A ValueMod always begins with byte 31 or 58.)
1861 This case is a template string, analogous to @code{printf}, followed
1862 by one or more Arguments, each of which has one or more values. The
1863 template string is copied directly into the output except for the
1864 following special syntax,
1871 Each of these expands to the character following @samp{\\}, to escape
1872 characters that have special meaning in template strings. These are
1873 effective inside and outside the @code{[@dots{}]} syntax forms
1877 Expands to a new-line, inside or outside the @code{[@dots{}]} forms
1881 Expands to a formatted version of argument @var{i}, which must have
1882 only a single value. For example, @code{^1} expands to the first
1883 argument's @code{value}.
1885 @item [:@var{a}:]@var{i}
1886 Expands @var{a} for each of the values in @var{i}. @var{a}
1887 should contain one or more @code{^@var{j}} conversions, which are
1888 drawn from the values for argument @var{i} in order. Some examples
1893 All of the values for the first argument, concatenated.
1896 Expands to the values for the first argument, each followed by
1900 Expands to @code{@var{x} = @var{y}} where @var{x} is the second
1901 argument's first value and @var{y} is its second value. (This would
1902 be used only if the argument has two values. If there were more
1903 values, the second and third values would be directly concatenated,
1904 which would look funny.)
1907 @item [@var{a}:@var{b}:]@var{i}
1908 This extends the previous form so that the first values are expanded
1909 using @var{a} and later values are expanded using @var{b}. For an
1910 unknown reason, within @var{a} the @code{^@var{j}} conversions are
1911 instead written as @code{%@var{j}}. Some examples from the corpus:
1915 Expands to all of the values for the first argument, separated by
1918 @item [%1 = %2:, ^1 = ^2:]1
1919 Given appropriate values for the first argument, expands to @code{X =
1923 Given appropriate values, expands to @code{1, 2, 3}.
1927 The template string is localized to the user's locale.
1930 A writer may safely omit all of the optional 00 bytes at the beginning
1931 of a Value, except that it should write a single 00 byte before a
1934 @node SPV Light Member ValueMod
1935 @subsection ValueMod
1937 A ValueMod can specify special modifications to a Value.
1943 int32[n-refs] int16*[n-refs]
1944 int32[n-subscripts] string*[n-subscripts]
1945 v1(00 (i1 | i2) 00? 00? int32 00? 00?)
1946 v3(count(TemplateString StylePair))
1948 TemplateString => count((count((i0 (58 @math{|} 31 55))?) (58 @math{|} 31 string[id]))?)
1955 bool[bold] bool[italic] bool[underline] bool[show]
1956 string[fg-color] string[bg-color]
1957 string[typeface] byte[size]
1960 int32[halign] int32[valign] double[decimal-offset]
1961 int16[left-margin] int16[right-margin]
1962 int16[top-margin] int16[bottom-margin]
1965 A ValueMod that begins with ``31'' specifies special modifications to
1968 Each of the @code{n-refs} integers is a reference to a Footnote
1969 (@pxref{SPV Light Member Footnotes}) by 0-based index. Footnote
1970 markers are shown appended to the main text of the Value, as
1971 superscripts or subscripts.
1973 The @code{subscripts}, if present, are strings to append to the main
1974 text of the Value, as subscripts. Each subscript text is a brief
1975 indicator, e.g.@: @samp{a} or @samp{b}, with its meaning indicated by
1976 the table caption. When multiple subscripts are present, they are
1977 displayed separated by commas.
1979 The @code{id} inside the TemplateString, if present, is a template
1980 string for substitutions using the syntax explained previously. It
1981 appears to be an English-language version of the localized template
1982 string in the Value in which the Template is nested. A writer may
1983 safely omit the optional fixed data in TemplateString.
1985 FontStyle and CellStyle, if present, change the style for this
1986 individual Value. In FontStyle, @code{bold}, @code{italic}, and
1987 @code{underline} control the particular style. @code{show} is
1988 ordinarily 1; if it is 0, then the cell data is not shown.
1989 @code{fg-color} and @code{bg-color} are strings in the format
1990 @code{#rrggbb}, e.g.@: @code{#ff0000} for red or @code{#ffffff} for
1991 white. The empty string is occasionally observed also. The
1992 @code{size} is a font size in units of 1/128 inch.
1994 In CellStyle, @code{halign} is 0 for center, 2 for left, 4 for right,
1995 6 for decimal, 0xffffffad for mixed. For decimal alignment,
1996 @code{decimal-offset} is the decimal point's offset from the right
1997 side of the cell, in pt (@pxref{SPV Light Detail Member Format}).
1998 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
1999 for bottom. @code{left-margin}, @code{right-margin},
2000 @code{top-margin}, and @code{bottom-margin} are in pt.
2002 @node SPV Legacy Detail Member Binary Format
2003 @section Legacy Detail Member Binary Format
2005 Whereas the light binary format represents everything about a given
2006 pivot table, the legacy binary format conceptually consists of a
2007 number of named sources, each of which consists of a number of named
2008 variables, each of which is a 1-dimensional array of numbers or
2009 strings or a mix. Thus, the legacy binary member format is quite
2012 This section uses the same context-free grammar notation as in the
2013 previous section, with the following additions:
2017 In a version 0xaf legacy member, @var{x}; in other versions, nothing.
2018 (The legacy member header indicates the version; see below.)
2021 In a version 0xb0 legacy member, @var{x}; in other versions, nothing.
2024 A legacy detail member @file{.bin} has the following overall format:
2028 00 byte[version] int16[n-sources] int32[member-size]
2029 Metadata*[n-sources]
2034 @code{version} is a version number that affects the interpretation of
2035 some of the other data in the member. Versions 0xaf and 0xb0 are
2036 known. We will refer to ``version 0xaf'' and ``version 0xb0'' members
2039 A legacy member consists of @code{n-sources} data sources, each of
2040 which has Metadata and Data.
2042 @code{member-size} is the size of the legacy binary member, in bytes.
2044 The Data and Strings above are commented out because the Metadata has
2045 some oddities that mean that the Data sometimes seems to start at
2046 an unexpected place. The following section goes into detail.
2049 * SPV Legacy Member Metadata::
2050 * SPV Legacy Member Numeric Data::
2051 * SPV Legacy Member String Data::
2054 @node SPV Legacy Member Metadata
2055 @subsection Metadata
2059 int32[n-values] int32[n-variables] int32[data-offset]
2060 vAF(byte*28[source-name])
2061 vB0(byte*64[source-name] int32[x])
2064 A data source has @code{n-variables} variables, each with
2065 @code{n-values} data values.
2067 @code{source-name} is a 28- or 64-byte string padded on the right with
2068 0-bytes. The names that appear in the corpus are very generic:
2069 usually @code{tableData} for pivot table data or @code{source0} for
2072 A given Metadata's @code{data-offset} is the offset, in bytes, from
2073 the beginning of the member to the start of the corresponding Data.
2074 This allows programs to skip to the beginning of the data for a
2075 particular source. In every case in the corpus, the Data follow the
2076 Metadata in the same order, but it is important to use
2077 @code{data-offset} instead of reading sequentially through the file
2078 because of the exception described below.
2080 One SPV file in the corpus has legacy binary members with version 0xb0
2081 but a 28-byte @code{source-name} field (and only a single source). In
2082 practice, this means that the 64-byte @code{source-name} used in
2083 version 0xb0 has a lot of 0-bytes in the middle followed by the
2084 @code{variable-name} of the following Data. As long as a reader
2085 treats the first 0-byte in the @code{source-name} as terminating the
2086 string, it can properly interpret these members.
2088 The meaning of @code{x} in version 0xb0 is unknown.
2090 @node SPV Legacy Member Numeric Data
2091 @subsection Numeric Data
2094 Data => Variable*[n-variables]
2095 Variable => byte*288[variable-name] double*[n-values]
2098 Data follow the Metadata in the legacy binary format, with sources in
2099 the same order (but readers should use the @code{data-offset} in
2100 Metadata records, rather than reading sequentially). Each Variable
2101 begins with a @code{variable-name} that generally indicates its role
2102 in the pivot table, e.g.@: ``cell'', ``cellFormat'',
2103 ``dimension0categories'', ``dimension0group0'', followed by the
2104 numeric data, one double per datum. A double with the maximum
2105 negative double @code{-DBL_MAX} represents the system-missing value
2108 @node SPV Legacy Member String Data
2109 @subsection String Data
2112 Strings => SourceMaps[maps] Labels
2114 SourceMaps => int32[n-maps] SourceMap*[n-maps]
2116 SourceMap => string[source-name] int32[n-variables] VariableMap*[n-variables]
2117 VariableMap => string[variable-name] int32[n-data] DatumMap*[n-data]
2118 DatumMap => int32[value-idx] int32[label-idx]
2120 Labels => int32[n-labels] Label*[n-labels]
2121 Label => int32[frequency] string[label]
2124 Each variable may include a mix of numeric and string data values. If
2125 a legacy binary member contains any string data, Strings is present;
2126 otherwise, it ends just after the last Data element.
2128 The string data overlays the numeric data. When a variable includes
2129 any string data, its Variable represents the string values with a
2130 SYSMIS or NaN placeholder. (Not all such values need be
2133 Each SourceMap provides a mapping between SYSMIS or NaN values in source
2134 @code{source-name} and the string data that they represent.
2135 @code{n-variables} is the number of variables in the source that
2136 include string data. More precisely, it is the 1-based index of the
2137 last variable in the source that includes any string data; thus, it
2138 would be 4 if there are 5 variables and only the fourth one includes
2141 A VariableMap repeats its variable's name, but variables are always
2142 present in the same order as the source, starting from the first
2143 variable, without skipping any even if they have no string values.
2144 Each VariableMap contains DatumMap nonterminals, each of which maps
2145 from a 0-based index within its variable's data to a 0-based label
2146 index, e.g.@: pair @code{value-idx} = 2, @code{label-idx} = 3, means
2147 that the third data value (which must be SYSMIS or NaN) is to be
2148 replaced by the string of the fourth Label.
2150 The labels themselves follow the pairs. The valuable part of each
2151 label is the string @code{label}. Each label also includes a
2152 @code{frequency} that reports the number of DatumMaps that reference
2153 it (although this is not useful).
2155 @node SPV Legacy Detail Member XML Format
2156 @section Legacy Detail Member XML Format
2158 The design of the detail XML format is not what one would end up with
2159 for describing pivot tables. This is because it is a special case
2160 of a much more general format (``visualization XML'' or ``VizML'')
2161 that can describe a wide range of visualizations. Most of this
2162 generality is overkill for tables, and so we end up with a funny
2163 subset of a general-purpose format.
2165 An XML Schema for VizML is available, distributed with SPSS binaries,
2166 under a nonfree license. It contains documentation that is
2167 occasionally helpful.
2169 This section describes the detail XML format using the same notation
2170 already used for the structure XML format (@pxref{SPV Structure Member
2171 Format}). See @file{src/output/spv/detail-xml.grammar} in the PSPP
2172 source tree for the full grammar that it uses for parsing.
2174 The important elements of the detail XML format are:
2178 Variables. @xref{SPV Detail Variable Elements}.
2181 Assignment of variables to axes. A variable can appear as columns, or
2182 rows, or layers. The @code{faceting} element and its sub-elements
2183 describe this assignment.
2186 Styles and other annotations.
2189 This description is not detailed enough to write legacy tables.
2190 Instead, write tables in the light binary format.
2193 * SPV Detail visualization Element::
2194 * SPV Detail Variable Elements::
2195 * SPV Detail extension Element::
2196 * SPV Detail graph Element::
2197 * SPV Detail location Element::
2198 * SPV Detail faceting Element::
2199 * SPV Detail facetLayout Element::
2200 * SPV Detail label Element::
2201 * SPV Detail setCellProperties Element::
2202 * SPV Detail setFormat Element::
2203 * SPV Detail interval Element::
2204 * SPV Detail style Element::
2205 * SPV Detail labelFrame Element::
2206 * SPV Detail Legacy Properties::
2209 @node SPV Detail visualization Element
2210 @subsection The @code{visualization} Element
2218 :style[style_ref]=ref style
2222 => visualization_extension?
2224 (sourceVariable | derivedVariable)+
2233 extension[visualization_extension]
2236 :minWidthSet=(true)?
2237 :maxWidthSet=(true)?
2240 userSource :missing=(listwise | pairwise)? => EMPTY
2242 categoricalDomain => variableReference simpleSort
2244 simpleSort :method[sort_method]=(custom) => categoryOrder
2246 container :style=ref style => container_extension? location+ labelFrame*
2248 extension[container_extension] :combinedFootnotes=(true) => EMPTY
2256 The @code{visualization} element is the root of detail XML member. It
2257 has the following attributes:
2259 @defvr {Attribute} creator
2260 The version of the software that created this SPV file, as a string of
2261 the form @code{xxyyzz}, which represents software version xx.yy.zz,
2262 e.g.@: @code{160001} is version 16.0.1. The corpus includes major
2263 versions 16 through 19.
2266 @defvr {Attribute} date
2267 The date on the which the file was created, as a string of the form
2271 @defvr {Attribute} lang
2272 The locale used for output, in Windows format, which is similar to the
2273 format used in Unix with the underscore replaced by a hyphen, e.g.@:
2274 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
2277 @defvr {Attribute} name
2278 The title of the pivot table, localized to the output language.
2281 @defvr {Attribute} style
2282 The base style for the pivot table. In every example in the corpus,
2283 the @code{style} element has no attributes other than @code{id}.
2286 @defvr {Attribute} type
2287 A floating-point number. The meaning is unknown.
2290 @defvr {Attribute} version
2291 The visualization schema version number. In the corpus, the value is
2292 one of 2.4, 2.5, 2.7, and 2.8.
2295 The @code{userSource} element has no visible effect.
2297 The @code{extension} element as a child of @code{visualization} has
2298 the following attributes.
2300 @defvr {Attribute} numRows
2301 An integer that presumably defines the number of rows in the displayed
2305 @defvr {Attribute} showGridline
2306 Always set to @code{false} in the corpus.
2309 @defvr {Attribute} minWidthSet
2310 @defvrx {Attribute} maxWidthSet
2311 Always set to @code{true} in the corpus.
2314 The @code{extension} element as a child of @code{container} has the
2317 @defvr {Attribute} combinedFootnotes
2321 The @code{categoricalDomain} and @code{simpleSort} elements have no
2324 The @code{layerController} element has no visible effect.
2326 @node SPV Detail Variable Elements
2327 @subsection Variable Elements
2329 A ``variable'' in detail XML is a 1-dimensional array of data. Each
2330 element of the array may, independently, have string or numeric
2331 content. All of the variables in a given detail XML member either
2332 have the same number of elements or have zero elements.
2334 Two different elements define variables and their content:
2337 @item sourceVariable
2338 These variables' data comes from the associated @code{tableData.bin}
2341 @item derivedVariable
2342 These variables are defined in terms of a mapping function from a
2343 source variable, or they are empty.
2346 A variable named @code{cell} always exists. This variable holds the
2347 data displayed in the table.
2349 Variables in detail XML roughly correspond to the dimensions in a
2350 light detail member. Each dimension has the following variables with
2351 stylized names, where @var{n} is a number for the dimension starting
2355 @item dimension@var{n}categories
2356 The dimension's leaf categories (@pxref{SPV Light Member Categories}).
2358 @item dimension@var{n}group0
2359 Present only if the dimension's categories are grouped, this variable
2360 holds the group labels for the categories. Grouping is inferred
2361 through adjacent identical labels. Categories that are not part of a
2362 group have empty-string data in this variable.
2364 @item dimension@var{n}group1
2365 Present only if the first-level groups are further grouped, this
2366 variable holds the labels for the second-level groups. There can be
2367 additional variables with further levels of grouping.
2369 @item dimension@var{n}
2373 Determining the data for a (non-empty) variable is a multi-step
2378 Draw initial data from its source, for a @code{sourceVariable}, or
2379 from another named variable, for a @code{derivedVariable}.
2382 Apply mappings from @code{valueMapEntry} elements within the
2383 @code{derivedVariable} element, if any.
2386 Apply mappings from @code{relabel} elements within a @code{format} or
2387 @code{stringFormat} element in the @code{sourceVariable} or
2388 @code{derivedVariable} element, if any.
2391 If the variable is a @code{sourceVariable} with a @code{labelVariable}
2392 attribute, and there were no mappings to apply in previous steps, then
2393 replace each element of the variable by the corresponding value in the
2397 A single variable's data can be modified in two of the steps, if both
2398 @code{valueMapEntry} and @code{relabel} are used. The following
2399 example from the corpus maps several integers to 2, then maps 2 in
2400 turn to the string ``Input'':
2403 <derivedVariable categorical="true" dependsOn="dimension0categories"
2404 id="dimension0group0map" value="map(dimension0group0)">
2406 <relabel from="2" to="Input"/>
2407 <relabel from="10" to="Missing Value Handling"/>
2408 <relabel from="14" to="Resources"/>
2409 <relabel from="0" to=""/>
2410 <relabel from="1" to=""/>
2411 <relabel from="13" to=""/>
2413 <valueMapEntry from="2;3;5;6;7;8;9" to="2"/>
2414 <valueMapEntry from="10;11" to="10"/>
2415 <valueMapEntry from="14;15" to="14"/>
2416 <valueMapEntry from="0" to="0"/>
2417 <valueMapEntry from="1" to="1"/>
2418 <valueMapEntry from="13" to="13"/>
2423 * SPV Detail sourceVariable Element::
2424 * SPV Detail derivedVariable Element::
2425 * SPV Detail valueMapEntry Element::
2428 @node SPV Detail sourceVariable Element
2429 @subsubsection The @code{sourceVariable} Element
2436 :domain=ref categoricalDomain?
2438 :dependsOn=ref sourceVariable?
2440 :labelVariable=ref sourceVariable?
2441 => variable_extension* (format | stringFormat)?
2444 This element defines a variable whose data comes from the
2445 @file{tableData.bin} member that corresponds to this @file{.xml}.
2447 This element has the following attributes.
2449 @defvr {Attribute} id
2450 An @code{id} is always present because this element exists to be
2451 referenced from other elements.
2454 @defvr {Attribute} categorical
2455 Always set to @code{true}.
2458 @defvr {Attribute} source
2459 Always set to @code{tableData}, the @code{source-name} in the
2460 corresponding @file{tableData.bin} member (@pxref{SPV Legacy Member
2464 @defvr {Attribute} sourceName
2465 The name of a variable within the source, corresponding to the
2466 @code{variable-name} in the @file{tableData.bin} member (@pxref{SPV
2467 Legacy Member Numeric Data}).
2470 @defvr {Attribute} label
2471 The variable label, if any.
2474 @defvr {Attribute} labelVariable
2475 The @code{variable-name} of a variable whose string values correspond
2476 one-to-one with the values of this variable and are suitable for use
2480 @defvr {Attribute} dependsOn
2481 This attribute doesn't affect the display of a table.
2484 @node SPV Detail derivedVariable Element
2485 @subsubsection The @code{derivedVariable} Element
2492 :dependsOn=ref sourceVariable?
2493 => variable_extension* (format | stringFormat)? valueMapEntry*
2496 Like @code{sourceVariable}, this element defines a variable whose
2497 values can be used elsewhere in the visualization. Instead of being
2498 read from a data source, the variable's data are defined by a
2499 mathematical expression.
2501 This element has the following attributes.
2503 @defvr {Attribute} id
2504 An @code{id} is always present because this element exists to be
2505 referenced from other elements.
2508 @defvr {Attribute} categorical
2509 Always set to @code{true}.
2512 @defvr {Attribute} value
2513 An expression that defines the variable's value. In theory this could
2514 be an arbitrary expression in terms of constants, functions, and other
2515 variables, e.g.@: @math{(@var{var1} + @var{var2}) / 2}. In practice,
2516 the corpus contains only the following forms of expressions:
2520 @itemx constant(@var{variable})
2521 All zeros. The reason why a variable is sometimes named is unknown.
2522 Sometimes the ``variable name'' has spaces in it.
2524 @item map(@var{variable})
2525 Transforms the values in the named @var{variable} using the
2526 @code{valueMapEntry}s contained within the element.
2530 @defvr {Attribute} dependsOn
2531 This attribute doesn't affect the display of a table.
2534 @node SPV Detail valueMapEntry Element
2535 @subsubsection The @code{valueMapEntry} Element
2538 valueMapEntry :from :to => EMPTY
2541 A @code{valueMapEntry} element defines a mapping from one or more
2542 values of a source expression to a target value. (In the corpus, the
2543 source expression is always just the name of a variable.) Each target
2544 value requires a separate @code{valueMapEntry}. If multiple source
2545 values map to the same target value, they can be combined or separate.
2547 In the corpus, all of the source and target values are integers.
2549 @code{valueMapEntry} has the following attributes.
2551 @defvr {Attribute} from
2552 A source value, or multiple source values separated by semicolons,
2553 e.g.@: @code{0} or @code{13;14;15;16}.
2556 @defvr {Attribute} to
2557 The target value, e.g.@: @code{0}.
2560 @node SPV Detail extension Element
2561 @subsection The @code{extension} Element
2563 This is a general-purpose ``extension'' element. Readers that don't
2564 understand a given extension should be able to safely ignore it. The
2565 attributes on this element, and their meanings, vary based on the
2566 context. Each known usage is described separately below. The current
2567 extensions use attributes exclusively, without any nested elements.
2569 @subsubheading @code{container} Parent Element
2572 extension[container_extension] :combinedFootnotes=(true) => EMPTY
2575 With @code{container} as its parent element, @code{extension} has the
2576 following attributes.
2578 @defvr {Attribute} combinedFootnotes
2579 Always set to @code{true} in the corpus.
2582 @subsubheading @code{sourceVariable} and @code{derivedVariable} Parent Element
2585 extension[variable_extension] :from :helpId => EMPTY
2588 With @code{sourceVariable} or @code{derivedVariable} as its parent
2589 element, @code{extension} has the following attributes. A given
2590 parent element often contains several @code{extension} elements that
2591 specify the meaning of the source data's variables or sources, e.g.@:
2594 <extension from="0" helpId="corrected_model"/>
2595 <extension from="3" helpId="error"/>
2596 <extension from="4" helpId="total_9"/>
2597 <extension from="5" helpId="corrected_total"/>
2600 More commonly they are less helpful, e.g.@:
2603 <extension from="0" helpId="notes"/>
2604 <extension from="1" helpId="notes"/>
2605 <extension from="2" helpId="notes"/>
2606 <extension from="5" helpId="notes"/>
2607 <extension from="6" helpId="notes"/>
2608 <extension from="7" helpId="notes"/>
2609 <extension from="8" helpId="notes"/>
2610 <extension from="12" helpId="notes"/>
2611 <extension from="13" helpId="no_help"/>
2612 <extension from="14" helpId="notes"/>
2615 @defvr {Attribute} from
2616 An integer or a name like ``dimension0''.
2619 @defvr {Attribute} helpId
2623 @node SPV Detail graph Element
2624 @subsection The @code{graph} Element
2628 :cellStyle=ref style
2630 => location+ coordinates faceting facetLayout interval
2632 coordinates => EMPTY
2635 @code{graph} has the following attributes.
2637 @defvr {Attribute} cellStyle
2638 @defvrx {Attribute} style
2639 Each of these is the @code{id} of a @code{style} element (@pxref{SPV
2640 Detail style Element}). The former is the default style for
2641 individual cells, the latter for the entire table.
2644 @node SPV Detail location Element
2645 @subsection The @code{location} Element
2649 :part=(height | width | top | bottom | left | right)
2650 :method=(sizeToContent | attach | fixed | same)
2653 :target=ref (labelFrame | graph | container)?
2658 Each instance of this element specifies where some part of the table
2659 frame is located. All the examples in the corpus have four instances
2660 of this element, one for each of the parts @code{height},
2661 @code{width}, @code{left}, and @code{top}. Some examples in the
2662 corpus add a fifth for part @code{bottom}, even though it is not clear
2663 how all of @code{top}, @code{bottom}, and @code{height} can be honored
2664 at the same time. In any case, @code{location} seems to have little
2665 importance in representing tables; a reader can safely ignore it.
2667 @defvr {Attribute} part
2668 The part of the table being located.
2671 @defvr {Attribute} method
2672 How the location is determined:
2676 Based on the natural size of the table. Observed only for
2677 parts @code{height} and @code{width}.
2680 Based on the location specified in @code{target}. Observed only for
2681 parts @code{top} and @code{bottom}.
2684 Using the value in @code{value}. Observed only for parts @code{top},
2685 @code{bottom}, and @code{left}.
2688 Same as the specified @code{target}. Observed only for part
2693 @defvr {Attribute} min
2694 Minimum size. Only observed with value @code{100pt}. Only observed
2695 for part @code{width}.
2698 @defvr {Dependent} target
2699 Required when @code{method} is @code{attach} or @code{same}, not
2700 observed otherwise. This identifies an element to attach to.
2701 Observed with the ID of @code{title}, @code{footnote}, @code{graph},
2705 @defvr {Dependent} value
2706 Required when @code{method} is @code{fixed}, not observed otherwise.
2707 Observed values are @code{0%}, @code{0px}, @code{1px}, and @code{3px}
2708 on parts @code{top} and @code{left}, and @code{100%} on part
2712 @node SPV Detail faceting Element
2713 @subsection The @code{faceting} Element
2716 faceting => layer[layers1]* cross layer[layers2]*
2718 cross => (unity | nest) (unity | nest)
2722 nest => variableReference[vars]+
2724 variableReference :ref=ref (sourceVariable | derivedVariable) => EMPTY
2727 :variable=ref (sourceVariable | derivedVariable)
2730 :method[layer_method]=(nest)?
2735 The @code{faceting} element describes the row, column, and layer
2736 structure of the table. Its @code{cross} child determines the row and
2737 column structure, and each @code{layer} child (if any) represents a
2738 layer. Layers may appear before or after @code{cross}.
2740 The @code{cross} element describes the row and column structure of the
2741 table. It has exactly two children, the first of which describes the
2742 table's columns and the second the table's rows. Each child is a
2743 @code{nest} element if the table has any dimensions along the axis in
2744 question, otherwise a @code{unity} element.
2746 A @code{nest} element contains of one or more dimensions listed from
2747 innermost to outermost, each represented by @code{variableReference}
2748 child elements. Each variable in a dimension is listed in order.
2749 @xref{SPV Detail Variable Elements}, for information on the variables
2750 that comprise a dimension.
2752 A @code{nest} can contain a single dimension, e.g.:
2756 <variableReference ref="dimension0categories"/>
2757 <variableReference ref="dimension0group0"/>
2758 <variableReference ref="dimension0"/>
2763 A @code{nest} can contain multiple dimensions, e.g.:
2767 <variableReference ref="dimension1categories"/>
2768 <variableReference ref="dimension1group0"/>
2769 <variableReference ref="dimension1"/>
2770 <variableReference ref="dimension0categories"/>
2771 <variableReference ref="dimension0"/>
2775 A @code{nest} may have no dimensions, in which case it still has one
2776 @code{variableReference} child, which references a
2777 @code{derivedVariable} whose @code{value} attribute is
2778 @code{constant(0)}. In the corpus, such a @code{derivedVariable} has
2779 @code{row} or @code{column}, respectively, as its @code{id}. This is
2780 equivalent to using a @code{unity} element in place of @code{nest}.
2782 A @code{variableReference} element refers to a variable through its
2783 @code{ref} attribute.
2785 Each @code{layer} element represents a dimension, e.g.:
2788 <layer value="0" variable="dimension0categories" visible="true"/>
2789 <layer value="dimension0" variable="dimension0" visible="false"/>
2793 @code{layer} has the following attributes.
2795 @defvr {Attribute} variable
2796 Refers to a @code{sourceVariable} or @code{derivedVariable} element.
2799 @defvr {Attribute} value
2800 The value to select. For a category variable, this is always
2801 @code{0}; for a data variable, it is the same as the @code{variable}
2805 @defvr {Attribute} visible
2806 Whether the layer is visible. Generally, category layers are visible
2807 and data layers are not, but sometimes this attribute is omitted.
2810 @defvr {Attribute} method
2811 When present, this is always @code{nest}.
2814 @node SPV Detail facetLayout Element
2815 @subsection The @code{facetLayout} Element
2818 facetLayout => tableLayout setCellProperties[scp1]*
2819 facetLevel+ setCellProperties[scp2]*
2822 :verticalTitlesInCorner=bool
2824 :fitCells=(ticks both)?
2828 The @code{facetLayout} element and its descendants control styling for
2831 Its @code{tableLayout} child has the following attributes
2833 @defvr {Attribute} verticalTitlesInCorner
2834 If true, in the absence of corner text, row headings will be displayed
2838 @defvr {Attribute} style
2839 Refers to a @code{style} element.
2842 @defvr {Attribute} fitCells
2846 @subsubheading The @code{facetLevel} Element
2849 facetLevel :level=int :gap=dimension? => axis
2851 axis :style=ref style => label? majorTicks
2857 :tickFrameStyle=ref style
2858 :labelFrequency=int?
2868 Each @code{facetLevel} describes a @code{variableReference} or
2869 @code{layer}, and a table has one @code{facetLevel} element for
2870 each such element. For example, an SPV detail member that contains
2871 four @code{variableReference} elements and two @code{layer} elements
2872 will contain six @code{facetLevel} elements.
2874 In the corpus, @code{facetLevel} elements and the elements that they
2875 describe are always in the same order. The correspondence may also be
2876 observed in two other ways. First, one may use the @code{level}
2877 attribute, described below. Second, in the corpus, a
2878 @code{facetLevel} always has an @code{id} that is the same as the
2879 @code{id} of the element it describes with @code{_facetLevel}
2880 appended. One should not formally rely on this, of course, but it is
2881 usefully indicative.
2883 @defvr {Attribute} level
2884 A 1-based index into the @code{variableReference} and @code{layer}
2885 elements, e.g.@: a @code{facetLayout} with a @code{level} of 1
2886 describes the first @code{variableReference} in the SPV detail member,
2887 and in a member with four @code{variableReference} elements, a
2888 @code{facetLayout} with a @code{level} of 5 describes the first
2889 @code{layer} in the member.
2892 @defvr {Attribute} gap
2893 Always observed as @code{0pt}.
2896 Each @code{facetLevel} contains an @code{axis}, which in turn may
2897 contain a @code{label} for the @code{facetLevel} (@pxref{SPV Detail
2898 label Element}) and does contain a @code{majorTicks} element.
2900 @defvr {Attribute} labelAngle
2901 Normally 0. The value -90 causes inner column or outer row labels to
2902 be rotated vertically.
2905 @defvr {Attribute} style
2906 @defvrx {Attribute} tickFrameStyle
2907 Each refers to a @code{style} element. @code{style} is the style of
2908 the tick labels, @code{tickFrameStyle} the style for the frames around
2912 @node SPV Detail label Element
2913 @subsection The @code{label} Element
2918 :textFrameStyle=ref style?
2919 :purpose=(title | subTitle | subSubTitle | layer | footnote)?
2920 => text+ | descriptionGroup
2923 :target=ref faceting
2925 => (description | text)+
2927 description :name=(variable | value) => EMPTY
2931 :definesReference=int?
2932 :position=(subscript | superscript)?
2937 This element represents a label on some aspect of the table.
2939 @defvr {Attribute} style
2940 @defvrx {Attribute} textFrameStyle
2941 Each of these refers to a @code{style} element. @code{style} is the
2942 style of the label text, @code{textFrameStyle} the style for the frame
2946 @defvr {Attribute} purpose
2947 The kind of entity being labeled.
2950 A @code{descriptionGroup} concatenates one or more elements to form a
2951 label. Each element can be a @code{text} element, which contains
2952 literal text, or a @code{description} element that substitutes a value
2955 @defvr {Attribute} target
2956 The @code{id} of an element being described. In the corpus, this is
2957 always @code{faceting}.
2960 @defvr {Attribute} separator
2961 A string to separate the description of multiple groups, if the
2962 @code{target} has more than one. In the corpus, this is always a
2966 Typical contents for a @code{descriptionGroup} are a value by itself:
2968 <description name="value"/>
2970 @noindent or a variable and its value, separated by a colon:
2972 <description name="variable"/><text>:</text><description name="value"/>
2975 A @code{description} is like a macro that expands to some property of
2976 the target of its parent @code{descriptionGroup}. The @code{name}
2977 attribute specifies the property.
2979 @node SPV Detail setCellProperties Element
2980 @subsection The @code{setCellProperties} Element
2984 :applyToConverse=bool?
2985 => (setStyle | setFrameStyle | setFormat | setMetaData)* union[union_]?
2988 The @code{setCellProperties} element sets style properties of cells or
2989 row or column labels.
2991 Interpreting @code{setCellProperties} requires answering two
2992 questions: which cells or labels to style, and what styles to use.
2994 @subsubheading Which Cells?
2999 intersect => where+ | intersectWhere | alternating | EMPTY
3002 :variable=ref (sourceVariable | derivedVariable)
3007 :variable=ref (sourceVariable | derivedVariable)
3008 :variable2=ref (sourceVariable | derivedVariable)
3011 alternating => EMPTY
3014 When @code{union} is present with @code{intersect} children, each of
3015 those children specifies a group of cells that should be styled, and
3016 the total group is all those cells taken together. When @code{union}
3017 is absent, every cell is styled. One attribute on
3018 @code{setCellProperties} affects the choice of cells:
3020 @defvr {Attribute} applyToConverse
3021 If true, this inverts the meaning of the cell selection: the selected
3022 cells are the ones @emph{not} designated. This is confusing, given
3023 the additional restrictions of @code{union}, but in the corpus
3024 @code{applyToConverse} is never present along with @code{union}.
3027 An @code{intersect} specifies restrictions on the cells to be matched.
3028 Each @code{where} child specifies which values of a given variable to
3029 include. The attributes of @code{intersect} are:
3031 @defvr {Attribute} variable
3032 Refers to a variable, e.g.@: @code{dimension0categories}. Only
3033 ``categories'' variables make sense here, but other variables, e.g.@:
3034 @code{dimension0group0map}, are sometimes seen. The reader may ignore
3038 @defvr {Attribute} include
3039 A value, or multiple values separated by semicolons,
3040 e.g.@: @code{0} or @code{13;14;15;16}.
3043 PSPP ignores @code{setCellProperties} when @code{intersectWhere} is
3046 @subsubheading What Styles?
3050 :target=ref (labeling | graph | interval | majorTicks)
3054 setMetaData :target=ref graph :key :value => EMPTY
3057 :target=ref (majorTicks | labeling)
3059 => format | numberFormat | stringFormat+ | dateTimeFormat | elapsedTimeFormat
3063 :target=ref majorTicks
3067 The @code{set*} children of @code{setCellProperties} determine the
3070 When @code{setCellProperties} contains a @code{setFormat} whose
3071 @code{target} references a @code{labeling} element, or if it contains
3072 a @code{setStyle} that references a @code{labeling} or @code{interval}
3073 element, the @code{setCellProperties} sets the style for table cells.
3074 The format from the @code{setFormat}, if present, replaces the cells'
3075 format. The style from the @code{setStyle} that references
3076 @code{labeling}, if present, replaces the label's font and cell
3077 styles, except that the background color is taken instead from the
3078 @code{interval}'s style, if present.
3080 When @code{setCellProperties} contains a @code{setFormat} whose
3081 @code{target} references a @code{majorTicks} element, or if it
3082 contains a @code{setStyle} whose @code{target} references a
3083 @code{majorTicks}, or if it contains a @code{setFrameStyle} element,
3084 the @code{setCellProperties} sets the style for row or column labels.
3085 In this case, the @code{setCellProperties} always contains a single
3086 @code{where} element whose @code{variable} designates the variable
3087 whose labels are to be styled. The format from the @code{setFormat},
3088 if present, replaces the labels' format. The style from the
3089 @code{setStyle} that references @code{majorTicks}, if present,
3090 replaces the labels' font and cell styles, except that the background
3091 color is taken instead from the @code{setFrameStyle}'s style, if
3094 When @code{setCellProperties} contains a @code{setStyle} whose
3095 @code{target} references a @code{graph} element, and one that
3096 references a @code{labeling} element, and the @code{union} element
3097 contains @code{alternating}, the @code{setCellProperties} sets the
3098 alternate foreground and background colors for the data area. The
3099 foreground color is taken from the style referenced by the
3100 @code{setStyle} that targets the @code{graph}, the background color
3101 from the @code{setStyle} for @code{labeling}.
3103 A reader may ignore a @code{setCellProperties} that only contains
3104 @code{setMetaData}, as well as @code{setMetaData} within other
3105 @code{setCellProperties}.
3107 A reader may ignore a @code{setCellProperties} whose only @code{set*}
3108 child is a @code{setStyle} that targets the @code{graph} element.
3110 @subsubheading The @code{setStyle} Element
3114 :target=ref (labeling | graph | interval | majorTicks)
3119 This element associates a style with the target.
3121 @defvr {Attribute} target
3122 The @code{id} of an element whose style is to be set.
3125 @defvr {Attribute} style
3126 The @code{id} of a @code{style} element that identifies the style to
3130 @node SPV Detail setFormat Element
3131 @subsection The @code{setFormat} Element
3135 :target=ref (majorTicks | labeling)
3137 => format | numberFormat | stringFormat+ | dateTimeFormat | elapsedTimeFormat
3140 This element sets the format of the target, ``format'' in this case
3141 meaning the SPSS print format for a variable.
3143 The details of this element vary depending on the schema version, as
3144 declared in the root @code{visualization} element's @code{version}
3145 attribute (@pxref{SPV Detail visualization Element}). A reader can
3146 interpret the content without knowing the schema version.
3148 The @code{setFormat} element itself has the following attributes.
3150 @defvr {Attribute} target
3151 Refers to an element whose style is to be set.
3154 @defvr {Attribute} reset
3155 If this is @code{true}, this format replaces the target's previous
3156 format. If it is @code{false}, the modifies the previous format.
3160 * SPV Detail numberFormat Element::
3161 * SPV Detail stringFormat Element::
3162 * SPV Detail dateTimeFormat Element::
3163 * SPV Detail elapsedTimeFormat Element::
3164 * SPV Detail format Element::
3165 * SPV Detail affix Element::
3168 @node SPV Detail numberFormat Element
3169 @subsubsection The @code{numberFormat} Element
3173 :minimumIntegerDigits=int?
3174 :maximumFractionDigits=int?
3175 :minimumFractionDigits=int?
3177 :scientific=(onlyForSmall | whenNeeded | true | false)?
3184 Specifies a format for displaying a number. The available options are
3185 a superset of those available from PSPP print formats. PSPP chooses a
3186 print format type for a @code{numberFormat} as follows:
3190 If @code{scientific} is @code{true}, uses @code{E} format.
3193 If @code{prefix} is @code{$}, uses @code{DOLLAR} format.
3196 If @code{suffix} is @code{%}, uses @code{PCT} format.
3199 If @code{useGrouping} is @code{true}, uses @code{COMMA} format.
3202 Otherwise, uses @code{F} format.
3205 For translating to a print format, PSPP uses
3206 @code{maximumFractionDigits} as the number of decimals, unless that
3207 attribute is missing or out of the range [0,15], in which case it uses
3210 @defvr {Attribute} minimumIntegerDigits
3211 Minimum number of digits to display before the decimal point. Always
3212 observed as @code{0}.
3215 @defvr {Attribute} maximumFractionDigits
3216 @defvrx {Attribute} minimumFractionDigits
3217 Maximum or minimum, respectively, number of digits to display after
3218 the decimal point. The observed values of each attribute range from 0
3222 @defvr {Attribute} useGrouping
3223 Whether to use the grouping character to group digits in large
3227 @defvr {Attribute} scientific
3228 This attribute controls when and whether the number is formatted in
3229 scientific notation. It takes the following values:
3233 Use scientific notation only when the number's magnitude is smaller
3234 than the value of the @code{small} attribute.
3237 Use scientific notation when the number will not otherwise fit in the
3241 Always use scientific notation. Not observed in the corpus.
3244 Never use scientific notation. A number that won't otherwise fit will
3245 be replaced by an error indication (see the @code{errorCharacter}
3246 attribute). Not observed in the corpus.
3250 @defvr {Attribute} small
3251 Only present when the @code{scientific} attribute is
3252 @code{onlyForSmall}, this is a numeric magnitude below which the
3253 number will be formatted in scientific notation. The values @code{0}
3254 and @code{0.0001} have been observed. The value @code{0} seems like a
3255 pathological choice, since no real number has a magnitude less than 0;
3256 perhaps in practice such a choice is equivalent to setting
3257 @code{scientific} to @code{false}.
3260 @defvr {Attribute} prefix
3261 @defvrx {Attribute} suffix
3262 Specifies a prefix or a suffix to apply to the formatted number. Only
3263 @code{suffix} has been observed, with value @samp{%}.
3266 @node SPV Detail stringFormat Element
3267 @subsubsection The @code{stringFormat} Element
3270 stringFormat => relabel* affix*
3272 relabel :from=real :to => EMPTY
3275 The @code{stringFormat} element specifies how to display a string. By
3276 default, a string is displayed verbatim, but @code{relabel} can change
3279 The @code{relabel} element appears as a child of @code{stringFormat}
3280 (and of @code{format}, when it is used to format strings). It
3281 specifies how to display a given value. It is used to implement value
3282 labels and to display the system-missing value in a human-readable
3283 way. It has the following attributes:
3285 @defvr {Attribute} from
3286 The value to map. In the corpus this is an integer or the
3287 system-missing value @code{-1.797693134862316E300}.
3290 @defvr {Attribute} to
3291 The string to display in place of the value of @code{from}. In the
3292 corpus this is a wide variety of value labels; the system-missing
3293 value is mapped to @samp{.}.
3296 @node SPV Detail dateTimeFormat Element
3297 @subsubsection The @code{dateTimeFormat} Element
3301 :baseFormat[dt_base_format]=(date | time | dateTime)
3303 :mdyOrder=(dayMonthYear | monthDayYear | yearMonthDay)?
3305 :yearAbbreviation=bool?
3310 :monthFormat=(long | short | number | paddedNumber)?
3314 :showDayOfWeek=bool?
3315 :dayOfWeekAbbreviation=bool?
3317 :dayOfMonthPadding=bool?
3319 :minutePadding=bool?
3320 :secondPadding=bool?
3326 :dayType=(month | year)?
3327 :hourFormat=(AMPM | AS_24 | AS_12)?
3331 This element appears only in schema version 2.5 and earlier
3332 (@pxref{SPV Detail visualization Element}).
3334 Data to be formatted in date formats is stored as strings in legacy
3335 data, in the format @code{yyyy-mm-ddTHH:MM:SS.SSS} and must be parsed
3336 and reformatted by the reader.
3338 The following attribute is required.
3340 @defvr {Attribute} baseFormat
3341 Specifies whether a date and time are both to be displayed, or just
3345 Many of the attributes' meanings are obvious. The following seem to
3346 be worth documenting.
3348 @defvr {Attribute} separatorChars
3349 Exactly four characters. In order, these are used for: decimal point,
3350 grouping, date separator, time separator. Always @samp{.,-:}.
3353 @defvr {Attribute} mdyOrder
3354 Within a date, the order of the days, months, and years.
3355 @code{dayMonthYear} is the only observed value, but one would expect
3356 that @code{monthDayYear} and @code{yearMonthDay} to be reasonable as
3360 @defvr {Attribute} showYear
3361 @defvrx {Attribute} yearAbbreviation
3362 Whether to include the year and, if so, whether the year should be
3363 shown abbreviated, that is, with only 2 digits. Each is @code{true}
3364 or @code{false}; only values of @code{true} and @code{false},
3365 respectively, have been observed.
3368 @defvr {Attribute} showMonth
3369 @defvrx {Attribute} monthFormat
3370 Whether to include the month (@code{true} or @code{false}) and, if so,
3371 how to format it. @code{monthFormat} is one of the following:
3375 The full name of the month, e.g.@: in an English locale,
3379 The abbreviated name of the month, e.g.@: in an English locale,
3383 The number representing the month, e.g.@: 9 for September.
3386 A two-digit number representing the month, e.g.@: 09 for September.
3389 Only values of @code{true} and @code{short}, respectively, have been
3393 @defvr {Attribute} dayType
3394 This attribute is always @code{month} in the corpus, specifying that
3395 the day of the month is to be displayed; a value of @code{year} is
3396 supposed to indicate that the day of the year, where 1 is January 1,
3397 is to be displayed instead.
3400 @defvr {Attribute} hourFormat
3401 @code{hourFormat}, if present, is one of:
3405 The time is displayed with an @code{am} or @code{pm} suffix, e.g.@:
3409 The time is displayed in a 24-hour format, e.g.@: @code{22:15}.
3411 This is the only value observed in the corpus.
3414 The time is displayed in a 12-hour format, without distinguishing
3415 morning or evening, e.g.@: @code{10;15}.
3418 @code{hourFormat} is sometimes present for @code{elapsedTime} formats,
3419 which is confusing since a time duration does not have a concept of AM
3420 or PM. This might indicate a bug in the code that generated the XML
3421 in the corpus, or it might indicate that @code{elapsedTime} is
3422 sometimes used to format a time of day.
3425 For a @code{baseFormat} of @code{date}, PSPP chooses a print format
3426 type based on the following rules:
3430 If @code{showQuarter} is true: @code{QYR}.
3433 Otherwise, if @code{showWeek} is true: @code{WKYR}.
3436 Otherwise, if @code{mdyOrder} is @code{dayMonthYear}:
3440 If @code{monthFormat} is @code{number} or @code{paddedNumber}: @code{EDATE}.
3443 Otherwise: @code{DATE}.
3447 Otherwise, if @code{mdyOrder} is @code{yearMonthDay}: @code{SDATE}.
3450 Otherwise, @code{ADATE}.
3453 For a @code{baseFormat} of @code{dateTime}, PSPP uses @code{YMDHMS} if
3454 @code{mdyOrder} is @code{yearMonthDay} and @code{DATETIME} otherwise.
3455 For a @code{baseFormat} of @code{time}, PSPP uses @code{DTIME} if
3456 @code{showDay} is true, otherwise @code{TIME} if @code{showHour} is
3457 true, otherwise @code{MTIME}.
3459 For a @code{baseFormat} of @code{date}, the chosen width is the
3460 minimum for the format type, adding 2 if @code{yearAbbreviation} is
3461 false or omitted. For other base formats, the chosen width is the
3462 minimum for its type, plus 3 if @code{showSecond} is true, plus 4 more
3463 if @code{showMillis} is also true. Decimals are 0 by default, or 3
3464 if @code{showMillis} is true.
3466 @node SPV Detail elapsedTimeFormat Element
3467 @subsubsection The @code{elapsedTimeFormat} Element
3471 :baseFormat[dt_base_format]=(date | time | dateTime)
3474 :minutePadding=bool?
3475 :secondPadding=bool?
3485 This element specifies the way to display a time duration.
3487 Data to be formatted in elapsed time formats is stored as strings in
3488 legacy data, in the format @code{H:MM:SS.SSS}, with additional hour
3489 digits as needed for long durations, and must be parsed and
3490 reformatted by the reader.
3492 The following attribute is required.
3494 @defvr {Attribute} baseFormat
3495 Specifies whether a day and a time are both to be displayed, or just
3499 The remaining attributes specify exactly how to display the elapsed
3502 For @code{baseFormat} of @code{time}, PSPP converts this element to
3503 print format type @code{DTIME}; otherwise, if @code{showHour} is true,
3504 to @code{TIME}; otherwise, to @code{MTIME}. The chosen width is the
3505 minimum for the chosen type, adding 3 if @code{showSecond} is true,
3506 adding 4 more if @code{showMillis} is also true. Decimals are 0 by
3507 default, or 3 if @code{showMillis} is true.
3509 @node SPV Detail format Element
3510 @subsubsection The @code{format} Element
3514 :baseFormat[f_base_format]=(date | time | dateTime | elapsedTime)?
3517 :mdyOrder=(dayMonthYear | monthDayYear | yearMonthDay)?
3522 :yearAbbreviation=bool?
3524 :monthFormat=(long | short | number | paddedNumber)?
3526 :dayOfMonthPadding=bool?
3530 :showDayOfWeek=bool?
3531 :dayOfWeekAbbreviation=bool?
3533 :minutePadding=bool?
3534 :secondPadding=bool?
3540 :dayType=(month | year)?
3541 :hourFormat=(AMPM | AS_24 | AS_12)?
3542 :minimumIntegerDigits=int?
3543 :maximumFractionDigits=int?
3544 :minimumFractionDigits=int?
3546 :scientific=(onlyForSmall | whenNeeded | true | false)?
3550 :tryStringsAsNumbers=bool?
3551 :negativesOutside=bool?
3555 This element is the union of all of the more-specific format elements.
3556 It is interpreted in the same way as one of those format elements,
3557 using @code{baseFormat} to determine which kind of format to use.
3559 There are a few attributes not present in the more specific formats:
3561 @defvr {Attribute} tryStringsAsNumbers
3562 When this is @code{true}, it is supposed to indicate that string
3563 values should be parsed as numbers and then displayed according to
3564 numeric formatting rules. However, in the corpus it is always
3568 @defvr {Attribute} negativesOutside
3569 If true, the negative sign should be shown before the prefix; if
3570 false, it should be shown after.
3573 @node SPV Detail affix Element
3574 @subsubsection The @code{affix} Element
3578 :definesReference=int
3579 :position=(subscript | superscript)
3585 This defines a suffix (or, theoretically, a prefix) for a formatted
3586 value. It is used to insert a reference to a footnote. It has the
3587 following attributes:
3589 @defvr {Attribute} definesReference
3590 This specifies the footnote number as a natural number: 1 for the
3591 first footnote, 2 for the second, and so on.
3594 @defvr {Attribute} position
3595 Position for the footnote label. Always @code{superscript}.
3598 @defvr {Attribute} suffix
3599 Whether the affix is a suffix (@code{true}) or a prefix
3600 (@code{false}). Always @code{true}.
3603 @defvr {Attribute} value
3604 The text of the suffix or prefix. Typically a letter, e.g.@: @code{a}
3605 for footnote 1, @code{b} for footnote 2, @enddots{} The corpus
3606 contains other values: @code{*}, @code{**}, and a few that begin with
3607 at least one comma: @code{,b}, @code{,c}, @code{,,b}, and @code{,,c}.
3610 @node SPV Detail interval Element
3611 @subsection The @code{interval} Element
3614 interval :style=ref style => labeling footnotes?
3618 :variable=ref (sourceVariable | derivedVariable)
3619 => (formatting | format | footnotes)*
3621 formatting :variable=ref (sourceVariable | derivedVariable) => formatMapping*
3623 formatMapping :from=int => format?
3627 :variable=ref (sourceVariable | derivedVariable)
3630 footnoteMapping :definesReference=int :from=int :to => EMPTY
3633 The @code{interval} element and its descendants determine the basic
3634 formatting and labeling for the table's cells. These basic styles are
3635 overridden by more specific styles set using @code{setCellProperties}
3636 (@pxref{SPV Detail setCellProperties Element}).
3638 The @code{style} attribute of @code{interval} itself may be ignored.
3640 The @code{labeling} element may have a single @code{formatting} child.
3641 If present, its @code{variable} attribute refers to a variable whose
3642 values are format specifiers as numbers, e.g. value 0x050802 for F8.2.
3643 However, the numbers are not actually interpreted that way. Instead,
3644 each number actually present in the variable's data is mapped by a
3645 @code{formatMapping} child of @code{formatting} to a @code{format}
3646 that specifies how to display it.
3648 The @code{labeling} element may also have a @code{footnotes} child
3649 element. The @code{variable} attribute of this element refers to a
3650 variable whose values are comma-delimited strings that list the
3651 1-based indexes of footnote references. (Cells without any footnote
3652 references are numeric 0 instead of strings.)
3654 Each @code{footnoteMapping} child of the @code{footnotes} element
3655 defines the footnote marker to be its @code{to} attribute text for the
3656 footnote whose 1-based index is given in its @code{definesReference}
3659 @node SPV Detail style Element
3660 @subsection The @code{style} Element
3667 :border-bottom=(solid | thick | thin | double | none)?
3668 :border-top=(solid | thick | thin | double | none)?
3669 :border-left=(solid | thick | thin | double | none)?
3670 :border-right=(solid | thick | thin | double | none)?
3671 :border-bottom-color?
3674 :border-right-color?
3677 :font-weight=(regular | bold)?
3678 :font-style=(regular | italic)?
3679 :font-underline=(none | underline)?
3680 :margin-bottom=dimension?
3681 :margin-left=dimension?
3682 :margin-right=dimension?
3683 :margin-top=dimension?
3684 :textAlignment=(left | right | center | decimal | mixed)?
3685 :labelLocationHorizontal=(positive | negative | center)?
3686 :labelLocationVertical=(positive | negative | center)?
3687 :decimal-offset=dimension?
3694 A @code{style} element has an effect only when it is referenced by
3695 another element to set some aspect of the table's style. Most of the
3696 attributes are self-explanatory. The rest are described below.
3698 @defvr {Attribute} {color}
3699 In some cases, the text color; in others, the background color.
3702 @defvr {Attribute} {color2}
3706 @defvr {Attribute} {labelAngle}
3707 Normally 0. The value -90 causes inner column or outer row labels to
3708 be rotated vertically.
3711 @defvr {Attribute} {labelLocationHorizontal}
3715 @defvr {Attribute} {labelLocationVertical}
3716 The value @code{positive} corresponds to vertically aligning text to
3717 the top of a cell, @code{negative} to the bottom, @code{center} to the
3721 @node SPV Detail labelFrame Element
3722 @subsection The @code{labelFrame} Element
3725 labelFrame :style=ref style => location+ label? paragraph?
3727 paragraph :hangingIndent=dimension? => EMPTY
3730 A @code{labelFrame} element specifies content and style for some
3731 aspect of a table. Only @code{labelFrame} elements that have a
3732 @code{label} child are important. The @code{purpose} attribute in the
3733 @code{label} determines what the @code{labelFrame} affects:
3737 The table's title and its style.
3740 The table's caption and its style.
3743 The table's footnotes and the style for the footer area.
3746 The style for the layer area.
3752 The @code{style} attribute references the style to use for the area.
3754 The @code{label}, if present, specifies the text to put into the title
3755 or caption or footnotes. For footnotes, the label has two @code{text}
3756 children for every footnote, each of which has a @code{usesReference}
3757 attribute identifying the 1-based index of a footnote. The first,
3758 third, fifth, @dots{} @code{text} child specifies the content for a
3759 footnote; the second, fourth, sixth, @dots{} child specifies the
3760 marker. Content tends to end in a new-line, which the reader may wish
3761 to trim; similarly, markers tend to end in @samp{.}.
3763 The @code{paragraph}, if present, may be ignored, since it is always
3766 @node SPV Detail Legacy Properties
3767 @subsection Legacy Properties
3769 The detail XML format has features for styling most of the aspects of
3770 a table. It also inherits defaults for many aspects from structure
3771 XML, which has the following @code{tableProperties} element:
3776 => generalProperties footnoteProperties cellFormatProperties borderProperties printingProperties
3779 :hideEmptyRows=bool?
3780 :maximumColumnWidth=dimension?
3781 :maximumRowWidth=dimension?
3782 :minimumColumnWidth=dimension?
3783 :minimumRowWidth=dimension?
3784 :rowDimensionLabels=(inCorner | nested)?
3788 :markerPosition=(superscript | subscript)?
3789 :numberFormat=(alphabetic | numeric)?
3792 cellFormatProperties => cell_style+
3795 :alternatingColor=color?
3796 :alternatingTextColor=color?
3804 :font-style=(regular | italic)?
3805 :font-weight=(regular | bold)?
3806 :font-underline=(none | underline)?
3807 :labelLocationVertical=(positive | negative | center)?
3808 :margin-bottom=dimension?
3809 :margin-left=dimension?
3810 :margin-right=dimension?
3811 :margin-top=dimension?
3812 :textAlignment=(left | right | center | decimal | mixed)?
3813 :decimal-offset=dimension?
3816 borderProperties => border_style+
3819 :borderStyleType=(none | solid | dashed | thick | thin | double)?
3824 :printAllLayers=bool?
3825 :rescaleLongTableToFitPage=bool?
3826 :rescaleWideTableToFitPage=bool?
3827 :windowOrphanLines=int?
3829 :continuationTextAtBottom=bool?
3830 :continuationTextAtTop=bool?
3831 :printEachLayerOnSeparatePage=bool?
3835 The @code{name} attribute appears only in standalone @file{.stt} files
3836 (@pxref{SPSS TableLook STT Format}).