1 @node SPSS Viewer Format
2 @section SPSS Viewer Format
4 SPSS Viewer or @file{.spv} files, here called SPV files, are written
5 by SPSS 16 and later to represent the contents of its output editor.
6 This section documents the format. This description is detailed
7 enough to read SPV files, but it is probably not sufficient to
10 An an aside, SPSS 15 and earlier versions use a completely different
11 output format based on the Microsoft Compound Document Format. This
12 format is not documented.
14 An SPV file is a Zip archive that can be read with @command{zipinfo}
15 and @command{unzip} and similar programs. The final member in the Zip
16 archive is a file named @file{META-INF/MANIFEST.MF}. This structure
17 makes SPV files resemble Java ``JAR'' files, but whereas a JAR
18 manifest contains a sequence of colon-delimited key/value pairs, an
19 SPV manifest contains the string @samp{allowPivoting=true}, without a
22 The rest of the members in an SPV file's Zip archive fall into two
23 categories: structure and details. ``Structure'' member names begin
24 with @file{outputViewer@var{nnnnnnnnnn}}, where each @var{n} is a
25 decimal digit, and end with @file{.xml}, and often include the string
26 @file{_heading} in between. Each of these members represents some
27 kind of output item (a table, a heading, a block of text, etc.) or a
28 group of them. The member whose output goes at the beginning of the
29 document is numbered 0, the next member in the output is numbered 1,
32 Structure members contain XML. This XML is sometimes self-contained,
33 but it often references other members in the Zip archive named as
37 @item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin}
38 @itemx @file{@var{prefix}_lightTableData.bin}
39 The structure of a table plus its data. Older SPV files pair a
40 @file{@var{prefix}_table.xml} file that describes the table's
41 structure with a binary @file{@var{prefix}_tableData.bin} file that
42 gives its data. Newer SPV files (the majority of those in the corpus)
43 instead include a single @file{@var{prefix}_lightTableData.bin} file
44 that incorporates both into a single binary format.
46 @item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin}
47 @itemx @file{@var{prefix}_lightWarningData.bin}
48 Same format used for tables, with a different name.
50 @item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin}
51 @itemx @file{@var{prefix}_lightNotesData.bin}
52 Same format used for tables, with a different name.
54 @item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml}
55 The structure of a chart plus its data. Charts do not have a
58 @item @var{prefix}_model.xml
59 @itemx @var{prefix}_pmml.xml
60 @itemx @var{prefix}_stats.xml
61 Not yet investigated. The corpus contains only one example of each.
64 The @file{@var{prefix}} in the names of the detail members is
65 typically an 11-digit decimal number that increases for each item,
66 tending to skip values. Older SPV files use different naming
67 conventions. Structure member refer to detail members by name, and so
68 their exact names do not appear to matter as long as they are unique.
70 @node SPV Structure Member Format
71 @subsection Structure Member Format
73 Structure members XML files claim conformance with a collection of XML
74 Schemas. These schemas are distributed, under a nonfree license, with
75 SPSS binaries. Fortunately, the schemas are not necessary to
76 understand the structure members. To a degree, the schemas can even
77 be deceptive because they document elements and attributes that are
78 not in the corpus and lack documentation of elements and attributes
79 that are commonly found in the corpus.
81 Structure members use a different XML namespace for each schema, but
82 these namespaces are not entirely consistent: in some SPV files, for
83 example, the @code{viewer-tree} schema is associated with namespace
84 @indicateurl{http://xml.spss.com/spss/viewer-tree} and in other with
85 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the
86 additional @file{viewer/} directory. In any case, the schema URIs are
87 not resolvable to obtain the schemas themselves.
89 One may ignore all of the above in interpreting a structure member.
90 The actual XML has a simple and straightforward form that does not
91 require a reader to take schemas or namespaces into account.
95 Parent: Document root or @code{heading} @*
96 Contents: [@code{pageSetup}] @code{label} [@code{container} | @code{heading}]*
98 The root of a structure member is a @code{heading}, which represents a
99 section of output beginning with a title (the @code{label}) and
100 ordinarily followed by content containers or further nested
101 (sub)-sections of output.
103 The document root heading may also contain a @code{pageSetup} element.
105 The following attributes have been observed on both document root and
106 nested @code{heading} elements:
109 @item Optional attribute: @code{creator-version}
110 The version of the software that created this SPV file. A string of
111 the form @code{xxyyzzww} represents software version xx.yy.zz.ww,
112 e.g.@: @code{21000001} is version 21.0.0.1. Trailing pairs of zeros
113 are sometimes omitted, so that @code{21}, @code{210000}, and
114 @code{21000000} are all version 21.0.0.0 (and the corpus contains all
115 three of those forms).
118 The following attributes have been observed on document root
119 @code{heading} elements only:
122 @item Optional attribute: @code{creator}
123 The directory of the software that created this SPV file,
124 e.g. @file{C:\PROGRA~1\IBM\SPSS\STATIS~1\22} or
125 @file{/Applications/IBM/SPSS/Statistics/22/SPSSStatistics.app/Contents/Resources/Java/../../bin}.
127 @item Optional attribute: @code{creation-date-time}
128 The date and time at which the SPV file was written, in a
129 locale-specific format, e.g. @code{Friday, May 16, 2014 6:47:37 PM
130 PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday,
131 December 5, 2014 5:00:19 o'clock PM EST}.
133 @item Optional attribute: @code{lockReader}
134 Whether a reader should be allowed to edit the output. The possible
135 values are @code{true} and @code{false}, but the corpus only contains
138 @item Optional attribute: @code{schemaLocation}
139 This is actually an XML Namespace attribute. A reader may ignore it.
142 The following attributes have been observed only on nested
143 @code{heading} elements:
146 @item Required attribute: @code{commandName}
147 The locale-invariant name of the command that produced the output,
148 e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
150 @item Optional attribute: @code{visibility}
151 To what degree the output represented by the element is visible. The
152 only observed value is @code{collapsed}.
154 @item Optional attribute: @code{locale}
155 The locale used for output, in Windows format, which is similar to the
156 format used in Unix with the underscore replaced by a hyphen, e.g.@:
157 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
159 @item Optional attribute: @code{olang}
160 The output language, e.g.@: @code{en}, @code{it}, @code{es},
161 @code{de}, @code{pt-BR}.
165 Parent: @code{heading} or @code{container} @*
168 Every @code{heading} and @code{container} holds a @code{label} as its
169 first child. The root @code{heading} in a structure member always
170 contains the string ``Output''. Otherwise, the text in @code{label}
171 describes what it labels, often by naming the statistical procedure
172 that was executed, e.g.@: ``Frequencies'' or ``T-Test''. Labels are
173 often very generic, especially within a @code{container}, e.g.@:
174 ``Title'' or ``Warnings'' or ``Notes''. Label text is localized
175 according to the output language, e.g. in Italian a frequency table
176 procedure is labeled ``Frequenze''.
178 The corpus contains one example of an empty label, one that contains
182 Parent: @code{heading} @*
183 Contents: @code{label} [@code{table} | @code{text}]
185 A @code{container} serves to label a @code{table} or a @code{text}
189 @item Required attribute: @code{visibility}
190 Either @code{visible} or @code{hidden}, this indicates whether the
191 container's content is displayed.
193 @item Optional attribute: @code{text-align}
194 Presumably indicates the alignment of text within the container. The
195 only observed value is @code{left}. Observed with nested @code{table}
196 and @code{text} elements.
198 @item Optional attribute: @code{width}
199 The width of the container in the form @code{@var{n}px}, e.g.@:
204 Parent: @code{container} @*
205 Contents: @code{html}
207 This @code{text} element is nested inside a @code{container}. There
208 is a different @code{text} element that is nested inside a
209 @code{pageParagraph}.
212 @item Required attribute: @code{type}
213 One of @code{title}, @code{log}, or @code{text}.
215 @item Optional attribute: @code{commandName}
216 As on the @code{heading} element. For output not specific to a
217 command, this is simply @code{log}. The corpus contains one example
218 of where @code{commandName} is present but set to the empty string.
220 @item Optional attribute: @code{creator-version}
221 As on the @code{heading} element.
225 Parent: @code{text} @*
228 The cdata contains an HTML document. In some cases, the document
229 starts with @code{<html>} and ends with @code{</html}; in others the
230 @code{html} element is implied. Generally the HTML includes a
231 @code{head} element with a CSS stylesheet. The HTML body often begins
232 with @code{<BR>}. The actual content ranges from trivial to simple:
233 just discarding the CSS and tags yields readable results.
236 @item Required attribute: @code{lang}
237 This always contains @code{en} in the corpus.
241 Parent: @code{container} @*
242 Contents: @code{tableStructure}
245 @item Required attribute: @code{commandName}
246 As on the @code{heading} element.
248 @item Required attribute: @code{type}
249 One of @code{table}, @code{note}, or @code{warning}.
251 @item Required attribute: @code{subType}
252 The locale-invariant name for the particular kind of output that this
253 table represents in the procedure. This can be the same as
254 @code{commandName} e.g.@: @code{Frequencies}, or different, e.g.@:
255 @code{Case Processing Summary}. Generic subtypes @code{Notes} and
256 @code{Warnings} are often used.
258 @item Required attribute: @code{tableId}
259 A number that uniquely identifies the table within the SPV file,
260 typically a large negative number such as @code{-4147135649387905023}.
262 @item Optional attribute: @code{creator-version}
263 As on the @code{heading} element. In the corpus, this is only present
264 for version 21 and up and always includes all 8 digits.
269 Contents: @code{dataPath}
272 Parent: @code{tableStructure}
275 Contains the name of the Zip member that holds the table details,
276 e.g.@: @code{0000000001437_lightTableData.bin}.
279 Parent: @code{heading} @*
280 Contents: @code{pageHeader} @code{pageFooter}
283 @item Required attribute: @code{initial-page-number}
286 @item Optional attribute: @code{chart-size}
287 Always @code{as-is} or a localization (!) of it (e.g.@: @code{dimensione
288 attuale}, @code{Wie vorgegeben}).
290 @item Optional attribute: @code{margin-left}
291 @itemx Optional attribute: @code{margin-right}
292 @itemx Optional attribute: @code{margin-top}
293 @itemx Optional attribute: @code{margin-bottom}
294 Margin sizes in the form @code{@var{size}in}, e.g.@: @code{0.25in}.
296 @item Optional attribute: @code{paper-height}
297 @itemx Optional attribute: @code{paper-width}
298 Paper sizes in the form @code{@var{size}in}, e.g.@: @code{8.5in} by
299 @code{11in} for letter paper or @code{8.267in} by @code{11.692in} for
302 @item Optional attribute: @code{reference-orientation}
305 @item Optional attribute: @code{space-after}
311 Parent: @code{pageSetup} @*
312 Contents: @code{pageParagraph}*
317 Parent: @code{pageHeader} or @code{pageFooter} @*
318 Contents: @code{text}
320 Text to go at the top or bottom of a page, respectively.
323 Parent: @code{pageParagraph} @*
326 This @code{text} element is nested inside a @code{pageParagraph}. There
327 is a different @code{text} element that is nested inside a
330 The element is either empty, or contains cdata that holds almost-XHTML
331 text: in the corpus, either an @code{html} or @code{p} element. It is
332 @emph{almost}-XHTML because the @code{html} element designates the
334 @code{http://xml.spss.com/spss/viewer/viewer-tree} instead of an XHTML
337 The cdata can contain substitution variables: @code{&[Page]} for the
338 page number and @code{&[PageTitle]} for the page title.
340 Typical contents (indented for clarity):
343 <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
346 <p style="text-align:right; margin-top: 0">Page &[Page]</p>
352 @item Required attribute: @code{type}