SPSS Viewer or @file{.spv} files, here called SPV files, are written
by SPSS 16 and later to represent the contents of its output editor.
This chapter documents the format, based on examination of a corpus of
-about 3,000 files from a variety of sources. This description is
+about 8,000 files from a variety of sources. This description is
detailed enough to both read and write SPV files.
SPSS 15 and earlier versions instead use @file{.spo} files, which have
new-line. PSPP uses this string to identify an SPV file; it is
invariant across the corpus.@footnote{SPV files always begin with the
7-byte sequence 50 4b 03 04 14 00 08, but this is not a useful magic
-number because most Zip archives start the same way.}
+number because most Zip archives start the same way.}@footnote{SPSS
+writes @file{META-INF/MANIFEST.MF} to every SPV file, but it does not
+read it or even require it to exist, so using different contents,
+e.g.@: as @samp{allowingPivot=false} has no effect.}
The rest of the members in an SPV file's Zip archive fall into two
categories: @dfn{structure} and @dfn{detail} members. Structure
:page-break-before=(always)?
:text-align=(left | center)?
:width=dimension
-=> label (table | container_text | graph | model | object | image)
+=> label (table | container_text | graph | model | object | image | tree)
@end example
Each attribute specification begins with @samp{:} followed by the
@item dimension
A floating-point number followed by a unit, e.g.@: @code{10pt}. Units
in the corpus include @code{in} (inch), @code{pt} (points, 72/inch),
-@code{px} (``device-independent pixels'', 96/inch), and @code{cm}.
-The corpus also contains localized names for units: @code{인치} for
-inch, @code{пт} for points, and @code{см} for centimeters. If the
-unit is omitted then points should be assumed. The number and unit
-may be separated by white space.
+@code{px} (``device-independent pixels'', 96/inch), and @code{cm}. If
+the unit is omitted then points should be assumed. The number and
+unit may be separated by white space.
+
+The corpus also includes localized names for units. A reader must
+understand these to properly interpret the dimension:
+
+@table @asis
+@item inch
+@code{인치}, @code{pol.}, @code{cala}, @code{cali}
+
+@item point
+@code{пт}
+
+@item centimeter
+@code{см}
+@end table
@item real
A floating-point number.
* SPV Structure table Element::
* SPV Structure graph Element::
* SPV Structure model Element::
-* SPV Structure dataPath and path Elements::
+* SPV Structure tree Element::
+* SPV Structure Path Elements::
* SPV Structure pageSetup Element::
* SPV Structure @code{text} Element (Inside @code{pageParagraph})::
@end menu
@end example
The root of a structure member is a @code{heading}, which represents a
-section of output beginning with a title (the @code{label}) and
+section of output beginning with a @code{label} and
ordinarily followed by content containers or further nested
(sub)-sections of output. Unlike heading elements in HTML and other
common document formats, which precede the content that they head,
@end example
Every @code{heading} and @code{container} holds a @code{label} as its
-first child. The root @code{heading} in a structure member always
-contains the string ``Output'' (localized). Otherwise, the text in
-@code{label} describes what it labels, often by naming the statistical
-procedure that was executed, e.g.@: ``Frequencies'' or ``T-Test''.
-Labels are often very generic, especially within a @code{container},
-e.g.@: ``Title'' or ``Warnings'' or ``Notes''. Label text is
-localized according to the output language, e.g.@: in Italian a
-frequency table procedure is labeled ``Frequenze''.
-
-The corpus contains a few examples of empty labels, ones that contain
-no text.
+first child. The label text is what appears in the outline pane of
+the GUI's viewer window. PSPP also puts it into the outline of PDF
+output. The label text doesn't appear in the output itself.
+
+The text in @code{label} describes what it labels, often by naming the
+statistical procedure that was executed, e.g.@: ``Frequencies'' or
+``T-Test''. The root @code{heading} in a structure member is normally
+``Output''. Labels are often very generic, especially within a
+@code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''.
+Label text is localized according to the output language, e.g.@: in
+Italian a frequency table procedure is labeled ``Frequenze''.
+
+The user can edit labels to be anything they want. The corpus
+contains a few examples of empty labels, ones that contain no text,
+probably as a result of user editing.
@node SPV Structure container Element
@subsection The @code{container} Element
:page-break-before=(always)?
:text-align=(left | center)?
:width=dimension
-=> label (table | container_text | graph | model | object | image)
+=> label (table | container_text | graph | model | object | image | tree)
@end example
A @code{container} serves to contain and label a @code{table},
:type[table_type]=(table | note | warning)
=> tableProperties? tableStructure
-tableStructure => path? dataPath
+tableStructure => path? dataPath csvPath?
@end example
This element has the following attributes.
:editor?
:refMapId?
:refMapURI?
-=> dataPath? path
+ :csvFileIds?
+ :csvFileNames?
+=> dataPath? path csvPath?
@end example
This element represents a graph. The @code{dataPath} and @code{path}
Normally, both elements are present; there is only one counterexample
in the corpus.
+@code{csvPath} only appears in one SPV file in the corpus, for two
+graphs. In these two cases, @code{dataPath}, @code{path}, and
+@code{csvPath} all appear. These @code{csvPath} name Zip members with
+names of the form @file{@var{number}_csv.bin}, where @var{number} is a
+many-digit number and the same as the @code{csvFileIds}. The named
+Zip members are CSV text files (despite the @file{.bin} extension).
+The CSV files are encoded in UTF-8 and begin with a U+FEFF byte-order
+marker.
+
@node SPV Structure model Element
@subsection The @code{model} Element
@example
model
- :PMMLContainerId
+ :PMMLContainerId?
:PMMLId
:StatXMLContainerId
:VDPId
:commandName
:creator-version
:mainViewName
-=> ViZml? path | pmmlContainerPath statsContainerPath
+=> ViZml? dataPath? path | pmmlContainerPath statsContainerPath
pmmlContainerPath => TEXT
Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath}
name Zip members with @file{.scf} extension.
-@node SPV Structure dataPath and path Elements
-@subsection The @code{dataPath} and @code{path} Elements
+@node SPV Structure tree Element
+@subsection The @code{tree} Element
+
+@example
+tree
+ :commandName
+ :creator-version
+ :name
+ :type
+=> dataPath path
+@end example
+
+This element represents a tree. The @code{dataPath} and @code{path}
+elements name the Zip members that give the details of the tree.
+The details are unexplored.
+
+@node SPV Structure Path Elements
+@subsection Path Elements
@example
dataPath => TEXT
path => TEXT
+
+csvPath => TEXT
@end example
These element contain the name of the Zip members that hold details
@item int16
@itemx be16
-A 16-bit integer in little-endian or big-endian byte order,
+A 16-bit unsigned integer in little-endian or big-endian byte order,
respectively.
@item int32
@itemx be32
-A 32-bit integer in little-endian or big-endian byte order,
+A 32-bit unsigned integer in little-endian or big-endian byte order,
respectively.
@item int64
@itemx be64
-A 64-bit integer in little-endian or big-endian byte order,
+A 64-bit unsigned integer in little-endian or big-endian byte order,
respectively.
@item double
@item string
@itemx bestring
-A 32-bit integer, in little-endian or big-endian byte order,
+A 32-bit unsigned integer, in little-endian or big-endian byte order,
respectively, followed by the specified number of bytes of character
data. (The encoding is indicated by the Formats nonterminal.)
@item count(@var{x})
@itemx becount(@var{x})
-A 32-bit integer, in little-endian or big-endian byte order, respectively,
-that indicates the number of bytes in @var{x}, followed by @var{x} itself.
+A 32-bit unsigned integer, in little-endian or big-endian byte order,
+respectively, that indicates the number of bytes in @var{x}, followed
+by @var{x} itself.
@item v1(@var{x})
In a version 1 @file{.bin} member, @var{x}; in version 3, nothing.
@example
Footnotes => int32[n-footnotes] Footnote*[n-footnotes]
-Footnote => Value[text] (58 @math{|} 31 Value[marker]) byte*4
+Footnote => Value[text] (58 @math{|} 31 Value[marker]) int32[show]
@end example
Each footnote has @code{text} and an optional custom @code{marker}
(such as @samp{*}).
+@code{show} is a 32-bit signed integer. It is positive to show the
+footnote or negative to hide it. Its magnitude is often 1, and in
+other cases tends to be the number of references to the footnote.
+
@node SPV Light Member Areas
@subsection Areas
or columns to pixel or point offsets.
@code{notes} is a text string that contains user-specified notes. It
-is displayed when the user hovers the cursor over the table, like
-``alt text'' on a webpage. It is not printed. It is usually empty.
+is displayed when the user hovers the cursor over the table, like text
+in the @code{title} attribute in HTML. It is not printed. It is
+usually empty.
@code{table-look} is the name of a SPSS ``TableLook'' table style,
such as ``Default'' or ``Academic''; it is often empty.
string[language] string[charset] string[locale]
bool bool bool bool
Y0
-Y2 => CustomCurrency byte[missing] bool[x16]
+Y2 => CustomCurrency byte[missing] bool[x17]
@end example
@code{command} describes the statistical procedure that generated the
X0 repeats @code{decimal}, @code{grouping}, CustomCurrency, and
@code{missing} already included in Formats.
-A writer may safely use false for @code{x16}.
+A writer may safely use false for @code{x17}.
@subsubheading X1
@example
X1 =>
- 00 byte[x14] bool[x15]
+ bool byte[x15] bool[x16]
byte[lang]
byte[show-variables]
byte[show-values]
- int32[x17] int32[x18]
+ int32[x18] int32[x19]
00*17
- bool[x19]
- 01
+ bool[x20]
+ bool[show-caption]
@end example
@code{lang} may indicate the language in use. Some values seem to be
available, 3 to display both. Again, the most common value is 0,
which probably means to use a global default.
-A writer may safely use 1 for @code{x14}, false for @code{x15}, -1 for
-@code{x17} and @code{x18}, and false for @code{x19}.
+@code{show-caption} is true to show the caption, false to hide it.
+
+A writer may safely use false for @code{x14}, 1 for @code{x15}, false
+for @code{x16}, -1 for @code{x18} and @code{x19}, and false for
+@code{x20}.
@subsubheading X2
@example
X3 =>
- 01 00 byte[x20] 00 00 00
+ 01 00 byte[x21] 00 00 00
Y1
double[small] 01
(string[dataset] string[datafile] i0 int32[date] i0)?
Y2
- (int32 i0)?
+ (int32[x22] i0)?
@end example
@code{date} is a date, as seconds since the epoch, i.e.@: since
presumptive @code{dataset} contains a null byte (a valid string never
will).
-A writer may safely use 4 for @code{x20} and omit the optional bytes
-at the end.
+@code{x22} is usually 0 or 2000000.
+
+A writer may safely use 4 for @code{x21} and omit @code{x22} and the
+other optional bytes at the end.
@node SPV Light Member Dimensions
@subsection Dimensions
Category => Value[name] (Leaf @math{|} Group)
Leaf => 00 00 00 i2 int32[leaf-index] i0
Group =>
- bool[merge] 00 01 int32[x22]
+ bool[merge] 00 01 int32[x23]
i-1 int32[n-subcategories] Category*[n-subcategories]
@end example
(For writing an SPV file, there is no need to use the @code{merge}
feature unless it is convenient.)
-A Group's @code{x22} appears to be i2 when all of the categories
+A Group's @code{x23} appears to be i2 when all of the categories
within a group are leaf categories that directly represent data values
for a variable (e.g.@: in a frequency table or crosstabulation, a group
of values in a variable being tabulated) and i0 otherwise. A writer
58
@math{|} 31
int32[n-refs] int16*[n-refs]
- (i0 | i1 string[subscript])
+ int32[n-subscripts] string*[n-subscripts]
v1(00 (i1 | i2) 00? 00? int32 00? 00?)
v3(count(TemplateString StylePair))
-TemplateString => count((count((i0 58)?) (58 @math{|} 31 string[id]))?)
+TemplateString => count((count((i0 (58 @math{|} 31 55))?) (58 @math{|} 31 string[id]))?)
StylePair =>
(31 FontStyle | 58)
markers are shown appended to the main text of the Value, as
superscripts.
-The @code{subscript}, if present, is a string to append to the main
-text of the Value, as a subscript. The subscript text is a brief
-indicator, e.g.@: @samp{a} or @samp{a,b}, with its meaning indicated
-by the table caption.
+The @code{subscripts}, if present, are strings to append to the main
+text of the Value, as subscripts. Each subscript text is a brief
+indicator, e.g.@: @samp{a} or @samp{b}, with its meaning indicated by
+the table caption. When multiple subscripts are present, they are
+displayed separated by commas.
The @code{id} inside the TemplateString, if present, is a template
string for substitutions using the syntax explained previously. It
@example
tableProperties
+ :name?
=> generalProperties footnoteProperties cellFormatProperties borderProperties printingProperties
generalProperties
:font-size?
:font-style=(regular | italic)?
:font-weight=(regular | bold)?
+ :font-underline=(none | underline)?
:labelLocationVertical=(positive | negative | center)?
:margin-bottom=dimension?
:margin-left=dimension?
:printEachLayerOnSeparatePage=bool?
=> EMPTY
@end example
+
+The @code{name} attribute appears only in standalone @file{.stt} files
+(@pxref{SPSS TableLook STT Format}).