pintos-os.org Git - pspp/blob - spv-file-format.texi

   1 @node SPSS Viewer Format
   2 @section SPSS Viewer Format
   3
   4 SPSS Viewer or @file{.spv} files, here called SPV files, are written
   5 by SPSS 16 and later to represent the contents of its output editor.
   6 This section documents the format.  This description is detailed
   7 enough to read SPV files, but it is probably not sufficient to
   8 write them.
   9
  10 An an aside, SPSS 15 and earlier versions use a completely different
  11 output format based on the Microsoft Compound Document Format.  This
  12 format is not documented.
  13
  14 An SPV file is a Zip archive that can be read with @command{zipinfo}
  15 and @command{unzip} and similar programs.  The final member in the Zip
  16 archive is a file named @file{META-INF/MANIFEST.MF}.  This structure
  17 makes SPV files resemble Java ``JAR'' files, but whereas a JAR
  18 manifest contains a sequence of colon-delimited key/value pairs, an
  19 SPV manifest contains the string @samp{allowPivoting=true}, without a
  20 new-line.
  21
  22 The rest of the members in an SPV file's Zip archive fall into two
  23 categories: structure and details.  ``Structure'' member names begin
  24 with @file{outputViewer@var{nnnnnnnnnn}}, where each @var{n} is a
  25 decimal digit, and end with @file{.xml}, and often include the string
  26 @file{_heading} in between.  Each of these members represents some
  27 kind of output item (a table, a heading, a block of text, etc.) or a
  28 group of them.  The member whose output goes at the beginning of the
  29 document is numbered 0, the next member in the output is numbered 1,
  30 and so on.
  31
  32 Structure members contain XML.  This XML is sometimes self-contained,
  33 but it often references other members in the Zip archive named as
  34 follows:
  35
  36 @table @asis
  37 @item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin}
  38 @itemx @file{@var{prefix}_lightTableData.bin}
  39 The structure of a table plus its data.  Older SPV files pair a
  40 @file{@var{prefix}_table.xml} file that describes the table's
  41 structure with a binary @file{@var{prefix}_tableData.bin} file that
  42 gives its data.  Newer SPV files (the majority of those in the corpus)
  43 instead include a single @file{@var{prefix}_lightTableData.bin} file
  44 that incorporates both into a single binary format.
  45
  46 @item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin}
  47 @itemx @file{@var{prefix}_lightWarningData.bin}
  48 Same format used for tables, with a different name.
  49
  50 @item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin}
  51 @itemx @file{@var{prefix}_lightNotesData.bin}
  52 Same format used for tables, with a different name.
  53
  54 @item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml}
  55 The structure of a chart plus its data.  Charts do not have a
  56 ``light'' format.
  57
  58 @item @var{prefix}_model.xml
  59 @itemx @var{prefix}_pmml.xml
  60 @itemx @var{prefix}_stats.xml
  61 Not yet investigated.  The corpus contains only one example of each.
  62 @end table
  63
  64 The @file{@var{prefix}} in the names of the detail members is
  65 typically an 11-digit decimal number that increases for each item,
  66 tending to skip values.  Older SPV files use different naming
  67 conventions.  Structure member refer to detail members by name, and so
  68 their exact names do not appear to matter as long as they are unique.
  69
  70 @node SPV Structure Member Format
  71 @subsection Structure Member Format
  72
  73 Structure members XML files claim conformance with a collection of XML
  74 Schemas.  These schemas are distributed, under a nonfree license, with
  75 SPSS binaries.  Fortunately, the schemas are not necessary to
  76 understand the structure members.  To a degree, the schemas can even
  77 be deceptive because they document elements and attributes that are
  78 not in the corpus and lack documentation of elements and attributes
  79 that are commonly found in the corpus.
  80
  81 Structure members use a different XML namespace for each schema, but
  82 these namespaces are not entirely consistent: in some SPV files, for
  83 example, the @code{viewer-tree} schema is associated with namespace
  84 @indicateurl{http://xml.spss.com/spss/viewer-tree} and in other with
  85 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the
  86 additional @file{viewer/} directory.  In any case, the schema URIs are
  87 not resolvable to obtain the schemas themselves.
  88
  89 One may ignore all of the above in interpreting a structure member.
  90 The actual XML has a simple and straightforward form that does not
  91 require a reader to take schemas or namespaces into account.
  92
  93 @table @code
  94 @item heading
  95 Parent: Document root or @code{heading} @*
  96 Contents: @code{label} [@code{container} | @code{heading}]*
  97
  98 The root of a structure member is a @code{heading}, which represents a
  99 section of output beginning with a title (the @code{label}) and
 100 ordinarily followed by a container for content and possibly further
 101 nested (sub)-sections of output.
 102
 103 The following attributes have been observed on both document root and
 104 nested @code{heading} elements:
 105
 106 @table @asis
 107 @item Optional attribute: @code{creator-version}
 108 The version of the software that created this SPV file.  A string of
 109 the form @code{xxyyzzww} represents software version xx.yy.zz.ww,
 110 e.g.@: @code{21000001} is version 21.0.0.1.  Trailing pairs of zeros
 111 are sometimes omitted, so that @code{21}, @code{210000}, and
 112 @code{21000000} are all version 21.0.0.0 (and the corpus contains all
 113 three of those forms).
 114 @end table
 115
 116 The following attributes have been observed on document root
 117 @code{heading} elements only:
 118
 119 @table @asis
 120 @item Optional attribute: @code{creator}
 121 The directory of the software that created this SPV file,
 122 e.g. @file{C:\PROGRA~1\IBM\SPSS\STATIS~1\22} or
 123 @file{/Applications/IBM/SPSS/Statistics/22/SPSSStatistics.app/Contents/Resources/Java/../../bin}.
 124
 125 @item Optional attribute: @code{creation-date-time}
 126 The date and time at which the SPV file was written, in a
 127 locale-specific format, e.g. @code{Friday, May 16, 2014 6:47:37 PM
 128 PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday,
 129 December 5, 2014 5:00:19 o'clock PM EST}.
 130
 131 @item Optional attribute: @code{lockReader}
 132 Whether a reader should be allowed to edit the output.  The possible
 133 values are @code{true} and @code{false}, but the corpus only contains
 134 @code{false}.
 135
 136 @item Optional attribute: @code{schemaLocation}
 137 This is actually an XML Namespace attribute.  A reader may ignore it.
 138 @end table
 139
 140 The following attributes have been observed only on nested
 141 @code{heading} elements:
 142
 143 @table @asis
 144 @item Required attribute: @code{commandName}
 145 The locale-invariant name of the command that produced the output,
 146 e.g.@: @code{Frequencies} or @code{T-Test}.  For output not specific
 147 to a command, this is simply @code{log}.
 148
 149 @item Optional attribute: @code{visibility}
 150 To what degree the output represented by the element is visible.  The
 151 possible values are @code{visible}, @code{hidden}, and
 152 @code{collapsed}.
 153
 154 @item Optional attribute: @code{locale}
 155 The locale used for output, in Windows format, which is similar to the
 156 format used in Unix with the underscore replaced by a hyphen, e.g.@:
 157 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
 158
 159 @item Optional attribute: @code{olang}
 160 The output language, e.g.@: @code{en}, @code{it}, @code{es},
 161 @code{de}, @code{pt-BR}.
 162 @end table
 163
 164 @item label
 165 Parent: @code{heading} or @code{container} @*
 166 Contents: text
 167
 168 Every @code{heading} and @code{container} holds a @code{label} as its
 169 first child.  The root @code{heading} in a structure member always
 170 contains the string ``Output''.  Otherwise, the text in @code{label}
 171 describes what it labels, often by naming the statistical procedure
 172 that was executed, e.g.@: ``Frequencies'' or ``T-Test''.  Labels are
 173 often very generic, especially within a @code{container}, e.g.@:
 174 ``Title'' or ``Warnings'' or ``Notes''.  Label text is localized
 175 according to the output language, e.g. in Italian a frequency table
 176 procedure is labeled ``Frequenze''.
 177
 178 The corpus contains one example of an empty label, one that contains
 179 no text.
 180
 181 @item container
 182 Parent: @code{heading} @*
 183 Contents: @code{label} [@code{table} | @code{text}]
 184
 185 A @code{container} serves to label a @code{table} or a @code{text}
 186 item.
 187
 188 @item text
 189 Parent: @code{container} @*
 190 Contents: @code{html}
 191
 192 @item html
 193 Parent: @code{text} @*
 194 Contents: cdata
 195
 196 @item table
 197 Parent: @code{container} @*
 198 Contents: @code{tableStructure}
 199
 200 @item tableStructure
 201 Parent: @code{table}
 202 Contents: @code{dataPath}
 203
 204 @item dataPath
 205 Parent: @code{tableStructure}
 206 Contents: text
 207 @end table