X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=spv-file-format.texi;h=eac26048d9bab842283f70096fc31f645186a2c4;hb=c5de9e3e53800a63a035b511dad9c577925867a0;hp=5710603fb1f7130b678a1d1c6540a4ffead02e3f;hpb=056275a2fdb76c15ed7de77a13238cf60b14937c;p=pspp diff --git a/spv-file-format.texi b/spv-file-format.texi index 5710603fb1..eac26048d9 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -1,5 +1,5 @@ -@section SPSS Viewer Format @node SPSS Viewer Format +@section SPSS Viewer Format SPSS Viewer or @file{.spv} files, here called SPV files, are written by SPSS 16 and later to represent the contents of its output editor. @@ -7,6 +7,10 @@ This section documents the format. This description is detailed enough to read SPV files, but it is probably not sufficient to write them. +An an aside, SPSS 15 and earlier versions use a completely different +output format based on the Microsoft Compound Document Format. This +format is not documented. + An SPV file is a Zip archive that can be read with @command{zipinfo} and @command{unzip} and similar programs. The final member in the Zip archive is a file named @file{META-INF/MANIFEST.MF}. This structure @@ -14,3 +18,216 @@ makes SPV files resemble Java ``JAR'' files, but whereas a JAR manifest contains a sequence of colon-delimited key/value pairs, an SPV manifest contains the string @samp{allowPivoting=true}, without a new-line. + +The rest of the members in an SPV file's Zip archive fall into two +categories: structure and details. ``Structure'' member names begin +with @file{outputViewer@var{nnnnnnnnnn}}, where each @var{n} is a +decimal digit, and end with @file{.xml}, and often include the string +@file{_heading} in between. Each of these members represents some +kind of output item (a table, a heading, a block of text, etc.) or a +group of them. The member whose output goes at the beginning of the +document is numbered 0, the next member in the output is numbered 1, +and so on. + +Structure members contain XML. This XML is sometimes self-contained, +but it often references other members in the Zip archive named as +follows: + +@table @asis +@item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin} +@itemx @file{@var{prefix}_lightTableData.bin} +The structure of a table plus its data. Older SPV files pair a +@file{@var{prefix}_table.xml} file that describes the table's +structure with a binary @file{@var{prefix}_tableData.bin} file that +gives its data. Newer SPV files (the majority of those in the corpus) +instead include a single @file{@var{prefix}_lightTableData.bin} file +that incorporates both into a single binary format. + +@item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin} +@itemx @file{@var{prefix}_lightWarningData.bin} +Same format used for tables, with a different name. + +@item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin} +@itemx @file{@var{prefix}_lightNotesData.bin} +Same format used for tables, with a different name. + +@item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml} +The structure of a chart plus its data. Charts do not have a +``light'' format. + +@item @var{prefix}_model.xml +@itemx @var{prefix}_pmml.xml +@itemx @var{prefix}_stats.xml +Not yet investigated. The corpus contains only one example of each. +@end table + +The @file{@var{prefix}} in the names of the detail members is +typically an 11-digit decimal number that increases for each item, +tending to skip values. Older SPV files use different naming +conventions. Structure member refer to detail members by name, and so +their exact names do not appear to matter as long as they are unique. + +@node SPV Structure Member Format +@subsection Structure Member Format + +Structure members XML files claim conformance with a collection of XML +Schemas. These schemas are distributed, under a nonfree license, with +SPSS binaries. Fortunately, the schemas are not necessary to +understand the structure members. To a degree, the schemas can even +be deceptive because they document elements and attributes that are +not in the corpus and lack documentation of elements and attributes +that are commonly found in the corpus. + +Structure members use a different XML namespace for each schema, but +these namespaces are not entirely consistent: in some SPV files, for +example, the @code{viewer-tree} schema is associated with namespace +@indicateurl{http://xml.spss.com/spss/viewer-tree} and in other with +@indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the +additional @file{viewer/} directory. In any case, the schema URIs are +not resolvable to obtain the schemas themselves. + +One may ignore all of the above in interpreting a structure member. +The actual XML has a simple and straightforward form that does not +require a reader to take schemas or namespaces into account. + +@table @code +@item heading +Parent: Document root or @code{heading} @* +Contents: @code{label} [@code{container} | @code{heading}]* + +The root of a structure member is a @code{heading}, which represents a +section of output beginning with a title (the @code{label}) and +ordinarily followed by a container for content and possibly further +nested (sub)-sections of output. + +The following attributes have been observed on both document root and +nested @code{heading} elements: + +@table @asis +@item Optional attribute: @code{creator-version} +The version of the software that created this SPV file. A string of +the form @code{xxyyzzww} represents software version xx.yy.zz.ww, +e.g.@: @code{21000001} is version 21.0.0.1. Trailing pairs of zeros +are sometimes omitted, so that @code{21}, @code{210000}, and +@code{21000000} are all version 21.0.0.0 (and the corpus contains all +three of those forms). +@end table + +The following attributes have been observed on document root +@code{heading} elements only: + +@table @asis +@item Optional attribute: @code{creator} +The directory of the software that created this SPV file, +e.g. @file{C:\PROGRA~1\IBM\SPSS\STATIS~1\22} or +@file{/Applications/IBM/SPSS/Statistics/22/SPSSStatistics.app/Contents/Resources/Java/../../bin}. + +@item Optional attribute: @code{creation-date-time} +The date and time at which the SPV file was written, in a +locale-specific format, e.g. @code{Friday, May 16, 2014 6:47:37 PM +PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday, +December 5, 2014 5:00:19 o'clock PM EST}. + +@item Optional attribute: @code{lockReader} +Whether a reader should be allowed to edit the output. The possible +values are @code{true} and @code{false}, but the corpus only contains +@code{false}. + +@item Optional attribute: @code{schemaLocation} +This is actually an XML Namespace attribute. A reader may ignore it. +@end table + +The following attributes have been observed only on nested +@code{heading} elements: + +@table @asis +@item Required attribute: @code{commandName} +The locale-invariant name of the command that produced the output, +e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}. + +@item Optional attribute: @code{visibility} +To what degree the output represented by the element is visible. The +only observed value is @code{collapsed}. + +@item Optional attribute: @code{locale} +The locale used for output, in Windows format, which is similar to the +format used in Unix with the underscore replaced by a hyphen, e.g.@: +@code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}. + +@item Optional attribute: @code{olang} +The output language, e.g.@: @code{en}, @code{it}, @code{es}, +@code{de}, @code{pt-BR}. +@end table + +@item label +Parent: @code{heading} or @code{container} @* +Contents: text + +Every @code{heading} and @code{container} holds a @code{label} as its +first child. The root @code{heading} in a structure member always +contains the string ``Output''. Otherwise, the text in @code{label} +describes what it labels, often by naming the statistical procedure +that was executed, e.g.@: ``Frequencies'' or ``T-Test''. Labels are +often very generic, especially within a @code{container}, e.g.@: +``Title'' or ``Warnings'' or ``Notes''. Label text is localized +according to the output language, e.g. in Italian a frequency table +procedure is labeled ``Frequenze''. + +The corpus contains one example of an empty label, one that contains +no text. + +@item container +Parent: @code{heading} @* +Contents: @code{label} [@code{table} | @code{text}] + +A @code{container} serves to label a @code{table} or a @code{text} +item. + +@table @asis +@item Required attribute: @code{visibility} +Either @code{visible} or @code{hidden}, this indicates whether the +container's content is displayed. + +@item Optional attribute: @code{text-align} +Presumably indicates the alignment of text within the container. The +only observed value is @code{left}. Observed with nested @code{table} +and @code{text} elements. + +@item Optional attribute: @code{width} +The width of the container in the form @code{@var{n}px}, e.g.@: +@code{1097px}. +@end table + +@item text +Parent: @code{container} @* +Contents: @code{html} + +@table @asis +@item Required attribute: @code{type} +One of @code{title}, @code{log}, or @code{text}. + +@item Optional attribute: @code{commandName} +As on the @code{heading} element. For output not specific to a +command, this is simply @code{log}. The corpus contains one example +of where @code{commandName} is present but set to the empty string. + +@item Optional attribute: @code{creator-version} +As on the @code{heading} element. +@end table + +@item html +Parent: @code{text} @* +Contents: cdata + +@item table +Parent: @code{container} @* +Contents: @code{tableStructure} + +@item tableStructure +Parent: @code{table} +Contents: @code{dataPath} + +@item dataPath +Parent: @code{tableStructure} +Contents: text +@end table