X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=spv-file-format.texi;h=1497c49b6dc3beb7cc2a7a76e183275c8081d419;hb=7674958d6669183799289f701e1148b6903b801a;hp=663097c4beff24720ce315b5bc21d29e73085d00;hpb=11e916f9abbb1b09a6b96879a6cd64e7df07e0b7;p=pspp diff --git a/spv-file-format.texi b/spv-file-format.texi index 663097c4be..1497c49b6d 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -1,5 +1,5 @@ -@section SPSS Viewer Format @node SPSS Viewer Format +@section SPSS Viewer Format SPSS Viewer or @file{.spv} files, here called SPV files, are written by SPSS 16 and later to represent the contents of its output editor. @@ -22,45 +22,104 @@ new-line. The rest of the members in an SPV file's Zip archive fall into two categories: structure and details. ``Structure'' member names begin with @file{outputViewer@var{nnnnnnnnnn}}, where each @var{n} is a -decimal digit, and end with @file{.xml}. Each of these members -represents some kind of output item (a table, a heading, a block of -text, etc.) or a group of them. The member whose output goes at the -beginning of the document is numbered 0, the next member in the output -is numbered 1, and so on. +decimal digit, and end with @file{.xml}, and often include the string +@file{_heading} in between. Each of these members represents some +kind of output item (a table, a heading, a block of text, etc.) or a +group of them. The member whose output goes at the beginning of the +document is numbered 0, the next member in the output is numbered 1, +and so on. Structure members contain XML. This XML is sometimes self-contained, but it often references other members in the Zip archive named as follows: @table @asis -@item @file{*_table.xml} and @file{*_tableData.bin} -@itemx @file{*_lightTableData.bin} +@item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin} +@itemx @file{@var{prefix}_lightTableData.bin} The structure of a table plus its data. Older SPV files pair a -@file{*_table.xml} file that describes the table's structure with a -binary @file{*_tableData.bin} file that gives its data. Newer SPV -files (the majority of those in the corpus) instead include a single -@file{*_lightTableData.bin} file that incorporates both into a single -binary format. - -@item @file{*_warning.xml} and @file{*_warningData.bin} -@itemx @file{*_lightWarningData.bin} +@file{@var{prefix}_table.xml} file that describes the table's +structure with a binary @file{@var{prefix}_tableData.bin} file that +gives its data. Newer SPV files (the majority of those in the corpus) +instead include a single @file{@var{prefix}_lightTableData.bin} file +that incorporates both into a single binary format. + +@item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin} +@itemx @file{@var{prefix}_lightWarningData.bin} Same format used for tables, with a different name. -@item @file{*_notes.xml} and @file{*_notesData.bin} -@itemx @file{*_lightNotesData.bin} +@item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin} +@itemx @file{@var{prefix}_lightNotesData.bin} Same format used for tables, with a different name. -@item @file{*_chartData.bin} and @file{*_chart.xml} +@item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml} The structure of a chart plus its data. Charts do not have a ``light'' format. -@item *_model.xml -@itemx *_pmml.xml -@itemx *_stats.xml +@item @var{prefix}_model.xml +@itemx @var{prefix}_pmml.xml +@itemx @var{prefix}_stats.xml Not yet investigated. The corpus contains only one example of each. @end table -The @file{*} in the names of these members is typically an 11-digit -decimal number that increases for each item, tending to skip values. -Older files use different naming convention, and the exact names do -not appear to matter as long as they are unique. +The @file{@var{prefix}} in the names of the detail members is +typically an 11-digit decimal number that increases for each item, +tending to skip values. Older SPV files use different naming +conventions. Structure member refer to detail members by name, and so +their exact names do not appear to matter as long as they are unique. + +@node SPV Structure Member Format +@subsection Structure Member Format + +Structure members XML files claim conformance with a collection of XML +Schemas. These schemas are distributed, under a nonfree license, with +SPSS binaries. Fortunately, the schemas are not necessary to +understand the structure members. To a degree, the schemas can even +be deceptive because they document elements and attributes that are +not in the corpus and lack documentation of elements and attributes +that are commonly found in the corpus. + +Structure members use a different XML namespace for each schema, but +these namespaces are not entirely consistent: in some SPV files, for +example, the @code{viewer-tree} schema is associated with namespace +@indicateurl{http://xml.spss.com/spss/viewer-tree} and in other with +@indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the +additional @file{viewer/} directory. In any case, the schema URIs are +not resolvable to obtain the schemas themselves. + +One may ignore all of the above in interpreting a structure member. +The actual XML has a simple and straightforward form that does not +require a reader to take schemas or namespaces into account. + +@table @code +@item heading +Parent: Document root or @code{heading} @* +Contents: @code{label} [@code{container} | @code{heading}]* + +The root of a structure member is a @code{heading}, which represents a +section of output beginning with a title (the @code{label}) and +ordinarily followed by a container for content and possibly further +nested (sub)-sections of output. + +@item label +Parent: @code{heading} or @code{container} @* +Contents: text + +Every @code{heading} and @code{container} holds a @code{label} as its +first child. The root @code{heading} in a structure member always +contains the string ``Output''. Otherwise, the text in @code{label} +describes what it labels, often by naming the statistical procedure +that was executed, e.g.@: ``Frequencies'' or ``T-Test''. Labels are +often very generic, especially within a @code{container}, e.g.@: +``Title'' or ``Warnings'' or ``Notes''. Label text is localized +according to the output language, e.g. in Italian a frequency table +procedure is labeled ``Frequenze''. + +The corpus contains one example of an empty label, one that contains +no text. + +@item container +Parent: @code{heading} @* +Contents: @code{label} [@code{table} | @code{text}] + +A @code{container} is the immediate parent of a +@end table