From 11e916f9abbb1b09a6b96879a6cd64e7df07e0b7 Mon Sep 17 00:00:00 2001 From: Ben Pfaff Date: Tue, 28 Jul 2015 23:30:22 -0700 Subject: [PATCH] spv-file-format: Work. --- spv-file-format.texi | 50 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/spv-file-format.texi b/spv-file-format.texi index 5710603fb1..663097c4be 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -7,6 +7,10 @@ This section documents the format. This description is detailed enough to read SPV files, but it is probably not sufficient to write them. +An an aside, SPSS 15 and earlier versions use a completely different +output format based on the Microsoft Compound Document Format. This +format is not documented. + An SPV file is a Zip archive that can be read with @command{zipinfo} and @command{unzip} and similar programs. The final member in the Zip archive is a file named @file{META-INF/MANIFEST.MF}. This structure @@ -14,3 +18,49 @@ makes SPV files resemble Java ``JAR'' files, but whereas a JAR manifest contains a sequence of colon-delimited key/value pairs, an SPV manifest contains the string @samp{allowPivoting=true}, without a new-line. + +The rest of the members in an SPV file's Zip archive fall into two +categories: structure and details. ``Structure'' member names begin +with @file{outputViewer@var{nnnnnnnnnn}}, where each @var{n} is a +decimal digit, and end with @file{.xml}. Each of these members +represents some kind of output item (a table, a heading, a block of +text, etc.) or a group of them. The member whose output goes at the +beginning of the document is numbered 0, the next member in the output +is numbered 1, and so on. + +Structure members contain XML. This XML is sometimes self-contained, +but it often references other members in the Zip archive named as +follows: + +@table @asis +@item @file{*_table.xml} and @file{*_tableData.bin} +@itemx @file{*_lightTableData.bin} +The structure of a table plus its data. Older SPV files pair a +@file{*_table.xml} file that describes the table's structure with a +binary @file{*_tableData.bin} file that gives its data. Newer SPV +files (the majority of those in the corpus) instead include a single +@file{*_lightTableData.bin} file that incorporates both into a single +binary format. + +@item @file{*_warning.xml} and @file{*_warningData.bin} +@itemx @file{*_lightWarningData.bin} +Same format used for tables, with a different name. + +@item @file{*_notes.xml} and @file{*_notesData.bin} +@itemx @file{*_lightNotesData.bin} +Same format used for tables, with a different name. + +@item @file{*_chartData.bin} and @file{*_chart.xml} +The structure of a chart plus its data. Charts do not have a +``light'' format. + +@item *_model.xml +@itemx *_pmml.xml +@itemx *_stats.xml +Not yet investigated. The corpus contains only one example of each. +@end table + +The @file{*} in the names of these members is typically an 11-digit +decimal number that increases for each item, tending to skip values. +Older files use different naming convention, and the exact names do +not appear to matter as long as they are unique. -- 2.30.2