X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=spv-file-format.texi;h=0b9ad8cf6e7c076f5db984f908738581043668e9;hb=0921b1b712bd855defb41171764f27c4ce6fb4b1;hp=15f195623d83f3b50ba41be88a72eb0ffb6c07e5;hpb=86b334a2694bd7e55392531ddb6dc0f2eda6a063;p=pspp diff --git a/spv-file-format.texi b/spv-file-format.texi index 15f195623d..0b9ad8cf6e 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -9,7 +9,7 @@ write them. An an aside, SPSS 15 and earlier versions use a completely different output format based on the Microsoft Compound Document Format. This -format is not documented. +format is not documented here. An SPV file is a Zip archive that can be read with @command{zipinfo} and @command{unzip} and similar programs. The final member in the Zip @@ -55,10 +55,12 @@ Same format used for tables, with a different name. The structure of a chart plus its data. Charts do not have a ``light'' format. -@item @var{prefix}_model.xml -@itemx @var{prefix}_pmml.xml -@itemx @var{prefix}_stats.xml +@item @var{prefix}_model.scf +@itemx @var{prefix}_pmml.scf Not yet investigated. The corpus contains only one example of each. + +@item @var{prefix}_stats.xml +Not yet investigated. The corpus contains few examples. @end table The @file{@var{prefix}} in the names of the detail members is @@ -81,9 +83,9 @@ that are commonly found in the corpus. Structure members use a different XML namespace for each schema, but these namespaces are not entirely consistent: in some SPV files, for example, the @code{viewer-tree} schema is associated with namespace -@indicateurl{http://xml.spss.com/spss/viewer-tree} and in other with +@indicateurl{http://xml.spss.com/spss/viewer-tree} and in others with @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the -additional @file{viewer/} directory. In any case, the schema URIs are +additional @file{viewer/}). In any case, the schema URIs are not resolvable to obtain the schemas themselves. One may ignore all of the above in interpreting a structure member. @@ -93,12 +95,14 @@ require a reader to take schemas or namespaces into account. @table @code @item heading Parent: Document root or @code{heading} @* -Contents: @code{label} [@code{container} | @code{heading}]* +Contents: [@code{pageSetup}] @code{label} [@code{container} | @code{heading}]* The root of a structure member is a @code{heading}, which represents a section of output beginning with a title (the @code{label}) and -ordinarily followed by a container for content and possibly further -nested (sub)-sections of output. +ordinarily followed by content containers or further nested +(sub)-sections of output. + +The document root heading may also contain a @code{pageSetup} element. The following attributes have been observed on both document root and nested @code{heading} elements: @@ -170,7 +174,7 @@ describes what it labels, often by naming the statistical procedure that was executed, e.g.@: ``Frequencies'' or ``T-Test''. Labels are often very generic, especially within a @code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''. Label text is localized -according to the output language, e.g. in Italian a frequency table +according to the output language, e.g.@: in Italian a frequency table procedure is labeled ``Frequenze''. The corpus contains one example of an empty label, one that contains @@ -202,6 +206,10 @@ The width of the container in the form @code{@var{n}px}, e.g.@: Parent: @code{container} @* Contents: @code{html} +This @code{text} element is nested inside a @code{container}. There +is a different @code{text} element that is nested inside a +@code{pageParagraph}. + @table @asis @item Required attribute: @code{type} One of @code{title}, @code{log}, or @code{text}. @@ -268,4 +276,644 @@ Contents: text Contains the name of the Zip member that holds the table details, e.g.@: @code{0000000001437_lightTableData.bin}. + +@item pageSetup +Parent: @code{heading} @* +Contents: @code{pageHeader} @code{pageFooter} + +@table @asis +@item Required attribute: @code{initial-page-number} +Always @code{1}. + +@item Optional attribute: @code{chart-size} +Always @code{as-is} or a localization (!) of it (e.g.@: @code{dimensione +attuale}, @code{Wie vorgegeben}). + +@item Optional attribute: @code{margin-left} +@itemx Optional attribute: @code{margin-right} +@itemx Optional attribute: @code{margin-top} +@itemx Optional attribute: @code{margin-bottom} +Margin sizes in the form @code{@var{size}in}, e.g.@: @code{0.25in}. + +@item Optional attribute: @code{paper-height} +@itemx Optional attribute: @code{paper-width} +Paper sizes in the form @code{@var{size}in}, e.g.@: @code{8.5in} by +@code{11in} for letter paper or @code{8.267in} by @code{11.692in} for +A4 paper. + +@item Optional attribute: @code{reference-orientation} +Always @code{0deg}. + +@item Optional attribute: @code{space-after} +Always @code{12pt}. +@end table + +@item pageHeader +@itemx pageFooter +Parent: @code{pageSetup} @* +Contents: @code{pageParagraph}* + +No attributes. + +@item pageParagraph +Parent: @code{pageHeader} or @code{pageFooter} @* +Contents: @code{text} + +Text to go at the top or bottom of a page, respectively. + +@item text +Parent: @code{pageParagraph} @* +Contents: [cdata] + +This @code{text} element is nested inside a @code{pageParagraph}. There +is a different @code{text} element that is nested inside a +@code{container}. + +The element is either empty, or contains cdata that holds almost-XHTML +text: in the corpus, either an @code{html} or @code{p} element. It is +@emph{almost}-XHTML because the @code{html} element designates the +default namespace as +@code{http://xml.spss.com/spss/viewer/viewer-tree} instead of an XHTML +namespace. + +The cdata can contain substitution variables: @code{&[Page]} for the +page number and @code{&[PageTitle]} for the page title. + +Typical contents (indented for clarity): + +@example + + + +

Page &[Page]

+ + +@end example + +@table @asis +@item Required attribute: @code{type} +Always @code{text}. +@end table @end table + +@node SPV Light Detail Member Format +@subsection Light Detail Member Format + +A ``light'' detail member @file{.bin} consists of a number of sections +concatenated together, terminated by a byte 01: + +@example +light-member := header title styles dimensions data 01 +@end example + +The first section is a 0x27-byte header: + +@example +header := 01 00 version 01 (00 | 01) byte*21 00 00 table-id byte*4 +version := i1 | i3 +table-id := int +@end example + +@code{header} includes @code{version}, a version number that affects +the interpretation of some of the other data in the member. We will +refer to ``version 1'' and ``version 3'' members later on. +@code{table-id} is a binary version of the @code{tableId} attribute in +the structure member that refers to the detail member. For example, +if @code{tableId} is @code{-4154297861994971133}, then @code{table-id} +would be 0xdca00003. The meaning of the other variable parts of the +header is not known. + +@example +title := value 01? /* @r{localized title} */ + value 01? 31 /* @r{subtype} */ + value 01? 00? 58 /* @r{locale-invariant title} */ + (31 value | 58) /* @r{caption} */ + int[n] footnote*[n] /* @r{footnotes} */ +footnote := value (31 value | 58) byte*4 +@end example + +@example +styles := 00 font*8 + int[x1] byte*[x1] + int[x2] byte*[x2] + int[x3] byte*[x3] + int[x4] int*[x4] + string[encoding] + (i0 | i-1) (00 | 01) 00 (00 | 01) + int + byte[decimal] byte[grouping] + int[n-ccs] string*[n-ccs] /* @r{custom currency} */ + styles2 + +x2 := 00 00 00 01 00 00 00 00 00 00 00 00 00 02 00 00 00 00 /* @r{18 bytes} */ + +styles2 := i0 /* @r{version 1} */ +styles2 := count(count(x5) count(x6)) /* @r{version 3} */ +x5 := byte*33 int[n] int*n +x6 := 01 00 (03 | 04) 00 00 00 + string[command] string[subcommand] + string[language] string[charset] string[locale] + (00 | 01) 00 (00 | 01) (00 | 01) + int + byte[decimal] byte[grouping] + byte*8 01 + (string[dataset] string[datafile] i0 int i0)? + int[n-ccs] string*[n-ccs] + 2e (00 | 01) (i2000000 i0)? +@end example + +In every example in the corpus, @code{x1} is 240. The meaning of the +bytes that follow it is unknown. + +In every example in the corpus, @code{x2} is 18 and the bytes that +follow it are @code{00 00 00 01 00 00 00 00 00 00 00 00 00 02 00 00 00 +00}. The meaning of these bytes is unknown. + +In every example in the corpus for version 1, @code{x3} is 16 and the +bytes that follow it are @code{00 00 00 01 00 00 00 01 00 00 00 00 01 +01 01 01}. In version 3, observed @code{x3} varies from 117 to 150, +and its bytes include a 1-byte count at offset 0x34. When the count +is nonzero, a text string of that length at offset 0x35 is the name of +a ``TableLook'', e.g. ``Default'' or ``Academic''. + +Observed values of @code{x4} vary from 0 to 17. Out of 7060 examples +in the corpus, it is nonzero only 36 times. + +@code{encoding} is a character encoding, usually a Windows code page +such as @code{en_US.windows-1252} or @code{it_IT.windows-1252}. The +encoding string is itself encoded in US-ASCII. The rest of the +character strings in the file use this encoding. + +@code{decimal} is the decimal point character. The observed values +are @samp{.} and @samp{,}. + +@code{grouping} is the grouping character. The observed values are +@samp{,}, @samp{.}, @samp{'}, @samp{ }, and zero (presumably +indicating that digits should not be grouped). + +@code{n-ccs} is observed as either 0 or 5. When it is 5, the +following strings are CCA through CCE format strings. Most commonly +these are all @code{-,,,} but other strings occur. + +@example +font := byte[index] 31 string[typeface] + 00 00 + (10 | 20 | 40 | 50 | 70 | 80)[f1] + 41 + (i0 | i1 | i2)[f2] + 00 + (i0 | i2 | i64173)[f3] + (i0 | i1 | i2 | i3)[f4] + string[fgcolor] string[bgcolor] + i0 i0 00 + (v3: int[f5] int[f6] int[f7] int[f8]) +@end example + +Each @code{font}, in order, represents the font style for a different +element: title, caption, footnote, row labels, column labels, corner +labels, data, and layers. + +@code{index} is the 1-based index of the @code{font}, i.e. 1 for the +first @code{font}, through 8 for the final @code{font}. + +@code{typeface} is the string name of the font. In the corpus, this +is @code{SansSerif} in over 99% of instances and @code{Times New +Roman} in the rest. + +@code{fgcolor} and @code{bgcolor} are the foreground color and +background color, respectively. In the corpus, these are always +@code{#000000} and @code{#ffffff}, respectively. + +The meaning of the remaining data is unknown. It seems likely to +include font sizes, horizontal and vertical alignment, attributes such +as bold or italic, and margins. @code{f1} is @code{40} most of the +time. @code{f2} is @code{i1} most of the time for the title and +@code{i0} most of the time for other fonts. + +The table below lists the values observed in the corpus. When a cell +contains a single value, then 99+% of the corpus contains that value. +When a cell contains a pair of values, then the first value is seen in +about two-third of the corpus and the second value in about the +remaining one-third. In fonts that include multiple pairs, values are +correlated, that is, for font 3, f5 = 24, f6 = 24, f7 = 2 appears +about two-thirds of the time, as does the combination of f4 = 0, f6 = +10 for font 7. + +@example +font f1 f2 f3 f4 f5 f6 f7 f8 + + 1 40 1 0 0 8 10/11 1 8 + 2 40 0 2 1 8 10/11 1 1 + 3 40 0 2 1 24/11 24/ 8 2/3 4 + 4 40 0 2 3 8 10/11 1 1 + 5 40 0 0 1 8 10/11 1 4 + 6 40 0 2 1 8 10/11 1 4 + 7 40 0 64173 0/1 8 10/11 1 1 + 8 40 0 2 3 8 10/11 1 4 +@end example + +@example +dimensions := int[n-dims] dimension*[n-dims] +dimension := value[name] + byte[d1] + (00 | 01 | 02)[d2] + (i0 | i2)[d3] + (00 | 01)[d4] + (00 | 01)[d5] + 01 + int[d6] + int[n-categories] category*[n-categories] +@end example + +@code{name} is the name of the dimension, e.g. @code{Variables}, +@code{Statistics}, or a variable name. + +@code{d1} is usually 0 but many other values have been observed. + +@code{d3} is 2 over 99% of the time. + +@code{d5} is 0 over 99% of the time. + +@code{d6} is either -1 or the 0-based index of the dimension, e.g.@: 0 +for the first dimension, 1 for the second, and so on. The latter is +the case 98% of the time in the corpus. + +@example +category := value[name] (terminal | group) +terminal-category := 00 00 00 i2 int[index] i0 +@end example + +@code{name} is the name of the category (or group). + +@code{category} can represent a terminal category. In that case, +@code{index} is a nonnegative integer less than @code{n-categories} in +the @code{dimension} in which the @code{category} is nested (directly +or indirectly). + +Alternatively, @code{category} can represent a @code{group} of nested +categories: + +@example +group := (00 | 01)[merge] 00 01 (i0 | i2)[data] + i-1 int[n-subcategories] category*[n-subcategories] +@end example + +Ordinarily a group has some nested content, so that +@code{n-subcategories} is positive, but a few instances of groups with +@code{n-subcategories} 0 has been observed. + +If @code{merge} is 00, the most common value, then the group is really +a distinct group that should be represented as such in the visual +representation and user interface. If @code{merge} is 01, however, +the categories in this group should be shown and treated as if they +were direct children of the group's parent group (or if it has no +parent group, then direct children of the dimension), and this group's +name is irrelevant and should not be displayed. (Merged groups can be +nested!) + +@code{data} appears to be i2 when all of the categories within a group +are terminal categories that directly represent data values for a +variable (e.g. in a frequency table or crosstabulation, a group of +values in a variable being tabulated) and i0 otherwise, but this might +be naive. + +@example +data := int[layers] int[rows] int[columns] int*[n-dimensions] + int[n-data] datum*[n-data] +@end example + +The values of @code{layers}, @code{rows}, and @code{columns} each +specifies the number of dimensions represented in layers or rows or +columns, respectively, and their values sum to the number of +dimensions. + +The @code{n-dimensions} integers are a permutation of the 0-based +dimension numbers. The first @code{layers} of them specify each of +the dimensions represented by layers, the next @code{rows} of them +specify the dimensions represented by rows, and the final +@code{columns} of them specify the dimensions represented by columns. +When there is more than one dimension of a given kind, the inner +dimensions are given first. + +@example +datum := int64[index] 00? value /* @r{version 1} */ +datum := int64[index] value /* @r{version 3} */ +@end example + +The format of a datum varies slightly from version 1 to version 3: in +version 1 it allows for an extra optional 00 byte. + +A datum consists of an index and a value. Suppose there are @math{d} +dimensions and dimension @math{i} for @math{0 \le i < d} has +@math{n_i} categories. Consider the datum at coordinates @math{x_i} +for @math{0 \le i < d}; note that @math{0 \le x_i < n_i}. Then the +index is calculated by the following algorithm: + +@display +let index = 0 +for each @math{i} from 0 to @math{d - 1}: + index = @math{n_i \times} index + @math{x_i} +@end display + +For example, suppose there are 3 dimensions with 3, 4, and 5 +categories, respectively. The datum at coordinates (1, 2, 3) has +index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}. + +@example +value := 00? 00? 00? 00? raw-value +raw-value := + 01 value-mod int[format] double[x] + | 02 value-mod int[format] double[x] + string[varname] string[vallab] (01 | 02 | 03) + | 03 string[local] value-mod string[id] string[c] (00 | 01)[type] + | 04 value-mod int[format] string[vallab] string[varname] + (01 | 02 | 03) string[s] + | 05 value-mod string[varname] string[varlabel] (01 | 02 | 03) + | value-mod string[format] int[n-args] arg*[n-args] +arg := + i0 value + | int[x] i0 value*[x + 1] /* @r{x > 0} */ +@end example + +A @code{value} boils down to a number or a string. There are several +possibilities, which one can distinguish by the first nonzero byte in +the encoding: + +@table @code +@item 01 +The numeric value @code{x}, presented to the user formatted according +to @code{format}, which is in the format described for system files. +@xref{System File Output Formats}, for details. Most commonly +@code{format} has width 40 (the maximum). + +An @code{x} with the maximum negative double @code{-DBL_MAX} +represents the system-missing value SYSMIS. (HIGHEST and LOWEST have +not been observed.) @xref{System File Format}, for more about these +special values. + +@item 02 +Similar to @code{01}, with the additional information that @code{x} is +a value of variable @code{varname} and has value label @code{vallab}. +Both @code{varname} and @code{vallab} can be the empty string, the +latter very commonly. + +The meaning of the final byte is unknown. Possibly it is connected to +whether the value or the label should be displayed. + +@item 03 +A text string, in two forms: @code{c} is in English, and sometimes +abbreviated or obscure, and @code{local} is localized to the user's +locale. In an English-language locale, the two strings are often the +same, and in the cases where they differ, @code{local} is more +appropriate for a user interface, e.g.@: @code{c} of ``Not a PxP table +for MCN...'' versus @code{local} of ``Computed only for a PxP table, +where P must be greater than 1.'' + +@code{c} and @code{local} are always either both empty or both +nonempty. + +@code{id} is a brief identifying string whose form seems to resemble a +programming language identifier, e.g.@: @code{cumulative_percent} or +@code{factor_14}. It is not unique. + +@code{type} is 00 for text taken from user input, such as syntax +fragment, expressions, file names, data set names, and 01 for fixed +text strings such as names of procedures or statistics. In the former +case, @code{id} is always the empty string; in the latter case, +@code{id} is still sometimes empty. + +@item 04 +The string value @code{s}, presented to the user formatted according +to @code{format}. The format for a string is not too interesting, and +clearly invalid formats like A16.39 or A255.127 or A134.1 abound in +the corpus, so readers should probably ignore the format entirely. + +@code{s} is a value of variable @code{varname} and has value label +@code{vallab}. @code{varname} is never empty but @code{vallab} is +commonly empty. + +The meaning of the final byte is unknown. + +@item 05 +Variable @code{varname}, which is rarely observed as empty in the +corpus, with variable label @code{varlabel}, which is often empty. + +The meaning of the final byte is unknown. + +@item 31 +@itemx 58 +(These bytes begin a @code{value-mod}.) A format string, analogous to +@code{printf}, followed by one or more arguments, each of which has +one or more values. The format string uses the following syntax: + +@table @code +@item \% +@item \: +@item \[ +@item \] +Each of these expands to the character following @samp{\\}. This is +useful to escape characters that have special meaning in format +strings. These are effective inside and outside the @code{[@dots{}]} +syntax forms described below. + +@item \n +Expands to a new-line, inside or outside the @code{[@dots{}]} forms +described below. + +@item ^@var{i} +Expands to a formatted version of argument @var{i}, which must have +only a single value. For example, @code{^1} would expand to the first +argument's @code{value}. + +@item [:@var{a}:]@var{i} +Expands @var{a} for each of the @code{value}s in @var{i}. @var{a} +should contain one or more @code{^@var{j}} conversions, which are +drawn from the values for argument @var{i} in order. Some examples +from the corpus: + +@table @code +@item [:^1:]1 +All of the values for the first argument, concatenated. + +@item [:^1\n:]1 +Expands to the values for the first argument, each followed by +a new-line. + +@item [:^1 = ^2:]2 +Expands to @code{@var{x} = @var{y}} where @var{x} is the second +argument's first value and @var{y} is its second value. (This would +be used only if the argument has two values. With additional values, +the second and third values would be directly concatenated, which +would look funny.) +@end table + +@item [@var{a}:@var{b}:]@var{i} +This extends the previous form so that the first values are expanded +using @var{a} and later values are expanded using @var{b}. For an +unknown reason, within @var{a} the @code{^@var{j}} conversions are +instead written as @code{%@var{j}}. Some examples from the corpus: + +@table @code +@item [%1:*^1:]1 +Expands to all of the values for the first argument, separated by +@samp{*}. + +@item [%1 = %2:, ^1 = ^2:]1 +Given appropriate values for the first argument, expands to @code{X = +1, Y = 2, Z = 3}. + +@item [%1:, ^1:]1 +Given appropriate values, expands to @code{1, 2, 3}. +@end table +@end table + +The format string is localized to the user's locale. +@end table + +@example +value-mod := + 31 i0 (i0 | i1 string[subscript]) value-mod-i0-v1 /* @r{version 1} */ + | 31 i0 (i0 | i1 string[subscript]) value-mod-i0-v3 /* @r{version 3} */ + | 31 i1 int[footnote-number] format + | 31 i2 (00 | 01 | 02) 00 (i1 | i2 | i3) format + | 31 i3 00 00 01 00 i2 format + | 58 +value-mod-i0-v1 := 00 (i1 | i2) 00 00 int 00 00 +value-mod-i0-v3 := count(format-string + (58 | 31 style) + (58 + | 31 i0 i0 i0 i0 01 00 (01 | 02 | 08) + 00 08 00 0a 00)) + +style := 01? 00? 00? 00? 01 string[fgcolor] string[bgcolor] string[font] byte +format := 00 00 count(format-string (58 | 31 style) 58) +format-string := count((i0 (58 | 31 string))?) +@end example + +A @code{value-mod} can specify special modifications to a @code{value}: + +@itemize @bullet +@item +The @code{footnote-number}, if present, specifies a footnote that the +@code{value} references. The footnote's marker is shown appended to +the main text of the @code{value}, as a superscript. + +@item +The @code{subscript}, if present, specifies a string to append to the +main text of the @code{value}, as a subscript. The subscript text is +normally a brief indicator, e.g.@: @samp{a} or @samp{a,b}, with its +meaning indicated by the table caption. In this usage, subscripts are +similar to footnotes; one apparent difference is that a @code{value} +can only reference one footnote but a subscript can list more than one +letter. + +@item +The @code{format}, if present, is a format string for substitutions +using the syntax explained previously. It appears to be an +English-language version of the localized format string in the +@code{value} in which the @code{format} is nested. + +@item +The @code{style}, if present, changes the style for this individual +@code{value}. +@end itemize + +@node SPV Legacy Detail Member Binary Format +@subsection SPV Legacy Detail Member Binary Format + +Whereas the light binary format represents everything about a given +pivot table, the legacy binary format conceptually consists of a +number of named sources, each of which consists of a number of named +series, each of which is a 1-dimensional array of numbers or strings +or a mix. Thus, the legacy binary file format is quite simple. + +@example +legacy-binary := 00 byte[version] int16[n-sources] int[file-size] + metadata*[n-sources] data*[n-sources] +@end example + +@code{version} is a version number that affects the interpretation of +some of the other data in the member. Versions 0xaf and 0xb0 are +known. We will refer to ``version 0xaf'' and ``version 0xb0'' members +later on. + +A legacy member consists of @code{n-sources} data sources, each of +which has @code{metadata} and @code{data}. + +@code{file-size} is the size of the file, in bytes. + +@example +/* @r{version 0xaf} */ +metadata := int[per-series] int[n-series] int[ofs] byte*32[source-name] + +/* @r{version 0xb0} */ +metadata := int[per-series] int[n-series] int[ofs] byte*64[source-name] int[x] +@end example + +A data source consists of @code{n-series} series of data, with +@code{per-series} data values per series. + +@code{source-name} is a 32- or 64-byte string padded on the right with +zero bytes. The names that appear in the corpus are very generic, +usually @code{tableData} or @code{source0}. + +The @code{ofs} is the offset, in bytes, from the beginning of the file +to the start of this data source's @code{data}. This allows programs +to skip to the beginning of the data for a particular source; it is +also important to determine whether a source includes any string data +(see below). + +The meaning of @code{x} in version 0xb0 is unknown. + +@example +data := numeric-data string-data? +numeric-data := numeric-series*[n-series] +numeric-series := byte*288[series-name] double*[per-series] +@end example + +Data follow the metadata in the legacy binary format, with sources in +the same order. Each series begins with a @code{series-name}, which +generally indicates its role in the pivot table, e.g.@: ``cell'', +``cellFormat'', ``dimension0categories'', ``dimension0group0''. The +name is followed by the data, one double per element in the series. A +double with the maximum negative double @code{-DBL_MAX} represents the +system-missing value SYSMIS. + +@example +string-data := i1 string[source-name] pairs labels + +pairs := int[n-string-series] pair-series*[n-string-series] +pair-series := string[pair-series-name] int[n-pairs] pair*[n-pairs] +pair := int[i] int[j] + +labels := int[n-labels] label*[n-labels] +label := int[frequency] int[s] +@end example + +A source may include a mix of numeric and string data values. When a +source includes any string data, the data values that are strings are +set to SYSMIS in the @code{numeric-series}, and @code{string-data} +follows the @code{numeric-data}. To reliably determine whether a +source includes @code{string-data}, the reader should check whether +the offset following the @code{numeric-data} is the offset of the next +series, as indicated by its @code{metadata} (or end of file, in the +case of the last source in a file). + +@code{string-data} repeats the name of the source. + +The string data overlays the numeric data. @code{n-string-series} is +the number of series within the source that include string data. More +precisely, it is the 1-based index of the last series in the source +that includes any string data; thus, it would be 4 if there are 5 +series and only the fourth one includes string data. + +Each @code{pair-series} consists a sequence of 0 or more pairs, each +of which maps from a 0-based index within the series @code{i} to a +0-based label index @code{j}. The pair @code{i} = 2, @code{j} = 3, +for example, would mean that the third data value (with value SYSMIS) +is to be replaced by the string of the fourth label. + +The labels themselves follow the pairs. The valuable part of each +label is the string @code{s}. Each label also includes a +@code{frequency} that reports the number of pairs that reference it +(although this is not useful).