docs

[pspp] / doc / dev / spv-file-format.texi
diff --git a/doc/dev/spv-file-format.texi b/doc/dev/spv-file-format.texi

index 16b21dcf12d5db52cc07dca2d987cfc9192ba1a3..d5c93c4348aea8228151fef7a5ea2016f2965e84 100644 (file)
--- a/doc/dev/spv-file-format.texi
+++ b/doc/dev/spv-file-format.texi
@@ -38,13 +38,13 @@ e.g.@: as @samp{allowingPivot=false} has no effect.}
  
  The rest of the members in an SPV file's Zip archive fall into two
  categories: @dfn{structure} and @dfn{detail} members.  Structure
  
  The rest of the members in an SPV file's Zip archive fall into two
  categories: @dfn{structure} and @dfn{detail} members.  Structure
-member names begin with @file{outputViewer@var{nnnnnnnnnn}}, where
-each @var{n} is a decimal digit, and end with @file{.xml}, and often
-include the string @file{_heading} in between.  Each of these members
-represents some kind of output item (a table, a heading, a block of
-text, etc.) or a group of them.  The member whose output goes at the
-beginning of the document is numbered 0, the next member in the output
-is numbered 1, and so on.
+member names take the form with @file{outputViewer@var{number}.xml} or
+@file{outputViewer@var{number}_heading.xml}, where @var{number} is an
+10-digit decimal number.  Each of these members represents some kind
+of output item (a table, a heading, a block of text, etc.) or a group
+of them.  The member whose output goes at the beginning of the
+document is numbered 0, the next member in the output is numbered 1,
+and so on.
  
  Structure members contain XML.  This XML is sometimes self-contained,
  but it often references detail members in the Zip archive, which are
  
  Structure members contain XML.  This XML is sometimes self-contained,
  but it often references detail members in the Zip archive, which are
@@ -72,6 +72,13 @@ Same format used for tables, with a different name.
  The structure of a chart plus its data.  Charts do not have a
  ``light'' format.
  
  The structure of a chart plus its data.  Charts do not have a
  ``light'' format.
  
+@item @file{@var{prefix}_Imagegeneric.png}
+@itemx @file{@var{prefix}_PastedObjectgeneric.png}
+@itemx @file{@var{prefix}_imageData.bin}
+A PNG image referenced by an @code{object} element (in the first two
+cases) or an @code{image} element (in the final case).  @xref{SPV
+Structure object and image Elements}.
+
  @item @file{@var{prefix}_pmml.scf}
  @itemx @file{@var{prefix}_stats.scf}
  @item @file{@var{prefix}_model.xml}
  @item @file{@var{prefix}_pmml.scf}
  @itemx @file{@var{prefix}_stats.scf}
  @item @file{@var{prefix}_model.xml}
@@ -81,8 +88,9 @@ Not yet investigated.  The corpus contains few examples.
  The @file{@var{prefix}} in the names of the detail members is
  typically an 11-digit decimal number that increases for each item,
  tending to skip values.  Older SPV files use different naming
  The @file{@var{prefix}} in the names of the detail members is
  typically an 11-digit decimal number that increases for each item,
  tending to skip values.  Older SPV files use different naming
-conventions.  Structure member refer to detail members by name, and so
-their exact names do not matter to readers as long as they are unique.
+conventions for detail members.  Structure member refer to detail
+members by name, and so their exact names do not matter to readers as
+long as they are unique.
  
  SPSS tolerates corrupted Zip archives that Zip reader libraries tend
  to reject.  These can be fixed up with @command{zip -FF}.
  
  SPSS tolerates corrupted Zip archives that Zip reader libraries tend
  to reject.  These can be fixed up with @command{zip -FF}.
@@ -294,6 +302,7 @@ information, and the CSS from the embedded HTML:
  * SPV Structure table Element::
  * SPV Structure graph Element::
  * SPV Structure model Element::
  * SPV Structure table Element::
  * SPV Structure graph Element::
  * SPV Structure model Element::
+* SPV Structure object and image Elements::
  * SPV Structure tree Element::
  * SPV Structure Path Elements::
  * SPV Structure pageSetup Element::
  * SPV Structure tree Element::
  * SPV Structure Path Elements::
  * SPV Structure pageSetup Element::
@@ -321,15 +330,29 @@ heading
  => label (container | heading)*
  @end example
  
  => label (container | heading)*
  @end example
  
-The root of a structure member is a @code{heading}, which represents a
-section of output beginning with a @code{label} and
-ordinarily followed by content containers or further nested
-(sub)-sections of output.  Unlike heading elements in HTML and other
-common document formats, which precede the content that they head,
-@code{heading} contains the elements that appear below the heading.
-
-The document root heading, only, may contain a @code{pageSetup}
-element.
+A @code{heading} represents a tree of content that appears in an
+output viewer window.  It contains a @code{label} text string that is
+shown in the outline view ordinarily followed by content containers or
+further nested (sub)-sections of output.  Unlike heading elements in
+HTML and other common document formats, which precede the content that
+they head, @code{heading} contains the elements that appear below the
+heading.
+
+The root of a structure member is a special @code{heading}.  The
+direct children of the root @code{heading} elements in all structure
+members in an SPV file are siblings.  That is, the root @code{heading}
+in all of the structure members conceptually represent the same node.
+The root heading's @code{label} is ignored (see @pxref{SPV Structure
+label Element}).  The root heading in the first structure member in
+the Zip file may contain a @code{pageSetup} element.
+
+The schema implies that any @code{heading} may contain a sequence of
+any number of @code{heading} and @code{container} elements.  This does
+not work for the root @code{heading} in practice, which must actually
+contain exactly one @code{container} or @code{heading} child element.
+Furthermore, if the root heading's child is a @code{heading}, then the
+structure member's name must end in @file{_heading.xml}; if it is a
+@code{container} child, then it must not.
  
  The following attributes have been observed on both document root and
  nested @code{heading} elements.
  
  The following attributes have been observed on both document root and
  nested @code{heading} elements.
@@ -379,7 +402,11 @@ output, e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
  @end defvr
  
  @defvr {Attribute} @code{visibility}
  @end defvr
  
  @defvr {Attribute} @code{visibility}
-To what degree the output represented by the element is visible.
+If this attribute is absent, the heading's content is expanded in the
+outline view.  If it is set to @code{collapsed}, it is collapsed.
+(This attribute is never present in a root @code{heading} because the
+root node is always expanded when a file is loaded, even though the UI
+can be used to collapse it interactively.)
  @end defvr
  
  @defvr {Attribute} @code{locale}
  @end defvr
  
  @defvr {Attribute} @code{locale}
@@ -407,8 +434,7 @@ output.  The label text doesn't appear in the output itself.
  
  The text in @code{label} describes what it labels, often by naming the
  statistical procedure that was executed, e.g.@: ``Frequencies'' or
  
  The text in @code{label} describes what it labels, often by naming the
  statistical procedure that was executed, e.g.@: ``Frequencies'' or
-``T-Test''.  The root @code{heading} in a structure member is normally
-``Output''.  Labels are often very generic, especially within a
+``T-Test''.  Labels are often very generic, especially within a
  @code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''.
  Label text is localized according to the output language, e.g.@: in
  Italian a frequency table procedure is labeled ``Frequenze''.
  @code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''.
  Label text is localized according to the output language, e.g.@: in
  Italian a frequency table procedure is labeled ``Frequenze''.
@@ -417,6 +443,10 @@ The user can edit labels to be anything they want.  The corpus
  contains a few examples of empty labels, ones that contain no text,
  probably as a result of user editing.
  
  contains a few examples of empty labels, ones that contain no text,
  probably as a result of user editing.
  
+The root @code{heading} in an SPV file has a @code{label}, like every
+@code{heading}.  It normally contains ``Output'' but its content is
+disregarded anyway.  The user cannot edit it.
+
  @node SPV Structure container Element
  @subsection The @code{container} Element
  
  @node SPV Structure container Element
  @subsection The @code{container} Element
  
@@ -436,7 +466,7 @@ This element has the following attributes.
  
  @defvr {Attribute} @code{visibility}
  Whether the container's content is displayed.  ``Notes'' tables are
  
  @defvr {Attribute} @code{visibility}
  Whether the container's content is displayed.  ``Notes'' tables are
-often hidden; other data is usually
+often hidden; other data is usually visible.
  @end defvr
  
  @defvr {Attribute} @code{text-align}
  @end defvr
  
  @defvr {Attribute} @code{text-align}
@@ -448,6 +478,14 @@ Alignment of text within the container.  Observed with nested
  The width of the container, e.g.@: @code{1097px}.
  @end defvr
  
  The width of the container, e.g.@: @code{1097px}.
  @end defvr
  
+All of the elements that nest inside @code{container} (except the
+@code{label}) have the following optional attribute.
+
+@defvr {Attribute} @code{commandName}
+As on the @code{heading} element.  The corpus contains one example
+of where @code{commandName} is present but set to the empty string.
+@end defvr
+
  @node SPV Structure text Element (Inside @code{container})
  @subsection The @code{text} Element (Inside @code{container})
  
  @node SPV Structure text Element (Inside @code{container})
  @subsection The @code{text} Element (Inside @code{container})
  
@@ -465,14 +503,13 @@ is a different @code{text} element that is nested inside a
  
  This element has the following attributes.
  
  
  This element has the following attributes.
  
-@defvr {Attribute} @code{type}
-The semantics of the text.
+@defvr {Attribute} @code{commandName}
+@xref{SPV Structure container Element}.  For output not specific to a
+command, this is simply @code{log}.
  @end defvr
  
  @end defvr
  
-@defvr {Attribute} @code{commandName}
-As on the @code{heading} element.  For output not specific to a
-command, this is simply @code{log}.  The corpus contains one example
-of where @code{commandName} is present but set to the empty string.
+@defvr {Attribute} @code{type}
+The semantics of the text.
  @end defvr
  
  @defvr {Attribute} @code{creator-version}
  @end defvr
  
  @defvr {Attribute} @code{creator-version}
@@ -574,7 +611,7 @@ tableStructure => path? dataPath csvPath?
  This element has the following attributes.
  
  @defvr {Attribute} @code{commandName}
  This element has the following attributes.
  
  @defvr {Attribute} @code{commandName}
-As on the @code{heading} element.
+@xref{SPV Structure container Element}.
  @end defvr
  
  @defvr {Attribute} @code{type}
  @end defvr
  
  @defvr {Attribute} @code{type}
@@ -668,6 +705,33 @@ strings, and @code{path} names an Zip member that contains XML.
  Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath}
  name Zip members with @file{.scf} extension.
  
  Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath}
  name Zip members with @file{.scf} extension.
  
+@node SPV Structure object and image Elements
+@subsection The @code{object} and @code{image} Elements
+
+@example
+object
+   :commandName?
+   :type[object_type]=(unknown)?
+   :uri
+=> EMPTY
+
+image
+   :commandName?
+   :VDPId
+=> dataPath
+@end example
+
+These two elements represent an image in PNG format.  They are
+equivalent and the corpus contains examples of both.  The only
+difference is the syntax: for @code{object}, the @code{uri} attribute
+names the Zip member that contains a PNG file; for @code{image}, the
+text of the inner @code{dataPath} element names the Zip member.
+
+PSPP writes @code{object} in output but there is no strong reason to
+choose this form.
+
+The corpus only contains PNG image files.
+
  @node SPV Structure tree Element
  @subsection The @code{tree} Element
  
  @node SPV Structure tree Element
  @subsection The @code{tree} Element
  
@@ -946,7 +1010,7 @@ A ``light'' detail member @file{.bin} consists of a number of sections
  concatenated together, terminated by an optional byte 01:
  
  @example
  concatenated together, terminated by an optional byte 01:
  
  @example
-LightMember =>
+Table =>
      Header Titles Footnotes
      Areas Borders PrintSettings TableSettings Formats
      Dimensions Axes Cells
      Header Titles Footnotes
      Areas Borders PrintSettings TableSettings Formats
      Dimensions Axes Cells
@@ -998,17 +1062,12 @@ and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for
  version-specific formatting (as described previously).
  
  If @code{rotate-inner-column-labels} is 1, then column labels closest
  version-specific formatting (as described previously).
  
  If @code{rotate-inner-column-labels} is 1, then column labels closest
-to the data are rotated to be vertical; otherwise, they are shown
-in the normal way.
+to the data are rotated 90° counterclockwise; otherwise, they are
+shown in the normal way.
  
  If @code{rotate-outer-row-labels} is 1, then row labels farthest from
  
  If @code{rotate-outer-row-labels} is 1, then row labels farthest from
-the data are rotated to be vertical; otherwise, they are shown in the
-normal way.
-
-@code{table-id} is a binary version of the @code{tableId} attribute in
-the structure member that refers to the detail member.  For example,
-if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
-would be 0xc6c99d183b300001.
+the data are rotated 90° counterclockwise; otherwise, they are shown
+in the normal way.
  
  @code{min-col-width} is the minimum width that a column will be
  assigned automatically.  @code{max-col-width} is the maximum width
  
  @code{min-col-width} is the minimum width that a column will be
  assigned automatically.  @code{max-col-width} is the maximum width
@@ -1017,6 +1076,11 @@ that a column will be assigned to accommodate a long column label.
  the width of row labels.  All of these measurements are in 1/96 inch
  units (called a ``device independent pixel'' unit in Windows).
  
  the width of row labels.  All of these measurements are in 1/96 inch
  units (called a ``device independent pixel'' unit in Windows).
  
+@code{table-id} is a binary version of the @code{tableId} attribute in
+the structure member that refers to the detail member.  For example,
+if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
+would be 0xc6c99d183b300001.
+
  The meaning of the other variable parts of the header is not known.  A
  writer may safely use version 3, true for @code{x0}, false for
  @code{x1}, true for @code{x2}, and 0x15 for @code{x3}.
  The meaning of the other variable parts of the header is not known.  A
  writer may safely use version 3, true for @code{x0}, false for
  @code{x1}, true for @code{x2}, and 0x15 for @code{x3}.
@@ -1036,7 +1100,7 @@ Titles =>
  The Titles follow the Header and specify the table's title, caption,
  and corner text.
  
  The Titles follow the Header and specify the table's title, caption,
  and corner text.
  
-The @code{user-title} is shown above the title and reflects any user
+The @code{user-title} reflects any user
  editing of the title text or style.  The @code{title} is the title
  originally generated by the procedure.  Both of these are appropriate
  for presentation and localized to the user's language.  For example,
  editing of the title text or style.  The @code{title} is the title
  originally generated by the procedure.  Both of these are appropriate
  for presentation and localized to the user's language.  For example,
@@ -1049,9 +1113,9 @@ name the variable and @code{c} is simply ``Frequencies''.
  
  The @code{corner-text}, if present, is shown in the upper-left corner
  of the table, above the row headings and to the left of the column
  
  The @code{corner-text}, if present, is shown in the upper-left corner
  of the table, above the row headings and to the left of the column
-headings.  It is usually absent.  Corner text prevents row dimension
-labels from being displayed above the dimension's group and category
-labels (see @code{show-row-labels-in-corner}).
+headings.  It is usually absent.  When row dimension labels are
+displayed in the corner (see @code{show-row-labels-in-corner}), corner
+text is hidden.
  
  The @code{caption}, if present, is shown below the table.
  @code{caption} reflects user editing of the caption.
  
  The @code{caption}, if present, is shown below the table.
  @code{caption} reflects user editing of the caption.
@@ -1073,6 +1137,7 @@ reference other footnotes, but in practice this doesn't work.
  @code{show} is a 32-bit signed integer.  It is positive to show the
  footnote or negative to hide it.  Its magnitude is often 1, and in
  other cases tends to be the number of references to the footnote.
  @code{show} is a 32-bit signed integer.  It is positive to show the
  footnote or negative to hide it.  Its magnitude is often 1, and in
  other cases tends to be the number of references to the footnote.
+It is safe to write 1 to show a footnote and -1 to hide it.
  
  @node SPV Light Member Areas
  @subsection Areas
  
  @node SPV Light Member Areas
  @subsection Areas
@@ -1092,7 +1157,7 @@ Each Area represents the style for a different area of the table, in
  the following order: title, caption, footer, corner, column labels,
  row labels, data, and layers.
  
  the following order: title, caption, footer, corner, column labels,
  row labels, data, and layers.
  
-@code{index} is the 1-based index of the Area, i.e. 1 for the first
+@code{index} is the 1-based index of the Area, i.e.@: 1 for the first
  Area, through 8 for the final Area.
  
  @code{typeface} is the string name of the font used in the area.  In
  Area, through 8 for the final Area.
  
  @code{typeface} is the string name of the font used in the area.  In
@@ -1100,7 +1165,7 @@ the corpus, this is @code{SansSerif} in over 99% of instances and
  @code{Times New Roman} in the rest.
  
  @code{size} is the size of the font, in px (@pxref{SPV Light Detail
  @code{Times New Roman} in the rest.
  
  @code{size} is the size of the font, in px (@pxref{SPV Light Detail
-Member Format}) The most common size in the corpus is 12 px.  Even
+Member Format}). The most common size in the corpus is 12 px.  Even
  though @code{size} has a floating-point type, in the corpus its values
  are always integers.
  
  though @code{size} has a floating-point type, in the corpus its values
  are always integers.
  
@@ -1217,8 +1282,9 @@ PrintSettings =>
  The PrintSettings reflect settings for printing.  The fixed value of
  @code{endian} can be used to validate the endianness.
  
  The PrintSettings reflect settings for printing.  The fixed value of
  @code{endian} can be used to validate the endianness.
  
-@code{all-layers} is 1 to print all layers, 0 to print only the
-visible layers.
+@code{all-layers} is 1 to print all layers, 0 to print only the layer
+designated by @code{current-layer} in TableSettings (@pxref{SPV Light
+Member Table Settings}).
  
  @code{paginate-layers} is 1 to print each layer at the start of a new
  page, 0 otherwise.  (This setting is honored only @code{all-layers} is
  
  @code{paginate-layers} is 1 to print each layer at the start of a new
  page, 0 otherwise.  (This setting is honored only @code{all-layers} is
@@ -1257,7 +1323,7 @@ TableSettings =>
          )
          bestring[notes]
          bestring[table-look]
          )
          bestring[notes]
          bestring[table-look]
-        00...))
+        )...)
  
  Breakpoints => be32[n-breaks] be32*[n-breaks]
  
  
  Breakpoints => be32[n-breaks] be32*[n-breaks]
  
@@ -1271,8 +1337,19 @@ PointKeep => be32[offset] be32 be32
  The TableSettings reflect display settings.  The fixed value of
  @code{endian} can be used to validate the endianness.
  
  The TableSettings reflect display settings.  The fixed value of
  @code{endian} can be used to validate the endianness.
  
-@code{current-layer} is the displayed layer.  The interpretation when
-there is more than one layer dimension is not yet known.
+@code{current-layer} is the displayed layer.  Suppose there are
+@math{d} layers, numbered 1 through @math{d} in the order given in the
+Dimensions (@pxref{SPV Light Member Dimensions}), and that the
+displayed value of dimension @math{i} is @math{d_i}, @math{0 \le x_i <
+n_i}, where @math{n_i} is the number of categories in dimension
+@math{i}.  Then @code{current-layer} is calculated by the following
+algorithm:
+
+@display
+let @code{current-layer} = 0
+for each @math{i} from @math{d} downto 1:
+    @code{current-layer} = (@math{n_i \times} @code{current-layer}) @math{+} @math{x_i}
+@end display
  
  If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
  any cell) are hidden; otherwise, they are shown.
  
  If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
  any cell) are hidden; otherwise, they are shown.
@@ -1303,7 +1380,7 @@ or columns to pixel or point offsets.
  
  @code{notes} is a text string that contains user-specified notes.  It
  is displayed when the user hovers the cursor over the table, like text
  
  @code{notes} is a text string that contains user-specified notes.  It
  is displayed when the user hovers the cursor over the table, like text
-in the @code{title} attribute in HTML.  It is not printed.  It is
+in the @code{title} attribute in HTML@.  It is not printed.  It is
  usually empty.
  
  @code{table-look} is the name of a SPSS ``TableLook'' table style,
  usually empty.
  
  @code{table-look} is the name of a SPSS ``TableLook'' table style,
@@ -1322,7 +1399,7 @@ Formats =>
      int32[n-widths] int32*[n-widths]
      string[locale]
      int32[current-layer]
      int32[n-widths] int32*[n-widths]
      string[locale]
      int32[current-layer]
-    bool bool bool
+    bool[x7] bool[x8] bool[x9]
      Y0
      CustomCurrency
      count(
      Y0
      CustomCurrency
      count(
@@ -1336,9 +1413,8 @@ If @code{n-widths} is nonzero, then the accompanying integers are
  column widths as manually adjusted by the user.
  
  @code{locale} is a locale including an encoding, such as
  column widths as manually adjusted by the user.
  
  @code{locale} is a locale including an encoding, such as
-@code{en_US.windows-1252} or @code{it_IT.windows-1252}.  The rest of
-the character strings in the member use this encoding.  The encoding
-string is itself encoded in US-ASCII.
+@code{en_US.windows-1252} or @code{it_IT.windows-1252}.
+(@code{locale} is often duplicated in Y1, described below).
  
  @code{epoch} is the year that starts the epoch.  A 2-digit year is
  interpreted as belonging to the 100 years beginning at the epoch.  The
  
  @code{epoch} is the year that starts the epoch.  A 2-digit year is
  interpreted as belonging to the 100 years beginning at the epoch.  The
@@ -1359,6 +1435,8 @@ following strings are CCA through CCE format strings.  @xref{Custom
  Currency Formats,,, pspp, PSPP}.  Most commonly these are all
  @code{-,,,} but other strings occur.
  
  Currency Formats,,, pspp, PSPP}.  Most commonly these are all
  @code{-,,,} but other strings occur.
  
+A writer may safely use false for @code{x7}, @code{x8}, and @code{x9}.
+
  @subsubheading X0
  
  X0 only appears, optionally, in version 1 members.
  @subsubheading X0
  
  X0 only appears, optionally, in version 1 members.
@@ -1380,17 +1458,9 @@ Tests.''  @code{command-local} is the procedure's name, translated
  into the output language; it is often empty and, when it is not,
  sometimes the same as @code{command}.
  
  into the output language; it is often empty and, when it is not,
  sometimes the same as @code{command}.
  
-@code{dataset} is the name of the dataset analyzed to produce the
-output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
-file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}.  The latter
-is sometimes the empty string.
-
  @code{missing} is the character used to indicate that a cell contains
  a missing value.  It is always observed as @samp{.}.
  
  @code{missing} is the character used to indicate that a cell contains
  a missing value.  It is always observed as @samp{.}.
  
-X0 repeats @code{decimal}, @code{grouping}, CustomCurrency, and
-@code{missing} already included in Formats.
-
  A writer may safely use false for @code{x17}.
  
  @subsubheading X1
  A writer may safely use false for @code{x17}.
  
  @subsubheading X1
@@ -1399,7 +1469,7 @@ X1 only appears in version 3 members.
  
  @example
  X1 =>
  
  @example
  X1 =>
-    bool
+    bool[x14]
      byte[show-title]
      bool[x16]
      byte[lang]
      byte[show-title]
      bool[x16]
      byte[lang]
@@ -1413,9 +1483,7 @@ X1 =>
  
  @code{lang} may indicate the language in use.  Some values seem to be
  0: @t{en}, 1: @t{de}, 2: @t{es}, 3: @t{it}, 5: @t{ko}, 6: @t{pl}, 8:
  
  @code{lang} may indicate the language in use.  Some values seem to be
  0: @t{en}, 1: @t{de}, 2: @t{es}, 3: @t{it}, 5: @t{ko}, 6: @t{pl}, 8:
-@t{zh-tw}, 10: @t{pt_BR}, 11: @t{fr}.  The @code{locale} in Formats
-and the @code{language}, @code{charset}, and @code{locale} in X0 are
-more likely to be useful in practice.
+@t{zh-tw}, 10: @t{pt_BR}, 11: @t{fr}.
  
  @code{show-variables} determines how variables are displayed by
  default.  A value of 1 means to display variable names, 2 to display
  
  @code{show-variables} determines how variables are displayed by
  default.  A value of 1 means to display variable names, 2 to display
@@ -1432,8 +1500,8 @@ which probably means to use a global default.
  
  @code{show-caption} is true to show the caption, false to hide it.
  
  
  @code{show-caption} is true to show the caption, false to hide it.
  
-A writer may safely use false for @code{x14}, false
-for @code{x16}, -1 for @code{x18} and @code{x19}, and false for
+A writer may safely use false for @code{x14}, false for @code{x16}, 0
+for @code{lang}, -1 for @code{x18} and @code{x19}, and false for
  @code{x20}.
  
  @subsubheading X2
  @code{x20}.
  
  @subsubheading X2
@@ -1477,22 +1545,22 @@ X3 =>
      (int32[x22] i0)?
  @end example
  
      (int32[x22] i0)?
  @end example
  
-@code{date} is a date, as seconds since the epoch, i.e.@: since
-January 1, 1970.  Pivot tables within an SPV file often have dates a
-few minutes apart, so this is probably a creation date for the table
-rather than for the file.
-
-X3 repeats @code{decimal}, @code{grouping}, CustomCurrency, and
-@code{missing} already included in Formats.  @code{command},
-@code{command-local}, @code{language}, @code{charset}, and
-@code{locale} have the same meaning as in X0.
-
  @code{small} is a small real number.  In the corpus, it overwhelmingly
  takes the value 0.0001, with zero occasionally seen.  Nonzero numbers
  with format 40 (@pxref{SPV Light Member Value}) whose magnitudes are
  smaller than displayed in scientific notation.  (Thus, a @code{small}
  of zero prevents scientific notation from being chosen.)
  
  @code{small} is a small real number.  In the corpus, it overwhelmingly
  takes the value 0.0001, with zero occasionally seen.  Nonzero numbers
  with format 40 (@pxref{SPV Light Member Value}) whose magnitudes are
  smaller than displayed in scientific notation.  (Thus, a @code{small}
  of zero prevents scientific notation from being chosen.)
  
+@code{dataset} is the name of the dataset analyzed to produce the
+output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
+file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}.  The latter
+is sometimes the empty string.
+
+@code{date} is a date, as seconds since the epoch, i.e.@: since
+January 1, 1970.  Pivot tables within an SPV file often have dates a
+few minutes apart, so this is probably a creation date for the table
+rather than for the file.
+
  Sometimes @code{dataset}, @code{datafile}, and @code{date} are present
  and other times they are absent.  The reader can distinguish by
  assuming that they are present and then checking whether the
  Sometimes @code{dataset}, @code{datafile}, and @code{date} are present
  and other times they are absent.  The reader can distinguish by
  assuming that they are present and then checking whether the
@@ -1504,6 +1572,49 @@ will).
  A writer may safely use 4 for @code{x21} and omit @code{x22} and the
  other optional bytes at the end.
  
  A writer may safely use 4 for @code{x21} and omit @code{x22} and the
  other optional bytes at the end.
  
+@subsubheading Encoding
+
+Formats contains several indications of character encoding:
+
+@itemize @bullet
+@item
+@code{locale} in Formats itself.
+
+@item
+@code{locale} in Y1 (in version 1, Y1 is optionally nested inside X0;
+in version 3, Y1 is nested inside X3).
+
+@item
+@code{charset} in version 3, in Y1.
+
+@item
+@code{lang} in X1, in version 3.
+@end itemize
+
+@code{charset}, if present, is a good indication of character
+encoding, and in its absence the encoding suffix on @code{locale} in
+Formats will work.
+
+@code{locale} in Y1 can be disregarded: it is normally the same as
+@code{locale} in Formats, and it is only present if @code{charset} is
+also.
+
+@code{lang} is not helpful and should be ignored for character
+encoding purposes.
+
+However, the corpus contains many examples of light members whose
+strings are encoded in UTF-8 despite declaring some other character
+set.  Furthermore, the corpus contains several examples of light
+members in which some strings are encoded in UTF-8 (and contain
+multibyte characters) and other strings are encoded in another
+character set (and contain non-ASCII characters).  PSPP treats any
+valid UTF-8 string as UTF-8 and only falls back to the declared
+encoding for strings that are not valid UTF-8.
+
+The @command{pspp-output} program's @command{strings} command can help
+analyze the encoding in an SPV light member.  Use @code{pspp-output
+--help-dev} to see its usage.
+
  @node SPV Light Member Dimensions
  @subsection Dimensions
  
  @node SPV Light Member Dimensions
  @subsection Dimensions
  
@@ -1551,7 +1662,8 @@ When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored.
  
  @code{dim-index} is usually the 0-based index of the dimension, e.g.@:
  0 for the first dimension, 1 for the second, and so on.  Sometimes it
  
  @code{dim-index} is usually the 0-based index of the dimension, e.g.@:
  0 for the first dimension, 1 for the second, and so on.  Sometimes it
-is -1.  There is no visible difference.
+is -1.  There is no visible difference.  A writer may safely use the
+0-based index.
  
  @node SPV Light Member Categories
  @subsection Categories
  
  @node SPV Light Member Categories
  @subsection Categories
@@ -1578,12 +1690,12 @@ Leaf.  If the user does sorts or rearrange the categories, then the
  order of categories in the file reflects that change and
  @code{leaf-index} reflects the original order.
  
  order of categories in the file reflects that change and
  @code{leaf-index} reflects the original order.
  
-Occasionally a dimension has no leaf categories at all.  A table that
+A dimension can have no leaf categories at all.  A table that
  contains such a dimension necessarily has no data at all.
  
  A Group is a group of nested categories.  Usually a Group contains at
  contains such a dimension necessarily has no data at all.
  
  A Group is a group of nested categories.  Usually a Group contains at
-least one Category, so that @code{n-subcategories} is positive, but a
-few Groups with @code{n-subcategories} 0 has been observed.
+least one Category, so that @code{n-subcategories} is positive, but
+Groups with zero subcategories have been observed.
  
  If a Group's @code{merge} is 00, the most common value, then the group
  is really a distinct group that should be represented as such in the
  
  If a Group's @code{merge} is 00, the most common value, then the group
  is really a distinct group that should be represented as such in the
@@ -1594,8 +1706,7 @@ parent group, then direct children of the dimension), and this group's
  name is irrelevant and should not be displayed.  (Merged groups can be
  nested!)
  
  name is irrelevant and should not be displayed.  (Merged groups can be
  nested!)
  
-(For writing an SPV file, there is no need to use the @code{merge}
-feature unless it is convenient.)
+Writers need not use merged groups.
  
  A Group's @code{x23} appears to be i2 when all of the categories
  within a group are leaf categories that directly represent data values
  
  A Group's @code{x23} appears to be i2 when all of the categories
  within a group are leaf categories that directly represent data values
@@ -1627,7 +1738,10 @@ permutation of the 0-based dimension numbers.  The first
  layers, the next @code{n-rows} integers specify the dimensions
  represented by rows, and the final @code{n-columns} integers specify
  the dimensions represented by columns.  When there is more than one
  layers, the next @code{n-rows} integers specify the dimensions
  represented by rows, and the final @code{n-columns} integers specify
  the dimensions represented by columns.  When there is more than one
-dimension of a given kind, the inner dimensions are given first.
+dimension of a given kind, the inner dimensions are given first.  (For
+the layer axis, this means that the first dimension is at the bottom
+of the list and the last dimension is at the top when the current
+layer is displayed.)
  
  @node SPV Light Member Cells
  @subsection Cells
  
  @node SPV Light Member Cells
  @subsection Cells
@@ -1641,7 +1755,7 @@ Cell => int64[index] v1(00?) Value
  
  A Cell consists of an @code{index} and a Value.  Suppose there are
  @math{d} dimensions, numbered 1 through @math{d} in the order given in
  
  A Cell consists of an @code{index} and a Value.  Suppose there are
  @math{d} dimensions, numbered 1 through @math{d} in the order given in
-the Dimensions previously, and that dimension @math{i}, has @math{n_i}
+the Dimensions previously, and that dimension @math{i} has @math{n_i}
  categories.  Consider the cell at coordinates @math{x_i}, @math{1 \le
  i \le d}, and note that @math{0 \le x_i < n_i}.  Then the index is
  calculated by the following algorithm:
  categories.  Consider the cell at coordinates @math{x_i}, @math{1 \le
  i \le d}, and note that @math{0 \le x_i < n_i}.  Then the index is
  calculated by the following algorithm:
@@ -1673,6 +1787,7 @@ RawValue =>
    @math{|} 04 ValueMod int32[format] string[value-label] string[var-name]
      byte[show] string[s]
    @math{|} 05 ValueMod string[var-name] string[var-label] byte[show]
    @math{|} 04 ValueMod int32[format] string[value-label] string[var-name]
      byte[show] string[s]
    @math{|} 05 ValueMod string[var-name] string[var-label] byte[show]
+  @math{|} 06 string[local] ValueMod string[id] string[c]
    @math{|} ValueMod string[template] int32[n-args] Argument*[n-args]
  Argument =>
      i0 Value
    @math{|} ValueMod string[template] int32[n-args] Argument*[n-args]
  Argument =>
      i0 Value
@@ -1697,8 +1812,8 @@ Most commonly, @code{format} has width 40 (the maximum).
  
  An @code{x} with the maximum negative double value @code{-DBL_MAX}
  represents the system-missing value SYSMIS.  (HIGHEST and LOWEST have
  
  An @code{x} with the maximum negative double value @code{-DBL_MAX}
  represents the system-missing value SYSMIS.  (HIGHEST and LOWEST have
-not been observed.)  @xref{System File Format}, for more about these
-special values.
+not been observed.)  See @ref{System File Format}, for more about
+these special values.
  
  @item 02
  Similar to @code{01}, with the additional information that @code{x} is
  
  @item 02
  Similar to @code{01}, with the additional information that @code{x} is
@@ -1737,8 +1852,9 @@ case, @code{id} is always the empty string; in the latter case,
  The string value @code{s}, intended to be presented to the user
  formatted according to @code{format}.  The format for a string is not
  too interesting, and the corpus contains many clearly invalid formats
  The string value @code{s}, intended to be presented to the user
  formatted according to @code{format}.  The format for a string is not
  too interesting, and the corpus contains many clearly invalid formats
-like A16.39 or A255.127 or A134.1, so readers should probably ignore
-the format entirely.
+like A16.39 or A255.127 or A134.1, so readers should probably entirely
+disregard the format.  PSPP only checks @code{format} to distinguish
+AHEX format.
  
  @code{s} is a value of variable @code{var-name} and has value label
  @code{value-label}.  @code{var-name} is never empty but
  
  @code{s} is a value of variable @code{var-name} and has value label
  @code{value-label}.  @code{var-name} is never empty but
@@ -1747,14 +1863,18 @@ the format entirely.
  @code{show} has the same meaning as in the encoding for 02.
  
  @item 05
  @code{show} has the same meaning as in the encoding for 02.
  
  @item 05
-Variable @code{var-name}, which is rarely observed as empty in the
-corpus, with variable label @code{var-label}, which is often empty.
+Variable @code{var-name} with variable label @code{var-label}.  In the
+corpus, @code{var-name} is rarely empty and @code{var-label} is often
+empty.
  
  @code{show} determines whether to show the variable name or the
  variable label.  A value of 1 means to show the name, 2 to show the
  label, 3 to show both, and 0 means to use the default specified in
  @code{show-variables} (@pxref{SPV Light Member Formats}).
  
  
  @code{show} determines whether to show the variable name or the
  variable label.  A value of 1 means to show the name, 2 to show the
  label, 3 to show both, and 0 means to use the default specified in
  @code{show-variables} (@pxref{SPV Light Member Formats}).
  
+@item 06
+Similar to type 03, with @code{fixed} assumed to be true.
+
  @item otherwise
  When the first byte of a RawValue is not one of the above, the
  RawValue starts with a ValueMod, whose syntax is described in the next
  @item otherwise
  When the first byte of a RawValue is not one of the above, the
  RawValue starts with a ValueMod, whose syntax is described in the next
@@ -1870,7 +1990,7 @@ a Value.
  Each of the @code{n-refs} integers is a reference to a Footnote
  (@pxref{SPV Light Member Footnotes}) by 0-based index.  Footnote
  markers are shown appended to the main text of the Value, as
  Each of the @code{n-refs} integers is a reference to a Footnote
  (@pxref{SPV Light Member Footnotes}) by 0-based index.  Footnote
  markers are shown appended to the main text of the Value, as
-superscripts.
+superscripts or subscripts.
  
  The @code{subscripts}, if present, are strings to append to the main
  text of the Value, as subscripts.  Each subscript text is a brief
  
  The @code{subscripts}, if present, are strings to append to the main
  text of the Value, as subscripts.  Each subscript text is a brief