work on rust version of pspp-dump-sav

[pspp] / doc / dev / spv-file-format.texi
diff --git a/doc/dev/spv-file-format.texi b/doc/dev/spv-file-format.texi

index 54decbcf52bfa7034c6b34bc4465115c7aa3451b..523eb556e9e61538120f9cd33d7fb4974e6b7642 100644 (file)
--- a/doc/dev/spv-file-format.texi
+++ b/doc/dev/spv-file-format.texi
@@ -9,7 +9,7 @@
  @c
  
  @node SPSS Viewer File Format
-@appendix SPSS Viewer File Format
+@chapter SPSS Viewer File Format
  
  SPSS Viewer or @file{.spv} files, here called SPV files, are written
  by SPSS 16 and later to represent the contents of its output editor.
@@ -38,13 +38,13 @@ e.g.@: as @samp{allowingPivot=false} has no effect.}
  
  The rest of the members in an SPV file's Zip archive fall into two
  categories: @dfn{structure} and @dfn{detail} members.  Structure
-member names begin with @file{outputViewer@var{nnnnnnnnnn}}, where
-each @var{n} is a decimal digit, and end with @file{.xml}, and often
-include the string @file{_heading} in between.  Each of these members
-represents some kind of output item (a table, a heading, a block of
-text, etc.) or a group of them.  The member whose output goes at the
-beginning of the document is numbered 0, the next member in the output
-is numbered 1, and so on.
+member names take the form with @file{outputViewer@var{number}.xml} or
+@file{outputViewer@var{number}_heading.xml}, where @var{number} is an
+10-digit decimal number.  Each of these members represents some kind
+of output item (a table, a heading, a block of text, etc.) or a group
+of them.  The member whose output goes at the beginning of the
+document is numbered 0, the next member in the output is numbered 1,
+and so on.
  
  Structure members contain XML.  This XML is sometimes self-contained,
  but it often references detail members in the Zip archive, which are
@@ -88,8 +88,9 @@ Not yet investigated.  The corpus contains few examples.
  The @file{@var{prefix}} in the names of the detail members is
  typically an 11-digit decimal number that increases for each item,
  tending to skip values.  Older SPV files use different naming
-conventions.  Structure member refer to detail members by name, and so
-their exact names do not matter to readers as long as they are unique.
+conventions for detail members.  Structure member refer to detail
+members by name, and so their exact names do not matter to readers as
+long as they are unique.
  
  SPSS tolerates corrupted Zip archives that Zip reader libraries tend
  to reject.  These can be fixed up with @command{zip -FF}.
@@ -329,15 +330,29 @@ heading
  => label (container | heading)*
  @end example
  
-The root of a structure member is a @code{heading}, which represents a
-section of output beginning with a @code{label} and
-ordinarily followed by content containers or further nested
-(sub)-sections of output.  Unlike heading elements in HTML and other
-common document formats, which precede the content that they head,
-@code{heading} contains the elements that appear below the heading.
-
-The document root heading, only, may contain a @code{pageSetup}
-element.
+A @code{heading} represents a tree of content that appears in an
+output viewer window.  It contains a @code{label} text string that is
+shown in the outline view ordinarily followed by content containers or
+further nested (sub)-sections of output.  Unlike heading elements in
+HTML and other common document formats, which precede the content that
+they head, @code{heading} contains the elements that appear below the
+heading.
+
+The root of a structure member is a special @code{heading}.  The
+direct children of the root @code{heading} elements in all structure
+members in an SPV file are siblings.  That is, the root @code{heading}
+in all of the structure members conceptually represent the same node.
+The root heading's @code{label} is ignored (see @pxref{SPV Structure
+label Element}).  The root heading in the first structure member in
+the Zip file may contain a @code{pageSetup} element.
+
+The schema implies that any @code{heading} may contain a sequence of
+any number of @code{heading} and @code{container} elements.  This does
+not work for the root @code{heading} in practice, which must actually
+contain exactly one @code{container} or @code{heading} child element.
+Furthermore, if the root heading's child is a @code{heading}, then the
+structure member's name must end in @file{_heading.xml}; if it is a
+@code{container} child, then it must not.
  
  The following attributes have been observed on both document root and
  nested @code{heading} elements.
@@ -387,7 +402,11 @@ output, e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
  @end defvr
  
  @defvr {Attribute} @code{visibility}
-To what degree the output represented by the element is visible.
+If this attribute is absent, the heading's content is expanded in the
+outline view.  If it is set to @code{collapsed}, it is collapsed.
+(This attribute is never present in a root @code{heading} because the
+root node is always expanded when a file is loaded, even though the UI
+can be used to collapse it interactively.)
  @end defvr
  
  @defvr {Attribute} @code{locale}
@@ -415,8 +434,7 @@ output.  The label text doesn't appear in the output itself.
  
  The text in @code{label} describes what it labels, often by naming the
  statistical procedure that was executed, e.g.@: ``Frequencies'' or
-``T-Test''.  The root @code{heading} in a structure member is normally
-``Output''.  Labels are often very generic, especially within a
+``T-Test''.  Labels are often very generic, especially within a
  @code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''.
  Label text is localized according to the output language, e.g.@: in
  Italian a frequency table procedure is labeled ``Frequenze''.
@@ -425,6 +443,10 @@ The user can edit labels to be anything they want.  The corpus
  contains a few examples of empty labels, ones that contain no text,
  probably as a result of user editing.
  
+The root @code{heading} in an SPV file has a @code{label}, like every
+@code{heading}.  It normally contains ``Output'' but its content is
+disregarded anyway.  The user cannot edit it.
+
  @node SPV Structure container Element
  @subsection The @code{container} Element
  
@@ -444,7 +466,7 @@ This element has the following attributes.
  
  @defvr {Attribute} @code{visibility}
  Whether the container's content is displayed.  ``Notes'' tables are
-often hidden; other data is usually
+often hidden; other data is usually visible.
  @end defvr
  
  @defvr {Attribute} @code{text-align}
@@ -456,6 +478,14 @@ Alignment of text within the container.  Observed with nested
  The width of the container, e.g.@: @code{1097px}.
  @end defvr
  
+All of the elements that nest inside @code{container} (except the
+@code{label}) have the following optional attribute.
+
+@defvr {Attribute} @code{commandName}
+As on the @code{heading} element.  The corpus contains one example
+of where @code{commandName} is present but set to the empty string.
+@end defvr
+
  @node SPV Structure text Element (Inside @code{container})
  @subsection The @code{text} Element (Inside @code{container})
  
@@ -473,14 +503,13 @@ is a different @code{text} element that is nested inside a
  
  This element has the following attributes.
  
-@defvr {Attribute} @code{type}
-The semantics of the text.
+@defvr {Attribute} @code{commandName}
+@xref{SPV Structure container Element}.  For output not specific to a
+command, this is simply @code{log}.
  @end defvr
  
-@defvr {Attribute} @code{commandName}
-As on the @code{heading} element.  For output not specific to a
-command, this is simply @code{log}.  The corpus contains one example
-of where @code{commandName} is present but set to the empty string.
+@defvr {Attribute} @code{type}
+The semantics of the text.
  @end defvr
  
  @defvr {Attribute} @code{creator-version}
@@ -582,7 +611,7 @@ tableStructure => path? dataPath csvPath?
  This element has the following attributes.
  
  @defvr {Attribute} @code{commandName}
-As on the @code{heading} element.
+@xref{SPV Structure container Element}.
  @end defvr
  
  @defvr {Attribute} @code{type}
@@ -680,9 +709,16 @@ name Zip members with @file{.scf} extension.
  @subsection The @code{object} and @code{image} Elements
  
  @example
-object :type[object_type]=(unknown)? :uri => EMPTY
+object
+   :commandName?
+   :type[object_type]=(unknown)?
+   :uri
+=> EMPTY
  
-image :VDPId :commandName => dataPath
+image
+   :commandName?
+   :VDPId
+=> dataPath
  @end example
  
  These two elements represent an image in PNG format.  They are
@@ -1015,8 +1051,8 @@ Header =>
      bool[rotate-outer-row-labels]
      bool[x2]
      int32[x3]
-    int32[min-col-width] int32[max-col-width]
-    int32[min-row-width] int32[max-row-width]
+    int32[min-col-heading-width] int32[max-col-heading-width]
+    int32[min-row-heading-width] int32[max-row-heading-width]
      int64[table-id]
  @end example
  
@@ -1033,12 +1069,37 @@ If @code{rotate-outer-row-labels} is 1, then row labels farthest from
  the data are rotated 90° counterclockwise; otherwise, they are shown
  in the normal way.
  
-@code{min-col-width} is the minimum width that a column will be
-assigned automatically.  @code{max-col-width} is the maximum width
-that a column will be assigned to accommodate a long column label.
-@code{min-row-width} and @code{max-row-width} are a similar range for
-the width of row labels.  All of these measurements are in 1/96 inch
-units (called a ``device independent pixel'' unit in Windows).
+@code{min-col-heading-width}, @code{max-col-heading-width}, @code{min-row-heading-width}, and
+@code{max-row-heading-width} are measurements in 1/96 inch units (called
+``device independent pixel'' units in Windows) whose values influence
+column widths.  For the purpose of interpreting these values, a table
+is divided into the three regions shown below:
+
+@example
++------------------+-------------------------------------------------+
+|                  |                  column headings                |
+|                  +-------------------------------------------------+
+|      corner      |                                                 |
+|       and        |                                                 |
+|   row headings   |                      data                       |
+|                  |                                                 |
+|                  |                                                 |
++------------------+-------------------------------------------------+
+@end example
+
+@code{min-col-heading-width} and @code{max-col-heading-width} apply to the columns in
+the column headings region.  @code{min-col-heading-width} is the minimum width
+that any of these columns will be given automatically.  In addition,
+@code{max-col-heading-width} is the maximum width that a column will be
+assigned to accommodate a long label in the column headings cells.
+These columns will still be made wider to accommodate wide data values
+in the data region.
+
+@code{min-row-heading-width} is the minimum width that a column in the corner
+and row headings region will be given automatically.
+@code{max-col-heading-width} is the maximum width that a column in this region
+will be assigned to accomodate a long label.  This region doesn't
+include data, so data values don't affect column widths.
  
  @code{table-id} is a binary version of the @code{tableId} attribute in
  the structure member that refers to the detail member.  For example,
@@ -1301,8 +1362,19 @@ PointKeep => be32[offset] be32 be32
  The TableSettings reflect display settings.  The fixed value of
  @code{endian} can be used to validate the endianness.
  
-@code{current-layer} is the displayed layer.  The interpretation when
-there is more than one layer dimension is not yet known.
+@code{current-layer} is the displayed layer.  Suppose there are
+@math{d} layers, numbered 1 through @math{d} in the order given in the
+Dimensions (@pxref{SPV Light Member Dimensions}), and that the
+displayed value of dimension @math{i} is @math{d_i}, @math{0 \le x_i <
+n_i}, where @math{n_i} is the number of categories in dimension
+@math{i}.  Then @code{current-layer} is calculated by the following
+algorithm:
+
+@display
+let @code{current-layer} = 0
+for each @math{i} from @math{d} downto 1:
+    @code{current-layer} = (@math{n_i \times} @code{current-layer}) @math{+} @math{x_i}
+@end display
  
  If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
  any cell) are hidden; otherwise, they are shown.
@@ -1399,7 +1471,7 @@ X0 => byte*14 Y1 Y2
  Y1 =>
      string[command] string[command-local]
      string[language] string[charset] string[locale]
-    bool bool bool bool
+    bool[x10] bool[include-leading-zero] bool[x12] bool[x13]
      Y0
  Y2 => CustomCurrency byte[missing] bool[x17]
  @end example
@@ -1411,10 +1483,15 @@ Tests.''  @code{command-local} is the procedure's name, translated
  into the output language; it is often empty and, when it is not,
  sometimes the same as @code{command}.
  
+@code{include-leading-zero} is the @code{LEADZERO} setting for the
+table, where false is @code{OFF} (the default) and true is @code{ON}.
+@xref{SET LEADZERO,,, pspp, PSPP}.
+
  @code{missing} is the character used to indicate that a cell contains
  a missing value.  It is always observed as @samp{.}.
  
-A writer may safely use false for @code{x17}.
+A writer may safely use false for @code{x10} and @code{x17} and true
+for @code{x12} and @code{x13}.
  
  @subsubheading X1
  
@@ -1495,7 +1572,7 @@ X3 =>
      double[small] 01
      (string[dataset] string[datafile] i0 int32[date] i0)?
      Y2
-    (int32[x22] i0)?
+    (int32[x22] i0 01?)?
  @end example
  
  @code{small} is a small real number.  In the corpus, it overwhelmingly
@@ -1691,7 +1768,10 @@ permutation of the 0-based dimension numbers.  The first
  layers, the next @code{n-rows} integers specify the dimensions
  represented by rows, and the final @code{n-columns} integers specify
  the dimensions represented by columns.  When there is more than one
-dimension of a given kind, the inner dimensions are given first.
+dimension of a given kind, the inner dimensions are given first.  (For
+the layer axis, this means that the first dimension is at the bottom
+of the list and the last dimension is at the top when the current
+layer is displayed.)
  
  @node SPV Light Member Cells
  @subsection Cells