X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=spv-file-format.texi;h=dff235859b6f9b711dd3362abaf26c3e9a8d3a63;hb=71ce2a2b4f682c94b744c906769f5fc2dd25f3f0;hp=82afa6de7239ee5e90efc9689aed340bf8d1f013;hpb=ebc66e3005143a6feb347e99e2d6e7e900623bf0;p=pspp diff --git a/spv-file-format.texi b/spv-file-format.texi index 82afa6de72..dff235859b 100644 --- a/spv-file-format.texi +++ b/spv-file-format.texi @@ -194,7 +194,7 @@ file. @defvr {Optional} @code{creation-date-time} The date and time at which the SPV file was written, in a -locale-specific format, e.g. @code{Friday, May 16, 2014 6:47:37 PM +locale-specific format, e.g.@: @code{Friday, May 16, 2014 6:47:37 PM PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday, December 5, 2014 5:00:19 o'clock PM EST}. @end defvr @@ -488,25 +488,46 @@ across multiple lines. Break points are chosen for aesthetics only and have no semantic significance. @item 00, 01, @dots{}, ff. -Bytes with fixed values are written in hexadecimal: +A bytes with a fixed value, written as a pair of hexadecimal digits. @item i0, i1, @dots{}, i9, i10, i11, @dots{} -32-bit integers with fixed values are written in decimal, prefixed by +@itemx b0, b1, @dots{}, b9, b10, b11, @dots{} +A 32-bit integer in little-endian or big-endian byte order, +respectively, with a fixed value, written in decimal, prefixed by @samp{i}. @item byte -An arbitrary byte. +A byte. + +@item bool +A byte with value 0 or 1. + +@item int16 +@itemx be16 +A 16-bit integer in little-endian or big-endian byte order, +respectively. @item int -An arbitrary 32-bit integer. +@itemx be32 +A 32-bit integer in little-endian or big-endian byte order, +respectively. + +@item int64 +@itemx be64 +A 64-bit integer in little-endian or big-endian byte order, +respectively. @item double -An arbitrary 64-bit IEEE floating-point number. +A 64-bit IEEE floating-point number. + +@item float +A 32-bit IEEE floating-point number. @item string -A 32-bit integer followed by the specified number of bytes of -character data. (The encoding is indicated by the Formats -nonterminal.) +@itemx bestring +A 32-bit integer, in little-endian or big-endian byte order, +respectively, followed by the specified number of bytes of character +data. (The encoding is indicated by the Formats nonterminal.) @item @var{x}? @var{x} is optional, e.g.@: 00? is an optional zero byte. @@ -540,15 +561,20 @@ In a version 1 @file{.bin} member, @var{x}; in version 3, nothing. In a version 3 @file{.bin} member, @var{x}; in version 1, nothing. @end table -All integer and floating-point values in this format use little-endian -byte order. +Little-endian byte order is far more common in this format, but a few +pieces of the format use big-endian byte order. A ``light'' detail member @file{.bin} consists of a number of sections concatenated together, terminated by a byte 01: @cartouche @format -LightMember @result{} Header Title Caption Footnotes Fonts Formats Dimensions Data 01 +LightMember @result{} + Header Title + Caption Footnotes + Fonts Formats Borders PrintSettings TableSettings + Dimensions Data + 01 @end format @end cartouche @@ -560,6 +586,9 @@ The following sections go into more detail. * PSV Light Member Caption:: * SPV Light Member Footnotes:: * SPV Light Member Fonts:: +* SPV Light Member Borders:: +* SPV Light Member Print Settings:: +* SPV Light Member Table Settings:: * SPV Light Member Formats:: * SPV Light Member Dimensions:: * SPV Light Member Categories:: @@ -571,15 +600,17 @@ The following sections go into more detail. @node SPV Light Member Header @subsection Header -An SPV file begins with an 39-byte header: +An SPV light member begins with a 39-byte header: @cartouche @format Header @result{} 01 00 (i1 @math{|} i3)[@t{version}] - 01 (00 @math{|} 01) byte*21 00 00 - int[@t{table-id}] byte*4 + 01 bool*4 int + int[@t{min-column-width}] int[@t{max-column-width}] + int[@t{min-row-height}] int[@t{max-row-height}] + int64[@t{table-id}] @end format @end cartouche @@ -590,8 +621,8 @@ version-specific formatting (as described previously). @code{table-id} is a binary version of the @code{tableId} attribute in the structure member that refers to the detail member. For example, -if @code{tableId} is @code{-4154297861994971133}, then @code{table-id} -would be 0xdca00003. +if @code{tableId} is @code{-4122591256483201023}, then @code{table-id} +would be 0xc6c99d183b300001. The meaning of the other variable parts of the header is not known. @@ -603,7 +634,7 @@ The meaning of the other variable parts of the header is not known. Title @result{} Value[@t{title1}] 01? Value[@t{c}] 01? 31 - Value[@t{title2}] 01? 00? 58 + Value[@t{title2}] 01? @end format @end cartouche @@ -622,11 +653,15 @@ well formatted. For example, for a frequency table, @code{title1} and @cartouche @format -Caption @result{} 58 @math{|} 31 Value[@t{caption}] +Caption @result{} Caption1 Caption2 +Caption1 @result{} 31 Value @math{|} 58 +Caption2 @result{} 31 Value @math{|} 58 @end format @end cartouche -The @code{caption}, if presented, is shown below the table. +The Caption, if present, is shown below the table. Caption2 is +normally present. Caption1 is only rarely nonempty; it might reflect +user editing of the caption. @node SPV Light Member Footnotes @subsection Footnotes @@ -648,19 +683,18 @@ Each footnote has @code{text} and an optional customer @code{marker} @format Fonts @result{} 00 Font*8 Font @result{} - byte[@t{index}] 31 string[@t{typeface}] 00 00 - (10 @math{|} 20 @math{|} 40 @math{|} 50 @math{|} 70 @math{|} 80)[@t{f1}] 41 - (i0 @math{|} i1 @math{|} i2)[@t{f2}] 00 - (i0 @math{|} i2 @math{|} i64173)[@t{f3}] - (i0 @math{|} i1 @math{|} i2 @math{|} i3)[@t{f4}] - string[@t{fgcolor}] string[@t{bgcolor}] i0 i0 00 - v3(int[@t{f5}] int[@t{f6}] int[@t{f7}] int[@t{f8}])) + byte[@t{index}] 31 + string[@t{typeface}] float[@t{size}] int[@t{style}] bool[@t{underline}] + int[@t{halign}] int[@t{valign}] + string[@t{fgcolor}] string[@t{bgcolor}] + byte[@t{alternate}] string[@t{altfg}] string[@t{altbg}] + v3(int[@t{left-margin}] int[@t{right-margin}] int[@t{top-margin}] int[@t{bottom-margin}]) @end format @end cartouche Each Font represents the font style for a different element, in the -following order: title, caption, footnote, row labels, column labels, -corner labels, data, and layers. +following order: title, caption, footer, corner, column +labels, row labels, data, and layers. @code{index} is the 1-based index of the Font, i.e. 1 for the first Font, through 8 for the final Font. @@ -669,34 +703,182 @@ Font, through 8 for the final Font. is @code{SansSerif} in over 99% of instances and @code{Times New Roman} in the rest. +@code{size} is the size of the font, in points. The most common size +in the corpus is 12 points. + +@code{style} is a bit mask. Bit 0 (with value 1) is set for bold, bit +1 (with value 2) is set for italic. + +@code{underline} is 1 if the font is underlined, 0 otherwise. + +@code{halign} specifies horizontal alignment: 0 for center, 2 for +left, 4 for right, 61453 for decimal, 64173 for mixed. Mixed +alignment varies according to type: string data is left-justified, +numbers and most other formats are right-justified. + +@code{valign} specifies vertical alignment: 0 for center, 1 for top, 3 +for bottom. + @code{fgcolor} and @code{bgcolor} are the foreground color and background color, respectively. In the corpus, these are always @code{#000000} and @code{#ffffff}, respectively. -The meaning of the remaining data is unknown. It seems likely to -include font sizes, horizontal and vertical alignment, attributes such -as bold or italic, and margins. - -The table below lists the values observed in the corpus. When a cell -contains a single value, then 99@math{+}% of the corpus contains that value. -When a cell contains a pair of values, then the first value is seen in -about two-thirds of the corpus and the second value in about the -remaining one-third. In fonts that include multiple pairs, values are -correlated, that is, for font 3, f5 = 24, f6 = 24, f7 = 2 appears -about two-thirds of the time, as does the combination of f4 = 0, f6 = -10 for font 7. - -@multitable {font} {40} {f2} {64173} {0/1} {24/11} {10/11} {2/3} {f8} -@headitem font @tab f1 @tab f2 @tab f3 @tab f4 @tab f5 @tab f6 @tab f7 @tab f8 -@item 1 @tab 40 @tab 1 @tab 0 @tab 0 @tab 8 @tab 10/11 @tab 1 @tab 8 -@item 2 @tab 40 @tab 0 @tab 2 @tab 1 @tab 8 @tab 10/11 @tab 1 @tab 1 -@item 3 @tab 40 @tab 0 @tab 2 @tab 1 @tab 24/11 @tab 24/ 8 @tab 2/3 @tab 4 -@item 4 @tab 40 @tab 0 @tab 2 @tab 3 @tab 8 @tab 10/11 @tab 1 @tab 1 -@item 5 @tab 40 @tab 0 @tab 0 @tab 1 @tab 8 @tab 10/11 @tab 1 @tab 4 -@item 6 @tab 40 @tab 0 @tab 2 @tab 1 @tab 8 @tab 10/11 @tab 1 @tab 4 -@item 7 @tab 40 @tab 0 @tab 64173 @tab 0/1 @tab 8 @tab 10/11 @tab 1 @tab 1 -@item 8 @tab 40 @tab 0 @tab 2 @tab 3 @tab 8 @tab 10/11 @tab 1 @tab 4 -@end multitable +@code{alternate} is 01 if rows should alternate colors, 00 if all rows +should be the same color. When @code{alternate} is 01, @code{altfg} +and @code{altbg} specify the colors for the alternate rows. + +@node SPV Light Member Borders +@subsection Borders + +@cartouche +@format +Borders @result{} + b1[@t{endian}] + be32[@t{n-borders}] Border*[@t{n-borders}] + bool[@t{show-grid-lines}] + 00 00 00 + +Border @result{} + be32[@t{border-type}] + be32[@t{stroke-type}] + be32[@t{color}] +@end format +@end cartouche + +The Borders reflect how borders between regions are drawn. + +The fixed value of @code{endian} can be used to validate the +endianness. + +@code{show-grid-lines} is 1 to draw grid lines, otherwise 0. + +Each Border describes one kind of border. @code{n-borders} seems to +always be 19. Each @code{border-type} appears once in order, and they +correspond to the following borders: + +@table @asis +@item 0 +Title. +@item 1@dots{}4 +Left, top, right, and bottom outer frame. +@item 5@dots{}8 +Left, top, right, and bottom inner frame. +@item 9, 10 +Left and top of data area. +@item 11, 12 +Horizontal and vertical dimension rows. +@item 13, 14 +Horizontal and vertical dimension columns. +@item 15, 16 +Horizontal and vertical category rows. +@item 17, 18 +Horizontal and vertical category columns. +@end table + +@code{stroke-type} describes how a border is drawn, as one of: + +@table @asis +@item 0 +No line. +@item 1 +Solid line. +@item 2 +Dashed line. +@item 3 +Thick line. +@item 4 +Thin line. +@item 5 +Double line. +@end table + +@code{color} is an RGB color. Bits 24--31 are alpha, bits 16--23 are +red, 8--15 are green, 0--7 are blue. An alpha of 255 indicates an +opaque color, therefore opaque black is 0xff000000. + +@node SPV Light Member Print Settings +@subsection Print Settings + +@cartouche +@format +PrintSettings @result{} + b1[@t{endian}] + bool[@t{layers}] + bool[@t{paginate-layers}] + bool[@t{fit-width}] + bool[@t{fit-length}] + bool[@t{top-continuation}] + bool[@t{bottom-continuation}] + be32[@t{n-orphan-lines}] + bestring[@t{continuation-string}] +@end format +@end cartouche + +The PrintSettings reflect settings for printing. The fixed value of +@code{endian} can be used to validate the endianness. + +@code{layers} is 1 to print all layers, 0 to print only the visible +layers. + +@code{paginate-layers} is 1 to print each layer at the start of a new +page, 0 otherwise. + +@code{fit-width} and @code{fit-length} control whether the table is +shrunk to fit within a page's width or length, respectively. + +@code{n-orphan-lines} is the minimum number of rows or columns to put +in one part of a table that is broken across pages. + +If @code{top-continuation} is 1, then @code{continuation-string} is +printed at the top of a page when a table is broken across pages for +printing; similarly for @code{bottom-continuation} and the bottom of a +page. Usually, @code{continuation-string} is empty. + +@node SPV Light Member Table Settings +@subsection Table Settings + +@cartouche +@format +TableSettings @result{} + be32[@t{endian}] + be32 + be32[@t{current-layer}] + bool[@t{omit-empty}] + bool[@t{show-row-labels-in-corner}] + bool[@t{show-alphabetic-markers}] + bool[@t{footnote-marker-position}] + v3( + byte + be32[@t{n}] byte*[@t{n}] + bestring + bestring[@t{table-look}] + 00... + ) +@end format +@end cartouche + +The TableSettings reflect display settings. The fixed value of +@code{endian} can be used to validate the endianness. + +@code{current-layer} is the displayed layer. + +If @code{omit-empty} is 1, empty rows or columns (ones with nothing in +any cell) are hidden; otherwise, they are shown. + +If @code{show-row-labels-in-corner} is 1, then row labels are shown in +the upper left corner; otherwise, they are shown nested. + +If @code{show-alphabetic-markers} is 1, markers are shown as letters +(e.g. @samp{a}, @samp{b}, @samp{c}, @dots{}); otherwise, they are +shown as numbers starting from 1. + +When @code{footnote-marker-position} is 1, footnote markers are shown +as superscripts, otherwise as subscripts. + +@code{table-look} is the name of a SPSS ``TableLook'' table style, +such as ``Default'' or ``Academic''; it is often empty. + +TableSettings ends with an arbitrary number of null bytes. @node SPV Light Member Formats @subsection Formats @@ -704,9 +886,6 @@ about two-thirds of the time, as does the combination of f4 = 0, f6 = @cartouche @format Formats @result{} - int[@t{n1}] byte*[@t{n1}] - int[@t{n2}] byte*[@t{n2}] - int[@t{n3}] byte*[@t{n3}] int[@t{n4}] int*[@t{n4}] string[@t{encoding}] (i0 @math{|} i-1) (00 @math{|} 01) 00 (00 @math{|} 01) @@ -725,26 +904,12 @@ X6 @result{} int byte[@t{decimal}] byte[@t{grouping}] byte*8 01 - (string[@t{dataset}] string[@t{datafile}] i0 int i0)? + (string[@t{dataset}] string[@t{data file}] i0 int i0)? int[@t{n-ccs}] string*[@t{n-ccs}] 2e (00 @math{|} 01) (i2000000 i0)? @end format @end cartouche -In every example in the corpus, @code{n1} is 240. The meaning of the -bytes that follow it is unknown. - -In every example in the corpus, @code{n2} is 18 and the bytes that -follow it are @code{00 00 00 01 00 00 00 00 00 00 00 00 00 02 00 00 00 -00}. The meaning of these bytes is unknown. - -In every example in the corpus for version 1, @code{n3} is 16 and the -bytes that follow it are @code{00 00 00 01 00 00 00 01 00 00 00 00 01 -01 01 01}. In version 3, observed @code{n3} varies from 117 to 150, -and its bytes include a 1-byte count at offset 0x34. When the count -is nonzero, a text string of that length at offset 0x35 is the name of -a ``TableLook'', e.g. ``Default'' or ``Academic''. - Observed values of @code{n4} vary from 0 to 17. Out of 7,060 examples in the corpus, it is nonzero only 36 times. @@ -1057,9 +1222,7 @@ ValueMod @result{} 31 i0 (i0 @math{|} i1 string[@t{subscript}]) v1(00 (i1 @math{|} i2) 00 00 int 00 00) v3(count(FormatString Style ValueModUnknown)) - @math{|} 31 i1 int[@t{footnote-number}] Format - @math{|} 31 i2 (00 @math{|} 01 @math{|} 02) 00 (i1 @math{|} i2 @math{|} i3) Format - @math{|} 31 i3 00 00 01 00 i2 Format + @math{|} 31 int[@t{n-refs}] int16*[@t{n-refs}] Format @math{|} 58 Style @result{} 58 @math{|} 31 01? 00? 00? 00? 01 string[@t{fgcolor}] string[@t{bgcolor}] string[@t{typeface}] byte Format @result{} 00 00 count(FormatString Style 58) @@ -1068,16 +1231,16 @@ ValueModUnknown @result{} 58 @math{|} 31 i0 i0 i0 i0 01 00 (01 @math{|} 02 @math @end format @end cartouche -The @code{footnote-number}, if present, specifies a footnote that the -Value references. The footnote's marker is shown appended to the main -text of the Value, as a superscript. +A ValueMod that begins with ``31 i0'' specifies a string to append to +the main text of the Value, as a subscript. The subscript text is a +brief indicator, e.g.@: @samp{a} or @samp{a,b}, with its meaning +indicated by the table caption. In this usage, subscripts are similar +to footnotes. One apparent difference is that a Value can only +reference one footnote but a subscript can list more than one letter. -The @code{subscript}, if present, specifies a string to append to the -main text of the Value, as a subscript. The subscript text is a brief -indicator, e.g.@: @samp{a} or @samp{a,b}, with its meaning indicated -by the table caption. In this usage, subscripts are similar to -footnotes; one apparent difference is that a Value can only reference -one footnote but a subscript can list more than one letter. +A ValueMod that begins with 31 followed by a nonzero ``int'' specifies +a footnote or footnotes that the Value references. Footnote markers +are shown appended to the main text of the Value, as superscripts. The Format, if present, is a format string for substitutions using the syntax explained previously. It appears to be an English-language @@ -1804,6 +1967,115 @@ and in a member with four @code{variableReference} elements, a Always observed as @code{0pt}. @end defvr +@subsubheading The @code{axis} Element + +Parent: @code{facetLevel} @* +Contents: @code{label}? @code{majorTicks} + +@defvr {Attribute} style +The @code{id} of a @code{style} element. +@end defvr + +@subsubheading The @code{label} Element + +Parent: @code{axis} or @code{labelFrame} @* +Contents: @code{text}@math{+} @math{|} @code{descriptionGroup} + +This element represents a label on some aspect of the table. For example, +the table's title is a @code{label}. + +The contents of the label can be one or more @code{text} elements or a +@code{descriptionGroup}. + +@defvr {Attribute} style +@defvrx {Optional} textFrameStyle +Each of these is the @code{id} of a @code{style} element. +@code{style} is the style of the label text, @code{textFrameStyle} the +style for the frame around the label. +@end defvr + +@defvr {Optional} purpose +The kind of entity being labeled, one of @code{title}, +@code{subTitle}, @code{layer}, or @code{footnote}. +@end defvr + +@subsubheading The @code{descriptionGroup} Element + +Parent: @code{label} @* +Contents: (@code{description} @math{|} @code{text})@math{+} + +A @code{descriptionGroup} concatenates one or more elements to form a +label. Each element can be a @code{text} element, which contains +literal text, or a @code{description} element that substitutes a value +or a variable name. + +@defvr {Attribute} target +The @code{id} of an element being described. In the corpus, this is +always @code{faceting}. +@end defvr + +@defvr {Attribute} separator +A string to separate the description of multiple groups, if the +@code{target} has more than one. In the corpus, this is always a +new-line. +@end defvr + +Typical contents for a @code{descriptionGroup} are a value by itself: +@example + +@end example +@noindent or a variable and its value, separated by a colon: +@example +: +@end example + +@subsubheading The @code{description} Element + +Parent: @code{descriptionGroup} @* +Contents: empty + +A @code{description} is like a macro that expands to some property of +the target of its parent @code{descriptionGroup}. + +@defvr {Attribute} name +The name of the property. Only @code{variable} and @code{value} +appear in the corpus. +@end defvr + +@subsubheading The @code{majorTicks} Element + +Parent: @code{axis} @* +Contents: @code{gridline}? + +@defvr {Attribute} labelAngle +@defvrx {Attribute} length +Both always defined to @code{0}. +@end defvr + +@defvr {Attribute} style +@defvrx {Attribute} tickFrameStyle +Each of these is the @code{id} of a @code{style} element. +@code{style} is the style of the tick labels, @code{tickFrameStyle} +the style for the frames around the labels. +@end defvr + +@subsubheading The @code{gridline} Element + +Parent: @code{majorTicks} @* +Contents: empty + +Represents ``gridlines,'' which for a table represents the lines +between the rows or columns of a table (XXX?). + +@defvr {Attribute} style +The style for the gridline. +@end defvr + +@defvr {Attribute} zOrder +Observed as a number between 28 and 31. Does not seem to be +important. +@end defvr + @subsubheading The @code{setCellProperties} Element Parent: @code{facetLayout} @* @@ -1911,6 +2183,7 @@ unknown. * SPV Detail dateTimeFormat Element:: * SPV Detail affix Element:: * SPV Detail relabel Element:: +* SPV Detail union Element:: @end menu @node SPV Detail format Element