work rust
authorBen Pfaff <blp@cs.stanford.edu>
Wed, 19 Nov 2025 17:35:47 +0000 (09:35 -0800)
committerBen Pfaff <blp@cs.stanford.edu>
Wed, 19 Nov 2025 17:35:47 +0000 (09:35 -0800)
18 files changed:
rust/doc/src/SUMMARY.md
rust/doc/src/invoking/pspp-convert.md
rust/doc/src/invoking/pspp-show-spv.md
rust/doc/src/spv/structure.md
rust/pspp/src/cli/convert.rs
rust/pspp/src/cli/show_spv.rs
rust/pspp/src/output.rs
rust/pspp/src/output/drivers/cairo/fsm.rs
rust/pspp/src/output/drivers/cairo/pager.rs
rust/pspp/src/output/drivers/spv.rs
rust/pspp/src/output/page.rs
rust/pspp/src/output/pivot.rs
rust/pspp/src/output/pivot/look_xml.rs
rust/pspp/src/output/spv.rs
rust/pspp/src/output/spv/css.rs
rust/pspp/src/output/spv/html.rs
rust/pspp/src/output/spv/legacy_xml.rs
rust/pspp/src/sys/cooked.rs

index 1810af5d4e63a4753e782d7ce1ddd4224a6fd259..2d8eab1e45771292b12e22eeaecfbd1a4997adba 100644 (file)
@@ -4,7 +4,7 @@
 [License](license.md)
 
 - [Running PSPP](invoking/index.md)
-  - [Converting Data](invoking/pspp-convert.md)
+  - [Converting File Formats](invoking/pspp-convert.md)
   - [Inspecting System Files](invoking/pspp-show.md)
   - [Inspecting Portable Files](invoking/pspp-show-por.md)
   - [Inspecting SPSS/PC+ Files](invoking/pspp-show-pc.md)
index c9dfaf63e57dcee7f42c5a305563bb8bd51d666a..96fd1d288eff8dface5fd09ec924f7e52ba78275 100644 (file)
@@ -40,7 +40,7 @@ for unrecognized extensions.
 ## Converting `.spv` Viewer Files
 
 `pspp convert` can convert SPSS viewer files (`.spv` files) into
-multiple different formats.
+any of the formats supported for PSPP output.
 
 ## Options
 
@@ -52,10 +52,10 @@ multiple different formats.
 
 * `-e <ENCODING>`  
   `--encoding=<ENCODING>`  
-  Sets the character encoding used to read text strings in the input
-  file.  This is not needed for new enough SPSS data files, but older
-  data files do not identify their encoding, and PSPP cannot always
-  guess correctly.
+  For reading SPSS system files only, sets the character encoding used
+  to read text strings.  This is not needed for new enough SPSS system
+  files, but older files do not identify their encoding, and PSPP
+  cannot always guess correctly.
 
   `<ENCODING>` must be one of the labels for encodings in the
   [Encoding Standard].  PSPP does not support UTF-16 or EBCDIC
@@ -73,9 +73,10 @@ multiple different formats.
 
 * `-p <PASSWORD>`  
   `--password=<PASSWORD>`  
-  Specifies the password for reading an encrypted SPSS system file.
+  Specifies the password for reading an encrypted SPSS system or
+  viewer file.
 
-  `pspp convert` reads, but does not write, encrypted system files.
+  PSPP reads, but does not write, encrypted files.
 
   > ⚠️ The password (and other command-line options) may be visible to
   other users on multiuser systems.
index fafbbfa24d42ca215787c7d8bd27d44127d81985..e080f10e51820b5ed8a3d108eecf8a489a5b99bb 100644 (file)
@@ -37,6 +37,19 @@ The following `<MODE>`s are accepted:
   This is useful for converting a TableLook `.tlo` file from SPSS 15
   or earlier into the newer `.stt` format.
 
+## Options
+
+These options apply to any `<MODE>` that reads an SPV file:
+
+* `-p <PASSWORD>`  
+  `--password=<PASSWORD>`  
+  Specifies the password for reading an encrypted SPV file.
+
+  PSPP reads, but does not write, encrypted SPV files.
+
+  > ⚠️ The password (and other command-line options) may be visible to
+  other users on multiuser systems.
+
 ## Input Selection Options
 
 Commands that read an SPV file operate, by default, on all of the
@@ -120,8 +133,8 @@ for use by PSPP developers:
 
 *  `--members=MEMBER...`  
   Include only the objects that include a listed Zip file `MEMBER`.
-  More than one name may be included, comma-separated.  The members
-  in an SPV file may be listed with the `dir` command by adding the
-  `--show-members` option or with the `zipinfo` program included with
-  many operating systems.  Error messages that `pspp-output` prints
-  when it reads SPV files also often include member names.
+  More than one name may be included, comma-separated.  The members in
+  an SPV file may be listed with the `dir` command by adding the
+  `--member-names` option or with `zipinfo` or another program to view
+  Zip files.  Error messages that `pspp-output` prints when it reads
+  SPV files also often include member names.
index af5caf4ba23df5149d054c2cb4459c820278f5ca..cdf7b51d26569d446c4591fc80d06739210a159c 100644 (file)
@@ -351,9 +351,11 @@ text[container_text]
   :commandName?
   :creator-version?
 => html
+
+html :lang=(en) => TEXT
 ```
 This `text` element is nested inside a `container`.  There is a
-different `text` element that is nested inside a `pageParagraph`.
+[different `text` element that is nested inside a `pageParagraph`](#the-text-element-inside-pageparagraph).
 
 This element has the following attributes.
 
@@ -367,64 +369,55 @@ This element has the following attributes.
 * `creator-version`  
   As on the `heading` element.
 
-## The `html` Element
-
-```
-html :lang=(en) => TEXT
-```
-
-The element contains an HTML document as text (or, in practice, as
-CDATA). In some cases, the document starts with `<html>` and ends with
-`</html>`; in others the `html` element is implied.  Generally the HTML
-includes a `head` element with a CSS stylesheet.  The HTML body often
-begins with `<BR>`.
-
-The HTML document uses only the following elements:
-
-* `html`  
-  Sometimes, the document is enclosed with `<html>`...`</html>`.
-
-* `br`  
-  The HTML body often begins with `<BR>` and may contain it as well.
-
-* `b`  
-  `i`  
-  `u`  
-  Styling.
+### The `html` element
 
-* `font`  
-  The attributes `face`, `color`, and `size` are observed.  The value
-  of `color` takes one of the forms `#RRGGBB` or `rgb (R, G, B)`.
-  The value of `size` is a number between 1 and 7, inclusive.
+The `html` element inside `text` contains an HTML document as text
+(or, in practice, as CDATA).  In some cases, the document starts with
+`<html>` and ends with `</html>`, and in others the `html` element is
+implied.  Generally the HTML includes a `head` element with a CSS
+stylesheet.  The HTML body often begins with `<BR>`.  See [Embedded
+HTML](#embedded-html) for details.
 
-The CSS in the corpus is simple.  To understand it, a parser only
-needs to be able to skip white space, `<!--`, and `-->`, and parse style
-only for `p` elements.  Only the following properties matter:
-
-* `color`  
-  In the form `RRGGBB`, e.g.  `000000`, with no leading `#`.
-
-* `font-weight`  
-  Either `bold` or `normal`.
-
-* `font-style`  
-  Either `italic` or `normal`.
-
-* `text-decoration`  
-  Either `underline` or `normal`.
-
-* `font-family`  
-  A font name, commonly `Monospaced` or `SansSerif`.
-
-* `font-size`  
-  Values claim to be in points, e.g. `14pt`, but the values are
-  actually in "device-independent pixels" (px), at 96/inch.
-
-This element has the following attributes.
+The `html` element has the following attributes:
 
 * `lang`  
   This always contains `en` in the corpus.
 
+> A few examples of typical text in the corpus:
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en">&lt;head>&lt;style type="text/css">p{color:0;font-family:Monospaced;font-size:14pt;font-style:normal;font-weight:normal;text-decoration:none}&lt;/style>&lt;/head>&lt;BR>REGRESSION
+>   /MISSING LISTWISE
+>   /STATISTICS COEFF OUTS R ANOVA
+>   /CRITERIA=PIN(.05) POUT(.10)
+>   /NOORIGIN
+>   /DEPENDENT Pvalues
+>   /METHOD=ENTER MMN.</html>
+> ```
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en">&lt;head>&lt;style type="text/css">p{color:0;font-family:Monospaced;font-size:13pt;font-style:normal;font-weight:normal;text-decoration:none}&lt;/style>&lt;/head>&lt;BR>CROSSTABS&lt;BR>&amp;nbsp;&amp;nbsp;/TABLES=facrec&amp;nbsp;BY&amp;nbsp;nq1e&lt;BR>&amp;nbsp;&amp;nbsp;/FORMAT=AVALUE&amp;nbsp;TABLES&lt;BR>&amp;nbsp;&amp;nbsp;/CELLS=COUNT&amp;nbsp;ROW&lt;BR>&amp;nbsp;&amp;nbsp;/COUNT&amp;nbsp;ROUND&amp;nbsp;CELL.</html>
+> ```
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en">&lt;html>
+>   &lt;head>
+>     &lt;style type="text/css">
+>       &lt;!--
+>         p { font-style: normal; text-decoration: none; font-weight: bold; color: 000000; font-size: 14pt; font-family: Trebuchet MS }
+>       -->
+>     &lt;/style>
+>
+>   &lt;/head>
+>   &lt;body>
+>     &lt;b>&lt;font size="5" face="Times New Roman">                                                                     &lt;u>H&lt;/u>&lt;/font>&lt;u>&lt;font size="5" color="#000000" face="Times New Roman">ousehold
+>     Income (In Thousands)&lt;/font>&lt;/u>&lt;font size="5" color="#000000" face="Times New Roman">
+>     &lt;/font>&lt;/b>
+>   &lt;/body>
+> &lt;/html>
+> </html>
+> ```
+
 ## The `table` Element
 
 ```
@@ -475,10 +468,12 @@ This element has the following attributes.
 
 This element contains the following:
 
-* `tableProperties`: See [Legacy
-  Properties](legacy-detail-xml.md#legacy-properties), for details.
+* `tableProperties`  
+  See [Legacy Properties](legacy-detail-xml.md#legacy-properties), for
+  details.
 
-* `tableStructure`, which in turn contains:
+* `tableStructure`  
+  This eleemnt in turn contains:
 
   - Both `path` and `dataPath` for legacy members.
 
@@ -676,21 +671,28 @@ The `pageSetup` element has the following attributes.
 * `space-after`  
   The amount of space between printed objects, typically `12pt`.
 
-## The `text` Element (Inside `pageParagraph`)
+### The `text` Element (Inside `pageParagraph`)
 
 ```
 text[pageParagraph_text] :type=(title | text) => TEXT
 ```
 
 This `text` element is nested inside a `pageParagraph`.  There is a
-different `text` element that is nested inside a `container`.
+[different `text` element that is nested inside a
+`container`](#the-text-element-inside-container).
+
+This element has the following attributes:
 
-The element is either empty, or contains CDATA that holds almost-XHTML
-text: in the corpus, either an `html` or `p` element.  It is
-_almost_-XHTML because the `html` element designates the default
-namespace as `http://xml.spss.com/spss/viewer/viewer-tree` instead of
-an XHTML namespace, and because the CDATA can contain substitution
-variables.  The following variables are supported:
+* `type`  
+  Always `text`.
+
+The element is either empty, or contains CDATA that holds XHTML text
+with a root element of either `html` or `p`.  Text in the XHTML can
+contain substitution variables. The following variables are
+supported:[^1]
+
+[^1]: The `&` characters are escaped as `&amp;`, that is, these are
+    not XML entities, since XML entity names can't begin with `[`.
 
 * `&[Date]`  
   `&[Time]`  
@@ -700,30 +702,274 @@ variables.  The following variables are supported:
   `&[Head2]`  
   `&[Head3]`  
   `&[Head4]`  
-  First-, second-, third-, or fourth-level heading.
+  First-, second-, third-, or fourth-level heading, respectively.
 
 * `&[PageTitle]`  
+  `&[Заголовок страницы]`  
+  `&[頁面標題]`  
   The page title.
 
 * `&[Filename]`  
   Name of the output file.
 
 * `&[Page]`  
+  `&[Страница]`  
+  `&[頁]`  
   The page number.
 
-Typical contents (indented for clarity):
+See [Embedded HTML](#embedded-html) for more information.
+
+> The 23,000 SPV files in the corpus have only 17 unique instances of
+`text` inside `pageParagraph`.  Most of them look similar to this for
+page headers:
+>
+> ```
+> &lt;html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
+>   &lt;head>
+>
+>   &lt;/head>
+>   &lt;body>
+>     &lt;p style="text-align:center; margin-top: 0">
+>       &amp;[PageTitle]
+>     &lt;/p>
+>   &lt;/body>
+> &lt;/html>
+> ```
+>
+> and footers:
+>
+> ```
+> &lt;html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
+>   &lt;head>
+>
+>   &lt;/head>
+>   &lt;body>
+>     &lt;p style="text-align:right; margin-top: 0">
+>       Page &amp;[Page]
+>     &lt;/p>
+>   &lt;/body>
+> &lt;/html>
+> ```
+>
+> Sometimes CSS is present (the original was indented much deeper), with
+> header:
+>
+> ```
+> &lt;html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+>   &lt;head>
+>           &lt;style type="text/css">
+>                   p { font-family: sans-serif;
+>                        font-size: 10pt; text-align: center;
+>                        font-weight: normal;
+>                        color: #000000;
+>                        }
+>           &lt;/style>
+>   &lt;/head>
+>   &lt;body>
+>           &lt;p>&amp;amp;[PageTitle]&lt;/p>
+>   &lt;/body>
+> &lt;/html>
+> ```
+>
+> and footer:
+>
+> ```
+> &lt;html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+>   &lt;head>
+>           &lt;style type="text/css">
+>                   p { font-family: sans-serif;
+>                        font-size: 10pt; text-align: right;
+>                        font-weight: normal;
+>                        color: #000000;
+>                        }
+>           &lt;/style>
+>   &lt;/head>
+>   &lt;body>
+>           &lt;p>Page &amp;amp;[Page]&lt;/p>
+>   &lt;/body>
+> &lt;/html>
+> ```
+>
+> No files in the corpus show any more sophisticated use of features
+> than these examples.
+
+## Embedded HTML
+
+Structure XML contains embedded HTML in two contexts:
+
+- The [`text` element inside `container`](#the-text-element-inside-container).
+
+- The [`text` element inside
+  `pageParagraph`](#the-text-element-inside-pageparagraph).
+
+The use of HTML in both cases is similar.  These HTML documents use
+only the following elements:
+
+* `html`  
+  Sometimes, the document is enclosed with `<html>`...`</html>`.
+
+* `head`  
+  The document often contains a `head` element.  It can be
+  empty or it can contain a `style` element, in turn enclosing CSS
+  within `<!--` and `-->`.  See [embedded CSS](#embedded-ccs), below,
+  for details.
+
+* `body`  
+  The document often contains a `body` element that contains the
+  content.
+
+* `p`  
+  The document often contains a `p` element that contains the content.
+  [Inside `pageParagraph`](#the-text-element-inside-pageparagraph)
+  only, the document can contain multiple paragraphs.
+
+  The following attributes are observed:
+
+  - `align`  
+    With value `left`, `center`, or `right`.
+
+  - `style`  
+    With value `text-align:<align>; margin-top: 0`, where `<align>` is
+    one of `left`, `center`, or `right`, or simply `margin-top: 0`.
+
+* `br`  
+  The HTML body often begins with a "break" tag and may contain them
+  as well.
+
+  Embedded HTML writes most tag names in lowercase but this one is
+  usually in uppercase, as `<BR>`.
+
+* `b`  
+  `i`  
+  `u`  
+  `strike`  
+  Styling.
+
+* `font`  
+  The following attributes are observed:
+
+  - `face`  
+    A typeface, most often `Monospaced` or `SansSerif`.
+
+  - `color`  
+    One of the forms `#RRGGBB` or `rgb (R, G, B)`.
+
+  - `size`  
+    A number between 1 and 7 with the following meanings:
+
+    | `size` |    Size |
+    |-------:|--------:|
+    |  1[^2] |    6 pt |
+    |      2 |  7.5 pt |
+    |      3 |    9 pt |
+    |      4 | 10.5 pt |
+    |      5 | 13.5 pt |
+    |      6 |   18 pt |
+    |      7 |   27 pt |
+
+    [^2]: This `size` doesn't appear in the corpus.  The size listed
+    is an extrapolation based on what browsers usually do.
+
+> It appears that pasting HTML into the SPSS viewer can cause more
+> general HTML to be included.  The following elements in the corpus,
+> each of these is observed in only a few files, appear to be added by
+> pasting HTML from another application:
+>
+> * `strong`  
+>   `em`  
+>   Styling.
+>
+> * `span`  
+>   The `style` attribute is used a bit.
+>
+> * `li`  
+>   `ul`  
+>   Seen in only one file in the corpus.
+>
+> * `a`  
+>   Seen in only two files in the corpus.  SPSS doesn't allow the link
+>   to be seen or visited.
+>
+> * `table`  
+>   `td`  
+>   `tr`  
+>   Seen in only one file in the corpus.  SPSS doesn't render the
+>   table properly.
+>
+> * `img`  
+>   Seen in only one file in the corpus.  In this file, the `src`
+>   attribute was an invalid `jar:` URL.
+
+Text in embedded HTML often uses non-breaking spaces (U+00A0
+NON-BREAKING SPACE), often written as `&#160;` or `&nbsp;`.  In
+embedded HTML, newlines must be treated as line breaks.
+
+### Embedded CSS
+
+The CSS in the corpus is simple.  To understand it, a parser only
+needs to be able to skip white space, `<!--`, and `-->`, and parse style
+only for `p` elements.  Only the following properties matter:
+
+* `color`  
+  In the form `RRGGBB`, e.g.  `000000`, with no leading `#`.
+
+* `font-weight`  
+  Either `bold` or `normal`.
+
+* `font-style`  
+  Either `italic` or `normal`.
+
+* `text-decoration`  
+  Either `underline` or `normal`.
+
+* `font-family`  
+  A font name, commonly `Monospaced` or `SansSerif`.
+
+* `font-size`  
+  Values claim to be in points, e.g. `14pt`, but the values are
+  actually in "device-independent pixels" (px), at 96/inch.
+
+### Examples
+
+Text that looks like "plain **bold** *italic* ~~strikeout~~", for use
+[inside `pageParagraph`]:
 
 ```
-<html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
-    <head></head>
-    <body>
-        <p style="text-align:right; margin-top: 0">Page &[Page]</p>
-    </body>
-</html>
+&lt;html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+  &lt;head>
+
+  &lt;/head>
+  &lt;body>
+    &lt;p>
+      plain&amp;#160;&lt;font color="#000000" size="3" face="Monospaced">&lt;b>bold&lt;/b>&lt;/font>&amp;#160;&lt;font color="#000000" size="3" face="Monospaced">&lt;i>italic&lt;/i>&amp;#160;&lt;strike>strikeout&lt;/strike>&lt;/font>
+    &lt;/p>
+  &lt;/body>
+&lt;/html>
 ```
 
-This element has the following attributes.
+Another example, also for use [inside `pageParagraph`], of three
+paragraphs, the first left justified, the second center justified with
+a large font, and the third right justified:
 
-* `type`  
-  Always `text`.
+```
+&lt;html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+  &lt;head>
+
+  &lt;/head>
+  &lt;body>
+    &lt;p>
+      left
+    &lt;/p>
+    &lt;p align="center">
+      &lt;font color="#000000" size="5" face="Monospaced">center&amp;#160;large&lt;/font>
+    &lt;/p>
+    &lt;p align="right">
+      &lt;font color="#000000" size="3" face="Monospaced">&lt;b>&lt;i>right&lt;/i>&lt;/b>&lt;/font>
+    &lt;/p>
+  &lt;/body>
+&lt;/html>
+```
+
+[inside `pageParagraph`]: #the-text-element-inside-pageparagraph
+[inside `container`]: #the-text-element-inside-container
 
index af206c0639c40af802a1a58bbe67d44a32477d55..c2fdb6eff7a2f9fc950369fc6e3d825cd0ab4d38 100644 (file)
@@ -57,7 +57,11 @@ pub struct Convert {
     #[arg(short = 'e', long, value_parser = parse_encoding)]
     encoding: Option<&'static Encoding>,
 
-    /// Password for decryption, with or without what SPSS calls "password encryption".
+    /// Password for decryption.
+    ///
+    /// In addition to file encryption, SPSS supports a feature called "password
+    /// encryption".  The password specified can be specified with or without
+    /// "password encryption".
     ///
     /// Specify only for an encrypted system file.
     #[clap(short, long)]
index fa399f9067a5d37a5c605b7ee686dda2c0c03525..aaa4cf6b9f3bf639c429e54e62a5e10c42d40423 100644 (file)
@@ -16,7 +16,7 @@
 
 use anyhow::Result;
 use clap::{Args, ValueEnum};
-use pspp::output::{Criteria, Item};
+use pspp::output::{Criteria, Item, spv};
 use std::{fmt::Display, path::PathBuf};
 
 /// Show information about SPSS viewer files (SPV files).
@@ -33,6 +33,16 @@ pub struct ShowSpv {
     #[arg(required = true)]
     input: PathBuf,
 
+    /// Password for decryption.
+    ///
+    /// In addition to file encryption, SPSS supports a feature called "password
+    /// encryption".  The password specified can be specified with or without
+    /// "password encryption".
+    ///
+    /// Specify only for an encrypted SPV file.
+    #[clap(short, long)]
+    password: Option<String>,
+
     /// Input selection options.
     #[command(flatten)]
     criteria: Criteria,
@@ -80,7 +90,10 @@ impl ShowSpv {
     pub fn run(self) -> Result<()> {
         match self.mode {
             Mode::Directory => {
-                let item = Item::from_spv_file(&self.input)?.0;
+                let item = spv::ReadOptions::new()
+                    .with_password(self.password)
+                    .open_file(&self.input)?
+                    .into_item();
                 let item = self.criteria.apply(item);
                 for child in item.details.children() {
                     print_item_directory(&child, 0, self.show_member_names);
@@ -88,7 +101,10 @@ impl ShowSpv {
                 Ok(())
             }
             Mode::View => {
-                let item = Item::from_spv_file(&self.input)?.0;
+                let item = spv::ReadOptions::new()
+                    .with_password(self.password)
+                    .open_file(&self.input)?
+                    .into_item();
                 let item = self.criteria.apply(item);
                 for child in item.details.children() {
                     println!("{child}");
index f82ab49db7bd7b3a0e7c3b63f79ea2ba3f6217b7..f07da23a9481615a76586841767f4d5d3ed93733 100644 (file)
@@ -46,7 +46,7 @@ pub mod drivers;
 pub mod page;
 pub mod pivot;
 pub mod render;
-mod spv;
+pub mod spv;
 pub mod table;
 
 /// A single output item.
index 8f87532850906dd8530284ed2e3af92ed808c59b..2a55bb57c5c1225de3cdbe9a7e6a2c02138f9560 100644 (file)
@@ -309,29 +309,24 @@ fn avoid_decimal_split(mut s: String) -> String {
     s
 }
 
-struct CairoDevice<'a> {
-    style: &'a CairoFsmStyle,
-    params: &'a Params,
-    context: &'a Context,
-}
-
-impl CairoDevice<'_> {
-    fn layout_cell(&self, cell: &DrawCell, mut bb: Rect2, clip: &Rect2) -> Coord2 {
+impl<'a> DrawCell<'a> {
+    pub(crate) fn layout(&self, bb: &Rect2, layout: &mut Layout, default_font: &FontDescription) {
         // XXX rotation
-        //let h = if cell.rotate { Axis2::Y } else { Axis2::X };
 
-        let layout = self.style.new_layout(self.context);
+        let mut bb = bb.clone();
+        layout.set_attributes(None);
 
-        let cell_font = if !cell.font_style.font.is_empty() {
-            Some(parse_font_style(&cell.font_style))
+        let parsed_font;
+        let font = if !self.font_style.font.is_empty() {
+            parsed_font = parse_font_style(&self.font_style);
+            &parsed_font
         } else {
-            None
+            default_font
         };
-        let font = cell_font.as_ref().unwrap_or(&self.style.font);
         layout.set_font_description(Some(font));
 
-        let (body, suffixes) = cell.display().split_suffixes();
-        let horz_align = cell.horz_align(&body);
+        let (body, suffixes) = self.display().split_suffixes();
+        let horz_align = self.horz_align(&body);
 
         let mut attrs = None;
         let mut body = if let Some(markup) = body.markup() {
@@ -347,7 +342,7 @@ impl CairoDevice<'_> {
         };
 
         match horz_align {
-            HorzAlign::Decimal { offset, decimal } if !cell.rotate => {
+            HorzAlign::Decimal { offset, decimal } if !self.rotate => {
                 let decimal_position = if let Some(position) = body.rfind(char::from(decimal)) {
                     layout.set_text(&body[position..]);
                     layout.set_width(-1);
@@ -360,7 +355,7 @@ impl CairoDevice<'_> {
             _ => (),
         }
 
-        if cell.font_style.underline {
+        if self.font_style.underline {
             attrs
                 .get_or_insert_default()
                 .insert(AttrInt::new_underline(Underline::Single));
@@ -395,11 +390,11 @@ impl CairoDevice<'_> {
                 let footnote_width = layout.size().0.max(0) as usize;
 
                 // Bound the adjustment by the width of the right margin.
-                let right_margin = px_to_xr(cell.cell_style.margins[Axis2::X][1].max(0) as usize);
+                let right_margin = px_to_xr(self.cell_style.margins[Axis2::X][1].max(0) as usize);
                 let footnote_adjustment = min(footnote_width, right_margin);
 
                 // Adjust the bounding box.
-                if cell.rotate {
+                if self.rotate {
                     bb[Axis2::X].end = bb[Axis2::X].end.saturating_sub(footnote_adjustment);
                 } else {
                     bb[Axis2::X].end = bb[Axis2::X].end.saturating_add(footnote_adjustment);
@@ -442,33 +437,57 @@ impl CairoDevice<'_> {
         } else {
             layout.set_width(bb[Axis2::X].len() as i32);
         }
+    }
 
-        let size = layout.size();
-
-        if !clip.is_empty() {
-            self.context.save().unwrap();
-            if !cell.rotate {
-                xr_clip(self.context, clip);
-            }
-            if cell.rotate {
-                let extra = bb[Axis2::X].len().saturating_sub(size.1.max(0) as usize);
-                let halign_offset = extra / 2;
-                self.context.translate(
-                    xr_to_pt(bb[Axis2::X].start + halign_offset),
-                    xr_to_pt(bb[Axis2::Y].end),
-                );
-                self.context.rotate(-PI / 2.0);
-            } else {
-                self.context
-                    .translate(xr_to_pt(bb[Axis2::X].start), xr_to_pt(bb[Axis2::Y].start));
-            }
-            show_layout(self.context, &layout);
-            self.context.restore().unwrap();
+    pub(crate) fn draw(
+        &self,
+        bb: &Rect2,
+        layout: &Layout,
+        clip: Option<&Rect2>,
+        context: &Context,
+    ) {
+        context.save().unwrap();
+        if !self.rotate
+            && let Some(clip) = clip
+        {
+            xr_clip(context, clip);
+        }
+        if self.rotate {
+            let extra = bb[Axis2::X]
+                .len()
+                .saturating_sub(layout.size().1.max(0) as usize);
+            let halign_offset = extra / 2;
+            context.translate(
+                xr_to_pt(bb[Axis2::X].start + halign_offset),
+                xr_to_pt(bb[Axis2::Y].end),
+            );
+            context.rotate(-PI / 2.0);
+        } else {
+            context.translate(xr_to_pt(bb[Axis2::X].start), xr_to_pt(bb[Axis2::Y].start));
         }
+        show_layout(context, &layout);
+        context.restore().unwrap();
+    }
+}
 
-        layout.set_attributes(None);
+struct CairoDevice<'a> {
+    style: &'a CairoFsmStyle,
+    params: &'a Params,
+    context: &'a Context,
+}
+
+impl CairoDevice<'_> {
+    fn measure_cell(&self, cell: &DrawCell, bb: Rect2) -> Coord2 {
+        let mut layout = self.style.new_layout(self.context);
+        cell.layout(&bb, &mut layout, &self.style.font);
+        let (width, height) = layout.size();
+        Coord2::new(width.max(0) as usize, height.max(0) as usize)
+    }
 
-        Coord2::new(size.0.max(0) as usize, size.1.max(0) as usize)
+    fn cell_draw(&self, cell: &DrawCell, bb: Rect2, clip: &Rect2) {
+        let mut layout = self.style.new_layout(self.context);
+        cell.layout(&bb, &mut layout, &self.style.font);
+        cell.draw(&bb, &layout, Some(clip), &self.context);
     }
 
     fn do_draw_line(
@@ -515,19 +534,14 @@ impl Device for CairoDevice<'_> {
             }
         }
 
-        /// An empty clipping rectangle.
-        fn clip() -> Rect2 {
-            Rect2::default()
-        }
-
         enum_map![
             Extreme::Min => {
                 let bb = Rect2::new(0..1, 0..usize::MAX);
-                add_margins(cell, self.layout_cell(cell, bb, &clip()).x())
+                add_margins(cell, self.measure_cell(cell, bb).x())
             }
             Extreme::Max => {
                 let bb = Rect2::new(0..usize::MAX, 0..usize::MAX);
-                add_margins(cell, self.layout_cell(cell, bb, &clip()).x())
+                add_margins(cell, self.measure_cell(cell, bb).x())
             },
         ]
     }
@@ -538,7 +552,7 @@ impl Device for CairoDevice<'_> {
             0..width.saturating_sub(px_to_xr(margins[Axis2::X].len())),
             0..usize::MAX,
         );
-        self.layout_cell(cell, bb, &Rect2::default()).y() + margin(cell, Axis2::Y)
+        self.measure_cell(cell, bb).y() + margin(cell, Axis2::Y)
     }
 
     fn adjust_break(&self, _cell: &Content, _size: Coord2) -> usize {
@@ -741,7 +755,7 @@ impl Device for CairoDevice<'_> {
                 .saturating_sub(draw_cell.cell_style.margins[axis][0].max(0) as usize);
         }
         if bb[Axis2::X].start < bb[Axis2::X].end && bb[Axis2::Y].start < bb[Axis2::Y].end {
-            self.layout_cell(draw_cell, bb, clip);
+            self.cell_draw(draw_cell, bb, clip);
         }
         self.context.restore().unwrap();
     }
index bb22b455a23a22c3d866c80dbefdb364538a5e9f..c26cff9c53a579a6a9bb7906f2c89a38a1a9df53 100644 (file)
@@ -24,10 +24,11 @@ use crate::output::{
     Item, ItemCursor,
     drivers::cairo::{
         fsm::{CairoFsm, CairoFsmStyle},
-        horz_align_to_pango, xr_to_pt,
+        xr_to_pt,
     },
     page::Heading,
-    pivot::Axis2,
+    pivot::{Axis2, CellStyle, FontStyle, Rect2, ValueOptions},
+    table::DrawCell,
 };
 
 #[derive(Clone, Debug)]
@@ -207,7 +208,7 @@ fn measure_headings(page_style: &CairoPageStyle, fsm_style: &CairoFsmStyle) -> [
 
 fn render_heading(
     context: &Context,
-    font: &FontDescription,
+    default_font: &FontDescription,
     heading: &Heading,
     _page_number: i32,
     width: usize,
@@ -216,23 +217,27 @@ fn render_heading(
 ) -> usize {
     let pangocairo_context = pangocairo::functions::create_context(context);
     pangocairo::functions::context_set_resolution(&pangocairo_context, font_resolution);
-    let layout = Layout::new(&pangocairo_context);
-    layout.set_font_description(Some(font));
 
     let mut y = 0;
+    let default_cell_style = CellStyle::default();
+    let default_font_style = FontStyle::default();
+    let value_options = ValueOptions::default();
     for paragraph in &heading.0 {
         // XXX substitute heading variables
-        layout.set_markup(&paragraph.text);
-
-        layout.set_alignment(horz_align_to_pango(paragraph.align));
-        layout.set_width(width as i32);
-
-        context.save().unwrap();
-        context.translate(0.0, xr_to_pt(y + base_y));
-        pangocairo::functions::show_layout(context, &layout);
-        context.restore().unwrap();
-
-        y += layout.height() as usize;
+        let cell = DrawCell {
+            rotate: false,
+            inner: &paragraph.inner,
+            cell_style: paragraph.cell_style().unwrap_or(&default_cell_style),
+            font_style: paragraph.font_style().unwrap_or(&default_font_style),
+            subscripts: paragraph.subscripts(),
+            footnotes: paragraph.footnotes(),
+            value_options: &value_options,
+        };
+        let mut layout = Layout::new(&pangocairo_context);
+        let bb = Rect2::new(0..width, y + base_y..usize::MAX);
+        cell.layout(&bb, &mut layout, &default_font);
+        cell.draw(&bb, &layout, None, context);
+        y += layout.size().1 as usize;
     }
     y
 }
index 82e424b0867204a3baabfa17a4bc2067ae43538c..93f2c8d1adf6de366c9ea3b1557af57ce8fbcaf1 100644 (file)
@@ -670,7 +670,10 @@ where
                     for paragraph in &heading.0 {
                         w.create_element("vtx:text")
                             .with_attribute(("text", "title"))
-                            .write_text_content(BytesText::new(&paragraph.text))?;
+                            .write_text_content(
+                                // XXX Need to instead generate HTML and then output it as a string.
+                                BytesText::new(&paragraph.display(()).to_string()),
+                            )?;
                     }
                     Ok(())
                 })?;
index f3eba60394d1cc71c02450158203b1bd9f7b6077..1330aa576755efd51fd38119d86656dec27c4ecd 100644 (file)
@@ -20,9 +20,9 @@ use enum_map::{EnumMap, enum_map};
 use paper_sizes::{Catalog, Length, PaperSize, Unit};
 use serde::{Deserialize, Deserializer, Serialize, de::Error};
 
-use crate::output::{pivot::FontStyle, spv::html::parse_paragraphs};
+use crate::output::{pivot::Value, spv::html::parse_paragraphs};
 
-use super::pivot::{Axis2, HorzAlign};
+use super::pivot::Axis2;
 
 #[derive(Copy, Clone, Debug, Default, PartialEq, Eq, Deserialize, Serialize)]
 #[serde(rename_all = "snake_case")]
@@ -50,25 +50,8 @@ pub enum ChartSize {
     QuarterHeight,
 }
 
-#[derive(Clone, Debug, PartialEq)]
-pub struct Paragraph {
-    pub text: String,
-    pub align: HorzAlign,
-    pub font_style: FontStyle,
-}
-
-impl Default for Paragraph {
-    fn default() -> Self {
-        Self {
-            text: Default::default(),
-            align: HorzAlign::Left,
-            font_style: FontStyle::default().with_size(10),
-        }
-    }
-}
-
 #[derive(Clone, Debug, Default, PartialEq)]
-pub struct Heading(pub Vec<Paragraph>);
+pub struct Heading(pub Vec<Value>);
 
 impl<'de> Deserialize<'de> for Heading {
     fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
index 8f6a91e3734b684c25990e8c57163dfc7e0650f1..5d7725676bea2b899575cf87b919e7b2ac298815 100644 (file)
@@ -155,39 +155,6 @@ impl Serialize for Area {
     }
 }
 
-impl Area {
-    pub fn default_cell_style(self) -> CellStyle {
-        use HorzAlign::*;
-        use VertAlign::*;
-        let (horz_align, vert_align, hmargins, vmargins) = match self {
-            Area::Title => (Some(Center), Middle, [8, 11], [1, 8]),
-            Area::Caption => (Some(Left), Top, [8, 11], [1, 1]),
-            Area::Footer => (Some(Left), Top, [11, 8], [2, 3]),
-            Area::Corner => (Some(Left), Bottom, [8, 11], [1, 1]),
-            Area::Labels(Axis2::X) => (Some(Center), Top, [8, 11], [1, 3]),
-            Area::Labels(Axis2::Y) => (Some(Left), Top, [8, 11], [1, 3]),
-            Area::Data(_) => (None, Top, [8, 11], [1, 1]),
-            Area::Layers => (Some(Left), Bottom, [8, 11], [1, 3]),
-        };
-        CellStyle {
-            horz_align,
-            vert_align,
-            margins: enum_map! { Axis2::X => hmargins, Axis2::Y => vmargins },
-        }
-    }
-
-    pub fn default_font_style(self) -> FontStyle {
-        FontStyle::default().with_bold(self == Area::Title)
-    }
-
-    pub fn default_area_style(self) -> AreaStyle {
-        AreaStyle {
-            cell_style: self.default_cell_style(),
-            font_style: self.default_font_style(),
-        }
-    }
-}
-
 /// Distinguishes [Area::Data] for even-numbered and odd-numbered rows.
 #[derive(Copy, Clone, Debug, Default, Enum, PartialEq, Eq)]
 pub enum RowParity {
@@ -1037,7 +1004,7 @@ impl Default for Look {
             }),
             footnote_marker_type: FootnoteMarkerType::default(),
             footnote_marker_position: FootnoteMarkerPosition::default(),
-            areas: EnumMap::from_fn(Area::default_area_style),
+            areas: EnumMap::from_fn(AreaStyle::default_for_area),
             borders: Border::default_borders(),
             print_all_layers: false,
             paginate_layers: false,
@@ -1190,6 +1157,15 @@ pub struct AreaStyle {
     pub font_style: FontStyle,
 }
 
+impl AreaStyle {
+    pub fn default_for_area(area: Area) -> Self {
+        Self {
+            cell_style: CellStyle::default_for_area(area),
+            font_style: FontStyle::default_for_area(area),
+        }
+    }
+}
+
 #[derive(Clone, Debug, Serialize, PartialEq)]
 pub struct CellStyle {
     /// `None` means "mixed" alignment: align strings to the left, numbers to
@@ -1206,6 +1182,43 @@ pub struct CellStyle {
     pub margins: EnumMap<Axis2, [i32; 2]>,
 }
 
+impl Default for CellStyle {
+    fn default() -> Self {
+        Self::default_for_area(Area::default())
+    }
+}
+
+impl CellStyle {
+    pub fn default_for_area(area: Area) -> Self {
+        use HorzAlign::*;
+        use VertAlign::*;
+        let (horz_align, vert_align, hmargins, vmargins) = match area {
+            Area::Title => (Some(Center), Middle, [8, 11], [1, 8]),
+            Area::Caption => (Some(Left), Top, [8, 11], [1, 1]),
+            Area::Footer => (Some(Left), Top, [11, 8], [2, 3]),
+            Area::Corner => (Some(Left), Bottom, [8, 11], [1, 1]),
+            Area::Labels(Axis2::X) => (Some(Center), Top, [8, 11], [1, 3]),
+            Area::Labels(Axis2::Y) => (Some(Left), Top, [8, 11], [1, 3]),
+            Area::Data(_) => (None, Top, [8, 11], [1, 1]),
+            Area::Layers => (Some(Left), Bottom, [8, 11], [1, 3]),
+        };
+        Self {
+            horz_align,
+            vert_align,
+            margins: enum_map! { Axis2::X => hmargins, Axis2::Y => vmargins },
+        }
+    }
+    pub fn with_horz_align(self, horz_align: Option<HorzAlign>) -> Self {
+        Self { horz_align, ..self }
+    }
+    pub fn with_vert_align(self, vert_align: VertAlign) -> Self {
+        Self { vert_align, ..self }
+    }
+    pub fn with_margins(self, margins: EnumMap<Axis2, [i32; 2]>) -> Self {
+        Self { margins, ..self }
+    }
+}
+
 #[derive(Copy, Clone, Debug, PartialEq, Deserialize, Serialize)]
 #[serde(rename_all = "snake_case")]
 pub enum HorzAlign {
@@ -1237,6 +1250,26 @@ impl HorzAlign {
     }
 }
 
+/// Unknown horizontal alignment.
+#[derive(Copy, Clone, Debug, PartialEq, Eq)]
+pub struct UnknownHorzAlign;
+
+impl FromStr for HorzAlign {
+    type Err = UnknownHorzAlign;
+
+    fn from_str(s: &str) -> Result<Self, Self::Err> {
+        if s.eq_ignore_ascii_case("left") {
+            Ok(Self::Left)
+        } else if s.eq_ignore_ascii_case("center") {
+            Ok(Self::Center)
+        } else if s.eq_ignore_ascii_case("right") {
+            Ok(Self::Right)
+        } else {
+            Err(UnknownHorzAlign)
+        }
+    }
+}
+
 #[derive(Copy, Clone, Debug, PartialEq, Eq, Serialize)]
 #[serde(rename_all = "snake_case")]
 pub enum VertAlign {
@@ -1278,6 +1311,9 @@ impl Default for FontStyle {
 }
 
 impl FontStyle {
+    pub fn default_for_area(area: Area) -> Self {
+        Self::default().with_bold(area == Area::Title)
+    }
     pub fn with_size(self, size: i32) -> Self {
         Self { size, ..self }
     }
@@ -2930,7 +2966,20 @@ impl Value {
 
 impl Debug for Value {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        let name = match &self.inner {
+            ValueInner::Number(_) => "Number",
+            ValueInner::String(_) => "String",
+            ValueInner::Variable(_) => "Variable",
+            ValueInner::Text(_) => "Text",
+            ValueInner::Markup(_) => "Markup",
+            ValueInner::Template(_) => "Template",
+            ValueInner::Empty => "Empty",
+        };
+        f.write_str(name)?;
         write!(f, "{:?}", self.display(()).to_string())?;
+        if let Some(markup) = self.inner.markup() {
+            write!(f, " (markup: {markup:?})")?;
+        }
         if let Some(styling) = &self.styling {
             write!(f, " ({styling:?})")?;
         }
index 96086d8e85c528633660b9becca1cd81bc8a96e6..fcefe65c355e5e12e05b3918237d881037271f5c 100644 (file)
@@ -367,6 +367,12 @@ impl Length {
     }
 }
 
+impl From<Length> for paper_sizes::Length {
+    fn from(value: Length) -> Self {
+        Self::new(value.0, paper_sizes::Unit::Inch)
+    }
+}
+
 impl Debug for Length {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
         write!(f, "{:.2}in", self.0)
index 9512da34518f314235e9f0e53d7b8d9a3c6e88ca..41715c0458a015c2c3fd87c2f2a2a032d13d0ef9 100644 (file)
@@ -20,21 +20,26 @@ use std::{
     path::Path,
 };
 
-use anyhow::Context;
+use anyhow::{Context, anyhow};
 use binrw::{BinRead, error::ContextExt};
 use cairo::ImageSurface;
 use displaydoc::Display;
+use paper_sizes::PaperSize;
 use serde::Deserialize;
 use zip::{ZipArchive, result::ZipError};
 
-use crate::output::{
-    Details, Item, SpvInfo, SpvMembers, Text,
-    page::PageSetup,
-    pivot::{Look, TableProperties, Value},
-    spv::{
-        legacy_bin::LegacyBin,
-        legacy_xml::Visualization,
-        light::{LightError, LightTable},
+use crate::{
+    crypto::EncryptedFile,
+    output::{
+        Details, Item, SpvInfo, SpvMembers, Text,
+        page::{self},
+        pivot::{Axis2, Length, Look, TableProperties, Value},
+        spv::{
+            html::parse_paragraphs,
+            legacy_bin::LegacyBin,
+            legacy_xml::Visualization,
+            light::{LightError, LightTable},
+        },
     },
 };
 
@@ -44,38 +49,80 @@ mod legacy_bin;
 mod legacy_xml;
 mod light;
 
-#[derive(Debug, Display, thiserror::Error)]
-pub enum Error {
-    /// Not an SPV file.
-    NotSpv,
-
-    /// {0}
-    ZipError(#[from] ZipError),
+/// Options for reading an SPV file.
+#[derive(Clone, Debug, Default)]
+pub struct ReadOptions {
+    /// Password to use to unlock an encrypted SPV file.
+    ///
+    /// For an encrypted SPV file, this must be set to the (encoded or
+    /// unencoded) password.
+    ///
+    /// For a plaintext SPV file, this must be None.
+    pub password: Option<String>,
+}
 
-    /// {0}
-    IoError(#[from] std::io::Error),
+impl ReadOptions {
+    /// Construct a new [ReadOptions] without a password.
+    pub fn new() -> Self {
+        Self::default()
+    }
 
-    /// {0}
-    DeError(#[from] quick_xml::DeError),
+    /// Causes the file to be read by decrypting it with the given `password` or
+    /// without decrypting if `password` is None.
+    pub fn with_password(self, password: Option<String>) -> Self {
+        Self { password }
+    }
 
-    /// {0}
-    BinrwError(#[from] binrw::Error),
+    /// Opens the file at `path`.
+    pub fn open_file<P>(mut self, path: P) -> Result<SpvFile, anyhow::Error>
+    where
+        P: AsRef<Path>,
+    {
+        let file = File::open(path)?;
+        if let Some(password) = self.password.take() {
+            self.open_reader_encrypted(file, password)
+        } else {
+            Self::open_reader_inner(file)
+        }
+    }
 
-    /// {0}
-    LightError(#[from] LightError),
+    /// Opens the file read from `reader`.
+    fn open_reader_encrypted<R>(self, reader: R, password: String) -> Result<SpvFile, anyhow::Error>
+    where
+        R: Read + Seek + 'static,
+    {
+        Self::open_reader_inner(
+            EncryptedFile::new(reader)?
+                .unlock(password.as_bytes())
+                .map_err(|_| anyhow!("Incorrect password."))?,
+        )
+    }
 
-    /// {0}
-    CairoError(#[from] cairo::IoError),
-}
+    /// Opens the file read from `reader`.
+    pub fn open_reader<R>(mut self, reader: R) -> Result<SpvFile, anyhow::Error>
+    where
+        R: Read + Seek + 'static,
+    {
+        if let Some(password) = self.password.take() {
+            self.open_reader_encrypted(reader, password)
+        } else {
+            Self::open_reader_inner(reader)
+        }
+    }
 
-impl Item {
-    pub fn from_spv_file(path: impl AsRef<Path>) -> Result<(Self, Option<PageSetup>), Error> {
-        Self::from_spv_reader(File::open(path.as_ref())?)
+    fn open_reader_inner<R>(reader: R) -> Result<SpvFile, anyhow::Error>
+    where
+        R: Read + Seek + 'static,
+    {
+        // Open archive.
+        let mut archive = ZipArchive::new(reader).map_err(|error| match error {
+            ZipError::InvalidArchive(_) => Error::NotSpv,
+            other => other.into(),
+        })?;
+        Ok(Self::from_spv_zip_archive(&mut archive)?)
     }
 
-    pub fn from_spv_zip_archive<R>(
-        archive: &mut ZipArchive<R>,
-    ) -> Result<(Self, Option<PageSetup>), Error>
+    fn from_spv_zip_archive<R>(archive: &mut ZipArchive<R>) -> Result<SpvFile, Error>
     where
         R: Read + Seek,
     {
@@ -101,22 +148,55 @@ impl Item {
             }
         }
 
-        Ok((items.into_iter().collect(), page_setup))
+        Ok(SpvFile {
+            item: items.into_iter().collect(),
+            page_setup,
+        })
     }
+}
 
-    pub fn from_spv_reader<R>(reader: R) -> Result<(Self, Option<PageSetup>), Error>
-    where
-        R: Read + Seek,
-    {
-        // Open archive.
-        let mut archive = ZipArchive::new(reader).map_err(|error| match error {
-            ZipError::InvalidArchive(_) => Error::NotSpv,
-            other => other.into(),
-        })?;
-        Self::from_spv_zip_archive(&mut archive)
+pub struct SpvFile {
+    /// SPV file contents.
+    pub item: Item,
+
+    /// The page setup in the SPV file, if any.
+    pub page_setup: Option<page::PageSetup>,
+}
+
+impl SpvFile {
+    pub fn into_parts(self) -> (Item, Option<page::PageSetup>) {
+        (self.item, self.page_setup)
+    }
+
+    pub fn into_item(self) -> Item {
+        self.item
     }
 }
 
+#[derive(Debug, Display, thiserror::Error)]
+pub enum Error {
+    /// Not an SPV file.
+    NotSpv,
+
+    /// {0}
+    ZipError(#[from] ZipError),
+
+    /// {0}
+    IoError(#[from] std::io::Error),
+
+    /// {0}
+    DeError(#[from] quick_xml::DeError),
+
+    /// {0}
+    BinrwError(#[from] binrw::Error),
+
+    /// {0}
+    LightError(#[from] LightError),
+
+    /// {0}
+    CairoError(#[from] cairo::IoError),
+}
+
 fn new_error_item(message: impl Into<Value>) -> Item {
     Text::new_log(message).into_item().with_label("Error")
 }
@@ -125,7 +205,7 @@ fn read_heading<R>(
     archive: &mut ZipArchive<R>,
     file_number: usize,
     structure_member: &str,
-) -> Result<(Vec<Item>, Option<PageSetup>), Error>
+) -> Result<(Vec<Item>, Option<page::PageSetup>), Error>
 where
     R: Read + Seek,
 {
@@ -139,7 +219,11 @@ where
         Err(error) => panic!("{error:?}"),
     };
     let page_setup = heading.page_setup.take();
-    Ok((heading.decode(archive, structure_member)?, page_setup))
+    dbg!(page_setup);
+    Ok((
+        heading.decode(archive, structure_member)?,
+        None, /*XXX*/
+    ))
 }
 
 #[derive(Deserialize, Debug)]
@@ -227,6 +311,179 @@ impl Heading {
     }
 }
 
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageSetup {
+    #[serde(rename = "@initial-page-number")]
+    pub initial_page_number: Option<i32>,
+    #[serde(rename = "@chart-size")]
+    pub chart_size: Option<ChartSize>,
+    #[serde(rename = "@margin-left")]
+    pub margin_left: Option<Length>,
+    #[serde(rename = "@margin-right")]
+    pub margin_right: Option<Length>,
+    #[serde(rename = "@margin-top")]
+    pub margin_top: Option<Length>,
+    #[serde(rename = "@margin-bottom")]
+    pub margin_bottom: Option<Length>,
+    #[serde(rename = "@paper-height")]
+    pub paper_height: Option<Length>,
+    #[serde(rename = "@paper-width")]
+    pub paper_width: Option<Length>,
+    #[serde(rename = "@reference-orientation")]
+    pub reference_orientation: Option<ReferenceOrientation>,
+    #[serde(rename = "@space-after")]
+    pub space_after: Option<Length>,
+    pub page_header: PageHeader,
+    pub page_footer: PageFooter,
+}
+
+impl PageSetup {
+    fn decode(&self) -> page::PageSetup {
+        let mut setup = page::PageSetup::default();
+        if let Some(initial_page_number) = self.initial_page_number {
+            setup.initial_page_number = initial_page_number;
+        }
+        if let Some(chart_size) = self.chart_size {
+            setup.chart_size = chart_size.into();
+        }
+        if let Some(margin_left) = self.margin_left {
+            setup.margins.0[Axis2::X][0] = margin_left.into();
+        }
+        if let Some(margin_right) = self.margin_right {
+            setup.margins.0[Axis2::X][1] = margin_right.into();
+        }
+        if let Some(margin_top) = self.margin_top {
+            setup.margins.0[Axis2::Y][0] = margin_top.into();
+        }
+        if let Some(margin_bottom) = self.margin_bottom {
+            setup.margins.0[Axis2::Y][1] = margin_bottom.into();
+        }
+        match (self.paper_width, self.paper_height) {
+            (Some(width), Some(height)) => {
+                setup.paper = PaperSize::new(width.0, height.0, paper_sizes::Unit::Inch)
+            }
+            (Some(length), None) | (None, Some(length)) => {
+                setup.paper = PaperSize::new(length.0, length.0, paper_sizes::Unit::Inch)
+            }
+            (None, None) => (),
+        }
+        if let Some(reference_orientation) = self.reference_orientation {
+            setup.orientation = reference_orientation.into();
+        }
+        if let Some(space_after) = self.space_after {
+            setup.object_spacing = space_after.into();
+        }
+        if let Some(PageParagraph { text }) = &self.page_header.page_paragraph {
+            setup.header = page::Heading(text.decode());
+        }
+        if let Some(PageParagraph { text }) = &self.page_footer.page_paragraph {
+            setup.footer = page::Heading(text.decode());
+        }
+        setup
+    }
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageHeader {
+    page_paragraph: Option<PageParagraph>,
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageFooter {
+    page_paragraph: Option<PageParagraph>,
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageParagraph {
+    text: PageParagraphText,
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageParagraphText {
+    #[serde(default, rename = "$text")]
+    text: String,
+}
+
+impl PageParagraphText {
+    fn decode(&self) -> Vec<Value> {
+        parse_paragraphs(&self.text)
+    }
+}
+
+#[derive(Copy, Clone, Debug, Default, Deserialize)]
+#[serde(rename = "snake_case")]
+pub enum ReferenceOrientation {
+    #[serde(alias = "0")]
+    #[serde(alias = "0deg")]
+    #[serde(alias = "inherit")]
+    #[default]
+    Portrait,
+
+    #[serde(alias = "90")]
+    #[serde(alias = "90deg")]
+    #[serde(alias = "-270")]
+    #[serde(alias = "-270deg")]
+    Landscape,
+
+    #[serde(alias = "180")]
+    #[serde(alias = "180deg")]
+    #[serde(alias = "-1280")]
+    #[serde(alias = "-180deg")]
+    ReversePortrait,
+
+    #[serde(alias = "270")]
+    #[serde(alias = "270deg")]
+    #[serde(alias = "-90")]
+    #[serde(alias = "-90deg")]
+    Seascape,
+}
+
+impl From<ReferenceOrientation> for page::Orientation {
+    fn from(value: ReferenceOrientation) -> Self {
+        match value {
+            ReferenceOrientation::Portrait | ReferenceOrientation::ReversePortrait => {
+                page::Orientation::Portrait
+            }
+            ReferenceOrientation::Landscape | ReferenceOrientation::Seascape => {
+                page::Orientation::Landscape
+            }
+        }
+    }
+}
+
+/// Chart size.
+#[derive(Copy, Clone, Debug, Default, Deserialize)]
+pub enum ChartSize {
+    #[default]
+    #[serde(rename = "as-is")]
+    AsIs,
+
+    #[serde(rename = "full-height")]
+    FullHeight,
+
+    #[serde(rename = "half-height")]
+    HalfHeight,
+
+    #[serde(rename = "quarter-height")]
+    QuarterHeight,
+}
+
+impl From<ChartSize> for page::ChartSize {
+    fn from(value: ChartSize) -> Self {
+        match value {
+            ChartSize::AsIs => page::ChartSize::AsIs,
+            ChartSize::FullHeight => page::ChartSize::FullHeight,
+            ChartSize::HalfHeight => page::ChartSize::HalfHeight,
+            ChartSize::QuarterHeight => page::ChartSize::QuarterHeight,
+        }
+    }
+}
+
 #[derive(Deserialize, Debug)]
 #[serde(rename_all = "camelCase")]
 enum HeadingContent {
@@ -509,9 +766,10 @@ struct TableStructure {
 #[cfg(test)]
 #[test]
 fn test_spv() {
-    let item = Item::from_spv_file(Path::new("/home/blp/pspp/rust/tests/utilities/regress.spv"))
+    let item = ReadOptions::new()
+        .open_file("/home/blp/pspp/rust/tests/utilities/regress.spv")
         .unwrap()
-        .0;
+        .into_item();
     println!("{item}");
     todo!()
 }
index 4248f28ae1bf14d4d82022c06391a64644394d8b..ac01a6f2e7b6ad3bd0ab687904f74021c8d71ebb 100644 (file)
@@ -6,7 +6,7 @@ use std::{
 
 use itertools::Itertools;
 
-use crate::output::pivot::FontStyle;
+use crate::output::pivot::{FontStyle, HorzAlign};
 
 #[derive(Clone, Debug, PartialEq, Eq)]
 enum Token<'a> {
@@ -95,6 +95,23 @@ impl<'a> Iterator for Lexer<'a> {
     }
 }
 
+impl HorzAlign {
+    pub fn from_css(s: &str) -> Option<Self> {
+        let mut lexer = Lexer(s);
+        while let Some(token) = lexer.next() {
+            if let Token::Id(key) = token
+                && let Some(Token::Colon) = lexer.next()
+                && let Some(Token::Id(value)) = lexer.next()
+                && key.as_ref() == "text-align"
+                && let Ok(align) = value.parse()
+            {
+                return Some(align);
+            }
+        }
+        None
+    }
+}
+
 impl FontStyle {
     pub fn parse_css(&mut self, s: &str) {
         let mut lexer = Lexer(s);
@@ -203,13 +220,31 @@ impl<'a> Display for CssString<'a> {
 
 #[cfg(test)]
 mod tests {
-    use std::borrow::Cow;
+    use std::{borrow::Cow, str::FromStr};
 
     use crate::output::{
-        pivot::{Color, FontStyle},
+        pivot::{Color, FontStyle, HorzAlign, UnknownHorzAlign},
         spv::css::{Lexer, Token},
     };
 
+    #[test]
+    fn css_horz_align() {
+        assert_eq!(
+            HorzAlign::from_css("text-align: left"),
+            Some(HorzAlign::Left)
+        );
+        assert_eq!(
+            HorzAlign::from_css("margin-top: 0; text-align:center"),
+            Some(HorzAlign::Center)
+        );
+        assert_eq!(
+            HorzAlign::from_css("text-align: Right; margin-top:0"),
+            Some(HorzAlign::Right)
+        );
+        assert_eq!(HorzAlign::from_css("text-align: other"), None);
+        assert_eq!(HorzAlign::from_css("margin-top: 0"), None);
+    }
+
     #[test]
     fn css_strings() {
         #[track_caller]
index baca5a5da73e60a039377e5e3cb660ca75191a74..9d63bcf33a057afd2dee340a4c59027673294477 100644 (file)
@@ -6,10 +6,7 @@ use std::{
 
 use html_parser::{Dom, Element, Node};
 
-use crate::output::{
-    page::Paragraph,
-    pivot::{Color, FontStyle, HorzAlign, Value},
-};
+use crate::output::pivot::{CellStyle, Color, FontStyle, HorzAlign, Value};
 
 fn find_element<'a>(elements: &'a [Node], name: &str) -> Option<&'a Element> {
     for element in elements {
@@ -194,9 +191,6 @@ fn extract_html_text2(node: &Node, base_font_size: i32, output: &mut impl HtmlOu
                         push_whitespace(' ', s);
                     }
                     _ if c.is_whitespace() => push_whitespace(c, s),
-                    '<' => s.push_str("&lt;"),
-                    '>' => s.push_str("&gt;"),
-                    '&' => s.push_str("&amp;"),
                     _ => s.push(c),
                 }
             }
@@ -230,6 +224,18 @@ fn extract_html_text2(node: &Node, base_font_size: i32, output: &mut impl HtmlOu
                     write!(s, "<{tag}>").unwrap();
                     Some(tag)
                 }
+                "strong" => {
+                    write!(s, "<b>").unwrap();
+                    Some("b")
+                }
+                "em" => {
+                    write!(s, "<i>").unwrap();
+                    Some("i")
+                }
+                "strike" => {
+                    write!(s, r#"<span strikethrough="true">"#).unwrap();
+                    Some("span")
+                }
                 "font" => {
                     s.push_str("<span");
                     if let Some(Some(face)) = element.attributes.get("face") {
@@ -243,12 +249,10 @@ fn extract_html_text2(node: &Node, base_font_size: i32, output: &mut impl HtmlOu
                     if let Some(Some(html_size)) = element.attributes.get("size")
                         && let Ok(html_size) = usize::from_str(&html_size)
                         && let Some(index) = html_size.checked_sub(1)
-                        && let Some(scale) = [0.444, 0.556, 0.667, 0.778, 1.0, 1.33, 2.0]
-                            .get(index)
-                            .copied()
+                        && let Some(points) =
+                            [6.0, 7.5, 9.0, 10.5, 13.5, 18.0, 27.0].get(index).copied()
                     {
-                        let size = base_font_size as f64 * scale * 1024.0;
-                        push_attribute("size", format_args!("{size:.0}"), s);
+                        push_attribute("size", format_args!("{points:.1}pt"), s);
                     }
                     s.push('>');
                     Some("span")
@@ -280,26 +284,32 @@ fn parse2(
 ) -> Result<(), html_parser::Error> {
     let dom = Dom::parse(&format!("<!doctype html>{input}"))?;
     for node in &dom.children {
-        match node.element() {
-            Some(head) if head.name.eq_ignore_ascii_case("head") => {
+        match node {
+            Node::Element(head) if head.name.eq_ignore_ascii_case("head") => {
                 if let Some(style) = find_element(&head.children, "style") {
                     let mut text = String::new();
                     get_element_text(style, &mut text);
                     font_style.parse_css(&text)
                 }
             }
-            Some(p) if p.name.eq_ignore_ascii_case("p") => {
-                let align = match p.attributes.get("align") {
-                    Some(Some(align)) if align.eq_ignore_ascii_case("left") => HorzAlign::Left,
-                    Some(Some(align)) if align.eq_ignore_ascii_case("right") => HorzAlign::Right,
-                    Some(Some(align)) if align.eq_ignore_ascii_case("center") => HorzAlign::Center,
-                    _ => HorzAlign::Left,
+            Node::Element(p) if p.name.eq_ignore_ascii_case("p") => {
+                let align = if let Some(Some(s)) = p.attributes.get("align")
+                    && let Ok(align) = HorzAlign::from_str(s)
+                {
+                    align
+                } else if let Some(Some(s)) = p.attributes.get("style")
+                    && let Some(align) = HorzAlign::from_css(s)
+                {
+                    align
+                } else {
+                    HorzAlign::Left
                 };
                 output.start_paragraph(align);
                 extract_html_text2(node, font_style.size, output);
                 output.end_paragraph();
             }
-            _ => extract_html_text2(node, font_style.size, output),
+            Node::Element(_) | Node::Text(_) => extract_html_text2(node, font_style.size, output),
+            Node::Comment(_) => (),
         }
     }
     Ok(())
@@ -326,50 +336,59 @@ pub fn parse_value(input: &str) -> Value {
     .with_font_style(font_style)
 }
 
-pub fn parse_paragraphs(input: &str) -> Vec<Paragraph> {
+pub fn parse_paragraphs(input: &str) -> Vec<Value> {
     let mut font_style = FontStyle::default().with_size(10);
 
-    #[derive(Default)]
     struct Paragraphs {
-        current: Paragraph,
-        finished: Vec<Paragraph>,
+        markup: String,
+        horz_align: HorzAlign,
+        finished: Vec<Value>,
+    }
+
+    impl Default for Paragraphs {
+        fn default() -> Self {
+            Self {
+                markup: String::new(),
+                horz_align: HorzAlign::Left,
+                finished: Vec::new(),
+            }
+        }
     }
 
     impl HtmlOutput for Paragraphs {
         fn start_paragraph(&mut self, align: HorzAlign) {
-            if !self.current.text.is_empty() {
+            if !self.markup.is_empty() {
                 self.end_paragraph();
             }
-            self.current.align = align;
+            self.horz_align = align;
         }
 
         fn end_paragraph(&mut self) {
-            self.finished.push(take(&mut self.current));
+            let value = Value::new_markup(take(&mut self.markup))
+                .with_cell_style(CellStyle::default().with_horz_align(Some(self.horz_align)));
+            self.finished.push(value);
         }
 
         fn text(&mut self) -> &mut String {
-            &mut self.current.text
+            &mut self.markup
         }
     }
 
     let mut output = Paragraphs::default();
     if parse2(input, &mut output, &mut font_style).is_ok() {
-        if !output.current.text.is_empty() {
+        if !output.markup.is_empty() {
             output.end_paragraph();
         }
         output.finished
     } else if !input.is_empty() {
-        vec![Paragraph {
-            text: input.into(),
-            ..Paragraph::default()
-        }]
+        vec![Value::new_user_text(input)]
     } else {
         Vec::new()
     }
 }
 
 pub fn parse(input: &str) -> Value {
-    let mut font_style = FontStyle::default().with_size(10);
+    let mut font_style = FontStyle::default();
     let value = match Dom::parse(&format!("<!doctype html>{input}")) {
         Ok(dom) => {
             let mut s = String::new();
@@ -395,11 +414,30 @@ pub fn parse(input: &str) -> Value {
 
 #[cfg(test)]
 mod tests {
+    use quick_xml::events::Event;
+
     use crate::output::{
         pivot::{FontStyle, Value},
         spv::html::{parse, parse_paragraphs, parse_value},
     };
 
+    #[test]
+    fn test_parse() {
+        let text = r##"<xml>&lt;html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+  &lt;head>
+
+  &lt;/head>
+  &lt;body>
+    &lt;p>
+      plain&amp;#160;&lt;font color="#000000" size="3" face="Monospaced">&lt;b>bold&lt;/b>&lt;/font>&amp;#160;&lt;font color="#000000" size="3" face="Monospaced">&lt;i>italic&lt;/i>&amp;#160;&lt;strike>strikeout&lt;/strike>&lt;/font>
+    &lt;/p>
+  &lt;/body>
+&lt;/html>
+</xml>"##;
+        let content = quick_xml::de::from_str::<String>(text).unwrap();
+        dbg!(parse_paragraphs(&content));
+    }
+
     #[test]
     fn css() {
         assert_eq!(
@@ -430,12 +468,32 @@ mod tests {
 
     #[test]
     fn paragraphs() {
+        let paragraphs = parse_paragraphs(
+            r#"<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+                                                        <head>
+                                                                <style type="text/css">
+                                                                        p { font-family: sans-serif;
+                                                                             font-size: 10pt; text-align: center;
+                                                                             font-weight: normal;
+                                                                             color: #000000;
+                                                                             }
+                                                                </style>
+                                                        </head>
+                                                        <body>
+                                                                <p>&amp;[PageTitle]</p>
+                                                        </body>
+                                                </html>]"#,
+        );
+        dbg!(&paragraphs);
+        for value in &paragraphs {
+            println!("{}", value.display(()));
+        }
+        todo!();
         let paragraphs = parse_paragraphs(
             r#"<p align="left"><b>bold</b><br><i>italic</i><BR><b><i>bold italic</i></b><br><font color="red" face="Serif">red serif</font><br><font size="7">big</font><br></p>not in a paragraph<p align="right">right justified</p><p align="center">centered</p>trailing"#,
         );
         dbg!(&paragraphs);
         assert_eq!(paragraphs.len(), 5);
-        todo!()
         /*
         assert_eq!(
             paragraph,
index f44da6dfe681742899cf0d74a0967b260b3d4b2e..1e218efcc3122645e19f29b9f595a7a466bd61c1 100644 (file)
@@ -393,7 +393,7 @@ impl Visualization {
                 && let Some(label) = &axis.label
             {
                 let out = &mut look.areas[Area::Labels(a)];
-                *out = Area::Labels(a).default_area_style();
+                *out = AreaStyle::default_for_area(Area::Labels(a));
                 let style = label.style.get(&styles);
                 Style::decode_area(
                     style,
@@ -784,7 +784,7 @@ impl Visualization {
                             frame: None,
                             format: None,
                         } if alternating => {
-                            let mut style = Area::Data(RowParity::Odd).default_area_style();
+                            let mut style = AreaStyle::default_for_area(Area::Data(RowParity::Odd));
                             Style::decode_area(self.labeling, self.graph, &mut style);
                             let font_style = &mut look.areas[Area::Data(RowParity::Odd)].font_style;
                             font_style.fg = style.font_style.fg;
index d3e43b6b157e35d67332821d0f36b8d744ba7a1d..6b09440ac8a9f8e9852abb4b5ce41e22f248e10d 100644 (file)
@@ -515,7 +515,7 @@ impl<F> ReadOptions<F> {
     }
 
     /// Causes the file to be read by decrypting it with the given `password` or
-    /// without decrypting if `encoding` is None.
+    /// without decrypting if `password` is None.
     pub fn with_password(self, password: Option<String>) -> Self {
         Self { password, ..self }
     }