[License](license.md)
- [Running PSPP](invoking/index.md)
- - [Converting Data](invoking/pspp-convert.md)
+ - [Converting File Formats](invoking/pspp-convert.md)
- [Inspecting System Files](invoking/pspp-show.md)
- [Inspecting Portable Files](invoking/pspp-show-por.md)
- [Inspecting SPSS/PC+ Files](invoking/pspp-show-pc.md)
## Converting `.spv` Viewer Files
`pspp convert` can convert SPSS viewer files (`.spv` files) into
-multiple different formats.
+any of the formats supported for PSPP output.
## Options
* `-e <ENCODING>`
`--encoding=<ENCODING>`
- Sets the character encoding used to read text strings in the input
- file. This is not needed for new enough SPSS data files, but older
- data files do not identify their encoding, and PSPP cannot always
- guess correctly.
+ For reading SPSS system files only, sets the character encoding used
+ to read text strings. This is not needed for new enough SPSS system
+ files, but older files do not identify their encoding, and PSPP
+ cannot always guess correctly.
`<ENCODING>` must be one of the labels for encodings in the
[Encoding Standard]. PSPP does not support UTF-16 or EBCDIC
* `-p <PASSWORD>`
`--password=<PASSWORD>`
- Specifies the password for reading an encrypted SPSS system file.
+ Specifies the password for reading an encrypted SPSS system or
+ viewer file.
- `pspp convert` reads, but does not write, encrypted system files.
+ PSPP reads, but does not write, encrypted files.
> ⚠️ The password (and other command-line options) may be visible to
other users on multiuser systems.
This is useful for converting a TableLook `.tlo` file from SPSS 15
or earlier into the newer `.stt` format.
+## Options
+
+These options apply to any `<MODE>` that reads an SPV file:
+
+* `-p <PASSWORD>`
+ `--password=<PASSWORD>`
+ Specifies the password for reading an encrypted SPV file.
+
+ PSPP reads, but does not write, encrypted SPV files.
+
+ > ⚠️ The password (and other command-line options) may be visible to
+ other users on multiuser systems.
+
## Input Selection Options
Commands that read an SPV file operate, by default, on all of the
* `--members=MEMBER...`
Include only the objects that include a listed Zip file `MEMBER`.
- More than one name may be included, comma-separated. The members
- in an SPV file may be listed with the `dir` command by adding the
- `--show-members` option or with the `zipinfo` program included with
- many operating systems. Error messages that `pspp-output` prints
- when it reads SPV files also often include member names.
+ More than one name may be included, comma-separated. The members in
+ an SPV file may be listed with the `dir` command by adding the
+ `--member-names` option or with `zipinfo` or another program to view
+ Zip files. Error messages that `pspp-output` prints when it reads
+ SPV files also often include member names.
:commandName?
:creator-version?
=> html
+
+html :lang=(en) => TEXT
```
This `text` element is nested inside a `container`. There is a
-different `text` element that is nested inside a `pageParagraph`.
+[different `text` element that is nested inside a `pageParagraph`](#the-text-element-inside-pageparagraph).
This element has the following attributes.
* `creator-version`
As on the `heading` element.
-## The `html` Element
-
-```
-html :lang=(en) => TEXT
-```
-
-The element contains an HTML document as text (or, in practice, as
-CDATA). In some cases, the document starts with `<html>` and ends with
-`</html>`; in others the `html` element is implied. Generally the HTML
-includes a `head` element with a CSS stylesheet. The HTML body often
-begins with `<BR>`.
-
-The HTML document uses only the following elements:
-
-* `html`
- Sometimes, the document is enclosed with `<html>`...`</html>`.
-
-* `br`
- The HTML body often begins with `<BR>` and may contain it as well.
-
-* `b`
- `i`
- `u`
- Styling.
+### The `html` element
-* `font`
- The attributes `face`, `color`, and `size` are observed. The value
- of `color` takes one of the forms `#RRGGBB` or `rgb (R, G, B)`.
- The value of `size` is a number between 1 and 7, inclusive.
+The `html` element inside `text` contains an HTML document as text
+(or, in practice, as CDATA). In some cases, the document starts with
+`<html>` and ends with `</html>`, and in others the `html` element is
+implied. Generally the HTML includes a `head` element with a CSS
+stylesheet. The HTML body often begins with `<BR>`. See [Embedded
+HTML](#embedded-html) for details.
-The CSS in the corpus is simple. To understand it, a parser only
-needs to be able to skip white space, `<!--`, and `-->`, and parse style
-only for `p` elements. Only the following properties matter:
-
-* `color`
- In the form `RRGGBB`, e.g. `000000`, with no leading `#`.
-
-* `font-weight`
- Either `bold` or `normal`.
-
-* `font-style`
- Either `italic` or `normal`.
-
-* `text-decoration`
- Either `underline` or `normal`.
-
-* `font-family`
- A font name, commonly `Monospaced` or `SansSerif`.
-
-* `font-size`
- Values claim to be in points, e.g. `14pt`, but the values are
- actually in "device-independent pixels" (px), at 96/inch.
-
-This element has the following attributes.
+The `html` element has the following attributes:
* `lang`
This always contains `en` in the corpus.
+> A few examples of typical text in the corpus:
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><style type="text/css">p{color:0;font-family:Monospaced;font-size:14pt;font-style:normal;font-weight:normal;text-decoration:none}</style></head><BR>REGRESSION
+> /MISSING LISTWISE
+> /STATISTICS COEFF OUTS R ANOVA
+> /CRITERIA=PIN(.05) POUT(.10)
+> /NOORIGIN
+> /DEPENDENT Pvalues
+> /METHOD=ENTER MMN.</html>
+> ```
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head><style type="text/css">p{color:0;font-family:Monospaced;font-size:13pt;font-style:normal;font-weight:normal;text-decoration:none}</style></head><BR>CROSSTABS<BR>&nbsp;&nbsp;/TABLES=facrec&nbsp;BY&nbsp;nq1e<BR>&nbsp;&nbsp;/FORMAT=AVALUE&nbsp;TABLES<BR>&nbsp;&nbsp;/CELLS=COUNT&nbsp;ROW<BR>&nbsp;&nbsp;/COUNT&nbsp;ROUND&nbsp;CELL.</html>
+> ```
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"><html>
+> <head>
+> <style type="text/css">
+> <!--
+> p { font-style: normal; text-decoration: none; font-weight: bold; color: 000000; font-size: 14pt; font-family: Trebuchet MS }
+> -->
+> </style>
+>
+> </head>
+> <body>
+> <b><font size="5" face="Times New Roman"> <u>H</u></font><u><font size="5" color="#000000" face="Times New Roman">ousehold
+> Income (In Thousands)</font></u><font size="5" color="#000000" face="Times New Roman">
+> </font></b>
+> </body>
+> </html>
+> </html>
+> ```
+
## The `table` Element
```
This element contains the following:
-* `tableProperties`: See [Legacy
- Properties](legacy-detail-xml.md#legacy-properties), for details.
+* `tableProperties`
+ See [Legacy Properties](legacy-detail-xml.md#legacy-properties), for
+ details.
-* `tableStructure`, which in turn contains:
+* `tableStructure`
+ This eleemnt in turn contains:
- Both `path` and `dataPath` for legacy members.
* `space-after`
The amount of space between printed objects, typically `12pt`.
-## The `text` Element (Inside `pageParagraph`)
+### The `text` Element (Inside `pageParagraph`)
```
text[pageParagraph_text] :type=(title | text) => TEXT
```
This `text` element is nested inside a `pageParagraph`. There is a
-different `text` element that is nested inside a `container`.
+[different `text` element that is nested inside a
+`container`](#the-text-element-inside-container).
+
+This element has the following attributes:
-The element is either empty, or contains CDATA that holds almost-XHTML
-text: in the corpus, either an `html` or `p` element. It is
-_almost_-XHTML because the `html` element designates the default
-namespace as `http://xml.spss.com/spss/viewer/viewer-tree` instead of
-an XHTML namespace, and because the CDATA can contain substitution
-variables. The following variables are supported:
+* `type`
+ Always `text`.
+
+The element is either empty, or contains CDATA that holds XHTML text
+with a root element of either `html` or `p`. Text in the XHTML can
+contain substitution variables. The following variables are
+supported:[^1]
+
+[^1]: The `&` characters are escaped as `&`, that is, these are
+ not XML entities, since XML entity names can't begin with `[`.
* `&[Date]`
`&[Time]`
`&[Head2]`
`&[Head3]`
`&[Head4]`
- First-, second-, third-, or fourth-level heading.
+ First-, second-, third-, or fourth-level heading, respectively.
* `&[PageTitle]`
+ `&[Заголовок страницы]`
+ `&[頁面標題]`
The page title.
* `&[Filename]`
Name of the output file.
* `&[Page]`
+ `&[Страница]`
+ `&[頁]`
The page number.
-Typical contents (indented for clarity):
+See [Embedded HTML](#embedded-html) for more information.
+
+> The 23,000 SPV files in the corpus have only 17 unique instances of
+`text` inside `pageParagraph`. Most of them look similar to this for
+page headers:
+>
+> ```
+> <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
+> <head>
+>
+> </head>
+> <body>
+> <p style="text-align:center; margin-top: 0">
+> &[PageTitle]
+> </p>
+> </body>
+> </html>
+> ```
+>
+> and footers:
+>
+> ```
+> <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
+> <head>
+>
+> </head>
+> <body>
+> <p style="text-align:right; margin-top: 0">
+> Page &[Page]
+> </p>
+> </body>
+> </html>
+> ```
+>
+> Sometimes CSS is present (the original was indented much deeper), with
+> header:
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+> <head>
+> <style type="text/css">
+> p { font-family: sans-serif;
+> font-size: 10pt; text-align: center;
+> font-weight: normal;
+> color: #000000;
+> }
+> </style>
+> </head>
+> <body>
+> <p>&amp;[PageTitle]</p>
+> </body>
+> </html>
+> ```
+>
+> and footer:
+>
+> ```
+> <html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+> <head>
+> <style type="text/css">
+> p { font-family: sans-serif;
+> font-size: 10pt; text-align: right;
+> font-weight: normal;
+> color: #000000;
+> }
+> </style>
+> </head>
+> <body>
+> <p>Page &amp;[Page]</p>
+> </body>
+> </html>
+> ```
+>
+> No files in the corpus show any more sophisticated use of features
+> than these examples.
+
+## Embedded HTML
+
+Structure XML contains embedded HTML in two contexts:
+
+- The [`text` element inside `container`](#the-text-element-inside-container).
+
+- The [`text` element inside
+ `pageParagraph`](#the-text-element-inside-pageparagraph).
+
+The use of HTML in both cases is similar. These HTML documents use
+only the following elements:
+
+* `html`
+ Sometimes, the document is enclosed with `<html>`...`</html>`.
+
+* `head`
+ The document often contains a `head` element. It can be
+ empty or it can contain a `style` element, in turn enclosing CSS
+ within `<!--` and `-->`. See [embedded CSS](#embedded-ccs), below,
+ for details.
+
+* `body`
+ The document often contains a `body` element that contains the
+ content.
+
+* `p`
+ The document often contains a `p` element that contains the content.
+ [Inside `pageParagraph`](#the-text-element-inside-pageparagraph)
+ only, the document can contain multiple paragraphs.
+
+ The following attributes are observed:
+
+ - `align`
+ With value `left`, `center`, or `right`.
+
+ - `style`
+ With value `text-align:<align>; margin-top: 0`, where `<align>` is
+ one of `left`, `center`, or `right`, or simply `margin-top: 0`.
+
+* `br`
+ The HTML body often begins with a "break" tag and may contain them
+ as well.
+
+ Embedded HTML writes most tag names in lowercase but this one is
+ usually in uppercase, as `<BR>`.
+
+* `b`
+ `i`
+ `u`
+ `strike`
+ Styling.
+
+* `font`
+ The following attributes are observed:
+
+ - `face`
+ A typeface, most often `Monospaced` or `SansSerif`.
+
+ - `color`
+ One of the forms `#RRGGBB` or `rgb (R, G, B)`.
+
+ - `size`
+ A number between 1 and 7 with the following meanings:
+
+ | `size` | Size |
+ |-------:|--------:|
+ | 1[^2] | 6 pt |
+ | 2 | 7.5 pt |
+ | 3 | 9 pt |
+ | 4 | 10.5 pt |
+ | 5 | 13.5 pt |
+ | 6 | 18 pt |
+ | 7 | 27 pt |
+
+ [^2]: This `size` doesn't appear in the corpus. The size listed
+ is an extrapolation based on what browsers usually do.
+
+> It appears that pasting HTML into the SPSS viewer can cause more
+> general HTML to be included. The following elements in the corpus,
+> each of these is observed in only a few files, appear to be added by
+> pasting HTML from another application:
+>
+> * `strong`
+> `em`
+> Styling.
+>
+> * `span`
+> The `style` attribute is used a bit.
+>
+> * `li`
+> `ul`
+> Seen in only one file in the corpus.
+>
+> * `a`
+> Seen in only two files in the corpus. SPSS doesn't allow the link
+> to be seen or visited.
+>
+> * `table`
+> `td`
+> `tr`
+> Seen in only one file in the corpus. SPSS doesn't render the
+> table properly.
+>
+> * `img`
+> Seen in only one file in the corpus. In this file, the `src`
+> attribute was an invalid `jar:` URL.
+
+Text in embedded HTML often uses non-breaking spaces (U+00A0
+NON-BREAKING SPACE), often written as ` ` or ` `. In
+embedded HTML, newlines must be treated as line breaks.
+
+### Embedded CSS
+
+The CSS in the corpus is simple. To understand it, a parser only
+needs to be able to skip white space, `<!--`, and `-->`, and parse style
+only for `p` elements. Only the following properties matter:
+
+* `color`
+ In the form `RRGGBB`, e.g. `000000`, with no leading `#`.
+
+* `font-weight`
+ Either `bold` or `normal`.
+
+* `font-style`
+ Either `italic` or `normal`.
+
+* `text-decoration`
+ Either `underline` or `normal`.
+
+* `font-family`
+ A font name, commonly `Monospaced` or `SansSerif`.
+
+* `font-size`
+ Values claim to be in points, e.g. `14pt`, but the values are
+ actually in "device-independent pixels" (px), at 96/inch.
+
+### Examples
+
+Text that looks like "plain **bold** *italic* ~~strikeout~~", for use
+[inside `pageParagraph`]:
```
-<html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
- <head></head>
- <body>
- <p style="text-align:right; margin-top: 0">Page &[Page]</p>
- </body>
-</html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+ <head>
+
+ </head>
+ <body>
+ <p>
+ plain&#160;<font color="#000000" size="3" face="Monospaced"><b>bold</b></font>&#160;<font color="#000000" size="3" face="Monospaced"><i>italic</i>&#160;<strike>strikeout</strike></font>
+ </p>
+ </body>
+</html>
```
-This element has the following attributes.
+Another example, also for use [inside `pageParagraph`], of three
+paragraphs, the first left justified, the second center justified with
+a large font, and the third right justified:
-* `type`
- Always `text`.
+```
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+ <head>
+
+ </head>
+ <body>
+ <p>
+ left
+ </p>
+ <p align="center">
+ <font color="#000000" size="5" face="Monospaced">center&#160;large</font>
+ </p>
+ <p align="right">
+ <font color="#000000" size="3" face="Monospaced"><b><i>right</i></b></font>
+ </p>
+ </body>
+</html>
+```
+
+[inside `pageParagraph`]: #the-text-element-inside-pageparagraph
+[inside `container`]: #the-text-element-inside-container
#[arg(short = 'e', long, value_parser = parse_encoding)]
encoding: Option<&'static Encoding>,
- /// Password for decryption, with or without what SPSS calls "password encryption".
+ /// Password for decryption.
+ ///
+ /// In addition to file encryption, SPSS supports a feature called "password
+ /// encryption". The password specified can be specified with or without
+ /// "password encryption".
///
/// Specify only for an encrypted system file.
#[clap(short, long)]
use anyhow::Result;
use clap::{Args, ValueEnum};
-use pspp::output::{Criteria, Item};
+use pspp::output::{Criteria, Item, spv};
use std::{fmt::Display, path::PathBuf};
/// Show information about SPSS viewer files (SPV files).
#[arg(required = true)]
input: PathBuf,
+ /// Password for decryption.
+ ///
+ /// In addition to file encryption, SPSS supports a feature called "password
+ /// encryption". The password specified can be specified with or without
+ /// "password encryption".
+ ///
+ /// Specify only for an encrypted SPV file.
+ #[clap(short, long)]
+ password: Option<String>,
+
/// Input selection options.
#[command(flatten)]
criteria: Criteria,
pub fn run(self) -> Result<()> {
match self.mode {
Mode::Directory => {
- let item = Item::from_spv_file(&self.input)?.0;
+ let item = spv::ReadOptions::new()
+ .with_password(self.password)
+ .open_file(&self.input)?
+ .into_item();
let item = self.criteria.apply(item);
for child in item.details.children() {
print_item_directory(&child, 0, self.show_member_names);
Ok(())
}
Mode::View => {
- let item = Item::from_spv_file(&self.input)?.0;
+ let item = spv::ReadOptions::new()
+ .with_password(self.password)
+ .open_file(&self.input)?
+ .into_item();
let item = self.criteria.apply(item);
for child in item.details.children() {
println!("{child}");
pub mod page;
pub mod pivot;
pub mod render;
-mod spv;
+pub mod spv;
pub mod table;
/// A single output item.
s
}
-struct CairoDevice<'a> {
- style: &'a CairoFsmStyle,
- params: &'a Params,
- context: &'a Context,
-}
-
-impl CairoDevice<'_> {
- fn layout_cell(&self, cell: &DrawCell, mut bb: Rect2, clip: &Rect2) -> Coord2 {
+impl<'a> DrawCell<'a> {
+ pub(crate) fn layout(&self, bb: &Rect2, layout: &mut Layout, default_font: &FontDescription) {
// XXX rotation
- //let h = if cell.rotate { Axis2::Y } else { Axis2::X };
- let layout = self.style.new_layout(self.context);
+ let mut bb = bb.clone();
+ layout.set_attributes(None);
- let cell_font = if !cell.font_style.font.is_empty() {
- Some(parse_font_style(&cell.font_style))
+ let parsed_font;
+ let font = if !self.font_style.font.is_empty() {
+ parsed_font = parse_font_style(&self.font_style);
+ &parsed_font
} else {
- None
+ default_font
};
- let font = cell_font.as_ref().unwrap_or(&self.style.font);
layout.set_font_description(Some(font));
- let (body, suffixes) = cell.display().split_suffixes();
- let horz_align = cell.horz_align(&body);
+ let (body, suffixes) = self.display().split_suffixes();
+ let horz_align = self.horz_align(&body);
let mut attrs = None;
let mut body = if let Some(markup) = body.markup() {
};
match horz_align {
- HorzAlign::Decimal { offset, decimal } if !cell.rotate => {
+ HorzAlign::Decimal { offset, decimal } if !self.rotate => {
let decimal_position = if let Some(position) = body.rfind(char::from(decimal)) {
layout.set_text(&body[position..]);
layout.set_width(-1);
_ => (),
}
- if cell.font_style.underline {
+ if self.font_style.underline {
attrs
.get_or_insert_default()
.insert(AttrInt::new_underline(Underline::Single));
let footnote_width = layout.size().0.max(0) as usize;
// Bound the adjustment by the width of the right margin.
- let right_margin = px_to_xr(cell.cell_style.margins[Axis2::X][1].max(0) as usize);
+ let right_margin = px_to_xr(self.cell_style.margins[Axis2::X][1].max(0) as usize);
let footnote_adjustment = min(footnote_width, right_margin);
// Adjust the bounding box.
- if cell.rotate {
+ if self.rotate {
bb[Axis2::X].end = bb[Axis2::X].end.saturating_sub(footnote_adjustment);
} else {
bb[Axis2::X].end = bb[Axis2::X].end.saturating_add(footnote_adjustment);
} else {
layout.set_width(bb[Axis2::X].len() as i32);
}
+ }
- let size = layout.size();
-
- if !clip.is_empty() {
- self.context.save().unwrap();
- if !cell.rotate {
- xr_clip(self.context, clip);
- }
- if cell.rotate {
- let extra = bb[Axis2::X].len().saturating_sub(size.1.max(0) as usize);
- let halign_offset = extra / 2;
- self.context.translate(
- xr_to_pt(bb[Axis2::X].start + halign_offset),
- xr_to_pt(bb[Axis2::Y].end),
- );
- self.context.rotate(-PI / 2.0);
- } else {
- self.context
- .translate(xr_to_pt(bb[Axis2::X].start), xr_to_pt(bb[Axis2::Y].start));
- }
- show_layout(self.context, &layout);
- self.context.restore().unwrap();
+ pub(crate) fn draw(
+ &self,
+ bb: &Rect2,
+ layout: &Layout,
+ clip: Option<&Rect2>,
+ context: &Context,
+ ) {
+ context.save().unwrap();
+ if !self.rotate
+ && let Some(clip) = clip
+ {
+ xr_clip(context, clip);
+ }
+ if self.rotate {
+ let extra = bb[Axis2::X]
+ .len()
+ .saturating_sub(layout.size().1.max(0) as usize);
+ let halign_offset = extra / 2;
+ context.translate(
+ xr_to_pt(bb[Axis2::X].start + halign_offset),
+ xr_to_pt(bb[Axis2::Y].end),
+ );
+ context.rotate(-PI / 2.0);
+ } else {
+ context.translate(xr_to_pt(bb[Axis2::X].start), xr_to_pt(bb[Axis2::Y].start));
}
+ show_layout(context, &layout);
+ context.restore().unwrap();
+ }
+}
- layout.set_attributes(None);
+struct CairoDevice<'a> {
+ style: &'a CairoFsmStyle,
+ params: &'a Params,
+ context: &'a Context,
+}
+
+impl CairoDevice<'_> {
+ fn measure_cell(&self, cell: &DrawCell, bb: Rect2) -> Coord2 {
+ let mut layout = self.style.new_layout(self.context);
+ cell.layout(&bb, &mut layout, &self.style.font);
+ let (width, height) = layout.size();
+ Coord2::new(width.max(0) as usize, height.max(0) as usize)
+ }
- Coord2::new(size.0.max(0) as usize, size.1.max(0) as usize)
+ fn cell_draw(&self, cell: &DrawCell, bb: Rect2, clip: &Rect2) {
+ let mut layout = self.style.new_layout(self.context);
+ cell.layout(&bb, &mut layout, &self.style.font);
+ cell.draw(&bb, &layout, Some(clip), &self.context);
}
fn do_draw_line(
}
}
- /// An empty clipping rectangle.
- fn clip() -> Rect2 {
- Rect2::default()
- }
-
enum_map![
Extreme::Min => {
let bb = Rect2::new(0..1, 0..usize::MAX);
- add_margins(cell, self.layout_cell(cell, bb, &clip()).x())
+ add_margins(cell, self.measure_cell(cell, bb).x())
}
Extreme::Max => {
let bb = Rect2::new(0..usize::MAX, 0..usize::MAX);
- add_margins(cell, self.layout_cell(cell, bb, &clip()).x())
+ add_margins(cell, self.measure_cell(cell, bb).x())
},
]
}
0..width.saturating_sub(px_to_xr(margins[Axis2::X].len())),
0..usize::MAX,
);
- self.layout_cell(cell, bb, &Rect2::default()).y() + margin(cell, Axis2::Y)
+ self.measure_cell(cell, bb).y() + margin(cell, Axis2::Y)
}
fn adjust_break(&self, _cell: &Content, _size: Coord2) -> usize {
.saturating_sub(draw_cell.cell_style.margins[axis][0].max(0) as usize);
}
if bb[Axis2::X].start < bb[Axis2::X].end && bb[Axis2::Y].start < bb[Axis2::Y].end {
- self.layout_cell(draw_cell, bb, clip);
+ self.cell_draw(draw_cell, bb, clip);
}
self.context.restore().unwrap();
}
Item, ItemCursor,
drivers::cairo::{
fsm::{CairoFsm, CairoFsmStyle},
- horz_align_to_pango, xr_to_pt,
+ xr_to_pt,
},
page::Heading,
- pivot::Axis2,
+ pivot::{Axis2, CellStyle, FontStyle, Rect2, ValueOptions},
+ table::DrawCell,
};
#[derive(Clone, Debug)]
fn render_heading(
context: &Context,
- font: &FontDescription,
+ default_font: &FontDescription,
heading: &Heading,
_page_number: i32,
width: usize,
) -> usize {
let pangocairo_context = pangocairo::functions::create_context(context);
pangocairo::functions::context_set_resolution(&pangocairo_context, font_resolution);
- let layout = Layout::new(&pangocairo_context);
- layout.set_font_description(Some(font));
let mut y = 0;
+ let default_cell_style = CellStyle::default();
+ let default_font_style = FontStyle::default();
+ let value_options = ValueOptions::default();
for paragraph in &heading.0 {
// XXX substitute heading variables
- layout.set_markup(¶graph.text);
-
- layout.set_alignment(horz_align_to_pango(paragraph.align));
- layout.set_width(width as i32);
-
- context.save().unwrap();
- context.translate(0.0, xr_to_pt(y + base_y));
- pangocairo::functions::show_layout(context, &layout);
- context.restore().unwrap();
-
- y += layout.height() as usize;
+ let cell = DrawCell {
+ rotate: false,
+ inner: ¶graph.inner,
+ cell_style: paragraph.cell_style().unwrap_or(&default_cell_style),
+ font_style: paragraph.font_style().unwrap_or(&default_font_style),
+ subscripts: paragraph.subscripts(),
+ footnotes: paragraph.footnotes(),
+ value_options: &value_options,
+ };
+ let mut layout = Layout::new(&pangocairo_context);
+ let bb = Rect2::new(0..width, y + base_y..usize::MAX);
+ cell.layout(&bb, &mut layout, &default_font);
+ cell.draw(&bb, &layout, None, context);
+ y += layout.size().1 as usize;
}
y
}
for paragraph in &heading.0 {
w.create_element("vtx:text")
.with_attribute(("text", "title"))
- .write_text_content(BytesText::new(¶graph.text))?;
+ .write_text_content(
+ // XXX Need to instead generate HTML and then output it as a string.
+ BytesText::new(¶graph.display(()).to_string()),
+ )?;
}
Ok(())
})?;
use paper_sizes::{Catalog, Length, PaperSize, Unit};
use serde::{Deserialize, Deserializer, Serialize, de::Error};
-use crate::output::{pivot::FontStyle, spv::html::parse_paragraphs};
+use crate::output::{pivot::Value, spv::html::parse_paragraphs};
-use super::pivot::{Axis2, HorzAlign};
+use super::pivot::Axis2;
#[derive(Copy, Clone, Debug, Default, PartialEq, Eq, Deserialize, Serialize)]
#[serde(rename_all = "snake_case")]
QuarterHeight,
}
-#[derive(Clone, Debug, PartialEq)]
-pub struct Paragraph {
- pub text: String,
- pub align: HorzAlign,
- pub font_style: FontStyle,
-}
-
-impl Default for Paragraph {
- fn default() -> Self {
- Self {
- text: Default::default(),
- align: HorzAlign::Left,
- font_style: FontStyle::default().with_size(10),
- }
- }
-}
-
#[derive(Clone, Debug, Default, PartialEq)]
-pub struct Heading(pub Vec<Paragraph>);
+pub struct Heading(pub Vec<Value>);
impl<'de> Deserialize<'de> for Heading {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
}
}
-impl Area {
- pub fn default_cell_style(self) -> CellStyle {
- use HorzAlign::*;
- use VertAlign::*;
- let (horz_align, vert_align, hmargins, vmargins) = match self {
- Area::Title => (Some(Center), Middle, [8, 11], [1, 8]),
- Area::Caption => (Some(Left), Top, [8, 11], [1, 1]),
- Area::Footer => (Some(Left), Top, [11, 8], [2, 3]),
- Area::Corner => (Some(Left), Bottom, [8, 11], [1, 1]),
- Area::Labels(Axis2::X) => (Some(Center), Top, [8, 11], [1, 3]),
- Area::Labels(Axis2::Y) => (Some(Left), Top, [8, 11], [1, 3]),
- Area::Data(_) => (None, Top, [8, 11], [1, 1]),
- Area::Layers => (Some(Left), Bottom, [8, 11], [1, 3]),
- };
- CellStyle {
- horz_align,
- vert_align,
- margins: enum_map! { Axis2::X => hmargins, Axis2::Y => vmargins },
- }
- }
-
- pub fn default_font_style(self) -> FontStyle {
- FontStyle::default().with_bold(self == Area::Title)
- }
-
- pub fn default_area_style(self) -> AreaStyle {
- AreaStyle {
- cell_style: self.default_cell_style(),
- font_style: self.default_font_style(),
- }
- }
-}
-
/// Distinguishes [Area::Data] for even-numbered and odd-numbered rows.
#[derive(Copy, Clone, Debug, Default, Enum, PartialEq, Eq)]
pub enum RowParity {
}),
footnote_marker_type: FootnoteMarkerType::default(),
footnote_marker_position: FootnoteMarkerPosition::default(),
- areas: EnumMap::from_fn(Area::default_area_style),
+ areas: EnumMap::from_fn(AreaStyle::default_for_area),
borders: Border::default_borders(),
print_all_layers: false,
paginate_layers: false,
pub font_style: FontStyle,
}
+impl AreaStyle {
+ pub fn default_for_area(area: Area) -> Self {
+ Self {
+ cell_style: CellStyle::default_for_area(area),
+ font_style: FontStyle::default_for_area(area),
+ }
+ }
+}
+
#[derive(Clone, Debug, Serialize, PartialEq)]
pub struct CellStyle {
/// `None` means "mixed" alignment: align strings to the left, numbers to
pub margins: EnumMap<Axis2, [i32; 2]>,
}
+impl Default for CellStyle {
+ fn default() -> Self {
+ Self::default_for_area(Area::default())
+ }
+}
+
+impl CellStyle {
+ pub fn default_for_area(area: Area) -> Self {
+ use HorzAlign::*;
+ use VertAlign::*;
+ let (horz_align, vert_align, hmargins, vmargins) = match area {
+ Area::Title => (Some(Center), Middle, [8, 11], [1, 8]),
+ Area::Caption => (Some(Left), Top, [8, 11], [1, 1]),
+ Area::Footer => (Some(Left), Top, [11, 8], [2, 3]),
+ Area::Corner => (Some(Left), Bottom, [8, 11], [1, 1]),
+ Area::Labels(Axis2::X) => (Some(Center), Top, [8, 11], [1, 3]),
+ Area::Labels(Axis2::Y) => (Some(Left), Top, [8, 11], [1, 3]),
+ Area::Data(_) => (None, Top, [8, 11], [1, 1]),
+ Area::Layers => (Some(Left), Bottom, [8, 11], [1, 3]),
+ };
+ Self {
+ horz_align,
+ vert_align,
+ margins: enum_map! { Axis2::X => hmargins, Axis2::Y => vmargins },
+ }
+ }
+ pub fn with_horz_align(self, horz_align: Option<HorzAlign>) -> Self {
+ Self { horz_align, ..self }
+ }
+ pub fn with_vert_align(self, vert_align: VertAlign) -> Self {
+ Self { vert_align, ..self }
+ }
+ pub fn with_margins(self, margins: EnumMap<Axis2, [i32; 2]>) -> Self {
+ Self { margins, ..self }
+ }
+}
+
#[derive(Copy, Clone, Debug, PartialEq, Deserialize, Serialize)]
#[serde(rename_all = "snake_case")]
pub enum HorzAlign {
}
}
+/// Unknown horizontal alignment.
+#[derive(Copy, Clone, Debug, PartialEq, Eq)]
+pub struct UnknownHorzAlign;
+
+impl FromStr for HorzAlign {
+ type Err = UnknownHorzAlign;
+
+ fn from_str(s: &str) -> Result<Self, Self::Err> {
+ if s.eq_ignore_ascii_case("left") {
+ Ok(Self::Left)
+ } else if s.eq_ignore_ascii_case("center") {
+ Ok(Self::Center)
+ } else if s.eq_ignore_ascii_case("right") {
+ Ok(Self::Right)
+ } else {
+ Err(UnknownHorzAlign)
+ }
+ }
+}
+
#[derive(Copy, Clone, Debug, PartialEq, Eq, Serialize)]
#[serde(rename_all = "snake_case")]
pub enum VertAlign {
}
impl FontStyle {
+ pub fn default_for_area(area: Area) -> Self {
+ Self::default().with_bold(area == Area::Title)
+ }
pub fn with_size(self, size: i32) -> Self {
Self { size, ..self }
}
impl Debug for Value {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ let name = match &self.inner {
+ ValueInner::Number(_) => "Number",
+ ValueInner::String(_) => "String",
+ ValueInner::Variable(_) => "Variable",
+ ValueInner::Text(_) => "Text",
+ ValueInner::Markup(_) => "Markup",
+ ValueInner::Template(_) => "Template",
+ ValueInner::Empty => "Empty",
+ };
+ f.write_str(name)?;
write!(f, "{:?}", self.display(()).to_string())?;
+ if let Some(markup) = self.inner.markup() {
+ write!(f, " (markup: {markup:?})")?;
+ }
if let Some(styling) = &self.styling {
write!(f, " ({styling:?})")?;
}
}
}
+impl From<Length> for paper_sizes::Length {
+ fn from(value: Length) -> Self {
+ Self::new(value.0, paper_sizes::Unit::Inch)
+ }
+}
+
impl Debug for Length {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{:.2}in", self.0)
path::Path,
};
-use anyhow::Context;
+use anyhow::{Context, anyhow};
use binrw::{BinRead, error::ContextExt};
use cairo::ImageSurface;
use displaydoc::Display;
+use paper_sizes::PaperSize;
use serde::Deserialize;
use zip::{ZipArchive, result::ZipError};
-use crate::output::{
- Details, Item, SpvInfo, SpvMembers, Text,
- page::PageSetup,
- pivot::{Look, TableProperties, Value},
- spv::{
- legacy_bin::LegacyBin,
- legacy_xml::Visualization,
- light::{LightError, LightTable},
+use crate::{
+ crypto::EncryptedFile,
+ output::{
+ Details, Item, SpvInfo, SpvMembers, Text,
+ page::{self},
+ pivot::{Axis2, Length, Look, TableProperties, Value},
+ spv::{
+ html::parse_paragraphs,
+ legacy_bin::LegacyBin,
+ legacy_xml::Visualization,
+ light::{LightError, LightTable},
+ },
},
};
mod legacy_xml;
mod light;
-#[derive(Debug, Display, thiserror::Error)]
-pub enum Error {
- /// Not an SPV file.
- NotSpv,
-
- /// {0}
- ZipError(#[from] ZipError),
+/// Options for reading an SPV file.
+#[derive(Clone, Debug, Default)]
+pub struct ReadOptions {
+ /// Password to use to unlock an encrypted SPV file.
+ ///
+ /// For an encrypted SPV file, this must be set to the (encoded or
+ /// unencoded) password.
+ ///
+ /// For a plaintext SPV file, this must be None.
+ pub password: Option<String>,
+}
- /// {0}
- IoError(#[from] std::io::Error),
+impl ReadOptions {
+ /// Construct a new [ReadOptions] without a password.
+ pub fn new() -> Self {
+ Self::default()
+ }
- /// {0}
- DeError(#[from] quick_xml::DeError),
+ /// Causes the file to be read by decrypting it with the given `password` or
+ /// without decrypting if `password` is None.
+ pub fn with_password(self, password: Option<String>) -> Self {
+ Self { password }
+ }
- /// {0}
- BinrwError(#[from] binrw::Error),
+ /// Opens the file at `path`.
+ pub fn open_file<P>(mut self, path: P) -> Result<SpvFile, anyhow::Error>
+ where
+ P: AsRef<Path>,
+ {
+ let file = File::open(path)?;
+ if let Some(password) = self.password.take() {
+ self.open_reader_encrypted(file, password)
+ } else {
+ Self::open_reader_inner(file)
+ }
+ }
- /// {0}
- LightError(#[from] LightError),
+ /// Opens the file read from `reader`.
+ fn open_reader_encrypted<R>(self, reader: R, password: String) -> Result<SpvFile, anyhow::Error>
+ where
+ R: Read + Seek + 'static,
+ {
+ Self::open_reader_inner(
+ EncryptedFile::new(reader)?
+ .unlock(password.as_bytes())
+ .map_err(|_| anyhow!("Incorrect password."))?,
+ )
+ }
- /// {0}
- CairoError(#[from] cairo::IoError),
-}
+ /// Opens the file read from `reader`.
+ pub fn open_reader<R>(mut self, reader: R) -> Result<SpvFile, anyhow::Error>
+ where
+ R: Read + Seek + 'static,
+ {
+ if let Some(password) = self.password.take() {
+ self.open_reader_encrypted(reader, password)
+ } else {
+ Self::open_reader_inner(reader)
+ }
+ }
-impl Item {
- pub fn from_spv_file(path: impl AsRef<Path>) -> Result<(Self, Option<PageSetup>), Error> {
- Self::from_spv_reader(File::open(path.as_ref())?)
+ fn open_reader_inner<R>(reader: R) -> Result<SpvFile, anyhow::Error>
+ where
+ R: Read + Seek + 'static,
+ {
+ // Open archive.
+ let mut archive = ZipArchive::new(reader).map_err(|error| match error {
+ ZipError::InvalidArchive(_) => Error::NotSpv,
+ other => other.into(),
+ })?;
+ Ok(Self::from_spv_zip_archive(&mut archive)?)
}
- pub fn from_spv_zip_archive<R>(
- archive: &mut ZipArchive<R>,
- ) -> Result<(Self, Option<PageSetup>), Error>
+ fn from_spv_zip_archive<R>(archive: &mut ZipArchive<R>) -> Result<SpvFile, Error>
where
R: Read + Seek,
{
}
}
- Ok((items.into_iter().collect(), page_setup))
+ Ok(SpvFile {
+ item: items.into_iter().collect(),
+ page_setup,
+ })
}
+}
- pub fn from_spv_reader<R>(reader: R) -> Result<(Self, Option<PageSetup>), Error>
- where
- R: Read + Seek,
- {
- // Open archive.
- let mut archive = ZipArchive::new(reader).map_err(|error| match error {
- ZipError::InvalidArchive(_) => Error::NotSpv,
- other => other.into(),
- })?;
- Self::from_spv_zip_archive(&mut archive)
+pub struct SpvFile {
+ /// SPV file contents.
+ pub item: Item,
+
+ /// The page setup in the SPV file, if any.
+ pub page_setup: Option<page::PageSetup>,
+}
+
+impl SpvFile {
+ pub fn into_parts(self) -> (Item, Option<page::PageSetup>) {
+ (self.item, self.page_setup)
+ }
+
+ pub fn into_item(self) -> Item {
+ self.item
}
}
+#[derive(Debug, Display, thiserror::Error)]
+pub enum Error {
+ /// Not an SPV file.
+ NotSpv,
+
+ /// {0}
+ ZipError(#[from] ZipError),
+
+ /// {0}
+ IoError(#[from] std::io::Error),
+
+ /// {0}
+ DeError(#[from] quick_xml::DeError),
+
+ /// {0}
+ BinrwError(#[from] binrw::Error),
+
+ /// {0}
+ LightError(#[from] LightError),
+
+ /// {0}
+ CairoError(#[from] cairo::IoError),
+}
+
fn new_error_item(message: impl Into<Value>) -> Item {
Text::new_log(message).into_item().with_label("Error")
}
archive: &mut ZipArchive<R>,
file_number: usize,
structure_member: &str,
-) -> Result<(Vec<Item>, Option<PageSetup>), Error>
+) -> Result<(Vec<Item>, Option<page::PageSetup>), Error>
where
R: Read + Seek,
{
Err(error) => panic!("{error:?}"),
};
let page_setup = heading.page_setup.take();
- Ok((heading.decode(archive, structure_member)?, page_setup))
+ dbg!(page_setup);
+ Ok((
+ heading.decode(archive, structure_member)?,
+ None, /*XXX*/
+ ))
}
#[derive(Deserialize, Debug)]
}
}
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageSetup {
+ #[serde(rename = "@initial-page-number")]
+ pub initial_page_number: Option<i32>,
+ #[serde(rename = "@chart-size")]
+ pub chart_size: Option<ChartSize>,
+ #[serde(rename = "@margin-left")]
+ pub margin_left: Option<Length>,
+ #[serde(rename = "@margin-right")]
+ pub margin_right: Option<Length>,
+ #[serde(rename = "@margin-top")]
+ pub margin_top: Option<Length>,
+ #[serde(rename = "@margin-bottom")]
+ pub margin_bottom: Option<Length>,
+ #[serde(rename = "@paper-height")]
+ pub paper_height: Option<Length>,
+ #[serde(rename = "@paper-width")]
+ pub paper_width: Option<Length>,
+ #[serde(rename = "@reference-orientation")]
+ pub reference_orientation: Option<ReferenceOrientation>,
+ #[serde(rename = "@space-after")]
+ pub space_after: Option<Length>,
+ pub page_header: PageHeader,
+ pub page_footer: PageFooter,
+}
+
+impl PageSetup {
+ fn decode(&self) -> page::PageSetup {
+ let mut setup = page::PageSetup::default();
+ if let Some(initial_page_number) = self.initial_page_number {
+ setup.initial_page_number = initial_page_number;
+ }
+ if let Some(chart_size) = self.chart_size {
+ setup.chart_size = chart_size.into();
+ }
+ if let Some(margin_left) = self.margin_left {
+ setup.margins.0[Axis2::X][0] = margin_left.into();
+ }
+ if let Some(margin_right) = self.margin_right {
+ setup.margins.0[Axis2::X][1] = margin_right.into();
+ }
+ if let Some(margin_top) = self.margin_top {
+ setup.margins.0[Axis2::Y][0] = margin_top.into();
+ }
+ if let Some(margin_bottom) = self.margin_bottom {
+ setup.margins.0[Axis2::Y][1] = margin_bottom.into();
+ }
+ match (self.paper_width, self.paper_height) {
+ (Some(width), Some(height)) => {
+ setup.paper = PaperSize::new(width.0, height.0, paper_sizes::Unit::Inch)
+ }
+ (Some(length), None) | (None, Some(length)) => {
+ setup.paper = PaperSize::new(length.0, length.0, paper_sizes::Unit::Inch)
+ }
+ (None, None) => (),
+ }
+ if let Some(reference_orientation) = self.reference_orientation {
+ setup.orientation = reference_orientation.into();
+ }
+ if let Some(space_after) = self.space_after {
+ setup.object_spacing = space_after.into();
+ }
+ if let Some(PageParagraph { text }) = &self.page_header.page_paragraph {
+ setup.header = page::Heading(text.decode());
+ }
+ if let Some(PageParagraph { text }) = &self.page_footer.page_paragraph {
+ setup.footer = page::Heading(text.decode());
+ }
+ setup
+ }
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageHeader {
+ page_paragraph: Option<PageParagraph>,
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageFooter {
+ page_paragraph: Option<PageParagraph>,
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageParagraph {
+ text: PageParagraphText,
+}
+
+#[derive(Debug, Deserialize)]
+#[serde(rename_all = "camelCase")]
+struct PageParagraphText {
+ #[serde(default, rename = "$text")]
+ text: String,
+}
+
+impl PageParagraphText {
+ fn decode(&self) -> Vec<Value> {
+ parse_paragraphs(&self.text)
+ }
+}
+
+#[derive(Copy, Clone, Debug, Default, Deserialize)]
+#[serde(rename = "snake_case")]
+pub enum ReferenceOrientation {
+ #[serde(alias = "0")]
+ #[serde(alias = "0deg")]
+ #[serde(alias = "inherit")]
+ #[default]
+ Portrait,
+
+ #[serde(alias = "90")]
+ #[serde(alias = "90deg")]
+ #[serde(alias = "-270")]
+ #[serde(alias = "-270deg")]
+ Landscape,
+
+ #[serde(alias = "180")]
+ #[serde(alias = "180deg")]
+ #[serde(alias = "-1280")]
+ #[serde(alias = "-180deg")]
+ ReversePortrait,
+
+ #[serde(alias = "270")]
+ #[serde(alias = "270deg")]
+ #[serde(alias = "-90")]
+ #[serde(alias = "-90deg")]
+ Seascape,
+}
+
+impl From<ReferenceOrientation> for page::Orientation {
+ fn from(value: ReferenceOrientation) -> Self {
+ match value {
+ ReferenceOrientation::Portrait | ReferenceOrientation::ReversePortrait => {
+ page::Orientation::Portrait
+ }
+ ReferenceOrientation::Landscape | ReferenceOrientation::Seascape => {
+ page::Orientation::Landscape
+ }
+ }
+ }
+}
+
+/// Chart size.
+#[derive(Copy, Clone, Debug, Default, Deserialize)]
+pub enum ChartSize {
+ #[default]
+ #[serde(rename = "as-is")]
+ AsIs,
+
+ #[serde(rename = "full-height")]
+ FullHeight,
+
+ #[serde(rename = "half-height")]
+ HalfHeight,
+
+ #[serde(rename = "quarter-height")]
+ QuarterHeight,
+}
+
+impl From<ChartSize> for page::ChartSize {
+ fn from(value: ChartSize) -> Self {
+ match value {
+ ChartSize::AsIs => page::ChartSize::AsIs,
+ ChartSize::FullHeight => page::ChartSize::FullHeight,
+ ChartSize::HalfHeight => page::ChartSize::HalfHeight,
+ ChartSize::QuarterHeight => page::ChartSize::QuarterHeight,
+ }
+ }
+}
+
#[derive(Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
enum HeadingContent {
#[cfg(test)]
#[test]
fn test_spv() {
- let item = Item::from_spv_file(Path::new("/home/blp/pspp/rust/tests/utilities/regress.spv"))
+ let item = ReadOptions::new()
+ .open_file("/home/blp/pspp/rust/tests/utilities/regress.spv")
.unwrap()
- .0;
+ .into_item();
println!("{item}");
todo!()
}
use itertools::Itertools;
-use crate::output::pivot::FontStyle;
+use crate::output::pivot::{FontStyle, HorzAlign};
#[derive(Clone, Debug, PartialEq, Eq)]
enum Token<'a> {
}
}
+impl HorzAlign {
+ pub fn from_css(s: &str) -> Option<Self> {
+ let mut lexer = Lexer(s);
+ while let Some(token) = lexer.next() {
+ if let Token::Id(key) = token
+ && let Some(Token::Colon) = lexer.next()
+ && let Some(Token::Id(value)) = lexer.next()
+ && key.as_ref() == "text-align"
+ && let Ok(align) = value.parse()
+ {
+ return Some(align);
+ }
+ }
+ None
+ }
+}
+
impl FontStyle {
pub fn parse_css(&mut self, s: &str) {
let mut lexer = Lexer(s);
#[cfg(test)]
mod tests {
- use std::borrow::Cow;
+ use std::{borrow::Cow, str::FromStr};
use crate::output::{
- pivot::{Color, FontStyle},
+ pivot::{Color, FontStyle, HorzAlign, UnknownHorzAlign},
spv::css::{Lexer, Token},
};
+ #[test]
+ fn css_horz_align() {
+ assert_eq!(
+ HorzAlign::from_css("text-align: left"),
+ Some(HorzAlign::Left)
+ );
+ assert_eq!(
+ HorzAlign::from_css("margin-top: 0; text-align:center"),
+ Some(HorzAlign::Center)
+ );
+ assert_eq!(
+ HorzAlign::from_css("text-align: Right; margin-top:0"),
+ Some(HorzAlign::Right)
+ );
+ assert_eq!(HorzAlign::from_css("text-align: other"), None);
+ assert_eq!(HorzAlign::from_css("margin-top: 0"), None);
+ }
+
#[test]
fn css_strings() {
#[track_caller]
use html_parser::{Dom, Element, Node};
-use crate::output::{
- page::Paragraph,
- pivot::{Color, FontStyle, HorzAlign, Value},
-};
+use crate::output::pivot::{CellStyle, Color, FontStyle, HorzAlign, Value};
fn find_element<'a>(elements: &'a [Node], name: &str) -> Option<&'a Element> {
for element in elements {
push_whitespace(' ', s);
}
_ if c.is_whitespace() => push_whitespace(c, s),
- '<' => s.push_str("<"),
- '>' => s.push_str(">"),
- '&' => s.push_str("&"),
_ => s.push(c),
}
}
write!(s, "<{tag}>").unwrap();
Some(tag)
}
+ "strong" => {
+ write!(s, "<b>").unwrap();
+ Some("b")
+ }
+ "em" => {
+ write!(s, "<i>").unwrap();
+ Some("i")
+ }
+ "strike" => {
+ write!(s, r#"<span strikethrough="true">"#).unwrap();
+ Some("span")
+ }
"font" => {
s.push_str("<span");
if let Some(Some(face)) = element.attributes.get("face") {
if let Some(Some(html_size)) = element.attributes.get("size")
&& let Ok(html_size) = usize::from_str(&html_size)
&& let Some(index) = html_size.checked_sub(1)
- && let Some(scale) = [0.444, 0.556, 0.667, 0.778, 1.0, 1.33, 2.0]
- .get(index)
- .copied()
+ && let Some(points) =
+ [6.0, 7.5, 9.0, 10.5, 13.5, 18.0, 27.0].get(index).copied()
{
- let size = base_font_size as f64 * scale * 1024.0;
- push_attribute("size", format_args!("{size:.0}"), s);
+ push_attribute("size", format_args!("{points:.1}pt"), s);
}
s.push('>');
Some("span")
) -> Result<(), html_parser::Error> {
let dom = Dom::parse(&format!("<!doctype html>{input}"))?;
for node in &dom.children {
- match node.element() {
- Some(head) if head.name.eq_ignore_ascii_case("head") => {
+ match node {
+ Node::Element(head) if head.name.eq_ignore_ascii_case("head") => {
if let Some(style) = find_element(&head.children, "style") {
let mut text = String::new();
get_element_text(style, &mut text);
font_style.parse_css(&text)
}
}
- Some(p) if p.name.eq_ignore_ascii_case("p") => {
- let align = match p.attributes.get("align") {
- Some(Some(align)) if align.eq_ignore_ascii_case("left") => HorzAlign::Left,
- Some(Some(align)) if align.eq_ignore_ascii_case("right") => HorzAlign::Right,
- Some(Some(align)) if align.eq_ignore_ascii_case("center") => HorzAlign::Center,
- _ => HorzAlign::Left,
+ Node::Element(p) if p.name.eq_ignore_ascii_case("p") => {
+ let align = if let Some(Some(s)) = p.attributes.get("align")
+ && let Ok(align) = HorzAlign::from_str(s)
+ {
+ align
+ } else if let Some(Some(s)) = p.attributes.get("style")
+ && let Some(align) = HorzAlign::from_css(s)
+ {
+ align
+ } else {
+ HorzAlign::Left
};
output.start_paragraph(align);
extract_html_text2(node, font_style.size, output);
output.end_paragraph();
}
- _ => extract_html_text2(node, font_style.size, output),
+ Node::Element(_) | Node::Text(_) => extract_html_text2(node, font_style.size, output),
+ Node::Comment(_) => (),
}
}
Ok(())
.with_font_style(font_style)
}
-pub fn parse_paragraphs(input: &str) -> Vec<Paragraph> {
+pub fn parse_paragraphs(input: &str) -> Vec<Value> {
let mut font_style = FontStyle::default().with_size(10);
- #[derive(Default)]
struct Paragraphs {
- current: Paragraph,
- finished: Vec<Paragraph>,
+ markup: String,
+ horz_align: HorzAlign,
+ finished: Vec<Value>,
+ }
+
+ impl Default for Paragraphs {
+ fn default() -> Self {
+ Self {
+ markup: String::new(),
+ horz_align: HorzAlign::Left,
+ finished: Vec::new(),
+ }
+ }
}
impl HtmlOutput for Paragraphs {
fn start_paragraph(&mut self, align: HorzAlign) {
- if !self.current.text.is_empty() {
+ if !self.markup.is_empty() {
self.end_paragraph();
}
- self.current.align = align;
+ self.horz_align = align;
}
fn end_paragraph(&mut self) {
- self.finished.push(take(&mut self.current));
+ let value = Value::new_markup(take(&mut self.markup))
+ .with_cell_style(CellStyle::default().with_horz_align(Some(self.horz_align)));
+ self.finished.push(value);
}
fn text(&mut self) -> &mut String {
- &mut self.current.text
+ &mut self.markup
}
}
let mut output = Paragraphs::default();
if parse2(input, &mut output, &mut font_style).is_ok() {
- if !output.current.text.is_empty() {
+ if !output.markup.is_empty() {
output.end_paragraph();
}
output.finished
} else if !input.is_empty() {
- vec![Paragraph {
- text: input.into(),
- ..Paragraph::default()
- }]
+ vec![Value::new_user_text(input)]
} else {
Vec::new()
}
}
pub fn parse(input: &str) -> Value {
- let mut font_style = FontStyle::default().with_size(10);
+ let mut font_style = FontStyle::default();
let value = match Dom::parse(&format!("<!doctype html>{input}")) {
Ok(dom) => {
let mut s = String::new();
#[cfg(test)]
mod tests {
+ use quick_xml::events::Event;
+
use crate::output::{
pivot::{FontStyle, Value},
spv::html::{parse, parse_paragraphs, parse_value},
};
+ #[test]
+ fn test_parse() {
+ let text = r##"<xml><html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+ <head>
+
+ </head>
+ <body>
+ <p>
+ plain&#160;<font color="#000000" size="3" face="Monospaced"><b>bold</b></font>&#160;<font color="#000000" size="3" face="Monospaced"><i>italic</i>&#160;<strike>strikeout</strike></font>
+ </p>
+ </body>
+</html>
+</xml>"##;
+ let content = quick_xml::de::from_str::<String>(text).unwrap();
+ dbg!(parse_paragraphs(&content));
+ }
+
#[test]
fn css() {
assert_eq!(
#[test]
fn paragraphs() {
+ let paragraphs = parse_paragraphs(
+ r#"<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
+ <head>
+ <style type="text/css">
+ p { font-family: sans-serif;
+ font-size: 10pt; text-align: center;
+ font-weight: normal;
+ color: #000000;
+ }
+ </style>
+ </head>
+ <body>
+ <p>&[PageTitle]</p>
+ </body>
+ </html>]"#,
+ );
+ dbg!(¶graphs);
+ for value in ¶graphs {
+ println!("{}", value.display(()));
+ }
+ todo!();
let paragraphs = parse_paragraphs(
r#"<p align="left"><b>bold</b><br><i>italic</i><BR><b><i>bold italic</i></b><br><font color="red" face="Serif">red serif</font><br><font size="7">big</font><br></p>not in a paragraph<p align="right">right justified</p><p align="center">centered</p>trailing"#,
);
dbg!(¶graphs);
assert_eq!(paragraphs.len(), 5);
- todo!()
/*
assert_eq!(
paragraph,
&& let Some(label) = &axis.label
{
let out = &mut look.areas[Area::Labels(a)];
- *out = Area::Labels(a).default_area_style();
+ *out = AreaStyle::default_for_area(Area::Labels(a));
let style = label.style.get(&styles);
Style::decode_area(
style,
frame: None,
format: None,
} if alternating => {
- let mut style = Area::Data(RowParity::Odd).default_area_style();
+ let mut style = AreaStyle::default_for_area(Area::Data(RowParity::Odd));
Style::decode_area(self.labeling, self.graph, &mut style);
let font_style = &mut look.areas[Area::Data(RowParity::Odd)].font_style;
font_style.fg = style.font_style.fg;
}
/// Causes the file to be read by decrypting it with the given `password` or
- /// without decrypting if `encoding` is None.
+ /// without decrypting if `password` is None.
pub fn with_password(self, password: Option<String>) -> Self {
Self { password, ..self }
}