rust: Add support for reading SPSS/PC+ system files.
authorBen Pfaff <blp@cs.stanford.edu>
Wed, 17 Sep 2025 15:45:57 +0000 (08:45 -0700)
committerBen Pfaff <blp@cs.stanford.edu>
Wed, 24 Sep 2025 15:07:11 +0000 (08:07 -0700)
Signed-off-by: Ben Pfaff <blp@cs.stanford.edu>
22 files changed:
rust/doc/src/SUMMARY.md
rust/doc/src/invoking/pspp-convert.md
rust/doc/src/invoking/pspp-show-pc.md [new file with mode: 0644]
rust/doc/src/invoking/pspp-show-por.md
rust/doc/src/invoking/pspp-show.md
rust/doc/src/pc+.md
rust/pspp/src/convert.rs
rust/pspp/src/file.rs
rust/pspp/src/lib.rs
rust/pspp/src/main.rs
rust/pspp/src/output/pivot.rs
rust/pspp/src/pc.rs [new file with mode: 0644]
rust/pspp/src/pc/testdata/README.md [new file with mode: 0644]
rust/pspp/src/pc/testdata/test1.expected [new file with mode: 0644]
rust/pspp/src/pc/testdata/test1.sys [new file with mode: 0644]
rust/pspp/src/pc/testdata/test2.expected [new file with mode: 0644]
rust/pspp/src/pc/testdata/test2.sys [new file with mode: 0644]
rust/pspp/src/pc/tests.rs [new file with mode: 0644]
rust/pspp/src/show_pc.rs [new file with mode: 0644]
rust/pspp/src/sys/cooked.rs
rust/pspp/src/sys/raw.rs
rust/pspp/src/sys/raw/records.rs

index 1d1c14d2e6918af300a0fc20997e3437edbfbca3..4c10ae8220f06dd3cbac268924623e13693d47a7 100644 (file)
@@ -5,8 +5,9 @@
 
 - [Running PSPP](invoking/index.md)
   - [Converting Data](invoking/pspp-convert.md)
-  - [Inspecting `.sav` Data](invoking/pspp-show.md)
-  - [Inspecting `.por` Data](invoking/pspp-show-por.md)
+  - [Inspecting System Files](invoking/pspp-show.md)
+  - [Inspecting Portable Files](invoking/pspp-show-por.md)
+  - [Inspecting SPSS/PC+ Files](invoking/pspp-show-pc.md)
   - [Decrypting Files](invoking/pspp-decrypt.md)
 
 # Language Overview
index e6c2abd95f590432696f7fcc27ebd07bb183d49f..d1248cd67f9a2fbd79c29970ebc7466d46b96e6c 100644 (file)
@@ -7,9 +7,9 @@ another.  The basic syntax is:
 pspp convert <INPUT> [OUTPUT]
 ```
 
-which reads an SPSS system file or portable file from `<INPUT>` and
-writes a copy of it to `[OUTPUT]`.  If `[OUTPUT]` is omitted, output
-is written to the terminal.
+which reads an SPSS system file or portable file or SPSS/PC+ system
+file from `<INPUT>` and writes a copy of it to `[OUTPUT]`.  If
+`[OUTPUT]` is omitted, output is written to the terminal.
 
 If `[OUTPUT]` is specified, then `pspp convert` tries to guess the
 output format based on its extension:
diff --git a/rust/doc/src/invoking/pspp-show-pc.md b/rust/doc/src/invoking/pspp-show-pc.md
new file mode 100644 (file)
index 0000000..6e85c11
--- /dev/null
@@ -0,0 +1,78 @@
+# Inspecting SPSS/PC+ Files
+
+The `pspp show-pc` command reads an SPSS/PC+ system file which
+usually has a `.sys` extension, and produces a report.
+
+> SPSS/PC+ has been obsolete since the 1990s, and its file format is
+> also obsolete and rarely encountered.  Use [`pspp
+> show`](pspp-show.md) to inspect modern SPSS system files.
+
+The basic syntax is:
+
+```
+pspp show-pc <MODE> <INPUT> [OUTPUT]
+```
+
+where `<MODE>` is a mode of operation (see below), `<INPUT>` is the
+SPSS/PC+ file to read, and `[OUTPUT]` is the output file name.  If
+`[OUTPUT]` is omitted, output is written to the terminal.
+
+The following `<MODE>`s are available:
+
+* `dictionary`: Outputs the file dictionary in detail, including
+  variables, value labels, and so on.  With `--data`, also outputs
+  cases from the system file.
+
+  This can be useful as an alternative to PSPP syntax commands such as
+  [`DISPLAY DICTIONARY`](../commands/display.md).
+
+  [`pspp convert`](pspp-convert.md) is a better way to convert an
+  SPSS/PC+ file to another format.
+
+* `metadata`: Outputs metadata not included in the dictionary:
+
+  - The creation date and time declared inside the file (not in the
+    file system).
+
+  - The name of the product family and product that wrote the file, if
+    present.
+
+  - The file name embedded inside the file, if one is present.
+
+  - Whether the file is bytecode-compressed.
+
+  - The number of cases in the file.
+
+## Options
+
+The following options affect how `pspp show-pc` reads `<INPUT>`:
+
+* `--data [<MAX_CASES>]`  
+  For mode `dictionary`, and `encodings`, this instructs `pspp
+  show-pc` to read cases from the file.  If `<MAX_CASES>` is given,
+  then that sets a limit on the number of cases to read.  Without this
+  option, PSPP will not read any cases.
+
+The following options affect how `pspp show-pc` writes its output:
+
+* `-f <FORMAT>`  
+  `--format <FORMAT>`  
+  Specifies the format to use for output.  `<FORMAT>` may be one of
+  the following:
+
+  - `json`: JSON using indentation and spaces for easy human
+    consumption.
+  - `ndjson`: [Newline-delimited JSON].
+  - `output`: Pivot tables with the PSPP output engine.  Use `-o` for
+    additional configuration.
+  - `discard`: Do not produce any output.
+
+  When these options are not used, the default output format is chosen
+  based on the `[OUTPUT]` extension.  If `[OUTPUT]` is not specified,
+  then output defaults to JSON.
+
+  [Newline-delimited JSON]: https://github.com/ndjson/ndjson-spec
+
+* `-o <OUTPUT_OPTIONS>`  
+  Adds `<OUTPUT_OPTIONS>` to the output engine configuration.
+
index 4bd208e8d2e7cb3de0a95c88323d7f781c065cbb..bfe0f669a9a6f7f236472e367689f08ad715b5f3 100644 (file)
@@ -1,4 +1,4 @@
-# Inspecting `.por` files with `pspp show-por`
+# Inspecting Portable (`.por`) Files with `pspp show-por`
 
 The `pspp show-por` command reads an SPSS "portable file",
 which usually has a `.por` extension, and produces a report.  The
index 2b4940193289380547c9f9b9ae38066121e538bb..25065438f7265ea267b7176029c90337202bdc08 100644 (file)
@@ -1,4 +1,4 @@
-# Inspecting `.sav` files with `pspp show`
+# Inspecting System (`.sav`) Files with `pspp show`
 
 The `pspp show` command reads an SPSS "system file" or data file,
 which usually has a `.sav` extension, and produces a report.  The
index 558d4136037042f7df26773765d9092b9f0c9869..5fb318c893140086d4612f771c4c67026a6ed924 100644 (file)
@@ -58,9 +58,10 @@ char                filename[128];
 
 * `char filename[128];`  
   In most files in the corpus, this field is entirely filled with
-  spaces.  In one file, it contains a file name, followed by a null
-  bytes, followed by spaces to fill the remainder of the field.  The
-  meaning is unknown.
+  spaces or null bytes.  In others, it contains a filename, which
+  generally contains doubled backslashes,
+  e.g. `c:\\doli\\altm\\f_sum94.sys`.  The unusual extension `(_)` is
+  also common, e.g. `DER56.(_)`.
 
 The following sections describe the contents of each record,
 identified by the index into the `records` array.
@@ -75,7 +76,8 @@ All files in the corpus have this record at offset 0x100 with length
 
 ```
 uint16              one0;
-char                product[62];
+char                family[2];
+char                product[60];
 flt64               sysmis;
 uint32              zero0;
 uint32              zero1;
@@ -84,12 +86,12 @@ uint16              compressed;
 uint16              nominal_case_size;
 uint16              n_cases0;
 uint16              weight_index;
-uint16              zero2;
+uint16              unknown;
 uint16              n_cases1;
-uint16              zero3;
+uint16              zero2;
 char                creation_date[8];
 char                creation_time[8];
-char                label[64];
+char                file_label[64];
 ```
 
 * `uint16 one0;`  
@@ -99,23 +101,26 @@ char                label[64];
 * `uint32 zero0;`  
   `uint32 zero1;`  
   `uint16 zero2;`  
-  `uint16 zero3;`  
   Always set to 0.
 
-  It seems likely that one of these variables is set to 1 if
-  weighting is enabled, but none of the files in the corpus is
-  weighted.
+* `uint16 unknown;`
+  Unknown meaning.  Usually set to 0.
+
+* `char family[2];`  
+  Identifies the product family that created the file.  This is either
+  `PC` for SPSS/PC+ and related software, or `DE` for SPSS Data Entry
+  and related software.
 
-* `char product[62];`  
+* `char product[60];`  
   Name of the program that created the file.  Only the following
   unique values have been observed, in each case padded on the right
   with spaces:
 
   ```
-  DESPSS/PC+ System File Written by Data Entry II
-  PCSPSS SYSTEM FILE.  IBM PC DOS, SPSS/PC+
-  PCSPSS SYSTEM FILE.  IBM PC DOS, SPSS/PC+ V3.0
-  PCSPSS SYSTEM FILE.  IBM PC DOS, SPSS for Windows
+  SPSS/PC+ System File Written by Data Entry II
+  SPSS SYSTEM FILE.  IBM PC DOS, SPSS/PC+
+  SPSS SYSTEM FILE.  IBM PC DOS, SPSS/PC+ V3.0
+  SPSS SYSTEM FILE.  IBM PC DOS, SPSS for Windows
   ```
 
   Thus, it is reasonable to use the presence of the string `SPSS` at
@@ -128,6 +133,8 @@ char                label[64];
   Set to 0 if the data in the file is not compressed, 1 if the data
   is compressed with simple bytecode compression.
 
+  > The corpus contains a mix of compressed and uncompressed files.
+
 * `uint16 nominal_case_size;`  
   Number of data elements per case.  This is the number of variables,
   except that long string variables add extra data elements (one for
@@ -137,8 +144,10 @@ char                label[64];
 * `uint16 n_cases0;`  
   `uint16 n_cases1;`  
   The number of cases in the data record.  Both values are the same.
-  Some files in the corpus contain data for the number of cases noted
-  here, followed by garbage that somewhat resembles data.
+
+  > Readers must use these case counts because some files in the corpus
+  contain garbage that somewhat resembles data after the specified
+  number of cases.
 
 * `uint16 weight_index;`  
   0, if the file is unweighted, otherwise a 1-based index into the
@@ -147,15 +156,17 @@ char                label[64];
 
 * `char creation_date[8];`  
   The date that the file was created, in `mm/dd/yy` format.
-  Single-digit days and months are not prefixed by zeros.  The string
+
+  > Single-digit days and months are not prefixed by zeros.  The string
   is padded with spaces on right or left or both, e.g.  `_2/4/93_`,
   `10/5/87_`, and `_1/11/88` (with `_` standing in for a space) are
   all actual examples from the corpus.
 
 * `char creation_time[8];`  
   The time that the file was created, in `HH:MM:SS` format.
-  Single-digit hours are padded on a left with a space.  Minutes and
-  seconds are always written as two digits.
+
+  > Single-digit hours are padded on the left with a space.  Minutes
+  and seconds are always written as two digits.
 
 * `char file_label[64];`  
   [File label](commands/file-label.md) declared by the user, if any.
@@ -194,14 +205,21 @@ these additional instances for long strings.
 
 * `uint32 value_label_start;`  
   `uint32 value_label_end;`  
-  For a variable with value labels, these specify offsets into the
-  label record of the start and end of this variable's value
-  labels, respectively.  See the [labels
-  record](#record-2-labels-record), for more information.
-
-  For a variable without any value labels, these are both zero.
-
-  A long string variable may not have value labels.
+  These specify offsets into the label record of the start and end of
+  value labels for this variable.  They are zero if there are no value
+  labels.  See the [labels record](#record-2-labels-record), for more
+  information.  A long string variable may not have value labels.
+
+  Sometimes the data is, instead of value labels, some form of data
+  validation rules for SPSS Data Entry.  There is no known way to
+  distinguish, except that data validation rules often cannot be
+  interpreted as valid value labels because the label length field
+  makes them not fit exactly in the allocated space.
+
+  > It appears that SPSS products cannot properly read these either.
+  > All the files in the corpus with these problems are closely
+  > related, so it's also possible that they are corrupted in some
+  > way.
 
 * `uint32 var_label_ofs;`  
   For a variable with a variable label, this specifies an offset into
@@ -224,13 +242,15 @@ these additional instances for long strings.
   variable's user-missing value.  For string variables, `missing.s`
   is a string missing value.  A variable without a user-missing value
   is indicated with `missing.f` set to the system-missing value, even
-  for string variables (!).  A Long string variable may not have a
+  for string variables (!).  A long string variable may not have a
   missing value.
 
 In addition to the user-defined variables, every SPSS/PC+ system file
 contains, as its first three variables, the following system-defined
 variables, in the following order.  The system-defined variables have
-no variable label, value labels, or missing values.
+no variable label, value labels, or missing values.  PSPP renames
+these variables to start with `@` when it reads an SPSS/PC+ system
+file.
 
 * `$CASENUM`  
   A numeric variable with format `F8.0`.  Most of the time this is a
@@ -247,8 +267,7 @@ no variable label, value labels, or missing values.
 
 * `$WEIGHT`  
   A numeric variable with format `F8.2`.  This represents the case's
-  weight; SPSS/PC+ files do not have a user-defined weighting
-  variable.  If weighting has not been enabled, every case has value
+  weight.  If weighting has not been enabled, every case has value
   1.0.
 
 ## Record 2: Labels Record
@@ -263,7 +282,7 @@ fields in a variable record are all offsets relative to the beginning of
 the labels record, with an additional 7-byte offset.  That is, if the
 labels record starts at byte offset `labels_ofs` and a variable has a
 given `var_label_ofs`, then the variable label begins at byte offset
-`labels_ofs` + `var_label_ofs` + 7 in the file.
+`labels_ofs` + `var_label_ofs + 7` in the file.
 
 A variable label, starting at the offset indicated by
 `var_label_ofs`, consists of a one-byte length followed by the specified
@@ -274,7 +293,7 @@ uint8               length;
 char                s[length];
 ```
 
-   A set of value labels, extending from `value_label_start` to
+A set of value labels, extending from `value_label_start` to
 `value_label_end` (exclusive), consists of a numeric or string value
 followed by a string in the format just described.  String values are
 padded on the right with spaces to fill the 8-byte field, like this:
@@ -288,10 +307,10 @@ uint8               length;
 char                s[length];
 ```
 
-   The labels record begins with a pair of `uint32` values.  The first of
-these is always 3.  The second is between 8 and 16 less than the number
-of bytes in the record.  Neither value is important for interpreting the
-file.
+The labels record begins with a pair of `uint32` values.  The first of
+these is always 3.  The second is between 8 and 16 less than the
+number of bytes in the record.  Neither value is important for
+interpreting the file.
 
 ## Record 3: Data Record
 
@@ -321,15 +340,16 @@ The format of the data record varies depending on the value of
     following the command bytes, and so on.
 
   - 2 through 255  
-    A number with value CODE - 100, where CODE is the value of the
+    A number with value `CODE - 100`, where `CODE` is the value of the
     compression code.  For example, code 105 indicates a numeric
     variable of value 5.
 
-  The end of the 8-byte group of bytecodes is followed by any 8-byte
-  blocks of non-compressible values indicated by code 1.  After that
-  follows another 8-byte group of bytecodes, then those bytecodes'
-  non-compressible values.  The pattern repeats up to the number of
-  cases specified by the main header record have been seen.
+  The end of the 8-byte group of command codes is followed by any
+  8-byte blocks of non-compressible values indicated by code 1.  After
+  that follows another 8-byte group of command codes, then those
+  command codes' non-compressible values.  The pattern repeats up to
+  the number of cases specified by the main header record have been
+  seen.
 
   The corpus does not contain any files with command codes 2 through
   95, so it is possible that some of these codes are used for special
index 34aed906bb66b67c837d60f12f60ad0451dc6895..b38ccb3b100b229fb4054e516949842da1abd1bf 100644 (file)
@@ -30,6 +30,7 @@ use pspp::{
     data::{ByteString, Case, Datum, WithEncoding},
     file::FileType,
     format::{DisplayPlain, Type},
+    pc::PcFile,
     por::PortableFile,
     sys::{ReadOptions, WriteOptions, raw::records::Compression},
     util::ToSmallString,
@@ -324,7 +325,22 @@ impl Convert {
                     as Box<dyn Iterator<Item = Result<Case<Vec<Datum<ByteString>>>, AnyError>>>;
                 (dictionary, cases)
             }
-            _ => bail!("{}: not a system or portable file", self.input.display()),
+            Some(FileType::Pc) => {
+                fn warn_pc(warning: pspp::pc::Warning) {
+                    eprintln!("warning: {warning}");
+                }
+
+                let pc_file = PcFile::open_file(&self.input, warn_pc)?;
+                let (dictionary, _, cases) = pc_file.into_parts();
+                let cases = cases.map(|result| result.map_err(AnyError::from));
+                let cases = Box::new(cases)
+                    as Box<dyn Iterator<Item = Result<Case<Vec<Datum<ByteString>>>, AnyError>>>;
+                (dictionary, cases)
+            }
+            _ => bail!(
+                "{}: not a system, portable, or SPSS/PC+ file",
+                self.input.display()
+            ),
         };
 
         // Take only the first `self.max_cases` cases.
index 6e34b4a2d9f0253d2e16488b34b8d19da4458699..7c9cf4fe5270cffa8e2a4d98e70e749528d770c1 100644 (file)
@@ -40,7 +40,7 @@ pub enum FileType {
     Portable,
 
     /// An SPSS PC+ data file.
-    PcPlus,
+    Pc,
 
     /// An [SPSS Viewer file](crate::output::spv).
     Viewer {
@@ -148,7 +148,7 @@ impl FileType {
         }
 
         if buf.get(0x104..0x108) == Some(b"SPSS") {
-            return Ok(Some(Self::PcPlus));
+            return Ok(Some(Self::Pc));
         }
 
         let mut string = String::new();
index 00114eba12468efd7f246932325100dbadb93584..baff025160615776467d8a068e87723c6026d9a4 100644 (file)
@@ -117,6 +117,7 @@ pub mod locale_charset;
 pub mod macros;
 pub mod message;
 pub mod output;
+pub mod pc;
 pub mod por;
 pub mod prompt;
 pub mod settings;
index 5cdd92ee71534935484964a010a44ce41dce51d4..4fa6f80b557396ed11fcd2f1c08096f4358ebc20 100644 (file)
@@ -19,11 +19,12 @@ use clap::{Parser, Subcommand};
 use encoding_rs::Encoding;
 use thiserror::Error as ThisError;
 
-use crate::{convert::Convert, decrypt::Decrypt, show::Show, show_por::ShowPor};
+use crate::{convert::Convert, decrypt::Decrypt, show::Show, show_pc::ShowPc, show_por::ShowPor};
 
 mod convert;
 mod decrypt;
 mod show;
+mod show_pc;
 mod show_por;
 
 /// PSPP, a program for statistical analysis of sampled data.
@@ -40,6 +41,7 @@ enum Command {
     Decrypt(Decrypt),
     Show(Show),
     ShowPor(ShowPor),
+    ShowPc(ShowPc),
 }
 
 impl Command {
@@ -49,6 +51,7 @@ impl Command {
             Command::Decrypt(decrypt) => decrypt.run(),
             Command::Show(show) => show.run(),
             Command::ShowPor(show_por) => show_por.run(),
+            Command::ShowPc(show_pc) => show_pc.run(),
         }
     }
 }
index 5ceab55e8e0446c054a19755e54deb894fda91fd..13392f8ea6bdd8331fe4e0fb9519cc121e9c6c3a 100644 (file)
@@ -2704,12 +2704,13 @@ pub struct MetadataEntry {
     pub value: MetadataValue,
 }
 
-pub enum MetadataValue {
-    Leaf(Value),
-    Group(Vec<MetadataEntry>),
-}
-
 impl MetadataEntry {
+    pub fn new(name: impl Into<Value>, value: MetadataValue) -> Self {
+        Self {
+            name: name.into(),
+            value,
+        }
+    }
     pub fn into_pivot_table(self) -> PivotTable {
         let mut data = Vec::new();
         let group = match self.visit(&mut data) {
@@ -2736,6 +2737,17 @@ impl MetadataEntry {
     }
 }
 
+pub enum MetadataValue {
+    Leaf(Value),
+    Group(Vec<MetadataEntry>),
+}
+
+impl MetadataValue {
+    pub fn new_leaf(value: impl Into<Value>) -> Self {
+        Self::Leaf(value.into())
+    }
+}
+
 impl Serialize for MetadataValue {
     fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
     where
diff --git a/rust/pspp/src/pc.rs b/rust/pspp/src/pc.rs
new file mode 100644 (file)
index 0000000..ce48449
--- /dev/null
@@ -0,0 +1,867 @@
+// PSPP - a program for statistical analysis.
+// Copyright (C) 2025 Free Software Foundation, Inc.
+//
+// This program is free software: you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free Software
+// Foundation, either version 3 of the License, or (at your option) any later
+// version.
+//
+// This program is distributed in the hope that it will be useful, but WITHOUT
+// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+// FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
+// details.
+//
+// You should have received a copy of the GNU General Public License along with
+// this program.  If not, see <http://www.gnu.org/licenses/>.
+
+//! Reading SPSS/PC+ data files.
+//!
+//! This module enables reading [SPSS/PC+ data files], the data format for the
+//! SPSS/PC+ product first released in 1984.  It is obsolete.
+//!
+//! Use [PcFile] to read an SPSS/PC+ file.  Writing SPSS/PC+ files is not
+//! supported.
+//!
+//! [SPSS/PC+ data files]: https://pspp.benpfaff.org/manual/pc+.html
+#![cfg_attr(not(test), warn(missing_docs))]
+
+use std::{
+    collections::VecDeque,
+    fmt::{Display, Formatter},
+    fs::File,
+    io::{BufReader, Error as IoError, Read, Seek, SeekFrom, Take},
+    path::Path,
+};
+
+use binrw::{BinRead, Endian, Error as BinError};
+use chrono::{NaiveDate, NaiveDateTime, NaiveTime};
+use displaydoc::Display;
+use encoding_rs::WINDOWS_1252;
+use serde::Serialize;
+use thiserror::Error as ThisError;
+
+use crate::{
+    data::{ByteString, Case, Datum, RawString, WithEncoding},
+    dictionary::Dictionary,
+    format::{Error as FormatError, Format, UncheckedFormat},
+    identifier::{Error as IdError, Identifier},
+    output::pivot::{MetadataEntry, MetadataValue, PivotTable, Value},
+    sys::raw::{self, CaseDetails, CaseVar, CompressionAction, records::RawFormat},
+    variable::{MissingValues, MissingValuesError, VarWidth, Variable},
+};
+
+#[cfg(test)]
+mod tests;
+
+/// An [SPSS/PC+ data file].
+///
+/// [SPSS/PC+ data file]: https://pspp.benpfaff.org/manual/pc+.html
+#[derive(Debug)]
+pub struct PcFile<R> {
+    /// The data file's dictionary.
+    pub dictionary: Dictionary,
+
+    /// SPSS/PC+ file metadata that is not part of the dictionary.
+    pub metadata: Metadata,
+
+    /// Data in the SPSS/PC+ file.
+    pub cases: Cases<R>,
+}
+
+impl<R> PcFile<R> {
+    /// Returns the individual parts of the [PcFile].
+    pub fn into_parts(self) -> (Dictionary, Metadata, Cases<R>) {
+        (self.dictionary, self.metadata, self.cases)
+    }
+}
+
+/// SPSS/PC+ product family.
+#[derive(Clone, Debug, PartialEq, Eq, Serialize)]
+#[serde(rename_all = "snake_case")]
+pub enum Family {
+    /// Data analysis product family.
+    ///
+    /// This includes at least:
+    /// - SPSS/PC+
+    /// - SPSS/PC+ V3.0
+    Pc,
+
+    /// Data entry product family.
+    ///
+    /// This includes at least:
+    /// - SPSS Data Entry
+    /// - SPSS Data Entry II
+    De,
+}
+
+/// SPSS/PC+ file metadata that is not part of [Dictionary].
+#[derive(Clone, Debug, PartialEq, Eq, Serialize)]
+pub struct Metadata {
+    /// Creation date and time.
+    ///
+    /// This comes from the file header, not from the file system.
+    pub creation: NaiveDateTime,
+
+    /// Product family.
+    pub family: Family,
+
+    /// Name of the product that wrote the file.
+    pub product: Option<String>,
+
+    /// Additional metadata that in some files identifies a file name.
+    pub filename: Option<String>,
+
+    /// Whether data in the file is bytecode compressed.
+    pub compressed: bool,
+
+    /// Number of declared cases in the file.
+    pub n_cases: u16,
+}
+
+impl From<&Metadata> for PivotTable {
+    fn from(value: &Metadata) -> Self {
+        fn maybe_string(name: &str, s: &Option<String>) -> MetadataEntry {
+            MetadataEntry {
+                name: Value::new_user_text(name),
+                value: MetadataValue::Leaf(
+                    s.as_ref()
+                        .cloned()
+                        .map(Value::new_user_text)
+                        .unwrap_or_default(),
+                ),
+            }
+        }
+
+        MetadataEntry {
+            name: Value::new_user_text("SPSS/PC+ File Metadata"),
+            value: MetadataValue::Group(vec![
+                MetadataEntry {
+                    name: Value::new_user_text("Created"),
+                    value: MetadataValue::new_leaf(Value::new_date_time(value.creation)),
+                },
+                maybe_string("Product", &value.product),
+                maybe_string("File Name", &value.filename),
+                MetadataEntry::new(
+                    "Compression",
+                    MetadataValue::new_leaf(if value.compressed { "Simple" } else { "None" }),
+                ),
+                MetadataEntry::new(
+                    "Number of Cases",
+                    MetadataValue::new_leaf(Value::new_integer(Some(value.n_cases as f64))),
+                ),
+            ]),
+        }
+        .into_pivot_table()
+    }
+}
+
+/// Reader for cases in a SPSS/PC+ file.
+#[derive(Debug)]
+pub struct Cases<R> {
+    reader: Take<R>,
+    compressed: bool,
+    case_vars: Vec<CaseVar>,
+    codes: VecDeque<u8>,
+    read_cases: u64,
+    sysmis: f64,
+    n_cases: u16,
+    eof: bool,
+}
+
+impl<R> Cases<R> {
+    fn new(reader: Take<R>, dictionary: &Dictionary, metadata: &Metadata, sysmis: f64) -> Self {
+        Self {
+            reader,
+            compressed: metadata.compressed,
+            case_vars: dictionary
+                .variables
+                .iter()
+                .map(|var| var.width.into())
+                .collect(),
+            codes: VecDeque::new(),
+            sysmis,
+            read_cases: 0,
+            n_cases: metadata.n_cases,
+            eof: false,
+        }
+    }
+
+    fn read_case(&mut self) -> Result<Case<Vec<Datum<ByteString>>>, raw::Error<CaseDetails>>
+    where
+        R: Read + Seek,
+    {
+        let result = if !self.compressed {
+            Datum::read_case(
+                &mut self.reader,
+                self.read_cases + 1,
+                &self.case_vars,
+                Endian::Little,
+            )
+        } else {
+            Datum::read_compressed_case(
+                &mut self.reader,
+                self.read_cases + 1,
+                &self.case_vars,
+                &mut self.codes,
+                CompressionAction::from_pc,
+                Endian::Little,
+            )
+        };
+
+        match result {
+            Ok(Some(mut raw_case)) => {
+                for datum in &mut raw_case.0 {
+                    if let Datum::Number(Some(number)) = datum
+                        && *number == self.sysmis
+                    {
+                        *datum = Datum::Number(None);
+                    }
+                }
+                Ok(raw_case.with_encoding(WINDOWS_1252))
+            }
+            Ok(None) => Err(raw::Error::new(
+                None,
+                CaseDetails::WrongNumberOfCases {
+                    expected: self.n_cases as u64,
+                    actual: self.read_cases,
+                },
+            )),
+            Err(error) => Err(error),
+        }
+    }
+}
+
+impl CompressionAction {
+    /// Interprets an SPSS/PC+ system file compression opcode.
+    fn from_pc(code: u8) -> Self {
+        match code {
+            0 => Self::Sysmis,
+            1 => Self::Literal,
+            other => Self::CompressedInt(other as f64 - 100.0),
+        }
+    }
+}
+
+impl<R> Iterator for Cases<R>
+where
+    R: Read + Seek,
+{
+    type Item = Result<Case<Vec<Datum<ByteString>>>, raw::Error<CaseDetails>>;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.eof || self.case_vars.is_empty() || self.read_cases >= self.n_cases as u64 {
+            return None;
+        }
+
+        match self.read_case() {
+            Ok(case) => {
+                self.read_cases += 1;
+                Some(Ok(case))
+            }
+            Err(error) => {
+                self.eof = true;
+                Some(Err(error))
+            }
+        }
+    }
+}
+
+/// An error encountered reading a SPSS/PC+ file.
+#[derive(Debug)]
+pub struct Error {
+    /// Offset where the error occurred.
+    pub offset: Option<u64>,
+
+    /// Details of the error.
+    pub details: ErrorDetails,
+}
+
+impl std::error::Error for Error {}
+
+impl Error {
+    /// Constructs an error from `offset` and `details`.
+    pub fn new(offset: Option<u64>, details: ErrorDetails) -> Self {
+        Self { offset, details }
+    }
+}
+
+impl From<IoError> for Error {
+    fn from(value: IoError) -> Self {
+        Self::new(None, value.into())
+    }
+}
+
+impl Display for Error {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        if let Some(offset) = self.offset {
+            write!(f, "Error at file offset {:#x}: ", offset)?;
+        }
+        write!(f, "{}", &self.details)
+    }
+}
+
+/// An error for reading a [PcFile].
+#[derive(Display, ThisError, Debug)]
+pub enum ErrorDetails {
+    /// Not an SPSS/PC+ data file.
+    NotPc,
+
+    /// I/O error ({0}).
+    Io(#[from] IoError),
+
+    /// {0}
+    BinError(DisplayBinError),
+
+    /// Invalid variable format: {0}.
+    InvalidFormat(FormatError),
+
+    /// File header record declares {nominal_case_size} variable segments but the variable records contain more than that (at least {n_chunks}).
+    TooManyVariables {
+        /// Declared number of variable segments.
+        nominal_case_size: u16,
+        /// Actual number of variable segments.
+        n_chunks: usize,
+    },
+
+    /// Labels record ({record}) extends beyond end of file with length {file_size}.
+    InvalidLabelsRecord {
+        /// Labels record location.
+        record: Record,
+        /// File size.
+        file_size: u64,
+    },
+}
+
+impl From<BinError> for ErrorDetails {
+    fn from(value: BinError) -> Self {
+        Self::BinError(DisplayBinError(value))
+    }
+}
+
+/// Newtype that implements [Display] for [BinError].
+#[derive(Debug)]
+pub struct DisplayBinError(BinError);
+
+impl Display for DisplayBinError {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        if self.0.is_eof() {
+            write!(f, "Unexpected end-of-file reading {}", self.0)
+        } else {
+            write!(f, "Error reading SPSS/PC+ file: {}", self.0.root_cause())
+        }
+    }
+}
+/// A warning while reading a [PcFile].
+#[derive(Display, ThisError, Debug)]
+pub enum Warning {
+    /// Invalid creation date {0}.
+    InvalidCreationDate(String),
+
+    /// Invalid creation time {0}.
+    InvalidCreationTime(String),
+
+    /// Invalid variable name.  {id_error}  Substituting {new_name} instead.
+    InvalidVariableName {
+        /// Identifier error.
+        id_error: IdError,
+        /// New name.
+        new_name: Identifier,
+    },
+
+    /// Renaming variable with duplicate name {duplicate_name} to {new_name}.
+    DuplicateVariableName {
+        /// Duplicate name.
+        duplicate_name: Identifier,
+        /// New name.
+        new_name: Identifier,
+    },
+
+    /// Substituting {new_format} for invalid print format on variable {variable}.  {format_error}
+    InvalidPrintFormat {
+        /// New format.
+        new_format: Format,
+        /// Variable.
+        variable: Identifier,
+        /// Underlying error.
+        format_error: FormatError,
+    },
+
+    /// Substituting {new_format} for invalid write format on variable {variable}.  {format_error}
+    InvalidWriteFormat {
+        /// New format.
+        new_format: Format,
+        /// Variable.
+        variable: Identifier,
+        /// Underlying error.
+        format_error: FormatError,
+    },
+
+    /// Missing value range may not contain system-missing value.
+    MissingValueRangeSysmis,
+
+    /// Ignoring missing value for long string variable {0}.
+    LongStringMissingValue(Identifier),
+
+    /// Invalid missing values for variable {name}: {error}.
+    InvalidMissingValues {
+        /// Variable name.
+        name: Identifier,
+        /// Kind of error with missing values.
+        error: MissingValuesError,
+    },
+
+    /// Invalid identifier {string}.  {error}
+    InvalidIdentifier {
+        /// String that should be an identifier.
+        string: String,
+        /// Kind of error with the string.
+        error: IdError,
+    },
+
+    /// Unknown variable name {0}.
+    UnknownVariableName(Identifier),
+
+    /// Mixed variable types in value labels.
+    MixedVariableTypes,
+
+    /// Cannot weight by string variable {0}.
+    StringWeight(Identifier),
+
+    /// File's specified weight index {0} does not refer to any variable.
+    InvalidWeightIndex(u16),
+
+    /// Variable record for {name} refers to invalid variable label starting at offset {offset} in label record.
+    InvalidVarLabel {
+        /// Variable name.
+        name: Identifier,
+        /// Offset into label record.
+        offset: usize,
+    },
+
+    /// Value labels for {name}, at file offsets {start:#x}..{end:#x}, end with last value label (starting at file offset {offset:#x}) running past end offset.  (This warning appears for some system files written by SPSS Data Entry products.)
+    ValueLabelOverflow {
+        /// Variable name.
+        name: Identifier,
+        /// File starting offset for variable's value labels.
+        start: usize,
+        /// File ending offset for variable's value labels.
+        end: usize,
+        /// File offset for last value label.
+        offset: u64,
+    },
+
+    /// Ignoring value labels for long string variable {0}.
+    LongStringValueLabel(
+        /// Variable name.
+        Identifier,
+    ),
+
+    /// Value label for {name} specifies invalid range {start}..{end} into labels record with length {len}.
+    InvalidValueLabelRange {
+        /// Variable name.
+        name: Identifier,
+        /// Starting offset in labels record.
+        start: usize,
+        /// Ending offset in labels record.
+        end: usize,
+        /// Length of labels record.
+        len: usize,
+    },
+
+    /// File header inconsistently reports {0} cases in one place and {1} in another; assuming {0} cases.
+    InconsistentCaseCount(u16, u16),
+}
+
+#[derive(Debug, BinRead)]
+#[br(little)]
+struct FileHeader {
+    two: u32,
+    zero: u32,
+    main_record: Record,
+    variables_record: Record,
+    labels_record: Record,
+    data_record: Record,
+    _other_records: [Record; 11],
+    filename: [u8; 128],
+}
+
+/// A record in an SPSS/PC+ system file.
+#[derive(Copy, Clone, Debug, PartialEq, Eq, BinRead)]
+#[br(little)]
+pub struct Record {
+    /// File starting offset of the record.
+    offset: u32,
+    /// Length of the record in bytes.
+    len: u32,
+}
+
+impl Record {
+    fn new(offset: u32, len: u32) -> Self {
+        Self { offset, len }
+    }
+}
+
+impl Display for Record {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        write!(f, "offset {}, length {}", self.offset, self.len)
+    }
+}
+
+#[derive(Debug, BinRead)]
+#[br(little)]
+struct MainHeader {
+    one0: u16,
+    family: [u8; 2],
+    product: [u8; 60],
+    sysmis: f64,
+    zero0: u32,
+    zero1: u32,
+    one1: u16,
+    compressed: u16,
+    nominal_case_size: u16,
+    n_cases0: u16,
+    weight_index: u16,
+    _unknown: u16,
+    n_cases1: u16,
+    zero2: u16,
+    creation_date: [u8; 8],
+    creation_time: [u8; 8],
+    file_label: [u8; 64],
+}
+
+#[derive(BinRead)]
+#[br(little)]
+struct VariableRecord {
+    value_label_start: u32,
+    value_label_end: u32,
+    var_label_ofs: u32,
+    format: RawFormat,
+    name: [u8; 8],
+    missing: [u8; 8],
+}
+
+impl PcFile<BufReader<File>> {
+    /// Opens the file at `path`.
+    pub fn open_file<P, F>(path: P, warn: F) -> Result<Self, Error>
+    where
+        P: AsRef<Path>,
+        F: FnMut(Warning),
+    {
+        let reader = BufReader::new(File::open(path)?);
+        Self::open(reader, warn)
+    }
+}
+
+impl<R> PcFile<R>
+where
+    R: Read + Seek,
+{
+    /// Opens `reader` as an SPSS/PC+ file, invoking `warn` with any warnings
+    /// diagnosed while reading it.
+    pub fn open<F>(mut reader: R, mut warn: F) -> Result<Self, Error>
+    where
+        F: FnMut(Warning),
+    {
+        fn read_inner<R, F>(
+            mut reader: R,
+            mut warn: F,
+        ) -> Result<(Dictionary, Metadata, Record, f64), ErrorDetails>
+        where
+            R: Read + Seek,
+            F: FnMut(Warning),
+        {
+            let file_header = FileHeader::read(&mut reader)?;
+            if file_header.two != 2
+                || file_header.zero != 0
+                || file_header.main_record != Record::new(0x100, 0xb0)
+            {
+                return Err(ErrorDetails::NotPc);
+            }
+
+            reader.seek(SeekFrom::Start(file_header.main_record.offset as u64))?;
+            let main_header = MainHeader::read(&mut reader)?;
+            if main_header.one0 != 1
+                || main_header.one1 != 1
+                || main_header.zero0 != 0
+                || main_header.zero1 != 0
+                || main_header.zero2 != 0
+            {
+                return Err(ErrorDetails::NotPc);
+            }
+            let family = match &main_header.family {
+                b"DE" => Ok(Family::De),
+                b"PC" => Ok(Family::Pc),
+                _ => Err(ErrorDetails::NotPc),
+            }?;
+            if main_header.n_cases0 != main_header.n_cases1 {
+                warn(Warning::InconsistentCaseCount(
+                    main_header.n_cases0,
+                    main_header.n_cases1,
+                ));
+            }
+            let sysmis = main_header.sysmis;
+
+            let mut dictionary = Dictionary::new(WINDOWS_1252);
+
+            let file_label = WINDOWS_1252.decode(&main_header.file_label);
+            let file_label = file_label.0.trim();
+            if !file_label.is_empty() {
+                dictionary.file_label = Some(file_label.into());
+            }
+
+            let creation_date = WINDOWS_1252.decode(&main_header.creation_date).0;
+            let creation_date = NaiveDate::parse_from_str(creation_date.trim(), "%m/%d/%y")
+                .unwrap_or_else(|_| {
+                    warn(Warning::InvalidCreationDate(creation_date.into_owned()));
+                    Default::default()
+                });
+            let creation_time = WINDOWS_1252.decode(&main_header.creation_time).0;
+            let creation_time = NaiveTime::parse_from_str(creation_time.trim(), "%H:%M:%S")
+                .unwrap_or_else(|_| {
+                    warn(Warning::InvalidCreationTime(creation_time.into_owned()));
+                    Default::default()
+                });
+            let creation = NaiveDateTime::new(creation_date, creation_time);
+
+            let mut n_generated_names = 0;
+            fn generate_name(dictionary: &Dictionary, n_generated_names: &mut usize) -> Identifier {
+                loop {
+                    *n_generated_names = n_generated_names.checked_add(1).unwrap();
+                    let name = Identifier::from_encoding(
+                        format!("VAR{:03}", *n_generated_names),
+                        WINDOWS_1252,
+                    )
+                    .unwrap();
+                    if !dictionary.variables.contains(&name.0) {
+                        return name;
+                    }
+                }
+            }
+
+            let file_size = reader.seek(SeekFrom::End(0))?;
+            if u64::from(file_header.labels_record.offset)
+                + u64::from(file_header.labels_record.len)
+                > file_size
+            {
+                return Err(ErrorDetails::InvalidLabelsRecord {
+                    record: file_header.labels_record,
+                    file_size,
+                });
+            }
+            reader.seek(SeekFrom::Start(file_header.labels_record.offset as u64))?;
+            let mut labels = vec![0; file_header.labels_record.len as usize];
+            reader.read_exact(&mut labels)?;
+
+            reader.seek(SeekFrom::Start(file_header.variables_record.offset as u64))?;
+
+            let mut index = 0;
+            let mut weight_index = None;
+            let mut n_overflows = 0;
+            while index < main_header.nominal_case_size as usize {
+                if main_header.weight_index as usize == index + 1 {
+                    weight_index = Some(dictionary.variables.len());
+                }
+
+                let variable_record = VariableRecord::read(&mut reader)?;
+                let mut name = String::from(WINDOWS_1252.decode(&variable_record.name).0.trim());
+                if name.starts_with('$') {
+                    name.replace_range(..1, "@");
+                }
+                let name = match Identifier::from_encoding(name, WINDOWS_1252)
+                    .and_then(Identifier::must_be_ordinary)
+                {
+                    Ok(name) => {
+                        if !dictionary.variables.contains(&name.0) {
+                            name
+                        } else {
+                            let new_name = generate_name(&dictionary, &mut n_generated_names);
+                            warn(Warning::DuplicateVariableName {
+                                duplicate_name: name.clone(),
+                                new_name: new_name.clone(),
+                            });
+                            new_name
+                        }
+                    }
+                    Err(id_error) => {
+                        let new_name = generate_name(&dictionary, &mut n_generated_names);
+                        warn(Warning::InvalidVariableName {
+                            id_error,
+                            new_name: new_name.clone(),
+                        });
+                        new_name
+                    }
+                };
+
+                let format = UncheckedFormat::try_from(variable_record.format)
+                    .and_then(Format::try_from)
+                    .map_err(ErrorDetails::InvalidFormat)?;
+
+                let width = format.var_width();
+                let mut variable = Variable::new(name, width, WINDOWS_1252);
+
+                // This `unwrap` cannot panic because `format`, from
+                // `RawFormat`, can only represent a width <= 255.
+                let n_chunks = width.n_chunks().unwrap();
+
+                fn parse_datum(
+                    datum: [u8; 8],
+                    width: VarWidth,
+                    sysmis: f64,
+                ) -> Datum<WithEncoding<ByteString>> {
+                    match width {
+                        VarWidth::Numeric => {
+                            let value = f64::from_le_bytes(datum);
+                            Datum::Number((value != sysmis).then_some(value))
+                        }
+                        VarWidth::String(width) => Datum::String(
+                            ByteString::from(&datum[..width as usize]).with_encoding(WINDOWS_1252),
+                        ),
+                    }
+                }
+
+                if sysmis != f64::from_le_bytes(variable_record.missing) {
+                    if !width.is_long_string() {
+                        let missing_value = MissingValues::new(
+                            vec![parse_datum(variable_record.missing, width, sysmis)],
+                            None,
+                        )
+                        .unwrap();
+                        variable
+                            .missing_values_mut()
+                            .replace(missing_value)
+                            .unwrap();
+                    } else {
+                        warn(Warning::LongStringMissingValue(variable.name.clone()))
+                    }
+                }
+
+                if variable_record.var_label_ofs != 0 {
+                    let offset = variable_record.var_label_ofs as usize + 7;
+                    if let Some(len) = labels.get(offset)
+                        && let Some(slice) = labels.get(offset + 1..offset + 1 + *len as usize)
+                    {
+                        variable.label = Some(WINDOWS_1252.decode(slice).0.into_owned());
+                    } else {
+                        warn(Warning::InvalidVarLabel {
+                            name: variable.name.clone(),
+                            offset,
+                        });
+                    }
+                }
+
+                if variable_record.value_label_start != 0 {
+                    if width.is_long_string() {
+                        warn(Warning::LongStringValueLabel(variable.name.clone()));
+                    } else {
+                        let start = variable_record.value_label_start as usize + 7;
+                        let end = variable_record.value_label_end as usize + 7;
+                        if let Some(mut slice) = labels.get(start..end) {
+                            while !slice.is_empty() {
+                                if let Some((value, rest)) = slice.split_at_checked(8)
+                                    && let Some((length, rest)) = rest.split_first()
+                                    && let Some((label, rest)) =
+                                        rest.split_at_checked(*length as usize)
+                                {
+                                    let label = WINDOWS_1252.decode(label).0.into_owned();
+                                    let value =
+                                        parse_datum(value.try_into().unwrap(), width, sysmis)
+                                            .without_encoding();
+                                    variable.value_labels.insert(value, label);
+                                    slice = rest;
+                                } else {
+                                    if n_overflows == 0 {
+                                        warn(Warning::ValueLabelOverflow {
+                                            name: variable.name.clone(),
+                                            start: start
+                                                + file_header.labels_record.offset as usize,
+                                            end: end + file_header.labels_record.offset as usize,
+                                            offset: file_header.labels_record.offset as u64
+                                                + variable_record.value_label_start as u64
+                                                + 7
+                                                + (variable_record.value_label_end as u64
+                                                    - variable_record.value_label_start as u64
+                                                    - slice.len() as u64),
+                                        });
+                                    }
+                                    n_overflows += 1;
+                                    break;
+                                };
+                            }
+                        } else {
+                            warn(Warning::InvalidValueLabelRange {
+                                name: variable.name.clone(),
+                                start,
+                                end,
+                                len: labels.len(),
+                            });
+                        }
+                    }
+                }
+
+                dictionary.add_var(variable).unwrap();
+
+                for _ in 1..n_chunks {
+                    let _variable_record = VariableRecord::read(&mut reader)?;
+                }
+                index += n_chunks;
+                if index > main_header.nominal_case_size as usize {
+                    return Err(ErrorDetails::TooManyVariables {
+                        nominal_case_size: main_header.nominal_case_size,
+                        n_chunks: index,
+                    });
+                }
+            }
+
+            if let Some(weight_index) = weight_index {
+                if dictionary.set_weight(Some(weight_index)).is_err() {
+                    warn(Warning::StringWeight(
+                        dictionary
+                            .variables
+                            .get_index(weight_index)
+                            .unwrap()
+                            .name
+                            .clone(),
+                    ))
+                }
+            } else if main_header.weight_index != 0 {
+                warn(Warning::InvalidWeightIndex(main_header.weight_index))
+            }
+
+            fn decode_optional_string(s: &[u8]) -> Option<String> {
+                let s = WINDOWS_1252.decode(s).0;
+                let s = s.trim_matches(&[' ', '\0']);
+                if s.is_empty() { None } else { Some(s.into()) }
+            }
+            let metadata = Metadata {
+                creation,
+                family,
+                product: decode_optional_string(&main_header.product),
+                filename: decode_optional_string(&file_header.filename),
+                compressed: main_header.compressed != 0,
+                n_cases: main_header.n_cases0,
+            };
+
+            Ok((
+                dictionary,
+                metadata,
+                file_header.data_record,
+                main_header.sysmis,
+            ))
+        }
+
+        let (dictionary, metadata, data_record, sysmis) = read_inner(&mut reader, &mut warn)
+            .map_err(|details| Error {
+                offset: reader.stream_position().ok(),
+                details,
+            })?;
+
+        reader.seek(SeekFrom::Start(data_record.offset as u64))?;
+        let reader = reader.take(data_record.len as u64);
+        let cases = Cases::new(reader, &dictionary, &metadata, sysmis);
+        Ok(PcFile {
+            dictionary,
+            metadata,
+            cases,
+        })
+    }
+}
diff --git a/rust/pspp/src/pc/testdata/README.md b/rust/pspp/src/pc/testdata/README.md
new file mode 100644 (file)
index 0000000..5b623ac
--- /dev/null
@@ -0,0 +1,2 @@
+The two .sys files in this directory are old ones found on the Internet.  They
+do not contain any personally identifying information.
diff --git a/rust/pspp/src/pc/testdata/test1.expected b/rust/pspp/src/pc/testdata/test1.expected
new file mode 100644 (file)
index 0000000..04cc739
--- /dev/null
@@ -0,0 +1,68 @@
+╭───────────────┬───────────────────────────────────────╮
+│Created        â”‚                   05-OCT-1987 17:18:05│
+│Product        â”‚SPSS SYSTEM FILE.  IBM PC DOS, SPSS/PC+│
+│Compression    â”‚Simple                                 â”‚
+│Number of Cases│                                     30│
+╰───────────────┴───────────────────────────────────────╯
+
+╭─────────┬────────╮
+│Label    â”‚SPSS/PC+│
+│Variables│       7│
+╰─────────┴────────╯
+
+                                                            Variables
+╭──────────────────┬────────┬──────────────────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────┬──────────────╮
+│                  â”‚Position│       Label      â”‚Measurement Level│ Role│Width│Alignment│Print Format│Write Format│Missing Values│
+├──────────────────┼────────┼──────────────────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┼──────────────┤
+│@CASENUM          â”‚       1│                  â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│@DATE             â”‚       2│                  â”‚Nominal          â”‚Input│    8│Left     â”‚A8          â”‚A8          â”‚              â”‚
+│@WEIGHT           â”‚       3│                  â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│Advertiser code   â”‚       4│Advertiser code   â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│Microprocessor    â”‚       5│Microprocessor    â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│Clock speed in MHz│       6│Clock speed in MHz│                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│Price in dollars  â”‚       7│Price in dollars  â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+╰──────────────────┴────────┴──────────────────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────┴──────────────╯
+
+        Value Labels
+╭───────────────────┬─────╮
+│Variable Value     â”‚     â”‚
+├───────────────────┼─────┤
+│Microprocessor 1.00│8088 â”‚
+│               2.00│80286│
+│               3.00│80386│
+╰───────────────────┴─────╯
+
+╭────┬────────┬────────┬───────┬───────────────┬──────────────┬──────────────────┬────────────────╮
+│Case│@CASENUM│  @DATE â”‚@WEIGHT│Advertiser code│Microprocessor│Clock speed in MHz│Price in dollars│
+├────┼────────┼────────┼───────┼───────────────┼──────────────┼──────────────────┼────────────────┤
+│1   â”‚    6.00│10/5/87 â”‚   6.00│          18.00│          6.00│             13.00│          995.00│
+│2   â”‚    7.00│10/5/87 â”‚   6.00│          23.00│          6.00│             13.00│          699.00│
+│3   â”‚    8.00│10/5/87 â”‚   6.00│          23.00│          7.00│             13.00│          995.00│
+│4   â”‚    9.00│10/5/87 â”‚   6.00│          23.00│          7.00│             15.00│         1529.00│
+│5   â”‚   10.00│10/5/87 â”‚   6.00│          23.00│          7.00│             17.00│         1695.00│
+│6   â”‚   11.00│10/5/87 â”‚   6.00│          43.00│          8.00│             21.00│         2995.00│
+│7   â”‚   12.00│10/5/87 â”‚   6.00│          50.00│          7.00│             13.00│         1999.00│
+│8   â”‚   14.00│10/5/87 â”‚   6.00│          50.00│          7.00│             15.00│         2995.00│
+│9   â”‚   16.00│10/5/87 â”‚   6.00│          69.00│          7.00│             15.00│         1795.00│
+│10  â”‚   17.00│10/5/87 â”‚   6.00│          69.00│          6.00│              4.77│          995.00│
+│11  â”‚   18.00│10/5/87 â”‚   6.00│          72.00│          6.00│             13.00│          648.00│
+│12  â”‚   19.00│10/5/87 â”‚   6.00│          72.00│          7.00│             15.00│         1248.00│
+│13  â”‚   20.00│10/5/87 â”‚   6.00│          79.00│          7.00│             17.00│         1199.00│
+│14  â”‚   21.00│10/5/87 â”‚   6.00│          82.00│          7.00│             15.00│         1199.00│
+│15  â”‚   22.00│10/5/87 â”‚   6.00│          82.00│          6.00│             15.00│          699.00│
+│16  â”‚   23.00│10/5/87 â”‚   6.00│          90.00│          6.00│             13.00│          537.00│
+│17  â”‚   24.00│10/5/87 â”‚   6.00│          90.00│          7.00│             15.00│         1585.00│
+│18  â”‚   25.00│10/5/87 â”‚   6.00│          91.00│          7.00│             13.00│         1084.00│
+│19  â”‚   26.00│10/5/87 â”‚   6.00│          91.00│          7.00│             15.00│         1134.00│
+│20  â”‚   27.00│10/5/87 â”‚   6.00│          91.00│          7.00│             17.00│         1288.00│
+│21  â”‚   28.00│10/5/87 â”‚   6.00│          91.00│          6.00│             15.00│          634.00│
+│22  â”‚   29.00│10/5/87 â”‚   6.00│          91.00│          6.00│             13.00│          538.00│
+│23  â”‚   30.00│10/5/87 â”‚   6.00│          99.00│          6.00│             15.00│          785.00│
+│24  â”‚   31.00│10/5/87 â”‚   6.00│          99.00│          7.00│             15.00│         1555.00│
+│25  â”‚   32.00│10/5/87 â”‚   6.00│          99.00│          8.00│             21.00│         3620.00│
+│26  â”‚   33.00│10/5/87 â”‚   6.00│         112.00│          8.00│             25.00│         4999.00│
+│27  â”‚   34.00│10/5/87 â”‚   6.00│         112.00│          7.00│             17.00│         2999.00│
+│28  â”‚   35.00│10/5/87 â”‚   6.00│         112.00│          6.00│             13.00│          499.00│
+│29  â”‚   36.00│10/5/87 â”‚   6.00│         112.00│          8.00│             21.00│         2999.00│
+│30  â”‚   37.00│10/5/87 â”‚   6.00│         112.00│          6.00│             13.00│         1799.00│
+╰────┴────────┴────────┴───────┴───────────────┴──────────────┴──────────────────┴────────────────╯
diff --git a/rust/pspp/src/pc/testdata/test1.sys b/rust/pspp/src/pc/testdata/test1.sys
new file mode 100644 (file)
index 0000000..64cbf6d
Binary files /dev/null and b/rust/pspp/src/pc/testdata/test1.sys differ
diff --git a/rust/pspp/src/pc/testdata/test2.expected b/rust/pspp/src/pc/testdata/test2.expected
new file mode 100644 (file)
index 0000000..f6827bc
--- /dev/null
@@ -0,0 +1,106 @@
+╭───────────────┬───────────────────────────────────────╮
+│Created        â”‚                   05-DEC-1989 23:07:29│
+│Product        â”‚SPSS SYSTEM FILE.  IBM PC DOS, SPSS/PC+│
+│Compression    â”‚None                                   â”‚
+│Number of Cases│                                     70│
+╰───────────────┴───────────────────────────────────────╯
+
+╭─────────┬────────╮
+│Label    â”‚SPSS/PC+│
+│Variables│      14│
+╰─────────┴────────╯
+
+                                                 Variables
+╭────────┬────────┬─────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────┬──────────────╮
+│        â”‚Position│Label│Measurement Level│ Role│Width│Alignment│Print Format│Write Format│Missing Values│
+├────────┼────────┼─────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┼──────────────┤
+│@CASENUM│       1│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│@DATE   â”‚       2│     â”‚Nominal          â”‚Input│    8│Left     â”‚A8          â”‚A8          â”‚              â”‚
+│@WEIGHT â”‚       3│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│ID      â”‚       4│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│LIVE    â”‚       5│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│SEX     â”‚       6│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│AGE     â”‚       7│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│FIRSTVIS│       8│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│PROMPT  â”‚       9│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│TIME    â”‚      10│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│PEOPLE  â”‚      11│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│ADVERT  â”‚      12│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│OTHER   â”‚      13│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+│COMMENT â”‚      14│     â”‚                 â”‚Input│    8│Right    â”‚F8.2        â”‚F8.2        â”‚              â”‚
+╰────────┴────────┴─────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────┴──────────────╯
+
+╭────┬────────┬────────┬───────┬───────┬────┬────┬────┬────────┬──────┬────┬──────┬──────┬─────┬───────╮
+│Case│@CASENUM│  @DATE â”‚@WEIGHT│   ID  â”‚LIVE│ SEX│ AGE│FIRSTVIS│PROMPT│TIME│PEOPLE│ADVERT│OTHER│COMMENT│
+├────┼────────┼────────┼───────┼───────┼────┼────┼────┼────────┼──────┼────┼──────┼──────┼─────┼───────┤
+│1   â”‚    1.00│12/5/89 â”‚   1.00│  87.00│1.00│2.00│5.00│    2.00│  1.00│4.00│  1.00│  9.00│ 9.00│   1.00│
+│2   â”‚    2.00│12/5/89 â”‚   1.00│  88.00│1.00│1.00│5.00│    2.00│  1.00│4.00│  9.00│  9.00│ 1.00│   1.00│
+│3   â”‚    3.00│12/5/89 â”‚   1.00│  89.00│1.00│2.00│2.00│    1.00│  1.00│4.00│  9.00│  9.00│ 9.00│   9.00│
+│4   â”‚    4.00│12/5/89 â”‚   1.00│  90.00│4.00│2.00│2.00│    1.00│  1.00│3.00│  9.00│  9.00│ 1.00│   1.00│
+│5   â”‚    5.00│12/5/89 â”‚   1.00│  91.00│4.00│1.00│2.00│    1.00│  1.00│2.00│  9.00│  9.00│ 1.00│   1.00│
+│6   â”‚    6.00│12/5/89 â”‚   1.00│  92.00│4.00│2.00│2.00│    1.00│  1.00│2.00│  9.00│  4.00│ 1.00│   1.00│
+│7   â”‚    7.00│12/5/89 â”‚   1.00│  93.00│4.00│2.00│2.00│    1.00│  9.00│2.00│  9.00│  9.00│ 1.00│   9.00│
+│8   â”‚    8.00│12/5/89 â”‚   1.00│  94.00│1.00│2.00│4.00│    1.00│  1.00│3.00│  9.00│  5.00│ 9.00│   1.00│
+│9   â”‚    9.00│12/5/89 â”‚   1.00│  95.00│1.00│2.00│4.00│    1.00│  1.00│2.00│  9.00│  6.00│ 9.00│   9.00│
+│10  â”‚   10.00│12/5/89 â”‚   1.00│  96.00│2.00│2.00│4.00│    2.00│  1.00│2.00│  9.00│  1.00│ 9.00│   1.00│
+│11  â”‚   11.00│12/5/89 â”‚   1.00│1000.00│4.00│2.00│2.00│    9.00│  1.00│1.00│  1.00│  9.00│ 9.00│   9.00│
+│12  â”‚   12.00│12/5/89 â”‚   1.00│1001.00│4.00│1.00│3.00│    9.00│  1.00│2.00│  9.00│  9.00│ 9.00│   1.00│
+│13  â”‚   13.00│12/5/89 â”‚   1.00│1002.00│4.00│2.00│3.00│    1.00│  1.00│3.00│  1.00│  9.00│ 9.00│   1.00│
+│14  â”‚   14.00│12/5/89 â”‚   1.00│1003.00│2.00│2.00│3.00│    2.00│  1.00│3.00│  9.00│  2.00│ 9.00│   9.00│
+│15  â”‚   15.00│12/5/89 â”‚   1.00│1004.00│1.00│1.00│3.00│    2.00│  1.00│3.00│  9.00│  2.00│ 9.00│   1.00│
+│16  â”‚   16.00│12/5/89 â”‚   1.00│1005.00│1.00│1.00│3.00│    2.00│  1.00│3.00│  9.00│  3.00│ 9.00│   1.00│
+│17  â”‚   17.00│12/5/89 â”‚   1.00│1006.00│4.00│2.00│3.00│    1.00│  1.00│2.00│  1.00│  4.00│ 9.00│   1.00│
+│18  â”‚   18.00│12/5/89 â”‚   1.00│1007.00│4.00│1.00│3.00│    1.00│  1.00│3.00│  9.00│  9.00│ 1.00│   1.00│
+│19  â”‚   19.00│12/5/89 â”‚   1.00│1008.00│1.00│1.00│3.00│    2.00│  1.00│4.00│  9.00│  9.00│ 9.00│   9.00│
+│20  â”‚   20.00│12/5/89 â”‚   1.00│1009.00│1.00│9.00│3.00│    1.00│  9.00│4.00│  9.00│  9.00│ 1.00│   1.00│
+│21  â”‚   21.00│12/5/89 â”‚   1.00│1010.00│4.00│2.00│1.00│    1.00│  1.00│1.00│  9.00│  9.00│ 1.00│   9.00│
+│22  â”‚   22.00│12/5/89 â”‚   1.00│1011.00│4.00│9.00│2.00│    1.00│  1.00│3.00│  1.00│  8.00│ 9.00│   9.00│
+│23  â”‚   23.00│12/5/89 â”‚   1.00│1012.00│4.00│1.00│2.00│    1.00│  1.00│2.00│  1.00│  7.00│ 9.00│   1.00│
+│24  â”‚   24.00│12/5/89 â”‚   1.00│1013.00│4.00│1.00│2.00│    1.00│  9.00│5.00│  9.00│  3.00│ 9.00│   9.00│
+│25  â”‚   25.00│12/5/89 â”‚   1.00│1014.00│4.00│1.00│2.00│    1.00│  1.00│2.00│  5.00│  9.00│ 9.00│   9.00│
+│26  â”‚   26.00│12/5/89 â”‚   1.00│1015.00│4.00│1.00│2.00│    1.00│  1.00│4.00│  9.00│  3.00│ 9.00│   9.00│
+│27  â”‚   27.00│12/5/89 â”‚   1.00│1016.00│4.00│2.00│2.00│    1.00│  1.00│2.00│  5.00│  9.00│ 9.00│   1.00│
+│28  â”‚   28.00│12/5/89 â”‚   1.00│1017.00│1.00│9.00│2.00│    9.00│  1.00│2.00│  2.00│  9.00│ 9.00│   9.00│
+│29  â”‚   29.00│12/5/89 â”‚   1.00│1018.00│1.00│2.00│2.00│    2.00│  1.00│3.00│  9.00│  3.00│ 9.00│   1.00│
+│30  â”‚   30.00│12/5/89 â”‚   1.00│1019.00│1.00│2.00│2.00│    2.00│  1.00│4.00│  9.00│  2.00│ 9.00│   1.00│
+│31  â”‚   31.00│12/5/89 â”‚   1.00│1020.00│4.00│2.00│2.00│    1.00│  1.00│1.00│  4.00│  9.00│ 9.00│   9.00│
+│32  â”‚   32.00│12/5/89 â”‚   1.00│1021.00│4.00│1.00│3.00│    1.00│  1.00│3.00│  4.00│  9.00│ 9.00│   1.00│
+│33  â”‚   33.00│12/5/89 â”‚   1.00│1022.00│4.00│2.00│3.00│    1.00│  1.00│3.00│  3.00│  9.00│ 9.00│   1.00│
+│34  â”‚   34.00│12/5/89 â”‚   1.00│1023.00│4.00│2.00│3.00│    1.00│  9.00│9.00│  5.00│  8.00│ 9.00│   9.00│
+│35  â”‚   35.00│12/5/89 â”‚   1.00│1024.00│4.00│2.00│3.00│    1.00│  9.00│2.00│  4.00│  9.00│ 9.00│   9.00│
+│36  â”‚   36.00│12/5/89 â”‚   1.00│1025.00│3.00│1.00│3.00│    1.00│  1.00│2.00│  9.00│  9.00│ 9.00│   1.00│
+│37  â”‚   37.00│12/5/89 â”‚   1.00│1026.00│2.00│2.00│3.00│    2.00│  1.00│9.00│  9.00│  9.00│ 1.00│   9.00│
+│38  â”‚   38.00│12/5/89 â”‚   1.00│1027.00│1.00│1.00│3.00│    1.00│  1.00│4.00│  1.00│  9.00│ 9.00│   1.00│
+│39  â”‚   39.00│12/5/89 â”‚   1.00│1028.00│3.00│2.00│3.00│    1.00│  1.00│5.00│  4.00│  9.00│ 9.00│   1.00│
+│40  â”‚   40.00│12/5/89 â”‚   1.00│1029.00│1.00│2.00│3.00│    1.00│  1.00│4.00│  1.00│  2.00│ 9.00│   1.00│
+│41  â”‚   41.00│12/5/89 â”‚   1.00│1030.00│1.00│2.00│2.00│    1.00│  1.00│5.00│  1.00│  2.00│ 9.00│   9.00│
+│42  â”‚   42.00│12/5/89 â”‚   1.00│1031.00│4.00│2.00│5.00│    2.00│  1.00│2.00│  5.00│  8.00│ 9.00│   1.00│
+│43  â”‚   43.00│12/5/89 â”‚   1.00│1032.00│1.00│1.00│5.00│    9.00│  1.00│2.00│  9.00│  9.00│ 1.00│   9.00│
+│44  â”‚   44.00│12/5/89 â”‚   1.00│1033.00│4.00│1.00│1.00│    1.00│  1.00│2.00│  4.00│  9.00│ 1.00│   1.00│
+│45  â”‚   45.00│12/5/89 â”‚   1.00│1034.00│4.00│2.00│2.00│    1.00│  1.00│3.00│  3.00│  4.00│ 9.00│   1.00│
+│46  â”‚   46.00│12/5/89 â”‚   1.00│1035.00│4.00│1.00│2.00│    1.00│  1.00│1.00│  9.00│  9.00│ 1.00│   1.00│
+│47  â”‚   47.00│12/5/89 â”‚   1.00│1036.00│4.00│9.00│5.00│    1.00│  1.00│1.00│  1.00│  9.00│ 9.00│   9.00│
+│48  â”‚   48.00│12/5/89 â”‚   1.00│1039.00│2.00│2.00│4.00│    1.00│  1.00│1.00│  4.00│  8.00│ 9.00│   1.00│
+│49  â”‚   49.00│12/5/89 â”‚   1.00│  21.00│1.00│9.00│2.00│    1.00│  1.00│1.00│  5.00│  4.00│ 9.00│   1.00│
+│50  â”‚   50.00│12/5/89 â”‚   1.00│  23.00│3.00│2.00│5.00│    2.00│  1.00│3.00│  2.00│  8.00│ 9.00│   1.00│
+│51  â”‚   51.00│12/5/89 â”‚   1.00│  24.00│2.00│2.00│5.00│    2.00│  1.00│5.00│  9.00│  9.00│ 1.00│   1.00│
+│52  â”‚   52.00│12/5/89 â”‚   1.00│  27.00│1.00│2.00│3.00│    1.00│  1.00│3.00│  5.00│  7.00│ 1.00│   1.00│
+│53  â”‚   53.00│12/5/89 â”‚   1.00│  28.00│4.00│2.00│3.00│    1.00│  1.00│3.00│  1.00│  4.00│ 9.00│   1.00│
+│54  â”‚   54.00│12/5/89 â”‚   1.00│1040.00│4.00│1.00│1.00│    1.00│  9.00│9.00│  9.00│  9.00│ 9.00│   9.00│
+│55  â”‚   55.00│12/5/89 â”‚   1.00│1041.00│4.00│9.00│3.00│    1.00│  1.00│4.00│  9.00│  4.00│ 9.00│   1.00│
+│56  â”‚   56.00│12/5/89 â”‚   1.00│1043.00│3.00│1.00│3.00│    1.00│  1.00│3.00│  9.00│  4.00│ 9.00│   9.00│
+│57  â”‚   57.00│12/5/89 â”‚   1.00│1044.00│1.00│1.00│2.00│    1.00│  1.00│3.00│  9.00│  9.00│ 9.00│   1.00│
+│58  â”‚   58.00│12/5/89 â”‚   1.00│1045.00│2.00│1.00│2.00│    1.00│  1.00│3.00│  9.00│  1.00│ 9.00│   9.00│
+│59  â”‚   59.00│12/5/89 â”‚   1.00│1046.00│3.00│1.00│2.00│    1.00│  1.00│3.00│  9.00│  9.00│ 1.00│   1.00│
+│60  â”‚   60.00│12/5/89 â”‚   1.00│1047.00│3.00│1.00│2.00│    1.00│  1.00│3.00│  3.00│  7.00│ 9.00│   1.00│
+│61  â”‚   61.00│12/5/89 â”‚   1.00│1048.00│3.00│1.00│2.00│    1.00│  1.00│2.00│  9.00│  8.00│ 1.00│   9.00│
+│62  â”‚   62.00│12/5/89 â”‚   1.00│1049.00│3.00│1.00│2.00│    1.00│  1.00│2.00│  9.00│  9.00│ 1.00│   1.00│
+│63  â”‚   63.00│12/5/89 â”‚   1.00│1050.00│1.00│2.00│2.00│    1.00│  1.00│1.00│  9.00│  1.00│ 9.00│   1.00│
+│64  â”‚   64.00│12/5/89 â”‚   1.00│1051.00│1.00│2.00│2.00│    1.00│  1.00│5.00│  1.00│  2.00│ 9.00│   1.00│
+│65  â”‚   65.00│12/5/89 â”‚   1.00│1052.00│2.00│2.00│2.00│    1.00│  1.00│9.00│  9.00│  9.00│ 9.00│   9.00│
+│66  â”‚   66.00│12/5/89 â”‚   1.00│1053.00│2.00│2.00│3.00│    2.00│  3.00│9.00│  1.00│  3.00│ 9.00│   1.00│
+│67  â”‚   67.00│12/5/89 â”‚   1.00│1054.00│1.00│2.00│5.00│    2.00│  1.00│1.00│  1.00│  3.00│ 9.00│   1.00│
+│68  â”‚   68.00│12/5/89 â”‚   1.00│1055.00│1.00│2.00│5.00│    2.00│  1.00│3.00│  1.00│  2.00│ 9.00│   1.00│
+│69  â”‚   69.00│12/5/89 â”‚   1.00│1056.00│2.00│2.00│5.00│    2.00│  4.00│9.00│  1.00│  1.00│ 9.00│   9.00│
+│70  â”‚   70.00│12/5/89 â”‚   1.00│1057.00│2.00│2.00│5.00│    1.00│  4.00│9.00│  9.00│  1.00│ 1.00│   1.00│
+╰────┴────────┴────────┴───────┴───────┴────┴────┴────┴────────┴──────┴────┴──────┴──────┴─────┴───────╯
diff --git a/rust/pspp/src/pc/testdata/test2.sys b/rust/pspp/src/pc/testdata/test2.sys
new file mode 100644 (file)
index 0000000..4a759c1
Binary files /dev/null and b/rust/pspp/src/pc/testdata/test2.sys differ
diff --git a/rust/pspp/src/pc/tests.rs b/rust/pspp/src/pc/tests.rs
new file mode 100644 (file)
index 0000000..25dab4f
--- /dev/null
@@ -0,0 +1,59 @@
+use std::path::Path;
+
+use itertools::Itertools;
+
+use crate::{
+    data::cases_to_output,
+    output::{
+        Details, Item, Text,
+        pivot::{PivotTable, tests::assert_lines_eq},
+    },
+    pc::PcFile,
+};
+
+fn test_pcfile(name: &str) {
+    let base_filename = Path::new("src/pc/testdata").join(name);
+    let input_filename = base_filename.with_extension("sys");
+    let expected_filename = base_filename.with_extension("expected");
+
+    let mut warnings = Vec::new();
+    let output = match PcFile::open_file(input_filename, |warning| warnings.push(warning)) {
+        Ok(pc_file) => {
+            let (dictionary, metadata, cases) = pc_file.into_parts();
+
+            let mut output = Vec::new();
+            output.extend(
+                warnings
+                    .into_iter()
+                    .map(|warning| Item::from(Text::new_log(warning.to_string()))),
+            );
+            output.push(PivotTable::from(&metadata).into());
+            output.extend(dictionary.all_pivot_tables().into_iter().map_into());
+            output.extend(cases_to_output(&dictionary, cases));
+            Item::new(Details::Group(output.into_iter().map_into().collect()))
+        }
+        Err(error) => Item::new(Details::Text(Box::new(Text::new_log(error.to_string())))),
+    };
+
+    let actual = output.to_string();
+    let expected = std::fs::read_to_string(&expected_filename).unwrap();
+    if expected != actual {
+        if std::env::var("PSPP_REFRESH_EXPECTED").is_ok() {
+            std::fs::write(&expected_filename, actual).unwrap();
+            panic!("{}: refreshed output", expected_filename.display());
+        } else {
+            eprintln!("note: rerun with PSPP_REFRESH_EXPECTED=1 to refresh expected output");
+        }
+    }
+    assert_lines_eq(&expected, expected_filename.display(), &actual, "actual");
+}
+
+#[test]
+fn pcfile_test1() {
+    test_pcfile("test1");
+}
+
+#[test]
+fn pcfile_test2() {
+    test_pcfile("test2");
+}
diff --git a/rust/pspp/src/show_pc.rs b/rust/pspp/src/show_pc.rs
new file mode 100644 (file)
index 0000000..385f877
--- /dev/null
@@ -0,0 +1,300 @@
+// PSPP - a program for statistical analysis.
+// Copyright (C) 2025 Free Software Foundation, Inc.
+//
+// This program is free software: you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free Software
+// Foundation, either version 3 of the License, or (at your option) any later
+// version.
+//
+// This program is distributed in the hope that it will be useful, but WITHOUT
+// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+// FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
+// details.
+//
+// You should have received a copy of the GNU General Public License along with
+// this program.  If not, see <http://www.gnu.org/licenses/>.
+
+use anyhow::{Result, anyhow};
+use clap::{Args, ValueEnum};
+use pspp::{
+    data::cases_to_output,
+    output::{
+        Details, Item, Text,
+        driver::{Config, Driver},
+        pivot::PivotTable,
+    },
+    pc::PcFile,
+};
+use serde::Serialize;
+use std::{
+    cell::RefCell,
+    ffi::OsStr,
+    fmt::{Display, Write as _},
+    fs::File,
+    io::{BufReader, Write, stdout},
+    path::{Path, PathBuf},
+    rc::Rc,
+    sync::Arc,
+};
+
+/// Show information about SPSS/PC+ data files.
+#[derive(Args, Clone, Debug)]
+pub struct ShowPc {
+    /// What to show.
+    #[arg(value_enum)]
+    mode: Mode,
+
+    /// File to show.
+    #[arg(required = true)]
+    input: PathBuf,
+
+    /// Output file name.  If omitted, output is written to stdout.
+    output: Option<PathBuf>,
+
+    /// Maximum number of cases to read.
+    ///
+    /// If specified without an argument, all cases will be read.
+    #[arg(
+        long = "data",
+        num_args = 0..=1,
+        default_missing_value = "18446744073709551615",
+        default_value_t = 0,
+        help_heading = "Input file options"
+    )]
+    max_cases: usize,
+
+    /// Output driver configuration options.
+    #[arg(short = 'o', help_heading = "Output options")]
+    output_options: Vec<String>,
+
+    /// Output format.
+    #[arg(long, short = 'f', help_heading = "Output options")]
+    format: Option<ShowFormat>,
+}
+
+enum Output {
+    Driver {
+        driver: Rc<RefCell<Box<dyn Driver>>>,
+        mode: Mode,
+    },
+    Json {
+        writer: Rc<RefCell<Box<dyn Write>>>,
+        pretty: bool,
+    },
+    Discard,
+}
+
+impl Output {
+    fn show_json<T>(&self, value: &T) -> Result<()>
+    where
+        T: Serialize,
+    {
+        match self {
+            Self::Driver { mode, driver: _ } => {
+                Err(anyhow!("Mode '{mode}' only supports output as JSON."))
+            }
+            Self::Json { writer, pretty } => {
+                let mut writer = writer.borrow_mut();
+                match pretty {
+                    true => serde_json::to_writer_pretty(&mut *writer, value)?,
+                    false => serde_json::to_writer(&mut *writer, value)?,
+                };
+                writeln!(writer)?;
+                Ok(())
+            }
+            Self::Discard => Ok(()),
+        }
+    }
+
+    fn warn(&self, warning: &impl Display) {
+        match self {
+            Output::Driver { driver, .. } => {
+                driver
+                    .borrow_mut()
+                    .write(&Arc::new(Item::from(Text::new_log(warning.to_string()))));
+            }
+            Output::Json { .. } => {
+                #[derive(Serialize)]
+                struct Warning {
+                    warning: String,
+                }
+                let warning = Warning {
+                    warning: warning.to_string(),
+                };
+                let _ = self.show_json(&warning);
+            }
+            Self::Discard => (),
+        }
+    }
+}
+
+impl ShowPc {
+    pub fn run(self) -> Result<()> {
+        let format = if let Some(format) = self.format {
+            format
+        } else if let Some(output_file) = &self.output {
+            match output_file
+                .extension()
+                .unwrap_or(OsStr::new(""))
+                .to_str()
+                .unwrap_or("")
+            {
+                "json" => ShowFormat::Json,
+                "ndjson" => ShowFormat::Ndjson,
+                _ => ShowFormat::Output,
+            }
+        } else {
+            ShowFormat::Json
+        };
+
+        let output = match format {
+            ShowFormat::Output => {
+                let mut config = String::new();
+
+                if let Some(file) = &self.output {
+                    #[derive(Serialize)]
+                    struct File<'a> {
+                        file: &'a Path,
+                    }
+                    let file = File {
+                        file: file.as_path(),
+                    };
+                    let toml_file = toml::to_string_pretty(&file).unwrap();
+                    config.push_str(&toml_file);
+                }
+                for option in &self.output_options {
+                    writeln!(&mut config, "{option}").unwrap();
+                }
+
+                let table: toml::Table = toml::from_str(&config)?;
+                if !table.contains_key("driver") {
+                    let driver = if let Some(file) = &self.output {
+                        <dyn Driver>::driver_type_from_filename(file).ok_or_else(|| {
+                            anyhow!("{}: no default output format for file name", file.display())
+                        })?
+                    } else {
+                        "text"
+                    };
+
+                    #[derive(Serialize)]
+                    struct DriverConfig {
+                        driver: &'static str,
+                    }
+                    config.insert_str(
+                        0,
+                        &toml::to_string_pretty(&DriverConfig { driver }).unwrap(),
+                    );
+                }
+
+                let config: Config = toml::from_str(&config)?;
+                Output::Driver {
+                    mode: self.mode,
+                    driver: Rc::new(RefCell::new(Box::new(<dyn Driver>::new(&config)?))),
+                }
+            }
+            ShowFormat::Json | ShowFormat::Ndjson => Output::Json {
+                pretty: format == ShowFormat::Json,
+                writer: if let Some(output_file) = &self.output {
+                    Rc::new(RefCell::new(Box::new(File::create(output_file)?)))
+                } else {
+                    Rc::new(RefCell::new(Box::new(stdout())))
+                },
+            },
+            ShowFormat::Discard => Output::Discard,
+        };
+
+        let reader = BufReader::new(File::open(&self.input)?);
+        match self.mode {
+            Mode::Dictionary => {
+                let PcFile {
+                    dictionary,
+                    metadata: _,
+                    cases,
+                } = PcFile::open(reader, |warning| output.warn(&warning))?;
+                let cases = cases.take(self.max_cases);
+
+                match &output {
+                    Output::Driver { driver, mode: _ } => {
+                        let mut output = Vec::new();
+                        output.extend(
+                            dictionary
+                                .all_pivot_tables()
+                                .into_iter()
+                                .map(|pivot_table| Item::new(pivot_table)),
+                        );
+                        output.extend(cases_to_output(&dictionary, cases));
+                        driver
+                            .borrow_mut()
+                            .write(&Arc::new(Item::new(Details::Group(
+                                output.into_iter().map(Arc::new).collect(),
+                            ))));
+                    }
+                    Output::Json { .. } => {
+                        output.show_json(&dictionary)?;
+                        for (_index, case) in (0..self.max_cases).zip(cases) {
+                            output.show_json(&case?)?;
+                        }
+                    }
+                    Output::Discard => (),
+                }
+            }
+            Mode::Metadata => {
+                let metadata = PcFile::open(reader, |warning| output.warn(&warning))?.metadata;
+
+                match &output {
+                    Output::Driver { driver, mode: _ } => {
+                        driver
+                            .borrow_mut()
+                            .write(&Arc::new(Item::new(PivotTable::from(&metadata))));
+                    }
+                    Output::Json { .. } => {
+                        output.show_json(&metadata)?;
+                    }
+                    Output::Discard => (),
+                }
+            }
+        }
+        Ok(())
+    }
+}
+
+/// What to show in a system file.
+#[derive(Clone, Copy, Debug, Default, PartialEq, ValueEnum)]
+enum Mode {
+    /// File dictionary, with variables, value labels, ...
+    #[default]
+    #[value(alias = "dict")]
+    Dictionary,
+
+    /// File metadata not included in the dictionary.
+    Metadata,
+}
+
+impl Mode {
+    fn as_str(&self) -> &'static str {
+        match self {
+            Mode::Dictionary => "dictionary",
+            Mode::Metadata => "metadata",
+        }
+    }
+}
+
+impl Display for Mode {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{}", self.as_str())
+    }
+}
+
+#[derive(Clone, Copy, Debug, Default, PartialEq, Serialize, ValueEnum)]
+#[serde(rename_all = "snake_case")]
+enum ShowFormat {
+    /// Pretty-printed JSON.
+    #[default]
+    Json,
+    /// Newline-delimited JSON.
+    Ndjson,
+    /// Pivot tables.
+    Output,
+    /// No output.
+    Discard,
+}
index 7316d29b9d0f7c724e9e0687ad8f7102e1859121..d3e43b6b157e35d67332821d0f36b8d744ba7a1d 100644 (file)
@@ -36,7 +36,7 @@ use crate::{
     output::pivot::{Axis3, Dimension, Group, PivotTable, Value},
     sys::{
         raw::{
-            self, DecodedRecord, RawCases, RawDatum, RawWidth, Reader, infer_encoding,
+            self, CaseDetails, DecodedRecord, RawCases, RawDatum, RawWidth, Reader, infer_encoding,
             records::{
                 Compression, DocumentRecord, EncodingRecord, Extension, FileAttributesRecord,
                 FileHeader, FloatInfoRecord, IntegerInfoRecord, LongName, LongNamesRecord,
@@ -1728,7 +1728,7 @@ impl Debug for Cases {
 }
 
 impl Iterator for Cases {
-    type Item = Result<Case<Vec<Datum<ByteString>>>, raw::Error>;
+    type Item = Result<Case<Vec<Datum<ByteString>>>, raw::Error<CaseDetails>>;
 
     fn next(&mut self) -> Option<Self::Item> {
         self.inner.next().map(|result| {
index d9f770ba6a93fed7ffbd8cfa00fff251afa6d1ce..00779f054fb3c00ecfef8c98e4c342387085d9a0 100644 (file)
@@ -48,6 +48,7 @@ use crate::{
 };
 
 use binrw::Endian;
+use displaydoc::Display;
 use encoding_rs::{
     BIG5, EUC_JP, EUC_KR, Encoding, GB18030, IBM866, ISO_2022_JP, ISO_8859_2, ISO_8859_3,
     ISO_8859_4, ISO_8859_5, ISO_8859_6, ISO_8859_7, ISO_8859_8, ISO_8859_10, ISO_8859_13,
@@ -79,24 +80,27 @@ pub mod records;
 ///
 /// Any error prevents reading further data from the system file.
 #[derive(Debug)]
-pub struct Error {
+pub struct Error<D> {
     /// Range of file offsets where the error occurred.
     pub offsets: Option<Range<u64>>,
 
     /// Details of the error.
-    pub details: ErrorDetails,
+    pub details: D,
 }
 
-impl std::error::Error for Error {}
+impl<D> std::error::Error for Error<D> where D: Debug + Display {}
 
-impl Error {
+impl<D> Error<D> {
     /// Constructs an error from `offsets` and `details`.
-    pub fn new(offsets: Option<Range<u64>>, details: ErrorDetails) -> Self {
+    pub fn new(offsets: Option<Range<u64>>, details: D) -> Self {
         Self { offsets, details }
     }
 }
 
-impl Display for Error {
+impl<D> Display for Error<D>
+where
+    D: Display,
+{
     fn fmt(&self, f: &mut Formatter<'_>) -> FmtResult {
         if let Some(offsets) = &self.offsets
             && !offsets.is_empty()
@@ -115,7 +119,10 @@ impl Display for Error {
     }
 }
 
-impl From<IoError> for Error {
+impl<D> From<IoError> for Error<D>
+where
+    D: From<IoError>,
+{
     fn from(value: IoError) -> Self {
         Self::new(None, value.into())
     }
@@ -225,33 +232,6 @@ pub enum ErrorDetails {
         count: u32,
     },
 
-    /// Unexpected end of file {case_ofs} bytes into a {case_len}-byte case.
-    #[error(
-        "Unexpected end of file {case_ofs} bytes into case {case_number} with expected length {case_len} bytes."
-    )]
-    EofInCase {
-        /// Offset into case in bytes.
-        case_ofs: u64,
-        /// Expected case length in bytes.
-        case_len: usize,
-        /// 1-based case number in file.
-        case_number: u64,
-    },
-
-    /// Unexpected end of file {case_ofs} bytes and {n_chunks} compression
-    /// chunks into a compressed case.
-    #[error(
-        "Unexpected end of file {case_ofs} bytes and {n_chunks} compression chunks into compressed case {case_number}."
-    )]
-    EofInCompressedCase {
-        /// Offset into case in bytes.
-        case_ofs: u64,
-        /// Number of compression codes consumed.
-        n_chunks: usize,
-        /// 1-based case number in file.
-        case_number: u64,
-    },
-
     /// Error reading a [ZHeader].
     #[error("Error reading ZLIB header: {0}")]
     ZHeader(#[from] ZHeaderError),
@@ -260,15 +240,6 @@ pub enum ErrorDetails {
     #[error("Error reading ZLIB trailer: {0}")]
     ZTrailer(#[from] ZTrailerError),
 
-    /// File metadata says it contains {expected} cases, but {actual} cases were read.
-    #[error("File metadata says it contains {expected} cases, but {actual} cases were read.")]
-    WrongNumberOfCases {
-        /// Expected number of cases.
-        expected: u64,
-        /// Actual number of cases.
-        actual: u64,
-    },
-
     /// Encoding error.
     #[error(transparent)]
     EncodingError(
@@ -747,7 +718,7 @@ impl Record {
         endian: Endian,
         var_types: &VarTypes,
         warn: &mut dyn FnMut(Warning),
-    ) -> Result<Option<Record>, Error>
+    ) -> Result<Option<Record>, Error<ErrorDetails>>
     where
         R: BufRead + Seek,
     {
@@ -829,7 +800,7 @@ impl Record {
 pub fn infer_encoding(
     records: &[Record],
     mut warn: impl FnMut(Warning),
-) -> Result<&'static Encoding, Error> {
+) -> Result<&'static Encoding, Error<ErrorDetails>> {
     let (encoding, character_code) = get_encoding_info(records);
     match get_encoding(encoding, character_code) {
         Ok(encoding) => Ok(encoding),
@@ -889,7 +860,10 @@ impl<'de> Decoder<'de> {
     /// EBCDIC encoding, since this crate only supports ASCII-based encodings.
     ///
     /// `warn` will be used to report warnings while decoding records.
-    pub fn with_inferred_encoding<F>(records: &[Record], mut warn: F) -> Result<Self, Error>
+    pub fn with_inferred_encoding<F>(
+        records: &[Record],
+        mut warn: F,
+    ) -> Result<Self, Error<ErrorDetails>>
     where
         F: FnMut(Warning) + 'de,
     {
@@ -1098,26 +1072,111 @@ impl RawDatum {
     }
 }
 
+/// The meaning of a compression opcode byte.
+///
+/// This abstraction exists because SPSS and SPSS/PC+ system files have similar
+/// compression structures but their opcodes are slightly different.
+pub enum CompressionAction {
+    /// Ignored.
+    NoOp,
+
+    /// A compressed integer.
+    CompressedInt(
+        /// The compressed integer.
+        f64,
+    ),
+
+    /// End of file.
+    Eof,
+
+    /// Literal 8-byte value follows the block of opcodes.
+    Literal,
+
+    /// Represents a group of 8 spaces.
+    Spaces,
+
+    /// Represents the system-missing value.
+    Sysmis,
+}
+
+impl CompressionAction {
+    /// Interprets an SPSS system file compression opcode.
+    fn from_sysfile(code: u8, bias: f64) -> Self {
+        match code {
+            0 => Self::NoOp,
+            252 => Self::Eof,
+            253 => Self::Literal,
+            254 => Self::Spaces,
+            255 => Self::Sysmis,
+            _ => Self::CompressedInt(code as f64 - bias),
+        }
+    }
+}
+
+/// An error reading a case from a system file.
+///
+/// Used for SPSS system files and SPSS/PC+ system files.
+#[derive(ThisError, Display, Debug)]
+pub enum CaseDetails {
+    /// Unexpected end of file {case_ofs} bytes into case {case_number} with expected length {case_len} bytes.
+    EofInCase {
+        /// Offset into case in bytes.
+        case_ofs: u64,
+        /// Expected case length in bytes.
+        case_len: usize,
+        /// 1-based case number in file.
+        case_number: u64,
+    },
+
+    /// Unexpected end of file {case_ofs} bytes and {n_chunks} compression chunks into compressed case {case_number}.
+    EofInCompressedCase {
+        /// Offset into case in bytes.
+        case_ofs: u64,
+        /// Number of compression codes consumed.
+        n_chunks: usize,
+        /// 1-based case number in file.
+        case_number: u64,
+    },
+
+    /// File metadata says it contains {expected} cases, but {actual} cases were read.
+    WrongNumberOfCases {
+        /// Expected number of cases.
+        expected: u64,
+        /// Actual number of cases.
+        actual: u64,
+    },
+
+    /// I/O error ({0})
+    Io(#[from] IoError),
+}
+
 impl Datum<ByteString> {
-    fn read_case<R: Read + Seek>(
+    /// Reads an uncompressed case with variables `case_vars` from `reader`,
+    /// with numbers in the given `endian`.
+    ///
+    /// `case_number` is used in error messages.
+    pub fn read_case<R>(
         reader: &mut R,
         case_number: u64,
         case_vars: &[CaseVar],
         endian: Endian,
-    ) -> Result<Option<RawCase>, Error> {
+    ) -> Result<Option<RawCase>, Error<CaseDetails>>
+    where
+        R: Read + Seek,
+    {
         fn eof<R: Seek>(
             reader: &mut R,
             case_number: u64,
             case_vars: &[CaseVar],
             case_start: u64,
-        ) -> Result<Option<RawCase>, Error> {
+        ) -> Result<Option<RawCase>, Error<CaseDetails>> {
             let offset = reader.stream_position()?;
             if offset == case_start {
                 Ok(None)
             } else {
                 Err(Error::new(
                     Some(case_start..offset),
-                    ErrorDetails::EofInCase {
+                    CaseDetails::EofInCase {
                         case_ofs: offset - case_start,
                         case_len: case_vars.iter().map(CaseVar::bytes).sum(),
                         case_number,
@@ -1156,20 +1215,26 @@ impl Datum<ByteString> {
         Ok(Some(RawCase(values)))
     }
 
-    fn read_compressed_chunk<R: Read>(
+    fn read_compressed_chunk<R, F>(
         reader: &mut R,
         codes: &mut VecDeque<u8>,
+        decode_compression_action: F,
         endian: Endian,
-        bias: f64,
-    ) -> Result<Option<[u8; 8]>, Error> {
+    ) -> Result<Option<[u8; 8]>, Error<CaseDetails>>
+    where
+        F: Fn(u8) -> CompressionAction,
+        R: Read,
+    {
         loop {
-            match codes.pop_front() {
-                Some(0) => (),
-                Some(252) => return Ok(None),
-                Some(253) => return Ok(Some(read_bytes(reader)?)),
-                Some(254) => return Ok(Some([b' '; 8])),
-                Some(255) => return Ok(Some(endian.to_bytes(-f64::MAX))),
-                Some(code) => return Ok(Some(endian.to_bytes(code as f64 - bias))),
+            match codes.pop_front().map(&decode_compression_action) {
+                Some(CompressionAction::NoOp) => (),
+                Some(CompressionAction::Eof) => return Ok(None),
+                Some(CompressionAction::Literal) => return Ok(Some(read_bytes(reader)?)),
+                Some(CompressionAction::Spaces) => return Ok(Some([b' '; 8])),
+                Some(CompressionAction::Sysmis) => return Ok(Some(endian.to_bytes(-f64::MAX))),
+                Some(CompressionAction::CompressedInt(value)) => {
+                    return Ok(Some(endian.to_bytes(value)));
+                }
                 None => {
                     match try_read_bytes::<8, _>(reader)? {
                         Some(new_codes) => codes.extend(new_codes),
@@ -1179,25 +1244,37 @@ impl Datum<ByteString> {
             };
         }
     }
-    fn read_compressed_case<R: Read + Seek>(
+
+    /// Reads an compressed case with variables `case_vars` from `reader`, with
+    /// numbers in the given `endian`.
+    ///
+    /// `codes` is used for compression codes, which are interpreted using
+    /// `decode_compression_action`.
+    ///
+    /// `case_number` is used in error messages.
+    pub fn read_compressed_case<R, F>(
         reader: &mut R,
         case_number: u64,
         case_vars: &[CaseVar],
         codes: &mut VecDeque<u8>,
+        decode_compression_action: F,
         endian: Endian,
-        bias: f64,
-    ) -> Result<Option<RawCase>, Error> {
+    ) -> Result<Option<RawCase>, Error<CaseDetails>>
+    where
+        R: Read + Seek,
+        F: Fn(u8) -> CompressionAction,
+    {
         fn eof<R: Seek>(
             reader: &mut R,
             case_number: u64,
             case_start: u64,
             n_chunks: usize,
-        ) -> Result<Option<RawCase>, Error> {
+        ) -> Result<Option<RawCase>, Error<CaseDetails>> {
             let offset = reader.stream_position()?;
             if n_chunks > 0 {
                 Err(Error::new(
                     Some(case_start..offset),
-                    ErrorDetails::EofInCompressedCase {
+                    CaseDetails::EofInCompressedCase {
                         case_ofs: offset - case_start,
                         n_chunks,
                         case_number,
@@ -1214,7 +1291,12 @@ impl Datum<ByteString> {
         for var in case_vars {
             match var {
                 CaseVar::Numeric => {
-                    let Some(raw) = Self::read_compressed_chunk(reader, codes, endian, bias)?
+                    let Some(raw) = Self::read_compressed_chunk(
+                        reader,
+                        codes,
+                        &decode_compression_action,
+                        endian,
+                    )?
                     else {
                         return eof(reader, case_number, case_start, n_chunks);
                     };
@@ -1227,8 +1309,12 @@ impl Datum<ByteString> {
                         let mut data_bytes = segment.data_bytes;
                         let mut padding_bytes = segment.padding_bytes;
                         while data_bytes > 0 || padding_bytes > 0 {
-                            let Some(raw) =
-                                Self::read_compressed_chunk(reader, codes, endian, bias)?
+                            let Some(raw) = Self::read_compressed_chunk(
+                                reader,
+                                codes,
+                                &decode_compression_action,
+                                endian,
+                            )?
                             else {
                                 return eof(reader, case_number, case_start, n_chunks);
                             };
@@ -1329,7 +1415,10 @@ where
     ///
     /// To read an encrypted system file, wrap `reader` in
     /// [EncryptedReader](crate::crypto::EncryptedReader).
-    pub fn new(mut reader: R, mut warn: impl FnMut(Warning) + 'a) -> Result<Self, Error> {
+    pub fn new(
+        mut reader: R,
+        mut warn: impl FnMut(Warning) + 'a,
+    ) -> Result<Self, Error<ErrorDetails>> {
         let header = FileHeader::read(&mut reader, &mut warn)?;
         Ok(Self {
             reader: Some(reader),
@@ -1451,7 +1540,7 @@ impl<'a, 'b, R> Iterator for Records<'a, 'b, R>
 where
     R: BufRead + Seek + 'static,
 {
-    type Item = Result<Record, Error>;
+    type Item = Result<Record, Error<ErrorDetails>>;
 
     fn next(&mut self) -> Option<Self::Item> {
         self.next_inner().inspect(|retval| {
@@ -1465,9 +1554,17 @@ where
 trait ReadSeek: Read + Seek {}
 impl<T> ReadSeek for T where T: Read + Seek {}
 
+/// Part of a string variable for reading data from a system file.
+///
+/// A string variable in a system file is usually just the string itself
+/// followed by padding out to a multiple of 8 bytes.  Very long strings (longer
+/// than 255 bytes) consist of multiple segments.
 #[derive(Debug)]
-struct StringSegment {
+pub struct StringSegment {
+    /// Number of bytes of string data.
     data_bytes: usize,
+
+    /// Number of bytes to ignore following the string data.
     padding_bytes: usize,
 }
 
@@ -1482,16 +1579,27 @@ fn segment_widths(width: usize) -> impl Iterator<Item = usize> {
         .map(|w| w.next_multiple_of(8))
 }
 
-enum CaseVar {
+/// Format for reading a variable in a system file.
+#[derive(Debug)]
+pub enum CaseVar {
+    /// A numeric variable, represented in the system file as an `f64`.
     Numeric,
+
+    /// A string variable.
     String {
+        /// Total number of bytes (sum of `data_bytes` across the `encoding`).
         width: usize,
+
+        /// How the string variable is represented in the file.
+        ///
+        /// Widths 255 or less have a single [StringSegment]; wider variables
+        /// have multiple.
         encoding: SmallVec<[StringSegment; 1]>,
     },
 }
 
-impl CaseVar {
-    fn new(width: VarWidth) -> Self {
+impl From<VarWidth> for CaseVar {
+    fn from(width: VarWidth) -> Self {
         match width {
             VarWidth::Numeric => Self::Numeric,
             VarWidth::String(width) => {
@@ -1515,7 +1623,9 @@ impl CaseVar {
             }
         }
     }
+}
 
+impl CaseVar {
     fn bytes(&self) -> usize {
         match self {
             CaseVar::Numeric => 8,
@@ -1595,7 +1705,7 @@ impl RawCases {
                 .iter()
                 .flatten()
                 .copied()
-                .map(CaseVar::new)
+                .map_into()
                 .collect::<Vec<_>>(),
             compression: header.compression,
             bias: header.bias,
@@ -1613,7 +1723,7 @@ impl RawCases {
     /// very long string variables (see [RawCases] for details).
     pub fn with_widths(self, widths: impl IntoIterator<Item = VarWidth>) -> Self {
         Self {
-            case_vars: widths.into_iter().map(CaseVar::new).collect::<Vec<_>>(),
+            case_vars: widths.into_iter().map_into().collect::<Vec<_>>(),
             ..self
         }
     }
@@ -1629,7 +1739,7 @@ impl RawCases {
 }
 
 impl Iterator for RawCases {
-    type Item = Result<RawCase, Error>;
+    type Item = Result<RawCase, Error<CaseDetails>>;
 
     fn next(&mut self) -> Option<Self::Item> {
         if self.eof {
@@ -1644,8 +1754,8 @@ impl Iterator for RawCases {
                 self.read_cases + 1,
                 &self.case_vars,
                 &mut self.codes,
+                |code| CompressionAction::from_sysfile(code, self.bias),
                 self.endian,
-                self.bias,
             )
             .transpose()
         } else {
@@ -1665,7 +1775,7 @@ impl Iterator for RawCases {
                 {
                     return Some(Err(Error::new(
                         None,
-                        ErrorDetails::WrongNumberOfCases {
+                        CaseDetails::WrongNumberOfCases {
                             expected: expected_cases,
                             actual: self.read_cases,
                         },
@@ -2152,15 +2262,15 @@ impl EncodingReportString {
 impl EncodingReport {
     /// Constructs an encoding report from `reader`, reading no more than
     /// `max_cases` from it.
-    pub fn new<R>(mut reader: Reader<R>, max_cases: u64) -> Result<Self, Error>
+    pub fn new<R>(mut reader: Reader<R>, max_cases: u64) -> anyhow::Result<Self>
     where
         R: BufRead + Seek + 'static,
     {
         fn inner(
             header: FileHeader<ByteString>,
             records: &[Record],
-            cases: impl Iterator<Item = Result<RawCase, Error>>,
-        ) -> Result<EncodingReport, Error> {
+            cases: impl Iterator<Item = Result<RawCase, Error<CaseDetails>>>,
+        ) -> Result<EncodingReport, Error<CaseDetails>> {
             let (encoding, codepage) = get_encoding_info(records);
             let label = encoding
                 .map(|encoding| (String::from(encoding), get_encoding(Some(encoding), None)));
@@ -2277,7 +2387,11 @@ impl EncodingReport {
 
         let records: Vec<Record> = reader.records().collect::<Result<Vec<_>, _>>()?;
         let header = reader.header().clone();
-        inner(header, &records, reader.cases().take(max_cases as usize))
+        Ok(inner(
+            header,
+            &records,
+            reader.cases().take(max_cases as usize),
+        )?)
     }
 }
 
index 592d29b54f03906e4a22432bcc797866ab66f742..1bc79ced89fb2b09f4551e98323271d93d44a57c 100644 (file)
@@ -146,7 +146,7 @@ pub struct RawHeader {
 
 impl FileHeader<ByteString> {
     /// Reads a header record from `r`, reporting any warnings via `warn`.
-    pub fn read<R>(r: &mut R, warn: &mut dyn FnMut(Warning)) -> Result<Self, Error>
+    pub fn read<R>(r: &mut R, warn: &mut dyn FnMut(Warning)) -> Result<Self, Error<ErrorDetails>>
     where
         R: Read + Seek,
     {
@@ -384,7 +384,7 @@ impl RawMissingValues {
         code: i32,
         endian: Endian,
         warn: &mut dyn FnMut(Warning),
-    ) -> Result<Self, Error>
+    ) -> Result<Self, Error<ErrorDetails>>
     where
         R: Read + Seek,
     {
@@ -558,7 +558,11 @@ pub struct RawVariableRecord {
 
 impl VariableRecord<ByteString> {
     /// Reads a variable record from `r`.
-    pub fn read<R>(r: &mut R, endian: Endian, warn: &mut dyn FnMut(Warning)) -> Result<Self, Error>
+    pub fn read<R>(
+        r: &mut R,
+        endian: Endian,
+        warn: &mut dyn FnMut(Warning),
+    ) -> Result<Self, Error<ErrorDetails>>
     where
         R: Read + Seek,
     {
@@ -720,7 +724,7 @@ impl ValueLabelRecord<RawDatum, ByteString> {
         endian: Endian,
         var_types: &VarTypes,
         warn: &mut dyn FnMut(Warning),
-    ) -> Result<Option<Self>, Error> {
+    ) -> Result<Option<Self>, Error<ErrorDetails>> {
         let label_offset = r.stream_position()?;
         let n: u32 = endian.parse(read_bytes(r)?);
         if n > Self::MAX_LABELS {
@@ -885,7 +889,7 @@ impl DocumentRecord<RawDocumentLine> {
     pub const MAX_LINES: usize = i32::MAX as usize / DOC_LINE_LEN;
 
     /// Reads a document record from `r`.
-    pub fn read<R>(r: &mut R, endian: Endian) -> Result<Self, Error>
+    pub fn read<R>(r: &mut R, endian: Endian) -> Result<Self, Error<ErrorDetails>>
     where
         R: Read + Seek,
     {
@@ -2300,7 +2304,7 @@ impl Extension {
         endian: Endian,
         var_types: &VarTypes,
         warn: &mut dyn FnMut(Warning),
-    ) -> Result<Option<Record>, Error> {
+    ) -> Result<Option<Record>, Error<ErrorDetails>> {
         let subtype = endian.parse(read_bytes(r)?);
         let header_offset = r.stream_position()?;
         let size: u32 = endian.parse(read_bytes(r)?);
@@ -2489,7 +2493,7 @@ pub struct RawZHeader {
 
 impl ZHeader {
     /// Reads a ZLIB header from `r` using `endian`.
-    pub fn read<R>(r: &mut R, endian: Endian) -> Result<ZHeader, Error>
+    pub fn read<R>(r: &mut R, endian: Endian) -> Result<ZHeader, Error<ErrorDetails>>
     where
         R: Read + Seek,
     {
@@ -2776,7 +2780,7 @@ impl ZTrailer {
         bias: f64,
         zheader: &RawZHeader,
         warn: &mut dyn FnMut(Warning),
-    ) -> Result<Option<ZTrailer>, Error>
+    ) -> Result<Option<ZTrailer>, Error<ErrorDetails>>
     where
         R: Read + Seek,
     {