Also, update `pspp show` implementation to match.
# Converting data files with `pspp convert`
-`pspp convert <INPUT> [OUTPUT]` reads an SPSS data file from `<INPUT>`
-and writes a copy of it to `[OUTPUT]` (or to the terminal, if
-`[OUTPUT]` is omitted).
+The `pspp convert` command reads data from one file and writes it to
+another. The basic syntax is:
+
+```
+pspp convert <INPUT> [OUTPUT]
+```
+
+which reads an SPSS data file from `<INPUT>` and writes a copy of it
+to `[OUTPUT]`. If `[OUTPUT]` is omitted, output is written to the
+terminal.
If `[OUTPUT]` is specified, then `pspp convert` tries to guess the
output format based on its extension:
`<ENCODING>` must be one of the labels for encodings in the
[Encoding Standard]. PSPP does not support UTF-16 or EBCDIC
- encodings data files.
+ encodings in data files.
`pspp show encodings` can help figure out the correct encoding for a
system file.
# Decrypting SPSS files with `pspp decrypt`
-SPSS supports encryption using a password for data, viewer, and syntax
-files. `pspp decrypt <INPUT> <OUTPUT>` reads an encrypted file
-`<INPUT>` and writes an equivalent plaintext file `<OUTPUT>`.
+The `pspp decrypt` command reads an encrypted SPSS file and writes out
+an equivalent plaintext file. The basic syntax is:
+
+```
+pspp decrypt <INPUT> <OUTPUT>
+```
+
+which reads an encrypted SPSS data, viewer, or syntax file `<INPUT>`,
+decrypts it, and writes the decrypted version to `<OUTPUT>`.
Other commands, such as [`pspp convert`](pspp-convert.md), can also
read encrypted files directly.
# Inspecting data files with `pspp show`
+
+The `pspp show` command reads an SPSS data file and produces a report.
+The basic syntax is:
+
+```
+pspp show <MODE> <INPUT> [OUTPUT]
+```
+
+where `<MODE>` is a mode of operation (see below), `<INPUT>` is the
+SPSS data file to read, and `[OUTPUT]` is the output file name. If
+`[OUTPUT]` is omitted, output is written to the terminal.
+
+The following `<MODE>`s are available:
+
+* `identify`: Outputs a line of text to stdout that identifies the
+ basic kind of system file.
+
+* `dictionary`: Outputs the file dictionary in detail, including
+ variables, value labels, attributes, documents, and so on. With
+ `--data`, also outputs cases from the system file.
+
+ This can be useful as an alternative to PSPP syntax commands such as
+ [`SYSFILE INFO`](../commands/spss-io/sysfile-info.md) or [`DISPLAY
+ DICTIONARY`](../commands/variables/display.md).
+
+ [`pspp convert`](pspp-convert.md) is a better way to convert a
+ system file to another format.
+
+* `encodings`: Analyzes text data in the system file dictionary and
+ (with `--data`) cases and produces a report that can help the user
+ to figure out what character encoding the file uses.
+
+ This is useful for old system files that don't identify their own
+ encodings.
+
+* `raw`: Outputs the raw structure of the system file dictionary and
+ (with `--data`) cases. This command does not assume a particular
+ character encoding for the system file, which means that some of the
+ dictionary can't be printed in detail, only in summary.
+
+ This is useful for debugging how PSPP reads system files and for
+ investigating cases of system file corruption, especially when the
+ character encoding is unknown or uncertain.
+
+* `decoded`: Outputs the raw structure of the system file dictionary
+ and (with `--data`) cases. Versus `raw`, this command does decode
+ the dictionary and data with a particular character encoding, which
+ allows it to fully interpret system file records.
+
+ This is useful for debugging how PSPP reads system files and for
+ investigating cases of system file corruption.
+
+## Options
+
+The following options affect how `pspp show` reads `<INPUT>`:
+
+* `--encoding <ENCODING>`
+ For modes `decoded` and `dictionary`, this reads the input file
+ using the specified `<ENCODING>`, overriding the default.
+
+ `<ENCODING>` must be one of the labels for encodings in the
+ [Encoding Standard]. PSPP does not support UTF-16 or EBCDIC
+ encodings in data files.
+
+ `pspp show encodings` can help figure out the correct encoding for a
+ system file.
+
+ [Encoding Standard]: https://encoding.spec.whatwg.org/#names-and-labels
+
+* `--data [<MAX_CASES>]`
+ For modes `raw`, `dictionary`, and `encodings`, this instructs `pspp
+ show` to read cases from the file. If `<MAX_CASES>` is given, then
+ that sets a limit on the number of cases to read. Without this
+ option, PSPP will not read any cases.
+
+The following options affect how `pspp show` writes its output:
+
+* `-f <FORMAT>`
+ `--format <FORMAT>`
+ Specifies the format to use for output. `<FORMAT>` may be one of
+ the following:
+
+ - `json`: JSON using indentation and spaces for easy human
+ consumption.
+ - `ndjson`: [Newline-delimited JSON].
+ - `output`: Pivot tables with the PSPP output engine. Use `-o` for
+ additional configuration.
+ - `discard`: Do not produce any output.
+
+ When these options are not used, the default output format is chosen
+ based on the `[OUTPUT]` extension. If `[OUTPUT]` is not specified,
+ then output defaults to JSON.
+
+ [Newline-delimited JSON]: https://github.com/ndjson/ndjson-spec
+
+* `-o <OUTPUT_OPTIONS>`
+ Adds `<OUTPUT_OPTIONS>` to the output engine configuration.
+
/// File to show.
#[arg(required = true)]
- input_file: PathBuf,
+ input: PathBuf,
/// Output file name. If omitted, output is written to stdout.
- output_file: Option<PathBuf>,
+ output: Option<PathBuf>,
- /// Output driver configuration options.
- #[arg(short = 'o')]
- output_options: Vec<String>,
+ /// The encoding to use.
+ #[arg(long, value_parser = parse_encoding, help_heading = "Input file options")]
+ encoding: Option<&'static Encoding>,
/// Maximum number of cases to read.
///
long = "data",
num_args = 0..=1,
default_missing_value = "18446744073709551615",
- default_value_t = 0
+ default_value_t = 0,
+ help_heading = "Input file options"
)]
max_cases: u64,
+ /// Output driver configuration options.
+ #[arg(short = 'o', help_heading = "Output options")]
+ output_options: Vec<String>,
+
/// Output format.
- #[arg(long, short = 'f')]
+ #[arg(long, short = 'f', help_heading = "Output options")]
format: Option<ShowFormat>,
-
- /// The encoding to use.
- #[arg(long, value_parser = parse_encoding)]
- encoding: Option<&'static Encoding>,
}
enum Output {
pub fn run(self) -> Result<()> {
let format = if let Some(format) = self.format {
format
- } else if let Some(output_file) = &self.output_file {
+ } else if let Some(output_file) = &self.output {
match output_file
.extension()
.unwrap_or(OsStr::new(""))
ShowFormat::Output => {
let mut config = String::new();
- if let Some(file) = &self.output_file {
+ if let Some(file) = &self.output {
#[derive(Serialize)]
struct File<'a> {
file: &'a Path,
let table: toml::Table = toml::from_str(&config)?;
if !table.contains_key("driver") {
- let driver = if let Some(file) = &self.output_file {
+ let driver = if let Some(file) = &self.output {
<dyn Driver>::driver_type_from_filename(file).ok_or_else(|| {
anyhow!("{}: no default output format for file name", file.display())
})?
}
ShowFormat::Json | ShowFormat::Ndjson => Output::Json {
pretty: format == ShowFormat::Json,
- writer: if let Some(output_file) = &self.output_file {
+ writer: if let Some(output_file) = &self.output {
Rc::new(RefCell::new(Box::new(File::create(output_file)?)))
} else {
Rc::new(RefCell::new(Box::new(stdout())))
ShowFormat::Discard => Output::Discard,
};
- let reader = File::open(&self.input_file)?;
+ let reader = File::open(&self.input)?;
let reader = BufReader::new(reader);
let mut reader = Reader::new(reader, Box::new(|warning| output.warn(&warning)))?;
/// What to show in a system file.
#[derive(Clone, Copy, Debug, Default, PartialEq, ValueEnum)]
enum Mode {
- /// The file dictionary, including variables, value labels, attributes, and so on.
+ /// The kind of file.
+ Identity,
+
+ /// File dictionary, with variables, value labels, attributes, ...
#[default]
#[value(alias = "dict")]
Dictionary,
- /// Possible encodings of text in the file dictionary and (with `--data`) cases.
+ /// Possible encodings of text in file dictionary and (with `--data`) cases.
Encodings,
- /// The kind of file.
- Identity,
-
/// Raw file records, without assuming a particular character encoding.
Raw,