work on manual

author Ben Pfaff <blp@cs.stanford.edu>

Wed, 7 May 2025 00:22:33 +0000 (17:22 -0700)

committer Ben Pfaff <blp@cs.stanford.edu>

Wed, 7 May 2025 00:22:33 +0000 (17:22 -0700)
author Ben Pfaff <blp@cs.stanford.edu>
Wed, 7 May 2025 00:22:33 +0000 (17:22 -0700)
committer Ben Pfaff <blp@cs.stanford.edu>
Wed, 7 May 2025 00:22:33 +0000 (17:22 -0700)
diff --git a/rust/doc/src/SUMMARY.md b/rust/doc/src/SUMMARY.md

index c92ef172c26f0597098cdcf17bbb7f69bbfb7c07..3c2020fd9a26a3bbcd050da88fbbab35193a8076 100644 (file)
--- a/rust/doc/src/SUMMARY.md
+++ b/rust/doc/src/SUMMARY.md
@@ -40,7 +40,7 @@
  
  # Command Syntax
  
-- [Data Input and Output](commands/data-io/index.md)
+- [Working with Text Files](commands/data-io/index.md)
    - [BEGIN DATA](commands/data-io/begin-data.md)
    - [CLOSE FILE HANDLE](commands/data-io/close-file-handle.md)
    - [DATAFILE ATTRIBUTE](commands/data-io/datafile-attribute.md)
@@ -58,6 +58,18 @@
    - [REREAD](commands/data-io/reread.md)
    - [REPEATING DATA](commands/data-io/repeating-data.md)
    - [WRITE](commands/data-io/write.md)
+- [Working with Data Files](commands/spss-io/index.md)
+  - [APPLY DICTIONARY](commands/spss-io/apply-dictionary.md)
+  - [EXPORT](commands/spss-io/export.md)
+  - [GET](commands/spss-io/get.md)
+  - [GET DATA](commands/spss-io/get-data.md)
+  - [IMPORT](commands/spss-io/import.md)
+  - [SAVE](commands/spss-io/save.md)
+  - [SAVE DATA COLLECTION](commands/spss-io/save-data-collection.md)
+  - [SAVE TRANSLATE](commands/spss-io/save-translate.md)
+  - [SYSFILE INFO](commands/spss-io/sysfile-info.md)
+  - [XEXPORT](commands/spss-io/xexport.md)
+  - [XSAVE](commands/spss-io/xsave.md)
  
  # Developer Documentation
  
diff --git a/rust/doc/src/commands/spss-io/apply-dictionary.md b/rust/doc/src/commands/spss-io/apply-dictionary.md

new file mode 100644 (file)

index 0000000..3e4e8df
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/apply-dictionary.md
@@ -0,0 +1,57 @@
+# APPLY DICTIONARY
+
+```
+APPLY DICTIONARY FROM={'FILE_NAME',FILE_HANDLE}.
+```
+
+`APPLY DICTIONARY` applies the variable labels, value labels, and
+missing values taken from a file to corresponding variables in the
+active dataset.  In some cases it also updates the weighting variable.
+
+The `FROM` clause is mandatory.  Use it to specify a system file or
+portable file's name in single quotes, or a [file handle
+name](../../language/files/file-handles.md).  The dictionary in the
+file is read, but it does not replace the active dataset's dictionary.
+The file's data is not read.
+
+Only variables with names that exist in both the active dataset and
+the system file are considered.  Variables with the same name but
+different types (numeric, string) cause an error message.  Otherwise,
+the system file variables' attributes replace those in their matching
+active dataset variables:
+
+- If a system file variable has a variable label, then it replaces
+  the variable label of the active dataset variable.  If the system
+  file variable does not have a variable label, then the active
+  dataset variable's variable label, if any, is retained.
+
+- If the system file variable has custom attributes (*note VARIABLE
+  ATTRIBUTE::), then those attributes replace the active dataset
+  variable's custom attributes.  If the system file variable does not
+  have custom attributes, then the active dataset variable's custom
+  attributes, if any, is retained.
+
+- If the active dataset variable is numeric or short string, then
+  value labels and missing values, if any, are copied to the active
+  dataset variable.  If the system file variable does not have value
+  labels or missing values, then those in the active dataset
+  variable, if any, are not disturbed.
+
+In addition to properties of variables, some properties of the active
+file dictionary as a whole are updated:
+
+- If the system file has custom attributes (see [DATAFILE
+  ATTRIBUTE](../../commands/data-io/datafile-attribute.html)), then
+  those attributes replace the active dataset variable's custom
+  attributes.
+
+- If the active dataset has a weighting variable (*note WEIGHT::),
+  and the system file does not, or if the weighting variable in the
+  system file does not exist in the active dataset, then the active
+  dataset weighting variable, if any, is retained.  Otherwise, the
+  weighting variable in the system file becomes the active dataset
+  weighting variable.
+
+`APPLY DICTIONARY` takes effect immediately.  It does not read the
+active dataset.  The system file is not modified.
+
diff --git a/rust/doc/src/commands/spss-io/export.md b/rust/doc/src/commands/spss-io/export.md

new file mode 100644 (file)

index 0000000..6c29a27
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/export.md
@@ -0,0 +1,45 @@
+# EXPORT
+
+```
+EXPORT
+        /OUTFILE='FILE_NAME'
+        /UNSELECTED={RETAIN,DELETE}
+        /DIGITS=N
+        /DROP=VAR_LIST
+        /KEEP=VAR_LIST
+        /RENAME=(SRC_NAMES=TARGET_NAMES)...
+        /TYPE={COMM,TAPE}
+        /MAP
+```
+
+   The `EXPORT` procedure writes the active dataset's dictionary and
+data to a specified portable file.
+
+   `UNSELECTED` controls whether cases excluded with `FILTER` (*note
+FILTER::) are written to the file.  These can be excluded by
+specifying `DELETE` on the `UNSELECTED` subcommand.  The default is
+`RETAIN`.
+
+   Portable files express real numbers in base 30.  Integers are
+always expressed to the maximum precision needed to make them exact.
+Non-integers are, by default, expressed to the machine's maximum
+natural precision (approximately 15 decimal digits on many machines).
+If many numbers require this many digits, the portable file may
+significantly increase in size.  As an alternative, the `DIGITS`
+subcommand may be used to specify the number of decimal digits of
+precision to write.  `DIGITS` applies only to non-integers.
+
+   The `OUTFILE` subcommand, which is the only required subcommand,
+specifies the portable file to be written as a file name string or a
+[file handle](../../language/files/file-handles.md).
+
+`DROP`, `KEEP`, and `RENAME` have the same syntax and meaning as for
+the [`SAVE`](save.md) command.
+
+   The `TYPE` subcommand specifies the character set for use in the
+portable file.  Its value is currently not used.
+
+   The `MAP` subcommand is currently ignored.
+
+   `EXPORT` is a procedure.  It causes the active dataset to be read.
+
diff --git a/rust/doc/src/commands/spss-io/get-data.md b/rust/doc/src/commands/spss-io/get-data.md

new file mode 100644 (file)

index 0000000..b6601e1
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/get-data.md
@@ -0,0 +1,386 @@
+# GET DATA
+
+```
+GET DATA
+        /TYPE={GNM,ODS,PSQL,TXT}
+        ...additional subcommands depending on TYPE...
+```
+
+   The `GET DATA` command is used to read files and other data sources
+created by other applications.  When this command is executed, the
+current dictionary and active dataset are replaced with variables and
+data read from the specified source.
+
+   The `TYPE` subcommand is mandatory and must be the first subcommand
+specified.  It determines the type of the file or source to read.
+PSPP currently supports the following `TYPE`s:
+
+* `GNM`  
+  Spreadsheet files created by Gnumeric (<http://gnumeric.org>).
+
+* `ODS`  
+  Spreadsheet files in OpenDocument format
+  (<http://opendocumentformat.org>).
+
+* `PSQL`  
+  Relations from PostgreSQL databases (<http://postgresql.org>).
+
+* `TXT`  
+  Textual data files in columnar and delimited formats.
+
+Each supported file type has additional subcommands, explained in
+separate sections below.
+
+## Spreadsheet Files
+
+```
+GET DATA /TYPE={GNM, ODS}
+         /FILE={'FILE_NAME'}
+         /SHEET={NAME 'SHEET_NAME', INDEX N}
+         /CELLRANGE={RANGE 'RANGE', FULL}
+         /READNAMES={ON, OFF}
+         /ASSUMEDSTRWIDTH=N.
+```
+
+`GET DATA` can read Gnumeric spreadsheets (<http://gnumeric.org>), and
+spreadsheets in OpenDocument format
+(<http://libreplanet.org/wiki/Group:OpenDocument/Software>).  Use the
+`TYPE` subcommand to indicate the file's format.  `/TYPE=GNM`
+indicates Gnumeric files, `/TYPE=ODS` indicates OpenDocument.  The
+`FILE` subcommand is mandatory.  Use it to specify the name file to be
+read.  All other subcommands are optional.
+
+   The format of each variable is determined by the format of the
+spreadsheet cell containing the first datum for the variable.  If this
+cell is of string (text) format, then the width of the variable is
+determined from the length of the string it contains, unless the
+`ASSUMEDSTRWIDTH` subcommand is given.
+
+   The `SHEET` subcommand specifies the sheet within the spreadsheet
+file to read.  There are two forms of the `SHEET` subcommand.  In the
+first form, `/SHEET=name SHEET_NAME`, the string SHEET_NAME is the name
+of the sheet to read.  In the second form, `/SHEET=index IDX`, IDX is a
+integer which is the index of the sheet to read.  The first sheet has
+the index 1.  If the `SHEET` subcommand is omitted, then the command
+reads the first sheet in the file.
+
+   The `CELLRANGE` subcommand specifies the range of cells within the
+sheet to read.  If the subcommand is given as `/CELLRANGE=FULL`, then
+the entire sheet is read.  To read only part of a sheet, use the form
+`/CELLRANGE=range 'TOP_LEFT_CELL:BOTTOM_RIGHT_CELL'`.  For example,
+the subcommand `/CELLRANGE=range 'C3:P19'` reads columns C-P and rows
+3-19, inclusive.  Without the `CELLRANGE` subcommand, the entire sheet
+is read.
+
+   If `/READNAMES=ON` is specified, then the contents of cells of the
+first row are used as the names of the variables in which to store the
+data from subsequent rows.  This is the default.  If `/READNAMES=OFF` is
+used, then the variables receive automatically assigned names.
+
+   The `ASSUMEDSTRWIDTH` subcommand specifies the maximum width of
+string variables read from the file.  If omitted, the default value is
+determined from the length of the string in the first spreadsheet cell
+for each variable.
+
+## Postgres Database Queries
+
+```
+GET DATA /TYPE=PSQL
+         /CONNECT={CONNECTION INFO}
+         /SQL={QUERY}
+         [/ASSUMEDSTRWIDTH=W]
+         [/UNENCRYPTED]
+         [/BSIZE=N].
+```
+
+   `GET DATA /TYPE=PSQL` imports data from a local or remote Postgres
+database server.  It automatically creates variables based on the table
+column names or the names specified in the SQL query.  PSPP cannot
+support the full precision of some Postgres data types, so data of those
+types will lose some precision when PSPP imports them.  PSPP does not
+support all Postgres data types.  If PSPP cannot support a datum, `GET
+DATA` issues a warning and substitutes the system-missing value.
+
+   The `CONNECT` subcommand must be a string for the parameters of the
+database server from which the data should be fetched.  The format of
+the string is given [in the Postgres
+manual](http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT).
+
+   The `SQL` subcommand must be a valid SQL statement to retrieve data
+from the database.
+
+   The `ASSUMEDSTRWIDTH` subcommand specifies the maximum width of
+string variables read from the database.  If omitted, the default value
+is determined from the length of the string in the first value read for
+each variable.
+
+   The `UNENCRYPTED` subcommand allows data to be retrieved over an
+insecure connection.  If the connection is not encrypted, and the
+`UNENCRYPTED` subcommand is not given, then an error occurs.  Whether or
+not the connection is encrypted depends upon the underlying psql library
+and the capabilities of the database server.
+
+   The `BSIZE` subcommand serves only to optimise the speed of data
+transfer.  It specifies an upper limit on number of cases to fetch from
+the database at once.  The default value is 4096.  If your SQL statement
+fetches a large number of cases but only a small number of variables,
+then the data transfer may be faster if you increase this value.
+Conversely, if the number of variables is large, or if the machine on
+which PSPP is running has only a small amount of memory, then a smaller
+value is probably better.
+
+### Example
+
+```
+GET DATA /TYPE=PSQL
+     /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
+     /SQL='select * from manufacturer'.
+```
+
+## Textual Data Files
+
+```
+GET DATA /TYPE=TXT
+        /FILE={'FILE_NAME',FILE_HANDLE}
+        [ENCODING='ENCODING']
+        [/ARRANGEMENT={DELIMITED,FIXED}]
+        [/FIRSTCASE={FIRST_CASE}]
+        [/IMPORTCASES=...]
+        ...additional subcommands depending on ARRANGEMENT...
+```
+
+   When `TYPE=TXT` is specified, `GET DATA` reads data in a delimited
+or fixed columnar format, much like [`DATA
+LIST`](../../commands/data-io/data-list.md).
+
+   The `FILE` subcommand must specify the file to be read as a string
+file name or (for textual data only) a [file
+handle](../../language/files/file-handles.md)).
+
+   The `ENCODING` subcommand specifies the character encoding of the
+file to be read.  *Note INSERT::, for information on supported
+encodings.
+
+   The `ARRANGEMENT` subcommand determines the file's basic format.
+`DELIMITED`, the default setting, specifies that fields in the input data
+are separated by spaces, tabs, or other user-specified delimiters.
+`FIXED` specifies that fields in the input data appear at particular fixed
+column positions within records of a case.
+
+   By default, cases are read from the input file starting from the
+first line.  To skip lines at the beginning of an input file, set
+`FIRSTCASE` to the number of the first line to read: 2 to skip the
+first line, 3 to skip the first two lines, and so on.
+
+   `IMPORTCASES` is ignored, for compatibility.  Use `N OF CASES` to
+limit the number of cases read from a file (*note N OF CASES::), or
+`SAMPLE` to obtain a random sample of cases (*note SAMPLE::).
+
+   The remaining subcommands apply only to one of the two file
+arrangements, described below.
+
+### Delimited Data
+
+```
+GET DATA /TYPE=TXT
+        /FILE={'FILE_NAME',FILE_HANDLE}
+        [/ARRANGEMENT={DELIMITED,FIXED}]
+        [/FIRSTCASE={FIRST_CASE}]
+        [/IMPORTCASE={ALL,FIRST MAX_CASES,PERCENT PERCENT}]
+
+        /DELIMITERS="DELIMITERS"
+        [/QUALIFIER="QUOTES"
+        [/DELCASE={LINE,VARIABLES N_VARIABLES}]
+        /VARIABLES=DEL_VAR1 [DEL_VAR2]...
+where each DEL_VAR takes the form:
+        variable format
+```
+
+   The `GET DATA` command with `TYPE=TXT` and `ARRANGEMENT=DELIMITED`
+reads input data from text files in delimited format, where fields are
+separated by a set of user-specified delimiters.  Its capabilities are
+similar to those of [`DATA LIST
+FREE`](../../commands/data-io/data-list.md#data-list-free), with a few
+enhancements.
+
+   The required `FILE` subcommand and optional `FIRSTCASE` and
+`IMPORTCASE` subcommands are described above (*note GET DATA
+/TYPE=TXT::).
+
+   `DELIMITERS`, which is required, specifies the set of characters that
+may separate fields.  Each character in the string specified on
+`DELIMITERS` separates one field from the next.  The end of a line also
+separates fields, regardless of `DELIMITERS`.  Two consecutive
+delimiters in the input yield an empty field, as does a delimiter at the
+end of a line.  A space character as a delimiter is an exception:
+consecutive spaces do not yield an empty field and neither does any
+number of spaces at the end of a line.
+
+   To use a tab as a delimiter, specify `\t` at the beginning of the
+`DELIMITERS` string.  To use a backslash as a delimiter, specify `\\` as
+the first delimiter or, if a tab should also be a delimiter, immediately
+following `\t`.  To read a data file in which each field appears on a
+separate line, specify the empty string for `DELIMITERS`.
+
+   The optional `QUALIFIER` subcommand names one or more characters that
+can be used to quote values within fields in the input.  A field that
+begins with one of the specified quote characters ends at the next
+matching quote.  Intervening delimiters become part of the field,
+instead of terminating it.  The ability to specify more than one quote
+character is a PSPP extension.
+
+   The character specified on `QUALIFIER` can be embedded within a field
+that it quotes by doubling the qualifier.  For example, if `'` is
+specified on `QUALIFIER`, then `'a''b'` specifies a field that contains
+`a'b`.
+
+   The `DELCASE` subcommand controls how data may be broken across
+lines in the data file.  With `LINE`, the default setting, each line
+must contain all the data for exactly one case.  For additional
+flexibility, to allow a single case to be split among lines or
+multiple cases to be contained on a single line, specify `VARIABLES
+n_variables`, where `n_variables` is the number of variables per case.
+
+   The `VARIABLES` subcommand is required and must be the last
+subcommand.  Specify the name of each variable and its [input
+format](../../language/datasets/formats/index.md), in the order they
+should be read from the input file.
+
+#### Example 1
+
+On a Unix-like system, the `/etc/passwd` file has a format similar to
+this:
+
+```
+root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
+blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
+john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
+jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
+```
+
+The following syntax reads a file in the format used by `/etc/passwd`:
+
+```
+GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
+        /VARIABLES=username A20
+                   password A40
+                   uid F10
+                   gid F10
+                   gecos A40
+                   home A40
+                   shell A40.
+```
+
+#### Example 2
+
+Consider the following data on used cars:
+
+```
+model   year    mileage price   type    age
+Civic   2002    29883   15900   Si      2
+Civic   2003    13415   15900   EX      1
+Civic   1992    107000  3800    n/a     12
+Accord  2002    26613   17900   EX      1
+```
+
+The following syntax can be used to read the used car data:
+
+```
+GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
+        /VARIABLES=model A8
+                   year F4
+                   mileage F6
+                   price F5
+                   type A4
+                   age F2.
+```
+
+#### Example 3
+
+Consider the following information on animals in a pet store:
+
+```
+'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
+, (Years), , , (Dollars), ,
+"Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
+"Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
+"Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
+"Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
+```
+
+The following syntax can be used to read the pet store data:
+
+```
+GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
+        /FIRSTCASE=3
+        /VARIABLES=name A10
+                   age F3.1
+                   color A5
+                   received EDATE10
+                   price F5.2
+                   height a5
+                   type a10.
+```
+
+### Fixed Columnar Data
+
+```
+GET DATA /TYPE=TXT
+        /FILE={'file_name',FILE_HANDLE}
+        [/ARRANGEMENT={DELIMITED,FIXED}]
+        [/FIRSTCASE={FIRST_CASE}]
+        [/IMPORTCASE={ALL,FIRST MAX_CASES,PERCENT PERCENT}]
+
+        [/FIXCASE=N]
+        /VARIABLES FIXED_VAR [FIXED_VAR]...
+            [/rec# FIXED_VAR [FIXED_VAR]...]...
+where each FIXED_VAR takes the form:
+        VARIABLE START-END FORMAT
+```
+
+   The `GET DATA` command with `TYPE=TXT` and `ARRANGEMENT=FIXED`
+reads input data from text files in fixed format, where each field is
+located in particular fixed column positions within records of a case.
+Its capabilities are similar to those of [`DATA LIST
+FIXED`](../../commands/data-io/data-list.md#data-list-fixed), with a
+few enhancements.
+
+   The required `FILE` subcommand and optional `FIRSTCASE` and
+`IMPORTCASE` subcommands are described [above](#textual-data-files).
+
+   The optional `FIXCASE` subcommand may be used to specify the positive
+integer number of input lines that make up each case.  The default value
+is 1.
+
+   The `VARIABLES` subcommand, which is required, specifies the
+positions at which each variable can be found.  For each variable,
+specify its name, followed by its start and end column separated by `-`
+(e.g. `0-9`), followed by an input format type (e.g. `F`) or a full
+format specification (e.g. `DOLLAR12.2`).  For this command, columns are
+numbered starting from 0 at the left column.  Introduce the variables in
+the second and later lines of a case by a slash followed by the number
+of the line within the case, e.g. `/2` for the second line.
+
+#### Example
+
+Consider the following data on used cars:
+
+```
+model   year    mileage price   type    age
+Civic   2002    29883   15900   Si      2
+Civic   2003    13415   15900   EX      1
+Civic   1992    107000  3800    n/a     12
+Accord  2002    26613   17900   EX      1
+```
+
+The following syntax can be used to read the used car data:
+
+```
+GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
+        /VARIABLES=model 0-7 A
+                   year 8-15 F
+                   mileage 16-23 F
+                   price 24-31 F
+                   type 32-40 A
+                   age 40-47 F.
+```
diff --git a/rust/doc/src/commands/spss-io/get.md b/rust/doc/src/commands/spss-io/get.md

new file mode 100644 (file)

index 0000000..3777bcc
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/get.md
@@ -0,0 +1,55 @@
+# GET
+
+```
+GET
+        /FILE={'FILE_NAME',FILE_HANDLE}
+        /DROP=VAR_LIST
+        /KEEP=VAR_LIST
+        /RENAME=(SRC_NAMES=TARGET_NAMES)...
+        /ENCODING='ENCODING'
+```
+
+   `GET` clears the current dictionary and active dataset and replaces
+them with the dictionary and data from a specified file.
+
+   The `FILE` subcommand is the only required subcommand.  Specify the
+SPSS system file, SPSS/PC+ system file, or SPSS portable file to be
+read as a string file name or a [file
+handle](../../language/files/file-handles.md).
+
+   By default, all the variables in a file are read.  The `DROP`
+subcommand can be used to specify a list of variables that are not to
+be read.  By contrast, the `KEEP` subcommand can be used to specify
+variable that are to be read, with all other variables not read.
+
+   Normally variables in a file retain the names that they were saved
+under.  Use the `RENAME` subcommand to change these names.  Specify,
+within parentheses, a list of variable names followed by an equals sign
+(`=`) and the names that they should be renamed to.  Multiple
+parenthesized groups of variable names can be included on a single
+`RENAME` subcommand.  Variables' names may be swapped using a `RENAME`
+subcommand of the form `/RENAME=(A B=B A)`.
+
+   Alternate syntax for the `RENAME` subcommand allows the parentheses
+to be omitted.  When this is done, only a single variable may be
+renamed at once.  For instance, `/RENAME=A=B`.  This alternate syntax
+is discouraged.
+
+   `DROP`, `KEEP`, and `RENAME` are executed in left-to-right order.
+Each may be present any number of times.  `GET` never modifies a file on
+disk.  Only the active dataset read from the file is affected by these
+subcommands.
+
+   PSPP automatically detects the encoding of string data in the file,
+when possible.  The character encoding of old SPSS system files cannot
+always be guessed correctly, and SPSS/PC+ system files do not include
+any indication of their encoding.  Specify the `ENCODING` subcommand
+with an IANA character set name as its string argument to override the
+default.  Use `SYSFILE INFO` to analyze the encodings that might be
+valid for a system file.  The `ENCODING` subcommand is a PSPP extension.
+
+   `GET` does not cause the data to be read, only the dictionary.  The
+data is read later, when a procedure is executed.
+
+   Use of `GET` to read a portable file is a PSPP extension.
+
diff --git a/rust/doc/src/commands/spss-io/import.md b/rust/doc/src/commands/spss-io/import.md

new file mode 100644 (file)

index 0000000..60365f4
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/import.md
@@ -0,0 +1,29 @@
+# IMPORT
+
+```
+IMPORT
+        /FILE='FILE_NAME'
+        /TYPE={COMM,TAPE}
+        /DROP=VAR_LIST
+        /KEEP=VAR_LIST
+        /RENAME=(SRC_NAMES=TARGET_NAMES)...
+```
+
+The `IMPORT` transformation clears the active dataset dictionary and
+data and replaces them with a dictionary and data from a system file or
+portable file.
+
+The `FILE` subcommand, which is the only required subcommand,
+specifies the portable file to be read as a file name string or a
+[file handle](../../language/files/file-handles.md).
+
+The `TYPE` subcommand is currently not used.
+
+`DROP`, `KEEP`, and `RENAME` follow the syntax used by
+[`GET`](get.md).
+
+`IMPORT` does not cause the data to be read; only the dictionary.
+The data is read later, when a procedure is executed.
+
+Use of `IMPORT` to read a system file is a PSPP extension.
+
diff --git a/rust/doc/src/commands/spss-io/index.md b/rust/doc/src/commands/spss-io/index.md

new file mode 100644 (file)

index 0000000..a7817a6
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/index.md
@@ -0,0 +1,4 @@
+# Working with SPSS Data Files
+
+These commands read and write data files in SPSS and other proprietary
+or specialized data formats.
diff --git a/rust/doc/src/commands/spss-io/save-data-collection.md b/rust/doc/src/commands/spss-io/save-data-collection.md

new file mode 100644 (file)

index 0000000..02d1b9f
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/save-data-collection.md
@@ -0,0 +1,37 @@
+# SAVE DATA COLLECTION
+
+```
+SAVE DATA COLLECTION
+        /OUTFILE={'FILE_NAME',FILE_HANDLE}
+        /METADATA={'FILE_NAME',FILE_HANDLE}
+        /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
+        /PERMISSIONS={WRITEABLE,READONLY}
+        /DROP=VAR_LIST
+        /KEEP=VAR_LIST
+        /VERSION=VERSION
+        /RENAME=(SRC_NAMES=TARGET_NAMES)...
+        /NAMES
+        /MAP
+```
+
+Like `SAVE`, `SAVE DATA COLLECTION` writes the dictionary and data in
+the active dataset to a system file.  In addition, it writes metadata to
+an additional XML metadata file.
+
+`OUTFILE` is required.  Specify the system file to be written as a
+string file name or a [file
+handle](../../language/files/file-handles.md).
+
+`METADATA` is also required.  Specify the metadata file to be written
+as a string file name or a file handle.  Metadata files customarily use
+a `.mdd` extension.
+
+The current implementation of this command is experimental.  It only
+outputs an approximation of the metadata file format.  Please report
+bugs.
+
+Other subcommands are optional.  They have the same meanings as in
+the `SAVE` command.
+
+`SAVE DATA COLLECTION` causes the data to be read.  It is a procedure.
+
diff --git a/rust/doc/src/commands/spss-io/save-translate.md b/rust/doc/src/commands/spss-io/save-translate.md

new file mode 100644 (file)

index 0000000..350d4fc
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/save-translate.md
@@ -0,0 +1,127 @@
+# SAVE TRANSLATE
+
+```
+SAVE TRANSLATE
+        /OUTFILE={'FILE_NAME',FILE_HANDLE}
+        /TYPE={CSV,TAB}
+        [/REPLACE]
+        [/MISSING={IGNORE,RECODE}]
+
+        [/DROP=VAR_LIST]
+        [/KEEP=VAR_LIST]
+        [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+        [/UNSELECTED={RETAIN,DELETE}]
+        [/MAP]
+
+        ...additional subcommands depending on TYPE...
+```
+
+The `SAVE TRANSLATE` command is used to save data into various
+formats understood by other applications.
+
+The `OUTFILE` and `TYPE` subcommands are mandatory.  `OUTFILE`
+specifies the file to be written, as a string file name or a file handle
+(*note File Handles::).  `TYPE` determines the type of the file or
+source to read.  It must be one of the following:
+
+* `CSV`  
+  Comma-separated value format,
+
+* `TAB`  
+  Tab-delimited format.
+
+By default, `SAVE TRANSLATE` does not overwrite an existing file.
+Use `REPLACE` to force an existing file to be overwritten.
+
+With `MISSING=IGNORE`, the default, `SAVE TRANSLATE` treats
+user-missing values as if they were not missing.  Specify
+`MISSING=RECODE` to output numeric user-missing values like
+system-missing values and string user-missing values as all spaces.
+
+By default, all the variables in the active dataset dictionary are
+saved to the system file, but `DROP` or `KEEP` can select a subset of
+variable to save.  The `RENAME` subcommand can also be used to change
+the names under which variables are saved; because they are used only
+in the output, these names do not have to conform to the usual PSPP
+variable naming rules.  `UNSELECTED` determines whether cases filtered
+out by the `FILTER` command are written to the output file.  These
+subcommands have the same syntax and meaning as on the
+[`SAVE`](save.md) command.
+
+Each supported file type has additional subcommands, explained in
+separate sections below.
+
+`SAVE TRANSLATE` causes the data to be read.  It is a procedure.
+
+## Comma- and Tab-Separated Data Files
+
+```
+SAVE TRANSLATE
+        /OUTFILE={'FILE_NAME',FILE_HANDLE}
+        /TYPE=CSV
+        [/REPLACE]
+        [/MISSING={IGNORE,RECODE}]
+
+        [/DROP=VAR_LIST]
+        [/KEEP=VAR_LIST]
+        [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+        [/UNSELECTED={RETAIN,DELETE}]
+
+        [/FIELDNAMES]
+        [/CELLS={VALUES,LABELS}]
+        [/TEXTOPTIONS DELIMITER='DELIMITER']
+        [/TEXTOPTIONS QUALIFIER='QUALIFIER']
+        [/TEXTOPTIONS DECIMAL={DOT,COMMA}]
+        [/TEXTOPTIONS FORMAT={PLAIN,VARIABLE}]
+```
+
+The `SAVE TRANSLATE` command with `TYPE=CSV` or `TYPE=TAB` writes data in a
+comma- or tab-separated value format similar to that described by
+RFC 4180.  Each variable becomes one output column, and each case
+becomes one line of output.  If `FIELDNAMES` is specified, an additional
+line at the top of the output file lists variable names.
+
+The `CELLS` and `TEXTOPTIONS FORMAT` settings determine how values are
+written to the output file:
+
+* `CELLS=VALUES FORMAT=PLAIN` (the default settings)  
+  Writes variables to the output in "plain" formats that ignore the
+  details of variable formats.  Numeric values are written as plain
+  decimal numbers with enough digits to indicate their exact values
+  in machine representation.  Numeric values include `e` followed by
+  an exponent if the exponent value would be less than -4 or greater
+  than 16.  Dates are written in MM/DD/YYYY format and times in
+  HH:MM:SS format.  `WKDAY` and `MONTH` values are written as decimal
+  numbers.
+
+  Numeric values use, by default, the decimal point character set
+  with `SET DECIMAL` (*note SET DECIMAL::).  Use `DECIMAL=DOT` or
+  `DECIMAL=COMMA` to force a particular decimal point character.
+
+* `CELLS=VALUES FORMAT=VARIABLE`  
+  Writes variables using their print formats.  Leading and trailing
+  spaces are removed from numeric values, and trailing spaces are
+  removed from string values.
+
+* `CELLS=LABEL FORMAT=PLAIN`  
+  `CELLS=LABEL FORMAT=VARIABLE`  
+  Writes value labels where they exist, and otherwise writes the
+  values themselves as described above.
+
+   Regardless of `CELLS` and `TEXTOPTIONS FORMAT`, numeric system-missing
+values are output as a single space.
+
+   For `TYPE=TAB`, tab characters delimit values.  For `TYPE=CSV`, the
+`TEXTOPTIONS DELIMITER` and `DECIMAL` settings determine the character
+that separate values within a line.  If `DELIMITER` is specified, then
+the specified string separate values.  If `DELIMITER` is not
+specified, then the default is a comma with `DECIMAL=DOT` or a
+semicolon with `DECIMAL=COMMA`. If `DECIMAL` is not given either, it
+is inferred from the decimal point character set with `SET DECIMAL`
+(*note SET DECIMAL::).
+
+   The `TEXTOPTIONS QUALIFIER` setting specifies a character that is
+output before and after a value that contains the delimiter character or
+the qualifier character.  The default is a double quote (`"`).  A
+qualifier character that appears within a value is doubled.
+
diff --git a/rust/doc/src/commands/spss-io/save.md b/rust/doc/src/commands/spss-io/save.md

new file mode 100644 (file)

index 0000000..4f426d9
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/save.md
@@ -0,0 +1,89 @@
+# SAVE
+
+```
+SAVE
+        /OUTFILE={'FILE_NAME',FILE_HANDLE}
+        /UNSELECTED={RETAIN,DELETE}
+        /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
+        /PERMISSIONS={WRITEABLE,READONLY}
+        /DROP=VAR_LIST
+        /KEEP=VAR_LIST
+        /VERSION=VERSION
+        /RENAME=(SRC_NAMES=TARGET_NAMES)...
+        /NAMES
+        /MAP
+```
+
+   The `SAVE` procedure causes the dictionary and data in the active
+dataset to be written to a system file.
+
+   `OUTFILE` is the only required subcommand.  Specify the system file
+to be written as a string file name or a [file
+handle](../../language/files/file-handles.md).
+
+   By default, cases excluded with `FILTER` are written to the system
+file.  These can be excluded by specifying `DELETE` on the `UNSELECTED`
+subcommand.  Specifying `RETAIN` makes the default explicit.
+
+   The `UNCOMPRESSED`, `COMPRESSED`, and `ZCOMPRESSED` subcommand
+determine the system file's compression level:
+
+* `UNCOMPRESSED`  
+  Data is not compressed.  Each numeric value uses 8 bytes of disk
+  space.  Each string value uses one byte per column width, rounded
+  up to a multiple of 8 bytes.
+
+* `COMPRESSED`  
+  Data is compressed in a simple way.  Each integer numeric value
+  between −99 and 151, inclusive, or system missing value uses one
+  byte of disk space.  Each 8-byte segment of a string that consists
+  only of spaces uses 1 byte.  Any other numeric value or 8-byte
+  string segment uses 9 bytes of disk space.
+
+* `ZCOMPRESSED`  
+  Data is compressed with the "deflate" compression algorithm
+  specified in RFC 1951 (the same algorithm used by `gzip`).  Files
+  written with this compression level cannot be read by PSPP 0.8.1 or
+  earlier or by SPSS 20 or earlier.
+
+`COMPRESSED` is the default compression level.  The SET command (*note
+SET::) can change this default.
+
+The `PERMISSIONS` subcommand specifies operating system permissions
+for the new system file.  `WRITEABLE`, the default, creates the file
+with read and write permission.  `READONLY` creates the file for
+read-only access.
+
+By default, all the variables in the active dataset dictionary are
+written to the system file.  The `DROP` subcommand can be used to
+specify a list of variables not to be written.  In contrast, `KEEP`
+specifies variables to be written, with all variables not specified
+not written.
+
+Normally variables are saved to a system file under the same names
+they have in the active dataset.  Use the `RENAME` subcommand to change
+these names.  Specify, within parentheses, a list of variable names
+followed by an equals sign (`=`) and the names that they should be
+renamed to.  Multiple parenthesized groups of variable names can be
+included on a single `RENAME` subcommand.  Variables' names may be
+swapped using a `RENAME` subcommand of the form `/RENAME=(A B=B A)`.
+
+Alternate syntax for the `RENAME` subcommand allows the parentheses to
+be eliminated.  When this is done, only a single variable may be
+renamed at once.  For instance, `/RENAME=A=B`.  This alternate syntax
+is discouraged.
+
+`DROP`, `KEEP`, and `RENAME` are performed in left-to-right order.
+They each may be present any number of times.  `SAVE` never modifies
+the active dataset.  `DROP`, `KEEP`, and `RENAME` only affect the
+system file written to disk.
+
+The `VERSION` subcommand specifies the version of the file format.
+Valid versions are 2 and 3.  The default version is 3.  In version 2
+system files, variable names longer than 8 bytes are truncated.  The
+two versions are otherwise identical.
+
+The `NAMES` and `MAP` subcommands are currently ignored.
+
+`SAVE` causes the data to be read.  It is a procedure.
+
diff --git a/rust/doc/src/commands/spss-io/sysfile-info.md b/rust/doc/src/commands/spss-io/sysfile-info.md

new file mode 100644 (file)

index 0000000..5f837fa
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/sysfile-info.md
@@ -0,0 +1,24 @@
+# SYSFILE INFO
+
+```
+SYSFILE INFO FILE='FILE_NAME' [ENCODING='ENCODING'].
+```
+
+`SYSFILE INFO` reads the dictionary in an SPSS system file, SPSS/PC+
+system file, or SPSS portable file, and displays the information in
+its dictionary.
+
+Specify a file name or file handle.  `SYSFILE INFO` reads that file
+and displays information on its dictionary.
+
+PSPP automatically detects the encoding of string data in the file,
+when possible.  The character encoding of old SPSS system files cannot
+always be guessed correctly, and SPSS/PC+ system files do not include
+any indication of their encoding.  Specify the `ENCODING` subcommand
+with an IANA character set name as its string argument to override the
+default, or specify `ENCODING='DETECT'` to analyze and report possibly
+valid encodings for the system file.  The `ENCODING` subcommand is a
+PSPP extension.
+
+`SYSFILE INFO` does not affect the current active dataset.
+
diff --git a/rust/doc/src/commands/spss-io/xexport.md b/rust/doc/src/commands/spss-io/xexport.md

new file mode 100644 (file)

index 0000000..d791aa0
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/xexport.md
@@ -0,0 +1,27 @@
+# XEXPORT
+
+```
+XEXPORT
+        /OUTFILE='FILE_NAME'
+        /DIGITS=N
+        /DROP=VAR_LIST
+        /KEEP=VAR_LIST
+        /RENAME=(SRC_NAMES=TARGET_NAMES)...
+        /TYPE={COMM,TAPE}
+        /MAP
+```
+
+The `XEXPORT` transformation writes the active dataset dictionary and
+data to a specified portable file.
+
+This transformation is a PSPP extension.
+
+It is similar to the `EXPORT` procedure, with two differences:
+
+- `XEXPORT` is a transformation, not a procedure.  It is executed when
+  the data is read by a procedure or procedure-like command.
+
+- `XEXPORT` does not support the `UNSELECTED` subcommand.
+
+See [`EXPORT`](export.md) for more information.
+
diff --git a/rust/doc/src/commands/spss-io/xsave.md b/rust/doc/src/commands/spss-io/xsave.md

new file mode 100644 (file)

index 0000000..0462500
--- /dev/null
+++ b/rust/doc/src/commands/spss-io/xsave.md
@@ -0,0 +1,26 @@
+# XSAVE
+
+```
+XSAVE
+        /OUTFILE='FILE_NAME'
+        /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
+        /PERMISSIONS={WRITEABLE,READONLY}
+        /DROP=VAR_LIST
+        /KEEP=VAR_LIST
+        /VERSION=VERSION
+        /RENAME=(SRC_NAMES=TARGET_NAMES)...
+        /NAMES
+        /MAP
+```
+
+The `XSAVE` transformation writes the active dataset's dictionary and
+data to a system file.  It is similar to the `SAVE` procedure, with
+two differences:
+
+- `XSAVE` is a transformation, not a procedure.  It is executed when
+  the data is read by a procedure or procedure-like command.
+
+- `XSAVE` does not support the `UNSELECTED` subcommand.
+
+See [`SAVE`](save.md) for more information.
+
author	Ben Pfaff <blp@cs.stanford.edu>
	Wed, 7 May 2025 00:22:33 +0000 (17:22 -0700)
committer	Ben Pfaff <blp@cs.stanford.edu>
	Wed, 7 May 2025 00:22:33 +0000 (17:22 -0700)
rust/doc/src/SUMMARY.md		patch \| blob \| history
rust/doc/src/commands/spss-io/apply-dictionary.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/export.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/get-data.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/get.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/import.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/index.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/save-data-collection.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/save-translate.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/save.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/sysfile-info.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/xexport.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/commands/spss-io/xsave.md	[new file with mode: 0644]	patch \| blob