[Introduction](introduction.md)
[License](license.md)
-# Language Syntax
+# Language Overview
- [Basics](language/basics/index.md)
- [Tokens](language/basics/tokens.md)
- [Miscellaneous Functions](language/expressions/functions/miscellaneous.md)
- [Statistical Distribution Functions](language/expressions/functions/statistical-distributions.md)
+# Command Syntax
+
+- [Data Input and Output](commands/data-io/index.md)
+ - [BEGIN DATA](commands/data-io/begin-data.md)
+ - [CLOSE FILE HANDLE](commands/data-io/close-file-handle.md)
+ - [DATAFILE ATTRIBUTE](commands/data-io/datafile-attribute.md)
+ - [DATASET commands](commands/data-io/dataset.md)
+ - [DATA LIST](commands/data-io/data-list.md)
+ - [END CASE](commands/data-io/end-case.md)
+ - [END FILE](commands/data-io/end-file.md)
+ - [FILE HANDLE](commands/data-io/file-handle.md)
+ - [INPUT PROGRAM](commands/data-io/input-program.md)
+ - [LIST](commands/data-io/list.md)
+ - [NEW FILE](commands/data-io/new-file.md)
+ - [PRINT](commands/data-io/print.md)
+ - [PRINT EJECT](commands/data-io/print-eject.md)
+ - [PRINT SPACE](commands/data-io/print-space.md)
+ - [REREAD](commands/data-io/reread.md)
+ - [REPEATING DATA](commands/data-io/repeating-data.md)
+ - [WRITE](commands/data-io/write.md)
+
# Developer Documentation
- [System File Format](system-file/index.md)
--- /dev/null
+# BEGIN DATA
+
+```
+BEGIN DATA.
+...
+END DATA.
+```
+
+`BEGIN DATA` and `END DATA` can be used to embed raw ASCII data in a
+PSPP syntax file. `DATA LIST` or another input procedure must be used
+before `BEGIN DATA` (*note DATA LIST::). `BEGIN DATA` and `END DATA`
+must be used together. `END DATA` must appear by itself on a single
+line, with no leading white space and exactly one space between the
+words `END` and `DATA`.
--- /dev/null
+# CLOSE FILE HANDLE
+
+```
+CLOSE FILE HANDLE HANDLE_NAME.
+```
+
+`CLOSE FILE HANDLE` disassociates the name of a [file
+handle](../../language/files/file-handles.md) with a given file. The
+only specification is the name of the handle to close. Afterward
+`FILE HANDLE`.
+
+The file named INLINE, which represents data entered between `BEGIN
+DATA` and `END DATA`, cannot be closed. Attempts to close it with
+`CLOSE FILE HANDLE` have no effect.
+
+`CLOSE FILE HANDLE` is a PSPP extension.
+
--- /dev/null
+# DATA LIST
+
+Used to read text or binary data, `DATA LIST` is the most fundamental
+data-reading command. Even the more sophisticated input methods use
+`DATA LIST` commands as a building block. Understanding `DATA LIST` is
+important to understanding how to use PSPP to read your data files.
+
+ There are two major variants of `DATA LIST`, which are fixed format
+and free format. In addition, free format has a minor variant, list
+format, which is discussed in terms of its differences from vanilla free
+format.
+
+ Each form of `DATA LIST` is described in detail below.
+
+ *Note GET DATA::, for a command that offers a few enhancements over
+DATA LIST and that may be substituted for DATA LIST in many situations.
+
+## DATA LIST FIXED
+
+```
+DATA LIST [FIXED]
+ {TABLE,NOTABLE}
+ [FILE='FILE_NAME' [ENCODING='ENCODING']]
+ [RECORDS=RECORD_COUNT]
+ [END=END_VAR]
+ [SKIP=RECORD_COUNT]
+ /[line_no] VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+```
+
+ `DATA LIST FIXED` is used to read data files that have values at
+fixed positions on each line of single-line or multiline records. The
+keyword `FIXED` is optional.
+
+ The `FILE` subcommand must be used if input is to be taken from an
+external file. It may be used to specify a file name as a string or a
+[file handle](../../language/files/file-handles.md). If the `FILE`
+subcommand is not used, then input is assumed to be specified within
+the command file using [`BEGIN DATA`...`END DATA`](begin-data.md).
+The `ENCODING` subcommand may only be used if the `FILE` subcommand is
+also used. It specifies the character encoding of the file. *Note
+INSERT::, for information on supported encodings.
+
+ The optional `RECORDS` subcommand, which takes a single integer as an
+argument, is used to specify the number of lines per record. If
+`RECORDS` is not specified, then the number of lines per record is
+calculated from the list of variable specifications later in `DATA
+LIST`.
+
+ The `END` subcommand is only useful in conjunction with `INPUT
+PROGRAM`. *Note INPUT PROGRAM::, for details.
+
+ The optional `SKIP` subcommand specifies a number of records to skip
+at the beginning of an input file. It can be used to skip over a row
+that contains variable names, for example.
+
+ `DATA LIST` can optionally output a table describing how the data
+file is read. The `TABLE` subcommand enables this output, and `NOTABLE`
+disables it. The default is to output the table.
+
+ The list of variables to be read from the data list must come last.
+Each line in the data record is introduced by a slash (`/`).
+Optionally, a line number may follow the slash. Following, any number
+of variable specifications may be present.
+
+ Each variable specification consists of a list of variable names
+followed by a description of their location on the input line. [Sets
+of variables](../../language/datasets/variable-lists.html) may be with
+`TO`, e.g. `VAR1 TO VAR5`. There are two ways to specify the location
+of the variable on the line: columnar style and FORTRAN style.
+
+ In columnar style, the starting column and ending column for the
+field are specified after the variable name, separated by a dash
+(`-`). For instance, the third through fifth columns on a line would
+be specified `3-5`. By default, variables are considered to be in
+[`F` format](../../language/datasets/formats/basic.html). (This
+default can be changed; see *note SET:: for more information.)
+
+ In columnar style, to use a variable format other than the default,
+specify the format type in parentheses after the column numbers. For
+instance, for alphanumeric `A` format, use `(A)`.
+
+ In addition, implied decimal places can be specified in parentheses
+after the column numbers. As an example, suppose that a data file has a
+field in which the characters `1234` should be interpreted as having the
+value 12.34. Then this field has two implied decimal places, and the
+corresponding specification would be `(2)`. If a field that has implied
+decimal places contains a decimal point, then the implied decimal places
+are not applied.
+
+ Changing the variable format and adding implied decimal places can be
+done together; for instance, `(N,5)`.
+
+ When using columnar style, the input and output width of each
+variable is computed from the field width. The field width must be
+evenly divisible into the number of variables specified.
+
+ FORTRAN style is an altogether different approach to specifying field
+locations. With this approach, a list of variable input format
+specifications, separated by commas, are placed after the variable names
+inside parentheses. Each format specifier advances as many characters
+into the input line as it uses.
+
+ Implied decimal places also exist in FORTRAN style. A format
+specification with `D` decimal places also has `D` implied decimal places.
+
+ In addition to the [standard
+ formats](../../language/datasets/formats/index.html), FORTRAN style
+ defines some extensions:
+
+* `X`
+ Advance the current column on this line by one character position.
+
+* `T<X>`
+ Set the current column on this line to column `<X>`, with column
+ numbers considered to begin with 1 at the left margin.
+
+* `NEWREC<X>`
+ Skip forward `<X>` lines in the current record, resetting the active
+ column to the left margin.
+
+* Repeat count
+ Any format specifier may be preceded by a number. This causes the
+ action of that format specifier to be repeated the specified number
+ of times.
+
+* `(SPEC1, ..., SPECN)`
+ Use `()` to group specifiers together. This is most useful when
+ preceded by a repeat count. Groups may be nested.
+
+ FORTRAN and columnar styles may be freely intermixed. Columnar style
+leaves the active column immediately after the ending column specified.
+Record motion using `NEWREC` in FORTRAN style also applies to later
+FORTRAN and columnar specifiers.
+
+### Example 1
+
+```
+DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
+
+BEGIN DATA.
+John Smith 102311
+Bob Arnold 122015
+Bill Yates 918 6
+END DATA.
+```
+
+Defines the following variables:
+
+ - `NAME`, a 10-character-wide string variable, in columns 1
+ through 10.
+
+ - `INFO1`, a numeric variable, in columns 12 through 13.
+
+ - `INFO2`, a numeric variable, in columns 14 through 15.
+
+ - `INFO3`, a numeric variable, in columns 16 through 17.
+
+The `BEGIN DATA`/`END DATA` commands cause three cases to be
+defined:
+
+|Case |NAME |INFO1 |INFO2 |INFO3|
+|------:|:------------|-------:|-------:|----:|
+| 1 |John Smith | 10 | 23 | 11|
+| 2 |Bob Arnold | 12 | 20 | 15|
+| 3 |Bill Yates | 9 | 18 | 6|
+
+The `TABLE` keyword causes PSPP to print out a table describing the
+four variables defined.
+
+### Example 2
+
+```
+DATA LIST FILE="survey.dat"
+ /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
+ /Q01 TO Q50 7-56
+ /.
+```
+
+Defines the following variables:
+
+ - `ID`, a numeric variable, in columns 1-5 of the first record.
+
+ - `NAME`, a 30-character string variable, in columns 7-36 of the
+ first record.
+
+ - `SURNAME`, a 30-character string variable, in columns 38-67 of
+ the first record.
+
+ - `MINITIAL`, a 1-character string variable, in column 69 of the
+ first record.
+
+ - Fifty variables `Q01`, `Q02`, `Q03`, ..., `Q49`, `Q50`, all
+ numeric, `Q01` in column 7, `Q02` in column 8, ..., `Q49` in
+ column 55, `Q50` in column 56, all in the second record.
+
+Cases are separated by a blank record.
+
+Data is read from file `survey.dat` in the current directory.
+
+## DATA LIST FREE
+
+```
+DATA LIST FREE
+ [({TAB,'C'}, ...)]
+ [{NOTABLE,TABLE}]
+ [FILE='FILE_NAME' [ENCODING='ENCODING']]
+ [SKIP=N_RECORDS]
+ /VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST [(TYPE_SPEC)]
+ VAR_LIST *
+```
+
+ In free format, the input data is, by default, structured as a series
+of fields separated by spaces, tabs, or line breaks. If the current
+`DECIMAL` separator is `DOT` (*note SET::), then commas are also treated
+as field separators. Each field's content may be unquoted, or it may be
+quoted with a pairs of apostrophes (`'`) or double quotes (`"`).
+Unquoted white space separates fields but is not part of any field. Any
+mix of spaces, tabs, and line breaks is equivalent to a single space for
+the purpose of separating fields, but consecutive commas will skip a
+field.
+
+ Alternatively, delimiters can be specified explicitly, as a
+parenthesized, comma-separated list of single-character strings
+immediately following `FREE`. The word `TAB` may also be used to
+specify a tab character as a delimiter. When delimiters are specified
+explicitly, only the given characters, plus line breaks, separate
+fields. Furthermore, leading spaces at the beginnings of fields are
+not trimmed, consecutive delimiters define empty fields, and no form
+of quoting is allowed.
+
+ The `NOTABLE` and `TABLE` subcommands are as in `DATA LIST FIXED`
+above. `NOTABLE` is the default.
+
+ The `FILE`, `SKIP`, and `ENCODING` subcommands are as in `DATA LIST
+FIXED` above.
+
+ The variables to be parsed are given as a single list of variable
+names. This list must be introduced by a single slash (`/`). The set
+of variable names may contain [format
+specifications](../../language/datasets/formats/index.html) in
+parentheses. Format specifications apply to all variables back to the
+previous parenthesized format specification.
+
+ An asterisk on its own has the same effect as `(F8.0)`, assigning
+the variables preceding it input/output format `F8.0`.
+
+ Specified field widths are ignored on input, although all normal
+limits on field width apply, but they are honored on output.
+
+## DATA LIST LIST
+
+```
+DATA LIST LIST
+ [({TAB,'C'}, ...)]
+ [{NOTABLE,TABLE}]
+ [FILE='FILE_NAME' [ENCODING='ENCODING']]
+ [SKIP=RECORD_COUNT]
+ /VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST [(TYPE_SPEC)]
+ VAR_LIST *
+```
+
+ With one exception, `DATA LIST LIST` is syntactically and
+semantically equivalent to `DATA LIST FREE`. The exception is that each
+input line is expected to correspond to exactly one input record. If
+more or fewer fields are found on an input line than expected, an
+appropriate diagnostic is issued.
+
--- /dev/null
+# DATAFILE ATTRIBUTE
+
+```
+DATAFILE ATTRIBUTE
+ ATTRIBUTE=NAME('VALUE') [NAME('VALUE')]...
+ ATTRIBUTE=NAME[INDEX]('VALUE') [NAME[INDEX]('VALUE')]...
+ DELETE=NAME [NAME]...
+ DELETE=NAME[INDEX] [NAME[INDEX]]...
+```
+
+ `DATAFILE ATTRIBUTE` adds, modifies, or removes user-defined
+attributes associated with the active dataset. Custom data file
+attributes are not interpreted by PSPP, but they are saved as part of
+system files and may be used by other software that reads them.
+
+ Use the `ATTRIBUTE` subcommand to add or modify a custom data file
+attribute. Specify the name of the attribute, followed by the desired
+value, in parentheses, as a quoted string. Attribute names that begin
+with `$` are reserved for PSPP's internal use, and attribute names
+that begin with `@` or `$@` are not displayed by most PSPP commands
+that display other attributes. Other attribute names are not treated
+specially.
+
+ Attributes may also be organized into arrays. To assign to an array
+element, add an integer array index enclosed in square brackets (`[` and
+`]`) between the attribute name and value. Array indexes start at 1,
+not 0. An attribute array that has a single element (number 1) is not
+distinguished from a non-array attribute.
+
+ Use the `DELETE` subcommand to delete an attribute. Specify an
+attribute name by itself to delete an entire attribute, including all
+array elements for attribute arrays. Specify an attribute name followed
+by an array index in square brackets to delete a single element of an
+attribute array. In the latter case, all the array elements numbered
+higher than the deleted element are shifted down, filling the vacated
+position.
+
+ To associate custom attributes with particular variables, instead of
+with the entire active dataset, use `VARIABLE ATTRIBUTE` (*note VARIABLE
+ATTRIBUTE::) instead.
+
+ `DATAFILE ATTRIBUTE` takes effect immediately. It is not affected by
+conditional and looping structures such as `DO IF` or `LOOP`.
+
--- /dev/null
+# DATASET commands
+
+```
+DATASET NAME NAME [WINDOW={ASIS,FRONT}].
+DATASET ACTIVATE NAME [WINDOW={ASIS,FRONT}].
+DATASET COPY NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}].
+DATASET DECLARE NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}].
+DATASET CLOSE {NAME,*,ALL}.
+DATASET DISPLAY.
+```
+
+ The `DATASET` commands simplify use of multiple datasets within a
+PSPP session. They allow datasets to be created and destroyed. At any
+given time, most PSPP commands work with a single dataset, called the
+active dataset.
+
+ The `DATASET NAME` command gives the active dataset the specified name,
+or if it already had a name, it renames it. If another dataset already
+had the given name, that dataset is deleted.
+
+ The `DATASET ACTIVATE` command selects the named dataset, which must
+already exist, as the active dataset. Before switching the active
+dataset, any pending transformations are executed, as if `EXECUTE` had
+been specified. If the active dataset is unnamed before switching, then
+it is deleted and becomes unavailable after switching.
+
+ The `DATASET COPY` command creates a new dataset with the specified
+name, whose contents are a copy of the active dataset. Any pending
+transformations are executed, as if `EXECUTE` had been specified, before
+making the copy. If a dataset with the given name already exists, it is
+replaced. If the name is the name of the active dataset, then the
+active dataset becomes unnamed.
+
+ The `DATASET DECLARE` command creates a new dataset that is initially
+"empty," that is, it has no dictionary or data. If a dataset with the
+given name already exists, this has no effect. The new dataset can be
+used with commands that support output to a dataset, e.g. AGGREGATE
+(*note AGGREGATE::).
+
+ The `DATASET CLOSE` command deletes a dataset. If the active dataset
+is specified by name, or if `*` is specified, then the active dataset
+becomes unnamed. If a different dataset is specified by name, then it
+is deleted and becomes unavailable. Specifying `ALL` deletes all datasets
+except for the active dataset, which becomes unnamed.
+
+ The `DATASET DISPLAY` command lists all the currently defined datasets.
+
+ Many `DATASET` commands accept an optional `WINDOW` subcommand. In the
+PSPPIRE GUI, the value given for this subcommand influences how the
+dataset's window is displayed. Outside the GUI, the `WINDOW` subcommand
+has no effect. The valid values are:
+
+* `ASIS`
+ Do not change how the window is displayed. This is the default for
+ `DATASET NAME` and `DATASET ACTIVATE`.
+
+* `FRONT`
+ Raise the dataset's window to the top. Make it the default dataset
+ for running syntax.
+
+* `MINIMIZED`
+ Display the window "minimized" to an icon. Prefer other datasets
+ for running syntax. This is the default for `DATASET COPY` and
+ `DATASET DECLARE`.
+
+* `HIDDEN`
+ Hide the dataset's window. Prefer other datasets for running
+ syntax.
--- /dev/null
+# END CASE
+
+```
+END CASE.
+```
+
+`END CASE` is used only within `INPUT PROGRAM` to output the current
+case. *Note INPUT PROGRAM::, for details.
+
--- /dev/null
+# END FILE
+
+```
+END FILE.
+```
+
+`END FILE` is used only within `INPUT PROGRAM` to terminate the
+current input program. *Note INPUT PROGRAM::.
+
--- /dev/null
+# FILE HANDLE
+
+## Syntax Overview
+
+For text files:
+
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME
+ [/MODE=CHARACTER]
+ [/ENDS={CR,CRLF}]
+ /TABWIDTH=TAB_WIDTH
+ [ENCODING='ENCODING']
+```
+
+For binary files in native encoding with fixed-length records:
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME'
+ /MODE=IMAGE
+ [/LRECL=REC_LEN]
+ [ENCODING='ENCODING']
+```
+
+For binary files in native encoding with variable-length records:
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME'
+ /MODE=BINARY
+ [/LRECL=REC_LEN]
+ [ENCODING='ENCODING']
+```
+
+For binary files encoded in EBCDIC:
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME'
+ /MODE=360
+ /RECFORM={FIXED,VARIABLE,SPANNED}
+ [/LRECL=REC_LEN]
+ [ENCODING='ENCODING']
+```
+
+## Details
+
+ Use `FILE HANDLE` to associate a file handle name with a file and its
+attributes, so that later commands can refer to the file by its handle
+name. Names of text files can be specified directly on commands that
+access files, so that `FILE HANDLE` is only needed when a file is not an
+ordinary file containing lines of text. However, `FILE HANDLE` may be
+used even for text files, and it may be easier to specify a file's name
+once and later refer to it by an abstract handle.
+
+Specify the file handle name as the identifier immediately following
+the `FILE HANDLE` command name. The identifier `INLINE` is reserved
+for representing data embedded in the syntax file (see [BEGIN
+DATA](begin-data.md)). The file handle name must not already have been
+used in a previous invocation of `FILE HANDLE`, unless it has been
+closed with [`CLOSE FILE HANDLE`](close-file-handle.md).
+
+The effect and syntax of `FILE HANDLE` depends on the selected `MODE`:
+
+ - In `CHARACTER` mode, the default, the data file is read as a text
+ file. Each text line is read as one record.
+
+ In `CHARACTER` mode only, tabs are expanded to spaces by input
+ programs, except by `DATA LIST FREE` with explicitly specified
+ delimiters. Each tab is 4 characters wide by default, but `TABWIDTH`
+ (a PSPP extension) may be used to specify an alternate width. Use
+ a `TABWIDTH` of 0 to suppress tab expansion.
+
+ A file written in `CHARACTER` mode by default uses the line ends of
+ the system on which PSPP is running, that is, on Windows, the
+ default is CR LF line ends, and on other systems the default is LF
+ only. Specify `ENDS` as `CR` or `CRLF` to override the default. PSPP
+ reads files using either convention on any kind of system,
+ regardless of `ENDS`.
+
+ - In `IMAGE` mode, the data file is treated as a series of fixed-length
+ binary records. `LRECL` should be used to specify the record length
+ in bytes, with a default of 1024. On input, it is an error if an
+ `IMAGE` file's length is not a integer multiple of the record length.
+ On output, each record is padded with spaces or truncated, if
+ necessary, to make it exactly the correct length.
+
+ - In `BINARY` mode, the data file is treated as a series of
+ variable-length binary records. `LRECL` may be specified, but
+ its value is ignored. The data for each record is both preceded
+ and followed by a 32-bit signed integer in little-endian byte
+ order that specifies the length of the record. (This redundancy
+ permits records in these files to be efficiently read in reverse
+ order, although PSPP always reads them in forward order.) The
+ length does not include either integer.
+
+ - Mode `360` reads and writes files in formats first used for tapes
+ in the 1960s on IBM mainframe operating systems and still
+ supported today by the modern successors of those operating
+ systems. For more information, see `OS/400 Tape and Diskette
+ Device Programming`, available on IBM's website.
+
+ Alphanumeric data in mode `360` files are encoded in EBCDIC. PSPP
+ translates EBCDIC to or from the host's native format as necessary
+ on input or output, using an ASCII/EBCDIC translation that is
+ one-to-one, so that a "round trip" from ASCII to EBCDIC back to
+ ASCII, or vice versa, always yields exactly the original data.
+
+ The `RECFORM` subcommand is required in mode `360`. The precise
+ file format depends on its setting:
+
+ * `F`
+ `FIXED`
+ This record format is equivalent to `IMAGE` mode, except for
+ EBCDIC translation.
+
+ IBM documentation calls this `*F` (fixed-length, deblocked)
+ format.
+
+ * `V`
+ `VARIABLE`
+ The file comprises a sequence of zero or more variable-length
+ blocks. Each block begins with a 4-byte "block descriptor
+ word" (BDW). The first two bytes of the BDW are an unsigned
+ integer in big-endian byte order that specifies the length of
+ the block, including the BDW itself. The other two bytes of
+ the BDW are ignored on input and written as zeros on output.
+
+ Following the BDW, the remainder of each block is a sequence
+ of one or more variable-length records, each of which in turn
+ begins with a 4-byte "record descriptor word" (RDW) that has
+ the same format as the BDW. Following the RDW, the remainder
+ of each record is the record data.
+
+ The maximum length of a record in `VARIABLE` mode is 65,527
+ bytes: 65,535 bytes (the maximum value of a 16-bit unsigned
+ integer), minus 4 bytes for the BDW, minus 4 bytes for the
+ RDW.
+
+ In mode `VARIABLE`, `LRECL` specifies a maximum, not a fixed,
+ record length, in bytes. The default is 8,192.
+
+ IBM documentation calls this `*VB` (variable-length, blocked,
+ unspanned) format.
+
+ * `VS`
+ `SPANNED`
+ This format is like `VARIABLE`, except that logical records may
+ be split among multiple physical records (called "segments") or
+ blocks. In `SPANNED` mode, the third byte of each RDW is
+ called the segment control character (SCC). Odd SCC values
+ cause the segment to be appended to a record buffer maintained
+ in memory; even values also append the segment and then flush
+ its contents to the input procedure. Canonically, SCC value 0
+ designates a record not spanned among multiple segments, and
+ values 1 through 3 designate the first segment, the last
+ segment, or an intermediate segment, respectively, within a
+ multi-segment record. The record buffer is also flushed at end
+ of file regardless of the final record's SCC.
+
+ The maximum length of a logical record in `VARIABLE` mode is
+ limited only by memory available to PSPP. Segments are
+ limited to 65,527 bytes, as in `VARIABLE` mode.
+
+ This format is similar to what IBM documentation call `*VS`
+ (variable-length, deblocked, spanned) format.
+
+ In mode `360`, fields of type `A` that extend beyond the end of a
+ record read from disk are padded with spaces in the host's native
+ character set, which are then translated from EBCDIC to the
+ native character set. Thus, when the host's native character set
+ is based on ASCII, these fields are effectively padded with
+ character `X'80'`. This wart is implemented for compatibility.
+
+ The `NAME` subcommand specifies the name of the file associated with
+the handle. It is required in all modes but `SCRATCH` mode, in which its
+use is forbidden.
+
+ The `ENCODING` subcommand specifies the encoding of text in the
+file. For reading text files in `CHARACTER` mode, all of the forms
+described for `ENCODING` on the `INSERT` command are supported (*note
+INSERT::). For reading in other file-based modes, encoding
+autodetection is not supported; if the specified encoding requests
+autodetection then the default encoding is used. This is also true
+when a file handle is used for writing a file in any mode.
+
--- /dev/null
+# Data Input and Output
+
+Data are the focus of the PSPP language. Each datum belongs to a “case”
+(also called an “observation”). Each case represents an individual or
+"experimental unit". For example, in the results of a survey, the names
+of the respondents, their sex, age, etc. and their responses are all
+data and the data pertaining to single respondent is a case. This
+chapter examines the PSPP commands for defining variables and reading
+and writing data. There are alternative commands to read data from
+predefined sources such as system files or databases (*Note GET DATA:
+GET.)
+
+> These commands tell PSPP how to read data, but the data will
+not actually be read until a procedure is executed.
--- /dev/null
+# INPUT PROGRAM
+
+```
+INPUT PROGRAM.
+... input commands ...
+END INPUT PROGRAM.
+```
+
+ `INPUT PROGRAM`...`END INPUT PROGRAM` specifies a complex input
+program. By placing data input commands within `INPUT PROGRAM`, PSPP
+programs can take advantage of more complex file structures than
+available with only `DATA LIST`.
+
+ The first sort of extended input program is to simply put multiple
+`DATA LIST` commands within the `INPUT PROGRAM`. This will cause all of
+the data files to be read in parallel. Input will stop when end of file
+is reached on any of the data files.
+
+ Transformations, such as conditional and looping constructs, can also
+be included within `INPUT PROGRAM`. These can be used to combine input
+from several data files in more complex ways. However, input will still
+stop when end of file is reached on any of the data files.
+
+ To prevent `INPUT PROGRAM` from terminating at the first end of
+file, use the `END` subcommand on `DATA LIST`. This subcommand takes
+a variable name, which should be a numeric [scratch
+variable](../../language/datasets/scratch-variables.md). (It need not
+be a scratch variable but otherwise the results can be surprising.)
+The value of this variable is set to 0 when reading the data file, or
+1 when end of file is encountered.
+
+ Two additional commands are useful in conjunction with `INPUT
+PROGRAM`. `END CASE` is the first. Normally each loop through the
+`INPUT PROGRAM` structure produces one case. `END CASE` controls
+exactly when cases are output. When `END CASE` is used, looping from
+the end of `INPUT PROGRAM` to the beginning does not cause a case to be
+output.
+
+ `END FILE` is the second. When the `END` subcommand is used on `DATA
+LIST`, there is no way for the `INPUT PROGRAM` construct to stop
+looping, so an infinite loop results. `END FILE`, when executed, stops
+the flow of input data and passes out of the `INPUT PROGRAM` structure.
+
+ `INPUT PROGRAM` must contain at least one `DATA LIST` or `END FILE`
+command.
+
+## Example 1: Read two files in parallel to the end of the shorter
+
+The following example reads variable `X` from file `a.txt` and
+variable `Y` from file `b.txt`. If one file is shorter than the other
+then the extra data in the longer file is ignored.
+
+```
+INPUT PROGRAM.
+ DATA LIST NOTABLE FILE='a.txt'/X 1-10.
+ DATA LIST NOTABLE FILE='b.txt'/Y 1-10.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 2: Read two files in parallel, supplementing the shorter
+
+The following example also reads variable `X` from `a.txt` and
+variable `Y` from `b.txt`. If one file is shorter than the other then
+it continues reading the longer to its end, setting the other variable
+to system-missing.
+
+```
+INPUT PROGRAM.
+ NUMERIC #A #B.
+
+ DO IF NOT #A.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ END IF.
+ DO IF NOT #B.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10.
+ END IF.
+ DO IF #A AND #B.
+ END FILE.
+ END IF.
+ END CASE.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 3: Concatenate two files (version 1)
+
+The following example reads data from file `a.txt`, then from `b.txt`,
+and concatenates them into a single active dataset.
+
+```
+INPUT PROGRAM.
+ NUMERIC #A #B.
+
+ DO IF #A.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10.
+ DO IF #B.
+ END FILE.
+ ELSE.
+ END CASE.
+ END IF.
+ ELSE.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ DO IF NOT #A.
+ END CASE.
+ END IF.
+ END IF.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 4: Concatenate two files (version 2)
+
+This is another way to do the same thing as Example 3.
+
+```
+INPUT PROGRAM.
+ NUMERIC #EOF.
+
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ COMPUTE #EOF = 0.
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ END FILE.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 5: Generate random variates
+
+The follows example creates a dataset that consists of 50 random
+variates between 0 and 10.
+
+```
+INPUT PROGRAM.
+ LOOP #I=1 TO 50.
+ COMPUTE X=UNIFORM(10).
+ END CASE.
+ END LOOP.
+ END FILE.
+END INPUT PROGRAM.
+LIST /FORMAT=NUMBERED.
+```
--- /dev/null
+# LIST
+
+```
+LIST
+ /VARIABLES=VAR_LIST
+ /CASES=FROM START_INDEX TO END_INDEX BY INCR_INDEX
+ /FORMAT={UNNUMBERED,NUMBERED} {WRAP,SINGLE}
+```
+
+ The `LIST` procedure prints the values of specified variables to the
+listing file.
+
+ The `VARIABLES` subcommand specifies the variables whose values are
+to be printed. Keyword `VARIABLES` is optional. If the `VARIABLES`
+subcommand is omitted then all variables in the active dataset are
+printed.
+
+ The `CASES` subcommand can be used to specify a subset of cases to be
+printed. Specify `FROM` and the case number of the first case to print,
+`TO` and the case number of the last case to print, and `BY` and the
+number of cases to advance between printing cases, or any subset of
+those settings. If `CASES` is not specified then all cases are printed.
+
+ The `FORMAT` subcommand can be used to change the output format.
+`NUMBERED` will print case numbers along with each case; `UNNUMBERED`,
+the default, causes the case numbers to be omitted. The `WRAP` and
+`SINGLE` settings are currently not used.
+
+ Case numbers start from 1. They are counted after all
+transformations have been considered.
+
+ `LIST` is a procedure. It causes the data to be read.
+
--- /dev/null
+# NEW FILE
+
+```
+NEW FILE.
+```
+
+The `NEW FILE` command clears the dictionary and data from the current
+active dataset.
+
--- /dev/null
+# PRINT EJECT
+
+```
+PRINT EJECT
+ OUTFILE='FILE_NAME'
+ RECORDS=N_LINES
+ {NOTABLE,TABLE}
+ /[LINE_NO] ARG...
+
+ARG takes one of the following forms:
+ 'STRING' [START-END]
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+ VAR_LIST *
+```
+
+`PRINT EJECT` advances to the beginning of a new output page in the
+listing file or output file. It can also output data in the same way as
+`PRINT`.
+
+All `PRINT EJECT` subcommands are optional.
+
+Without `OUTFILE`, `PRINT EJECT` ejects the current page in the
+listing file, then it produces other output, if any is specified.
+
+With `OUTFILE`, `PRINT EJECT` writes its output to the specified
+file. The first line of output is written with `1` inserted in the
+first column. Commonly, this is the only line of output. If additional
+lines of output are specified, these additional lines are written with a
+space inserted in the first column, as with `PRINT`.
+
+See [PRINT](print.md) for more information on syntax and usage.
+
--- /dev/null
+# PRINT SPACE
+
+```
+PRINT SPACE [OUTFILE='file_name'] [ENCODING='ENCODING'] [n_lines].
+```
+
+`PRINT SPACE` prints one or more blank lines to an output file.
+
+The `OUTFILE` subcommand is optional. It may be used to direct output
+to a file specified by file name as a string or [file
+handle](../../language/files/file-handles.md). If `OUTFILE` is not
+specified then output is directed to the listing file.
+
+The `ENCODING` subcommand may only be used if `OUTFILE` is also used.
+It specifies the character encoding of the file. *Note INSERT::, for
+information on supported encodings.
+
+`n_lines` is also optional. If present, it is an
+[expression](../../language/expressions/index.md) for the number of
+blank lines to be printed. The expression must evaluate to a
+nonnegative value.
+
--- /dev/null
+# PRINT
+
+```
+PRINT
+ [OUTFILE='FILE_NAME']
+ [RECORDS=N_LINES]
+ [{NOTABLE,TABLE}]
+ [ENCODING='ENCODING']
+ [/[LINE_NO] ARG...]
+
+ARG takes one of the following forms:
+ 'STRING' [START]
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+ VAR_LIST *
+```
+
+ The `PRINT` transformation writes variable data to the listing file
+or an output file. `PRINT` is executed when a procedure causes the data
+to be read. Follow `PRINT` by `EXECUTE` to print variable data without
+invoking a procedure (*note EXECUTE::).
+
+ All `PRINT` subcommands are optional. If no strings or variables are
+specified, `PRINT` outputs a single blank line.
+
+ The `OUTFILE` subcommand specifies the file to receive the output.
+The file may be a file name as a string or a [file
+handle](../../language/files/file-handles.md). If `OUTFILE` is not
+present then output is sent to PSPP's output listing file. When
+`OUTFILE` is present, the output is written to the file in a plain
+text format, with a space inserted at beginning of each output line,
+even lines that otherwise would be blank.
+
+ The `ENCODING` subcommand may only be used if the `OUTFILE`
+subcommand is also used. It specifies the character encoding of the
+file. *Note INSERT::, for information on supported encodings.
+
+ The `RECORDS` subcommand specifies the number of lines to be output.
+The number of lines may optionally be surrounded by parentheses.
+
+ `TABLE` will cause the `PRINT` command to output a table to the
+listing file that describes what it will print to the output file.
+`NOTABLE`, the default, suppresses this output table.
+
+ Introduce the strings and variables to be printed with a slash (`/`).
+Optionally, the slash may be followed by a number indicating which
+output line is specified. In the absence of this line number, the next
+line number is specified. Multiple lines may be specified using
+multiple slashes with the intended output for a line following its
+respective slash.
+
+ Literal strings may be printed. Specify the string itself.
+Optionally the string may be followed by a column number, specifying the
+column on the line where the string should start. Otherwise, the string
+is printed at the current position on the line.
+
+ Variables to be printed can be specified in the same ways as
+available for [`DATA LIST FIXED`](data-list.md#data-list-fixed). In addition,
+a variable list may be followed by an asterisk (`*`), which indicates
+that the variables should be printed in their dictionary print formats,
+separated by spaces. A variable list followed by a slash or the end of
+command is interpreted in the same way.
+
+ If a FORTRAN type specification is used to move backwards on the
+current line, then text is written at that point on the line, the line
+is truncated to that length, although additional text being added will
+again extend the line to that length.
+
--- /dev/null
+# REPEATING DATA
+
+```
+REPEATING DATA
+ /STARTS=START-END
+ /OCCURS=N_OCCURS
+ /FILE='FILE_NAME'
+ /LENGTH=LENGTH
+ /CONTINUED[=CONT_START-CONT_END]
+ /ID=ID_START-ID_END=ID_VAR
+ /{TABLE,NOTABLE}
+ /DATA=VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+```
+
+`REPEATING DATA` parses groups of data repeating in a uniform format,
+possibly with several groups on a single line. Each group of data
+corresponds with one case. `REPEATING DATA` may only be used within
+[`INPUT PROGRAM`](input-program.md). When used with [`DATA
+LIST`](data-list.md), it can be used to parse groups of cases that
+share a subset of variables but differ in their other data.
+
+The `STARTS` subcommand is required. Specify a range of columns,
+using literal numbers or numeric variable names. This range specifies
+the columns on the first line that are used to contain groups of data.
+The ending column is optional. If it is not specified, then the
+record width of the input file is used. For the [inline
+file](begin-data.md), this is 80 columns; for a file with fixed record
+widths it is the record width; for other files it is 1024 characters
+by default.
+
+The `OCCURS` subcommand is required. It must be a number or the name
+of a numeric variable. Its value is the number of groups present in the
+current record.
+
+The `DATA` subcommand is required. It must be the last subcommand
+specified. It is used to specify the data present within each
+repeating group. Column numbers are specified relative to the
+beginning of a group at column 1. Data is specified in the same way
+as with [`DATA LIST FIXED`](data-list.md#data-list-fixed).
+
+All other subcommands are optional.
+
+`FILE` specifies the file to read, either a file name as a string or a
+[file handle](../../language/files/file-handles.md). If `FILE` is not
+present then the default is the last file handle used on the most
+recent `DATA LIST` command.
+
+By default `REPEATING DATA` will output a table describing how it
+will parse the input data. Specifying `NOTABLE` will disable this
+behavior; specifying `TABLE` will explicitly enable it.
+
+The `LENGTH` subcommand specifies the length in characters of each
+group. If it is not present then length is inferred from the `DATA`
+subcommand. `LENGTH` may be a number or a variable name.
+
+Normally all the data groups are expected to be present on a single
+line. Use the `CONTINUED` command to indicate that data can be
+continued onto additional lines. If data on continuation lines starts
+at the left margin and continues through the entire field width, no
+column specifications are necessary on `CONTINUED`. Otherwise, specify
+the possible range of columns in the same way as on `STARTS`.
+
+When data groups are continued from line to line, it is easy for
+cases to get out of sync through careless hand editing. The `ID`
+subcommand allows a case identifier to be present on each line of
+repeating data groups. `REPEATING DATA` will check for the same
+identifier on each line and report mismatches. Specify the range of
+columns that the identifier will occupy, followed by an equals sign
+(`=`) and the identifier variable name. The variable must already have
+been declared with `NUMERIC` or another command.
+
+`REPEATING DATA` should be the last command given within an [`INPUT
+PROGRAM`](input-program.md). It should not be enclosed within a
+`LOOP` structure (*note LOOP::). Use `DATA LIST` before, not after,
+`REPEATING DATA`.
--- /dev/null
+# REREAD
+
+```
+REREAD [FILE=handle] [COLUMN=column] [ENCODING='ENCODING'].
+```
+
+The `REREAD` transformation allows the previous input line in a data
+file already processed by `DATA LIST` or another input command to be
+re-read for further processing.
+
+The `FILE` subcommand, which is optional, is used to specify the file
+to have its line re-read. The file must be specified as the name of a
+[file handle](../../language/files/file-handles.md). If `FILE` is not
+specified then the file specified on the most recent `DATA LIST`
+command is assumed.
+
+By default, the line re-read is re-read in its entirety. With the
+`COLUMN` subcommand, a prefix of the line can be exempted from
+re-reading. Specify an
+[expression](../../language/expressions/index.md) evaluating to the
+first column that should be included in the re-read line. Columns are
+numbered from 1 at the left margin.
+
+The `ENCODING` subcommand may only be used if the `FILE` subcommand
+is also used. It specifies the character encoding of the file. *Note
+INSERT::, for information on supported encodings.
+
+Issuing `REREAD` multiple times will not back up in the data file.
+Instead, it will re-read the same line multiple times.
+
--- /dev/null
+# WRITE
+
+```
+WRITE
+ OUTFILE='FILE_NAME'
+ RECORDS=N_LINES
+ {NOTABLE,TABLE}
+ /[LINE_NO] ARG...
+
+ARG takes one of the following forms:
+ 'STRING' [START-END]
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+ VAR_LIST *
+```
+
+`WRITE` writes text or binary data to an output file. `WRITE` differs
+from [`PRINT`](print.md) in only a few ways:
+
+- `WRITE` uses write formats by default, whereas `PRINT` uses print
+ formats.
+
+- `PRINT` inserts a space between variables unless a format is
+ explicitly specified, but `WRITE` never inserts space between
+ variables in output.
+
+- `PRINT` inserts a space at the beginning of each line that it writes
+ to an output file (and `PRINT EJECT` inserts `1` at the beginning of
+ each line that should begin a new page), but `WRITE` does not.
+
+- `PRINT` outputs the system-missing value according to its specified
+ output format, whereas `WRITE` outputs the system-missing value as a
+ field filled with spaces. Binary formats are an exception.
+
from a file using a program such as `zcat` (e.g. `GET '|zcat
mydata.sav.gz'`), and for many other purposes.
- PSPP also supports declaring named file handles with the `FILE
-HANDLE` command. This command associates an identifier of your choice
-(the file handle's name) with a file. Later, the file handle name can
-be substituted for the name of the file. When PSPP syntax accesses a
-file multiple times, declaring a named file handle simplifies updating
-the syntax later to use a different file. Use of `FILE HANDLE` is also
-required to read data files in binary formats. *Note FILE HANDLE::, for
-more information.
+ PSPP also supports declaring named file handles with the [`FILE
+HANDLE`](../../commands/data-io/file-handle.md) command. This command
+associates an identifier of your choice (the file handle's name) with
+a file. Later, the file handle name can be substituted for the name
+of the file. When PSPP syntax accesses a file multiple times,
+declaring a named file handle simplifies updating the syntax later to
+use a different file. Use of `FILE HANDLE` is also required to read
+data files in binary formats.
In some circumstances, PSPP must distinguish whether a file handle
refers to a system file or a portable file. When this is necessary to
writes a portable file; otherwise, PSPP writes a system file.
`INLINE` is reserved as a file handle name. It refers to the "data
-file" embedded into the syntax file between `BEGIN DATA` and `END
-DATA`. *Note BEGIN DATA::, for more information.
+file" embedded into the syntax file between [`BEGIN DATA` and `END
+DATA`](../../commands/data-io/begin-data.md).
The file to which a file handle refers may be reassigned on a later
-`FILE HANDLE` command if it is first closed using `CLOSE FILE HANDLE`.
-*Note CLOSE FILE HANDLE::, for more information.
+`FILE HANDLE` command if it is first closed using [`CLOSE FILE
+HANDLE`](../../commands/data-io/close-file-handle.md).
+