From cb3377f0b17cfbe54dfddd88bf25455bb5979278 Mon Sep 17 00:00:00 2001 From: Ben Pfaff Date: Tue, 6 May 2025 15:40:59 -0700 Subject: [PATCH] work on manual --- rust/doc/src/SUMMARY.md | 23 +- rust/doc/src/commands/data-io/begin-data.md | 14 + .../src/commands/data-io/close-file-handle.md | 17 ++ rust/doc/src/commands/data-io/data-list.md | 277 ++++++++++++++++++ .../commands/data-io/datafile-attribute.md | 44 +++ rust/doc/src/commands/data-io/dataset.md | 68 +++++ rust/doc/src/commands/data-io/end-case.md | 9 + rust/doc/src/commands/data-io/end-file.md | 9 + rust/doc/src/commands/data-io/file-handle.md | 184 ++++++++++++ rust/doc/src/commands/data-io/index.md | 14 + .../doc/src/commands/data-io/input-program.md | 154 ++++++++++ rust/doc/src/commands/data-io/list.md | 33 +++ rust/doc/src/commands/data-io/new-file.md | 9 + rust/doc/src/commands/data-io/print-eject.md | 33 +++ rust/doc/src/commands/data-io/print-space.md | 22 ++ rust/doc/src/commands/data-io/print.md | 68 +++++ .../src/commands/data-io/repeating-data.md | 79 +++++ rust/doc/src/commands/data-io/reread.md | 30 ++ rust/doc/src/commands/data-io/write.md | 34 +++ rust/doc/src/language/files/file-handles.md | 25 +- 20 files changed, 1133 insertions(+), 13 deletions(-) create mode 100644 rust/doc/src/commands/data-io/begin-data.md create mode 100644 rust/doc/src/commands/data-io/close-file-handle.md create mode 100644 rust/doc/src/commands/data-io/data-list.md create mode 100644 rust/doc/src/commands/data-io/datafile-attribute.md create mode 100644 rust/doc/src/commands/data-io/dataset.md create mode 100644 rust/doc/src/commands/data-io/end-case.md create mode 100644 rust/doc/src/commands/data-io/end-file.md create mode 100644 rust/doc/src/commands/data-io/file-handle.md create mode 100644 rust/doc/src/commands/data-io/index.md create mode 100644 rust/doc/src/commands/data-io/input-program.md create mode 100644 rust/doc/src/commands/data-io/list.md create mode 100644 rust/doc/src/commands/data-io/new-file.md create mode 100644 rust/doc/src/commands/data-io/print-eject.md create mode 100644 rust/doc/src/commands/data-io/print-space.md create mode 100644 rust/doc/src/commands/data-io/print.md create mode 100644 rust/doc/src/commands/data-io/repeating-data.md create mode 100644 rust/doc/src/commands/data-io/reread.md create mode 100644 rust/doc/src/commands/data-io/write.md diff --git a/rust/doc/src/SUMMARY.md b/rust/doc/src/SUMMARY.md index eb7e9ad198..c92ef172c2 100644 --- a/rust/doc/src/SUMMARY.md +++ b/rust/doc/src/SUMMARY.md @@ -3,7 +3,7 @@ [Introduction](introduction.md) [License](license.md) -# Language Syntax +# Language Overview - [Basics](language/basics/index.md) - [Tokens](language/basics/tokens.md) @@ -38,6 +38,27 @@ - [Miscellaneous Functions](language/expressions/functions/miscellaneous.md) - [Statistical Distribution Functions](language/expressions/functions/statistical-distributions.md) +# Command Syntax + +- [Data Input and Output](commands/data-io/index.md) + - [BEGIN DATA](commands/data-io/begin-data.md) + - [CLOSE FILE HANDLE](commands/data-io/close-file-handle.md) + - [DATAFILE ATTRIBUTE](commands/data-io/datafile-attribute.md) + - [DATASET commands](commands/data-io/dataset.md) + - [DATA LIST](commands/data-io/data-list.md) + - [END CASE](commands/data-io/end-case.md) + - [END FILE](commands/data-io/end-file.md) + - [FILE HANDLE](commands/data-io/file-handle.md) + - [INPUT PROGRAM](commands/data-io/input-program.md) + - [LIST](commands/data-io/list.md) + - [NEW FILE](commands/data-io/new-file.md) + - [PRINT](commands/data-io/print.md) + - [PRINT EJECT](commands/data-io/print-eject.md) + - [PRINT SPACE](commands/data-io/print-space.md) + - [REREAD](commands/data-io/reread.md) + - [REPEATING DATA](commands/data-io/repeating-data.md) + - [WRITE](commands/data-io/write.md) + # Developer Documentation - [System File Format](system-file/index.md) diff --git a/rust/doc/src/commands/data-io/begin-data.md b/rust/doc/src/commands/data-io/begin-data.md new file mode 100644 index 0000000000..2309e32754 --- /dev/null +++ b/rust/doc/src/commands/data-io/begin-data.md @@ -0,0 +1,14 @@ +# BEGIN DATA + +``` +BEGIN DATA. +... +END DATA. +``` + +`BEGIN DATA` and `END DATA` can be used to embed raw ASCII data in a +PSPP syntax file. `DATA LIST` or another input procedure must be used +before `BEGIN DATA` (*note DATA LIST::). `BEGIN DATA` and `END DATA` +must be used together. `END DATA` must appear by itself on a single +line, with no leading white space and exactly one space between the +words `END` and `DATA`. diff --git a/rust/doc/src/commands/data-io/close-file-handle.md b/rust/doc/src/commands/data-io/close-file-handle.md new file mode 100644 index 0000000000..e73a569ed2 --- /dev/null +++ b/rust/doc/src/commands/data-io/close-file-handle.md @@ -0,0 +1,17 @@ +# CLOSE FILE HANDLE + +``` +CLOSE FILE HANDLE HANDLE_NAME. +``` + +`CLOSE FILE HANDLE` disassociates the name of a [file +handle](../../language/files/file-handles.md) with a given file. The +only specification is the name of the handle to close. Afterward +`FILE HANDLE`. + +The file named INLINE, which represents data entered between `BEGIN +DATA` and `END DATA`, cannot be closed. Attempts to close it with +`CLOSE FILE HANDLE` have no effect. + +`CLOSE FILE HANDLE` is a PSPP extension. + diff --git a/rust/doc/src/commands/data-io/data-list.md b/rust/doc/src/commands/data-io/data-list.md new file mode 100644 index 0000000000..fc6b94797f --- /dev/null +++ b/rust/doc/src/commands/data-io/data-list.md @@ -0,0 +1,277 @@ +# DATA LIST + +Used to read text or binary data, `DATA LIST` is the most fundamental +data-reading command. Even the more sophisticated input methods use +`DATA LIST` commands as a building block. Understanding `DATA LIST` is +important to understanding how to use PSPP to read your data files. + + There are two major variants of `DATA LIST`, which are fixed format +and free format. In addition, free format has a minor variant, list +format, which is discussed in terms of its differences from vanilla free +format. + + Each form of `DATA LIST` is described in detail below. + + *Note GET DATA::, for a command that offers a few enhancements over +DATA LIST and that may be substituted for DATA LIST in many situations. + +## DATA LIST FIXED + +``` +DATA LIST [FIXED] + {TABLE,NOTABLE} + [FILE='FILE_NAME' [ENCODING='ENCODING']] + [RECORDS=RECORD_COUNT] + [END=END_VAR] + [SKIP=RECORD_COUNT] + /[line_no] VAR_SPEC... + +where each VAR_SPEC takes one of the forms + VAR_LIST START-END [TYPE_SPEC] + VAR_LIST (FORTRAN_SPEC) +``` + + `DATA LIST FIXED` is used to read data files that have values at +fixed positions on each line of single-line or multiline records. The +keyword `FIXED` is optional. + + The `FILE` subcommand must be used if input is to be taken from an +external file. It may be used to specify a file name as a string or a +[file handle](../../language/files/file-handles.md). If the `FILE` +subcommand is not used, then input is assumed to be specified within +the command file using [`BEGIN DATA`...`END DATA`](begin-data.md). +The `ENCODING` subcommand may only be used if the `FILE` subcommand is +also used. It specifies the character encoding of the file. *Note +INSERT::, for information on supported encodings. + + The optional `RECORDS` subcommand, which takes a single integer as an +argument, is used to specify the number of lines per record. If +`RECORDS` is not specified, then the number of lines per record is +calculated from the list of variable specifications later in `DATA +LIST`. + + The `END` subcommand is only useful in conjunction with `INPUT +PROGRAM`. *Note INPUT PROGRAM::, for details. + + The optional `SKIP` subcommand specifies a number of records to skip +at the beginning of an input file. It can be used to skip over a row +that contains variable names, for example. + + `DATA LIST` can optionally output a table describing how the data +file is read. The `TABLE` subcommand enables this output, and `NOTABLE` +disables it. The default is to output the table. + + The list of variables to be read from the data list must come last. +Each line in the data record is introduced by a slash (`/`). +Optionally, a line number may follow the slash. Following, any number +of variable specifications may be present. + + Each variable specification consists of a list of variable names +followed by a description of their location on the input line. [Sets +of variables](../../language/datasets/variable-lists.html) may be with +`TO`, e.g. `VAR1 TO VAR5`. There are two ways to specify the location +of the variable on the line: columnar style and FORTRAN style. + + In columnar style, the starting column and ending column for the +field are specified after the variable name, separated by a dash +(`-`). For instance, the third through fifth columns on a line would +be specified `3-5`. By default, variables are considered to be in +[`F` format](../../language/datasets/formats/basic.html). (This +default can be changed; see *note SET:: for more information.) + + In columnar style, to use a variable format other than the default, +specify the format type in parentheses after the column numbers. For +instance, for alphanumeric `A` format, use `(A)`. + + In addition, implied decimal places can be specified in parentheses +after the column numbers. As an example, suppose that a data file has a +field in which the characters `1234` should be interpreted as having the +value 12.34. Then this field has two implied decimal places, and the +corresponding specification would be `(2)`. If a field that has implied +decimal places contains a decimal point, then the implied decimal places +are not applied. + + Changing the variable format and adding implied decimal places can be +done together; for instance, `(N,5)`. + + When using columnar style, the input and output width of each +variable is computed from the field width. The field width must be +evenly divisible into the number of variables specified. + + FORTRAN style is an altogether different approach to specifying field +locations. With this approach, a list of variable input format +specifications, separated by commas, are placed after the variable names +inside parentheses. Each format specifier advances as many characters +into the input line as it uses. + + Implied decimal places also exist in FORTRAN style. A format +specification with `D` decimal places also has `D` implied decimal places. + + In addition to the [standard + formats](../../language/datasets/formats/index.html), FORTRAN style + defines some extensions: + +* `X` + Advance the current column on this line by one character position. + +* `T` + Set the current column on this line to column ``, with column + numbers considered to begin with 1 at the left margin. + +* `NEWREC` + Skip forward `` lines in the current record, resetting the active + column to the left margin. + +* Repeat count + Any format specifier may be preceded by a number. This causes the + action of that format specifier to be repeated the specified number + of times. + +* `(SPEC1, ..., SPECN)` + Use `()` to group specifiers together. This is most useful when + preceded by a repeat count. Groups may be nested. + + FORTRAN and columnar styles may be freely intermixed. Columnar style +leaves the active column immediately after the ending column specified. +Record motion using `NEWREC` in FORTRAN style also applies to later +FORTRAN and columnar specifiers. + +### Example 1 + +``` +DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1). + +BEGIN DATA. +John Smith 102311 +Bob Arnold 122015 +Bill Yates 918 6 +END DATA. +``` + +Defines the following variables: + + - `NAME`, a 10-character-wide string variable, in columns 1 + through 10. + + - `INFO1`, a numeric variable, in columns 12 through 13. + + - `INFO2`, a numeric variable, in columns 14 through 15. + + - `INFO3`, a numeric variable, in columns 16 through 17. + +The `BEGIN DATA`/`END DATA` commands cause three cases to be +defined: + +|Case |NAME |INFO1 |INFO2 |INFO3| +|------:|:------------|-------:|-------:|----:| +| 1 |John Smith | 10 | 23 | 11| +| 2 |Bob Arnold | 12 | 20 | 15| +| 3 |Bill Yates | 9 | 18 | 6| + +The `TABLE` keyword causes PSPP to print out a table describing the +four variables defined. + +### Example 2 + +``` +DATA LIST FILE="survey.dat" + /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A) + /Q01 TO Q50 7-56 + /. +``` + +Defines the following variables: + + - `ID`, a numeric variable, in columns 1-5 of the first record. + + - `NAME`, a 30-character string variable, in columns 7-36 of the + first record. + + - `SURNAME`, a 30-character string variable, in columns 38-67 of + the first record. + + - `MINITIAL`, a 1-character string variable, in column 69 of the + first record. + + - Fifty variables `Q01`, `Q02`, `Q03`, ..., `Q49`, `Q50`, all + numeric, `Q01` in column 7, `Q02` in column 8, ..., `Q49` in + column 55, `Q50` in column 56, all in the second record. + +Cases are separated by a blank record. + +Data is read from file `survey.dat` in the current directory. + +## DATA LIST FREE + +``` +DATA LIST FREE + [({TAB,'C'}, ...)] + [{NOTABLE,TABLE}] + [FILE='FILE_NAME' [ENCODING='ENCODING']] + [SKIP=N_RECORDS] + /VAR_SPEC... + +where each VAR_SPEC takes one of the forms + VAR_LIST [(TYPE_SPEC)] + VAR_LIST * +``` + + In free format, the input data is, by default, structured as a series +of fields separated by spaces, tabs, or line breaks. If the current +`DECIMAL` separator is `DOT` (*note SET::), then commas are also treated +as field separators. Each field's content may be unquoted, or it may be +quoted with a pairs of apostrophes (`'`) or double quotes (`"`). +Unquoted white space separates fields but is not part of any field. Any +mix of spaces, tabs, and line breaks is equivalent to a single space for +the purpose of separating fields, but consecutive commas will skip a +field. + + Alternatively, delimiters can be specified explicitly, as a +parenthesized, comma-separated list of single-character strings +immediately following `FREE`. The word `TAB` may also be used to +specify a tab character as a delimiter. When delimiters are specified +explicitly, only the given characters, plus line breaks, separate +fields. Furthermore, leading spaces at the beginnings of fields are +not trimmed, consecutive delimiters define empty fields, and no form +of quoting is allowed. + + The `NOTABLE` and `TABLE` subcommands are as in `DATA LIST FIXED` +above. `NOTABLE` is the default. + + The `FILE`, `SKIP`, and `ENCODING` subcommands are as in `DATA LIST +FIXED` above. + + The variables to be parsed are given as a single list of variable +names. This list must be introduced by a single slash (`/`). The set +of variable names may contain [format +specifications](../../language/datasets/formats/index.html) in +parentheses. Format specifications apply to all variables back to the +previous parenthesized format specification. + + An asterisk on its own has the same effect as `(F8.0)`, assigning +the variables preceding it input/output format `F8.0`. + + Specified field widths are ignored on input, although all normal +limits on field width apply, but they are honored on output. + +## DATA LIST LIST + +``` +DATA LIST LIST + [({TAB,'C'}, ...)] + [{NOTABLE,TABLE}] + [FILE='FILE_NAME' [ENCODING='ENCODING']] + [SKIP=RECORD_COUNT] + /VAR_SPEC... + +where each VAR_SPEC takes one of the forms + VAR_LIST [(TYPE_SPEC)] + VAR_LIST * +``` + + With one exception, `DATA LIST LIST` is syntactically and +semantically equivalent to `DATA LIST FREE`. The exception is that each +input line is expected to correspond to exactly one input record. If +more or fewer fields are found on an input line than expected, an +appropriate diagnostic is issued. + diff --git a/rust/doc/src/commands/data-io/datafile-attribute.md b/rust/doc/src/commands/data-io/datafile-attribute.md new file mode 100644 index 0000000000..0e0a6f4f60 --- /dev/null +++ b/rust/doc/src/commands/data-io/datafile-attribute.md @@ -0,0 +1,44 @@ +# DATAFILE ATTRIBUTE + +``` +DATAFILE ATTRIBUTE + ATTRIBUTE=NAME('VALUE') [NAME('VALUE')]... + ATTRIBUTE=NAME[INDEX]('VALUE') [NAME[INDEX]('VALUE')]... + DELETE=NAME [NAME]... + DELETE=NAME[INDEX] [NAME[INDEX]]... +``` + + `DATAFILE ATTRIBUTE` adds, modifies, or removes user-defined +attributes associated with the active dataset. Custom data file +attributes are not interpreted by PSPP, but they are saved as part of +system files and may be used by other software that reads them. + + Use the `ATTRIBUTE` subcommand to add or modify a custom data file +attribute. Specify the name of the attribute, followed by the desired +value, in parentheses, as a quoted string. Attribute names that begin +with `$` are reserved for PSPP's internal use, and attribute names +that begin with `@` or `$@` are not displayed by most PSPP commands +that display other attributes. Other attribute names are not treated +specially. + + Attributes may also be organized into arrays. To assign to an array +element, add an integer array index enclosed in square brackets (`[` and +`]`) between the attribute name and value. Array indexes start at 1, +not 0. An attribute array that has a single element (number 1) is not +distinguished from a non-array attribute. + + Use the `DELETE` subcommand to delete an attribute. Specify an +attribute name by itself to delete an entire attribute, including all +array elements for attribute arrays. Specify an attribute name followed +by an array index in square brackets to delete a single element of an +attribute array. In the latter case, all the array elements numbered +higher than the deleted element are shifted down, filling the vacated +position. + + To associate custom attributes with particular variables, instead of +with the entire active dataset, use `VARIABLE ATTRIBUTE` (*note VARIABLE +ATTRIBUTE::) instead. + + `DATAFILE ATTRIBUTE` takes effect immediately. It is not affected by +conditional and looping structures such as `DO IF` or `LOOP`. + diff --git a/rust/doc/src/commands/data-io/dataset.md b/rust/doc/src/commands/data-io/dataset.md new file mode 100644 index 0000000000..03f8da857f --- /dev/null +++ b/rust/doc/src/commands/data-io/dataset.md @@ -0,0 +1,68 @@ +# DATASET commands + +``` +DATASET NAME NAME [WINDOW={ASIS,FRONT}]. +DATASET ACTIVATE NAME [WINDOW={ASIS,FRONT}]. +DATASET COPY NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}]. +DATASET DECLARE NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}]. +DATASET CLOSE {NAME,*,ALL}. +DATASET DISPLAY. +``` + + The `DATASET` commands simplify use of multiple datasets within a +PSPP session. They allow datasets to be created and destroyed. At any +given time, most PSPP commands work with a single dataset, called the +active dataset. + + The `DATASET NAME` command gives the active dataset the specified name, +or if it already had a name, it renames it. If another dataset already +had the given name, that dataset is deleted. + + The `DATASET ACTIVATE` command selects the named dataset, which must +already exist, as the active dataset. Before switching the active +dataset, any pending transformations are executed, as if `EXECUTE` had +been specified. If the active dataset is unnamed before switching, then +it is deleted and becomes unavailable after switching. + + The `DATASET COPY` command creates a new dataset with the specified +name, whose contents are a copy of the active dataset. Any pending +transformations are executed, as if `EXECUTE` had been specified, before +making the copy. If a dataset with the given name already exists, it is +replaced. If the name is the name of the active dataset, then the +active dataset becomes unnamed. + + The `DATASET DECLARE` command creates a new dataset that is initially +"empty," that is, it has no dictionary or data. If a dataset with the +given name already exists, this has no effect. The new dataset can be +used with commands that support output to a dataset, e.g. AGGREGATE +(*note AGGREGATE::). + + The `DATASET CLOSE` command deletes a dataset. If the active dataset +is specified by name, or if `*` is specified, then the active dataset +becomes unnamed. If a different dataset is specified by name, then it +is deleted and becomes unavailable. Specifying `ALL` deletes all datasets +except for the active dataset, which becomes unnamed. + + The `DATASET DISPLAY` command lists all the currently defined datasets. + + Many `DATASET` commands accept an optional `WINDOW` subcommand. In the +PSPPIRE GUI, the value given for this subcommand influences how the +dataset's window is displayed. Outside the GUI, the `WINDOW` subcommand +has no effect. The valid values are: + +* `ASIS` + Do not change how the window is displayed. This is the default for + `DATASET NAME` and `DATASET ACTIVATE`. + +* `FRONT` + Raise the dataset's window to the top. Make it the default dataset + for running syntax. + +* `MINIMIZED` + Display the window "minimized" to an icon. Prefer other datasets + for running syntax. This is the default for `DATASET COPY` and + `DATASET DECLARE`. + +* `HIDDEN` + Hide the dataset's window. Prefer other datasets for running + syntax. diff --git a/rust/doc/src/commands/data-io/end-case.md b/rust/doc/src/commands/data-io/end-case.md new file mode 100644 index 0000000000..5c771faeec --- /dev/null +++ b/rust/doc/src/commands/data-io/end-case.md @@ -0,0 +1,9 @@ +# END CASE + +``` +END CASE. +``` + +`END CASE` is used only within `INPUT PROGRAM` to output the current +case. *Note INPUT PROGRAM::, for details. + diff --git a/rust/doc/src/commands/data-io/end-file.md b/rust/doc/src/commands/data-io/end-file.md new file mode 100644 index 0000000000..e2d9b29af4 --- /dev/null +++ b/rust/doc/src/commands/data-io/end-file.md @@ -0,0 +1,9 @@ +# END FILE + +``` +END FILE. +``` + +`END FILE` is used only within `INPUT PROGRAM` to terminate the +current input program. *Note INPUT PROGRAM::. + diff --git a/rust/doc/src/commands/data-io/file-handle.md b/rust/doc/src/commands/data-io/file-handle.md new file mode 100644 index 0000000000..909bfe07e4 --- /dev/null +++ b/rust/doc/src/commands/data-io/file-handle.md @@ -0,0 +1,184 @@ +# FILE HANDLE + +## Syntax Overview + +For text files: + +``` +FILE HANDLE HANDLE_NAME + /NAME='FILE_NAME + [/MODE=CHARACTER] + [/ENDS={CR,CRLF}] + /TABWIDTH=TAB_WIDTH + [ENCODING='ENCODING'] +``` + +For binary files in native encoding with fixed-length records: +``` +FILE HANDLE HANDLE_NAME + /NAME='FILE_NAME' + /MODE=IMAGE + [/LRECL=REC_LEN] + [ENCODING='ENCODING'] +``` + +For binary files in native encoding with variable-length records: +``` +FILE HANDLE HANDLE_NAME + /NAME='FILE_NAME' + /MODE=BINARY + [/LRECL=REC_LEN] + [ENCODING='ENCODING'] +``` + +For binary files encoded in EBCDIC: +``` +FILE HANDLE HANDLE_NAME + /NAME='FILE_NAME' + /MODE=360 + /RECFORM={FIXED,VARIABLE,SPANNED} + [/LRECL=REC_LEN] + [ENCODING='ENCODING'] +``` + +## Details + + Use `FILE HANDLE` to associate a file handle name with a file and its +attributes, so that later commands can refer to the file by its handle +name. Names of text files can be specified directly on commands that +access files, so that `FILE HANDLE` is only needed when a file is not an +ordinary file containing lines of text. However, `FILE HANDLE` may be +used even for text files, and it may be easier to specify a file's name +once and later refer to it by an abstract handle. + +Specify the file handle name as the identifier immediately following +the `FILE HANDLE` command name. The identifier `INLINE` is reserved +for representing data embedded in the syntax file (see [BEGIN +DATA](begin-data.md)). The file handle name must not already have been +used in a previous invocation of `FILE HANDLE`, unless it has been +closed with [`CLOSE FILE HANDLE`](close-file-handle.md). + +The effect and syntax of `FILE HANDLE` depends on the selected `MODE`: + + - In `CHARACTER` mode, the default, the data file is read as a text + file. Each text line is read as one record. + + In `CHARACTER` mode only, tabs are expanded to spaces by input + programs, except by `DATA LIST FREE` with explicitly specified + delimiters. Each tab is 4 characters wide by default, but `TABWIDTH` + (a PSPP extension) may be used to specify an alternate width. Use + a `TABWIDTH` of 0 to suppress tab expansion. + + A file written in `CHARACTER` mode by default uses the line ends of + the system on which PSPP is running, that is, on Windows, the + default is CR LF line ends, and on other systems the default is LF + only. Specify `ENDS` as `CR` or `CRLF` to override the default. PSPP + reads files using either convention on any kind of system, + regardless of `ENDS`. + + - In `IMAGE` mode, the data file is treated as a series of fixed-length + binary records. `LRECL` should be used to specify the record length + in bytes, with a default of 1024. On input, it is an error if an + `IMAGE` file's length is not a integer multiple of the record length. + On output, each record is padded with spaces or truncated, if + necessary, to make it exactly the correct length. + + - In `BINARY` mode, the data file is treated as a series of + variable-length binary records. `LRECL` may be specified, but + its value is ignored. The data for each record is both preceded + and followed by a 32-bit signed integer in little-endian byte + order that specifies the length of the record. (This redundancy + permits records in these files to be efficiently read in reverse + order, although PSPP always reads them in forward order.) The + length does not include either integer. + + - Mode `360` reads and writes files in formats first used for tapes + in the 1960s on IBM mainframe operating systems and still + supported today by the modern successors of those operating + systems. For more information, see `OS/400 Tape and Diskette + Device Programming`, available on IBM's website. + + Alphanumeric data in mode `360` files are encoded in EBCDIC. PSPP + translates EBCDIC to or from the host's native format as necessary + on input or output, using an ASCII/EBCDIC translation that is + one-to-one, so that a "round trip" from ASCII to EBCDIC back to + ASCII, or vice versa, always yields exactly the original data. + + The `RECFORM` subcommand is required in mode `360`. The precise + file format depends on its setting: + + * `F` + `FIXED` + This record format is equivalent to `IMAGE` mode, except for + EBCDIC translation. + + IBM documentation calls this `*F` (fixed-length, deblocked) + format. + + * `V` + `VARIABLE` + The file comprises a sequence of zero or more variable-length + blocks. Each block begins with a 4-byte "block descriptor + word" (BDW). The first two bytes of the BDW are an unsigned + integer in big-endian byte order that specifies the length of + the block, including the BDW itself. The other two bytes of + the BDW are ignored on input and written as zeros on output. + + Following the BDW, the remainder of each block is a sequence + of one or more variable-length records, each of which in turn + begins with a 4-byte "record descriptor word" (RDW) that has + the same format as the BDW. Following the RDW, the remainder + of each record is the record data. + + The maximum length of a record in `VARIABLE` mode is 65,527 + bytes: 65,535 bytes (the maximum value of a 16-bit unsigned + integer), minus 4 bytes for the BDW, minus 4 bytes for the + RDW. + + In mode `VARIABLE`, `LRECL` specifies a maximum, not a fixed, + record length, in bytes. The default is 8,192. + + IBM documentation calls this `*VB` (variable-length, blocked, + unspanned) format. + + * `VS` + `SPANNED` + This format is like `VARIABLE`, except that logical records may + be split among multiple physical records (called "segments") or + blocks. In `SPANNED` mode, the third byte of each RDW is + called the segment control character (SCC). Odd SCC values + cause the segment to be appended to a record buffer maintained + in memory; even values also append the segment and then flush + its contents to the input procedure. Canonically, SCC value 0 + designates a record not spanned among multiple segments, and + values 1 through 3 designate the first segment, the last + segment, or an intermediate segment, respectively, within a + multi-segment record. The record buffer is also flushed at end + of file regardless of the final record's SCC. + + The maximum length of a logical record in `VARIABLE` mode is + limited only by memory available to PSPP. Segments are + limited to 65,527 bytes, as in `VARIABLE` mode. + + This format is similar to what IBM documentation call `*VS` + (variable-length, deblocked, spanned) format. + + In mode `360`, fields of type `A` that extend beyond the end of a + record read from disk are padded with spaces in the host's native + character set, which are then translated from EBCDIC to the + native character set. Thus, when the host's native character set + is based on ASCII, these fields are effectively padded with + character `X'80'`. This wart is implemented for compatibility. + + The `NAME` subcommand specifies the name of the file associated with +the handle. It is required in all modes but `SCRATCH` mode, in which its +use is forbidden. + + The `ENCODING` subcommand specifies the encoding of text in the +file. For reading text files in `CHARACTER` mode, all of the forms +described for `ENCODING` on the `INSERT` command are supported (*note +INSERT::). For reading in other file-based modes, encoding +autodetection is not supported; if the specified encoding requests +autodetection then the default encoding is used. This is also true +when a file handle is used for writing a file in any mode. + diff --git a/rust/doc/src/commands/data-io/index.md b/rust/doc/src/commands/data-io/index.md new file mode 100644 index 0000000000..b331129787 --- /dev/null +++ b/rust/doc/src/commands/data-io/index.md @@ -0,0 +1,14 @@ +# Data Input and Output + +Data are the focus of the PSPP language. Each datum belongs to a “case” +(also called an “observation”). Each case represents an individual or +"experimental unit". For example, in the results of a survey, the names +of the respondents, their sex, age, etc. and their responses are all +data and the data pertaining to single respondent is a case. This +chapter examines the PSPP commands for defining variables and reading +and writing data. There are alternative commands to read data from +predefined sources such as system files or databases (*Note GET DATA: +GET.) + +> These commands tell PSPP how to read data, but the data will +not actually be read until a procedure is executed. diff --git a/rust/doc/src/commands/data-io/input-program.md b/rust/doc/src/commands/data-io/input-program.md new file mode 100644 index 0000000000..8a31d14095 --- /dev/null +++ b/rust/doc/src/commands/data-io/input-program.md @@ -0,0 +1,154 @@ +# INPUT PROGRAM + +``` +INPUT PROGRAM. +... input commands ... +END INPUT PROGRAM. +``` + + `INPUT PROGRAM`...`END INPUT PROGRAM` specifies a complex input +program. By placing data input commands within `INPUT PROGRAM`, PSPP +programs can take advantage of more complex file structures than +available with only `DATA LIST`. + + The first sort of extended input program is to simply put multiple +`DATA LIST` commands within the `INPUT PROGRAM`. This will cause all of +the data files to be read in parallel. Input will stop when end of file +is reached on any of the data files. + + Transformations, such as conditional and looping constructs, can also +be included within `INPUT PROGRAM`. These can be used to combine input +from several data files in more complex ways. However, input will still +stop when end of file is reached on any of the data files. + + To prevent `INPUT PROGRAM` from terminating at the first end of +file, use the `END` subcommand on `DATA LIST`. This subcommand takes +a variable name, which should be a numeric [scratch +variable](../../language/datasets/scratch-variables.md). (It need not +be a scratch variable but otherwise the results can be surprising.) +The value of this variable is set to 0 when reading the data file, or +1 when end of file is encountered. + + Two additional commands are useful in conjunction with `INPUT +PROGRAM`. `END CASE` is the first. Normally each loop through the +`INPUT PROGRAM` structure produces one case. `END CASE` controls +exactly when cases are output. When `END CASE` is used, looping from +the end of `INPUT PROGRAM` to the beginning does not cause a case to be +output. + + `END FILE` is the second. When the `END` subcommand is used on `DATA +LIST`, there is no way for the `INPUT PROGRAM` construct to stop +looping, so an infinite loop results. `END FILE`, when executed, stops +the flow of input data and passes out of the `INPUT PROGRAM` structure. + + `INPUT PROGRAM` must contain at least one `DATA LIST` or `END FILE` +command. + +## Example 1: Read two files in parallel to the end of the shorter + +The following example reads variable `X` from file `a.txt` and +variable `Y` from file `b.txt`. If one file is shorter than the other +then the extra data in the longer file is ignored. + +``` +INPUT PROGRAM. + DATA LIST NOTABLE FILE='a.txt'/X 1-10. + DATA LIST NOTABLE FILE='b.txt'/Y 1-10. +END INPUT PROGRAM. +LIST. +``` + +## Example 2: Read two files in parallel, supplementing the shorter + +The following example also reads variable `X` from `a.txt` and +variable `Y` from `b.txt`. If one file is shorter than the other then +it continues reading the longer to its end, setting the other variable +to system-missing. + +``` +INPUT PROGRAM. + NUMERIC #A #B. + + DO IF NOT #A. + DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10. + END IF. + DO IF NOT #B. + DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10. + END IF. + DO IF #A AND #B. + END FILE. + END IF. + END CASE. +END INPUT PROGRAM. +LIST. +``` + +## Example 3: Concatenate two files (version 1) + +The following example reads data from file `a.txt`, then from `b.txt`, +and concatenates them into a single active dataset. + +``` +INPUT PROGRAM. + NUMERIC #A #B. + + DO IF #A. + DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10. + DO IF #B. + END FILE. + ELSE. + END CASE. + END IF. + ELSE. + DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10. + DO IF NOT #A. + END CASE. + END IF. + END IF. +END INPUT PROGRAM. +LIST. +``` + +## Example 4: Concatenate two files (version 2) + +This is another way to do the same thing as Example 3. + +``` +INPUT PROGRAM. + NUMERIC #EOF. + + LOOP IF NOT #EOF. + DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10. + DO IF NOT #EOF. + END CASE. + END IF. + END LOOP. + + COMPUTE #EOF = 0. + LOOP IF NOT #EOF. + DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10. + DO IF NOT #EOF. + END CASE. + END IF. + END LOOP. + + END FILE. +END INPUT PROGRAM. +LIST. +``` + +## Example 5: Generate random variates + +The follows example creates a dataset that consists of 50 random +variates between 0 and 10. + +``` +INPUT PROGRAM. + LOOP #I=1 TO 50. + COMPUTE X=UNIFORM(10). + END CASE. + END LOOP. + END FILE. +END INPUT PROGRAM. +LIST /FORMAT=NUMBERED. +``` diff --git a/rust/doc/src/commands/data-io/list.md b/rust/doc/src/commands/data-io/list.md new file mode 100644 index 0000000000..d1eb321715 --- /dev/null +++ b/rust/doc/src/commands/data-io/list.md @@ -0,0 +1,33 @@ +# LIST + +``` +LIST + /VARIABLES=VAR_LIST + /CASES=FROM START_INDEX TO END_INDEX BY INCR_INDEX + /FORMAT={UNNUMBERED,NUMBERED} {WRAP,SINGLE} +``` + + The `LIST` procedure prints the values of specified variables to the +listing file. + + The `VARIABLES` subcommand specifies the variables whose values are +to be printed. Keyword `VARIABLES` is optional. If the `VARIABLES` +subcommand is omitted then all variables in the active dataset are +printed. + + The `CASES` subcommand can be used to specify a subset of cases to be +printed. Specify `FROM` and the case number of the first case to print, +`TO` and the case number of the last case to print, and `BY` and the +number of cases to advance between printing cases, or any subset of +those settings. If `CASES` is not specified then all cases are printed. + + The `FORMAT` subcommand can be used to change the output format. +`NUMBERED` will print case numbers along with each case; `UNNUMBERED`, +the default, causes the case numbers to be omitted. The `WRAP` and +`SINGLE` settings are currently not used. + + Case numbers start from 1. They are counted after all +transformations have been considered. + + `LIST` is a procedure. It causes the data to be read. + diff --git a/rust/doc/src/commands/data-io/new-file.md b/rust/doc/src/commands/data-io/new-file.md new file mode 100644 index 0000000000..e287a1ee02 --- /dev/null +++ b/rust/doc/src/commands/data-io/new-file.md @@ -0,0 +1,9 @@ +# NEW FILE + +``` +NEW FILE. +``` + +The `NEW FILE` command clears the dictionary and data from the current +active dataset. + diff --git a/rust/doc/src/commands/data-io/print-eject.md b/rust/doc/src/commands/data-io/print-eject.md new file mode 100644 index 0000000000..04bf38ba37 --- /dev/null +++ b/rust/doc/src/commands/data-io/print-eject.md @@ -0,0 +1,33 @@ +# PRINT EJECT + +``` +PRINT EJECT + OUTFILE='FILE_NAME' + RECORDS=N_LINES + {NOTABLE,TABLE} + /[LINE_NO] ARG... + +ARG takes one of the following forms: + 'STRING' [START-END] + VAR_LIST START-END [TYPE_SPEC] + VAR_LIST (FORTRAN_SPEC) + VAR_LIST * +``` + +`PRINT EJECT` advances to the beginning of a new output page in the +listing file or output file. It can also output data in the same way as +`PRINT`. + +All `PRINT EJECT` subcommands are optional. + +Without `OUTFILE`, `PRINT EJECT` ejects the current page in the +listing file, then it produces other output, if any is specified. + +With `OUTFILE`, `PRINT EJECT` writes its output to the specified +file. The first line of output is written with `1` inserted in the +first column. Commonly, this is the only line of output. If additional +lines of output are specified, these additional lines are written with a +space inserted in the first column, as with `PRINT`. + +See [PRINT](print.md) for more information on syntax and usage. + diff --git a/rust/doc/src/commands/data-io/print-space.md b/rust/doc/src/commands/data-io/print-space.md new file mode 100644 index 0000000000..71aa6107b4 --- /dev/null +++ b/rust/doc/src/commands/data-io/print-space.md @@ -0,0 +1,22 @@ +# PRINT SPACE + +``` +PRINT SPACE [OUTFILE='file_name'] [ENCODING='ENCODING'] [n_lines]. +``` + +`PRINT SPACE` prints one or more blank lines to an output file. + +The `OUTFILE` subcommand is optional. It may be used to direct output +to a file specified by file name as a string or [file +handle](../../language/files/file-handles.md). If `OUTFILE` is not +specified then output is directed to the listing file. + +The `ENCODING` subcommand may only be used if `OUTFILE` is also used. +It specifies the character encoding of the file. *Note INSERT::, for +information on supported encodings. + +`n_lines` is also optional. If present, it is an +[expression](../../language/expressions/index.md) for the number of +blank lines to be printed. The expression must evaluate to a +nonnegative value. + diff --git a/rust/doc/src/commands/data-io/print.md b/rust/doc/src/commands/data-io/print.md new file mode 100644 index 0000000000..d80f350d20 --- /dev/null +++ b/rust/doc/src/commands/data-io/print.md @@ -0,0 +1,68 @@ +# PRINT + +``` +PRINT + [OUTFILE='FILE_NAME'] + [RECORDS=N_LINES] + [{NOTABLE,TABLE}] + [ENCODING='ENCODING'] + [/[LINE_NO] ARG...] + +ARG takes one of the following forms: + 'STRING' [START] + VAR_LIST START-END [TYPE_SPEC] + VAR_LIST (FORTRAN_SPEC) + VAR_LIST * +``` + + The `PRINT` transformation writes variable data to the listing file +or an output file. `PRINT` is executed when a procedure causes the data +to be read. Follow `PRINT` by `EXECUTE` to print variable data without +invoking a procedure (*note EXECUTE::). + + All `PRINT` subcommands are optional. If no strings or variables are +specified, `PRINT` outputs a single blank line. + + The `OUTFILE` subcommand specifies the file to receive the output. +The file may be a file name as a string or a [file +handle](../../language/files/file-handles.md). If `OUTFILE` is not +present then output is sent to PSPP's output listing file. When +`OUTFILE` is present, the output is written to the file in a plain +text format, with a space inserted at beginning of each output line, +even lines that otherwise would be blank. + + The `ENCODING` subcommand may only be used if the `OUTFILE` +subcommand is also used. It specifies the character encoding of the +file. *Note INSERT::, for information on supported encodings. + + The `RECORDS` subcommand specifies the number of lines to be output. +The number of lines may optionally be surrounded by parentheses. + + `TABLE` will cause the `PRINT` command to output a table to the +listing file that describes what it will print to the output file. +`NOTABLE`, the default, suppresses this output table. + + Introduce the strings and variables to be printed with a slash (`/`). +Optionally, the slash may be followed by a number indicating which +output line is specified. In the absence of this line number, the next +line number is specified. Multiple lines may be specified using +multiple slashes with the intended output for a line following its +respective slash. + + Literal strings may be printed. Specify the string itself. +Optionally the string may be followed by a column number, specifying the +column on the line where the string should start. Otherwise, the string +is printed at the current position on the line. + + Variables to be printed can be specified in the same ways as +available for [`DATA LIST FIXED`](data-list.md#data-list-fixed). In addition, +a variable list may be followed by an asterisk (`*`), which indicates +that the variables should be printed in their dictionary print formats, +separated by spaces. A variable list followed by a slash or the end of +command is interpreted in the same way. + + If a FORTRAN type specification is used to move backwards on the +current line, then text is written at that point on the line, the line +is truncated to that length, although additional text being added will +again extend the line to that length. + diff --git a/rust/doc/src/commands/data-io/repeating-data.md b/rust/doc/src/commands/data-io/repeating-data.md new file mode 100644 index 0000000000..94c4b21236 --- /dev/null +++ b/rust/doc/src/commands/data-io/repeating-data.md @@ -0,0 +1,79 @@ +# REPEATING DATA + +``` +REPEATING DATA + /STARTS=START-END + /OCCURS=N_OCCURS + /FILE='FILE_NAME' + /LENGTH=LENGTH + /CONTINUED[=CONT_START-CONT_END] + /ID=ID_START-ID_END=ID_VAR + /{TABLE,NOTABLE} + /DATA=VAR_SPEC... + +where each VAR_SPEC takes one of the forms + VAR_LIST START-END [TYPE_SPEC] + VAR_LIST (FORTRAN_SPEC) +``` + +`REPEATING DATA` parses groups of data repeating in a uniform format, +possibly with several groups on a single line. Each group of data +corresponds with one case. `REPEATING DATA` may only be used within +[`INPUT PROGRAM`](input-program.md). When used with [`DATA +LIST`](data-list.md), it can be used to parse groups of cases that +share a subset of variables but differ in their other data. + +The `STARTS` subcommand is required. Specify a range of columns, +using literal numbers or numeric variable names. This range specifies +the columns on the first line that are used to contain groups of data. +The ending column is optional. If it is not specified, then the +record width of the input file is used. For the [inline +file](begin-data.md), this is 80 columns; for a file with fixed record +widths it is the record width; for other files it is 1024 characters +by default. + +The `OCCURS` subcommand is required. It must be a number or the name +of a numeric variable. Its value is the number of groups present in the +current record. + +The `DATA` subcommand is required. It must be the last subcommand +specified. It is used to specify the data present within each +repeating group. Column numbers are specified relative to the +beginning of a group at column 1. Data is specified in the same way +as with [`DATA LIST FIXED`](data-list.md#data-list-fixed). + +All other subcommands are optional. + +`FILE` specifies the file to read, either a file name as a string or a +[file handle](../../language/files/file-handles.md). If `FILE` is not +present then the default is the last file handle used on the most +recent `DATA LIST` command. + +By default `REPEATING DATA` will output a table describing how it +will parse the input data. Specifying `NOTABLE` will disable this +behavior; specifying `TABLE` will explicitly enable it. + +The `LENGTH` subcommand specifies the length in characters of each +group. If it is not present then length is inferred from the `DATA` +subcommand. `LENGTH` may be a number or a variable name. + +Normally all the data groups are expected to be present on a single +line. Use the `CONTINUED` command to indicate that data can be +continued onto additional lines. If data on continuation lines starts +at the left margin and continues through the entire field width, no +column specifications are necessary on `CONTINUED`. Otherwise, specify +the possible range of columns in the same way as on `STARTS`. + +When data groups are continued from line to line, it is easy for +cases to get out of sync through careless hand editing. The `ID` +subcommand allows a case identifier to be present on each line of +repeating data groups. `REPEATING DATA` will check for the same +identifier on each line and report mismatches. Specify the range of +columns that the identifier will occupy, followed by an equals sign +(`=`) and the identifier variable name. The variable must already have +been declared with `NUMERIC` or another command. + +`REPEATING DATA` should be the last command given within an [`INPUT +PROGRAM`](input-program.md). It should not be enclosed within a +`LOOP` structure (*note LOOP::). Use `DATA LIST` before, not after, +`REPEATING DATA`. diff --git a/rust/doc/src/commands/data-io/reread.md b/rust/doc/src/commands/data-io/reread.md new file mode 100644 index 0000000000..6fec5e8cc1 --- /dev/null +++ b/rust/doc/src/commands/data-io/reread.md @@ -0,0 +1,30 @@ +# REREAD + +``` +REREAD [FILE=handle] [COLUMN=column] [ENCODING='ENCODING']. +``` + +The `REREAD` transformation allows the previous input line in a data +file already processed by `DATA LIST` or another input command to be +re-read for further processing. + +The `FILE` subcommand, which is optional, is used to specify the file +to have its line re-read. The file must be specified as the name of a +[file handle](../../language/files/file-handles.md). If `FILE` is not +specified then the file specified on the most recent `DATA LIST` +command is assumed. + +By default, the line re-read is re-read in its entirety. With the +`COLUMN` subcommand, a prefix of the line can be exempted from +re-reading. Specify an +[expression](../../language/expressions/index.md) evaluating to the +first column that should be included in the re-read line. Columns are +numbered from 1 at the left margin. + +The `ENCODING` subcommand may only be used if the `FILE` subcommand +is also used. It specifies the character encoding of the file. *Note +INSERT::, for information on supported encodings. + +Issuing `REREAD` multiple times will not back up in the data file. +Instead, it will re-read the same line multiple times. + diff --git a/rust/doc/src/commands/data-io/write.md b/rust/doc/src/commands/data-io/write.md new file mode 100644 index 0000000000..aa52ee0dfb --- /dev/null +++ b/rust/doc/src/commands/data-io/write.md @@ -0,0 +1,34 @@ +# WRITE + +``` +WRITE + OUTFILE='FILE_NAME' + RECORDS=N_LINES + {NOTABLE,TABLE} + /[LINE_NO] ARG... + +ARG takes one of the following forms: + 'STRING' [START-END] + VAR_LIST START-END [TYPE_SPEC] + VAR_LIST (FORTRAN_SPEC) + VAR_LIST * +``` + +`WRITE` writes text or binary data to an output file. `WRITE` differs +from [`PRINT`](print.md) in only a few ways: + +- `WRITE` uses write formats by default, whereas `PRINT` uses print + formats. + +- `PRINT` inserts a space between variables unless a format is + explicitly specified, but `WRITE` never inserts space between + variables in output. + +- `PRINT` inserts a space at the beginning of each line that it writes + to an output file (and `PRINT EJECT` inserts `1` at the beginning of + each line that should begin a new page), but `WRITE` does not. + +- `PRINT` outputs the system-missing value according to its specified + output format, whereas `WRITE` outputs the system-missing value as a + field filled with spaces. Binary formats are an exception. + diff --git a/rust/doc/src/language/files/file-handles.md b/rust/doc/src/language/files/file-handles.md index d987801afc..a82f033add 100644 --- a/rust/doc/src/language/files/file-handles.md +++ b/rust/doc/src/language/files/file-handles.md @@ -11,14 +11,14 @@ read data over the network using a program such as `curl` (e.g. `GET from a file using a program such as `zcat` (e.g. `GET '|zcat mydata.sav.gz'`), and for many other purposes. - PSPP also supports declaring named file handles with the `FILE -HANDLE` command. This command associates an identifier of your choice -(the file handle's name) with a file. Later, the file handle name can -be substituted for the name of the file. When PSPP syntax accesses a -file multiple times, declaring a named file handle simplifies updating -the syntax later to use a different file. Use of `FILE HANDLE` is also -required to read data files in binary formats. *Note FILE HANDLE::, for -more information. + PSPP also supports declaring named file handles with the [`FILE +HANDLE`](../../commands/data-io/file-handle.md) command. This command +associates an identifier of your choice (the file handle's name) with +a file. Later, the file handle name can be substituted for the name +of the file. When PSPP syntax accesses a file multiple times, +declaring a named file handle simplifies updating the syntax later to +use a different file. Use of `FILE HANDLE` is also required to read +data files in binary formats. In some circumstances, PSPP must distinguish whether a file handle refers to a system file or a portable file. When this is necessary to @@ -29,10 +29,11 @@ file's name: if it ends in `.por` (with any capitalization), then PSPP writes a portable file; otherwise, PSPP writes a system file. `INLINE` is reserved as a file handle name. It refers to the "data -file" embedded into the syntax file between `BEGIN DATA` and `END -DATA`. *Note BEGIN DATA::, for more information. +file" embedded into the syntax file between [`BEGIN DATA` and `END +DATA`](../../commands/data-io/begin-data.md). The file to which a file handle refers may be reassigned on a later -`FILE HANDLE` command if it is first closed using `CLOSE FILE HANDLE`. -*Note CLOSE FILE HANDLE::, for more information. +`FILE HANDLE` command if it is first closed using [`CLOSE FILE +HANDLE`](../../commands/data-io/close-file-handle.md). + -- 2.30.2