# Command Syntax
-- [Working with Text Files](commands/data-io/index.md)
- - [BEGIN DATA…END DATA](commands/data-io/begin-data.md)
- - [CLOSE FILE HANDLE](commands/data-io/close-file-handle.md)
- - [DATAFILE ATTRIBUTE](commands/data-io/datafile-attribute.md)
- - [DATASET commands](commands/data-io/dataset.md)
- - [DATA LIST](commands/data-io/data-list.md)
- - [END CASE](commands/data-io/end-case.md)
- - [END FILE](commands/data-io/end-file.md)
- - [FILE HANDLE](commands/data-io/file-handle.md)
- - [INPUT PROGRAM…END INPUT PROGRAM](commands/data-io/input-program.md)
- - [LIST](commands/data-io/list.md)
- - [NEW FILE](commands/data-io/new-file.md)
- - [PRINT](commands/data-io/print.md)
- - [PRINT EJECT](commands/data-io/print-eject.md)
- - [PRINT SPACE](commands/data-io/print-space.md)
- - [REREAD](commands/data-io/reread.md)
- - [REPEATING DATA](commands/data-io/repeating-data.md)
- - [WRITE](commands/data-io/write.md)
-- [Working with Data Files](commands/spss-io/index.md)
- - [APPLY DICTIONARY](commands/spss-io/apply-dictionary.md)
- - [EXPORT](commands/spss-io/export.md)
- - [GET](commands/spss-io/get.md)
- - [GET DATA](commands/spss-io/get-data.md)
- - [IMPORT](commands/spss-io/import.md)
- - [SAVE](commands/spss-io/save.md)
- - [SAVE DATA COLLECTION](commands/spss-io/save-data-collection.md)
- - [SAVE TRANSLATE](commands/spss-io/save-translate.md)
- - [SYSFILE INFO](commands/spss-io/sysfile-info.md)
- - [XEXPORT](commands/spss-io/xexport.md)
- - [XSAVE](commands/spss-io/xsave.md)
-- [Combining Data Files](commands/combining/index.md)
- - [ADD FILES](commands/combining/add-files.md)
- - [MATCH FILES](commands/combining/match-files.md)
- - [UPDATE](commands/combining/update.md)
-- [Manipulating Variables](commands/variables/index.md)
- - [ADD VALUE LABELS](commands/variables/add-value-labels.md)
- - [DELETE VARIABLES](commands/variables/delete-variables.md)
- - [DISPLAY](commands/variables/display.md)
- - [FORMATS](commands/variables/formats.md)
- - [LEAVE](commands/variables/leave.md)
- - [MISSING VALUES](commands/variables/missing-values.md)
- - [MRSETS](commands/variables/mrsets.md)
- - [NUMERIC](commands/variables/numeric.md)
- - [PRINT FORMATS](commands/variables/print-formats.md)
- - [RENAME VARIABLES](commands/variables/rename-variables.md)
- - [SORT VARIABLES](commands/variables/sort-variables.md)
- - [STRING](commands/variables/string.md)
- - [VALUE LABELS](commands/variables/value-labels.md)
- - [VARIABLE ALIGNMENT](commands/variables/variable-alignment.md)
- - [VARIABLE ATTRIBUTE](commands/variables/variable-attribute.md)
- - [VARIABLE LABELS](commands/variables/variable-labels.md)
- - [VARIABLE LEVEL](commands/variables/variable-level.md)
- - [VARIABLE ROLE](commands/variables/variable-role.md)
- - [VARIABLE WIDTH](commands/variables/variable-width.md)
- - [VECTOR](commands/variables/vector.md)
- - [WRITE FORMATS](commands/variables/write-formats.md)
-- [Transforming Data](commands/data/index.md)
- - [AGGREGATE](commands/data/aggregate.md)
- - [AUTORECODE](commands/data/autorecode.md)
- - [COMPUTE](commands/data/compute.md)
- - [FLIP](commands/data/flip.md)
- - [IF](commands/data/if.md)
- - [RECODE](commands/data/recode.md)
- - [SORT CASES](commands/data/sort-cases.md)
-- [Selecting Data](commands/selection/index.md)
- - [FILTER](commands/selection/filter.md)
- - [N OF CASES](commands/selection/n.md)
- - [SAMPLE](commands/selection/sample.md)
- - [SELECT IF](commands/selection/select-if.md)
- - [SPLIT FILE](commands/selection/split-file.md)
- - [TEMPORARY](commands/selection/temporary.md)
- - [WEIGHT](commands/selection/weight.md)
-- [Conditionals and Loops](commands/control/index.md)
- - [BREAK](commands/control/break.md)
- - [DEFINE…!ENDDEFINE](commands/control/define.md)
- - [DO IF…END IF](commands/control/do-if.md)
- - [DO REPEAT…END REPEAT](commands/control/do-repeat.md)
- - [LOOP…END LOOP](commands/control/loop.md)
-- [Statistics](commands/statistics/index.md)
- - [DESCRIPTIVES](commands/statistics/descriptives.md)
- - [FREQUENCIES](commands/statistics/frequencies.md)
- - [EXAMINE](commands/statistics/examine.md)
- - [GRAPH](commands/statistics/graph.md)
- - [CORRELATIONS](commands/statistics/correlations.md)
- - [CROSSTABS](commands/statistics/crosstabs.md)
- - [CTABLES](commands/statistics/ctables.md)
- - [FACTOR](commands/statistics/factor.md)
- - [GLM](commands/statistics/glm.md)
- - [LOGISTIC REGRESSION](commands/statistics/logistic-regression.md)
- - [MEANS](commands/statistics/means.md)
- - [NPAR TESTS](commands/statistics/npar-tests.md)
- - [T-TEST](commands/statistics/t-test.md)
- - [ONEWAY](commands/statistics/oneway.md)
- - [QUICK CLUSTER](commands/statistics/quick-cluster.md)
- - [RANK](commands/statistics/rank.md)
- - [REGRESSION](commands/statistics/regression.md)
- - [RELIABILITY](commands/statistics/reliability.md)
- - [ROC](commands/statistics/roc.md)
-- [Matrices](commands/matrix/index.md)
- - [MATRIX DATA](commands/matrix/matrix-data.md)
- - [MCONVERT](commands/matrix/mconvert.md)
- - [MATRIX…END MATRIX](commands/matrix/matrix.md)
-- [Utilities](commands/utilities/index.md)
- - [ADD DOCUMENT](commands/utilities/add-document.md)
- - [CACHE](commands/utilities/cache.md)
- - [CD](commands/utilities/cd.md)
- - [COMMENT](commands/utilities/comment.md)
- - [DOCUMENT](commands/utilities/document.md)
- - [DISPLAY DOCUMENTS](commands/utilities/display-documents.md)
- - [DISPLAY FILE LABEL](commands/utilities/display-file-label.md)
- - [DROP DOCUMENTS](commands/utilities/drop-documents.md)
- - [ECHO](commands/utilities/echo.md)
- - [ERASE](commands/utilities/erase.md)
- - [EXECUTE](commands/utilities/execute.md)
- - [FILE LABEL](commands/utilities/file-label.md)
- - [FINISH](commands/utilities/finish.md)
- - [HOST](commands/utilities/host.md)
- - [INCLUDE](commands/utilities/include.md)
- - [INSERT](commands/utilities/insert.md)
- - [OUTPUT](commands/utilities/output.md)
- - [PERMISSIONS](commands/utilities/permissions.md)
- - [PRESERVE…RESTORE](commands/utilities/preserve.md)
- - [SET](commands/utilities/set.md)
- - [SHOW](commands/utilities/show.md)
- - [SUBTITLE](commands/utilities/subtitle.md)
- - [TITLE](commands/utilities/title.md)
+- [Working with Text Files](commands/data-io.md)
+ - [BEGIN DATA…END DATA](commands/begin-data.md)
+ - [CLOSE FILE HANDLE](commands/close-file-handle.md)
+ - [DATAFILE ATTRIBUTE](commands/datafile-attribute.md)
+ - [DATASET commands](commands/dataset.md)
+ - [DATA LIST](commands/data-list.md)
+ - [END CASE](commands/end-case.md)
+ - [END FILE](commands/end-file.md)
+ - [FILE HANDLE](commands/file-handle.md)
+ - [INPUT PROGRAM…END INPUT PROGRAM](commands/input-program.md)
+ - [LIST](commands/list.md)
+ - [NEW FILE](commands/new-file.md)
+ - [PRINT](commands/print.md)
+ - [PRINT EJECT](commands/print-eject.md)
+ - [PRINT SPACE](commands/print-space.md)
+ - [REREAD](commands/reread.md)
+ - [REPEATING DATA](commands/repeating-data.md)
+ - [WRITE](commands/write.md)
+- [Working with Data Files](commands/spss-io.md)
+ - [APPLY DICTIONARY](commands/apply-dictionary.md)
+ - [EXPORT](commands/export.md)
+ - [GET](commands/get.md)
+ - [GET DATA](commands/get-data.md)
+ - [IMPORT](commands/import.md)
+ - [SAVE](commands/save.md)
+ - [SAVE DATA COLLECTION](commands/save-data-collection.md)
+ - [SAVE TRANSLATE](commands/save-translate.md)
+ - [SYSFILE INFO](commands/sysfile-info.md)
+ - [XEXPORT](commands/xexport.md)
+ - [XSAVE](commands/xsave.md)
+- [Combining Data Files](commands/combining.md)
+ - [ADD FILES](commands/add-files.md)
+ - [MATCH FILES](commands/match-files.md)
+ - [UPDATE](commands/update.md)
+- [Manipulating Variables](commands/variables.md)
+ - [ADD VALUE LABELS](commands/add-value-labels.md)
+ - [DELETE VARIABLES](commands/delete-variables.md)
+ - [DISPLAY](commands/display.md)
+ - [FORMATS](commands/formats.md)
+ - [LEAVE](commands/leave.md)
+ - [MISSING VALUES](commands/missing-values.md)
+ - [MRSETS](commands/mrsets.md)
+ - [NUMERIC](commands/numeric.md)
+ - [PRINT FORMATS](commands/print-formats.md)
+ - [RENAME VARIABLES](commands/rename-variables.md)
+ - [SORT VARIABLES](commands/sort-variables.md)
+ - [STRING](commands/string.md)
+ - [VALUE LABELS](commands/value-labels.md)
+ - [VARIABLE ALIGNMENT](commands/variable-alignment.md)
+ - [VARIABLE ATTRIBUTE](commands/variable-attribute.md)
+ - [VARIABLE LABELS](commands/variable-labels.md)
+ - [VARIABLE LEVEL](commands/variable-level.md)
+ - [VARIABLE ROLE](commands/variable-role.md)
+ - [VARIABLE WIDTH](commands/variable-width.md)
+ - [VECTOR](commands/vector.md)
+ - [WRITE FORMATS](commands/write-formats.md)
+- [Transforming Data](commands/data.md)
+ - [AGGREGATE](commands/aggregate.md)
+ - [AUTORECODE](commands/autorecode.md)
+ - [COMPUTE](commands/compute.md)
+ - [FLIP](commands/flip.md)
+ - [IF](commands/if.md)
+ - [RECODE](commands/recode.md)
+ - [SORT CASES](commands/sort-cases.md)
+- [Selecting Data](commands/selection.md)
+ - [FILTER](commands/filter.md)
+ - [N OF CASES](commands/n.md)
+ - [SAMPLE](commands/sample.md)
+ - [SELECT IF](commands/select-if.md)
+ - [SPLIT FILE](commands/split-file.md)
+ - [TEMPORARY](commands/temporary.md)
+ - [WEIGHT](commands/weight.md)
+- [Conditionals and Loops](commands/control.md)
+ - [BREAK](commands/break.md)
+ - [DEFINE…!ENDDEFINE](commands/define.md)
+ - [DO IF…END IF](commands/do-if.md)
+ - [DO REPEAT…END REPEAT](commands/do-repeat.md)
+ - [LOOP…END LOOP](commands/loop.md)
+- [Statistics](commands/statistics.md)
+ - [DESCRIPTIVES](commands/descriptives.md)
+ - [FREQUENCIES](commands/frequencies.md)
+ - [EXAMINE](commands/examine.md)
+ - [GRAPH](commands/graph.md)
+ - [CORRELATIONS](commands/correlations.md)
+ - [CROSSTABS](commands/crosstabs.md)
+ - [CTABLES](commands/ctables.md)
+ - [FACTOR](commands/factor.md)
+ - [GLM](commands/glm.md)
+ - [LOGISTIC REGRESSION](commands/logistic-regression.md)
+ - [MEANS](commands/means.md)
+ - [NPAR TESTS](commands/npar-tests.md)
+ - [T-TEST](commands/t-test.md)
+ - [ONEWAY](commands/oneway.md)
+ - [QUICK CLUSTER](commands/quick-cluster.md)
+ - [RANK](commands/rank.md)
+ - [REGRESSION](commands/regression.md)
+ - [RELIABILITY](commands/reliability.md)
+ - [ROC](commands/roc.md)
+- [Matrices](commands/matrices.md)
+ - [MATRIX DATA](commands/matrix-data.md)
+ - [MCONVERT](commands/mconvert.md)
+ - [MATRIX…END MATRIX](commands/matrix.md)
+- [Utilities](commands/utilities.md)
+ - [ADD DOCUMENT](commands/add-document.md)
+ - [CACHE](commands/cache.md)
+ - [CD](commands/cd.md)
+ - [COMMENT](commands/comment.md)
+ - [DOCUMENT](commands/document.md)
+ - [DISPLAY DOCUMENTS](commands/display-documents.md)
+ - [DISPLAY FILE LABEL](commands/display-file-label.md)
+ - [DROP DOCUMENTS](commands/drop-documents.md)
+ - [ECHO](commands/echo.md)
+ - [ERASE](commands/erase.md)
+ - [EXECUTE](commands/execute.md)
+ - [FILE LABEL](commands/file-label.md)
+ - [FINISH](commands/finish.md)
+ - [HOST](commands/host.md)
+ - [INCLUDE](commands/include.md)
+ - [INSERT](commands/insert.md)
+ - [OUTPUT](commands/output.md)
+ - [PERMISSIONS](commands/permissions.md)
+ - [PRESERVE…RESTORE](commands/preserve.md)
+ - [SET](commands/set.md)
+ - [SHOW](commands/show.md)
+ - [SUBTITLE](commands/subtitle.md)
+ - [TITLE](commands/title.md)
# Developer Documentation
--- /dev/null
+# ADD DOCUMENT
+
+```
+ADD DOCUMENT
+ 'line one' 'line two' ... 'last line' .
+```
+
+`ADD DOCUMENT` adds one or more lines of descriptive commentary to
+the active dataset. Documents added in this way are saved to system
+files. They can be viewed using `SYSFILE INFO` or `DISPLAY DOCUMENTS`.
+They can be removed from the active dataset with `DROP DOCUMENTS`.
+
+Each line of documentary text must be enclosed in quotation marks, and
+may not be more than 80 bytes long. See also
+[`DOCUMENT`](document.md).
+
--- /dev/null
+# ADD FILES
+
+```
+ADD FILES
+
+Per input file:
+ /FILE={*,'FILE_NAME'}
+ [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+ [/IN=VAR_NAME]
+ [/SORT]
+
+Once per command:
+ [/BY VAR_LIST[({D|A})] [VAR_LIST[({D|A})]...]]
+ [/DROP=VAR_LIST]
+ [/KEEP=VAR_LIST]
+ [/FIRST=VAR_NAME]
+ [/LAST=VAR_NAME]
+ [/MAP]
+```
+
+`ADD FILES` adds cases from multiple input files. The output, which
+replaces the active dataset, consists all of the cases in all of the
+input files.
+
+`ADD FILES` shares the bulk of its syntax with other PSPP commands for
+combining multiple data files (see [Common
+Syntax](combining.md#common-syntax) for details).
+
+When `BY` is not used, the output of `ADD FILES` consists of all the
+cases from the first input file specified, followed by all the cases
+from the second file specified, and so on. When `BY` is used, the
+output is additionally sorted on the `BY` variables.
+
+When `ADD FILES` creates an output case, variables that are not part
+of the input file from which the case was drawn are set to the
+system-missing value for numeric variables or spaces for string
+variables.
+
--- /dev/null
+# ADD VALUE LABELS
+
+`ADD VALUE LABELS` has the same syntax and purpose as [`VALUE
+LABELS`](value-labels.md), but it does not clear value labels from the
+variables before adding the ones specified.
+
+```
+ADD VALUE LABELS
+ /VAR_LIST VALUE 'LABEL' [VALUE 'LABEL']...
+```
--- /dev/null
+# AGGREGATE
+
+```
+AGGREGATE
+ [OUTFILE={*,'FILE_NAME',FILE_HANDLE} [MODE={REPLACE,ADDVARIABLES}]]
+ [/MISSING=COLUMNWISE]
+ [/PRESORTED]
+ [/DOCUMENT]
+ [/BREAK=VAR_LIST]
+ /DEST_VAR['LABEL']...=AGR_FUNC(SRC_VARS[, ARGS]...)...
+```
+
+`AGGREGATE` summarizes groups of cases into single cases. It divides
+cases into groups that have the same values for one or more variables
+called "break variables". Several functions are available for
+summarizing case contents.
+
+The `AGGREGATE` syntax consists of subcommands to control its
+behavior, all of which are optional, followed by one or more
+destination variable assigments, each of which uses an aggregation
+function to define how it is calculated.
+
+The `OUTFILE` subcommand, which must be first, names the destination
+for `AGGREGATE` output. It may name a system file by file name or
+[file handle](../language/files/file-handles.md), a
+[dataset](../language/datasets/index.md) by its name, or `*` to
+replace the active dataset. `AGGREGATE` writes its output to this
+file.
+
+With `OUTFILE=*` only, `MODE` may be specified immediately afterward
+with the value `ADDVARIABLES` or `REPLACE`:
+
+- With `REPLACE`, the default, the active dataset is replaced by a
+ new dataset which contains just the break variables and the
+ destination varibles. The new file contains as many cases as there
+ are unique combinations of the break variables.
+
+- With `ADDVARIABLES`, the destination variables are added to those
+ in the existing active dataset. Cases that have the same
+ combination of values in their break variables receive identical
+ values for the destination variables. The number of cases in the
+ active dataset remains unchanged. The data must be sorted on the
+ break variables, that is, `ADDVARIABLES` implies `PRESORTED`
+
+Without `OUTFILE`, `AGGREGATE` acts as if `OUTFILE=*
+MODE=ADDVARIABLES` were specified.
+
+By default, `AGGREGATE` first sorts the data on the break variables.
+If the active dataset is already sorted or grouped by the break
+variables, specify `PRESORTED` to save time. With
+`MODE=ADDVARIABLES`, the data must be pre-sorted.
+
+Specify [`DOCUMENT`](document.md) to copy the documents from the
+active dataset into the aggregate file. Otherwise, the aggregate file
+does not contain any documents, even if the aggregate file replaces
+the active dataset.
+
+Normally, `AGGREGATE` produces a non-missing value whenever there is
+enough non-missing data for the aggregation function in use, that is,
+just one non-missing value or, for the `SD` and `SD.` aggregation
+functions, two non-missing values. Specify `/MISSING=COLUMNWISE` to
+make `AGGREGATE` output a missing value when one or more of the input
+values are missing.
+
+The `BREAK` subcommand is optionally but usually present. On `BREAK`,
+list the variables used to divide the active dataset into groups to be
+summarized.
+
+`AGGREGATE` is particular about the order of subcommands. `OUTFILE`
+must be first, followed by `MISSING`. `PRESORTED` and `DOCUMENT`
+follow `MISSING`, in either order, followed by `BREAK`, then followed
+by aggregation variable specifications.
+
+At least one set of aggregation variables is required. Each set
+comprises a list of aggregation variables, an equals sign (`=`), the
+name of an aggregation function (see the list below), and a list of
+source variables in parentheses. A few aggregation functions do not
+accept source variables, and some aggregation functions expect
+additional arguments after the source variable names.
+
+`AGGREGATE` typically creates aggregation variables with no variable
+label, value labels, or missing values. Their default print and write
+formats depend on the aggregation function used, with details given in
+the table below. A variable label for an aggregation variable may be
+specified just after the variable's name in the aggregation variable
+list.
+
+Each set must have exactly as many source variables as aggregation
+variables. Each aggregation variable receives the results of applying
+the specified aggregation function to the corresponding source variable.
+
+The following aggregation functions may be applied only to numeric
+variables:
+
+* `MEAN(VAR_NAME...)`
+ Arithmetic mean. Limited to numeric values. The default format is
+ `F8.2`.
+
+* `MEDIAN(VAR_NAME...)`
+ The median value. Limited to numeric values. The default format
+ is `F8.2`.
+
+* `SD(VAR_NAME...)`
+ Standard deviation of the mean. Limited to numeric values. The
+ default format is `F8.2`.
+
+* `SUM(VAR_NAME...)`
+ Sum. Limited to numeric values. The default format is `F8.2`.
+
+ These aggregation functions may be applied to numeric and string
+variables:
+
+* `CGT(VAR_NAME..., VALUE)`
+ `CLT(VAR_NAME..., VALUE)`
+ `CIN(VAR_NAME..., LOW, HIGH)`
+ `COUT(VAR_NAME..., LOW, HIGH)`
+ Total weight of cases greater than or less than `VALUE` or inside or
+ outside the closed range `[LOW,HIGH]`, respectively. The default
+ format is `F5.3`.
+
+* `FGT(VAR_NAME..., VALUE)`
+ `FLT(VAR_NAME..., VALUE)`
+ `FIN(VAR_NAME..., LOW, HIGH)`
+ `FOUT(VAR_NAME..., LOW, HIGH)`
+ Fraction of values greater than or less than `VALUE` or inside or
+ outside the closed range `[LOW,HIGH]`, respectively. The default
+ format is `F5.3`.
+
+* `FIRST(VAR_NAME...)`
+ `LAST(VAR_NAME...)`
+ First or last non-missing value, respectively, in break group. The
+ aggregation variable receives the complete dictionary information
+ from the source variable. The sort performed by `AGGREGATE` (and
+ by `SORT CASES`) is stable. This means that the first (or last)
+ case with particular values for the break variables before sorting
+ is also the first (or last) case in that break group after sorting.
+
+* `MIN(VAR_NAME...)`
+ `MAX(VAR_NAME...)`
+ Minimum or maximum value, respectively. The aggregation variable
+ receives the complete dictionary information from the source
+ variable.
+
+* `N(VAR_NAME...)`
+ `NMISS(VAR_NAME...)`
+ Total weight of non-missing or missing values, respectively. The
+ default format is `F7.0` if weighting is not enabled, `F8.2` if it
+ is (see [`WEIGHT`](weight.md)).
+
+* `NU(VAR_NAME...)`
+ `NUMISS(VAR_NAME...)`
+ Count of non-missing or missing values, respectively, ignoring case
+ weights. The default format is `F7.0`.
+
+* `PGT(VAR_NAME..., VALUE)`
+ `PLT(VAR_NAME..., VALUE)`
+ `PIN(VAR_NAME..., LOW, HIGH)`
+ `POUT(VAR_NAME..., LOW, HIGH)`
+ Percentage between 0 and 100 of values greater than or less than
+ `VALUE` or inside or outside the closed range `[LOW,HIGH]`,
+ respectively. The default format is `F5.1`.
+
+These aggregation functions do not accept source variables:
+
+* `N`
+ Total weight of cases aggregated to form this group. The default
+ format is `F7.0` if weighting is not enabled, `F8.2` if it is (see
+ [`WEIGHT`](weight.md)).
+
+* `NU`
+ Count of cases aggregated to form this group, ignoring case
+ weights. The default format is `F7.0`.
+
+Aggregation functions compare string values in terms of Unicode
+character codes.
+
+The aggregation functions listed above exclude all user-missing values
+from calculations. To include user-missing values, insert a period
+(`.`) at the end of the function name. (e.g. `SUM.`). (Be aware that
+specifying such a function as the last token on a line causes the
+period to be interpreted as the end of the command.)
+
+`AGGREGATE` both ignores and cancels the current [`SPLIT
+FILE`](split-file.md) settings.
+
+## Example
+
+The `personnel.sav` dataset provides the occupations and salaries of
+many individuals. For many purposes however such detailed information
+is not interesting, but often the aggregated statistics of each
+occupation are of interest. Here, the `AGGREGATE` command is used to
+calculate the mean, the median and the standard deviation of each
+occupation.
+
+```
+GET FILE="personnel.sav".
+AGGREGATE OUTFILE=* MODE=REPLACE
+ /BREAK=occupation
+ /occ_mean_salary=MEAN(salary)
+ /occ_median_salary=MEDIAN(salary)
+ /occ_std_dev_salary=SD(salary).
+LIST.
+```
+
+Since we chose the `MODE=REPLACE` option, cases for the individual
+persons are no longer present. They have each been replaced by a
+single case per aggregated value.
+
+```
+ Data List
+┌──────────────────┬───────────────┬─────────────────┬──────────────────┐
+│ occupation │occ_mean_salary│occ_median_salary│occ_std_dev_salary│
+├──────────────────┼───────────────┼─────────────────┼──────────────────┤
+│Artist │ 37836.18│ 34712.50│ 7631.48│
+│Baker │ 45075.20│ 45075.20│ 4411.21│
+│Barrister │ 39504.00│ 39504.00│ .│
+│Carpenter │ 39349.11│ 36190.04│ 7453.40│
+│Cleaner │ 41142.50│ 39647.49│ 14378.98│
+│Cook │ 40357.79│ 43194.00│ 11064.51│
+│Manager │ 46452.14│ 45657.56│ 6901.69│
+│Mathematician │ 34531.06│ 34763.06│ 5267.68│
+│Painter │ 45063.55│ 45063.55│ 15159.67│
+│Payload Specialist│ 34355.72│ 34355.72│ .│
+│Plumber │ 40413.91│ 40410.00│ 4726.05│
+│Scientist │ 36687.07│ 36803.83│ 10873.54│
+│Scrientist │ 42530.65│ 42530.65│ .│
+│Tailor │ 34586.79│ 34586.79│ 3728.98│
+└──────────────────┴───────────────┴─────────────────┴──────────────────┘
+```
+
+Some values for the standard deviation are blank because there is only
+one case with the respective occupation.
+
--- /dev/null
+# APPLY DICTIONARY
+
+```
+APPLY DICTIONARY FROM={'FILE_NAME',FILE_HANDLE}.
+```
+
+`APPLY DICTIONARY` applies the variable labels, value labels, and
+missing values taken from a file to corresponding variables in the
+active dataset. In some cases it also updates the weighting variable.
+
+The `FROM` clause is mandatory. Use it to specify a system file or
+portable file's name in single quotes, or a [file handle
+name](../language/files/file-handles.md). The dictionary in the
+file is read, but it does not replace the active dataset's dictionary.
+The file's data is not read.
+
+Only variables with names that exist in both the active dataset and
+the system file are considered. Variables with the same name but
+different types (numeric, string) cause an error message. Otherwise,
+the system file variables' attributes replace those in their matching
+active dataset variables:
+
+- If a system file variable has a variable label, then it replaces
+ the variable label of the active dataset variable. If the system
+ file variable does not have a variable label, then the active
+ dataset variable's variable label, if any, is retained.
+
+- If the system file variable has [variable
+ attributes](variable-attribute.md), then those
+ attributes replace the active dataset variable's variable
+ attributes. If the system file variable does not have varaible
+ attributes, then the active dataset variable's variable attributes,
+ if any, is retained.
+
+- If the active dataset variable is numeric or short string, then
+ value labels and missing values, if any, are copied to the active
+ dataset variable. If the system file variable does not have value
+ labels or missing values, then those in the active dataset
+ variable, if any, are not disturbed.
+
+In addition to properties of variables, some properties of the active
+file dictionary as a whole are updated:
+
+- If the system file has custom attributes (see [DATAFILE
+ ATTRIBUTE](datafile-attribute.html)), then those attributes replace
+ the active dataset variable's custom attributes.
+
+- If the active dataset has a [weight variable](weight.md), and the
+ system file does not, or if the weighting variable in the system
+ file does not exist in the active dataset, then the active dataset
+ weighting variable, if any, is retained. Otherwise, the weighting
+ variable in the system file becomes the active dataset weighting
+ variable.
+
+`APPLY DICTIONARY` takes effect immediately. It does not read the
+active dataset. The system file is not modified.
+
--- /dev/null
+# AUTORECODE
+
+```
+AUTORECODE VARIABLES=SRC_VARS INTO DEST_VARS
+ [ /DESCENDING ]
+ [ /PRINT ]
+ [ /GROUP ]
+ [ /BLANK = {VALID, MISSING} ]
+```
+
+The `AUTORECODE` procedure considers the N values that a variable
+takes on and maps them onto values 1...N on a new numeric variable.
+
+Subcommand `VARIABLES` is the only required subcommand and must come
+first. Specify `VARIABLES`, an equals sign (`=`), a list of source
+variables, `INTO`, and a list of target variables. There must the
+same number of source and target variables. The target variables must
+not already exist.
+
+`AUTORECODE` ordinarily assigns each increasing non-missing value of a
+source variable (for a string, this is based on character code
+comparisons) to consecutive values of its target variable. For
+example, the smallest non-missing value of the source variable is
+recoded to value 1, the next smallest to 2, and so on. If the source
+variable has user-missing values, they are recoded to consecutive
+values just above the non-missing values. For example, if a source
+variables has seven distinct non-missing values, then the smallest
+missing value would be recoded to 8, the next smallest to 9, and so
+on.
+
+Use `DESCENDING` to reverse the sort order for non-missing values, so
+that the largest non-missing value is recoded to 1, the second-largest
+to 2, and so on. Even with `DESCENDING`, user-missing values are
+still recoded in ascending order just above the non-missing values.
+
+The system-missing value is always recoded into the system-missing
+variable in target variables.
+
+If a source value has a value label, then that value label is retained
+for the new value in the target variable. Otherwise, the source value
+itself becomes each new value's label.
+
+Variable labels are copied from the source to target variables.
+
+`PRINT` is currently ignored.
+
+The `GROUP` subcommand is relevant only if more than one variable is
+to be recoded. It causes a single mapping between source and target
+values to be used, instead of one map per variable. With `GROUP`,
+user-missing values are taken from the first source variable that has
+any user-missing values.
+
+If `/BLANK=MISSING` is given, then string variables which contain
+only whitespace are recoded as SYSMIS. If `/BLANK=VALID` is specified
+then they are allocated a value like any other. `/BLANK` is not
+relevant to numeric values. `/BLANK=VALID` is the default.
+
+`AUTORECODE` is a procedure. It causes the data to be read.
+
+## Example
+
+In the file `personnel.sav`, the variable occupation is a string
+variable. Except for data of a purely commentary nature, string
+variables are generally a bad idea. One reason is that data entry
+errors are easily overlooked. This has happened in `personnel.sav`;
+one entry which should read "Scientist" has been mistyped as
+"Scrientist". The syntax below shows how to correct this error in the
+`DO IF` clause[^1], which then uses `AUTORECODE` to create a new numeric
+variable which takes recoded values of occupation. Finally, we remove
+the old variable and rename the new variable to the name of the old
+variable:
+
+[^1]: One must use care when correcting such data input errors rather
+than simply marking them as missing. For example, if an occupation
+has been entered "Barister", did the person mean "Barrister" or
+"Barista"?
+
+```
+get file='personnel.sav'.
+
+* Correct a typing error in the original file.
+do if occupation = "Scrientist".
+ compute occupation = "Scientist".
+end if.
+
+autorecode
+ variables = occupation into occ
+ /blank = missing.
+
+* Delete the old variable.
+delete variables occupation.
+
+* Rename the new variable to the old variable's name.
+rename variables (occ = occupation).
+
+* Inspect the new variable.
+display dictionary /variables=occupation.
+```
+
+
+Notice, in the output below, how the new variable has been
+automatically allocated value labels which correspond to the strings
+of the old variable. This means that in future analyses the
+descriptive strings are reported instead of the numeric values.
+
+```
+ Variables
++----------+--------+--------------+-----+-----+---------+----------+---------+
+| | | Measurement | | | | Print | Write |
+|Name |Position| Level | Role|Width|Alignment| Format | Format |
++----------+--------+--------------+-----+-----+---------+----------+---------+
+|occupation| 6|Unknown |Input| 8|Right |F2.0 |F2.0 |
++----------+--------+--------------+-----+-----+---------+----------+---------+
+
+ Value Labels
++---------------+------------------+
+|Variable Value | Label |
++---------------+------------------+
+|occupation 1 |Artist |
+| 2 |Baker |
+| 3 |Barrister |
+| 4 |Carpenter |
+| 5 |Cleaner |
+| 6 |Cook |
+| 7 |Manager |
+| 8 |Mathematician |
+| 9 |Painter |
+| 10 |Payload Specialist|
+| 11 |Plumber |
+| 12 |Scientist |
+| 13 |Tailor |
++---------------+------------------+
+```
--- /dev/null
+# BEGIN DATA…END DATA
+
+```
+BEGIN DATA.
+...
+END DATA.
+```
+
+`BEGIN DATA` and `END DATA` can be used to embed raw ASCII data in a
+PSPP syntax file. [`DATA LIST`](data-list.md) or another input
+procedure must be used before `BEGIN DATA`. `BEGIN DATA` and `END
+DATA` must be used together. `END DATA` must appear by itself on a
+single line, with no leading white space and exactly one space between
+the words `END` and `DATA`.
--- /dev/null
+# BREAK
+
+```
+BREAK.
+```
+
+`BREAK` terminates execution of the innermost currently executing
+`LOOP` construct.
+
+`BREAK` is allowed only inside [`LOOP`...`END LOOP`](loop.md).
+
--- /dev/null
+# CACHE
+
+```
+CACHE.
+```
+
+This command is accepted, for compatibility, but it has no effect.
+
--- /dev/null
+# CD
+
+```
+CD 'new directory' .
+```
+
+`CD` changes the current directory. The new directory becomes that
+specified by the command.
+
--- /dev/null
+# CLOSE FILE HANDLE
+
+```
+CLOSE FILE HANDLE HANDLE_NAME.
+```
+
+`CLOSE FILE HANDLE` disassociates the name of a [file
+handle](../language/files/file-handles.md) with a given file. The
+only specification is the name of the handle to close. Afterward
+`FILE HANDLE`.
+
+The file named INLINE, which represents data entered between `BEGIN
+DATA` and `END DATA`, cannot be closed. Attempts to close it with
+`CLOSE FILE HANDLE` have no effect.
+
+`CLOSE FILE HANDLE` is a PSPP extension.
+
--- /dev/null
+# Combining Data Files
+
+This chapter describes commands that allow data from system files,
+portable files, and open datasets to be combined to form a new active
+dataset. These commands can combine data files in the following ways:
+
+- [`ADD FILES`](add-files.md) interleaves or appends the cases from
+ each input file. It is used with input files that have variables in
+ common, but distinct sets of cases.
+
+- [`MATCH FILES`](match-files.md) adds the data together in cases that
+ match across multiple input files. It is used with input files that
+ have cases in common, but different information about each case.
+
+- [`UPDATE`](update.md) updates a master data file from data in a set
+ of transaction files. Each case in a transaction data file modifies
+ a matching case in the primary data file, or it adds a new case if
+ no matching case can be found.
+
+These commands share the majority of their syntax, described below,
+followed by an individual section for each command that describes its
+specific syntax and semantics.
+
+## Common Syntax
+
+```
+Per input file:
+ /FILE={*,'FILE_NAME'}
+ [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+ [/IN=VAR_NAME]
+ [/SORT]
+
+Once per command:
+ /BY VAR_LIST[({D|A})] [VAR_LIST[({D|A}]]...
+ [/DROP=VAR_LIST]
+ [/KEEP=VAR_LIST]
+ [/FIRST=VAR_NAME]
+ [/LAST=VAR_NAME]
+ [/MAP]
+```
+
+Each of these commands reads two or more input files and combines
+them. The command's output becomes the new active dataset. None of
+the commands actually change the input files. Therefore, if you want
+the changes to become permanent, you must explicitly save them using
+an appropriate procedure or transformation.
+
+The syntax of each command begins with a specification of the files to
+be read as input. For each input file, specify `FILE` with a system
+file or portable file's name as a string, a
+[dataset](../language/datasets/index.md) or [file
+handle](../language/files/file-handles.md) name, or an asterisk (`*`)
+to use the active dataset as input. Use of portable files on `FILE`
+is a PSPP extension.
+
+At least two `FILE` subcommands must be specified. If the active
+dataset is used as an input source, then `TEMPORARY` must not be in
+effect.
+
+Each `FILE` subcommand may be followed by any number of `RENAME`
+subcommands that specify a parenthesized group or groups of variable
+names as they appear in the input file, followed by those variables'
+new names, separated by an equals sign (`=`), e.g.
+`/RENAME=(OLD1=NEW1)(OLD2=NEW2)`. To rename a single variable, the
+parentheses may be omitted: `/RENAME=OLD=NEW`. Within a parenthesized
+group, variables are renamed simultaneously, so that `/RENAME=(A B=B
+A)` exchanges the names of variables A and B. Otherwise, renaming
+occurs in left-to-right order.
+
+Each `FILE` subcommand may optionally be followed by a single `IN`
+subcommand, which creates a numeric variable with the specified name
+and format `F1.0`. The `IN` variable takes value 1 in an output case
+if the given input file contributed to that output case, and 0
+otherwise. The `DROP`, `KEEP`, and `RENAME` subcommands have no
+effect on `IN` variables.
+
+If `BY` is used (see below), the `SORT` keyword must be specified
+after a `FILE` if that input file is not already sorted on the `BY`
+variables. When `SORT` is specified, PSPP sorts the input file's data
+on the `BY` variables before it applies it to the command. When
+`SORT` is used, `BY` is required. `SORT` is a PSPP extension.
+
+PSPP merges the dictionaries of all of the input files to form the
+dictionary of the new active dataset, like so:
+
+- The variables in the new active dataset are the union of all the
+ input files' variables, matched based on their name. When a single
+ input file contains a variable with a given name, the output file
+ will contain exactly that variable. When more than one input file
+ contains a variable with a given name, those variables must all
+ have the same type (numeric or string) and, for string variables,
+ the same width. Variables are matched after renaming with the
+ `RENAME` subcommand. Thus, `RENAME` can be used to resolve
+ conflicts.
+
+- The variable label for each output variable is taken from the first
+ specified input file that has a variable label for that variable,
+ and similarly for value labels and missing values.
+
+- The [file label](file-label.md) of the new active dataset is that of
+ the first specified `FILE` that has a file label.
+
+- The [documents](document.md) in the new active dataset are the
+ concatenation of all the input files' documents, in the order in
+ which the `FILE` subcommands are specified.
+
+- If all of the input files are weighted on the same variable, then
+ the new active dataset is weighted on that variable. Otherwise,
+ the new active dataset is not weighted.
+
+The remaining subcommands apply to the output file as a whole, rather
+than to individual input files. They must be specified at the end of
+the command specification, following all of the `FILE` and related
+subcommands. The most important of these subcommands is `BY`, which
+specifies a set of one or more variables that may be used to find
+corresponding cases in each of the input files. The variables
+specified on `BY` must be present in all of the input files.
+Furthermore, if any of the input files are not sorted on the `BY`
+variables, then `SORT` must be specified for those input files.
+
+The variables listed on `BY` may include `(A)` or `(D)` annotations to
+specify ascending or descending sort order. See [`SORT
+CASES`](sort-cases.md), for more details on this notation. Adding
+`(A)` or `(D)` to the `BY` subcommand specification is a PSPP
+extension.
+
+The `DROP` subcommand can be used to specify a list of variables to
+exclude from the output. By contrast, the `KEEP` subcommand can be
+used to specify variables to include in the output; all variables not
+listed are dropped. `DROP` and `KEEP` are executed in left-to-right
+order and may be repeated any number of times. `DROP` and `KEEP` do
+not affect variables created by the `IN`, `FIRST`, and `LAST`
+subcommands, which are always included in the new active dataset, but
+they can be used to drop `BY` variables.
+
+The `FIRST` and `LAST` subcommands are optional. They may only be
+specified on `MATCH FILES` and `ADD FILES`, and only when `BY` is
+used. `FIRST` and `LIST` each adds a numeric variable to the new
+active dataset, with the name given as the subcommand's argument and
+`F1.0` print and write formats. The value of the `FIRST` variable is
+1 in the first output case with a given set of values for the `BY`
+variables, and 0 in other cases. Similarly, the `LAST` variable is 1
+in the last case with a given of `BY` values, and 0 in other cases.
+
+When any of these commands creates an output case, variables that are
+only in files that are not present for the current case are set to the
+system-missing value for numeric variables or spaces for string
+variables.
+
+These commands may combine any number of files, limited only by the
+machine's memory.
+
+++ /dev/null
-# ADD FILES
-
-```
-ADD FILES
-
-Per input file:
- /FILE={*,'FILE_NAME'}
- [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
- [/IN=VAR_NAME]
- [/SORT]
-
-Once per command:
- [/BY VAR_LIST[({D|A})] [VAR_LIST[({D|A})]...]]
- [/DROP=VAR_LIST]
- [/KEEP=VAR_LIST]
- [/FIRST=VAR_NAME]
- [/LAST=VAR_NAME]
- [/MAP]
-```
-
-`ADD FILES` adds cases from multiple input files. The output, which
-replaces the active dataset, consists all of the cases in all of the
-input files.
-
-`ADD FILES` shares the bulk of its syntax with other PSPP commands for
-combining multiple data files (see [Common
-Syntax](index.md#common-syntax) for details).
-
-When `BY` is not used, the output of `ADD FILES` consists of all the
-cases from the first input file specified, followed by all the cases
-from the second file specified, and so on. When `BY` is used, the
-output is additionally sorted on the `BY` variables.
-
-When `ADD FILES` creates an output case, variables that are not part
-of the input file from which the case was drawn are set to the
-system-missing value for numeric variables or spaces for string
-variables.
-
+++ /dev/null
-# Combining Data Files
-
-This chapter describes commands that allow data from system files,
-portable files, and open datasets to be combined to form a new active
-dataset. These commands can combine data files in the following ways:
-
-- [`ADD FILES`](add-files.md) interleaves or appends the cases from
- each input file. It is used with input files that have variables in
- common, but distinct sets of cases.
-
-- [`MATCH FILES`](match-files.md) adds the data together in cases that
- match across multiple input files. It is used with input files that
- have cases in common, but different information about each case.
-
-- [`UPDATE`](update.md) updates a master data file from data in a set
- of transaction files. Each case in a transaction data file modifies
- a matching case in the primary data file, or it adds a new case if
- no matching case can be found.
-
-These commands share the majority of their syntax, described below,
-followed by an individual section for each command that describes its
-specific syntax and semantics.
-
-## Common Syntax
-
-```
-Per input file:
- /FILE={*,'FILE_NAME'}
- [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
- [/IN=VAR_NAME]
- [/SORT]
-
-Once per command:
- /BY VAR_LIST[({D|A})] [VAR_LIST[({D|A}]]...
- [/DROP=VAR_LIST]
- [/KEEP=VAR_LIST]
- [/FIRST=VAR_NAME]
- [/LAST=VAR_NAME]
- [/MAP]
-```
-
-Each of these commands reads two or more input files and combines
-them. The command's output becomes the new active dataset. None of
-the commands actually change the input files. Therefore, if you want
-the changes to become permanent, you must explicitly save them using
-an appropriate procedure or transformation.
-
-The syntax of each command begins with a specification of the files to
-be read as input. For each input file, specify `FILE` with a system
-file or portable file's name as a string, a
-[dataset](../../language/datasets/index.md) or [file
-handle](../../language/files/file-handles.md) name, or an asterisk
-(`*`) to use the active dataset as input. Use of portable files on
-`FILE` is a PSPP extension.
-
-At least two `FILE` subcommands must be specified. If the active
-dataset is used as an input source, then `TEMPORARY` must not be in
-effect.
-
-Each `FILE` subcommand may be followed by any number of `RENAME`
-subcommands that specify a parenthesized group or groups of variable
-names as they appear in the input file, followed by those variables'
-new names, separated by an equals sign (`=`), e.g.
-`/RENAME=(OLD1=NEW1)(OLD2=NEW2)`. To rename a single variable, the
-parentheses may be omitted: `/RENAME=OLD=NEW`. Within a parenthesized
-group, variables are renamed simultaneously, so that `/RENAME=(A B=B
-A)` exchanges the names of variables A and B. Otherwise, renaming
-occurs in left-to-right order.
-
-Each `FILE` subcommand may optionally be followed by a single `IN`
-subcommand, which creates a numeric variable with the specified name
-and format `F1.0`. The `IN` variable takes value 1 in an output case
-if the given input file contributed to that output case, and 0
-otherwise. The `DROP`, `KEEP`, and `RENAME` subcommands have no
-effect on `IN` variables.
-
-If `BY` is used (see below), the `SORT` keyword must be specified
-after a `FILE` if that input file is not already sorted on the `BY`
-variables. When `SORT` is specified, PSPP sorts the input file's data
-on the `BY` variables before it applies it to the command. When
-`SORT` is used, `BY` is required. `SORT` is a PSPP extension.
-
-PSPP merges the dictionaries of all of the input files to form the
-dictionary of the new active dataset, like so:
-
-- The variables in the new active dataset are the union of all the
- input files' variables, matched based on their name. When a single
- input file contains a variable with a given name, the output file
- will contain exactly that variable. When more than one input file
- contains a variable with a given name, those variables must all
- have the same type (numeric or string) and, for string variables,
- the same width. Variables are matched after renaming with the
- `RENAME` subcommand. Thus, `RENAME` can be used to resolve
- conflicts.
-
-- The variable label for each output variable is taken from the first
- specified input file that has a variable label for that variable,
- and similarly for value labels and missing values.
-
-- The [file label](../utilities/file-label.md) of the new active
- dataset is that of the first specified `FILE` that has a file label.
-
-- The [documents](../utilities/document.md) in the new active dataset
- are the concatenation of all the input files' documents, in the
- order in which the `FILE` subcommands are specified.
-
-- If all of the input files are weighted on the same variable, then
- the new active dataset is weighted on that variable. Otherwise,
- the new active dataset is not weighted.
-
-The remaining subcommands apply to the output file as a whole, rather
-than to individual input files. They must be specified at the end of
-the command specification, following all of the `FILE` and related
-subcommands. The most important of these subcommands is `BY`, which
-specifies a set of one or more variables that may be used to find
-corresponding cases in each of the input files. The variables
-specified on `BY` must be present in all of the input files.
-Furthermore, if any of the input files are not sorted on the `BY`
-variables, then `SORT` must be specified for those input files.
-
-The variables listed on `BY` may include `(A)` or `(D)` annotations to
-specify ascending or descending sort order. See [`SORT
-CASES`](../data/sort-cases.md), for more details on this notation.
-Adding `(A)` or `(D)` to the `BY` subcommand specification is a PSPP
-extension.
-
-The `DROP` subcommand can be used to specify a list of variables to
-exclude from the output. By contrast, the `KEEP` subcommand can be
-used to specify variables to include in the output; all variables not
-listed are dropped. `DROP` and `KEEP` are executed in left-to-right
-order and may be repeated any number of times. `DROP` and `KEEP` do
-not affect variables created by the `IN`, `FIRST`, and `LAST`
-subcommands, which are always included in the new active dataset, but
-they can be used to drop `BY` variables.
-
-The `FIRST` and `LAST` subcommands are optional. They may only be
-specified on `MATCH FILES` and `ADD FILES`, and only when `BY` is
-used. `FIRST` and `LIST` each adds a numeric variable to the new
-active dataset, with the name given as the subcommand's argument and
-`F1.0` print and write formats. The value of the `FIRST` variable is
-1 in the first output case with a given set of values for the `BY`
-variables, and 0 in other cases. Similarly, the `LAST` variable is 1
-in the last case with a given of `BY` values, and 0 in other cases.
-
-When any of these commands creates an output case, variables that are
-only in files that are not present for the current case are set to the
-system-missing value for numeric variables or spaces for string
-variables.
-
-These commands may combine any number of files, limited only by the
-machine's memory.
-
+++ /dev/null
-# MATCH FILES
-
-```
-MATCH FILES
-
-Per input file:
- /{FILE,TABLE}={*,'FILE_NAME'}
- [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
- [/IN=VAR_NAME]
- [/SORT]
-
-Once per command:
- /BY VAR_LIST[({D|A}] [VAR_LIST[({D|A})]...]
- [/DROP=VAR_LIST]
- [/KEEP=VAR_LIST]
- [/FIRST=VAR_NAME]
- [/LAST=VAR_NAME]
- [/MAP]
-```
-
-`MATCH FILES` merges sets of corresponding cases in multiple input
-files into single cases in the output, combining their data.
-
-`MATCH FILES` shares the bulk of its syntax with other PSPP commands
-for combining multiple data files (see [Common
-Syntax](index.md#common-syntax) for details).
-
-How `MATCH FILES` matches up cases from the input files depends on
-whether `BY` is specified:
-
-- If `BY` is not used, `MATCH FILES` combines the first case from
- each input file to produce the first output case, then the second
- case from each input file for the second output case, and so on.
- If some input files have fewer cases than others, then the shorter
- files do not contribute to cases output after their input has been
- exhausted.
-
-- If `BY` is used, `MATCH FILES` combines cases from each input file
- that have identical values for the `BY` variables.
-
- When `BY` is used, `TABLE` subcommands may be used to introduce
- "table lookup files". `TABLE` has same syntax as `FILE`, and the
- `RENAME`, `IN`, and `SORT` subcommands may follow a `TABLE` in the
- same way as `FILE`. Regardless of the number of `TABLE`s, at least
- one `FILE` must specified. Table lookup files are treated in the
- same way as other input files for most purposes and, in particular,
- table lookup files must be sorted on the `BY` variables or the
- `SORT` subcommand must be specified for that `TABLE`.
-
- Cases in table lookup files are not consumed after they have been
- used once. This means that data in table lookup files can
- correspond to any number of cases in `FILE` input files. Table
- lookup files are analogous to lookup tables in traditional
- relational database systems.
-
- If a table lookup file contains more than one case with a given set
- of `BY` variables, only the first case is used.
-
-When `MATCH FILES` creates an output case, variables that are only in
-files that are not present for the current case are set to the
-system-missing value for numeric variables or spaces for string
-variables.
-
+++ /dev/null
-# UPDATE
-
-```
-UPDATE
-
-Per input file:
- /FILE={*,'FILE_NAME'}
- [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
- [/IN=VAR_NAME]
- [/SORT]
-
-Once per command:
- /BY VAR_LIST[({D|A})] [VAR_LIST[({D|A})]]...
- [/DROP=VAR_LIST]
- [/KEEP=VAR_LIST]
- [/MAP]
-```
-
-`UPDATE` updates a "master file" by applying modifications from one
-or more "transaction files".
-
-`UPDATE` shares the bulk of its syntax with other PSPP commands for
-combining multiple data files (see [Common
-Syntax](index.md#common-syntax) for details).
-
-At least two `FILE` subcommands must be specified. The first `FILE`
-subcommand names the master file, and the rest name transaction files.
-Every input file must either be sorted on the variables named on the
-`BY` subcommand, or the `SORT` subcommand must be used just after the
-`FILE` subcommand for that input file.
-
-`UPDATE` uses the variables specified on the `BY` subcommand, which
-is required, to attempt to match each case in a transaction file with a
-case in the master file:
-
-- When a match is found, then the values of the variables present in
- the transaction file replace those variables' values in the new
- active file. If there are matching cases in more than more
- transaction file, PSPP applies the replacements from the first
- transaction file, then from the second transaction file, and so on.
- Similarly, if a single transaction file has cases with duplicate
- `BY` values, then those are applied in order to the master file.
-
- When a variable in a transaction file has a missing value or when a
- string variable's value is all blanks, that value is never used to
- update the master file.
-
-- If a case in the master file has no matching case in any transaction
- file, then it is copied unchanged to the output.
-
-- If a case in a transaction file has no matching case in the master
- file, then it causes a new case to be added to the output,
- initialized from the values in the transaction file.
-
--- /dev/null
+# COMMENT
+
+```
+Comment commands:
+ COMMENT comment text ... .
+ *comment text ... .
+
+Comments within a line of syntax:
+ FREQUENCIES /VARIABLES=v0 v1 v2. /* All our categorical variables.
+```
+
+`COMMENT` is ignored. It is used to provide information to the
+author and other readers of the PSPP syntax file.
+
+`COMMENT` can extend over any number of lines. It ends at a dot at
+the end of a line or a blank line. The comment may contain any
+characters.
+
+PSPP also supports comments within a line of syntax, introduced with
+`/*`. These comments end at the first `*/` or at the end of the line,
+whichever comes first. A line that contains just this kind of comment
+is considered blank and ends the current command.
+
--- /dev/null
+# COMPUTE
+
+```
+COMPUTE VARIABLE = EXPRESSION.
+ or
+COMPUTE vector(INDEX) = EXPRESSION.
+```
+
+`COMPUTE` assigns the value of an expression to a target variable.
+For each case, the expression is evaluated and its value assigned to
+the target variable. Numeric and string variables may be assigned.
+When a string expression's width differs from the target variable's
+width, the string result of the expression is truncated or padded with
+spaces on the right as necessary. The expression and variable types
+must match.
+
+For numeric variables only, the target variable need not already
+exist. Numeric variables created by `COMPUTE` are assigned an `F8.2`
+output format. String variables must be declared before they can be
+used as targets for `COMPUTE`.
+
+The target variable may be specified as an element of a
+[vector](vector.md). In this case, an expression `INDEX` must be
+specified in parentheses following the vector name. The expression
+`INDEX` must evaluate to a numeric value that, after rounding down to
+the nearest integer, is a valid index for the named vector.
+
+Using `COMPUTE` to assign to a variable specified on
+[`LEAVE`](leave.md) resets the variable's left state. Therefore,
+`LEAVE` should be specified following `COMPUTE`, not before.
+
+`COMPUTE` is a transformation. It does not cause the active dataset
+to be read.
+
+When `COMPUTE` is specified following [`TEMPORARY`](temporary.md), the
+[`LAG`](../language/expressions/functions/miscellaneous.md)
+function may not be used.
+
+## Example
+
+The dataset `physiology.sav` contains the height and weight of
+persons. For some purposes, neither height nor weight alone is of
+interest. Epidemiologists are often more interested in the "body mass
+index" which can sometimes be used as a predictor for clinical
+conditions. The body mass index is defined as the weight of the
+person in kilograms divided by the square of the person's height in
+metres.[^1]
+
+[^1]: Since BMI is a quantity with a ratio scale and has units, the
+term "index" is a misnomer, but that is what it is called.
+
+```
+get file='physiology.sav'.
+
+* height is in mm so we must divide by 1000 to get metres.
+compute bmi = weight / (height/1000)**2.
+variable label bmi "Body Mass Index".
+
+descriptives /weight height bmi.
+```
+
+This syntax shows how you can use `COMPUTE` to generate a new variable
+called bmi and have every case's value calculated from the existing
+values of weight and height. It also shows how you can [add a
+label](variable-labels.md) to this new variable, so that a more
+descriptive label appears in subsequent analyses, and this can be seen
+in the output from the `DESCRIPTIVES` command, below.
+
+The expression which follows the `=` sign can be as complicated as
+necessary. See [Expressions](../language/expressions/index.md) for
+a full description of the language accepted.
+
+```
+ Descriptive Statistics
+┌─────────────────────┬──┬───────┬───────┬───────┬───────┐
+│ │ N│ Mean │Std Dev│Minimum│Maximum│
+├─────────────────────┼──┼───────┼───────┼───────┼───────┤
+│Weight in kilograms │40│ 72.12│ 26.70│ ─55.6│ 92.1│
+│Height in millimeters│40│1677.12│ 262.87│ 179│ 1903│
+│Body Mass Index │40│ 67.46│ 274.08│ ─21.62│1756.82│
+│Valid N (listwise) │40│ │ │ │ │
+│Missing N (listwise) │ 0│ │ │ │ │
+└─────────────────────┴──┴───────┴───────┴───────┴───────┘
+```
--- /dev/null
+# Conditionals and Loops
+
+This chapter documents PSPP commands used for conditional execution,
+looping, and flow of control.
+++ /dev/null
-# BREAK
-
-```
-BREAK.
-```
-
-`BREAK` terminates execution of the innermost currently executing
-`LOOP` construct.
-
-`BREAK` is allowed only inside [`LOOP`...`END LOOP`](loop.md).
-
+++ /dev/null
-# DEFINE…!ENDDEFINE
-
-<!-- toc -->
-
-## Overview
-
-```
-DEFINE macro_name([argument[/argument]...])
-...body...
-!ENDDEFINE.
-```
-
-Each argument takes the following form:
-```
-{!arg_name= | !POSITIONAL}
-[!DEFAULT(default)]
-[!NOEXPAND]
-{!TOKENS(count) | !CHAREND('token') | !ENCLOSE('start' | 'end') | !CMDEND}
-```
-
-The following directives may be used within body:
-```
-!OFFEXPAND
-!ONEXPAND
-```
-
-The following functions may be used within the body:
-```
-!BLANKS(count)
-!CONCAT(arg...)
-!EVAL(arg)
-!HEAD(arg)
-!INDEX(haystack, needle)
-!LENGTH(arg)
-!NULL
-!QUOTE(arg)
-!SUBSTR(arg, start[, count])
-!TAIL(arg)
-!UNQUOTE(arg)
-!UPCASE(arg)
-```
-
-The body may also include the following constructs:
-```
-!IF (condition) !THEN true-expansion !ENDIF
-!IF (condition) !THEN true-expansion !ELSE false-expansion !ENDIF
-
-!DO !var = start !TO end [!BY step]
- body
-!DOEND
-!DO !var !IN (expression)
- body
-!DOEND
-
-!LET !var = expression
-```
-
-## Introduction
-
-The DEFINE command creates a "macro", which is a name for a fragment of
-PSPP syntax called the macro's "body". Following the DEFINE command,
-syntax may "call" the macro by name any number of times. Each call
-substitutes, or "expands", the macro's body in place of the call, as if
-the body had been written in its place.
-
-The following syntax defines a macro named `!vars` that expands to
-the variable names `v1 v2 v3`. The macro's name begins with `!`, which
-is optional for macro names. The `()` following the macro name are
-required:
-
-```
-DEFINE !vars()
-v1 v2 v3
-!ENDDEFINE.
-```
-
-Here are two ways that `!vars` might be called given the preceding
-definition:
-
-```
-DESCRIPTIVES !vars.
-FREQUENCIES /VARIABLES=!vars.
-```
-
-With macro expansion, the above calls are equivalent to the
-following:
-
-```
-DESCRIPTIVES v1 v2 v3.
-FREQUENCIES /VARIABLES=v1 v2 v3.
-```
-
-The `!vars` macro expands to a fixed body. Macros may have more
-sophisticated contents:
-
-- Macro "[arguments](#macro-arguments)" that are substituted into the
- body whenever they are named. The values of a macro's arguments are
- specified each time it is called.
-
-- Macro "[functions](#macro-functions)", expanded when the macro is
- called.
-
-- [`!IF` constructs](#macro-conditional-expansion), for conditional expansion.
-
-- Two forms of [`!DO` construct](#macro-loops), for looping over a
- numerical range or a collection of tokens.
-
-- [`!LET` constructs](#macro-variable-assignment), for assigning to
- macro variables.
-
-Many identifiers associated with macros begin with `!`, a character
-not normally allowed in identifiers. These identifiers are reserved
-only for use with macros, which helps keep them from being confused with
-other kinds of identifiers.
-
-The following sections provide more details on macro syntax and
-semantics.
-
-## Macro Bodies
-
-As previously shown, a macro body may contain a fragment of a PSPP
-command (such as a variable name). A macro body may also contain full
-PSPP commands. In the latter case, the macro body should also contain
-the command terminators.
-
-Most PSPP commands may occur within a macro. The `DEFINE` command
-itself is one exception, because the inner `!ENDDEFINE` ends the outer
-macro definition. For compatibility, `BEGIN DATA`...`END DATA.`
-should not be used within a macro.
-
-The body of a macro may call another macro. The following shows one
-way that could work:
-
-```
-DEFINE !commands()
-DESCRIPTIVES !vars.
-FREQUENCIES /VARIABLES=!vars.
-!ENDDEFINE.
-
-* Initially define the 'vars' macro to analyze v1...v3.
-DEFINE !vars() v1 v2 v3 !ENDDEFINE.
-!commands
-
-* Redefine 'vars' macro to analyze different variables.
-DEFINE !vars() v4 v5 !ENDDEFINE.
-!commands
-```
-
-The `!commands` macro would be easier to use if it took the variables
-to analyze as an argument rather than through another macro. The
-following section shows how to do that.
-
-## Macro Arguments
-
-This section explains how to use macro arguments. As an initial
-example, the following syntax defines a macro named `!analyze` that
-takes all the syntax up to the first command terminator as an argument:
-
-```
-DEFINE !analyze(!POSITIONAL !CMDEND)
-DESCRIPTIVES !1.
-FREQUENCIES /VARIABLES=!1.
-!ENDDEFINE.
-```
-
-When `!analyze` is called, it expands to a pair of analysis commands
-with each `!1` in the body replaced by the argument. That is, these
-calls:
-
-```
-!analyze v1 v2 v3.
-!analyze v4 v5.
-```
-
-act like the following:
-
-```
-DESCRIPTIVES v1 v2 v3.
-FREQUENCIES /VARIABLES=v1 v2 v3.
-DESCRIPTIVES v4 v5.
-FREQUENCIES /VARIABLES=v4 v5.
-```
-
-Macros may take any number of arguments, described within the
-parentheses in the DEFINE command. Arguments come in two varieties
-based on how their values are specified when the macro is called:
-
-- A "positional" argument has a required value that follows the
- macro's name. Use the `!POSITIONAL` keyword to declare a
- positional argument.
-
- When a macro is called, the positional argument values appear in
- the same order as their definitions, before any keyword argument
- values.
-
- References to a positional argument in a macro body are numbered:
- `!1` is the first positional argument, `!2` the second, and so on.
- In addition, `!*` expands to all of the positional arguments'
- values, separated by spaces.
-
- The following example uses a positional argument:
-
- ```
- DEFINE !analyze(!POSITIONAL !CMDEND)
- DESCRIPTIVES !1.
- FREQUENCIES /VARIABLES=!1.
- !ENDDEFINE.
-
- !analyze v1 v2 v3.
- !analyze v4 v5.
- ```
-
-- A "keyword" argument has a name. In the macro call, its value is
- specified with the syntax `name=value`. The names allow keyword
- argument values to take any order in the call.
-
- In declaration and calls, a keyword argument's name may not begin
- with `!`, but references to it in the macro body do start with a
- leading `!`.
-
- The following example uses a keyword argument that defaults to ALL
- if the argument is not assigned a value:
-
- ```
- DEFINE !analyze_kw(vars=!DEFAULT(ALL) !CMDEND)
- DESCRIPTIVES !vars.
- FREQUENCIES /VARIABLES=!vars.
- !ENDDEFINE.
-
- !analyze_kw vars=v1 v2 v3. /* Analyze specified variables.
- !analyze_kw. /* Analyze all variables.
- ```
-
-If a macro has both positional and keyword arguments, then the
-positional arguments must come first in the DEFINE command, and their
-values also come first in macro calls. A keyword argument may be
-omitted by leaving its keyword out of the call, and a positional
-argument may be omitted by putting a command terminator where it would
-appear. (The latter case also omits any following positional
-arguments and all keyword arguments, if there are any.) When an
-argument is omitted, a default value is used: either the value
-specified in `!DEFAULT(value)`, or an empty value otherwise.
-
-Each argument declaration specifies the form of its value:
-
-* `!TOKENS(count)`
- Exactly `count` tokens, e.g. `!TOKENS(1)` for a single token. Each
- identifier, number, quoted string, operator, or punctuator is a
- token (see [Tokens](../../language/basics/tokens.md) for details).
-
- The following variant of `!analyze_kw` accepts only a single
- variable name (or `ALL`) as its argument:
-
- ```
- DEFINE !analyze_one_var(!POSITIONAL !TOKENS(1))
- DESCRIPTIVES !1.
- FREQUENCIES /VARIABLES=!1.
- !ENDDEFINE.
-
- !analyze_one_var v1.
- ```
-
-* `!CHAREND('TOKEN')`
- Any number of tokens up to `TOKEN`, which should be an operator or
- punctuator token such as `/` or `+`. The `TOKEN` does not become
- part of the value.
-
- With the following variant of `!analyze_kw`, the variables must be
- following by `/`:
-
- ```
- DEFINE !analyze_parens(vars=!CHARNED('/'))
- DESCRIPTIVES !vars.
- FREQUENCIES /VARIABLES=!vars.
- !ENDDEFINE.
-
- !analyze_parens vars=v1 v2 v3/.
- ```
-
-* `!ENCLOSE('START','END')`
- Any number of tokens enclosed between `START` and `END`, which
- should each be operator or punctuator tokens. For example, use
- `!ENCLOSE('(',')')` for a value enclosed within parentheses. (Such
- a value could never have right parentheses inside it, even paired
- with left parentheses.) The start and end tokens are not part of
- the value.
-
- With the following variant of `!analyze_kw`, the variables must be
- specified within parentheses:
-
- ```
- DEFINE !analyze_parens(vars=!ENCLOSE('(',')'))
- DESCRIPTIVES !vars.
- FREQUENCIES /VARIABLES=!vars.
- !ENDDEFINE.
-
- !analyze_parens vars=(v1 v2 v3).
- ```
-
-* `!CMDEND`
- Any number of tokens up to the end of the command. This should be
- used only for the last positional parameter, since it consumes all
- of the tokens in the command calling the macro.
-
- The following variant of `!analyze_kw` takes all the variable names
- up to the end of the command as its argument:
-
- ```
- DEFINE !analyze_kw(vars=!CMDEND)
- DESCRIPTIVES !vars.
- FREQUENCIES /VARIABLES=!vars.
- !ENDDEFINE.
-
- !analyze_kw vars=v1 v2 v3.
- ```
-
-By default, when an argument's value contains a macro call, the call
-is expanded each time the argument appears in the macro's body. The
-[`!NOEXPAND` keyword](#controlling-macro-expansion) in an argument
-declaration suppresses this expansion.
-
-## Controlling Macro Expansion
-
-Multiple factors control whether macro calls are expanded in different
-situations. At the highest level, `SET MEXPAND` controls whether
-macro calls are expanded. By default, it is enabled. [`SET
-MEXPAND`](../utilities/set.md#mexpand), for details.
-
-A macro body may contain macro calls. By default, these are expanded.
-If a macro body contains `!OFFEXPAND` or `!ONEXPAND` directives, then
-`!OFFEXPAND` disables expansion of macro calls until the following
-`!ONEXPAND`.
-
-A macro argument's value may contain a macro call. These macro calls
-are expanded, unless the argument was declared with the `!NOEXPAND`
-keyword.
-
-The argument to a macro function is a special context that does not
-expand macro calls. For example, if `!vars` is the name of a macro,
-then `!LENGTH(!vars)` expands to 5, as does `!LENGTH(!1)` if
-positional argument 1 has value `!vars`. To expand macros in these
-cases, use the [`!EVAL` macro function](#eval),
-e.g. `!LENGTH(!EVAL(!vars))` or `!LENGTH(!EVAL(!1))`.
-
-These rules apply to macro calls, not to uses within a macro body of
-macro functions, macro arguments, and macro variables created by `!DO`
-or `!LET`, which are always expanded.
-
-`SET MEXPAND` may appear within the body of a macro, but it will not
-affect expansion of the macro that it appears in. Use `!OFFEXPAND`
-and `!ONEXPAND` instead.
-
-## Macro Functions
-
-Macro bodies may manipulate syntax using macro functions. Macro
-functions accept tokens as arguments and expand to sequences of
-characters.
-
-The arguments to macro functions have a restricted form. They may
-only be a single token (such as an identifier or a string), a macro
-argument, or a call to a macro function. Thus, the following are valid
-macro arguments:
-
-- `x`
-- `5.0`
-- `x`
-- `!1`
-- `"5 + 6"`
-- `!CONCAT(x,y)`
-
-and the following are not (because they are each multiple tokens):
-
-- `x y`
-- `5+6`
-
-Macro functions expand to sequences of characters. When these
-character strings are processed further as character strings,
-e.g. with `!LENGTH`, any character string is valid. When they are
-interpreted as PSPP syntax, e.g. when the expansion becomes part of a
-command, they need to be valid for that purpose. For example,
-`!UNQUOTE("It's")` will yield an error if the expansion `It's` becomes
-part of a PSPP command, because it contains unbalanced single quotes,
-but `!LENGTH(!UNQUOTE("It's"))` expands to 4.
-
-The following macro functions are available.
-
-* `!BLANKS(count)`
- Expands to COUNT unquoted spaces, where COUNT is a nonnegative
- integer. Outside quotes, any positive number of spaces are
- equivalent; for a quoted string of spaces, use
- `!QUOTE(!BLANKS(COUNT))`.
-
- In the examples below, `_` stands in for a space to make the
- results visible.
-
- ```
- !BLANKS(0) ↦ empty
- !BLANKS(1) ↦ _
- !BLANKS(2) ↦ __
- !QUOTE(!BLANKS(5)) ↦ '_____'
- ```
-
- |Call|Expansion|
- |:-----|:--------|
- |`!BLANKS(0)`|(empty)|
- |`!BLANKS(1)`|`_`|
- |`!BLANKS(2)`|`__`|
- |`!QUOTE(!BLANKS(5))|`'_____'`|
-
-* `!CONCAT(arg...)`
- Expands to the concatenation of all of the arguments. Before
- concatenation, each quoted string argument is unquoted, as if
- `!UNQUOTE` were applied. This allows for "token pasting",
- combining two (or more) tokens into a single one:
-
- |Call|Expansion|
- |:-----|:--------|
- |`!CONCAT(x, y)`|`xy`|
- |`!CONCAT('x', 'y')`|`xy`|
- |`!CONCAT(12, 34)`|`1234`|
- |`!CONCAT(!NULL, 123)`|`123`|
-
- `!CONCAT` is often used for constructing a series of similar
- variable names from a prefix followed by a number and perhaps a
- suffix. For example:
-
- |Call|Expansion|
- |:-----|:--------|
- |`!CONCAT(x, 0)`|`x0`|
- |`!CONCAT(x, 0, y)`|`x0y`|
-
- An identifier token must begin with a letter (or `#` or `@`), which
- means that attempting to use a number as the first part of an
- identifier will produce a pair of distinct tokens rather than a
- single one. For example:
-
- |Call|Expansion|
- |:-----|:--------|
- |`!CONCAT(0, x)`|`0 x`|
- |`!CONCAT(0, x, y)`|`0 xy`|
-
-* <a name="eval">`!EVAL(arg)`</a>
- Expands macro calls in ARG. This is especially useful if ARG is
- the name of a macro or a macro argument that expands to one,
- because arguments to macro functions are not expanded by default
- (see [Controlling Macro Expansion](#controlling-macro-expansion)).
-
- The following examples assume that `!vars` is a macro that expands
- to `a b c`:
-
- |Call|Expansion|
- |:-----|:--------|
- |`!vars`|`a b c`|
- |`!QUOTE(!vars)`|`'!vars'`|
- |`!EVAL(!vars)`|`a b c`|
- |`!QUOTE(!EVAL(!vars))`|`'a b c'`|
-
- These examples additionally assume that argument `!1` has value
- `!vars`:
-
- |Call|Expansion|
- |:-----|:--------|
- |`!1`|`a b c`|
- |`!QUOTE(!1)`|`'!vars'`|
- |`!EVAL(!1)`|`a b c`|
- |`!QUOTE(!EVAL(!1))`|`'a b c'`|
-
-* `!HEAD(arg)`
- `!TAIL(arg)`
- `!HEAD` expands to just the first token in an unquoted version of
- ARG, and `!TAIL` to all the tokens after the first.
-
- |Call|Expansion|
- |:-----|:--------|
- |`!HEAD('a b c')`|`a`|
- |`!HEAD('a')`|`a`|
- |`!HEAD(!NULL)`|(empty)|
- |`!HEAD('')`|(empty)|
- |`!TAIL('a b c')`|`b c`|
- |`!TAIL('a')`|(empty)|
- |`!TAIL(!NULL)`|(empty)|
- |`!TAIL('')`|(empty)|
-
-* `!INDEX(haystack, needle)`
- Looks for NEEDLE in HAYSTACK. If it is present, expands to the
- 1-based index of its first occurrence; if not, expands to 0.
-
- |Call|Expansion|
- |:-----|:--------|
- |`!INDEX(banana, an)`|`2`|
- |`!INDEX(banana, nan)`|`3`|
- |`!INDEX(banana, apple)`|`0`|
- |`!INDEX("banana", nan)`|`4`|
- |`!INDEX("banana", "nan")`|`0`|
- |`!INDEX(!UNQUOTE("banana"), !UNQUOTE("nan"))`|`3`|
-
-* `!LENGTH(arg)`
- Expands to a number token representing the number of characters in
- ARG.
-
- |Call|Expansion|
- |:-----|:--------|
- |`!LENGTH(123)`|`3`|
- |`!LENGTH(123.00)`|`6`|
- |`!LENGTH( 123 )`|`3`|
- |`!LENGTH("123")`|`5`|
- |`!LENGTH(xyzzy)`|`5`|
- |`!LENGTH("xyzzy")`|`7`|
- |`!LENGTH("xy""zzy")`|`9`|
- |`!LENGTH(!UNQUOTE("xyzzy"))`|`5`|
- |`!LENGTH(!UNQUOTE("xy""zzy"))`|`6`|
- |`!LENGTH(!1)`|`5` (if `!1` is `a b c`)|
- |`!LENGTH(!1)`|`0` (if `!1` is empty)|
- |`!LENGTH(!NULL)`|`0`|
-
-* `!NULL`
- Expands to an empty character sequence.
-
- |Call|Expansion|
- |:-----|:--------|
- |`!NULL`|(empty)|
- |`!QUOTE(!NULL)`|`''`|
-
-* `!QUOTE(arg)`
- `!UNQUOTE(arg)`
- The `!QUOTE` function expands to its argument surrounded by
- apostrophes, doubling any apostrophes inside the argument to make
- sure that it is valid PSPP syntax for a string. If the argument
- was already a quoted string, `!QUOTE` expands to it unchanged.
-
- Given a quoted string argument, the `!UNQUOTED` function expands to
- the string's contents, with the quotes removed and any doubled
- quote marks reduced to singletons. If the argument was not a
- quoted string, `!UNQUOTE` expands to the argument unchanged.
-
- |Call|Expansion|
- |:-----|:--------|
- |`!QUOTE(123.0)`|`'123.0'`|
- |`!QUOTE( 123 )`|`'123'`|
- |`!QUOTE('a b c')`|`'a b c'`|
- |`!QUOTE("a b c")`|`"a b c"`|
- |`!QUOTE(!1)`|`'a ''b'' c'` (if `!1` is `a 'b' c`)|
- |`!UNQUOTE(123.0)`|`123.0`|
- |`!UNQUOTE( 123 )`|`123`|
- |`!UNQUOTE('a b c')`|`a b c`|
- |`!UNQUOTE("a b c")`|`a b c`|
- |`!UNQUOTE(!1)`|`a 'b' c` (if `!1` is `a 'b' c`)|
- |`!QUOTE(!UNQUOTE(123.0))`|`'123.0'`|
- |`!QUOTE(!UNQUOTE( 123 ))`|`'123'`|
- |`!QUOTE(!UNQUOTE('a b c'))`|`'a b c'`|
- |`!QUOTE(!UNQUOTE("a b c"))`|`'a b c'`|
- |`!QUOTE(!UNQUOTE(!1))`|`'a ''b'' c'` (if `!1` is `a 'b' c`)|
-
-* `!SUBSTR(arg, start[, count])`
- Expands to a substring of ARG starting from 1-based position START.
- If COUNT is given, it limits the number of characters in the
- expansion; if it is omitted, then the expansion extends to the end
- of ARG.
-
- ```
- |Call|Expansion|
- |:-----|:--------|
- |`!SUBSTR(banana, 3)`|`nana`|
- |`!SUBSTR(banana, 3, 3)`|`nan`|
- |`!SUBSTR("banana", 1, 3)`|error (`"ba` is not a valid token)|
- |`!SUBSTR(!UNQUOTE("banana"), 3)`|`nana`|
- |`!SUBSTR("banana", 3, 3)`|`ana`|
- |`!SUBSTR(banana, 3, 0)`|(empty)|
- |`!SUBSTR(banana, 3, 10)`|`nana`|
- |`!SUBSTR(banana, 10, 3)`|(empty)|
- ```
-
-* `!UPCASE(arg)`
- Expands to an unquoted version of ARG with all letters converted to
- uppercase.
-
- |Call|Expansion|
- |:-----|:--------|
- |`!UPCASE(freckle)`|`FRECKLE`|
- |`!UPCASE('freckle')`|`FRECKLE`|
- |`!UPCASE('a b c')`|`A B C`|
- |`!UPCASE('A B C')`|`A B C`|
-
-## Macro Expressions
-
-Macro expressions are used in conditional expansion and loops, which are
-described in the following sections. A macro expression may use the
-following operators, listed in descending order of operator precedence:
-
-* `()`
- Parentheses override the default operator precedence.
-
-* `!EQ !NE !GT !LT !GE !LE = ~= <> > < >= <=`
- Relational operators compare their operands and yield a Boolean
- result, either `0` for false or `1` for true.
-
- These operators always compare their operands as strings. This can
- be surprising when the strings are numbers because, e.g., `1 < 1.0`
- and `10 < 2` both evaluate to `1` (true).
-
- Comparisons are case sensitive, so that `a = A` evaluates to `0`
- (false).
-
-* `!NOT ~`
- `!AND &`
- `!OR |`
- Logical operators interpret their operands as Boolean values, where
- quoted or unquoted `0` is false and anything else is true, and
- yield a Boolean result, either `0` for false or `1` for true.
-
-Macro expressions do not include any arithmetic operators.
-
-An operand in an expression may be a single token (including a macro
-argument name) or a macro function invocation. Either way, the
-expression evaluator unquotes the operand, so that `1 = '1'` is true.
-
-## Macro Conditional Expansion
-
-The `!IF` construct may be used inside a macro body to allow for
-conditional expansion. It takes the following forms:
-
-```
-!IF (EXPRESSION) !THEN TRUE-EXPANSION !IFEND
-!IF (EXPRESSION) !THEN TRUE-EXPANSION !ELSE FALSE-EXPANSION !IFEND
-```
-
-When `EXPRESSION` evaluates to true, the macro processor expands
-`TRUE-EXPANSION`; otherwise, it expands `FALSE-EXPANSION`, if it is
-present. The macro processor considers quoted or unquoted `0` to be
-false, and anything else to be true.
-
-## Macro Loops
-
-The body of a macro may include two forms of loops: loops over numerical
-ranges and loops over tokens. Both forms expand a "loop body" multiple
-times, each time setting a named "loop variable" to a different value.
-The loop body typically expands the loop variable at least once.
-
-The [`MITERATE` setting](../utilities/set.md#miterate) limits the number of
-iterations in a loop. This is a safety measure to ensure that macro
-expansion terminates. PSPP issues a warning when the `MITERATE` limit is
-exceeded.
-
-### Loops Over Ranges
-
-```
-!DO !VAR = START !TO END [!BY STEP]
- BODY
-!DOEND
-```
-
-A loop over a numerical range has the form shown above. `START`,
-`END`, and `STEP` (if included) must be expressions with numeric
-values. The macro processor accepts both integers and real numbers.
-The macro processor expands `BODY` for each numeric value from `START`
-to `END`, inclusive.
-
-The default value for `STEP` is 1. If `STEP` is positive and `FIRST >
-LAST`, or if `STEP` is negative and `FIRST < LAST`, then the macro
-processor doesn't expand the body at all. `STEP` may not be zero.
-
-### Loops Over Tokens
-
-```
-!DO !VAR !IN (EXPRESSION)
- BODY
-!DOEND
-```
-
-A loop over tokens takes the form shown above. The macro processor
-evaluates `EXPRESSION` and expands `BODY` once per token in the
-result, substituting the token for `!VAR` each time it appears.
-
-## Macro Variable Assignment
-
-The `!LET` construct evaluates an expression and assigns the result to a
-macro variable. It may create a new macro variable or change the value
-of one created by a previous `!LET` or `!DO`, but it may not change the
-value of a macro argument. `!LET` has the following form:
-
-```
-!LET !VAR = EXPRESSION
-```
-
-If `EXPRESSION` is more than one token, it must be enclosed in
-parentheses.
-
-## Macro Settings
-
-Some macro behavior is controlled through the
-[`SET`](../utilities/set.md) command. This section describes these
-settings.
-
-Any `SET` command that changes these settings within a macro body only
-takes effect following the macro. This is because PSPP expands a
-macro's entire body at once, so that `SET` inside the body only
-executes afterwards.
-
-The [`MEXPAND`](../utilities/set.md#mexpand) setting controls whether
-macros will be expanded at all. By default, macro expansion is on.
-To avoid expansion of macros called within a macro body, use
-[`!OFFEXPAND` and `!ONEXPAND`](#controlling-macro-expansion).
-
-When [`MPRINT`](../utilities/set.md#mprint) is turned on, PSPP outputs
-an expansion of each macro called. This feature can be useful for
-debugging macro definitions. For reading the expanded version, keep
-in mind that macro expansion removes comments and standardizes white
-space.
-
-[`MNEST`](../utilities/set.md#mnest) limits the depth of expansion of
-macro calls, that is, the nesting level of macro expansion. The
-default is 50. This is mainly useful to avoid infinite expansion in
-the case of a macro that calls itself.
-
-[`MITERATE`](../utilities/set.md#miterate) limits the number of
-iterations in a `!DO` construct. The default is 1000.
-
-## Additional Notes
-
-### Calling Macros from Macros
-
-If the body of macro A includes a call to macro B, the call can use
-macro arguments (including `!*`) and macro variables as part of
-arguments to B. For `!TOKENS` arguments, the argument or variable name
-counts as one token regardless of the number that it expands into; for
-`!CHAREND` and `!ENCLOSE` arguments, the delimiters come only from the
-call, not the expansions; and `!CMDEND` ends at the calling command, not
-any end of command within an argument or variable.
-
-Macro functions are not supported as part of the arguments in a macro
-call. To get the same effect, use `!LET` to define a macro variable,
-then pass the macro variable to the macro.
-
-When macro A calls macro B, the order of their `DEFINE` commands
-doesn't matter, as long as macro B has been defined when A is called.
-
-### Command Terminators
-
-Macros and command terminators require care. Macros honor the syntax
-differences between [interactive and batch
-syntax](../../language/basics/syntax-variants.md), which means that
-the interpretation of a macro can vary depending on the syntax mode in
-use. We assume here that interactive mode is in use, in which `.` at
-the end of a line is the primary way to end a command.
-
-The `DEFINE` command needs to end with `.` following the `!ENDDEFINE`.
-The macro body may contain `.` if it is intended to expand to whole
-commands, but using `.` within a macro body that expands to just
-syntax fragments (such as a list of variables) will cause syntax
-errors.
-
-Macro directives such as `!IF` and `!DO` do not end with `.`.
-
-### Expansion Contexts
-
-PSPP does not expand macros within comments, whether introduced within
-a line by `/*` or as a separate [`COMMENT` or
-`*`](../utilities/comment.md) command. (SPSS does expand macros in
-`COMMENT` and `*`.)
-
-Macros do not expand within quoted strings.
-
-Macros are expanded in the [`TITLE`](../utilities/title.md) and
-[`SUBTITLE`](../utilities/subtitle.md) commands as long as their
-arguments are not quoted strings.
-
-### PRESERVE and RESTORE
-
-Some macro bodies might use the [`SET`](../utilities/set.md) command
-to change certain settings. When this is the case, consider using the
-[`PRESERVE` and `RESTORE`](../utilities/preserve.md) commands to save
-and then restore these settings.
-
+++ /dev/null
-# DO IF…END IF
-
-```
-DO IF condition.
- ...
-[ELSE IF condition.
- ...
-]...
-[ELSE.
- ...]
-END IF.
-```
-
-`DO IF` allows one of several sets of transformations to be executed,
-depending on user-specified conditions.
-
-If the specified boolean expression evaluates as true, then the block
-of code following `DO IF` is executed. If it evaluates as missing,
-then none of the code blocks is executed. If it is false, then the
-boolean expression on the first `ELSE IF`, if present, is tested in
-turn, with the same rules applied. If all expressions evaluate to
-false, then the `ELSE` code block is executed, if it is present.
-
-When `DO IF` or `ELSE IF` is specified following
-[`TEMPORARY`](../../commands/selection/temporary.md), the
-[`LAG`](../../language/expressions/functions/miscellaneous.md)
-function may not be used.
-
+++ /dev/null
-# DO REPEAT…END REPEAT
-
-```
-DO REPEAT dummy_name=expansion....
- ...
-END REPEAT [PRINT].
-
-expansion takes one of the following forms:
- var_list
- num_or_range...
- 'string'...
- ALL
-
-num_or_range takes one of the following forms:
- number
- num1 TO num2
-```
-
-`DO REPEAT` repeats a block of code, textually substituting different
-variables, numbers, or strings into the block with each repetition.
-
-Specify a dummy variable name followed by an equals sign (`=`) and
-the list of replacements. Replacements can be a list of existing or new
-variables, numbers, strings, or `ALL` to specify all existing variables.
-When numbers are specified, runs of increasing integers may be indicated
-as `NUM1 TO NUM2`, so that `1 TO 5` is short for `1 2 3 4 5`.
-
-Multiple dummy variables can be specified. Each variable must have
-the same number of replacements.
-
-The code within `DO REPEAT` is repeated as many times as there are
-replacements for each variable. The first time, the first value for
-each dummy variable is substituted; the second time, the second value
-for each dummy variable is substituted; and so on.
-
-Dummy variable substitutions work like macros. They take place
-anywhere in a line that the dummy variable name occurs. This includes
-command and subcommand names, so command and subcommand names that
-appear in the code block should not be used as dummy variable
-identifiers. Dummy variable substitutions do not occur inside quoted
-strings, comments, unquoted strings (such as the text on the `TITLE`
-or `DOCUMENT` command), or inside `BEGIN DATA`...`END DATA`.
-
-Substitution occurs only on whole words, so that, for example, a dummy
-variable `PRINT` would not be substituted into the word `PRINTOUT`.
-
-New variable names used as replacements are not automatically created
-as variables, but only if used in the code block in a context that
-would create them, e.g. on a `NUMERIC` or `STRING` command or on the
-left side of a `COMPUTE` assignment.
-
-Any command may appear within `DO REPEAT`, including nested `DO
-REPEAT` commands. If `INCLUDE` or `INSERT` appears within `DO
-REPEAT`, the substitutions do not apply to the included file.
-
-If `PRINT` is specified on `END REPEAT`, the commands after
-substitutions are made should be printed to the listing file, prefixed
-by a plus sign (`+`). This feature is not yet implemented.
-
+++ /dev/null
-This chapter documents PSPP commands used for conditional execution,
-looping, and flow of control.
+++ /dev/null
-# LOOP…END LOOP
-
-```
-LOOP [INDEX_VAR=START TO END [BY INCR]] [IF CONDITION].
- ...
-END LOOP [IF CONDITION].
-```
-
-`LOOP` iterates a group of commands. A number of termination options
-are offered.
-
-Specify `INDEX_VAR` to make that variable count from one value to
-another by a particular increment. `INDEX_VAR` must be a pre-existing
-numeric variable. `START`, `END`, and `INCR` are numeric
-[expressions](../../language/expressions/index.md).
-
-During the first iteration, `INDEX_VAR` is set to the value of
-`START`. During each successive iteration, `INDEX_VAR` is increased
-by the value of `INCR`. If `END > START`, then the loop terminates
-when `INDEX_VAR > END`; otherwise it terminates when `INDEX_VAR <
-END`. If `INCR` is not specified then it defaults to +1 or -1 as
-appropriate.
-
-If `END > START` and `INCR < 0`, or if `END < START` and `INCR > 0`,
-then the loop is never executed. `INDEX_VAR` is nevertheless set to
-the value of start.
-
-Modifying `INDEX_VAR` within the loop is allowed, but it has no effect
-on the value of `INDEX_VAR` in the next iteration.
-
-Specify a boolean expression for the condition on `LOOP` to cause the
-loop to be executed only if the condition is true. If the condition
-is false or missing before the loop contents are executed the first
-time, the loop contents are not executed at all.
-
-If index and condition clauses are both present on `LOOP`, the index
-variable is always set before the condition is evaluated. Thus, a
-condition that makes use of the index variable will always see the index
-value to be used in the next execution of the body.
-
-Specify a boolean expression for the condition on `END LOOP` to cause
-the loop to terminate if the condition is true after the enclosed code
-block is executed. The condition is evaluated at the end of the loop,
-not at the beginning, so that the body of a loop with only a condition
-on `END LOOP` will always execute at least once.
-
-If the index clause is not present, then the global
-[`MXLOOPS`](../utilities/set.md#mxloops) setting, which defaults to
-40, limits the number of iterations.
-
-[`BREAK`](break.md) also terminates `LOOP` execution.
-
-Loop index variables are by default reset to system-missing from one
-case to another, not left, unless a scratch variable is used as index.
-When loops are nested, this is usually undesired behavior, which can
-be corrected with [`LEAVE`](../../commands/variables/leave.md) or by
-using a [scratch
-variable](../../language/datasets/scratch-variables.md) as the loop
-index.
-
-When `LOOP` or `END LOOP` is specified following
-[`TEMPORARY`](../../commands/selection/temporary.md), the
-[`LAG`](../../language/expressions/functions/miscellaneous.md)
-function may not be used.
-
--- /dev/null
+# CORRELATIONS
+
+```
+CORRELATIONS
+ /VARIABLES = VAR_LIST [ WITH VAR_LIST ]
+ [
+ .
+ .
+ .
+ /VARIABLES = VAR_LIST [ WITH VAR_LIST ]
+ /VARIABLES = VAR_LIST [ WITH VAR_LIST ]
+ ]
+
+ [ /PRINT={TWOTAIL, ONETAIL} {SIG, NOSIG} ]
+ [ /STATISTICS=DESCRIPTIVES XPROD ALL]
+ [ /MISSING={PAIRWISE, LISTWISE} {INCLUDE, EXCLUDE} ]
+```
+
+The `CORRELATIONS` procedure produces tables of the Pearson
+correlation coefficient for a set of variables. The significance of the
+coefficients are also given.
+
+At least one `VARIABLES` subcommand is required. If you specify the
+`WITH` keyword, then a non-square correlation table is produced. The
+variables preceding `WITH`, are used as the rows of the table, and the
+variables following `WITH` are used as the columns of the table. If no
+`WITH` subcommand is specified, then `CORRELATIONS` produces a square,
+symmetrical table using all variables.
+
+The `MISSING` subcommand determines the handling of missing
+variables. If `INCLUDE` is set, then user-missing values are included
+in the calculations, but system-missing values are not. If `EXCLUDE` is
+set, which is the default, user-missing values are excluded as well as
+system-missing values.
+
+If `LISTWISE` is set, then the entire case is excluded from analysis
+whenever any variable specified in any `/VARIABLES` subcommand contains
+a missing value. If `PAIRWISE` is set, then a case is considered
+missing only if either of the values for the particular coefficient are
+missing. The default is `PAIRWISE`.
+
+The `PRINT` subcommand is used to control how the reported
+significance values are printed. If the `TWOTAIL` option is used, then
+a two-tailed test of significance is printed. If the `ONETAIL` option
+is given, then a one-tailed test is used. The default is `TWOTAIL`.
+
+If the `NOSIG` option is specified, then correlation coefficients
+with significance less than 0.05 are highlighted. If `SIG` is
+specified, then no highlighting is performed. This is the default.
+
+The `STATISTICS` subcommand requests additional statistics to be
+displayed. The keyword `DESCRIPTIVES` requests that the mean, number of
+non-missing cases, and the non-biased estimator of the standard
+deviation are displayed. These statistics are displayed in a separated
+table, for all the variables listed in any `/VARIABLES` subcommand. The
+`XPROD` keyword requests cross-product deviations and covariance
+estimators to be displayed for each pair of variables. The keyword
+`ALL` is the union of `DESCRIPTIVES` and `XPROD`.
+
--- /dev/null
+CROSSTABS
+
+```
+CROSSTABS
+ /TABLES=VAR_LIST BY VAR_LIST [BY VAR_LIST]...
+ /MISSING={TABLE,INCLUDE,REPORT}
+ /FORMAT={TABLES,NOTABLES}
+ {AVALUE,DVALUE}
+ /CELLS={COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
+ ASRESIDUAL,ALL,NONE}
+ /COUNT={ASIS,CASE,CELL}
+ {ROUND,TRUNCATE}
+ /STATISTICS={CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
+ KAPPA,ETA,CORR,ALL,NONE}
+ /BARCHART
+
+(Integer mode.)
+ /VARIABLES=VAR_LIST (LOW,HIGH)...
+```
+
+The `CROSSTABS` procedure displays crosstabulation tables requested
+by the user. It can calculate several statistics for each cell in the
+crosstabulation tables. In addition, a number of statistics can be
+calculated for each table itself.
+
+The `TABLES` subcommand is used to specify the tables to be reported.
+Any number of dimensions is permitted, and any number of variables per
+dimension is allowed. The `TABLES` subcommand may be repeated as many
+times as needed. This is the only required subcommand in "general
+mode".
+
+Occasionally, one may want to invoke a special mode called "integer
+mode". Normally, in general mode, PSPP automatically determines what
+values occur in the data. In integer mode, the user specifies the range
+of values that the data assumes. To invoke this mode, specify the
+`VARIABLES` subcommand, giving a range of data values in parentheses for
+each variable to be used on the `TABLES` subcommand. Data values inside
+the range are truncated to the nearest integer, then assigned to that
+value. If values occur outside this range, they are discarded. When it
+is present, the `VARIABLES` subcommand must precede the `TABLES`
+subcommand.
+
+In general mode, numeric and string variables may be specified on
+`TABLES`. In integer mode, only numeric variables are allowed.
+
+The `MISSING` subcommand determines the handling of user-missing
+values. When set to `TABLE`, the default, missing values are dropped on
+a table by table basis. When set to `INCLUDE`, user-missing values are
+included in tables and statistics. When set to `REPORT`, which is
+allowed only in integer mode, user-missing values are included in tables
+but marked with a footnote and excluded from statistical calculations.
+
+The `FORMAT` subcommand controls the characteristics of the
+crosstabulation tables to be displayed. It has a number of possible
+settings:
+
+* `TABLES`, the default, causes crosstabulation tables to be output.
+
+* `NOTABLES`, which is equivalent to `CELLS=NONE`, suppresses them.
+
+* `AVALUE`, the default, causes values to be sorted in ascending
+ order. `DVALUE` asserts a descending sort order.
+
+The `CELLS` subcommand controls the contents of each cell in the
+displayed crosstabulation table. The possible settings are:
+
+* `COUNT`
+ Frequency count.
+* `ROW`
+ Row percent.
+* `COLUMN`
+ Column percent.
+* `TOTAL`
+ Table percent.
+* `EXPECTED`
+ Expected value.
+* `RESIDUAL`
+ Residual.
+* `SRESIDUAL`
+ Standardized residual.
+* `ASRESIDUAL`
+ Adjusted standardized residual.
+* `ALL`
+ All of the above.
+* `NONE`
+ Suppress cells entirely.
+
+`/CELLS` without any settings specified requests `COUNT`, `ROW`,
+`COLUMN`, and `TOTAL`. If `CELLS` is not specified at all then only
+`COUNT` is selected.
+
+By default, crosstabulation and statistics use raw case weights,
+without rounding. Use the `/COUNT` subcommand to perform rounding:
+`CASE` rounds the weights of individual weights as cases are read,
+`CELL` rounds the weights of cells within each crosstabulation table
+after it has been constructed, and `ASIS` explicitly specifies the
+default non-rounding behavior. When rounding is requested, `ROUND`,
+the default, rounds to the nearest integer and `TRUNCATE` rounds
+toward zero.
+
+The `STATISTICS` subcommand selects statistics for computation:
+
+* `CHISQ`
+ Pearson chi-square, likelihood ratio, Fisher's exact test,
+ continuity correction, linear-by-linear association.
+* `PHI`
+ Phi.
+* `CC`
+ Contingency coefficient.
+* `LAMBDA`
+ Lambda.
+* `UC`
+ Uncertainty coefficient.
+* `BTAU`
+ Tau-b.
+* `CTAU`
+ Tau-c.
+* `RISK`
+ Risk estimate.
+* `GAMMA`
+ Gamma.
+* `D`
+ Somers' D.
+* `KAPPA`
+ Cohen's Kappa.
+* `ETA`
+ Eta.
+* `CORR`
+ Spearman correlation, Pearson's r.
+* `ALL`
+ All of the above.
+* `NONE`
+ No statistics.
+
+Selected statistics are only calculated when appropriate for the
+statistic. Certain statistics require tables of a particular size, and
+some statistics are calculated only in integer mode.
+
+`/STATISTICS` without any settings selects CHISQ. If the `STATISTICS`
+subcommand is not given, no statistics are calculated.
+
+The `/BARCHART` subcommand produces a clustered bar chart for the
+first two variables on each table. If a table has more than two
+variables, the counts for the third and subsequent levels are aggregated
+and the chart is produced as if there were only two variables.
+
+> Currently the implementation of `CROSSTABS` has the
+> following limitations:
+>
+> - Significance of some symmetric and directional measures is not
+> calculated.
+> - Asymptotic standard error is not calculated for Goodman and
+> Kruskal's tau or symmetric Somers' d.
+> - Approximate T is not calculated for symmetric uncertainty
+> coefficient.
+>
+> Fixes for any of these deficiencies would be welcomed.
+
+## Example
+
+A researcher wishes to know if, in an industry, a person's sex is
+related to the person's occupation. To investigate this, she has
+determined that the `personnel.sav` is a representative, randomly
+selected sample of persons. The researcher's null hypothesis is that a
+person's sex has no relation to a person's occupation. She uses a
+chi-squared test of independence to investigate the hypothesis.
+
+```
+get file="personnel.sav".
+
+crosstabs
+ /tables= occupation by sex
+ /cells = count expected
+ /statistics=chisq.
+```
+
+The syntax above conducts a chi-squared test of independence. The
+line `/tables = occupation by sex` indicates that occupation and sex
+are the variables to be tabulated.
+
+As shown in the output below, `CROSSTABS` generates a contingency
+table containing the observed count and the expected count of each sex
+and each occupation. The expected count is the count which would be
+observed if the null hypothesis were true.
+
+The significance of the Pearson Chi-Square value is very much larger
+than the normally accepted value of 0.05 and so one cannot reject the
+null hypothesis. Thus the researcher must conclude that a person's
+sex has no relation to the person's occupation.
+
+```
+ Summary
+┌────────────────┬───────────────────────────────┐
+│ │ Cases │
+│ ├──────────┬─────────┬──────────┤
+│ │ Valid │ Missing │ Total │
+│ ├──┬───────┼─┬───────┼──┬───────┤
+│ │ N│Percent│N│Percent│ N│Percent│
+├────────────────┼──┼───────┼─┼───────┼──┼───────┤
+│occupation × sex│54│ 96.4%│2│ 3.6%│56│ 100.0%│
+└────────────────┴──┴───────┴─┴───────┴──┴───────┘
+
+ occupation × sex
+┌──────────────────────────────────────┬───────────┬─────┐
+│ │ sex │ │
+│ ├────┬──────┤ │
+│ │Male│Female│Total│
+├──────────────────────────────────────┼────┼──────┼─────┤
+│occupation Artist Count │ 2│ 6│ 8│
+│ Expected│4.89│ 3.11│ .15│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Baker Count │ 1│ 1│ 2│
+│ Expected│1.22│ .78│ .04│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Barrister Count │ 0│ 1│ 1│
+│ Expected│ .61│ .39│ .02│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Carpenter Count │ 3│ 1│ 4│
+│ Expected│2.44│ 1.56│ .07│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Cleaner Count │ 4│ 0│ 4│
+│ Expected│2.44│ 1.56│ .07│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Cook Count │ 3│ 2│ 5│
+│ Expected│3.06│ 1.94│ .09│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Manager Count │ 4│ 4│ 8│
+│ Expected│4.89│ 3.11│ .15│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Mathematician Count │ 3│ 1│ 4│
+│ Expected│2.44│ 1.56│ .07│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Painter Count │ 1│ 1│ 2│
+│ Expected│1.22│ .78│ .04│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Payload Specialist Count │ 1│ 0│ 1│
+│ Expected│ .61│ .39│ .02│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Plumber Count │ 5│ 0│ 5│
+│ Expected│3.06│ 1.94│ .09│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Scientist Count │ 5│ 2│ 7│
+│ Expected│4.28│ 2.72│ .13│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Scrientist Count │ 0│ 1│ 1│
+│ Expected│ .61│ .39│ .02│
+│ ────────────────────────────┼────┼──────┼─────┤
+│ Tailor Count │ 1│ 1│ 2│
+│ Expected│1.22│ .78│ .04│
+├──────────────────────────────────────┼────┼──────┼─────┤
+│Total Count │ 33│ 21│ 54│
+│ Expected│ .61│ .39│ 1.00│
+└──────────────────────────────────────┴────┴──────┴─────┘
+
+ Chi─Square Tests
+┌──────────────────┬─────┬──┬──────────────────────────┐
+│ │Value│df│Asymptotic Sig. (2─tailed)│
+├──────────────────┼─────┼──┼──────────────────────────┤
+│Pearson Chi─Square│15.59│13│ .272│
+│Likelihood Ratio │19.66│13│ .104│
+│N of Valid Cases │ 54│ │ │
+└──────────────────┴─────┴──┴──────────────────────────┘
+```
--- /dev/null
+# CTABLES
+
+`CTABLES` has the following overall syntax. At least one `TABLE`
+subcommand is required:
+
+```
+CTABLES
+ ...global subcommands...
+ [/TABLE axis [BY axis [BY axis]]
+ ...per-table subcommands...]...
+```
+
+where each axis may be empty or take one of the following forms:
+
+```
+variable
+variable [{C | S}]
+axis + axis
+axis > axis
+(axis)
+axis [summary [string] [format]]
+```
+
+The following subcommands precede the first `TABLE` subcommand and
+apply to all of the output tables. All of these subcommands are
+optional:
+
+```
+/FORMAT
+ [MINCOLWIDTH={DEFAULT | width}]
+ [MAXCOLWIDTH={DEFAULT | width}]
+ [UNITS={POINTS | INCHES | CM}]
+ [EMPTY={ZERO | BLANK | string}]
+ [MISSING=string]
+/VLABELS
+ VARIABLES=variables
+ DISPLAY={DEFAULT | NAME | LABEL | BOTH | NONE}
+/SMISSING {VARIABLE | LISTWISE}
+/PCOMPUTE &postcompute=EXPR(expression)
+/PPROPERTIES &postcompute...
+ [LABEL=string]
+ [FORMAT=[summary format]...]
+ [HIDESOURCECATS={NO | YES}
+/WEIGHT VARIABLE=variable
+/HIDESMALLCOUNTS COUNT=count
+```
+
+The following subcommands follow `TABLE` and apply only to the
+previous `TABLE`. All of these subcommands are optional:
+
+```
+/SLABELS
+ [POSITION={COLUMN | ROW | LAYER}]
+ [VISIBLE={YES | NO}]
+/CLABELS {AUTO | {ROWLABELS|COLLABELS}={OPPOSITE|LAYER}}
+/CATEGORIES VARIABLES=variables
+ {[value, value...]
+ | [ORDER={A | D}]
+ [KEY={VALUE | LABEL | summary(variable)}]
+ [MISSING={EXCLUDE | INCLUDE}]}
+ [TOTAL={NO | YES} [LABEL=string] [POSITION={AFTER | BEFORE}]]
+ [EMPTY={INCLUDE | EXCLUDE}]
+/TITLES
+ [TITLE=string...]
+ [CAPTION=string...]
+ [CORNER=string...]
+```
+
+The `CTABLES` (aka "custom tables") command produces
+multi-dimensional tables from categorical and scale data. It offers
+many options for data summarization and formatting.
+
+This section's examples use data from the 2008 (USA) National Survey
+of Drinking and Driving Attitudes and Behaviors, a public domain data
+set from the (USA) National Highway Traffic Administration and available
+at <https://data.transportation.gov>. PSPP includes this data set, with
+a modified dictionary, as `examples/nhtsa.sav`.
+
+<!-- toc -->
+
+## Basics
+
+The only required subcommand is `TABLE`, which specifies the variables
+to include along each axis:
+
+```
+ /TABLE rows [BY columns [BY layers]]
+```
+
+In `TABLE`, each of `ROWS`, `COLUMNS`, and `LAYERS` is either empty or
+an axis expression that specifies one or more variables. At least one
+must specify an axis expression.
+
+## Categorical Variables
+
+An axis expression that names a categorical variable divides the data
+into cells according to the values of that variable. When all the
+variables named on `TABLE` are categorical, by default each cell
+displays the number of cases that it contains, so specifying a single
+variable yields a frequency table, much like the output of the
+[`FREQUENCIES`](frequencies.md) command:
+
+```
+ CTABLES /TABLE=ageGroup.
+```
+
+```
+ Custom Tables
+┌───────────────────────┬─────┐
+│ │Count│
+├───────────────────────┼─────┤
+│Age group 15 or younger│ 0│
+│ 16 to 25 │ 1099│
+│ 26 to 35 │ 967│
+│ 36 to 45 │ 1037│
+│ 46 to 55 │ 1175│
+│ 56 to 65 │ 1247│
+│ 66 or older │ 1474│
+└───────────────────────┴─────┘
+```
+
+Specifying a row and a column categorical variable yields a
+crosstabulation, much like the output of the
+[`CROSSTABS`](crosstabs.md) command:
+
+```
+CTABLES /TABLE=ageGroup BY gender.
+```
+
+```
+ Custom Tables
+┌───────────────────────┬────────────┐
+│ │S3a. GENDER:│
+│ ├─────┬──────┤
+│ │ Male│Female│
+│ ├─────┼──────┤
+│ │Count│ Count│
+├───────────────────────┼─────┼──────┤
+│Age group 15 or younger│ 0│ 0│
+│ 16 to 25 │ 594│ 505│
+│ 26 to 35 │ 476│ 491│
+│ 36 to 45 │ 489│ 548│
+│ 46 to 55 │ 526│ 649│
+│ 56 to 65 │ 516│ 731│
+│ 66 or older │ 531│ 943│
+└───────────────────────┴─────┴──────┘
+```
+
+The `>` "nesting" operator nests multiple variables on a single axis,
+e.g.:
+
+```
+CTABLES /TABLE likelihoodOfBeingStoppedByPolice BY ageGroup > gender.
+```
+
+```
+ Custom Tables
+┌─────────────────────────────────┬───────────────────────────────────────────┐
+│ │ 86. In the past year, have you hosted a │
+│ │ social event or party where alcohol was │
+│ │ served to adults? │
+│ ├─────────────────────┬─────────────────────┤
+│ │ Yes │ No │
+│ ├─────────────────────┼─────────────────────┤
+│ │ Count │ Count │
+├─────────────────────────────────┼─────────────────────┼─────────────────────┤
+│Age 15 or S3a. Male │ 0│ 0│
+│group younger GENDER: Female│ 0│ 0│
+│ ───────────────────────────┼─────────────────────┼─────────────────────┤
+│ 16 to 25 S3a. Male │ 208│ 386│
+│ GENDER: Female│ 202│ 303│
+│ ───────────────────────────┼─────────────────────┼─────────────────────┤
+│ 26 to 35 S3a. Male │ 225│ 251│
+│ GENDER: Female│ 242│ 249│
+│ ───────────────────────────┼─────────────────────┼─────────────────────┤
+│ 36 to 45 S3a. Male │ 223│ 266│
+│ GENDER: Female│ 240│ 307│
+│ ───────────────────────────┼─────────────────────┼─────────────────────┤
+│ 46 to 55 S3a. Male │ 201│ 325│
+│ GENDER: Female│ 282│ 366│
+│ ───────────────────────────┼─────────────────────┼─────────────────────┤
+│ 56 to 65 S3a. Male │ 196│ 320│
+│ GENDER: Female│ 279│ 452│
+│ ───────────────────────────┼─────────────────────┼─────────────────────┤
+│ 66 or S3a. Male │ 162│ 367│
+│ older GENDER: Female│ 243│ 700│
+└─────────────────────────────────┴─────────────────────┴─────────────────────┘
+```
+
+The `+` "stacking" operator allows a single output table to include
+multiple data analyses. With `+`, `CTABLES` divides the output table
+into multiple "sections", each of which includes an analysis of the full
+data set. For example, the following command separately tabulates age
+group and driving frequency by gender:
+
+```
+CTABLES /TABLE ageGroup + freqOfDriving BY gender.
+```
+
+```
+ Custom Tables
+┌────────────────────────────────────────────────────────────────┬────────────┐
+│ │S3a. GENDER:│
+│ ├─────┬──────┤
+│ │ Male│Female│
+│ ├─────┼──────┤
+│ │Count│ Count│
+├────────────────────────────────────────────────────────────────┼─────┼──────┤
+│Age group 15 or younger │ 0│ 0│
+│ 16 to 25 │ 594│ 505│
+│ 26 to 35 │ 476│ 491│
+│ 36 to 45 │ 489│ 548│
+│ 46 to 55 │ 526│ 649│
+│ 56 to 65 │ 516│ 731│
+│ 66 or older │ 531│ 943│
+├────────────────────────────────────────────────────────────────┼─────┼──────┤
+│ 1. How often do you usually drive a car or Every day │ 2305│ 2362│
+│other motor vehicle? Several days a week│ 440│ 834│
+│ Once a week or less│ 125│ 236│
+│ Only certain times │ 58│ 72│
+│ a year │ │ │
+│ Never │ 192│ 348│
+└────────────────────────────────────────────────────────────────┴─────┴──────┘
+```
+
+When `+` and `>` are used together, `>` binds more tightly. Use
+parentheses to override operator precedence. Thus:
+
+```
+CTABLES /TABLE hasConsideredReduction + hasBeenCriticized > gender.
+CTABLES /TABLE (hasConsideredReduction + hasBeenCriticized) > gender.
+```
+
+```
+ Custom Tables
+┌───────────────────────────────────────────────────────────────────────┬─────┐
+│ │Count│
+├───────────────────────────────────────────────────────────────────────┼─────┤
+│26. During the last 12 months, has there been a Yes │ 513│
+│time when you felt you should cut down on your ─────────────────────┼─────┤
+│drinking? No │ 3710│
+├───────────────────────────────────────────────────────────────────────┼─────┤
+│27. During the last 12 months, has there been a Yes S3a. Male │ 135│
+│time when people criticized your drinking? GENDER: Female│ 49│
+│ ─────────────────────┼─────┤
+│ No S3a. Male │ 1916│
+│ GENDER: Female│ 2126│
+└───────────────────────────────────────────────────────────────────────┴─────┘
+
+ Custom Tables
+┌───────────────────────────────────────────────────────────────────────┬─────┐
+│ │Count│
+├───────────────────────────────────────────────────────────────────────┼─────┤
+│26. During the last 12 months, has there been a Yes S3a. Male │ 333│
+│time when you felt you should cut down on your GENDER: Female│ 180│
+│drinking? ─────────────────────┼─────┤
+│ No S3a. Male │ 1719│
+│ GENDER: Female│ 1991│
+├───────────────────────────────────────────────────────────────────────┼─────┤
+│27. During the last 12 months, has there been a Yes S3a. Male │ 135│
+│time when people criticized your drinking? GENDER: Female│ 49│
+│ ─────────────────────┼─────┤
+│ No S3a. Male │ 1916│
+│ GENDER: Female│ 2126│
+└───────────────────────────────────────────────────────────────────────┴─────┘
+```
+
+
+## Scalar Variables
+
+For a categorical variable, `CTABLES` divides the table into a cell per
+category. For a scalar variable, `CTABLES` instead calculates a summary
+measure, by default the mean, of the values that fall into a cell. For
+example, if the only variable specified is a scalar variable, then the
+output is a single cell that holds the mean of all of the data:
+
+```
+CTABLES /TABLE age.
+```
+
+```
+ Custom Tables
+┌──────────────────────────┬────┐
+│ │Mean│
+├──────────────────────────┼────┤
+│D1. AGE: What is your age?│ 48│
+└──────────────────────────┴────┘
+```
+
+A scalar variable may nest with categorical variables. The following
+example shows the mean age of survey respondents across gender and
+language groups:
+
+```
+CTABLES /TABLE gender > age BY region.
+```
+
+```
+Custom Tables
+┌─────────────────────────────────────┬───────────────────────────────────────┐
+│ │Was this interview conducted in English│
+│ │ or Spanish? │
+│ ├───────────────────┬───────────────────┤
+│ │ English │ Spanish │
+│ ├───────────────────┼───────────────────┤
+│ │ Mean │ Mean │
+├─────────────────────────────────────┼───────────────────┼───────────────────┤
+│D1. AGE: What is S3a. Male │ 46│ 37│
+│your age? GENDER: Female│ 51│ 39│
+└─────────────────────────────────────┴───────────────────┴───────────────────┘
+```
+
+The order of nesting of scalar and categorical variables affects
+table labeling, but it does not affect the data displayed in the table.
+The following example shows how the output changes when the nesting
+order of the scalar and categorical variable are interchanged:
+
+```
+CTABLES /TABLE age > gender BY region.
+```
+
+```
+ Custom Tables
+┌─────────────────────────────────────┬───────────────────────────────────────┐
+│ │Was this interview conducted in English│
+│ │ or Spanish? │
+│ ├───────────────────┬───────────────────┤
+│ │ English │ Spanish │
+│ ├───────────────────┼───────────────────┤
+│ │ Mean │ Mean │
+├─────────────────────────────────────┼───────────────────┼───────────────────┤
+│S3a. Male D1. AGE: What is │ 46│ 37│
+│GENDER: your age? │ │ │
+│ ───────────────────────────┼───────────────────┼───────────────────┤
+│ Female D1. AGE: What is │ 51│ 39│
+│ your age? │ │ │
+└─────────────────────────────────────┴───────────────────┴───────────────────┘
+```
+
+Only a single scalar variable may appear in each section; that is, a
+scalar variable may not nest inside a scalar variable directly or
+indirectly. Scalar variables may only appear on one axis within
+`TABLE`.
+
+## Overriding Measurement Level
+
+By default, `CTABLES` uses a variable's measurement level to decide
+whether to treat it as categorical or scalar. Variables assigned the
+nominal or ordinal measurement level are treated as categorical, and
+scalar variables are treated as scalar.
+
+When PSPP reads data from a file in an external format, such as a text
+file, variables' measurement levels are often unknown. If `CTABLES`
+runs when a variable has an unknown measurement level, it makes an
+initial pass through the data to [guess measurement
+levels](../language/datasets/variables.md). Use the [`VARIABLE
+LEVEL`](variable-level.md) command to set or change a variable's
+measurement level.
+
+To treat a variable as categorical or scalar only for one use on
+`CTABLES`, add `[C]` or `[S]`, respectively, after the variable name.
+The following example shows the output when variable
+`monthDaysMin1drink` is analyzed as scalar (the default for its
+measurement level) and as categorical:
+
+```
+CTABLES
+ /TABLE monthDaysMin1drink BY gender
+ /TABLE monthDaysMin1drink [C] BY gender.
+```
+
+```
+ Custom Tables
+┌────────────────────────────────────────────────────────────────┬────────────┐
+│ │S3a. GENDER:│
+│ ├────┬───────┤
+│ │Male│ Female│
+│ ├────┼───────┤
+│ │Mean│ Mean │
+├────────────────────────────────────────────────────────────────┼────┼───────┤
+│20. On how many of the thirty days in this typical month did you│ 7│ 5│
+│have one or more alcoholic beverages to drink? │ │ │
+└────────────────────────────────────────────────────────────────┴────┴───────┘
+
+ Custom Tables
+┌────────────────────────────────────────────────────────────────┬────────────┐
+│ │S3a. GENDER:│
+│ ├─────┬──────┤
+│ │ Male│Female│
+│ ├─────┼──────┤
+│ │Count│ Count│
+├────────────────────────────────────────────────────────────────┼─────┼──────┤
+│20. On how many of the thirty days in this typical month None │ 152│ 258│
+│did you have one or more alcoholic beverages to drink? 1 │ 403│ 653│
+│ 2 │ 284│ 324│
+│ 3 │ 169│ 215│
+│ 4 │ 178│ 143│
+│ 5 │ 107│ 106│
+│ 6 │ 67│ 59│
+│ 7 │ 31│ 11│
+│ 8 │ 101│ 74│
+│ 9 │ 6│ 4│
+│ 10 │ 95│ 75│
+│ 11 │ 4│ 0│
+│ 12 │ 58│ 33│
+│ 13 │ 3│ 2│
+│ 14 │ 13│ 3│
+│ 15 │ 79│ 58│
+│ 16 │ 10│ 6│
+│ 17 │ 4│ 2│
+│ 18 │ 5│ 4│
+│ 19 │ 2│ 0│
+│ 20 │ 105│ 47│
+│ 21 │ 2│ 0│
+│ 22 │ 3│ 3│
+│ 23 │ 0│ 3│
+│ 24 │ 3│ 0│
+│ 25 │ 35│ 25│
+│ 26 │ 1│ 1│
+│ 27 │ 3│ 3│
+│ 28 │ 13│ 8│
+│ 29 │ 3│ 3│
+│ Every │ 104│ 43│
+│ day │ │ │
+└────────────────────────────────────────────────────────────────┴─────┴──────┘
+```
+
+## Data Summarization
+
+The `CTABLES` command allows the user to control how the data are
+summarized with "summary specifications", syntax that lists one or more
+summary function names, optionally separated by commas, and which are
+enclosed in square brackets following a variable name on the `TABLE`
+subcommand. When all the variables are categorical, summary
+specifications can be given for the innermost nested variables on any
+one axis. When a scalar variable is present, only the scalar variable
+may have summary specifications.
+
+The following example includes a summary specification for column and
+row percentages for categorical variables, and mean and median for a
+scalar variable:
+
+```
+CTABLES
+ /TABLE=age [MEAN, MEDIAN] BY gender
+ /TABLE=ageGroup [COLPCT, ROWPCT] BY gender.
+```
+
+```
+ Custom Tables
+┌──────────────────────────┬───────────────────────┐
+│ │ S3a. GENDER: │
+│ ├───────────┬───────────┤
+│ │ Male │ Female │
+│ ├────┬──────┼────┬──────┤
+│ │Mean│Median│Mean│Median│
+├──────────────────────────┼────┼──────┼────┼──────┤
+│D1. AGE: What is your age?│ 46│ 45│ 50│ 52│
+└──────────────────────────┴────┴──────┴────┴──────┘
+
+ Custom Tables
+┌───────────────────────┬─────────────────────────────┐
+│ │ S3a. GENDER: │
+│ ├──────────────┬──────────────┤
+│ │ Male │ Female │
+│ ├────────┬─────┼────────┬─────┤
+│ │Column %│Row %│Column %│Row %│
+├───────────────────────┼────────┼─────┼────────┼─────┤
+│Age group 15 or younger│ .0%│ .│ .0%│ .│
+│ 16 to 25 │ 19.0%│54.0%│ 13.1%│46.0%│
+│ 26 to 35 │ 15.2%│49.2%│ 12.7%│50.8%│
+│ 36 to 45 │ 15.6%│47.2%│ 14.2%│52.8%│
+│ 46 to 55 │ 16.8%│44.8%│ 16.8%│55.2%│
+│ 56 to 65 │ 16.5%│41.4%│ 18.9%│58.6%│
+│ 66 or older │ 17.0%│36.0%│ 24.4%│64.0%│
+└───────────────────────┴────────┴─────┴────────┴─────┘
+```
+
+A summary specification may override the default label and format by
+appending a string or format specification or both (in that order) to
+the summary function name. For example:
+
+```
+CTABLES /TABLE=ageGroup [COLPCT 'Gender %' PCT5.0,
+ ROWPCT 'Age Group %' PCT5.0]
+ BY gender.
+```
+
+```
+ Custom Tables
+┌───────────────────────┬─────────────────────────────────────────┐
+│ │ S3a. GENDER: │
+│ ├────────────────────┬────────────────────┤
+│ │ Male │ Female │
+│ ├────────┬───────────┼────────┬───────────┤
+│ │Gender %│Age Group %│Gender %│Age Group %│
+├───────────────────────┼────────┼───────────┼────────┼───────────┤
+│Age group 15 or younger│ 0%│ .│ 0%│ .│
+│ 16 to 25 │ 19%│ 54%│ 13%│ 46%│
+│ 26 to 35 │ 15%│ 49%│ 13%│ 51%│
+│ 36 to 45 │ 16%│ 47%│ 14%│ 53%│
+│ 46 to 55 │ 17%│ 45%│ 17%│ 55%│
+│ 56 to 65 │ 16%│ 41%│ 19%│ 59%│
+│ 66 or older │ 17%│ 36%│ 24%│ 64%│
+└───────────────────────┴────────┴───────────┴────────┴───────────┘
+```
+
+In addition to the standard formats, `CTABLES` allows the user to
+specify the following special formats:
+
+|Format|Description|Positive Example|Negative Example|
+|:-----|:----------|-------:|-------:|
+|`NEGPARENw.d`|Encloses negative numbers in parentheses.|42.96|(42.96)|
+|`NEQUALw.d`|Adds a `N=` prefix.|N=42.96|N=-42.96|
+|`PARENw.d`|Encloses all numbers in parentheses.|(42.96)|(-42.96)|
+|`PCTPARENw.d`|Encloses all numbers in parentheses with a `%` suffix.|(42.96%)|(-42.96%)|
+
+Parentheses provide a shorthand to apply summary specifications to
+multiple variables. For example, both of these commands:
+
+```
+CTABLES /TABLE=ageGroup[COLPCT] + membersOver16[COLPCT] BY gender.
+CTABLES /TABLE=(ageGroup + membersOver16)[COLPCT] BY gender.
+```
+
+produce the same output shown below:
+
+```
+ Custom Tables
+┌─────────────────────────────────────────────────────────────┬───────────────┐
+│ │ S3a. GENDER: │
+│ ├───────┬───────┤
+│ │ Male │ Female│
+│ ├───────┼───────┤
+│ │ Column│ Column│
+│ │ % │ % │
+├─────────────────────────────────────────────────────────────┼───────┼───────┤
+│Age group 15 or │ .0%│ .0%│
+│ younger │ │ │
+│ 16 to 25 │ 19.0%│ 13.1%│
+│ 26 to 35 │ 15.2%│ 12.7%│
+│ 36 to 45 │ 15.6%│ 14.2%│
+│ 46 to 55 │ 16.8%│ 16.8%│
+│ 56 to 65 │ 16.5%│ 18.9%│
+│ 66 or older│ 17.0%│ 24.4%│
+├─────────────────────────────────────────────────────────────┼───────┼───────┤
+│S1. Including yourself, how many members of this None │ .0%│ .0%│
+│household are age 16 or older? 1 │ 21.4%│ 35.0%│
+│ 2 │ 61.9%│ 52.3%│
+│ 3 │ 11.0%│ 8.2%│
+│ 4 │ 4.2%│ 3.2%│
+│ 5 │ 1.1%│ .9%│
+│ 6 or more │ .4%│ .4%│
+└─────────────────────────────────────────────────────────────┴───────┴───────┘
+```
+
+The following sections list the available summary functions. After
+each function's name is given its default label and format. If no
+format is listed, then the default format is the print format for the
+variable being summarized.
+
+### Summary Functions for Individual Cells
+
+This section lists the summary functions that consider only an
+individual cell in `CTABLES`. Only one such summary function, `COUNT`,
+may be applied to both categorical and scale variables:
+
+* `COUNT` ("Count", F40.0)
+ The sum of weights in a cell.
+
+ If `CATEGORIES` for one or more of the variables in a table include
+ missing values (see [Per-Variable Category
+ Options](#per-variable-category-options)), then some or all of the
+ categories for a cell might be missing values. `COUNT` counts data
+ included in a cell regardless of whether its categories are missing.
+
+The following summary functions apply only to scale variables or
+totals and subtotals for categorical variables. Be cautious about
+interpreting the summary value in the latter case, because it is not
+necessarily meaningful; however, the mean of a Likert scale, etc. may
+have a straightforward interpreation.
+
+* `MAXIMUM` ("Maximum")
+ The largest value.
+
+* `MEAN` ("Mean")
+ The mean.
+
+* `MEDIAN` ("Median")
+ The median value.
+
+* `MINIMUM` ("Minimum")
+ The smallest value.
+
+* `MISSING` ("Missing")
+ Sum of weights of user- and system-missing values.
+
+* `MODE` ("Mode")
+ The highest-frequency value. Ties are broken by taking the
+ smallest mode.
+
+* `PTILE` n ("Percentile n")
+ The Nth percentile, where 0 ≤ N ≤ 100.
+
+* `RANGE` ("Range")
+ The maximum minus the minimum.
+
+* `SEMEAN` ("Std Error of Mean")
+ The standard error of the mean.
+
+* `STDDEV` ("Std Deviation")
+ The standard deviation.
+
+* `SUM` ("Sum")
+ The sum.
+
+* `TOTALN` ("Total N", F40.0)
+ The sum of weights in a cell.
+
+ For scale data, `COUNT` and `TOTALN` are the same.
+
+ For categorical data, `TOTALN` counts missing values in excluded
+ categories, that is, user-missing values not in an explicit category
+ list on `CATEGORIES` (see [Per-Variable Category
+ Options](#per-variable-category-options)), or user-missing values
+ excluded because `MISSING=EXCLUDE` is in effect on `CATEGORIES`, or
+ system-missing values. `COUNT` does not count these.
+
+ See [Missing Values for Summary
+ Variables](#missing-values-for-summary-variables), for details of
+ how `CTABLES` summarizes missing values.
+
+* `VALIDN` ("Valid N", F40.0)
+ The sum of valid count weights in included categories.
+
+ For categorical variables, `VALIDN` does not count missing values
+ regardless of whether they are in included categories via
+ `CATEGORIES`. `VALIDN` does not count valid values that are in
+ excluded categories. See [Missing Values for Summary
+ Variables](#missing-values-for-summary-variables) for details.
+
+* `VARIANCE` ("Variance")
+ The variance.
+
+### Summary Functions for Groups of Cells
+
+These summary functions summarize over multiple cells within an area of
+the output chosen by the user and specified as part of the function
+name. The following basic AREAs are supported, in decreasing order of
+size:
+
+* `TABLE`
+ A "section". Stacked variables divide sections of the output from
+ each other. sections may span multiple layers.
+
+* `LAYER`
+ A section within a single layer.
+
+* `SUBTABLE`
+ A "subtable", whose contents are the cells that pair an innermost
+ row variable and an innermost column variable within a single
+ layer.
+
+The following shows how the output for the table expression
+`hasBeenPassengerOfDesignatedDriver > hasBeenPassengerOfDrunkDriver BY
+isLicensedDriver > hasHostedEventWithAlcohol + hasBeenDesignatedDriver
+BY gender`[^1] is divided up into `TABLE`, `LAYER`, and `SUBTABLE`
+areas. Each unique value for Table ID is one section, and similarly
+for Layer ID and Subtable ID. Thus, this output has two `TABLE` areas
+(one for `isLicensedDriver` and one for `hasBeenDesignatedDriver`),
+four `LAYER` areas (for those two variables, per layer), and 12
+`SUBTABLE` areas.
+
+```
+ Custom Tables
+Male
+┌─────────────────────────────────┬─────────────────┬──────┐
+│ │ licensed │desDrv│
+│ ├────────┬────────┼───┬──┤
+│ │ Yes │ No │ │ │
+│ ├────────┼────────┤ │ │
+│ │ hostAlc│ hostAlc│ │ │
+│ ├────┬───┼────┬───┤ │ │
+│ │ Yes│ No│ Yes│ No│Yes│No│
+├─────────────────────────────────┼────┼───┼────┼───┼───┼──┤
+│desPas Yes druPas Yes Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Subtable ID│ 1│ 1│ 2│ 2│ 3│ 3│
+│ ────────────────┼────┼───┼────┼───┼───┼──┤
+│ No Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Subtable ID│ 1│ 1│ 2│ 2│ 3│ 3│
+│ ───────────────────────────┼────┼───┼────┼───┼───┼──┤
+│ No druPas Yes Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Subtable ID│ 4│ 4│ 5│ 5│ 6│ 6│
+│ ────────────────┼────┼───┼────┼───┼───┼──┤
+│ No Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
+│ Subtable ID│ 4│ 4│ 5│ 5│ 6│ 6│
+└─────────────────────────────────┴────┴───┴────┴───┴───┴──┘
+```
+
+`CTABLES` also supports the following AREAs that further divide a
+subtable or a layer within a section:
+
+* `LAYERROW`
+ `LAYERCOL`
+ A row or column, respectively, in one layer of a section.
+
+* `ROW`
+ `COL`
+ A row or column, respectively, in a subtable.
+
+The following summary functions for groups of cells are available for
+each AREA described above, for both categorical and scale variables:
+
+* `areaPCT` or `areaPCT.COUNT` ("Area %", PCT40.1)
+ A percentage of total counts within AREA.
+
+* `areaPCT.VALIDN` ("Area Valid N %", PCT40.1)
+ A percentage of total counts for valid values within AREA.
+
+* `areaPCT.TOTALN` ("Area Total N %", PCT40.1)
+ A percentage of total counts for all values within AREA.
+
+Scale variables and totals and subtotals for categorical variables
+may use the following additional group cell summary function:
+
+* `areaPCT.SUM` ("Area Sum %", PCT40.1)
+ Percentage of the sum of the values within AREA.
+
+
+[^1]: This is not necessarily a meaningful table. To make it easier to
+read, short variable labels are used.
+
+### Summary Functions for Adjusted Weights
+
+If the `WEIGHT` subcommand specified an [effective weight
+variable](#effective-weight), then the following summary functions use
+its value instead of the dictionary weight variable. Otherwise, they
+are equivalent to the summary function without the `E`-prefix:
+
+- `ECOUNT` ("Adjusted Count", F40.0)
+
+- `ETOTALN` ("Adjusted Total N", F40.0)
+
+- `EVALIDN` ("Adjusted Valid N", F40.0)
+
+### Unweighted Summary Functions
+
+The following summary functions with a `U`-prefix are equivalent to the
+same ones without the prefix, except that they use unweighted counts:
+
+- `UCOUNT` ("Unweighted Count", F40.0)
+
+- `UareaPCT` or `UareaPCT.COUNT` ("Unweighted Area %", PCT40.1)
+
+- `UareaPCT.VALIDN` ("Unweighted Area Valid N %", PCT40.1)
+
+- `UareaPCT.TOTALN` ("Unweighted Area Total N %", PCT40.1)
+
+- `UMEAN` ("Unweighted Mean")
+
+- `UMEDIAN` ("Unweighted Median")
+
+- `UMISSING` ("Unweighted Missing")
+
+- `UMODE` ("Unweighted Mode")
+
+- `UareaPCT.SUM` ("Unweighted Area Sum %", PCT40.1)
+
+- `UPTILE` n ("Unweighted Percentile n")
+
+- `USEMEAN` ("Unweighted Std Error of Mean")
+
+- `USTDDEV` ("Unweighted Std Deviation")
+
+- `USUM` ("Unweighted Sum")
+
+- `UTOTALN` ("Unweighted Total N", F40.0)
+
+- `UVALIDN` ("Unweighted Valid N", F40.0)
+
+- `UVARIANCE` ("Unweighted Variance", F40.0)
+
+## Statistics Positions and Labels
+
+```
+/SLABELS
+ [POSITION={COLUMN | ROW | LAYER}]
+ [VISIBLE={YES | NO}]
+```
+
+The `SLABELS` subcommand controls the position and visibility of
+summary statistics for the `TABLE` subcommand that it follows.
+
+`POSITION` sets the axis on which summary statistics appear. With
+POSITION=COLUMN, which is the default, each summary statistic appears in
+a column. For example:
+
+```
+CTABLES /TABLE=age [MEAN, MEDIAN] BY gender.
+```
+
+```
+ Custom Tables
++──────────────────────────+───────────────────────+
+│ │ S3a. GENDER: │
+│ +───────────+───────────+
+│ │ Male │ Female │
+│ +────+──────+────+──────+
+│ │Mean│Median│Mean│Median│
++──────────────────────────+────+──────+────+──────+
+│D1. AGE: What is your age?│ 46│ 45│ 50│ 52│
++──────────────────────────+────+──────+────+──────+
+```
+
+
+With `POSITION=ROW`, each summary statistic appears in a row, as shown
+below:
+
+```
+CTABLES /TABLE=age [MEAN, MEDIAN] BY gender /SLABELS POSITION=ROW.
+```
+
+```
+ Custom Tables
++─────────────────────────────────+─────────────+
+│ │ S3a. GENDER:│
+│ +─────+───────+
+│ │ Male│ Female│
++─────────────────────────────────+─────+───────+
+│D1. AGE: What is your age? Mean │ 46│ 50│
+│ Median│ 45│ 52│
++─────────────────────────────────+─────+───────+
+```
+
+
+`POSITION=LAYER` is also available to place each summary statistic in a
+separate layer.
+
+Labels for summary statistics are shown by default. Use VISIBLE=NO
+to suppress them. Because unlabeled data can cause confusion, it should
+only be considered if the meaning of the data is evident, as in a simple
+case like this:
+
+```
+CTABLES /TABLE=ageGroup [TABLEPCT] /SLABELS VISIBLE=NO.
+```
+
+```
+ Custom Tables
++───────────────────────+─────+
+│Age group 15 or younger│ .0%│
+│ 16 to 25 │15.7%│
+│ 26 to 35 │13.8%│
+│ 36 to 45 │14.8%│
+│ 46 to 55 │16.8%│
+│ 56 to 65 │17.8%│
+│ 66 or older │21.1%│
++───────────────────────+─────+
+```
+
+
+## Category Label Positions
+
+```
+/CLABELS {AUTO │ {ROWLABELS│COLLABELS}={OPPOSITE│LAYER}}
+```
+
+The `CLABELS` subcommand controls the position of category labels for
+the `TABLE` subcommand that it follows. By default, or if AUTO is
+specified, category labels for a given variable nest inside the
+variable's label on the same axis. For example, the command below
+results in age categories nesting within the age group variable on the
+rows axis and gender categories within the gender variable on the
+columns axis:
+
+```
+CTABLES /TABLE ageGroup BY gender.
+```
+
+```
+ Custom Tables
++───────────────────────+────────────+
+│ │S3a. GENDER:│
+│ +─────+──────+
+│ │ Male│Female│
+│ +─────+──────+
+│ │Count│ Count│
++───────────────────────+─────+──────+
+│Age group 15 or younger│ 0│ 0│
+│ 16 to 25 │ 594│ 505│
+│ 26 to 35 │ 476│ 491│
+│ 36 to 45 │ 489│ 548│
+│ 46 to 55 │ 526│ 649│
+│ 56 to 65 │ 516│ 731│
+│ 66 or older │ 531│ 943│
++───────────────────────+─────+──────+
+```
+
+
+ROWLABELS=OPPOSITE or COLLABELS=OPPOSITE move row or column variable
+category labels, respectively, to the opposite axis. The setting
+affects only the innermost variable or variables, which must be
+categorical, on the given axis. For example:
+
+```
+CTABLES /TABLE ageGroup BY gender /CLABELS ROWLABELS=OPPOSITE.
+CTABLES /TABLE ageGroup BY gender /CLABELS COLLABELS=OPPOSITE.
+```
+
+```
+ Custom Tables
++─────+──────────────────────────────────────────────────────────────────────
+│ │ S3a. GENDER:
+│ +───────────────────────────────────────────+──────────────────────────
+│ │ Male │ Female
+│ +───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
+│ │ 15 or │16 to│26 to│36 to│46 to│56 to│66 or│ 15 or │16 to│26 to│36 to│
+│ │younger│ 25 │ 35 │ 45 │ 55 │ 65 │older│younger│ 25 │ 35 │ 45 │
+│ +───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
+│ │ Count │Count│Count│Count│Count│Count│Count│ Count │Count│Count│Count│
++─────+───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
+│Age │ 0│ 594│ 476│ 489│ 526│ 516│ 531│ 0│ 505│ 491│ 548│
+│group│ │ │ │ │ │ │ │ │ │ │ │
++─────+───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
+
++─────+─────────────────+
+│ │ │
+│ +─────────────────+
+│ │ │
+│ +─────+─────+─────+
+│ │46 to│56 to│66 or│
+│ │ 55 │ 65 │older│
+│ +─────+─────+─────+
+│ │Count│Count│Count│
++─────+─────+─────+─────+
+│Age │ 649│ 731│ 943│
+│group│ │ │ │
++─────+─────+─────+─────+
+
+ Custom Tables
++──────────────────────────────+────────────+
+│ │S3a. GENDER:│
+│ +────────────+
+│ │ Count │
++──────────────────────────────+────────────+
+│Age group 15 or younger Male │ 0│
+│ Female│ 0│
+│ ─────────────────────+────────────+
+│ 16 to 25 Male │ 594│
+│ Female│ 505│
+│ ─────────────────────+────────────+
+│ 26 to 35 Male │ 476│
+│ Female│ 491│
+│ ─────────────────────+────────────+
+│ 36 to 45 Male │ 489│
+│ Female│ 548│
+│ ─────────────────────+────────────+
+│ 46 to 55 Male │ 526│
+│ Female│ 649│
+│ ─────────────────────+────────────+
+│ 56 to 65 Male │ 516│
+│ Female│ 731│
+│ ─────────────────────+────────────+
+│ 66 or older Male │ 531│
+│ Female│ 943│
++──────────────────────────────+────────────+
+```
+
+
+`ROWLABELS=LAYER` or `COLLABELS=LAYER` move the innermost row or column
+variable category labels, respectively, to the layer axis.
+
+Only one axis's labels may be moved, whether to the opposite axis or
+to the layer axis.
+
+### Effect on Summary Statistics
+
+`CLABELS` primarily affects the appearance of tables, not the data
+displayed in them. However, `CTABLES` can affect the values displayed
+for statistics that summarize areas of a table, since it can change the
+definitions of these areas.
+
+For example, consider the following syntax and output:
+
+```
+CTABLES /TABLE ageGroup BY gender [ROWPCT, COLPCT].
+```
+
+```
+ Custom Tables
++───────────────────────+─────────────────────────────+
+│ │ S3a. GENDER: │
+│ +──────────────+──────────────+
+│ │ Male │ Female │
+│ +─────+────────+─────+────────+
+│ │Row %│Column %│Row %│Column %│
++───────────────────────+─────+────────+─────+────────+
+│Age group 15 or younger│ .│ .0%│ .│ .0%│
+│ 16 to 25 │54.0%│ 19.0%│46.0%│ 13.1%│
+│ 26 to 35 │49.2%│ 15.2%│50.8%│ 12.7%│
+│ 36 to 45 │47.2%│ 15.6%│52.8%│ 14.2%│
+│ 46 to 55 │44.8%│ 16.8%│55.2%│ 16.8%│
+│ 56 to 65 │41.4%│ 16.5%│58.6%│ 18.9%│
+│ 66 or older │36.0%│ 17.0%│64.0%│ 24.4%│
++───────────────────────+─────+────────+─────+────────+
+```
+
+
+Using `COLLABELS=OPPOSITE` changes the definitions of rows and columns,
+so that column percentages display what were previously row percentages
+and the new row percentages become meaningless (because there is only
+one cell per row):
+
+```
+CTABLES
+ /TABLE ageGroup BY gender [ROWPCT, COLPCT]
+ /CLABELS COLLABELS=OPPOSITE.
+```
+
+```
+ Custom Tables
++──────────────────────────────+───────────────+
+│ │ S3a. GENDER: │
+│ +──────+────────+
+│ │ Row %│Column %│
++──────────────────────────────+──────+────────+
+│Age group 15 or younger Male │ .│ .│
+│ Female│ .│ .│
+│ ─────────────────────+──────+────────+
+│ 16 to 25 Male │100.0%│ 54.0%│
+│ Female│100.0%│ 46.0%│
+│ ─────────────────────+──────+────────+
+│ 26 to 35 Male │100.0%│ 49.2%│
+│ Female│100.0%│ 50.8%│
+│ ─────────────────────+──────+────────+
+│ 36 to 45 Male │100.0%│ 47.2%│
+│ Female│100.0%│ 52.8%│
+│ ─────────────────────+──────+────────+
+│ 46 to 55 Male │100.0%│ 44.8%│
+│ Female│100.0%│ 55.2%│
+│ ─────────────────────+──────+────────+
+│ 56 to 65 Male │100.0%│ 41.4%│
+│ Female│100.0%│ 58.6%│
+│ ─────────────────────+──────+────────+
+│ 66 or older Male │100.0%│ 36.0%│
+│ Female│100.0%│ 64.0%│
++──────────────────────────────+──────+────────+
+```
+
+
+### Moving Categories for Stacked Variables
+
+If `CLABELS` moves category labels from an axis with stacked
+variables, the variables that are moved must have the same category
+specifications (see [Per-Variable Category
+Options](#per-variable-category-options)) and the same value labels.
+
+The following shows both moving stacked category variables and
+adapting to the changing definitions of rows and columns:
+
+```
+CTABLES /TABLE (likelihoodOfBeingStoppedByPolice
+ + likelihoodOfHavingAnAccident) [COLPCT].
+CTABLES /TABLE (likelihoodOfBeingStoppedByPolice
+ + likelihoodOfHavingAnAccident) [ROWPCT]
+ /CLABELS ROW=OPPOSITE.
+```
+
+```
+ Custom Tables
++─────────────────────────────────────────────────────────────────────+───────+
+│ │ Column│
+│ │ % │
++─────────────────────────────────────────────────────────────────────+───────+
+│105b. How likely is it that drivers who have had too Almost │ 10.2%│
+│much to drink to drive safely will A. Get stopped by the certain │ │
+│police? Very likely │ 21.8%│
+│ Somewhat │ 40.2%│
+│ likely │ │
+│ Somewhat │ 19.0%│
+│ unlikely │ │
+│ Very │ 8.9%│
+│ unlikely │ │
++─────────────────────────────────────────────────────────────────────+───────+
+│105b. How likely is it that drivers who have had too Almost │ 15.9%│
+│much to drink to drive safely will B. Have an accident? certain │ │
+│ Very likely │ 40.8%│
+│ Somewhat │ 35.0%│
+│ likely │ │
+│ Somewhat │ 6.2%│
+│ unlikely │ │
+│ Very │ 2.0%│
+│ unlikely │ │
++─────────────────────────────────────────────────────────────────────+───────+
+
+ Custom Tables
++─────────────────────────────+────────+───────+─────────+──────────+─────────+
+│ │ Almost │ Very │ Somewhat│ Somewhat │ Very │
+│ │ certain│ likely│ likely │ unlikely │ unlikely│
+│ +────────+───────+─────────+──────────+─────────+
+│ │ Row % │ Row % │ Row % │ Row % │ Row % │
++─────────────────────────────+────────+───────+─────────+──────────+─────────+
+│105b. How likely is it that │ 10.2%│ 21.8%│ 40.2%│ 19.0%│ 8.9%│
+│drivers who have had too much│ │ │ │ │ │
+│to drink to drive safely will│ │ │ │ │ │
+│A. Get stopped by the police?│ │ │ │ │ │
+│105b. How likely is it that │ 15.9%│ 40.8%│ 35.0%│ 6.2%│ 2.0%│
+│drivers who have had too much│ │ │ │ │ │
+│to drink to drive safely will│ │ │ │ │ │
+│B. Have an accident? │ │ │ │ │ │
++─────────────────────────────+────────+───────+─────────+──────────+─────────+
+```
+
+
+## Per-Variable Category Options
+
+```
+/CATEGORIES VARIABLES=variables
+ {[value, value...]
+ | [ORDER={A | D}]
+ [KEY={VALUE | LABEL | summary(variable)}]
+ [MISSING={EXCLUDE | INCLUDE}]}
+ [TOTAL={NO | YES} [LABEL=string] [POSITION={AFTER | BEFORE}]]
+ [EMPTY={INCLUDE | EXCLUDE}]
+```
+
+The `CATEGORIES` subcommand specifies, for one or more categorical
+variables, the categories to include and exclude, the sort order for
+included categories, and treatment of missing values. It also controls
+the totals and subtotals to display. It may be specified any number of
+times, each time for a different set of variables. `CATEGORIES` applies
+to the table produced by the `TABLE` subcommand that it follows.
+
+`CATEGORIES` does not apply to scalar variables.
+
+VARIABLES is required and must list the variables for the subcommand
+to affect.
+
+The syntax may specify the categories to include and their sort order
+either explicitly or implicitly. The following sections give the
+details of each form of syntax, followed by information on totals and
+subtotals and the `EMPTY` setting.
+
+### Explicit Categories
+
+To use `CTABLES` to explicitly specify categories to include, list the
+categories within square brackets in the desired sort order. Use spaces
+or commas to separate values. Categories not covered by the list are
+excluded from analysis.
+
+Each element of the list takes one of the following forms:
+
+* `number`
+ `'string'`
+ A numeric or string category value, for variables that have the
+ corresponding type.
+
+* `'date'`
+ `'time'`
+ A date or time category value, for variables that have a date or
+ time print format.
+
+* `min THRU max`
+ `LO THRU max`
+ `min THRU HI`
+ A range of category values, where `min` and `max` each takes one of
+ the forms above, in increasing order.
+
+* `MISSING`
+ All user-missing values. (To match individual user-missing values,
+ specify their category values.)
+
+* `OTHERNM`
+ Any non-missing value not covered by any other element of the list
+ (regardless of where `OTHERNM` is placed in the list).
+
+* `&postcompute`
+ A [computed category name](#computed-categories).
+
+* `SUBTOTAL`
+ `HSUBTOTAL`
+ A [subtotal](#totals-and-subtotals).
+
+If multiple elements of the list cover a given category, the last one
+in the list takes precedence.
+
+The following example syntax and output show how an explicit category
+can limit the displayed categories:
+
+```
+CTABLES /TABLE freqOfDriving.
+CTABLES /TABLE freqOfDriving /CATEGORIES VARIABLES=freqOfDriving [1, 2, 3].
+```
+
+```
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│ 1. How often do you usually drive a car or other Every day │ 4667│
+│motor vehicle? Several days a week │ 1274│
+│ Once a week or less │ 361│
+│ Only certain times a│ 130│
+│ year │ │
+│ Never │ 540│
++───────────────────────────────────────────────────────────────────────+─────+
+
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│ 1. How often do you usually drive a car or other Every day │ 4667│
+│motor vehicle? Several days a │ 1274│
+│ week │ │
+│ Once a week or │ 361│
+│ less │ │
++───────────────────────────────────────────────────────────────────────+─────+
+```
+
+
+### Implicit Categories
+
+In the absence of an explicit list of categories, `CATEGORIES` allows
+`KEY`, `ORDER`, and `MISSING` to specify how to select and sort
+categories.
+
+The `KEY` setting specifies the sort key. By default, or with
+`KEY=VALUE`, categories are sorted by default. Categories may also be
+sorted by value label, with `KEY=LABEL`, or by the value of a summary
+function, e.g. `KEY=COUNT`.
+
+By default, or with `ORDER=A`, categories are sorted in ascending
+order. Specify `ORDER=D` to sort in descending order.
+
+User-missing values are excluded by default, or with
+`MISSING=EXCLUDE`. Specify `MISSING=INCLUDE` to include user-missing
+values. The system-missing value is always excluded.
+
+The following example syntax and output show how `MISSING=INCLUDE`
+causes missing values to be included in a category list.
+
+```
+CTABLES /TABLE freqOfDriving.
+CTABLES /TABLE freqOfDriving
+ /CATEGORIES VARIABLES=freqOfDriving MISSING=INCLUDE.
+```
+
+```
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│ 1. How often do you usually drive a car or other Every day │ 4667│
+│motor vehicle? Several days a week │ 1274│
+│ Once a week or less │ 361│
+│ Only certain times a│ 130│
+│ year │ │
+│ Never │ 540│
++───────────────────────────────────────────────────────────────────────+─────+
+
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│ 1. How often do you usually drive a car or other Every day │ 4667│
+│motor vehicle? Several days a week │ 1274│
+│ Once a week or less │ 361│
+│ Only certain times a│ 130│
+│ year │ │
+│ Never │ 540│
+│ Don't know │ 8│
+│ Refused │ 19│
++───────────────────────────────────────────────────────────────────────+─────+
+```
+
+
+### Totals and Subtotals
+
+`CATEGORIES` also controls display of totals and subtotals. By default,
+or with `TOTAL=NO`, totals are not displayed. Use `TOTAL=YES` to
+display a total. By default, the total is labeled "Total"; use
+`LABEL="label"` to override it.
+
+Subtotals are also not displayed by default. To add one or more
+subtotals, use an explicit category list and insert `SUBTOTAL` or
+`HSUBTOTAL` in the position or positions where the subtotal should
+appear. The subtotal becomes an extra row or column or layer.
+`HSUBTOTAL` additionally hides the categories that make up the
+subtotal. Either way, the default label is "Subtotal", use
+`SUBTOTAL="label"` or `HSUBTOTAL="label"` to specify a custom label.
+
+The following example syntax and output show how to use `TOTAL=YES`
+and `SUBTOTAL`:
+
+```
+CTABLES
+ /TABLE freqOfDriving
+ /CATEGORIES VARIABLES=freqOfDriving [OTHERNM, SUBTOTAL='Valid Total',
+ MISSING, SUBTOTAL='Missing Total']
+ TOTAL=YES LABEL='Overall Total'.
+```
+
+```
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│ 1. How often do you usually drive a car or other Every day │ 4667│
+│motor vehicle? Several days a week │ 1274│
+│ Once a week or less │ 361│
+│ Only certain times a│ 130│
+│ year │ │
+│ Never │ 540│
+│ Valid Total │ 6972│
+│ Don't know │ 8│
+│ Refused │ 19│
+│ Missing Total │ 27│
+│ Overall Total │ 6999│
++───────────────────────────────────────────────────────────────────────+─────+
+```
+
+
+By default, or with `POSITION=AFTER`, totals are displayed in the
+output after the last category and subtotals apply to categories that
+precede them. With `POSITION=BEFORE`, totals come before the first
+category and subtotals apply to categories that follow them.
+
+Only categorical variables may have totals and subtotals. Scalar
+variables may be "totaled" indirectly by enabling totals and subtotals
+on a categorical variable within which the scalar variable is
+summarized. For example, the following syntax produces a mean, count,
+and valid count across all data by adding a total on the categorical
+`region` variable, as shown:
+
+```
+CTABLES /TABLE=region > monthDaysMin1drink [MEAN, VALIDN]
+ /CATEGORIES VARIABLES=region TOTAL=YES LABEL='All regions'.
+```
+
+```
+ Custom Tables
++───────────────────────────────────────────────────────────+────+─────+──────+
+│ │ │ │ Valid│
+│ │Mean│Count│ N │
++───────────────────────────────────────────────────────────+────+─────+──────+
+│20. On how many of the thirty days in this Region NE │ 5.6│ 1409│ 945│
+│typical month did you have one or more MW │ 5.0│ 1654│ 1026│
+│alcoholic beverages to drink? S │ 6.0│ 2390│ 1285│
+│ W │ 6.5│ 1546│ 953│
+│ All │ 5.8│ 6999│ 4209│
+│ regions │ │ │ │
++───────────────────────────────────────────────────────────+────+─────+──────+
+```
+
+
+By default, PSPP uses the same summary functions for totals and
+subtotals as other categories. To summarize totals and subtotals
+differently, specify the summary functions for totals and subtotals
+after the ordinary summary functions inside a nested set of `[]`
+following `TOTALS`. For example, the following syntax displays `COUNT`
+for individual categories and totals and `VALIDN` for totals, as shown:
+
+```
+CTABLES
+ /TABLE isLicensedDriver [COUNT, TOTALS[COUNT, VALIDN]]
+ /CATEGORIES VARIABLES=isLicensedDriver TOTAL=YES MISSING=INCLUDE.
+```
+
+```
+ Custom Tables
++────────────────────────────────────────────────────────────────+─────+──────+
+│ │ │ Valid│
+│ │Count│ N │
++────────────────────────────────────────────────────────────────+─────+──────+
+│D7a. Are you a licensed driver; that is, do you have a Yes │ 6379│ │
+│valid driver's license? No │ 572│ │
+│ Don't │ 4│ │
+│ know │ │ │
+│ Refused │ 44│ │
+│ Total │ 6999│ 6951│
++────────────────────────────────────────────────────────────────+─────+──────+
+```
+
+
+### Categories Without Values
+
+Some categories might not be included in the data set being analyzed.
+For example, our example data set has no cases in the "15 or younger"
+age group. By default, or with `EMPTY=INCLUDE`, PSPP includes these
+empty categories in output tables. To exclude them, specify
+`EMPTY=EXCLUDE`.
+
+For implicit categories, empty categories potentially include all the
+values with value labels for a given variable; for explicit categories,
+they include all the values listed individually and all values with
+value labels that are covered by ranges or `MISSING` or `OTHERNM`.
+
+The following example syntax and output show the effect of
+`EMPTY=EXCLUDE` for the `membersOver16` variable, in which 0 is labeled
+"None" but no cases exist with that value:
+
+```
+CTABLES /TABLE=membersOver16.
+CTABLES /TABLE=membersOver16 /CATEGORIES VARIABLES=membersOver16 EMPTY=EXCLUDE.
+```
+
+```
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│S1. Including yourself, how many members of this household are None │ 0│
+│age 16 or older? 1 │ 1586│
+│ 2 │ 3031│
+│ 3 │ 505│
+│ 4 │ 194│
+│ 5 │ 55│
+│ 6 or │ 21│
+│ more │ │
++───────────────────────────────────────────────────────────────────────+─────+
+
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│S1. Including yourself, how many members of this household are 1 │ 1586│
+│age 16 or older? 2 │ 3031│
+│ 3 │ 505│
+│ 4 │ 194│
+│ 5 │ 55│
+│ 6 or │ 21│
+│ more │ │
++───────────────────────────────────────────────────────────────────────+─────+
+```
+
+
+## Titles
+
+```
+/TITLES
+ [TITLE=string...]
+ [CAPTION=string...]
+ [CORNER=string...]
+```
+
+The `TITLES` subcommand sets the title, caption, and corner text for
+the table output for the previous `TABLE` subcommand. Any number of
+strings may be specified for each kind of text, with each string
+appearing on a separate line in the output. The title appears above the
+table, the caption below the table, and the corner text appears in the
+table's upper left corner. By default, the title is "Custom Tables" and
+the caption and corner text are empty. With some table output styles,
+the corner text is not displayed.
+
+The strings provided in this subcommand may contain the following
+macro-like keywords that PSPP substitutes at the time that it runs the
+command:
+
+* `)DATE`
+ The current date, e.g. MM/DD/YY. The format is locale-dependent.
+
+* `)TIME`
+ The current time, e.g. HH:MM:SS. The format is locale-dependent.
+
+* `)TABLE`
+ The expression specified on the `TABLE` command. Summary and
+ measurement level specifications are omitted, and variable labels
+ are used in place of variable names.
+
+## Table Formatting
+
+```
+/FORMAT
+ [MINCOLWIDTH={DEFAULT | width}]
+ [MAXCOLWIDTH={DEFAULT | width}]
+ [UNITS={POINTS | INCHES | CM}]
+ [EMPTY={ZERO | BLANK | string}]
+ [MISSING=string]
+```
+
+The `FORMAT` subcommand, which must precede the first `TABLE`
+subcommand, controls formatting for all the output tables. `FORMAT` and
+all of its settings are optional.
+
+Use `MINCOLWIDTH` and `MAXCOLWIDTH` to control the minimum or maximum
+width of columns in output tables. By default, with `DEFAULT`, column
+width varies based on content. Otherwise, specify a number for either
+or both of these settings. If both are specified, `MAXCOLWIDTH` must be
+greater than or equal to `MINCOLWIDTH`. The default unit, or with
+`UNITS=POINTS`, is points (1/72 inch), or specify `UNITS=INCHES` to use
+inches or `UNITS=CM` for centimeters. PSPP does not currently honor any
+of these settings.
+
+By default, or with `EMPTY=ZERO`, zero values are displayed in their
+usual format. Use `EMPTY=BLANK` to use an empty cell instead, or
+`EMPTY="string"` to use the specified string.
+
+By default, missing values are displayed as `.`, the same as in other
+tables. Specify `MISSING="string"` to instead use a custom string.
+
+## Display of Variable Labels
+
+```
+/VLABELS
+ VARIABLES=variables
+ DISPLAY={DEFAULT | NAME | LABEL | BOTH | NONE}
+```
+
+The `VLABELS` subcommand, which must precede the first `TABLE`
+subcommand, controls display of variable labels in all the output
+tables. `VLABELS` is optional. It may appear multiple times to adjust
+settings for different variables.
+
+`VARIABLES` and `DISPLAY` are required. The value of `DISPLAY`
+controls how variable labels are displayed for the variables listed on
+`VARIABLES`. The supported values are:
+
+* `DEFAULT`
+ Use the setting from [`SET TVARS`](set.md#tvars)).
+
+* `NAME`
+ Show only a variable name.
+
+* `LABEL`
+ Show only a variable label.
+
+* `BOTH`
+ Show variable name and label.
+
+* `NONE`
+ Show nothing.
+
+## Missing Value Treatment
+
+The `TABLE` subcommand on `CTABLES` specifies two different kinds of
+variables: variables that divide tables into cells (which are always
+categorical) and variables being summarized (which may be categorical or
+scale). PSPP treats missing values differently in each kind of
+variable, as described in the sections below.
+
+### Missing Values for Cell-Defining Variables
+
+For variables that divide tables into cells, per-variable category
+options, as described in [Per-Variable Category
+Options](#per-variable-category-options), determine which data is
+analyzed. If any of the categories for such a variable would exclude
+a case, then that case is not included.
+
+As an example, consider the following entirely artificial dataset, in
+which `x` and `y` are categorical variables with missing value 9, and
+`z` is scale:
+
+```
+ Data List
++─+─+─────────+
+│x│y│ z │
++─+─+─────────+
+│1│1│ 1│
+│1│2│ 10│
+│1│9│ 100│
+│2│1│ 1000│
+│2│2│ 10000│
+│2│9│ 100000│
+│9│1│ 1000000│
+│9│2│ 10000000│
+│9│9│100000000│
++─+─+─────────+
+```
+
+
+Using `x` and `y` to define cells, and summarizing `z`, by default
+PSPP omits all the cases that have `x` or `y` (or both) missing:
+
+```
+CTABLES /TABLE x > y > z [SUM].
+```
+
+```
+ Custom Tables
++─────────+─────+
+│ │ Sum │
++─────────+─────+
+│x 1 y 1 z│ 1│
+│ ────+─────+
+│ 2 z│ 10│
+│ ────────+─────+
+│ 2 y 1 z│ 1000│
+│ ────+─────+
+│ 2 z│10000│
++─────────+─────+
+```
+
+
+If, however, we add `CATEGORIES` specifications to include missing
+values for `y` or for `x` and `y`, the output table includes them, like
+so:
+
+```
+CTABLES /TABLE x > y > z [SUM] /CATEGORIES VARIABLES=y MISSING=INCLUDE.
+CTABLES /TABLE x > y > z [SUM] /CATEGORIES VARIABLES=x y MISSING=INCLUDE.
+```
+
+```
+ Custom Tables
++─────────+──────+
+│ │ Sum │
++─────────+──────+
+│x 1 y 1 z│ 1│
+│ ────+──────+
+│ 2 z│ 10│
+│ ────+──────+
+│ 9 z│ 100│
+│ ────────+──────+
+│ 2 y 1 z│ 1000│
+│ ────+──────+
+│ 2 z│ 10000│
+│ ────+──────+
+│ 9 z│100000│
++─────────+──────+
+
+ Custom Tables
++─────────+─────────+
+│ │ Sum │
++─────────+─────────+
+│x 1 y 1 z│ 1│
+│ ────+─────────+
+│ 2 z│ 10│
+│ ────+─────────+
+│ 9 z│ 100│
+│ ────────+─────────+
+│ 2 y 1 z│ 1000│
+│ ────+─────────+
+│ 2 z│ 10000│
+│ ────+─────────+
+│ 9 z│ 100000│
+│ ────────+─────────+
+│ 9 y 1 z│ 1000000│
+│ ────+─────────+
+│ 2 z│ 10000000│
+│ ────+─────────+
+│ 9 z│100000000│
++─────────+─────────+
+```
+
+
+### Missing Values for Summary Variables
+
+For summary variables, values that are valid and in included categories
+are analyzed, and values that are missing or in excluded categories are
+not analyzed, with the following exceptions:
+
+- The `VALIDN` summary functions (`VALIDN`, `EVALIDN`, `UVALIDN`,
+ `areaPCT.VALIDN`, and `UareaPCT.VALIDN`) only count valid values in
+ included categories (not missing values in included categories).
+
+- The `TOTALN` summary functions (`TOTALN`, `ETOTALN`, `UTOTALN`,
+ `areaPCT.TOTALN`), and `UareaPCT.TOTALN` count all values (valid
+ and missing) in included categories and missing (but not valid)
+ values in excluded categories.
+
+For categorical variables, system-missing values are never in included
+categories. For scale variables, there is no notion of included and
+excluded categories, so all values are effectively included.
+
+The following table provides another view of the above rules:
+
+||`VALIDN`|other|`TOTALN`|
+|:--|:--|:--|:--|
+|Categorical variables:||||
+| Valid values in included categories|yes|yes|yes|
+| Missing values in included categories|--|yes|yes|
+| Missing values in excluded categories|--|--|yes|
+| Valid values in excluded categories|--|--|--|
+|Scale variables:||||
+| Valid values|yes|yes|yes|
+| User- or system-missing values|--|yes|yes|
+
+
+### Scale Missing Values
+
+```
+/SMISSING {VARIABLE | LISTWISE}
+```
+
+The `SMISSING` subcommand, which must precede the first `TABLE`
+subcommand, controls treatment of missing values for scalar variables in
+producing all the output tables. `SMISSING` is optional.
+
+With `SMISSING=VARIABLE`, which is the default, missing values are
+excluded on a variable-by-variable basis. With `SMISSING=LISTWISE`,
+when stacked scalar variables are nested together with a categorical
+variable, a missing value for any of the scalar variables causes the
+case to be excluded for all of them.
+
+As an example, consider the following dataset, in which `x` is a
+categorical variable and `y` and `z` are scale:
+
+```
+ Data List
++─+─────+─────+
+│x│ y │ z │
++─+─────+─────+
+│1│ .│40.00│
+│1│10.00│50.00│
+│1│20.00│60.00│
+│1│30.00│ .│
++─+─────+─────+
+```
+
+
+With the default missing-value treatment, `x`'s mean is 20, based on the
+values 10, 20, and 30, and `y`'s mean is 50, based on 40, 50, and 60:
+
+```
+CTABLES /TABLE (y + z) > x.
+```
+
+```
+Custom Tables
++─────+─────+
+│ │ Mean│
++─────+─────+
+│y x 1│20.00│
++─────+─────+
+│z x 1│50.00│
++─────+─────+
+```
+
+
+By adding `SMISSING=LISTWISE`, only cases where `y` and `z` are both
+non-missing are considered, so `x`'s mean becomes 15, as the average of
+10 and 20, and `y`'s mean becomes 55, the average of 50 and 60:
+
+```
+CTABLES /SMISSING LISTWISE /TABLE (y + z) > x.
+```
+
+```
+Custom Tables
++─────+─────+
+│ │ Mean│
++─────+─────+
+│y x 1│15.00│
++─────+─────+
+│z x 1│55.00│
++─────+─────+
+```
+
+
+Even with `SMISSING=LISTWISE`, if `y` and `z` are separately nested with
+`x`, instead of using a single `>` operator, missing values revert to
+being considered on a variable-by-variable basis:
+
+```
+CTABLES /SMISSING LISTWISE /TABLE (y > x) + (z > x).
+```
+
+```
+Custom Tables
++─────+─────+
+│ │ Mean│
++─────+─────+
+│y x 1│20.00│
++─────+─────+
+│z x 1│50.00│
++─────+─────+
+```
+
+
+## Computed Categories
+
+```
+/PCOMPUTE &postcompute=EXPR(expression)
+/PPROPERTIES &postcompute...
+ [LABEL=string]
+ [FORMAT=[summary format]...]
+ [HIDESOURCECATS={NO | YES}
+```
+
+"Computed categories", also called "postcomputes", are categories
+created using arithmetic on categories obtained from the data. The
+`PCOMPUTE` subcommand creates a postcompute, which may then be used on
+`CATEGORIES` within an [explicit category
+list](#explicit-categories). Optionally, `PPROPERTIES` refines how
+a postcompute is displayed. The following sections provide the
+details.
+
+### PCOMPUTE
+
+```
+/PCOMPUTE &postcompute=EXPR(expression)
+```
+
+The `PCOMPUTE` subcommand, which must precede the first `TABLE`
+command, defines computed categories. It is optional and may be used
+any number of times to define multiple postcomputes.
+
+Each `PCOMPUTE` defines one postcompute. Its syntax consists of a
+name to identify the postcompute as a PSPP identifier prefixed by `&`,
+followed by `=` and a postcompute expression enclosed in `EXPR(...)`. A
+postcompute expression consists of:
+
+* `[category]`
+ This form evaluates to the summary statistic for category, e.g.
+ `[1]` evaluates to the value of the summary statistic associated
+ with category 1. The category may be a number, a quoted string, or
+ a quoted time or date value. All of the categories for a given
+ postcompute must have the same form. The category must appear in
+ all the `CATEGORIES` list in which the postcompute is used.
+
+* `[min THRU max]`
+`[LO THRU max]`
+`[min THRU HI]`
+`MISSING`
+`OTHERNM`
+ These forms evaluate to the summary statistics for a category
+ specified with the same syntax, as described in a previous section
+ (see [Explicit Category List](#explicit-categories)). The
+ category must appear in all the `CATEGORIES` list in which the
+ postcompute is used.
+
+* `SUBTOTAL`
+ The summary statistic for the subtotal category. This form is
+ allowed only if the `CATEGORIES` lists that include this
+ postcompute have exactly one subtotal.
+
+* `SUBTOTAL[index]`
+ The summary statistic for subtotal category index, where 1 is the
+ first subtotal, 2 is the second, and so on. This form may be used
+ for `CATEGORIES` lists with any number of subtotals.
+
+* `TOTAL`
+ The summary statistic for the total. The `CATEGORIES` lsits that
+ include this postcompute must have a total enabled.
+
+* `a + b`
+ `a - b`
+ `a * b`
+ `a / b`
+ `a ** b`
+ These forms perform arithmetic on the values of postcompute
+ expressions a and b. The usual operator precedence rules apply.
+
+* `number`
+ Numeric constants may be used in postcompute expressions.
+
+* `(a)`
+ Parentheses override operator precedence.
+
+A postcompute is not associated with any particular variable.
+Instead, it may be referenced within `CATEGORIES` for any suitable
+variable (e.g. only a string variable is suitable for a postcompute
+expression that refers to a string category, only a variable with
+subtotals for an expression that refers to subtotals, ...).
+
+Normally a named postcompute is defined only once, but if a later
+`PCOMPUTE` redefines a postcompute with the same name as an earlier one,
+the later one take precedence.
+
+The following syntax and output shows how `PCOMPUTE` can compute a
+total over subtotals, summing the "Frequent Drivers" and "Infrequent
+Drivers" subtotals to form an "All Drivers" postcompute. It also
+shows how to calculate and display a percentage, in this case the
+percentage of valid responses that report never driving. It uses
+[`PPROPERTIES`](#pproperties) to display the latter in `PCT` format.
+
+```
+CTABLES
+ /PCOMPUTE &all_drivers=EXPR([1 THRU 2] + [3 THRU 4])
+ /PPROPERTIES &all_drivers LABEL='All Drivers'
+ /PCOMPUTE &pct_never=EXPR([5] / ([1 THRU 2] + [3 THRU 4] + [5]) * 100)
+ /PPROPERTIES &pct_never LABEL='% Not Drivers' FORMAT=COUNT PCT40.1
+ /TABLE=freqOfDriving BY gender
+ /CATEGORIES VARIABLES=freqOfDriving
+ [1 THRU 2, SUBTOTAL='Frequent Drivers',
+ 3 THRU 4, SUBTOTAL='Infrequent Drivers',
+ &all_drivers, 5, &pct_never,
+ MISSING, SUBTOTAL='Not Drivers or Missing'].
+```
+
+```
+ Custom Tables
++────────────────────────────────────────────────────────────────+────────────+
+│ │S3a. GENDER:│
+│ +─────+──────+
+│ │ Male│Female│
+│ +─────+──────+
+│ │Count│ Count│
++────────────────────────────────────────────────────────────────+─────+──────+
+│ 1. How often do you usually drive a car or Every day │ 2305│ 2362│
+│other motor vehicle? Several days a week │ 440│ 834│
+│ Frequent Drivers │ 2745│ 3196│
+│ Once a week or less │ 125│ 236│
+│ Only certain times a│ 58│ 72│
+│ year │ │ │
+│ Infrequent Drivers │ 183│ 308│
+│ All Drivers │ 2928│ 3504│
+│ Never │ 192│ 348│
+│ % Not Drivers │ 6.2%│ 9.0%│
+│ Don't know │ 3│ 5│
+│ Refused │ 9│ 10│
+│ Not Drivers or │ 204│ 363│
+│ Missing │ │ │
++────────────────────────────────────────────────────────────────+─────+──────+
+```
+
+
+### PPROPERTIES
+
+```
+/PPROPERTIES &postcompute...
+ [LABEL=string]
+ [FORMAT=[summary format]...]
+ [HIDESOURCECATS={NO | YES}
+```
+
+The `PPROPERTIES` subcommand, which must appear before `TABLE`, sets
+properties for one or more postcomputes defined on prior `PCOMPUTE`
+subcommands. The subcommand syntax begins with the list of
+postcomputes, each prefixed with `&` as specified on `PCOMPUTE`.
+
+All of the settings on `PPROPERTIES` are optional. Use `LABEL` to
+set the label shown for the postcomputes in table output. The default
+label for a postcompute is the expression used to define it.
+
+A postcompute always uses same summary functions as the variable
+whose categories contain it, but `FORMAT` allows control over the format
+used to display their values. It takes a list of summary function names
+and format specifiers.
+
+By default, or with `HIDESOURCECATS=NO`, categories referred to by
+computed categories are displayed like other categories. Use
+`HIDESOURCECATS=YES` to hide them.
+
+The previous section provides an example for `PPROPERTIES`.
+
+## Effective Weight
+
+```
+/WEIGHT VARIABLE=variable
+```
+
+The `WEIGHT` subcommand is optional and must appear before `TABLE`.
+If it appears, it must name a numeric variable, known as the
+"effective weight" or "adjustment weight". The effective weight
+variable stands in for the dictionary's [weight variable](weight.md),
+if any, in most calculations in `CTABLES`. The only exceptions are
+the `COUNT`, `TOTALN`, and `VALIDN` summary functions, which use the
+dictionary weight instead.
+
+Weights obtained from the PSPP dictionary are rounded to the nearest
+integer at the case level. Effective weights are not rounded.
+Regardless of the weighting source, PSPP does not analyze cases with
+zero, missing, or negative effective weights.
+
+
+## Hiding Small Counts
+
+```
+/HIDESMALLCOUNTS COUNT=count
+```
+
+The `HIDESMALLCOUNTS` subcommand is optional. If it specified, then
+`COUNT`, `ECOUNT`, and `UCOUNT` values in output tables less than the
+value of count are shown as `<count` instead of their true values. The
+value of count must be an integer and must be at least 2.
+
+The following syntax and example shows how to use `HIDESMALLCOUNTS`:
+
+```
+CTABLES /HIDESMALLCOUNTS COUNT=10 /TABLE placeOfLastDrinkBeforeDrive.
+```
+
+```
+ Custom Tables
++───────────────────────────────────────────────────────────────────────+─────+
+│ │Count│
++───────────────────────────────────────────────────────────────────────+─────+
+│37. Please think about the most recent occasion that Other (list) │<10 │
+│you drove within two hours of drinking alcoholic Your home │ 182│
+│beverages. Where did you drink on that occasion? Friend's home │ 264│
+│ Bar/Tavern/Club │ 279│
+│ Restaurant │ 495│
+│ Work │ 21│
+│ Bowling alley │<10 │
+│ Hotel/Motel │<10 │
+│ Country Club/ │ 17│
+│ Golf course │ │
+│ Drank in the │<10 │
+│ car/On the road │ │
+│ Sporting event │ 15│
+│ Movie theater │<10 │
+│ Shopping/Store/ │<10 │
+│ Grocery store │ │
+│ Wedding │ 15│
+│ Party at someone│ 81│
+│ else's home │ │
+│ Park/picnic │ 14│
+│ Party at your │<10 │
+│ house │ │
++───────────────────────────────────────────────────────────────────────+─────+
+```
--- /dev/null
+# Data Input and Output
+
+Data are the focus of the PSPP language. Each datum belongs to a “case”
+(also called an “observation”). Each case represents an individual or
+"experimental unit". For example, in the results of a survey, the names
+of the respondents, their sex, age, etc. and their responses are all
+data and the data pertaining to single respondent is a case. This
+chapter examines the PSPP commands for defining variables and reading
+and writing data. There are alternative commands to read data from
+predefined sources such as system files or databases.
+
+> These commands tell PSPP how to read data, but the data will
+not actually be read until a procedure is executed.
+++ /dev/null
-# BEGIN DATA…END DATA
-
-```
-BEGIN DATA.
-...
-END DATA.
-```
-
-`BEGIN DATA` and `END DATA` can be used to embed raw ASCII data in a
-PSPP syntax file. [`DATA LIST`](data-list.md) or another input
-procedure must be used before `BEGIN DATA`. `BEGIN DATA` and `END
-DATA` must be used together. `END DATA` must appear by itself on a
-single line, with no leading white space and exactly one space between
-the words `END` and `DATA`.
+++ /dev/null
-# CLOSE FILE HANDLE
-
-```
-CLOSE FILE HANDLE HANDLE_NAME.
-```
-
-`CLOSE FILE HANDLE` disassociates the name of a [file
-handle](../../language/files/file-handles.md) with a given file. The
-only specification is the name of the handle to close. Afterward
-`FILE HANDLE`.
-
-The file named INLINE, which represents data entered between `BEGIN
-DATA` and `END DATA`, cannot be closed. Attempts to close it with
-`CLOSE FILE HANDLE` have no effect.
-
-`CLOSE FILE HANDLE` is a PSPP extension.
-
+++ /dev/null
-# DATA LIST
-
-Used to read text or binary data, `DATA LIST` is the most fundamental
-data-reading command. Even the more sophisticated input methods use
-`DATA LIST` commands as a building block. Understanding `DATA LIST` is
-important to understanding how to use PSPP to read your data files.
-
- There are two major variants of `DATA LIST`, which are fixed format
-and free format. In addition, free format has a minor variant, list
-format, which is discussed in terms of its differences from vanilla free
-format.
-
- Each form of `DATA LIST` is described in detail below.
-
- See [`GET DATA`](../spss-io/get-data.md) for a command that offers
-a few enhancements over DATA LIST and that may be substituted for DATA
-LIST in many situations.
-
-## DATA LIST FIXED
-
-```
-DATA LIST [FIXED]
- {TABLE,NOTABLE}
- [FILE='FILE_NAME' [ENCODING='ENCODING']]
- [RECORDS=RECORD_COUNT]
- [END=END_VAR]
- [SKIP=RECORD_COUNT]
- /[line_no] VAR_SPEC...
-
-where each VAR_SPEC takes one of the forms
- VAR_LIST START-END [TYPE_SPEC]
- VAR_LIST (FORTRAN_SPEC)
-```
-
- `DATA LIST FIXED` is used to read data files that have values at
-fixed positions on each line of single-line or multiline records. The
-keyword `FIXED` is optional.
-
- The `FILE` subcommand must be used if input is to be taken from an
-external file. It may be used to specify a file name as a string or a
-[file handle](../../language/files/file-handles.md). If the `FILE`
-subcommand is not used, then input is assumed to be specified within
-the command file using [`BEGIN DATA`...`END DATA`](begin-data.md).
-The `ENCODING` subcommand may only be used if the `FILE` subcommand is
-also used. It specifies the character encoding of the file. See
-[`INSERT`](../utilities/insert.md), for information on supported
-encodings.
-
- The optional `RECORDS` subcommand, which takes a single integer as an
-argument, is used to specify the number of lines per record. If
-`RECORDS` is not specified, then the number of lines per record is
-calculated from the list of variable specifications later in `DATA
-LIST`.
-
- The `END` subcommand is only useful in conjunction with [`INPUT
-PROGRAM`](input-program.md).
-
- The optional `SKIP` subcommand specifies a number of records to skip
-at the beginning of an input file. It can be used to skip over a row
-that contains variable names, for example.
-
- `DATA LIST` can optionally output a table describing how the data
-file is read. The `TABLE` subcommand enables this output, and `NOTABLE`
-disables it. The default is to output the table.
-
- The list of variables to be read from the data list must come last.
-Each line in the data record is introduced by a slash (`/`).
-Optionally, a line number may follow the slash. Following, any number
-of variable specifications may be present.
-
- Each variable specification consists of a list of variable names
-followed by a description of their location on the input line. [Sets
-of variables](../../language/datasets/variable-lists.html) may be with
-`TO`, e.g. `VAR1 TO VAR5`. There are two ways to specify the location
-of the variable on the line: columnar style and FORTRAN style.
-
- In columnar style, the starting column and ending column for the
-field are specified after the variable name, separated by a dash
-(`-`). For instance, the third through fifth columns on a line would
-be specified `3-5`. By default, variables are considered to be in
-[`F` format](../../language/datasets/formats/basic.html). (Use [`SET
-FORMAT`](../utilities/set.md#format) to change the default.)
-
- In columnar style, to use a variable format other than the default,
-specify the format type in parentheses after the column numbers. For
-instance, for alphanumeric `A` format, use `(A)`.
-
- In addition, implied decimal places can be specified in parentheses
-after the column numbers. As an example, suppose that a data file has a
-field in which the characters `1234` should be interpreted as having the
-value 12.34. Then this field has two implied decimal places, and the
-corresponding specification would be `(2)`. If a field that has implied
-decimal places contains a decimal point, then the implied decimal places
-are not applied.
-
- Changing the variable format and adding implied decimal places can be
-done together; for instance, `(N,5)`.
-
- When using columnar style, the input and output width of each
-variable is computed from the field width. The field width must be
-evenly divisible into the number of variables specified.
-
- FORTRAN style is an altogether different approach to specifying field
-locations. With this approach, a list of variable input format
-specifications, separated by commas, are placed after the variable names
-inside parentheses. Each format specifier advances as many characters
-into the input line as it uses.
-
- Implied decimal places also exist in FORTRAN style. A format
-specification with `D` decimal places also has `D` implied decimal places.
-
- In addition to the [standard
- formats](../../language/datasets/formats/index.html), FORTRAN style
- defines some extensions:
-
-* `X`
- Advance the current column on this line by one character position.
-
-* `T<X>`
- Set the current column on this line to column `<X>`, with column
- numbers considered to begin with 1 at the left margin.
-
-* `NEWREC<X>`
- Skip forward `<X>` lines in the current record, resetting the active
- column to the left margin.
-
-* Repeat count
- Any format specifier may be preceded by a number. This causes the
- action of that format specifier to be repeated the specified number
- of times.
-
-* `(SPEC1, ..., SPECN)`
- Use `()` to group specifiers together. This is most useful when
- preceded by a repeat count. Groups may be nested.
-
- FORTRAN and columnar styles may be freely intermixed. Columnar style
-leaves the active column immediately after the ending column specified.
-Record motion using `NEWREC` in FORTRAN style also applies to later
-FORTRAN and columnar specifiers.
-
-### Example 1
-
-```
-DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
-
-BEGIN DATA.
-John Smith 102311
-Bob Arnold 122015
-Bill Yates 918 6
-END DATA.
-```
-
-Defines the following variables:
-
- - `NAME`, a 10-character-wide string variable, in columns 1
- through 10.
-
- - `INFO1`, a numeric variable, in columns 12 through 13.
-
- - `INFO2`, a numeric variable, in columns 14 through 15.
-
- - `INFO3`, a numeric variable, in columns 16 through 17.
-
-The `BEGIN DATA`/`END DATA` commands cause three cases to be
-defined:
-
-|Case |NAME |INFO1 |INFO2 |INFO3|
-|------:|:------------|-------:|-------:|----:|
-| 1 |John Smith | 10 | 23 | 11|
-| 2 |Bob Arnold | 12 | 20 | 15|
-| 3 |Bill Yates | 9 | 18 | 6|
-
-The `TABLE` keyword causes PSPP to print out a table describing the
-four variables defined.
-
-### Example 2
-
-```
-DATA LIST FILE="survey.dat"
- /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
- /Q01 TO Q50 7-56
- /.
-```
-
-Defines the following variables:
-
- - `ID`, a numeric variable, in columns 1-5 of the first record.
-
- - `NAME`, a 30-character string variable, in columns 7-36 of the
- first record.
-
- - `SURNAME`, a 30-character string variable, in columns 38-67 of
- the first record.
-
- - `MINITIAL`, a 1-character string variable, in column 69 of the
- first record.
-
- - Fifty variables `Q01`, `Q02`, `Q03`, ..., `Q49`, `Q50`, all
- numeric, `Q01` in column 7, `Q02` in column 8, ..., `Q49` in
- column 55, `Q50` in column 56, all in the second record.
-
-Cases are separated by a blank record.
-
-Data is read from file `survey.dat` in the current directory.
-
-## DATA LIST FREE
-
-```
-DATA LIST FREE
- [({TAB,'C'}, ...)]
- [{NOTABLE,TABLE}]
- [FILE='FILE_NAME' [ENCODING='ENCODING']]
- [SKIP=N_RECORDS]
- /VAR_SPEC...
-
-where each VAR_SPEC takes one of the forms
- VAR_LIST [(TYPE_SPEC)]
- VAR_LIST *
-```
-
- In free format, the input data is, by default, structured as a
-series of fields separated by spaces, tabs, or line breaks. If the
-current [`DECIMAL`](../utilities/set.md#decimal) separator is `DOT`,
-then commas are also treated as field separators. Each field's
-content may be unquoted, or it may be quoted with a pairs of
-apostrophes (`'`) or double quotes (`"`). Unquoted white space
-separates fields but is not part of any field. Any mix of spaces,
-tabs, and line breaks is equivalent to a single space for the purpose
-of separating fields, but consecutive commas will skip a field.
-
- Alternatively, delimiters can be specified explicitly, as a
-parenthesized, comma-separated list of single-character strings
-immediately following `FREE`. The word `TAB` may also be used to
-specify a tab character as a delimiter. When delimiters are specified
-explicitly, only the given characters, plus line breaks, separate
-fields. Furthermore, leading spaces at the beginnings of fields are
-not trimmed, consecutive delimiters define empty fields, and no form
-of quoting is allowed.
-
- The `NOTABLE` and `TABLE` subcommands are as in `DATA LIST FIXED`
-above. `NOTABLE` is the default.
-
- The `FILE`, `SKIP`, and `ENCODING` subcommands are as in `DATA LIST
-FIXED` above.
-
- The variables to be parsed are given as a single list of variable
-names. This list must be introduced by a single slash (`/`). The set
-of variable names may contain [format
-specifications](../../language/datasets/formats/index.html) in
-parentheses. Format specifications apply to all variables back to the
-previous parenthesized format specification.
-
- An asterisk on its own has the same effect as `(F8.0)`, assigning
-the variables preceding it input/output format `F8.0`.
-
- Specified field widths are ignored on input, although all normal
-limits on field width apply, but they are honored on output.
-
-## DATA LIST LIST
-
-```
-DATA LIST LIST
- [({TAB,'C'}, ...)]
- [{NOTABLE,TABLE}]
- [FILE='FILE_NAME' [ENCODING='ENCODING']]
- [SKIP=RECORD_COUNT]
- /VAR_SPEC...
-
-where each VAR_SPEC takes one of the forms
- VAR_LIST [(TYPE_SPEC)]
- VAR_LIST *
-```
-
- With one exception, `DATA LIST LIST` is syntactically and
-semantically equivalent to `DATA LIST FREE`. The exception is that each
-input line is expected to correspond to exactly one input record. If
-more or fewer fields are found on an input line than expected, an
-appropriate diagnostic is issued.
-
+++ /dev/null
-# DATAFILE ATTRIBUTE
-
-```
-DATAFILE ATTRIBUTE
- ATTRIBUTE=NAME('VALUE') [NAME('VALUE')]...
- ATTRIBUTE=NAME[INDEX]('VALUE') [NAME[INDEX]('VALUE')]...
- DELETE=NAME [NAME]...
- DELETE=NAME[INDEX] [NAME[INDEX]]...
-```
-
- `DATAFILE ATTRIBUTE` adds, modifies, or removes user-defined
-attributes associated with the active dataset. Custom data file
-attributes are not interpreted by PSPP, but they are saved as part of
-system files and may be used by other software that reads them.
-
- Use the `ATTRIBUTE` subcommand to add or modify a custom data file
-attribute. Specify the name of the attribute, followed by the desired
-value, in parentheses, as a quoted string. Attribute names that begin
-with `$` are reserved for PSPP's internal use, and attribute names
-that begin with `@` or `$@` are not displayed by most PSPP commands
-that display other attributes. Other attribute names are not treated
-specially.
-
- Attributes may also be organized into arrays. To assign to an array
-element, add an integer array index enclosed in square brackets (`[` and
-`]`) between the attribute name and value. Array indexes start at 1,
-not 0. An attribute array that has a single element (number 1) is not
-distinguished from a non-array attribute.
-
- Use the `DELETE` subcommand to delete an attribute. Specify an
-attribute name by itself to delete an entire attribute, including all
-array elements for attribute arrays. Specify an attribute name followed
-by an array index in square brackets to delete a single element of an
-attribute array. In the latter case, all the array elements numbered
-higher than the deleted element are shifted down, filling the vacated
-position.
-
- To associate custom attributes with particular variables, instead
-of with the entire active dataset, use [`VARIABLE
-ATTRIBUTE`](../variables/variable-attribute.md) instead.
-
- `DATAFILE ATTRIBUTE` takes effect immediately. It is not affected by
-conditional and looping structures such as `DO IF` or `LOOP`.
-
+++ /dev/null
-# DATASET commands
-
-```
-DATASET NAME NAME [WINDOW={ASIS,FRONT}].
-DATASET ACTIVATE NAME [WINDOW={ASIS,FRONT}].
-DATASET COPY NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}].
-DATASET DECLARE NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}].
-DATASET CLOSE {NAME,*,ALL}.
-DATASET DISPLAY.
-```
-
- The `DATASET` commands simplify use of multiple datasets within a
-PSPP session. They allow datasets to be created and destroyed. At any
-given time, most PSPP commands work with a single dataset, called the
-active dataset.
-
- The `DATASET NAME` command gives the active dataset the specified name,
-or if it already had a name, it renames it. If another dataset already
-had the given name, that dataset is deleted.
-
- The `DATASET ACTIVATE` command selects the named dataset, which must
-already exist, as the active dataset. Before switching the active
-dataset, any pending transformations are executed, as if `EXECUTE` had
-been specified. If the active dataset is unnamed before switching, then
-it is deleted and becomes unavailable after switching.
-
- The `DATASET COPY` command creates a new dataset with the specified
-name, whose contents are a copy of the active dataset. Any pending
-transformations are executed, as if `EXECUTE` had been specified, before
-making the copy. If a dataset with the given name already exists, it is
-replaced. If the name is the name of the active dataset, then the
-active dataset becomes unnamed.
-
- The `DATASET DECLARE` command creates a new dataset that is
-initially "empty," that is, it has no dictionary or data. If a
-dataset with the given name already exists, this has no effect. The
-new dataset can be used with commands that support output to a
-dataset, such as. [`AGGREGATE`](../data/aggregate.md).
-
- The `DATASET CLOSE` command deletes a dataset. If the active dataset
-is specified by name, or if `*` is specified, then the active dataset
-becomes unnamed. If a different dataset is specified by name, then it
-is deleted and becomes unavailable. Specifying `ALL` deletes all datasets
-except for the active dataset, which becomes unnamed.
-
- The `DATASET DISPLAY` command lists all the currently defined datasets.
-
- Many `DATASET` commands accept an optional `WINDOW` subcommand. In the
-PSPPIRE GUI, the value given for this subcommand influences how the
-dataset's window is displayed. Outside the GUI, the `WINDOW` subcommand
-has no effect. The valid values are:
-
-* `ASIS`
- Do not change how the window is displayed. This is the default for
- `DATASET NAME` and `DATASET ACTIVATE`.
-
-* `FRONT`
- Raise the dataset's window to the top. Make it the default dataset
- for running syntax.
-
-* `MINIMIZED`
- Display the window "minimized" to an icon. Prefer other datasets
- for running syntax. This is the default for `DATASET COPY` and
- `DATASET DECLARE`.
-
-* `HIDDEN`
- Hide the dataset's window. Prefer other datasets for running
- syntax.
+++ /dev/null
-# END CASE
-
-```
-END CASE.
-```
-
-`END CASE` is used only within [`INPUT PROGRAM`](input-program.md) to
-output the current case.
-
+++ /dev/null
-# END FILE
-
-```
-END FILE.
-```
-
-`END FILE` is used only within [`INPUT PROGRAM`](input-program.md) to
-terminate the current input program.
-
+++ /dev/null
-# FILE HANDLE
-
-## Syntax Overview
-
-For text files:
-
-```
-FILE HANDLE HANDLE_NAME
- /NAME='FILE_NAME
- [/MODE=CHARACTER]
- [/ENDS={CR,CRLF}]
- /TABWIDTH=TAB_WIDTH
- [ENCODING='ENCODING']
-```
-
-For binary files in native encoding with fixed-length records:
-```
-FILE HANDLE HANDLE_NAME
- /NAME='FILE_NAME'
- /MODE=IMAGE
- [/LRECL=REC_LEN]
- [ENCODING='ENCODING']
-```
-
-For binary files in native encoding with variable-length records:
-```
-FILE HANDLE HANDLE_NAME
- /NAME='FILE_NAME'
- /MODE=BINARY
- [/LRECL=REC_LEN]
- [ENCODING='ENCODING']
-```
-
-For binary files encoded in EBCDIC:
-```
-FILE HANDLE HANDLE_NAME
- /NAME='FILE_NAME'
- /MODE=360
- /RECFORM={FIXED,VARIABLE,SPANNED}
- [/LRECL=REC_LEN]
- [ENCODING='ENCODING']
-```
-
-## Details
-
- Use `FILE HANDLE` to associate a file handle name with a file and its
-attributes, so that later commands can refer to the file by its handle
-name. Names of text files can be specified directly on commands that
-access files, so that `FILE HANDLE` is only needed when a file is not an
-ordinary file containing lines of text. However, `FILE HANDLE` may be
-used even for text files, and it may be easier to specify a file's name
-once and later refer to it by an abstract handle.
-
-Specify the file handle name as the identifier immediately following
-the `FILE HANDLE` command name. The identifier `INLINE` is reserved
-for representing data embedded in the syntax file (see [BEGIN
-DATA](begin-data.md)). The file handle name must not already have been
-used in a previous invocation of `FILE HANDLE`, unless it has been
-closed with [`CLOSE FILE HANDLE`](close-file-handle.md).
-
-The effect and syntax of `FILE HANDLE` depends on the selected `MODE`:
-
- - In `CHARACTER` mode, the default, the data file is read as a text
- file. Each text line is read as one record.
-
- In `CHARACTER` mode only, tabs are expanded to spaces by input
- programs, except by `DATA LIST FREE` with explicitly specified
- delimiters. Each tab is 4 characters wide by default, but `TABWIDTH`
- (a PSPP extension) may be used to specify an alternate width. Use
- a `TABWIDTH` of 0 to suppress tab expansion.
-
- A file written in `CHARACTER` mode by default uses the line ends of
- the system on which PSPP is running, that is, on Windows, the
- default is CR LF line ends, and on other systems the default is LF
- only. Specify `ENDS` as `CR` or `CRLF` to override the default. PSPP
- reads files using either convention on any kind of system,
- regardless of `ENDS`.
-
- - In `IMAGE` mode, the data file is treated as a series of fixed-length
- binary records. `LRECL` should be used to specify the record length
- in bytes, with a default of 1024. On input, it is an error if an
- `IMAGE` file's length is not a integer multiple of the record length.
- On output, each record is padded with spaces or truncated, if
- necessary, to make it exactly the correct length.
-
- - In `BINARY` mode, the data file is treated as a series of
- variable-length binary records. `LRECL` may be specified, but
- its value is ignored. The data for each record is both preceded
- and followed by a 32-bit signed integer in little-endian byte
- order that specifies the length of the record. (This redundancy
- permits records in these files to be efficiently read in reverse
- order, although PSPP always reads them in forward order.) The
- length does not include either integer.
-
- - Mode `360` reads and writes files in formats first used for tapes
- in the 1960s on IBM mainframe operating systems and still
- supported today by the modern successors of those operating
- systems. For more information, see `OS/400 Tape and Diskette
- Device Programming`, available on IBM's website.
-
- Alphanumeric data in mode `360` files are encoded in EBCDIC. PSPP
- translates EBCDIC to or from the host's native format as necessary
- on input or output, using an ASCII/EBCDIC translation that is
- one-to-one, so that a "round trip" from ASCII to EBCDIC back to
- ASCII, or vice versa, always yields exactly the original data.
-
- The `RECFORM` subcommand is required in mode `360`. The precise
- file format depends on its setting:
-
- * `F`
- `FIXED`
- This record format is equivalent to `IMAGE` mode, except for
- EBCDIC translation.
-
- IBM documentation calls this `*F` (fixed-length, deblocked)
- format.
-
- * `V`
- `VARIABLE`
- The file comprises a sequence of zero or more variable-length
- blocks. Each block begins with a 4-byte "block descriptor
- word" (BDW). The first two bytes of the BDW are an unsigned
- integer in big-endian byte order that specifies the length of
- the block, including the BDW itself. The other two bytes of
- the BDW are ignored on input and written as zeros on output.
-
- Following the BDW, the remainder of each block is a sequence
- of one or more variable-length records, each of which in turn
- begins with a 4-byte "record descriptor word" (RDW) that has
- the same format as the BDW. Following the RDW, the remainder
- of each record is the record data.
-
- The maximum length of a record in `VARIABLE` mode is 65,527
- bytes: 65,535 bytes (the maximum value of a 16-bit unsigned
- integer), minus 4 bytes for the BDW, minus 4 bytes for the
- RDW.
-
- In mode `VARIABLE`, `LRECL` specifies a maximum, not a fixed,
- record length, in bytes. The default is 8,192.
-
- IBM documentation calls this `*VB` (variable-length, blocked,
- unspanned) format.
-
- * `VS`
- `SPANNED`
- This format is like `VARIABLE`, except that logical records may
- be split among multiple physical records (called "segments") or
- blocks. In `SPANNED` mode, the third byte of each RDW is
- called the segment control character (SCC). Odd SCC values
- cause the segment to be appended to a record buffer maintained
- in memory; even values also append the segment and then flush
- its contents to the input procedure. Canonically, SCC value 0
- designates a record not spanned among multiple segments, and
- values 1 through 3 designate the first segment, the last
- segment, or an intermediate segment, respectively, within a
- multi-segment record. The record buffer is also flushed at end
- of file regardless of the final record's SCC.
-
- The maximum length of a logical record in `VARIABLE` mode is
- limited only by memory available to PSPP. Segments are
- limited to 65,527 bytes, as in `VARIABLE` mode.
-
- This format is similar to what IBM documentation call `*VS`
- (variable-length, deblocked, spanned) format.
-
- In mode `360`, fields of type `A` that extend beyond the end of a
- record read from disk are padded with spaces in the host's native
- character set, which are then translated from EBCDIC to the
- native character set. Thus, when the host's native character set
- is based on ASCII, these fields are effectively padded with
- character `X'80'`. This wart is implemented for compatibility.
-
- The `NAME` subcommand specifies the name of the file associated with
-the handle. It is required in all modes but `SCRATCH` mode, in which its
-use is forbidden.
-
- The `ENCODING` subcommand specifies the encoding of text in the
-file. For reading text files in `CHARACTER` mode, all of the forms
-described for `ENCODING` on the [`INSERT`](../utilities/insert.md)
-command are supported. For reading in other file-based modes,
-encoding autodetection is not supported; if the specified encoding
-requests autodetection then the default encoding is used. This is
-also true when a file handle is used for writing a file in any mode.
-
+++ /dev/null
-# Data Input and Output
-
-Data are the focus of the PSPP language. Each datum belongs to a “case”
-(also called an “observation”). Each case represents an individual or
-"experimental unit". For example, in the results of a survey, the names
-of the respondents, their sex, age, etc. and their responses are all
-data and the data pertaining to single respondent is a case. This
-chapter examines the PSPP commands for defining variables and reading
-and writing data. There are alternative commands to read data from
-predefined sources such as system files or databases.
-
-> These commands tell PSPP how to read data, but the data will
-not actually be read until a procedure is executed.
+++ /dev/null
-# INPUT PROGRAM…END INPUT PROGRAM
-
-```
-INPUT PROGRAM.
-... input commands ...
-END INPUT PROGRAM.
-```
-
- `INPUT PROGRAM`...`END INPUT PROGRAM` specifies a complex input
-program. By placing data input commands within `INPUT PROGRAM`, PSPP
-programs can take advantage of more complex file structures than
-available with only `DATA LIST`.
-
- The first sort of extended input program is to simply put multiple
-`DATA LIST` commands within the `INPUT PROGRAM`. This will cause all of
-the data files to be read in parallel. Input will stop when end of file
-is reached on any of the data files.
-
- Transformations, such as conditional and looping constructs, can also
-be included within `INPUT PROGRAM`. These can be used to combine input
-from several data files in more complex ways. However, input will still
-stop when end of file is reached on any of the data files.
-
- To prevent `INPUT PROGRAM` from terminating at the first end of
-file, use the `END` subcommand on `DATA LIST`. This subcommand takes
-a variable name, which should be a numeric [scratch
-variable](../../language/datasets/scratch-variables.md). (It need not
-be a scratch variable but otherwise the results can be surprising.)
-The value of this variable is set to 0 when reading the data file, or
-1 when end of file is encountered.
-
- Two additional commands are useful in conjunction with `INPUT
-PROGRAM`. `END CASE` is the first. Normally each loop through the
-`INPUT PROGRAM` structure produces one case. `END CASE` controls
-exactly when cases are output. When `END CASE` is used, looping from
-the end of `INPUT PROGRAM` to the beginning does not cause a case to be
-output.
-
- `END FILE` is the second. When the `END` subcommand is used on `DATA
-LIST`, there is no way for the `INPUT PROGRAM` construct to stop
-looping, so an infinite loop results. `END FILE`, when executed, stops
-the flow of input data and passes out of the `INPUT PROGRAM` structure.
-
- `INPUT PROGRAM` must contain at least one `DATA LIST` or `END FILE`
-command.
-
-## Example 1: Read two files in parallel to the end of the shorter
-
-The following example reads variable `X` from file `a.txt` and
-variable `Y` from file `b.txt`. If one file is shorter than the other
-then the extra data in the longer file is ignored.
-
-```
-INPUT PROGRAM.
- DATA LIST NOTABLE FILE='a.txt'/X 1-10.
- DATA LIST NOTABLE FILE='b.txt'/Y 1-10.
-END INPUT PROGRAM.
-LIST.
-```
-
-## Example 2: Read two files in parallel, supplementing the shorter
-
-The following example also reads variable `X` from `a.txt` and
-variable `Y` from `b.txt`. If one file is shorter than the other then
-it continues reading the longer to its end, setting the other variable
-to system-missing.
-
-```
-INPUT PROGRAM.
- NUMERIC #A #B.
-
- DO IF NOT #A.
- DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
- END IF.
- DO IF NOT #B.
- DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10.
- END IF.
- DO IF #A AND #B.
- END FILE.
- END IF.
- END CASE.
-END INPUT PROGRAM.
-LIST.
-```
-
-## Example 3: Concatenate two files (version 1)
-
-The following example reads data from file `a.txt`, then from `b.txt`,
-and concatenates them into a single active dataset.
-
-```
-INPUT PROGRAM.
- NUMERIC #A #B.
-
- DO IF #A.
- DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10.
- DO IF #B.
- END FILE.
- ELSE.
- END CASE.
- END IF.
- ELSE.
- DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
- DO IF NOT #A.
- END CASE.
- END IF.
- END IF.
-END INPUT PROGRAM.
-LIST.
-```
-
-## Example 4: Concatenate two files (version 2)
-
-This is another way to do the same thing as Example 3.
-
-```
-INPUT PROGRAM.
- NUMERIC #EOF.
-
- LOOP IF NOT #EOF.
- DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10.
- DO IF NOT #EOF.
- END CASE.
- END IF.
- END LOOP.
-
- COMPUTE #EOF = 0.
- LOOP IF NOT #EOF.
- DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10.
- DO IF NOT #EOF.
- END CASE.
- END IF.
- END LOOP.
-
- END FILE.
-END INPUT PROGRAM.
-LIST.
-```
-
-## Example 5: Generate random variates
-
-The follows example creates a dataset that consists of 50 random
-variates between 0 and 10.
-
-```
-INPUT PROGRAM.
- LOOP #I=1 TO 50.
- COMPUTE X=UNIFORM(10).
- END CASE.
- END LOOP.
- END FILE.
-END INPUT PROGRAM.
-LIST /FORMAT=NUMBERED.
-```
+++ /dev/null
-# LIST
-
-```
-LIST
- /VARIABLES=VAR_LIST
- /CASES=FROM START_INDEX TO END_INDEX BY INCR_INDEX
- /FORMAT={UNNUMBERED,NUMBERED} {WRAP,SINGLE}
-```
-
- The `LIST` procedure prints the values of specified variables to the
-listing file.
-
- The `VARIABLES` subcommand specifies the variables whose values are
-to be printed. Keyword `VARIABLES` is optional. If the `VARIABLES`
-subcommand is omitted then all variables in the active dataset are
-printed.
-
- The `CASES` subcommand can be used to specify a subset of cases to be
-printed. Specify `FROM` and the case number of the first case to print,
-`TO` and the case number of the last case to print, and `BY` and the
-number of cases to advance between printing cases, or any subset of
-those settings. If `CASES` is not specified then all cases are printed.
-
- The `FORMAT` subcommand can be used to change the output format.
-`NUMBERED` will print case numbers along with each case; `UNNUMBERED`,
-the default, causes the case numbers to be omitted. The `WRAP` and
-`SINGLE` settings are currently not used.
-
- Case numbers start from 1. They are counted after all
-transformations have been considered.
-
- `LIST` is a procedure. It causes the data to be read.
-
+++ /dev/null
-# NEW FILE
-
-```
-NEW FILE.
-```
-
-The `NEW FILE` command clears the dictionary and data from the current
-active dataset.
-
+++ /dev/null
-# PRINT EJECT
-
-```
-PRINT EJECT
- OUTFILE='FILE_NAME'
- RECORDS=N_LINES
- {NOTABLE,TABLE}
- /[LINE_NO] ARG...
-
-ARG takes one of the following forms:
- 'STRING' [START-END]
- VAR_LIST START-END [TYPE_SPEC]
- VAR_LIST (FORTRAN_SPEC)
- VAR_LIST *
-```
-
-`PRINT EJECT` advances to the beginning of a new output page in the
-listing file or output file. It can also output data in the same way as
-`PRINT`.
-
-All `PRINT EJECT` subcommands are optional.
-
-Without `OUTFILE`, `PRINT EJECT` ejects the current page in the
-listing file, then it produces other output, if any is specified.
-
-With `OUTFILE`, `PRINT EJECT` writes its output to the specified
-file. The first line of output is written with `1` inserted in the
-first column. Commonly, this is the only line of output. If additional
-lines of output are specified, these additional lines are written with a
-space inserted in the first column, as with `PRINT`.
-
-See [PRINT](print.md) for more information on syntax and usage.
-
+++ /dev/null
-# PRINT SPACE
-
-```
-PRINT SPACE [OUTFILE='file_name'] [ENCODING='ENCODING'] [n_lines].
-```
-
-`PRINT SPACE` prints one or more blank lines to an output file.
-
-The `OUTFILE` subcommand is optional. It may be used to direct output
-to a file specified by file name as a string or [file
-handle](../../language/files/file-handles.md). If `OUTFILE` is not
-specified then output is directed to the listing file.
-
-The `ENCODING` subcommand may only be used if `OUTFILE` is also used.
-It specifies the character encoding of the file. See
-[`INSERT`](../utilities/insert.md), for information on supported
-encodings.
-
-`n_lines` is also optional. If present, it is an
-[expression](../../language/expressions/index.md) for the number of
-blank lines to be printed. The expression must evaluate to a
-nonnegative value.
-
+++ /dev/null
-# PRINT
-
-```
-PRINT
- [OUTFILE='FILE_NAME']
- [RECORDS=N_LINES]
- [{NOTABLE,TABLE}]
- [ENCODING='ENCODING']
- [/[LINE_NO] ARG...]
-
-ARG takes one of the following forms:
- 'STRING' [START]
- VAR_LIST START-END [TYPE_SPEC]
- VAR_LIST (FORTRAN_SPEC)
- VAR_LIST *
-```
-
- The `PRINT` transformation writes variable data to the listing file
-or an output file. `PRINT` is executed when a procedure causes the
-data to be read. Follow `PRINT` by
-[`EXECUTE`](../utilities/execute.md) to print variable data without
-invoking a procedure.
-
- All `PRINT` subcommands are optional. If no strings or variables are
-specified, `PRINT` outputs a single blank line.
-
- The `OUTFILE` subcommand specifies the file to receive the output.
-The file may be a file name as a string or a [file
-handle](../../language/files/file-handles.md). If `OUTFILE` is not
-present then output is sent to PSPP's output listing file. When
-`OUTFILE` is present, the output is written to the file in a plain
-text format, with a space inserted at beginning of each output line,
-even lines that otherwise would be blank.
-
- The `ENCODING` subcommand may only be used if the `OUTFILE`
-subcommand is also used. It specifies the character encoding of the
-file. See [INSERT](../utilities/insert.md), for information on
-supported encodings.
-
- The `RECORDS` subcommand specifies the number of lines to be output.
-The number of lines may optionally be surrounded by parentheses.
-
- `TABLE` will cause the `PRINT` command to output a table to the
-listing file that describes what it will print to the output file.
-`NOTABLE`, the default, suppresses this output table.
-
- Introduce the strings and variables to be printed with a slash (`/`).
-Optionally, the slash may be followed by a number indicating which
-output line is specified. In the absence of this line number, the next
-line number is specified. Multiple lines may be specified using
-multiple slashes with the intended output for a line following its
-respective slash.
-
- Literal strings may be printed. Specify the string itself.
-Optionally the string may be followed by a column number, specifying the
-column on the line where the string should start. Otherwise, the string
-is printed at the current position on the line.
-
- Variables to be printed can be specified in the same ways as
-available for [`DATA LIST FIXED`](data-list.md#data-list-fixed). In addition,
-a variable list may be followed by an asterisk (`*`), which indicates
-that the variables should be printed in their dictionary print formats,
-separated by spaces. A variable list followed by a slash or the end of
-command is interpreted in the same way.
-
- If a FORTRAN type specification is used to move backwards on the
-current line, then text is written at that point on the line, the line
-is truncated to that length, although additional text being added will
-again extend the line to that length.
-
+++ /dev/null
-# REPEATING DATA
-
-```
-REPEATING DATA
- /STARTS=START-END
- /OCCURS=N_OCCURS
- /FILE='FILE_NAME'
- /LENGTH=LENGTH
- /CONTINUED[=CONT_START-CONT_END]
- /ID=ID_START-ID_END=ID_VAR
- /{TABLE,NOTABLE}
- /DATA=VAR_SPEC...
-
-where each VAR_SPEC takes one of the forms
- VAR_LIST START-END [TYPE_SPEC]
- VAR_LIST (FORTRAN_SPEC)
-```
-
-`REPEATING DATA` parses groups of data repeating in a uniform format,
-possibly with several groups on a single line. Each group of data
-corresponds with one case. `REPEATING DATA` may only be used within
-[`INPUT PROGRAM`](input-program.md). When used with [`DATA
-LIST`](data-list.md), it can be used to parse groups of cases that
-share a subset of variables but differ in their other data.
-
-The `STARTS` subcommand is required. Specify a range of columns,
-using literal numbers or numeric variable names. This range specifies
-the columns on the first line that are used to contain groups of data.
-The ending column is optional. If it is not specified, then the
-record width of the input file is used. For the [inline
-file](begin-data.md), this is 80 columns; for a file with fixed record
-widths it is the record width; for other files it is 1024 characters
-by default.
-
-The `OCCURS` subcommand is required. It must be a number or the name
-of a numeric variable. Its value is the number of groups present in the
-current record.
-
-The `DATA` subcommand is required. It must be the last subcommand
-specified. It is used to specify the data present within each
-repeating group. Column numbers are specified relative to the
-beginning of a group at column 1. Data is specified in the same way
-as with [`DATA LIST FIXED`](data-list.md#data-list-fixed).
-
-All other subcommands are optional.
-
-`FILE` specifies the file to read, either a file name as a string or a
-[file handle](../../language/files/file-handles.md). If `FILE` is not
-present then the default is the last file handle used on the most
-recent `DATA LIST` command.
-
-By default `REPEATING DATA` will output a table describing how it
-will parse the input data. Specifying `NOTABLE` will disable this
-behavior; specifying `TABLE` will explicitly enable it.
-
-The `LENGTH` subcommand specifies the length in characters of each
-group. If it is not present then length is inferred from the `DATA`
-subcommand. `LENGTH` may be a number or a variable name.
-
-Normally all the data groups are expected to be present on a single
-line. Use the `CONTINUED` command to indicate that data can be
-continued onto additional lines. If data on continuation lines starts
-at the left margin and continues through the entire field width, no
-column specifications are necessary on `CONTINUED`. Otherwise, specify
-the possible range of columns in the same way as on `STARTS`.
-
-When data groups are continued from line to line, it is easy for
-cases to get out of sync through careless hand editing. The `ID`
-subcommand allows a case identifier to be present on each line of
-repeating data groups. `REPEATING DATA` will check for the same
-identifier on each line and report mismatches. Specify the range of
-columns that the identifier will occupy, followed by an equals sign
-(`=`) and the identifier variable name. The variable must already have
-been declared with `NUMERIC` or another command.
-
-`REPEATING DATA` should be the last command given within an [`INPUT
-PROGRAM`](input-program.md). It should not be enclosed within
-[`LOOP`…`END LOOP`](../control/loop.md). Use `DATA LIST` before, not
-after, [`REPEATING DATA`](../data-io/repeating-data.md).
+++ /dev/null
-# REREAD
-
-```
-REREAD [FILE=handle] [COLUMN=column] [ENCODING='ENCODING'].
-```
-
-The `REREAD` transformation allows the previous input line in a data
-file already processed by `DATA LIST` or another input command to be
-re-read for further processing.
-
-The `FILE` subcommand, which is optional, is used to specify the file
-to have its line re-read. The file must be specified as the name of a
-[file handle](../../language/files/file-handles.md). If `FILE` is not
-specified then the file specified on the most recent `DATA LIST`
-command is assumed.
-
-By default, the line re-read is re-read in its entirety. With the
-`COLUMN` subcommand, a prefix of the line can be exempted from
-re-reading. Specify an
-[expression](../../language/expressions/index.md) evaluating to the
-first column that should be included in the re-read line. Columns are
-numbered from 1 at the left margin.
-
-The `ENCODING` subcommand may only be used if the `FILE` subcommand is
-also used. It specifies the character encoding of the file. See
-[`INSERT`](../utilities/insert.md) for information on supported
-encodings.
-
-Issuing `REREAD` multiple times will not back up in the data file.
-Instead, it will re-read the same line multiple times.
-
+++ /dev/null
-# WRITE
-
-```
-WRITE
- OUTFILE='FILE_NAME'
- RECORDS=N_LINES
- {NOTABLE,TABLE}
- /[LINE_NO] ARG...
-
-ARG takes one of the following forms:
- 'STRING' [START-END]
- VAR_LIST START-END [TYPE_SPEC]
- VAR_LIST (FORTRAN_SPEC)
- VAR_LIST *
-```
-
-`WRITE` writes text or binary data to an output file. `WRITE` differs
-from [`PRINT`](print.md) in only a few ways:
-
-- `WRITE` uses write formats by default, whereas `PRINT` uses print
- formats.
-
-- `PRINT` inserts a space between variables unless a format is
- explicitly specified, but `WRITE` never inserts space between
- variables in output.
-
-- `PRINT` inserts a space at the beginning of each line that it writes
- to an output file (and `PRINT EJECT` inserts `1` at the beginning of
- each line that should begin a new page), but `WRITE` does not.
-
-- `PRINT` outputs the system-missing value according to its specified
- output format, whereas `WRITE` outputs the system-missing value as a
- field filled with spaces. Binary formats are an exception.
-
--- /dev/null
+# DATA LIST
+
+Used to read text or binary data, `DATA LIST` is the most fundamental
+data-reading command. Even the more sophisticated input methods use
+`DATA LIST` commands as a building block. Understanding `DATA LIST` is
+important to understanding how to use PSPP to read your data files.
+
+ There are two major variants of `DATA LIST`, which are fixed format
+and free format. In addition, free format has a minor variant, list
+format, which is discussed in terms of its differences from vanilla free
+format.
+
+ Each form of `DATA LIST` is described in detail below.
+
+ See [`GET DATA`](get-data.md) for a command that offers a few
+enhancements over DATA LIST and that may be substituted for DATA LIST
+in many situations.
+
+## DATA LIST FIXED
+
+```
+DATA LIST [FIXED]
+ {TABLE,NOTABLE}
+ [FILE='FILE_NAME' [ENCODING='ENCODING']]
+ [RECORDS=RECORD_COUNT]
+ [END=END_VAR]
+ [SKIP=RECORD_COUNT]
+ /[line_no] VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+```
+
+ `DATA LIST FIXED` is used to read data files that have values at
+fixed positions on each line of single-line or multiline records. The
+keyword `FIXED` is optional.
+
+ The `FILE` subcommand must be used if input is to be taken from an
+external file. It may be used to specify a file name as a string or a
+[file handle](../language/files/file-handles.md). If the `FILE`
+subcommand is not used, then input is assumed to be specified within
+the command file using [`BEGIN DATA`...`END DATA`](begin-data.md).
+The `ENCODING` subcommand may only be used if the `FILE` subcommand is
+also used. It specifies the character encoding of the file. See
+[`INSERT`](insert.md), for information on supported encodings.
+
+ The optional `RECORDS` subcommand, which takes a single integer as an
+argument, is used to specify the number of lines per record. If
+`RECORDS` is not specified, then the number of lines per record is
+calculated from the list of variable specifications later in `DATA
+LIST`.
+
+ The `END` subcommand is only useful in conjunction with [`INPUT
+PROGRAM`](input-program.md).
+
+ The optional `SKIP` subcommand specifies a number of records to skip
+at the beginning of an input file. It can be used to skip over a row
+that contains variable names, for example.
+
+ `DATA LIST` can optionally output a table describing how the data
+file is read. The `TABLE` subcommand enables this output, and `NOTABLE`
+disables it. The default is to output the table.
+
+ The list of variables to be read from the data list must come last.
+Each line in the data record is introduced by a slash (`/`).
+Optionally, a line number may follow the slash. Following, any number
+of variable specifications may be present.
+
+ Each variable specification consists of a list of variable names
+followed by a description of their location on the input line. [Sets
+of variables](../language/datasets/variable-lists.html) may be with
+`TO`, e.g. `VAR1 TO VAR5`. There are two ways to specify the location
+of the variable on the line: columnar style and FORTRAN style.
+
+ In columnar style, the starting column and ending column for the
+field are specified after the variable name, separated by a dash
+(`-`). For instance, the third through fifth columns on a line would
+be specified `3-5`. By default, variables are considered to be in
+[`F` format](../language/datasets/formats/basic.html). (Use [`SET
+FORMAT`](set.md#format) to change the default.)
+
+ In columnar style, to use a variable format other than the default,
+specify the format type in parentheses after the column numbers. For
+instance, for alphanumeric `A` format, use `(A)`.
+
+ In addition, implied decimal places can be specified in parentheses
+after the column numbers. As an example, suppose that a data file has a
+field in which the characters `1234` should be interpreted as having the
+value 12.34. Then this field has two implied decimal places, and the
+corresponding specification would be `(2)`. If a field that has implied
+decimal places contains a decimal point, then the implied decimal places
+are not applied.
+
+ Changing the variable format and adding implied decimal places can be
+done together; for instance, `(N,5)`.
+
+ When using columnar style, the input and output width of each
+variable is computed from the field width. The field width must be
+evenly divisible into the number of variables specified.
+
+ FORTRAN style is an altogether different approach to specifying field
+locations. With this approach, a list of variable input format
+specifications, separated by commas, are placed after the variable names
+inside parentheses. Each format specifier advances as many characters
+into the input line as it uses.
+
+ Implied decimal places also exist in FORTRAN style. A format
+specification with `D` decimal places also has `D` implied decimal places.
+
+ In addition to the [standard
+ formats](../language/datasets/formats/index.html), FORTRAN style
+ defines some extensions:
+
+* `X`
+ Advance the current column on this line by one character position.
+
+* `T<X>`
+ Set the current column on this line to column `<X>`, with column
+ numbers considered to begin with 1 at the left margin.
+
+* `NEWREC<X>`
+ Skip forward `<X>` lines in the current record, resetting the active
+ column to the left margin.
+
+* Repeat count
+ Any format specifier may be preceded by a number. This causes the
+ action of that format specifier to be repeated the specified number
+ of times.
+
+* `(SPEC1, ..., SPECN)`
+ Use `()` to group specifiers together. This is most useful when
+ preceded by a repeat count. Groups may be nested.
+
+ FORTRAN and columnar styles may be freely intermixed. Columnar style
+leaves the active column immediately after the ending column specified.
+Record motion using `NEWREC` in FORTRAN style also applies to later
+FORTRAN and columnar specifiers.
+
+### Example 1
+
+```
+DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
+
+BEGIN DATA.
+John Smith 102311
+Bob Arnold 122015
+Bill Yates 918 6
+END DATA.
+```
+
+Defines the following variables:
+
+ - `NAME`, a 10-character-wide string variable, in columns 1
+ through 10.
+
+ - `INFO1`, a numeric variable, in columns 12 through 13.
+
+ - `INFO2`, a numeric variable, in columns 14 through 15.
+
+ - `INFO3`, a numeric variable, in columns 16 through 17.
+
+The `BEGIN DATA`/`END DATA` commands cause three cases to be
+defined:
+
+|Case |NAME |INFO1 |INFO2 |INFO3|
+|------:|:------------|-------:|-------:|----:|
+| 1 |John Smith | 10 | 23 | 11|
+| 2 |Bob Arnold | 12 | 20 | 15|
+| 3 |Bill Yates | 9 | 18 | 6|
+
+The `TABLE` keyword causes PSPP to print out a table describing the
+four variables defined.
+
+### Example 2
+
+```
+DATA LIST FILE="survey.dat"
+ /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
+ /Q01 TO Q50 7-56
+ /.
+```
+
+Defines the following variables:
+
+ - `ID`, a numeric variable, in columns 1-5 of the first record.
+
+ - `NAME`, a 30-character string variable, in columns 7-36 of the
+ first record.
+
+ - `SURNAME`, a 30-character string variable, in columns 38-67 of
+ the first record.
+
+ - `MINITIAL`, a 1-character string variable, in column 69 of the
+ first record.
+
+ - Fifty variables `Q01`, `Q02`, `Q03`, ..., `Q49`, `Q50`, all
+ numeric, `Q01` in column 7, `Q02` in column 8, ..., `Q49` in
+ column 55, `Q50` in column 56, all in the second record.
+
+Cases are separated by a blank record.
+
+Data is read from file `survey.dat` in the current directory.
+
+## DATA LIST FREE
+
+```
+DATA LIST FREE
+ [({TAB,'C'}, ...)]
+ [{NOTABLE,TABLE}]
+ [FILE='FILE_NAME' [ENCODING='ENCODING']]
+ [SKIP=N_RECORDS]
+ /VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST [(TYPE_SPEC)]
+ VAR_LIST *
+```
+
+ In free format, the input data is, by default, structured as a
+series of fields separated by spaces, tabs, or line breaks. If the
+current [`DECIMAL`](set.md#decimal) separator is `DOT`, then commas
+are also treated as field separators. Each field's content may be
+unquoted, or it may be quoted with a pairs of apostrophes (`'`) or
+double quotes (`"`). Unquoted white space separates fields but is not
+part of any field. Any mix of spaces, tabs, and line breaks is
+equivalent to a single space for the purpose of separating fields, but
+consecutive commas will skip a field.
+
+ Alternatively, delimiters can be specified explicitly, as a
+parenthesized, comma-separated list of single-character strings
+immediately following `FREE`. The word `TAB` may also be used to
+specify a tab character as a delimiter. When delimiters are specified
+explicitly, only the given characters, plus line breaks, separate
+fields. Furthermore, leading spaces at the beginnings of fields are
+not trimmed, consecutive delimiters define empty fields, and no form
+of quoting is allowed.
+
+ The `NOTABLE` and `TABLE` subcommands are as in `DATA LIST FIXED`
+above. `NOTABLE` is the default.
+
+ The `FILE`, `SKIP`, and `ENCODING` subcommands are as in `DATA LIST
+FIXED` above.
+
+ The variables to be parsed are given as a single list of variable
+names. This list must be introduced by a single slash (`/`). The set
+of variable names may contain [format
+specifications](../language/datasets/formats/index.html) in
+parentheses. Format specifications apply to all variables back to the
+previous parenthesized format specification.
+
+ An asterisk on its own has the same effect as `(F8.0)`, assigning
+the variables preceding it input/output format `F8.0`.
+
+ Specified field widths are ignored on input, although all normal
+limits on field width apply, but they are honored on output.
+
+## DATA LIST LIST
+
+```
+DATA LIST LIST
+ [({TAB,'C'}, ...)]
+ [{NOTABLE,TABLE}]
+ [FILE='FILE_NAME' [ENCODING='ENCODING']]
+ [SKIP=RECORD_COUNT]
+ /VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST [(TYPE_SPEC)]
+ VAR_LIST *
+```
+
+ With one exception, `DATA LIST LIST` is syntactically and
+semantically equivalent to `DATA LIST FREE`. The exception is that each
+input line is expected to correspond to exactly one input record. If
+more or fewer fields are found on an input line than expected, an
+appropriate diagnostic is issued.
+
--- /dev/null
+# Transforming Data
+
+The PSPP procedures in this chapter manipulate data and prepare the
+active dataset for later analyses. They do not produce output.
+++ /dev/null
-# AGGREGATE
-
-```
-AGGREGATE
- [OUTFILE={*,'FILE_NAME',FILE_HANDLE} [MODE={REPLACE,ADDVARIABLES}]]
- [/MISSING=COLUMNWISE]
- [/PRESORTED]
- [/DOCUMENT]
- [/BREAK=VAR_LIST]
- /DEST_VAR['LABEL']...=AGR_FUNC(SRC_VARS[, ARGS]...)...
-```
-
-`AGGREGATE` summarizes groups of cases into single cases. It divides
-cases into groups that have the same values for one or more variables
-called "break variables". Several functions are available for
-summarizing case contents.
-
-The `AGGREGATE` syntax consists of subcommands to control its
-behavior, all of which are optional, followed by one or more
-destination variable assigments, each of which uses an aggregation
-function to define how it is calculated.
-
-The `OUTFILE` subcommand, which must be first, names the destination
-for `AGGREGATE` output. It may name a system file by file name or
-[file handle](../../language/files/file-handles.md), a
-[dataset](../../language/datasets/index.md) by its name, or `*` to
-replace the active dataset. `AGGREGATE` writes its output to this
-file.
-
-With `OUTFILE=*` only, `MODE` may be specified immediately afterward
-with the value `ADDVARIABLES` or `REPLACE`:
-
-- With `REPLACE`, the default, the active dataset is replaced by a
- new dataset which contains just the break variables and the
- destination varibles. The new file contains as many cases as there
- are unique combinations of the break variables.
-
-- With `ADDVARIABLES`, the destination variables are added to those
- in the existing active dataset. Cases that have the same
- combination of values in their break variables receive identical
- values for the destination variables. The number of cases in the
- active dataset remains unchanged. The data must be sorted on the
- break variables, that is, `ADDVARIABLES` implies `PRESORTED`
-
-Without `OUTFILE`, `AGGREGATE` acts as if `OUTFILE=*
-MODE=ADDVARIABLES` were specified.
-
-By default, `AGGREGATE` first sorts the data on the break variables.
-If the active dataset is already sorted or grouped by the break
-variables, specify `PRESORTED` to save time. With
-`MODE=ADDVARIABLES`, the data must be pre-sorted.
-
-Specify [`DOCUMENT`](../utilities/document.md) to copy the documents
-from the active dataset into the aggregate file. Otherwise, the
-aggregate file does not contain any documents, even if the aggregate
-file replaces the active dataset.
-
-Normally, `AGGREGATE` produces a non-missing value whenever there is
-enough non-missing data for the aggregation function in use, that is,
-just one non-missing value or, for the `SD` and `SD.` aggregation
-functions, two non-missing values. Specify `/MISSING=COLUMNWISE` to
-make `AGGREGATE` output a missing value when one or more of the input
-values are missing.
-
-The `BREAK` subcommand is optionally but usually present. On `BREAK`,
-list the variables used to divide the active dataset into groups to be
-summarized.
-
-`AGGREGATE` is particular about the order of subcommands. `OUTFILE`
-must be first, followed by `MISSING`. `PRESORTED` and `DOCUMENT`
-follow `MISSING`, in either order, followed by `BREAK`, then followed
-by aggregation variable specifications.
-
-At least one set of aggregation variables is required. Each set
-comprises a list of aggregation variables, an equals sign (`=`), the
-name of an aggregation function (see the list below), and a list of
-source variables in parentheses. A few aggregation functions do not
-accept source variables, and some aggregation functions expect
-additional arguments after the source variable names.
-
-`AGGREGATE` typically creates aggregation variables with no variable
-label, value labels, or missing values. Their default print and write
-formats depend on the aggregation function used, with details given in
-the table below. A variable label for an aggregation variable may be
-specified just after the variable's name in the aggregation variable
-list.
-
-Each set must have exactly as many source variables as aggregation
-variables. Each aggregation variable receives the results of applying
-the specified aggregation function to the corresponding source variable.
-
-The following aggregation functions may be applied only to numeric
-variables:
-
-* `MEAN(VAR_NAME...)`
- Arithmetic mean. Limited to numeric values. The default format is
- `F8.2`.
-
-* `MEDIAN(VAR_NAME...)`
- The median value. Limited to numeric values. The default format
- is `F8.2`.
-
-* `SD(VAR_NAME...)`
- Standard deviation of the mean. Limited to numeric values. The
- default format is `F8.2`.
-
-* `SUM(VAR_NAME...)`
- Sum. Limited to numeric values. The default format is `F8.2`.
-
- These aggregation functions may be applied to numeric and string
-variables:
-
-* `CGT(VAR_NAME..., VALUE)`
- `CLT(VAR_NAME..., VALUE)`
- `CIN(VAR_NAME..., LOW, HIGH)`
- `COUT(VAR_NAME..., LOW, HIGH)`
- Total weight of cases greater than or less than `VALUE` or inside or
- outside the closed range `[LOW,HIGH]`, respectively. The default
- format is `F5.3`.
-
-* `FGT(VAR_NAME..., VALUE)`
- `FLT(VAR_NAME..., VALUE)`
- `FIN(VAR_NAME..., LOW, HIGH)`
- `FOUT(VAR_NAME..., LOW, HIGH)`
- Fraction of values greater than or less than `VALUE` or inside or
- outside the closed range `[LOW,HIGH]`, respectively. The default
- format is `F5.3`.
-
-* `FIRST(VAR_NAME...)`
- `LAST(VAR_NAME...)`
- First or last non-missing value, respectively, in break group. The
- aggregation variable receives the complete dictionary information
- from the source variable. The sort performed by `AGGREGATE` (and
- by `SORT CASES`) is stable. This means that the first (or last)
- case with particular values for the break variables before sorting
- is also the first (or last) case in that break group after sorting.
-
-* `MIN(VAR_NAME...)`
- `MAX(VAR_NAME...)`
- Minimum or maximum value, respectively. The aggregation variable
- receives the complete dictionary information from the source
- variable.
-
-* `N(VAR_NAME...)`
- `NMISS(VAR_NAME...)`
- Total weight of non-missing or missing values, respectively. The
- default format is `F7.0` if weighting is not enabled, `F8.2` if it
- is (see [`WEIGHT`](../selection/weight.md)).
-
-* `NU(VAR_NAME...)`
- `NUMISS(VAR_NAME...)`
- Count of non-missing or missing values, respectively, ignoring case
- weights. The default format is `F7.0`.
-
-* `PGT(VAR_NAME..., VALUE)`
- `PLT(VAR_NAME..., VALUE)`
- `PIN(VAR_NAME..., LOW, HIGH)`
- `POUT(VAR_NAME..., LOW, HIGH)`
- Percentage between 0 and 100 of values greater than or less than
- `VALUE` or inside or outside the closed range `[LOW,HIGH]`,
- respectively. The default format is `F5.1`.
-
-These aggregation functions do not accept source variables:
-
-* `N`
- Total weight of cases aggregated to form this group. The default
- format is `F7.0` if weighting is not enabled, `F8.2` if it is (see
- [`WEIGHT`](../selection/weight.md)).
-
-* `NU`
- Count of cases aggregated to form this group, ignoring case
- weights. The default format is `F7.0`.
-
-Aggregation functions compare string values in terms of Unicode
-character codes.
-
-The aggregation functions listed above exclude all user-missing values
-from calculations. To include user-missing values, insert a period
-(`.`) at the end of the function name. (e.g. `SUM.`). (Be aware that
-specifying such a function as the last token on a line causes the
-period to be interpreted as the end of the command.)
-
-`AGGREGATE` both ignores and cancels the current [`SPLIT
-FILE`](../selection/split-file.md) settings.
-
-## Example
-
-The `personnel.sav` dataset provides the occupations and salaries of
-many individuals. For many purposes however such detailed information
-is not interesting, but often the aggregated statistics of each
-occupation are of interest. Here, the `AGGREGATE` command is used to
-calculate the mean, the median and the standard deviation of each
-occupation.
-
-```
-GET FILE="personnel.sav".
-AGGREGATE OUTFILE=* MODE=REPLACE
- /BREAK=occupation
- /occ_mean_salary=MEAN(salary)
- /occ_median_salary=MEDIAN(salary)
- /occ_std_dev_salary=SD(salary).
-LIST.
-```
-
-Since we chose the `MODE=REPLACE` option, cases for the individual
-persons are no longer present. They have each been replaced by a
-single case per aggregated value.
-
-```
- Data List
-┌──────────────────┬───────────────┬─────────────────┬──────────────────┐
-│ occupation │occ_mean_salary│occ_median_salary│occ_std_dev_salary│
-├──────────────────┼───────────────┼─────────────────┼──────────────────┤
-│Artist │ 37836.18│ 34712.50│ 7631.48│
-│Baker │ 45075.20│ 45075.20│ 4411.21│
-│Barrister │ 39504.00│ 39504.00│ .│
-│Carpenter │ 39349.11│ 36190.04│ 7453.40│
-│Cleaner │ 41142.50│ 39647.49│ 14378.98│
-│Cook │ 40357.79│ 43194.00│ 11064.51│
-│Manager │ 46452.14│ 45657.56│ 6901.69│
-│Mathematician │ 34531.06│ 34763.06│ 5267.68│
-│Painter │ 45063.55│ 45063.55│ 15159.67│
-│Payload Specialist│ 34355.72│ 34355.72│ .│
-│Plumber │ 40413.91│ 40410.00│ 4726.05│
-│Scientist │ 36687.07│ 36803.83│ 10873.54│
-│Scrientist │ 42530.65│ 42530.65│ .│
-│Tailor │ 34586.79│ 34586.79│ 3728.98│
-└──────────────────┴───────────────┴─────────────────┴──────────────────┘
-```
-
-Some values for the standard deviation are blank because there is only
-one case with the respective occupation.
-
+++ /dev/null
-# AUTORECODE
-
-```
-AUTORECODE VARIABLES=SRC_VARS INTO DEST_VARS
- [ /DESCENDING ]
- [ /PRINT ]
- [ /GROUP ]
- [ /BLANK = {VALID, MISSING} ]
-```
-
-The `AUTORECODE` procedure considers the N values that a variable
-takes on and maps them onto values 1...N on a new numeric variable.
-
-Subcommand `VARIABLES` is the only required subcommand and must come
-first. Specify `VARIABLES`, an equals sign (`=`), a list of source
-variables, `INTO`, and a list of target variables. There must the
-same number of source and target variables. The target variables must
-not already exist.
-
-`AUTORECODE` ordinarily assigns each increasing non-missing value of a
-source variable (for a string, this is based on character code
-comparisons) to consecutive values of its target variable. For
-example, the smallest non-missing value of the source variable is
-recoded to value 1, the next smallest to 2, and so on. If the source
-variable has user-missing values, they are recoded to consecutive
-values just above the non-missing values. For example, if a source
-variables has seven distinct non-missing values, then the smallest
-missing value would be recoded to 8, the next smallest to 9, and so
-on.
-
-Use `DESCENDING` to reverse the sort order for non-missing values, so
-that the largest non-missing value is recoded to 1, the second-largest
-to 2, and so on. Even with `DESCENDING`, user-missing values are
-still recoded in ascending order just above the non-missing values.
-
-The system-missing value is always recoded into the system-missing
-variable in target variables.
-
-If a source value has a value label, then that value label is retained
-for the new value in the target variable. Otherwise, the source value
-itself becomes each new value's label.
-
-Variable labels are copied from the source to target variables.
-
-`PRINT` is currently ignored.
-
-The `GROUP` subcommand is relevant only if more than one variable is
-to be recoded. It causes a single mapping between source and target
-values to be used, instead of one map per variable. With `GROUP`,
-user-missing values are taken from the first source variable that has
-any user-missing values.
-
-If `/BLANK=MISSING` is given, then string variables which contain
-only whitespace are recoded as SYSMIS. If `/BLANK=VALID` is specified
-then they are allocated a value like any other. `/BLANK` is not
-relevant to numeric values. `/BLANK=VALID` is the default.
-
-`AUTORECODE` is a procedure. It causes the data to be read.
-
-## Example
-
-In the file `personnel.sav`, the variable occupation is a string
-variable. Except for data of a purely commentary nature, string
-variables are generally a bad idea. One reason is that data entry
-errors are easily overlooked. This has happened in `personnel.sav`;
-one entry which should read "Scientist" has been mistyped as
-"Scrientist". The syntax below shows how to correct this error in the
-`DO IF` clause[^1], which then uses `AUTORECODE` to create a new numeric
-variable which takes recoded values of occupation. Finally, we remove
-the old variable and rename the new variable to the name of the old
-variable:
-
-[^1]: One must use care when correcting such data input errors rather
-than simply marking them as missing. For example, if an occupation
-has been entered "Barister", did the person mean "Barrister" or
-"Barista"?
-
-```
-get file='personnel.sav'.
-
-* Correct a typing error in the original file.
-do if occupation = "Scrientist".
- compute occupation = "Scientist".
-end if.
-
-autorecode
- variables = occupation into occ
- /blank = missing.
-
-* Delete the old variable.
-delete variables occupation.
-
-* Rename the new variable to the old variable's name.
-rename variables (occ = occupation).
-
-* Inspect the new variable.
-display dictionary /variables=occupation.
-```
-
-
-Notice, in the output below, how the new variable has been
-automatically allocated value labels which correspond to the strings
-of the old variable. This means that in future analyses the
-descriptive strings are reported instead of the numeric values.
-
-```
- Variables
-+----------+--------+--------------+-----+-----+---------+----------+---------+
-| | | Measurement | | | | Print | Write |
-|Name |Position| Level | Role|Width|Alignment| Format | Format |
-+----------+--------+--------------+-----+-----+---------+----------+---------+
-|occupation| 6|Unknown |Input| 8|Right |F2.0 |F2.0 |
-+----------+--------+--------------+-----+-----+---------+----------+---------+
-
- Value Labels
-+---------------+------------------+
-|Variable Value | Label |
-+---------------+------------------+
-|occupation 1 |Artist |
-| 2 |Baker |
-| 3 |Barrister |
-| 4 |Carpenter |
-| 5 |Cleaner |
-| 6 |Cook |
-| 7 |Manager |
-| 8 |Mathematician |
-| 9 |Painter |
-| 10 |Payload Specialist|
-| 11 |Plumber |
-| 12 |Scientist |
-| 13 |Tailor |
-+---------------+------------------+
-```
+++ /dev/null
-# COMPUTE
-
-```
-COMPUTE VARIABLE = EXPRESSION.
- or
-COMPUTE vector(INDEX) = EXPRESSION.
-```
-
-`COMPUTE` assigns the value of an expression to a target variable.
-For each case, the expression is evaluated and its value assigned to
-the target variable. Numeric and string variables may be assigned.
-When a string expression's width differs from the target variable's
-width, the string result of the expression is truncated or padded with
-spaces on the right as necessary. The expression and variable types
-must match.
-
-For numeric variables only, the target variable need not already
-exist. Numeric variables created by `COMPUTE` are assigned an `F8.2`
-output format. String variables must be declared before they can be
-used as targets for `COMPUTE`.
-
-The target variable may be specified as an element of a
-[vector](../../commands/variables/vector.md). In this case, an
-expression `INDEX` must be specified in parentheses following the vector
-name. The expression `INDEX` must evaluate to a numeric value that,
-after rounding down to the nearest integer, is a valid index for the
-named vector.
-
-Using `COMPUTE` to assign to a variable specified on
-[`LEAVE`](../../commands/variables/leave.md) resets the variable's
-left state. Therefore, `LEAVE` should be specified following
-`COMPUTE`, not before.
-
-`COMPUTE` is a transformation. It does not cause the active dataset
-to be read.
-
-When `COMPUTE` is specified following
-[`TEMPORARY`](../selection/temporary.md), the
-[`LAG`](../../language/expressions/functions/miscellaneous.md)
-function may not be used.
-
-## Example
-
-The dataset `physiology.sav` contains the height and weight of
-persons. For some purposes, neither height nor weight alone is of
-interest. Epidemiologists are often more interested in the "body mass
-index" which can sometimes be used as a predictor for clinical
-conditions. The body mass index is defined as the weight of the
-person in kilograms divided by the square of the person's height in
-metres.[^1]
-
-[^1]: Since BMI is a quantity with a ratio scale and has units, the
-term "index" is a misnomer, but that is what it is called.
-
-```
-get file='physiology.sav'.
-
-* height is in mm so we must divide by 1000 to get metres.
-compute bmi = weight / (height/1000)**2.
-variable label bmi "Body Mass Index".
-
-descriptives /weight height bmi.
-```
-
-This syntax shows how you can use `COMPUTE` to generate a new variable
-called bmi and have every case's value calculated from the existing
-values of weight and height. It also shows how you can [add a
-label](../../commands/variables/variable-labels.md) to this new
-variable, so that a more descriptive label appears in subsequent
-analyses, and this can be seen in the output from the `DESCRIPTIVES`
-command, below.
-
-The expression which follows the `=` sign can be as complicated as
-necessary. See [Expressions](../../language/expressions/index.md) for
-a full description of the language accepted.
-
-```
- Descriptive Statistics
-┌─────────────────────┬──┬───────┬───────┬───────┬───────┐
-│ │ N│ Mean │Std Dev│Minimum│Maximum│
-├─────────────────────┼──┼───────┼───────┼───────┼───────┤
-│Weight in kilograms │40│ 72.12│ 26.70│ ─55.6│ 92.1│
-│Height in millimeters│40│1677.12│ 262.87│ 179│ 1903│
-│Body Mass Index │40│ 67.46│ 274.08│ ─21.62│1756.82│
-│Valid N (listwise) │40│ │ │ │ │
-│Missing N (listwise) │ 0│ │ │ │ │
-└─────────────────────┴──┴───────┴───────┴───────┴───────┘
-```
+++ /dev/null
-# FLIP
-
-```
-FLIP /VARIABLES=VAR_LIST /NEWNAMES=VAR_NAME.
-```
-
-`FLIP` transposes rows and columns in the active dataset. It causes
-cases to be swapped with variables, and vice versa.
-
-All variables in the transposed active dataset are numeric. String
-variables take on the system-missing value in the transposed file.
-
-`N` subcommands are required. If specified, the `VARIABLES`
-subcommand selects variables to be transformed into cases, and variables
-not specified are discarded. If the `VARIABLES` subcommand is omitted,
-all variables are selected for transposition.
-
-The variables specified by `NEWNAMES`, which must be a string
-variable, is used to give names to the variables created by `FLIP`.
-Only the first 8 characters of the variable are used. If `NEWNAMES`
-is not specified then the default is a variable named CASE_LBL, if it
-exists. If it does not then the variables created by `FLIP` are named
-`VAR000` through `VAR999`, then `VAR1000`, `VAR1001`, and so on.
-
-When a `NEWNAMES` variable is available, the names must be
-canonicalized before becoming variable names. Invalid characters are
-replaced by letter `V` in the first position, or by `_` in subsequent
-positions. If the name thus generated is not unique, then numeric
-extensions are added, starting with 1, until a unique name is found or
-there are no remaining possibilities. If the latter occurs then the
-`FLIP` operation aborts.
-
-The resultant dictionary contains a `CASE_LBL` variable, a string
-variable of width 8, which stores the names of the variables in the
-dictionary before the transposition. Variables names longer than 8
-characters are truncated. If `FLIP` is called again on this dataset,
-the `CASE_LBL` variable can be passed to the `NEWNAMES` subcommand to
-recreate the original variable names.
-
-`FLIP` honors [`N OF CASES`](../selection/n.md). It ignores
-[`TEMPORARY`](../selection/temporary.md), so that "temporary"
-transformations become permanent.
-
-## Example
-
-In the syntax below, data has been entered using [`DATA
-LIST`](../../commands/data-io/data-list.md) such that the first
-variable in the dataset is a string variable containing a description
-of the other data for the case. Clearly this is not a convenient
-arrangement for performing statistical analyses, so it would have been
-better to think a little more carefully about how the data should have
-been arranged. However often the data is provided by some third party
-source, and you have no control over the form. Fortunately, we can
-use `FLIP` to exchange the variables and cases in the active dataset.
-
-```
-data list notable list /heading (a16) v1 v2 v3 v4 v5 v6
-begin data.
-date-of-birth 1970 1989 2001 1966 1976 1982
-sex 1 0 0 1 0 1
-score 10 10 9 3 8 9
-end data.
-
-echo 'Before FLIP:'.
-display variables.
-list.
-
-flip /variables = all /newnames = heading.
-
-echo 'After FLIP:'.
-display variables.
-list.
-```
-
-As you can see in the results below, before the `FLIP` command has run
-there are seven variables (six containing data and one for the
-heading) and three cases. Afterwards there are four variables (one
-per case, plus the CASE_LBL variable) and six cases. You can delete
-the CASE_LBL variable (see [DELETE
-VARIABLES](../../commands/variables/delete-variables.md)) if you don't
-need it.
-
-```
-Before FLIP:
-
- Variables
-┌───────┬────────┬────────────┬────────────┐
-│Name │Position│Print Format│Write Format│
-├───────┼────────┼────────────┼────────────┤
-│heading│ 1│A16 │A16 │
-│v1 │ 2│F8.2 │F8.2 │
-│v2 │ 3│F8.2 │F8.2 │
-│v3 │ 4│F8.2 │F8.2 │
-│v4 │ 5│F8.2 │F8.2 │
-│v5 │ 6│F8.2 │F8.2 │
-│v6 │ 7│F8.2 │F8.2 │
-└───────┴────────┴────────────┴────────────┘
-
- Data List
-┌─────────────┬───────┬───────┬───────┬───────┬───────┬───────┐
-│ heading │ v1 │ v2 │ v3 │ v4 │ v5 │ v6 │
-├─────────────┼───────┼───────┼───────┼───────┼───────┼───────┤
-│date─of─birth│1970.00│1989.00│2001.00│1966.00│1976.00│1982.00│
-│sex │ 1.00│ .00│ .00│ 1.00│ .00│ 1.00│
-│score │ 10.00│ 10.00│ 9.00│ 3.00│ 8.00│ 9.00│
-└─────────────┴───────┴───────┴───────┴───────┴───────┴───────┘
-
-After FLIP:
-
- Variables
-┌─────────────┬────────┬────────────┬────────────┐
-│Name │Position│Print Format│Write Format│
-├─────────────┼────────┼────────────┼────────────┤
-│CASE_LBL │ 1│A8 │A8 │
-│date_of_birth│ 2│F8.2 │F8.2 │
-│sex │ 3│F8.2 │F8.2 │
-│score │ 4│F8.2 │F8.2 │
-└─────────────┴────────┴────────────┴────────────┘
-
- Data List
-┌────────┬─────────────┬────┬─────┐
-│CASE_LBL│date_of_birth│ sex│score│
-├────────┼─────────────┼────┼─────┤
-│v1 │ 1970.00│1.00│10.00│
-│v2 │ 1989.00│ .00│10.00│
-│v3 │ 2001.00│ .00│ 9.00│
-│v4 │ 1966.00│1.00│ 3.00│
-│v5 │ 1976.00│ .00│ 8.00│
-│v6 │ 1982.00│1.00│ 9.00│
-└────────┴─────────────┴────┴─────┘
-```
+++ /dev/null
-# IF
-
-```
- IF CONDITION VARIABLE=EXPRESSION.
-or
- IF CONDITION vector(INDEX)=EXPRESSION.
-```
-
-The `IF` transformation evaluates a test expression and, if it is
-true, assigns the value of a target expression to a target variable.
-
-Specify a boolean-valued test
-[expression](../../language/expressions/index.md) to be tested following the
-`IF` keyword. The test expression is evaluated for each case:
-
-- If it is true, then the target expression is evaluated and assigned
- to the specified variable.
-
-- If it is false or missing, nothing is done.
-
-Numeric and string variables may be assigned. When a string
-expression's width differs from the target variable's width, the
-string result is truncated or padded with spaces on the right as
-necessary. The expression and variable types must match.
-
-The target variable may be specified as an element of a
-[vector](../../commands/variables/vector.md). In this case, a vector
-index expression must be specified in parentheses following the vector
-name. The index expression must evaluate to a numeric value that,
-after rounding down to the nearest integer, is a valid index for the
-named vector.
-
-Using `IF` to assign to a variable specified on
-[`LEAVE`](../../commands/variables/leave.md) resets the variable's
-left state. Therefore, use `LEAVE` after `IF`, not before.
-
-When `IF` follows [`TEMPORARY`](../selection/temporary.md), the
-[`LAG`](../../language/expressions/functions/miscellaneous.md) function may not
-be used.
-
+++ /dev/null
-The PSPP procedures in this chapter manipulate data and prepare the
-active dataset for later analyses. They do not produce output.
+++ /dev/null
-# RECODE
-
-The `RECODE` command is used to transform existing values into other,
-user specified values. The general form is:
-
-```
-RECODE SRC_VARS
- (SRC_VALUE SRC_VALUE ... = DEST_VALUE)
- (SRC_VALUE SRC_VALUE ... = DEST_VALUE)
- (SRC_VALUE SRC_VALUE ... = DEST_VALUE) ...
- [INTO DEST_VARS].
-```
-
-Following the `RECODE` keyword itself comes `SRC_VARS`, a list of
-variables whose values are to be transformed. These variables must
-all string or all numeric variables.
-
-After the list of source variables, there should be one or more
-"mappings". Each mapping is enclosed in parentheses, and contains the
-source values and a destination value separated by a single `=`. The
-source values are used to specify the values in the dataset which need
-to change, and the destination value specifies the new value to which
-they should be changed. Each SRC_VALUE may take one of the following
-forms:
-
-* `NUMBER` (numeric source variables only)
- Matches a number.
-
-* `STRING` (string source variables only)
- Matches a string enclosed in single or double quotes.
-
-* `NUM1 THRU NUM2` (numeric source variables only)
- Matches all values in the range between `NUM1` and `NUM2`, including
- both endpoints of the range. `NUM1` should be less than `NUM2`.
- Open-ended ranges may be specified using `LO` or `LOWEST` for `NUM1`
- or `HI` or `HIGHEST` for `NUM2`.
-
-* `MISSING`
- Matches system missing and user missing values.
-
-* `SYSMIS` (numeric source variables only)
- Match system-missing values.
-
-* `ELSE`
- Matches any values that are not matched by any other `SRC_VALUE`.
- This should appear only as the last mapping in the command.
-
-After the source variables comes an `=` and then the `DEST_VALUE`,
-which may take any of the following forms:
-
-* `NUMBER` (numeric destination variables only)
- A literal numeric value to which the source values should be
- changed.
-
-* `STRING` (string destination variables only)
- A literal string value (enclosed in quotation marks) to which the
- source values should be changed. This implies the destination
- variable must be a string variable.
-
-* `SYSMIS` (numeric destination variables only)
- The keyword `SYSMIS` changes the value to the system missing value.
- This implies the destination variable must be numeric.
-
-* `COPY`
- The special keyword `COPY` means that the source value should not be
- modified, but copied directly to the destination value. This is
- meaningful only if `INTO DEST_VARS` is specified.
-
-Mappings are considered from left to right. Therefore, if a value is
-matched by a `SRC_VALUE` from more than one mapping, the first
-(leftmost) mapping which matches is considered. Any subsequent
-matches are ignored.
-
-The clause `INTO DEST_VARS` is optional. The behaviour of the command
-is slightly different depending on whether it appears or not:
-
-* Without `INTO DEST_VARS`, then values are recoded "in place". This
- means that the recoded values are written back to the source variables
- from whence the original values came. In this case, the DEST_VALUE
- for every mapping must imply a value which has the same type as the
- SRC_VALUE. For example, if the source value is a string value, it is
- not permissible for DEST_VALUE to be `SYSMIS` or another forms which
- implies a numeric result. It is also not permissible for DEST_VALUE
- to be longer than the width of the source variable.
-
- The following example recodes two numeric variables `x` and `y` in
- place. 0 becomes 99, the values 1 to 10 inclusive are unchanged,
- values 1000 and higher are recoded to the system-missing value, and
- all other values are changed to 999:
-
- ```
- RECODE x y
- (0 = 99)
- (1 THRU 10 = COPY)
- (1000 THRU HIGHEST = SYSMIS)
- (ELSE = 999).
- ```
-
-* With `INTO DEST_VARS`, recoded values are written into the variables
- specified in `DEST_VARS`, which must therefore contain a list of
- valid variable names. The number of variables in `DEST_VARS` must
- be the same as the number of variables in `SRC_VARS` and the
- respective order of the variables in `DEST_VARS` corresponds to the
- order of `SRC_VARS`. That is to say, the recoded value whose
- original value came from the Nth variable in `SRC_VARS` is placed
- into the Nth variable in `DEST_VARS`. The source variables are
- unchanged. If any mapping implies a string as its destination
- value, then the respective destination variable must already exist,
- or have been declared using `STRING` or another transformation.
- Numeric variables however are automatically created if they don't
- already exist.
-
- The following example deals with two source variables, `a` and `b`
- which contain string values. Hence there are two destination
- variables `v1` and `v2`. Any cases where `a` or `b` contain the
- values `apple`, `pear` or `pomegranate` result in `v1` or `v2` being
- filled with the string `fruit` whilst cases with `tomato`, `lettuce`
- or `carrot` result in `vegetable`. Other values produce the result
- `unknown`:
-
- ```
- STRING v1 (A20).
- STRING v2 (A20).
-
- RECODE a b
- ("apple" "pear" "pomegranate" = "fruit")
- ("tomato" "lettuce" "carrot" = "vegetable")
- (ELSE = "unknown")
- INTO v1 v2.
- ```
-
-There is one special mapping, not mentioned above. If the source
-variable is a string variable then a mapping may be specified as
-`(CONVERT)`. This mapping, if it appears must be the last mapping
-given and the `INTO DEST_VARS` clause must also be given and must not
-refer to a string variable. `CONVERT` causes a number specified as a
-string to be converted to a numeric value. For example it converts
-the string `"3"` into the numeric value 3 (note that it does not
-convert `three` into 3). If the string cannot be parsed as a number,
-then the system-missing value is assigned instead. In the following
-example, cases where the value of `x` (a string variable) is the empty
-string, are recoded to 999 and all others are converted to the numeric
-equivalent of the input value. The results are placed into the
-numeric variable `y`:
-
-```
-RECODE x ("" = 999) (CONVERT) INTO y.
-```
-
-It is possible to specify multiple recodings on a single command.
-Introduce additional recodings with a slash (`/`) to separate them from
-the previous recodings:
-
-```
-RECODE
- a (2 = 22) (ELSE = 99)
- /b (1 = 3) INTO z.
-```
-
-Here we have two recodings. The first affects the source variable `a`
-and recodes in-place the value 2 into 22 and all other values to 99.
-The second recoding copies the values of `b` into the variable `z`,
-changing any instances of 1 into 3.
-
+++ /dev/null
-# SORT CASES
-
-```
-SORT CASES BY VAR_LIST[({D|A}] [ VAR_LIST[({D|A}] ] ...
-```
-
-`SORT CASES` sorts the active dataset by the values of one or more
-variables.
-
-Specify `BY` and a list of variables to sort by. By default,
-variables are sorted in ascending order. To override sort order,
-specify `(D)` or `(DOWN)` after a list of variables to get descending
-order, or `(A)` or `(UP)` for ascending order. These apply to all the
-listed variables up until the preceding `(A)`, `(D)`, `(UP)` or
-`(DOWN)`.
-
-`SORT CASES` performs a stable sort, meaning that records with equal
-values of the sort variables have the same relative order before and
-after sorting. Thus, re-sorting an already sorted file does not
-affect the ordering of cases.
-
-`SORT CASES` is a procedure. It causes the data to be read.
-
-`SORT CASES` attempts to sort the entire active dataset in main
-memory. If workspace is exhausted, it falls back to a merge sort
-algorithm which creates numerous temporary files.
-
-`SORT CASES` may not be specified following `TEMPORARY`.
-
-## Example
-
-In the syntax below, the data from the file `physiology.sav` is sorted
-by two variables, viz sex in descending order and temperature in
-ascending order.
-
-```
-get file='physiology.sav'.
-sort cases by sex (D) temperature(A).
-list.
-```
-
- In the output below, you can see that all the cases with a sex of
-`1` (female) appear before those with a sex of `0` (male). This is
-because they have been sorted in descending order. Within each sex,
-the data is sorted on the temperature variable, this time in ascending
-order.
-
-```
- Data List
-┌───┬──────┬──────┬───────────┐
-│sex│height│weight│temperature│
-├───┼──────┼──────┼───────────┤
-│ 1│ 1606│ 56.1│ 34.56│
-│ 1│ 179│ 56.3│ 35.15│
-│ 1│ 1609│ 55.4│ 35.46│
-│ 1│ 1606│ 56.0│ 36.06│
-│ 1│ 1607│ 56.3│ 36.26│
-│ 1│ 1604│ 56.0│ 36.57│
-│ 1│ 1604│ 56.6│ 36.81│
-│ 1│ 1606│ 56.3│ 36.88│
-│ 1│ 1604│ 57.8│ 37.32│
-│ 1│ 1598│ 55.6│ 37.37│
-│ 1│ 1607│ 55.9│ 37.84│
-│ 1│ 1605│ 54.5│ 37.86│
-│ 1│ 1603│ 56.1│ 38.80│
-│ 1│ 1604│ 58.1│ 38.85│
-│ 1│ 1605│ 57.7│ 38.98│
-│ 1│ 1709│ 55.6│ 39.45│
-│ 1│ 1604│ -55.6│ 39.72│
-│ 1│ 1601│ 55.9│ 39.90│
-│ 0│ 1799│ 90.3│ 32.59│
-│ 0│ 1799│ 89.0│ 33.61│
-│ 0│ 1799│ 90.6│ 34.04│
-│ 0│ 1801│ 90.5│ 34.42│
-│ 0│ 1802│ 87.7│ 35.03│
-│ 0│ 1793│ 90.1│ 35.11│
-│ 0│ 1801│ 92.1│ 35.98│
-│ 0│ 1800│ 89.5│ 36.10│
-│ 0│ 1645│ 92.1│ 36.68│
-│ 0│ 1698│ 90.2│ 36.94│
-│ 0│ 1800│ 89.6│ 37.02│
-│ 0│ 1800│ 88.9│ 37.03│
-│ 0│ 1801│ 88.9│ 37.12│
-│ 0│ 1799│ 90.4│ 37.33│
-│ 0│ 1903│ 91.5│ 37.52│
-│ 0│ 1799│ 90.9│ 37.53│
-│ 0│ 1800│ 91.0│ 37.60│
-│ 0│ 1799│ 90.4│ 37.68│
-│ 0│ 1801│ 91.7│ 38.98│
-│ 0│ 1801│ 90.9│ 39.03│
-│ 0│ 1799│ 89.3│ 39.77│
-│ 0│ 1884│ 88.6│ 39.97│
-└───┴──────┴──────┴───────────┘
-```
-
-`SORT CASES` affects only the active file. It does not have any
-effect upon the `physiology.sav` file itself. For that, you would
-have to rewrite the file using the [`SAVE`](../spss-io/save.md)
-command.
--- /dev/null
+# DATAFILE ATTRIBUTE
+
+```
+DATAFILE ATTRIBUTE
+ ATTRIBUTE=NAME('VALUE') [NAME('VALUE')]...
+ ATTRIBUTE=NAME[INDEX]('VALUE') [NAME[INDEX]('VALUE')]...
+ DELETE=NAME [NAME]...
+ DELETE=NAME[INDEX] [NAME[INDEX]]...
+```
+
+ `DATAFILE ATTRIBUTE` adds, modifies, or removes user-defined
+attributes associated with the active dataset. Custom data file
+attributes are not interpreted by PSPP, but they are saved as part of
+system files and may be used by other software that reads them.
+
+ Use the `ATTRIBUTE` subcommand to add or modify a custom data file
+attribute. Specify the name of the attribute, followed by the desired
+value, in parentheses, as a quoted string. Attribute names that begin
+with `$` are reserved for PSPP's internal use, and attribute names
+that begin with `@` or `$@` are not displayed by most PSPP commands
+that display other attributes. Other attribute names are not treated
+specially.
+
+ Attributes may also be organized into arrays. To assign to an array
+element, add an integer array index enclosed in square brackets (`[` and
+`]`) between the attribute name and value. Array indexes start at 1,
+not 0. An attribute array that has a single element (number 1) is not
+distinguished from a non-array attribute.
+
+ Use the `DELETE` subcommand to delete an attribute. Specify an
+attribute name by itself to delete an entire attribute, including all
+array elements for attribute arrays. Specify an attribute name followed
+by an array index in square brackets to delete a single element of an
+attribute array. In the latter case, all the array elements numbered
+higher than the deleted element are shifted down, filling the vacated
+position.
+
+ To associate custom attributes with particular variables, instead
+of with the entire active dataset, use [`VARIABLE
+ATTRIBUTE`](variable-attribute.md) instead.
+
+ `DATAFILE ATTRIBUTE` takes effect immediately. It is not affected by
+conditional and looping structures such as `DO IF` or `LOOP`.
+
--- /dev/null
+# DATASET commands
+
+```
+DATASET NAME NAME [WINDOW={ASIS,FRONT}].
+DATASET ACTIVATE NAME [WINDOW={ASIS,FRONT}].
+DATASET COPY NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}].
+DATASET DECLARE NAME [WINDOW={MINIMIZED,HIDDEN,FRONT}].
+DATASET CLOSE {NAME,*,ALL}.
+DATASET DISPLAY.
+```
+
+ The `DATASET` commands simplify use of multiple datasets within a
+PSPP session. They allow datasets to be created and destroyed. At any
+given time, most PSPP commands work with a single dataset, called the
+active dataset.
+
+ The `DATASET NAME` command gives the active dataset the specified name,
+or if it already had a name, it renames it. If another dataset already
+had the given name, that dataset is deleted.
+
+ The `DATASET ACTIVATE` command selects the named dataset, which must
+already exist, as the active dataset. Before switching the active
+dataset, any pending transformations are executed, as if `EXECUTE` had
+been specified. If the active dataset is unnamed before switching, then
+it is deleted and becomes unavailable after switching.
+
+ The `DATASET COPY` command creates a new dataset with the specified
+name, whose contents are a copy of the active dataset. Any pending
+transformations are executed, as if `EXECUTE` had been specified, before
+making the copy. If a dataset with the given name already exists, it is
+replaced. If the name is the name of the active dataset, then the
+active dataset becomes unnamed.
+
+ The `DATASET DECLARE` command creates a new dataset that is
+initially "empty," that is, it has no dictionary or data. If a
+dataset with the given name already exists, this has no effect. The
+new dataset can be used with commands that support output to a
+dataset, such as. [`AGGREGATE`](aggregate.md).
+
+ The `DATASET CLOSE` command deletes a dataset. If the active dataset
+is specified by name, or if `*` is specified, then the active dataset
+becomes unnamed. If a different dataset is specified by name, then it
+is deleted and becomes unavailable. Specifying `ALL` deletes all datasets
+except for the active dataset, which becomes unnamed.
+
+ The `DATASET DISPLAY` command lists all the currently defined datasets.
+
+ Many `DATASET` commands accept an optional `WINDOW` subcommand. In the
+PSPPIRE GUI, the value given for this subcommand influences how the
+dataset's window is displayed. Outside the GUI, the `WINDOW` subcommand
+has no effect. The valid values are:
+
+* `ASIS`
+ Do not change how the window is displayed. This is the default for
+ `DATASET NAME` and `DATASET ACTIVATE`.
+
+* `FRONT`
+ Raise the dataset's window to the top. Make it the default dataset
+ for running syntax.
+
+* `MINIMIZED`
+ Display the window "minimized" to an icon. Prefer other datasets
+ for running syntax. This is the default for `DATASET COPY` and
+ `DATASET DECLARE`.
+
+* `HIDDEN`
+ Hide the dataset's window. Prefer other datasets for running
+ syntax.
--- /dev/null
+# DEFINE…!ENDDEFINE
+
+<!-- toc -->
+
+## Overview
+
+```
+DEFINE macro_name([argument[/argument]...])
+...body...
+!ENDDEFINE.
+```
+
+Each argument takes the following form:
+```
+{!arg_name= | !POSITIONAL}
+[!DEFAULT(default)]
+[!NOEXPAND]
+{!TOKENS(count) | !CHAREND('token') | !ENCLOSE('start' | 'end') | !CMDEND}
+```
+
+The following directives may be used within body:
+```
+!OFFEXPAND
+!ONEXPAND
+```
+
+The following functions may be used within the body:
+```
+!BLANKS(count)
+!CONCAT(arg...)
+!EVAL(arg)
+!HEAD(arg)
+!INDEX(haystack, needle)
+!LENGTH(arg)
+!NULL
+!QUOTE(arg)
+!SUBSTR(arg, start[, count])
+!TAIL(arg)
+!UNQUOTE(arg)
+!UPCASE(arg)
+```
+
+The body may also include the following constructs:
+```
+!IF (condition) !THEN true-expansion !ENDIF
+!IF (condition) !THEN true-expansion !ELSE false-expansion !ENDIF
+
+!DO !var = start !TO end [!BY step]
+ body
+!DOEND
+!DO !var !IN (expression)
+ body
+!DOEND
+
+!LET !var = expression
+```
+
+## Introduction
+
+The DEFINE command creates a "macro", which is a name for a fragment of
+PSPP syntax called the macro's "body". Following the DEFINE command,
+syntax may "call" the macro by name any number of times. Each call
+substitutes, or "expands", the macro's body in place of the call, as if
+the body had been written in its place.
+
+The following syntax defines a macro named `!vars` that expands to
+the variable names `v1 v2 v3`. The macro's name begins with `!`, which
+is optional for macro names. The `()` following the macro name are
+required:
+
+```
+DEFINE !vars()
+v1 v2 v3
+!ENDDEFINE.
+```
+
+Here are two ways that `!vars` might be called given the preceding
+definition:
+
+```
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+```
+
+With macro expansion, the above calls are equivalent to the
+following:
+
+```
+DESCRIPTIVES v1 v2 v3.
+FREQUENCIES /VARIABLES=v1 v2 v3.
+```
+
+The `!vars` macro expands to a fixed body. Macros may have more
+sophisticated contents:
+
+- Macro "[arguments](#macro-arguments)" that are substituted into the
+ body whenever they are named. The values of a macro's arguments are
+ specified each time it is called.
+
+- Macro "[functions](#macro-functions)", expanded when the macro is
+ called.
+
+- [`!IF` constructs](#macro-conditional-expansion), for conditional expansion.
+
+- Two forms of [`!DO` construct](#macro-loops), for looping over a
+ numerical range or a collection of tokens.
+
+- [`!LET` constructs](#macro-variable-assignment), for assigning to
+ macro variables.
+
+Many identifiers associated with macros begin with `!`, a character
+not normally allowed in identifiers. These identifiers are reserved
+only for use with macros, which helps keep them from being confused with
+other kinds of identifiers.
+
+The following sections provide more details on macro syntax and
+semantics.
+
+## Macro Bodies
+
+As previously shown, a macro body may contain a fragment of a PSPP
+command (such as a variable name). A macro body may also contain full
+PSPP commands. In the latter case, the macro body should also contain
+the command terminators.
+
+Most PSPP commands may occur within a macro. The `DEFINE` command
+itself is one exception, because the inner `!ENDDEFINE` ends the outer
+macro definition. For compatibility, `BEGIN DATA`...`END DATA.`
+should not be used within a macro.
+
+The body of a macro may call another macro. The following shows one
+way that could work:
+
+```
+DEFINE !commands()
+DESCRIPTIVES !vars.
+FREQUENCIES /VARIABLES=!vars.
+!ENDDEFINE.
+
+* Initially define the 'vars' macro to analyze v1...v3.
+DEFINE !vars() v1 v2 v3 !ENDDEFINE.
+!commands
+
+* Redefine 'vars' macro to analyze different variables.
+DEFINE !vars() v4 v5 !ENDDEFINE.
+!commands
+```
+
+The `!commands` macro would be easier to use if it took the variables
+to analyze as an argument rather than through another macro. The
+following section shows how to do that.
+
+## Macro Arguments
+
+This section explains how to use macro arguments. As an initial
+example, the following syntax defines a macro named `!analyze` that
+takes all the syntax up to the first command terminator as an argument:
+
+```
+DEFINE !analyze(!POSITIONAL !CMDEND)
+DESCRIPTIVES !1.
+FREQUENCIES /VARIABLES=!1.
+!ENDDEFINE.
+```
+
+When `!analyze` is called, it expands to a pair of analysis commands
+with each `!1` in the body replaced by the argument. That is, these
+calls:
+
+```
+!analyze v1 v2 v3.
+!analyze v4 v5.
+```
+
+act like the following:
+
+```
+DESCRIPTIVES v1 v2 v3.
+FREQUENCIES /VARIABLES=v1 v2 v3.
+DESCRIPTIVES v4 v5.
+FREQUENCIES /VARIABLES=v4 v5.
+```
+
+Macros may take any number of arguments, described within the
+parentheses in the DEFINE command. Arguments come in two varieties
+based on how their values are specified when the macro is called:
+
+- A "positional" argument has a required value that follows the
+ macro's name. Use the `!POSITIONAL` keyword to declare a
+ positional argument.
+
+ When a macro is called, the positional argument values appear in
+ the same order as their definitions, before any keyword argument
+ values.
+
+ References to a positional argument in a macro body are numbered:
+ `!1` is the first positional argument, `!2` the second, and so on.
+ In addition, `!*` expands to all of the positional arguments'
+ values, separated by spaces.
+
+ The following example uses a positional argument:
+
+ ```
+ DEFINE !analyze(!POSITIONAL !CMDEND)
+ DESCRIPTIVES !1.
+ FREQUENCIES /VARIABLES=!1.
+ !ENDDEFINE.
+
+ !analyze v1 v2 v3.
+ !analyze v4 v5.
+ ```
+
+- A "keyword" argument has a name. In the macro call, its value is
+ specified with the syntax `name=value`. The names allow keyword
+ argument values to take any order in the call.
+
+ In declaration and calls, a keyword argument's name may not begin
+ with `!`, but references to it in the macro body do start with a
+ leading `!`.
+
+ The following example uses a keyword argument that defaults to ALL
+ if the argument is not assigned a value:
+
+ ```
+ DEFINE !analyze_kw(vars=!DEFAULT(ALL) !CMDEND)
+ DESCRIPTIVES !vars.
+ FREQUENCIES /VARIABLES=!vars.
+ !ENDDEFINE.
+
+ !analyze_kw vars=v1 v2 v3. /* Analyze specified variables.
+ !analyze_kw. /* Analyze all variables.
+ ```
+
+If a macro has both positional and keyword arguments, then the
+positional arguments must come first in the DEFINE command, and their
+values also come first in macro calls. A keyword argument may be
+omitted by leaving its keyword out of the call, and a positional
+argument may be omitted by putting a command terminator where it would
+appear. (The latter case also omits any following positional
+arguments and all keyword arguments, if there are any.) When an
+argument is omitted, a default value is used: either the value
+specified in `!DEFAULT(value)`, or an empty value otherwise.
+
+Each argument declaration specifies the form of its value:
+
+* `!TOKENS(count)`
+ Exactly `count` tokens, e.g. `!TOKENS(1)` for a single token. Each
+ identifier, number, quoted string, operator, or punctuator is a
+ token (see [Tokens](../language/basics/tokens.md) for details).
+
+ The following variant of `!analyze_kw` accepts only a single
+ variable name (or `ALL`) as its argument:
+
+ ```
+ DEFINE !analyze_one_var(!POSITIONAL !TOKENS(1))
+ DESCRIPTIVES !1.
+ FREQUENCIES /VARIABLES=!1.
+ !ENDDEFINE.
+
+ !analyze_one_var v1.
+ ```
+
+* `!CHAREND('TOKEN')`
+ Any number of tokens up to `TOKEN`, which should be an operator or
+ punctuator token such as `/` or `+`. The `TOKEN` does not become
+ part of the value.
+
+ With the following variant of `!analyze_kw`, the variables must be
+ following by `/`:
+
+ ```
+ DEFINE !analyze_parens(vars=!CHARNED('/'))
+ DESCRIPTIVES !vars.
+ FREQUENCIES /VARIABLES=!vars.
+ !ENDDEFINE.
+
+ !analyze_parens vars=v1 v2 v3/.
+ ```
+
+* `!ENCLOSE('START','END')`
+ Any number of tokens enclosed between `START` and `END`, which
+ should each be operator or punctuator tokens. For example, use
+ `!ENCLOSE('(',')')` for a value enclosed within parentheses. (Such
+ a value could never have right parentheses inside it, even paired
+ with left parentheses.) The start and end tokens are not part of
+ the value.
+
+ With the following variant of `!analyze_kw`, the variables must be
+ specified within parentheses:
+
+ ```
+ DEFINE !analyze_parens(vars=!ENCLOSE('(',')'))
+ DESCRIPTIVES !vars.
+ FREQUENCIES /VARIABLES=!vars.
+ !ENDDEFINE.
+
+ !analyze_parens vars=(v1 v2 v3).
+ ```
+
+* `!CMDEND`
+ Any number of tokens up to the end of the command. This should be
+ used only for the last positional parameter, since it consumes all
+ of the tokens in the command calling the macro.
+
+ The following variant of `!analyze_kw` takes all the variable names
+ up to the end of the command as its argument:
+
+ ```
+ DEFINE !analyze_kw(vars=!CMDEND)
+ DESCRIPTIVES !vars.
+ FREQUENCIES /VARIABLES=!vars.
+ !ENDDEFINE.
+
+ !analyze_kw vars=v1 v2 v3.
+ ```
+
+By default, when an argument's value contains a macro call, the call
+is expanded each time the argument appears in the macro's body. The
+[`!NOEXPAND` keyword](#controlling-macro-expansion) in an argument
+declaration suppresses this expansion.
+
+## Controlling Macro Expansion
+
+Multiple factors control whether macro calls are expanded in different
+situations. At the highest level, `SET MEXPAND` controls whether
+macro calls are expanded. By default, it is enabled. [`SET
+MEXPAND`](set.md#mexpand), for details.
+
+A macro body may contain macro calls. By default, these are expanded.
+If a macro body contains `!OFFEXPAND` or `!ONEXPAND` directives, then
+`!OFFEXPAND` disables expansion of macro calls until the following
+`!ONEXPAND`.
+
+A macro argument's value may contain a macro call. These macro calls
+are expanded, unless the argument was declared with the `!NOEXPAND`
+keyword.
+
+The argument to a macro function is a special context that does not
+expand macro calls. For example, if `!vars` is the name of a macro,
+then `!LENGTH(!vars)` expands to 5, as does `!LENGTH(!1)` if
+positional argument 1 has value `!vars`. To expand macros in these
+cases, use the [`!EVAL` macro function](#eval),
+e.g. `!LENGTH(!EVAL(!vars))` or `!LENGTH(!EVAL(!1))`.
+
+These rules apply to macro calls, not to uses within a macro body of
+macro functions, macro arguments, and macro variables created by `!DO`
+or `!LET`, which are always expanded.
+
+`SET MEXPAND` may appear within the body of a macro, but it will not
+affect expansion of the macro that it appears in. Use `!OFFEXPAND`
+and `!ONEXPAND` instead.
+
+## Macro Functions
+
+Macro bodies may manipulate syntax using macro functions. Macro
+functions accept tokens as arguments and expand to sequences of
+characters.
+
+The arguments to macro functions have a restricted form. They may
+only be a single token (such as an identifier or a string), a macro
+argument, or a call to a macro function. Thus, the following are valid
+macro arguments:
+
+- `x`
+- `5.0`
+- `x`
+- `!1`
+- `"5 + 6"`
+- `!CONCAT(x,y)`
+
+and the following are not (because they are each multiple tokens):
+
+- `x y`
+- `5+6`
+
+Macro functions expand to sequences of characters. When these
+character strings are processed further as character strings,
+e.g. with `!LENGTH`, any character string is valid. When they are
+interpreted as PSPP syntax, e.g. when the expansion becomes part of a
+command, they need to be valid for that purpose. For example,
+`!UNQUOTE("It's")` will yield an error if the expansion `It's` becomes
+part of a PSPP command, because it contains unbalanced single quotes,
+but `!LENGTH(!UNQUOTE("It's"))` expands to 4.
+
+The following macro functions are available.
+
+* `!BLANKS(count)`
+ Expands to COUNT unquoted spaces, where COUNT is a nonnegative
+ integer. Outside quotes, any positive number of spaces are
+ equivalent; for a quoted string of spaces, use
+ `!QUOTE(!BLANKS(COUNT))`.
+
+ In the examples below, `_` stands in for a space to make the
+ results visible.
+
+ ```
+ !BLANKS(0) ↦ empty
+ !BLANKS(1) ↦ _
+ !BLANKS(2) ↦ __
+ !QUOTE(!BLANKS(5)) ↦ '_____'
+ ```
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!BLANKS(0)`|(empty)|
+ |`!BLANKS(1)`|`_`|
+ |`!BLANKS(2)`|`__`|
+ |`!QUOTE(!BLANKS(5))|`'_____'`|
+
+* `!CONCAT(arg...)`
+ Expands to the concatenation of all of the arguments. Before
+ concatenation, each quoted string argument is unquoted, as if
+ `!UNQUOTE` were applied. This allows for "token pasting",
+ combining two (or more) tokens into a single one:
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!CONCAT(x, y)`|`xy`|
+ |`!CONCAT('x', 'y')`|`xy`|
+ |`!CONCAT(12, 34)`|`1234`|
+ |`!CONCAT(!NULL, 123)`|`123`|
+
+ `!CONCAT` is often used for constructing a series of similar
+ variable names from a prefix followed by a number and perhaps a
+ suffix. For example:
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!CONCAT(x, 0)`|`x0`|
+ |`!CONCAT(x, 0, y)`|`x0y`|
+
+ An identifier token must begin with a letter (or `#` or `@`), which
+ means that attempting to use a number as the first part of an
+ identifier will produce a pair of distinct tokens rather than a
+ single one. For example:
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!CONCAT(0, x)`|`0 x`|
+ |`!CONCAT(0, x, y)`|`0 xy`|
+
+* <a name="eval">`!EVAL(arg)`</a>
+ Expands macro calls in ARG. This is especially useful if ARG is
+ the name of a macro or a macro argument that expands to one,
+ because arguments to macro functions are not expanded by default
+ (see [Controlling Macro Expansion](#controlling-macro-expansion)).
+
+ The following examples assume that `!vars` is a macro that expands
+ to `a b c`:
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!vars`|`a b c`|
+ |`!QUOTE(!vars)`|`'!vars'`|
+ |`!EVAL(!vars)`|`a b c`|
+ |`!QUOTE(!EVAL(!vars))`|`'a b c'`|
+
+ These examples additionally assume that argument `!1` has value
+ `!vars`:
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!1`|`a b c`|
+ |`!QUOTE(!1)`|`'!vars'`|
+ |`!EVAL(!1)`|`a b c`|
+ |`!QUOTE(!EVAL(!1))`|`'a b c'`|
+
+* `!HEAD(arg)`
+ `!TAIL(arg)`
+ `!HEAD` expands to just the first token in an unquoted version of
+ ARG, and `!TAIL` to all the tokens after the first.
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!HEAD('a b c')`|`a`|
+ |`!HEAD('a')`|`a`|
+ |`!HEAD(!NULL)`|(empty)|
+ |`!HEAD('')`|(empty)|
+ |`!TAIL('a b c')`|`b c`|
+ |`!TAIL('a')`|(empty)|
+ |`!TAIL(!NULL)`|(empty)|
+ |`!TAIL('')`|(empty)|
+
+* `!INDEX(haystack, needle)`
+ Looks for NEEDLE in HAYSTACK. If it is present, expands to the
+ 1-based index of its first occurrence; if not, expands to 0.
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!INDEX(banana, an)`|`2`|
+ |`!INDEX(banana, nan)`|`3`|
+ |`!INDEX(banana, apple)`|`0`|
+ |`!INDEX("banana", nan)`|`4`|
+ |`!INDEX("banana", "nan")`|`0`|
+ |`!INDEX(!UNQUOTE("banana"), !UNQUOTE("nan"))`|`3`|
+
+* `!LENGTH(arg)`
+ Expands to a number token representing the number of characters in
+ ARG.
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!LENGTH(123)`|`3`|
+ |`!LENGTH(123.00)`|`6`|
+ |`!LENGTH( 123 )`|`3`|
+ |`!LENGTH("123")`|`5`|
+ |`!LENGTH(xyzzy)`|`5`|
+ |`!LENGTH("xyzzy")`|`7`|
+ |`!LENGTH("xy""zzy")`|`9`|
+ |`!LENGTH(!UNQUOTE("xyzzy"))`|`5`|
+ |`!LENGTH(!UNQUOTE("xy""zzy"))`|`6`|
+ |`!LENGTH(!1)`|`5` (if `!1` is `a b c`)|
+ |`!LENGTH(!1)`|`0` (if `!1` is empty)|
+ |`!LENGTH(!NULL)`|`0`|
+
+* `!NULL`
+ Expands to an empty character sequence.
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!NULL`|(empty)|
+ |`!QUOTE(!NULL)`|`''`|
+
+* `!QUOTE(arg)`
+ `!UNQUOTE(arg)`
+ The `!QUOTE` function expands to its argument surrounded by
+ apostrophes, doubling any apostrophes inside the argument to make
+ sure that it is valid PSPP syntax for a string. If the argument
+ was already a quoted string, `!QUOTE` expands to it unchanged.
+
+ Given a quoted string argument, the `!UNQUOTED` function expands to
+ the string's contents, with the quotes removed and any doubled
+ quote marks reduced to singletons. If the argument was not a
+ quoted string, `!UNQUOTE` expands to the argument unchanged.
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!QUOTE(123.0)`|`'123.0'`|
+ |`!QUOTE( 123 )`|`'123'`|
+ |`!QUOTE('a b c')`|`'a b c'`|
+ |`!QUOTE("a b c")`|`"a b c"`|
+ |`!QUOTE(!1)`|`'a ''b'' c'` (if `!1` is `a 'b' c`)|
+ |`!UNQUOTE(123.0)`|`123.0`|
+ |`!UNQUOTE( 123 )`|`123`|
+ |`!UNQUOTE('a b c')`|`a b c`|
+ |`!UNQUOTE("a b c")`|`a b c`|
+ |`!UNQUOTE(!1)`|`a 'b' c` (if `!1` is `a 'b' c`)|
+ |`!QUOTE(!UNQUOTE(123.0))`|`'123.0'`|
+ |`!QUOTE(!UNQUOTE( 123 ))`|`'123'`|
+ |`!QUOTE(!UNQUOTE('a b c'))`|`'a b c'`|
+ |`!QUOTE(!UNQUOTE("a b c"))`|`'a b c'`|
+ |`!QUOTE(!UNQUOTE(!1))`|`'a ''b'' c'` (if `!1` is `a 'b' c`)|
+
+* `!SUBSTR(arg, start[, count])`
+ Expands to a substring of ARG starting from 1-based position START.
+ If COUNT is given, it limits the number of characters in the
+ expansion; if it is omitted, then the expansion extends to the end
+ of ARG.
+
+ ```
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!SUBSTR(banana, 3)`|`nana`|
+ |`!SUBSTR(banana, 3, 3)`|`nan`|
+ |`!SUBSTR("banana", 1, 3)`|error (`"ba` is not a valid token)|
+ |`!SUBSTR(!UNQUOTE("banana"), 3)`|`nana`|
+ |`!SUBSTR("banana", 3, 3)`|`ana`|
+ |`!SUBSTR(banana, 3, 0)`|(empty)|
+ |`!SUBSTR(banana, 3, 10)`|`nana`|
+ |`!SUBSTR(banana, 10, 3)`|(empty)|
+ ```
+
+* `!UPCASE(arg)`
+ Expands to an unquoted version of ARG with all letters converted to
+ uppercase.
+
+ |Call|Expansion|
+ |:-----|:--------|
+ |`!UPCASE(freckle)`|`FRECKLE`|
+ |`!UPCASE('freckle')`|`FRECKLE`|
+ |`!UPCASE('a b c')`|`A B C`|
+ |`!UPCASE('A B C')`|`A B C`|
+
+## Macro Expressions
+
+Macro expressions are used in conditional expansion and loops, which are
+described in the following sections. A macro expression may use the
+following operators, listed in descending order of operator precedence:
+
+* `()`
+ Parentheses override the default operator precedence.
+
+* `!EQ !NE !GT !LT !GE !LE = ~= <> > < >= <=`
+ Relational operators compare their operands and yield a Boolean
+ result, either `0` for false or `1` for true.
+
+ These operators always compare their operands as strings. This can
+ be surprising when the strings are numbers because, e.g., `1 < 1.0`
+ and `10 < 2` both evaluate to `1` (true).
+
+ Comparisons are case sensitive, so that `a = A` evaluates to `0`
+ (false).
+
+* `!NOT ~`
+ `!AND &`
+ `!OR |`
+ Logical operators interpret their operands as Boolean values, where
+ quoted or unquoted `0` is false and anything else is true, and
+ yield a Boolean result, either `0` for false or `1` for true.
+
+Macro expressions do not include any arithmetic operators.
+
+An operand in an expression may be a single token (including a macro
+argument name) or a macro function invocation. Either way, the
+expression evaluator unquotes the operand, so that `1 = '1'` is true.
+
+## Macro Conditional Expansion
+
+The `!IF` construct may be used inside a macro body to allow for
+conditional expansion. It takes the following forms:
+
+```
+!IF (EXPRESSION) !THEN TRUE-EXPANSION !IFEND
+!IF (EXPRESSION) !THEN TRUE-EXPANSION !ELSE FALSE-EXPANSION !IFEND
+```
+
+When `EXPRESSION` evaluates to true, the macro processor expands
+`TRUE-EXPANSION`; otherwise, it expands `FALSE-EXPANSION`, if it is
+present. The macro processor considers quoted or unquoted `0` to be
+false, and anything else to be true.
+
+## Macro Loops
+
+The body of a macro may include two forms of loops: loops over numerical
+ranges and loops over tokens. Both forms expand a "loop body" multiple
+times, each time setting a named "loop variable" to a different value.
+The loop body typically expands the loop variable at least once.
+
+The [`MITERATE` setting](set.md#miterate) limits the number of
+iterations in a loop. This is a safety measure to ensure that macro
+expansion terminates. PSPP issues a warning when the `MITERATE` limit is
+exceeded.
+
+### Loops Over Ranges
+
+```
+!DO !VAR = START !TO END [!BY STEP]
+ BODY
+!DOEND
+```
+
+A loop over a numerical range has the form shown above. `START`,
+`END`, and `STEP` (if included) must be expressions with numeric
+values. The macro processor accepts both integers and real numbers.
+The macro processor expands `BODY` for each numeric value from `START`
+to `END`, inclusive.
+
+The default value for `STEP` is 1. If `STEP` is positive and `FIRST >
+LAST`, or if `STEP` is negative and `FIRST < LAST`, then the macro
+processor doesn't expand the body at all. `STEP` may not be zero.
+
+### Loops Over Tokens
+
+```
+!DO !VAR !IN (EXPRESSION)
+ BODY
+!DOEND
+```
+
+A loop over tokens takes the form shown above. The macro processor
+evaluates `EXPRESSION` and expands `BODY` once per token in the
+result, substituting the token for `!VAR` each time it appears.
+
+## Macro Variable Assignment
+
+The `!LET` construct evaluates an expression and assigns the result to a
+macro variable. It may create a new macro variable or change the value
+of one created by a previous `!LET` or `!DO`, but it may not change the
+value of a macro argument. `!LET` has the following form:
+
+```
+!LET !VAR = EXPRESSION
+```
+
+If `EXPRESSION` is more than one token, it must be enclosed in
+parentheses.
+
+## Macro Settings
+
+the [`SET`](set.md) command controls some macro behavior. This
+section describes these settings.
+
+Any `SET` command that changes these settings within a macro body only
+takes effect following the macro. This is because PSPP expands a
+macro's entire body at once, so that `SET` inside the body only
+executes afterwards.
+
+The [`MEXPAND`](set.md#mexpand) setting controls whether
+macros will be expanded at all. By default, macro expansion is on.
+To avoid expansion of macros called within a macro body, use
+[`!OFFEXPAND` and `!ONEXPAND`](#controlling-macro-expansion).
+
+When [`MPRINT`](set.md#mprint) is turned on, PSPP outputs
+an expansion of each macro called. This feature can be useful for
+debugging macro definitions. For reading the expanded version, keep
+in mind that macro expansion removes comments and standardizes white
+space.
+
+[`MNEST`](set.md#mnest) limits the depth of expansion of
+macro calls, that is, the nesting level of macro expansion. The
+default is 50. This is mainly useful to avoid infinite expansion in
+the case of a macro that calls itself.
+
+[`MITERATE`](set.md#miterate) limits the number of
+iterations in a `!DO` construct. The default is 1000.
+
+## Additional Notes
+
+### Calling Macros from Macros
+
+If the body of macro A includes a call to macro B, the call can use
+macro arguments (including `!*`) and macro variables as part of
+arguments to B. For `!TOKENS` arguments, the argument or variable name
+counts as one token regardless of the number that it expands into; for
+`!CHAREND` and `!ENCLOSE` arguments, the delimiters come only from the
+call, not the expansions; and `!CMDEND` ends at the calling command, not
+any end of command within an argument or variable.
+
+Macro functions are not supported as part of the arguments in a macro
+call. To get the same effect, use `!LET` to define a macro variable,
+then pass the macro variable to the macro.
+
+When macro A calls macro B, the order of their `DEFINE` commands
+doesn't matter, as long as macro B has been defined when A is called.
+
+### Command Terminators
+
+Macros and command terminators require care. Macros honor the syntax
+differences between [interactive and batch
+syntax](../language/basics/syntax-variants.md), which means that the
+interpretation of a macro can vary depending on the syntax mode in
+use. We assume here that interactive mode is in use, in which `.` at
+the end of a line is the primary way to end a command.
+
+The `DEFINE` command needs to end with `.` following the `!ENDDEFINE`.
+The macro body may contain `.` if it is intended to expand to whole
+commands, but using `.` within a macro body that expands to just
+syntax fragments (such as a list of variables) will cause syntax
+errors.
+
+Macro directives such as `!IF` and `!DO` do not end with `.`.
+
+### Expansion Contexts
+
+PSPP does not expand macros within comments, whether introduced within
+a line by `/*` or as a separate [`COMMENT` or
+`*`](comment.md) command. (SPSS does expand macros in
+`COMMENT` and `*`.)
+
+Macros do not expand within quoted strings.
+
+Macros are expanded in the [`TITLE`](title.md) and
+[`SUBTITLE`](subtitle.md) commands as long as their
+arguments are not quoted strings.
+
+### PRESERVE and RESTORE
+
+Some macro bodies might use the [`SET`](set.md) command to change
+certain settings. When this is the case, consider using the
+[`PRESERVE` and `RESTORE`](preserve.md) commands to save and then
+restore these settings.
+
--- /dev/null
+# DELETE VARIABLES
+
+`DELETE VARIABLES` deletes the specified variables from the dictionary.
+
+```
+DELETE VARIABLES VAR_LIST.
+```
+
+`DELETE VARIABLES` should not be used after defining transformations
+but before executing a procedure. If it is anyhow, it causes the data
+to be read. If it is used while `TEMPORARY` is in effect, it causes
+the temporary transformations to become permanent.
+
+`DELETE VARIABLES` may not be used to delete all variables from the
+dictionary; use [`NEW FILE`](new-file.html) instead.
+
--- /dev/null
+#DESCRIPTIVES
+
+```
+DESCRIPTIVES
+ /VARIABLES=VAR_LIST
+ /MISSING={VARIABLE,LISTWISE} {INCLUDE,NOINCLUDE}
+ /FORMAT={LABELS,NOLABELS} {NOINDEX,INDEX} {LINE,SERIAL}
+ /SAVE
+ /STATISTICS={ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
+ SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
+ SESKEWNESS,SEKURTOSIS}
+ /SORT={NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
+ RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME}
+ {A,D}
+```
+
+The `DESCRIPTIVES` procedure reads the active dataset and outputs
+linear descriptive statistics requested by the user. It can also
+compute Z-scores.
+
+The `VARIABLES` subcommand, which is required, specifies the list of
+variables to be analyzed. Keyword `VARIABLES` is optional.
+
+All other subcommands are optional:
+
+The `MISSING` subcommand determines the handling of missing variables.
+If `INCLUDE` is set, then user-missing values are included in the
+calculations. If `NOINCLUDE` is set, which is the default,
+user-missing values are excluded. If `VARIABLE` is set, then missing
+values are excluded on a variable by variable basis; if `LISTWISE` is
+set, then the entire case is excluded whenever any value in that case
+has a system-missing or, if `INCLUDE` is set, user-missing value.
+
+The `FORMAT` subcommand has no effect. It is accepted for backward
+compatibility.
+
+The `SAVE` subcommand causes `DESCRIPTIVES` to calculate Z scores for
+all the specified variables. The Z scores are saved to new variables.
+Variable names are generated by trying first the original variable
+name with Z prepended and truncated to a maximum of 8 characters, then
+the names `ZSC000` through `ZSC999`, `STDZ00` through `STDZ09`,
+`ZZZZ00` through `ZZZZ09`, `ZQZQ00` through `ZQZQ09`, in that order.
+Z-score variable names may also be specified explicitly on `VARIABLES`
+in the variable list by enclosing them in parentheses after each
+variable. When Z scores are calculated, PSPP ignores
+[`TEMPORARY`](temporary.md), treating temporary transformations as
+permanent.
+
+The `STATISTICS` subcommand specifies the statistics to be displayed:
+
+* `ALL`
+ All of the statistics below.
+* `MEAN`
+ Arithmetic mean.
+* `SEMEAN`
+ Standard error of the mean.
+* `STDDEV`
+ Standard deviation.
+* `VARIANCE`
+ Variance.
+* `KURTOSIS`
+ Kurtosis and standard error of the kurtosis.
+* `SKEWNESS`
+ Skewness and standard error of the skewness.
+* `RANGE`
+ Range.
+* `MINIMUM`
+ Minimum value.
+* `MAXIMUM`
+ Maximum value.
+* `SUM`
+ Sum.
+* `DEFAULT`
+ Mean, standard deviation of the mean, minimum, maximum.
+* `SEKURTOSIS`
+ Standard error of the kurtosis.
+* `SESKEWNESS`
+ Standard error of the skewness.
+
+The `SORT` subcommand specifies how the statistics should be sorted.
+Most of the possible values should be self-explanatory. `NAME` causes
+the statistics to be sorted by name. By default, the statistics are
+listed in the order that they are specified on the `VARIABLES`
+subcommand. The `A` and `D` settings request an ascending or
+descending sort order, respectively.
+
+## Example
+
+The `physiology.sav` file contains various physiological data for a
+sample of persons. Running the `DESCRIPTIVES` command on the
+variables height and temperature with the default options allows one
+to see simple linear statistics for these two variables. In the
+example below, these variables are specfied on the `VARIABLES`
+subcommand and the `SAVE` option has been used, to request that Z
+scores be calculated.
+
+After the command completes, this example runs `DESCRIPTIVES` again,
+this time on the zheight and ztemperature variables, which are the two
+normalized (Z-score) variables generated by the first `DESCRIPTIVES`
+command.
+
+```
+get file='physiology.sav'.
+
+descriptives
+ /variables = height temperature
+ /save.
+
+descriptives
+ /variables = zheight ztemperature.
+```
+
+In the output below, we can see that there are 40 valid data for each
+of the variables and no missing values. The mean average of the
+height and temperature is 16677.12 and 37.02 respectively. The
+descriptive statistics for temperature seem reasonable. However there
+is a very high standard deviation for height and a suspiciously low
+minimum. This is due to a data entry error in the data.
+
+In the second Descriptive Statistics output, one can see that the mean
+and standard deviation of both Z score variables is 0 and 1
+respectively. All Z score statistics should have these properties
+since they are normalized versions of the original scores.
+
+```
+ Mapping of Variables to Z-scores
+┌────────────────────────────────────────────┬────────────┐
+│ Source │ Target │
+├────────────────────────────────────────────┼────────────┤
+│Height in millimeters │Zheight │
+│Internal body temperature in degrees Celcius│Ztemperature│
+└────────────────────────────────────────────┴────────────┘
+
+ Descriptive Statistics
+┌──────────────────────────────────────────┬──┬───────┬───────┬───────┬───────┐
+│ │ N│ Mean │Std Dev│Minimum│Maximum│
+├──────────────────────────────────────────┼──┼───────┼───────┼───────┼───────┤
+│Height in millimeters │40│1677.12│ 262.87│ 179│ 1903│
+│Internal body temperature in degrees │40│ 37.02│ 1.82│ 32.59│ 39.97│
+│Celcius │ │ │ │ │ │
+│Valid N (listwise) │40│ │ │ │ │
+│Missing N (listwise) │ 0│ │ │ │ │
+└──────────────────────────────────────────┴──┴───────┴───────┴───────┴───────┘
+
+ Descriptive Statistics
+┌─────────────────────────────────────────┬──┬─────────┬──────┬───────┬───────┐
+│ │ │ │ Std │ │ │
+│ │ N│ Mean │ Dev │Minimum│Maximum│
+├─────────────────────────────────────────┼──┼─────────┼──────┼───────┼───────┤
+│Z─score of Height in millimeters │40│1.93E─015│ 1.00│ ─5.70│ .86│
+│Z─score of Internal body temperature in │40│1.37E─015│ 1.00│ ─2.44│ 1.62│
+│degrees Celcius │ │ │ │ │ │
+│Valid N (listwise) │40│ │ │ │ │
+│Missing N (listwise) │ 0│ │ │ │ │
+└─────────────────────────────────────────┴──┴─────────┴──────┴───────┴───────┘
+```
--- /dev/null
+# DISPLAY DOCUMENTS
+
+```
+DISPLAY DOCUMENTS.
+```
+
+`DISPLAY DOCUMENTS` displays the documents in the active dataset.
+Each document is preceded by a line giving the time and date that it
+was added. See also [`DOCUMENT`](document.md).
+
--- /dev/null
+# DISPLAY FILE LABEL
+
+```
+DISPLAY FILE LABEL.
+```
+
+`DISPLAY FILE LABEL` displays the file label contained in the active
+dataset, if any. See also [`FILE LABEL`](file-label.md).
+
+This command is a PSPP extension.
+
--- /dev/null
+# DISPLAY
+
+The `DISPLAY` command displays information about the variables in the
+active dataset. A variety of different forms of information can be
+requested. By default, all variables in the active dataset are
+displayed. However you can select variables of interest using the
+`/VARIABLES` subcommand.
+
+```
+DISPLAY [SORTED] NAMES [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] INDEX [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] LABELS [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] VARIABLES [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] DICTIONARY [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] SCRATCH [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] ATTRIBUTES [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] @ATTRIBUTES [[/VARIABLES=]VAR_LIST].
+DISPLAY [SORTED] VECTORS.
+```
+
+The following keywords primarily cause information about variables to
+be displayed. With these keywords, by default information is
+displayed about all variable in the active dataset, in the order that
+variables occur in the active dataset dictionary. The `SORTED`
+keyword causes output to be sorted alphabetically by variable name.
+
+* `NAMES`
+ The variables' names are displayed.
+
+* `INDEX`
+ The variables' names are displayed along with a value describing
+ their position within the active dataset dictionary.
+
+* `LABELS`
+ Variable names, positions, and variable labels are displayed.
+
+* `VARIABLES`
+ Variable names, positions, print and write formats, and missing
+ values are displayed.
+
+* `DICTIONARY`
+ Variable names, positions, print and write formats, missing values,
+ variable labels, and value labels are displayed.
+
+* `SCRATCH`
+ Displays Variablen ames, for [scratch
+ variables](../language/datasets/scratch-variables.md) only.
+ variables](../language/datasets/scratch-variables.md) only.
+
+* `ATTRIBUTES`
+ Datafile and variable attributes are displayed, except attributes
+ whose names begin with `@` or `$@`.
+
+* `@ATTRIBUTES`
+ All datafile and variable attributes, even those whose names begin
+ with `@` or `$@`.
+
+With the `VECTOR` keyword, `DISPLAY` lists all the currently declared
+vectors. If the `SORTED` keyword is given, the vectors are listed in
+alphabetical order; otherwise, they are listed in textual order of
+definition within the PSPP syntax file.
+
+For related commands, see [`DISPLAY DOCUMENTS`](display-documents.md)
+and [`DISPLAY FILE LABEL`](display-file-label.md).
+
--- /dev/null
+# DO IF…END IF
+
+```
+DO IF condition.
+ ...
+[ELSE IF condition.
+ ...
+]...
+[ELSE.
+ ...]
+END IF.
+```
+
+`DO IF` allows one of several sets of transformations to be executed,
+depending on user-specified conditions.
+
+If the specified boolean expression evaluates as true, then the block
+of code following `DO IF` is executed. If it evaluates as missing,
+then none of the code blocks is executed. If it is false, then the
+boolean expression on the first `ELSE IF`, if present, is tested in
+turn, with the same rules applied. If all expressions evaluate to
+false, then the `ELSE` code block is executed, if it is present.
+
+When `DO IF` or `ELSE IF` is specified following
+[`TEMPORARY`](temporary.md), the
+[`LAG`](../language/expressions/functions/miscellaneous.md) function
+may not be used.
+
--- /dev/null
+# DO REPEAT…END REPEAT
+
+```
+DO REPEAT dummy_name=expansion....
+ ...
+END REPEAT [PRINT].
+
+expansion takes one of the following forms:
+ var_list
+ num_or_range...
+ 'string'...
+ ALL
+
+num_or_range takes one of the following forms:
+ number
+ num1 TO num2
+```
+
+`DO REPEAT` repeats a block of code, textually substituting different
+variables, numbers, or strings into the block with each repetition.
+
+Specify a dummy variable name followed by an equals sign (`=`) and
+the list of replacements. Replacements can be a list of existing or new
+variables, numbers, strings, or `ALL` to specify all existing variables.
+When numbers are specified, runs of increasing integers may be indicated
+as `NUM1 TO NUM2`, so that `1 TO 5` is short for `1 2 3 4 5`.
+
+Multiple dummy variables can be specified. Each variable must have
+the same number of replacements.
+
+The code within `DO REPEAT` is repeated as many times as there are
+replacements for each variable. The first time, the first value for
+each dummy variable is substituted; the second time, the second value
+for each dummy variable is substituted; and so on.
+
+Dummy variable substitutions work like macros. They take place
+anywhere in a line that the dummy variable name occurs. This includes
+command and subcommand names, so command and subcommand names that
+appear in the code block should not be used as dummy variable
+identifiers. Dummy variable substitutions do not occur inside quoted
+strings, comments, unquoted strings (such as the text on the `TITLE`
+or `DOCUMENT` command), or inside `BEGIN DATA`...`END DATA`.
+
+Substitution occurs only on whole words, so that, for example, a dummy
+variable `PRINT` would not be substituted into the word `PRINTOUT`.
+
+New variable names used as replacements are not automatically created
+as variables, but only if used in the code block in a context that
+would create them, e.g. on a `NUMERIC` or `STRING` command or on the
+left side of a `COMPUTE` assignment.
+
+Any command may appear within `DO REPEAT`, including nested `DO
+REPEAT` commands. If `INCLUDE` or `INSERT` appears within `DO
+REPEAT`, the substitutions do not apply to the included file.
+
+If `PRINT` is specified on `END REPEAT`, the commands after
+substitutions are made should be printed to the listing file, prefixed
+by a plus sign (`+`). This feature is not yet implemented.
+
--- /dev/null
+# DOCUMENT
+
+```
+DOCUMENT DOCUMENTARY_TEXT.
+```
+
+`DOCUMENT` adds one or more lines of descriptive commentary to the
+active dataset. Documents added in this way are saved to system
+files. They can be viewed using `SYSFILE INFO` or [`DISPLAY
+DOCUMENTS`](display-documents.md). They can be removed from the
+active dataset with [`DROP DOCUMENTS`](drop-documents.md).
+
+Specify the text of the document following the `DOCUMENT` keyword. It
+is interpreted literally—any quotes or other punctuation marks are
+included in the file. You can extend the documentary text over as
+many lines as necessary, including blank lines to separate paragraphs.
+Lines are truncated at 80 bytes. Don't forget to terminate the
+command with a dot at the end of a line. See also [ADD
+DOCUMENT](add-document.md).
+
--- /dev/null
+# DROP DOCUMENTS
+
+```
+DROP DOCUMENTS.
+```
+
+`DROP DOCUMENTS` removes all documents from the active dataset. New
+documents can be added with [`DOCUMENT`](document.md).
+
+`DROP DOCUMENTS` changes only the active dataset. It does not modify
+any system files stored on disk.
+
--- /dev/null
+# ECHO
+
+```
+ECHO 'arbitrary text' .
+```
+
+Use `ECHO` to write arbitrary text to the output stream. The text
+should be enclosed in quotation marks following the normal rules for
+[string tokens](../language/basics/tokens.md#strings).
+[string tokens](../language/basics/tokens.md#strings).
+
--- /dev/null
+# END CASE
+
+```
+END CASE.
+```
+
+`END CASE` is used only within [`INPUT PROGRAM`](input-program.md) to
+output the current case.
+
--- /dev/null
+# END FILE
+
+```
+END FILE.
+```
+
+`END FILE` is used only within [`INPUT PROGRAM`](input-program.md) to
+terminate the current input program.
+
--- /dev/null
+# ERASE
+
+```
+ERASE FILE "FILE_NAME".
+```
+
+`ERASE FILE` deletes a file from the local file system. The file's
+name must be quoted. This command cannot be used if the
+[`SAFER`](set.md#safer) setting is active.
+
--- /dev/null
+#EXAMINE
+
+```
+EXAMINE
+ VARIABLES= VAR1 [VAR2] ... [VARN]
+ [BY FACTOR1 [BY SUBFACTOR1]
+ [ FACTOR2 [BY SUBFACTOR2]]
+ ...
+ [ FACTOR3 [BY SUBFACTOR3]]
+ ]
+ /STATISTICS={DESCRIPTIVES, EXTREME[(N)], ALL, NONE}
+ /PLOT={BOXPLOT, NPPLOT, HISTOGRAM, SPREADLEVEL[(T)], ALL, NONE}
+ /CINTERVAL P
+ /COMPARE={GROUPS,VARIABLES}
+ /ID=IDENTITY_VARIABLE
+ /{TOTAL,NOTOTAL}
+ /PERCENTILE=[PERCENTILES]={HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL }
+ /MISSING={LISTWISE, PAIRWISE} [{EXCLUDE, INCLUDE}]
+ [{NOREPORT,REPORT}]
+```
+
+`EXAMINE` is used to perform exploratory data analysis. In
+particular, it is useful for testing how closely a distribution
+follows a normal distribution, and for finding outliers and extreme
+values.
+
+The `VARIABLES` subcommand is mandatory. It specifies the dependent
+variables and optionally variables to use as factors for the analysis.
+Variables listed before the first `BY` keyword (if any) are the
+dependent variables. The dependent variables may optionally be followed
+by a list of factors which tell PSPP how to break down the analysis for
+each dependent variable.
+
+Following the dependent variables, factors may be specified. The
+factors (if desired) should be preceded by a single `BY` keyword. The
+format for each factor is `FACTORVAR [BY SUBFACTORVAR]`. Each unique
+combination of the values of `FACTORVAR` and `SUBFACTORVAR` divide the
+dataset into "cells". Statistics are calculated for each cell and for
+the entire dataset (unless `NOTOTAL` is given).
+
+The `STATISTICS` subcommand specifies which statistics to show.
+`DESCRIPTIVES` produces a table showing some parametric and
+non-parametrics statistics. `EXTREME` produces a table showing the
+extremities of each cell. A number in parentheses determines how many
+upper and lower extremities to show. The default number is 5.
+
+The subcommands `TOTAL` and `NOTOTAL` are mutually exclusive. If
+`TOTAL` appears, then statistics for the entire dataset as well as for
+each cell are produced. If `NOTOTAL` appears, then statistics are
+produced only for the cells (unless no factor variables have been
+given). These subcommands have no effect if there have been no factor
+variables specified.
+
+The `PLOT` subcommand specifies which plots are to be produced if
+any. Available plots are `HISTOGRAM`, `NPPLOT`, `BOXPLOT` and
+`SPREADLEVEL`. The first three can be used to visualise how closely
+each cell conforms to a normal distribution, whilst the spread vs. level
+plot can be useful to visualise how the variance differs between
+factors. Boxplots show you the outliers and extreme values.[^1]
+
+[^1]: `HISTOGRAM` uses Sturges' rule to determine the number of bins,
+as approximately \\(1 + \log2(n)\\), where \\(n\\) is the number of
+samples. ([`FREQUENCIES`](frequencies.md) uses a different algorithm
+to find the bin size.)
+
+The `SPREADLEVEL` plot displays the interquartile range versus the
+median. It takes an optional parameter `T`, which specifies how the
+data should be transformed prior to plotting. The given value `T` is
+a power to which the data are raised. For example, if `T` is given as
+2, then the square of the data is used. Zero, however is a special
+value. If `T` is 0 or is omitted, then data are transformed by taking
+its natural logarithm instead of raising to the power of `T`.
+
+When one or more plots are requested, `EXAMINE` also performs the
+Shapiro-Wilk test for each category. There are however a number of
+provisos:
+- All weight values must be integer.
+- The cumulative weight value must be in the range \[3, 5000\].
+
+The `COMPARE` subcommand is only relevant if producing boxplots, and
+it is only useful there is more than one dependent variable and at least
+one factor. If `/COMPARE=GROUPS` is specified, then one plot per
+dependent variable is produced, each of which contain boxplots for all
+the cells. If `/COMPARE=VARIABLES` is specified, then one plot per cell
+is produced, each containing one boxplot per dependent variable. If the
+`/COMPARE` subcommand is omitted, then PSPP behaves as if
+`/COMPARE=GROUPS` were given.
+
+The `ID` subcommand is relevant only if `/PLOT=BOXPLOT` or
+`/STATISTICS=EXTREME` has been given. If given, it should provide the
+name of a variable which is to be used to labels extreme values and
+outliers. Numeric or string variables are permissible. If the `ID`
+subcommand is not given, then the case number is used for labelling.
+
+The `CINTERVAL` subcommand specifies the confidence interval to use
+in calculation of the descriptives command. The default is 95%.
+
+The `PERCENTILES` subcommand specifies which percentiles are to be
+calculated, and which algorithm to use for calculating them. The
+default is to calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using
+the `HAVERAGE` algorithm.
+
+The `TOTAL` and `NOTOTAL` subcommands are mutually exclusive. If
+`NOTOTAL` is given and factors have been specified in the `VARIABLES`
+subcommand, then statistics for the unfactored dependent variables are
+produced in addition to the factored variables. If there are no factors
+specified then `TOTAL` and `NOTOTAL` have no effect.
+
+The following example generates descriptive statistics and histograms
+for two variables `score1` and `score2`. Two factors are given: `gender`
+and `gender BY culture`. Therefore, the descriptives and histograms are
+generated for each distinct value of `gender` _and_ for each distinct
+combination of the values of `gender` and `race`. Since the `NOTOTAL`
+keyword is given, statistics and histograms for `score1` and `score2`
+covering the whole dataset are not produced.
+
+```
+EXAMINE score1 score2 BY
+ gender
+ gender BY culture
+ /STATISTICS = DESCRIPTIVES
+ /PLOT = HISTOGRAM
+ /NOTOTAL.
+```
+
+Here is a second example showing how `EXAMINE` may be used to find
+extremities.
+
+```
+EXAMINE height weight BY
+ gender
+ /STATISTICS = EXTREME (3)
+ /PLOT = BOXPLOT
+ /COMPARE = GROUPS
+ /ID = name.
+```
+
+In this example, we look at the height and weight of a sample of
+individuals and how they differ between male and female. A table
+showing the 3 largest and the 3 smallest values of height and weight for
+each gender, and for the whole dataset as are shown. In addition, the
+`/PLOT` subcommand requests boxplots. Because `/COMPARE = GROUPS` was
+specified, boxplots for male and female are shown in juxtaposed in the
+same graphic, allowing us to easily see the difference between the
+genders. Since the variable `name` was specified on the `ID` subcommand,
+values of the `name` variable are used to label the extreme values.
+
+> ⚠️ If you specify many dependent variables or factor variables for
+which there are many distinct values, then `EXAMINE` will produce a
+very large quantity of output.
--- /dev/null
+# EXECUTE
+
+```
+EXECUTE.
+```
+
+`EXECUTE` causes the active dataset to be read and all pending
+transformations to be executed.
+
--- /dev/null
+# EXPORT
+
+```
+EXPORT
+ /OUTFILE='FILE_NAME'
+ /UNSELECTED={RETAIN,DELETE}
+ /DIGITS=N
+ /DROP=VAR_LIST
+ /KEEP=VAR_LIST
+ /RENAME=(SRC_NAMES=TARGET_NAMES)...
+ /TYPE={COMM,TAPE}
+ /MAP
+```
+
+ The `EXPORT` procedure writes the active dataset's dictionary and
+data to a specified portable file.
+
+ `UNSELECTED` controls whether cases excluded with
+[`FILTER`](filter.md) are written to the file. These can
+be excluded by specifying `DELETE` on the `UNSELECTED` subcommand.
+The default is `RETAIN`.
+
+ Portable files express real numbers in base 30. Integers are
+always expressed to the maximum precision needed to make them exact.
+Non-integers are, by default, expressed to the machine's maximum
+natural precision (approximately 15 decimal digits on many machines).
+If many numbers require this many digits, the portable file may
+significantly increase in size. As an alternative, the `DIGITS`
+subcommand may be used to specify the number of decimal digits of
+precision to write. `DIGITS` applies only to non-integers.
+
+ The `OUTFILE` subcommand, which is the only required subcommand,
+specifies the portable file to be written as a file name string or a
+[file handle](../language/files/file-handles.md).
+
+`DROP`, `KEEP`, and `RENAME` have the same syntax and meaning as for
+the [`SAVE`](save.md) command.
+
+ The `TYPE` subcommand specifies the character set for use in the
+portable file. Its value is currently not used.
+
+ The `MAP` subcommand is currently ignored.
+
+ `EXPORT` is a procedure. It causes the active dataset to be read.
+
--- /dev/null
+# FACTOR
+
+```
+FACTOR {
+ VARIABLES=VAR_LIST,
+ MATRIX IN ({CORR,COV}={*,FILE_SPEC})
+ }
+
+ [ /METHOD = {CORRELATION, COVARIANCE} ]
+
+ [ /ANALYSIS=VAR_LIST ]
+
+ [ /EXTRACTION={PC, PAF}]
+
+ [ /ROTATION={VARIMAX, EQUAMAX, QUARTIMAX, PROMAX[(K)], NOROTATE}]
+
+ [ /PRINT=[INITIAL] [EXTRACTION] [ROTATION] [UNIVARIATE] [CORRELATION] [COVARIANCE] [DET] [KMO] [AIC] [SIG] [ALL] [DEFAULT] ]
+
+ [ /PLOT=[EIGEN] ]
+
+ [ /FORMAT=[SORT] [BLANK(N)] [DEFAULT] ]
+
+ [ /CRITERIA=[FACTORS(N)] [MINEIGEN(L)] [ITERATE(M)] [ECONVERGE (DELTA)] [DEFAULT] ]
+
+ [ /MISSING=[{LISTWISE, PAIRWISE}] [{INCLUDE, EXCLUDE}] ]
+```
+
+The `FACTOR` command performs Factor Analysis or Principal Axis
+Factoring on a dataset. It may be used to find common factors in the
+data or for data reduction purposes.
+
+The `VARIABLES` subcommand is required (unless the `MATRIX IN`
+subcommand is used). It lists the variables which are to partake in the
+analysis. (The `ANALYSIS` subcommand may optionally further limit the
+variables that participate; it is useful primarily in conjunction with
+`MATRIX IN`.)
+
+If `MATRIX IN` instead of `VARIABLES` is specified, then the analysis
+is performed on a pre-prepared correlation or covariance matrix file
+instead of on individual data cases. Typically the [matrix
+file](matrices.md#matrix-files) will have been generated by [`MATRIX
+DATA`](matrix-data.md) or provided by a third party. If specified,
+`MATRIX IN` must be followed by `COV` or `CORR`, then by `=` and
+`FILE_SPEC` all in parentheses. `FILE_SPEC` may either be an
+asterisk, which indicates the currently loaded dataset, or it may be a
+file name to be loaded. See [`MATRIX DATA`](matrix-data.md), for the
+expected format of the file.
+
+The `/EXTRACTION` subcommand is used to specify the way in which
+factors (components) are extracted from the data. If `PC` is specified,
+then Principal Components Analysis is used. If `PAF` is specified, then
+Principal Axis Factoring is used. By default Principal Components
+Analysis is used.
+
+The `/ROTATION` subcommand is used to specify the method by which the
+extracted solution is rotated. Three orthogonal rotation methods are
+available: `VARIMAX` (which is the default), `EQUAMAX`, and `QUARTIMAX`.
+There is one oblique rotation method, viz: `PROMAX`. Optionally you may
+enter the power of the promax rotation K, which must be enclosed in
+parentheses. The default value of K is 5. If you don't want any
+rotation to be performed, the word `NOROTATE` prevents the command from
+performing any rotation on the data.
+
+The `/METHOD` subcommand should be used to determine whether the
+covariance matrix or the correlation matrix of the data is to be
+analysed. By default, the correlation matrix is analysed.
+
+The `/PRINT` subcommand may be used to select which features of the
+analysis are reported:
+
+- `UNIVARIATE` A table of mean values, standard deviations and total
+ weights are printed.
+- `INITIAL` Initial communalities and eigenvalues are printed.
+- `EXTRACTION` Extracted communalities and eigenvalues are printed.
+- `ROTATION` Rotated communalities and eigenvalues are printed.
+- `CORRELATION` The correlation matrix is printed.
+- `COVARIANCE` The covariance matrix is printed.
+- `DET` The determinant of the correlation or covariance matrix is
+ printed.
+- `AIC` The anti-image covariance and anti-image correlation matrices
+ are printed.
+- `KMO` The Kaiser-Meyer-Olkin measure of sampling adequacy and the
+ Bartlett test of sphericity is printed.
+- `SIG` The significance of the elements of correlation matrix is
+ printed.
+- `ALL` All of the above are printed.
+- `DEFAULT` Identical to `INITIAL` and `EXTRACTION`.
+
+If `/PLOT=EIGEN` is given, then a "Scree" plot of the eigenvalues is
+printed. This can be useful for visualizing the factors and deciding
+which factors (components) should be retained.
+
+The `/FORMAT` subcommand determined how data are to be displayed in
+loading matrices. If `SORT` is specified, then the variables are sorted
+in descending order of significance. If `BLANK(N)` is specified, then
+coefficients whose absolute value is less than N are not printed. If
+the keyword `DEFAULT` is specified, or if no `/FORMAT` subcommand is
+specified, then no sorting is performed, and all coefficients are
+printed.
+
+You can use the `/CRITERIA` subcommand to specify how the number of
+extracted factors (components) are chosen. If `FACTORS(N)` is
+specified, where N is an integer, then N factors are extracted.
+Otherwise, the `MINEIGEN` setting is used. `MINEIGEN(L)` requests that
+all factors whose eigenvalues are greater than or equal to L are
+extracted. The default value of L is 1. The `ECONVERGE` setting has
+effect only when using iterative algorithms for factor extraction (such
+as Principal Axis Factoring). `ECONVERGE(DELTA)` specifies that
+iteration should cease when the maximum absolute value of the
+communality estimate between one iteration and the previous is less than
+DELTA. The default value of DELTA is 0.001.
+
+The `ITERATE(M)` may appear any number of times and is used for two
+different purposes. It is used to set the maximum number of iterations
+(M) for convergence and also to set the maximum number of iterations for
+rotation. Whether it affects convergence or rotation depends upon which
+subcommand follows the `ITERATE` subcommand. If `EXTRACTION` follows,
+it affects convergence. If `ROTATION` follows, it affects rotation. If
+neither `ROTATION` nor `EXTRACTION` follow a `ITERATE` subcommand, then
+the entire subcommand is ignored. The default value of M is 25.
+
+The `MISSING` subcommand determines the handling of missing
+variables. If `INCLUDE` is set, then user-missing values are included
+in the calculations, but system-missing values are not. If `EXCLUDE` is
+set, which is the default, user-missing values are excluded as well as
+system-missing values. This is the default. If `LISTWISE` is set, then
+the entire case is excluded from analysis whenever any variable
+specified in the `VARIABLES` subcommand contains a missing value.
+
+If `PAIRWISE` is set, then a case is considered missing only if
+either of the values for the particular coefficient are missing. The
+default is `LISTWISE`.
+
--- /dev/null
+# FILE HANDLE
+
+## Syntax Overview
+
+For text files:
+
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME
+ [/MODE=CHARACTER]
+ [/ENDS={CR,CRLF}]
+ /TABWIDTH=TAB_WIDTH
+ [ENCODING='ENCODING']
+```
+
+For binary files in native encoding with fixed-length records:
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME'
+ /MODE=IMAGE
+ [/LRECL=REC_LEN]
+ [ENCODING='ENCODING']
+```
+
+For binary files in native encoding with variable-length records:
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME'
+ /MODE=BINARY
+ [/LRECL=REC_LEN]
+ [ENCODING='ENCODING']
+```
+
+For binary files encoded in EBCDIC:
+```
+FILE HANDLE HANDLE_NAME
+ /NAME='FILE_NAME'
+ /MODE=360
+ /RECFORM={FIXED,VARIABLE,SPANNED}
+ [/LRECL=REC_LEN]
+ [ENCODING='ENCODING']
+```
+
+## Details
+
+ Use `FILE HANDLE` to associate a file handle name with a file and its
+attributes, so that later commands can refer to the file by its handle
+name. Names of text files can be specified directly on commands that
+access files, so that `FILE HANDLE` is only needed when a file is not an
+ordinary file containing lines of text. However, `FILE HANDLE` may be
+used even for text files, and it may be easier to specify a file's name
+once and later refer to it by an abstract handle.
+
+Specify the file handle name as the identifier immediately following
+the `FILE HANDLE` command name. The identifier `INLINE` is reserved
+for representing data embedded in the syntax file (see [BEGIN
+DATA](begin-data.md)). The file handle name must not already have been
+used in a previous invocation of `FILE HANDLE`, unless it has been
+closed with [`CLOSE FILE HANDLE`](close-file-handle.md).
+
+The effect and syntax of `FILE HANDLE` depends on the selected `MODE`:
+
+ - In `CHARACTER` mode, the default, the data file is read as a text
+ file. Each text line is read as one record.
+
+ In `CHARACTER` mode only, tabs are expanded to spaces by input
+ programs, except by `DATA LIST FREE` with explicitly specified
+ delimiters. Each tab is 4 characters wide by default, but `TABWIDTH`
+ (a PSPP extension) may be used to specify an alternate width. Use
+ a `TABWIDTH` of 0 to suppress tab expansion.
+
+ A file written in `CHARACTER` mode by default uses the line ends of
+ the system on which PSPP is running, that is, on Windows, the
+ default is CR LF line ends, and on other systems the default is LF
+ only. Specify `ENDS` as `CR` or `CRLF` to override the default. PSPP
+ reads files using either convention on any kind of system,
+ regardless of `ENDS`.
+
+ - In `IMAGE` mode, the data file is treated as a series of fixed-length
+ binary records. `LRECL` should be used to specify the record length
+ in bytes, with a default of 1024. On input, it is an error if an
+ `IMAGE` file's length is not a integer multiple of the record length.
+ On output, each record is padded with spaces or truncated, if
+ necessary, to make it exactly the correct length.
+
+ - In `BINARY` mode, the data file is treated as a series of
+ variable-length binary records. `LRECL` may be specified, but
+ its value is ignored. The data for each record is both preceded
+ and followed by a 32-bit signed integer in little-endian byte
+ order that specifies the length of the record. (This redundancy
+ permits records in these files to be efficiently read in reverse
+ order, although PSPP always reads them in forward order.) The
+ length does not include either integer.
+
+ - Mode `360` reads and writes files in formats first used for tapes
+ in the 1960s on IBM mainframe operating systems and still
+ supported today by the modern successors of those operating
+ systems. For more information, see `OS/400 Tape and Diskette
+ Device Programming`, available on IBM's website.
+
+ Alphanumeric data in mode `360` files are encoded in EBCDIC. PSPP
+ translates EBCDIC to or from the host's native format as necessary
+ on input or output, using an ASCII/EBCDIC translation that is
+ one-to-one, so that a "round trip" from ASCII to EBCDIC back to
+ ASCII, or vice versa, always yields exactly the original data.
+
+ The `RECFORM` subcommand is required in mode `360`. The precise
+ file format depends on its setting:
+
+ * `F`
+ `FIXED`
+ This record format is equivalent to `IMAGE` mode, except for
+ EBCDIC translation.
+
+ IBM documentation calls this `*F` (fixed-length, deblocked)
+ format.
+
+ * `V`
+ `VARIABLE`
+ The file comprises a sequence of zero or more variable-length
+ blocks. Each block begins with a 4-byte "block descriptor
+ word" (BDW). The first two bytes of the BDW are an unsigned
+ integer in big-endian byte order that specifies the length of
+ the block, including the BDW itself. The other two bytes of
+ the BDW are ignored on input and written as zeros on output.
+
+ Following the BDW, the remainder of each block is a sequence
+ of one or more variable-length records, each of which in turn
+ begins with a 4-byte "record descriptor word" (RDW) that has
+ the same format as the BDW. Following the RDW, the remainder
+ of each record is the record data.
+
+ The maximum length of a record in `VARIABLE` mode is 65,527
+ bytes: 65,535 bytes (the maximum value of a 16-bit unsigned
+ integer), minus 4 bytes for the BDW, minus 4 bytes for the
+ RDW.
+
+ In mode `VARIABLE`, `LRECL` specifies a maximum, not a fixed,
+ record length, in bytes. The default is 8,192.
+
+ IBM documentation calls this `*VB` (variable-length, blocked,
+ unspanned) format.
+
+ * `VS`
+ `SPANNED`
+ This format is like `VARIABLE`, except that logical records may
+ be split among multiple physical records (called "segments") or
+ blocks. In `SPANNED` mode, the third byte of each RDW is
+ called the segment control character (SCC). Odd SCC values
+ cause the segment to be appended to a record buffer maintained
+ in memory; even values also append the segment and then flush
+ its contents to the input procedure. Canonically, SCC value 0
+ designates a record not spanned among multiple segments, and
+ values 1 through 3 designate the first segment, the last
+ segment, or an intermediate segment, respectively, within a
+ multi-segment record. The record buffer is also flushed at end
+ of file regardless of the final record's SCC.
+
+ The maximum length of a logical record in `VARIABLE` mode is
+ limited only by memory available to PSPP. Segments are
+ limited to 65,527 bytes, as in `VARIABLE` mode.
+
+ This format is similar to what IBM documentation call `*VS`
+ (variable-length, deblocked, spanned) format.
+
+ In mode `360`, fields of type `A` that extend beyond the end of a
+ record read from disk are padded with spaces in the host's native
+ character set, which are then translated from EBCDIC to the
+ native character set. Thus, when the host's native character set
+ is based on ASCII, these fields are effectively padded with
+ character `X'80'`. This wart is implemented for compatibility.
+
+ The `NAME` subcommand specifies the name of the file associated with
+the handle. It is required in all modes but `SCRATCH` mode, in which its
+use is forbidden.
+
+ The `ENCODING` subcommand specifies the encoding of text in the
+file. For reading text files in `CHARACTER` mode, all of the forms
+described for `ENCODING` on the [`INSERT`](insert.md) command are
+supported. For reading in other file-based modes, encoding
+autodetection is not supported; if the specified encoding requests
+autodetection then the default encoding is used. This is also true
+when a file handle is used for writing a file in any mode.
+
--- /dev/null
+# FILE LABEL
+
+```
+FILE LABEL file label.
+```
+
+`FILE LABEL` provides a title for the active dataset. This title is
+saved into system files and portable files that are created during
+this PSPP run.
+
+The file label should not be quoted. If quotes are included, they are
+become part of the file label.
+
--- /dev/null
+# FILTER
+
+```
+FILTER BY VAR_NAME.
+FILTER OFF.
+```
+
+`FILTER` allows a boolean-valued variable to be used to select cases
+from the data stream for processing.
+
+To set up filtering, specify `BY` and a variable name. Keyword `BY` is
+optional but recommended. Cases which have a zero or system- or
+user-missing value are excluded from analysis, but not deleted from
+the data stream. Cases with other values are analyzed. To filter
+based on a different condition, use transformations such as `COMPUTE`
+or `RECODE` to compute a filter variable of the required form, then
+specify that variable on `FILTER`.
+
+`FILTER OFF` turns off case filtering.
+
+Filtering takes place immediately before cases pass to a procedure for
+analysis. Only one filter variable may be active at a time.
+Normally, case filtering continues until it is explicitly turned off
+with `FILTER OFF`. However, if `FILTER` is placed after `TEMPORARY`,
+it filters only the next procedure or procedure-like command.
+
--- /dev/null
+# FINISH
+
+```
+FINISH.
+```
+
+`FINISH` terminates the current PSPP session and returns control to
+the operating system.
+
--- /dev/null
+# FLIP
+
+```
+FLIP /VARIABLES=VAR_LIST /NEWNAMES=VAR_NAME.
+```
+
+`FLIP` transposes rows and columns in the active dataset. It causes
+cases to be swapped with variables, and vice versa.
+
+All variables in the transposed active dataset are numeric. String
+variables take on the system-missing value in the transposed file.
+
+`N` subcommands are required. If specified, the `VARIABLES`
+subcommand selects variables to be transformed into cases, and variables
+not specified are discarded. If the `VARIABLES` subcommand is omitted,
+all variables are selected for transposition.
+
+The variables specified by `NEWNAMES`, which must be a string
+variable, is used to give names to the variables created by `FLIP`.
+Only the first 8 characters of the variable are used. If `NEWNAMES`
+is not specified then the default is a variable named CASE_LBL, if it
+exists. If it does not then the variables created by `FLIP` are named
+`VAR000` through `VAR999`, then `VAR1000`, `VAR1001`, and so on.
+
+When a `NEWNAMES` variable is available, the names must be
+canonicalized before becoming variable names. Invalid characters are
+replaced by letter `V` in the first position, or by `_` in subsequent
+positions. If the name thus generated is not unique, then numeric
+extensions are added, starting with 1, until a unique name is found or
+there are no remaining possibilities. If the latter occurs then the
+`FLIP` operation aborts.
+
+The resultant dictionary contains a `CASE_LBL` variable, a string
+variable of width 8, which stores the names of the variables in the
+dictionary before the transposition. Variables names longer than 8
+characters are truncated. If `FLIP` is called again on this dataset,
+the `CASE_LBL` variable can be passed to the `NEWNAMES` subcommand to
+recreate the original variable names.
+
+`FLIP` honors [`N OF CASES`](n.md). It ignores
+[`TEMPORARY`](temporary.md), so that "temporary"
+transformations become permanent.
+
+## Example
+
+In the syntax below, data has been entered using [`DATA
+LIST`](data-list.md) such that the first
+variable in the dataset is a string variable containing a description
+of the other data for the case. Clearly this is not a convenient
+arrangement for performing statistical analyses, so it would have been
+better to think a little more carefully about how the data should have
+been arranged. However often the data is provided by some third party
+source, and you have no control over the form. Fortunately, we can
+use `FLIP` to exchange the variables and cases in the active dataset.
+
+```
+data list notable list /heading (a16) v1 v2 v3 v4 v5 v6
+begin data.
+date-of-birth 1970 1989 2001 1966 1976 1982
+sex 1 0 0 1 0 1
+score 10 10 9 3 8 9
+end data.
+
+echo 'Before FLIP:'.
+display variables.
+list.
+
+flip /variables = all /newnames = heading.
+
+echo 'After FLIP:'.
+display variables.
+list.
+```
+
+As you can see in the results below, before the `FLIP` command has run
+there are seven variables (six containing data and one for the
+heading) and three cases. Afterwards there are four variables (one
+per case, plus the CASE_LBL variable) and six cases. You can delete
+the CASE_LBL variable (see [DELETE VARIABLES](delete-variables.md)) if
+you don't need it.
+
+```
+Before FLIP:
+
+ Variables
+┌───────┬────────┬────────────┬────────────┐
+│Name │Position│Print Format│Write Format│
+├───────┼────────┼────────────┼────────────┤
+│heading│ 1│A16 │A16 │
+│v1 │ 2│F8.2 │F8.2 │
+│v2 │ 3│F8.2 │F8.2 │
+│v3 │ 4│F8.2 │F8.2 │
+│v4 │ 5│F8.2 │F8.2 │
+│v5 │ 6│F8.2 │F8.2 │
+│v6 │ 7│F8.2 │F8.2 │
+└───────┴────────┴────────────┴────────────┘
+
+ Data List
+┌─────────────┬───────┬───────┬───────┬───────┬───────┬───────┐
+│ heading │ v1 │ v2 │ v3 │ v4 │ v5 │ v6 │
+├─────────────┼───────┼───────┼───────┼───────┼───────┼───────┤
+│date─of─birth│1970.00│1989.00│2001.00│1966.00│1976.00│1982.00│
+│sex │ 1.00│ .00│ .00│ 1.00│ .00│ 1.00│
+│score │ 10.00│ 10.00│ 9.00│ 3.00│ 8.00│ 9.00│
+└─────────────┴───────┴───────┴───────┴───────┴───────┴───────┘
+
+After FLIP:
+
+ Variables
+┌─────────────┬────────┬────────────┬────────────┐
+│Name │Position│Print Format│Write Format│
+├─────────────┼────────┼────────────┼────────────┤
+│CASE_LBL │ 1│A8 │A8 │
+│date_of_birth│ 2│F8.2 │F8.2 │
+│sex │ 3│F8.2 │F8.2 │
+│score │ 4│F8.2 │F8.2 │
+└─────────────┴────────┴────────────┴────────────┘
+
+ Data List
+┌────────┬─────────────┬────┬─────┐
+│CASE_LBL│date_of_birth│ sex│score│
+├────────┼─────────────┼────┼─────┤
+│v1 │ 1970.00│1.00│10.00│
+│v2 │ 1989.00│ .00│10.00│
+│v3 │ 2001.00│ .00│ 9.00│
+│v4 │ 1966.00│1.00│ 3.00│
+│v5 │ 1976.00│ .00│ 8.00│
+│v6 │ 1982.00│1.00│ 9.00│
+└────────┴─────────────┴────┴─────┘
+```
--- /dev/null
+# FORMATS
+
+```
+FORMATS VAR_LIST (FMT_SPEC) [VAR_LIST (FMT_SPEC)]....
+```
+
+`FORMATS` set both print and write formats for the specified variables
+to the specified [output
+format](../language/datasets/formats/index.md).
+format](../language/datasets/formats/index.md).
+
+Specify a list of variables followed by a format specification in
+parentheses. The print and write formats of the specified variables
+will be changed. All of the variables listed together must have the
+same type and, for string variables, the same width.
+
+Additional lists of variables and formats may be included following
+the first one.
+
+`FORMATS` takes effect immediately. It is not affected by conditional
+and looping structures such as `DO IF` or `LOOP`.
+
--- /dev/null
+# FREQUENCIES
+
+```
+FREQUENCIES
+ /VARIABLES=VAR_LIST
+ /FORMAT={TABLE,NOTABLE,LIMIT(LIMIT)}
+ {AVALUE,DVALUE,AFREQ,DFREQ}
+ /MISSING={EXCLUDE,INCLUDE}
+ /STATISTICS={DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
+ KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
+ SESKEWNESS,SEKURTOSIS,ALL,NONE}
+ /NTILES=NTILES
+ /PERCENTILES=percent...
+ /HISTOGRAM=[MINIMUM(X_MIN)] [MAXIMUM(X_MAX)]
+ [{FREQ[(Y_MAX)],PERCENT[(Y_MAX)]}] [{NONORMAL,NORMAL}]
+ /PIECHART=[MINIMUM(X_MIN)] [MAXIMUM(X_MAX)]
+ [{FREQ,PERCENT}] [{NOMISSING,MISSING}]
+ /BARCHART=[MINIMUM(X_MIN)] [MAXIMUM(X_MAX)]
+ [{FREQ,PERCENT}]
+ /ORDER={ANALYSIS,VARIABLE}
+
+
+(These options are not currently implemented.)
+ /HBAR=...
+ /GROUPED=...
+```
+
+The `FREQUENCIES` procedure outputs frequency tables for specified
+variables. `FREQUENCIES` can also calculate and display descriptive
+statistics (including median and mode) and percentiles, and various
+graphical representations of the frequency distribution.
+
+The `VARIABLES` subcommand is the only required subcommand. Specify
+the variables to be analyzed.
+
+The `FORMAT` subcommand controls the output format. It has several
+possible settings:
+
+ * `TABLE`, the default, causes a frequency table to be output for
+ every variable specified. `NOTABLE` prevents them from being
+ output. `LIMIT` with a numeric argument causes them to be output
+ except when there are more than the specified number of values in
+ the table.
+
+ * Normally frequency tables are sorted in ascending order by value.
+ This is `AVALUE`. `DVALUE` tables are sorted in descending order
+ by value. `AFREQ` and `DFREQ` tables are sorted in ascending and
+ descending order, respectively, by frequency count.
+
+The `MISSING` subcommand controls the handling of user-missing values.
+When `EXCLUDE`, the default, is set, user-missing values are not
+included in frequency tables or statistics. When `INCLUDE` is set,
+user-missing are included. System-missing values are never included
+in statistics, but are listed in frequency tables.
+
+The available `STATISTICS` are the same as available in
+[`DESCRIPTIVES`](descriptives.md), with the addition of `MEDIAN`, the
+data's median value, and `MODE`, the mode. (If there are multiple
+modes, the smallest value is reported.) By default, the mean,
+standard deviation of the mean, minimum, and maximum are reported for
+each variable.
+
+`PERCENTILES` causes the specified percentiles to be reported. The
+percentiles should be presented at a list of numbers between 0 and 100
+inclusive. The `NTILES` subcommand causes the percentiles to be
+reported at the boundaries of the data set divided into the specified
+number of ranges. For instance, `/NTILES=4` would cause quartiles to
+be reported.
+
+The `HISTOGRAM` subcommand causes the output to include a histogram
+for each specified numeric variable. The X axis by default ranges
+from the minimum to the maximum value observed in the data, but the
+`MINIMUM` and `MAXIMUM` keywords can set an explicit range.[^1]
+Histograms are not created for string variables.
+
+[^1]: The number of bins is chosen according to the Freedman-Diaconis
+rule: $$2 \times IQR(x)n^{-1/3}$$ where \\(IQR(x)\\) is the
+interquartile range of \\(x\\) and \\(n\\) is the number of samples.
+([`EXAMINE`](examine.md) uses a different algorithm to determine bin
+sizes.)
+
+Specify `NORMAL` to superimpose a normal curve on the histogram.
+
+The `PIECHART` subcommand adds a pie chart for each variable to the
+data. Each slice represents one value, with the size of the slice
+proportional to the value's frequency. By default, all non-missing
+values are given slices. The `MINIMUM` and `MAXIMUM` keywords can be
+used to limit the displayed slices to a given range of values. The
+keyword `NOMISSING` causes missing values to be omitted from the
+piechart. This is the default. If instead, `MISSING` is specified,
+then the pie chart includes a single slice representing all system
+missing and user-missing cases.
+
+The `BARCHART` subcommand produces a bar chart for each variable.
+The `MINIMUM` and `MAXIMUM` keywords can be used to omit categories
+whose counts which lie outside the specified limits. The `FREQ` option
+(default) causes the ordinate to display the frequency of each category,
+whereas the `PERCENT` option displays relative percentages.
+
+The `FREQ` and `PERCENT` options on `HISTOGRAM` and `PIECHART` are
+accepted but not currently honoured.
+
+The `ORDER` subcommand is accepted but ignored.
+
+## Example
+
+The syntax below runs a frequency analysis on the sex and occupation
+variables from the `personnel.sav` file. This is useful to get an
+general idea of the way in which these nominal variables are
+distributed.
+
+```
+get file='personnel.sav'.
+
+frequencies /variables = sex occupation
+ /statistics = none.
+```
+
+If you are using the graphic user interface, the dialog box is set up
+such that by default, several statistics are calculated. Some are not
+particularly useful for categorical variables, so you may want to
+disable those.
+
+From the output, shown below, it is evident that there are 33 males,
+21 females and 2 persons for whom their sex has not been entered.
+
+One can also see how many of each occupation there are in the data.
+When dealing with string variables used as nominal values, running a
+frequency analysis is useful to detect data input entries. Notice
+that one occupation value has been mistyped as "Scrientist". This
+entry should be corrected, or marked as missing before using the data.
+
+```
+ sex
+┌──────────────┬─────────┬───────┬─────────────┬──────────────────┐
+│ │Frequency│Percent│Valid Percent│Cumulative Percent│
+├──────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Valid Male │ 33│ 58.9%│ 61.1%│ 61.1%│
+│ Female│ 21│ 37.5%│ 38.9%│ 100.0%│
+├──────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Missing . │ 2│ 3.6%│ │ │
+├──────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Total │ 56│ 100.0%│ │ │
+└──────────────┴─────────┴───────┴─────────────┴──────────────────┘
+
+ occupation
+┌────────────────────────┬─────────┬───────┬─────────────┬──────────────────┐
+│ │Frequency│Percent│Valid Percent│Cumulative Percent│
+├────────────────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Valid Artist │ 8│ 14.3%│ 14.3%│ 14.3%│
+│ Baker │ 2│ 3.6%│ 3.6%│ 17.9%│
+│ Barrister │ 1│ 1.8%│ 1.8%│ 19.6%│
+│ Carpenter │ 4│ 7.1%│ 7.1%│ 26.8%│
+│ Cleaner │ 4│ 7.1%│ 7.1%│ 33.9%│
+│ Cook │ 7│ 12.5%│ 12.5%│ 46.4%│
+│ Manager │ 8│ 14.3%│ 14.3%│ 60.7%│
+│ Mathematician │ 4│ 7.1%│ 7.1%│ 67.9%│
+│ Painter │ 2│ 3.6%│ 3.6%│ 71.4%│
+│ Payload Specialist│ 1│ 1.8%│ 1.8%│ 73.2%│
+│ Plumber │ 5│ 8.9%│ 8.9%│ 82.1%│
+│ Scientist │ 7│ 12.5%│ 12.5%│ 94.6%│
+│ Scrientist │ 1│ 1.8%│ 1.8%│ 96.4%│
+│ Tailor │ 2│ 3.6%│ 3.6%│ 100.0%│
+├────────────────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Total │ 56│ 100.0%│ │ │
+└────────────────────────┴─────────┴───────┴─────────────┴──────────────────┘
+```
+
--- /dev/null
+# GET DATA
+
+```
+GET DATA
+ /TYPE={GNM,ODS,PSQL,TXT}
+ ...additional subcommands depending on TYPE...
+```
+
+ The `GET DATA` command is used to read files and other data sources
+created by other applications. When this command is executed, the
+current dictionary and active dataset are replaced with variables and
+data read from the specified source.
+
+ The `TYPE` subcommand is mandatory and must be the first subcommand
+specified. It determines the type of the file or source to read.
+PSPP currently supports the following `TYPE`s:
+
+* `GNM`
+ Spreadsheet files created by Gnumeric (<http://gnumeric.org>).
+
+* `ODS`
+ Spreadsheet files in OpenDocument format
+ (<http://opendocumentformat.org>).
+
+* `PSQL`
+ Relations from PostgreSQL databases (<http://postgresql.org>).
+
+* `TXT`
+ Textual data files in columnar and delimited formats.
+
+Each supported file type has additional subcommands, explained in
+separate sections below.
+
+## Spreadsheet Files
+
+```
+GET DATA /TYPE={GNM, ODS}
+ /FILE={'FILE_NAME'}
+ /SHEET={NAME 'SHEET_NAME', INDEX N}
+ /CELLRANGE={RANGE 'RANGE', FULL}
+ /READNAMES={ON, OFF}
+ /ASSUMEDSTRWIDTH=N.
+```
+
+`GET DATA` can read Gnumeric spreadsheets (<http://gnumeric.org>), and
+spreadsheets in OpenDocument format
+(<http://libreplanet.org/wiki/Group:OpenDocument/Software>). Use the
+`TYPE` subcommand to indicate the file's format. `/TYPE=GNM`
+indicates Gnumeric files, `/TYPE=ODS` indicates OpenDocument. The
+`FILE` subcommand is mandatory. Use it to specify the name file to be
+read. All other subcommands are optional.
+
+ The format of each variable is determined by the format of the
+spreadsheet cell containing the first datum for the variable. If this
+cell is of string (text) format, then the width of the variable is
+determined from the length of the string it contains, unless the
+`ASSUMEDSTRWIDTH` subcommand is given.
+
+ The `SHEET` subcommand specifies the sheet within the spreadsheet
+file to read. There are two forms of the `SHEET` subcommand. In the
+first form, `/SHEET=name SHEET_NAME`, the string SHEET_NAME is the name
+of the sheet to read. In the second form, `/SHEET=index IDX`, IDX is a
+integer which is the index of the sheet to read. The first sheet has
+the index 1. If the `SHEET` subcommand is omitted, then the command
+reads the first sheet in the file.
+
+ The `CELLRANGE` subcommand specifies the range of cells within the
+sheet to read. If the subcommand is given as `/CELLRANGE=FULL`, then
+the entire sheet is read. To read only part of a sheet, use the form
+`/CELLRANGE=range 'TOP_LEFT_CELL:BOTTOM_RIGHT_CELL'`. For example,
+the subcommand `/CELLRANGE=range 'C3:P19'` reads columns C-P and rows
+3-19, inclusive. Without the `CELLRANGE` subcommand, the entire sheet
+is read.
+
+ If `/READNAMES=ON` is specified, then the contents of cells of the
+first row are used as the names of the variables in which to store the
+data from subsequent rows. This is the default. If `/READNAMES=OFF` is
+used, then the variables receive automatically assigned names.
+
+ The `ASSUMEDSTRWIDTH` subcommand specifies the maximum width of
+string variables read from the file. If omitted, the default value is
+determined from the length of the string in the first spreadsheet cell
+for each variable.
+
+## Postgres Database Queries
+
+```
+GET DATA /TYPE=PSQL
+ /CONNECT={CONNECTION INFO}
+ /SQL={QUERY}
+ [/ASSUMEDSTRWIDTH=W]
+ [/UNENCRYPTED]
+ [/BSIZE=N].
+```
+
+ `GET DATA /TYPE=PSQL` imports data from a local or remote Postgres
+database server. It automatically creates variables based on the table
+column names or the names specified in the SQL query. PSPP cannot
+support the full precision of some Postgres data types, so data of those
+types will lose some precision when PSPP imports them. PSPP does not
+support all Postgres data types. If PSPP cannot support a datum, `GET
+DATA` issues a warning and substitutes the system-missing value.
+
+ The `CONNECT` subcommand must be a string for the parameters of the
+database server from which the data should be fetched. The format of
+the string is given [in the Postgres
+manual](http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT).
+
+ The `SQL` subcommand must be a valid SQL statement to retrieve data
+from the database.
+
+ The `ASSUMEDSTRWIDTH` subcommand specifies the maximum width of
+string variables read from the database. If omitted, the default value
+is determined from the length of the string in the first value read for
+each variable.
+
+ The `UNENCRYPTED` subcommand allows data to be retrieved over an
+insecure connection. If the connection is not encrypted, and the
+`UNENCRYPTED` subcommand is not given, then an error occurs. Whether or
+not the connection is encrypted depends upon the underlying psql library
+and the capabilities of the database server.
+
+ The `BSIZE` subcommand serves only to optimise the speed of data
+transfer. It specifies an upper limit on number of cases to fetch from
+the database at once. The default value is 4096. If your SQL statement
+fetches a large number of cases but only a small number of variables,
+then the data transfer may be faster if you increase this value.
+Conversely, if the number of variables is large, or if the machine on
+which PSPP is running has only a small amount of memory, then a smaller
+value is probably better.
+
+### Example
+
+```
+GET DATA /TYPE=PSQL
+ /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
+ /SQL='select * from manufacturer'.
+```
+
+## Textual Data Files
+
+```
+GET DATA /TYPE=TXT
+ /FILE={'FILE_NAME',FILE_HANDLE}
+ [ENCODING='ENCODING']
+ [/ARRANGEMENT={DELIMITED,FIXED}]
+ [/FIRSTCASE={FIRST_CASE}]
+ [/IMPORTCASES=...]
+ ...additional subcommands depending on ARRANGEMENT...
+```
+
+ When `TYPE=TXT` is specified, `GET DATA` reads data in a delimited
+or fixed columnar format, much like [`DATA
+LIST`](data-list.md).
+
+ The `FILE` subcommand must specify the file to be read as a string
+file name or (for textual data only) a [file
+handle](../language/files/file-handles.md)).
+
+ The `ENCODING` subcommand specifies the character encoding of the
+file to be read. See [`INSERT`](insert.md), for information on
+supported encodings.
+
+ The `ARRANGEMENT` subcommand determines the file's basic format.
+`DELIMITED`, the default setting, specifies that fields in the input data
+are separated by spaces, tabs, or other user-specified delimiters.
+`FIXED` specifies that fields in the input data appear at particular fixed
+column positions within records of a case.
+
+ By default, cases are read from the input file starting from the
+first line. To skip lines at the beginning of an input file, set
+`FIRSTCASE` to the number of the first line to read: 2 to skip the
+first line, 3 to skip the first two lines, and so on.
+
+ `IMPORTCASES` is ignored, for compatibility. Use [`N OF
+CASES`](n.md) to limit the number of cases read from a file, or
+[`SAMPLE`](sample.md) to obtain a random sample of cases.
+
+ The remaining subcommands apply only to one of the two file
+arrangements, described below.
+
+### Delimited Data
+
+```
+GET DATA /TYPE=TXT
+ /FILE={'FILE_NAME',FILE_HANDLE}
+ [/ARRANGEMENT={DELIMITED,FIXED}]
+ [/FIRSTCASE={FIRST_CASE}]
+ [/IMPORTCASE={ALL,FIRST MAX_CASES,PERCENT PERCENT}]
+
+ /DELIMITERS="DELIMITERS"
+ [/QUALIFIER="QUOTES"
+ [/DELCASE={LINE,VARIABLES N_VARIABLES}]
+ /VARIABLES=DEL_VAR1 [DEL_VAR2]...
+where each DEL_VAR takes the form:
+ variable format
+```
+
+ The `GET DATA` command with `TYPE=TXT` and `ARRANGEMENT=DELIMITED`
+reads input data from text files in delimited format, where fields are
+separated by a set of user-specified delimiters. Its capabilities are
+similar to those of [`DATA LIST FREE`](data-list.md#data-list-free),
+with a few enhancements.
+
+ The required `FILE` subcommand and optional `FIRSTCASE` and
+`IMPORTCASE` subcommands are described [above](#textual-data-files).
+
+ `DELIMITERS`, which is required, specifies the set of characters that
+may separate fields. Each character in the string specified on
+`DELIMITERS` separates one field from the next. The end of a line also
+separates fields, regardless of `DELIMITERS`. Two consecutive
+delimiters in the input yield an empty field, as does a delimiter at the
+end of a line. A space character as a delimiter is an exception:
+consecutive spaces do not yield an empty field and neither does any
+number of spaces at the end of a line.
+
+ To use a tab as a delimiter, specify `\t` at the beginning of the
+`DELIMITERS` string. To use a backslash as a delimiter, specify `\\` as
+the first delimiter or, if a tab should also be a delimiter, immediately
+following `\t`. To read a data file in which each field appears on a
+separate line, specify the empty string for `DELIMITERS`.
+
+ The optional `QUALIFIER` subcommand names one or more characters that
+can be used to quote values within fields in the input. A field that
+begins with one of the specified quote characters ends at the next
+matching quote. Intervening delimiters become part of the field,
+instead of terminating it. The ability to specify more than one quote
+character is a PSPP extension.
+
+ The character specified on `QUALIFIER` can be embedded within a field
+that it quotes by doubling the qualifier. For example, if `'` is
+specified on `QUALIFIER`, then `'a''b'` specifies a field that contains
+`a'b`.
+
+ The `DELCASE` subcommand controls how data may be broken across
+lines in the data file. With `LINE`, the default setting, each line
+must contain all the data for exactly one case. For additional
+flexibility, to allow a single case to be split among lines or
+multiple cases to be contained on a single line, specify `VARIABLES
+n_variables`, where `n_variables` is the number of variables per case.
+
+ The `VARIABLES` subcommand is required and must be the last
+subcommand. Specify the name of each variable and its [input
+format](../language/datasets/formats/index.md), in the order they
+should be read from the input file.
+
+#### Example 1
+
+On a Unix-like system, the `/etc/passwd` file has a format similar to
+this:
+
+```
+root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
+blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
+john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
+jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
+```
+
+The following syntax reads a file in the format used by `/etc/passwd`:
+
+```
+GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
+ /VARIABLES=username A20
+ password A40
+ uid F10
+ gid F10
+ gecos A40
+ home A40
+ shell A40.
+```
+
+#### Example 2
+
+Consider the following data on used cars:
+
+```
+model year mileage price type age
+Civic 2002 29883 15900 Si 2
+Civic 2003 13415 15900 EX 1
+Civic 1992 107000 3800 n/a 12
+Accord 2002 26613 17900 EX 1
+```
+
+The following syntax can be used to read the used car data:
+
+```
+GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
+ /VARIABLES=model A8
+ year F4
+ mileage F6
+ price F5
+ type A4
+ age F2.
+```
+
+#### Example 3
+
+Consider the following information on animals in a pet store:
+
+```
+'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
+, (Years), , , (Dollars), ,
+"Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
+"Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
+"Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
+"Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
+```
+
+The following syntax can be used to read the pet store data:
+
+```
+GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
+ /FIRSTCASE=3
+ /VARIABLES=name A10
+ age F3.1
+ color A5
+ received EDATE10
+ price F5.2
+ height a5
+ type a10.
+```
+
+### Fixed Columnar Data
+
+```
+GET DATA /TYPE=TXT
+ /FILE={'file_name',FILE_HANDLE}
+ [/ARRANGEMENT={DELIMITED,FIXED}]
+ [/FIRSTCASE={FIRST_CASE}]
+ [/IMPORTCASE={ALL,FIRST MAX_CASES,PERCENT PERCENT}]
+
+ [/FIXCASE=N]
+ /VARIABLES FIXED_VAR [FIXED_VAR]...
+ [/rec# FIXED_VAR [FIXED_VAR]...]...
+where each FIXED_VAR takes the form:
+ VARIABLE START-END FORMAT
+```
+
+ The `GET DATA` command with `TYPE=TXT` and `ARRANGEMENT=FIXED`
+reads input data from text files in fixed format, where each field is
+located in particular fixed column positions within records of a case.
+Its capabilities are similar to those of [`DATA LIST
+FIXED`](data-list.md#data-list-fixed), with a few enhancements.
+
+ The required `FILE` subcommand and optional `FIRSTCASE` and
+`IMPORTCASE` subcommands are described [above](#textual-data-files).
+
+ The optional `FIXCASE` subcommand may be used to specify the positive
+integer number of input lines that make up each case. The default value
+is 1.
+
+ The `VARIABLES` subcommand, which is required, specifies the
+positions at which each variable can be found. For each variable,
+specify its name, followed by its start and end column separated by `-`
+(e.g. `0-9`), followed by an input format type (e.g. `F`) or a full
+format specification (e.g. `DOLLAR12.2`). For this command, columns are
+numbered starting from 0 at the left column. Introduce the variables in
+the second and later lines of a case by a slash followed by the number
+of the line within the case, e.g. `/2` for the second line.
+
+#### Example
+
+Consider the following data on used cars:
+
+```
+model year mileage price type age
+Civic 2002 29883 15900 Si 2
+Civic 2003 13415 15900 EX 1
+Civic 1992 107000 3800 n/a 12
+Accord 2002 26613 17900 EX 1
+```
+
+The following syntax can be used to read the used car data:
+
+```
+GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
+ /VARIABLES=model 0-7 A
+ year 8-15 F
+ mileage 16-23 F
+ price 24-31 F
+ type 32-40 A
+ age 40-47 F.
+```
--- /dev/null
+# GET
+
+```
+GET
+ /FILE={'FILE_NAME',FILE_HANDLE}
+ /DROP=VAR_LIST
+ /KEEP=VAR_LIST
+ /RENAME=(SRC_NAMES=TARGET_NAMES)...
+ /ENCODING='ENCODING'
+```
+
+ `GET` clears the current dictionary and active dataset and replaces
+them with the dictionary and data from a specified file.
+
+ The `FILE` subcommand is the only required subcommand. Specify the
+SPSS system file, SPSS/PC+ system file, or SPSS portable file to be
+read as a string file name or a [file
+handle](../language/files/file-handles.md).
+handle](../language/files/file-handles.md).
+
+ By default, all the variables in a file are read. The `DROP`
+subcommand can be used to specify a list of variables that are not to
+be read. By contrast, the `KEEP` subcommand can be used to specify
+variable that are to be read, with all other variables not read.
+
+ Normally variables in a file retain the names that they were saved
+under. Use the `RENAME` subcommand to change these names. Specify,
+within parentheses, a list of variable names followed by an equals sign
+(`=`) and the names that they should be renamed to. Multiple
+parenthesized groups of variable names can be included on a single
+`RENAME` subcommand. Variables' names may be swapped using a `RENAME`
+subcommand of the form `/RENAME=(A B=B A)`.
+
+ Alternate syntax for the `RENAME` subcommand allows the parentheses
+to be omitted. When this is done, only a single variable may be
+renamed at once. For instance, `/RENAME=A=B`. This alternate syntax
+is discouraged.
+
+ `DROP`, `KEEP`, and `RENAME` are executed in left-to-right order.
+Each may be present any number of times. `GET` never modifies a file on
+disk. Only the active dataset read from the file is affected by these
+subcommands.
+
+ PSPP automatically detects the encoding of string data in the file,
+when possible. The character encoding of old SPSS system files cannot
+always be guessed correctly, and SPSS/PC+ system files do not include
+any indication of their encoding. Specify the `ENCODING` subcommand
+with an IANA character set name as its string argument to override the
+default. Use `SYSFILE INFO` to analyze the encodings that might be
+valid for a system file. The `ENCODING` subcommand is a PSPP extension.
+
+ `GET` does not cause the data to be read, only the dictionary. The
+data is read later, when a procedure is executed.
+
+ Use of `GET` to read a portable file is a PSPP extension.
+
--- /dev/null
+# GLM
+
+```
+GLM DEPENDENT_VARS BY FIXED_FACTORS
+ [/METHOD = SSTYPE(TYPE)]
+ [/DESIGN = INTERACTION_0 [INTERACTION_1 [... INTERACTION_N]]]
+ [/INTERCEPT = {INCLUDE|EXCLUDE}]
+ [/MISSING = {INCLUDE|EXCLUDE}]
+```
+
+The `GLM` procedure can be used for fixed effects factorial Anova.
+
+The `DEPENDENT_VARS` are the variables to be analysed. You may analyse
+several variables in the same command in which case they should all
+appear before the `BY` keyword.
+
+The `FIXED_FACTORS` list must be one or more categorical variables.
+Normally it does not make sense to enter a scalar variable in the
+`FIXED_FACTORS` and doing so may cause PSPP to do a lot of unnecessary
+processing.
+
+The `METHOD` subcommand is used to change the method for producing
+the sums of squares. Available values of `TYPE` are 1, 2 and 3. The
+default is type 3.
+
+You may specify a custom design using the `DESIGN` subcommand. The
+design comprises a list of interactions where each interaction is a list
+of variables separated by a `*`. For example the command
+```
+GLM subject BY sex age_group race
+ /DESIGN = age_group sex group age_group*sex age_group*race
+```
+specifies the model
+```
+subject = age_group + sex + race + age_group×sex + age_group×race
+```
+If no `DESIGN` subcommand is specified, then the
+default is all possible combinations of the fixed factors. That is to
+say
+```
+GLM subject BY sex age_group race
+```
+implies the model
+```
+subject = age_group + sex + race + age_group×sex + age_group×race + sex×race + age_group×sex×race
+```
+
+The `MISSING` subcommand determines the handling of missing variables.
+If `INCLUDE` is set then, for the purposes of GLM analysis, only
+system-missing values are considered to be missing; user-missing
+values are not regarded as missing. If `EXCLUDE` is set, which is the
+default, then user-missing values are considered to be missing as well
+as system-missing values. A case for which any dependent variable or
+any factor variable has a missing value is excluded from the analysis.
+
--- /dev/null
+# GRAPH
+
+```
+GRAPH
+ /HISTOGRAM [(NORMAL)]= VAR
+ /SCATTERPLOT [(BIVARIATE)] = VAR1 WITH VAR2 [BY VAR3]
+ /BAR = {SUMMARY-FUNCTION(VAR1) | COUNT-FUNCTION} BY VAR2 [BY VAR3]
+ [ /MISSING={LISTWISE, VARIABLE} [{EXCLUDE, INCLUDE}] ]
+ [{NOREPORT,REPORT}]
+```
+
+`GRAPH` produces a graphical plots of data. Only one of the
+subcommands `HISTOGRAM`, `BAR` or `SCATTERPLOT` can be specified, i.e.
+only one plot can be produced per call of `GRAPH`. The `MISSING` is
+optional.
+
+## Scatterplot
+
+The subcommand `SCATTERPLOT` produces an xy plot of the data. `GRAPH`
+uses `VAR3`, if specified, to determine the colours and/or
+markers for the plot. The following is an example for producing a
+scatterplot.
+
+```
+GRAPH
+ /SCATTERPLOT = height WITH weight BY gender.
+```
+
+This example produces a scatterplot where `height` is plotted versus
+`weight`. Depending on the value of `gender`, the colour of the
+datapoint is different. With this plot it is possible to analyze
+gender differences for `height` versus `weight` relation.
+
+## Histogram
+
+The subcommand `HISTOGRAM` produces a histogram. Only one variable is
+allowed for the histogram plot. The keyword `NORMAL` may be specified
+in parentheses, to indicate that the ideal normal curve should be
+superimposed over the histogram. For an alternative method to produce
+histograms, see [EXAMINE](examine.md). The following example produces
+a histogram plot for the variable `weight`.
+
+```
+GRAPH
+ /HISTOGRAM = weight.
+```
+
+## Bar Chart
+
+The subcommand `BAR` produces a bar chart. This subcommand requires
+that a `COUNT-FUNCTION` be specified (with no arguments) or a
+`SUMMARY-FUNCTION` with a variable VAR1 in parentheses. Following the
+summary or count function, the keyword `BY` should be specified and
+then a catagorical variable, `VAR2`. The values of `VAR2` determine
+the labels of the bars to be plotted. A second categorical variable
+`VAR3` may be specified, in which case a clustered (grouped) bar chart
+is produced.
+
+Valid count functions are:
+
+* `COUNT`
+ The weighted counts of the cases in each category.
+* `PCT`
+ The weighted counts of the cases in each category expressed as a
+ percentage of the total weights of the cases.
+* `CUFREQ`
+ The cumulative weighted counts of the cases in each category.
+* `CUPCT`
+ The cumulative weighted counts of the cases in each category
+ expressed as a percentage of the total weights of the cases.
+
+The summary function is applied to `VAR1` across all cases in each
+category. The recognised summary functions are:
+
+* `SUM`
+ The sum.
+* `MEAN`
+ The arithmetic mean.
+* `MAXIMUM`
+ The maximum value.
+* `MINIMUM`
+ The minimum value.
+
+The following examples assume a dataset which is the results of a
+survey. Each respondent has indicated annual income, their sex and city
+of residence. One could create a bar chart showing how the mean income
+varies between of residents of different cities, thus:
+```
+GRAPH /BAR = MEAN(INCOME) BY CITY.
+```
+
+This can be extended to also indicate how income in each city differs
+between the sexes.
+```
+GRAPH /BAR = MEAN(INCOME) BY CITY BY SEX.
+```
+
+One might also want to see how many respondents there are from each
+city. This can be achieved as follows:
+```
+GRAPH /BAR = COUNT BY CITY.
+```
+
+The [FREQUENCIES](frequencies.md) and [CROSSTABS](crosstabs.md)
+commands can also produce bar charts.
+
--- /dev/null
+# HOST
+
+In the syntax below, the square brackets must be included in the command
+syntax and do not indicate that that their contents are optional.
+
+```
+HOST COMMAND=['COMMAND'...]
+ TIMELIMIT=SECS.
+```
+
+`HOST` executes one or more commands, each provided as a string in
+the required `COMMAND` subcommand, in the shell of the underlying
+operating system. PSPP runs each command in a separate shell process
+and waits for it to finish before running the next one. If a command
+fails (with a nonzero exit status, or because it is killed by a signal),
+then PSPP does not run any remaining commands.
+
+PSPP provides `/dev/null` as the shell's standard input. If a
+process needs to read from stdin, redirect from a file or device, or use
+a pipe.
+
+PSPP displays the shell's standard output and standard error as PSPP
+output. Redirect to a file or `/dev/null` or another device if this is
+not desired.
+
+By default, PSPP waits as long as necessary for the series of
+commands to complete. Use the optional `TIMELIMIT` subcommand to limit
+the execution time to the specified number of seconds.
+
+PSPP built for mingw does not support all the features of `HOST`.
+
+PSPP rejects this command if the [`SAFER`](set.md#safer) setting is
+active.
+
+## Example
+
+The following example runs `rsync` to copy a file from a remote
+server to the local file `data.txt`, writing `rsync`'s own output to
+`rsync-log.txt`. PSPP displays the command's error output, if any. If
+`rsync` needs to prompt the user (e.g. to obtain a password), the
+command fails. Only if the `rsync` succeeds, PSPP then runs the
+`sha512sum` command.
+
+```
+HOST COMMAND=['rsync remote:data.txt data.txt > rsync-log.txt'
+ 'sha512sum -c data.txt.sha512sum].
+```
+
--- /dev/null
+# IF
+
+```
+ IF CONDITION VARIABLE=EXPRESSION.
+or
+ IF CONDITION vector(INDEX)=EXPRESSION.
+```
+
+The `IF` transformation evaluates a test expression and, if it is
+true, assigns the value of a target expression to a target variable.
+
+Specify a boolean-valued test
+[expression](../language/expressions/index.md) to be tested following the
+[expression](../language/expressions/index.md) to be tested following the
+`IF` keyword. The test expression is evaluated for each case:
+
+- If it is true, then the target expression is evaluated and assigned
+ to the specified variable.
+
+- If it is false or missing, nothing is done.
+
+Numeric and string variables may be assigned. When a string
+expression's width differs from the target variable's width, the
+string result is truncated or padded with spaces on the right as
+necessary. The expression and variable types must match.
+
+The target variable may be specified as an element of a
+[vector](vector.md). In this case, a vector
+index expression must be specified in parentheses following the vector
+name. The index expression must evaluate to a numeric value that,
+after rounding down to the nearest integer, is a valid index for the
+named vector.
+
+Using `IF` to assign to a variable specified on [`LEAVE`](leave.md)
+resets the variable's left state. Therefore, use `LEAVE` after `IF`,
+not before.
+
+When `IF` follows [`TEMPORARY`](temporary.md), the
+[`LAG`](../language/expressions/functions/miscellaneous.md) function
+may not be used.
+
--- /dev/null
+# IMPORT
+
+```
+IMPORT
+ /FILE='FILE_NAME'
+ /TYPE={COMM,TAPE}
+ /DROP=VAR_LIST
+ /KEEP=VAR_LIST
+ /RENAME=(SRC_NAMES=TARGET_NAMES)...
+```
+
+The `IMPORT` transformation clears the active dataset dictionary and
+data and replaces them with a dictionary and data from a system file or
+portable file.
+
+The `FILE` subcommand, which is the only required subcommand,
+specifies the portable file to be read as a file name string or a
+[file handle](../language/files/file-handles.md).
+[file handle](../language/files/file-handles.md).
+
+The `TYPE` subcommand is currently not used.
+
+`DROP`, `KEEP`, and `RENAME` follow the syntax used by
+[`GET`](get.md).
+
+`IMPORT` does not cause the data to be read; only the dictionary.
+The data is read later, when a procedure is executed.
+
+Use of `IMPORT` to read a system file is a PSPP extension.
+
--- /dev/null
+# INCLUDE
+
+```
+INCLUDE [FILE=]'FILE_NAME' [ENCODING='ENCODING'].
+```
+
+`INCLUDE` causes the PSPP command processor to read an additional
+command file as if it were included bodily in the current command file.
+If errors are encountered in the included file, then command processing
+stops and no more commands are processed. Include files may be nested
+to any depth, up to the limit of available memory.
+
+The [`INSERT`](insert.md) command is a more flexible alternative to
+`INCLUDE`. An `INCLUDE` command acts the same as `INSERT` with
+`ERROR=STOP CD=NO SYNTAX=BATCH` specified.
+
+The optional `ENCODING` subcommand has the same meaning as with
+`INSERT`.
+
--- /dev/null
+# INPUT PROGRAM…END INPUT PROGRAM
+
+```
+INPUT PROGRAM.
+... input commands ...
+END INPUT PROGRAM.
+```
+
+ `INPUT PROGRAM`...`END INPUT PROGRAM` specifies a complex input
+program. By placing data input commands within `INPUT PROGRAM`, PSPP
+programs can take advantage of more complex file structures than
+available with only `DATA LIST`.
+
+ The first sort of extended input program is to simply put multiple
+`DATA LIST` commands within the `INPUT PROGRAM`. This will cause all of
+the data files to be read in parallel. Input will stop when end of file
+is reached on any of the data files.
+
+ Transformations, such as conditional and looping constructs, can also
+be included within `INPUT PROGRAM`. These can be used to combine input
+from several data files in more complex ways. However, input will still
+stop when end of file is reached on any of the data files.
+
+ To prevent `INPUT PROGRAM` from terminating at the first end of
+file, use the `END` subcommand on `DATA LIST`. This subcommand takes
+a variable name, which should be a numeric [scratch
+variable](../language/datasets/scratch-variables.md). (It need not
+variable](../language/datasets/scratch-variables.md). (It need not
+be a scratch variable but otherwise the results can be surprising.)
+The value of this variable is set to 0 when reading the data file, or
+1 when end of file is encountered.
+
+ Two additional commands are useful in conjunction with `INPUT
+PROGRAM`. `END CASE` is the first. Normally each loop through the
+`INPUT PROGRAM` structure produces one case. `END CASE` controls
+exactly when cases are output. When `END CASE` is used, looping from
+the end of `INPUT PROGRAM` to the beginning does not cause a case to be
+output.
+
+ `END FILE` is the second. When the `END` subcommand is used on `DATA
+LIST`, there is no way for the `INPUT PROGRAM` construct to stop
+looping, so an infinite loop results. `END FILE`, when executed, stops
+the flow of input data and passes out of the `INPUT PROGRAM` structure.
+
+ `INPUT PROGRAM` must contain at least one `DATA LIST` or `END FILE`
+command.
+
+## Example 1: Read two files in parallel to the end of the shorter
+
+The following example reads variable `X` from file `a.txt` and
+variable `Y` from file `b.txt`. If one file is shorter than the other
+then the extra data in the longer file is ignored.
+
+```
+INPUT PROGRAM.
+ DATA LIST NOTABLE FILE='a.txt'/X 1-10.
+ DATA LIST NOTABLE FILE='b.txt'/Y 1-10.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 2: Read two files in parallel, supplementing the shorter
+
+The following example also reads variable `X` from `a.txt` and
+variable `Y` from `b.txt`. If one file is shorter than the other then
+it continues reading the longer to its end, setting the other variable
+to system-missing.
+
+```
+INPUT PROGRAM.
+ NUMERIC #A #B.
+
+ DO IF NOT #A.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ END IF.
+ DO IF NOT #B.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10.
+ END IF.
+ DO IF #A AND #B.
+ END FILE.
+ END IF.
+ END CASE.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 3: Concatenate two files (version 1)
+
+The following example reads data from file `a.txt`, then from `b.txt`,
+and concatenates them into a single active dataset.
+
+```
+INPUT PROGRAM.
+ NUMERIC #A #B.
+
+ DO IF #A.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10.
+ DO IF #B.
+ END FILE.
+ ELSE.
+ END CASE.
+ END IF.
+ ELSE.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ DO IF NOT #A.
+ END CASE.
+ END IF.
+ END IF.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 4: Concatenate two files (version 2)
+
+This is another way to do the same thing as Example 3.
+
+```
+INPUT PROGRAM.
+ NUMERIC #EOF.
+
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ COMPUTE #EOF = 0.
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ END FILE.
+END INPUT PROGRAM.
+LIST.
+```
+
+## Example 5: Generate random variates
+
+The follows example creates a dataset that consists of 50 random
+variates between 0 and 10.
+
+```
+INPUT PROGRAM.
+ LOOP #I=1 TO 50.
+ COMPUTE X=UNIFORM(10).
+ END CASE.
+ END LOOP.
+ END FILE.
+END INPUT PROGRAM.
+LIST /FORMAT=NUMBERED.
+```
--- /dev/null
+# INSERT
+
+```
+INSERT [FILE=]'FILE_NAME'
+ [CD={NO,YES}]
+ [ERROR={CONTINUE,STOP}]
+ [SYNTAX={BATCH,INTERACTIVE}]
+ [ENCODING={LOCALE, 'CHARSET_NAME'}].
+```
+
+`INSERT` is similar to [`INCLUDE`](include.md) but more flexible. It
+causes the command processor to read a file as if it were embedded in
+the current command file.
+
+If `CD=YES` is specified, then before including the file, the current
+directory becomes the directory of the included file. The default
+setting is `CD=NO`. This directory remains current until it is
+changed explicitly (with the `CD` command, or a subsequent `INSERT`
+command with the `CD=YES` option). It does not revert to its original
+setting even after the included file is finished processing.
+
+If `ERROR=STOP` is specified, errors encountered in the inserted file
+causes processing to immediately cease. Otherwise processing continues
+at the next command. The default setting is `ERROR=CONTINUE`.
+
+If `SYNTAX=INTERACTIVE` is specified then the syntax contained in the
+included file must conform to [interactive syntax
+conventions](../language/basics/syntax-variants.md). The default
+conventions](../language/basics/syntax-variants.md). The default
+setting is `SYNTAX=BATCH`.
+
+`ENCODING` optionally specifies the character set used by the
+included file. Its argument, which is not case-sensitive, must be in
+one of the following forms:
+
+* `LOCALE`
+ The encoding used by the system locale, or as overridden by [`SET
+ LOCALE`](set.md#locale). On GNU/Linux and other Unix-like systems,
+ environment variables, e.g. `LANG` or `LC_ALL`, determine the system
+ locale.
+
+* `'CHARSET_NAME'`
+ An [IANA character set
+ name](http://www.iana.org/assignments/character-sets). Some
+ examples are `ASCII` (United States), `ISO-8859-1` (western Europe),
+ `EUC-JP` (Japan), and `windows-1252` (Windows). Not all systems
+ support all character sets.
+
+* `Auto,ENCODING`
+ Automatically detects whether a syntax file is encoded in a Unicode
+ encoding such as UTF-8, UTF-16, or UTF-32. If it is not, then PSPP
+ generally assumes that the file is encoded in `ENCODING` (an IANA
+ character set name). However, if `ENCODING` is UTF-8, and the
+ syntax file is not valid UTF-8, PSPP instead assumes that the file
+ is encoded in `windows-1252`.
+
+ For best results, `ENCODING` should be an ASCII-compatible encoding
+ (the most common locale encodings are all ASCII-compatible),
+ because encodings that are not ASCII compatible cannot be
+ automatically distinguished from UTF-8.
+
+* `Auto`
+ `Auto,Locale`
+ Automatic detection, as above, with the default encoding taken from
+ the system locale or the setting on `SET LOCALE`.
+
+When `ENCODING` is not specified, the default is taken from the
+`--syntax-encoding` command option, if it was specified, and otherwise
+it is `Auto`.
+
--- /dev/null
+# LEAVE
+
+`LEAVE` prevents the specified variables from being reinitialized
+whenever a new case is processed.
+
+```
+LEAVE VAR_LIST.
+```
+
+Normally, when a data file is processed, every variable in the active
+dataset is initialized to the system-missing value or spaces at the
+beginning of processing for each case. When a variable has been
+specified on `LEAVE`, this is not the case. Instead, that variable is
+initialized to 0 (not system-missing) or spaces for the first case.
+After that, it retains its value between cases.
+
+This becomes useful for counters. For instance, in the example below
+the variable `SUM` maintains a running total of the values in the
+`ITEM` variable.
+
+```
+DATA LIST /ITEM 1-3.
+COMPUTE SUM=SUM+ITEM.
+PRINT /ITEM SUM.
+LEAVE SUM
+BEGIN DATA.
+123
+404
+555
+999
+END DATA.
+```
+
+Partial output from this example:
+
+```
+123 123.00
+404 527.00
+555 1082.00
+999 2081.00
+```
+
+It is best to use `LEAVE` command immediately before invoking a
+procedure command, because the left status of variables is reset by
+certain transformations—for instance, `COMPUTE` and `IF`. Left status
+is also reset by all procedure invocations.
+
--- /dev/null
+# LIST
+
+```
+LIST
+ /VARIABLES=VAR_LIST
+ /CASES=FROM START_INDEX TO END_INDEX BY INCR_INDEX
+ /FORMAT={UNNUMBERED,NUMBERED} {WRAP,SINGLE}
+```
+
+ The `LIST` procedure prints the values of specified variables to the
+listing file.
+
+ The `VARIABLES` subcommand specifies the variables whose values are
+to be printed. Keyword `VARIABLES` is optional. If the `VARIABLES`
+subcommand is omitted then all variables in the active dataset are
+printed.
+
+ The `CASES` subcommand can be used to specify a subset of cases to be
+printed. Specify `FROM` and the case number of the first case to print,
+`TO` and the case number of the last case to print, and `BY` and the
+number of cases to advance between printing cases, or any subset of
+those settings. If `CASES` is not specified then all cases are printed.
+
+ The `FORMAT` subcommand can be used to change the output format.
+`NUMBERED` will print case numbers along with each case; `UNNUMBERED`,
+the default, causes the case numbers to be omitted. The `WRAP` and
+`SINGLE` settings are currently not used.
+
+ Case numbers start from 1. They are counted after all
+transformations have been considered.
+
+ `LIST` is a procedure. It causes the data to be read.
+
--- /dev/null
+# LOGISTIC REGRESSION
+
+```
+LOGISTIC REGRESSION [VARIABLES =] DEPENDENT_VAR WITH PREDICTORS
+ [/CATEGORICAL = CATEGORICAL_PREDICTORS]
+ [{/NOCONST | /ORIGIN | /NOORIGIN }]
+ [/PRINT = [SUMMARY] [DEFAULT] [CI(CONFIDENCE)] [ALL]]
+ [/CRITERIA = [BCON(MIN_DELTA)] [ITERATE(MAX_INTERATIONS)]
+ [LCON(MIN_LIKELIHOOD_DELTA)] [EPS(MIN_EPSILON)]
+ [CUT(CUT_POINT)]]
+ [/MISSING = {INCLUDE|EXCLUDE}]
+```
+
+Bivariate Logistic Regression is used when you want to explain a
+dichotomous dependent variable in terms of one or more predictor
+variables.
+
+The minimum command is
+```
+LOGISTIC REGRESSION y WITH x1 x2 ... xN.
+```
+
+Here, `y` is the dependent variable, which must be dichotomous and
+`x1` through `xN` are the predictor variables whose coefficients the
+procedure estimates.
+
+By default, a constant term is included in the model. Hence, the
+full model is $${\bf y} = b_0 + b_1 {\bf x_1} + b_2 {\bf x_2} + \dots +
+b_n {\bf x_n}.$$
+
+Predictor variables which are categorical in nature should be listed
+on the `/CATEGORICAL` subcommand. Simple variables as well as
+interactions between variables may be listed here.
+
+If you want a model without the constant term b_0, use the keyword
+`/ORIGIN`. `/NOCONST` is a synonym for `/ORIGIN`.
+
+An iterative Newton-Raphson procedure is used to fit the model. The
+`/CRITERIA` subcommand is used to specify the stopping criteria of the
+procedure, and other parameters. The value of `CUT_POINT` is used in the
+classification table. It is the threshold above which predicted values
+are considered to be 1. Values of `CUT_POINT` must lie in the range
+\[0,1\]. During iterations, if any one of the stopping criteria are
+satisfied, the procedure is considered complete. The stopping criteria
+are:
+
+- The number of iterations exceeds `MAX_ITERATIONS`. The default value
+ of `MAX_ITERATIONS` is 20.
+- The change in the all coefficient estimates are less than
+ `MIN_DELTA`. The default value of `MIN_DELTA` is 0.001.
+- The magnitude of change in the likelihood estimate is less than
+ `MIN_LIKELIHOOD_DELTA`. The default value of `MIN_LIKELIHOOD_DELTA`
+ is zero. This means that this criterion is disabled.
+- The differential of the estimated probability for all cases is less
+ than `MIN_EPSILON`. In other words, the probabilities are close to
+ zero or one. The default value of `MIN_EPSILON` is 0.00000001.
+
+The `PRINT` subcommand controls the display of optional statistics.
+Currently there is one such option, `CI`, which indicates that the
+confidence interval of the odds ratio should be displayed as well as its
+value. `CI` should be followed by an integer in parentheses, to
+indicate the confidence level of the desired confidence interval.
+
+The `MISSING` subcommand determines the handling of missing
+variables. If `INCLUDE` is set, then user-missing values are included
+in the calculations, but system-missing values are not. If `EXCLUDE` is
+set, which is the default, user-missing values are excluded as well as
+system-missing values. This is the default.
+
--- /dev/null
+# LOOP…END LOOP
+
+```
+LOOP [INDEX_VAR=START TO END [BY INCR]] [IF CONDITION].
+ ...
+END LOOP [IF CONDITION].
+```
+
+`LOOP` iterates a group of commands. A number of termination options
+are offered.
+
+Specify `INDEX_VAR` to make that variable count from one value to
+another by a particular increment. `INDEX_VAR` must be a pre-existing
+numeric variable. `START`, `END`, and `INCR` are numeric
+[expressions](../language/expressions/index.md).
+[expressions](../language/expressions/index.md).
+
+During the first iteration, `INDEX_VAR` is set to the value of
+`START`. During each successive iteration, `INDEX_VAR` is increased
+by the value of `INCR`. If `END > START`, then the loop terminates
+when `INDEX_VAR > END`; otherwise it terminates when `INDEX_VAR <
+END`. If `INCR` is not specified then it defaults to +1 or -1 as
+appropriate.
+
+If `END > START` and `INCR < 0`, or if `END < START` and `INCR > 0`,
+then the loop is never executed. `INDEX_VAR` is nevertheless set to
+the value of start.
+
+Modifying `INDEX_VAR` within the loop is allowed, but it has no effect
+on the value of `INDEX_VAR` in the next iteration.
+
+Specify a boolean expression for the condition on `LOOP` to cause the
+loop to be executed only if the condition is true. If the condition
+is false or missing before the loop contents are executed the first
+time, the loop contents are not executed at all.
+
+If index and condition clauses are both present on `LOOP`, the index
+variable is always set before the condition is evaluated. Thus, a
+condition that makes use of the index variable will always see the index
+value to be used in the next execution of the body.
+
+Specify a boolean expression for the condition on `END LOOP` to cause
+the loop to terminate if the condition is true after the enclosed code
+block is executed. The condition is evaluated at the end of the loop,
+not at the beginning, so that the body of a loop with only a condition
+on `END LOOP` will always execute at least once.
+
+If the index clause is not present, then the global
+[`MXLOOPS`](set.md#mxloops) setting, which defaults to
+40, limits the number of iterations.
+
+[`BREAK`](break.md) also terminates `LOOP` execution.
+
+Loop index variables are by default reset to system-missing from one
+case to another, not left, unless a scratch variable is used as index.
+When loops are nested, this is usually undesired behavior, which can
+be corrected with [`LEAVE`](leave.md) or by using a [scratch
+variable](../language/datasets/scratch-variables.md) as the loop
+index.
+
+When `LOOP` or `END LOOP` is specified following
+[`TEMPORARY`](temporary.md), the
+[`LAG`](../language/expressions/functions/miscellaneous.md) function
+may not be used.
+
--- /dev/null
+# MATCH FILES
+
+```
+MATCH FILES
+
+Per input file:
+ /{FILE,TABLE}={*,'FILE_NAME'}
+ [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+ [/IN=VAR_NAME]
+ [/SORT]
+
+Once per command:
+ /BY VAR_LIST[({D|A}] [VAR_LIST[({D|A})]...]
+ [/DROP=VAR_LIST]
+ [/KEEP=VAR_LIST]
+ [/FIRST=VAR_NAME]
+ [/LAST=VAR_NAME]
+ [/MAP]
+```
+
+`MATCH FILES` merges sets of corresponding cases in multiple input
+files into single cases in the output, combining their data.
+
+`MATCH FILES` shares the bulk of its syntax with other PSPP commands
+for combining multiple data files (see [Common
+Syntax](combining.md#common-syntax) for details).
+
+How `MATCH FILES` matches up cases from the input files depends on
+whether `BY` is specified:
+
+- If `BY` is not used, `MATCH FILES` combines the first case from
+ each input file to produce the first output case, then the second
+ case from each input file for the second output case, and so on.
+ If some input files have fewer cases than others, then the shorter
+ files do not contribute to cases output after their input has been
+ exhausted.
+
+- If `BY` is used, `MATCH FILES` combines cases from each input file
+ that have identical values for the `BY` variables.
+
+ When `BY` is used, `TABLE` subcommands may be used to introduce
+ "table lookup files". `TABLE` has same syntax as `FILE`, and the
+ `RENAME`, `IN`, and `SORT` subcommands may follow a `TABLE` in the
+ same way as `FILE`. Regardless of the number of `TABLE`s, at least
+ one `FILE` must specified. Table lookup files are treated in the
+ same way as other input files for most purposes and, in particular,
+ table lookup files must be sorted on the `BY` variables or the
+ `SORT` subcommand must be specified for that `TABLE`.
+
+ Cases in table lookup files are not consumed after they have been
+ used once. This means that data in table lookup files can
+ correspond to any number of cases in `FILE` input files. Table
+ lookup files are analogous to lookup tables in traditional
+ relational database systems.
+
+ If a table lookup file contains more than one case with a given set
+ of `BY` variables, only the first case is used.
+
+When `MATCH FILES` creates an output case, variables that are only in
+files that are not present for the current case are set to the
+system-missing value for numeric variables or spaces for string
+variables.
+
--- /dev/null
+# Matrices
+
+Some PSPP procedures work with matrices by producing numeric matrices
+that report results of data analysis, or by consuming matrices as a
+basis for further analysis. This chapter documents the [format of
+data files](#matrix-files) that store these matrices and commands for
+working with them, as well as PSPP's general-purpose facility for
+matrix operations.
+
+## Matrix Files
+
+A matrix file is an SPSS system file that conforms to the dictionary and
+case structure described in this section. Procedures that read matrices
+from files expect them to be in the matrix file format. Procedures that
+write matrices also use this format.
+
+Text files that contain matrices can be converted to matrix file
+format. The [MATRIX DATA](matrix-data.md) command can read a text
+file as a matrix file.
+
+A matrix file's dictionary must have the following variables in the
+specified order:
+
+1. Zero or more numeric split variables. These are included by
+ procedures when [`SPLIT FILE`](split-file.md) is active. [`MATRIX
+ DATA`](matrix-data.md) assigns split variables format `F4.0`.
+
+2. `ROWTYPE_`, a string variable with width 8. This variable
+ indicates the kind of matrix or vector that a given case
+ represents. The supported row types are listed below.
+
+3. Zero or more numeric factor variables. These are included by
+ procedures that divide data into cells. For within-cell data,
+ factor variables are filled with non-missing values; for pooled
+ data, they are missing. [`MATRIX DATA`](matrix-data.md) assigns
+ factor variables format `F4.0`.
+
+4. `VARNAME_`, a string variable. Matrix data includes one row per
+ continuous variable (see below), naming each continuous variable in
+ order. This column is blank for vector data. [`MATRIX
+ DATA`](matrix-data.md) makes `VARNAME_` wide enough for the name of
+ any of the continuous variables, but at least 8 bytes.
+
+5. One or more numeric continuous variables. These are the variables
+ whose data was analyzed to produce the matrices. [`MATRIX
+ DATA`](matrix-data.md) assigns continuous variables format `F10.4`.
+
+Case weights are ignored in matrix files.
+
+### Row Types
+
+Matrix files support a fixed set of types of matrix and vector data.
+The `ROWTYPE_` variable in each case of a matrix file indicates its row
+type.
+
+The supported matrix row types are listed below. Each type is listed
+with the keyword that identifies it in `ROWTYPE_`. All supported types
+of matrices are square, meaning that each matrix must include one row
+per continuous variable, with the `VARNAME_` variable indicating each
+continuous variable in turn in the same order as the dictionary.
+
+* `CORR`
+ Correlation coefficients.
+
+* `COV`
+ Covariance coefficients.
+
+* `MAT`
+ General-purpose matrix.
+
+* `N_MATRIX`
+ Counts.
+
+* `PROX`
+ Proximities matrix.
+
+The supported vector row types are listed below, along with their
+associated keyword. Vector row types only require a single row, whose
+`VARNAME_` is blank:
+
+* `COUNT`
+ Unweighted counts.
+
+* `DFE`
+ Degrees of freedom.
+
+* `MEAN`
+ Means.
+
+* `MSE`
+ Mean squared errors.
+
+* `N`
+ Counts.
+
+* `STDDEV`
+ Standard deviations.
+
+Only the row types listed above may appear in matrix files. The
+[`MATRIX DATA`](matrix-data.md) command, however, accepts the additional row types
+listed below, which it changes into matrix file row types as part of
+its conversion process:
+
+* `N_VECTOR`
+ Synonym for `N`.
+
+* `SD`
+ Synonym for `STDDEV`.
+
+* `N_SCALAR`
+ Accepts a single number from the [`MATRIX DATA`](matrix-data.md)
+ input and writes it as an `N` row with the number replicated across
+ all the continuous variables.
+
--- /dev/null
+MATRIX DATA
+================
+
+```
+MATRIX DATA
+ VARIABLES=VARIABLES
+ [FILE={'FILE_NAME' | INLINE}
+ [/FORMAT=[{LIST | FREE}]
+ [{UPPER | LOWER | FULL}]
+ [{DIAGONAL | NODIAGONAL}]]
+ [/SPLIT=SPLIT_VARS]
+ [/FACTORS=FACTOR_VARS]
+ [/N=N]
+
+The following subcommands are only needed when ROWTYPE_ is not
+specified on the VARIABLES subcommand:
+ [/CONTENTS={CORR,COUNT,COV,DFE,MAT,MEAN,MSE,
+ N_MATRIX,N|N_VECTOR,N_SCALAR,PROX,SD|STDDEV}]
+ [/CELLS=N_CELLS]
+```
+
+The `MATRIX DATA` command convert matrices and vectors from text
+format into the [matrix file format](matrices.md#matrix-files) for use
+by procedures that read matrices. It reads a text file or inline data
+and outputs to the active file, replacing any data already in the
+active dataset. The matrix file may then be used by other commands
+directly from the active file, or it may be written to a `.sav` file
+using the `SAVE` command.
+
+The text data read by `MATRIX DATA` can be delimited by spaces or
+commas. A plus or minus sign, except immediately following a `d` or
+`e`, also begins a new value. Optionally, values may be enclosed in
+single or double quotes.
+
+`MATRIX DATA` can read the types of matrix and vector data supported
+in matrix files (see [Row Types](matrices.md#row-types)).
+
+The `FILE` subcommand specifies the source of the command's input. To
+read input from a text file, specify its name in quotes. To supply
+input inline, omit `FILE` or specify `INLINE`. Inline data must
+directly follow `MATRIX DATA`, inside [`BEGIN DATA`](begin-data.md).
+
+`VARIABLES` is the only required subcommand. It names the variables
+present in each input record in the order that they appear. (`MATRIX
+DATA` reorders the variables in the matrix file it produces, if needed
+to fit the matrix file format.) The variable list must include split
+variables and factor variables, if they are present in the data, in
+addition to the continuous variables that form matrix rows and columns.
+It may also include a special variable named `ROWTYPE_`.
+
+Matrix data may include split variables or factor variables or both.
+List split variables, if any, on the `SPLIT` subcommand and factor
+variables, if any, on the `FACTORS` subcommand. Split and factor
+variables must be numeric. Split and factor variables must also be
+listed on `VARIABLES`, with one exception: if `VARIABLES` does not
+include `ROWTYPE_`, then `SPLIT` may name a single variable that is not
+in `VARIABLES` (see [Example 8](#example-8-split-variable-with-sequential-values)).
+
+The `FORMAT` subcommand accepts settings to describe the format of
+the input data:
+
+* `LIST` (default)
+ `FREE`
+
+ `LIST` requires each row to begin at the start of a new input line.
+ `FREE` allows rows to begin in the middle of a line. Either setting
+ allows a single row to continue across multiple input lines.
+
+* `LOWER` (default)
+ `UPPER`
+ `FULL`
+
+ With `LOWER`, only the lower triangle is read from the input data and
+ the upper triangle is mirrored across the main diagonal. `UPPER`
+ behaves similarly for the upper triangle. `FULL` reads the entire
+ matrix.
+
+* `DIAGONAL` (default)
+ `NODIAGONAL`
+
+ With `DIAGONAL`, the main diagonal is read from the input data. With
+ `NODIAGONAL`, which is incompatible with `FULL`, the main diagonal is
+ not read from the input data but instead set to 1 for correlation
+ matrices and system-missing for others.
+
+The `N` subcommand is a way to specify the size of the population.
+It is equivalent to specifying an `N` vector with the specified value
+for each split file.
+
+`MATRIX DATA` supports two different ways to indicate the kinds of
+matrices and vectors present in the data, depending on whether a
+variable with the special name `ROWTYPE_` is present in `VARIABLES`.
+The following subsections explain `MATRIX DATA` syntax and behavior in
+each case.
+
+<!-- toc -->
+
+## With `ROWTYPE_`
+
+If `VARIABLES` includes `ROWTYPE_`, each case's `ROWTYPE_` indicates
+the type of data contained in the row. See [Row
+Types](matrices.md#row-types) for a list of supported row types.
+
+### Example 1: Defaults with `ROWTYPE_`
+
+This example shows a simple use of `MATRIX DATA` with `ROWTYPE_` plus 8
+variables named `var01` through `var08`.
+
+Because `ROWTYPE_` is the first variable in `VARIABLES`, it appears
+first on each line. The first three lines in the example data have
+`ROWTYPE_` values of `MEAN`, `SD`, and `N`. These indicate that these
+lines contain vectors of means, standard deviations, and counts,
+respectively, for `var01` through `var08` in order.
+
+The remaining 8 lines have a ROWTYPE_ of `CORR` which indicates that
+the values are correlation coefficients. Each of the lines corresponds
+to a row in the correlation matrix: the first line is for `var01`, the
+next line for `var02`, and so on. The input only contains values for
+the lower triangle, including the diagonal, since `FORMAT=LOWER
+DIAGONAL` is the default.
+
+With `ROWTYPE_`, the `CONTENTS` subcommand is optional and the
+`CELLS` subcommand may not be used.
+
+```
+MATRIX DATA
+ VARIABLES=ROWTYPE_ var01 TO var08.
+BEGIN DATA.
+MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+N 92 92 92 92 92 92 92 92
+CORR 1.00
+CORR .18 1.00
+CORR -.22 -.17 1.00
+CORR .36 .31 -.14 1.00
+CORR .27 .16 -.12 .22 1.00
+CORR .33 .15 -.17 .24 .21 1.00
+CORR .50 .29 -.20 .32 .12 .38 1.00
+CORR .17 .29 -.05 .20 .27 .20 .04 1.00
+END DATA.
+```
+
+### Example 2: `FORMAT=UPPER NODIAGONAL`
+
+This syntax produces the same matrix file as example 1, but it uses
+`FORMAT=UPPER NODIAGONAL` to specify the upper triangle and omit the
+diagonal. Because the matrix's `ROWTYPE_` is `CORR`, PSPP automatically
+fills in the diagonal with 1.
+
+```
+MATRIX DATA
+ VARIABLES=ROWTYPE_ var01 TO var08
+ /FORMAT=UPPER NODIAGONAL.
+BEGIN DATA.
+MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+N 92 92 92 92 92 92 92 92
+CORR .17 .50 -.33 .27 .36 -.22 .18
+CORR .29 .29 -.20 .32 .12 .38
+CORR .05 .20 -.15 .16 .21
+CORR .20 .32 -.17 .12
+CORR .27 .12 -.24
+CORR -.20 -.38
+CORR .04
+END DATA.
+```
+
+### Example 3: `N` subcommand
+
+This syntax uses the `N` subcommand in place of an `N` vector. It
+produces the same matrix file as examples 1 and 2.
+
+```
+MATRIX DATA
+ VARIABLES=ROWTYPE_ var01 TO var08
+ /FORMAT=UPPER NODIAGONAL
+ /N 92.
+BEGIN DATA.
+MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+CORR .17 .50 -.33 .27 .36 -.22 .18
+CORR .29 .29 -.20 .32 .12 .38
+CORR .05 .20 -.15 .16 .21
+CORR .20 .32 -.17 .12
+CORR .27 .12 -.24
+CORR -.20 -.38
+CORR .04
+END DATA.
+```
+
+### Example 4: Split variables
+
+This syntax defines two matrices, using the variable `s1` to distinguish
+between them. Notice how the order of variables in the input matches
+their order on `VARIABLES`. This example also uses `FORMAT=FULL`.
+
+```
+MATRIX DATA
+ VARIABLES=s1 ROWTYPE_ var01 TO var04
+ /SPLIT=s1
+ /FORMAT=FULL.
+BEGIN DATA.
+0 MEAN 34 35 36 37
+0 SD 22 11 55 66
+0 N 99 98 99 92
+0 CORR 1 .9 .8 .7
+0 CORR .9 1 .6 .5
+0 CORR .8 .6 1 .4
+0 CORR .7 .5 .4 1
+1 MEAN 44 45 34 39
+1 SD 23 15 51 46
+1 N 98 34 87 23
+1 CORR 1 .2 .3 .4
+1 CORR .2 1 .5 .6
+1 CORR .3 .5 1 .7
+1 CORR .4 .6 .7 1
+END DATA.
+```
+
+### Example 5: Factor variables
+
+This syntax defines a matrix file that includes a factor variable `f1`.
+The data includes mean, standard deviation, and count vectors for two
+values of the factor variable, plus a correlation matrix for pooled
+data.
+
+```
+MATRIX DATA
+ VARIABLES=ROWTYPE_ f1 var01 TO var04
+ /FACTOR=f1.
+BEGIN DATA.
+MEAN 0 34 35 36 37
+SD 0 22 11 55 66
+N 0 99 98 99 92
+MEAN 1 44 45 34 39
+SD 1 23 15 51 46
+N 1 98 34 87 23
+CORR . 1
+CORR . .9 1
+CORR . .8 .6 1
+CORR . .7 .5 .4 1
+END DATA.
+```
+
+## Without `ROWTYPE_`
+
+If `VARIABLES` does not contain `ROWTYPE_`, the `CONTENTS` subcommand
+defines the row types that appear in the file and their order. If
+`CONTENTS` is omitted, `CONTENTS=CORR` is assumed.
+
+Factor variables without `ROWTYPE_` introduce special requirements,
+illustrated below in Examples 8 and 9.
+
+### Example 6: Defaults without `ROWTYPE_`
+
+This example shows a simple use of `MATRIX DATA` with 8 variables named
+`var01` through `var08`, without `ROWTYPE_`. This yields the same
+matrix file as [Example 1](#example-1-defaults-with-rowtype_).
+
+```
+MATRIX DATA
+ VARIABLES=var01 TO var08
+ /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+ 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+ 92 92 92 92 92 92 92 92
+1.00
+ .18 1.00
+-.22 -.17 1.00
+ .36 .31 -.14 1.00
+ .27 .16 -.12 .22 1.00
+ .33 .15 -.17 .24 .21 1.00
+ .50 .29 -.20 .32 .12 .38 1.00
+ .17 .29 -.05 .20 .27 .20 .04 1.00
+END DATA.
+```
+
+### Example 7: Split variables with explicit values
+
+This syntax defines two matrices, using the variable `s1` to distinguish
+between them. Each line of data begins with `s1`. This yields the same
+matrix file as [Example 4](#example-4-split-variables).
+
+```
+MATRIX DATA
+ VARIABLES=s1 var01 TO var04
+ /SPLIT=s1
+ /FORMAT=FULL
+ /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+0 1 .9 .8 .7
+0 .9 1 .6 .5
+0 .8 .6 1 .4
+0 .7 .5 .4 1
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+1 1 .2 .3 .4
+1 .2 1 .5 .6
+1 .3 .5 1 .7
+1 .4 .6 .7 1
+END DATA.
+```
+
+### Example 8: Split variable with sequential values
+
+Like this previous example, this syntax defines two matrices with split
+variable `s1`. In this case, though, `s1` is not listed in `VARIABLES`,
+which means that its value does not appear in the data. Instead,
+`MATRIX DATA` reads matrix data until the input is exhausted, supplying
+1 for the first split, 2 for the second, and so on.
+
+```
+MATRIX DATA
+ VARIABLES=var01 TO var04
+ /SPLIT=s1
+ /FORMAT=FULL
+ /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+34 35 36 37
+22 11 55 66
+99 98 99 92
+ 1 .9 .8 .7
+.9 1 .6 .5
+.8 .6 1 .4
+.7 .5 .4 1
+44 45 34 39
+23 15 51 46
+98 34 87 23
+ 1 .2 .3 .4
+.2 1 .5 .6
+.3 .5 1 .7
+.4 .6 .7 1
+END DATA.
+```
+
+### Factor variables without `ROWTYPE_`
+
+Without `ROWTYPE_`, factor variables introduce two new wrinkles to
+`MATRIX DATA` syntax. First, the `CELLS` subcommand must declare the
+number of combinations of factor variables present in the data. If
+there is, for example, one factor variable for which the data contains
+three values, one would write `CELLS=3`; if there are two (or more)
+factor variables for which the data contains five combinations, one
+would use `CELLS=5`; and so on.
+
+Second, the `CONTENTS` subcommand must distinguish within-cell data
+from pooled data by enclosing within-cell row types in parentheses.
+When different within-cell row types for a single factor appear in
+subsequent lines, enclose the row types in a single set of parentheses;
+when different factors' values for a given within-cell row type appear
+in subsequent lines, enclose each row type in individual parentheses.
+
+Without `ROWTYPE_`, input lines for pooled data do not include factor
+values, not even as missing values, but input lines for within-cell data
+do.
+
+The following examples aim to clarify this syntax.
+
+#### Example 9: Factor variables, grouping within-cell records by factor
+
+This syntax defines the same matrix file as [Example
+5](#example-5-factor-variables), without using `ROWTYPE_`. It
+declares `CELLS=2` because the data contains two values (0 and 1) for
+factor variable `f1`. Within-cell vector row types `MEAN`, `SD`, and
+`N` are in a single set of parentheses on `CONTENTS` because they are
+grouped together in subsequent lines for a single factor value. The
+data lines with the pooled correlation matrix do not have any factor
+values.
+
+```
+MATRIX DATA
+ VARIABLES=f1 var01 TO var04
+ /FACTOR=f1
+ /CELLS=2
+ /CONTENTS=(MEAN SD N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+ 1
+ .9 1
+ .8 .6 1
+ .7 .5 .4 1
+END DATA.
+```
+
+#### Example 10: Factor variables, grouping within-cell records by row type
+
+This syntax defines the same matrix file as the previous example. The
+only difference is that the within-cell vector rows are grouped
+differently: two rows of means (one for each factor), followed by two
+rows of standard deviations, followed by two rows of counts.
+
+```
+MATRIX DATA
+ VARIABLES=f1 var01 TO var04
+ /FACTOR=f1
+ /CELLS=2
+ /CONTENTS=(MEAN) (SD) (N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+1 44 45 34 39
+0 22 11 55 66
+1 23 15 51 46
+0 99 98 99 92
+1 98 34 87 23
+ 1
+ .9 1
+ .8 .6 1
+ .7 .5 .4 1
+END DATA.
+```
--- /dev/null
+# MATRIX…END MATRIX
+
+<!-- toc -->
+
+## Summary
+
+```
+MATRIX.
+…matrix commands…
+END MATRIX.
+```
+
+The following basic matrix commands are supported:
+
+```
+COMPUTE variable[(index[,index])]=expression.
+CALL procedure(argument, …).
+PRINT [expression]
+ [/FORMAT=format]
+ [/TITLE=title]
+ [/SPACE={NEWPAGE | n}]
+ [{/RLABELS=string… | /RNAMES=expression}]
+ [{/CLABELS=string… | /CNAMES=expression}].
+```
+
+The following matrix commands offer support for flow control:
+
+```
+DO IF expression.
+ …matrix commands…
+[ELSE IF expression.
+ …matrix commands…]…
+[ELSE
+ …matrix commands…]
+END IF.
+
+LOOP [var=first TO last [BY step]] [IF expression].
+ …matrix commands…
+END LOOP [IF expression].
+
+BREAK.
+```
+
+The following matrix commands support matrix input and output:
+
+```
+READ variable[(index[,index])]
+ [/FILE=file]
+ /FIELD=first TO last [BY width]
+ [/FORMAT=format]
+ [/SIZE=expression]
+ [/MODE={RECTANGULAR | SYMMETRIC}]
+ [/REREAD].
+WRITE expression
+ [/OUTFILE=file]
+ /FIELD=first TO last [BY width]
+ [/MODE={RECTANGULAR | TRIANGULAR}]
+ [/HOLD]
+ [/FORMAT=format].
+GET variable[(index[,index])]
+ [/FILE={file | *}]
+ [/VARIABLES=variable…]
+ [/NAMES=expression]
+ [/MISSING={ACCEPT | OMIT | number}]
+ [/SYSMIS={OMIT | number}].
+SAVE expression
+ [/OUTFILE={file | *}]
+ [/VARIABLES=variable…]
+ [/NAMES=expression]
+ [/STRINGS=variable…].
+MGET [/FILE=file]
+ [/TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}].
+MSAVE expression
+ /TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}
+ [/OUTFILE=file]
+ [/VARIABLES=variable…]
+ [/SNAMES=variable…]
+ [/SPLIT=expression]
+ [/FNAMES=variable…]
+ [/FACTOR=expression].
+```
+
+The following matrix commands provide additional support:
+
+```
+DISPLAY [{DICTIONARY | STATUS}].
+RELEASE variable….
+```
+
+`MATRIX` and `END MATRIX` enclose a special PSPP sub-language, called
+the matrix language. The matrix language does not require an active
+dataset to be defined and only a few of the matrix language commands
+work with any datasets that are defined. Each instance of
+`MATRIX`…`END MATRIX` is a separate program whose state is independent
+of any instance, so that variables declared within a matrix program are
+forgotten at its end.
+
+The matrix language works with matrices, where a "matrix" is a
+rectangular array of real numbers. An `N`×`M` matrix has `N` rows and
+`M` columns. Some special cases are important: a `N`×1 matrix is a
+"column vector", a 1×`N` is a "row vector", and a 1×1 matrix is a
+"scalar".
+
+The matrix language also has limited support for matrices that
+contain 8-byte strings instead of numbers. Strings longer than 8 bytes
+are truncated, and shorter strings are padded with spaces. String
+matrices are mainly useful for labeling rows and columns when printing
+numerical matrices with the `MATRIX PRINT` command. Arithmetic
+operations on string matrices will not produce useful results. The user
+should not mix strings and numbers within a matrix.
+
+The matrix language does not work with cases. A variable in the
+matrix language represents a single matrix.
+
+The matrix language does not support missing values.
+
+`MATRIX` is a procedure, so it cannot be enclosed inside `DO IF`,
+`LOOP`, etc.
+
+Macros defined before a matrix program may be used within a matrix
+program, and macros may expand to include entire matrix programs. The
+[`DEFINE`](define.md) command to define new
+macros may not appear within a matrix program.
+
+The following sections describe the details of the matrix language:
+first, the syntax of matrix expressions, then each of the supported
+commands. The [`COMMENT`](comment.md) command is also supported.
+
+## Matrix Expressions
+
+Many matrix commands use expressions. A matrix expression may use the
+following operators, listed in descending order of operator precedence.
+Within a single level, operators associate from left to right.
+
+- [Function call `()`](#matrix-functions) and [matrix construction `{}`](#matrix-construction-operator-)
+
+- [Indexing `()`](#index-operator-)
+
+- [Unary `+` and `-`](#unary-operators)
+
+- [Integer sequence `:`](#integer-sequence-operator-)
+
+- Matrix [`**`](#matrix-exponentiation-operator-) and elementwise [`&**`](#elementwise-binary-operators) exponentiation.
+
+- Matrix [`*`](#matrix-multiplication-operator-) and elementwise [`&*`](#elementwise-binary-operators) multiplication; [elementwise division `/` and `&/`](#elementwise-binary-operators).
+
+- [Addition `+` and subtraction `-`](#elementwise-binary-operators)
+
+- [Relational `<` `<=` `=` `>=` `>` `<>`](#elementwise-binary-operators)
+
+- [Logical `NOT`](#unary-operators)
+
+- [Logical `AND`](#elementwise-binary-operators)
+
+- [Logical `OR` and `XOR`](#elementwise-binary-operators)
+
+The operators are described in more detail below. [Matrix
+Functions](#matrix-functions) documents matrix functions.
+
+Expressions appear in the matrix language in some contexts where there
+would be ambiguity whether `/` is an operator or a separator between
+subcommands. In these contexts, only the operators with higher
+precedence than `/` are allowed outside parentheses. Later sections
+call these "restricted expressions".
+
+### Matrix Construction Operator `{}`
+
+Use the `{}` operator to construct matrices. Within the curly braces,
+commas separate elements within a row and semicolons separate rows. The
+following examples show a 2×3 matrix, a 1×4 row vector, a 3×1 column
+vector, and a scalar.
+
+```
+{1, 2, 3; 4, 5, 6} ⇒ [1 2 3]
+ [4 5 6]
+{3.14, 6.28, 9.24, 12.57} ⇒ [3.14 6.28 9.42 12.57]
+{1.41; 1.73; 2} ⇒ [1.41]
+ [1.73]
+ [2.00]
+{5} ⇒ 5
+```
+
+ Curly braces are not limited to holding numeric literals. They can
+contain calculations, and they can paste together matrices and vectors
+in any way as long as the result is rectangular. For example, if `m` is
+matrix `{1, 2; 3, 4}`, `r` is row vector `{5, 6}`, and `c` is column
+vector `{7, 8}`, then curly braces can be used as follows:
+
+```
+{m, c; r, 10} ⇒ [1 2 7]
+ [3 4 8]
+ [5 6 10]
+{c, 2 * c, T(r)} ⇒ [7 14 5]
+ [8 16 6]
+```
+
+ The final example above uses the transposition function `T`.
+
+### Integer Sequence Operator `:`
+
+The syntax `FIRST:LAST:STEP` yields a row vector of consecutive integers
+from FIRST to LAST counting by STEP. The final `:STEP` is optional and
+defaults to 1 when omitted.
+
+`FIRST`, `LAST`, and `STEP` must each be a scalar and should be an
+integer (any fractional part is discarded). Because `:` has a high
+precedence, operands other than numeric literals must usually be
+parenthesized.
+
+When `STEP` is positive (or omitted) and `END < START`, or if `STEP`
+is negative and `END > START`, then the result is an empty matrix. If
+`STEP` is 0, then PSPP reports an error.
+
+Here are some examples:
+
+```
+1:6 ⇒ {1, 2, 3, 4, 5, 6}
+1:6:2 ⇒ {1, 3, 5}
+-1:-5:-1 ⇒ {-1, -2, -3, -4, -5}
+-1:-5 ⇒ {}
+2:1:0 ⇒ (error)
+```
+
+### Index Operator `()`
+
+The result of the submatrix or indexing operator, written `M(RINDEX,
+CINDEX)`, contains the rows of `M` whose indexes are given in vector
+`RINDEX` and the columns whose indexes are given in vector `CINDEX`.
+
+ In the simplest case, if `RINDEX` and `CINDEX` are both scalars, the
+result is also a scalar:
+
+```
+{10, 20; 30, 40}(1, 1) ⇒ 10
+{10, 20; 30, 40}(1, 2) ⇒ 20
+{10, 20; 30, 40}(2, 1) ⇒ 30
+{10, 20; 30, 40}(2, 2) ⇒ 40
+```
+
+If the index arguments have multiple elements, then the result
+includes multiple rows or columns:
+
+```
+{10, 20; 30, 40}(1:2, 1) ⇒ {10; 30}
+{10, 20; 30, 40}(2, 1:2) ⇒ {30, 40}
+{10, 20; 30, 40}(1:2, 1:2) ⇒ {10, 20; 30, 40}
+```
+
+The special argument `:` may stand in for all the rows or columns in
+the matrix being indexed, like this:
+
+```
+{10, 20; 30, 40}(:, 1) ⇒ {10; 30}
+{10, 20; 30, 40}(2, :) ⇒ {30, 40}
+{10, 20; 30, 40}(:, :) ⇒ {10, 20; 30, 40}
+```
+
+The index arguments do not have to be in order, and they may contain
+repeated values, like this:
+
+```
+{10, 20; 30, 40}({2, 1}, 1) ⇒ {30; 10}
+{10, 20; 30, 40}(2, {2; 2; ⇒ {40, 40, 30}
+1})
+{10, 20; 30, 40}(2:1:-1, :) ⇒ {30, 40; 10, 20}
+```
+
+When the matrix being indexed is a row or column vector, only a
+single index argument is needed, like this:
+
+```
+{11, 12, 13, 14, 15}(2:4) ⇒ {12, 13, 14}
+{11; 12; 13; 14; 15}(2:4) ⇒ {12; 13; 14}
+```
+
+When an index is not an integer, PSPP discards the fractional part.
+It is an error for an index to be less than 1 or greater than the number
+of rows or columns:
+
+```
+{11, 12, 13, 14}({2.5, ⇒ {12, 14}
+4.6})
+{11; 12; 13; 14}(0) ⇒ (error)
+```
+
+### Unary Operators
+
+The unary operators take a single operand of any dimensions and operate
+on each of its elements independently. The unary operators are:
+
+* `-`: Inverts the sign of each element.
+* `+`: No change.
+* `NOT`: Logical inversion: each positive value becomes 0 and each
+ zero or negative value becomes 1.
+
+Examples:
+
+```
+-{1, -2; 3, -4} ⇒ {-1, 2; -3, 4}
++{1, -2; 3, -4} ⇒ {1, -2; 3, -4}
+NOT {1, 0; -1, 1} ⇒ {0, 1; 1, 0}
+```
+
+### Elementwise Binary Operators
+
+The elementwise binary operators require their operands to be matrices
+with the same dimensions. Alternatively, if one operand is a scalar,
+then its value is treated as if it were duplicated to the dimensions of
+the other operand. The result is a matrix of the same size as the
+operands, in which each element is the result of the applying the
+operator to the corresponding elements of the operands.
+
+The elementwise binary operators are listed below.
+
+- The arithmetic operators, for familiar arithmetic operations:
+
+ - `+`: Addition.
+
+ - `-`: Subtraction.
+
+ - `*`: Multiplication, if one operand is a scalar. (Otherwise this
+ is matrix multiplication, described below.)
+
+ - `/` or `&/`: Division.
+
+ - `&*`: Multiplication.
+
+ - `&**`: Exponentiation.
+
+- The relational operators, whose results are 1 when a comparison is
+ true and 0 when it is false:
+
+ - `<` or `LT`: Less than.
+
+ - `<=` or `LE`: Less than or equal.
+
+ - `=` or `EQ`: Equal.
+
+ - `>` or `GT`: Greater than.
+
+ - `>=` or `GE`: Greater than or equal.
+
+ - `<>` or `~=` or `NE`: Not equal.
+
+- The logical operators, which treat positive operands as true and
+ nonpositive operands as false. They yield 0 for false and 1 for
+ true:
+
+ - `AND`: True if both operands are true.
+
+ - `OR`: True if at least one operand is true.
+
+ - `XOR`: True if exactly one operand is true.
+
+Examples:
+
+```
+1 + 2 ⇒ 3
+1 + {3; 4} ⇒ {4; 5}
+{66, 77; 88, 99} + 5 ⇒ {71, 82; 93, 104}
+{4, 8; 3, 7} + {1, 0; 5, 2} ⇒ {5, 8; 8, 9}
+{1, 2; 3, 4} < {4, 3; 2, 1} ⇒ {1, 1; 0, 0}
+{1, 3; 2, 4} >= 3 ⇒ {0, 1; 0, 1}
+{0, 0; 1, 1} AND {0, 1; 0, ⇒ {0, 0; 0, 1}
+1}
+```
+
+### Matrix Multiplication Operator `*`
+
+If `A` is an `M`×`N` matrix and `B` is an `N`×`P` matrix, then `A*B` is the
+`M`×`P` matrix multiplication product `C`. PSPP reports an error if the
+number of columns in `A` differs from the number of rows in `B`.
+
+The `*` operator performs elementwise multiplication (see above) if
+one of its operands is a scalar.
+
+No built-in operator yields the inverse of matrix multiplication.
+Instead, multiply by the result of `INV` or `GINV`.
+
+Some examples:
+
+```
+{1, 2, 3} * {4; 5; 6} ⇒ 32
+{4; 5; 6} * {1, 2, 3} ⇒ {4, 8, 12;
+ 5, 10, 15;
+ 6, 12, 18}
+```
+
+### Matrix Exponentiation Operator `**`
+
+The result of `A**B` is defined as follows when `A` is a square matrix
+and `B` is an integer scalar:
+
+ - For `B > 0`, `A**B` is `A*…*A`, where there are `B` `A`s. (PSPP
+ implements this efficiently for large `B`, using exponentiation by
+ squaring.)
+
+ - For `B < 0`, `A**B` is `INV(A**(-B))`.
+
+ - For `B = 0`, `A**B` is the identity matrix.
+
+PSPP reports an error if `A` is not square or `B` is not an integer.
+
+Examples:
+
+```
+{2, 5; 1, 4}**3 ⇒ {48, 165; 33, 114}
+{2, 5; 1, 4}**0 ⇒ {1, 0; 0, 1}
+10*{4, 7; 2, 6}**-1 ⇒ {6, -7; -2, 4}
+```
+
+## Matrix Functions
+
+The matrix language support numerous functions in multiple categories.
+The following subsections document each of the currently supported
+functions. The first letter of each parameter's name indicate the
+required argument type:
+
+* `S`: A scalar.
+
+* `N`: A nonnegative integer scalar. (Non-integers are accepted and
+ silently rounded down to the nearest integer.)
+
+* `V`: A row or column vector.
+
+* `M`: A matrix.
+
+### Elementwise Functions
+
+These functions act on each element of their argument independently,
+like the [elementwise operators](#elementwise-binary-operators).
+
+* `ABS(M)`
+ Takes the absolute value of each element of M.
+
+ ```
+ ABS({-1, 2; -3, 0}) ⇒ {1, 2; 3, 0}
+ ```
+
+* `ARSIN(M)`
+ `ARTAN(M)`
+ Computes the inverse sine or tangent, respectively, of each
+ element in M. The results are in radians, between \\(-\pi/2\\)
+ and \\(+\pi/2\\), inclusive.
+
+ The value of \\(\pi\\) can be computed as `4*ARTAN(1)`.
+
+ ```
+ ARSIN({-1, 0, 1}) ⇒ {-1.57, 0, 1.57} (approximately)
+
+ ARTAN({-5, -1, 1, 5}) ⇒ {-1.37, -.79, .79, 1.37} (approximately)
+ ```
+
+* `COS(M)`
+ `SIN(M)`
+ Computes the cosine or sine, respectively, of each element in `M`,
+ which must be in radians.
+
+ ```
+ COS({0.785, 1.57; 3.14, 1.57 + 3.14}) ⇒ {.71, 0; -1, 0}
+ (approximately)
+ ```
+
+* `EXP(M)`
+ Computes \\(e^x\\) for each element \\(x\\) in `M`.
+
+ ```
+ EXP({2, 3; 4, 5}) ⇒ {7.39, 20.09; 54.6, 148.4} (approximately)
+ ```
+
+* `LG10(M)`
+ `LN(M)`
+ Takes the logarithm with base 10 or base \\(e\\), respectively, of each
+ element in `M`.
+
+ ```
+ LG10({1, 10, 100, 1000}) ⇒ {0, 1, 2, 3}
+ LG10(0) ⇒ (error)
+
+ LN({EXP(1), 1, 2, 3, 4}) ⇒ {1, 0, .69, 1.1, 1.39} (approximately)
+ LN(0) ⇒ (error)
+ ```
+
+* `MOD(M, S)`
+ Takes each element in `M` modulo nonzero scalar value `S`, that
+ is, the remainder of division by `S`. The sign of the result is
+ the same as the sign of the dividend.
+
+ ```
+ MOD({5, 4, 3, 2, 1, 0}, 3) ⇒ {2, 1, 0, 2, 1, 0}
+ MOD({5, 4, 3, 2, 1, 0}, -3) ⇒ {2, 1, 0, 2, 1, 0}
+ MOD({-5, -4, -3, -2, -1, 0}, 3) ⇒ {-2, -1, 0, -2, -1, 0}
+ MOD({-5, -4, -3, -2, -1, 0}, -3) ⇒ {-2, -1, 0, -2, -1, 0}
+ MOD({5, 4, 3, 2, 1, 0}, 1.5) ⇒ {.5, 1.0, .0, .5, 1.0, .0}
+ MOD({5, 4, 3, 2, 1, 0}, 0) ⇒ (error)
+ ```
+
+* `RND(M)`
+ `TRUNC(M)`
+ Rounds each element of `M` to an integer. `RND` rounds to the
+ nearest integer, with halves rounded to even integers, and
+ `TRUNC` rounds toward zero.
+
+ ```
+ RND({-1.6, -1.5, -1.4}) ⇒ {-2, -2, -1}
+ RND({-.6, -.5, -.4}) ⇒ {-1, 0, 0}
+ RND({.4, .5, .6} ⇒ {0, 0, 1}
+ RND({1.4, 1.5, 1.6}) ⇒ {1, 2, 2}
+
+ TRUNC({-1.6, -1.5, -1.4}) ⇒ {-1, -1, -1}
+ TRUNC({-.6, -.5, -.4}) ⇒ {0, 0, 0}
+ TRUNC({.4, .5, .6} ⇒ {0, 0, 0}
+ TRUNC({1.4, 1.5, 1.6}) ⇒ {1, 1, 1}
+ ```
+
+* `SQRT(M)`
+ Takes the square root of each element of `M`, which must not be
+ negative.
+
+ ```
+ SQRT({0, 1, 2, 4, 9, 81}) ⇒ {0, 1, 1.41, 2, 3, 9} (approximately)
+ SQRT(-1) ⇒ (error)
+ ```
+
+### Logical Functions
+
+* `ALL(M)`
+ Returns a scalar with value 1 if all of the elements in `M` are
+ nonzero, or 0 if at least one element is zero.
+
+ ```
+ ALL({1, 2, 3} < {2, 3, 4}) ⇒ 1
+ ALL({2, 2, 3} < {2, 3, 4}) ⇒ 0
+ ALL({2, 3, 3} < {2, 3, 4}) ⇒ 0
+ ALL({2, 3, 4} < {2, 3, 4}) ⇒ 0
+ ```
+
+* `ANY(M)`
+ Returns a scalar with value 1 if any of the elements in `M` is
+ nonzero, or 0 if all of them are zero.
+
+ ```
+ ANY({1, 2, 3} < {2, 3, 4}) ⇒ 1
+ ANY({2, 2, 3} < {2, 3, 4}) ⇒ 1
+ ANY({2, 3, 3} < {2, 3, 4}) ⇒ 1
+ ANY({2, 3, 4} < {2, 3, 4}) ⇒ 0
+ ```
+
+### Matrix Construction Functions
+
+* `BLOCK(M1, …, MN)`
+ Returns a block diagonal matrix with as many rows as the sum of
+ its arguments' row counts and as many columns as the sum of their
+ columns. Each argument matrix is placed along the main diagonal
+ of the result, and all other elements are zero.
+
+ ```
+ BLOCK({1, 2; 3, 4}, 5, {7; 8; 9}, {10, 11}) ⇒
+ 1 2 0 0 0 0
+ 3 4 0 0 0 0
+ 0 0 5 0 0 0
+ 0 0 0 7 0 0
+ 0 0 0 8 0 0
+ 0 0 0 9 0 0
+ 0 0 0 0 10 11
+ ```
+
+* `IDENT(N)`
+ `IDENT(NR, NC)`
+ Returns an identity matrix, whose main diagonal elements are one
+ and whose other elements are zero. The returned matrix has `N`
+ rows and columns or `NR` rows and `NC` columns, respectively.
+
+ ```
+ IDENT(1) ⇒ 1
+ IDENT(2) ⇒
+ 1 0
+ 0 1
+ IDENT(3, 5) ⇒
+ 1 0 0 0 0
+ 0 1 0 0 0
+ 0 0 1 0 0
+ IDENT(5, 3) ⇒
+ 1 0 0
+ 0 1 0
+ 0 0 1
+ 0 0 0
+ 0 0 0
+ ```
+
+* `MAGIC(N)`
+ Returns an `N`×`N` matrix that contains each of the integers 1…`N`
+ once, in which each column, each row, and each diagonal sums to
+ \\(n(n^2+1)/2\\). There are many magic squares with given dimensions,
+ but this function always returns the same one for a given value of
+ N.
+
+ ```
+ MAGIC(3) ⇒ {8, 1, 6; 3, 5, 7; 4, 9, 2}
+ MAGIC(4) ⇒ {1, 5, 12, 16; 15, 11, 6, 2; 14, 8, 9, 3; 4, 10, 7, 13}
+ ```
+
+* `MAKE(NR, NC, S)`
+ Returns an `NR`×`NC` matrix whose elements are all `S`.
+
+ ```
+ MAKE(1, 2, 3) ⇒ {3, 3}
+ MAKE(2, 1, 4) ⇒ {4; 4}
+ MAKE(2, 3, 5) ⇒ {5, 5, 5; 5, 5, 5}
+ ```
+
+* <a name="mdiag">`MDIAG(V)`</a>
+ Given `N`-element vector `V`, returns a `N`×`N` matrix whose main
+ diagonal is copied from `V`. The other elements in the returned
+ vector are zero.
+
+ Use [`CALL SETDIAG`](#setdiag) to replace the main diagonal of a
+ matrix in-place.
+
+ ```
+ MDIAG({1, 2, 3, 4}) ⇒
+ 1 0 0 0
+ 0 2 0 0
+ 0 0 3 0
+ 0 0 0 4
+ ```
+
+* `RESHAPE(M, NR, NC)`
+ Returns an `NR`×`NC` matrix whose elements come from `M`, which
+ must have the same number of elements as the new matrix, copying
+ elements from `M` to the new matrix row by row.
+
+ ```
+ RESHAPE(1:12, 1, 12) ⇒
+ 1 2 3 4 5 6 7 8 9 10 11 12
+ RESHAPE(1:12, 2, 6) ⇒
+ 1 2 3 4 5 6
+ 7 8 9 10 11 12
+ RESHAPE(1:12, 3, 4) ⇒
+ 1 2 3 4
+ 5 6 7 8
+ 9 10 11 12
+ RESHAPE(1:12, 4, 3) ⇒
+ 1 2 3
+ 4 5 6
+ 7 8 9
+ 10 11 12
+ ```
+
+* `T(M)`
+ `TRANSPOS(M)`
+ Returns `M` with rows exchanged for columns.
+
+ ```
+ T({1, 2, 3}) ⇒ {1; 2; 3}
+ T({1; 2; 3}) ⇒ {1, 2, 3}
+ ```
+
+* `UNIFORM(NR, NC)`
+ Returns a `NR`×`NC` matrix in which each element is randomly
+ chosen from a uniform distribution of real numbers between 0
+ and 1. Random number generation honors the current
+ [seed](set.md#seed) setting.
+
+ The following example shows one possible output, but of course
+ every result will be different (given different seeds):
+
+ ```
+ UNIFORM(4, 5)*10 ⇒
+ 7.71 2.99 .21 4.95 6.34
+ 4.43 7.49 8.32 4.99 5.83
+ 2.25 .25 1.98 7.09 7.61
+ 2.66 1.69 2.64 .88 1.50
+ ```
+
+### Minimum, Maximum, and Sum Functions
+
+* `CMIN(M)`
+ `CMAX(M)`
+ `CSUM(M)`
+ `CSSQ(M)`
+ Returns a row vector with the same number of columns as `M`, in
+ which each element is the minimum, maximum, sum, or sum of
+ squares, respectively, of the elements in the same column of `M`.
+
+ ```
+ CMIN({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {1, 2, 3}
+ CMAX({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {7, 8, 9}
+ CSUM({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {12, 15, 18}
+ CSSQ({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {66, 93, 126}
+ ```
+
+* `MMIN(M)`
+ `MMAX(M)`
+ `MSUM(M)`
+ `MSSQ(M)`
+ Returns the minimum, maximum, sum, or sum of squares, respectively,
+ of the elements of `M`.
+
+ ```
+ MMIN({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 1
+ MMAX({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 9
+ MSUM({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 45
+ MSSQ({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 285
+ ```
+
+* `RMIN(M)`
+ `RMAX(M)`
+ `RSUM(M)`
+ `RSSQ(M)`
+ Returns a column vector with the same number of rows as `M`, in
+ which each element is the minimum, maximum, sum, or sum of
+ squares, respectively, of the elements in the same row of `M`.
+
+ ```
+ RMIN({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {1; 4; 7}
+ RMAX({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {3; 6; 9}
+ RSUM({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {6; 15; 24}
+ RSSQ({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {14; 77; 194}
+ ```
+
+* `SSCP(M)`
+ Returns \\({\bf M}^{\bf T} × \bf M\\).
+
+ ```
+ SSCP({1, 2, 3; 4, 5, 6}) ⇒ {17, 22, 27; 22, 29, 36; 27, 36, 45}
+ ```
+
+* `TRACE(M)`
+ Returns the sum of the elements along `M`'s main diagonal,
+ equivalent to `MSUM(DIAG(M))`.
+
+ ```
+ TRACE(MDIAG(1:5)) ⇒ 15
+ ```
+
+### Matrix Property Functions
+
+* `NROW(M)`
+ `NCOL(M)`
+ Returns the number of row or columns, respectively, in `M`.
+
+ ```
+ NROW({1, 0; -2, -3; 3, 3}) ⇒ 3
+ NROW(1:5) ⇒ 1
+
+ NCOL({1, 0; -2, -3; 3, 3}) ⇒ 2
+ NCOL(1:5) ⇒ 5
+ ```
+
+* `DIAG(M)`
+ Returns a column vector containing a copy of M's main diagonal.
+ The vector's length is the lesser of `NCOL(M)` and `NROW(M)`.
+
+ ```
+ DIAG({1, 0; -2, -3; 3, 3}) ⇒ {1; -3}
+ ```
+
+### Matrix Rank Ordering Functions
+
+The `GRADE` and `RANK` functions each take a matrix `M` and return a
+matrix `R` with the same dimensions. Each element in `R` ranges
+between 1 and the number of elements `N` in `M`, inclusive. When the
+elements in `M` all have unique values, both of these functions yield
+the same results: the smallest element in `M` corresponds to value 1
+in R, the next smallest to 2, and so on, up to the largest to `N`.
+When multiple elements in `M` have the same value, these functions use
+different rules for handling the ties.
+
+* `GRADE(M)`
+ Returns a ranking of `M`, turning duplicate values into sequential
+ ranks. The returned matrix always contains each of the integers 1
+ through the number of elements in the matrix exactly once.
+
+ ```
+ GRADE({1, 0, 3; 3, 1, 2; 3, 0, 5}) ⇒ {3, 1, 6; 7, 4, 5; 8, 2, 9}
+ ```
+
+* `RNKORDER(M)`
+ Returns a ranking of `M`, turning duplicate values into the mean
+ of their sequential ranks.
+
+ ```
+ RNKORDER({1, 0, 3; 3, 1, 2; 3, 0, 5})
+ ⇒ {3.5, 1.5, 7; 7, 3.5, 5; 7, 1.5, 9}
+ ```
+
+One may use `GRADE` to sort a vector:
+
+```
+COMPUTE v(GRADE(v))=v. /* Sort v in ascending order.
+COMPUTE v(GRADE(-v))=v. /* Sort v in descending order.
+```
+
+### Matrix Algebra Functions
+
+* `CHOL(M)`
+ Matrix `M` must be an `N`×`N` symmetric positive-definite matrix.
+ Returns an `N`×`N` matrix `B` such that \\({\bf B}^{\bf T}×{\bf
+ B}=\bf M\\).
+
+ ```
+ CHOL({4, 12, -16; 12, 37, -43; -16, -43, 98}) ⇒
+ 2 6 -8
+ 0 1 5
+ 0 0 3
+ ```
+
+* `DESIGN(M)`
+ Returns a design matrix for `M`. The design matrix has the same
+ number of rows as `M`. Each column C in `M`, from left to right,
+ yields a group of columns in the output. For each unique value
+ `V` in `C`, from top to bottom, add a column to the output in
+ which `V` becomes 1 and other values become 0.
+
+ PSPP issues a warning if a column only contains a single unique
+ value.
+
+ ```
+ DESIGN({1; 2; 3}) ⇒ {1, 0, 0; 0, 1, 0; 0, 0, 1}
+ DESIGN({5; 8; 5}) ⇒ {1, 0; 0, 1; 1, 0}
+ DESIGN({1, 5; 2, 8; 3, 5})
+ ⇒ {1, 0, 0, 1, 0; 0, 1, 0, 0, 1; 0, 0, 1, 1, 0}
+ DESIGN({5; 5; 5}) ⇒ (warning)
+ ```
+
+* `DET(M)`
+ Returns the determinant of square matrix `M`.
+
+ ```
+ DET({3, 7; 1, -4}) ⇒ -19
+ ```
+
+* <a name="eval">`EVAL(M)`</a>
+ Returns a column vector containing the eigenvalues of symmetric
+ matrix `M`, sorted in ascending order.
+
+ Use [`CALL EIGEN`](#eigen) to compute eigenvalues and eigenvectors
+ of a matrix.
+
+ ```
+ EVAL({2, 0, 0; 0, 3, 4; 0, 4, 9}) ⇒ {11; 2; 1}
+ ```
+
+* `GINV(M)`
+ Returns the `K`×`N` matrix `A` that is the "generalized inverse"
+ of `N`×`K` matrix `M`, defined such that \\({\bf M}×{\bf A}×{\bf
+ M}={\bf M}\\) and \\({\bf A}×{\bf M}×{\bf A}={\bf A}\\).
+
+ ```
+ GINV({1, 2}) ⇒ {.2; .4} (approximately)
+ {1:9} * GINV(1:9) * {1:9} ⇒ {1:9} (approximately)
+ ```
+
+* `GSCH(M)`
+ `M` must be a `N`×`M` matrix, `M` ≥ `N`, with rank `N`. Returns
+ an `N`×`N` orthonormal basis for `M`, obtained using the
+ [Gram-Schmidt
+ process](https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process).
+
+ ```
+ GSCH({3, 2; 1, 2}) * SQRT(10) ⇒ {3, -1; 1, 3} (approximately)
+ ```
+
+* `INV(M)`
+ Returns the `N`×`N` matrix A that is the inverse of `N`×`N` matrix M,
+ defined such that \\({\bf M}×{\bf A} = {\bf A}×{\bf M} = {\bf I}\\), where I is the identity matrix. M
+ must not be singular, that is, \\(\det({\bf M}) ≠ 0\\).
+
+ ```
+ INV({4, 7; 2, 6}) ⇒ {.6, -.7; -.2, .4} (approximately)
+ ```
+
+* `KRONEKER(MA, MB)`
+ Returns the `PM`×`QN` matrix P that is the [Kroneker
+ product](https://en.wikipedia.org/wiki/Kronecker_product) of `M`×`N`
+ matrix `MA` and `P`×`Q` matrix `MB`. One may view P as the
+ concatenation of multiple `P`×`Q` blocks, each of which is the
+ scalar product of `MB` by a different element of `MA`. For example,
+ when `A` is a 2×2 matrix, `KRONEKER(A, B)` is equivalent to
+ `{A(1,1)*B, A(1,2)*B; A(2,1)*B, A(2,2)*B}`.
+
+ ```
+ KRONEKER({1, 2; 3, 4}, {0, 5; 6, 7}) ⇒
+ 0 5 0 10
+ 6 7 12 14
+ 0 15 0 20
+ 18 21 24 28
+ ```
+
+* `RANK(M)`
+ Returns the rank of matrix `M`, a integer scalar whose value is the
+ dimension of the vector space spanned by its columns or,
+ equivalently, by its rows.
+
+ ```
+ RANK({1, 0, 1; -2, -3, 1; 3, 3, 0}) ⇒ 2
+ RANK({1, 1, 0, 2; -1, -1, 0, -2}) ⇒ 1
+ RANK({1, -1; 1, -1; 0, 0; 2, -2}) ⇒ 1
+ RANK({1, 2, 1; -2, -3, 1; 3, 5, 0}) ⇒ 2
+ RANK({1, 0, 2; 2, 1, 0; 3, 2, 1}) ⇒ 3
+ ```
+
+* `SOLVE(MA, MB)`
+ MA must be an `N`×`N` matrix, with \\(\det({\bf MA}) ≠ 0\\), and MB an `P`×`Q` matrix.
+ Returns an `P`×`Q` matrix X such that \\({\bf MA} × {\bf X} = {\bf MB}\\).
+
+ All of the following examples show approximate results:
+
+ ```
+ SOLVE({2, 3; 4, 9}, {6, 2; 15, 5}) ⇒
+ 1.50 .50
+ 1.00 .33
+ SOLVE({1, 3, -2; 3, 5, 6; 2, 4, 3}, {5; 7; 8}) ⇒
+ -15.00
+ 8.00
+ 2.00
+ SOLVE({2, 1, -1; -3, -1, 2; -2, 1, 2}, {8; -11; -3}) ⇒
+ 2.00
+ 3.00
+ -1.00
+ ```
+
+* <a name="sval">`SVAL(M)`</a>
+
+ Given `P`×`Q` matrix `M`, returns a \\(\min(N,K)\\)-element column vector
+ containing the singular values of `M` in descending order.
+
+ Use [`CALL SVD`](#svd) to compute the full singular value
+ decomposition of a matrix.
+
+ ```
+ SVAL({1, 1; 0, 0}) ⇒ {1.41; .00}
+ SVAL({1, 0, 1; 0, 1, 1; 0, 0, 0}) ⇒ {1.73; 1.00; .00}
+ SVAL({2, 4; 1, 3; 0, 0; 0, 0}) ⇒ {5.46; .37}
+ ```
+
+* `SWEEP(M, NK)`
+ Given `P`×`Q` matrix `M` and integer scalar \\(k\\) = `NK` such that \\(1 ≤ k ≤
+ \min(R,C)\\), returns the `P`×`Q` sweep matrix A.
+
+ If \\({\bf M}_{kk} ≠ 0\\), then:
+
+ $$
+ \begin{align}
+ A_{kk} &= 1/M_{kk},\\\\
+ A_{ik} &= -M_{ik}/M_{kk} \text{ for } i ≠ k,\\\\
+ A_{kj} &= M_{kj}/M_{kk} \text{ for } j ≠ k,\\\\
+ A_{ij} &= M_{ij} - M_{ik}M_{kj}/M_{kk} \text{ for } i ≠ k \text{ and } j ≠ k.
+ \end{align}
+ $$
+
+ If \\({\bf M}_{kk}\\) = 0, then:
+
+ $$
+ \begin{align}
+ A_{ik} &= A_{ki} = 0, \\\\
+ A_{ij} &= M_{ij}, \text{ for } i ≠ k \text{ and } j ≠ k.
+ \end{align}
+ $$
+
+ Given `M = {0, 1, 2; 3, 4, 5; 6, 7, 8}`, then (approximately):
+
+ ```
+ SWEEP(M, 1) ⇒
+ .00 .00 .00
+ .00 4.00 5.00
+ .00 7.00 8.00
+ SWEEP(M, 2) ⇒
+ -.75 -.25 .75
+ .75 .25 1.25
+ .75 -1.75 -.75
+ SWEEP(M, 3) ⇒
+ -1.50 -.75 -.25
+ -.75 -.38 -.63
+ .75 .88 .13
+ ```
+
+### Matrix Statistical Distribution Functions
+
+The matrix language can calculate several functions of standard
+statistical distributions using the same syntax and semantics as in
+PSPP transformation expressions. See [Statistical Distribution
+Functions](../language/expressions/functions/statistical-distributions.md)
+for details.
+
+ The matrix language extends the `PDF`, `CDF`, `SIG`, `IDF`, `NPDF`,
+and `NCDF` functions by allowing the first parameters to each of these
+functions to be a vector or matrix with any dimensions. In addition,
+`CDF.BVNOR` and `PDF.BVNOR` allow either or both of their first two
+parameters to be vectors or matrices; if both are non-scalar then they
+must have the same dimensions. In each case, the result is a matrix
+or vector with the same dimensions as the input populated with
+elementwise calculations.
+
+### `EOF` Function
+
+This function works with files being used on the `READ` statement.
+
+* `EOF(FILE)`
+
+ Given a file handle or file name `FILE`, returns an integer scalar 1
+ if the last line in the file has been read or 0 if more lines are
+ available. Determining this requires attempting to read another
+ line, which means that `REREAD` on the next `READ` command
+ following `EOF` on the same file will be ineffective.
+
+The `EOF` function gives a matrix program the flexibility to read a
+file with text data without knowing the length of the file in advance.
+For example, the following program will read all the lines of data in
+`data.txt`, each consisting of three numbers, as rows in matrix `data`:
+
+```
+MATRIX.
+COMPUTE data={}.
+LOOP IF NOT EOF('data.txt').
+ READ row/FILE='data.txt'/FIELD=1 TO 1000/SIZE={1,3}.
+ COMPUTE data={data; row}.
+END LOOP.
+PRINT data.
+END MATRIX.
+```
+
+## `COMPUTE` Command
+
+```
+COMPUTE variable[(index[,index])]=expression.
+```
+
+ The `COMPUTE` command evaluates an expression and assigns the
+result to a variable or a submatrix of a variable. Assigning to a
+submatrix uses the same syntax as the [index
+operator](#index-operator-).
+
+## `CALL` Command
+
+A matrix function returns a single result. The `CALL` command
+implements procedures, which take a similar syntactic form to functions
+but yield results by modifying their arguments rather than returning a
+value.
+
+Output arguments to a `CALL` procedure must be a single variable
+name.
+
+The following procedures are implemented via `CALL` to allow them to
+return multiple results. For these procedures, the output arguments
+need not name existing variables; if they do, then their previous
+values are replaced:
+
+* <a name="eigen">`CALL EIGEN(M, EVEC, EVAL)`</a>
+
+ Computes the eigenvalues and eigenvector of symmetric `N`×`N` matrix `M`.
+ Assigns the eigenvectors of `M` to the columns of `N`×`N` matrix EVEC and
+ the eigenvalues in descending order to `N`-element column vector
+ `EVAL`.
+
+ Use the [`EVAL`](#eval) function to compute just the eigenvalues of
+ a symmetric matrix.
+
+ For example, the following matrix language commands:
+
+ ```
+ CALL EIGEN({1, 0; 0, 1}, evec, eval).
+ PRINT evec.
+ PRINT eval.
+
+ CALL EIGEN({3, 2, 4; 2, 0, 2; 4, 2, 3}, evec2, eval2).
+ PRINT evec2.
+ PRINT eval2.
+ ```
+
+ yield this output:
+
+ ```
+ evec
+ 1 0
+ 0 1
+
+ eval
+ 1
+ 1
+
+ evec2
+ -.6666666667 .0000000000 .7453559925
+ -.3333333333 -.8944271910 -.2981423970
+ -.6666666667 .4472135955 -.5962847940
+
+ eval2
+ 8.0000000000
+ -1.0000000000
+ -1.0000000000
+ ```
+
+* <a name="svd">`CALL SVD(M, U, S, V)`</a>
+
+ Computes the singular value decomposition of `P`×`Q` matrix `M`,
+ assigning `S` a `P`×`Q` diagonal matrix and to `U` and `V` unitary `P`×`Q`
+ matrices such that M = U×S×V^T. The main diagonal of `Q` contains the
+ singular values of `M`.
+
+ Use the [`SVAL`](#sval) function to compute just the singular values
+ of a matrix.
+
+ For example, the following matrix program:
+
+ ```
+ CALL SVD({3, 2, 2; 2, 3, -2}, u, s, v).
+ PRINT (u * s * T(v))/FORMAT F5.1.
+ ```
+
+ yields this output:
+
+ ```
+ (u * s * T(v))
+ 3.0 2.0 2.0
+ 2.0 3.0 -2.0
+ ```
+
+The final procedure is implemented via `CALL` to allow it to modify a
+matrix instead of returning a modified version. For this procedure,
+the output argument must name an existing variable.
+
+* <a name="setdiag">`CALL SETDIAG(M, V)`</a>
+
+ Replaces the main diagonal of `N`×`P` matrix M by the contents of
+ `K`-element vector `V`. If `K` = 1, so that `V` is a scalar, replaces all
+ of the diagonal elements of `M` by `V`. If K < \min(N,P), only the
+ upper K diagonal elements are replaced; if K > \min(N,P), then the
+ extra elements of V are ignored.
+
+ Use the [`MDIAG`](#mdiag) function to construct a new matrix with a
+ specified main diagonal.
+
+ For example, this matrix program:
+
+ ```
+ COMPUTE x={1, 2, 3; 4, 5, 6; 7, 8, 9}.
+ CALL SETDIAG(x, 10).
+ PRINT x.
+ ```
+
+ outputs the following:
+
+ ```
+ x
+ 10 2 3
+ 4 10 6
+ 7 8 10
+ ```
+
+## `PRINT` Command
+
+```
+PRINT [expression]
+ [/FORMAT=format]
+ [/TITLE=title]
+ [/SPACE={NEWPAGE | n}]
+ [{/RLABELS=string… | /RNAMES=expression}]
+ [{/CLABELS=string… | /CNAMES=expression}].
+```
+
+ The `PRINT` command is commonly used to display a matrix. It
+evaluates the restricted EXPRESSION, if present, and outputs it either
+as text or a pivot table, depending on the setting of
+[`MDISPLAY`](set.md#mdisplay).
+
+ Use the `FORMAT` subcommand to specify a format, such as `F8.2`, for
+displaying the matrix elements. `FORMAT` is optional for numerical
+matrices. When it is omitted, PSPP chooses how to format entries
+automatically using \\(m\\), the magnitude of the largest-magnitude element in
+the matrix to be displayed:
+
+ 1. If \\(m < 10^{11}\\) and the matrix's elements are all integers,
+ PSPP chooses the narrowest `F` format that fits \\(m\\) plus a
+ sign. For example, if the matrix is `{1:10}`, then \\(m = 10\\),
+ which fits in 3 columns with room for a sign, the format is
+ `F3.0`.
+
+ 2. Otherwise, if \\(m ≥ 10^9\\) or \\(m ≤ 10^{-4}\\), PSPP scales
+ all of the numbers in the matrix by \\(10^x\\), where \\(x\\) is
+ the exponent that would be used to display \\(m\\) in scientific
+ notation. For example, for \\(m = 5.123×10^{20}\\), the scale
+ factor is \\(10^{20}\\). PSPP displays the scaled values in
+ format `F13.10` and notes the scale factor in the output.
+
+ 3. Otherwise, PSPP displays the matrix values, without scaling, in
+ format `F13.10`.
+
+ The optional `TITLE` subcommand specifies a title for the output text
+or table, as a quoted string. When it is omitted, the syntax of the
+matrix expression is used as the title.
+
+ Use the `SPACE` subcommand to request extra space above the matrix
+output. With a numerical argument, it adds the specified number of
+lines of blank space above the matrix. With `NEWPAGE` as an argument,
+it prints the matrix at the top of a new page. The `SPACE` subcommand
+has no effect when a matrix is output as a pivot table.
+
+ The `RLABELS` and `RNAMES` subcommands, which are mutually exclusive,
+can supply a label to accompany each row in the output. With `RLABELS`,
+specify the labels as comma-separated strings or other tokens. With
+`RNAMES`, specify a single expression that evaluates to a vector of
+strings. Either way, if there are more labels than rows, the extra
+labels are ignored, and if there are more rows than labels, the extra
+rows are unlabeled. For output to a pivot table with `RLABELS`, the
+labels can be any length; otherwise, the labels are truncated to 8
+bytes.
+
+ The `CLABELS` and `CNAMES` subcommands work for labeling columns as
+`RLABELS` and `RNAMES` do for labeling rows.
+
+ When the EXPRESSION is omitted, `PRINT` does not output a matrix.
+Instead, it outputs only the text specified on `TITLE`, if any, preceded
+by any space specified on the `SPACE` subcommand, if any. Any other
+subcommands are ignored, and the command acts as if `MDISPLAY` is set to
+`TEXT` regardless of its actual setting.
+
+### Example
+
+ The following syntax demonstrates two different ways to label the
+rows and columns of a matrix with `PRINT`:
+
+```
+MATRIX.
+COMPUTE m={1, 2, 3; 4, 5, 6; 7, 8, 9}.
+PRINT m/RLABELS=a, b, c/CLABELS=x, y, z.
+
+COMPUTE rlabels={"a", "b", "c"}.
+COMPUTE clabels={"x", "y", "z"}.
+PRINT m/RNAMES=rlabels/CNAMES=clabels.
+END MATRIX.
+```
+
+With `MDISPLAY=TEXT` (the default), this program outputs the following
+(twice):
+
+```
+m
+ x y z
+a 1 2 3
+b 4 5 6
+c 7 8 9
+```
+
+With `SET MDISPLAY=TABLES.` added above `MATRIX.`, the output becomes
+the following (twice):
+
+```
+ m
+┌─┬─┬─┬─┐
+│ │x│y│z│
+├─┼─┼─┼─┤
+│a│1│2│3│
+│b│4│5│6│
+│c│7│8│9│
+└─┴─┴─┴─┘
+```
+
+
+## `DO IF` Command
+
+```
+DO IF expression.
+ …matrix commands…
+[ELSE IF expression.
+ …matrix commands…]…
+[ELSE
+ …matrix commands…]
+END IF.
+```
+
+ A `DO IF` command evaluates its expression argument. If the `DO IF`
+expression evaluates to true, then PSPP executes the associated
+commands. Otherwise, PSPP evaluates the expression on each `ELSE IF`
+clause (if any) in order, and executes the commands associated with the
+first one that yields a true value. Finally, if the `DO IF` and all the
+`ELSE IF` expressions all evaluate to false, PSPP executes the commands
+following the `ELSE` clause (if any).
+
+ Each expression on `DO IF` and `ELSE IF` must evaluate to a scalar.
+Positive scalars are considered to be true, and scalars that are zero or
+negative are considered to be false.
+
+### Example
+
+ The following matrix language fragment sets `b` to the term
+following `a` in the [Juggler
+sequence](https://en.wikipedia.org/wiki/Juggler_sequence):
+
+```
+DO IF MOD(a, 2) = 0.
+ COMPUTE b = TRUNC(a &** (1/2)).
+ELSE.
+ COMPUTE b = TRUNC(a &** (3/2)).
+END IF.
+```
+
+## `LOOP` and `BREAK` Commands
+
+```
+LOOP [var=first TO last [BY step]] [IF expression].
+ …matrix commands…
+END LOOP [IF expression].
+
+BREAK.
+```
+
+ The `LOOP` command executes a nested group of matrix commands,
+called the loop's "body", repeatedly. It has three optional clauses
+that control how many times the loop body executes. Regardless of
+these clauses, the global `MXLOOPS` setting, which defaults to 40,
+also limits the number of iterations of a loop. To iterate more
+times, raise the maximum with [`SET MXLOOPS`](set.md#mxloops) outside
+of the `MATRIX` command.
+
+ The optional index clause causes VAR to be assigned successive
+values on each trip through the loop: first `FIRST`, then `FIRST +
+STEP`, then `FIRST + 2 × STEP`, and so on. The loop ends when `VAR >
+LAST`, for positive `STEP`, or `VAR < LAST`, for negative `STEP`. If
+`STEP` is not specified, it defaults to 1. All the index clause
+expressions must evaluate to scalars, and non-integers are rounded
+toward zero. If `STEP` evaluates as zero (or rounds to zero), then
+the loop body never executes.
+
+ The optional `IF` on `LOOP` is evaluated before each iteration
+through the loop body. If its expression, which must evaluate to a
+scalar, is zero or negative, then the loop terminates without executing
+the loop body.
+
+ The optional `IF` on `END LOOP` is evaluated after each iteration
+through the loop body. If its expression, which must evaluate to a
+scalar, is zero or negative, then the loop terminates.
+
+### Example
+
+ The following computes and prints \\(l(n)\\), whose value is the
+number of steps in the [Juggler
+sequence](https://en.wikipedia.org/wiki/Juggler_sequence) for \\(n\\),
+for \\( 2 \le n \le 10\\):
+
+```
+COMPUTE l = {}.
+LOOP n = 2 TO 10.
+ COMPUTE a = n.
+ LOOP i = 1 TO 100.
+ DO IF MOD(a, 2) = 0.
+ COMPUTE a = TRUNC(a &** (1/2)).
+ ELSE.
+ COMPUTE a = TRUNC(a &** (3/2)).
+ END IF.
+ END LOOP IF a = 1.
+ COMPUTE l = {l; i}.
+END LOOP.
+PRINT l.
+```
+
+### `BREAK` Command
+
+The `BREAK` command may be used inside a loop body, ordinarily within a
+`DO IF` command. If it is executed, then the loop terminates
+immediately, jumping to the command just following `END LOOP`. When
+multiple `LOOP` commands nest, `BREAK` terminates the innermost loop.
+
+#### Example
+
+The following example is a revision of the one above that shows how
+`BREAK` could substitute for the index and `IF` clauses on `LOOP` and
+`END LOOP`:
+
+```
+COMPUTE l = {}.
+LOOP n = 2 TO 10.
+ COMPUTE a = n.
+ COMPUTE i = 1.
+ LOOP.
+ DO IF MOD(a, 2) = 0.
+ COMPUTE a = TRUNC(a &** (1/2)).
+ ELSE.
+ COMPUTE a = TRUNC(a &** (3/2)).
+ END IF.
+ DO IF a = 1.
+ BREAK.
+ END IF.
+ COMPUTE i = i + 1.
+ END LOOP.
+ COMPUTE l = {l; i}.
+END LOOP.
+PRINT l.
+```
+
+## `READ` and `WRITE` Commands
+
+The `READ` and `WRITE` commands perform matrix input and output with
+text files. They share the following syntax for specifying how data is
+divided among input lines:
+
+```
+/FIELD=first TO last [BY width]
+[/FORMAT=format]
+```
+
+Both commands require the `FIELD` subcommand. It specifies the range
+of columns, from FIRST to LAST, inclusive, that the data occupies on
+each line of the file. The leftmost column is column 1. The columns
+must be literal numbers, not expressions. To use entire lines, even if
+they might be very long, specify a column range such as `1 TO 100000`.
+
+The `FORMAT` subcommand is optional for numerical matrices. For
+string matrix input and output, specify an `A` format. In addition to
+`FORMAT`, the optional `BY` specification on `FIELD` determine the
+meaning of each text line:
+
+- With neither `BY` nor `FORMAT`, the numbers in the text file are in
+ `F` format separated by spaces or commas. For `WRITE`, PSPP uses
+ as many digits of precision as needed to accurately represent the
+ numbers in the matrix.
+
+- `BY width` divides the input area into fixed-width fields with the
+ given width. The input area must be a multiple of width columns
+ wide. Numbers are read or written as `Fwidth.0` format.
+
+- `FORMAT="countF"` divides the input area into integer count
+ equal-width fields per line. The input area must be a multiple of
+ count columns wide. Another format type may be substituted for
+ `F`.
+
+- `FORMAT=Fw[.d]` divides the input area into fixed-width fields
+ with width `w`. The input area must be a multiple of `w` columns
+ wide. Another format type may be substituted for `F`. The
+ `READ` command disregards `d`.
+
+- `FORMAT=F` specifies format `F` without indicating a field width.
+ Another format type may be substituted for `F`. The `WRITE`
+ command accepts this form, but it has no effect unless `BY` is also
+ used to specify a field width.
+
+If `BY` and `FORMAT` both specify or imply a field width, then they
+must indicate the same field width.
+
+### `READ` Command
+
+```
+READ variable[(index[,index])]
+ [/FILE=file]
+ /FIELD=first TO last [BY width]
+ [/FORMAT=format]
+ [/SIZE=expression]
+ [/MODE={RECTANGULAR | SYMMETRIC}]
+ [/REREAD].
+```
+
+The `READ` command reads from a text file into a matrix variable.
+Specify the target variable just after the command name, either just a
+variable name to create or replace an entire variable, or a variable
+name followed by an indexing expression to replace a submatrix of an
+existing variable.
+
+The `FILE` subcommand is required in the first `READ` command that
+appears within `MATRIX`. It specifies the text file to be read,
+either as a file name in quotes or a file handle previously declared
+on [`FILE HANDLE`](file-handle.md). Later `READ` commands (in syntax
+order) use the previous referenced file if `FILE` is omitted.
+
+The `FIELD` and `FORMAT` subcommands specify how input lines are
+interpreted. `FIELD` is required, but `FORMAT` is optional. See
+[`READ` and `WRITE` Commands](#read-and-write-commands), for details.
+
+The `SIZE` subcommand is required for reading into an entire
+variable. Its restricted expression argument should evaluate to a
+2-element vector `{N, M}` or `{N; M}`, which indicates a `N`×`M`
+matrix destination. A scalar `N` is also allowed and indicates a
+`N`×1 column vector destination. When the destination is a submatrix,
+`SIZE` is optional, and if it is present then it must match the size
+of the submatrix.
+
+By default, or with `MODE=RECTANGULAR`, the command reads an entry
+for every row and column. With `MODE=SYMMETRIC`, the command reads only
+the entries on and below the matrix's main diagonal, and copies the
+entries above the main diagonal from the corresponding symmetric entries
+below it. Only square matrices may use `MODE=SYMMETRIC`.
+
+Ordinarily, each `READ` command starts from a new line in the text
+file. Specify the `REREAD` subcommand to instead start from the last
+line read by the previous `READ` command. This has no effect for the
+first `READ` command to read from a particular file. It is also
+ineffective just after a command that uses the [`EOF` matrix
+function](#eof-function) on a particular file, because `EOF` has to
+try to read the next line from the file to determine whether the file
+contains more input.
+
+#### Example 1: Basic Use
+
+The following matrix program reads the same matrix `{1, 2, 4; 2, 3, 5;
+4, 5, 6}` into matrix variables `v`, `w`, and `x`:
+
+```
+READ v /FILE='input.txt' /FIELD=1 TO 100 /SIZE={3, 3}.
+READ w /FIELD=1 TO 100 /SIZE={3; 3} /MODE=SYMMETRIC.
+READ x /FIELD=1 TO 100 BY 1/SIZE={3, 3} /MODE=SYMMETRIC.
+```
+given that `input.txt` contains the following:
+
+```
+1, 2, 4
+2, 3, 5
+4, 5, 6
+1
+2 3
+4 5 6
+1
+23
+456
+```
+The `READ` command will read as many lines of input as needed for a
+particular row, so it's also acceptable to break any of the lines above
+into multiple lines. For example, the first line `1, 2, 4` could be
+written with a line break following either or both commas.
+
+#### Example 2: Reading into a Submatrix
+
+The following reads a 5×5 matrix from `input2.txt`, reversing the order
+of the rows:
+
+```
+COMPUTE m = MAKE(5, 5, 0).
+LOOP r = 5 TO 1 BY -1.
+ READ m(r, :) /FILE='input2.txt' /FIELD=1 TO 100.
+END LOOP.
+```
+#### Example 3: Using `REREAD`
+
+Suppose each of the 5 lines in a file `input3.txt` starts with an
+integer COUNT followed by COUNT numbers, e.g.:
+
+```
+1 5
+3 1 2 3
+5 6 -1 2 5 1
+2 8 9
+3 1 3 2
+```
+Then, the following reads this file into a matrix `m`:
+
+```
+COMPUTE m = MAKE(5, 5, 0).
+LOOP i = 1 TO 5.
+ READ count /FILE='input3.txt' /FIELD=1 TO 1 /SIZE=1.
+ READ m(i, 1:count) /FIELD=3 TO 100 /REREAD.
+END LOOP.
+```
+### `WRITE` Command
+
+```
+WRITE expression
+ [/OUTFILE=file]
+ /FIELD=first TO last [BY width]
+ [/FORMAT=format]
+ [/MODE={RECTANGULAR | TRIANGULAR}]
+ [/HOLD].
+```
+The `WRITE` command evaluates expression and writes its value to a
+text file in a specified format. Write the expression to evaluate just
+after the command name.
+
+The `OUTFILE` subcommand is required in the first `WRITE` command that
+appears within `MATRIX`. It specifies the text file to be written,
+either as a file name in quotes or a file handle previously declared
+on [`FILE HANDLE`](file-handle.md). Later `WRITE` commands (in syntax
+order) use the previous referenced file if `FILE` is omitted.
+
+The `FIELD` and `FORMAT` subcommands specify how output lines are
+formed. `FIELD` is required, but `FORMAT` is optional. See [`READ`
+and `WRITE` Commands](#read-and-write-commands), for details.
+
+By default, or with `MODE=RECTANGULAR`, the command writes an entry
+for every row and column. With `MODE=TRIANGULAR`, the command writes
+only the entries on and below the matrix's main diagonal. Entries above
+the diagonal are not written. Only square matrices may be written with
+`MODE=TRIANGULAR`.
+
+Ordinarily, each `WRITE` command writes complete lines to the output
+file. With `HOLD`, the final line written by `WRITE` will be held back
+for the next `WRITE` command to augment. This can be useful to write
+more than one matrix on a single output line.
+
+#### Example 1: Basic Usage
+
+This matrix program:
+
+```
+WRITE {1, 2; 3, 4} /OUTFILE='matrix.txt' /FIELD=1 TO 80.
+```
+writes the following to `matrix.txt`:
+
+```
+ 1 2
+ 3 4
+```
+#### Example 2: Triangular Matrix
+
+This matrix program:
+
+```
+WRITE MAGIC(5) /OUTFILE='matrix.txt' /FIELD=1 TO 80 BY 5 /MODE=TRIANGULAR.
+```
+writes the following to `matrix.txt`:
+
+```
+ 17
+ 23 5
+ 4 6 13
+ 10 12 19 21
+ 11 18 25 2 9
+```
+## `GET` Command
+
+```
+GET variable[(index[,index])]
+ [/FILE={file | *}]
+ [/VARIABLES=variable…]
+ [/NAMES=variable]
+ [/MISSING={ACCEPT | OMIT | number}]
+ [/SYSMIS={OMIT | number}].
+```
+ The `READ` command reads numeric data from an SPSS system file,
+SPSS/PC+ system file, or SPSS portable file into a matrix variable or
+submatrix:
+
+- To read data into a variable, specify just its name following
+ `GET`. The variable need not already exist; if it does, it is
+ replaced. The variable will have as many columns as there are
+ variables specified on the `VARIABLES` subcommand and as many rows
+ as there are cases in the input file.
+
+- To read data into a submatrix, specify the name of an existing
+ variable, followed by an indexing expression, just after `GET`.
+ The submatrix must have as many columns as variables specified on
+ `VARIABLES` and as many rows as cases in the input file.
+
+Specify the name or handle of the file to be read on `FILE`. Use
+`*`, or simply omit the `FILE` subcommand, to read from the active file.
+Reading from the active file is only permitted if it was already defined
+outside `MATRIX`.
+
+List the variables to be read as columns in the matrix on the
+`VARIABLES` subcommand. The list can use `TO` for collections of
+variables or `ALL` for all variables. If `VARIABLES` is omitted, all
+variables are read. Only numeric variables may be read.
+
+If a variable is named on `NAMES`, then the names of the variables
+read as data columns are stored in a string vector within the given
+name, replacing any existing matrix variable with that name. Variable
+names are truncated to 8 bytes.
+
+The `MISSING` and `SYSMIS` subcommands control the treatment of
+missing values in the input file. By default, any user- or
+system-missing data in the variables being read from the input causes an
+error that prevents `GET` from executing. To accept missing values,
+specify one of the following settings on `MISSING`:
+
+* `ACCEPT`: Accept user-missing values with no change.
+
+ By default, system-missing values still yield an error. Use the
+ `SYSMIS` subcommand to change this treatment:
+
+ - `OMIT`: Skip any case that contains a system-missing value.
+
+ - `number`: Recode the system-missing value to `number`.
+
+* `OMIT`: Skip any case that contains any user- or system-missing value.
+
+* `number`: Recode all user- and system-missing values to `number`.
+
+The `SYSMIS` subcommand has an effect only with `MISSING=ACCEPT`.
+
+## `SAVE` Command
+
+```
+SAVE expression
+ [/OUTFILE={file | *}]
+ [/VARIABLES=variable…]
+ [/NAMES=expression]
+ [/STRINGS=variable…].
+```
+The `SAVE` matrix command evaluates expression and writes the
+resulting matrix to an SPSS system file. In the system file, each
+matrix row becomes a case and each column becomes a variable.
+
+Specify the name or handle of the SPSS system file on the `OUTFILE`
+subcommand, or `*` to write the output as the new active file. The
+`OUTFILE` subcommand is required on the first `SAVE` command, in syntax
+order, within `MATRIX`. For `SAVE` commands after the first, the
+default output file is the same as the previous.
+
+When multiple `SAVE` commands write to one destination within a
+single `MATRIX`, the later commands append to the same output file. All
+the matrices written to the file must have the same number of columns.
+The `VARIABLES`, `NAMES`, and `STRINGS` subcommands are honored only for
+the first `SAVE` command that writes to a given file.
+
+By default, `SAVE` names the variables in the output file `COL1`
+through `COLn`. Use `VARIABLES` or `NAMES` to give the variables
+meaningful names. The `VARIABLES` subcommand accepts a comma-separated
+list of variable names. Its alternative, `NAMES`, instead accepts an
+expression that must evaluate to a row or column string vector of names.
+The number of names need not exactly match the number of columns in the
+matrix to be written: extra names are ignored; extra columns use default
+names.
+
+By default, `SAVE` assumes that the matrix to be written is all
+numeric. To write string columns, specify a comma-separated list of the
+string columns' variable names on `STRINGS`.
+
+## `MGET` Command
+
+```
+MGET [/FILE=file]
+ [/TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}].
+```
+The `MGET` command reads the data from a [matrix file](matrices.md#matrix-files) into matrix variables.
+
+All of `MGET`'s subcommands are optional. Specify the name or handle
+of the matrix file to be read on the `FILE` subcommand; if it is
+omitted, then the command reads the active file.
+
+By default, `MGET` reads all of the data from the matrix file.
+Specify a space-delimited list of matrix types on `TYPE` to limit the
+kinds of data to the one specified:
+
+* `COV`: Covariance matrix.
+* `CORR`: Correlation coefficient matrix.
+* `MEAN`: Vector of means.
+* `STDDEV`: Vector of standard deviations.
+* `N`: Vector of case counts.
+* `COUNT`: Vector of counts.
+
+`MGET` reads the entire matrix file and automatically names, creates,
+and populates matrix variables using its contents. It constructs the
+name of each variable by concatenating the following:
+
+- A 2-character prefix that identifies the type of the matrix:
+
+ * `CV`: Covariance matrix.
+ * `CR`: Correlation coefficient matrix.
+ * `MN`: Vector of means.
+ * `SD`: Vector of standard deviations.
+ * `NC`: Vector of case counts.
+ * `CN`: Vector of counts.
+
+- If the matrix file has factor variables, `Fn`, where `n` is a number
+ identifying a group of factors: `F1` for the first group, `F2` for
+ the second, and so on. This part is omitted for pooled data (where
+ the factors all have the system-missing value).
+
+- If the matrix file has split file variables, `Sn`, where n is a
+ number identifying a split group: `S1` for the first group, `S2`
+ for the second, and so on.
+
+If `MGET` chooses the name of an existing variable, it issues a
+warning and does not change the variable.
+
+## `MSAVE` Command
+
+```
+MSAVE expression
+ /TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}
+ [/FACTOR=expression]
+ [/SPLIT=expression]
+ [/OUTFILE=file]
+ [/VARIABLES=variable…]
+ [/SNAMES=variable…]
+ [/FNAMES=variable…].
+```
+The `MSAVE` command evaluates the expression specified just after the
+command name, and writes the resulting matrix to a [matrix file](matrices.md#matrix-files).
+
+The `TYPE` subcommand is required. It specifies the `ROWTYPE_` to
+write along with this matrix.
+
+The `FACTOR` and `SPLIT` subcommands are required on the first
+`MSAVE` if and only if the matrix file has factor or split variables,
+respectively. After that, their values are carried along from one
+`MSAVE` command to the next in syntax order as defaults. Each one takes
+an expression that must evaluate to a vector with the same number of
+entries as the matrix has factor or split variables, respectively. Each
+`MSAVE` only writes data for a single combination of factor and split
+variables, so many `MSAVE` commands (or one inside a loop) may be needed
+to write a complete set.
+
+The remaining `MSAVE` subcommands define the format of the matrix
+file. All of the `MSAVE` commands within a given matrix program write
+to the same matrix file, so these subcommands are only meaningful on the
+first `MSAVE` command within a matrix program. (If they are given again
+on later `MSAVE` commands, then they must have the same values as on the
+first.)
+
+The `OUTFILE` subcommand specifies the name or handle of the matrix
+file to be written. Output must go to an external file, not a data set
+or the active file.
+
+The `VARIABLES` subcommand specifies a comma-separated list of the
+names of the continuous variables to be written to the matrix file. The
+`TO` keyword can be used to define variables named with consecutive
+integer suffixes. These names become column names and names that appear
+in `VARNAME_` in the matrix file. `ROWTYPE_` and `VARNAME_` are not
+allowed on `VARIABLES`. If `VARIABLES` is omitted, then PSPP uses the
+names `COL1`, `COL2`, and so on.
+
+The `FNAMES` subcommand may be used to supply a comma-separated list
+of factor variable names. The default names are `FAC1`, `FAC2`, and so
+on.
+
+The `SNAMES` subcommand can supply a comma-separated list of split
+variable names. The default names are `SPL1`, `SPL2`, and so on.
+
+## `DISPLAY` Command
+
+```
+DISPLAY [{DICTIONARY | STATUS}].
+```
+The `DISPLAY` command makes PSPP display a table with the name and
+dimensions of each matrix variable. The `DICTIONARY` and `STATUS`
+keywords are accepted but have no effect.
+
+## `RELEASE` Command
+
+```
+RELEASE variable….
+```
+The `RELEASE` command accepts a comma-separated list of matrix
+variable names. It deletes each variable and releases the memory
+associated with it.
+
+The `END MATRIX` command releases all matrix variables.
+++ /dev/null
-# Matrices
-
-Some PSPP procedures work with matrices by producing numeric matrices
-that report results of data analysis, or by consuming matrices as a
-basis for further analysis. This chapter documents the [format of
-data files](#matrix-files) that store these matrices and commands for
-working with them, as well as PSPP's general-purpose facility for
-matrix operations.
-
-## Matrix Files
-
-A matrix file is an SPSS system file that conforms to the dictionary and
-case structure described in this section. Procedures that read matrices
-from files expect them to be in the matrix file format. Procedures that
-write matrices also use this format.
-
-Text files that contain matrices can be converted to matrix file
-format. The [MATRIX DATA](matrix-data.md) command can read a text
-file as a matrix file.
-
-A matrix file's dictionary must have the following variables in the
-specified order:
-
-1. Zero or more numeric split variables. These are included by
- procedures when [`SPLIT
- FILE`](../../commands/selection/split-file.md) is active. [`MATRIX
- DATA`](matrix-data.md) assigns split variables format `F4.0`.
-
-2. `ROWTYPE_`, a string variable with width 8. This variable
- indicates the kind of matrix or vector that a given case
- represents. The supported row types are listed below.
-
-3. Zero or more numeric factor variables. These are included by
- procedures that divide data into cells. For within-cell data,
- factor variables are filled with non-missing values; for pooled
- data, they are missing. [`MATRIX DATA`](matrix-data.md) assigns
- factor variables format `F4.0`.
-
-4. `VARNAME_`, a string variable. Matrix data includes one row per
- continuous variable (see below), naming each continuous variable in
- order. This column is blank for vector data. [`MATRIX
- DATA`](matrix-data.md) makes `VARNAME_` wide enough for the name of
- any of the continuous variables, but at least 8 bytes.
-
-5. One or more numeric continuous variables. These are the variables
- whose data was analyzed to produce the matrices. [`MATRIX
- DATA`](matrix-data.md) assigns continuous variables format `F10.4`.
-
-Case weights are ignored in matrix files.
-
-### Row Types
-
-Matrix files support a fixed set of types of matrix and vector data.
-The `ROWTYPE_` variable in each case of a matrix file indicates its row
-type.
-
-The supported matrix row types are listed below. Each type is listed
-with the keyword that identifies it in `ROWTYPE_`. All supported types
-of matrices are square, meaning that each matrix must include one row
-per continuous variable, with the `VARNAME_` variable indicating each
-continuous variable in turn in the same order as the dictionary.
-
-* `CORR`
- Correlation coefficients.
-
-* `COV`
- Covariance coefficients.
-
-* `MAT`
- General-purpose matrix.
-
-* `N_MATRIX`
- Counts.
-
-* `PROX`
- Proximities matrix.
-
-The supported vector row types are listed below, along with their
-associated keyword. Vector row types only require a single row, whose
-`VARNAME_` is blank:
-
-* `COUNT`
- Unweighted counts.
-
-* `DFE`
- Degrees of freedom.
-
-* `MEAN`
- Means.
-
-* `MSE`
- Mean squared errors.
-
-* `N`
- Counts.
-
-* `STDDEV`
- Standard deviations.
-
-Only the row types listed above may appear in matrix files. The
-[`MATRIX DATA`](matrix-data.md) command, however, accepts the additional row types
-listed below, which it changes into matrix file row types as part of
-its conversion process:
-
-* `N_VECTOR`
- Synonym for `N`.
-
-* `SD`
- Synonym for `STDDEV`.
-
-* `N_SCALAR`
- Accepts a single number from the [`MATRIX DATA`](matrix-data.md)
- input and writes it as an `N` row with the number replicated across
- all the continuous variables.
-
+++ /dev/null
-MATRIX DATA
-================
-
-```
-MATRIX DATA
- VARIABLES=VARIABLES
- [FILE={'FILE_NAME' | INLINE}
- [/FORMAT=[{LIST | FREE}]
- [{UPPER | LOWER | FULL}]
- [{DIAGONAL | NODIAGONAL}]]
- [/SPLIT=SPLIT_VARS]
- [/FACTORS=FACTOR_VARS]
- [/N=N]
-
-The following subcommands are only needed when ROWTYPE_ is not
-specified on the VARIABLES subcommand:
- [/CONTENTS={CORR,COUNT,COV,DFE,MAT,MEAN,MSE,
- N_MATRIX,N|N_VECTOR,N_SCALAR,PROX,SD|STDDEV}]
- [/CELLS=N_CELLS]
-```
-
-The `MATRIX DATA` command convert matrices and vectors from text
-format into the [matrix file format](index.md#matrix-files) for use by
-procedures that read matrices. It reads a text file or inline data and
-outputs to the active file, replacing any data already in the active
-dataset. The matrix file may then be used by other commands directly
-from the active file, or it may be written to a `.sav` file using the
-`SAVE` command.
-
-The text data read by `MATRIX DATA` can be delimited by spaces or
-commas. A plus or minus sign, except immediately following a `d` or
-`e`, also begins a new value. Optionally, values may be enclosed in
-single or double quotes.
-
-`MATRIX DATA` can read the types of matrix and vector data supported
-in matrix files (see [Row Types](index.md#row-types)).
-
-The `FILE` subcommand specifies the source of the command's input. To
-read input from a text file, specify its name in quotes. To supply
-input inline, omit `FILE` or specify `INLINE`. Inline data must
-directly follow `MATRIX DATA`, inside [`BEGIN
-DATA`](../../commands/data-io/begin-data.md).
-
-`VARIABLES` is the only required subcommand. It names the variables
-present in each input record in the order that they appear. (`MATRIX
-DATA` reorders the variables in the matrix file it produces, if needed
-to fit the matrix file format.) The variable list must include split
-variables and factor variables, if they are present in the data, in
-addition to the continuous variables that form matrix rows and columns.
-It may also include a special variable named `ROWTYPE_`.
-
-Matrix data may include split variables or factor variables or both.
-List split variables, if any, on the `SPLIT` subcommand and factor
-variables, if any, on the `FACTORS` subcommand. Split and factor
-variables must be numeric. Split and factor variables must also be
-listed on `VARIABLES`, with one exception: if `VARIABLES` does not
-include `ROWTYPE_`, then `SPLIT` may name a single variable that is not
-in `VARIABLES` (see [Example 8](#example-8-split-variable-with-sequential-values)).
-
-The `FORMAT` subcommand accepts settings to describe the format of
-the input data:
-
-* `LIST` (default)
- `FREE`
-
- `LIST` requires each row to begin at the start of a new input line.
- `FREE` allows rows to begin in the middle of a line. Either setting
- allows a single row to continue across multiple input lines.
-
-* `LOWER` (default)
- `UPPER`
- `FULL`
-
- With `LOWER`, only the lower triangle is read from the input data and
- the upper triangle is mirrored across the main diagonal. `UPPER`
- behaves similarly for the upper triangle. `FULL` reads the entire
- matrix.
-
-* `DIAGONAL` (default)
- `NODIAGONAL`
-
- With `DIAGONAL`, the main diagonal is read from the input data. With
- `NODIAGONAL`, which is incompatible with `FULL`, the main diagonal is
- not read from the input data but instead set to 1 for correlation
- matrices and system-missing for others.
-
-The `N` subcommand is a way to specify the size of the population.
-It is equivalent to specifying an `N` vector with the specified value
-for each split file.
-
-`MATRIX DATA` supports two different ways to indicate the kinds of
-matrices and vectors present in the data, depending on whether a
-variable with the special name `ROWTYPE_` is present in `VARIABLES`.
-The following subsections explain `MATRIX DATA` syntax and behavior in
-each case.
-
-<!-- toc -->
-
-## With `ROWTYPE_`
-
-If `VARIABLES` includes `ROWTYPE_`, each case's `ROWTYPE_` indicates
-the type of data contained in the row. See [Row
-Types](index.md#row-types) for a list of supported row types.
-
-### Example 1: Defaults with `ROWTYPE_`
-
-This example shows a simple use of `MATRIX DATA` with `ROWTYPE_` plus 8
-variables named `var01` through `var08`.
-
-Because `ROWTYPE_` is the first variable in `VARIABLES`, it appears
-first on each line. The first three lines in the example data have
-`ROWTYPE_` values of `MEAN`, `SD`, and `N`. These indicate that these
-lines contain vectors of means, standard deviations, and counts,
-respectively, for `var01` through `var08` in order.
-
-The remaining 8 lines have a ROWTYPE_ of `CORR` which indicates that
-the values are correlation coefficients. Each of the lines corresponds
-to a row in the correlation matrix: the first line is for `var01`, the
-next line for `var02`, and so on. The input only contains values for
-the lower triangle, including the diagonal, since `FORMAT=LOWER
-DIAGONAL` is the default.
-
-With `ROWTYPE_`, the `CONTENTS` subcommand is optional and the
-`CELLS` subcommand may not be used.
-
-```
-MATRIX DATA
- VARIABLES=ROWTYPE_ var01 TO var08.
-BEGIN DATA.
-MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
-SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
-N 92 92 92 92 92 92 92 92
-CORR 1.00
-CORR .18 1.00
-CORR -.22 -.17 1.00
-CORR .36 .31 -.14 1.00
-CORR .27 .16 -.12 .22 1.00
-CORR .33 .15 -.17 .24 .21 1.00
-CORR .50 .29 -.20 .32 .12 .38 1.00
-CORR .17 .29 -.05 .20 .27 .20 .04 1.00
-END DATA.
-```
-
-### Example 2: `FORMAT=UPPER NODIAGONAL`
-
-This syntax produces the same matrix file as example 1, but it uses
-`FORMAT=UPPER NODIAGONAL` to specify the upper triangle and omit the
-diagonal. Because the matrix's `ROWTYPE_` is `CORR`, PSPP automatically
-fills in the diagonal with 1.
-
-```
-MATRIX DATA
- VARIABLES=ROWTYPE_ var01 TO var08
- /FORMAT=UPPER NODIAGONAL.
-BEGIN DATA.
-MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
-SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
-N 92 92 92 92 92 92 92 92
-CORR .17 .50 -.33 .27 .36 -.22 .18
-CORR .29 .29 -.20 .32 .12 .38
-CORR .05 .20 -.15 .16 .21
-CORR .20 .32 -.17 .12
-CORR .27 .12 -.24
-CORR -.20 -.38
-CORR .04
-END DATA.
-```
-
-### Example 3: `N` subcommand
-
-This syntax uses the `N` subcommand in place of an `N` vector. It
-produces the same matrix file as examples 1 and 2.
-
-```
-MATRIX DATA
- VARIABLES=ROWTYPE_ var01 TO var08
- /FORMAT=UPPER NODIAGONAL
- /N 92.
-BEGIN DATA.
-MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
-SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
-CORR .17 .50 -.33 .27 .36 -.22 .18
-CORR .29 .29 -.20 .32 .12 .38
-CORR .05 .20 -.15 .16 .21
-CORR .20 .32 -.17 .12
-CORR .27 .12 -.24
-CORR -.20 -.38
-CORR .04
-END DATA.
-```
-
-### Example 4: Split variables
-
-This syntax defines two matrices, using the variable `s1` to distinguish
-between them. Notice how the order of variables in the input matches
-their order on `VARIABLES`. This example also uses `FORMAT=FULL`.
-
-```
-MATRIX DATA
- VARIABLES=s1 ROWTYPE_ var01 TO var04
- /SPLIT=s1
- /FORMAT=FULL.
-BEGIN DATA.
-0 MEAN 34 35 36 37
-0 SD 22 11 55 66
-0 N 99 98 99 92
-0 CORR 1 .9 .8 .7
-0 CORR .9 1 .6 .5
-0 CORR .8 .6 1 .4
-0 CORR .7 .5 .4 1
-1 MEAN 44 45 34 39
-1 SD 23 15 51 46
-1 N 98 34 87 23
-1 CORR 1 .2 .3 .4
-1 CORR .2 1 .5 .6
-1 CORR .3 .5 1 .7
-1 CORR .4 .6 .7 1
-END DATA.
-```
-
-### Example 5: Factor variables
-
-This syntax defines a matrix file that includes a factor variable `f1`.
-The data includes mean, standard deviation, and count vectors for two
-values of the factor variable, plus a correlation matrix for pooled
-data.
-
-```
-MATRIX DATA
- VARIABLES=ROWTYPE_ f1 var01 TO var04
- /FACTOR=f1.
-BEGIN DATA.
-MEAN 0 34 35 36 37
-SD 0 22 11 55 66
-N 0 99 98 99 92
-MEAN 1 44 45 34 39
-SD 1 23 15 51 46
-N 1 98 34 87 23
-CORR . 1
-CORR . .9 1
-CORR . .8 .6 1
-CORR . .7 .5 .4 1
-END DATA.
-```
-
-## Without `ROWTYPE_`
-
-If `VARIABLES` does not contain `ROWTYPE_`, the `CONTENTS` subcommand
-defines the row types that appear in the file and their order. If
-`CONTENTS` is omitted, `CONTENTS=CORR` is assumed.
-
-Factor variables without `ROWTYPE_` introduce special requirements,
-illustrated below in Examples 8 and 9.
-
-### Example 6: Defaults without `ROWTYPE_`
-
-This example shows a simple use of `MATRIX DATA` with 8 variables named
-`var01` through `var08`, without `ROWTYPE_`. This yields the same
-matrix file as [Example 1](#example-1-defaults-with-rowtype_).
-
-```
-MATRIX DATA
- VARIABLES=var01 TO var08
- /CONTENTS=MEAN SD N CORR.
-BEGIN DATA.
-24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
- 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
- 92 92 92 92 92 92 92 92
-1.00
- .18 1.00
--.22 -.17 1.00
- .36 .31 -.14 1.00
- .27 .16 -.12 .22 1.00
- .33 .15 -.17 .24 .21 1.00
- .50 .29 -.20 .32 .12 .38 1.00
- .17 .29 -.05 .20 .27 .20 .04 1.00
-END DATA.
-```
-
-### Example 7: Split variables with explicit values
-
-This syntax defines two matrices, using the variable `s1` to distinguish
-between them. Each line of data begins with `s1`. This yields the same
-matrix file as [Example 4](#example-4-split-variables).
-
-```
-MATRIX DATA
- VARIABLES=s1 var01 TO var04
- /SPLIT=s1
- /FORMAT=FULL
- /CONTENTS=MEAN SD N CORR.
-BEGIN DATA.
-0 34 35 36 37
-0 22 11 55 66
-0 99 98 99 92
-0 1 .9 .8 .7
-0 .9 1 .6 .5
-0 .8 .6 1 .4
-0 .7 .5 .4 1
-1 44 45 34 39
-1 23 15 51 46
-1 98 34 87 23
-1 1 .2 .3 .4
-1 .2 1 .5 .6
-1 .3 .5 1 .7
-1 .4 .6 .7 1
-END DATA.
-```
-
-### Example 8: Split variable with sequential values
-
-Like this previous example, this syntax defines two matrices with split
-variable `s1`. In this case, though, `s1` is not listed in `VARIABLES`,
-which means that its value does not appear in the data. Instead,
-`MATRIX DATA` reads matrix data until the input is exhausted, supplying
-1 for the first split, 2 for the second, and so on.
-
-```
-MATRIX DATA
- VARIABLES=var01 TO var04
- /SPLIT=s1
- /FORMAT=FULL
- /CONTENTS=MEAN SD N CORR.
-BEGIN DATA.
-34 35 36 37
-22 11 55 66
-99 98 99 92
- 1 .9 .8 .7
-.9 1 .6 .5
-.8 .6 1 .4
-.7 .5 .4 1
-44 45 34 39
-23 15 51 46
-98 34 87 23
- 1 .2 .3 .4
-.2 1 .5 .6
-.3 .5 1 .7
-.4 .6 .7 1
-END DATA.
-```
-
-### Factor variables without `ROWTYPE_`
-
-Without `ROWTYPE_`, factor variables introduce two new wrinkles to
-`MATRIX DATA` syntax. First, the `CELLS` subcommand must declare the
-number of combinations of factor variables present in the data. If
-there is, for example, one factor variable for which the data contains
-three values, one would write `CELLS=3`; if there are two (or more)
-factor variables for which the data contains five combinations, one
-would use `CELLS=5`; and so on.
-
-Second, the `CONTENTS` subcommand must distinguish within-cell data
-from pooled data by enclosing within-cell row types in parentheses.
-When different within-cell row types for a single factor appear in
-subsequent lines, enclose the row types in a single set of parentheses;
-when different factors' values for a given within-cell row type appear
-in subsequent lines, enclose each row type in individual parentheses.
-
-Without `ROWTYPE_`, input lines for pooled data do not include factor
-values, not even as missing values, but input lines for within-cell data
-do.
-
-The following examples aim to clarify this syntax.
-
-#### Example 9: Factor variables, grouping within-cell records by factor
-
-This syntax defines the same matrix file as [Example
-5](#example-5-factor-variables), without using `ROWTYPE_`. It
-declares `CELLS=2` because the data contains two values (0 and 1) for
-factor variable `f1`. Within-cell vector row types `MEAN`, `SD`, and
-`N` are in a single set of parentheses on `CONTENTS` because they are
-grouped together in subsequent lines for a single factor value. The
-data lines with the pooled correlation matrix do not have any factor
-values.
-
-```
-MATRIX DATA
- VARIABLES=f1 var01 TO var04
- /FACTOR=f1
- /CELLS=2
- /CONTENTS=(MEAN SD N) CORR.
-BEGIN DATA.
-0 34 35 36 37
-0 22 11 55 66
-0 99 98 99 92
-1 44 45 34 39
-1 23 15 51 46
-1 98 34 87 23
- 1
- .9 1
- .8 .6 1
- .7 .5 .4 1
-END DATA.
-```
-
-#### Example 10: Factor variables, grouping within-cell records by row type
-
-This syntax defines the same matrix file as the previous example. The
-only difference is that the within-cell vector rows are grouped
-differently: two rows of means (one for each factor), followed by two
-rows of standard deviations, followed by two rows of counts.
-
-```
-MATRIX DATA
- VARIABLES=f1 var01 TO var04
- /FACTOR=f1
- /CELLS=2
- /CONTENTS=(MEAN) (SD) (N) CORR.
-BEGIN DATA.
-0 34 35 36 37
-1 44 45 34 39
-0 22 11 55 66
-1 23 15 51 46
-0 99 98 99 92
-1 98 34 87 23
- 1
- .9 1
- .8 .6 1
- .7 .5 .4 1
-END DATA.
-```
+++ /dev/null
-# MATRIX…END MATRIX
-
-<!-- toc -->
-
-## Summary
-
-```
-MATRIX.
-…matrix commands…
-END MATRIX.
-```
-
-The following basic matrix commands are supported:
-
-```
-COMPUTE variable[(index[,index])]=expression.
-CALL procedure(argument, …).
-PRINT [expression]
- [/FORMAT=format]
- [/TITLE=title]
- [/SPACE={NEWPAGE | n}]
- [{/RLABELS=string… | /RNAMES=expression}]
- [{/CLABELS=string… | /CNAMES=expression}].
-```
-
-The following matrix commands offer support for flow control:
-
-```
-DO IF expression.
- …matrix commands…
-[ELSE IF expression.
- …matrix commands…]…
-[ELSE
- …matrix commands…]
-END IF.
-
-LOOP [var=first TO last [BY step]] [IF expression].
- …matrix commands…
-END LOOP [IF expression].
-
-BREAK.
-```
-
-The following matrix commands support matrix input and output:
-
-```
-READ variable[(index[,index])]
- [/FILE=file]
- /FIELD=first TO last [BY width]
- [/FORMAT=format]
- [/SIZE=expression]
- [/MODE={RECTANGULAR | SYMMETRIC}]
- [/REREAD].
-WRITE expression
- [/OUTFILE=file]
- /FIELD=first TO last [BY width]
- [/MODE={RECTANGULAR | TRIANGULAR}]
- [/HOLD]
- [/FORMAT=format].
-GET variable[(index[,index])]
- [/FILE={file | *}]
- [/VARIABLES=variable…]
- [/NAMES=expression]
- [/MISSING={ACCEPT | OMIT | number}]
- [/SYSMIS={OMIT | number}].
-SAVE expression
- [/OUTFILE={file | *}]
- [/VARIABLES=variable…]
- [/NAMES=expression]
- [/STRINGS=variable…].
-MGET [/FILE=file]
- [/TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}].
-MSAVE expression
- /TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}
- [/OUTFILE=file]
- [/VARIABLES=variable…]
- [/SNAMES=variable…]
- [/SPLIT=expression]
- [/FNAMES=variable…]
- [/FACTOR=expression].
-```
-
-The following matrix commands provide additional support:
-
-```
-DISPLAY [{DICTIONARY | STATUS}].
-RELEASE variable….
-```
-
-`MATRIX` and `END MATRIX` enclose a special PSPP sub-language, called
-the matrix language. The matrix language does not require an active
-dataset to be defined and only a few of the matrix language commands
-work with any datasets that are defined. Each instance of
-`MATRIX`…`END MATRIX` is a separate program whose state is independent
-of any instance, so that variables declared within a matrix program are
-forgotten at its end.
-
-The matrix language works with matrices, where a "matrix" is a
-rectangular array of real numbers. An `N`×`M` matrix has `N` rows and
-`M` columns. Some special cases are important: a `N`×1 matrix is a
-"column vector", a 1×`N` is a "row vector", and a 1×1 matrix is a
-"scalar".
-
-The matrix language also has limited support for matrices that
-contain 8-byte strings instead of numbers. Strings longer than 8 bytes
-are truncated, and shorter strings are padded with spaces. String
-matrices are mainly useful for labeling rows and columns when printing
-numerical matrices with the `MATRIX PRINT` command. Arithmetic
-operations on string matrices will not produce useful results. The user
-should not mix strings and numbers within a matrix.
-
-The matrix language does not work with cases. A variable in the
-matrix language represents a single matrix.
-
-The matrix language does not support missing values.
-
-`MATRIX` is a procedure, so it cannot be enclosed inside `DO IF`,
-`LOOP`, etc.
-
-Macros defined before a matrix program may be used within a matrix
-program, and macros may expand to include entire matrix programs. The
-[`DEFINE`](../../commands/control/define.md) command to define new
-macros may not appear within a matrix program.
-
-The following sections describe the details of the matrix language:
-first, the syntax of matrix expressions, then each of the supported
-commands. The [`COMMENT`](../utilities/comment.md) command is also
-supported.
-
-## Matrix Expressions
-
-Many matrix commands use expressions. A matrix expression may use the
-following operators, listed in descending order of operator precedence.
-Within a single level, operators associate from left to right.
-
-- [Function call `()`](#matrix-functions) and [matrix construction `{}`](#matrix-construction-operator-)
-
-- [Indexing `()`](#index-operator-)
-
-- [Unary `+` and `-`](#unary-operators)
-
-- [Integer sequence `:`](#integer-sequence-operator-)
-
-- Matrix [`**`](#matrix-exponentiation-operator-) and elementwise [`&**`](#elementwise-binary-operators) exponentiation.
-
-- Matrix [`*`](#matrix-multiplication-operator-) and elementwise [`&*`](#elementwise-binary-operators) multiplication; [elementwise division `/` and `&/`](#elementwise-binary-operators).
-
-- [Addition `+` and subtraction `-`](#elementwise-binary-operators)
-
-- [Relational `<` `<=` `=` `>=` `>` `<>`](#elementwise-binary-operators)
-
-- [Logical `NOT`](#unary-operators)
-
-- [Logical `AND`](#elementwise-binary-operators)
-
-- [Logical `OR` and `XOR`](#elementwise-binary-operators)
-
-The operators are described in more detail below. [Matrix
-Functions](#matrix-functions) documents matrix functions.
-
-Expressions appear in the matrix language in some contexts where there
-would be ambiguity whether `/` is an operator or a separator between
-subcommands. In these contexts, only the operators with higher
-precedence than `/` are allowed outside parentheses. Later sections
-call these "restricted expressions".
-
-### Matrix Construction Operator `{}`
-
-Use the `{}` operator to construct matrices. Within the curly braces,
-commas separate elements within a row and semicolons separate rows. The
-following examples show a 2×3 matrix, a 1×4 row vector, a 3×1 column
-vector, and a scalar.
-
-```
-{1, 2, 3; 4, 5, 6} ⇒ [1 2 3]
- [4 5 6]
-{3.14, 6.28, 9.24, 12.57} ⇒ [3.14 6.28 9.42 12.57]
-{1.41; 1.73; 2} ⇒ [1.41]
- [1.73]
- [2.00]
-{5} ⇒ 5
-```
-
- Curly braces are not limited to holding numeric literals. They can
-contain calculations, and they can paste together matrices and vectors
-in any way as long as the result is rectangular. For example, if `m` is
-matrix `{1, 2; 3, 4}`, `r` is row vector `{5, 6}`, and `c` is column
-vector `{7, 8}`, then curly braces can be used as follows:
-
-```
-{m, c; r, 10} ⇒ [1 2 7]
- [3 4 8]
- [5 6 10]
-{c, 2 * c, T(r)} ⇒ [7 14 5]
- [8 16 6]
-```
-
- The final example above uses the transposition function `T`.
-
-### Integer Sequence Operator `:`
-
-The syntax `FIRST:LAST:STEP` yields a row vector of consecutive integers
-from FIRST to LAST counting by STEP. The final `:STEP` is optional and
-defaults to 1 when omitted.
-
-`FIRST`, `LAST`, and `STEP` must each be a scalar and should be an
-integer (any fractional part is discarded). Because `:` has a high
-precedence, operands other than numeric literals must usually be
-parenthesized.
-
-When `STEP` is positive (or omitted) and `END < START`, or if `STEP`
-is negative and `END > START`, then the result is an empty matrix. If
-`STEP` is 0, then PSPP reports an error.
-
-Here are some examples:
-
-```
-1:6 ⇒ {1, 2, 3, 4, 5, 6}
-1:6:2 ⇒ {1, 3, 5}
--1:-5:-1 ⇒ {-1, -2, -3, -4, -5}
--1:-5 ⇒ {}
-2:1:0 ⇒ (error)
-```
-
-### Index Operator `()`
-
-The result of the submatrix or indexing operator, written `M(RINDEX,
-CINDEX)`, contains the rows of `M` whose indexes are given in vector
-`RINDEX` and the columns whose indexes are given in vector `CINDEX`.
-
- In the simplest case, if `RINDEX` and `CINDEX` are both scalars, the
-result is also a scalar:
-
-```
-{10, 20; 30, 40}(1, 1) ⇒ 10
-{10, 20; 30, 40}(1, 2) ⇒ 20
-{10, 20; 30, 40}(2, 1) ⇒ 30
-{10, 20; 30, 40}(2, 2) ⇒ 40
-```
-
-If the index arguments have multiple elements, then the result
-includes multiple rows or columns:
-
-```
-{10, 20; 30, 40}(1:2, 1) ⇒ {10; 30}
-{10, 20; 30, 40}(2, 1:2) ⇒ {30, 40}
-{10, 20; 30, 40}(1:2, 1:2) ⇒ {10, 20; 30, 40}
-```
-
-The special argument `:` may stand in for all the rows or columns in
-the matrix being indexed, like this:
-
-```
-{10, 20; 30, 40}(:, 1) ⇒ {10; 30}
-{10, 20; 30, 40}(2, :) ⇒ {30, 40}
-{10, 20; 30, 40}(:, :) ⇒ {10, 20; 30, 40}
-```
-
-The index arguments do not have to be in order, and they may contain
-repeated values, like this:
-
-```
-{10, 20; 30, 40}({2, 1}, 1) ⇒ {30; 10}
-{10, 20; 30, 40}(2, {2; 2; ⇒ {40, 40, 30}
-1})
-{10, 20; 30, 40}(2:1:-1, :) ⇒ {30, 40; 10, 20}
-```
-
-When the matrix being indexed is a row or column vector, only a
-single index argument is needed, like this:
-
-```
-{11, 12, 13, 14, 15}(2:4) ⇒ {12, 13, 14}
-{11; 12; 13; 14; 15}(2:4) ⇒ {12; 13; 14}
-```
-
-When an index is not an integer, PSPP discards the fractional part.
-It is an error for an index to be less than 1 or greater than the number
-of rows or columns:
-
-```
-{11, 12, 13, 14}({2.5, ⇒ {12, 14}
-4.6})
-{11; 12; 13; 14}(0) ⇒ (error)
-```
-
-### Unary Operators
-
-The unary operators take a single operand of any dimensions and operate
-on each of its elements independently. The unary operators are:
-
-* `-`: Inverts the sign of each element.
-* `+`: No change.
-* `NOT`: Logical inversion: each positive value becomes 0 and each
- zero or negative value becomes 1.
-
-Examples:
-
-```
--{1, -2; 3, -4} ⇒ {-1, 2; -3, 4}
-+{1, -2; 3, -4} ⇒ {1, -2; 3, -4}
-NOT {1, 0; -1, 1} ⇒ {0, 1; 1, 0}
-```
-
-### Elementwise Binary Operators
-
-The elementwise binary operators require their operands to be matrices
-with the same dimensions. Alternatively, if one operand is a scalar,
-then its value is treated as if it were duplicated to the dimensions of
-the other operand. The result is a matrix of the same size as the
-operands, in which each element is the result of the applying the
-operator to the corresponding elements of the operands.
-
-The elementwise binary operators are listed below.
-
-- The arithmetic operators, for familiar arithmetic operations:
-
- - `+`: Addition.
-
- - `-`: Subtraction.
-
- - `*`: Multiplication, if one operand is a scalar. (Otherwise this
- is matrix multiplication, described below.)
-
- - `/` or `&/`: Division.
-
- - `&*`: Multiplication.
-
- - `&**`: Exponentiation.
-
-- The relational operators, whose results are 1 when a comparison is
- true and 0 when it is false:
-
- - `<` or `LT`: Less than.
-
- - `<=` or `LE`: Less than or equal.
-
- - `=` or `EQ`: Equal.
-
- - `>` or `GT`: Greater than.
-
- - `>=` or `GE`: Greater than or equal.
-
- - `<>` or `~=` or `NE`: Not equal.
-
-- The logical operators, which treat positive operands as true and
- nonpositive operands as false. They yield 0 for false and 1 for
- true:
-
- - `AND`: True if both operands are true.
-
- - `OR`: True if at least one operand is true.
-
- - `XOR`: True if exactly one operand is true.
-
-Examples:
-
-```
-1 + 2 ⇒ 3
-1 + {3; 4} ⇒ {4; 5}
-{66, 77; 88, 99} + 5 ⇒ {71, 82; 93, 104}
-{4, 8; 3, 7} + {1, 0; 5, 2} ⇒ {5, 8; 8, 9}
-{1, 2; 3, 4} < {4, 3; 2, 1} ⇒ {1, 1; 0, 0}
-{1, 3; 2, 4} >= 3 ⇒ {0, 1; 0, 1}
-{0, 0; 1, 1} AND {0, 1; 0, ⇒ {0, 0; 0, 1}
-1}
-```
-
-### Matrix Multiplication Operator `*`
-
-If `A` is an `M`×`N` matrix and `B` is an `N`×`P` matrix, then `A*B` is the
-`M`×`P` matrix multiplication product `C`. PSPP reports an error if the
-number of columns in `A` differs from the number of rows in `B`.
-
-The `*` operator performs elementwise multiplication (see above) if
-one of its operands is a scalar.
-
-No built-in operator yields the inverse of matrix multiplication.
-Instead, multiply by the result of `INV` or `GINV`.
-
-Some examples:
-
-```
-{1, 2, 3} * {4; 5; 6} ⇒ 32
-{4; 5; 6} * {1, 2, 3} ⇒ {4, 8, 12;
- 5, 10, 15;
- 6, 12, 18}
-```
-
-### Matrix Exponentiation Operator `**`
-
-The result of `A**B` is defined as follows when `A` is a square matrix
-and `B` is an integer scalar:
-
- - For `B > 0`, `A**B` is `A*…*A`, where there are `B` `A`s. (PSPP
- implements this efficiently for large `B`, using exponentiation by
- squaring.)
-
- - For `B < 0`, `A**B` is `INV(A**(-B))`.
-
- - For `B = 0`, `A**B` is the identity matrix.
-
-PSPP reports an error if `A` is not square or `B` is not an integer.
-
-Examples:
-
-```
-{2, 5; 1, 4}**3 ⇒ {48, 165; 33, 114}
-{2, 5; 1, 4}**0 ⇒ {1, 0; 0, 1}
-10*{4, 7; 2, 6}**-1 ⇒ {6, -7; -2, 4}
-```
-
-## Matrix Functions
-
-The matrix language support numerous functions in multiple categories.
-The following subsections document each of the currently supported
-functions. The first letter of each parameter's name indicate the
-required argument type:
-
-* `S`: A scalar.
-
-* `N`: A nonnegative integer scalar. (Non-integers are accepted and
- silently rounded down to the nearest integer.)
-
-* `V`: A row or column vector.
-
-* `M`: A matrix.
-
-### Elementwise Functions
-
-These functions act on each element of their argument independently,
-like the [elementwise operators](#elementwise-binary-operators).
-
-* `ABS(M)`
- Takes the absolute value of each element of M.
-
- ```
- ABS({-1, 2; -3, 0}) ⇒ {1, 2; 3, 0}
- ```
-
-* `ARSIN(M)`
- `ARTAN(M)`
- Computes the inverse sine or tangent, respectively, of each
- element in M. The results are in radians, between \\(-\pi/2\\)
- and \\(+\pi/2\\), inclusive.
-
- The value of \\(\pi\\) can be computed as `4*ARTAN(1)`.
-
- ```
- ARSIN({-1, 0, 1}) ⇒ {-1.57, 0, 1.57} (approximately)
-
- ARTAN({-5, -1, 1, 5}) ⇒ {-1.37, -.79, .79, 1.37} (approximately)
- ```
-
-* `COS(M)`
- `SIN(M)`
- Computes the cosine or sine, respectively, of each element in `M`,
- which must be in radians.
-
- ```
- COS({0.785, 1.57; 3.14, 1.57 + 3.14}) ⇒ {.71, 0; -1, 0}
- (approximately)
- ```
-
-* `EXP(M)`
- Computes \\(e^x\\) for each element \\(x\\) in `M`.
-
- ```
- EXP({2, 3; 4, 5}) ⇒ {7.39, 20.09; 54.6, 148.4} (approximately)
- ```
-
-* `LG10(M)`
- `LN(M)`
- Takes the logarithm with base 10 or base \\(e\\), respectively, of each
- element in `M`.
-
- ```
- LG10({1, 10, 100, 1000}) ⇒ {0, 1, 2, 3}
- LG10(0) ⇒ (error)
-
- LN({EXP(1), 1, 2, 3, 4}) ⇒ {1, 0, .69, 1.1, 1.39} (approximately)
- LN(0) ⇒ (error)
- ```
-
-* `MOD(M, S)`
- Takes each element in `M` modulo nonzero scalar value `S`, that
- is, the remainder of division by `S`. The sign of the result is
- the same as the sign of the dividend.
-
- ```
- MOD({5, 4, 3, 2, 1, 0}, 3) ⇒ {2, 1, 0, 2, 1, 0}
- MOD({5, 4, 3, 2, 1, 0}, -3) ⇒ {2, 1, 0, 2, 1, 0}
- MOD({-5, -4, -3, -2, -1, 0}, 3) ⇒ {-2, -1, 0, -2, -1, 0}
- MOD({-5, -4, -3, -2, -1, 0}, -3) ⇒ {-2, -1, 0, -2, -1, 0}
- MOD({5, 4, 3, 2, 1, 0}, 1.5) ⇒ {.5, 1.0, .0, .5, 1.0, .0}
- MOD({5, 4, 3, 2, 1, 0}, 0) ⇒ (error)
- ```
-
-* `RND(M)`
- `TRUNC(M)`
- Rounds each element of `M` to an integer. `RND` rounds to the
- nearest integer, with halves rounded to even integers, and
- `TRUNC` rounds toward zero.
-
- ```
- RND({-1.6, -1.5, -1.4}) ⇒ {-2, -2, -1}
- RND({-.6, -.5, -.4}) ⇒ {-1, 0, 0}
- RND({.4, .5, .6} ⇒ {0, 0, 1}
- RND({1.4, 1.5, 1.6}) ⇒ {1, 2, 2}
-
- TRUNC({-1.6, -1.5, -1.4}) ⇒ {-1, -1, -1}
- TRUNC({-.6, -.5, -.4}) ⇒ {0, 0, 0}
- TRUNC({.4, .5, .6} ⇒ {0, 0, 0}
- TRUNC({1.4, 1.5, 1.6}) ⇒ {1, 1, 1}
- ```
-
-* `SQRT(M)`
- Takes the square root of each element of `M`, which must not be
- negative.
-
- ```
- SQRT({0, 1, 2, 4, 9, 81}) ⇒ {0, 1, 1.41, 2, 3, 9} (approximately)
- SQRT(-1) ⇒ (error)
- ```
-
-### Logical Functions
-
-* `ALL(M)`
- Returns a scalar with value 1 if all of the elements in `M` are
- nonzero, or 0 if at least one element is zero.
-
- ```
- ALL({1, 2, 3} < {2, 3, 4}) ⇒ 1
- ALL({2, 2, 3} < {2, 3, 4}) ⇒ 0
- ALL({2, 3, 3} < {2, 3, 4}) ⇒ 0
- ALL({2, 3, 4} < {2, 3, 4}) ⇒ 0
- ```
-
-* `ANY(M)`
- Returns a scalar with value 1 if any of the elements in `M` is
- nonzero, or 0 if all of them are zero.
-
- ```
- ANY({1, 2, 3} < {2, 3, 4}) ⇒ 1
- ANY({2, 2, 3} < {2, 3, 4}) ⇒ 1
- ANY({2, 3, 3} < {2, 3, 4}) ⇒ 1
- ANY({2, 3, 4} < {2, 3, 4}) ⇒ 0
- ```
-
-### Matrix Construction Functions
-
-* `BLOCK(M1, …, MN)`
- Returns a block diagonal matrix with as many rows as the sum of
- its arguments' row counts and as many columns as the sum of their
- columns. Each argument matrix is placed along the main diagonal
- of the result, and all other elements are zero.
-
- ```
- BLOCK({1, 2; 3, 4}, 5, {7; 8; 9}, {10, 11}) ⇒
- 1 2 0 0 0 0
- 3 4 0 0 0 0
- 0 0 5 0 0 0
- 0 0 0 7 0 0
- 0 0 0 8 0 0
- 0 0 0 9 0 0
- 0 0 0 0 10 11
- ```
-
-* `IDENT(N)`
- `IDENT(NR, NC)`
- Returns an identity matrix, whose main diagonal elements are one
- and whose other elements are zero. The returned matrix has `N`
- rows and columns or `NR` rows and `NC` columns, respectively.
-
- ```
- IDENT(1) ⇒ 1
- IDENT(2) ⇒
- 1 0
- 0 1
- IDENT(3, 5) ⇒
- 1 0 0 0 0
- 0 1 0 0 0
- 0 0 1 0 0
- IDENT(5, 3) ⇒
- 1 0 0
- 0 1 0
- 0 0 1
- 0 0 0
- 0 0 0
- ```
-
-* `MAGIC(N)`
- Returns an `N`×`N` matrix that contains each of the integers 1…`N`
- once, in which each column, each row, and each diagonal sums to
- \\(n(n^2+1)/2\\). There are many magic squares with given dimensions,
- but this function always returns the same one for a given value of
- N.
-
- ```
- MAGIC(3) ⇒ {8, 1, 6; 3, 5, 7; 4, 9, 2}
- MAGIC(4) ⇒ {1, 5, 12, 16; 15, 11, 6, 2; 14, 8, 9, 3; 4, 10, 7, 13}
- ```
-
-* `MAKE(NR, NC, S)`
- Returns an `NR`×`NC` matrix whose elements are all `S`.
-
- ```
- MAKE(1, 2, 3) ⇒ {3, 3}
- MAKE(2, 1, 4) ⇒ {4; 4}
- MAKE(2, 3, 5) ⇒ {5, 5, 5; 5, 5, 5}
- ```
-
-* <a name="mdiag">`MDIAG(V)`</a>
- Given `N`-element vector `V`, returns a `N`×`N` matrix whose main
- diagonal is copied from `V`. The other elements in the returned
- vector are zero.
-
- Use [`CALL SETDIAG`](#setdiag) to replace the main diagonal of a
- matrix in-place.
-
- ```
- MDIAG({1, 2, 3, 4}) ⇒
- 1 0 0 0
- 0 2 0 0
- 0 0 3 0
- 0 0 0 4
- ```
-
-* `RESHAPE(M, NR, NC)`
- Returns an `NR`×`NC` matrix whose elements come from `M`, which
- must have the same number of elements as the new matrix, copying
- elements from `M` to the new matrix row by row.
-
- ```
- RESHAPE(1:12, 1, 12) ⇒
- 1 2 3 4 5 6 7 8 9 10 11 12
- RESHAPE(1:12, 2, 6) ⇒
- 1 2 3 4 5 6
- 7 8 9 10 11 12
- RESHAPE(1:12, 3, 4) ⇒
- 1 2 3 4
- 5 6 7 8
- 9 10 11 12
- RESHAPE(1:12, 4, 3) ⇒
- 1 2 3
- 4 5 6
- 7 8 9
- 10 11 12
- ```
-
-* `T(M)`
- `TRANSPOS(M)`
- Returns `M` with rows exchanged for columns.
-
- ```
- T({1, 2, 3}) ⇒ {1; 2; 3}
- T({1; 2; 3}) ⇒ {1, 2, 3}
- ```
-
-* `UNIFORM(NR, NC)`
- Returns a `NR`×`NC` matrix in which each element is randomly
- chosen from a uniform distribution of real numbers between 0
- and 1. Random number generation honors the current
- [seed](../utilities/set.md#seed) setting.
-
- The following example shows one possible output, but of course
- every result will be different (given different seeds):
-
- ```
- UNIFORM(4, 5)*10 ⇒
- 7.71 2.99 .21 4.95 6.34
- 4.43 7.49 8.32 4.99 5.83
- 2.25 .25 1.98 7.09 7.61
- 2.66 1.69 2.64 .88 1.50
- ```
-
-### Minimum, Maximum, and Sum Functions
-
-* `CMIN(M)`
- `CMAX(M)`
- `CSUM(M)`
- `CSSQ(M)`
- Returns a row vector with the same number of columns as `M`, in
- which each element is the minimum, maximum, sum, or sum of
- squares, respectively, of the elements in the same column of `M`.
-
- ```
- CMIN({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {1, 2, 3}
- CMAX({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {7, 8, 9}
- CSUM({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {12, 15, 18}
- CSSQ({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {66, 93, 126}
- ```
-
-* `MMIN(M)`
- `MMAX(M)`
- `MSUM(M)`
- `MSSQ(M)`
- Returns the minimum, maximum, sum, or sum of squares, respectively,
- of the elements of `M`.
-
- ```
- MMIN({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 1
- MMAX({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 9
- MSUM({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 45
- MSSQ({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ 285
- ```
-
-* `RMIN(M)`
- `RMAX(M)`
- `RSUM(M)`
- `RSSQ(M)`
- Returns a column vector with the same number of rows as `M`, in
- which each element is the minimum, maximum, sum, or sum of
- squares, respectively, of the elements in the same row of `M`.
-
- ```
- RMIN({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {1; 4; 7}
- RMAX({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {3; 6; 9}
- RSUM({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {6; 15; 24}
- RSSQ({1, 2, 3; 4, 5, 6; 7, 8, 9} ⇒ {14; 77; 194}
- ```
-
-* `SSCP(M)`
- Returns \\({\bf M}^{\bf T} × \bf M\\).
-
- ```
- SSCP({1, 2, 3; 4, 5, 6}) ⇒ {17, 22, 27; 22, 29, 36; 27, 36, 45}
- ```
-
-* `TRACE(M)`
- Returns the sum of the elements along `M`'s main diagonal,
- equivalent to `MSUM(DIAG(M))`.
-
- ```
- TRACE(MDIAG(1:5)) ⇒ 15
- ```
-
-### Matrix Property Functions
-
-* `NROW(M)`
- `NCOL(M)`
- Returns the number of row or columns, respectively, in `M`.
-
- ```
- NROW({1, 0; -2, -3; 3, 3}) ⇒ 3
- NROW(1:5) ⇒ 1
-
- NCOL({1, 0; -2, -3; 3, 3}) ⇒ 2
- NCOL(1:5) ⇒ 5
- ```
-
-* `DIAG(M)`
- Returns a column vector containing a copy of M's main diagonal.
- The vector's length is the lesser of `NCOL(M)` and `NROW(M)`.
-
- ```
- DIAG({1, 0; -2, -3; 3, 3}) ⇒ {1; -3}
- ```
-
-### Matrix Rank Ordering Functions
-
-The `GRADE` and `RANK` functions each take a matrix `M` and return a
-matrix `R` with the same dimensions. Each element in `R` ranges
-between 1 and the number of elements `N` in `M`, inclusive. When the
-elements in `M` all have unique values, both of these functions yield
-the same results: the smallest element in `M` corresponds to value 1
-in R, the next smallest to 2, and so on, up to the largest to `N`.
-When multiple elements in `M` have the same value, these functions use
-different rules for handling the ties.
-
-* `GRADE(M)`
- Returns a ranking of `M`, turning duplicate values into sequential
- ranks. The returned matrix always contains each of the integers 1
- through the number of elements in the matrix exactly once.
-
- ```
- GRADE({1, 0, 3; 3, 1, 2; 3, 0, 5}) ⇒ {3, 1, 6; 7, 4, 5; 8, 2, 9}
- ```
-
-* `RNKORDER(M)`
- Returns a ranking of `M`, turning duplicate values into the mean
- of their sequential ranks.
-
- ```
- RNKORDER({1, 0, 3; 3, 1, 2; 3, 0, 5})
- ⇒ {3.5, 1.5, 7; 7, 3.5, 5; 7, 1.5, 9}
- ```
-
-One may use `GRADE` to sort a vector:
-
-```
-COMPUTE v(GRADE(v))=v. /* Sort v in ascending order.
-COMPUTE v(GRADE(-v))=v. /* Sort v in descending order.
-```
-
-### Matrix Algebra Functions
-
-* `CHOL(M)`
- Matrix `M` must be an `N`×`N` symmetric positive-definite matrix.
- Returns an `N`×`N` matrix `B` such that \\({\bf B}^{\bf T}×{\bf
- B}=\bf M\\).
-
- ```
- CHOL({4, 12, -16; 12, 37, -43; -16, -43, 98}) ⇒
- 2 6 -8
- 0 1 5
- 0 0 3
- ```
-
-* `DESIGN(M)`
- Returns a design matrix for `M`. The design matrix has the same
- number of rows as `M`. Each column C in `M`, from left to right,
- yields a group of columns in the output. For each unique value
- `V` in `C`, from top to bottom, add a column to the output in
- which `V` becomes 1 and other values become 0.
-
- PSPP issues a warning if a column only contains a single unique
- value.
-
- ```
- DESIGN({1; 2; 3}) ⇒ {1, 0, 0; 0, 1, 0; 0, 0, 1}
- DESIGN({5; 8; 5}) ⇒ {1, 0; 0, 1; 1, 0}
- DESIGN({1, 5; 2, 8; 3, 5})
- ⇒ {1, 0, 0, 1, 0; 0, 1, 0, 0, 1; 0, 0, 1, 1, 0}
- DESIGN({5; 5; 5}) ⇒ (warning)
- ```
-
-* `DET(M)`
- Returns the determinant of square matrix `M`.
-
- ```
- DET({3, 7; 1, -4}) ⇒ -19
- ```
-
-* <a name="eval">`EVAL(M)`</a>
- Returns a column vector containing the eigenvalues of symmetric
- matrix `M`, sorted in ascending order.
-
- Use [`CALL EIGEN`](#eigen) to compute eigenvalues and eigenvectors
- of a matrix.
-
- ```
- EVAL({2, 0, 0; 0, 3, 4; 0, 4, 9}) ⇒ {11; 2; 1}
- ```
-
-* `GINV(M)`
- Returns the `K`×`N` matrix `A` that is the "generalized inverse"
- of `N`×`K` matrix `M`, defined such that \\({\bf M}×{\bf A}×{\bf
- M}={\bf M}\\) and \\({\bf A}×{\bf M}×{\bf A}={\bf A}\\).
-
- ```
- GINV({1, 2}) ⇒ {.2; .4} (approximately)
- {1:9} * GINV(1:9) * {1:9} ⇒ {1:9} (approximately)
- ```
-
-* `GSCH(M)`
- `M` must be a `N`×`M` matrix, `M` ≥ `N`, with rank `N`. Returns
- an `N`×`N` orthonormal basis for `M`, obtained using the
- [Gram-Schmidt
- process](https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process).
-
- ```
- GSCH({3, 2; 1, 2}) * SQRT(10) ⇒ {3, -1; 1, 3} (approximately)
- ```
-
-* `INV(M)`
- Returns the `N`×`N` matrix A that is the inverse of `N`×`N` matrix M,
- defined such that \\({\bf M}×{\bf A} = {\bf A}×{\bf M} = {\bf I}\\), where I is the identity matrix. M
- must not be singular, that is, \\(\det({\bf M}) ≠ 0\\).
-
- ```
- INV({4, 7; 2, 6}) ⇒ {.6, -.7; -.2, .4} (approximately)
- ```
-
-* `KRONEKER(MA, MB)`
- Returns the `PM`×`QN` matrix P that is the [Kroneker
- product](https://en.wikipedia.org/wiki/Kronecker_product) of `M`×`N`
- matrix `MA` and `P`×`Q` matrix `MB`. One may view P as the
- concatenation of multiple `P`×`Q` blocks, each of which is the
- scalar product of `MB` by a different element of `MA`. For example,
- when `A` is a 2×2 matrix, `KRONEKER(A, B)` is equivalent to
- `{A(1,1)*B, A(1,2)*B; A(2,1)*B, A(2,2)*B}`.
-
- ```
- KRONEKER({1, 2; 3, 4}, {0, 5; 6, 7}) ⇒
- 0 5 0 10
- 6 7 12 14
- 0 15 0 20
- 18 21 24 28
- ```
-
-* `RANK(M)`
- Returns the rank of matrix `M`, a integer scalar whose value is the
- dimension of the vector space spanned by its columns or,
- equivalently, by its rows.
-
- ```
- RANK({1, 0, 1; -2, -3, 1; 3, 3, 0}) ⇒ 2
- RANK({1, 1, 0, 2; -1, -1, 0, -2}) ⇒ 1
- RANK({1, -1; 1, -1; 0, 0; 2, -2}) ⇒ 1
- RANK({1, 2, 1; -2, -3, 1; 3, 5, 0}) ⇒ 2
- RANK({1, 0, 2; 2, 1, 0; 3, 2, 1}) ⇒ 3
- ```
-
-* `SOLVE(MA, MB)`
- MA must be an `N`×`N` matrix, with \\(\det({\bf MA}) ≠ 0\\), and MB an `P`×`Q` matrix.
- Returns an `P`×`Q` matrix X such that \\({\bf MA} × {\bf X} = {\bf MB}\\).
-
- All of the following examples show approximate results:
-
- ```
- SOLVE({2, 3; 4, 9}, {6, 2; 15, 5}) ⇒
- 1.50 .50
- 1.00 .33
- SOLVE({1, 3, -2; 3, 5, 6; 2, 4, 3}, {5; 7; 8}) ⇒
- -15.00
- 8.00
- 2.00
- SOLVE({2, 1, -1; -3, -1, 2; -2, 1, 2}, {8; -11; -3}) ⇒
- 2.00
- 3.00
- -1.00
- ```
-
-* <a name="sval">`SVAL(M)`</a>
-
- Given `P`×`Q` matrix `M`, returns a \\(\min(N,K)\\)-element column vector
- containing the singular values of `M` in descending order.
-
- Use [`CALL SVD`](#svd) to compute the full singular value
- decomposition of a matrix.
-
- ```
- SVAL({1, 1; 0, 0}) ⇒ {1.41; .00}
- SVAL({1, 0, 1; 0, 1, 1; 0, 0, 0}) ⇒ {1.73; 1.00; .00}
- SVAL({2, 4; 1, 3; 0, 0; 0, 0}) ⇒ {5.46; .37}
- ```
-
-* `SWEEP(M, NK)`
- Given `P`×`Q` matrix `M` and integer scalar \\(k\\) = `NK` such that \\(1 ≤ k ≤
- \min(R,C)\\), returns the `P`×`Q` sweep matrix A.
-
- If \\({\bf M}_{kk} ≠ 0\\), then:
-
- $$
- \begin{align}
- A_{kk} &= 1/M_{kk},\\\\
- A_{ik} &= -M_{ik}/M_{kk} \text{ for } i ≠ k,\\\\
- A_{kj} &= M_{kj}/M_{kk} \text{ for } j ≠ k,\\\\
- A_{ij} &= M_{ij} - M_{ik}M_{kj}/M_{kk} \text{ for } i ≠ k \text{ and } j ≠ k.
- \end{align}
- $$
-
- If \\({\bf M}_{kk}\\) = 0, then:
-
- $$
- \begin{align}
- A_{ik} &= A_{ki} = 0, \\\\
- A_{ij} &= M_{ij}, \text{ for } i ≠ k \text{ and } j ≠ k.
- \end{align}
- $$
-
- Given `M = {0, 1, 2; 3, 4, 5; 6, 7, 8}`, then (approximately):
-
- ```
- SWEEP(M, 1) ⇒
- .00 .00 .00
- .00 4.00 5.00
- .00 7.00 8.00
- SWEEP(M, 2) ⇒
- -.75 -.25 .75
- .75 .25 1.25
- .75 -1.75 -.75
- SWEEP(M, 3) ⇒
- -1.50 -.75 -.25
- -.75 -.38 -.63
- .75 .88 .13
- ```
-
-### Matrix Statistical Distribution Functions
-
-The matrix language can calculate several functions of standard
-statistical distributions using the same syntax and semantics as in
-PSPP transformation expressions. See [Statistical Distribution
-Functions](../../language/expressions/functions/statistical-distributions.md)
-for details.
-
- The matrix language extends the `PDF`, `CDF`, `SIG`, `IDF`, `NPDF`,
-and `NCDF` functions by allowing the first parameters to each of these
-functions to be a vector or matrix with any dimensions. In addition,
-`CDF.BVNOR` and `PDF.BVNOR` allow either or both of their first two
-parameters to be vectors or matrices; if both are non-scalar then they
-must have the same dimensions. In each case, the result is a matrix
-or vector with the same dimensions as the input populated with
-elementwise calculations.
-
-### `EOF` Function
-
-This function works with files being used on the `READ` statement.
-
-* `EOF(FILE)`
-
- Given a file handle or file name `FILE`, returns an integer scalar 1
- if the last line in the file has been read or 0 if more lines are
- available. Determining this requires attempting to read another
- line, which means that `REREAD` on the next `READ` command
- following `EOF` on the same file will be ineffective.
-
-The `EOF` function gives a matrix program the flexibility to read a
-file with text data without knowing the length of the file in advance.
-For example, the following program will read all the lines of data in
-`data.txt`, each consisting of three numbers, as rows in matrix `data`:
-
-```
-MATRIX.
-COMPUTE data={}.
-LOOP IF NOT EOF('data.txt').
- READ row/FILE='data.txt'/FIELD=1 TO 1000/SIZE={1,3}.
- COMPUTE data={data; row}.
-END LOOP.
-PRINT data.
-END MATRIX.
-```
-
-## `COMPUTE` Command
-
-```
-COMPUTE variable[(index[,index])]=expression.
-```
-
- The `COMPUTE` command evaluates an expression and assigns the
-result to a variable or a submatrix of a variable. Assigning to a
-submatrix uses the same syntax as the [index
-operator](#index-operator-).
-
-## `CALL` Command
-
-A matrix function returns a single result. The `CALL` command
-implements procedures, which take a similar syntactic form to functions
-but yield results by modifying their arguments rather than returning a
-value.
-
-Output arguments to a `CALL` procedure must be a single variable
-name.
-
-The following procedures are implemented via `CALL` to allow them to
-return multiple results. For these procedures, the output arguments
-need not name existing variables; if they do, then their previous
-values are replaced:
-
-* <a name="eigen">`CALL EIGEN(M, EVEC, EVAL)`</a>
-
- Computes the eigenvalues and eigenvector of symmetric `N`×`N` matrix `M`.
- Assigns the eigenvectors of `M` to the columns of `N`×`N` matrix EVEC and
- the eigenvalues in descending order to `N`-element column vector
- `EVAL`.
-
- Use the [`EVAL`](#eval) function to compute just the eigenvalues of
- a symmetric matrix.
-
- For example, the following matrix language commands:
-
- ```
- CALL EIGEN({1, 0; 0, 1}, evec, eval).
- PRINT evec.
- PRINT eval.
-
- CALL EIGEN({3, 2, 4; 2, 0, 2; 4, 2, 3}, evec2, eval2).
- PRINT evec2.
- PRINT eval2.
- ```
-
- yield this output:
-
- ```
- evec
- 1 0
- 0 1
-
- eval
- 1
- 1
-
- evec2
- -.6666666667 .0000000000 .7453559925
- -.3333333333 -.8944271910 -.2981423970
- -.6666666667 .4472135955 -.5962847940
-
- eval2
- 8.0000000000
- -1.0000000000
- -1.0000000000
- ```
-
-* <a name="svd">`CALL SVD(M, U, S, V)`</a>
-
- Computes the singular value decomposition of `P`×`Q` matrix `M`,
- assigning `S` a `P`×`Q` diagonal matrix and to `U` and `V` unitary `P`×`Q`
- matrices such that M = U×S×V^T. The main diagonal of `Q` contains the
- singular values of `M`.
-
- Use the [`SVAL`](#sval) function to compute just the singular values
- of a matrix.
-
- For example, the following matrix program:
-
- ```
- CALL SVD({3, 2, 2; 2, 3, -2}, u, s, v).
- PRINT (u * s * T(v))/FORMAT F5.1.
- ```
-
- yields this output:
-
- ```
- (u * s * T(v))
- 3.0 2.0 2.0
- 2.0 3.0 -2.0
- ```
-
-The final procedure is implemented via `CALL` to allow it to modify a
-matrix instead of returning a modified version. For this procedure,
-the output argument must name an existing variable.
-
-* <a name="setdiag">`CALL SETDIAG(M, V)`</a>
-
- Replaces the main diagonal of `N`×`P` matrix M by the contents of
- `K`-element vector `V`. If `K` = 1, so that `V` is a scalar, replaces all
- of the diagonal elements of `M` by `V`. If K < \min(N,P), only the
- upper K diagonal elements are replaced; if K > \min(N,P), then the
- extra elements of V are ignored.
-
- Use the [`MDIAG`](#mdiag) function to construct a new matrix with a
- specified main diagonal.
-
- For example, this matrix program:
-
- ```
- COMPUTE x={1, 2, 3; 4, 5, 6; 7, 8, 9}.
- CALL SETDIAG(x, 10).
- PRINT x.
- ```
-
- outputs the following:
-
- ```
- x
- 10 2 3
- 4 10 6
- 7 8 10
- ```
-
-## `PRINT` Command
-
-```
-PRINT [expression]
- [/FORMAT=format]
- [/TITLE=title]
- [/SPACE={NEWPAGE | n}]
- [{/RLABELS=string… | /RNAMES=expression}]
- [{/CLABELS=string… | /CNAMES=expression}].
-```
-
- The `PRINT` command is commonly used to display a matrix. It
-evaluates the restricted EXPRESSION, if present, and outputs it either
-as text or a pivot table, depending on the setting of
-[`MDISPLAY`](../utilities/set.md#mdisplay).
-
- Use the `FORMAT` subcommand to specify a format, such as `F8.2`, for
-displaying the matrix elements. `FORMAT` is optional for numerical
-matrices. When it is omitted, PSPP chooses how to format entries
-automatically using \\(m\\), the magnitude of the largest-magnitude element in
-the matrix to be displayed:
-
- 1. If \\(m < 10^{11}\\) and the matrix's elements are all integers,
- PSPP chooses the narrowest `F` format that fits \\(m\\) plus a
- sign. For example, if the matrix is `{1:10}`, then \\(m = 10\\),
- which fits in 3 columns with room for a sign, the format is
- `F3.0`.
-
- 2. Otherwise, if \\(m ≥ 10^9\\) or \\(m ≤ 10^{-4}\\), PSPP scales
- all of the numbers in the matrix by \\(10^x\\), where \\(x\\) is
- the exponent that would be used to display \\(m\\) in scientific
- notation. For example, for \\(m = 5.123×10^{20}\\), the scale
- factor is \\(10^{20}\\). PSPP displays the scaled values in
- format `F13.10` and notes the scale factor in the output.
-
- 3. Otherwise, PSPP displays the matrix values, without scaling, in
- format `F13.10`.
-
- The optional `TITLE` subcommand specifies a title for the output text
-or table, as a quoted string. When it is omitted, the syntax of the
-matrix expression is used as the title.
-
- Use the `SPACE` subcommand to request extra space above the matrix
-output. With a numerical argument, it adds the specified number of
-lines of blank space above the matrix. With `NEWPAGE` as an argument,
-it prints the matrix at the top of a new page. The `SPACE` subcommand
-has no effect when a matrix is output as a pivot table.
-
- The `RLABELS` and `RNAMES` subcommands, which are mutually exclusive,
-can supply a label to accompany each row in the output. With `RLABELS`,
-specify the labels as comma-separated strings or other tokens. With
-`RNAMES`, specify a single expression that evaluates to a vector of
-strings. Either way, if there are more labels than rows, the extra
-labels are ignored, and if there are more rows than labels, the extra
-rows are unlabeled. For output to a pivot table with `RLABELS`, the
-labels can be any length; otherwise, the labels are truncated to 8
-bytes.
-
- The `CLABELS` and `CNAMES` subcommands work for labeling columns as
-`RLABELS` and `RNAMES` do for labeling rows.
-
- When the EXPRESSION is omitted, `PRINT` does not output a matrix.
-Instead, it outputs only the text specified on `TITLE`, if any, preceded
-by any space specified on the `SPACE` subcommand, if any. Any other
-subcommands are ignored, and the command acts as if `MDISPLAY` is set to
-`TEXT` regardless of its actual setting.
-
-### Example
-
- The following syntax demonstrates two different ways to label the
-rows and columns of a matrix with `PRINT`:
-
-```
-MATRIX.
-COMPUTE m={1, 2, 3; 4, 5, 6; 7, 8, 9}.
-PRINT m/RLABELS=a, b, c/CLABELS=x, y, z.
-
-COMPUTE rlabels={"a", "b", "c"}.
-COMPUTE clabels={"x", "y", "z"}.
-PRINT m/RNAMES=rlabels/CNAMES=clabels.
-END MATRIX.
-```
-
-With `MDISPLAY=TEXT` (the default), this program outputs the following
-(twice):
-
-```
-m
- x y z
-a 1 2 3
-b 4 5 6
-c 7 8 9
-```
-
-With `SET MDISPLAY=TABLES.` added above `MATRIX.`, the output becomes
-the following (twice):
-
-```
- m
-┌─┬─┬─┬─┐
-│ │x│y│z│
-├─┼─┼─┼─┤
-│a│1│2│3│
-│b│4│5│6│
-│c│7│8│9│
-└─┴─┴─┴─┘
-```
-
-
-## `DO IF` Command
-
-```
-DO IF expression.
- …matrix commands…
-[ELSE IF expression.
- …matrix commands…]…
-[ELSE
- …matrix commands…]
-END IF.
-```
-
- A `DO IF` command evaluates its expression argument. If the `DO IF`
-expression evaluates to true, then PSPP executes the associated
-commands. Otherwise, PSPP evaluates the expression on each `ELSE IF`
-clause (if any) in order, and executes the commands associated with the
-first one that yields a true value. Finally, if the `DO IF` and all the
-`ELSE IF` expressions all evaluate to false, PSPP executes the commands
-following the `ELSE` clause (if any).
-
- Each expression on `DO IF` and `ELSE IF` must evaluate to a scalar.
-Positive scalars are considered to be true, and scalars that are zero or
-negative are considered to be false.
-
-### Example
-
- The following matrix language fragment sets `b` to the term
-following `a` in the [Juggler
-sequence](https://en.wikipedia.org/wiki/Juggler_sequence):
-
-```
-DO IF MOD(a, 2) = 0.
- COMPUTE b = TRUNC(a &** (1/2)).
-ELSE.
- COMPUTE b = TRUNC(a &** (3/2)).
-END IF.
-```
-
-## `LOOP` and `BREAK` Commands
-
-```
-LOOP [var=first TO last [BY step]] [IF expression].
- …matrix commands…
-END LOOP [IF expression].
-
-BREAK.
-```
-
- The `LOOP` command executes a nested group of matrix commands,
-called the loop's "body", repeatedly. It has three optional clauses
-that control how many times the loop body executes. Regardless of
-these clauses, the global `MXLOOPS` setting, which defaults to 40,
-also limits the number of iterations of a loop. To iterate more
-times, raise the maximum with [`SET
-MXLOOPS`](../utilities/set.md#mxloops) outside of the `MATRIX`
-command.
-
- The optional index clause causes VAR to be assigned successive
-values on each trip through the loop: first `FIRST`, then `FIRST +
-STEP`, then `FIRST + 2 × STEP`, and so on. The loop ends when `VAR >
-LAST`, for positive `STEP`, or `VAR < LAST`, for negative `STEP`. If
-`STEP` is not specified, it defaults to 1. All the index clause
-expressions must evaluate to scalars, and non-integers are rounded
-toward zero. If `STEP` evaluates as zero (or rounds to zero), then
-the loop body never executes.
-
- The optional `IF` on `LOOP` is evaluated before each iteration
-through the loop body. If its expression, which must evaluate to a
-scalar, is zero or negative, then the loop terminates without executing
-the loop body.
-
- The optional `IF` on `END LOOP` is evaluated after each iteration
-through the loop body. If its expression, which must evaluate to a
-scalar, is zero or negative, then the loop terminates.
-
-### Example
-
- The following computes and prints \\(l(n)\\), whose value is the
-number of steps in the [Juggler
-sequence](https://en.wikipedia.org/wiki/Juggler_sequence) for \\(n\\),
-for \\( 2 \le n \le 10\\):
-
-```
-COMPUTE l = {}.
-LOOP n = 2 TO 10.
- COMPUTE a = n.
- LOOP i = 1 TO 100.
- DO IF MOD(a, 2) = 0.
- COMPUTE a = TRUNC(a &** (1/2)).
- ELSE.
- COMPUTE a = TRUNC(a &** (3/2)).
- END IF.
- END LOOP IF a = 1.
- COMPUTE l = {l; i}.
-END LOOP.
-PRINT l.
-```
-
-### `BREAK` Command
-
-The `BREAK` command may be used inside a loop body, ordinarily within a
-`DO IF` command. If it is executed, then the loop terminates
-immediately, jumping to the command just following `END LOOP`. When
-multiple `LOOP` commands nest, `BREAK` terminates the innermost loop.
-
-#### Example
-
-The following example is a revision of the one above that shows how
-`BREAK` could substitute for the index and `IF` clauses on `LOOP` and
-`END LOOP`:
-
-```
-COMPUTE l = {}.
-LOOP n = 2 TO 10.
- COMPUTE a = n.
- COMPUTE i = 1.
- LOOP.
- DO IF MOD(a, 2) = 0.
- COMPUTE a = TRUNC(a &** (1/2)).
- ELSE.
- COMPUTE a = TRUNC(a &** (3/2)).
- END IF.
- DO IF a = 1.
- BREAK.
- END IF.
- COMPUTE i = i + 1.
- END LOOP.
- COMPUTE l = {l; i}.
-END LOOP.
-PRINT l.
-```
-
-## `READ` and `WRITE` Commands
-
-The `READ` and `WRITE` commands perform matrix input and output with
-text files. They share the following syntax for specifying how data is
-divided among input lines:
-
-```
-/FIELD=first TO last [BY width]
-[/FORMAT=format]
-```
-
-Both commands require the `FIELD` subcommand. It specifies the range
-of columns, from FIRST to LAST, inclusive, that the data occupies on
-each line of the file. The leftmost column is column 1. The columns
-must be literal numbers, not expressions. To use entire lines, even if
-they might be very long, specify a column range such as `1 TO 100000`.
-
-The `FORMAT` subcommand is optional for numerical matrices. For
-string matrix input and output, specify an `A` format. In addition to
-`FORMAT`, the optional `BY` specification on `FIELD` determine the
-meaning of each text line:
-
-- With neither `BY` nor `FORMAT`, the numbers in the text file are in
- `F` format separated by spaces or commas. For `WRITE`, PSPP uses
- as many digits of precision as needed to accurately represent the
- numbers in the matrix.
-
-- `BY width` divides the input area into fixed-width fields with the
- given width. The input area must be a multiple of width columns
- wide. Numbers are read or written as `Fwidth.0` format.
-
-- `FORMAT="countF"` divides the input area into integer count
- equal-width fields per line. The input area must be a multiple of
- count columns wide. Another format type may be substituted for
- `F`.
-
-- `FORMAT=Fw[.d]` divides the input area into fixed-width fields
- with width `w`. The input area must be a multiple of `w` columns
- wide. Another format type may be substituted for `F`. The
- `READ` command disregards `d`.
-
-- `FORMAT=F` specifies format `F` without indicating a field width.
- Another format type may be substituted for `F`. The `WRITE`
- command accepts this form, but it has no effect unless `BY` is also
- used to specify a field width.
-
-If `BY` and `FORMAT` both specify or imply a field width, then they
-must indicate the same field width.
-
-### `READ` Command
-
-```
-READ variable[(index[,index])]
- [/FILE=file]
- /FIELD=first TO last [BY width]
- [/FORMAT=format]
- [/SIZE=expression]
- [/MODE={RECTANGULAR | SYMMETRIC}]
- [/REREAD].
-```
-
-The `READ` command reads from a text file into a matrix variable.
-Specify the target variable just after the command name, either just a
-variable name to create or replace an entire variable, or a variable
-name followed by an indexing expression to replace a submatrix of an
-existing variable.
-
-The `FILE` subcommand is required in the first `READ` command that
-appears within `MATRIX`. It specifies the text file to be read,
-either as a file name in quotes or a file handle previously declared
-on [`FILE HANDLE`](../data-io/file-handle.md). Later `READ`
-commands (in syntax order) use the previous referenced file if `FILE`
-is omitted.
-
-The `FIELD` and `FORMAT` subcommands specify how input lines are
-interpreted. `FIELD` is required, but `FORMAT` is optional. See
-[`READ` and `WRITE` Commands](#read-and-write-commands), for details.
-
-The `SIZE` subcommand is required for reading into an entire
-variable. Its restricted expression argument should evaluate to a
-2-element vector `{N, M}` or `{N; M}`, which indicates a `N`×`M`
-matrix destination. A scalar `N` is also allowed and indicates a
-`N`×1 column vector destination. When the destination is a submatrix,
-`SIZE` is optional, and if it is present then it must match the size
-of the submatrix.
-
-By default, or with `MODE=RECTANGULAR`, the command reads an entry
-for every row and column. With `MODE=SYMMETRIC`, the command reads only
-the entries on and below the matrix's main diagonal, and copies the
-entries above the main diagonal from the corresponding symmetric entries
-below it. Only square matrices may use `MODE=SYMMETRIC`.
-
-Ordinarily, each `READ` command starts from a new line in the text
-file. Specify the `REREAD` subcommand to instead start from the last
-line read by the previous `READ` command. This has no effect for the
-first `READ` command to read from a particular file. It is also
-ineffective just after a command that uses the [`EOF` matrix
-function](#eof-function) on a particular file, because `EOF` has to
-try to read the next line from the file to determine whether the file
-contains more input.
-
-#### Example 1: Basic Use
-
-The following matrix program reads the same matrix `{1, 2, 4; 2, 3, 5;
-4, 5, 6}` into matrix variables `v`, `w`, and `x`:
-
-```
-READ v /FILE='input.txt' /FIELD=1 TO 100 /SIZE={3, 3}.
-READ w /FIELD=1 TO 100 /SIZE={3; 3} /MODE=SYMMETRIC.
-READ x /FIELD=1 TO 100 BY 1/SIZE={3, 3} /MODE=SYMMETRIC.
-```
-given that `input.txt` contains the following:
-
-```
-1, 2, 4
-2, 3, 5
-4, 5, 6
-1
-2 3
-4 5 6
-1
-23
-456
-```
-The `READ` command will read as many lines of input as needed for a
-particular row, so it's also acceptable to break any of the lines above
-into multiple lines. For example, the first line `1, 2, 4` could be
-written with a line break following either or both commas.
-
-#### Example 2: Reading into a Submatrix
-
-The following reads a 5×5 matrix from `input2.txt`, reversing the order
-of the rows:
-
-```
-COMPUTE m = MAKE(5, 5, 0).
-LOOP r = 5 TO 1 BY -1.
- READ m(r, :) /FILE='input2.txt' /FIELD=1 TO 100.
-END LOOP.
-```
-#### Example 3: Using `REREAD`
-
-Suppose each of the 5 lines in a file `input3.txt` starts with an
-integer COUNT followed by COUNT numbers, e.g.:
-
-```
-1 5
-3 1 2 3
-5 6 -1 2 5 1
-2 8 9
-3 1 3 2
-```
-Then, the following reads this file into a matrix `m`:
-
-```
-COMPUTE m = MAKE(5, 5, 0).
-LOOP i = 1 TO 5.
- READ count /FILE='input3.txt' /FIELD=1 TO 1 /SIZE=1.
- READ m(i, 1:count) /FIELD=3 TO 100 /REREAD.
-END LOOP.
-```
-### `WRITE` Command
-
-```
-WRITE expression
- [/OUTFILE=file]
- /FIELD=first TO last [BY width]
- [/FORMAT=format]
- [/MODE={RECTANGULAR | TRIANGULAR}]
- [/HOLD].
-```
-The `WRITE` command evaluates expression and writes its value to a
-text file in a specified format. Write the expression to evaluate just
-after the command name.
-
-The `OUTFILE` subcommand is required in the first `WRITE` command that
-appears within `MATRIX`. It specifies the text file to be written,
-either as a file name in quotes or a file handle previously declared
-on [`FILE HANDLE`](../data-io/file-handle.md). Later `WRITE` commands
-(in syntax order) use the previous referenced file if `FILE` is
-omitted.
-
-The `FIELD` and `FORMAT` subcommands specify how output lines are
-formed. `FIELD` is required, but `FORMAT` is optional. See [`READ`
-and `WRITE` Commands](#read-and-write-commands), for details.
-
-By default, or with `MODE=RECTANGULAR`, the command writes an entry
-for every row and column. With `MODE=TRIANGULAR`, the command writes
-only the entries on and below the matrix's main diagonal. Entries above
-the diagonal are not written. Only square matrices may be written with
-`MODE=TRIANGULAR`.
-
-Ordinarily, each `WRITE` command writes complete lines to the output
-file. With `HOLD`, the final line written by `WRITE` will be held back
-for the next `WRITE` command to augment. This can be useful to write
-more than one matrix on a single output line.
-
-#### Example 1: Basic Usage
-
-This matrix program:
-
-```
-WRITE {1, 2; 3, 4} /OUTFILE='matrix.txt' /FIELD=1 TO 80.
-```
-writes the following to `matrix.txt`:
-
-```
- 1 2
- 3 4
-```
-#### Example 2: Triangular Matrix
-
-This matrix program:
-
-```
-WRITE MAGIC(5) /OUTFILE='matrix.txt' /FIELD=1 TO 80 BY 5 /MODE=TRIANGULAR.
-```
-writes the following to `matrix.txt`:
-
-```
- 17
- 23 5
- 4 6 13
- 10 12 19 21
- 11 18 25 2 9
-```
-## `GET` Command
-
-```
-GET variable[(index[,index])]
- [/FILE={file | *}]
- [/VARIABLES=variable…]
- [/NAMES=variable]
- [/MISSING={ACCEPT | OMIT | number}]
- [/SYSMIS={OMIT | number}].
-```
- The `READ` command reads numeric data from an SPSS system file,
-SPSS/PC+ system file, or SPSS portable file into a matrix variable or
-submatrix:
-
-- To read data into a variable, specify just its name following
- `GET`. The variable need not already exist; if it does, it is
- replaced. The variable will have as many columns as there are
- variables specified on the `VARIABLES` subcommand and as many rows
- as there are cases in the input file.
-
-- To read data into a submatrix, specify the name of an existing
- variable, followed by an indexing expression, just after `GET`.
- The submatrix must have as many columns as variables specified on
- `VARIABLES` and as many rows as cases in the input file.
-
-Specify the name or handle of the file to be read on `FILE`. Use
-`*`, or simply omit the `FILE` subcommand, to read from the active file.
-Reading from the active file is only permitted if it was already defined
-outside `MATRIX`.
-
-List the variables to be read as columns in the matrix on the
-`VARIABLES` subcommand. The list can use `TO` for collections of
-variables or `ALL` for all variables. If `VARIABLES` is omitted, all
-variables are read. Only numeric variables may be read.
-
-If a variable is named on `NAMES`, then the names of the variables
-read as data columns are stored in a string vector within the given
-name, replacing any existing matrix variable with that name. Variable
-names are truncated to 8 bytes.
-
-The `MISSING` and `SYSMIS` subcommands control the treatment of
-missing values in the input file. By default, any user- or
-system-missing data in the variables being read from the input causes an
-error that prevents `GET` from executing. To accept missing values,
-specify one of the following settings on `MISSING`:
-
-* `ACCEPT`: Accept user-missing values with no change.
-
- By default, system-missing values still yield an error. Use the
- `SYSMIS` subcommand to change this treatment:
-
- - `OMIT`: Skip any case that contains a system-missing value.
-
- - `number`: Recode the system-missing value to `number`.
-
-* `OMIT`: Skip any case that contains any user- or system-missing value.
-
-* `number`: Recode all user- and system-missing values to `number`.
-
-The `SYSMIS` subcommand has an effect only with `MISSING=ACCEPT`.
-
-## `SAVE` Command
-
-```
-SAVE expression
- [/OUTFILE={file | *}]
- [/VARIABLES=variable…]
- [/NAMES=expression]
- [/STRINGS=variable…].
-```
-The `SAVE` matrix command evaluates expression and writes the
-resulting matrix to an SPSS system file. In the system file, each
-matrix row becomes a case and each column becomes a variable.
-
-Specify the name or handle of the SPSS system file on the `OUTFILE`
-subcommand, or `*` to write the output as the new active file. The
-`OUTFILE` subcommand is required on the first `SAVE` command, in syntax
-order, within `MATRIX`. For `SAVE` commands after the first, the
-default output file is the same as the previous.
-
-When multiple `SAVE` commands write to one destination within a
-single `MATRIX`, the later commands append to the same output file. All
-the matrices written to the file must have the same number of columns.
-The `VARIABLES`, `NAMES`, and `STRINGS` subcommands are honored only for
-the first `SAVE` command that writes to a given file.
-
-By default, `SAVE` names the variables in the output file `COL1`
-through `COLn`. Use `VARIABLES` or `NAMES` to give the variables
-meaningful names. The `VARIABLES` subcommand accepts a comma-separated
-list of variable names. Its alternative, `NAMES`, instead accepts an
-expression that must evaluate to a row or column string vector of names.
-The number of names need not exactly match the number of columns in the
-matrix to be written: extra names are ignored; extra columns use default
-names.
-
-By default, `SAVE` assumes that the matrix to be written is all
-numeric. To write string columns, specify a comma-separated list of the
-string columns' variable names on `STRINGS`.
-
-## `MGET` Command
-
-```
-MGET [/FILE=file]
- [/TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}].
-```
-The `MGET` command reads the data from a [matrix file](index.md#matrix-files) into matrix variables.
-
-All of `MGET`'s subcommands are optional. Specify the name or handle
-of the matrix file to be read on the `FILE` subcommand; if it is
-omitted, then the command reads the active file.
-
-By default, `MGET` reads all of the data from the matrix file.
-Specify a space-delimited list of matrix types on `TYPE` to limit the
-kinds of data to the one specified:
-
-* `COV`: Covariance matrix.
-* `CORR`: Correlation coefficient matrix.
-* `MEAN`: Vector of means.
-* `STDDEV`: Vector of standard deviations.
-* `N`: Vector of case counts.
-* `COUNT`: Vector of counts.
-
-`MGET` reads the entire matrix file and automatically names, creates,
-and populates matrix variables using its contents. It constructs the
-name of each variable by concatenating the following:
-
-- A 2-character prefix that identifies the type of the matrix:
-
- * `CV`: Covariance matrix.
- * `CR`: Correlation coefficient matrix.
- * `MN`: Vector of means.
- * `SD`: Vector of standard deviations.
- * `NC`: Vector of case counts.
- * `CN`: Vector of counts.
-
-- If the matrix file has factor variables, `Fn`, where `n` is a number
- identifying a group of factors: `F1` for the first group, `F2` for
- the second, and so on. This part is omitted for pooled data (where
- the factors all have the system-missing value).
-
-- If the matrix file has split file variables, `Sn`, where n is a
- number identifying a split group: `S1` for the first group, `S2`
- for the second, and so on.
-
-If `MGET` chooses the name of an existing variable, it issues a
-warning and does not change the variable.
-
-## `MSAVE` Command
-
-```
-MSAVE expression
- /TYPE={COV | CORR | MEAN | STDDEV | N | COUNT}
- [/FACTOR=expression]
- [/SPLIT=expression]
- [/OUTFILE=file]
- [/VARIABLES=variable…]
- [/SNAMES=variable…]
- [/FNAMES=variable…].
-```
-The `MSAVE` command evaluates the expression specified just after the
-command name, and writes the resulting matrix to a [matrix file](index.md#matrix-files).
-
-The `TYPE` subcommand is required. It specifies the `ROWTYPE_` to
-write along with this matrix.
-
-The `FACTOR` and `SPLIT` subcommands are required on the first
-`MSAVE` if and only if the matrix file has factor or split variables,
-respectively. After that, their values are carried along from one
-`MSAVE` command to the next in syntax order as defaults. Each one takes
-an expression that must evaluate to a vector with the same number of
-entries as the matrix has factor or split variables, respectively. Each
-`MSAVE` only writes data for a single combination of factor and split
-variables, so many `MSAVE` commands (or one inside a loop) may be needed
-to write a complete set.
-
-The remaining `MSAVE` subcommands define the format of the matrix
-file. All of the `MSAVE` commands within a given matrix program write
-to the same matrix file, so these subcommands are only meaningful on the
-first `MSAVE` command within a matrix program. (If they are given again
-on later `MSAVE` commands, then they must have the same values as on the
-first.)
-
-The `OUTFILE` subcommand specifies the name or handle of the matrix
-file to be written. Output must go to an external file, not a data set
-or the active file.
-
-The `VARIABLES` subcommand specifies a comma-separated list of the
-names of the continuous variables to be written to the matrix file. The
-`TO` keyword can be used to define variables named with consecutive
-integer suffixes. These names become column names and names that appear
-in `VARNAME_` in the matrix file. `ROWTYPE_` and `VARNAME_` are not
-allowed on `VARIABLES`. If `VARIABLES` is omitted, then PSPP uses the
-names `COL1`, `COL2`, and so on.
-
-The `FNAMES` subcommand may be used to supply a comma-separated list
-of factor variable names. The default names are `FAC1`, `FAC2`, and so
-on.
-
-The `SNAMES` subcommand can supply a comma-separated list of split
-variable names. The default names are `SPL1`, `SPL2`, and so on.
-
-## `DISPLAY` Command
-
-```
-DISPLAY [{DICTIONARY | STATUS}].
-```
-The `DISPLAY` command makes PSPP display a table with the name and
-dimensions of each matrix variable. The `DICTIONARY` and `STATUS`
-keywords are accepted but have no effect.
-
-## `RELEASE` Command
-
-```
-RELEASE variable….
-```
-The `RELEASE` command accepts a comma-separated list of matrix
-variable names. It deletes each variable and releases the memory
-associated with it.
-
-The `END MATRIX` command releases all matrix variables.
+++ /dev/null
-# MCONVERT
-
-```
-MCONVERT
- [[MATRIX=]
- [IN({‘*’|'FILE'})]
- [OUT({‘*’|'FILE'})]]
- [/{REPLACE,APPEND}].
-```
-
-The `MCONVERT` command converts matrix data from a correlation matrix
-and a vector of standard deviations into a covariance matrix, or vice
-versa.
-
-By default, `MCONVERT` both reads and writes the active file. Use
-the `MATRIX` subcommand to specify other files. To read a matrix file,
-specify its name inside parentheses following `IN`. To write a matrix
-file, specify its name inside parentheses following `OUT`. Use `*` to
-explicitly specify the active file for input or output.
-
-When `MCONVERT` reads the input, by default it substitutes a
-correlation matrix and a vector of standard deviations each time it
-encounters a covariance matrix, and vice versa. Specify `/APPEND` to
-instead have `MCONVERT` add the other form of data without removing the
-existing data. Use `/REPLACE` to explicitly request removing the
-existing data.
-
-The `MCONVERT` command requires its input to be a matrix file. Use
-[`MATRIX DATA`](matrix-data.md) to convert text input into matrix file
-format.
-
--- /dev/null
+# MCONVERT
+
+```
+MCONVERT
+ [[MATRIX=]
+ [IN({‘*’|'FILE'})]
+ [OUT({‘*’|'FILE'})]]
+ [/{REPLACE,APPEND}].
+```
+
+The `MCONVERT` command converts matrix data from a correlation matrix
+and a vector of standard deviations into a covariance matrix, or vice
+versa.
+
+By default, `MCONVERT` both reads and writes the active file. Use
+the `MATRIX` subcommand to specify other files. To read a matrix file,
+specify its name inside parentheses following `IN`. To write a matrix
+file, specify its name inside parentheses following `OUT`. Use `*` to
+explicitly specify the active file for input or output.
+
+When `MCONVERT` reads the input, by default it substitutes a
+correlation matrix and a vector of standard deviations each time it
+encounters a covariance matrix, and vice versa. Specify `/APPEND` to
+instead have `MCONVERT` add the other form of data without removing the
+existing data. Use `/REPLACE` to explicitly request removing the
+existing data.
+
+The `MCONVERT` command requires its input to be a matrix file. Use
+[`MATRIX DATA`](matrix-data.md) to convert text input into matrix file
+format.
+
--- /dev/null
+# MEANS
+
+```
+MEANS [TABLES =]
+ {VAR_LIST}
+ [ BY {VAR_LIST} [BY {VAR_LIST} [BY {VAR_LIST} ... ]]]
+
+ [ /{VAR_LIST}
+ [ BY {VAR_LIST} [BY {VAR_LIST} [BY {VAR_LIST} ... ]]] ]
+
+ [/CELLS = [MEAN] [COUNT] [STDDEV] [SEMEAN] [SUM] [MIN] [MAX] [RANGE]
+ [VARIANCE] [KURT] [SEKURT]
+ [SKEW] [SESKEW] [FIRST] [LAST]
+ [HARMONIC] [GEOMETRIC]
+ [DEFAULT]
+ [ALL]
+ [NONE] ]
+
+ [/MISSING = [INCLUDE] [DEPENDENT]]
+```
+
+You can use the `MEANS` command to calculate the arithmetic mean and
+similar statistics, either for the dataset as a whole or for categories
+of data.
+
+The simplest form of the command is
+```
+MEANS V.
+```
+which calculates the mean, count and standard deviation for V. If you
+specify a grouping variable, for example
+```
+MEANS V BY G.
+```
+then the means, counts and standard deviations for V after having been
+grouped by G are calculated. Instead of the mean, count and standard
+deviation, you could specify the statistics in which you are interested:
+```
+MEANS X Y BY G
+ /CELLS = HARMONIC SUM MIN.
+```
+This example calculates the harmonic mean, the sum and the minimum
+values of X and Y grouped by G.
+
+The `CELLS` subcommand specifies which statistics to calculate. The
+available statistics are:
+- `MEAN`: The arithmetic mean.
+- `COUNT`: The count of the values.
+- `STDDEV`: The standard deviation.
+- `SEMEAN`: The standard error of the mean.
+- `SUM`: The sum of the values.
+- `MIN`: The minimum value.
+- `MAX`: The maximum value.
+- `RANGE`: The difference between the maximum and minimum values.
+- `VARIANCE`: The variance.
+- `FIRST`: The first value in the category.
+- `LAST`: The last value in the category.
+- `SKEW`: The skewness.
+- `SESKEW`: The standard error of the skewness.
+- `KURT`: The kurtosis
+- `SEKURT`: The standard error of the kurtosis.
+- `HARMONIC`: The harmonic mean.
+- `GEOMETRIC`: The geometric mean.
+
+In addition, three special keywords are recognized:
+- `DEFAULT`: This is the same as `MEAN COUNT STDDEV`.
+- `ALL`: All of the above statistics are calculated.
+- `NONE`: No statistics are calculated (only a summary is shown).
+
+More than one "table" can be specified in a single command. Each
+table is separated by a `/`. For example
+
+```
+ MEANS TABLES =
+ c d e BY x
+ /a b BY x y
+ /f BY y BY z.
+```
+
+has three tables (the `TABLE =` is optional). The first table has
+three dependent variables `c`, `d`, and `e` and a single categorical
+variable `x`. The second table has two dependent variables `a` and
+`b`, and two categorical variables `x` and `y`. The third table has a
+single dependent variable `f` and a categorical variable formed by the
+combination of `y` and `Z`.
+
+By default values are omitted from the analysis only if missing
+values (either system missing or user missing) for any of the variables
+directly involved in their calculation are encountered. This behaviour
+can be modified with the `/MISSING` subcommand. Three options are
+possible: `TABLE`, `INCLUDE` and `DEPENDENT`.
+
+`/MISSING = INCLUDE` says that user missing values, either in the
+dependent variables or in the categorical variables should be taken at
+their face value, and not excluded.
+
+`/MISSING = DEPENDENT` says that user missing values, in the
+dependent variables should be taken at their face value, however cases
+which have user missing values for the categorical variables should be
+omitted from the calculation.
+
+## Example
+
+The dataset in `repairs.sav` contains the mean time between failures
+(mtbf) for a sample of artifacts produced by different factories and
+trialed under different operating conditions. Since there are four
+combinations of categorical variables, by simply looking at the list
+of data, it would be hard to how the scores vary for each category.
+The syntax below shows one way of tabulating the mtbf in a way which
+is easier to understand.
+
+```
+get file='repairs.sav'.
+
+means tables = mtbf
+ by factory by environment.
+```
+
+The results are shown below. The figures shown indicate the mean,
+standard deviation and number of samples in each category. These
+figures however do not indicate whether the results are statistically
+significant. For that, you would need to use the procedures `ONEWAY`,
+`GLM` or `T-TEST` depending on the hypothesis being tested.
+
+```
+ Case Processing Summary
+┌────────────────────────────┬───────────────────────────────┐
+│ │ Cases │
+│ ├──────────┬─────────┬──────────┤
+│ │ Included │ Excluded│ Total │
+│ ├──┬───────┼─┬───────┼──┬───────┤
+│ │ N│Percent│N│Percent│ N│Percent│
+├────────────────────────────┼──┼───────┼─┼───────┼──┼───────┤
+│mtbf * factory * environment│30│ 100.0%│0│ .0%│30│ 100.0%│
+└────────────────────────────┴──┴───────┴─┴───────┴──┴───────┘
+
+ Report
+┌────────────────────────────────────────────┬─────┬──┬──────────────┐
+│Manufacturing facility Operating Environment│ Mean│ N│Std. Deviation│
+├────────────────────────────────────────────┼─────┼──┼──────────────┤
+│0 Temperate │ 7.26│ 9│ 2.57│
+│ Tropical │ 7.47│ 7│ 2.68│
+│ Total │ 7.35│16│ 2.53│
+├────────────────────────────────────────────┼─────┼──┼──────────────┤
+│1 Temperate │13.38│ 6│ 7.77│
+│ Tropical │ 8.20│ 8│ 8.39│
+│ Total │10.42│14│ 8.26│
+├────────────────────────────────────────────┼─────┼──┼──────────────┤
+│Total Temperate │ 9.71│15│ 5.91│
+│ Tropical │ 7.86│15│ 6.20│
+│ Total │ 8.78│30│ 6.03│
+└────────────────────────────────────────────┴─────┴──┴──────────────┘
+```
+
+PSPP does not limit the number of variables for which you can
+calculate statistics, nor number of categorical variables per layer,
+nor the number of layers. However, running `MEANS` on a large number
+of variables, or with categorical variables containing a large number
+of distinct values, may result in an extremely large output, which
+will not be easy to interpret. So you should consider carefully which
+variables to select for participation in the analysis.
+
--- /dev/null
+# MISSING VALUES
+
+In many situations, the data available for analysis is incomplete, so
+that a placeholder must be used to indicate that the value is unknown.
+One way that missing values are represented, for numeric data, is the
+["system-missing value"](../language/basics/missing-values.html).
+["system-missing value"](../language/basics/missing-values.html).
+Another, more flexible way is through "user-missing values" which are
+determined on a per variable basis.
+
+The `MISSING VALUES` command sets user-missing values for variables.
+
+```
+MISSING VALUES VAR_LIST (MISSING_VALUES).
+
+where MISSING_VALUES takes one of the following forms:
+ NUM1
+ NUM1, NUM2
+ NUM1, NUM2, NUM3
+ NUM1 THRU NUM2
+ NUM1 THRU NUM2, NUM3
+ STRING1
+ STRING1, STRING2
+ STRING1, STRING2, STRING3
+As part of a range, `LO` or `LOWEST` may take the place of NUM1;
+`HI` or `HIGHEST` may take the place of NUM2.
+```
+
+`MISSING VALUES` sets user-missing values for numeric and string
+variables. Long string variables may have missing values, but
+characters after the first 8 bytes of the missing value must be
+spaces.
+
+Specify a list of variables, followed by a list of their user-missing
+values in parentheses. Up to three discrete values may be given, or,
+for numeric variables only, a range of values optionally accompanied
+by a single discrete value. Ranges may be open-ended on one end,
+indicated through the use of the keyword `LO` or `LOWEST` or `HI` or
+`HIGHEST`.
+
+The `MISSING VALUES` command takes effect immediately. It is not
+affected by conditional and looping constructs such as `DO IF` or
+`LOOP`.
+
--- /dev/null
+# MRSETS
+
+`MRSETS` creates, modifies, deletes, and displays multiple response
+sets. A multiple response set is a set of variables that represent
+multiple responses to a survey question.
+
+Multiple responses are represented in one of the two following ways:
+
+- A "multiple dichotomy set" is analogous to a survey question with a
+ set of checkboxes. Each variable in the set is treated in a Boolean
+ fashion: one value (the "counted value") means that the box was
+ checked, and any other value means that it was not.
+
+- A "multiple category set" represents a survey question where the
+ respondent is instructed to list up to N choices. Each variable
+ represents one of the responses.
+
+```
+MRSETS
+ /MDGROUP NAME=NAME VARIABLES=VAR_LIST VALUE=VALUE
+ [CATEGORYLABELS={VARLABELS,COUNTEDVALUES}]
+ [{LABEL='LABEL',LABELSOURCE=VARLABEL}]
+
+ /MCGROUP NAME=NAME VARIABLES=VAR_LIST [LABEL='LABEL']
+
+ /DELETE NAME={[NAMES],ALL}
+
+ /DISPLAY NAME={[NAMES],ALL}
+```
+
+Any number of subcommands may be specified in any order.
+
+The `MDGROUP` subcommand creates a new multiple dichotomy set or
+replaces an existing multiple response set. The `NAME`, `VARIABLES`,
+and `VALUE` specifications are required. The others are optional:
+
+- `NAME` specifies the name used in syntax for the new multiple
+ dichotomy set. The name must begin with `$`; it must otherwise
+ follow the rules for [identifiers](../language/basics/tokens.md).
+ follow the rules for [identifiers](../language/basics/tokens.md).
+
+- `VARIABLES` specifies the variables that belong to the set. At
+ least two variables must be specified. The variables must be all
+ string or all numeric.
+
+- `VALUE` specifies the counted value. If the variables are numeric,
+ the value must be an integer. If the variables are strings, then
+ the value must be a string that is no longer than the shortest of
+ the variables in the set (ignoring trailing spaces).
+
+- `CATEGORYLABELS` optionally specifies the source of the labels for
+ each category in the set:
+
+ − `VARLABELS`, the default, uses variable labels or, for
+ variables without variable labels, variable names. PSPP warns
+ if two variables have the same variable label, since these
+ categories cannot be distinguished in output.
+
+ − `COUNTEDVALUES` instead uses each variable's value label for
+ the counted value. PSPP warns if two variables have the same
+ value label for the counted value or if one of the variables
+ lacks a value label, since such categories cannot be
+ distinguished in output.
+
+- `LABEL` optionally specifies a label for the multiple response set.
+ If neither `LABEL` nor `LABELSOURCE=VARLABEL` is specified, the set
+ is unlabeled.
+
+- `LABELSOURCE=VARLABEL` draws the multiple response set's label from
+ the first variable label among the variables in the set; if none of
+ the variables has a label, the name of the first variable is used.
+ `LABELSOURCE=VARLABEL` must be used with
+ `CATEGORYLABELS=COUNTEDVALUES`. It is mutually exclusive with
+ `LABEL`.
+
+The `MCGROUP` subcommand creates a new multiple category set or
+replaces an existing multiple response set. The `NAME` and
+`VARIABLES` specifications are required, and `LABEL` is optional.
+Their meanings are as described above in `MDGROUP`. PSPP warns if two
+variables in the set have different value labels for a single value,
+since each of the variables in the set should have the same possible
+categories.
+
+The `DELETE` subcommand deletes multiple response groups. A list of
+groups may be named within a set of required square brackets, or ALL
+may be used to delete all groups.
+
+The `DISPLAY` subcommand displays information about defined multiple
+response sets. Its syntax is the same as the `DELETE` subcommand.
+
+Multiple response sets are saved to and read from system files by,
+e.g., the `SAVE` and `GET` command. Otherwise, multiple response sets
+are currently used only by third party software.
+
--- /dev/null
+# N OF CASES
+
+```
+N [OF CASES] NUM_OF_CASES [ESTIMATED].
+```
+
+`N OF CASES` limits the number of cases processed by any procedures
+that follow it in the command stream. `N OF CASES 100`, for example,
+tells PSPP to disregard all cases after the first 100.
+
+When `N OF CASES` is specified after [`TEMPORARY`](temporary.md), it
+affects only the next procedure. Otherwise, cases beyond the limit
+specified are not processed by any later procedure.
+
+If the limit specified on `N OF CASES` is greater than the number of
+cases in the active dataset, it has no effect.
+
+When `N OF CASES` is used along with `SAMPLE` or `SELECT IF`, the
+case limit is applied to the cases obtained after sampling or case
+selection, regardless of how `N OF CASES` is placed relative to `SAMPLE`
+or `SELECT IF` in the command file. Thus, the commands `N OF CASES 100`
+and `SAMPLE .5` both randomly sample approximately half of the active
+dataset's cases, then select the first 100 of those sampled, regardless
+of their order in the command file.
+
+`N OF CASES` with the `ESTIMATED` keyword gives an estimated number of
+cases before `DATA LIST` or another command to read in data.
+`ESTIMATED` never limits the number of cases processed by procedures.
+PSPP currently does not use case count estimates.
+
--- /dev/null
+# NEW FILE
+
+```
+NEW FILE.
+```
+
+The `NEW FILE` command clears the dictionary and data from the current
+active dataset.
+
--- /dev/null
+# NPAR TESTS
+
+```
+NPAR TESTS
+ nonparametric test subcommands
+ .
+ .
+ .
+
+ [ /STATISTICS={DESCRIPTIVES} ]
+
+ [ /MISSING={ANALYSIS, LISTWISE} {INCLUDE, EXCLUDE} ]
+
+ [ /METHOD=EXACT [ TIMER [(N)] ] ]
+```
+
+`NPAR TESTS` performs nonparametric tests. Nonparametric tests make
+very few assumptions about the distribution of the data. One or more
+tests may be specified by using the corresponding subcommand. If the
+`/STATISTICS` subcommand is also specified, then summary statistics
+are produces for each variable that is the subject of any test.
+
+Certain tests may take a long time to execute, if an exact figure is
+required. Therefore, by default asymptotic approximations are used
+unless the subcommand `/METHOD=EXACT` is specified. Exact tests give
+more accurate results, but may take an unacceptably long time to
+perform. If the `TIMER` keyword is used, it sets a maximum time,
+after which the test is abandoned, and a warning message printed. The
+time, in minutes, should be specified in parentheses after the `TIMER`
+keyword. If the `TIMER` keyword is given without this figure, then a
+default value of 5 minutes is used.
+
+<!-- toc -->
+
+## Binomial test
+
+```
+ [ /BINOMIAL[(P)]=VAR_LIST[(VALUE1[, VALUE2)] ] ]
+```
+
+The `/BINOMIAL` subcommand compares the observed distribution of a
+dichotomous variable with that of a binomial distribution. The variable
+`P` specifies the test proportion of the binomial distribution. The
+default value of 0.5 is assumed if `P` is omitted.
+
+If a single value appears after the variable list, then that value is
+used as the threshold to partition the observed values. Values less
+than or equal to the threshold value form the first category. Values
+greater than the threshold form the second category.
+
+If two values appear after the variable list, then they are used as
+the values which a variable must take to be in the respective category.
+Cases for which a variable takes a value equal to neither of the
+specified values, take no part in the test for that variable.
+
+If no values appear, then the variable must assume dichotomous
+values. If more than two distinct, non-missing values for a variable
+under test are encountered then an error occurs.
+
+If the test proportion is equal to 0.5, then a two tailed test is
+reported. For any other test proportion, a one tailed test is reported.
+For one tailed tests, if the test proportion is less than or equal to
+the observed proportion, then the significance of observing the observed
+proportion or more is reported. If the test proportion is more than the
+observed proportion, then the significance of observing the observed
+proportion or less is reported. That is to say, the test is always
+performed in the observed direction.
+
+PSPP uses a very precise approximation to the gamma function to
+compute the binomial significance. Thus, exact results are reported
+even for very large sample sizes.
+
+## Chi-square Test
+
+```
+ [ /CHISQUARE=VAR_LIST[(LO,HI)] [/EXPECTED={EQUAL|F1, F2 ... FN}] ]
+```
+
+The `/CHISQUARE` subcommand produces a chi-square statistic for the
+differences between the expected and observed frequencies of the
+categories of a variable. Optionally, a range of values may appear
+after the variable list. If a range is given, then non-integer values
+are truncated, and values outside the specified range are excluded
+from the analysis.
+
+The `/EXPECTED` subcommand specifies the expected values of each
+category. There must be exactly one non-zero expected value, for each
+observed category, or the `EQUAL` keyword must be specified. You may
+use the notation `N*F` to specify N consecutive expected categories all
+taking a frequency of F. The frequencies given are proportions, not
+absolute frequencies. The sum of the frequencies need not be 1. If no
+`/EXPECTED` subcommand is given, then equal frequencies are expected.
+
+### Chi-square Example
+
+A researcher wishes to investigate whether there are an equal number of
+persons of each sex in a population. The sample chosen for invesigation
+is that from the `physiology.sav` dataset. The null hypothesis for the
+test is that the population comprises an equal number of males and
+females. The analysis is performed as shown below:
+
+```
+get file='physiology.sav'.
+
+npar test
+ /chisquare=sex.
+```
+
+
+There is only one test variable: sex. The other variables in
+the dataset are ignored.
+
+In the output, shown below, the summary box shows that in the sample,
+there are more males than females. However the significance of
+chi-square result is greater than 0.05—the most commonly accepted
+p-value—and therefore there is not enough evidence to reject the null
+hypothesis and one must conclude that the evidence does not indicate
+that there is an imbalance of the sexes in the population.
+
+```
+ Sex of subject
+┌──────┬──────────┬──────────┬────────┐
+│Value │Observed N│Expected N│Residual│
+├──────┼──────────┼──────────┼────────┤
+│Male │ 22│ 20.00│ 2.00│
+│Female│ 18│ 20.00│ ─2.00│
+│Total │ 40│ │ │
+└──────┴──────────┴──────────┴────────┘
+
+ Test Statistics
+┌──────────────┬──────────┬──┬───────────┐
+│ │Chi─square│df│Asymp. Sig.│
+├──────────────┼──────────┼──┼───────────┤
+│Sex of subject│ .40│ 1│ .527│
+└──────────────┴──────────┴──┴───────────┘
+```
+
+## Cochran Q Test
+
+```
+ [ /COCHRAN = VAR_LIST ]
+```
+
+The Cochran Q test is used to test for differences between three or
+more groups. The data for `VAR_LIST` in all cases must assume exactly
+two distinct values (other than missing values).
+
+The value of Q is displayed along with its asymptotic significance
+based on a chi-square distribution.
+
+## Friedman Test
+
+```
+ [ /FRIEDMAN = VAR_LIST ]
+```
+
+The Friedman test is used to test for differences between repeated
+measures when there is no indication that the distributions are normally
+distributed.
+
+A list of variables which contain the measured data must be given.
+The procedure prints the sum of ranks for each variable, the test
+statistic and its significance.
+
+## Kendall's W Test
+
+```
+ [ /KENDALL = VAR_LIST ]
+```
+
+The Kendall test investigates whether an arbitrary number of related
+samples come from the same population. It is identical to the
+Friedman test except that the additional statistic W, Kendall's
+Coefficient of Concordance is printed. It has the range \[0,1\]—a value
+of zero indicates no agreement between the samples whereas a value of
+unity indicates complete agreement.
+
+## Kolmogorov-Smirnov Test
+
+```
+ [ /KOLMOGOROV-SMIRNOV ({NORMAL [MU, SIGMA], UNIFORM [MIN, MAX], POISSON [LAMBDA], EXPONENTIAL [SCALE] }) = VAR_LIST ]
+```
+
+The one sample Kolmogorov-Smirnov subcommand is used to test whether
+or not a dataset is drawn from a particular distribution. Four
+distributions are supported: normal, uniform, Poisson and
+exponential.
+
+Ideally you should provide the parameters of the distribution against
+which you wish to test the data. For example, with the normal
+distribution the mean (`MU`) and standard deviation (`SIGMA`) should
+be given; with the uniform distribution, the minimum (`MIN`) and
+maximum (`MAX`) value should be provided. However, if the parameters
+are omitted they are imputed from the data. Imputing the parameters
+reduces the power of the test so should be avoided if possible.
+
+In the following example, two variables `score` and `age` are tested to
+see if they follow a normal distribution with a mean of 3.5 and a
+standard deviation of 2.0.
+```
+ NPAR TESTS
+ /KOLMOGOROV-SMIRNOV (NORMAL 3.5 2.0) = score age.
+```
+If the variables need to be tested against different distributions,
+then a separate subcommand must be used. For example the following
+syntax tests `score` against a normal distribution with mean of 3.5 and
+standard deviation of 2.0 whilst `age` is tested against a normal
+distribution of mean 40 and standard deviation 1.5.
+```
+ NPAR TESTS
+ /KOLMOGOROV-SMIRNOV (NORMAL 3.5 2.0) = score
+ /KOLMOGOROV-SMIRNOV (NORMAL 40 1.5) = age.
+```
+
+The abbreviated subcommand `K-S` may be used in place of
+`KOLMOGOROV-SMIRNOV`.
+
+## Kruskal-Wallis Test
+
+```
+ [ /KRUSKAL-WALLIS = VAR_LIST BY VAR (LOWER, UPPER) ]
+```
+
+The Kruskal-Wallis test is used to compare data from an arbitrary
+number of populations. It does not assume normality. The data to be
+compared are specified by `VAR_LIST`. The categorical variable
+determining the groups to which the data belongs is given by `VAR`.
+The limits `LOWER` and `UPPER` specify the valid range of `VAR`. If
+`UPPER` is smaller than `LOWER`, the PSPP will assume their values to
+be reversed. Any cases for which `VAR` falls outside `[LOWER, UPPER]`
+are ignored.
+
+The mean rank of each group as well as the chi-squared value and
+significance of the test are printed. The abbreviated subcommand `K-W`
+may be used in place of `KRUSKAL-WALLIS`.
+
+## Mann-Whitney U Test
+
+```
+ [ /MANN-WHITNEY = VAR_LIST BY var (GROUP1, GROUP2) ]
+```
+
+The Mann-Whitney subcommand is used to test whether two groups of
+data come from different populations. The variables to be tested should
+be specified in `VAR_LIST` and the grouping variable, that determines to
+which group the test variables belong, in `VAR`. `VAR` may be either a
+string or an alpha variable. `GROUP1` and `GROUP2` specify the two values
+of VAR which determine the groups of the test data. Cases for which the
+`VAR` value is neither `GROUP1` or `GROUP2` are ignored.
+
+The value of the Mann-Whitney U statistic, the Wilcoxon W, and the
+significance are printed. You may abbreviated the subcommand
+`MANN-WHITNEY` to `M-W`.
+
+
+## McNemar Test
+
+```
+ [ /MCNEMAR VAR_LIST [ WITH VAR_LIST [ (PAIRED) ]]]
+```
+
+Use McNemar's test to analyse the significance of the difference
+between pairs of correlated proportions.
+
+If the `WITH` keyword is omitted, then tests for all combinations of
+the listed variables are performed. If the `WITH` keyword is given, and
+the `(PAIRED)` keyword is also given, then the number of variables
+preceding `WITH` must be the same as the number following it. In this
+case, tests for each respective pair of variables are performed. If the
+`WITH` keyword is given, but the `(PAIRED)` keyword is omitted, then
+tests for each combination of variable preceding `WITH` against variable
+following `WITH` are performed.
+
+The data in each variable must be dichotomous. If there are more
+than two distinct variables an error will occur and the test will not be
+run.
+
+## Median Test
+
+```
+ [ /MEDIAN [(VALUE)] = VAR_LIST BY VARIABLE (VALUE1, VALUE2) ]
+```
+
+The median test is used to test whether independent samples come from
+populations with a common median. The median of the populations against
+which the samples are to be tested may be given in parentheses
+immediately after the `/MEDIAN` subcommand. If it is not given, the
+median is imputed from the union of all the samples.
+
+The variables of the samples to be tested should immediately follow
+the `=` sign. The keyword `BY` must come next, and then the grouping
+variable. Two values in parentheses should follow. If the first
+value is greater than the second, then a 2-sample test is performed
+using these two values to determine the groups. If however, the first
+variable is less than the second, then a k sample test is conducted
+and the group values used are all values encountered which lie in the
+range `[VALUE1,VALUE2]`.
+
+## Runs Test
+
+```
+ [ /RUNS ({MEAN, MEDIAN, MODE, VALUE}) = VAR_LIST ]
+```
+
+The `/RUNS` subcommand tests whether a data sequence is randomly
+ordered.
+
+It works by examining the number of times a variable's value crosses
+a given threshold. The desired threshold must be specified within
+parentheses. It may either be specified as a number or as one of
+`MEAN`, `MEDIAN` or `MODE`. Following the threshold specification comes
+the list of variables whose values are to be tested.
+
+The subcommand shows the number of runs, the asymptotic significance
+based on the length of the data.
+
+## Sign Test
+
+```
+ [ /SIGN VAR_LIST [ WITH VAR_LIST [ (PAIRED) ]]]
+```
+
+The `/SIGN` subcommand tests for differences between medians of the
+variables listed. The test does not make any assumptions about the
+distribution of the data.
+
+If the `WITH` keyword is omitted, then tests for all combinations of
+the listed variables are performed. If the `WITH` keyword is given, and
+the `(PAIRED)` keyword is also given, then the number of variables
+preceding `WITH` must be the same as the number following it. In this
+case, tests for each respective pair of variables are performed. If the
+`WITH` keyword is given, but the `(PAIRED)` keyword is omitted, then
+tests for each combination of variable preceding `WITH` against variable
+following `WITH` are performed.
+
+## Wilcoxon Matched Pairs Signed Ranks Test
+
+```
+ [ /WILCOXON VAR_LIST [ WITH VAR_LIST [ (PAIRED) ]]]
+```
+
+The `/WILCOXON` subcommand tests for differences between medians of
+the variables listed. The test does not make any assumptions about the
+variances of the samples. It does however assume that the distribution
+is symmetrical.
+
+If the `WITH` keyword is omitted, then tests for all combinations of
+the listed variables are performed. If the `WITH` keyword is given, and
+the `(PAIRED)` keyword is also given, then the number of variables
+preceding `WITH` must be the same as the number following it. In this
+case, tests for each respective pair of variables are performed. If the
+`WITH` keyword is given, but the `(PAIRED)` keyword is omitted, then
+tests for each combination of variable preceding `WITH` against variable
+following `WITH` are performed.
+
--- /dev/null
+# NUMERIC
+
+`NUMERIC` explicitly declares new numeric variables, optionally setting
+their output formats.
+
+```
+NUMERIC VAR_LIST [(FMT_SPEC)] [/VAR_LIST [(FMT_SPEC)]]...
+```
+
+ Specify the names of the new numeric variables as `VAR_LIST`. If
+you wish to set the variables' output formats, follow their names by
+an [output format](../language/datasets/formats/index.html) in
+an [output format](../language/datasets/formats/index.html) in
+parentheses; otherwise, the default is `F8.2`.
+
+ Variables created with `NUMERIC` are initialized to the
+system-missing value.
+
--- /dev/null
+# ONEWAY
+
+```
+ONEWAY
+ [/VARIABLES = ] VAR_LIST BY VAR
+ /MISSING={ANALYSIS,LISTWISE} {EXCLUDE,INCLUDE}
+ /CONTRAST= VALUE1 [, VALUE2] ... [,VALUEN]
+ /STATISTICS={DESCRIPTIVES,HOMOGENEITY}
+ /POSTHOC={BONFERRONI, GH, LSD, SCHEFFE, SIDAK, TUKEY, ALPHA ([VALUE])}
+```
+
+The `ONEWAY` procedure performs a one-way analysis of variance of
+variables factored by a single independent variable. It is used to
+compare the means of a population divided into more than two groups.
+
+The dependent variables to be analysed should be given in the
+`VARIABLES` subcommand. The list of variables must be followed by the
+`BY` keyword and the name of the independent (or factor) variable.
+
+You can use the `STATISTICS` subcommand to tell PSPP to display
+ancillary information. The options accepted are:
+- `DESCRIPTIVES`: Displays descriptive statistics about the groups
+factored by the independent variable.
+- `HOMOGENEITY`: Displays the Levene test of Homogeneity of Variance for
+the variables and their groups.
+
+The `CONTRAST` subcommand is used when you anticipate certain
+differences between the groups. The subcommand must be followed by a
+list of numerals which are the coefficients of the groups to be tested.
+The number of coefficients must correspond to the number of distinct
+groups (or values of the independent variable). If the total sum of the
+coefficients are not zero, then PSPP will display a warning, but will
+proceed with the analysis. The `CONTRAST` subcommand may be given up to
+10 times in order to specify different contrast tests. The `MISSING`
+subcommand defines how missing values are handled. If `LISTWISE` is
+specified then cases which have missing values for the independent
+variable or any dependent variable are ignored. If `ANALYSIS` is
+specified, then cases are ignored if the independent variable is missing
+or if the dependent variable currently being analysed is missing. The
+default is `ANALYSIS`. A setting of `EXCLUDE` means that variables
+whose values are user-missing are to be excluded from the analysis. A
+setting of `INCLUDE` means they are to be included. The default is
+`EXCLUDE`.
+
+Using the `POSTHOC` subcommand you can perform multiple pairwise
+comparisons on the data. The following comparison methods are
+available:
+- `LSD`: Least Significant Difference.
+- `TUKEY`: Tukey Honestly Significant Difference.
+- `BONFERRONI`: Bonferroni test.
+- `SCHEFFE`: Scheffé's test.
+- `SIDAK`: Sidak test.
+- `GH`: The Games-Howell test.
+
+Use the optional syntax `ALPHA(VALUE)` to indicate that `ONEWAY` should
+perform the posthoc tests at a confidence level of VALUE. If
+`ALPHA(VALUE)` is not specified, then the confidence level used is 0.05.
+
--- /dev/null
+# OUTPUT
+
+In the syntax below, the characters `[` and `]` are literals. They
+must appear in the syntax to be interpreted:
+
+```
+OUTPUT MODIFY
+ /SELECT TABLES
+ /TABLECELLS SELECT = [ CLASS... ]
+ FORMAT = FMT_SPEC.
+```
+
+`OUTPUT` changes the appearance of the tables in which results are
+printed. In particular, it can be used to set the format and precision
+to which results are displayed.
+
+After running this command, the default table appearance parameters
+will have been modified and each new output table generated uses the new
+parameters.
+
+Following `/TABLECELLS SELECT =` a list of cell classes must appear,
+enclosed in square brackets. This list determines the classes of values
+should be selected for modification. Each class can be:
+
+* `RESIDUAL`: Residual values. Default: `F40.2`.
+
+* `CORRELATION`: Correlations. Default: `F40.3`.
+
+* `PERCENT`: Percentages. Default: `PCT40.1`.
+
+* `SIGNIFICANCE`: Significance of tests (p-values). Default: `F40.3`.
+
+* `COUNT`: Counts or sums of weights. For a weighted data set, the
+ default is the weight variable's print format. For an unweighted
+ data set, the default is `F40.0`.
+
+For most other numeric values that appear in tables, [`SET
+FORMAT`](set.md#format)) may be used to specify the format.
+
+`FMT_SPEC` must be a valid [output
+format](../language/datasets/formats/index.md). Not all possible
+format](../language/datasets/formats/index.md). Not all possible
+formats are meaningful for all classes.
+
--- /dev/null
+# PERMISSIONS
+
+```
+PERMISSIONS
+ FILE='FILE_NAME'
+ /PERMISSIONS = {READONLY,WRITEABLE}.
+```
+
+`PERMISSIONS` changes the permissions of a file. There is one
+mandatory subcommand which specifies the permissions to which the file
+should be changed. If you set a file's permission to `READONLY`, then
+the file will become unwritable either by you or anyone else on the
+system. If you set the permission to `WRITEABLE`, then the file
+becomes writeable by you; the permissions afforded to others are
+unchanged. This command cannot be used if the [`SAFER`](set.md#safer)
+setting is active.
+
--- /dev/null
+# PRESERVE…RESTORE
+
+```
+PRESERVE.
+...
+RESTORE.
+```
+
+`PRESERVE` saves all of the settings that [`SET`](set.md) can adjust.
+A later `RESTORE` command restores those settings.
+
+`PRESERVE` can be nested up to five levels deep.
+
--- /dev/null
+# PRINT EJECT
+
+```
+PRINT EJECT
+ OUTFILE='FILE_NAME'
+ RECORDS=N_LINES
+ {NOTABLE,TABLE}
+ /[LINE_NO] ARG...
+
+ARG takes one of the following forms:
+ 'STRING' [START-END]
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+ VAR_LIST *
+```
+
+`PRINT EJECT` advances to the beginning of a new output page in the
+listing file or output file. It can also output data in the same way as
+`PRINT`.
+
+All `PRINT EJECT` subcommands are optional.
+
+Without `OUTFILE`, `PRINT EJECT` ejects the current page in the
+listing file, then it produces other output, if any is specified.
+
+With `OUTFILE`, `PRINT EJECT` writes its output to the specified
+file. The first line of output is written with `1` inserted in the
+first column. Commonly, this is the only line of output. If additional
+lines of output are specified, these additional lines are written with a
+space inserted in the first column, as with `PRINT`.
+
+See [PRINT](print.md) for more information on syntax and usage.
+
--- /dev/null
+# PRINT FORMATS
+
+```
+PRINT FORMATS VAR_LIST (FMT_SPEC) [VAR_LIST (FMT_SPEC)]....
+```
+
+`PRINT FORMATS` sets the print formats for the specified variables to
+the specified format specification.
+
+It has the same syntax as [`FORMATS`](formats.md), but `PRINT FORMATS`
+sets only print formats, not write formats.
+
--- /dev/null
+# PRINT SPACE
+
+```
+PRINT SPACE [OUTFILE='file_name'] [ENCODING='ENCODING'] [n_lines].
+```
+
+`PRINT SPACE` prints one or more blank lines to an output file.
+
+The `OUTFILE` subcommand is optional. It may be used to direct output
+to a file specified by file name as a string or [file
+handle](../language/files/file-handles.md). If `OUTFILE` is not
+handle](../language/files/file-handles.md). If `OUTFILE` is not
+specified then output is directed to the listing file.
+
+The `ENCODING` subcommand may only be used if `OUTFILE` is also used.
+It specifies the character encoding of the file. See
+[`INSERT`](insert.md), for information on supported
+encodings.
+
+`n_lines` is also optional. If present, it is an
+[expression](../language/expressions/index.md) for the number of
+blank lines to be printed. The expression must evaluate to a
+nonnegative value.
+
--- /dev/null
+# PRINT
+
+```
+PRINT
+ [OUTFILE='FILE_NAME']
+ [RECORDS=N_LINES]
+ [{NOTABLE,TABLE}]
+ [ENCODING='ENCODING']
+ [/[LINE_NO] ARG...]
+
+ARG takes one of the following forms:
+ 'STRING' [START]
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+ VAR_LIST *
+```
+
+ The `PRINT` transformation writes variable data to the listing file
+or an output file. `PRINT` is executed when a procedure causes the
+data to be read. Follow `PRINT` by
+[`EXECUTE`](execute.md) to print variable data without
+invoking a procedure.
+
+ All `PRINT` subcommands are optional. If no strings or variables are
+specified, `PRINT` outputs a single blank line.
+
+ The `OUTFILE` subcommand specifies the file to receive the output.
+The file may be a file name as a string or a [file
+handle](../language/files/file-handles.md). If `OUTFILE` is not
+present then output is sent to PSPP's output listing file. When
+`OUTFILE` is present, the output is written to the file in a plain
+text format, with a space inserted at beginning of each output line,
+even lines that otherwise would be blank.
+
+ The `ENCODING` subcommand may only be used if the `OUTFILE`
+subcommand is also used. It specifies the character encoding of the
+file. See [INSERT](insert.md), for information on supported
+encodings.
+
+ The `RECORDS` subcommand specifies the number of lines to be output.
+The number of lines may optionally be surrounded by parentheses.
+
+ `TABLE` will cause the `PRINT` command to output a table to the
+listing file that describes what it will print to the output file.
+`NOTABLE`, the default, suppresses this output table.
+
+ Introduce the strings and variables to be printed with a slash (`/`).
+Optionally, the slash may be followed by a number indicating which
+output line is specified. In the absence of this line number, the next
+line number is specified. Multiple lines may be specified using
+multiple slashes with the intended output for a line following its
+respective slash.
+
+ Literal strings may be printed. Specify the string itself.
+Optionally the string may be followed by a column number, specifying the
+column on the line where the string should start. Otherwise, the string
+is printed at the current position on the line.
+
+ Variables to be printed can be specified in the same ways as
+available for [`DATA LIST FIXED`](data-list.md#data-list-fixed). In addition,
+a variable list may be followed by an asterisk (`*`), which indicates
+that the variables should be printed in their dictionary print formats,
+separated by spaces. A variable list followed by a slash or the end of
+command is interpreted in the same way.
+
+ If a FORTRAN type specification is used to move backwards on the
+current line, then text is written at that point on the line, the line
+is truncated to that length, although additional text being added will
+again extend the line to that length.
+
--- /dev/null
+# QUICK CLUSTER
+
+```
+QUICK CLUSTER VAR_LIST
+ [/CRITERIA=CLUSTERS(K) [MXITER(MAX_ITER)] CONVERGE(EPSILON) [NOINITIAL]]
+ [/MISSING={EXCLUDE,INCLUDE} {LISTWISE, PAIRWISE}]
+ [/PRINT={INITIAL} {CLUSTER}]
+ [/SAVE[=[CLUSTER[(MEMBERSHIP_VAR)]] [DISTANCE[(DISTANCE_VAR)]]]
+```
+
+The `QUICK CLUSTER` command performs k-means clustering on the
+dataset. This is useful when you wish to allocate cases into clusters
+of similar values and you already know the number of clusters.
+
+The minimum specification is `QUICK CLUSTER` followed by the names of
+the variables which contain the cluster data. Normally you will also
+want to specify `/CRITERIA=CLUSTERS(K)` where `K` is the number of
+clusters. If this is not specified, then `K` defaults to 2.
+
+If you use `/CRITERIA=NOINITIAL` then a naive algorithm to select the
+initial clusters is used. This will provide for faster execution but
+less well separated initial clusters and hence possibly an inferior
+final result.
+
+`QUICK CLUSTER` uses an iterative algorithm to select the clusters
+centers. The subcommand `/CRITERIA=MXITER(MAX_ITER)` sets the maximum
+number of iterations. During classification, PSPP will continue
+iterating until until `MAX_ITER` iterations have been done or the
+convergence criterion (see below) is fulfilled. The default value of
+MAX_ITER is 2.
+
+If however, you specify `/CRITERIA=NOUPDATE` then after selecting the
+initial centers, no further update to the cluster centers is done. In
+this case, `MAX_ITER`, if specified, is ignored.
+
+The subcommand `/CRITERIA=CONVERGE(EPSILON)` is used to set the
+convergence criterion. The value of convergence criterion is
+`EPSILON` times the minimum distance between the _initial_ cluster
+centers. Iteration stops when the mean cluster distance between one
+iteration and the next is less than the convergence criterion. The
+default value of `EPSILON` is zero.
+
+The `MISSING` subcommand determines the handling of missing
+variables. If `INCLUDE` is set, then user-missing values are considered
+at their face value and not as missing values. If `EXCLUDE` is set,
+which is the default, user-missing values are excluded as well as
+system-missing values.
+
+If `LISTWISE` is set, then the entire case is excluded from the
+analysis whenever any of the clustering variables contains a missing
+value. If `PAIRWISE` is set, then a case is considered missing only if
+all the clustering variables contain missing values. Otherwise it is
+clustered on the basis of the non-missing values. The default is
+`LISTWISE`.
+
+The `PRINT` subcommand requests additional output to be printed. If
+`INITIAL` is set, then the initial cluster memberships will be printed.
+If `CLUSTER` is set, the cluster memberships of the individual cases are
+displayed (potentially generating lengthy output).
+
+You can specify the subcommand `SAVE` to ask that each case's cluster
+membership and the euclidean distance between the case and its cluster
+center be saved to a new variable in the active dataset. To save the
+cluster membership use the `CLUSTER` keyword and to save the distance
+use the `DISTANCE` keyword. Each keyword may optionally be followed by
+a variable name in parentheses to specify the new variable which is to
+contain the saved parameter. If no variable name is specified, then
+PSPP will create one.
+
--- /dev/null
+# RANK
+
+```
+RANK
+ [VARIABLES=] VAR_LIST [{A,D}] [BY VAR_LIST]
+ /TIES={MEAN,LOW,HIGH,CONDENSE}
+ /FRACTION={BLOM,TUKEY,VW,RANKIT}
+ /PRINT[={YES,NO}
+ /MISSING={EXCLUDE,INCLUDE}
+
+ /RANK [INTO VAR_LIST]
+ /NTILES(k) [INTO VAR_LIST]
+ /NORMAL [INTO VAR_LIST]
+ /PERCENT [INTO VAR_LIST]
+ /RFRACTION [INTO VAR_LIST]
+ /PROPORTION [INTO VAR_LIST]
+ /N [INTO VAR_LIST]
+ /SAVAGE [INTO VAR_LIST]
+```
+
+The `RANK` command ranks variables and stores the results into new
+variables.
+
+The `VARIABLES` subcommand, which is mandatory, specifies one or more
+variables whose values are to be ranked. After each variable, `A` or
+`D` may appear, indicating that the variable is to be ranked in
+ascending or descending order. Ascending is the default. If a `BY`
+keyword appears, it should be followed by a list of variables which are
+to serve as group variables. In this case, the cases are gathered into
+groups, and ranks calculated for each group.
+
+The `TIES` subcommand specifies how tied values are to be treated.
+The default is to take the mean value of all the tied cases.
+
+The `FRACTION` subcommand specifies how proportional ranks are to be
+calculated. This only has any effect if `NORMAL` or `PROPORTIONAL` rank
+functions are requested.
+
+The `PRINT` subcommand may be used to specify that a summary of the
+rank variables created should appear in the output.
+
+The function subcommands are `RANK`, `NTILES`, `NORMAL`, `PERCENT`,
+`RFRACTION`, `PROPORTION`, and `SAVAGE`. Any number of function
+subcommands may appear. If none are given, then the default is `RANK`.
+The `NTILES` subcommand must take an integer specifying the number of
+partitions into which values should be ranked. Each subcommand may be
+followed by the `INTO` keyword and a list of variables which are the
+variables to be created and receive the rank scores. There may be as
+many variables specified as there are variables named on the
+`VARIABLES` subcommand. If fewer are specified, then the variable
+names are automatically created.
+
+The `MISSING` subcommand determines how user missing values are to be
+treated. A setting of `EXCLUDE` means that variables whose values are
+user-missing are to be excluded from the rank scores. A setting of
+`INCLUDE` means they are to be included. The default is `EXCLUDE`.
+
--- /dev/null
+# RECODE
+
+The `RECODE` command is used to transform existing values into other,
+user specified values. The general form is:
+
+```
+RECODE SRC_VARS
+ (SRC_VALUE SRC_VALUE ... = DEST_VALUE)
+ (SRC_VALUE SRC_VALUE ... = DEST_VALUE)
+ (SRC_VALUE SRC_VALUE ... = DEST_VALUE) ...
+ [INTO DEST_VARS].
+```
+
+Following the `RECODE` keyword itself comes `SRC_VARS`, a list of
+variables whose values are to be transformed. These variables must
+all string or all numeric variables.
+
+After the list of source variables, there should be one or more
+"mappings". Each mapping is enclosed in parentheses, and contains the
+source values and a destination value separated by a single `=`. The
+source values are used to specify the values in the dataset which need
+to change, and the destination value specifies the new value to which
+they should be changed. Each SRC_VALUE may take one of the following
+forms:
+
+* `NUMBER` (numeric source variables only)
+ Matches a number.
+
+* `STRING` (string source variables only)
+ Matches a string enclosed in single or double quotes.
+
+* `NUM1 THRU NUM2` (numeric source variables only)
+ Matches all values in the range between `NUM1` and `NUM2`, including
+ both endpoints of the range. `NUM1` should be less than `NUM2`.
+ Open-ended ranges may be specified using `LO` or `LOWEST` for `NUM1`
+ or `HI` or `HIGHEST` for `NUM2`.
+
+* `MISSING`
+ Matches system missing and user missing values.
+
+* `SYSMIS` (numeric source variables only)
+ Match system-missing values.
+
+* `ELSE`
+ Matches any values that are not matched by any other `SRC_VALUE`.
+ This should appear only as the last mapping in the command.
+
+After the source variables comes an `=` and then the `DEST_VALUE`,
+which may take any of the following forms:
+
+* `NUMBER` (numeric destination variables only)
+ A literal numeric value to which the source values should be
+ changed.
+
+* `STRING` (string destination variables only)
+ A literal string value (enclosed in quotation marks) to which the
+ source values should be changed. This implies the destination
+ variable must be a string variable.
+
+* `SYSMIS` (numeric destination variables only)
+ The keyword `SYSMIS` changes the value to the system missing value.
+ This implies the destination variable must be numeric.
+
+* `COPY`
+ The special keyword `COPY` means that the source value should not be
+ modified, but copied directly to the destination value. This is
+ meaningful only if `INTO DEST_VARS` is specified.
+
+Mappings are considered from left to right. Therefore, if a value is
+matched by a `SRC_VALUE` from more than one mapping, the first
+(leftmost) mapping which matches is considered. Any subsequent
+matches are ignored.
+
+The clause `INTO DEST_VARS` is optional. The behaviour of the command
+is slightly different depending on whether it appears or not:
+
+* Without `INTO DEST_VARS`, then values are recoded "in place". This
+ means that the recoded values are written back to the source variables
+ from whence the original values came. In this case, the DEST_VALUE
+ for every mapping must imply a value which has the same type as the
+ SRC_VALUE. For example, if the source value is a string value, it is
+ not permissible for DEST_VALUE to be `SYSMIS` or another forms which
+ implies a numeric result. It is also not permissible for DEST_VALUE
+ to be longer than the width of the source variable.
+
+ The following example recodes two numeric variables `x` and `y` in
+ place. 0 becomes 99, the values 1 to 10 inclusive are unchanged,
+ values 1000 and higher are recoded to the system-missing value, and
+ all other values are changed to 999:
+
+ ```
+ RECODE x y
+ (0 = 99)
+ (1 THRU 10 = COPY)
+ (1000 THRU HIGHEST = SYSMIS)
+ (ELSE = 999).
+ ```
+
+* With `INTO DEST_VARS`, recoded values are written into the variables
+ specified in `DEST_VARS`, which must therefore contain a list of
+ valid variable names. The number of variables in `DEST_VARS` must
+ be the same as the number of variables in `SRC_VARS` and the
+ respective order of the variables in `DEST_VARS` corresponds to the
+ order of `SRC_VARS`. That is to say, the recoded value whose
+ original value came from the Nth variable in `SRC_VARS` is placed
+ into the Nth variable in `DEST_VARS`. The source variables are
+ unchanged. If any mapping implies a string as its destination
+ value, then the respective destination variable must already exist,
+ or have been declared using `STRING` or another transformation.
+ Numeric variables however are automatically created if they don't
+ already exist.
+
+ The following example deals with two source variables, `a` and `b`
+ which contain string values. Hence there are two destination
+ variables `v1` and `v2`. Any cases where `a` or `b` contain the
+ values `apple`, `pear` or `pomegranate` result in `v1` or `v2` being
+ filled with the string `fruit` whilst cases with `tomato`, `lettuce`
+ or `carrot` result in `vegetable`. Other values produce the result
+ `unknown`:
+
+ ```
+ STRING v1 (A20).
+ STRING v2 (A20).
+
+ RECODE a b
+ ("apple" "pear" "pomegranate" = "fruit")
+ ("tomato" "lettuce" "carrot" = "vegetable")
+ (ELSE = "unknown")
+ INTO v1 v2.
+ ```
+
+There is one special mapping, not mentioned above. If the source
+variable is a string variable then a mapping may be specified as
+`(CONVERT)`. This mapping, if it appears must be the last mapping
+given and the `INTO DEST_VARS` clause must also be given and must not
+refer to a string variable. `CONVERT` causes a number specified as a
+string to be converted to a numeric value. For example it converts
+the string `"3"` into the numeric value 3 (note that it does not
+convert `three` into 3). If the string cannot be parsed as a number,
+then the system-missing value is assigned instead. In the following
+example, cases where the value of `x` (a string variable) is the empty
+string, are recoded to 999 and all others are converted to the numeric
+equivalent of the input value. The results are placed into the
+numeric variable `y`:
+
+```
+RECODE x ("" = 999) (CONVERT) INTO y.
+```
+
+It is possible to specify multiple recodings on a single command.
+Introduce additional recodings with a slash (`/`) to separate them from
+the previous recodings:
+
+```
+RECODE
+ a (2 = 22) (ELSE = 99)
+ /b (1 = 3) INTO z.
+```
+
+Here we have two recodings. The first affects the source variable `a`
+and recodes in-place the value 2 into 22 and all other values to 99.
+The second recoding copies the values of `b` into the variable `z`,
+changing any instances of 1 into 3.
+
--- /dev/null
+# REGRESSION
+
+The `REGRESSION` procedure fits linear models to data via least-squares
+estimation. The procedure is appropriate for data which satisfy those
+assumptions typical in linear regression:
+
+- The data set contains \\(n\\) observations of a dependent variable,
+ say \\(y_1,...,y_n\\), and \\(n\\) observations of one or more
+ explanatory variables. Let \\(x_{11}, x_{12}, ..., x_{1n}\\\)
+ denote the \\(n\\) observations of the first explanatory variable;
+ \\(x_{21},...,x_{2n}\\) denote the \\(n\\) observations of the
+ second explanatory variable; \\(x_{k1},...,x_{kn}\\) denote the
+ \\(n\\) observations of the kth explanatory variable.
+
+- The dependent variable \\(y\\) has the following relationship to the
+ explanatory variables: \\(y_i = b_0 + b_1 x_{1i} + ... + b_k
+ x_{ki} + z_i\\) where \\(b_0, b_1, ..., b_k\\) are unknown
+ coefficients, and \\(z_1,...,z_n\\) are independent, normally
+ distributed "noise" terms with mean zero and common variance. The
+ noise, or "error" terms are unobserved. This relationship is called
+ the "linear model".
+
+ The `REGRESSION` procedure estimates the coefficients
+\\(b_0,...,b_k\\) and produces output relevant to inferences for the
+linear model.
+
+## Syntax
+
+```
+REGRESSION
+ /VARIABLES=VAR_LIST
+ /DEPENDENT=VAR_LIST
+ /STATISTICS={ALL, DEFAULTS, R, COEFF, ANOVA, BCOV, CI[CONF, TOL]}
+ { /ORIGIN | /NOORIGIN }
+ /SAVE={PRED, RESID}
+```
+
+The `REGRESSION` procedure reads the active dataset and outputs
+statistics relevant to the linear model specified by the user.
+
+The `VARIABLES` subcommand, which is required, specifies the list of
+variables to be analyzed. Keyword `VARIABLES` is required. The
+`DEPENDENT` subcommand specifies the dependent variable of the linear
+model. The `DEPENDENT` subcommand is required. All variables listed
+in the `VARIABLES` subcommand, but not listed in the `DEPENDENT`
+subcommand, are treated as explanatory variables in the linear model.
+
+All other subcommands are optional:
+
+The `STATISTICS` subcommand specifies which statistics are to be
+displayed. The following keywords are accepted:
+
+* `ALL`
+ All of the statistics below.
+* `R`
+ The ratio of the sums of squares due to the model to the total sums
+ of squares for the dependent variable.
+* `COEFF`
+ A table containing the estimated model coefficients and their
+ standard errors.
+* `CI (CONF)`
+ This item is only relevant if `COEFF` has also been selected. It
+ specifies that the confidence interval for the coefficients should
+ be printed. The optional value `CONF`, which must be in
+ parentheses, is the desired confidence level expressed as a
+ percentage.
+* `ANOVA`
+ Analysis of variance table for the model.
+* `BCOV`
+ The covariance matrix for the estimated model coefficients.
+* `TOL`
+ The variance inflation factor and its reciprocal. This has no
+ effect unless `COEFF` is also given.
+* `DEFAULT`
+ The same as if `R`, `COEFF`, and `ANOVA` had been selected. This is
+ what you get if the `/STATISTICS` command is not specified, or if it
+ is specified without any parameters.
+
+The `ORIGIN` and `NOORIGIN` subcommands are mutually exclusive.
+`ORIGIN` indicates that the regression should be performed through the
+origin. You should use this option if, and only if you have reason to
+believe that the regression does indeed pass through the origin -- that
+is to say, the value b_0 above, is zero. The default is `NOORIGIN`.
+
+The `SAVE` subcommand causes PSPP to save the residuals or predicted
+values from the fitted model to the active dataset. PSPP will store the
+residuals in a variable called `RES1` if no such variable exists, `RES2`
+if `RES1` already exists, `RES3` if `RES1` and `RES2` already exist,
+etc. It will choose the name of the variable for the predicted values
+similarly, but with `PRED` as a prefix. When `SAVE` is used, PSPP
+ignores `TEMPORARY`, treating temporary transformations as permanent.
+
+## Example
+
+The following PSPP syntax will generate the default output and save the
+predicted values and residuals to the active dataset.
+
+```
+title 'Demonstrate REGRESSION procedure'.
+data list / v0 1-2 (A) v1 v2 3-22 (10).
+begin data.
+b 7.735648 -23.97588
+b 6.142625 -19.63854
+a 7.651430 -25.26557
+c 6.125125 -16.57090
+a 8.245789 -25.80001
+c 6.031540 -17.56743
+a 9.832291 -28.35977
+c 5.343832 -16.79548
+a 8.838262 -29.25689
+b 6.200189 -18.58219
+end data.
+list.
+regression /variables=v0 v1 v2 /statistics defaults /dependent=v2
+ /save pred resid /method=enter.
+
+```
--- /dev/null
+# RELIABILITY
+
+```
+RELIABILITY
+ /VARIABLES=VAR_LIST
+ /SCALE (NAME) = {VAR_LIST, ALL}
+ /MODEL={ALPHA, SPLIT[(N)]}
+ /SUMMARY={TOTAL,ALL}
+ /MISSING={EXCLUDE,INCLUDE}
+```
+
+The `RELIABILITY` command performs reliability analysis on the data.
+
+The `VARIABLES` subcommand is required. It determines the set of
+variables upon which analysis is to be performed.
+
+The `SCALE` subcommand determines the variables for which reliability
+is to be calculated. If `SCALE` is omitted, then analysis for all
+variables named in the `VARIABLES` subcommand are used. Optionally, the
+NAME parameter may be specified to set a string name for the scale.
+
+The `MODEL` subcommand determines the type of analysis. If `ALPHA`
+is specified, then Cronbach's Alpha is calculated for the scale. If the
+model is `SPLIT`, then the variables are divided into 2 subsets. An
+optional parameter `N` may be given, to specify how many variables to be
+in the first subset. If `N` is omitted, then it defaults to one half of
+the variables in the scale, or one half minus one if there are an odd
+number of variables. The default model is `ALPHA`.
+
+By default, any cases with user missing, or system missing values for
+any variables given in the `VARIABLES` subcommand are omitted from the
+analysis. The `MISSING` subcommand determines whether user missing
+values are included or excluded in the analysis.
+
+The `SUMMARY` subcommand determines the type of summary analysis to
+be performed. Currently there is only one type: `SUMMARY=TOTAL`, which
+displays per-item analysis tested against the totals.
+
+## Example
+
+Before analysing the results of a survey—particularly for a multiple
+choice survey—it is desirable to know whether the respondents have
+considered their answers or simply provided random answers.
+
+In the following example the survey results from the file `hotel.sav`
+are used. All five survey questions are included in the reliability
+analysis. However, before running the analysis, the data must be
+preprocessed. An examination of the survey questions reveals that two
+questions, viz: v3 and v5 are negatively worded, whereas the others
+are positively worded. All questions must be based upon the same
+scale for the analysis to be meaningful. One could use the
+[`RECODE`](recode.md) command, however a simpler way is to use
+[`COMPUTE`](compute.md) and this is what is done in the syntax below.
+
+```
+get file="hotel.sav".
+
+* Recode V3 and V5 inverting the sense of the values.
+compute v3 = 6 - v3.
+compute v5 = 6 - v5.
+
+reliability
+ /variables= all
+ /model=alpha.
+```
+
+In this case, all variables in the data set are used, so we can use
+the special keyword `ALL`.
+
+The output, below, shows that Cronbach's Alpha is 0.11 which is a
+value normally considered too low to indicate consistency within the
+data. This is possibly due to the small number of survey questions.
+The survey should be redesigned before serious use of the results are
+applied.
+
+```
+Scale: ANY
+
+Case Processing Summary
+┌────────┬──┬───────┐
+│Cases │ N│Percent│
+├────────┼──┼───────┤
+│Valid │17│ 100.0%│
+│Excluded│ 0│ .0%│
+│Total │17│ 100.0%│
+└────────┴──┴───────┘
+
+ Reliability Statistics
+┌────────────────┬──────────┐
+│Cronbach's Alpha│N of Items│
+├────────────────┼──────────┤
+│ .11│ 5│
+└────────────────┴──────────┘
+```
--- /dev/null
+# RENAME VARIABLES
+
+`RENAME VARIABLES` changes the names of variables in the active dataset.
+
+```
+RENAME VARIABLES (OLD_NAMES=NEW_NAMES)... .
+```
+
+Specify lists of the old variable names and new variable names,
+separated by an equals sign (`=`), within parentheses. There must be
+the same number of old and new variable names. Each old variable is
+renamed to the corresponding new variable name. Multiple
+parenthesized groups of variables may be specified. When the old and
+new variable names contain only a single variable name, the
+parentheses are optional.
+
+`RENAME VARIABLES` takes effect immediately. It does not cause the
+data to be read.
+
+`RENAME VARIABLES` may not be specified following
+[`TEMPORARY`](temporary.md).
+
--- /dev/null
+# REPEATING DATA
+
+```
+REPEATING DATA
+ /STARTS=START-END
+ /OCCURS=N_OCCURS
+ /FILE='FILE_NAME'
+ /LENGTH=LENGTH
+ /CONTINUED[=CONT_START-CONT_END]
+ /ID=ID_START-ID_END=ID_VAR
+ /{TABLE,NOTABLE}
+ /DATA=VAR_SPEC...
+
+where each VAR_SPEC takes one of the forms
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+```
+
+`REPEATING DATA` parses groups of data repeating in a uniform format,
+possibly with several groups on a single line. Each group of data
+corresponds with one case. `REPEATING DATA` may only be used within
+[`INPUT PROGRAM`](input-program.md). When used with [`DATA
+LIST`](data-list.md), it can be used to parse groups of cases that
+share a subset of variables but differ in their other data.
+
+The `STARTS` subcommand is required. Specify a range of columns,
+using literal numbers or numeric variable names. This range specifies
+the columns on the first line that are used to contain groups of data.
+The ending column is optional. If it is not specified, then the
+record width of the input file is used. For the [inline
+file](begin-data.md), this is 80 columns; for a file with fixed record
+widths it is the record width; for other files it is 1024 characters
+by default.
+
+The `OCCURS` subcommand is required. It must be a number or the name
+of a numeric variable. Its value is the number of groups present in the
+current record.
+
+The `DATA` subcommand is required. It must be the last subcommand
+specified. It is used to specify the data present within each
+repeating group. Column numbers are specified relative to the
+beginning of a group at column 1. Data is specified in the same way
+as with [`DATA LIST FIXED`](data-list.md#data-list-fixed).
+
+All other subcommands are optional.
+
+`FILE` specifies the file to read, either a file name as a string or a
+[file handle](../language/files/file-handles.md). If `FILE` is not
+[file handle](../language/files/file-handles.md). If `FILE` is not
+present then the default is the last file handle used on the most
+recent `DATA LIST` command.
+
+By default `REPEATING DATA` will output a table describing how it
+will parse the input data. Specifying `NOTABLE` will disable this
+behavior; specifying `TABLE` will explicitly enable it.
+
+The `LENGTH` subcommand specifies the length in characters of each
+group. If it is not present then length is inferred from the `DATA`
+subcommand. `LENGTH` may be a number or a variable name.
+
+Normally all the data groups are expected to be present on a single
+line. Use the `CONTINUED` command to indicate that data can be
+continued onto additional lines. If data on continuation lines starts
+at the left margin and continues through the entire field width, no
+column specifications are necessary on `CONTINUED`. Otherwise, specify
+the possible range of columns in the same way as on `STARTS`.
+
+When data groups are continued from line to line, it is easy for
+cases to get out of sync through careless hand editing. The `ID`
+subcommand allows a case identifier to be present on each line of
+repeating data groups. `REPEATING DATA` will check for the same
+identifier on each line and report mismatches. Specify the range of
+columns that the identifier will occupy, followed by an equals sign
+(`=`) and the identifier variable name. The variable must already have
+been declared with `NUMERIC` or another command.
+
+`REPEATING DATA` should be the last command given within an [`INPUT
+PROGRAM`](input-program.md). It should not be enclosed within
+[`LOOP`…`END LOOP`](loop.md). Use `DATA LIST` before, not after,
+[`REPEATING DATA`](repeating-data.md).
--- /dev/null
+# REREAD
+
+```
+REREAD [FILE=handle] [COLUMN=column] [ENCODING='ENCODING'].
+```
+
+The `REREAD` transformation allows the previous input line in a data
+file already processed by `DATA LIST` or another input command to be
+re-read for further processing.
+
+The `FILE` subcommand, which is optional, is used to specify the file
+to have its line re-read. The file must be specified as the name of a
+[file handle](../language/files/file-handles.md). If `FILE` is not
+[file handle](../language/files/file-handles.md). If `FILE` is not
+specified then the file specified on the most recent `DATA LIST`
+command is assumed.
+
+By default, the line re-read is re-read in its entirety. With the
+`COLUMN` subcommand, a prefix of the line can be exempted from
+re-reading. Specify an
+[expression](../language/expressions/index.md) evaluating to the
+[expression](../language/expressions/index.md) evaluating to the
+first column that should be included in the re-read line. Columns are
+numbered from 1 at the left margin.
+
+The `ENCODING` subcommand may only be used if the `FILE` subcommand is
+also used. It specifies the character encoding of the file. See
+[`INSERT`](insert.md) for information on supported encodings.
+
+Issuing `REREAD` multiple times will not back up in the data file.
+Instead, it will re-read the same line multiple times.
+
--- /dev/null
+# ROC
+
+```
+ROC
+ VAR_LIST BY STATE_VAR (STATE_VALUE)
+ /PLOT = { CURVE [(REFERENCE)], NONE }
+ /PRINT = [ SE ] [ COORDINATES ]
+ /CRITERIA = [ CUTOFF({INCLUDE,EXCLUDE}) ]
+ [ TESTPOS ({LARGE,SMALL}) ]
+ [ CI (CONFIDENCE) ]
+ [ DISTRIBUTION ({FREE, NEGEXPO }) ]
+ /MISSING={EXCLUDE,INCLUDE}
+```
+
+The `ROC` command is used to plot the receiver operating
+characteristic curve of a dataset, and to estimate the area under the
+curve. This is useful for analysing the efficacy of a variable as a
+predictor of a state of nature.
+
+The mandatory `VAR_LIST` is the list of predictor variables. The
+variable `STATE_VAR` is the variable whose values represent the actual
+states, and `STATE_VALUE` is the value of this variable which represents
+the positive state.
+
+The optional subcommand `PLOT` is used to determine if and how the
+`ROC` curve is drawn. The keyword `CURVE` means that the `ROC` curve
+should be drawn, and the optional keyword `REFERENCE`, which should be
+enclosed in parentheses, says that the diagonal reference line should be
+drawn. If the keyword `NONE` is given, then no `ROC` curve is drawn.
+By default, the curve is drawn with no reference line.
+
+The optional subcommand `PRINT` determines which additional tables
+should be printed. Two additional tables are available. The `SE`
+keyword says that standard error of the area under the curve should be
+printed as well as the area itself. In addition, a p-value for the null
+hypothesis that the area under the curve equals 0.5 is printed. The
+`COORDINATES` keyword says that a table of coordinates of the `ROC`
+curve should be printed.
+
+The `CRITERIA` subcommand has four optional parameters:
+
+- The `TESTPOS` parameter may be `LARGE` or `SMALL`. `LARGE` is the
+ default, and says that larger values in the predictor variables are
+ to be considered positive. `SMALL` indicates that smaller values
+ should be considered positive.
+
+- The `CI` parameter specifies the confidence interval that should be
+ printed. It has no effect if the `SE` keyword in the `PRINT`
+ subcommand has not been given.
+
+- The `DISTRIBUTION` parameter determines the method to be used when
+ estimating the area under the curve. There are two possibilities,
+ viz: `FREE` and `NEGEXPO`. The `FREE` method uses a non-parametric
+ estimate, and the `NEGEXPO` method a bi-negative exponential
+ distribution estimate. The `NEGEXPO` method should only be used
+ when the number of positive actual states is equal to the number of
+ negative actual states. The default is `FREE`.
+
+- The `CUTOFF` parameter is for compatibility and is ignored.
+
+The `MISSING` subcommand determines whether user missing values are to
+be included or excluded in the analysis. The default behaviour is to
+exclude them. Cases are excluded on a listwise basis; if any of the
+variables in `VAR_LIST` or if the variable `STATE_VAR` is missing,
+then the entire case is excluded.
+
--- /dev/null
+# SAMPLE
+
+```
+SAMPLE NUM1 [FROM NUM2].
+```
+
+`SAMPLE` randomly samples a proportion of the cases in the active
+file. Unless it follows `TEMPORARY`, it permanently removes cases
+from the active dataset.
+
+The proportion to sample may be expressed as a single number between 0
+and 1. If `N` is the number of currently-selected cases in the active
+dataset, then `SAMPLE K.` will select approximately `K×N` cases.
+
+The proportion to sample can also be specified in the style `SAMPLE M
+FROM N`. With this style, cases are selected as follows:
+
+1. If `N` is the number of currently-selected cases in the active
+ dataset, exactly `M` cases are selected.
+
+2. If `N` is greater than the number of currently-selected cases in
+ the active dataset, an equivalent proportion of cases are selected.
+
+3. If `N` is less than the number of currently-selected cases in the
+ active, exactly `M` cases are selected *from the first `N` cases*
+ in the active dataset.
+
+`SAMPLE` and `SELECT IF` are performed in the order specified by the
+syntax file.
+
+`SAMPLE` is always performed before [`N OF CASES`](n.md), regardless
+of ordering in the syntax file.
+
+The same values for `SAMPLE` may result in different samples. To
+obtain the same sample, use the `SET` command to set the random number
+seed to the same value before each `SAMPLE`. Different samples may
+still result when the file is processed on systems with different
+machine types or PSPP versions. By default, the random number seed is
+based on the system time.
+
--- /dev/null
+# SAVE DATA COLLECTION
+
+```
+SAVE DATA COLLECTION
+ /OUTFILE={'FILE_NAME',FILE_HANDLE}
+ /METADATA={'FILE_NAME',FILE_HANDLE}
+ /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
+ /PERMISSIONS={WRITEABLE,READONLY}
+ /DROP=VAR_LIST
+ /KEEP=VAR_LIST
+ /VERSION=VERSION
+ /RENAME=(SRC_NAMES=TARGET_NAMES)...
+ /NAMES
+ /MAP
+```
+
+Like `SAVE`, `SAVE DATA COLLECTION` writes the dictionary and data in
+the active dataset to a system file. In addition, it writes metadata to
+an additional XML metadata file.
+
+`OUTFILE` is required. Specify the system file to be written as a
+string file name or a [file
+handle](../language/files/file-handles.md).
+handle](../language/files/file-handles.md).
+
+`METADATA` is also required. Specify the metadata file to be written
+as a string file name or a file handle. Metadata files customarily use
+a `.mdd` extension.
+
+The current implementation of this command is experimental. It only
+outputs an approximation of the metadata file format. Please report
+bugs.
+
+Other subcommands are optional. They have the same meanings as in
+the `SAVE` command.
+
+`SAVE DATA COLLECTION` causes the data to be read. It is a procedure.
+
--- /dev/null
+# SAVE TRANSLATE
+
+```
+SAVE TRANSLATE
+ /OUTFILE={'FILE_NAME',FILE_HANDLE}
+ /TYPE={CSV,TAB}
+ [/REPLACE]
+ [/MISSING={IGNORE,RECODE}]
+
+ [/DROP=VAR_LIST]
+ [/KEEP=VAR_LIST]
+ [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+ [/UNSELECTED={RETAIN,DELETE}]
+ [/MAP]
+
+ ...additional subcommands depending on TYPE...
+```
+
+The `SAVE TRANSLATE` command is used to save data into various
+formats understood by other applications.
+
+The `OUTFILE` and `TYPE` subcommands are mandatory. `OUTFILE`
+specifies the file to be written, as a string file name or a [file
+handle](../language/files/file-handles.md). `TYPE` determines the
+handle](../language/files/file-handles.md). `TYPE` determines the
+type of the file or source to read. It must be one of the following:
+
+* `CSV`
+ Comma-separated value format,
+
+* `TAB`
+ Tab-delimited format.
+
+By default, `SAVE TRANSLATE` does not overwrite an existing file.
+Use `REPLACE` to force an existing file to be overwritten.
+
+With `MISSING=IGNORE`, the default, `SAVE TRANSLATE` treats
+user-missing values as if they were not missing. Specify
+`MISSING=RECODE` to output numeric user-missing values like
+system-missing values and string user-missing values as all spaces.
+
+By default, all the variables in the active dataset dictionary are
+saved to the system file, but `DROP` or `KEEP` can select a subset of
+variable to save. The `RENAME` subcommand can also be used to change
+the names under which variables are saved; because they are used only
+in the output, these names do not have to conform to the usual PSPP
+variable naming rules. `UNSELECTED` determines whether cases filtered
+out by the `FILTER` command are written to the output file. These
+subcommands have the same syntax and meaning as on the
+[`SAVE`](save.md) command.
+
+Each supported file type has additional subcommands, explained in
+separate sections below.
+
+`SAVE TRANSLATE` causes the data to be read. It is a procedure.
+
+## Comma- and Tab-Separated Data Files
+
+```
+SAVE TRANSLATE
+ /OUTFILE={'FILE_NAME',FILE_HANDLE}
+ /TYPE=CSV
+ [/REPLACE]
+ [/MISSING={IGNORE,RECODE}]
+
+ [/DROP=VAR_LIST]
+ [/KEEP=VAR_LIST]
+ [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+ [/UNSELECTED={RETAIN,DELETE}]
+
+ [/FIELDNAMES]
+ [/CELLS={VALUES,LABELS}]
+ [/TEXTOPTIONS DELIMITER='DELIMITER']
+ [/TEXTOPTIONS QUALIFIER='QUALIFIER']
+ [/TEXTOPTIONS DECIMAL={DOT,COMMA}]
+ [/TEXTOPTIONS FORMAT={PLAIN,VARIABLE}]
+```
+
+The `SAVE TRANSLATE` command with `TYPE=CSV` or `TYPE=TAB` writes data in a
+comma- or tab-separated value format similar to that described by
+RFC 4180. Each variable becomes one output column, and each case
+becomes one line of output. If `FIELDNAMES` is specified, an additional
+line at the top of the output file lists variable names.
+
+The `CELLS` and `TEXTOPTIONS FORMAT` settings determine how values are
+written to the output file:
+
+* `CELLS=VALUES FORMAT=PLAIN` (the default settings)
+ Writes variables to the output in "plain" formats that ignore the
+ details of variable formats. Numeric values are written as plain
+ decimal numbers with enough digits to indicate their exact values
+ in machine representation. Numeric values include `e` followed by
+ an exponent if the exponent value would be less than -4 or greater
+ than 16. Dates are written in MM/DD/YYYY format and times in
+ HH:MM:SS format. `WKDAY` and `MONTH` values are written as decimal
+ numbers.
+
+ Numeric values use, by default, the decimal point character set with
+ [`SET DECIMAL`](set.md#decimal). Use `DECIMAL=DOT` or
+ `DECIMAL=COMMA` to force a particular decimal point character.
+
+* `CELLS=VALUES FORMAT=VARIABLE`
+ Writes variables using their print formats. Leading and trailing
+ spaces are removed from numeric values, and trailing spaces are
+ removed from string values.
+
+* `CELLS=LABEL FORMAT=PLAIN`
+ `CELLS=LABEL FORMAT=VARIABLE`
+ Writes value labels where they exist, and otherwise writes the
+ values themselves as described above.
+
+ Regardless of `CELLS` and `TEXTOPTIONS FORMAT`, numeric system-missing
+values are output as a single space.
+
+ For `TYPE=TAB`, tab characters delimit values. For `TYPE=CSV`, the
+`TEXTOPTIONS DELIMITER` and `DECIMAL` settings determine the character
+that separate values within a line. If `DELIMITER` is specified, then
+the specified string separate values. If `DELIMITER` is not
+specified, then the default is a comma with `DECIMAL=DOT` or a
+semicolon with `DECIMAL=COMMA`. If `DECIMAL` is not given either, it
+is inferred from the decimal point character set with [`SET
+DECIMAL`](set.md#decimal).
+
+ The `TEXTOPTIONS QUALIFIER` setting specifies a character that is
+output before and after a value that contains the delimiter character or
+the qualifier character. The default is a double quote (`"`). A
+qualifier character that appears within a value is doubled.
+
--- /dev/null
+# SAVE
+
+```
+SAVE
+ /OUTFILE={'FILE_NAME',FILE_HANDLE}
+ /UNSELECTED={RETAIN,DELETE}
+ /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
+ /PERMISSIONS={WRITEABLE,READONLY}
+ /DROP=VAR_LIST
+ /KEEP=VAR_LIST
+ /VERSION=VERSION
+ /RENAME=(SRC_NAMES=TARGET_NAMES)...
+ /NAMES
+ /MAP
+```
+
+ The `SAVE` procedure causes the dictionary and data in the active
+dataset to be written to a system file.
+
+ `OUTFILE` is the only required subcommand. Specify the system file
+to be written as a string file name or a [file
+handle](../language/files/file-handles.md).
+handle](../language/files/file-handles.md).
+
+ By default, cases excluded with `FILTER` are written to the system
+file. These can be excluded by specifying `DELETE` on the `UNSELECTED`
+subcommand. Specifying `RETAIN` makes the default explicit.
+
+ The `UNCOMPRESSED`, `COMPRESSED`, and `ZCOMPRESSED` subcommand
+determine the system file's compression level:
+
+* `UNCOMPRESSED`
+ Data is not compressed. Each numeric value uses 8 bytes of disk
+ space. Each string value uses one byte per column width, rounded
+ up to a multiple of 8 bytes.
+
+* `COMPRESSED`
+ Data is compressed in a simple way. Each integer numeric value
+ between −99 and 151, inclusive, or system missing value uses one
+ byte of disk space. Each 8-byte segment of a string that consists
+ only of spaces uses 1 byte. Any other numeric value or 8-byte
+ string segment uses 9 bytes of disk space.
+
+* `ZCOMPRESSED`
+ Data is compressed with the "deflate" compression algorithm
+ specified in RFC 1951 (the same algorithm used by `gzip`). Files
+ written with this compression level cannot be read by PSPP 0.8.1 or
+ earlier or by SPSS 20 or earlier.
+
+`COMPRESSED` is the default compression level. The [`SET`](set.md)
+command can change this default.
+
+The `PERMISSIONS` subcommand specifies operating system permissions
+for the new system file. `WRITEABLE`, the default, creates the file
+with read and write permission. `READONLY` creates the file for
+read-only access.
+
+By default, all the variables in the active dataset dictionary are
+written to the system file. The `DROP` subcommand can be used to
+specify a list of variables not to be written. In contrast, `KEEP`
+specifies variables to be written, with all variables not specified
+not written.
+
+Normally variables are saved to a system file under the same names
+they have in the active dataset. Use the `RENAME` subcommand to change
+these names. Specify, within parentheses, a list of variable names
+followed by an equals sign (`=`) and the names that they should be
+renamed to. Multiple parenthesized groups of variable names can be
+included on a single `RENAME` subcommand. Variables' names may be
+swapped using a `RENAME` subcommand of the form `/RENAME=(A B=B A)`.
+
+Alternate syntax for the `RENAME` subcommand allows the parentheses to
+be eliminated. When this is done, only a single variable may be
+renamed at once. For instance, `/RENAME=A=B`. This alternate syntax
+is discouraged.
+
+`DROP`, `KEEP`, and `RENAME` are performed in left-to-right order.
+They each may be present any number of times. `SAVE` never modifies
+the active dataset. `DROP`, `KEEP`, and `RENAME` only affect the
+system file written to disk.
+
+The `VERSION` subcommand specifies the version of the file format.
+Valid versions are 2 and 3. The default version is 3. In version 2
+system files, variable names longer than 8 bytes are truncated. The
+two versions are otherwise identical.
+
+The `NAMES` and `MAP` subcommands are currently ignored.
+
+`SAVE` causes the data to be read. It is a procedure.
+
--- /dev/null
+# SELECT IF
+
+```
+SELECT IF EXPRESSION.
+```
+
+`SELECT IF` selects cases for analysis based on the value of
+EXPRESSION. Cases not selected are permanently eliminated from the
+active dataset, unless [`TEMPORARY`](temporary.md) is in effect.
+
+Specify a [boolean
+expression](../language/expressions/index.md#boolean-values). If
+expression](../language/expressions/index.md#boolean-values). If
+the expression is true for a particular case, the case is analyzed.
+If the expression is false or missing, then the case is deleted from
+the data stream.
+
+Place `SELECT IF` early in the command file. Cases that are deleted
+early can be processed more efficiently in time and space. Once cases
+have been deleted from the active dataset using `SELECT IF` they
+cannot be re-instated. If you want to be able to re-instate cases,
+then use [`FILTER`](filter.md) instead.
+
+When `SELECT IF` is specified following [`TEMPORARY`](temporary.md),
+the [`LAG`](../language/expressions/functions/miscellaneous.md)
+the [`LAG`](../language/expressions/functions/miscellaneous.md)
+function may not be used.
+
+## Example
+
+A shop steward is interested in the salaries of younger personnel in a
+firm. The file `personnel.sav` provides the salaries of all the
+workers and their dates of birth. The syntax below shows how `SELECT
+IF` can be used to limit analysis only to those persons born after
+December 31, 1999.
+
+```
+get file = 'personnel.sav'.
+
+echo 'Salaries of all personnel'.
+descriptives salary.
+
+echo 'Salaries of personnel born after December 31 1999'.
+select if dob > date.dmy (31,12,1999).
+descriptives salary.
+```
+
+From the output shown below, one can see that there are 56 persons
+listed in the dataset, and 17 of them were born after December 31,
+1999.
+
+```
+Salaries of all personnel
+
+ Descriptive Statistics
+┌────────────────────────┬──┬────────┬───────┬───────┬───────┐
+│ │ N│ Mean │Std Dev│Minimum│Maximum│
+├────────────────────────┼──┼────────┼───────┼───────┼───────┤
+│Annual salary before tax│56│40028.97│8721.17│$23,451│$57,044│
+│Valid N (listwise) │56│ │ │ │ │
+│Missing N (listwise) │ 0│ │ │ │ │
+└────────────────────────┴──┴────────┴───────┴───────┴───────┘
+
+Salaries of personnel born after December 31 1999
+
+ Descriptive Statistics
+┌────────────────────────┬──┬────────┬───────┬───────┬───────┐
+│ │ N│ Mean │Std Dev│Minimum│Maximum│
+├────────────────────────┼──┼────────┼───────┼───────┼───────┤
+│Annual salary before tax│17│31828.59│4454.80│$23,451│$39,504│
+│Valid N (listwise) │17│ │ │ │ │
+│Missing N (listwise) │ 0│ │ │ │ │
+└────────────────────────┴──┴────────┴───────┴───────┴───────┘
+```
+
+Note that the `personnel.sav` file from which the data were read is
+unaffected. The transformation affects only the active file.
+
--- /dev/null
+# Selecting Data
+
+This chapter documents PSPP commands that temporarily or permanently
+select data records from the active dataset for analysis.
+++ /dev/null
-# FILTER
-
-```
-FILTER BY VAR_NAME.
-FILTER OFF.
-```
-
-`FILTER` allows a boolean-valued variable to be used to select cases
-from the data stream for processing.
-
-To set up filtering, specify `BY` and a variable name. Keyword `BY` is
-optional but recommended. Cases which have a zero or system- or
-user-missing value are excluded from analysis, but not deleted from
-the data stream. Cases with other values are analyzed. To filter
-based on a different condition, use transformations such as `COMPUTE`
-or `RECODE` to compute a filter variable of the required form, then
-specify that variable on `FILTER`.
-
-`FILTER OFF` turns off case filtering.
-
-Filtering takes place immediately before cases pass to a procedure for
-analysis. Only one filter variable may be active at a time.
-Normally, case filtering continues until it is explicitly turned off
-with `FILTER OFF`. However, if `FILTER` is placed after `TEMPORARY`,
-it filters only the next procedure or procedure-like command.
-
+++ /dev/null
-# Selecting Data
-
-This chapter documents PSPP commands that temporarily or permanently
-select data records from the active dataset for analysis.
+++ /dev/null
-# N OF CASES
-
-```
-N [OF CASES] NUM_OF_CASES [ESTIMATED].
-```
-
-`N OF CASES` limits the number of cases processed by any procedures
-that follow it in the command stream. `N OF CASES 100`, for example,
-tells PSPP to disregard all cases after the first 100.
-
-When `N OF CASES` is specified after [`TEMPORARY`](temporary.md), it
-affects only the next procedure. Otherwise, cases beyond the limit
-specified are not processed by any later procedure.
-
-If the limit specified on `N OF CASES` is greater than the number of
-cases in the active dataset, it has no effect.
-
-When `N OF CASES` is used along with `SAMPLE` or `SELECT IF`, the
-case limit is applied to the cases obtained after sampling or case
-selection, regardless of how `N OF CASES` is placed relative to `SAMPLE`
-or `SELECT IF` in the command file. Thus, the commands `N OF CASES 100`
-and `SAMPLE .5` both randomly sample approximately half of the active
-dataset's cases, then select the first 100 of those sampled, regardless
-of their order in the command file.
-
-`N OF CASES` with the `ESTIMATED` keyword gives an estimated number of
-cases before `DATA LIST` or another command to read in data.
-`ESTIMATED` never limits the number of cases processed by procedures.
-PSPP currently does not use case count estimates.
-
+++ /dev/null
-# SAMPLE
-
-```
-SAMPLE NUM1 [FROM NUM2].
-```
-
-`SAMPLE` randomly samples a proportion of the cases in the active
-file. Unless it follows `TEMPORARY`, it permanently removes cases
-from the active dataset.
-
-The proportion to sample may be expressed as a single number between 0
-and 1. If `N` is the number of currently-selected cases in the active
-dataset, then `SAMPLE K.` will select approximately `K×N` cases.
-
-The proportion to sample can also be specified in the style `SAMPLE M
-FROM N`. With this style, cases are selected as follows:
-
-1. If `N` is the number of currently-selected cases in the active
- dataset, exactly `M` cases are selected.
-
-2. If `N` is greater than the number of currently-selected cases in
- the active dataset, an equivalent proportion of cases are selected.
-
-3. If `N` is less than the number of currently-selected cases in the
- active, exactly `M` cases are selected *from the first `N` cases*
- in the active dataset.
-
-`SAMPLE` and `SELECT IF` are performed in the order specified by the
-syntax file.
-
-`SAMPLE` is always performed before [`N OF CASES`](n.md), regardless
-of ordering in the syntax file.
-
-The same values for `SAMPLE` may result in different samples. To
-obtain the same sample, use the `SET` command to set the random number
-seed to the same value before each `SAMPLE`. Different samples may
-still result when the file is processed on systems with different
-machine types or PSPP versions. By default, the random number seed is
-based on the system time.
-
+++ /dev/null
-# SELECT IF
-
-```
-SELECT IF EXPRESSION.
-```
-
-`SELECT IF` selects cases for analysis based on the value of
-EXPRESSION. Cases not selected are permanently eliminated from the
-active dataset, unless [`TEMPORARY`](temporary.md) is in effect.
-
-Specify a [boolean
-expression](../../language/expressions/index.md#boolean-values). If
-the expression is true for a particular case, the case is analyzed.
-If the expression is false or missing, then the case is deleted from
-the data stream.
-
-Place `SELECT IF` early in the command file. Cases that are deleted
-early can be processed more efficiently in time and space. Once cases
-have been deleted from the active dataset using `SELECT IF` they
-cannot be re-instated. If you want to be able to re-instate cases,
-then use [`FILTER`](filter.md) instead.
-
-When `SELECT IF` is specified following [`TEMPORARY`](temporary.md),
-the [`LAG`](../../language/expressions/functions/miscellaneous.md)
-function may not be used.
-
-## Example
-
-A shop steward is interested in the salaries of younger personnel in a
-firm. The file `personnel.sav` provides the salaries of all the
-workers and their dates of birth. The syntax below shows how `SELECT
-IF` can be used to limit analysis only to those persons born after
-December 31, 1999.
-
-```
-get file = 'personnel.sav'.
-
-echo 'Salaries of all personnel'.
-descriptives salary.
-
-echo 'Salaries of personnel born after December 31 1999'.
-select if dob > date.dmy (31,12,1999).
-descriptives salary.
-```
-
-From the output shown below, one can see that there are 56 persons
-listed in the dataset, and 17 of them were born after December 31,
-1999.
-
-```
-Salaries of all personnel
-
- Descriptive Statistics
-┌────────────────────────┬──┬────────┬───────┬───────┬───────┐
-│ │ N│ Mean │Std Dev│Minimum│Maximum│
-├────────────────────────┼──┼────────┼───────┼───────┼───────┤
-│Annual salary before tax│56│40028.97│8721.17│$23,451│$57,044│
-│Valid N (listwise) │56│ │ │ │ │
-│Missing N (listwise) │ 0│ │ │ │ │
-└────────────────────────┴──┴────────┴───────┴───────┴───────┘
-
-Salaries of personnel born after December 31 1999
-
- Descriptive Statistics
-┌────────────────────────┬──┬────────┬───────┬───────┬───────┐
-│ │ N│ Mean │Std Dev│Minimum│Maximum│
-├────────────────────────┼──┼────────┼───────┼───────┼───────┤
-│Annual salary before tax│17│31828.59│4454.80│$23,451│$39,504│
-│Valid N (listwise) │17│ │ │ │ │
-│Missing N (listwise) │ 0│ │ │ │ │
-└────────────────────────┴──┴────────┴───────┴───────┴───────┘
-```
-
-Note that the `personnel.sav` file from which the data were read is
-unaffected. The transformation affects only the active file.
-
+++ /dev/null
-# SPLIT FILE
-
-```
-SPLIT FILE [{LAYERED, SEPARATE}] BY VAR_LIST.
-SPLIT FILE OFF.
-```
-
-`SPLIT FILE` allows multiple sets of data present in one data file to
-be analyzed separately using single statistical procedure commands.
-
-Specify a list of variable names to analyze multiple sets of data
-separately. Groups of adjacent cases having the same values for these
-variables are analyzed by statistical procedure commands as one group.
-An independent analysis is carried out for each group of cases, and the
-variable values for the group are printed along with the analysis.
-
-When a list of variable names is specified, one of the keywords
-`LAYERED` or `SEPARATE` may also be specified. With `LAYERED`, which
-is the default, the separate analyses for each group are presented
-together in a single table. With `SEPARATE`, each analysis is
-presented in a separate table. Not all procedures honor the
-distinction.
-
-Groups are formed only by _adjacent_ cases. To create a split using a
-variable where like values are not adjacent in the working file, first
-[sort the data](../../commands/data/sort-cases.md) by that variable.
-
-Specify `OFF` to disable `SPLIT FILE` and resume analysis of the
-entire active dataset as a single group of data.
-
-When `SPLIT FILE` is specified after [`TEMPORARY`](temporary.md), it
-affects only the next procedure.
-
-## Example
-
-The file `horticulture.sav` contains data describing the yield of a
-number of horticultural specimens which have been subjected to various
-treatments. If we wanted to investigate linear statistics of the
-yeild, one way to do this is using
-[`DESCRIPTIVES`](../statistics/descriptives.md). However, it is
-reasonable to expect the mean to be different depending on the
-treatment. So we might want to perform three separate procedures --
-one for each treatment.[^1] The following syntax shows how this can be
-done automatically using the `SPLIT FILE` command.
-
-[^1]: There are other, possibly better, ways to achieve a similar
-result using the `MEANS` or `EXAMINE` commands.
-
-```
-get file='horticulture.sav'.
-
-* Ensure cases are sorted before splitting.
-sort cases by treatment.
-
-split file by treatment.
-
-* Run descriptives on the yield variable
-descriptives /variable = yield.
-```
-
-In the following output, you can see that the table of descriptive
-statistics appears 3 times—once for each value of treatment. In this
-example `N`, the number of observations are identical in all splits.
-This is because that experiment was deliberately designed that way.
-However in general one can expect a different `N` for each split.
-
-```
- Split Values
-┌─────────┬───────┐
-│Variable │ Value │
-├─────────┼───────┤
-│treatment│control│
-└─────────┴───────┘
-
- Descriptive Statistics
-┌────────────────────┬──┬─────┬───────┬───────┬───────┐
-│ │ N│ Mean│Std Dev│Minimum│Maximum│
-├────────────────────┼──┼─────┼───────┼───────┼───────┤
-│yield │30│51.23│ 8.28│ 37.86│ 68.59│
-│Valid N (listwise) │30│ │ │ │ │
-│Missing N (listwise)│ 0│ │ │ │ │
-└────────────────────┴──┴─────┴───────┴───────┴───────┘
-
- Split Values
-┌─────────┬────────────┐
-│Variable │ Value │
-├─────────┼────────────┤
-│treatment│conventional│
-└─────────┴────────────┘
-
- Descriptive Statistics
-┌────────────────────┬──┬─────┬───────┬───────┬───────┐
-│ │ N│ Mean│Std Dev│Minimum│Maximum│
-├────────────────────┼──┼─────┼───────┼───────┼───────┤
-│yield │30│53.57│ 8.92│ 36.30│ 70.66│
-│Valid N (listwise) │30│ │ │ │ │
-│Missing N (listwise)│ 0│ │ │ │ │
-└────────────────────┴──┴─────┴───────┴───────┴───────┘
-
- Split Values
-┌─────────┬───────────┐
-│Variable │ Value │
-├─────────┼───────────┤
-│treatment│traditional│
-└─────────┴───────────┘
-
- Descriptive Statistics
-┌────────────────────┬──┬─────┬───────┬───────┬───────┐
-│ │ N│ Mean│Std Dev│Minimum│Maximum│
-├────────────────────┼──┼─────┼───────┼───────┼───────┤
-│yield │30│56.87│ 8.88│ 39.08│ 75.93│
-│Valid N (listwise) │30│ │ │ │ │
-│Missing N (listwise)│ 0│ │ │ │ │
-└────────────────────┴──┴─────┴───────┴───────┴───────┘
-```
-
-Example 13.3: The results of running `DESCRIPTIVES` with an active split
-
-Unless `TEMPORARY` was used, after a split has been defined for a
-dataset it remains active until explicitly disabled.
-
+++ /dev/null
-# TEMPORARY
-
-```
-TEMPORARY.
-```
-
-`TEMPORARY` is used to make the effects of transformations following
-its execution temporary. These transformations affect only the
-execution of the next procedure or procedure-like command. Their
-effects are not be saved to the active dataset.
-
-The only specification on `TEMPORARY` is the command name.
-
-`TEMPORARY` may not appear within a `DO IF` or `LOOP` construct. It
-may appear only once between procedures and procedure-like commands.
-
-Scratch variables cannot be used following `TEMPORARY`.
-
-## Example
-
-In the syntax below, there are two `COMPUTE` transformation. One of
-them immediately follows a `TEMPORARY` command, and therefore affects
-only the next procedure, which in this case is the first
-`DESCRIPTIVES` command.
-
-```
-data list notable /x 1-2.
-begin data.
- 2
- 4
-10
-15
-20
-24
-end data.
-
-compute x=x/2.
-
-temporary.
-compute x=x+3.
-
-descriptives x.
-descriptives x.
-```
-
-The data read by the first `DESCRIPTIVES` procedure are 4, 5, 8, 10.5,
-13, 15. The data read by the second `DESCRIPTIVES` procedure are 1,
-2, 5, 7.5, 10, 12. This is because the second `COMPUTE`
-transformation has no effect on the second `DESCRIPTIVES` procedure.
-You can check these figures in the following output.
-
-```
- Descriptive Statistics
-┌────────────────────┬─┬────┬───────┬───────┬───────┐
-│ │N│Mean│Std Dev│Minimum│Maximum│
-├────────────────────┼─┼────┼───────┼───────┼───────┤
-│x │6│9.25│ 4.38│ 4│ 15│
-│Valid N (listwise) │6│ │ │ │ │
-│Missing N (listwise)│0│ │ │ │ │
-└────────────────────┴─┴────┴───────┴───────┴───────┘
-
- Descriptive Statistics
-┌────────────────────┬─┬────┬───────┬───────┬───────┐
-│ │N│Mean│Std Dev│Minimum│Maximum│
-├────────────────────┼─┼────┼───────┼───────┼───────┤
-│x │6│6.25│ 4.38│ 1│ 12│
-│Valid N (listwise) │6│ │ │ │ │
-│Missing N (listwise)│0│ │ │ │ │
-└────────────────────┴─┴────┴───────┴───────┴───────┘
-```
+++ /dev/null
-# WEIGHT
-
-```
-WEIGHT BY VAR_NAME.
-WEIGHT OFF.
-```
-
-`WEIGHT` assigns cases varying weights, changing the frequency
-distribution of the active dataset. Execution of `WEIGHT` is delayed
-until data have been read.
-
-If a variable name is specified, `WEIGHT` causes the values of that
-variable to be used as weighting factors for subsequent statistical
-procedures. Use of keyword `BY` is optional but recommended.
-Weighting variables must be numeric. [Scratch
-variables](../../language/datasets/scratch-variables.md) may not be
-used for weighting.
-
-When `OFF` is specified, subsequent statistical procedures weight all
-cases equally.
-
-A positive integer weighting factor `W` on a case yields the same
-statistical output as would replicating the case `W` times. A
-weighting factor of 0 is treated for statistical purposes as if the
-case did not exist in the input. Weighting values need not be
-integers, but negative and system-missing values for the weighting
-variable are interpreted as weighting factors of 0. User-missing
-values are not treated specially.
-
-When `WEIGHT` is specified after [`TEMPORARY`](temporary.md), it
-affects only the next procedure.
-
-`WEIGHT` does not cause cases in the active dataset to be replicated
-in memory.
-
-## Example
-
-One could define a dataset containing an inventory of stock items. It
-would be reasonable to use a string variable for a description of the
-item, and a numeric variable for the number in stock, like in the
-syntax below.
-
-```
-data list notable list /item (a16) quantity (f8.0).
-begin data
-nuts 345
-screws 10034
-washers 32012
-bolts 876
-end data.
-
-echo 'Unweighted frequency table'.
-frequencies /variables = item /format=dfreq.
-
-weight by quantity.
-
-echo 'Weighted frequency table'.
-frequencies /variables = item /format=dfreq.
-```
-
-One analysis which most surely would be of interest is the relative
-amounts or each item in stock. However without setting a weight
-variable, [`FREQUENCIES`](../statistics/frequencies.md) does not tell
-us what we want to know, since there is only one case for each stock
-item. The output below shows the difference between the weighted and
-unweighted frequency tables.
-
-```
-Unweighted frequency table
-
- item
-┌─────────────┬─────────┬───────┬─────────────┬──────────────────┐
-│ │Frequency│Percent│Valid Percent│Cumulative Percent│
-├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Valid bolts │ 1│ 25.0%│ 25.0%│ 25.0%│
-│ nuts │ 1│ 25.0%│ 25.0%│ 50.0%│
-│ screws │ 1│ 25.0%│ 25.0%│ 75.0%│
-│ washers│ 1│ 25.0%│ 25.0%│ 100.0%│
-├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Total │ 4│ 100.0%│ │ │
-└─────────────┴─────────┴───────┴─────────────┴──────────────────┘
-
-Weighted frequency table
-
- item
-┌─────────────┬─────────┬───────┬─────────────┬──────────────────┐
-│ │Frequency│Percent│Valid Percent│Cumulative Percent│
-├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Valid washers│ 32012│ 74.0%│ 74.0%│ 74.0%│
-│ screws │ 10034│ 23.2%│ 23.2%│ 97.2%│
-│ bolts │ 876│ 2.0%│ 2.0%│ 99.2%│
-│ nuts │ 345│ .8%│ .8%│ 100.0%│
-├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Total │ 43267│ 100.0%│ │ │
-└─────────────┴─────────┴───────┴─────────────┴──────────────────┘
-```
--- /dev/null
+# SET
+
+```
+SET
+
+(data input)
+ /BLANKS={SYSMIS,'.',number}
+ /DECIMAL={DOT,COMMA}
+ /FORMAT=FMT_SPEC
+ /EPOCH={AUTOMATIC,YEAR}
+ /RIB={NATIVE,MSBFIRST,LSBFIRST}
+
+(interaction)
+ /MXERRS=MAX_ERRS
+ /MXWARNS=MAX_WARNINGS
+ /WORKSPACE=WORKSPACE_SIZE
+
+(syntax execution)
+ /LOCALE='LOCALE'
+ /MXLOOPS=MAX_LOOPS
+ /SEED={RANDOM,SEED_VALUE}
+ /UNDEFINED={WARN,NOWARN}
+ /FUZZBITS=FUZZBITS
+ /SCALEMIN=COUNT
+
+(data output)
+ /CC{A,B,C,D,E}={'NPRE,PRE,SUF,NSUF','NPRE.PRE.SUF.NSUF'}
+ /DECIMAL={DOT,COMMA}
+ /FORMAT=FMT_SPEC
+ /LEADZERO={ON,OFF}
+ /MDISPLAY={TEXT,TABLES}
+ /SMALL=NUMBER
+ /WIB={NATIVE,MSBFIRST,LSBFIRST}
+
+(output routing)
+ /ERRORS={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
+ /MESSAGES={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
+ /PRINTBACK={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
+ /RESULTS={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
+
+(output driver options)
+ /HEADERS={NO,YES,BLANK}
+ /LENGTH={NONE,N_LINES}
+ /WIDTH={NARROW,WIDTH,N_CHARACTERS}
+ /TNUMBERS={VALUES,LABELS,BOTH}
+ /TVARS={NAMES,LABELS,BOTH}
+ /TLOOK={NONE,FILE}
+
+(logging)
+ /JOURNAL={ON,OFF} ['FILE_NAME']
+
+(system files)
+ /SCOMPRESSION={ON,OFF}
+
+(miscellaneous)
+ /SAFER=ON
+ /LOCALE='STRING'
+
+(macros)
+ /MEXPAND={ON,OFF}
+ /MPRINT={ON,OFF}
+ /MITERATE=NUMBER
+ /MNEST=NUMBER
+
+(settings not yet implemented, but accepted and ignored)
+ /BASETEXTDIRECTION={AUTOMATIC,RIGHTTOLEFT,LEFTTORIGHT}
+ /BLOCK='C'
+ /BOX={'XXX','XXXXXXXXXXX'}
+ /CACHE={ON,OFF}
+ /CELLSBREAK=NUMBER
+ /COMPRESSION={ON,OFF}
+ /CMPTRANS={ON,OFF}
+ /HEADER={NO,YES,BLANK}
+```
+
+`SET` allows the user to adjust several parameters relating to PSPP's
+execution. Since there are many subcommands to this command, its
+subcommands are examined in groups.
+
+For subcommands that take boolean values, `ON` and `YES` are
+synonymous, as are `OFF` and `NO`, when used as subcommand values.
+
+The data input subcommands affect the way that data is read from data
+files. The data input subcommands are
+
+* `BLANKS`
+ This is the value assigned to an item data item that is empty or
+ contains only white space. An argument of SYSMIS or '.' causes
+ the system-missing value to be assigned to null items. This is the
+ default. Any real value may be assigned.
+
+* <a name="decimal">`DECIMAL`</a>
+ This value may be set to `DOT` or `COMMA`. Setting it to `DOT`
+ causes the decimal point character to be `.` and the grouping
+ character to be `,`. Setting it to `COMMA` causes the decimal point
+ character to be `,` and the grouping character to be `.`. If the
+ setting is `COMMA`, then `,` is not treated as a field separator in
+ the [`DATA LIST`](data-list.md) command. The default
+ value is determined from the system locale.
+
+* <a name="format">`FORMAT`</a>
+ Changes the default numeric [input/output
+ format](../language/datasets/formats/index.md). The default is
+ initially `F8.2`.
+
+* <a name="epoch">`EPOCH`</a>
+ Specifies the range of years used when a 2-digit year is read from a
+ data file or used in a [date construction
+ expression](../language/expressions/functions/time-and-date.md#constructing-dates).
+ If a 4-digit year is specified for the epoch, then 2-digit years are
+ interpreted starting from that year, known as the epoch. If
+ `AUTOMATIC` (the default) is specified, then the epoch begins 69
+ years before the current date.
+
+* <a name="rib">`RIB`</a>
+ PSPP extension to set the byte ordering (endianness) used for
+ reading data in [`IB` or `PIB`
+ format](../language/datasets/formats/binary-and-hex.md#ib-and-pib-formats). In
+ `MSBFIRST` ordering, the most-significant byte appears at the left
+ end of a IB or PIB field. In `LSBFIRST` ordering, the
+ least-significant byte appears at the left end. `NATIVE`, the
+ default, is equivalent to `MSBFIRST` or `LSBFIRST` depending on the
+ native format of the machine running PSPP.
+
+Interaction subcommands affect the way that PSPP interacts with an
+online user. The interaction subcommands are
+
+* `MXERRS`
+ The maximum number of errors before PSPP halts processing of the
+ current command file. The default is 50.
+
+* `MXWARNS`
+ The maximum number of warnings + errors before PSPP halts
+ processing the current command file. The special value of zero
+ means that all warning situations should be ignored. No warnings
+ are issued, except a single initial warning advising you that
+ warnings will not be given. The default value is 100.
+
+Syntax execution subcommands control the way that PSPP commands
+execute. The syntax execution subcommands are
+
+* `LOCALE`
+ Overrides the system locale for the purpose of reading and writing
+ syntax and data files. The argument should be a locale name in the
+ general form `LANGUAGE_COUNTRY.ENCODING`, where `LANGUAGE` and
+ `COUNTRY` are 2-character language and country abbreviations,
+ respectively, and `ENCODING` is an [IANA character set
+ name](http://www.iana.org/assignments/character-sets). Example
+ locales are `en_US.UTF-8` (UTF-8 encoded English as spoken in the
+ United States) and `ja_JP.EUC-JP` (EUC-JP encoded Japanese as spoken
+ in Japan).
+
+* <a name="mxloops">`MXLOOPS`</a>
+ The maximum number of iterations for an uncontrolled
+ [`LOOP`](loop.md), and for any [loop in the matrix
+ language](matrix.md#the-loop-and-break-commands). The default
+ `MXLOOPS` is 40.
+
+* <a name="seed">`SEED`</a>
+ The initial pseudo-random number seed. Set it to a real number or
+ to `RANDOM`, to obtain an initial seed from the current time of day.
+
+* `UNDEFINED`
+ Currently not used.
+
+* <a name="fuzzbits">`FUZZBITS`</a>
+ The maximum number of bits of errors in the least-significant places
+ to accept for rounding up a value that is almost halfway between two
+ possibilities for rounding with the
+ [RND](../language/expressions/functions/mathematical.md#rnd). The
+ default FUZZBITS is 6.
+
+* <a name="scalemin">`SCALEMIN`</a>
+ The minimum number of distinct valid values for PSPP to assume that
+ a variable has a scale [measurement
+ level](../language/datasets/variables.md#measurement-level).
+
+* `WORKSPACE`
+ The maximum amount of memory (in kilobytes) that PSPP uses to store
+ data being processed. If memory in excess of the workspace size is
+ required, then PSPP starts to use temporary files to store the
+ data. Setting a higher value means that procedures run faster, but
+ may cause other applications to run slower. On platforms without
+ virtual memory management, setting a very large workspace may cause
+ PSPP to abort.
+
+Data output subcommands affect the format of output data. These
+subcommands are
+
+* `CCA`
+ `CCB`
+ `CCC`
+ `CCD`
+ `CCE`
+ Set up [custom currency
+ formats](../language/datasets/formats/custom-currency.md).
+
+* `DECIMAL`
+ The default `DOT` setting causes the decimal point character to be
+ `.`. A setting of `COMMA` causes the decimal point character to be
+ `,`.
+
+* `FORMAT`
+ Allows the default numeric [input/output
+ format](../language/datasets/formats/index.md) to be specified. The
+ default is `F8.2`.
+
+* <a name="leadzero">`LEADZERO`</a>
+ Controls whether numbers with magnitude less than one are displayed
+ with a zero before the decimal point. For example, with `SET
+ LEADZERO=OFF`, which is the default, one-half is shown as 0.5, and
+ with `SET LEADZERO=ON`, it is shown as .5. This setting affects
+ only the `F`, `COMMA`, and `DOT` formats.
+
+* <a name="mdisplay">`MDISPLAY`</a>
+ Controls how the [`PRINT`](matrix.md#the-print-command) command
+ within [`MATRIX`...`END MATRIX`](matrix.md) outputs matrices. With
+ the default `TEXT`, `PRINT` outputs matrices as text. Change this
+ setting to `TABLES` to instead output matrices as pivot tables.
+
+* `SMALL`
+ This controls how PSPP formats small numbers in pivot tables, in
+ cases where PSPP does not otherwise have a well-defined format for
+ the numbers. When such a number has a magnitude less than the
+ value set here, PSPP formats the number in scientific notation;
+ otherwise, it formats it in standard notation. The default is
+ 0.0001. Set a value of 0 to disable scientific notation.
+
+* <a name="wib">`WIB`</a>
+ PSPP extension to set the byte ordering (endianness) used for
+ writing data in [`IB` or `PIB`
+ format](../language/datasets/formats/binary-and-hex.md#ib-and-pib-formats).
+ In `MSBFIRST` ordering, the most-significant byte appears at the
+ left end of a IB or PIB field. In `LSBFIRST` ordering, the
+ least-significant byte appears at the left end. `NATIVE`, the
+ default, is equivalent to `MSBFIRST` or `LSBFIRST` depending on the
+ native format of the machine running PSPP.
+
+In the PSPP text-based interface, the output routing subcommands
+affect where output is sent. The following values are allowed for each
+of these subcommands:
+
+* `OFF`
+ `NONE`
+ Discard this kind of output.
+
+* `TERMINAL`
+ Write this output to the terminal, but not to listing files and
+ other output devices.
+
+* `LISTING`
+ Write this output to listing files and other output devices, but
+ not to the terminal.
+
+* `ON`
+ `BOTH`
+ Write this type of output to all output devices.
+
+These output routing subcommands are:
+
+* `ERRORS`
+ Applies to error and warning messages. The default is `BOTH`.
+
+* `MESSAGES`
+ Applies to notes. The default is `BOTH`.
+
+* `PRINTBACK`
+ Determines whether the syntax used for input is printed back as
+ part of the output. The default is `NONE`.
+
+* `RESULTS`
+ Applies to everything not in one of the above categories, such as
+ the results of statistical procedures. The default is `BOTH`.
+
+These subcommands have no effect on output in the PSPP GUI
+environment.
+
+Output driver option subcommands affect output drivers' settings.
+These subcommands are:
+
+* `HEADERS`
+
+* `LENGTH`
+
+* <a name="width">`WIDTH`</a>
+
+* `TNUMBERS`
+ The `TNUMBERS` option sets the way in which values are displayed in
+ output tables. The valid settings are `VALUES`, `LABELS` and
+ `BOTH`. If `TNUMBERS` is set to `VALUES`, then all values are
+ displayed with their literal value (which for a numeric value is a
+ number and for a string value an alphanumeric string). If
+ `TNUMBERS` is set to `LABELS`, then values are displayed using their
+ assigned [value labels](value-labels.md), if any. If the value has
+ no label, then the literal value is used for display. If `TNUMBERS`
+ is set to `BOTH`, then values are displayed with both their label
+ (if any) and their literal value in parentheses.
+
+* <a name="tvars">`TVARS`</a>
+ The `TVARS` option sets the way in which variables are displayed in
+ output tables. The valid settings are `NAMES`, `LABELS` and `BOTH`.
+ If `TVARS` is set to `NAMES`, then all variables are displayed using
+ their names. If `TVARS` is set to `LABELS`, then variables are
+ displayed using their [variable label](variable-labels.md), if one
+ has been set. If no label has been set, then the name is used. If
+ `TVARS` is set to `BOTH`, then variables are displayed with both
+ their label (if any) and their name in parentheses.
+
+* <a name="tlook">`TLOOK`</a>
+ The `TLOOK` option sets the style used for subsequent table output.
+ Specifying `NONE` makes PSPP use the default built-in style.
+ Otherwise, specifying FILE makes PSPP search for an `.stt` or
+ `.tlo` file in the same way as specifying `--table-look=FILE` the
+ PSPP command line (*note Main Options::).
+
+Logging subcommands affect logging of commands executed to external
+files. These subcommands are
+
+* `JOURNAL`
+ `LOG`
+ These subcommands, which are synonyms, control the journal. The
+ default is `ON`, which causes commands entered interactively to be
+ written to the journal file. Commands included from syntax files
+ that are included interactively and error messages printed by PSPP
+ are also written to the journal file, prefixed by `>`. `OFF`
+ disables use of the journal.
+
+ The journal is named `pspp.jnl` by default. A different name may
+ be specified.
+
+System file subcommands affect the default format of system files
+produced by PSPP. These subcommands are
+
+* <a name="scompression">`SCOMPRESSION</a>`
+ Whether system files created by `SAVE` or `XSAVE` are compressed by
+ default. The default is `ON`.
+
+Security subcommands affect the operations that commands are allowed
+to perform. The security subcommands are
+
+* <a name="safer">`SAFER`</a>
+ Setting this option disables the following operations:
+
+ - The `ERASE` command.
+ - The `HOST` command.
+ - The `PERMISSIONS` command.
+ - Pipes (file names beginning or ending with `|`).
+
+ Be aware that this setting does not guarantee safety (commands can
+ still overwrite files, for instance) but it is an improvement.
+ When set, this setting cannot be reset during the same session, for
+ obvious security reasons.
+
+* <a name="locale">`LOCALE`</a>
+ This item is used to set the default character encoding. The
+ encoding may be specified either as an [IANA encoding name or
+ alias](http://www.iana.org/assignments/character-sets), or as a
+ locale name. If given as a locale name, only the character encoding
+ of the locale is relevant.
+
+ System files written by PSPP use this encoding. System files read
+ by PSPP, for which the encoding is unknown, are interpreted using
+ this encoding.
+
+ The full list of valid encodings and locale names/alias are
+ operating system dependent. The following are all examples of
+ acceptable syntax on common GNU/Linux systems.
+
+ ```
+ SET LOCALE='iso-8859-1'.
+
+ SET LOCALE='ru_RU.cp1251'.
+
+ SET LOCALE='japanese'.
+ ```
+
+ Contrary to intuition, this command does not affect any aspect of
+ the system's locale.
+
+The following subcommands affect the interpretation of macros. For
+more information, see [Macro Settings](define.md#macro-settings).
+
+* <a name="mexpand">`MEXPAND`</a>
+ Controls whether macros are expanded. The default is `ON`.
+
+* <a name="mprint">`MPRINT`</a>
+ Controls whether the expansion of macros is included in output.
+ This is separate from whether command syntax in general is included
+ in output. The default is `OFF`.
+
+* <a name="miterate">`MITERATE`</a>
+ Limits the number of iterations executed in
+ [`!DO`](define.md#macro-loops) loops within macros. This does not
+ affect other language constructs such as [`LOOP`…`END
+ LOOP`](loop.md). This must be set to a positive integer. The
+ default is 1000.
+
+* <a name="mnest">`MNEST`</a>
+ Limits the number of levels of nested macro expansions. This must
+ be set to a positive integer. The default is 50.
+
+The following subcommands are not yet implemented, but PSPP accepts
+them and ignores the settings:
+
+* `BASETEXTDIRECTION`
+* `BLOCK`
+* `BOX`
+* `CACHE`
+* `CELLSBREAK`
+* `COMPRESSION`
+* `CMPTRANS`
+* `HEADER`
+
--- /dev/null
+# SHOW
+
+```
+SHOW
+ [ALL]
+ [BLANKS]
+ [CC]
+ [CCA]
+ [CCB]
+ [CCC]
+ [CCD]
+ [CCE]
+ [COPYING]
+ [DECIMAL]
+ [DIRECTORY]
+ [ENVIRONMENT]
+ [FORMAT]
+ [FUZZBITS]
+ [LENGTH]
+ [MEXPAND]
+ [MPRINT]
+ [MITERATE]
+ [MNEST]
+ [MXERRS]
+ [MXLOOPS]
+ [MXWARNS]
+ [N]
+ [SCOMPRESSION]
+ [SYSTEM]
+ [TEMPDIR]
+ [UNDEFINED]
+ [VERSION]
+ [WARRANTY]
+ [WEIGHT]
+ [WIDTH]
+```
+
+`SHOW` displays PSPP's settings and status. Parameters that can be
+changed using [`SET`](set.md), can be examined using `SHOW` using the
+subcommand with the same name. `SHOW` supports the following
+additional subcommands:
+
+* `ALL`
+ Show all settings.
+* `CC`
+ Show all custom currency settings (`CCA` through `CCE`).
+* `DIRECTORY`
+ Shows the current working directory.
+* `ENVIRONMENT`
+ Shows the operating system details.
+* `N`
+ Reports the number of cases in the active dataset. The reported
+ number is not weighted. If no dataset is defined, then `Unknown`
+ is reported.
+* `SYSTEM`
+ Shows information about how PSPP was built. This information is
+ useful in bug reports.
+* `TEMPDIR`
+ Shows the path of the directory where temporary files are stored.
+* `VERSION`
+ Shows the version of this installation of PSPP.
+* `WARRANTY`
+ Show details of the lack of warranty for PSPP.
+* `COPYING` or `LICENSE`
+ Display the terms of [PSPP's copyright licence](../license.md).
+
+Specifying `SHOW` without any subcommands is equivalent to `SHOW
+ALL`.
+
--- /dev/null
+# SORT CASES
+
+```
+SORT CASES BY VAR_LIST[({D|A}] [ VAR_LIST[({D|A}] ] ...
+```
+
+`SORT CASES` sorts the active dataset by the values of one or more
+variables.
+
+Specify `BY` and a list of variables to sort by. By default,
+variables are sorted in ascending order. To override sort order,
+specify `(D)` or `(DOWN)` after a list of variables to get descending
+order, or `(A)` or `(UP)` for ascending order. These apply to all the
+listed variables up until the preceding `(A)`, `(D)`, `(UP)` or
+`(DOWN)`.
+
+`SORT CASES` performs a stable sort, meaning that records with equal
+values of the sort variables have the same relative order before and
+after sorting. Thus, re-sorting an already sorted file does not
+affect the ordering of cases.
+
+`SORT CASES` is a procedure. It causes the data to be read.
+
+`SORT CASES` attempts to sort the entire active dataset in main
+memory. If workspace is exhausted, it falls back to a merge sort
+algorithm which creates numerous temporary files.
+
+`SORT CASES` may not be specified following `TEMPORARY`.
+
+## Example
+
+In the syntax below, the data from the file `physiology.sav` is sorted
+by two variables, viz sex in descending order and temperature in
+ascending order.
+
+```
+get file='physiology.sav'.
+sort cases by sex (D) temperature(A).
+list.
+```
+
+ In the output below, you can see that all the cases with a sex of
+`1` (female) appear before those with a sex of `0` (male). This is
+because they have been sorted in descending order. Within each sex,
+the data is sorted on the temperature variable, this time in ascending
+order.
+
+```
+ Data List
+┌───┬──────┬──────┬───────────┐
+│sex│height│weight│temperature│
+├───┼──────┼──────┼───────────┤
+│ 1│ 1606│ 56.1│ 34.56│
+│ 1│ 179│ 56.3│ 35.15│
+│ 1│ 1609│ 55.4│ 35.46│
+│ 1│ 1606│ 56.0│ 36.06│
+│ 1│ 1607│ 56.3│ 36.26│
+│ 1│ 1604│ 56.0│ 36.57│
+│ 1│ 1604│ 56.6│ 36.81│
+│ 1│ 1606│ 56.3│ 36.88│
+│ 1│ 1604│ 57.8│ 37.32│
+│ 1│ 1598│ 55.6│ 37.37│
+│ 1│ 1607│ 55.9│ 37.84│
+│ 1│ 1605│ 54.5│ 37.86│
+│ 1│ 1603│ 56.1│ 38.80│
+│ 1│ 1604│ 58.1│ 38.85│
+│ 1│ 1605│ 57.7│ 38.98│
+│ 1│ 1709│ 55.6│ 39.45│
+│ 1│ 1604│ -55.6│ 39.72│
+│ 1│ 1601│ 55.9│ 39.90│
+│ 0│ 1799│ 90.3│ 32.59│
+│ 0│ 1799│ 89.0│ 33.61│
+│ 0│ 1799│ 90.6│ 34.04│
+│ 0│ 1801│ 90.5│ 34.42│
+│ 0│ 1802│ 87.7│ 35.03│
+│ 0│ 1793│ 90.1│ 35.11│
+│ 0│ 1801│ 92.1│ 35.98│
+│ 0│ 1800│ 89.5│ 36.10│
+│ 0│ 1645│ 92.1│ 36.68│
+│ 0│ 1698│ 90.2│ 36.94│
+│ 0│ 1800│ 89.6│ 37.02│
+│ 0│ 1800│ 88.9│ 37.03│
+│ 0│ 1801│ 88.9│ 37.12│
+│ 0│ 1799│ 90.4│ 37.33│
+│ 0│ 1903│ 91.5│ 37.52│
+│ 0│ 1799│ 90.9│ 37.53│
+│ 0│ 1800│ 91.0│ 37.60│
+│ 0│ 1799│ 90.4│ 37.68│
+│ 0│ 1801│ 91.7│ 38.98│
+│ 0│ 1801│ 90.9│ 39.03│
+│ 0│ 1799│ 89.3│ 39.77│
+│ 0│ 1884│ 88.6│ 39.97│
+└───┴──────┴──────┴───────────┘
+```
+
+`SORT CASES` affects only the active file. It does not have any
+effect upon the `physiology.sav` file itself. For that, you would
+have to rewrite the file using the [`SAVE`](save.md) command.
--- /dev/null
+# SORT VARIABLES
+
+`SORT VARIABLES` reorders the variables in the active dataset's
+dictionary according to a chosen sort key.
+
+```
+SORT VARIABLES [BY]
+ (NAME | TYPE | FORMAT | LABEL | VALUES | MISSING | MEASURE
+ | ROLE | COLUMNS | ALIGNMENT | ATTRIBUTE NAME)
+ [(D)].
+```
+
+The main specification is one of the following identifiers, which
+determines how the variables are sorted:
+
+* `NAME`
+ Sorts the variables according to their names, in a case-insensitive
+ fashion. However, when variable names differ only in a number at
+ the end, they are sorted numerically. For example, `VAR5` is
+ sorted before `VAR400` even though `4` precedes `5`.
+
+* `TYPE`
+ Sorts numeric variables before string variables, and shorter string
+ variables before longer ones.
+
+* `FORMAT`
+ Groups variables by print format; within a format, sorts narrower
+ formats before wider ones; with the same format and width, sorts
+ fewer decimal places before more decimal places. See [`PRINT
+ FORMATS`](print-formats.md).
+
+* `LABEL`
+ Sorts variables without a variable label before those with one.
+ See [VARIABLE LABELS](variable-labels.md).
+
+* `VALUES`
+ Sorts variables without value labels before those with some. See
+ [VALUE LABELS](value-labels.md).
+
+* `MISSING`
+ Sorts variables without missing values before those with some. See
+ [MISSING VALUES](missing-values.md).
+
+* `MEASURE`
+ Sorts nominal variables first, followed by ordinal variables,
+ followed by scale variables. See [VARIABLE
+ LEVEL](variable-level.md).
+
+* `ROLE`
+ Groups variables according to their role. See [VARIABLE
+ ROLE](variable-role.md).
+
+* `COLUMNS`
+ Sorts variables in ascending display width. See [VARIABLE
+ WIDTH](variable-width.md).
+
+* `ALIGNMENT`
+ Sorts variables according to their alignment, first left-aligned,
+ then right-aligned, then centered. See [VARIABLE
+ ALIGNMENT](variable-alignment.md).
+
+* `ATTRIBUTE NAME`
+ Sorts variables according to the first value of their `NAME`
+ attribute. Variables without attributes are sorted first. See
+ [VARIABLE ATTRIBUTE](variable-attribute.md).
+
+Only one sort criterion can be specified. The sort is "stable," so to
+sort on multiple criteria one may perform multiple sorts. For
+example, the following will sort primarily based on alignment, with
+variables that have the same alignment ordered based on display width:
+
+```
+SORT VARIABLES BY COLUMNS.
+SORT VARIABLES BY ALIGNMENT.
+```
+
+
+Specify `(D)` to reverse the sort order.
+
--- /dev/null
+# SPLIT FILE
+
+```
+SPLIT FILE [{LAYERED, SEPARATE}] BY VAR_LIST.
+SPLIT FILE OFF.
+```
+
+`SPLIT FILE` allows multiple sets of data present in one data file to
+be analyzed separately using single statistical procedure commands.
+
+Specify a list of variable names to analyze multiple sets of data
+separately. Groups of adjacent cases having the same values for these
+variables are analyzed by statistical procedure commands as one group.
+An independent analysis is carried out for each group of cases, and the
+variable values for the group are printed along with the analysis.
+
+When a list of variable names is specified, one of the keywords
+`LAYERED` or `SEPARATE` may also be specified. With `LAYERED`, which
+is the default, the separate analyses for each group are presented
+together in a single table. With `SEPARATE`, each analysis is
+presented in a separate table. Not all procedures honor the
+distinction.
+
+Groups are formed only by _adjacent_ cases. To create a split using a
+variable where like values are not adjacent in the working file, first
+[sort the data](sort-cases.md) by that variable.
+
+Specify `OFF` to disable `SPLIT FILE` and resume analysis of the
+entire active dataset as a single group of data.
+
+When `SPLIT FILE` is specified after [`TEMPORARY`](temporary.md), it
+affects only the next procedure.
+
+## Example
+
+The file `horticulture.sav` contains data describing the yield of a
+number of horticultural specimens which have been subjected to various
+treatments. If we wanted to investigate linear statistics of the
+yeild, one way to do this is using [`DESCRIPTIVES`](descriptives.md).
+However, it is reasonable to expect the mean to be different depending
+on the treatment. So we might want to perform three separate
+procedures -- one for each treatment.[^1] The following syntax shows
+how this can be done automatically using the `SPLIT FILE` command.
+
+[^1]: There are other, possibly better, ways to achieve a similar
+result using the `MEANS` or `EXAMINE` commands.
+
+```
+get file='horticulture.sav'.
+
+* Ensure cases are sorted before splitting.
+sort cases by treatment.
+
+split file by treatment.
+
+* Run descriptives on the yield variable
+descriptives /variable = yield.
+```
+
+In the following output, you can see that the table of descriptive
+statistics appears 3 times—once for each value of treatment. In this
+example `N`, the number of observations are identical in all splits.
+This is because that experiment was deliberately designed that way.
+However in general one can expect a different `N` for each split.
+
+```
+ Split Values
+┌─────────┬───────┐
+│Variable │ Value │
+├─────────┼───────┤
+│treatment│control│
+└─────────┴───────┘
+
+ Descriptive Statistics
+┌────────────────────┬──┬─────┬───────┬───────┬───────┐
+│ │ N│ Mean│Std Dev│Minimum│Maximum│
+├────────────────────┼──┼─────┼───────┼───────┼───────┤
+│yield │30│51.23│ 8.28│ 37.86│ 68.59│
+│Valid N (listwise) │30│ │ │ │ │
+│Missing N (listwise)│ 0│ │ │ │ │
+└────────────────────┴──┴─────┴───────┴───────┴───────┘
+
+ Split Values
+┌─────────┬────────────┐
+│Variable │ Value │
+├─────────┼────────────┤
+│treatment│conventional│
+└─────────┴────────────┘
+
+ Descriptive Statistics
+┌────────────────────┬──┬─────┬───────┬───────┬───────┐
+│ │ N│ Mean│Std Dev│Minimum│Maximum│
+├────────────────────┼──┼─────┼───────┼───────┼───────┤
+│yield │30│53.57│ 8.92│ 36.30│ 70.66│
+│Valid N (listwise) │30│ │ │ │ │
+│Missing N (listwise)│ 0│ │ │ │ │
+└────────────────────┴──┴─────┴───────┴───────┴───────┘
+
+ Split Values
+┌─────────┬───────────┐
+│Variable │ Value │
+├─────────┼───────────┤
+│treatment│traditional│
+└─────────┴───────────┘
+
+ Descriptive Statistics
+┌────────────────────┬──┬─────┬───────┬───────┬───────┐
+│ │ N│ Mean│Std Dev│Minimum│Maximum│
+├────────────────────┼──┼─────┼───────┼───────┼───────┤
+│yield │30│56.87│ 8.88│ 39.08│ 75.93│
+│Valid N (listwise) │30│ │ │ │ │
+│Missing N (listwise)│ 0│ │ │ │ │
+└────────────────────┴──┴─────┴───────┴───────┴───────┘
+```
+
+Example 13.3: The results of running `DESCRIPTIVES` with an active split
+
+Unless `TEMPORARY` was used, after a split has been defined for a
+dataset it remains active until explicitly disabled.
+
--- /dev/null
+# Working with SPSS Data Files
+
+These commands read and write data files in SPSS and other proprietary
+or specialized data formats.
+++ /dev/null
-# APPLY DICTIONARY
-
-```
-APPLY DICTIONARY FROM={'FILE_NAME',FILE_HANDLE}.
-```
-
-`APPLY DICTIONARY` applies the variable labels, value labels, and
-missing values taken from a file to corresponding variables in the
-active dataset. In some cases it also updates the weighting variable.
-
-The `FROM` clause is mandatory. Use it to specify a system file or
-portable file's name in single quotes, or a [file handle
-name](../../language/files/file-handles.md). The dictionary in the
-file is read, but it does not replace the active dataset's dictionary.
-The file's data is not read.
-
-Only variables with names that exist in both the active dataset and
-the system file are considered. Variables with the same name but
-different types (numeric, string) cause an error message. Otherwise,
-the system file variables' attributes replace those in their matching
-active dataset variables:
-
-- If a system file variable has a variable label, then it replaces
- the variable label of the active dataset variable. If the system
- file variable does not have a variable label, then the active
- dataset variable's variable label, if any, is retained.
-
-- If the system file variable has [variable
- attributes](../variables/variable-attribute.md), then those
- attributes replace the active dataset variable's variable
- attributes. If the system file variable does not have varaible
- attributes, then the active dataset variable's variable attributes,
- if any, is retained.
-
-- If the active dataset variable is numeric or short string, then
- value labels and missing values, if any, are copied to the active
- dataset variable. If the system file variable does not have value
- labels or missing values, then those in the active dataset
- variable, if any, are not disturbed.
-
-In addition to properties of variables, some properties of the active
-file dictionary as a whole are updated:
-
-- If the system file has custom attributes (see [DATAFILE
- ATTRIBUTE](../../commands/data-io/datafile-attribute.html)), then
- those attributes replace the active dataset variable's custom
- attributes.
-
-- If the active dataset has a [weight
- variable](../selection/weight.md), and the system file does not, or
- if the weighting variable in the system file does not exist in the
- active dataset, then the active dataset weighting variable, if any,
- is retained. Otherwise, the weighting variable in the system file
- becomes the active dataset weighting variable.
-
-`APPLY DICTIONARY` takes effect immediately. It does not read the
-active dataset. The system file is not modified.
-
+++ /dev/null
-# EXPORT
-
-```
-EXPORT
- /OUTFILE='FILE_NAME'
- /UNSELECTED={RETAIN,DELETE}
- /DIGITS=N
- /DROP=VAR_LIST
- /KEEP=VAR_LIST
- /RENAME=(SRC_NAMES=TARGET_NAMES)...
- /TYPE={COMM,TAPE}
- /MAP
-```
-
- The `EXPORT` procedure writes the active dataset's dictionary and
-data to a specified portable file.
-
- `UNSELECTED` controls whether cases excluded with
-[`FILTER`](../selection/filter.md) are written to the file. These can
-be excluded by specifying `DELETE` on the `UNSELECTED` subcommand.
-The default is `RETAIN`.
-
- Portable files express real numbers in base 30. Integers are
-always expressed to the maximum precision needed to make them exact.
-Non-integers are, by default, expressed to the machine's maximum
-natural precision (approximately 15 decimal digits on many machines).
-If many numbers require this many digits, the portable file may
-significantly increase in size. As an alternative, the `DIGITS`
-subcommand may be used to specify the number of decimal digits of
-precision to write. `DIGITS` applies only to non-integers.
-
- The `OUTFILE` subcommand, which is the only required subcommand,
-specifies the portable file to be written as a file name string or a
-[file handle](../../language/files/file-handles.md).
-
-`DROP`, `KEEP`, and `RENAME` have the same syntax and meaning as for
-the [`SAVE`](save.md) command.
-
- The `TYPE` subcommand specifies the character set for use in the
-portable file. Its value is currently not used.
-
- The `MAP` subcommand is currently ignored.
-
- `EXPORT` is a procedure. It causes the active dataset to be read.
-
+++ /dev/null
-# GET DATA
-
-```
-GET DATA
- /TYPE={GNM,ODS,PSQL,TXT}
- ...additional subcommands depending on TYPE...
-```
-
- The `GET DATA` command is used to read files and other data sources
-created by other applications. When this command is executed, the
-current dictionary and active dataset are replaced with variables and
-data read from the specified source.
-
- The `TYPE` subcommand is mandatory and must be the first subcommand
-specified. It determines the type of the file or source to read.
-PSPP currently supports the following `TYPE`s:
-
-* `GNM`
- Spreadsheet files created by Gnumeric (<http://gnumeric.org>).
-
-* `ODS`
- Spreadsheet files in OpenDocument format
- (<http://opendocumentformat.org>).
-
-* `PSQL`
- Relations from PostgreSQL databases (<http://postgresql.org>).
-
-* `TXT`
- Textual data files in columnar and delimited formats.
-
-Each supported file type has additional subcommands, explained in
-separate sections below.
-
-## Spreadsheet Files
-
-```
-GET DATA /TYPE={GNM, ODS}
- /FILE={'FILE_NAME'}
- /SHEET={NAME 'SHEET_NAME', INDEX N}
- /CELLRANGE={RANGE 'RANGE', FULL}
- /READNAMES={ON, OFF}
- /ASSUMEDSTRWIDTH=N.
-```
-
-`GET DATA` can read Gnumeric spreadsheets (<http://gnumeric.org>), and
-spreadsheets in OpenDocument format
-(<http://libreplanet.org/wiki/Group:OpenDocument/Software>). Use the
-`TYPE` subcommand to indicate the file's format. `/TYPE=GNM`
-indicates Gnumeric files, `/TYPE=ODS` indicates OpenDocument. The
-`FILE` subcommand is mandatory. Use it to specify the name file to be
-read. All other subcommands are optional.
-
- The format of each variable is determined by the format of the
-spreadsheet cell containing the first datum for the variable. If this
-cell is of string (text) format, then the width of the variable is
-determined from the length of the string it contains, unless the
-`ASSUMEDSTRWIDTH` subcommand is given.
-
- The `SHEET` subcommand specifies the sheet within the spreadsheet
-file to read. There are two forms of the `SHEET` subcommand. In the
-first form, `/SHEET=name SHEET_NAME`, the string SHEET_NAME is the name
-of the sheet to read. In the second form, `/SHEET=index IDX`, IDX is a
-integer which is the index of the sheet to read. The first sheet has
-the index 1. If the `SHEET` subcommand is omitted, then the command
-reads the first sheet in the file.
-
- The `CELLRANGE` subcommand specifies the range of cells within the
-sheet to read. If the subcommand is given as `/CELLRANGE=FULL`, then
-the entire sheet is read. To read only part of a sheet, use the form
-`/CELLRANGE=range 'TOP_LEFT_CELL:BOTTOM_RIGHT_CELL'`. For example,
-the subcommand `/CELLRANGE=range 'C3:P19'` reads columns C-P and rows
-3-19, inclusive. Without the `CELLRANGE` subcommand, the entire sheet
-is read.
-
- If `/READNAMES=ON` is specified, then the contents of cells of the
-first row are used as the names of the variables in which to store the
-data from subsequent rows. This is the default. If `/READNAMES=OFF` is
-used, then the variables receive automatically assigned names.
-
- The `ASSUMEDSTRWIDTH` subcommand specifies the maximum width of
-string variables read from the file. If omitted, the default value is
-determined from the length of the string in the first spreadsheet cell
-for each variable.
-
-## Postgres Database Queries
-
-```
-GET DATA /TYPE=PSQL
- /CONNECT={CONNECTION INFO}
- /SQL={QUERY}
- [/ASSUMEDSTRWIDTH=W]
- [/UNENCRYPTED]
- [/BSIZE=N].
-```
-
- `GET DATA /TYPE=PSQL` imports data from a local or remote Postgres
-database server. It automatically creates variables based on the table
-column names or the names specified in the SQL query. PSPP cannot
-support the full precision of some Postgres data types, so data of those
-types will lose some precision when PSPP imports them. PSPP does not
-support all Postgres data types. If PSPP cannot support a datum, `GET
-DATA` issues a warning and substitutes the system-missing value.
-
- The `CONNECT` subcommand must be a string for the parameters of the
-database server from which the data should be fetched. The format of
-the string is given [in the Postgres
-manual](http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT).
-
- The `SQL` subcommand must be a valid SQL statement to retrieve data
-from the database.
-
- The `ASSUMEDSTRWIDTH` subcommand specifies the maximum width of
-string variables read from the database. If omitted, the default value
-is determined from the length of the string in the first value read for
-each variable.
-
- The `UNENCRYPTED` subcommand allows data to be retrieved over an
-insecure connection. If the connection is not encrypted, and the
-`UNENCRYPTED` subcommand is not given, then an error occurs. Whether or
-not the connection is encrypted depends upon the underlying psql library
-and the capabilities of the database server.
-
- The `BSIZE` subcommand serves only to optimise the speed of data
-transfer. It specifies an upper limit on number of cases to fetch from
-the database at once. The default value is 4096. If your SQL statement
-fetches a large number of cases but only a small number of variables,
-then the data transfer may be faster if you increase this value.
-Conversely, if the number of variables is large, or if the machine on
-which PSPP is running has only a small amount of memory, then a smaller
-value is probably better.
-
-### Example
-
-```
-GET DATA /TYPE=PSQL
- /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
- /SQL='select * from manufacturer'.
-```
-
-## Textual Data Files
-
-```
-GET DATA /TYPE=TXT
- /FILE={'FILE_NAME',FILE_HANDLE}
- [ENCODING='ENCODING']
- [/ARRANGEMENT={DELIMITED,FIXED}]
- [/FIRSTCASE={FIRST_CASE}]
- [/IMPORTCASES=...]
- ...additional subcommands depending on ARRANGEMENT...
-```
-
- When `TYPE=TXT` is specified, `GET DATA` reads data in a delimited
-or fixed columnar format, much like [`DATA
-LIST`](../../commands/data-io/data-list.md).
-
- The `FILE` subcommand must specify the file to be read as a string
-file name or (for textual data only) a [file
-handle](../../language/files/file-handles.md)).
-
- The `ENCODING` subcommand specifies the character encoding of the
-file to be read. See [`INSERT`](../utilities/insert.md), for
-information on supported encodings.
-
- The `ARRANGEMENT` subcommand determines the file's basic format.
-`DELIMITED`, the default setting, specifies that fields in the input data
-are separated by spaces, tabs, or other user-specified delimiters.
-`FIXED` specifies that fields in the input data appear at particular fixed
-column positions within records of a case.
-
- By default, cases are read from the input file starting from the
-first line. To skip lines at the beginning of an input file, set
-`FIRSTCASE` to the number of the first line to read: 2 to skip the
-first line, 3 to skip the first two lines, and so on.
-
- `IMPORTCASES` is ignored, for compatibility. Use [`N OF
-CASES`](../selection/n.md) to limit the number of cases read from a
-file, or [`SAMPLE`](../selection/sample.md) to obtain a random sample
-of cases.
-
- The remaining subcommands apply only to one of the two file
-arrangements, described below.
-
-### Delimited Data
-
-```
-GET DATA /TYPE=TXT
- /FILE={'FILE_NAME',FILE_HANDLE}
- [/ARRANGEMENT={DELIMITED,FIXED}]
- [/FIRSTCASE={FIRST_CASE}]
- [/IMPORTCASE={ALL,FIRST MAX_CASES,PERCENT PERCENT}]
-
- /DELIMITERS="DELIMITERS"
- [/QUALIFIER="QUOTES"
- [/DELCASE={LINE,VARIABLES N_VARIABLES}]
- /VARIABLES=DEL_VAR1 [DEL_VAR2]...
-where each DEL_VAR takes the form:
- variable format
-```
-
- The `GET DATA` command with `TYPE=TXT` and `ARRANGEMENT=DELIMITED`
-reads input data from text files in delimited format, where fields are
-separated by a set of user-specified delimiters. Its capabilities are
-similar to those of [`DATA LIST
-FREE`](../../commands/data-io/data-list.md#data-list-free), with a few
-enhancements.
-
- The required `FILE` subcommand and optional `FIRSTCASE` and
-`IMPORTCASE` subcommands are described [above](#textual-data-files).
-
- `DELIMITERS`, which is required, specifies the set of characters that
-may separate fields. Each character in the string specified on
-`DELIMITERS` separates one field from the next. The end of a line also
-separates fields, regardless of `DELIMITERS`. Two consecutive
-delimiters in the input yield an empty field, as does a delimiter at the
-end of a line. A space character as a delimiter is an exception:
-consecutive spaces do not yield an empty field and neither does any
-number of spaces at the end of a line.
-
- To use a tab as a delimiter, specify `\t` at the beginning of the
-`DELIMITERS` string. To use a backslash as a delimiter, specify `\\` as
-the first delimiter or, if a tab should also be a delimiter, immediately
-following `\t`. To read a data file in which each field appears on a
-separate line, specify the empty string for `DELIMITERS`.
-
- The optional `QUALIFIER` subcommand names one or more characters that
-can be used to quote values within fields in the input. A field that
-begins with one of the specified quote characters ends at the next
-matching quote. Intervening delimiters become part of the field,
-instead of terminating it. The ability to specify more than one quote
-character is a PSPP extension.
-
- The character specified on `QUALIFIER` can be embedded within a field
-that it quotes by doubling the qualifier. For example, if `'` is
-specified on `QUALIFIER`, then `'a''b'` specifies a field that contains
-`a'b`.
-
- The `DELCASE` subcommand controls how data may be broken across
-lines in the data file. With `LINE`, the default setting, each line
-must contain all the data for exactly one case. For additional
-flexibility, to allow a single case to be split among lines or
-multiple cases to be contained on a single line, specify `VARIABLES
-n_variables`, where `n_variables` is the number of variables per case.
-
- The `VARIABLES` subcommand is required and must be the last
-subcommand. Specify the name of each variable and its [input
-format](../../language/datasets/formats/index.md), in the order they
-should be read from the input file.
-
-#### Example 1
-
-On a Unix-like system, the `/etc/passwd` file has a format similar to
-this:
-
-```
-root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
-blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
-john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
-jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
-```
-
-The following syntax reads a file in the format used by `/etc/passwd`:
-
-```
-GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
- /VARIABLES=username A20
- password A40
- uid F10
- gid F10
- gecos A40
- home A40
- shell A40.
-```
-
-#### Example 2
-
-Consider the following data on used cars:
-
-```
-model year mileage price type age
-Civic 2002 29883 15900 Si 2
-Civic 2003 13415 15900 EX 1
-Civic 1992 107000 3800 n/a 12
-Accord 2002 26613 17900 EX 1
-```
-
-The following syntax can be used to read the used car data:
-
-```
-GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
- /VARIABLES=model A8
- year F4
- mileage F6
- price F5
- type A4
- age F2.
-```
-
-#### Example 3
-
-Consider the following information on animals in a pet store:
-
-```
-'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
-, (Years), , , (Dollars), ,
-"Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
-"Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
-"Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
-"Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
-```
-
-The following syntax can be used to read the pet store data:
-
-```
-GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
- /FIRSTCASE=3
- /VARIABLES=name A10
- age F3.1
- color A5
- received EDATE10
- price F5.2
- height a5
- type a10.
-```
-
-### Fixed Columnar Data
-
-```
-GET DATA /TYPE=TXT
- /FILE={'file_name',FILE_HANDLE}
- [/ARRANGEMENT={DELIMITED,FIXED}]
- [/FIRSTCASE={FIRST_CASE}]
- [/IMPORTCASE={ALL,FIRST MAX_CASES,PERCENT PERCENT}]
-
- [/FIXCASE=N]
- /VARIABLES FIXED_VAR [FIXED_VAR]...
- [/rec# FIXED_VAR [FIXED_VAR]...]...
-where each FIXED_VAR takes the form:
- VARIABLE START-END FORMAT
-```
-
- The `GET DATA` command with `TYPE=TXT` and `ARRANGEMENT=FIXED`
-reads input data from text files in fixed format, where each field is
-located in particular fixed column positions within records of a case.
-Its capabilities are similar to those of [`DATA LIST
-FIXED`](../../commands/data-io/data-list.md#data-list-fixed), with a
-few enhancements.
-
- The required `FILE` subcommand and optional `FIRSTCASE` and
-`IMPORTCASE` subcommands are described [above](#textual-data-files).
-
- The optional `FIXCASE` subcommand may be used to specify the positive
-integer number of input lines that make up each case. The default value
-is 1.
-
- The `VARIABLES` subcommand, which is required, specifies the
-positions at which each variable can be found. For each variable,
-specify its name, followed by its start and end column separated by `-`
-(e.g. `0-9`), followed by an input format type (e.g. `F`) or a full
-format specification (e.g. `DOLLAR12.2`). For this command, columns are
-numbered starting from 0 at the left column. Introduce the variables in
-the second and later lines of a case by a slash followed by the number
-of the line within the case, e.g. `/2` for the second line.
-
-#### Example
-
-Consider the following data on used cars:
-
-```
-model year mileage price type age
-Civic 2002 29883 15900 Si 2
-Civic 2003 13415 15900 EX 1
-Civic 1992 107000 3800 n/a 12
-Accord 2002 26613 17900 EX 1
-```
-
-The following syntax can be used to read the used car data:
-
-```
-GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
- /VARIABLES=model 0-7 A
- year 8-15 F
- mileage 16-23 F
- price 24-31 F
- type 32-40 A
- age 40-47 F.
-```
+++ /dev/null
-# GET
-
-```
-GET
- /FILE={'FILE_NAME',FILE_HANDLE}
- /DROP=VAR_LIST
- /KEEP=VAR_LIST
- /RENAME=(SRC_NAMES=TARGET_NAMES)...
- /ENCODING='ENCODING'
-```
-
- `GET` clears the current dictionary and active dataset and replaces
-them with the dictionary and data from a specified file.
-
- The `FILE` subcommand is the only required subcommand. Specify the
-SPSS system file, SPSS/PC+ system file, or SPSS portable file to be
-read as a string file name or a [file
-handle](../../language/files/file-handles.md).
-
- By default, all the variables in a file are read. The `DROP`
-subcommand can be used to specify a list of variables that are not to
-be read. By contrast, the `KEEP` subcommand can be used to specify
-variable that are to be read, with all other variables not read.
-
- Normally variables in a file retain the names that they were saved
-under. Use the `RENAME` subcommand to change these names. Specify,
-within parentheses, a list of variable names followed by an equals sign
-(`=`) and the names that they should be renamed to. Multiple
-parenthesized groups of variable names can be included on a single
-`RENAME` subcommand. Variables' names may be swapped using a `RENAME`
-subcommand of the form `/RENAME=(A B=B A)`.
-
- Alternate syntax for the `RENAME` subcommand allows the parentheses
-to be omitted. When this is done, only a single variable may be
-renamed at once. For instance, `/RENAME=A=B`. This alternate syntax
-is discouraged.
-
- `DROP`, `KEEP`, and `RENAME` are executed in left-to-right order.
-Each may be present any number of times. `GET` never modifies a file on
-disk. Only the active dataset read from the file is affected by these
-subcommands.
-
- PSPP automatically detects the encoding of string data in the file,
-when possible. The character encoding of old SPSS system files cannot
-always be guessed correctly, and SPSS/PC+ system files do not include
-any indication of their encoding. Specify the `ENCODING` subcommand
-with an IANA character set name as its string argument to override the
-default. Use `SYSFILE INFO` to analyze the encodings that might be
-valid for a system file. The `ENCODING` subcommand is a PSPP extension.
-
- `GET` does not cause the data to be read, only the dictionary. The
-data is read later, when a procedure is executed.
-
- Use of `GET` to read a portable file is a PSPP extension.
-
+++ /dev/null
-# IMPORT
-
-```
-IMPORT
- /FILE='FILE_NAME'
- /TYPE={COMM,TAPE}
- /DROP=VAR_LIST
- /KEEP=VAR_LIST
- /RENAME=(SRC_NAMES=TARGET_NAMES)...
-```
-
-The `IMPORT` transformation clears the active dataset dictionary and
-data and replaces them with a dictionary and data from a system file or
-portable file.
-
-The `FILE` subcommand, which is the only required subcommand,
-specifies the portable file to be read as a file name string or a
-[file handle](../../language/files/file-handles.md).
-
-The `TYPE` subcommand is currently not used.
-
-`DROP`, `KEEP`, and `RENAME` follow the syntax used by
-[`GET`](get.md).
-
-`IMPORT` does not cause the data to be read; only the dictionary.
-The data is read later, when a procedure is executed.
-
-Use of `IMPORT` to read a system file is a PSPP extension.
-
+++ /dev/null
-# Working with SPSS Data Files
-
-These commands read and write data files in SPSS and other proprietary
-or specialized data formats.
+++ /dev/null
-# SAVE DATA COLLECTION
-
-```
-SAVE DATA COLLECTION
- /OUTFILE={'FILE_NAME',FILE_HANDLE}
- /METADATA={'FILE_NAME',FILE_HANDLE}
- /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
- /PERMISSIONS={WRITEABLE,READONLY}
- /DROP=VAR_LIST
- /KEEP=VAR_LIST
- /VERSION=VERSION
- /RENAME=(SRC_NAMES=TARGET_NAMES)...
- /NAMES
- /MAP
-```
-
-Like `SAVE`, `SAVE DATA COLLECTION` writes the dictionary and data in
-the active dataset to a system file. In addition, it writes metadata to
-an additional XML metadata file.
-
-`OUTFILE` is required. Specify the system file to be written as a
-string file name or a [file
-handle](../../language/files/file-handles.md).
-
-`METADATA` is also required. Specify the metadata file to be written
-as a string file name or a file handle. Metadata files customarily use
-a `.mdd` extension.
-
-The current implementation of this command is experimental. It only
-outputs an approximation of the metadata file format. Please report
-bugs.
-
-Other subcommands are optional. They have the same meanings as in
-the `SAVE` command.
-
-`SAVE DATA COLLECTION` causes the data to be read. It is a procedure.
-
+++ /dev/null
-# SAVE TRANSLATE
-
-```
-SAVE TRANSLATE
- /OUTFILE={'FILE_NAME',FILE_HANDLE}
- /TYPE={CSV,TAB}
- [/REPLACE]
- [/MISSING={IGNORE,RECODE}]
-
- [/DROP=VAR_LIST]
- [/KEEP=VAR_LIST]
- [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
- [/UNSELECTED={RETAIN,DELETE}]
- [/MAP]
-
- ...additional subcommands depending on TYPE...
-```
-
-The `SAVE TRANSLATE` command is used to save data into various
-formats understood by other applications.
-
-The `OUTFILE` and `TYPE` subcommands are mandatory. `OUTFILE`
-specifies the file to be written, as a string file name or a [file
-handle](../../language/files/file-handles.md). `TYPE` determines the
-type of the file or source to read. It must be one of the following:
-
-* `CSV`
- Comma-separated value format,
-
-* `TAB`
- Tab-delimited format.
-
-By default, `SAVE TRANSLATE` does not overwrite an existing file.
-Use `REPLACE` to force an existing file to be overwritten.
-
-With `MISSING=IGNORE`, the default, `SAVE TRANSLATE` treats
-user-missing values as if they were not missing. Specify
-`MISSING=RECODE` to output numeric user-missing values like
-system-missing values and string user-missing values as all spaces.
-
-By default, all the variables in the active dataset dictionary are
-saved to the system file, but `DROP` or `KEEP` can select a subset of
-variable to save. The `RENAME` subcommand can also be used to change
-the names under which variables are saved; because they are used only
-in the output, these names do not have to conform to the usual PSPP
-variable naming rules. `UNSELECTED` determines whether cases filtered
-out by the `FILTER` command are written to the output file. These
-subcommands have the same syntax and meaning as on the
-[`SAVE`](save.md) command.
-
-Each supported file type has additional subcommands, explained in
-separate sections below.
-
-`SAVE TRANSLATE` causes the data to be read. It is a procedure.
-
-## Comma- and Tab-Separated Data Files
-
-```
-SAVE TRANSLATE
- /OUTFILE={'FILE_NAME',FILE_HANDLE}
- /TYPE=CSV
- [/REPLACE]
- [/MISSING={IGNORE,RECODE}]
-
- [/DROP=VAR_LIST]
- [/KEEP=VAR_LIST]
- [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
- [/UNSELECTED={RETAIN,DELETE}]
-
- [/FIELDNAMES]
- [/CELLS={VALUES,LABELS}]
- [/TEXTOPTIONS DELIMITER='DELIMITER']
- [/TEXTOPTIONS QUALIFIER='QUALIFIER']
- [/TEXTOPTIONS DECIMAL={DOT,COMMA}]
- [/TEXTOPTIONS FORMAT={PLAIN,VARIABLE}]
-```
-
-The `SAVE TRANSLATE` command with `TYPE=CSV` or `TYPE=TAB` writes data in a
-comma- or tab-separated value format similar to that described by
-RFC 4180. Each variable becomes one output column, and each case
-becomes one line of output. If `FIELDNAMES` is specified, an additional
-line at the top of the output file lists variable names.
-
-The `CELLS` and `TEXTOPTIONS FORMAT` settings determine how values are
-written to the output file:
-
-* `CELLS=VALUES FORMAT=PLAIN` (the default settings)
- Writes variables to the output in "plain" formats that ignore the
- details of variable formats. Numeric values are written as plain
- decimal numbers with enough digits to indicate their exact values
- in machine representation. Numeric values include `e` followed by
- an exponent if the exponent value would be less than -4 or greater
- than 16. Dates are written in MM/DD/YYYY format and times in
- HH:MM:SS format. `WKDAY` and `MONTH` values are written as decimal
- numbers.
-
- Numeric values use, by default, the decimal point character set with
- [`SET DECIMAL`](../utilities/set.md#decimal). Use `DECIMAL=DOT` or
- `DECIMAL=COMMA` to force a particular decimal point character.
-
-* `CELLS=VALUES FORMAT=VARIABLE`
- Writes variables using their print formats. Leading and trailing
- spaces are removed from numeric values, and trailing spaces are
- removed from string values.
-
-* `CELLS=LABEL FORMAT=PLAIN`
- `CELLS=LABEL FORMAT=VARIABLE`
- Writes value labels where they exist, and otherwise writes the
- values themselves as described above.
-
- Regardless of `CELLS` and `TEXTOPTIONS FORMAT`, numeric system-missing
-values are output as a single space.
-
- For `TYPE=TAB`, tab characters delimit values. For `TYPE=CSV`, the
-`TEXTOPTIONS DELIMITER` and `DECIMAL` settings determine the character
-that separate values within a line. If `DELIMITER` is specified, then
-the specified string separate values. If `DELIMITER` is not
-specified, then the default is a comma with `DECIMAL=DOT` or a
-semicolon with `DECIMAL=COMMA`. If `DECIMAL` is not given either, it
-is inferred from the decimal point character set with [`SET
-DECIMAL`](../utilities/set.md#decimal).
-
- The `TEXTOPTIONS QUALIFIER` setting specifies a character that is
-output before and after a value that contains the delimiter character or
-the qualifier character. The default is a double quote (`"`). A
-qualifier character that appears within a value is doubled.
-
+++ /dev/null
-# SAVE
-
-```
-SAVE
- /OUTFILE={'FILE_NAME',FILE_HANDLE}
- /UNSELECTED={RETAIN,DELETE}
- /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
- /PERMISSIONS={WRITEABLE,READONLY}
- /DROP=VAR_LIST
- /KEEP=VAR_LIST
- /VERSION=VERSION
- /RENAME=(SRC_NAMES=TARGET_NAMES)...
- /NAMES
- /MAP
-```
-
- The `SAVE` procedure causes the dictionary and data in the active
-dataset to be written to a system file.
-
- `OUTFILE` is the only required subcommand. Specify the system file
-to be written as a string file name or a [file
-handle](../../language/files/file-handles.md).
-
- By default, cases excluded with `FILTER` are written to the system
-file. These can be excluded by specifying `DELETE` on the `UNSELECTED`
-subcommand. Specifying `RETAIN` makes the default explicit.
-
- The `UNCOMPRESSED`, `COMPRESSED`, and `ZCOMPRESSED` subcommand
-determine the system file's compression level:
-
-* `UNCOMPRESSED`
- Data is not compressed. Each numeric value uses 8 bytes of disk
- space. Each string value uses one byte per column width, rounded
- up to a multiple of 8 bytes.
-
-* `COMPRESSED`
- Data is compressed in a simple way. Each integer numeric value
- between −99 and 151, inclusive, or system missing value uses one
- byte of disk space. Each 8-byte segment of a string that consists
- only of spaces uses 1 byte. Any other numeric value or 8-byte
- string segment uses 9 bytes of disk space.
-
-* `ZCOMPRESSED`
- Data is compressed with the "deflate" compression algorithm
- specified in RFC 1951 (the same algorithm used by `gzip`). Files
- written with this compression level cannot be read by PSPP 0.8.1 or
- earlier or by SPSS 20 or earlier.
-
-`COMPRESSED` is the default compression level. The
-[`SET`](../utilities/set.md) command can change this default.
-
-The `PERMISSIONS` subcommand specifies operating system permissions
-for the new system file. `WRITEABLE`, the default, creates the file
-with read and write permission. `READONLY` creates the file for
-read-only access.
-
-By default, all the variables in the active dataset dictionary are
-written to the system file. The `DROP` subcommand can be used to
-specify a list of variables not to be written. In contrast, `KEEP`
-specifies variables to be written, with all variables not specified
-not written.
-
-Normally variables are saved to a system file under the same names
-they have in the active dataset. Use the `RENAME` subcommand to change
-these names. Specify, within parentheses, a list of variable names
-followed by an equals sign (`=`) and the names that they should be
-renamed to. Multiple parenthesized groups of variable names can be
-included on a single `RENAME` subcommand. Variables' names may be
-swapped using a `RENAME` subcommand of the form `/RENAME=(A B=B A)`.
-
-Alternate syntax for the `RENAME` subcommand allows the parentheses to
-be eliminated. When this is done, only a single variable may be
-renamed at once. For instance, `/RENAME=A=B`. This alternate syntax
-is discouraged.
-
-`DROP`, `KEEP`, and `RENAME` are performed in left-to-right order.
-They each may be present any number of times. `SAVE` never modifies
-the active dataset. `DROP`, `KEEP`, and `RENAME` only affect the
-system file written to disk.
-
-The `VERSION` subcommand specifies the version of the file format.
-Valid versions are 2 and 3. The default version is 3. In version 2
-system files, variable names longer than 8 bytes are truncated. The
-two versions are otherwise identical.
-
-The `NAMES` and `MAP` subcommands are currently ignored.
-
-`SAVE` causes the data to be read. It is a procedure.
-
+++ /dev/null
-# SYSFILE INFO
-
-```
-SYSFILE INFO FILE='FILE_NAME' [ENCODING='ENCODING'].
-```
-
-`SYSFILE INFO` reads the dictionary in an SPSS system file, SPSS/PC+
-system file, or SPSS portable file, and displays the information in
-its dictionary.
-
-Specify a file name or file handle. `SYSFILE INFO` reads that file
-and displays information on its dictionary.
-
-PSPP automatically detects the encoding of string data in the file,
-when possible. The character encoding of old SPSS system files cannot
-always be guessed correctly, and SPSS/PC+ system files do not include
-any indication of their encoding. Specify the `ENCODING` subcommand
-with an IANA character set name as its string argument to override the
-default, or specify `ENCODING='DETECT'` to analyze and report possibly
-valid encodings for the system file. The `ENCODING` subcommand is a
-PSPP extension.
-
-`SYSFILE INFO` does not affect the current active dataset.
-
+++ /dev/null
-# XEXPORT
-
-```
-XEXPORT
- /OUTFILE='FILE_NAME'
- /DIGITS=N
- /DROP=VAR_LIST
- /KEEP=VAR_LIST
- /RENAME=(SRC_NAMES=TARGET_NAMES)...
- /TYPE={COMM,TAPE}
- /MAP
-```
-
-The `XEXPORT` transformation writes the active dataset dictionary and
-data to a specified portable file.
-
-This transformation is a PSPP extension.
-
-It is similar to the `EXPORT` procedure, with two differences:
-
-- `XEXPORT` is a transformation, not a procedure. It is executed when
- the data is read by a procedure or procedure-like command.
-
-- `XEXPORT` does not support the `UNSELECTED` subcommand.
-
-See [`EXPORT`](export.md) for more information.
-
+++ /dev/null
-# XSAVE
-
-```
-XSAVE
- /OUTFILE='FILE_NAME'
- /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
- /PERMISSIONS={WRITEABLE,READONLY}
- /DROP=VAR_LIST
- /KEEP=VAR_LIST
- /VERSION=VERSION
- /RENAME=(SRC_NAMES=TARGET_NAMES)...
- /NAMES
- /MAP
-```
-
-The `XSAVE` transformation writes the active dataset's dictionary and
-data to a system file. It is similar to the `SAVE` procedure, with
-two differences:
-
-- `XSAVE` is a transformation, not a procedure. It is executed when
- the data is read by a procedure or procedure-like command.
-
-- `XSAVE` does not support the `UNSELECTED` subcommand.
-
-See [`SAVE`](save.md) for more information.
-
--- /dev/null
+# Statistics
+
+This chapter documents the statistical procedures that PSPP supports.
+++ /dev/null
-# CORRELATIONS
-
-```
-CORRELATIONS
- /VARIABLES = VAR_LIST [ WITH VAR_LIST ]
- [
- .
- .
- .
- /VARIABLES = VAR_LIST [ WITH VAR_LIST ]
- /VARIABLES = VAR_LIST [ WITH VAR_LIST ]
- ]
-
- [ /PRINT={TWOTAIL, ONETAIL} {SIG, NOSIG} ]
- [ /STATISTICS=DESCRIPTIVES XPROD ALL]
- [ /MISSING={PAIRWISE, LISTWISE} {INCLUDE, EXCLUDE} ]
-```
-
-The `CORRELATIONS` procedure produces tables of the Pearson
-correlation coefficient for a set of variables. The significance of the
-coefficients are also given.
-
-At least one `VARIABLES` subcommand is required. If you specify the
-`WITH` keyword, then a non-square correlation table is produced. The
-variables preceding `WITH`, are used as the rows of the table, and the
-variables following `WITH` are used as the columns of the table. If no
-`WITH` subcommand is specified, then `CORRELATIONS` produces a square,
-symmetrical table using all variables.
-
-The `MISSING` subcommand determines the handling of missing
-variables. If `INCLUDE` is set, then user-missing values are included
-in the calculations, but system-missing values are not. If `EXCLUDE` is
-set, which is the default, user-missing values are excluded as well as
-system-missing values.
-
-If `LISTWISE` is set, then the entire case is excluded from analysis
-whenever any variable specified in any `/VARIABLES` subcommand contains
-a missing value. If `PAIRWISE` is set, then a case is considered
-missing only if either of the values for the particular coefficient are
-missing. The default is `PAIRWISE`.
-
-The `PRINT` subcommand is used to control how the reported
-significance values are printed. If the `TWOTAIL` option is used, then
-a two-tailed test of significance is printed. If the `ONETAIL` option
-is given, then a one-tailed test is used. The default is `TWOTAIL`.
-
-If the `NOSIG` option is specified, then correlation coefficients
-with significance less than 0.05 are highlighted. If `SIG` is
-specified, then no highlighting is performed. This is the default.
-
-The `STATISTICS` subcommand requests additional statistics to be
-displayed. The keyword `DESCRIPTIVES` requests that the mean, number of
-non-missing cases, and the non-biased estimator of the standard
-deviation are displayed. These statistics are displayed in a separated
-table, for all the variables listed in any `/VARIABLES` subcommand. The
-`XPROD` keyword requests cross-product deviations and covariance
-estimators to be displayed for each pair of variables. The keyword
-`ALL` is the union of `DESCRIPTIVES` and `XPROD`.
-
+++ /dev/null
-CROSSTABS
-
-```
-CROSSTABS
- /TABLES=VAR_LIST BY VAR_LIST [BY VAR_LIST]...
- /MISSING={TABLE,INCLUDE,REPORT}
- /FORMAT={TABLES,NOTABLES}
- {AVALUE,DVALUE}
- /CELLS={COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
- ASRESIDUAL,ALL,NONE}
- /COUNT={ASIS,CASE,CELL}
- {ROUND,TRUNCATE}
- /STATISTICS={CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
- KAPPA,ETA,CORR,ALL,NONE}
- /BARCHART
-
-(Integer mode.)
- /VARIABLES=VAR_LIST (LOW,HIGH)...
-```
-
-The `CROSSTABS` procedure displays crosstabulation tables requested
-by the user. It can calculate several statistics for each cell in the
-crosstabulation tables. In addition, a number of statistics can be
-calculated for each table itself.
-
-The `TABLES` subcommand is used to specify the tables to be reported.
-Any number of dimensions is permitted, and any number of variables per
-dimension is allowed. The `TABLES` subcommand may be repeated as many
-times as needed. This is the only required subcommand in "general
-mode".
-
-Occasionally, one may want to invoke a special mode called "integer
-mode". Normally, in general mode, PSPP automatically determines what
-values occur in the data. In integer mode, the user specifies the range
-of values that the data assumes. To invoke this mode, specify the
-`VARIABLES` subcommand, giving a range of data values in parentheses for
-each variable to be used on the `TABLES` subcommand. Data values inside
-the range are truncated to the nearest integer, then assigned to that
-value. If values occur outside this range, they are discarded. When it
-is present, the `VARIABLES` subcommand must precede the `TABLES`
-subcommand.
-
-In general mode, numeric and string variables may be specified on
-`TABLES`. In integer mode, only numeric variables are allowed.
-
-The `MISSING` subcommand determines the handling of user-missing
-values. When set to `TABLE`, the default, missing values are dropped on
-a table by table basis. When set to `INCLUDE`, user-missing values are
-included in tables and statistics. When set to `REPORT`, which is
-allowed only in integer mode, user-missing values are included in tables
-but marked with a footnote and excluded from statistical calculations.
-
-The `FORMAT` subcommand controls the characteristics of the
-crosstabulation tables to be displayed. It has a number of possible
-settings:
-
-* `TABLES`, the default, causes crosstabulation tables to be output.
-
-* `NOTABLES`, which is equivalent to `CELLS=NONE`, suppresses them.
-
-* `AVALUE`, the default, causes values to be sorted in ascending
- order. `DVALUE` asserts a descending sort order.
-
-The `CELLS` subcommand controls the contents of each cell in the
-displayed crosstabulation table. The possible settings are:
-
-* `COUNT`
- Frequency count.
-* `ROW`
- Row percent.
-* `COLUMN`
- Column percent.
-* `TOTAL`
- Table percent.
-* `EXPECTED`
- Expected value.
-* `RESIDUAL`
- Residual.
-* `SRESIDUAL`
- Standardized residual.
-* `ASRESIDUAL`
- Adjusted standardized residual.
-* `ALL`
- All of the above.
-* `NONE`
- Suppress cells entirely.
-
-`/CELLS` without any settings specified requests `COUNT`, `ROW`,
-`COLUMN`, and `TOTAL`. If `CELLS` is not specified at all then only
-`COUNT` is selected.
-
-By default, crosstabulation and statistics use raw case weights,
-without rounding. Use the `/COUNT` subcommand to perform rounding:
-`CASE` rounds the weights of individual weights as cases are read,
-`CELL` rounds the weights of cells within each crosstabulation table
-after it has been constructed, and `ASIS` explicitly specifies the
-default non-rounding behavior. When rounding is requested, `ROUND`,
-the default, rounds to the nearest integer and `TRUNCATE` rounds
-toward zero.
-
-The `STATISTICS` subcommand selects statistics for computation:
-
-* `CHISQ`
- Pearson chi-square, likelihood ratio, Fisher's exact test,
- continuity correction, linear-by-linear association.
-* `PHI`
- Phi.
-* `CC`
- Contingency coefficient.
-* `LAMBDA`
- Lambda.
-* `UC`
- Uncertainty coefficient.
-* `BTAU`
- Tau-b.
-* `CTAU`
- Tau-c.
-* `RISK`
- Risk estimate.
-* `GAMMA`
- Gamma.
-* `D`
- Somers' D.
-* `KAPPA`
- Cohen's Kappa.
-* `ETA`
- Eta.
-* `CORR`
- Spearman correlation, Pearson's r.
-* `ALL`
- All of the above.
-* `NONE`
- No statistics.
-
-Selected statistics are only calculated when appropriate for the
-statistic. Certain statistics require tables of a particular size, and
-some statistics are calculated only in integer mode.
-
-`/STATISTICS` without any settings selects CHISQ. If the `STATISTICS`
-subcommand is not given, no statistics are calculated.
-
-The `/BARCHART` subcommand produces a clustered bar chart for the
-first two variables on each table. If a table has more than two
-variables, the counts for the third and subsequent levels are aggregated
-and the chart is produced as if there were only two variables.
-
-> Currently the implementation of `CROSSTABS` has the
-> following limitations:
->
-> - Significance of some symmetric and directional measures is not
-> calculated.
-> - Asymptotic standard error is not calculated for Goodman and
-> Kruskal's tau or symmetric Somers' d.
-> - Approximate T is not calculated for symmetric uncertainty
-> coefficient.
->
-> Fixes for any of these deficiencies would be welcomed.
-
-## Example
-
-A researcher wishes to know if, in an industry, a person's sex is
-related to the person's occupation. To investigate this, she has
-determined that the `personnel.sav` is a representative, randomly
-selected sample of persons. The researcher's null hypothesis is that a
-person's sex has no relation to a person's occupation. She uses a
-chi-squared test of independence to investigate the hypothesis.
-
-```
-get file="personnel.sav".
-
-crosstabs
- /tables= occupation by sex
- /cells = count expected
- /statistics=chisq.
-```
-
-The syntax above conducts a chi-squared test of independence. The
-line `/tables = occupation by sex` indicates that occupation and sex
-are the variables to be tabulated.
-
-As shown in the output below, `CROSSTABS` generates a contingency
-table containing the observed count and the expected count of each sex
-and each occupation. The expected count is the count which would be
-observed if the null hypothesis were true.
-
-The significance of the Pearson Chi-Square value is very much larger
-than the normally accepted value of 0.05 and so one cannot reject the
-null hypothesis. Thus the researcher must conclude that a person's
-sex has no relation to the person's occupation.
-
-```
- Summary
-┌────────────────┬───────────────────────────────┐
-│ │ Cases │
-│ ├──────────┬─────────┬──────────┤
-│ │ Valid │ Missing │ Total │
-│ ├──┬───────┼─┬───────┼──┬───────┤
-│ │ N│Percent│N│Percent│ N│Percent│
-├────────────────┼──┼───────┼─┼───────┼──┼───────┤
-│occupation × sex│54│ 96.4%│2│ 3.6%│56│ 100.0%│
-└────────────────┴──┴───────┴─┴───────┴──┴───────┘
-
- occupation × sex
-┌──────────────────────────────────────┬───────────┬─────┐
-│ │ sex │ │
-│ ├────┬──────┤ │
-│ │Male│Female│Total│
-├──────────────────────────────────────┼────┼──────┼─────┤
-│occupation Artist Count │ 2│ 6│ 8│
-│ Expected│4.89│ 3.11│ .15│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Baker Count │ 1│ 1│ 2│
-│ Expected│1.22│ .78│ .04│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Barrister Count │ 0│ 1│ 1│
-│ Expected│ .61│ .39│ .02│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Carpenter Count │ 3│ 1│ 4│
-│ Expected│2.44│ 1.56│ .07│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Cleaner Count │ 4│ 0│ 4│
-│ Expected│2.44│ 1.56│ .07│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Cook Count │ 3│ 2│ 5│
-│ Expected│3.06│ 1.94│ .09│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Manager Count │ 4│ 4│ 8│
-│ Expected│4.89│ 3.11│ .15│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Mathematician Count │ 3│ 1│ 4│
-│ Expected│2.44│ 1.56│ .07│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Painter Count │ 1│ 1│ 2│
-│ Expected│1.22│ .78│ .04│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Payload Specialist Count │ 1│ 0│ 1│
-│ Expected│ .61│ .39│ .02│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Plumber Count │ 5│ 0│ 5│
-│ Expected│3.06│ 1.94│ .09│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Scientist Count │ 5│ 2│ 7│
-│ Expected│4.28│ 2.72│ .13│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Scrientist Count │ 0│ 1│ 1│
-│ Expected│ .61│ .39│ .02│
-│ ────────────────────────────┼────┼──────┼─────┤
-│ Tailor Count │ 1│ 1│ 2│
-│ Expected│1.22│ .78│ .04│
-├──────────────────────────────────────┼────┼──────┼─────┤
-│Total Count │ 33│ 21│ 54│
-│ Expected│ .61│ .39│ 1.00│
-└──────────────────────────────────────┴────┴──────┴─────┘
-
- Chi─Square Tests
-┌──────────────────┬─────┬──┬──────────────────────────┐
-│ │Value│df│Asymptotic Sig. (2─tailed)│
-├──────────────────┼─────┼──┼──────────────────────────┤
-│Pearson Chi─Square│15.59│13│ .272│
-│Likelihood Ratio │19.66│13│ .104│
-│N of Valid Cases │ 54│ │ │
-└──────────────────┴─────┴──┴──────────────────────────┘
-```
+++ /dev/null
-# CTABLES
-
-`CTABLES` has the following overall syntax. At least one `TABLE`
-subcommand is required:
-
-```
-CTABLES
- ...global subcommands...
- [/TABLE axis [BY axis [BY axis]]
- ...per-table subcommands...]...
-```
-
-where each axis may be empty or take one of the following forms:
-
-```
-variable
-variable [{C | S}]
-axis + axis
-axis > axis
-(axis)
-axis [summary [string] [format]]
-```
-
-The following subcommands precede the first `TABLE` subcommand and
-apply to all of the output tables. All of these subcommands are
-optional:
-
-```
-/FORMAT
- [MINCOLWIDTH={DEFAULT | width}]
- [MAXCOLWIDTH={DEFAULT | width}]
- [UNITS={POINTS | INCHES | CM}]
- [EMPTY={ZERO | BLANK | string}]
- [MISSING=string]
-/VLABELS
- VARIABLES=variables
- DISPLAY={DEFAULT | NAME | LABEL | BOTH | NONE}
-/SMISSING {VARIABLE | LISTWISE}
-/PCOMPUTE &postcompute=EXPR(expression)
-/PPROPERTIES &postcompute...
- [LABEL=string]
- [FORMAT=[summary format]...]
- [HIDESOURCECATS={NO | YES}
-/WEIGHT VARIABLE=variable
-/HIDESMALLCOUNTS COUNT=count
-```
-
-The following subcommands follow `TABLE` and apply only to the
-previous `TABLE`. All of these subcommands are optional:
-
-```
-/SLABELS
- [POSITION={COLUMN | ROW | LAYER}]
- [VISIBLE={YES | NO}]
-/CLABELS {AUTO | {ROWLABELS|COLLABELS}={OPPOSITE|LAYER}}
-/CATEGORIES VARIABLES=variables
- {[value, value...]
- | [ORDER={A | D}]
- [KEY={VALUE | LABEL | summary(variable)}]
- [MISSING={EXCLUDE | INCLUDE}]}
- [TOTAL={NO | YES} [LABEL=string] [POSITION={AFTER | BEFORE}]]
- [EMPTY={INCLUDE | EXCLUDE}]
-/TITLES
- [TITLE=string...]
- [CAPTION=string...]
- [CORNER=string...]
-```
-
-The `CTABLES` (aka "custom tables") command produces
-multi-dimensional tables from categorical and scale data. It offers
-many options for data summarization and formatting.
-
-This section's examples use data from the 2008 (USA) National Survey
-of Drinking and Driving Attitudes and Behaviors, a public domain data
-set from the (USA) National Highway Traffic Administration and available
-at <https://data.transportation.gov>. PSPP includes this data set, with
-a modified dictionary, as `examples/nhtsa.sav`.
-
-<!-- toc -->
-
-## Basics
-
-The only required subcommand is `TABLE`, which specifies the variables
-to include along each axis:
-
-```
- /TABLE rows [BY columns [BY layers]]
-```
-
-In `TABLE`, each of `ROWS`, `COLUMNS`, and `LAYERS` is either empty or
-an axis expression that specifies one or more variables. At least one
-must specify an axis expression.
-
-## Categorical Variables
-
-An axis expression that names a categorical variable divides the data
-into cells according to the values of that variable. When all the
-variables named on `TABLE` are categorical, by default each cell
-displays the number of cases that it contains, so specifying a single
-variable yields a frequency table, much like the output of the
-[`FREQUENCIES`](frequencies.md) command:
-
-```
- CTABLES /TABLE=ageGroup.
-```
-
-```
- Custom Tables
-┌───────────────────────┬─────┐
-│ │Count│
-├───────────────────────┼─────┤
-│Age group 15 or younger│ 0│
-│ 16 to 25 │ 1099│
-│ 26 to 35 │ 967│
-│ 36 to 45 │ 1037│
-│ 46 to 55 │ 1175│
-│ 56 to 65 │ 1247│
-│ 66 or older │ 1474│
-└───────────────────────┴─────┘
-```
-
-Specifying a row and a column categorical variable yields a
-crosstabulation, much like the output of the
-[`CROSSTABS`](crosstabs.md) command:
-
-```
-CTABLES /TABLE=ageGroup BY gender.
-```
-
-```
- Custom Tables
-┌───────────────────────┬────────────┐
-│ │S3a. GENDER:│
-│ ├─────┬──────┤
-│ │ Male│Female│
-│ ├─────┼──────┤
-│ │Count│ Count│
-├───────────────────────┼─────┼──────┤
-│Age group 15 or younger│ 0│ 0│
-│ 16 to 25 │ 594│ 505│
-│ 26 to 35 │ 476│ 491│
-│ 36 to 45 │ 489│ 548│
-│ 46 to 55 │ 526│ 649│
-│ 56 to 65 │ 516│ 731│
-│ 66 or older │ 531│ 943│
-└───────────────────────┴─────┴──────┘
-```
-
-The `>` "nesting" operator nests multiple variables on a single axis,
-e.g.:
-
-```
-CTABLES /TABLE likelihoodOfBeingStoppedByPolice BY ageGroup > gender.
-```
-
-```
- Custom Tables
-┌─────────────────────────────────┬───────────────────────────────────────────┐
-│ │ 86. In the past year, have you hosted a │
-│ │ social event or party where alcohol was │
-│ │ served to adults? │
-│ ├─────────────────────┬─────────────────────┤
-│ │ Yes │ No │
-│ ├─────────────────────┼─────────────────────┤
-│ │ Count │ Count │
-├─────────────────────────────────┼─────────────────────┼─────────────────────┤
-│Age 15 or S3a. Male │ 0│ 0│
-│group younger GENDER: Female│ 0│ 0│
-│ ───────────────────────────┼─────────────────────┼─────────────────────┤
-│ 16 to 25 S3a. Male │ 208│ 386│
-│ GENDER: Female│ 202│ 303│
-│ ───────────────────────────┼─────────────────────┼─────────────────────┤
-│ 26 to 35 S3a. Male │ 225│ 251│
-│ GENDER: Female│ 242│ 249│
-│ ───────────────────────────┼─────────────────────┼─────────────────────┤
-│ 36 to 45 S3a. Male │ 223│ 266│
-│ GENDER: Female│ 240│ 307│
-│ ───────────────────────────┼─────────────────────┼─────────────────────┤
-│ 46 to 55 S3a. Male │ 201│ 325│
-│ GENDER: Female│ 282│ 366│
-│ ───────────────────────────┼─────────────────────┼─────────────────────┤
-│ 56 to 65 S3a. Male │ 196│ 320│
-│ GENDER: Female│ 279│ 452│
-│ ───────────────────────────┼─────────────────────┼─────────────────────┤
-│ 66 or S3a. Male │ 162│ 367│
-│ older GENDER: Female│ 243│ 700│
-└─────────────────────────────────┴─────────────────────┴─────────────────────┘
-```
-
-The `+` "stacking" operator allows a single output table to include
-multiple data analyses. With `+`, `CTABLES` divides the output table
-into multiple "sections", each of which includes an analysis of the full
-data set. For example, the following command separately tabulates age
-group and driving frequency by gender:
-
-```
-CTABLES /TABLE ageGroup + freqOfDriving BY gender.
-```
-
-```
- Custom Tables
-┌────────────────────────────────────────────────────────────────┬────────────┐
-│ │S3a. GENDER:│
-│ ├─────┬──────┤
-│ │ Male│Female│
-│ ├─────┼──────┤
-│ │Count│ Count│
-├────────────────────────────────────────────────────────────────┼─────┼──────┤
-│Age group 15 or younger │ 0│ 0│
-│ 16 to 25 │ 594│ 505│
-│ 26 to 35 │ 476│ 491│
-│ 36 to 45 │ 489│ 548│
-│ 46 to 55 │ 526│ 649│
-│ 56 to 65 │ 516│ 731│
-│ 66 or older │ 531│ 943│
-├────────────────────────────────────────────────────────────────┼─────┼──────┤
-│ 1. How often do you usually drive a car or Every day │ 2305│ 2362│
-│other motor vehicle? Several days a week│ 440│ 834│
-│ Once a week or less│ 125│ 236│
-│ Only certain times │ 58│ 72│
-│ a year │ │ │
-│ Never │ 192│ 348│
-└────────────────────────────────────────────────────────────────┴─────┴──────┘
-```
-
-When `+` and `>` are used together, `>` binds more tightly. Use
-parentheses to override operator precedence. Thus:
-
-```
-CTABLES /TABLE hasConsideredReduction + hasBeenCriticized > gender.
-CTABLES /TABLE (hasConsideredReduction + hasBeenCriticized) > gender.
-```
-
-```
- Custom Tables
-┌───────────────────────────────────────────────────────────────────────┬─────┐
-│ │Count│
-├───────────────────────────────────────────────────────────────────────┼─────┤
-│26. During the last 12 months, has there been a Yes │ 513│
-│time when you felt you should cut down on your ─────────────────────┼─────┤
-│drinking? No │ 3710│
-├───────────────────────────────────────────────────────────────────────┼─────┤
-│27. During the last 12 months, has there been a Yes S3a. Male │ 135│
-│time when people criticized your drinking? GENDER: Female│ 49│
-│ ─────────────────────┼─────┤
-│ No S3a. Male │ 1916│
-│ GENDER: Female│ 2126│
-└───────────────────────────────────────────────────────────────────────┴─────┘
-
- Custom Tables
-┌───────────────────────────────────────────────────────────────────────┬─────┐
-│ │Count│
-├───────────────────────────────────────────────────────────────────────┼─────┤
-│26. During the last 12 months, has there been a Yes S3a. Male │ 333│
-│time when you felt you should cut down on your GENDER: Female│ 180│
-│drinking? ─────────────────────┼─────┤
-│ No S3a. Male │ 1719│
-│ GENDER: Female│ 1991│
-├───────────────────────────────────────────────────────────────────────┼─────┤
-│27. During the last 12 months, has there been a Yes S3a. Male │ 135│
-│time when people criticized your drinking? GENDER: Female│ 49│
-│ ─────────────────────┼─────┤
-│ No S3a. Male │ 1916│
-│ GENDER: Female│ 2126│
-└───────────────────────────────────────────────────────────────────────┴─────┘
-```
-
-
-## Scalar Variables
-
-For a categorical variable, `CTABLES` divides the table into a cell per
-category. For a scalar variable, `CTABLES` instead calculates a summary
-measure, by default the mean, of the values that fall into a cell. For
-example, if the only variable specified is a scalar variable, then the
-output is a single cell that holds the mean of all of the data:
-
-```
-CTABLES /TABLE age.
-```
-
-```
- Custom Tables
-┌──────────────────────────┬────┐
-│ │Mean│
-├──────────────────────────┼────┤
-│D1. AGE: What is your age?│ 48│
-└──────────────────────────┴────┘
-```
-
-A scalar variable may nest with categorical variables. The following
-example shows the mean age of survey respondents across gender and
-language groups:
-
-```
-CTABLES /TABLE gender > age BY region.
-```
-
-```
-Custom Tables
-┌─────────────────────────────────────┬───────────────────────────────────────┐
-│ │Was this interview conducted in English│
-│ │ or Spanish? │
-│ ├───────────────────┬───────────────────┤
-│ │ English │ Spanish │
-│ ├───────────────────┼───────────────────┤
-│ │ Mean │ Mean │
-├─────────────────────────────────────┼───────────────────┼───────────────────┤
-│D1. AGE: What is S3a. Male │ 46│ 37│
-│your age? GENDER: Female│ 51│ 39│
-└─────────────────────────────────────┴───────────────────┴───────────────────┘
-```
-
-The order of nesting of scalar and categorical variables affects
-table labeling, but it does not affect the data displayed in the table.
-The following example shows how the output changes when the nesting
-order of the scalar and categorical variable are interchanged:
-
-```
-CTABLES /TABLE age > gender BY region.
-```
-
-```
- Custom Tables
-┌─────────────────────────────────────┬───────────────────────────────────────┐
-│ │Was this interview conducted in English│
-│ │ or Spanish? │
-│ ├───────────────────┬───────────────────┤
-│ │ English │ Spanish │
-│ ├───────────────────┼───────────────────┤
-│ │ Mean │ Mean │
-├─────────────────────────────────────┼───────────────────┼───────────────────┤
-│S3a. Male D1. AGE: What is │ 46│ 37│
-│GENDER: your age? │ │ │
-│ ───────────────────────────┼───────────────────┼───────────────────┤
-│ Female D1. AGE: What is │ 51│ 39│
-│ your age? │ │ │
-└─────────────────────────────────────┴───────────────────┴───────────────────┘
-```
-
-Only a single scalar variable may appear in each section; that is, a
-scalar variable may not nest inside a scalar variable directly or
-indirectly. Scalar variables may only appear on one axis within
-`TABLE`.
-
-## Overriding Measurement Level
-
-By default, `CTABLES` uses a variable's measurement level to decide
-whether to treat it as categorical or scalar. Variables assigned the
-nominal or ordinal measurement level are treated as categorical, and
-scalar variables are treated as scalar.
-
-When PSPP reads data from a file in an external format, such as a text
-file, variables' measurement levels are often unknown. If `CTABLES`
-runs when a variable has an unknown measurement level, it makes an
-initial pass through the data to [guess measurement
-levels](../../language/datasets/variables.md). Use the [`VARIABLE
-LEVEL`](../../commands/variables/variable-level.md) command to set or
-change a variable's measurement level.
-
-To treat a variable as categorical or scalar only for one use on
-`CTABLES`, add `[C]` or `[S]`, respectively, after the variable name.
-The following example shows the output when variable
-`monthDaysMin1drink` is analyzed as scalar (the default for its
-measurement level) and as categorical:
-
-```
-CTABLES
- /TABLE monthDaysMin1drink BY gender
- /TABLE monthDaysMin1drink [C] BY gender.
-```
-
-```
- Custom Tables
-┌────────────────────────────────────────────────────────────────┬────────────┐
-│ │S3a. GENDER:│
-│ ├────┬───────┤
-│ │Male│ Female│
-│ ├────┼───────┤
-│ │Mean│ Mean │
-├────────────────────────────────────────────────────────────────┼────┼───────┤
-│20. On how many of the thirty days in this typical month did you│ 7│ 5│
-│have one or more alcoholic beverages to drink? │ │ │
-└────────────────────────────────────────────────────────────────┴────┴───────┘
-
- Custom Tables
-┌────────────────────────────────────────────────────────────────┬────────────┐
-│ │S3a. GENDER:│
-│ ├─────┬──────┤
-│ │ Male│Female│
-│ ├─────┼──────┤
-│ │Count│ Count│
-├────────────────────────────────────────────────────────────────┼─────┼──────┤
-│20. On how many of the thirty days in this typical month None │ 152│ 258│
-│did you have one or more alcoholic beverages to drink? 1 │ 403│ 653│
-│ 2 │ 284│ 324│
-│ 3 │ 169│ 215│
-│ 4 │ 178│ 143│
-│ 5 │ 107│ 106│
-│ 6 │ 67│ 59│
-│ 7 │ 31│ 11│
-│ 8 │ 101│ 74│
-│ 9 │ 6│ 4│
-│ 10 │ 95│ 75│
-│ 11 │ 4│ 0│
-│ 12 │ 58│ 33│
-│ 13 │ 3│ 2│
-│ 14 │ 13│ 3│
-│ 15 │ 79│ 58│
-│ 16 │ 10│ 6│
-│ 17 │ 4│ 2│
-│ 18 │ 5│ 4│
-│ 19 │ 2│ 0│
-│ 20 │ 105│ 47│
-│ 21 │ 2│ 0│
-│ 22 │ 3│ 3│
-│ 23 │ 0│ 3│
-│ 24 │ 3│ 0│
-│ 25 │ 35│ 25│
-│ 26 │ 1│ 1│
-│ 27 │ 3│ 3│
-│ 28 │ 13│ 8│
-│ 29 │ 3│ 3│
-│ Every │ 104│ 43│
-│ day │ │ │
-└────────────────────────────────────────────────────────────────┴─────┴──────┘
-```
-
-## Data Summarization
-
-The `CTABLES` command allows the user to control how the data are
-summarized with "summary specifications", syntax that lists one or more
-summary function names, optionally separated by commas, and which are
-enclosed in square brackets following a variable name on the `TABLE`
-subcommand. When all the variables are categorical, summary
-specifications can be given for the innermost nested variables on any
-one axis. When a scalar variable is present, only the scalar variable
-may have summary specifications.
-
-The following example includes a summary specification for column and
-row percentages for categorical variables, and mean and median for a
-scalar variable:
-
-```
-CTABLES
- /TABLE=age [MEAN, MEDIAN] BY gender
- /TABLE=ageGroup [COLPCT, ROWPCT] BY gender.
-```
-
-```
- Custom Tables
-┌──────────────────────────┬───────────────────────┐
-│ │ S3a. GENDER: │
-│ ├───────────┬───────────┤
-│ │ Male │ Female │
-│ ├────┬──────┼────┬──────┤
-│ │Mean│Median│Mean│Median│
-├──────────────────────────┼────┼──────┼────┼──────┤
-│D1. AGE: What is your age?│ 46│ 45│ 50│ 52│
-└──────────────────────────┴────┴──────┴────┴──────┘
-
- Custom Tables
-┌───────────────────────┬─────────────────────────────┐
-│ │ S3a. GENDER: │
-│ ├──────────────┬──────────────┤
-│ │ Male │ Female │
-│ ├────────┬─────┼────────┬─────┤
-│ │Column %│Row %│Column %│Row %│
-├───────────────────────┼────────┼─────┼────────┼─────┤
-│Age group 15 or younger│ .0%│ .│ .0%│ .│
-│ 16 to 25 │ 19.0%│54.0%│ 13.1%│46.0%│
-│ 26 to 35 │ 15.2%│49.2%│ 12.7%│50.8%│
-│ 36 to 45 │ 15.6%│47.2%│ 14.2%│52.8%│
-│ 46 to 55 │ 16.8%│44.8%│ 16.8%│55.2%│
-│ 56 to 65 │ 16.5%│41.4%│ 18.9%│58.6%│
-│ 66 or older │ 17.0%│36.0%│ 24.4%│64.0%│
-└───────────────────────┴────────┴─────┴────────┴─────┘
-```
-
-A summary specification may override the default label and format by
-appending a string or format specification or both (in that order) to
-the summary function name. For example:
-
-```
-CTABLES /TABLE=ageGroup [COLPCT 'Gender %' PCT5.0,
- ROWPCT 'Age Group %' PCT5.0]
- BY gender.
-```
-
-```
- Custom Tables
-┌───────────────────────┬─────────────────────────────────────────┐
-│ │ S3a. GENDER: │
-│ ├────────────────────┬────────────────────┤
-│ │ Male │ Female │
-│ ├────────┬───────────┼────────┬───────────┤
-│ │Gender %│Age Group %│Gender %│Age Group %│
-├───────────────────────┼────────┼───────────┼────────┼───────────┤
-│Age group 15 or younger│ 0%│ .│ 0%│ .│
-│ 16 to 25 │ 19%│ 54%│ 13%│ 46%│
-│ 26 to 35 │ 15%│ 49%│ 13%│ 51%│
-│ 36 to 45 │ 16%│ 47%│ 14%│ 53%│
-│ 46 to 55 │ 17%│ 45%│ 17%│ 55%│
-│ 56 to 65 │ 16%│ 41%│ 19%│ 59%│
-│ 66 or older │ 17%│ 36%│ 24%│ 64%│
-└───────────────────────┴────────┴───────────┴────────┴───────────┘
-```
-
-In addition to the standard formats, `CTABLES` allows the user to
-specify the following special formats:
-
-|Format|Description|Positive Example|Negative Example|
-|:-----|:----------|-------:|-------:|
-|`NEGPARENw.d`|Encloses negative numbers in parentheses.|42.96|(42.96)|
-|`NEQUALw.d`|Adds a `N=` prefix.|N=42.96|N=-42.96|
-|`PARENw.d`|Encloses all numbers in parentheses.|(42.96)|(-42.96)|
-|`PCTPARENw.d`|Encloses all numbers in parentheses with a `%` suffix.|(42.96%)|(-42.96%)|
-
-Parentheses provide a shorthand to apply summary specifications to
-multiple variables. For example, both of these commands:
-
-```
-CTABLES /TABLE=ageGroup[COLPCT] + membersOver16[COLPCT] BY gender.
-CTABLES /TABLE=(ageGroup + membersOver16)[COLPCT] BY gender.
-```
-
-produce the same output shown below:
-
-```
- Custom Tables
-┌─────────────────────────────────────────────────────────────┬───────────────┐
-│ │ S3a. GENDER: │
-│ ├───────┬───────┤
-│ │ Male │ Female│
-│ ├───────┼───────┤
-│ │ Column│ Column│
-│ │ % │ % │
-├─────────────────────────────────────────────────────────────┼───────┼───────┤
-│Age group 15 or │ .0%│ .0%│
-│ younger │ │ │
-│ 16 to 25 │ 19.0%│ 13.1%│
-│ 26 to 35 │ 15.2%│ 12.7%│
-│ 36 to 45 │ 15.6%│ 14.2%│
-│ 46 to 55 │ 16.8%│ 16.8%│
-│ 56 to 65 │ 16.5%│ 18.9%│
-│ 66 or older│ 17.0%│ 24.4%│
-├─────────────────────────────────────────────────────────────┼───────┼───────┤
-│S1. Including yourself, how many members of this None │ .0%│ .0%│
-│household are age 16 or older? 1 │ 21.4%│ 35.0%│
-│ 2 │ 61.9%│ 52.3%│
-│ 3 │ 11.0%│ 8.2%│
-│ 4 │ 4.2%│ 3.2%│
-│ 5 │ 1.1%│ .9%│
-│ 6 or more │ .4%│ .4%│
-└─────────────────────────────────────────────────────────────┴───────┴───────┘
-```
-
-The following sections list the available summary functions. After
-each function's name is given its default label and format. If no
-format is listed, then the default format is the print format for the
-variable being summarized.
-
-### Summary Functions for Individual Cells
-
-This section lists the summary functions that consider only an
-individual cell in `CTABLES`. Only one such summary function, `COUNT`,
-may be applied to both categorical and scale variables:
-
-* `COUNT` ("Count", F40.0)
- The sum of weights in a cell.
-
- If `CATEGORIES` for one or more of the variables in a table include
- missing values (see [Per-Variable Category
- Options](#per-variable-category-options)), then some or all of the
- categories for a cell might be missing values. `COUNT` counts data
- included in a cell regardless of whether its categories are missing.
-
-The following summary functions apply only to scale variables or
-totals and subtotals for categorical variables. Be cautious about
-interpreting the summary value in the latter case, because it is not
-necessarily meaningful; however, the mean of a Likert scale, etc. may
-have a straightforward interpreation.
-
-* `MAXIMUM` ("Maximum")
- The largest value.
-
-* `MEAN` ("Mean")
- The mean.
-
-* `MEDIAN` ("Median")
- The median value.
-
-* `MINIMUM` ("Minimum")
- The smallest value.
-
-* `MISSING` ("Missing")
- Sum of weights of user- and system-missing values.
-
-* `MODE` ("Mode")
- The highest-frequency value. Ties are broken by taking the
- smallest mode.
-
-* `PTILE` n ("Percentile n")
- The Nth percentile, where 0 ≤ N ≤ 100.
-
-* `RANGE` ("Range")
- The maximum minus the minimum.
-
-* `SEMEAN` ("Std Error of Mean")
- The standard error of the mean.
-
-* `STDDEV` ("Std Deviation")
- The standard deviation.
-
-* `SUM` ("Sum")
- The sum.
-
-* `TOTALN` ("Total N", F40.0)
- The sum of weights in a cell.
-
- For scale data, `COUNT` and `TOTALN` are the same.
-
- For categorical data, `TOTALN` counts missing values in excluded
- categories, that is, user-missing values not in an explicit category
- list on `CATEGORIES` (see [Per-Variable Category
- Options](#per-variable-category-options)), or user-missing values
- excluded because `MISSING=EXCLUDE` is in effect on `CATEGORIES`, or
- system-missing values. `COUNT` does not count these.
-
- See [Missing Values for Summary
- Variables](#missing-values-for-summary-variables), for details of
- how `CTABLES` summarizes missing values.
-
-* `VALIDN` ("Valid N", F40.0)
- The sum of valid count weights in included categories.
-
- For categorical variables, `VALIDN` does not count missing values
- regardless of whether they are in included categories via
- `CATEGORIES`. `VALIDN` does not count valid values that are in
- excluded categories. See [Missing Values for Summary
- Variables](#missing-values-for-summary-variables) for details.
-
-* `VARIANCE` ("Variance")
- The variance.
-
-### Summary Functions for Groups of Cells
-
-These summary functions summarize over multiple cells within an area of
-the output chosen by the user and specified as part of the function
-name. The following basic AREAs are supported, in decreasing order of
-size:
-
-* `TABLE`
- A "section". Stacked variables divide sections of the output from
- each other. sections may span multiple layers.
-
-* `LAYER`
- A section within a single layer.
-
-* `SUBTABLE`
- A "subtable", whose contents are the cells that pair an innermost
- row variable and an innermost column variable within a single
- layer.
-
-The following shows how the output for the table expression
-`hasBeenPassengerOfDesignatedDriver > hasBeenPassengerOfDrunkDriver BY
-isLicensedDriver > hasHostedEventWithAlcohol + hasBeenDesignatedDriver
-BY gender`[^1] is divided up into `TABLE`, `LAYER`, and `SUBTABLE`
-areas. Each unique value for Table ID is one section, and similarly
-for Layer ID and Subtable ID. Thus, this output has two `TABLE` areas
-(one for `isLicensedDriver` and one for `hasBeenDesignatedDriver`),
-four `LAYER` areas (for those two variables, per layer), and 12
-`SUBTABLE` areas.
-
-```
- Custom Tables
-Male
-┌─────────────────────────────────┬─────────────────┬──────┐
-│ │ licensed │desDrv│
-│ ├────────┬────────┼───┬──┤
-│ │ Yes │ No │ │ │
-│ ├────────┼────────┤ │ │
-│ │ hostAlc│ hostAlc│ │ │
-│ ├────┬───┼────┬───┤ │ │
-│ │ Yes│ No│ Yes│ No│Yes│No│
-├─────────────────────────────────┼────┼───┼────┼───┼───┼──┤
-│desPas Yes druPas Yes Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Subtable ID│ 1│ 1│ 2│ 2│ 3│ 3│
-│ ────────────────┼────┼───┼────┼───┼───┼──┤
-│ No Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Subtable ID│ 1│ 1│ 2│ 2│ 3│ 3│
-│ ───────────────────────────┼────┼───┼────┼───┼───┼──┤
-│ No druPas Yes Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Subtable ID│ 4│ 4│ 5│ 5│ 6│ 6│
-│ ────────────────┼────┼───┼────┼───┼───┼──┤
-│ No Table ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Layer ID │ 1│ 1│ 1│ 1│ 2│ 2│
-│ Subtable ID│ 4│ 4│ 5│ 5│ 6│ 6│
-└─────────────────────────────────┴────┴───┴────┴───┴───┴──┘
-```
-
-`CTABLES` also supports the following AREAs that further divide a
-subtable or a layer within a section:
-
-* `LAYERROW`
- `LAYERCOL`
- A row or column, respectively, in one layer of a section.
-
-* `ROW`
- `COL`
- A row or column, respectively, in a subtable.
-
-The following summary functions for groups of cells are available for
-each AREA described above, for both categorical and scale variables:
-
-* `areaPCT` or `areaPCT.COUNT` ("Area %", PCT40.1)
- A percentage of total counts within AREA.
-
-* `areaPCT.VALIDN` ("Area Valid N %", PCT40.1)
- A percentage of total counts for valid values within AREA.
-
-* `areaPCT.TOTALN` ("Area Total N %", PCT40.1)
- A percentage of total counts for all values within AREA.
-
-Scale variables and totals and subtotals for categorical variables
-may use the following additional group cell summary function:
-
-* `areaPCT.SUM` ("Area Sum %", PCT40.1)
- Percentage of the sum of the values within AREA.
-
-
-[^1]: This is not necessarily a meaningful table. To make it easier to
-read, short variable labels are used.
-
-### Summary Functions for Adjusted Weights
-
-If the `WEIGHT` subcommand specified an [effective weight
-variable](#effective-weight), then the following summary functions use
-its value instead of the dictionary weight variable. Otherwise, they
-are equivalent to the summary function without the `E`-prefix:
-
-- `ECOUNT` ("Adjusted Count", F40.0)
-
-- `ETOTALN` ("Adjusted Total N", F40.0)
-
-- `EVALIDN` ("Adjusted Valid N", F40.0)
-
-### Unweighted Summary Functions
-
-The following summary functions with a `U`-prefix are equivalent to the
-same ones without the prefix, except that they use unweighted counts:
-
-- `UCOUNT` ("Unweighted Count", F40.0)
-
-- `UareaPCT` or `UareaPCT.COUNT` ("Unweighted Area %", PCT40.1)
-
-- `UareaPCT.VALIDN` ("Unweighted Area Valid N %", PCT40.1)
-
-- `UareaPCT.TOTALN` ("Unweighted Area Total N %", PCT40.1)
-
-- `UMEAN` ("Unweighted Mean")
-
-- `UMEDIAN` ("Unweighted Median")
-
-- `UMISSING` ("Unweighted Missing")
-
-- `UMODE` ("Unweighted Mode")
-
-- `UareaPCT.SUM` ("Unweighted Area Sum %", PCT40.1)
-
-- `UPTILE` n ("Unweighted Percentile n")
-
-- `USEMEAN` ("Unweighted Std Error of Mean")
-
-- `USTDDEV` ("Unweighted Std Deviation")
-
-- `USUM` ("Unweighted Sum")
-
-- `UTOTALN` ("Unweighted Total N", F40.0)
-
-- `UVALIDN` ("Unweighted Valid N", F40.0)
-
-- `UVARIANCE` ("Unweighted Variance", F40.0)
-
-## Statistics Positions and Labels
-
-```
-/SLABELS
- [POSITION={COLUMN | ROW | LAYER}]
- [VISIBLE={YES | NO}]
-```
-
-The `SLABELS` subcommand controls the position and visibility of
-summary statistics for the `TABLE` subcommand that it follows.
-
-`POSITION` sets the axis on which summary statistics appear. With
-POSITION=COLUMN, which is the default, each summary statistic appears in
-a column. For example:
-
-```
-CTABLES /TABLE=age [MEAN, MEDIAN] BY gender.
-```
-
-```
- Custom Tables
-+──────────────────────────+───────────────────────+
-│ │ S3a. GENDER: │
-│ +───────────+───────────+
-│ │ Male │ Female │
-│ +────+──────+────+──────+
-│ │Mean│Median│Mean│Median│
-+──────────────────────────+────+──────+────+──────+
-│D1. AGE: What is your age?│ 46│ 45│ 50│ 52│
-+──────────────────────────+────+──────+────+──────+
-```
-
-
-With `POSITION=ROW`, each summary statistic appears in a row, as shown
-below:
-
-```
-CTABLES /TABLE=age [MEAN, MEDIAN] BY gender /SLABELS POSITION=ROW.
-```
-
-```
- Custom Tables
-+─────────────────────────────────+─────────────+
-│ │ S3a. GENDER:│
-│ +─────+───────+
-│ │ Male│ Female│
-+─────────────────────────────────+─────+───────+
-│D1. AGE: What is your age? Mean │ 46│ 50│
-│ Median│ 45│ 52│
-+─────────────────────────────────+─────+───────+
-```
-
-
-`POSITION=LAYER` is also available to place each summary statistic in a
-separate layer.
-
-Labels for summary statistics are shown by default. Use VISIBLE=NO
-to suppress them. Because unlabeled data can cause confusion, it should
-only be considered if the meaning of the data is evident, as in a simple
-case like this:
-
-```
-CTABLES /TABLE=ageGroup [TABLEPCT] /SLABELS VISIBLE=NO.
-```
-
-```
- Custom Tables
-+───────────────────────+─────+
-│Age group 15 or younger│ .0%│
-│ 16 to 25 │15.7%│
-│ 26 to 35 │13.8%│
-│ 36 to 45 │14.8%│
-│ 46 to 55 │16.8%│
-│ 56 to 65 │17.8%│
-│ 66 or older │21.1%│
-+───────────────────────+─────+
-```
-
-
-## Category Label Positions
-
-```
-/CLABELS {AUTO │ {ROWLABELS│COLLABELS}={OPPOSITE│LAYER}}
-```
-
-The `CLABELS` subcommand controls the position of category labels for
-the `TABLE` subcommand that it follows. By default, or if AUTO is
-specified, category labels for a given variable nest inside the
-variable's label on the same axis. For example, the command below
-results in age categories nesting within the age group variable on the
-rows axis and gender categories within the gender variable on the
-columns axis:
-
-```
-CTABLES /TABLE ageGroup BY gender.
-```
-
-```
- Custom Tables
-+───────────────────────+────────────+
-│ │S3a. GENDER:│
-│ +─────+──────+
-│ │ Male│Female│
-│ +─────+──────+
-│ │Count│ Count│
-+───────────────────────+─────+──────+
-│Age group 15 or younger│ 0│ 0│
-│ 16 to 25 │ 594│ 505│
-│ 26 to 35 │ 476│ 491│
-│ 36 to 45 │ 489│ 548│
-│ 46 to 55 │ 526│ 649│
-│ 56 to 65 │ 516│ 731│
-│ 66 or older │ 531│ 943│
-+───────────────────────+─────+──────+
-```
-
-
-ROWLABELS=OPPOSITE or COLLABELS=OPPOSITE move row or column variable
-category labels, respectively, to the opposite axis. The setting
-affects only the innermost variable or variables, which must be
-categorical, on the given axis. For example:
-
-```
-CTABLES /TABLE ageGroup BY gender /CLABELS ROWLABELS=OPPOSITE.
-CTABLES /TABLE ageGroup BY gender /CLABELS COLLABELS=OPPOSITE.
-```
-
-```
- Custom Tables
-+─────+──────────────────────────────────────────────────────────────────────
-│ │ S3a. GENDER:
-│ +───────────────────────────────────────────+──────────────────────────
-│ │ Male │ Female
-│ +───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
-│ │ 15 or │16 to│26 to│36 to│46 to│56 to│66 or│ 15 or │16 to│26 to│36 to│
-│ │younger│ 25 │ 35 │ 45 │ 55 │ 65 │older│younger│ 25 │ 35 │ 45 │
-│ +───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
-│ │ Count │Count│Count│Count│Count│Count│Count│ Count │Count│Count│Count│
-+─────+───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
-│Age │ 0│ 594│ 476│ 489│ 526│ 516│ 531│ 0│ 505│ 491│ 548│
-│group│ │ │ │ │ │ │ │ │ │ │ │
-+─────+───────+─────+─────+─────+─────+─────+─────+───────+─────+─────+─────+
-
-+─────+─────────────────+
-│ │ │
-│ +─────────────────+
-│ │ │
-│ +─────+─────+─────+
-│ │46 to│56 to│66 or│
-│ │ 55 │ 65 │older│
-│ +─────+─────+─────+
-│ │Count│Count│Count│
-+─────+─────+─────+─────+
-│Age │ 649│ 731│ 943│
-│group│ │ │ │
-+─────+─────+─────+─────+
-
- Custom Tables
-+──────────────────────────────+────────────+
-│ │S3a. GENDER:│
-│ +────────────+
-│ │ Count │
-+──────────────────────────────+────────────+
-│Age group 15 or younger Male │ 0│
-│ Female│ 0│
-│ ─────────────────────+────────────+
-│ 16 to 25 Male │ 594│
-│ Female│ 505│
-│ ─────────────────────+────────────+
-│ 26 to 35 Male │ 476│
-│ Female│ 491│
-│ ─────────────────────+────────────+
-│ 36 to 45 Male │ 489│
-│ Female│ 548│
-│ ─────────────────────+────────────+
-│ 46 to 55 Male │ 526│
-│ Female│ 649│
-│ ─────────────────────+────────────+
-│ 56 to 65 Male │ 516│
-│ Female│ 731│
-│ ─────────────────────+────────────+
-│ 66 or older Male │ 531│
-│ Female│ 943│
-+──────────────────────────────+────────────+
-```
-
-
-`ROWLABELS=LAYER` or `COLLABELS=LAYER` move the innermost row or column
-variable category labels, respectively, to the layer axis.
-
-Only one axis's labels may be moved, whether to the opposite axis or
-to the layer axis.
-
-### Effect on Summary Statistics
-
-`CLABELS` primarily affects the appearance of tables, not the data
-displayed in them. However, `CTABLES` can affect the values displayed
-for statistics that summarize areas of a table, since it can change the
-definitions of these areas.
-
-For example, consider the following syntax and output:
-
-```
-CTABLES /TABLE ageGroup BY gender [ROWPCT, COLPCT].
-```
-
-```
- Custom Tables
-+───────────────────────+─────────────────────────────+
-│ │ S3a. GENDER: │
-│ +──────────────+──────────────+
-│ │ Male │ Female │
-│ +─────+────────+─────+────────+
-│ │Row %│Column %│Row %│Column %│
-+───────────────────────+─────+────────+─────+────────+
-│Age group 15 or younger│ .│ .0%│ .│ .0%│
-│ 16 to 25 │54.0%│ 19.0%│46.0%│ 13.1%│
-│ 26 to 35 │49.2%│ 15.2%│50.8%│ 12.7%│
-│ 36 to 45 │47.2%│ 15.6%│52.8%│ 14.2%│
-│ 46 to 55 │44.8%│ 16.8%│55.2%│ 16.8%│
-│ 56 to 65 │41.4%│ 16.5%│58.6%│ 18.9%│
-│ 66 or older │36.0%│ 17.0%│64.0%│ 24.4%│
-+───────────────────────+─────+────────+─────+────────+
-```
-
-
-Using `COLLABELS=OPPOSITE` changes the definitions of rows and columns,
-so that column percentages display what were previously row percentages
-and the new row percentages become meaningless (because there is only
-one cell per row):
-
-```
-CTABLES
- /TABLE ageGroup BY gender [ROWPCT, COLPCT]
- /CLABELS COLLABELS=OPPOSITE.
-```
-
-```
- Custom Tables
-+──────────────────────────────+───────────────+
-│ │ S3a. GENDER: │
-│ +──────+────────+
-│ │ Row %│Column %│
-+──────────────────────────────+──────+────────+
-│Age group 15 or younger Male │ .│ .│
-│ Female│ .│ .│
-│ ─────────────────────+──────+────────+
-│ 16 to 25 Male │100.0%│ 54.0%│
-│ Female│100.0%│ 46.0%│
-│ ─────────────────────+──────+────────+
-│ 26 to 35 Male │100.0%│ 49.2%│
-│ Female│100.0%│ 50.8%│
-│ ─────────────────────+──────+────────+
-│ 36 to 45 Male │100.0%│ 47.2%│
-│ Female│100.0%│ 52.8%│
-│ ─────────────────────+──────+────────+
-│ 46 to 55 Male │100.0%│ 44.8%│
-│ Female│100.0%│ 55.2%│
-│ ─────────────────────+──────+────────+
-│ 56 to 65 Male │100.0%│ 41.4%│
-│ Female│100.0%│ 58.6%│
-│ ─────────────────────+──────+────────+
-│ 66 or older Male │100.0%│ 36.0%│
-│ Female│100.0%│ 64.0%│
-+──────────────────────────────+──────+────────+
-```
-
-
-### Moving Categories for Stacked Variables
-
-If `CLABELS` moves category labels from an axis with stacked
-variables, the variables that are moved must have the same category
-specifications (see [Per-Variable Category
-Options](#per-variable-category-options)) and the same value labels.
-
-The following shows both moving stacked category variables and
-adapting to the changing definitions of rows and columns:
-
-```
-CTABLES /TABLE (likelihoodOfBeingStoppedByPolice
- + likelihoodOfHavingAnAccident) [COLPCT].
-CTABLES /TABLE (likelihoodOfBeingStoppedByPolice
- + likelihoodOfHavingAnAccident) [ROWPCT]
- /CLABELS ROW=OPPOSITE.
-```
-
-```
- Custom Tables
-+─────────────────────────────────────────────────────────────────────+───────+
-│ │ Column│
-│ │ % │
-+─────────────────────────────────────────────────────────────────────+───────+
-│105b. How likely is it that drivers who have had too Almost │ 10.2%│
-│much to drink to drive safely will A. Get stopped by the certain │ │
-│police? Very likely │ 21.8%│
-│ Somewhat │ 40.2%│
-│ likely │ │
-│ Somewhat │ 19.0%│
-│ unlikely │ │
-│ Very │ 8.9%│
-│ unlikely │ │
-+─────────────────────────────────────────────────────────────────────+───────+
-│105b. How likely is it that drivers who have had too Almost │ 15.9%│
-│much to drink to drive safely will B. Have an accident? certain │ │
-│ Very likely │ 40.8%│
-│ Somewhat │ 35.0%│
-│ likely │ │
-│ Somewhat │ 6.2%│
-│ unlikely │ │
-│ Very │ 2.0%│
-│ unlikely │ │
-+─────────────────────────────────────────────────────────────────────+───────+
-
- Custom Tables
-+─────────────────────────────+────────+───────+─────────+──────────+─────────+
-│ │ Almost │ Very │ Somewhat│ Somewhat │ Very │
-│ │ certain│ likely│ likely │ unlikely │ unlikely│
-│ +────────+───────+─────────+──────────+─────────+
-│ │ Row % │ Row % │ Row % │ Row % │ Row % │
-+─────────────────────────────+────────+───────+─────────+──────────+─────────+
-│105b. How likely is it that │ 10.2%│ 21.8%│ 40.2%│ 19.0%│ 8.9%│
-│drivers who have had too much│ │ │ │ │ │
-│to drink to drive safely will│ │ │ │ │ │
-│A. Get stopped by the police?│ │ │ │ │ │
-│105b. How likely is it that │ 15.9%│ 40.8%│ 35.0%│ 6.2%│ 2.0%│
-│drivers who have had too much│ │ │ │ │ │
-│to drink to drive safely will│ │ │ │ │ │
-│B. Have an accident? │ │ │ │ │ │
-+─────────────────────────────+────────+───────+─────────+──────────+─────────+
-```
-
-
-## Per-Variable Category Options
-
-```
-/CATEGORIES VARIABLES=variables
- {[value, value...]
- | [ORDER={A | D}]
- [KEY={VALUE | LABEL | summary(variable)}]
- [MISSING={EXCLUDE | INCLUDE}]}
- [TOTAL={NO | YES} [LABEL=string] [POSITION={AFTER | BEFORE}]]
- [EMPTY={INCLUDE | EXCLUDE}]
-```
-
-The `CATEGORIES` subcommand specifies, for one or more categorical
-variables, the categories to include and exclude, the sort order for
-included categories, and treatment of missing values. It also controls
-the totals and subtotals to display. It may be specified any number of
-times, each time for a different set of variables. `CATEGORIES` applies
-to the table produced by the `TABLE` subcommand that it follows.
-
-`CATEGORIES` does not apply to scalar variables.
-
-VARIABLES is required and must list the variables for the subcommand
-to affect.
-
-The syntax may specify the categories to include and their sort order
-either explicitly or implicitly. The following sections give the
-details of each form of syntax, followed by information on totals and
-subtotals and the `EMPTY` setting.
-
-### Explicit Categories
-
-To use `CTABLES` to explicitly specify categories to include, list the
-categories within square brackets in the desired sort order. Use spaces
-or commas to separate values. Categories not covered by the list are
-excluded from analysis.
-
-Each element of the list takes one of the following forms:
-
-* `number`
- `'string'`
- A numeric or string category value, for variables that have the
- corresponding type.
-
-* `'date'`
- `'time'`
- A date or time category value, for variables that have a date or
- time print format.
-
-* `min THRU max`
- `LO THRU max`
- `min THRU HI`
- A range of category values, where `min` and `max` each takes one of
- the forms above, in increasing order.
-
-* `MISSING`
- All user-missing values. (To match individual user-missing values,
- specify their category values.)
-
-* `OTHERNM`
- Any non-missing value not covered by any other element of the list
- (regardless of where `OTHERNM` is placed in the list).
-
-* `&postcompute`
- A [computed category name](#computed-categories).
-
-* `SUBTOTAL`
- `HSUBTOTAL`
- A [subtotal](#totals-and-subtotals).
-
-If multiple elements of the list cover a given category, the last one
-in the list takes precedence.
-
-The following example syntax and output show how an explicit category
-can limit the displayed categories:
-
-```
-CTABLES /TABLE freqOfDriving.
-CTABLES /TABLE freqOfDriving /CATEGORIES VARIABLES=freqOfDriving [1, 2, 3].
-```
-
-```
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│ 1. How often do you usually drive a car or other Every day │ 4667│
-│motor vehicle? Several days a week │ 1274│
-│ Once a week or less │ 361│
-│ Only certain times a│ 130│
-│ year │ │
-│ Never │ 540│
-+───────────────────────────────────────────────────────────────────────+─────+
-
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│ 1. How often do you usually drive a car or other Every day │ 4667│
-│motor vehicle? Several days a │ 1274│
-│ week │ │
-│ Once a week or │ 361│
-│ less │ │
-+───────────────────────────────────────────────────────────────────────+─────+
-```
-
-
-### Implicit Categories
-
-In the absence of an explicit list of categories, `CATEGORIES` allows
-`KEY`, `ORDER`, and `MISSING` to specify how to select and sort
-categories.
-
-The `KEY` setting specifies the sort key. By default, or with
-`KEY=VALUE`, categories are sorted by default. Categories may also be
-sorted by value label, with `KEY=LABEL`, or by the value of a summary
-function, e.g. `KEY=COUNT`.
-
-By default, or with `ORDER=A`, categories are sorted in ascending
-order. Specify `ORDER=D` to sort in descending order.
-
-User-missing values are excluded by default, or with
-`MISSING=EXCLUDE`. Specify `MISSING=INCLUDE` to include user-missing
-values. The system-missing value is always excluded.
-
-The following example syntax and output show how `MISSING=INCLUDE`
-causes missing values to be included in a category list.
-
-```
-CTABLES /TABLE freqOfDriving.
-CTABLES /TABLE freqOfDriving
- /CATEGORIES VARIABLES=freqOfDriving MISSING=INCLUDE.
-```
-
-```
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│ 1. How often do you usually drive a car or other Every day │ 4667│
-│motor vehicle? Several days a week │ 1274│
-│ Once a week or less │ 361│
-│ Only certain times a│ 130│
-│ year │ │
-│ Never │ 540│
-+───────────────────────────────────────────────────────────────────────+─────+
-
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│ 1. How often do you usually drive a car or other Every day │ 4667│
-│motor vehicle? Several days a week │ 1274│
-│ Once a week or less │ 361│
-│ Only certain times a│ 130│
-│ year │ │
-│ Never │ 540│
-│ Don't know │ 8│
-│ Refused │ 19│
-+───────────────────────────────────────────────────────────────────────+─────+
-```
-
-
-### Totals and Subtotals
-
-`CATEGORIES` also controls display of totals and subtotals. By default,
-or with `TOTAL=NO`, totals are not displayed. Use `TOTAL=YES` to
-display a total. By default, the total is labeled "Total"; use
-`LABEL="label"` to override it.
-
-Subtotals are also not displayed by default. To add one or more
-subtotals, use an explicit category list and insert `SUBTOTAL` or
-`HSUBTOTAL` in the position or positions where the subtotal should
-appear. The subtotal becomes an extra row or column or layer.
-`HSUBTOTAL` additionally hides the categories that make up the
-subtotal. Either way, the default label is "Subtotal", use
-`SUBTOTAL="label"` or `HSUBTOTAL="label"` to specify a custom label.
-
-The following example syntax and output show how to use `TOTAL=YES`
-and `SUBTOTAL`:
-
-```
-CTABLES
- /TABLE freqOfDriving
- /CATEGORIES VARIABLES=freqOfDriving [OTHERNM, SUBTOTAL='Valid Total',
- MISSING, SUBTOTAL='Missing Total']
- TOTAL=YES LABEL='Overall Total'.
-```
-
-```
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│ 1. How often do you usually drive a car or other Every day │ 4667│
-│motor vehicle? Several days a week │ 1274│
-│ Once a week or less │ 361│
-│ Only certain times a│ 130│
-│ year │ │
-│ Never │ 540│
-│ Valid Total │ 6972│
-│ Don't know │ 8│
-│ Refused │ 19│
-│ Missing Total │ 27│
-│ Overall Total │ 6999│
-+───────────────────────────────────────────────────────────────────────+─────+
-```
-
-
-By default, or with `POSITION=AFTER`, totals are displayed in the
-output after the last category and subtotals apply to categories that
-precede them. With `POSITION=BEFORE`, totals come before the first
-category and subtotals apply to categories that follow them.
-
-Only categorical variables may have totals and subtotals. Scalar
-variables may be "totaled" indirectly by enabling totals and subtotals
-on a categorical variable within which the scalar variable is
-summarized. For example, the following syntax produces a mean, count,
-and valid count across all data by adding a total on the categorical
-`region` variable, as shown:
-
-```
-CTABLES /TABLE=region > monthDaysMin1drink [MEAN, VALIDN]
- /CATEGORIES VARIABLES=region TOTAL=YES LABEL='All regions'.
-```
-
-```
- Custom Tables
-+───────────────────────────────────────────────────────────+────+─────+──────+
-│ │ │ │ Valid│
-│ │Mean│Count│ N │
-+───────────────────────────────────────────────────────────+────+─────+──────+
-│20. On how many of the thirty days in this Region NE │ 5.6│ 1409│ 945│
-│typical month did you have one or more MW │ 5.0│ 1654│ 1026│
-│alcoholic beverages to drink? S │ 6.0│ 2390│ 1285│
-│ W │ 6.5│ 1546│ 953│
-│ All │ 5.8│ 6999│ 4209│
-│ regions │ │ │ │
-+───────────────────────────────────────────────────────────+────+─────+──────+
-```
-
-
-By default, PSPP uses the same summary functions for totals and
-subtotals as other categories. To summarize totals and subtotals
-differently, specify the summary functions for totals and subtotals
-after the ordinary summary functions inside a nested set of `[]`
-following `TOTALS`. For example, the following syntax displays `COUNT`
-for individual categories and totals and `VALIDN` for totals, as shown:
-
-```
-CTABLES
- /TABLE isLicensedDriver [COUNT, TOTALS[COUNT, VALIDN]]
- /CATEGORIES VARIABLES=isLicensedDriver TOTAL=YES MISSING=INCLUDE.
-```
-
-```
- Custom Tables
-+────────────────────────────────────────────────────────────────+─────+──────+
-│ │ │ Valid│
-│ │Count│ N │
-+────────────────────────────────────────────────────────────────+─────+──────+
-│D7a. Are you a licensed driver; that is, do you have a Yes │ 6379│ │
-│valid driver's license? No │ 572│ │
-│ Don't │ 4│ │
-│ know │ │ │
-│ Refused │ 44│ │
-│ Total │ 6999│ 6951│
-+────────────────────────────────────────────────────────────────+─────+──────+
-```
-
-
-### Categories Without Values
-
-Some categories might not be included in the data set being analyzed.
-For example, our example data set has no cases in the "15 or younger"
-age group. By default, or with `EMPTY=INCLUDE`, PSPP includes these
-empty categories in output tables. To exclude them, specify
-`EMPTY=EXCLUDE`.
-
-For implicit categories, empty categories potentially include all the
-values with value labels for a given variable; for explicit categories,
-they include all the values listed individually and all values with
-value labels that are covered by ranges or `MISSING` or `OTHERNM`.
-
-The following example syntax and output show the effect of
-`EMPTY=EXCLUDE` for the `membersOver16` variable, in which 0 is labeled
-"None" but no cases exist with that value:
-
-```
-CTABLES /TABLE=membersOver16.
-CTABLES /TABLE=membersOver16 /CATEGORIES VARIABLES=membersOver16 EMPTY=EXCLUDE.
-```
-
-```
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│S1. Including yourself, how many members of this household are None │ 0│
-│age 16 or older? 1 │ 1586│
-│ 2 │ 3031│
-│ 3 │ 505│
-│ 4 │ 194│
-│ 5 │ 55│
-│ 6 or │ 21│
-│ more │ │
-+───────────────────────────────────────────────────────────────────────+─────+
-
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│S1. Including yourself, how many members of this household are 1 │ 1586│
-│age 16 or older? 2 │ 3031│
-│ 3 │ 505│
-│ 4 │ 194│
-│ 5 │ 55│
-│ 6 or │ 21│
-│ more │ │
-+───────────────────────────────────────────────────────────────────────+─────+
-```
-
-
-## Titles
-
-```
-/TITLES
- [TITLE=string...]
- [CAPTION=string...]
- [CORNER=string...]
-```
-
-The `TITLES` subcommand sets the title, caption, and corner text for
-the table output for the previous `TABLE` subcommand. Any number of
-strings may be specified for each kind of text, with each string
-appearing on a separate line in the output. The title appears above the
-table, the caption below the table, and the corner text appears in the
-table's upper left corner. By default, the title is "Custom Tables" and
-the caption and corner text are empty. With some table output styles,
-the corner text is not displayed.
-
-The strings provided in this subcommand may contain the following
-macro-like keywords that PSPP substitutes at the time that it runs the
-command:
-
-* `)DATE`
- The current date, e.g. MM/DD/YY. The format is locale-dependent.
-
-* `)TIME`
- The current time, e.g. HH:MM:SS. The format is locale-dependent.
-
-* `)TABLE`
- The expression specified on the `TABLE` command. Summary and
- measurement level specifications are omitted, and variable labels
- are used in place of variable names.
-
-## Table Formatting
-
-```
-/FORMAT
- [MINCOLWIDTH={DEFAULT | width}]
- [MAXCOLWIDTH={DEFAULT | width}]
- [UNITS={POINTS | INCHES | CM}]
- [EMPTY={ZERO | BLANK | string}]
- [MISSING=string]
-```
-
-The `FORMAT` subcommand, which must precede the first `TABLE`
-subcommand, controls formatting for all the output tables. `FORMAT` and
-all of its settings are optional.
-
-Use `MINCOLWIDTH` and `MAXCOLWIDTH` to control the minimum or maximum
-width of columns in output tables. By default, with `DEFAULT`, column
-width varies based on content. Otherwise, specify a number for either
-or both of these settings. If both are specified, `MAXCOLWIDTH` must be
-greater than or equal to `MINCOLWIDTH`. The default unit, or with
-`UNITS=POINTS`, is points (1/72 inch), or specify `UNITS=INCHES` to use
-inches or `UNITS=CM` for centimeters. PSPP does not currently honor any
-of these settings.
-
-By default, or with `EMPTY=ZERO`, zero values are displayed in their
-usual format. Use `EMPTY=BLANK` to use an empty cell instead, or
-`EMPTY="string"` to use the specified string.
-
-By default, missing values are displayed as `.`, the same as in other
-tables. Specify `MISSING="string"` to instead use a custom string.
-
-## Display of Variable Labels
-
-```
-/VLABELS
- VARIABLES=variables
- DISPLAY={DEFAULT | NAME | LABEL | BOTH | NONE}
-```
-
-The `VLABELS` subcommand, which must precede the first `TABLE`
-subcommand, controls display of variable labels in all the output
-tables. `VLABELS` is optional. It may appear multiple times to adjust
-settings for different variables.
-
-`VARIABLES` and `DISPLAY` are required. The value of `DISPLAY`
-controls how variable labels are displayed for the variables listed on
-`VARIABLES`. The supported values are:
-
-* `DEFAULT`
- Use the setting from [`SET TVARS`](../utilities/set.md#tvars)).
-
-* `NAME`
- Show only a variable name.
-
-* `LABEL`
- Show only a variable label.
-
-* `BOTH`
- Show variable name and label.
-
-* `NONE`
- Show nothing.
-
-## Missing Value Treatment
-
-The `TABLE` subcommand on `CTABLES` specifies two different kinds of
-variables: variables that divide tables into cells (which are always
-categorical) and variables being summarized (which may be categorical or
-scale). PSPP treats missing values differently in each kind of
-variable, as described in the sections below.
-
-### Missing Values for Cell-Defining Variables
-
-For variables that divide tables into cells, per-variable category
-options, as described in [Per-Variable Category
-Options](#per-variable-category-options), determine which data is
-analyzed. If any of the categories for such a variable would exclude
-a case, then that case is not included.
-
-As an example, consider the following entirely artificial dataset, in
-which `x` and `y` are categorical variables with missing value 9, and
-`z` is scale:
-
-```
- Data List
-+─+─+─────────+
-│x│y│ z │
-+─+─+─────────+
-│1│1│ 1│
-│1│2│ 10│
-│1│9│ 100│
-│2│1│ 1000│
-│2│2│ 10000│
-│2│9│ 100000│
-│9│1│ 1000000│
-│9│2│ 10000000│
-│9│9│100000000│
-+─+─+─────────+
-```
-
-
-Using `x` and `y` to define cells, and summarizing `z`, by default
-PSPP omits all the cases that have `x` or `y` (or both) missing:
-
-```
-CTABLES /TABLE x > y > z [SUM].
-```
-
-```
- Custom Tables
-+─────────+─────+
-│ │ Sum │
-+─────────+─────+
-│x 1 y 1 z│ 1│
-│ ────+─────+
-│ 2 z│ 10│
-│ ────────+─────+
-│ 2 y 1 z│ 1000│
-│ ────+─────+
-│ 2 z│10000│
-+─────────+─────+
-```
-
-
-If, however, we add `CATEGORIES` specifications to include missing
-values for `y` or for `x` and `y`, the output table includes them, like
-so:
-
-```
-CTABLES /TABLE x > y > z [SUM] /CATEGORIES VARIABLES=y MISSING=INCLUDE.
-CTABLES /TABLE x > y > z [SUM] /CATEGORIES VARIABLES=x y MISSING=INCLUDE.
-```
-
-```
- Custom Tables
-+─────────+──────+
-│ │ Sum │
-+─────────+──────+
-│x 1 y 1 z│ 1│
-│ ────+──────+
-│ 2 z│ 10│
-│ ────+──────+
-│ 9 z│ 100│
-│ ────────+──────+
-│ 2 y 1 z│ 1000│
-│ ────+──────+
-│ 2 z│ 10000│
-│ ────+──────+
-│ 9 z│100000│
-+─────────+──────+
-
- Custom Tables
-+─────────+─────────+
-│ │ Sum │
-+─────────+─────────+
-│x 1 y 1 z│ 1│
-│ ────+─────────+
-│ 2 z│ 10│
-│ ────+─────────+
-│ 9 z│ 100│
-│ ────────+─────────+
-│ 2 y 1 z│ 1000│
-│ ────+─────────+
-│ 2 z│ 10000│
-│ ────+─────────+
-│ 9 z│ 100000│
-│ ────────+─────────+
-│ 9 y 1 z│ 1000000│
-│ ────+─────────+
-│ 2 z│ 10000000│
-│ ────+─────────+
-│ 9 z│100000000│
-+─────────+─────────+
-```
-
-
-### Missing Values for Summary Variables
-
-For summary variables, values that are valid and in included categories
-are analyzed, and values that are missing or in excluded categories are
-not analyzed, with the following exceptions:
-
-- The `VALIDN` summary functions (`VALIDN`, `EVALIDN`, `UVALIDN`,
- `areaPCT.VALIDN`, and `UareaPCT.VALIDN`) only count valid values in
- included categories (not missing values in included categories).
-
-- The `TOTALN` summary functions (`TOTALN`, `ETOTALN`, `UTOTALN`,
- `areaPCT.TOTALN`), and `UareaPCT.TOTALN` count all values (valid
- and missing) in included categories and missing (but not valid)
- values in excluded categories.
-
-For categorical variables, system-missing values are never in included
-categories. For scale variables, there is no notion of included and
-excluded categories, so all values are effectively included.
-
-The following table provides another view of the above rules:
-
-||`VALIDN`|other|`TOTALN`|
-|:--|:--|:--|:--|
-|Categorical variables:||||
-| Valid values in included categories|yes|yes|yes|
-| Missing values in included categories|--|yes|yes|
-| Missing values in excluded categories|--|--|yes|
-| Valid values in excluded categories|--|--|--|
-|Scale variables:||||
-| Valid values|yes|yes|yes|
-| User- or system-missing values|--|yes|yes|
-
-
-### Scale Missing Values
-
-```
-/SMISSING {VARIABLE | LISTWISE}
-```
-
-The `SMISSING` subcommand, which must precede the first `TABLE`
-subcommand, controls treatment of missing values for scalar variables in
-producing all the output tables. `SMISSING` is optional.
-
-With `SMISSING=VARIABLE`, which is the default, missing values are
-excluded on a variable-by-variable basis. With `SMISSING=LISTWISE`,
-when stacked scalar variables are nested together with a categorical
-variable, a missing value for any of the scalar variables causes the
-case to be excluded for all of them.
-
-As an example, consider the following dataset, in which `x` is a
-categorical variable and `y` and `z` are scale:
-
-```
- Data List
-+─+─────+─────+
-│x│ y │ z │
-+─+─────+─────+
-│1│ .│40.00│
-│1│10.00│50.00│
-│1│20.00│60.00│
-│1│30.00│ .│
-+─+─────+─────+
-```
-
-
-With the default missing-value treatment, `x`'s mean is 20, based on the
-values 10, 20, and 30, and `y`'s mean is 50, based on 40, 50, and 60:
-
-```
-CTABLES /TABLE (y + z) > x.
-```
-
-```
-Custom Tables
-+─────+─────+
-│ │ Mean│
-+─────+─────+
-│y x 1│20.00│
-+─────+─────+
-│z x 1│50.00│
-+─────+─────+
-```
-
-
-By adding `SMISSING=LISTWISE`, only cases where `y` and `z` are both
-non-missing are considered, so `x`'s mean becomes 15, as the average of
-10 and 20, and `y`'s mean becomes 55, the average of 50 and 60:
-
-```
-CTABLES /SMISSING LISTWISE /TABLE (y + z) > x.
-```
-
-```
-Custom Tables
-+─────+─────+
-│ │ Mean│
-+─────+─────+
-│y x 1│15.00│
-+─────+─────+
-│z x 1│55.00│
-+─────+─────+
-```
-
-
-Even with `SMISSING=LISTWISE`, if `y` and `z` are separately nested with
-`x`, instead of using a single `>` operator, missing values revert to
-being considered on a variable-by-variable basis:
-
-```
-CTABLES /SMISSING LISTWISE /TABLE (y > x) + (z > x).
-```
-
-```
-Custom Tables
-+─────+─────+
-│ │ Mean│
-+─────+─────+
-│y x 1│20.00│
-+─────+─────+
-│z x 1│50.00│
-+─────+─────+
-```
-
-
-## Computed Categories
-
-```
-/PCOMPUTE &postcompute=EXPR(expression)
-/PPROPERTIES &postcompute...
- [LABEL=string]
- [FORMAT=[summary format]...]
- [HIDESOURCECATS={NO | YES}
-```
-
-"Computed categories", also called "postcomputes", are categories
-created using arithmetic on categories obtained from the data. The
-`PCOMPUTE` subcommand creates a postcompute, which may then be used on
-`CATEGORIES` within an [explicit category
-list](#explicit-categories). Optionally, `PPROPERTIES` refines how
-a postcompute is displayed. The following sections provide the
-details.
-
-### PCOMPUTE
-
-```
-/PCOMPUTE &postcompute=EXPR(expression)
-```
-
-The `PCOMPUTE` subcommand, which must precede the first `TABLE`
-command, defines computed categories. It is optional and may be used
-any number of times to define multiple postcomputes.
-
-Each `PCOMPUTE` defines one postcompute. Its syntax consists of a
-name to identify the postcompute as a PSPP identifier prefixed by `&`,
-followed by `=` and a postcompute expression enclosed in `EXPR(...)`. A
-postcompute expression consists of:
-
-* `[category]`
- This form evaluates to the summary statistic for category, e.g.
- `[1]` evaluates to the value of the summary statistic associated
- with category 1. The category may be a number, a quoted string, or
- a quoted time or date value. All of the categories for a given
- postcompute must have the same form. The category must appear in
- all the `CATEGORIES` list in which the postcompute is used.
-
-* `[min THRU max]`
-`[LO THRU max]`
-`[min THRU HI]`
-`MISSING`
-`OTHERNM`
- These forms evaluate to the summary statistics for a category
- specified with the same syntax, as described in a previous section
- (see [Explicit Category List](#explicit-categories)). The
- category must appear in all the `CATEGORIES` list in which the
- postcompute is used.
-
-* `SUBTOTAL`
- The summary statistic for the subtotal category. This form is
- allowed only if the `CATEGORIES` lists that include this
- postcompute have exactly one subtotal.
-
-* `SUBTOTAL[index]`
- The summary statistic for subtotal category index, where 1 is the
- first subtotal, 2 is the second, and so on. This form may be used
- for `CATEGORIES` lists with any number of subtotals.
-
-* `TOTAL`
- The summary statistic for the total. The `CATEGORIES` lsits that
- include this postcompute must have a total enabled.
-
-* `a + b`
- `a - b`
- `a * b`
- `a / b`
- `a ** b`
- These forms perform arithmetic on the values of postcompute
- expressions a and b. The usual operator precedence rules apply.
-
-* `number`
- Numeric constants may be used in postcompute expressions.
-
-* `(a)`
- Parentheses override operator precedence.
-
-A postcompute is not associated with any particular variable.
-Instead, it may be referenced within `CATEGORIES` for any suitable
-variable (e.g. only a string variable is suitable for a postcompute
-expression that refers to a string category, only a variable with
-subtotals for an expression that refers to subtotals, ...).
-
-Normally a named postcompute is defined only once, but if a later
-`PCOMPUTE` redefines a postcompute with the same name as an earlier one,
-the later one take precedence.
-
-The following syntax and output shows how `PCOMPUTE` can compute a
-total over subtotals, summing the "Frequent Drivers" and "Infrequent
-Drivers" subtotals to form an "All Drivers" postcompute. It also
-shows how to calculate and display a percentage, in this case the
-percentage of valid responses that report never driving. It uses
-[`PPROPERTIES`](#pproperties) to display the latter in `PCT` format.
-
-```
-CTABLES
- /PCOMPUTE &all_drivers=EXPR([1 THRU 2] + [3 THRU 4])
- /PPROPERTIES &all_drivers LABEL='All Drivers'
- /PCOMPUTE &pct_never=EXPR([5] / ([1 THRU 2] + [3 THRU 4] + [5]) * 100)
- /PPROPERTIES &pct_never LABEL='% Not Drivers' FORMAT=COUNT PCT40.1
- /TABLE=freqOfDriving BY gender
- /CATEGORIES VARIABLES=freqOfDriving
- [1 THRU 2, SUBTOTAL='Frequent Drivers',
- 3 THRU 4, SUBTOTAL='Infrequent Drivers',
- &all_drivers, 5, &pct_never,
- MISSING, SUBTOTAL='Not Drivers or Missing'].
-```
-
-```
- Custom Tables
-+────────────────────────────────────────────────────────────────+────────────+
-│ │S3a. GENDER:│
-│ +─────+──────+
-│ │ Male│Female│
-│ +─────+──────+
-│ │Count│ Count│
-+────────────────────────────────────────────────────────────────+─────+──────+
-│ 1. How often do you usually drive a car or Every day │ 2305│ 2362│
-│other motor vehicle? Several days a week │ 440│ 834│
-│ Frequent Drivers │ 2745│ 3196│
-│ Once a week or less │ 125│ 236│
-│ Only certain times a│ 58│ 72│
-│ year │ │ │
-│ Infrequent Drivers │ 183│ 308│
-│ All Drivers │ 2928│ 3504│
-│ Never │ 192│ 348│
-│ % Not Drivers │ 6.2%│ 9.0%│
-│ Don't know │ 3│ 5│
-│ Refused │ 9│ 10│
-│ Not Drivers or │ 204│ 363│
-│ Missing │ │ │
-+────────────────────────────────────────────────────────────────+─────+──────+
-```
-
-
-### PPROPERTIES
-
-```
-/PPROPERTIES &postcompute...
- [LABEL=string]
- [FORMAT=[summary format]...]
- [HIDESOURCECATS={NO | YES}
-```
-
-The `PPROPERTIES` subcommand, which must appear before `TABLE`, sets
-properties for one or more postcomputes defined on prior `PCOMPUTE`
-subcommands. The subcommand syntax begins with the list of
-postcomputes, each prefixed with `&` as specified on `PCOMPUTE`.
-
-All of the settings on `PPROPERTIES` are optional. Use `LABEL` to
-set the label shown for the postcomputes in table output. The default
-label for a postcompute is the expression used to define it.
-
-A postcompute always uses same summary functions as the variable
-whose categories contain it, but `FORMAT` allows control over the format
-used to display their values. It takes a list of summary function names
-and format specifiers.
-
-By default, or with `HIDESOURCECATS=NO`, categories referred to by
-computed categories are displayed like other categories. Use
-`HIDESOURCECATS=YES` to hide them.
-
-The previous section provides an example for `PPROPERTIES`.
-
-## Effective Weight
-
-```
-/WEIGHT VARIABLE=variable
-```
-
-The `WEIGHT` subcommand is optional and must appear before `TABLE`.
-If it appears, it must name a numeric variable, known as the
-"effective weight" or "adjustment weight". The effective weight
-variable stands in for the dictionary's [weight
-variable](../../commands/selection/weight.md), if any, in most
-calculations in `CTABLES`. The only exceptions are the `COUNT`,
-`TOTALN`, and `VALIDN` summary functions, which use the dictionary
-weight instead.
-
-Weights obtained from the PSPP dictionary are rounded to the nearest
-integer at the case level. Effective weights are not rounded.
-Regardless of the weighting source, PSPP does not analyze cases with
-zero, missing, or negative effective weights.
-
-
-## Hiding Small Counts
-
-```
-/HIDESMALLCOUNTS COUNT=count
-```
-
-The `HIDESMALLCOUNTS` subcommand is optional. If it specified, then
-`COUNT`, `ECOUNT`, and `UCOUNT` values in output tables less than the
-value of count are shown as `<count` instead of their true values. The
-value of count must be an integer and must be at least 2.
-
-The following syntax and example shows how to use `HIDESMALLCOUNTS`:
-
-```
-CTABLES /HIDESMALLCOUNTS COUNT=10 /TABLE placeOfLastDrinkBeforeDrive.
-```
-
-```
- Custom Tables
-+───────────────────────────────────────────────────────────────────────+─────+
-│ │Count│
-+───────────────────────────────────────────────────────────────────────+─────+
-│37. Please think about the most recent occasion that Other (list) │<10 │
-│you drove within two hours of drinking alcoholic Your home │ 182│
-│beverages. Where did you drink on that occasion? Friend's home │ 264│
-│ Bar/Tavern/Club │ 279│
-│ Restaurant │ 495│
-│ Work │ 21│
-│ Bowling alley │<10 │
-│ Hotel/Motel │<10 │
-│ Country Club/ │ 17│
-│ Golf course │ │
-│ Drank in the │<10 │
-│ car/On the road │ │
-│ Sporting event │ 15│
-│ Movie theater │<10 │
-│ Shopping/Store/ │<10 │
-│ Grocery store │ │
-│ Wedding │ 15│
-│ Party at someone│ 81│
-│ else's home │ │
-│ Park/picnic │ 14│
-│ Party at your │<10 │
-│ house │ │
-+───────────────────────────────────────────────────────────────────────+─────+
-```
+++ /dev/null
-#DESCRIPTIVES
-
-```
-DESCRIPTIVES
- /VARIABLES=VAR_LIST
- /MISSING={VARIABLE,LISTWISE} {INCLUDE,NOINCLUDE}
- /FORMAT={LABELS,NOLABELS} {NOINDEX,INDEX} {LINE,SERIAL}
- /SAVE
- /STATISTICS={ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
- SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
- SESKEWNESS,SEKURTOSIS}
- /SORT={NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
- RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME}
- {A,D}
-```
-
-The `DESCRIPTIVES` procedure reads the active dataset and outputs
-linear descriptive statistics requested by the user. It can also
-compute Z-scores.
-
-The `VARIABLES` subcommand, which is required, specifies the list of
-variables to be analyzed. Keyword `VARIABLES` is optional.
-
-All other subcommands are optional:
-
-The `MISSING` subcommand determines the handling of missing variables.
-If `INCLUDE` is set, then user-missing values are included in the
-calculations. If `NOINCLUDE` is set, which is the default,
-user-missing values are excluded. If `VARIABLE` is set, then missing
-values are excluded on a variable by variable basis; if `LISTWISE` is
-set, then the entire case is excluded whenever any value in that case
-has a system-missing or, if `INCLUDE` is set, user-missing value.
-
-The `FORMAT` subcommand has no effect. It is accepted for backward
-compatibility.
-
-The `SAVE` subcommand causes `DESCRIPTIVES` to calculate Z scores for
-all the specified variables. The Z scores are saved to new variables.
-Variable names are generated by trying first the original variable
-name with Z prepended and truncated to a maximum of 8 characters, then
-the names `ZSC000` through `ZSC999`, `STDZ00` through `STDZ09`,
-`ZZZZ00` through `ZZZZ09`, `ZQZQ00` through `ZQZQ09`, in that order.
-Z-score variable names may also be specified explicitly on `VARIABLES`
-in the variable list by enclosing them in parentheses after each
-variable. When Z scores are calculated, PSPP ignores
-[`TEMPORARY`](../../commands/selection/temporary.md), treating
-temporary transformations as permanent.
-
-The `STATISTICS` subcommand specifies the statistics to be displayed:
-
-* `ALL`
- All of the statistics below.
-* `MEAN`
- Arithmetic mean.
-* `SEMEAN`
- Standard error of the mean.
-* `STDDEV`
- Standard deviation.
-* `VARIANCE`
- Variance.
-* `KURTOSIS`
- Kurtosis and standard error of the kurtosis.
-* `SKEWNESS`
- Skewness and standard error of the skewness.
-* `RANGE`
- Range.
-* `MINIMUM`
- Minimum value.
-* `MAXIMUM`
- Maximum value.
-* `SUM`
- Sum.
-* `DEFAULT`
- Mean, standard deviation of the mean, minimum, maximum.
-* `SEKURTOSIS`
- Standard error of the kurtosis.
-* `SESKEWNESS`
- Standard error of the skewness.
-
-The `SORT` subcommand specifies how the statistics should be sorted.
-Most of the possible values should be self-explanatory. `NAME` causes
-the statistics to be sorted by name. By default, the statistics are
-listed in the order that they are specified on the `VARIABLES`
-subcommand. The `A` and `D` settings request an ascending or
-descending sort order, respectively.
-
-## Example
-
-The `physiology.sav` file contains various physiological data for a
-sample of persons. Running the `DESCRIPTIVES` command on the
-variables height and temperature with the default options allows one
-to see simple linear statistics for these two variables. In the
-example below, these variables are specfied on the `VARIABLES`
-subcommand and the `SAVE` option has been used, to request that Z
-scores be calculated.
-
-After the command completes, this example runs `DESCRIPTIVES` again,
-this time on the zheight and ztemperature variables, which are the two
-normalized (Z-score) variables generated by the first `DESCRIPTIVES`
-command.
-
-```
-get file='physiology.sav'.
-
-descriptives
- /variables = height temperature
- /save.
-
-descriptives
- /variables = zheight ztemperature.
-```
-
-In the output below, we can see that there are 40 valid data for each
-of the variables and no missing values. The mean average of the
-height and temperature is 16677.12 and 37.02 respectively. The
-descriptive statistics for temperature seem reasonable. However there
-is a very high standard deviation for height and a suspiciously low
-minimum. This is due to a data entry error in the data.
-
-In the second Descriptive Statistics output, one can see that the mean
-and standard deviation of both Z score variables is 0 and 1
-respectively. All Z score statistics should have these properties
-since they are normalized versions of the original scores.
-
-```
- Mapping of Variables to Z-scores
-┌────────────────────────────────────────────┬────────────┐
-│ Source │ Target │
-├────────────────────────────────────────────┼────────────┤
-│Height in millimeters │Zheight │
-│Internal body temperature in degrees Celcius│Ztemperature│
-└────────────────────────────────────────────┴────────────┘
-
- Descriptive Statistics
-┌──────────────────────────────────────────┬──┬───────┬───────┬───────┬───────┐
-│ │ N│ Mean │Std Dev│Minimum│Maximum│
-├──────────────────────────────────────────┼──┼───────┼───────┼───────┼───────┤
-│Height in millimeters │40│1677.12│ 262.87│ 179│ 1903│
-│Internal body temperature in degrees │40│ 37.02│ 1.82│ 32.59│ 39.97│
-│Celcius │ │ │ │ │ │
-│Valid N (listwise) │40│ │ │ │ │
-│Missing N (listwise) │ 0│ │ │ │ │
-└──────────────────────────────────────────┴──┴───────┴───────┴───────┴───────┘
-
- Descriptive Statistics
-┌─────────────────────────────────────────┬──┬─────────┬──────┬───────┬───────┐
-│ │ │ │ Std │ │ │
-│ │ N│ Mean │ Dev │Minimum│Maximum│
-├─────────────────────────────────────────┼──┼─────────┼──────┼───────┼───────┤
-│Z─score of Height in millimeters │40│1.93E─015│ 1.00│ ─5.70│ .86│
-│Z─score of Internal body temperature in │40│1.37E─015│ 1.00│ ─2.44│ 1.62│
-│degrees Celcius │ │ │ │ │ │
-│Valid N (listwise) │40│ │ │ │ │
-│Missing N (listwise) │ 0│ │ │ │ │
-└─────────────────────────────────────────┴──┴─────────┴──────┴───────┴───────┘
-```
+++ /dev/null
-#EXAMINE
-
-```
-EXAMINE
- VARIABLES= VAR1 [VAR2] ... [VARN]
- [BY FACTOR1 [BY SUBFACTOR1]
- [ FACTOR2 [BY SUBFACTOR2]]
- ...
- [ FACTOR3 [BY SUBFACTOR3]]
- ]
- /STATISTICS={DESCRIPTIVES, EXTREME[(N)], ALL, NONE}
- /PLOT={BOXPLOT, NPPLOT, HISTOGRAM, SPREADLEVEL[(T)], ALL, NONE}
- /CINTERVAL P
- /COMPARE={GROUPS,VARIABLES}
- /ID=IDENTITY_VARIABLE
- /{TOTAL,NOTOTAL}
- /PERCENTILE=[PERCENTILES]={HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL }
- /MISSING={LISTWISE, PAIRWISE} [{EXCLUDE, INCLUDE}]
- [{NOREPORT,REPORT}]
-```
-
-`EXAMINE` is used to perform exploratory data analysis. In
-particular, it is useful for testing how closely a distribution
-follows a normal distribution, and for finding outliers and extreme
-values.
-
-The `VARIABLES` subcommand is mandatory. It specifies the dependent
-variables and optionally variables to use as factors for the analysis.
-Variables listed before the first `BY` keyword (if any) are the
-dependent variables. The dependent variables may optionally be followed
-by a list of factors which tell PSPP how to break down the analysis for
-each dependent variable.
-
-Following the dependent variables, factors may be specified. The
-factors (if desired) should be preceded by a single `BY` keyword. The
-format for each factor is `FACTORVAR [BY SUBFACTORVAR]`. Each unique
-combination of the values of `FACTORVAR` and `SUBFACTORVAR` divide the
-dataset into "cells". Statistics are calculated for each cell and for
-the entire dataset (unless `NOTOTAL` is given).
-
-The `STATISTICS` subcommand specifies which statistics to show.
-`DESCRIPTIVES` produces a table showing some parametric and
-non-parametrics statistics. `EXTREME` produces a table showing the
-extremities of each cell. A number in parentheses determines how many
-upper and lower extremities to show. The default number is 5.
-
-The subcommands `TOTAL` and `NOTOTAL` are mutually exclusive. If
-`TOTAL` appears, then statistics for the entire dataset as well as for
-each cell are produced. If `NOTOTAL` appears, then statistics are
-produced only for the cells (unless no factor variables have been
-given). These subcommands have no effect if there have been no factor
-variables specified.
-
-The `PLOT` subcommand specifies which plots are to be produced if
-any. Available plots are `HISTOGRAM`, `NPPLOT`, `BOXPLOT` and
-`SPREADLEVEL`. The first three can be used to visualise how closely
-each cell conforms to a normal distribution, whilst the spread vs. level
-plot can be useful to visualise how the variance differs between
-factors. Boxplots show you the outliers and extreme values.[^1]
-
-[^1]: `HISTOGRAM` uses Sturges' rule to determine the number of bins,
-as approximately \\(1 + \log2(n)\\), where \\(n\\) is the number of
-samples. ([`FREQUENCIES`](frequencies.md) uses a different algorithm
-to find the bin size.)
-
-The `SPREADLEVEL` plot displays the interquartile range versus the
-median. It takes an optional parameter `T`, which specifies how the
-data should be transformed prior to plotting. The given value `T` is
-a power to which the data are raised. For example, if `T` is given as
-2, then the square of the data is used. Zero, however is a special
-value. If `T` is 0 or is omitted, then data are transformed by taking
-its natural logarithm instead of raising to the power of `T`.
-
-When one or more plots are requested, `EXAMINE` also performs the
-Shapiro-Wilk test for each category. There are however a number of
-provisos:
-- All weight values must be integer.
-- The cumulative weight value must be in the range \[3, 5000\].
-
-The `COMPARE` subcommand is only relevant if producing boxplots, and
-it is only useful there is more than one dependent variable and at least
-one factor. If `/COMPARE=GROUPS` is specified, then one plot per
-dependent variable is produced, each of which contain boxplots for all
-the cells. If `/COMPARE=VARIABLES` is specified, then one plot per cell
-is produced, each containing one boxplot per dependent variable. If the
-`/COMPARE` subcommand is omitted, then PSPP behaves as if
-`/COMPARE=GROUPS` were given.
-
-The `ID` subcommand is relevant only if `/PLOT=BOXPLOT` or
-`/STATISTICS=EXTREME` has been given. If given, it should provide the
-name of a variable which is to be used to labels extreme values and
-outliers. Numeric or string variables are permissible. If the `ID`
-subcommand is not given, then the case number is used for labelling.
-
-The `CINTERVAL` subcommand specifies the confidence interval to use
-in calculation of the descriptives command. The default is 95%.
-
-The `PERCENTILES` subcommand specifies which percentiles are to be
-calculated, and which algorithm to use for calculating them. The
-default is to calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using
-the `HAVERAGE` algorithm.
-
-The `TOTAL` and `NOTOTAL` subcommands are mutually exclusive. If
-`NOTOTAL` is given and factors have been specified in the `VARIABLES`
-subcommand, then statistics for the unfactored dependent variables are
-produced in addition to the factored variables. If there are no factors
-specified then `TOTAL` and `NOTOTAL` have no effect.
-
-The following example generates descriptive statistics and histograms
-for two variables `score1` and `score2`. Two factors are given: `gender`
-and `gender BY culture`. Therefore, the descriptives and histograms are
-generated for each distinct value of `gender` _and_ for each distinct
-combination of the values of `gender` and `race`. Since the `NOTOTAL`
-keyword is given, statistics and histograms for `score1` and `score2`
-covering the whole dataset are not produced.
-
-```
-EXAMINE score1 score2 BY
- gender
- gender BY culture
- /STATISTICS = DESCRIPTIVES
- /PLOT = HISTOGRAM
- /NOTOTAL.
-```
-
-Here is a second example showing how `EXAMINE` may be used to find
-extremities.
-
-```
-EXAMINE height weight BY
- gender
- /STATISTICS = EXTREME (3)
- /PLOT = BOXPLOT
- /COMPARE = GROUPS
- /ID = name.
-```
-
-In this example, we look at the height and weight of a sample of
-individuals and how they differ between male and female. A table
-showing the 3 largest and the 3 smallest values of height and weight for
-each gender, and for the whole dataset as are shown. In addition, the
-`/PLOT` subcommand requests boxplots. Because `/COMPARE = GROUPS` was
-specified, boxplots for male and female are shown in juxtaposed in the
-same graphic, allowing us to easily see the difference between the
-genders. Since the variable `name` was specified on the `ID` subcommand,
-values of the `name` variable are used to label the extreme values.
-
-> ⚠️ If you specify many dependent variables or factor variables for
-which there are many distinct values, then `EXAMINE` will produce a
-very large quantity of output.
+++ /dev/null
-# FACTOR
-
-```
-FACTOR {
- VARIABLES=VAR_LIST,
- MATRIX IN ({CORR,COV}={*,FILE_SPEC})
- }
-
- [ /METHOD = {CORRELATION, COVARIANCE} ]
-
- [ /ANALYSIS=VAR_LIST ]
-
- [ /EXTRACTION={PC, PAF}]
-
- [ /ROTATION={VARIMAX, EQUAMAX, QUARTIMAX, PROMAX[(K)], NOROTATE}]
-
- [ /PRINT=[INITIAL] [EXTRACTION] [ROTATION] [UNIVARIATE] [CORRELATION] [COVARIANCE] [DET] [KMO] [AIC] [SIG] [ALL] [DEFAULT] ]
-
- [ /PLOT=[EIGEN] ]
-
- [ /FORMAT=[SORT] [BLANK(N)] [DEFAULT] ]
-
- [ /CRITERIA=[FACTORS(N)] [MINEIGEN(L)] [ITERATE(M)] [ECONVERGE (DELTA)] [DEFAULT] ]
-
- [ /MISSING=[{LISTWISE, PAIRWISE}] [{INCLUDE, EXCLUDE}] ]
-```
-
-The `FACTOR` command performs Factor Analysis or Principal Axis
-Factoring on a dataset. It may be used to find common factors in the
-data or for data reduction purposes.
-
-The `VARIABLES` subcommand is required (unless the `MATRIX IN`
-subcommand is used). It lists the variables which are to partake in the
-analysis. (The `ANALYSIS` subcommand may optionally further limit the
-variables that participate; it is useful primarily in conjunction with
-`MATRIX IN`.)
-
-If `MATRIX IN` instead of `VARIABLES` is specified, then the analysis
-is performed on a pre-prepared correlation or covariance matrix file
-instead of on individual data cases. Typically the [matrix
-file](../matrix/index.md#matrix-files) will have been generated by
-[`MATRIX DATA`](../matrix/matrix-data.md) or provided by a third
-party. If specified, `MATRIX IN` must be followed by `COV` or `CORR`,
-then by `=` and `FILE_SPEC` all in parentheses. `FILE_SPEC` may
-either be an asterisk, which indicates the currently loaded dataset,
-or it may be a file name to be loaded. See [`MATRIX
-DATA`](../matrix/matrix-data.md), for the expected format of the file.
-
-The `/EXTRACTION` subcommand is used to specify the way in which
-factors (components) are extracted from the data. If `PC` is specified,
-then Principal Components Analysis is used. If `PAF` is specified, then
-Principal Axis Factoring is used. By default Principal Components
-Analysis is used.
-
-The `/ROTATION` subcommand is used to specify the method by which the
-extracted solution is rotated. Three orthogonal rotation methods are
-available: `VARIMAX` (which is the default), `EQUAMAX`, and `QUARTIMAX`.
-There is one oblique rotation method, viz: `PROMAX`. Optionally you may
-enter the power of the promax rotation K, which must be enclosed in
-parentheses. The default value of K is 5. If you don't want any
-rotation to be performed, the word `NOROTATE` prevents the command from
-performing any rotation on the data.
-
-The `/METHOD` subcommand should be used to determine whether the
-covariance matrix or the correlation matrix of the data is to be
-analysed. By default, the correlation matrix is analysed.
-
-The `/PRINT` subcommand may be used to select which features of the
-analysis are reported:
-
-- `UNIVARIATE` A table of mean values, standard deviations and total
- weights are printed.
-- `INITIAL` Initial communalities and eigenvalues are printed.
-- `EXTRACTION` Extracted communalities and eigenvalues are printed.
-- `ROTATION` Rotated communalities and eigenvalues are printed.
-- `CORRELATION` The correlation matrix is printed.
-- `COVARIANCE` The covariance matrix is printed.
-- `DET` The determinant of the correlation or covariance matrix is
- printed.
-- `AIC` The anti-image covariance and anti-image correlation matrices
- are printed.
-- `KMO` The Kaiser-Meyer-Olkin measure of sampling adequacy and the
- Bartlett test of sphericity is printed.
-- `SIG` The significance of the elements of correlation matrix is
- printed.
-- `ALL` All of the above are printed.
-- `DEFAULT` Identical to `INITIAL` and `EXTRACTION`.
-
-If `/PLOT=EIGEN` is given, then a "Scree" plot of the eigenvalues is
-printed. This can be useful for visualizing the factors and deciding
-which factors (components) should be retained.
-
-The `/FORMAT` subcommand determined how data are to be displayed in
-loading matrices. If `SORT` is specified, then the variables are sorted
-in descending order of significance. If `BLANK(N)` is specified, then
-coefficients whose absolute value is less than N are not printed. If
-the keyword `DEFAULT` is specified, or if no `/FORMAT` subcommand is
-specified, then no sorting is performed, and all coefficients are
-printed.
-
-You can use the `/CRITERIA` subcommand to specify how the number of
-extracted factors (components) are chosen. If `FACTORS(N)` is
-specified, where N is an integer, then N factors are extracted.
-Otherwise, the `MINEIGEN` setting is used. `MINEIGEN(L)` requests that
-all factors whose eigenvalues are greater than or equal to L are
-extracted. The default value of L is 1. The `ECONVERGE` setting has
-effect only when using iterative algorithms for factor extraction (such
-as Principal Axis Factoring). `ECONVERGE(DELTA)` specifies that
-iteration should cease when the maximum absolute value of the
-communality estimate between one iteration and the previous is less than
-DELTA. The default value of DELTA is 0.001.
-
-The `ITERATE(M)` may appear any number of times and is used for two
-different purposes. It is used to set the maximum number of iterations
-(M) for convergence and also to set the maximum number of iterations for
-rotation. Whether it affects convergence or rotation depends upon which
-subcommand follows the `ITERATE` subcommand. If `EXTRACTION` follows,
-it affects convergence. If `ROTATION` follows, it affects rotation. If
-neither `ROTATION` nor `EXTRACTION` follow a `ITERATE` subcommand, then
-the entire subcommand is ignored. The default value of M is 25.
-
-The `MISSING` subcommand determines the handling of missing
-variables. If `INCLUDE` is set, then user-missing values are included
-in the calculations, but system-missing values are not. If `EXCLUDE` is
-set, which is the default, user-missing values are excluded as well as
-system-missing values. This is the default. If `LISTWISE` is set, then
-the entire case is excluded from analysis whenever any variable
-specified in the `VARIABLES` subcommand contains a missing value.
-
-If `PAIRWISE` is set, then a case is considered missing only if
-either of the values for the particular coefficient are missing. The
-default is `LISTWISE`.
-
+++ /dev/null
-# FREQUENCIES
-
-```
-FREQUENCIES
- /VARIABLES=VAR_LIST
- /FORMAT={TABLE,NOTABLE,LIMIT(LIMIT)}
- {AVALUE,DVALUE,AFREQ,DFREQ}
- /MISSING={EXCLUDE,INCLUDE}
- /STATISTICS={DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
- KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
- SESKEWNESS,SEKURTOSIS,ALL,NONE}
- /NTILES=NTILES
- /PERCENTILES=percent...
- /HISTOGRAM=[MINIMUM(X_MIN)] [MAXIMUM(X_MAX)]
- [{FREQ[(Y_MAX)],PERCENT[(Y_MAX)]}] [{NONORMAL,NORMAL}]
- /PIECHART=[MINIMUM(X_MIN)] [MAXIMUM(X_MAX)]
- [{FREQ,PERCENT}] [{NOMISSING,MISSING}]
- /BARCHART=[MINIMUM(X_MIN)] [MAXIMUM(X_MAX)]
- [{FREQ,PERCENT}]
- /ORDER={ANALYSIS,VARIABLE}
-
-
-(These options are not currently implemented.)
- /HBAR=...
- /GROUPED=...
-```
-
-The `FREQUENCIES` procedure outputs frequency tables for specified
-variables. `FREQUENCIES` can also calculate and display descriptive
-statistics (including median and mode) and percentiles, and various
-graphical representations of the frequency distribution.
-
-The `VARIABLES` subcommand is the only required subcommand. Specify
-the variables to be analyzed.
-
-The `FORMAT` subcommand controls the output format. It has several
-possible settings:
-
- * `TABLE`, the default, causes a frequency table to be output for
- every variable specified. `NOTABLE` prevents them from being
- output. `LIMIT` with a numeric argument causes them to be output
- except when there are more than the specified number of values in
- the table.
-
- * Normally frequency tables are sorted in ascending order by value.
- This is `AVALUE`. `DVALUE` tables are sorted in descending order
- by value. `AFREQ` and `DFREQ` tables are sorted in ascending and
- descending order, respectively, by frequency count.
-
-The `MISSING` subcommand controls the handling of user-missing values.
-When `EXCLUDE`, the default, is set, user-missing values are not
-included in frequency tables or statistics. When `INCLUDE` is set,
-user-missing are included. System-missing values are never included
-in statistics, but are listed in frequency tables.
-
-The available `STATISTICS` are the same as available in
-[`DESCRIPTIVES`](descriptives.md), with the addition of `MEDIAN`, the
-data's median value, and `MODE`, the mode. (If there are multiple
-modes, the smallest value is reported.) By default, the mean,
-standard deviation of the mean, minimum, and maximum are reported for
-each variable.
-
-`PERCENTILES` causes the specified percentiles to be reported. The
-percentiles should be presented at a list of numbers between 0 and 100
-inclusive. The `NTILES` subcommand causes the percentiles to be
-reported at the boundaries of the data set divided into the specified
-number of ranges. For instance, `/NTILES=4` would cause quartiles to
-be reported.
-
-The `HISTOGRAM` subcommand causes the output to include a histogram
-for each specified numeric variable. The X axis by default ranges
-from the minimum to the maximum value observed in the data, but the
-`MINIMUM` and `MAXIMUM` keywords can set an explicit range.[^1]
-Histograms are not created for string variables.
-
-[^1]: The number of bins is chosen according to the Freedman-Diaconis
-rule: $$2 \times IQR(x)n^{-1/3}$$ where \\(IQR(x)\\) is the
-interquartile range of \\(x\\) and \\(n\\) is the number of samples.
-([`EXAMINE`](examine.md) uses a different algorithm to determine bin
-sizes.)
-
-Specify `NORMAL` to superimpose a normal curve on the histogram.
-
-The `PIECHART` subcommand adds a pie chart for each variable to the
-data. Each slice represents one value, with the size of the slice
-proportional to the value's frequency. By default, all non-missing
-values are given slices. The `MINIMUM` and `MAXIMUM` keywords can be
-used to limit the displayed slices to a given range of values. The
-keyword `NOMISSING` causes missing values to be omitted from the
-piechart. This is the default. If instead, `MISSING` is specified,
-then the pie chart includes a single slice representing all system
-missing and user-missing cases.
-
-The `BARCHART` subcommand produces a bar chart for each variable.
-The `MINIMUM` and `MAXIMUM` keywords can be used to omit categories
-whose counts which lie outside the specified limits. The `FREQ` option
-(default) causes the ordinate to display the frequency of each category,
-whereas the `PERCENT` option displays relative percentages.
-
-The `FREQ` and `PERCENT` options on `HISTOGRAM` and `PIECHART` are
-accepted but not currently honoured.
-
-The `ORDER` subcommand is accepted but ignored.
-
-## Example
-
-The syntax below runs a frequency analysis on the sex and occupation
-variables from the `personnel.sav` file. This is useful to get an
-general idea of the way in which these nominal variables are
-distributed.
-
-```
-get file='personnel.sav'.
-
-frequencies /variables = sex occupation
- /statistics = none.
-```
-
-If you are using the graphic user interface, the dialog box is set up
-such that by default, several statistics are calculated. Some are not
-particularly useful for categorical variables, so you may want to
-disable those.
-
-From the output, shown below, it is evident that there are 33 males,
-21 females and 2 persons for whom their sex has not been entered.
-
-One can also see how many of each occupation there are in the data.
-When dealing with string variables used as nominal values, running a
-frequency analysis is useful to detect data input entries. Notice
-that one occupation value has been mistyped as "Scrientist". This
-entry should be corrected, or marked as missing before using the data.
-
-```
- sex
-┌──────────────┬─────────┬───────┬─────────────┬──────────────────┐
-│ │Frequency│Percent│Valid Percent│Cumulative Percent│
-├──────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Valid Male │ 33│ 58.9%│ 61.1%│ 61.1%│
-│ Female│ 21│ 37.5%│ 38.9%│ 100.0%│
-├──────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Missing . │ 2│ 3.6%│ │ │
-├──────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Total │ 56│ 100.0%│ │ │
-└──────────────┴─────────┴───────┴─────────────┴──────────────────┘
-
- occupation
-┌────────────────────────┬─────────┬───────┬─────────────┬──────────────────┐
-│ │Frequency│Percent│Valid Percent│Cumulative Percent│
-├────────────────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Valid Artist │ 8│ 14.3%│ 14.3%│ 14.3%│
-│ Baker │ 2│ 3.6%│ 3.6%│ 17.9%│
-│ Barrister │ 1│ 1.8%│ 1.8%│ 19.6%│
-│ Carpenter │ 4│ 7.1%│ 7.1%│ 26.8%│
-│ Cleaner │ 4│ 7.1%│ 7.1%│ 33.9%│
-│ Cook │ 7│ 12.5%│ 12.5%│ 46.4%│
-│ Manager │ 8│ 14.3%│ 14.3%│ 60.7%│
-│ Mathematician │ 4│ 7.1%│ 7.1%│ 67.9%│
-│ Painter │ 2│ 3.6%│ 3.6%│ 71.4%│
-│ Payload Specialist│ 1│ 1.8%│ 1.8%│ 73.2%│
-│ Plumber │ 5│ 8.9%│ 8.9%│ 82.1%│
-│ Scientist │ 7│ 12.5%│ 12.5%│ 94.6%│
-│ Scrientist │ 1│ 1.8%│ 1.8%│ 96.4%│
-│ Tailor │ 2│ 3.6%│ 3.6%│ 100.0%│
-├────────────────────────┼─────────┼───────┼─────────────┼──────────────────┤
-│Total │ 56│ 100.0%│ │ │
-└────────────────────────┴─────────┴───────┴─────────────┴──────────────────┘
-```
-
+++ /dev/null
-# GLM
-
-```
-GLM DEPENDENT_VARS BY FIXED_FACTORS
- [/METHOD = SSTYPE(TYPE)]
- [/DESIGN = INTERACTION_0 [INTERACTION_1 [... INTERACTION_N]]]
- [/INTERCEPT = {INCLUDE|EXCLUDE}]
- [/MISSING = {INCLUDE|EXCLUDE}]
-```
-
-The `GLM` procedure can be used for fixed effects factorial Anova.
-
-The `DEPENDENT_VARS` are the variables to be analysed. You may analyse
-several variables in the same command in which case they should all
-appear before the `BY` keyword.
-
-The `FIXED_FACTORS` list must be one or more categorical variables.
-Normally it does not make sense to enter a scalar variable in the
-`FIXED_FACTORS` and doing so may cause PSPP to do a lot of unnecessary
-processing.
-
-The `METHOD` subcommand is used to change the method for producing
-the sums of squares. Available values of `TYPE` are 1, 2 and 3. The
-default is type 3.
-
-You may specify a custom design using the `DESIGN` subcommand. The
-design comprises a list of interactions where each interaction is a list
-of variables separated by a `*`. For example the command
-```
-GLM subject BY sex age_group race
- /DESIGN = age_group sex group age_group*sex age_group*race
-```
-specifies the model
-```
-subject = age_group + sex + race + age_group×sex + age_group×race
-```
-If no `DESIGN` subcommand is specified, then the
-default is all possible combinations of the fixed factors. That is to
-say
-```
-GLM subject BY sex age_group race
-```
-implies the model
-```
-subject = age_group + sex + race + age_group×sex + age_group×race + sex×race + age_group×sex×race
-```
-
-The `MISSING` subcommand determines the handling of missing variables.
-If `INCLUDE` is set then, for the purposes of GLM analysis, only
-system-missing values are considered to be missing; user-missing
-values are not regarded as missing. If `EXCLUDE` is set, which is the
-default, then user-missing values are considered to be missing as well
-as system-missing values. A case for which any dependent variable or
-any factor variable has a missing value is excluded from the analysis.
-
+++ /dev/null
-# GRAPH
-
-```
-GRAPH
- /HISTOGRAM [(NORMAL)]= VAR
- /SCATTERPLOT [(BIVARIATE)] = VAR1 WITH VAR2 [BY VAR3]
- /BAR = {SUMMARY-FUNCTION(VAR1) | COUNT-FUNCTION} BY VAR2 [BY VAR3]
- [ /MISSING={LISTWISE, VARIABLE} [{EXCLUDE, INCLUDE}] ]
- [{NOREPORT,REPORT}]
-```
-
-`GRAPH` produces a graphical plots of data. Only one of the
-subcommands `HISTOGRAM`, `BAR` or `SCATTERPLOT` can be specified, i.e.
-only one plot can be produced per call of `GRAPH`. The `MISSING` is
-optional.
-
-## Scatterplot
-
-The subcommand `SCATTERPLOT` produces an xy plot of the data. `GRAPH`
-uses `VAR3`, if specified, to determine the colours and/or
-markers for the plot. The following is an example for producing a
-scatterplot.
-
-```
-GRAPH
- /SCATTERPLOT = height WITH weight BY gender.
-```
-
-This example produces a scatterplot where `height` is plotted versus
-`weight`. Depending on the value of `gender`, the colour of the
-datapoint is different. With this plot it is possible to analyze
-gender differences for `height` versus `weight` relation.
-
-## Histogram
-
-The subcommand `HISTOGRAM` produces a histogram. Only one variable is
-allowed for the histogram plot. The keyword `NORMAL` may be specified
-in parentheses, to indicate that the ideal normal curve should be
-superimposed over the histogram. For an alternative method to produce
-histograms, see [EXAMINE](examine.md). The following example produces
-a histogram plot for the variable `weight`.
-
-```
-GRAPH
- /HISTOGRAM = weight.
-```
-
-## Bar Chart
-
-The subcommand `BAR` produces a bar chart. This subcommand requires
-that a `COUNT-FUNCTION` be specified (with no arguments) or a
-`SUMMARY-FUNCTION` with a variable VAR1 in parentheses. Following the
-summary or count function, the keyword `BY` should be specified and
-then a catagorical variable, `VAR2`. The values of `VAR2` determine
-the labels of the bars to be plotted. A second categorical variable
-`VAR3` may be specified, in which case a clustered (grouped) bar chart
-is produced.
-
-Valid count functions are:
-
-* `COUNT`
- The weighted counts of the cases in each category.
-* `PCT`
- The weighted counts of the cases in each category expressed as a
- percentage of the total weights of the cases.
-* `CUFREQ`
- The cumulative weighted counts of the cases in each category.
-* `CUPCT`
- The cumulative weighted counts of the cases in each category
- expressed as a percentage of the total weights of the cases.
-
-The summary function is applied to `VAR1` across all cases in each
-category. The recognised summary functions are:
-
-* `SUM`
- The sum.
-* `MEAN`
- The arithmetic mean.
-* `MAXIMUM`
- The maximum value.
-* `MINIMUM`
- The minimum value.
-
-The following examples assume a dataset which is the results of a
-survey. Each respondent has indicated annual income, their sex and city
-of residence. One could create a bar chart showing how the mean income
-varies between of residents of different cities, thus:
-```
-GRAPH /BAR = MEAN(INCOME) BY CITY.
-```
-
-This can be extended to also indicate how income in each city differs
-between the sexes.
-```
-GRAPH /BAR = MEAN(INCOME) BY CITY BY SEX.
-```
-
-One might also want to see how many respondents there are from each
-city. This can be achieved as follows:
-```
-GRAPH /BAR = COUNT BY CITY.
-```
-
-The [FREQUENCIES](frequencies.md) and [CROSSTABS](crosstabs.md)
-commands can also produce bar charts.
-
+++ /dev/null
-This chapter documents the statistical procedures that PSPP supports.
+++ /dev/null
-# LOGISTIC REGRESSION
-
-```
-LOGISTIC REGRESSION [VARIABLES =] DEPENDENT_VAR WITH PREDICTORS
- [/CATEGORICAL = CATEGORICAL_PREDICTORS]
- [{/NOCONST | /ORIGIN | /NOORIGIN }]
- [/PRINT = [SUMMARY] [DEFAULT] [CI(CONFIDENCE)] [ALL]]
- [/CRITERIA = [BCON(MIN_DELTA)] [ITERATE(MAX_INTERATIONS)]
- [LCON(MIN_LIKELIHOOD_DELTA)] [EPS(MIN_EPSILON)]
- [CUT(CUT_POINT)]]
- [/MISSING = {INCLUDE|EXCLUDE}]
-```
-
-Bivariate Logistic Regression is used when you want to explain a
-dichotomous dependent variable in terms of one or more predictor
-variables.
-
-The minimum command is
-```
-LOGISTIC REGRESSION y WITH x1 x2 ... xN.
-```
-
-Here, `y` is the dependent variable, which must be dichotomous and
-`x1` through `xN` are the predictor variables whose coefficients the
-procedure estimates.
-
-By default, a constant term is included in the model. Hence, the
-full model is $${\bf y} = b_0 + b_1 {\bf x_1} + b_2 {\bf x_2} + \dots +
-b_n {\bf x_n}.$$
-
-Predictor variables which are categorical in nature should be listed
-on the `/CATEGORICAL` subcommand. Simple variables as well as
-interactions between variables may be listed here.
-
-If you want a model without the constant term b_0, use the keyword
-`/ORIGIN`. `/NOCONST` is a synonym for `/ORIGIN`.
-
-An iterative Newton-Raphson procedure is used to fit the model. The
-`/CRITERIA` subcommand is used to specify the stopping criteria of the
-procedure, and other parameters. The value of `CUT_POINT` is used in the
-classification table. It is the threshold above which predicted values
-are considered to be 1. Values of `CUT_POINT` must lie in the range
-\[0,1\]. During iterations, if any one of the stopping criteria are
-satisfied, the procedure is considered complete. The stopping criteria
-are:
-
-- The number of iterations exceeds `MAX_ITERATIONS`. The default value
- of `MAX_ITERATIONS` is 20.
-- The change in the all coefficient estimates are less than
- `MIN_DELTA`. The default value of `MIN_DELTA` is 0.001.
-- The magnitude of change in the likelihood estimate is less than
- `MIN_LIKELIHOOD_DELTA`. The default value of `MIN_LIKELIHOOD_DELTA`
- is zero. This means that this criterion is disabled.
-- The differential of the estimated probability for all cases is less
- than `MIN_EPSILON`. In other words, the probabilities are close to
- zero or one. The default value of `MIN_EPSILON` is 0.00000001.
-
-The `PRINT` subcommand controls the display of optional statistics.
-Currently there is one such option, `CI`, which indicates that the
-confidence interval of the odds ratio should be displayed as well as its
-value. `CI` should be followed by an integer in parentheses, to
-indicate the confidence level of the desired confidence interval.
-
-The `MISSING` subcommand determines the handling of missing
-variables. If `INCLUDE` is set, then user-missing values are included
-in the calculations, but system-missing values are not. If `EXCLUDE` is
-set, which is the default, user-missing values are excluded as well as
-system-missing values. This is the default.
-
+++ /dev/null
-# MEANS
-
-```
-MEANS [TABLES =]
- {VAR_LIST}
- [ BY {VAR_LIST} [BY {VAR_LIST} [BY {VAR_LIST} ... ]]]
-
- [ /{VAR_LIST}
- [ BY {VAR_LIST} [BY {VAR_LIST} [BY {VAR_LIST} ... ]]] ]
-
- [/CELLS = [MEAN] [COUNT] [STDDEV] [SEMEAN] [SUM] [MIN] [MAX] [RANGE]
- [VARIANCE] [KURT] [SEKURT]
- [SKEW] [SESKEW] [FIRST] [LAST]
- [HARMONIC] [GEOMETRIC]
- [DEFAULT]
- [ALL]
- [NONE] ]
-
- [/MISSING = [INCLUDE] [DEPENDENT]]
-```
-
-You can use the `MEANS` command to calculate the arithmetic mean and
-similar statistics, either for the dataset as a whole or for categories
-of data.
-
-The simplest form of the command is
-```
-MEANS V.
-```
-which calculates the mean, count and standard deviation for V. If you
-specify a grouping variable, for example
-```
-MEANS V BY G.
-```
-then the means, counts and standard deviations for V after having been
-grouped by G are calculated. Instead of the mean, count and standard
-deviation, you could specify the statistics in which you are interested:
-```
-MEANS X Y BY G
- /CELLS = HARMONIC SUM MIN.
-```
-This example calculates the harmonic mean, the sum and the minimum
-values of X and Y grouped by G.
-
-The `CELLS` subcommand specifies which statistics to calculate. The
-available statistics are:
-- `MEAN`: The arithmetic mean.
-- `COUNT`: The count of the values.
-- `STDDEV`: The standard deviation.
-- `SEMEAN`: The standard error of the mean.
-- `SUM`: The sum of the values.
-- `MIN`: The minimum value.
-- `MAX`: The maximum value.
-- `RANGE`: The difference between the maximum and minimum values.
-- `VARIANCE`: The variance.
-- `FIRST`: The first value in the category.
-- `LAST`: The last value in the category.
-- `SKEW`: The skewness.
-- `SESKEW`: The standard error of the skewness.
-- `KURT`: The kurtosis
-- `SEKURT`: The standard error of the kurtosis.
-- `HARMONIC`: The harmonic mean.
-- `GEOMETRIC`: The geometric mean.
-
-In addition, three special keywords are recognized:
-- `DEFAULT`: This is the same as `MEAN COUNT STDDEV`.
-- `ALL`: All of the above statistics are calculated.
-- `NONE`: No statistics are calculated (only a summary is shown).
-
-More than one "table" can be specified in a single command. Each
-table is separated by a `/`. For example
-
-```
- MEANS TABLES =
- c d e BY x
- /a b BY x y
- /f BY y BY z.
-```
-
-has three tables (the `TABLE =` is optional). The first table has
-three dependent variables `c`, `d`, and `e` and a single categorical
-variable `x`. The second table has two dependent variables `a` and
-`b`, and two categorical variables `x` and `y`. The third table has a
-single dependent variable `f` and a categorical variable formed by the
-combination of `y` and `Z`.
-
-By default values are omitted from the analysis only if missing
-values (either system missing or user missing) for any of the variables
-directly involved in their calculation are encountered. This behaviour
-can be modified with the `/MISSING` subcommand. Three options are
-possible: `TABLE`, `INCLUDE` and `DEPENDENT`.
-
-`/MISSING = INCLUDE` says that user missing values, either in the
-dependent variables or in the categorical variables should be taken at
-their face value, and not excluded.
-
-`/MISSING = DEPENDENT` says that user missing values, in the
-dependent variables should be taken at their face value, however cases
-which have user missing values for the categorical variables should be
-omitted from the calculation.
-
-## Example
-
-The dataset in `repairs.sav` contains the mean time between failures
-(mtbf) for a sample of artifacts produced by different factories and
-trialed under different operating conditions. Since there are four
-combinations of categorical variables, by simply looking at the list
-of data, it would be hard to how the scores vary for each category.
-The syntax below shows one way of tabulating the mtbf in a way which
-is easier to understand.
-
-```
-get file='repairs.sav'.
-
-means tables = mtbf
- by factory by environment.
-```
-
-The results are shown below. The figures shown indicate the mean,
-standard deviation and number of samples in each category. These
-figures however do not indicate whether the results are statistically
-significant. For that, you would need to use the procedures `ONEWAY`,
-`GLM` or `T-TEST` depending on the hypothesis being tested.
-
-```
- Case Processing Summary
-┌────────────────────────────┬───────────────────────────────┐
-│ │ Cases │
-│ ├──────────┬─────────┬──────────┤
-│ │ Included │ Excluded│ Total │
-│ ├──┬───────┼─┬───────┼──┬───────┤
-│ │ N│Percent│N│Percent│ N│Percent│
-├────────────────────────────┼──┼───────┼─┼───────┼──┼───────┤
-│mtbf * factory * environment│30│ 100.0%│0│ .0%│30│ 100.0%│
-└────────────────────────────┴──┴───────┴─┴───────┴──┴───────┘
-
- Report
-┌────────────────────────────────────────────┬─────┬──┬──────────────┐
-│Manufacturing facility Operating Environment│ Mean│ N│Std. Deviation│
-├────────────────────────────────────────────┼─────┼──┼──────────────┤
-│0 Temperate │ 7.26│ 9│ 2.57│
-│ Tropical │ 7.47│ 7│ 2.68│
-│ Total │ 7.35│16│ 2.53│
-├────────────────────────────────────────────┼─────┼──┼──────────────┤
-│1 Temperate │13.38│ 6│ 7.77│
-│ Tropical │ 8.20│ 8│ 8.39│
-│ Total │10.42│14│ 8.26│
-├────────────────────────────────────────────┼─────┼──┼──────────────┤
-│Total Temperate │ 9.71│15│ 5.91│
-│ Tropical │ 7.86│15│ 6.20│
-│ Total │ 8.78│30│ 6.03│
-└────────────────────────────────────────────┴─────┴──┴──────────────┘
-```
-
-PSPP does not limit the number of variables for which you can
-calculate statistics, nor number of categorical variables per layer,
-nor the number of layers. However, running `MEANS` on a large number
-of variables, or with categorical variables containing a large number
-of distinct values, may result in an extremely large output, which
-will not be easy to interpret. So you should consider carefully which
-variables to select for participation in the analysis.
-
+++ /dev/null
-# NPAR TESTS
-
-```
-NPAR TESTS
- nonparametric test subcommands
- .
- .
- .
-
- [ /STATISTICS={DESCRIPTIVES} ]
-
- [ /MISSING={ANALYSIS, LISTWISE} {INCLUDE, EXCLUDE} ]
-
- [ /METHOD=EXACT [ TIMER [(N)] ] ]
-```
-
-`NPAR TESTS` performs nonparametric tests. Nonparametric tests make
-very few assumptions about the distribution of the data. One or more
-tests may be specified by using the corresponding subcommand. If the
-`/STATISTICS` subcommand is also specified, then summary statistics
-are produces for each variable that is the subject of any test.
-
-Certain tests may take a long time to execute, if an exact figure is
-required. Therefore, by default asymptotic approximations are used
-unless the subcommand `/METHOD=EXACT` is specified. Exact tests give
-more accurate results, but may take an unacceptably long time to
-perform. If the `TIMER` keyword is used, it sets a maximum time,
-after which the test is abandoned, and a warning message printed. The
-time, in minutes, should be specified in parentheses after the `TIMER`
-keyword. If the `TIMER` keyword is given without this figure, then a
-default value of 5 minutes is used.
-
-<!-- toc -->
-
-## Binomial test
-
-```
- [ /BINOMIAL[(P)]=VAR_LIST[(VALUE1[, VALUE2)] ] ]
-```
-
-The `/BINOMIAL` subcommand compares the observed distribution of a
-dichotomous variable with that of a binomial distribution. The variable
-`P` specifies the test proportion of the binomial distribution. The
-default value of 0.5 is assumed if `P` is omitted.
-
-If a single value appears after the variable list, then that value is
-used as the threshold to partition the observed values. Values less
-than or equal to the threshold value form the first category. Values
-greater than the threshold form the second category.
-
-If two values appear after the variable list, then they are used as
-the values which a variable must take to be in the respective category.
-Cases for which a variable takes a value equal to neither of the
-specified values, take no part in the test for that variable.
-
-If no values appear, then the variable must assume dichotomous
-values. If more than two distinct, non-missing values for a variable
-under test are encountered then an error occurs.
-
-If the test proportion is equal to 0.5, then a two tailed test is
-reported. For any other test proportion, a one tailed test is reported.
-For one tailed tests, if the test proportion is less than or equal to
-the observed proportion, then the significance of observing the observed
-proportion or more is reported. If the test proportion is more than the
-observed proportion, then the significance of observing the observed
-proportion or less is reported. That is to say, the test is always
-performed in the observed direction.
-
-PSPP uses a very precise approximation to the gamma function to
-compute the binomial significance. Thus, exact results are reported
-even for very large sample sizes.
-
-## Chi-square Test
-
-```
- [ /CHISQUARE=VAR_LIST[(LO,HI)] [/EXPECTED={EQUAL|F1, F2 ... FN}] ]
-```
-
-The `/CHISQUARE` subcommand produces a chi-square statistic for the
-differences between the expected and observed frequencies of the
-categories of a variable. Optionally, a range of values may appear
-after the variable list. If a range is given, then non-integer values
-are truncated, and values outside the specified range are excluded
-from the analysis.
-
-The `/EXPECTED` subcommand specifies the expected values of each
-category. There must be exactly one non-zero expected value, for each
-observed category, or the `EQUAL` keyword must be specified. You may
-use the notation `N*F` to specify N consecutive expected categories all
-taking a frequency of F. The frequencies given are proportions, not
-absolute frequencies. The sum of the frequencies need not be 1. If no
-`/EXPECTED` subcommand is given, then equal frequencies are expected.
-
-### Chi-square Example
-
-A researcher wishes to investigate whether there are an equal number of
-persons of each sex in a population. The sample chosen for invesigation
-is that from the `physiology.sav` dataset. The null hypothesis for the
-test is that the population comprises an equal number of males and
-females. The analysis is performed as shown below:
-
-```
-get file='physiology.sav'.
-
-npar test
- /chisquare=sex.
-```
-
-
-There is only one test variable: sex. The other variables in
-the dataset are ignored.
-
-In the output, shown below, the summary box shows that in the sample,
-there are more males than females. However the significance of
-chi-square result is greater than 0.05—the most commonly accepted
-p-value—and therefore there is not enough evidence to reject the null
-hypothesis and one must conclude that the evidence does not indicate
-that there is an imbalance of the sexes in the population.
-
-```
- Sex of subject
-┌──────┬──────────┬──────────┬────────┐
-│Value │Observed N│Expected N│Residual│
-├──────┼──────────┼──────────┼────────┤
-│Male │ 22│ 20.00│ 2.00│
-│Female│ 18│ 20.00│ ─2.00│
-│Total │ 40│ │ │
-└──────┴──────────┴──────────┴────────┘
-
- Test Statistics
-┌──────────────┬──────────┬──┬───────────┐
-│ │Chi─square│df│Asymp. Sig.│
-├──────────────┼──────────┼──┼───────────┤
-│Sex of subject│ .40│ 1│ .527│
-└──────────────┴──────────┴──┴───────────┘
-```
-
-## Cochran Q Test
-
-```
- [ /COCHRAN = VAR_LIST ]
-```
-
-The Cochran Q test is used to test for differences between three or
-more groups. The data for `VAR_LIST` in all cases must assume exactly
-two distinct values (other than missing values).
-
-The value of Q is displayed along with its asymptotic significance
-based on a chi-square distribution.
-
-## Friedman Test
-
-```
- [ /FRIEDMAN = VAR_LIST ]
-```
-
-The Friedman test is used to test for differences between repeated
-measures when there is no indication that the distributions are normally
-distributed.
-
-A list of variables which contain the measured data must be given.
-The procedure prints the sum of ranks for each variable, the test
-statistic and its significance.
-
-## Kendall's W Test
-
-```
- [ /KENDALL = VAR_LIST ]
-```
-
-The Kendall test investigates whether an arbitrary number of related
-samples come from the same population. It is identical to the
-Friedman test except that the additional statistic W, Kendall's
-Coefficient of Concordance is printed. It has the range \[0,1\]—a value
-of zero indicates no agreement between the samples whereas a value of
-unity indicates complete agreement.
-
-## Kolmogorov-Smirnov Test
-
-```
- [ /KOLMOGOROV-SMIRNOV ({NORMAL [MU, SIGMA], UNIFORM [MIN, MAX], POISSON [LAMBDA], EXPONENTIAL [SCALE] }) = VAR_LIST ]
-```
-
-The one sample Kolmogorov-Smirnov subcommand is used to test whether
-or not a dataset is drawn from a particular distribution. Four
-distributions are supported: normal, uniform, Poisson and
-exponential.
-
-Ideally you should provide the parameters of the distribution against
-which you wish to test the data. For example, with the normal
-distribution the mean (`MU`) and standard deviation (`SIGMA`) should
-be given; with the uniform distribution, the minimum (`MIN`) and
-maximum (`MAX`) value should be provided. However, if the parameters
-are omitted they are imputed from the data. Imputing the parameters
-reduces the power of the test so should be avoided if possible.
-
-In the following example, two variables `score` and `age` are tested to
-see if they follow a normal distribution with a mean of 3.5 and a
-standard deviation of 2.0.
-```
- NPAR TESTS
- /KOLMOGOROV-SMIRNOV (NORMAL 3.5 2.0) = score age.
-```
-If the variables need to be tested against different distributions,
-then a separate subcommand must be used. For example the following
-syntax tests `score` against a normal distribution with mean of 3.5 and
-standard deviation of 2.0 whilst `age` is tested against a normal
-distribution of mean 40 and standard deviation 1.5.
-```
- NPAR TESTS
- /KOLMOGOROV-SMIRNOV (NORMAL 3.5 2.0) = score
- /KOLMOGOROV-SMIRNOV (NORMAL 40 1.5) = age.
-```
-
-The abbreviated subcommand `K-S` may be used in place of
-`KOLMOGOROV-SMIRNOV`.
-
-## Kruskal-Wallis Test
-
-```
- [ /KRUSKAL-WALLIS = VAR_LIST BY VAR (LOWER, UPPER) ]
-```
-
-The Kruskal-Wallis test is used to compare data from an arbitrary
-number of populations. It does not assume normality. The data to be
-compared are specified by `VAR_LIST`. The categorical variable
-determining the groups to which the data belongs is given by `VAR`.
-The limits `LOWER` and `UPPER` specify the valid range of `VAR`. If
-`UPPER` is smaller than `LOWER`, the PSPP will assume their values to
-be reversed. Any cases for which `VAR` falls outside `[LOWER, UPPER]`
-are ignored.
-
-The mean rank of each group as well as the chi-squared value and
-significance of the test are printed. The abbreviated subcommand `K-W`
-may be used in place of `KRUSKAL-WALLIS`.
-
-## Mann-Whitney U Test
-
-```
- [ /MANN-WHITNEY = VAR_LIST BY var (GROUP1, GROUP2) ]
-```
-
-The Mann-Whitney subcommand is used to test whether two groups of
-data come from different populations. The variables to be tested should
-be specified in `VAR_LIST` and the grouping variable, that determines to
-which group the test variables belong, in `VAR`. `VAR` may be either a
-string or an alpha variable. `GROUP1` and `GROUP2` specify the two values
-of VAR which determine the groups of the test data. Cases for which the
-`VAR` value is neither `GROUP1` or `GROUP2` are ignored.
-
-The value of the Mann-Whitney U statistic, the Wilcoxon W, and the
-significance are printed. You may abbreviated the subcommand
-`MANN-WHITNEY` to `M-W`.
-
-
-## McNemar Test
-
-```
- [ /MCNEMAR VAR_LIST [ WITH VAR_LIST [ (PAIRED) ]]]
-```
-
-Use McNemar's test to analyse the significance of the difference
-between pairs of correlated proportions.
-
-If the `WITH` keyword is omitted, then tests for all combinations of
-the listed variables are performed. If the `WITH` keyword is given, and
-the `(PAIRED)` keyword is also given, then the number of variables
-preceding `WITH` must be the same as the number following it. In this
-case, tests for each respective pair of variables are performed. If the
-`WITH` keyword is given, but the `(PAIRED)` keyword is omitted, then
-tests for each combination of variable preceding `WITH` against variable
-following `WITH` are performed.
-
-The data in each variable must be dichotomous. If there are more
-than two distinct variables an error will occur and the test will not be
-run.
-
-## Median Test
-
-```
- [ /MEDIAN [(VALUE)] = VAR_LIST BY VARIABLE (VALUE1, VALUE2) ]
-```
-
-The median test is used to test whether independent samples come from
-populations with a common median. The median of the populations against
-which the samples are to be tested may be given in parentheses
-immediately after the `/MEDIAN` subcommand. If it is not given, the
-median is imputed from the union of all the samples.
-
-The variables of the samples to be tested should immediately follow
-the `=` sign. The keyword `BY` must come next, and then the grouping
-variable. Two values in parentheses should follow. If the first
-value is greater than the second, then a 2-sample test is performed
-using these two values to determine the groups. If however, the first
-variable is less than the second, then a k sample test is conducted
-and the group values used are all values encountered which lie in the
-range `[VALUE1,VALUE2]`.
-
-## Runs Test
-
-```
- [ /RUNS ({MEAN, MEDIAN, MODE, VALUE}) = VAR_LIST ]
-```
-
-The `/RUNS` subcommand tests whether a data sequence is randomly
-ordered.
-
-It works by examining the number of times a variable's value crosses
-a given threshold. The desired threshold must be specified within
-parentheses. It may either be specified as a number or as one of
-`MEAN`, `MEDIAN` or `MODE`. Following the threshold specification comes
-the list of variables whose values are to be tested.
-
-The subcommand shows the number of runs, the asymptotic significance
-based on the length of the data.
-
-## Sign Test
-
-```
- [ /SIGN VAR_LIST [ WITH VAR_LIST [ (PAIRED) ]]]
-```
-
-The `/SIGN` subcommand tests for differences between medians of the
-variables listed. The test does not make any assumptions about the
-distribution of the data.
-
-If the `WITH` keyword is omitted, then tests for all combinations of
-the listed variables are performed. If the `WITH` keyword is given, and
-the `(PAIRED)` keyword is also given, then the number of variables
-preceding `WITH` must be the same as the number following it. In this
-case, tests for each respective pair of variables are performed. If the
-`WITH` keyword is given, but the `(PAIRED)` keyword is omitted, then
-tests for each combination of variable preceding `WITH` against variable
-following `WITH` are performed.
-
-## Wilcoxon Matched Pairs Signed Ranks Test
-
-```
- [ /WILCOXON VAR_LIST [ WITH VAR_LIST [ (PAIRED) ]]]
-```
-
-The `/WILCOXON` subcommand tests for differences between medians of
-the variables listed. The test does not make any assumptions about the
-variances of the samples. It does however assume that the distribution
-is symmetrical.
-
-If the `WITH` keyword is omitted, then tests for all combinations of
-the listed variables are performed. If the `WITH` keyword is given, and
-the `(PAIRED)` keyword is also given, then the number of variables
-preceding `WITH` must be the same as the number following it. In this
-case, tests for each respective pair of variables are performed. If the
-`WITH` keyword is given, but the `(PAIRED)` keyword is omitted, then
-tests for each combination of variable preceding `WITH` against variable
-following `WITH` are performed.
-
+++ /dev/null
-# ONEWAY
-
-```
-ONEWAY
- [/VARIABLES = ] VAR_LIST BY VAR
- /MISSING={ANALYSIS,LISTWISE} {EXCLUDE,INCLUDE}
- /CONTRAST= VALUE1 [, VALUE2] ... [,VALUEN]
- /STATISTICS={DESCRIPTIVES,HOMOGENEITY}
- /POSTHOC={BONFERRONI, GH, LSD, SCHEFFE, SIDAK, TUKEY, ALPHA ([VALUE])}
-```
-
-The `ONEWAY` procedure performs a one-way analysis of variance of
-variables factored by a single independent variable. It is used to
-compare the means of a population divided into more than two groups.
-
-The dependent variables to be analysed should be given in the
-`VARIABLES` subcommand. The list of variables must be followed by the
-`BY` keyword and the name of the independent (or factor) variable.
-
-You can use the `STATISTICS` subcommand to tell PSPP to display
-ancillary information. The options accepted are:
-- `DESCRIPTIVES`: Displays descriptive statistics about the groups
-factored by the independent variable.
-- `HOMOGENEITY`: Displays the Levene test of Homogeneity of Variance for
-the variables and their groups.
-
-The `CONTRAST` subcommand is used when you anticipate certain
-differences between the groups. The subcommand must be followed by a
-list of numerals which are the coefficients of the groups to be tested.
-The number of coefficients must correspond to the number of distinct
-groups (or values of the independent variable). If the total sum of the
-coefficients are not zero, then PSPP will display a warning, but will
-proceed with the analysis. The `CONTRAST` subcommand may be given up to
-10 times in order to specify different contrast tests. The `MISSING`
-subcommand defines how missing values are handled. If `LISTWISE` is
-specified then cases which have missing values for the independent
-variable or any dependent variable are ignored. If `ANALYSIS` is
-specified, then cases are ignored if the independent variable is missing
-or if the dependent variable currently being analysed is missing. The
-default is `ANALYSIS`. A setting of `EXCLUDE` means that variables
-whose values are user-missing are to be excluded from the analysis. A
-setting of `INCLUDE` means they are to be included. The default is
-`EXCLUDE`.
-
-Using the `POSTHOC` subcommand you can perform multiple pairwise
-comparisons on the data. The following comparison methods are
-available:
-- `LSD`: Least Significant Difference.
-- `TUKEY`: Tukey Honestly Significant Difference.
-- `BONFERRONI`: Bonferroni test.
-- `SCHEFFE`: Scheffé's test.
-- `SIDAK`: Sidak test.
-- `GH`: The Games-Howell test.
-
-Use the optional syntax `ALPHA(VALUE)` to indicate that `ONEWAY` should
-perform the posthoc tests at a confidence level of VALUE. If
-`ALPHA(VALUE)` is not specified, then the confidence level used is 0.05.
-
+++ /dev/null
-# QUICK CLUSTER
-
-```
-QUICK CLUSTER VAR_LIST
- [/CRITERIA=CLUSTERS(K) [MXITER(MAX_ITER)] CONVERGE(EPSILON) [NOINITIAL]]
- [/MISSING={EXCLUDE,INCLUDE} {LISTWISE, PAIRWISE}]
- [/PRINT={INITIAL} {CLUSTER}]
- [/SAVE[=[CLUSTER[(MEMBERSHIP_VAR)]] [DISTANCE[(DISTANCE_VAR)]]]
-```
-
-The `QUICK CLUSTER` command performs k-means clustering on the
-dataset. This is useful when you wish to allocate cases into clusters
-of similar values and you already know the number of clusters.
-
-The minimum specification is `QUICK CLUSTER` followed by the names of
-the variables which contain the cluster data. Normally you will also
-want to specify `/CRITERIA=CLUSTERS(K)` where `K` is the number of
-clusters. If this is not specified, then `K` defaults to 2.
-
-If you use `/CRITERIA=NOINITIAL` then a naive algorithm to select the
-initial clusters is used. This will provide for faster execution but
-less well separated initial clusters and hence possibly an inferior
-final result.
-
-`QUICK CLUSTER` uses an iterative algorithm to select the clusters
-centers. The subcommand `/CRITERIA=MXITER(MAX_ITER)` sets the maximum
-number of iterations. During classification, PSPP will continue
-iterating until until `MAX_ITER` iterations have been done or the
-convergence criterion (see below) is fulfilled. The default value of
-MAX_ITER is 2.
-
-If however, you specify `/CRITERIA=NOUPDATE` then after selecting the
-initial centers, no further update to the cluster centers is done. In
-this case, `MAX_ITER`, if specified, is ignored.
-
-The subcommand `/CRITERIA=CONVERGE(EPSILON)` is used to set the
-convergence criterion. The value of convergence criterion is
-`EPSILON` times the minimum distance between the _initial_ cluster
-centers. Iteration stops when the mean cluster distance between one
-iteration and the next is less than the convergence criterion. The
-default value of `EPSILON` is zero.
-
-The `MISSING` subcommand determines the handling of missing
-variables. If `INCLUDE` is set, then user-missing values are considered
-at their face value and not as missing values. If `EXCLUDE` is set,
-which is the default, user-missing values are excluded as well as
-system-missing values.
-
-If `LISTWISE` is set, then the entire case is excluded from the
-analysis whenever any of the clustering variables contains a missing
-value. If `PAIRWISE` is set, then a case is considered missing only if
-all the clustering variables contain missing values. Otherwise it is
-clustered on the basis of the non-missing values. The default is
-`LISTWISE`.
-
-The `PRINT` subcommand requests additional output to be printed. If
-`INITIAL` is set, then the initial cluster memberships will be printed.
-If `CLUSTER` is set, the cluster memberships of the individual cases are
-displayed (potentially generating lengthy output).
-
-You can specify the subcommand `SAVE` to ask that each case's cluster
-membership and the euclidean distance between the case and its cluster
-center be saved to a new variable in the active dataset. To save the
-cluster membership use the `CLUSTER` keyword and to save the distance
-use the `DISTANCE` keyword. Each keyword may optionally be followed by
-a variable name in parentheses to specify the new variable which is to
-contain the saved parameter. If no variable name is specified, then
-PSPP will create one.
-
+++ /dev/null
-# RANK
-
-```
-RANK
- [VARIABLES=] VAR_LIST [{A,D}] [BY VAR_LIST]
- /TIES={MEAN,LOW,HIGH,CONDENSE}
- /FRACTION={BLOM,TUKEY,VW,RANKIT}
- /PRINT[={YES,NO}
- /MISSING={EXCLUDE,INCLUDE}
-
- /RANK [INTO VAR_LIST]
- /NTILES(k) [INTO VAR_LIST]
- /NORMAL [INTO VAR_LIST]
- /PERCENT [INTO VAR_LIST]
- /RFRACTION [INTO VAR_LIST]
- /PROPORTION [INTO VAR_LIST]
- /N [INTO VAR_LIST]
- /SAVAGE [INTO VAR_LIST]
-```
-
-The `RANK` command ranks variables and stores the results into new
-variables.
-
-The `VARIABLES` subcommand, which is mandatory, specifies one or more
-variables whose values are to be ranked. After each variable, `A` or
-`D` may appear, indicating that the variable is to be ranked in
-ascending or descending order. Ascending is the default. If a `BY`
-keyword appears, it should be followed by a list of variables which are
-to serve as group variables. In this case, the cases are gathered into
-groups, and ranks calculated for each group.
-
-The `TIES` subcommand specifies how tied values are to be treated.
-The default is to take the mean value of all the tied cases.
-
-The `FRACTION` subcommand specifies how proportional ranks are to be
-calculated. This only has any effect if `NORMAL` or `PROPORTIONAL` rank
-functions are requested.
-
-The `PRINT` subcommand may be used to specify that a summary of the
-rank variables created should appear in the output.
-
-The function subcommands are `RANK`, `NTILES`, `NORMAL`, `PERCENT`,
-`RFRACTION`, `PROPORTION`, and `SAVAGE`. Any number of function
-subcommands may appear. If none are given, then the default is `RANK`.
-The `NTILES` subcommand must take an integer specifying the number of
-partitions into which values should be ranked. Each subcommand may be
-followed by the `INTO` keyword and a list of variables which are the
-variables to be created and receive the rank scores. There may be as
-many variables specified as there are variables named on the
-`VARIABLES` subcommand. If fewer are specified, then the variable
-names are automatically created.
-
-The `MISSING` subcommand determines how user missing values are to be
-treated. A setting of `EXCLUDE` means that variables whose values are
-user-missing are to be excluded from the rank scores. A setting of
-`INCLUDE` means they are to be included. The default is `EXCLUDE`.
-
+++ /dev/null
-# REGRESSION
-
-The `REGRESSION` procedure fits linear models to data via least-squares
-estimation. The procedure is appropriate for data which satisfy those
-assumptions typical in linear regression:
-
-- The data set contains \\(n\\) observations of a dependent variable,
- say \\(y_1,...,y_n\\), and \\(n\\) observations of one or more
- explanatory variables. Let \\(x_{11}, x_{12}, ..., x_{1n}\\\)
- denote the \\(n\\) observations of the first explanatory variable;
- \\(x_{21},...,x_{2n}\\) denote the \\(n\\) observations of the
- second explanatory variable; \\(x_{k1},...,x_{kn}\\) denote the
- \\(n\\) observations of the kth explanatory variable.
-
-- The dependent variable \\(y\\) has the following relationship to the
- explanatory variables: \\(y_i = b_0 + b_1 x_{1i} + ... + b_k
- x_{ki} + z_i\\) where \\(b_0, b_1, ..., b_k\\) are unknown
- coefficients, and \\(z_1,...,z_n\\) are independent, normally
- distributed "noise" terms with mean zero and common variance. The
- noise, or "error" terms are unobserved. This relationship is called
- the "linear model".
-
- The `REGRESSION` procedure estimates the coefficients
-\\(b_0,...,b_k\\) and produces output relevant to inferences for the
-linear model.
-
-## Syntax
-
-```
-REGRESSION
- /VARIABLES=VAR_LIST
- /DEPENDENT=VAR_LIST
- /STATISTICS={ALL, DEFAULTS, R, COEFF, ANOVA, BCOV, CI[CONF, TOL]}
- { /ORIGIN | /NOORIGIN }
- /SAVE={PRED, RESID}
-```
-
-The `REGRESSION` procedure reads the active dataset and outputs
-statistics relevant to the linear model specified by the user.
-
-The `VARIABLES` subcommand, which is required, specifies the list of
-variables to be analyzed. Keyword `VARIABLES` is required. The
-`DEPENDENT` subcommand specifies the dependent variable of the linear
-model. The `DEPENDENT` subcommand is required. All variables listed
-in the `VARIABLES` subcommand, but not listed in the `DEPENDENT`
-subcommand, are treated as explanatory variables in the linear model.
-
-All other subcommands are optional:
-
-The `STATISTICS` subcommand specifies which statistics are to be
-displayed. The following keywords are accepted:
-
-* `ALL`
- All of the statistics below.
-* `R`
- The ratio of the sums of squares due to the model to the total sums
- of squares for the dependent variable.
-* `COEFF`
- A table containing the estimated model coefficients and their
- standard errors.
-* `CI (CONF)`
- This item is only relevant if `COEFF` has also been selected. It
- specifies that the confidence interval for the coefficients should
- be printed. The optional value `CONF`, which must be in
- parentheses, is the desired confidence level expressed as a
- percentage.
-* `ANOVA`
- Analysis of variance table for the model.
-* `BCOV`
- The covariance matrix for the estimated model coefficients.
-* `TOL`
- The variance inflation factor and its reciprocal. This has no
- effect unless `COEFF` is also given.
-* `DEFAULT`
- The same as if `R`, `COEFF`, and `ANOVA` had been selected. This is
- what you get if the `/STATISTICS` command is not specified, or if it
- is specified without any parameters.
-
-The `ORIGIN` and `NOORIGIN` subcommands are mutually exclusive.
-`ORIGIN` indicates that the regression should be performed through the
-origin. You should use this option if, and only if you have reason to
-believe that the regression does indeed pass through the origin -- that
-is to say, the value b_0 above, is zero. The default is `NOORIGIN`.
-
-The `SAVE` subcommand causes PSPP to save the residuals or predicted
-values from the fitted model to the active dataset. PSPP will store the
-residuals in a variable called `RES1` if no such variable exists, `RES2`
-if `RES1` already exists, `RES3` if `RES1` and `RES2` already exist,
-etc. It will choose the name of the variable for the predicted values
-similarly, but with `PRED` as a prefix. When `SAVE` is used, PSPP
-ignores `TEMPORARY`, treating temporary transformations as permanent.
-
-## Example
-
-The following PSPP syntax will generate the default output and save the
-predicted values and residuals to the active dataset.
-
-```
-title 'Demonstrate REGRESSION procedure'.
-data list / v0 1-2 (A) v1 v2 3-22 (10).
-begin data.
-b 7.735648 -23.97588
-b 6.142625 -19.63854
-a 7.651430 -25.26557
-c 6.125125 -16.57090
-a 8.245789 -25.80001
-c 6.031540 -17.56743
-a 9.832291 -28.35977
-c 5.343832 -16.79548
-a 8.838262 -29.25689
-b 6.200189 -18.58219
-end data.
-list.
-regression /variables=v0 v1 v2 /statistics defaults /dependent=v2
- /save pred resid /method=enter.
-
-```
+++ /dev/null
-# RELIABILITY
-
-```
-RELIABILITY
- /VARIABLES=VAR_LIST
- /SCALE (NAME) = {VAR_LIST, ALL}
- /MODEL={ALPHA, SPLIT[(N)]}
- /SUMMARY={TOTAL,ALL}
- /MISSING={EXCLUDE,INCLUDE}
-```
-
-The `RELIABILITY` command performs reliability analysis on the data.
-
-The `VARIABLES` subcommand is required. It determines the set of
-variables upon which analysis is to be performed.
-
-The `SCALE` subcommand determines the variables for which reliability
-is to be calculated. If `SCALE` is omitted, then analysis for all
-variables named in the `VARIABLES` subcommand are used. Optionally, the
-NAME parameter may be specified to set a string name for the scale.
-
-The `MODEL` subcommand determines the type of analysis. If `ALPHA`
-is specified, then Cronbach's Alpha is calculated for the scale. If the
-model is `SPLIT`, then the variables are divided into 2 subsets. An
-optional parameter `N` may be given, to specify how many variables to be
-in the first subset. If `N` is omitted, then it defaults to one half of
-the variables in the scale, or one half minus one if there are an odd
-number of variables. The default model is `ALPHA`.
-
-By default, any cases with user missing, or system missing values for
-any variables given in the `VARIABLES` subcommand are omitted from the
-analysis. The `MISSING` subcommand determines whether user missing
-values are included or excluded in the analysis.
-
-The `SUMMARY` subcommand determines the type of summary analysis to
-be performed. Currently there is only one type: `SUMMARY=TOTAL`, which
-displays per-item analysis tested against the totals.
-
-## Example
-
-Before analysing the results of a survey—particularly for a multiple
-choice survey—it is desirable to know whether the respondents have
-considered their answers or simply provided random answers.
-
-In the following example the survey results from the file `hotel.sav`
-are used. All five survey questions are included in the reliability
-analysis. However, before running the analysis, the data must be
-preprocessed. An examination of the survey questions reveals that two
-questions, viz: v3 and v5 are negatively worded, whereas the others
-are positively worded. All questions must be based upon the same
-scale for the analysis to be meaningful. One could use the
-[`RECODE`](../../commands/data/recode.md) command, however a simpler
-way is to use [`COMPUTE`](../../commands/data/compute.md) and this is what
-is done in the syntax below.
-
-```
-get file="hotel.sav".
-
-* Recode V3 and V5 inverting the sense of the values.
-compute v3 = 6 - v3.
-compute v5 = 6 - v5.
-
-reliability
- /variables= all
- /model=alpha.
-```
-
-In this case, all variables in the data set are used, so we can use
-the special keyword `ALL`.
-
-The output, below, shows that Cronbach's Alpha is 0.11 which is a
-value normally considered too low to indicate consistency within the
-data. This is possibly due to the small number of survey questions.
-The survey should be redesigned before serious use of the results are
-applied.
-
-```
-Scale: ANY
-
-Case Processing Summary
-┌────────┬──┬───────┐
-│Cases │ N│Percent│
-├────────┼──┼───────┤
-│Valid │17│ 100.0%│
-│Excluded│ 0│ .0%│
-│Total │17│ 100.0%│
-└────────┴──┴───────┘
-
- Reliability Statistics
-┌────────────────┬──────────┐
-│Cronbach's Alpha│N of Items│
-├────────────────┼──────────┤
-│ .11│ 5│
-└────────────────┴──────────┘
-```
+++ /dev/null
-# ROC
-
-```
-ROC
- VAR_LIST BY STATE_VAR (STATE_VALUE)
- /PLOT = { CURVE [(REFERENCE)], NONE }
- /PRINT = [ SE ] [ COORDINATES ]
- /CRITERIA = [ CUTOFF({INCLUDE,EXCLUDE}) ]
- [ TESTPOS ({LARGE,SMALL}) ]
- [ CI (CONFIDENCE) ]
- [ DISTRIBUTION ({FREE, NEGEXPO }) ]
- /MISSING={EXCLUDE,INCLUDE}
-```
-
-The `ROC` command is used to plot the receiver operating
-characteristic curve of a dataset, and to estimate the area under the
-curve. This is useful for analysing the efficacy of a variable as a
-predictor of a state of nature.
-
-The mandatory `VAR_LIST` is the list of predictor variables. The
-variable `STATE_VAR` is the variable whose values represent the actual
-states, and `STATE_VALUE` is the value of this variable which represents
-the positive state.
-
-The optional subcommand `PLOT` is used to determine if and how the
-`ROC` curve is drawn. The keyword `CURVE` means that the `ROC` curve
-should be drawn, and the optional keyword `REFERENCE`, which should be
-enclosed in parentheses, says that the diagonal reference line should be
-drawn. If the keyword `NONE` is given, then no `ROC` curve is drawn.
-By default, the curve is drawn with no reference line.
-
-The optional subcommand `PRINT` determines which additional tables
-should be printed. Two additional tables are available. The `SE`
-keyword says that standard error of the area under the curve should be
-printed as well as the area itself. In addition, a p-value for the null
-hypothesis that the area under the curve equals 0.5 is printed. The
-`COORDINATES` keyword says that a table of coordinates of the `ROC`
-curve should be printed.
-
-The `CRITERIA` subcommand has four optional parameters:
-
-- The `TESTPOS` parameter may be `LARGE` or `SMALL`. `LARGE` is the
- default, and says that larger values in the predictor variables are
- to be considered positive. `SMALL` indicates that smaller values
- should be considered positive.
-
-- The `CI` parameter specifies the confidence interval that should be
- printed. It has no effect if the `SE` keyword in the `PRINT`
- subcommand has not been given.
-
-- The `DISTRIBUTION` parameter determines the method to be used when
- estimating the area under the curve. There are two possibilities,
- viz: `FREE` and `NEGEXPO`. The `FREE` method uses a non-parametric
- estimate, and the `NEGEXPO` method a bi-negative exponential
- distribution estimate. The `NEGEXPO` method should only be used
- when the number of positive actual states is equal to the number of
- negative actual states. The default is `FREE`.
-
-- The `CUTOFF` parameter is for compatibility and is ignored.
-
-The `MISSING` subcommand determines whether user missing values are to
-be included or excluded in the analysis. The default behaviour is to
-exclude them. Cases are excluded on a listwise basis; if any of the
-variables in `VAR_LIST` or if the variable `STATE_VAR` is missing,
-then the entire case is excluded.
-
+++ /dev/null
-# T-TEST
-
-```
-T-TEST
- /MISSING={ANALYSIS,LISTWISE} {EXCLUDE,INCLUDE}
- /CRITERIA=CI(CONFIDENCE)
-
-
-(One Sample mode.)
- TESTVAL=TEST_VALUE
- /VARIABLES=VAR_LIST
-
-
-(Independent Samples mode.)
- GROUPS=var(VALUE1 [, VALUE2])
- /VARIABLES=VAR_LIST
-
-
-(Paired Samples mode.)
- PAIRS=VAR_LIST [WITH VAR_LIST [(PAIRED)] ]
-```
-
-The `T-TEST` procedure outputs tables used in testing hypotheses
-about means. It operates in one of three modes:
-- [One Sample mode](#one-sample-mode).
-- [Independent Samples mode](#independent-samples-mode).
-- [Paired Samples mode](#paired-samples-mode).
-
-Each of these modes are described in more detail below. There are two
-optional subcommands which are common to all modes.
-
-The `/CRITERIA` subcommand tells PSPP the confidence interval used in
-the tests. The default value is 0.95.
-
-The `MISSING` subcommand determines the handling of missing
-variables. If `INCLUDE` is set, then user-missing values are included
-in the calculations, but system-missing values are not. If `EXCLUDE` is
-set, which is the default, user-missing values are excluded as well as
-system-missing values. This is the default.
-
-If `LISTWISE` is set, then the entire case is excluded from analysis
-whenever any variable specified in the `/VARIABLES`, `/PAIRS` or
-`/GROUPS` subcommands contains a missing value. If `ANALYSIS` is set,
-then missing values are excluded only in the analysis for which they
-would be needed. This is the default.
-
-## One Sample Mode
-
-The `TESTVAL` subcommand invokes the One Sample mode. This mode is used
-to test a population mean against a hypothesized mean. The value given
-to the `TESTVAL` subcommand is the value against which you wish to test.
-In this mode, you must also use the `/VARIABLES` subcommand to tell PSPP
-which variables you wish to test.
-
-### Example
-
-A researcher wishes to know whether the weight of persons in a
-population is different from the national average. The samples are
-drawn from the population under investigation and recorded in the file
-`physiology.sav`. From the Department of Health, she knows that the
-national average weight of healthy adults is 76.8kg. Accordingly the
-`TESTVAL` is set to 76.8. The null hypothesis therefore is that the
-mean average weight of the population from which the sample was drawn is
-76.8kg.
-
- As previously noted, one sample in the dataset contains a weight
-value which is clearly incorrect. So this is excluded from the
-analysis using the `SELECT` command.
-
-```
-GET FILE='physiology.sav'.
-
-SELECT IF (weight > 0).
-
-T-TEST TESTVAL = 76.8
- /VARIABLES = weight.
-```
-
-The output below shows that the mean of our sample differs from the
-test value by -1.40kg. However the significance is very high (0.610).
-So one cannot reject the null hypothesis, and must conclude there is
-not enough evidence to suggest that the mean weight of the persons in
-our population is different from 76.8kg.
-
-```
- One─Sample Statistics
-┌───────────────────┬──┬─────┬──────────────┬─────────┐
-│ │ N│ Mean│Std. Deviation│S.E. Mean│
-├───────────────────┼──┼─────┼──────────────┼─────────┤
-│Weight in kilograms│39│75.40│ 17.08│ 2.73│
-└───────────────────┴──┴─────┴──────────────┴─────────┘
-
- One─Sample Test
-┌──────────────┬──────────────────────────────────────────────────────────────┐
-│ │ Test Value = 76.8 │
-│ ├────┬──┬────────────┬────────────┬────────────────────────────┤
-│ │ │ │ │ │ 95% Confidence Interval of │
-│ │ │ │ │ │ the Difference │
-│ │ │ │ Sig. (2─ │ Mean ├──────────────┬─────────────┤
-│ │ t │df│ tailed) │ Difference │ Lower │ Upper │
-├──────────────┼────┼──┼────────────┼────────────┼──────────────┼─────────────┤
-│Weight in │─.51│38│ .610│ ─1.40│ ─6.94│ 4.13│
-│kilograms │ │ │ │ │ │ │
-└──────────────┴────┴──┴────────────┴────────────┴──────────────┴─────────────┘
-```
-
-## Independent Samples Mode
-
-The `GROUPS` subcommand invokes Independent Samples mode or 'Groups'
-mode. This mode is used to test whether two groups of values have the
-same population mean. In this mode, you must also use the `/VARIABLES`
-subcommand to tell PSPP the dependent variables you wish to test.
-
-The variable given in the `GROUPS` subcommand is the independent
-variable which determines to which group the samples belong. The values
-in parentheses are the specific values of the independent variable for
-each group. If the parentheses are omitted and no values are given, the
-default values of 1.0 and 2.0 are assumed.
-
-If the independent variable is numeric, it is acceptable to specify
-only one value inside the parentheses. If you do this, cases where the
-independent variable is greater than or equal to this value belong to
-the first group, and cases less than this value belong to the second
-group. When using this form of the `GROUPS` subcommand, missing values
-in the independent variable are excluded on a listwise basis, regardless
-of whether `/MISSING=LISTWISE` was specified.
-
-### Example
-
-A researcher wishes to know whether within a population, adult males are
-taller than adult females. The samples are drawn from the population
-under investigation and recorded in the file `physiology.sav`.
-
-As previously noted, one sample in the dataset contains a height value
-which is clearly incorrect. So this is excluded from the analysis
-using the `SELECT` command.
-
-```
-get file='physiology.sav'.
-
-select if (height >= 200).
-
-t-test /variables = height
- /groups = sex(0,1).
-```
-
-The null hypothesis is that both males and females are on average of
-equal height.
-
-
-From the output, shown below, one can clearly see that the _sample_
-mean height is greater for males than for females. However in order
-to see if this is a significant result, one must consult the T-Test
-table.
-
-The T-Test table contains two rows; one for use if the variance of
-the samples in each group may be safely assumed to be equal, and the
-second row if the variances in each group may not be safely assumed to
-be equal.
-
-In this case however, both rows show a 2-tailed significance less
-than 0.001 and one must therefore reject the null hypothesis and
-conclude that within the population the mean height of males and of
-females are unequal.
-
-```
- Group Statistics
-┌────────────────────────────┬──┬───────┬──────────────┬─────────┐
-│ Group │ N│ Mean │Std. Deviation│S.E. Mean│
-├────────────────────────────┼──┼───────┼──────────────┼─────────┤
-│Height in millimeters Male │22│1796.49│ 49.71│ 10.60│
-│ Female│17│1610.77│ 25.43│ 6.17│
-└────────────────────────────┴──┴───────┴──────────────┴─────────┘
-
- Independent Samples Test
-┌─────────────────────┬──────────┬──────────────────────────────────────────
-│ │ Levene's │
-│ │ Test for │
-│ │ Equality │
-│ │ of │
-│ │ Variances│ T─Test for Equality of Means
-│ ├────┬─────┼─────┬─────┬───────┬──────────┬──────────┐
-│ │ │ │ │ │ │ │ │
-│ │ │ │ │ │ │ │ │
-│ │ │ │ │ │ │ │ │
-│ │ │ │ │ │ │ │ │
-│ │ │ │ │ │ Sig. │ │ │
-│ │ │ │ │ │ (2─ │ Mean │Std. Error│
-│ │ F │ Sig.│ t │ df │tailed)│Difference│Difference│
-├─────────────────────┼────┼─────┼─────┼─────┼───────┼──────────┼──────────┤
-│Height in Equal │ .97│ .331│14.02│37.00│ .000│ 185.72│ 13.24│
-│millimeters variances│ │ │ │ │ │ │ │
-│ assumed │ │ │ │ │ │ │ │
-│ Equal │ │ │15.15│32.71│ .000│ 185.72│ 12.26│
-│ variances│ │ │ │ │ │ │ │
-│ not │ │ │ │ │ │ │ │
-│ assumed │ │ │ │ │ │ │ │
-└─────────────────────┴────┴─────┴─────┴─────┴───────┴──────────┴──────────┘
-
-┌─────────────────────┬─────────────┐
-│ │ │
-│ │ │
-│ │ │
-│ │ │
-│ │ │
-│ ├─────────────┤
-│ │ 95% │
-│ │ Confidence │
-│ │ Interval of │
-│ │ the │
-│ │ Difference │
-│ ├──────┬──────┤
-│ │ Lower│ Upper│
-├─────────────────────┼──────┼──────┤
-│Height in Equal │158.88│212.55│
-│millimeters variances│ │ │
-│ assumed │ │ │
-│ Equal │160.76│210.67│
-│ variances│ │ │
-│ not │ │ │
-│ assumed │ │ │
-└─────────────────────┴──────┴──────┘
-```
-
-## Paired Samples Mode
-
-The `PAIRS` subcommand introduces Paired Samples mode. Use this mode
-when repeated measures have been taken from the same samples. If the
-`WITH` keyword is omitted, then tables for all combinations of variables
-given in the `PAIRS` subcommand are generated. If the `WITH` keyword is
-given, and the `(PAIRED)` keyword is also given, then the number of
-variables preceding `WITH` must be the same as the number following it.
-In this case, tables for each respective pair of variables are
-generated. In the event that the `WITH` keyword is given, but the
-`(PAIRED)` keyword is omitted, then tables for each combination of
-variable preceding `WITH` against variable following `WITH` are
-generated.
-
--- /dev/null
+# STRING
+
+`STRING` creates new string variables.
+
+```
+STRING VAR_LIST (FMT_SPEC) [/VAR_LIST (FMT_SPEC)] [...].
+```
+
+Specify a list of names for the variable you want to create, followed
+by the desired [output
+format](../language/datasets/formats/index.html) in parentheses.
+format](../language/datasets/formats/index.html) in parentheses.
+Variable widths are implicitly derived from the specified output
+formats. The created variables will be initialized to spaces.
+
+If you want to create several variables with distinct output formats,
+you can either use two or more separate `STRING` commands, or you can
+specify further variable list and format specification pairs, each
+separated from the previous by a slash (`/`).
+
+The following example is one way to create three string variables; Two
+of the variables have format `A24` and the other `A80`:
+
+```
+STRING firstname lastname (A24) / address (A80).
+```
+
+Here is another way to achieve the same result:
+
+```
+STRING firstname lastname (A24).
+STRING address (A80).
+```
+
+... and here is yet another way:
+
+```
+STRING firstname (A24).
+STRING lastname (A24).
+STRING address (A80).
+```
--- /dev/null
+# SUBTITLE
+
+```
+SUBTITLE 'SUBTITLE_STRING'.
+ or
+SUBTITLE SUBTITLE_STRING.
+```
+
+`SUBTITLE` provides a subtitle to a particular PSPP run. This
+subtitle appears at the top of each output page below the title, if
+headers are enabled on the output device.
+
+Specify a subtitle as a string in quotes. The alternate syntax that
+did not require quotes is now obsolete. If it is used then the subtitle
+is converted to all uppercase.
+
--- /dev/null
+# SYSFILE INFO
+
+```
+SYSFILE INFO FILE='FILE_NAME' [ENCODING='ENCODING'].
+```
+
+`SYSFILE INFO` reads the dictionary in an SPSS system file, SPSS/PC+
+system file, or SPSS portable file, and displays the information in
+its dictionary.
+
+Specify a file name or file handle. `SYSFILE INFO` reads that file
+and displays information on its dictionary.
+
+PSPP automatically detects the encoding of string data in the file,
+when possible. The character encoding of old SPSS system files cannot
+always be guessed correctly, and SPSS/PC+ system files do not include
+any indication of their encoding. Specify the `ENCODING` subcommand
+with an IANA character set name as its string argument to override the
+default, or specify `ENCODING='DETECT'` to analyze and report possibly
+valid encodings for the system file. The `ENCODING` subcommand is a
+PSPP extension.
+
+`SYSFILE INFO` does not affect the current active dataset.
+
--- /dev/null
+# T-TEST
+
+```
+T-TEST
+ /MISSING={ANALYSIS,LISTWISE} {EXCLUDE,INCLUDE}
+ /CRITERIA=CI(CONFIDENCE)
+
+
+(One Sample mode.)
+ TESTVAL=TEST_VALUE
+ /VARIABLES=VAR_LIST
+
+
+(Independent Samples mode.)
+ GROUPS=var(VALUE1 [, VALUE2])
+ /VARIABLES=VAR_LIST
+
+
+(Paired Samples mode.)
+ PAIRS=VAR_LIST [WITH VAR_LIST [(PAIRED)] ]
+```
+
+The `T-TEST` procedure outputs tables used in testing hypotheses
+about means. It operates in one of three modes:
+- [One Sample mode](#one-sample-mode).
+- [Independent Samples mode](#independent-samples-mode).
+- [Paired Samples mode](#paired-samples-mode).
+
+Each of these modes are described in more detail below. There are two
+optional subcommands which are common to all modes.
+
+The `/CRITERIA` subcommand tells PSPP the confidence interval used in
+the tests. The default value is 0.95.
+
+The `MISSING` subcommand determines the handling of missing
+variables. If `INCLUDE` is set, then user-missing values are included
+in the calculations, but system-missing values are not. If `EXCLUDE` is
+set, which is the default, user-missing values are excluded as well as
+system-missing values. This is the default.
+
+If `LISTWISE` is set, then the entire case is excluded from analysis
+whenever any variable specified in the `/VARIABLES`, `/PAIRS` or
+`/GROUPS` subcommands contains a missing value. If `ANALYSIS` is set,
+then missing values are excluded only in the analysis for which they
+would be needed. This is the default.
+
+## One Sample Mode
+
+The `TESTVAL` subcommand invokes the One Sample mode. This mode is used
+to test a population mean against a hypothesized mean. The value given
+to the `TESTVAL` subcommand is the value against which you wish to test.
+In this mode, you must also use the `/VARIABLES` subcommand to tell PSPP
+which variables you wish to test.
+
+### Example
+
+A researcher wishes to know whether the weight of persons in a
+population is different from the national average. The samples are
+drawn from the population under investigation and recorded in the file
+`physiology.sav`. From the Department of Health, she knows that the
+national average weight of healthy adults is 76.8kg. Accordingly the
+`TESTVAL` is set to 76.8. The null hypothesis therefore is that the
+mean average weight of the population from which the sample was drawn is
+76.8kg.
+
+ As previously noted, one sample in the dataset contains a weight
+value which is clearly incorrect. So this is excluded from the
+analysis using the `SELECT` command.
+
+```
+GET FILE='physiology.sav'.
+
+SELECT IF (weight > 0).
+
+T-TEST TESTVAL = 76.8
+ /VARIABLES = weight.
+```
+
+The output below shows that the mean of our sample differs from the
+test value by -1.40kg. However the significance is very high (0.610).
+So one cannot reject the null hypothesis, and must conclude there is
+not enough evidence to suggest that the mean weight of the persons in
+our population is different from 76.8kg.
+
+```
+ One─Sample Statistics
+┌───────────────────┬──┬─────┬──────────────┬─────────┐
+│ │ N│ Mean│Std. Deviation│S.E. Mean│
+├───────────────────┼──┼─────┼──────────────┼─────────┤
+│Weight in kilograms│39│75.40│ 17.08│ 2.73│
+└───────────────────┴──┴─────┴──────────────┴─────────┘
+
+ One─Sample Test
+┌──────────────┬──────────────────────────────────────────────────────────────┐
+│ │ Test Value = 76.8 │
+│ ├────┬──┬────────────┬────────────┬────────────────────────────┤
+│ │ │ │ │ │ 95% Confidence Interval of │
+│ │ │ │ │ │ the Difference │
+│ │ │ │ Sig. (2─ │ Mean ├──────────────┬─────────────┤
+│ │ t │df│ tailed) │ Difference │ Lower │ Upper │
+├──────────────┼────┼──┼────────────┼────────────┼──────────────┼─────────────┤
+│Weight in │─.51│38│ .610│ ─1.40│ ─6.94│ 4.13│
+│kilograms │ │ │ │ │ │ │
+└──────────────┴────┴──┴────────────┴────────────┴──────────────┴─────────────┘
+```
+
+## Independent Samples Mode
+
+The `GROUPS` subcommand invokes Independent Samples mode or 'Groups'
+mode. This mode is used to test whether two groups of values have the
+same population mean. In this mode, you must also use the `/VARIABLES`
+subcommand to tell PSPP the dependent variables you wish to test.
+
+The variable given in the `GROUPS` subcommand is the independent
+variable which determines to which group the samples belong. The values
+in parentheses are the specific values of the independent variable for
+each group. If the parentheses are omitted and no values are given, the
+default values of 1.0 and 2.0 are assumed.
+
+If the independent variable is numeric, it is acceptable to specify
+only one value inside the parentheses. If you do this, cases where the
+independent variable is greater than or equal to this value belong to
+the first group, and cases less than this value belong to the second
+group. When using this form of the `GROUPS` subcommand, missing values
+in the independent variable are excluded on a listwise basis, regardless
+of whether `/MISSING=LISTWISE` was specified.
+
+### Example
+
+A researcher wishes to know whether within a population, adult males are
+taller than adult females. The samples are drawn from the population
+under investigation and recorded in the file `physiology.sav`.
+
+As previously noted, one sample in the dataset contains a height value
+which is clearly incorrect. So this is excluded from the analysis
+using the `SELECT` command.
+
+```
+get file='physiology.sav'.
+
+select if (height >= 200).
+
+t-test /variables = height
+ /groups = sex(0,1).
+```
+
+The null hypothesis is that both males and females are on average of
+equal height.
+
+
+From the output, shown below, one can clearly see that the _sample_
+mean height is greater for males than for females. However in order
+to see if this is a significant result, one must consult the T-Test
+table.
+
+The T-Test table contains two rows; one for use if the variance of
+the samples in each group may be safely assumed to be equal, and the
+second row if the variances in each group may not be safely assumed to
+be equal.
+
+In this case however, both rows show a 2-tailed significance less
+than 0.001 and one must therefore reject the null hypothesis and
+conclude that within the population the mean height of males and of
+females are unequal.
+
+```
+ Group Statistics
+┌────────────────────────────┬──┬───────┬──────────────┬─────────┐
+│ Group │ N│ Mean │Std. Deviation│S.E. Mean│
+├────────────────────────────┼──┼───────┼──────────────┼─────────┤
+│Height in millimeters Male │22│1796.49│ 49.71│ 10.60│
+│ Female│17│1610.77│ 25.43│ 6.17│
+└────────────────────────────┴──┴───────┴──────────────┴─────────┘
+
+ Independent Samples Test
+┌─────────────────────┬──────────┬──────────────────────────────────────────
+│ │ Levene's │
+│ │ Test for │
+│ │ Equality │
+│ │ of │
+│ │ Variances│ T─Test for Equality of Means
+│ ├────┬─────┼─────┬─────┬───────┬──────────┬──────────┐
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ Sig. │ │ │
+│ │ │ │ │ │ (2─ │ Mean │Std. Error│
+│ │ F │ Sig.│ t │ df │tailed)│Difference│Difference│
+├─────────────────────┼────┼─────┼─────┼─────┼───────┼──────────┼──────────┤
+│Height in Equal │ .97│ .331│14.02│37.00│ .000│ 185.72│ 13.24│
+│millimeters variances│ │ │ │ │ │ │ │
+│ assumed │ │ │ │ │ │ │ │
+│ Equal │ │ │15.15│32.71│ .000│ 185.72│ 12.26│
+│ variances│ │ │ │ │ │ │ │
+│ not │ │ │ │ │ │ │ │
+│ assumed │ │ │ │ │ │ │ │
+└─────────────────────┴────┴─────┴─────┴─────┴───────┴──────────┴──────────┘
+
+┌─────────────────────┬─────────────┐
+│ │ │
+│ │ │
+│ │ │
+│ │ │
+│ │ │
+│ ├─────────────┤
+│ │ 95% │
+│ │ Confidence │
+│ │ Interval of │
+│ │ the │
+│ │ Difference │
+│ ├──────┬──────┤
+│ │ Lower│ Upper│
+├─────────────────────┼──────┼──────┤
+│Height in Equal │158.88│212.55│
+│millimeters variances│ │ │
+│ assumed │ │ │
+│ Equal │160.76│210.67│
+│ variances│ │ │
+│ not │ │ │
+│ assumed │ │ │
+└─────────────────────┴──────┴──────┘
+```
+
+## Paired Samples Mode
+
+The `PAIRS` subcommand introduces Paired Samples mode. Use this mode
+when repeated measures have been taken from the same samples. If the
+`WITH` keyword is omitted, then tables for all combinations of variables
+given in the `PAIRS` subcommand are generated. If the `WITH` keyword is
+given, and the `(PAIRED)` keyword is also given, then the number of
+variables preceding `WITH` must be the same as the number following it.
+In this case, tables for each respective pair of variables are
+generated. In the event that the `WITH` keyword is given, but the
+`(PAIRED)` keyword is omitted, then tables for each combination of
+variable preceding `WITH` against variable following `WITH` are
+generated.
+
--- /dev/null
+# TEMPORARY
+
+```
+TEMPORARY.
+```
+
+`TEMPORARY` is used to make the effects of transformations following
+its execution temporary. These transformations affect only the
+execution of the next procedure or procedure-like command. Their
+effects are not be saved to the active dataset.
+
+The only specification on `TEMPORARY` is the command name.
+
+`TEMPORARY` may not appear within a `DO IF` or `LOOP` construct. It
+may appear only once between procedures and procedure-like commands.
+
+Scratch variables cannot be used following `TEMPORARY`.
+
+## Example
+
+In the syntax below, there are two `COMPUTE` transformation. One of
+them immediately follows a `TEMPORARY` command, and therefore affects
+only the next procedure, which in this case is the first
+`DESCRIPTIVES` command.
+
+```
+data list notable /x 1-2.
+begin data.
+ 2
+ 4
+10
+15
+20
+24
+end data.
+
+compute x=x/2.
+
+temporary.
+compute x=x+3.
+
+descriptives x.
+descriptives x.
+```
+
+The data read by the first `DESCRIPTIVES` procedure are 4, 5, 8, 10.5,
+13, 15. The data read by the second `DESCRIPTIVES` procedure are 1,
+2, 5, 7.5, 10, 12. This is because the second `COMPUTE`
+transformation has no effect on the second `DESCRIPTIVES` procedure.
+You can check these figures in the following output.
+
+```
+ Descriptive Statistics
+┌────────────────────┬─┬────┬───────┬───────┬───────┐
+│ │N│Mean│Std Dev│Minimum│Maximum│
+├────────────────────┼─┼────┼───────┼───────┼───────┤
+│x │6│9.25│ 4.38│ 4│ 15│
+│Valid N (listwise) │6│ │ │ │ │
+│Missing N (listwise)│0│ │ │ │ │
+└────────────────────┴─┴────┴───────┴───────┴───────┘
+
+ Descriptive Statistics
+┌────────────────────┬─┬────┬───────┬───────┬───────┐
+│ │N│Mean│Std Dev│Minimum│Maximum│
+├────────────────────┼─┼────┼───────┼───────┼───────┤
+│x │6│6.25│ 4.38│ 1│ 12│
+│Valid N (listwise) │6│ │ │ │ │
+│Missing N (listwise)│0│ │ │ │ │
+└────────────────────┴─┴────┴───────┴───────┴───────┘
+```
--- /dev/null
+# TITLE
+
+```
+TITLE 'TITLE_STRING'.
+ or
+TITLE TITLE_STRING.
+```
+
+`TITLE` provides a title to a particular PSPP run. This title
+appears at the top of each output page, if headers are enabled on the
+output device.
+
+Specify a title as a string in quotes. The alternate syntax that did
+not require quotes is now obsolete. If it is used then the title is
+converted to all uppercase.
+
--- /dev/null
+# UPDATE
+
+```
+UPDATE
+
+Per input file:
+ /FILE={*,'FILE_NAME'}
+ [/RENAME=(SRC_NAMES=TARGET_NAMES)...]
+ [/IN=VAR_NAME]
+ [/SORT]
+
+Once per command:
+ /BY VAR_LIST[({D|A})] [VAR_LIST[({D|A})]]...
+ [/DROP=VAR_LIST]
+ [/KEEP=VAR_LIST]
+ [/MAP]
+```
+
+`UPDATE` updates a "master file" by applying modifications from one
+or more "transaction files".
+
+`UPDATE` shares the bulk of its syntax with other PSPP commands for
+combining multiple data files (see [Common
+Syntax](combining.md#common-syntax) for details).
+
+At least two `FILE` subcommands must be specified. The first `FILE`
+subcommand names the master file, and the rest name transaction files.
+Every input file must either be sorted on the variables named on the
+`BY` subcommand, or the `SORT` subcommand must be used just after the
+`FILE` subcommand for that input file.
+
+`UPDATE` uses the variables specified on the `BY` subcommand, which
+is required, to attempt to match each case in a transaction file with a
+case in the master file:
+
+- When a match is found, then the values of the variables present in
+ the transaction file replace those variables' values in the new
+ active file. If there are matching cases in more than more
+ transaction file, PSPP applies the replacements from the first
+ transaction file, then from the second transaction file, and so on.
+ Similarly, if a single transaction file has cases with duplicate
+ `BY` values, then those are applied in order to the master file.
+
+ When a variable in a transaction file has a missing value or when a
+ string variable's value is all blanks, that value is never used to
+ update the master file.
+
+- If a case in the master file has no matching case in any transaction
+ file, then it is copied unchanged to the output.
+
+- If a case in a transaction file has no matching case in the master
+ file, then it causes a new case to be added to the output,
+ initialized from the values in the transaction file.
+
--- /dev/null
+# Utility Commands
+
+This chapter describes commands that don't fit in other categories.
+
+Most of these commands are not affected by commands like `IF` and
+`LOOP`: they take effect only once, unconditionally, at the time that
+they are encountered in the input.
+++ /dev/null
-# ADD DOCUMENT
-
-```
-ADD DOCUMENT
- 'line one' 'line two' ... 'last line' .
-```
-
-`ADD DOCUMENT` adds one or more lines of descriptive commentary to
-the active dataset. Documents added in this way are saved to system
-files. They can be viewed using `SYSFILE INFO` or `DISPLAY DOCUMENTS`.
-They can be removed from the active dataset with `DROP DOCUMENTS`.
-
-Each line of documentary text must be enclosed in quotation marks, and
-may not be more than 80 bytes long. See also
-[`DOCUMENT`](document.md).
-
+++ /dev/null
-# CACHE
-
-```
-CACHE.
-```
-
-This command is accepted, for compatibility, but it has no effect.
-
+++ /dev/null
-# CD
-
-```
-CD 'new directory' .
-```
-
-`CD` changes the current directory. The new directory becomes that
-specified by the command.
-
+++ /dev/null
-# COMMENT
-
-```
-Comment commands:
- COMMENT comment text ... .
- *comment text ... .
-
-Comments within a line of syntax:
- FREQUENCIES /VARIABLES=v0 v1 v2. /* All our categorical variables.
-```
-
-`COMMENT` is ignored. It is used to provide information to the
-author and other readers of the PSPP syntax file.
-
-`COMMENT` can extend over any number of lines. It ends at a dot at
-the end of a line or a blank line. The comment may contain any
-characters.
-
-PSPP also supports comments within a line of syntax, introduced with
-`/*`. These comments end at the first `*/` or at the end of the line,
-whichever comes first. A line that contains just this kind of comment
-is considered blank and ends the current command.
-
+++ /dev/null
-# DISPLAY DOCUMENTS
-
-```
-DISPLAY DOCUMENTS.
-```
-
-`DISPLAY DOCUMENTS` displays the documents in the active dataset.
-Each document is preceded by a line giving the time and date that it
-was added. See also [`DOCUMENT`](document.md).
-
+++ /dev/null
-# DISPLAY FILE LABEL
-
-```
-DISPLAY FILE LABEL.
-```
-
-`DISPLAY FILE LABEL` displays the file label contained in the active
-dataset, if any. See also [`FILE LABEL`](file-label.md).
-
-This command is a PSPP extension.
-
+++ /dev/null
-# DOCUMENT
-
-```
-DOCUMENT DOCUMENTARY_TEXT.
-```
-
-`DOCUMENT` adds one or more lines of descriptive commentary to the
-active dataset. Documents added in this way are saved to system
-files. They can be viewed using `SYSFILE INFO` or [`DISPLAY
-DOCUMENTS`](display-documents.md). They can be removed from the
-active dataset with [`DROP DOCUMENTS`](drop-documents.md).
-
-Specify the text of the document following the `DOCUMENT` keyword. It
-is interpreted literally—any quotes or other punctuation marks are
-included in the file. You can extend the documentary text over as
-many lines as necessary, including blank lines to separate paragraphs.
-Lines are truncated at 80 bytes. Don't forget to terminate the
-command with a dot at the end of a line. See also [ADD
-DOCUMENT](add-document.md).
-
+++ /dev/null
-# DROP DOCUMENTS
-
-```
-DROP DOCUMENTS.
-```
-
-`DROP DOCUMENTS` removes all documents from the active dataset. New
-documents can be added with [`DOCUMENT`](document.md).
-
-`DROP DOCUMENTS` changes only the active dataset. It does not modify
-any system files stored on disk.
-
+++ /dev/null
-# ECHO
-
-```
-ECHO 'arbitrary text' .
-```
-
-Use `ECHO` to write arbitrary text to the output stream. The text
-should be enclosed in quotation marks following the normal rules for
-[string tokens](../../language/basics/tokens.md#strings).
-
+++ /dev/null
-# ERASE
-
-```
-ERASE FILE "FILE_NAME".
-```
-
-`ERASE FILE` deletes a file from the local file system. The file's
-name must be quoted. This command cannot be used if the
-[`SAFER`](../utilities/set.md#safer) setting is active.
-
+++ /dev/null
-# EXECUTE
-
-```
-EXECUTE.
-```
-
-`EXECUTE` causes the active dataset to be read and all pending
-transformations to be executed.
-
+++ /dev/null
-# FILE LABEL
-
-```
-FILE LABEL file label.
-```
-
-`FILE LABEL` provides a title for the active dataset. This title is
-saved into system files and portable files that are created during
-this PSPP run.
-
-The file label should not be quoted. If quotes are included, they are
-become part of the file label.
-
+++ /dev/null
-# FINISH
-
-```
-FINISH.
-```
-
-`FINISH` terminates the current PSPP session and returns control to
-the operating system.
-
+++ /dev/null
-# HOST
-
-In the syntax below, the square brackets must be included in the command
-syntax and do not indicate that that their contents are optional.
-
-```
-HOST COMMAND=['COMMAND'...]
- TIMELIMIT=SECS.
-```
-
-`HOST` executes one or more commands, each provided as a string in
-the required `COMMAND` subcommand, in the shell of the underlying
-operating system. PSPP runs each command in a separate shell process
-and waits for it to finish before running the next one. If a command
-fails (with a nonzero exit status, or because it is killed by a signal),
-then PSPP does not run any remaining commands.
-
-PSPP provides `/dev/null` as the shell's standard input. If a
-process needs to read from stdin, redirect from a file or device, or use
-a pipe.
-
-PSPP displays the shell's standard output and standard error as PSPP
-output. Redirect to a file or `/dev/null` or another device if this is
-not desired.
-
-By default, PSPP waits as long as necessary for the series of
-commands to complete. Use the optional `TIMELIMIT` subcommand to limit
-the execution time to the specified number of seconds.
-
-PSPP built for mingw does not support all the features of `HOST`.
-
-PSPP rejects this command if the [`SAFER`](../utilities/set.md#safer)
-setting is active.
-
-## Example
-
-The following example runs `rsync` to copy a file from a remote
-server to the local file `data.txt`, writing `rsync`'s own output to
-`rsync-log.txt`. PSPP displays the command's error output, if any. If
-`rsync` needs to prompt the user (e.g. to obtain a password), the
-command fails. Only if the `rsync` succeeds, PSPP then runs the
-`sha512sum` command.
-
-```
-HOST COMMAND=['rsync remote:data.txt data.txt > rsync-log.txt'
- 'sha512sum -c data.txt.sha512sum].
-```
-
+++ /dev/null
-# INCLUDE
-
-```
-INCLUDE [FILE=]'FILE_NAME' [ENCODING='ENCODING'].
-```
-
-`INCLUDE` causes the PSPP command processor to read an additional
-command file as if it were included bodily in the current command file.
-If errors are encountered in the included file, then command processing
-stops and no more commands are processed. Include files may be nested
-to any depth, up to the limit of available memory.
-
-The [`INSERT`](insert.md) command is a more flexible alternative to
-`INCLUDE`. An `INCLUDE` command acts the same as `INSERT` with
-`ERROR=STOP CD=NO SYNTAX=BATCH` specified.
-
-The optional `ENCODING` subcommand has the same meaning as with
-`INSERT`.
-
+++ /dev/null
-# Utility Commands
-
-This chapter describes commands that don't fit in other categories.
-
-Most of these commands are not affected by commands like `IF` and
-`LOOP`: they take effect only once, unconditionally, at the time that
-they are encountered in the input.
+++ /dev/null
-# INSERT
-
-```
-INSERT [FILE=]'FILE_NAME'
- [CD={NO,YES}]
- [ERROR={CONTINUE,STOP}]
- [SYNTAX={BATCH,INTERACTIVE}]
- [ENCODING={LOCALE, 'CHARSET_NAME'}].
-```
-
-`INSERT` is similar to [`INCLUDE`](include.md) but more flexible. It
-causes the command processor to read a file as if it were embedded in
-the current command file.
-
-If `CD=YES` is specified, then before including the file, the current
-directory becomes the directory of the included file. The default
-setting is `CD=NO`. This directory remains current until it is
-changed explicitly (with the `CD` command, or a subsequent `INSERT`
-command with the `CD=YES` option). It does not revert to its original
-setting even after the included file is finished processing.
-
-If `ERROR=STOP` is specified, errors encountered in the inserted file
-causes processing to immediately cease. Otherwise processing continues
-at the next command. The default setting is `ERROR=CONTINUE`.
-
-If `SYNTAX=INTERACTIVE` is specified then the syntax contained in the
-included file must conform to [interactive syntax
-conventions](../../language/basics/syntax-variants.md). The default
-setting is `SYNTAX=BATCH`.
-
-`ENCODING` optionally specifies the character set used by the
-included file. Its argument, which is not case-sensitive, must be in
-one of the following forms:
-
-* `LOCALE`
- The encoding used by the system locale, or as overridden by [`SET
- LOCALE`](../utilities/set.md#locale). On GNU/Linux and other
- Unix-like systems, environment variables, e.g. `LANG` or `LC_ALL`,
- determine the system locale.
-
-* `'CHARSET_NAME'`
- An [IANA character set
- name](http://www.iana.org/assignments/character-sets). Some
- examples are `ASCII` (United States), `ISO-8859-1` (western Europe),
- `EUC-JP` (Japan), and `windows-1252` (Windows). Not all systems
- support all character sets.
-
-* `Auto,ENCODING`
- Automatically detects whether a syntax file is encoded in a Unicode
- encoding such as UTF-8, UTF-16, or UTF-32. If it is not, then PSPP
- generally assumes that the file is encoded in `ENCODING` (an IANA
- character set name). However, if `ENCODING` is UTF-8, and the
- syntax file is not valid UTF-8, PSPP instead assumes that the file
- is encoded in `windows-1252`.
-
- For best results, `ENCODING` should be an ASCII-compatible encoding
- (the most common locale encodings are all ASCII-compatible),
- because encodings that are not ASCII compatible cannot be
- automatically distinguished from UTF-8.
-
-* `Auto`
- `Auto,Locale`
- Automatic detection, as above, with the default encoding taken from
- the system locale or the setting on `SET LOCALE`.
-
-When `ENCODING` is not specified, the default is taken from the
-`--syntax-encoding` command option, if it was specified, and otherwise
-it is `Auto`.
-
+++ /dev/null
-# OUTPUT
-
-In the syntax below, the characters `[` and `]` are literals. They
-must appear in the syntax to be interpreted:
-
-```
-OUTPUT MODIFY
- /SELECT TABLES
- /TABLECELLS SELECT = [ CLASS... ]
- FORMAT = FMT_SPEC.
-```
-
-`OUTPUT` changes the appearance of the tables in which results are
-printed. In particular, it can be used to set the format and precision
-to which results are displayed.
-
-After running this command, the default table appearance parameters
-will have been modified and each new output table generated uses the new
-parameters.
-
-Following `/TABLECELLS SELECT =` a list of cell classes must appear,
-enclosed in square brackets. This list determines the classes of values
-should be selected for modification. Each class can be:
-
-* `RESIDUAL`: Residual values. Default: `F40.2`.
-
-* `CORRELATION`: Correlations. Default: `F40.3`.
-
-* `PERCENT`: Percentages. Default: `PCT40.1`.
-
-* `SIGNIFICANCE`: Significance of tests (p-values). Default: `F40.3`.
-
-* `COUNT`: Counts or sums of weights. For a weighted data set, the
- default is the weight variable's print format. For an unweighted
- data set, the default is `F40.0`.
-
-For most other numeric values that appear in tables, [`SET
-FORMAT`](set.md#format)) may be used to specify the format.
-
-`FMT_SPEC` must be a valid [output
-format](../../language/datasets/formats/index.md). Not all possible
-formats are meaningful for all classes.
-
+++ /dev/null
-# PERMISSIONS
-
-```
-PERMISSIONS
- FILE='FILE_NAME'
- /PERMISSIONS = {READONLY,WRITEABLE}.
-```
-
-`PERMISSIONS` changes the permissions of a file. There is one
-mandatory subcommand which specifies the permissions to which the file
-should be changed. If you set a file's permission to `READONLY`, then
-the file will become unwritable either by you or anyone else on the
-system. If you set the permission to `WRITEABLE`, then the file becomes
-writeable by you; the permissions afforded to others are unchanged.
-This command cannot be used if the [`SAFER`](../utilities/set.md#safer) setting is
-active.
-
+++ /dev/null
-# PRESERVE…RESTORE
-
-```
-PRESERVE.
-...
-RESTORE.
-```
-
-`PRESERVE` saves all of the settings that [`SET`](set.md) can adjust.
-A later `RESTORE` command restores those settings.
-
-`PRESERVE` can be nested up to five levels deep.
-
+++ /dev/null
-# SET
-
-```
-SET
-
-(data input)
- /BLANKS={SYSMIS,'.',number}
- /DECIMAL={DOT,COMMA}
- /FORMAT=FMT_SPEC
- /EPOCH={AUTOMATIC,YEAR}
- /RIB={NATIVE,MSBFIRST,LSBFIRST}
-
-(interaction)
- /MXERRS=MAX_ERRS
- /MXWARNS=MAX_WARNINGS
- /WORKSPACE=WORKSPACE_SIZE
-
-(syntax execution)
- /LOCALE='LOCALE'
- /MXLOOPS=MAX_LOOPS
- /SEED={RANDOM,SEED_VALUE}
- /UNDEFINED={WARN,NOWARN}
- /FUZZBITS=FUZZBITS
- /SCALEMIN=COUNT
-
-(data output)
- /CC{A,B,C,D,E}={'NPRE,PRE,SUF,NSUF','NPRE.PRE.SUF.NSUF'}
- /DECIMAL={DOT,COMMA}
- /FORMAT=FMT_SPEC
- /LEADZERO={ON,OFF}
- /MDISPLAY={TEXT,TABLES}
- /SMALL=NUMBER
- /WIB={NATIVE,MSBFIRST,LSBFIRST}
-
-(output routing)
- /ERRORS={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
- /MESSAGES={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
- /PRINTBACK={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
- /RESULTS={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
-
-(output driver options)
- /HEADERS={NO,YES,BLANK}
- /LENGTH={NONE,N_LINES}
- /WIDTH={NARROW,WIDTH,N_CHARACTERS}
- /TNUMBERS={VALUES,LABELS,BOTH}
- /TVARS={NAMES,LABELS,BOTH}
- /TLOOK={NONE,FILE}
-
-(logging)
- /JOURNAL={ON,OFF} ['FILE_NAME']
-
-(system files)
- /SCOMPRESSION={ON,OFF}
-
-(miscellaneous)
- /SAFER=ON
- /LOCALE='STRING'
-
-(macros)
- /MEXPAND={ON,OFF}
- /MPRINT={ON,OFF}
- /MITERATE=NUMBER
- /MNEST=NUMBER
-
-(settings not yet implemented, but accepted and ignored)
- /BASETEXTDIRECTION={AUTOMATIC,RIGHTTOLEFT,LEFTTORIGHT}
- /BLOCK='C'
- /BOX={'XXX','XXXXXXXXXXX'}
- /CACHE={ON,OFF}
- /CELLSBREAK=NUMBER
- /COMPRESSION={ON,OFF}
- /CMPTRANS={ON,OFF}
- /HEADER={NO,YES,BLANK}
-```
-
-`SET` allows the user to adjust several parameters relating to PSPP's
-execution. Since there are many subcommands to this command, its
-subcommands are examined in groups.
-
-For subcommands that take boolean values, `ON` and `YES` are
-synonymous, as are `OFF` and `NO`, when used as subcommand values.
-
-The data input subcommands affect the way that data is read from data
-files. The data input subcommands are
-
-* `BLANKS`
- This is the value assigned to an item data item that is empty or
- contains only white space. An argument of SYSMIS or '.' causes
- the system-missing value to be assigned to null items. This is the
- default. Any real value may be assigned.
-
-* <a name="decimal">`DECIMAL`</a>
- This value may be set to `DOT` or `COMMA`. Setting it to `DOT`
- causes the decimal point character to be `.` and the grouping
- character to be `,`. Setting it to `COMMA` causes the decimal point
- character to be `,` and the grouping character to be `.`. If the
- setting is `COMMA`, then `,` is not treated as a field separator in
- the [`DATA LIST`](../data-io/data-list.md) command. The default
- value is determined from the system locale.
-
-* <a name="format">`FORMAT`</a>
- Changes the default numeric [input/output
- format](../../language/datasets/formats/index.md). The default is
- initially `F8.2`.
-
-* <a name="epoch">`EPOCH`</a>
- Specifies the range of years used when a 2-digit year is read from a
- data file or used in a [date construction
- expression](../../language/expressions/functions/time-and-date.md#constructing-dates).
- If a 4-digit year is specified for the epoch, then 2-digit years are
- interpreted starting from that year, known as the epoch. If
- `AUTOMATIC` (the default) is specified, then the epoch begins 69
- years before the current date.
-
-* <a name="rib">`RIB`</a>
- PSPP extension to set the byte ordering (endianness) used for
- reading data in [`IB` or `PIB`
- format](../../language/datasets/formats/binary-and-hex.md#ib-and-pib-formats). In
- `MSBFIRST` ordering, the most-significant byte appears at the left
- end of a IB or PIB field. In `LSBFIRST` ordering, the
- least-significant byte appears at the left end. `NATIVE`, the
- default, is equivalent to `MSBFIRST` or `LSBFIRST` depending on the
- native format of the machine running PSPP.
-
-Interaction subcommands affect the way that PSPP interacts with an
-online user. The interaction subcommands are
-
-* `MXERRS`
- The maximum number of errors before PSPP halts processing of the
- current command file. The default is 50.
-
-* `MXWARNS`
- The maximum number of warnings + errors before PSPP halts
- processing the current command file. The special value of zero
- means that all warning situations should be ignored. No warnings
- are issued, except a single initial warning advising you that
- warnings will not be given. The default value is 100.
-
-Syntax execution subcommands control the way that PSPP commands
-execute. The syntax execution subcommands are
-
-* `LOCALE`
- Overrides the system locale for the purpose of reading and writing
- syntax and data files. The argument should be a locale name in the
- general form `LANGUAGE_COUNTRY.ENCODING`, where `LANGUAGE` and
- `COUNTRY` are 2-character language and country abbreviations,
- respectively, and `ENCODING` is an [IANA character set
- name](http://www.iana.org/assignments/character-sets). Example
- locales are `en_US.UTF-8` (UTF-8 encoded English as spoken in the
- United States) and `ja_JP.EUC-JP` (EUC-JP encoded Japanese as spoken
- in Japan).
-
-* <a name="mxloops">`MXLOOPS`</a>
- The maximum number of iterations for an uncontrolled
- [`LOOP`](../../commands/control/loop.md), and for any [loop in the
- matrix
- language](../../commands/matrix/matrix.md#the-loop-and-break-commands).
- The default `MXLOOPS` is 40.
-
-* <a name="seed">`SEED`</a>
- The initial pseudo-random number seed. Set it to a real number or
- to `RANDOM`, to obtain an initial seed from the current time of day.
-
-* `UNDEFINED`
- Currently not used.
-
-* <a name="fuzzbits">`FUZZBITS`</a>
- The maximum number of bits of errors in the least-significant places
- to accept for rounding up a value that is almost halfway between two
- possibilities for rounding with the
- [RND](../../language/expressions/functions/mathematical.md#rnd). The default
- FUZZBITS is 6.
-
-* <a name="scalemin">`SCALEMIN`</a>
- The minimum number of distinct valid values for PSPP to assume that
- a variable has a scale [measurement
- level](../../language/datasets/variables.md#measurement-level).
-
-* `WORKSPACE`
- The maximum amount of memory (in kilobytes) that PSPP uses to store
- data being processed. If memory in excess of the workspace size is
- required, then PSPP starts to use temporary files to store the
- data. Setting a higher value means that procedures run faster, but
- may cause other applications to run slower. On platforms without
- virtual memory management, setting a very large workspace may cause
- PSPP to abort.
-
-Data output subcommands affect the format of output data. These
-subcommands are
-
-* `CCA`
- `CCB`
- `CCC`
- `CCD`
- `CCE`
- Set up [custom currency
- formats](../../language/datasets/formats/custom-currency.md).
-
-* `DECIMAL`
- The default `DOT` setting causes the decimal point character to be
- `.`. A setting of `COMMA` causes the decimal point character to be
- `,`.
-
-* `FORMAT`
- Allows the default numeric [input/output
- format](../../language/datasets/formats/index.md) to be specified. The
- default is `F8.2`.
-
-* <a name="leadzero">`LEADZERO`</a>
- Controls whether numbers with magnitude less than one are displayed
- with a zero before the decimal point. For example, with `SET
- LEADZERO=OFF`, which is the default, one-half is shown as 0.5, and
- with `SET LEADZERO=ON`, it is shown as .5. This setting affects
- only the `F`, `COMMA`, and `DOT` formats.
-
-* <a name="mdisplay">`MDISPLAY`</a>
- Controls how the
- [`PRINT`](../../commands/matrix/matrix.md#the-print-command) command
- within [`MATRIX`...`END MATRIX`](../../commands/matrix/matrix.md)
- outputs matrices. With the default `TEXT`, `PRINT` outputs matrices
- as text. Change this setting to `TABLES` to instead output matrices
- as pivot tables.
-
-* `SMALL`
- This controls how PSPP formats small numbers in pivot tables, in
- cases where PSPP does not otherwise have a well-defined format for
- the numbers. When such a number has a magnitude less than the
- value set here, PSPP formats the number in scientific notation;
- otherwise, it formats it in standard notation. The default is
- 0.0001. Set a value of 0 to disable scientific notation.
-
-* <a name="wib">`WIB`</a>
- PSPP extension to set the byte ordering (endianness) used for
- writing data in [`IB` or `PIB`
- format](../../language/datasets/formats/binary-and-hex.md#ib-and-pib-formats).
- In `MSBFIRST` ordering, the most-significant byte appears at the
- left end of a IB or PIB field. In `LSBFIRST` ordering, the
- least-significant byte appears at the left end. `NATIVE`, the
- default, is equivalent to `MSBFIRST` or `LSBFIRST` depending on the
- native format of the machine running PSPP.
-
-In the PSPP text-based interface, the output routing subcommands
-affect where output is sent. The following values are allowed for each
-of these subcommands:
-
-* `OFF`
- `NONE`
- Discard this kind of output.
-
-* `TERMINAL`
- Write this output to the terminal, but not to listing files and
- other output devices.
-
-* `LISTING`
- Write this output to listing files and other output devices, but
- not to the terminal.
-
-* `ON`
- `BOTH`
- Write this type of output to all output devices.
-
-These output routing subcommands are:
-
-* `ERRORS`
- Applies to error and warning messages. The default is `BOTH`.
-
-* `MESSAGES`
- Applies to notes. The default is `BOTH`.
-
-* `PRINTBACK`
- Determines whether the syntax used for input is printed back as
- part of the output. The default is `NONE`.
-
-* `RESULTS`
- Applies to everything not in one of the above categories, such as
- the results of statistical procedures. The default is `BOTH`.
-
-These subcommands have no effect on output in the PSPP GUI
-environment.
-
-Output driver option subcommands affect output drivers' settings.
-These subcommands are:
-
-* `HEADERS`
-
-* `LENGTH`
-
-* <a name="width">`WIDTH`</a>
-
-* `TNUMBERS`
- The `TNUMBERS` option sets the way in which values are displayed in
- output tables. The valid settings are `VALUES`, `LABELS` and
- `BOTH`. If `TNUMBERS` is set to `VALUES`, then all values are
- displayed with their literal value (which for a numeric value is a
- number and for a string value an alphanumeric string). If
- `TNUMBERS` is set to `LABELS`, then values are displayed using their
- assigned [value labels](../../commands/variables/value-labels.md), if
- any. If the value has no label, then the literal value is used for
- display. If `TNUMBERS` is set to `BOTH`, then values are displayed
- with both their label (if any) and their literal value in
- parentheses.
-
-* <a name="tvars">`TVARS`</a>
- The `TVARS` option sets the way in which variables are displayed in
- output tables. The valid settings are `NAMES`, `LABELS` and `BOTH`.
- If `TVARS` is set to `NAMES`, then all variables are displayed using
- their names. If `TVARS` is set to `LABELS`, then variables are
- displayed using their [variable
- label](../../commands/variables/variable-labels.md), if one has been
- set. If no label has been set, then the name is used. If `TVARS`
- is set to `BOTH`, then variables are displayed with both their label
- (if any) and their name in parentheses.
-
-* <a name="tlook">`TLOOK`</a>
- The `TLOOK` option sets the style used for subsequent table output.
- Specifying `NONE` makes PSPP use the default built-in style.
- Otherwise, specifying FILE makes PSPP search for an `.stt` or
- `.tlo` file in the same way as specifying `--table-look=FILE` the
- PSPP command line (*note Main Options::).
-
-Logging subcommands affect logging of commands executed to external
-files. These subcommands are
-
-* `JOURNAL`
- `LOG`
- These subcommands, which are synonyms, control the journal. The
- default is `ON`, which causes commands entered interactively to be
- written to the journal file. Commands included from syntax files
- that are included interactively and error messages printed by PSPP
- are also written to the journal file, prefixed by `>`. `OFF`
- disables use of the journal.
-
- The journal is named `pspp.jnl` by default. A different name may
- be specified.
-
-System file subcommands affect the default format of system files
-produced by PSPP. These subcommands are
-
-* <a name="scompression">`SCOMPRESSION</a>`
- Whether system files created by `SAVE` or `XSAVE` are compressed by
- default. The default is `ON`.
-
-Security subcommands affect the operations that commands are allowed
-to perform. The security subcommands are
-
-* <a name="safer">`SAFER`</a>
- Setting this option disables the following operations:
-
- - The `ERASE` command.
- - The `HOST` command.
- - The `PERMISSIONS` command.
- - Pipes (file names beginning or ending with `|`).
-
- Be aware that this setting does not guarantee safety (commands can
- still overwrite files, for instance) but it is an improvement.
- When set, this setting cannot be reset during the same session, for
- obvious security reasons.
-
-* <a name="locale">`LOCALE`</a>
- This item is used to set the default character encoding. The
- encoding may be specified either as an [IANA encoding name or
- alias](http://www.iana.org/assignments/character-sets), or as a
- locale name. If given as a locale name, only the character encoding
- of the locale is relevant.
-
- System files written by PSPP use this encoding. System files read
- by PSPP, for which the encoding is unknown, are interpreted using
- this encoding.
-
- The full list of valid encodings and locale names/alias are
- operating system dependent. The following are all examples of
- acceptable syntax on common GNU/Linux systems.
-
- ```
- SET LOCALE='iso-8859-1'.
-
- SET LOCALE='ru_RU.cp1251'.
-
- SET LOCALE='japanese'.
- ```
-
- Contrary to intuition, this command does not affect any aspect of
- the system's locale.
-
-The following subcommands affect the interpretation of macros. For
-more information, see [Macro
-Settings](../control/define.md#macro-settings).
-
-* <a name="mexpand">`MEXPAND`</a>
- Controls whether macros are expanded. The default is `ON`.
-
-* <a name="mprint">`MPRINT`</a>
- Controls whether the expansion of macros is included in output.
- This is separate from whether command syntax in general is included
- in output. The default is `OFF`.
-
-* <a name="miterate">`MITERATE`</a>
- Limits the number of iterations executed in
- [`!DO`](../../commands/control/define.md#macro-loops) loops within
- macros. This does not affect other language constructs such as
- [`LOOP`…`END LOOP`](../../commands/control/loop.md). This must be set to a
- positive integer. The default is 1000.
-
-* <a name="mnest">`MNEST`</a>
- Limits the number of levels of nested macro expansions. This must
- be set to a positive integer. The default is 50.
-
-The following subcommands are not yet implemented, but PSPP accepts
-them and ignores the settings:
-
-* `BASETEXTDIRECTION`
-* `BLOCK`
-* `BOX`
-* `CACHE`
-* `CELLSBREAK`
-* `COMPRESSION`
-* `CMPTRANS`
-* `HEADER`
-
+++ /dev/null
-# SHOW
-
-```
-SHOW
- [ALL]
- [BLANKS]
- [CC]
- [CCA]
- [CCB]
- [CCC]
- [CCD]
- [CCE]
- [COPYING]
- [DECIMAL]
- [DIRECTORY]
- [ENVIRONMENT]
- [FORMAT]
- [FUZZBITS]
- [LENGTH]
- [MEXPAND]
- [MPRINT]
- [MITERATE]
- [MNEST]
- [MXERRS]
- [MXLOOPS]
- [MXWARNS]
- [N]
- [SCOMPRESSION]
- [SYSTEM]
- [TEMPDIR]
- [UNDEFINED]
- [VERSION]
- [WARRANTY]
- [WEIGHT]
- [WIDTH]
-```
-
-`SHOW` displays PSPP's settings and status. Parameters that can be
-changed using [`SET`](set.md), can be examined using `SHOW` using the
-subcommand with the same name. `SHOW` supports the following
-additional subcommands:
-
-* `ALL`
- Show all settings.
-* `CC`
- Show all custom currency settings (`CCA` through `CCE`).
-* `DIRECTORY`
- Shows the current working directory.
-* `ENVIRONMENT`
- Shows the operating system details.
-* `N`
- Reports the number of cases in the active dataset. The reported
- number is not weighted. If no dataset is defined, then `Unknown`
- is reported.
-* `SYSTEM`
- Shows information about how PSPP was built. This information is
- useful in bug reports.
-* `TEMPDIR`
- Shows the path of the directory where temporary files are stored.
-* `VERSION`
- Shows the version of this installation of PSPP.
-* `WARRANTY`
- Show details of the lack of warranty for PSPP.
-* `COPYING` or `LICENSE`
- Display the terms of [PSPP's copyright licence](../../license.md).
-
-Specifying `SHOW` without any subcommands is equivalent to `SHOW
-ALL`.
-
+++ /dev/null
-# SUBTITLE
-
-```
-SUBTITLE 'SUBTITLE_STRING'.
- or
-SUBTITLE SUBTITLE_STRING.
-```
-
-`SUBTITLE` provides a subtitle to a particular PSPP run. This
-subtitle appears at the top of each output page below the title, if
-headers are enabled on the output device.
-
-Specify a subtitle as a string in quotes. The alternate syntax that
-did not require quotes is now obsolete. If it is used then the subtitle
-is converted to all uppercase.
-
+++ /dev/null
-# TITLE
-
-```
-TITLE 'TITLE_STRING'.
- or
-TITLE TITLE_STRING.
-```
-
-`TITLE` provides a title to a particular PSPP run. This title
-appears at the top of each output page, if headers are enabled on the
-output device.
-
-Specify a title as a string in quotes. The alternate syntax that did
-not require quotes is now obsolete. If it is used then the title is
-converted to all uppercase.
-
--- /dev/null
+# VALUE LABELS
+
+The values of a variable can be associated with explanatory text
+strings. In this way, a short value can stand for a longer, more
+descriptive label.
+
+Both numeric and string variables can be given labels. For string
+variables, the values are case-sensitive, so that, for example, a
+capitalized value and its lowercase variant would have to be labeled
+separately if both are present in the data.
+
+```
+VALUE LABELS
+ /VAR_LIST VALUE 'LABEL' [VALUE 'LABEL']...
+```
+
+`VALUE LABELS` allows values of variables to be associated with
+labels.
+
+To set up value labels for one or more variables, specify the variable
+names after a slash (`/`), followed by a list of values and their
+associated labels, separated by spaces.
+
+Value labels in output are normally broken into lines automatically.
+Put `\n` in a label string to force a line break at that point. The
+label may still be broken into lines at additional points.
+
+Before `VALUE LABELS` is executed, any existing value labels are
+cleared from the variables specified. Use [`ADD VALUE
+LABELS`](add-value-labels.md) to add value labels without clearing
+those already present.
+
--- /dev/null
+# VARIABLE ALIGNMENT
+
+`VARIABLE ALIGNMENT` sets the alignment of variables for display
+editing purposes. It does not affect the display of variables in PSPP
+output.
+
+```
+VARIABLE ALIGNMENT
+ VAR_LIST ( LEFT | RIGHT | CENTER )
+ [ /VAR_LIST ( LEFT | RIGHT | CENTER ) ]
+ .
+ .
+ .
+ [ /VAR_LIST ( LEFT | RIGHT | CENTER ) ]
+
+```
--- /dev/null
+# VARIABLE ATTRIBUTE
+
+`VARIABLE ATTRIBUTE` adds, modifies, or removes user-defined attributes
+associated with variables in the active dataset. Custom variable
+attributes are not interpreted by PSPP, but they are saved as part of
+system files and may be used by other software that reads them.
+
+```
+VARIABLE ATTRIBUTE
+ VARIABLES=VAR_LIST
+ ATTRIBUTE=NAME('VALUE') [NAME('VALUE')]...
+ ATTRIBUTE=NAME[INDEX]('VALUE') [NAME[INDEX]('VALUE')]...
+ DELETE=NAME [NAME]...
+ DELETE=NAME[INDEX] [NAME[INDEX]]...
+```
+
+The required `VARIABLES` subcommand must come first. Specify the
+variables to which the following `ATTRIBUTE` or `DELETE` subcommand
+should apply.
+
+Use the `ATTRIBUTE` subcommand to add or modify custom variable
+attributes. Specify the name of the attribute as an
+[identifier](../language/basics/tokens.md), followed by the desired
+[identifier](../language/basics/tokens.md), followed by the desired
+value, in parentheses, as a quoted string. The specified attributes
+are then added or modified in the variables specified on `VARIABLES`.
+Attribute names that begin with `$` are reserved for PSPP's internal
+use, and attribute names that begin with `@` or `$@` are not displayed
+by most PSPP commands that display other attributes. Other attribute
+names are not treated specially.
+
+Attributes may also be organized into arrays. To assign to an array
+element, add an integer array index enclosed in square brackets (`[`
+and `]`) between the attribute name and value. Array indexes start at
+1, not 0. An attribute array that has a single element (number 1) is
+not distinguished from a non-array attribute.
+
+Use the `DELETE` subcommand to delete an attribute from the variable
+specified on `VARIABLES`. Specify an attribute name by itself to
+delete an entire attribute, including all array elements for attribute
+arrays. Specify an attribute name followed by an array index in
+square brackets to delete a single element of an attribute array. In
+the latter case, all the array elements numbered higher than the
+deleted element are shifted down, filling the vacated position.
+
+To associate custom attributes with the entire active dataset, instead
+of with particular variables, use [`DATAFILE
+ATTRIBUTE`](datafile-attribute.md) instead.
+
+`VARIABLE ATTRIBUTE` takes effect immediately. It is not affected by
+conditional and looping structures such as `DO IF` or `LOOP`.
+
--- /dev/null
+# VARIABLE LABELS
+
+Each variable can have a "label" to supplement its name. Whereas a
+variable name is a concise, easy-to-type mnemonic for the variable, a
+label may be longer and more descriptive.
+
+```
+VARIABLE LABELS
+ VARIABLE 'LABEL'
+ [VARIABLE 'LABEL']...
+```
+
+`VARIABLE LABELS` associates explanatory names with variables. This
+name, called a "variable label", is displayed by statistical
+procedures.
+
+Specify each variable followed by its label as a quoted string.
+Variable-label pairs may be separated by an optional slash `/`.
+
+If a listed variable already has a label, the new one replaces it.
+Specifying an empty string as the label, e.g. `''`, removes a label.
+
--- /dev/null
+# VARIABLE LEVEL
+
+```
+VARIABLE LEVEL variables ({SCALE | NOMINAL | ORDINAL})...
+```
+
+`VARIABLE LEVEL` sets the [measurement
+level](../language/datasets/variables.md) of the listed variables.
+level](../language/datasets/variables.md) of the listed variables.
+
--- /dev/null
+# VARIABLE ROLE
+
+```
+VARIABLE ROLE
+ /ROLE VAR_LIST
+ [/ROLE VAR_LIST]...
+```
+
+`VARIABLE ROLE` sets the intended role of a variable for use in dialog
+boxes in graphical user interfaces. Each `ROLE` specifies one of the
+following roles for the variables that follow it:
+
+* `INPUT`
+ An input variable, such as an independent variable.
+
+* `TARGET`
+ An output variable, such as an dependent variable.
+
+* `BOTH`
+ A variable used for input and output.
+
+* `NONE`
+ No role assigned. (This is a variable's default role.)
+
+* `PARTITION`
+ Used to break the data into groups for testing.
+
+* `SPLIT`
+ No meaning except for certain third party software. (This role's
+ meaning is unrelated to `SPLIT FILE`.)
+
+The PSPPIRE GUI does not yet use variable roles.
--- /dev/null
+# VARIABLE WIDTH
+
+```
+VARIABLE WIDTH
+ VAR_LIST (width)
+ [ /VAR_LIST (width) ]
+ .
+ .
+ .
+ [ /VAR_LIST (width) ]
+```
+
+`VARIABLE WIDTH` sets the column width of variables for display
+editing purposes. It does not affect the display of variables in the
+PSPP output.
+
--- /dev/null
+# Manipulating Variables
+
+Every value in a dataset is associated with a
+[variable](../language/datasets/variables.md). Variables describe
+[variable](../language/datasets/variables.md). Variables describe
+what the values represent and properties of those values, such as the
+format in which they should be displayed, whether they are numeric or
+alphabetic and how missing values should be represented. There are
+several utility commands for examining and adjusting variables.
+++ /dev/null
-# ADD VALUE LABELS
-
-`ADD VALUE LABELS` has the same syntax and purpose as [`VALUE
-LABELS`](value-labels.md), but it does not clear value labels from the
-variables before adding the ones specified.
-
-```
-ADD VALUE LABELS
- /VAR_LIST VALUE 'LABEL' [VALUE 'LABEL']...
-```
+++ /dev/null
-# DELETE VARIABLES
-
-`DELETE VARIABLES` deletes the specified variables from the dictionary.
-
-```
-DELETE VARIABLES VAR_LIST.
-```
-
-`DELETE VARIABLES` should not be used after defining transformations
-but before executing a procedure. If it is anyhow, it causes the data
-to be read. If it is used while `TEMPORARY` is in effect, it causes
-the temporary transformations to become permanent.
-
-`DELETE VARIABLES` may not be used to delete all variables from the
-dictionary; use [`NEW FILE`](../../commands/data-io/new-file.html)
-instead.
-
+++ /dev/null
-# DISPLAY
-
-The `DISPLAY` command displays information about the variables in the
-active dataset. A variety of different forms of information can be
-requested. By default, all variables in the active dataset are
-displayed. However you can select variables of interest using the
-`/VARIABLES` subcommand.
-
-```
-DISPLAY [SORTED] NAMES [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] INDEX [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] LABELS [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] VARIABLES [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] DICTIONARY [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] SCRATCH [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] ATTRIBUTES [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] @ATTRIBUTES [[/VARIABLES=]VAR_LIST].
-DISPLAY [SORTED] VECTORS.
-```
-
-The following keywords primarily cause information about variables to
-be displayed. With these keywords, by default information is
-displayed about all variable in the active dataset, in the order that
-variables occur in the active dataset dictionary. The `SORTED`
-keyword causes output to be sorted alphabetically by variable name.
-
-* `NAMES`
- The variables' names are displayed.
-
-* `INDEX`
- The variables' names are displayed along with a value describing
- their position within the active dataset dictionary.
-
-* `LABELS`
- Variable names, positions, and variable labels are displayed.
-
-* `VARIABLES`
- Variable names, positions, print and write formats, and missing
- values are displayed.
-
-* `DICTIONARY`
- Variable names, positions, print and write formats, missing values,
- variable labels, and value labels are displayed.
-
-* `SCRATCH`
- Displays Variablen ames, for [scratch
- variables](../../language/datasets/scratch-variables.md) only.
-
-* `ATTRIBUTES`
- Datafile and variable attributes are displayed, except attributes
- whose names begin with `@` or `$@`.
-
-* `@ATTRIBUTES`
- All datafile and variable attributes, even those whose names begin
- with `@` or `$@`.
-
-With the `VECTOR` keyword, `DISPLAY` lists all the currently declared
-vectors. If the `SORTED` keyword is given, the vectors are listed in
-alphabetical order; otherwise, they are listed in textual order of
-definition within the PSPP syntax file.
-
-For related commands, see [`DISPLAY
-DOCUMENTS`](../utilities/display-documents.md) and [`DISPLAY FILE
-LABEL`](../utilities/display-file-label.md).
-
+++ /dev/null
-# FORMATS
-
-```
-FORMATS VAR_LIST (FMT_SPEC) [VAR_LIST (FMT_SPEC)]....
-```
-
-`FORMATS` set both print and write formats for the specified variables
-to the specified [output
-format](../../language/datasets/formats/index.md).
-
-Specify a list of variables followed by a format specification in
-parentheses. The print and write formats of the specified variables
-will be changed. All of the variables listed together must have the
-same type and, for string variables, the same width.
-
-Additional lists of variables and formats may be included following
-the first one.
-
-`FORMATS` takes effect immediately. It is not affected by conditional
-and looping structures such as `DO IF` or `LOOP`.
-
+++ /dev/null
-# Manipulating Variables
-
-Every value in a dataset is associated with a
-[variable](../../language/datasets/variables.md). Variables describe
-what the values represent and properties of those values, such as the
-format in which they should be displayed, whether they are numeric or
-alphabetic and how missing values should be represented. There are
-several utility commands for examining and adjusting variables.
+++ /dev/null
-# LEAVE
-
-`LEAVE` prevents the specified variables from being reinitialized
-whenever a new case is processed.
-
-```
-LEAVE VAR_LIST.
-```
-
-Normally, when a data file is processed, every variable in the active
-dataset is initialized to the system-missing value or spaces at the
-beginning of processing for each case. When a variable has been
-specified on `LEAVE`, this is not the case. Instead, that variable is
-initialized to 0 (not system-missing) or spaces for the first case.
-After that, it retains its value between cases.
-
-This becomes useful for counters. For instance, in the example below
-the variable `SUM` maintains a running total of the values in the
-`ITEM` variable.
-
-```
-DATA LIST /ITEM 1-3.
-COMPUTE SUM=SUM+ITEM.
-PRINT /ITEM SUM.
-LEAVE SUM
-BEGIN DATA.
-123
-404
-555
-999
-END DATA.
-```
-
-Partial output from this example:
-
-```
-123 123.00
-404 527.00
-555 1082.00
-999 2081.00
-```
-
-It is best to use `LEAVE` command immediately before invoking a
-procedure command, because the left status of variables is reset by
-certain transformations—for instance, `COMPUTE` and `IF`. Left status
-is also reset by all procedure invocations.
-
+++ /dev/null
-# MISSING VALUES
-
-In many situations, the data available for analysis is incomplete, so
-that a placeholder must be used to indicate that the value is unknown.
-One way that missing values are represented, for numeric data, is the
-["system-missing value"](../../language/basics/missing-values.html).
-Another, more flexible way is through "user-missing values" which are
-determined on a per variable basis.
-
-The `MISSING VALUES` command sets user-missing values for variables.
-
-```
-MISSING VALUES VAR_LIST (MISSING_VALUES).
-
-where MISSING_VALUES takes one of the following forms:
- NUM1
- NUM1, NUM2
- NUM1, NUM2, NUM3
- NUM1 THRU NUM2
- NUM1 THRU NUM2, NUM3
- STRING1
- STRING1, STRING2
- STRING1, STRING2, STRING3
-As part of a range, `LO` or `LOWEST` may take the place of NUM1;
-`HI` or `HIGHEST` may take the place of NUM2.
-```
-
-`MISSING VALUES` sets user-missing values for numeric and string
-variables. Long string variables may have missing values, but
-characters after the first 8 bytes of the missing value must be
-spaces.
-
-Specify a list of variables, followed by a list of their user-missing
-values in parentheses. Up to three discrete values may be given, or,
-for numeric variables only, a range of values optionally accompanied
-by a single discrete value. Ranges may be open-ended on one end,
-indicated through the use of the keyword `LO` or `LOWEST` or `HI` or
-`HIGHEST`.
-
-The `MISSING VALUES` command takes effect immediately. It is not
-affected by conditional and looping constructs such as `DO IF` or
-`LOOP`.
-
+++ /dev/null
-# MRSETS
-
-`MRSETS` creates, modifies, deletes, and displays multiple response
-sets. A multiple response set is a set of variables that represent
-multiple responses to a survey question.
-
-Multiple responses are represented in one of the two following ways:
-
-- A "multiple dichotomy set" is analogous to a survey question with a
- set of checkboxes. Each variable in the set is treated in a Boolean
- fashion: one value (the "counted value") means that the box was
- checked, and any other value means that it was not.
-
-- A "multiple category set" represents a survey question where the
- respondent is instructed to list up to N choices. Each variable
- represents one of the responses.
-
-```
-MRSETS
- /MDGROUP NAME=NAME VARIABLES=VAR_LIST VALUE=VALUE
- [CATEGORYLABELS={VARLABELS,COUNTEDVALUES}]
- [{LABEL='LABEL',LABELSOURCE=VARLABEL}]
-
- /MCGROUP NAME=NAME VARIABLES=VAR_LIST [LABEL='LABEL']
-
- /DELETE NAME={[NAMES],ALL}
-
- /DISPLAY NAME={[NAMES],ALL}
-```
-
-Any number of subcommands may be specified in any order.
-
-The `MDGROUP` subcommand creates a new multiple dichotomy set or
-replaces an existing multiple response set. The `NAME`, `VARIABLES`,
-and `VALUE` specifications are required. The others are optional:
-
-- `NAME` specifies the name used in syntax for the new multiple
- dichotomy set. The name must begin with `$`; it must otherwise
- follow the rules for [identifiers](../../language/basics/tokens.md).
-
-- `VARIABLES` specifies the variables that belong to the set. At
- least two variables must be specified. The variables must be all
- string or all numeric.
-
-- `VALUE` specifies the counted value. If the variables are numeric,
- the value must be an integer. If the variables are strings, then
- the value must be a string that is no longer than the shortest of
- the variables in the set (ignoring trailing spaces).
-
-- `CATEGORYLABELS` optionally specifies the source of the labels for
- each category in the set:
-
- − `VARLABELS`, the default, uses variable labels or, for
- variables without variable labels, variable names. PSPP warns
- if two variables have the same variable label, since these
- categories cannot be distinguished in output.
-
- − `COUNTEDVALUES` instead uses each variable's value label for
- the counted value. PSPP warns if two variables have the same
- value label for the counted value or if one of the variables
- lacks a value label, since such categories cannot be
- distinguished in output.
-
-- `LABEL` optionally specifies a label for the multiple response set.
- If neither `LABEL` nor `LABELSOURCE=VARLABEL` is specified, the set
- is unlabeled.
-
-- `LABELSOURCE=VARLABEL` draws the multiple response set's label from
- the first variable label among the variables in the set; if none of
- the variables has a label, the name of the first variable is used.
- `LABELSOURCE=VARLABEL` must be used with
- `CATEGORYLABELS=COUNTEDVALUES`. It is mutually exclusive with
- `LABEL`.
-
-The `MCGROUP` subcommand creates a new multiple category set or
-replaces an existing multiple response set. The `NAME` and
-`VARIABLES` specifications are required, and `LABEL` is optional.
-Their meanings are as described above in `MDGROUP`. PSPP warns if two
-variables in the set have different value labels for a single value,
-since each of the variables in the set should have the same possible
-categories.
-
-The `DELETE` subcommand deletes multiple response groups. A list of
-groups may be named within a set of required square brackets, or ALL
-may be used to delete all groups.
-
-The `DISPLAY` subcommand displays information about defined multiple
-response sets. Its syntax is the same as the `DELETE` subcommand.
-
-Multiple response sets are saved to and read from system files by,
-e.g., the `SAVE` and `GET` command. Otherwise, multiple response sets
-are currently used only by third party software.
-
+++ /dev/null
-# NUMERIC
-
-`NUMERIC` explicitly declares new numeric variables, optionally setting
-their output formats.
-
-```
-NUMERIC VAR_LIST [(FMT_SPEC)] [/VAR_LIST [(FMT_SPEC)]]...
-```
-
- Specify the names of the new numeric variables as `VAR_LIST`. If
-you wish to set the variables' output formats, follow their names by
-an [output format](../../language/datasets/formats/index.html) in
-parentheses; otherwise, the default is `F8.2`.
-
- Variables created with `NUMERIC` are initialized to the
-system-missing value.
-
+++ /dev/null
-# PRINT FORMATS
-
-```
-PRINT FORMATS VAR_LIST (FMT_SPEC) [VAR_LIST (FMT_SPEC)]....
-```
-
-`PRINT FORMATS` sets the print formats for the specified variables to
-the specified format specification.
-
-It has the same syntax as [`FORMATS`](formats.md), but `PRINT FORMATS`
-sets only print formats, not write formats.
-
+++ /dev/null
-# RENAME VARIABLES
-
-`RENAME VARIABLES` changes the names of variables in the active dataset.
-
-```
-RENAME VARIABLES (OLD_NAMES=NEW_NAMES)... .
-```
-
-Specify lists of the old variable names and new variable names,
-separated by an equals sign (`=`), within parentheses. There must be
-the same number of old and new variable names. Each old variable is
-renamed to the corresponding new variable name. Multiple
-parenthesized groups of variables may be specified. When the old and
-new variable names contain only a single variable name, the
-parentheses are optional.
-
-`RENAME VARIABLES` takes effect immediately. It does not cause the
-data to be read.
-
-`RENAME VARIABLES` may not be specified following
-[`TEMPORARY`](../selection/temporary.md).
-
+++ /dev/null
-# SORT VARIABLES
-
-`SORT VARIABLES` reorders the variables in the active dataset's
-dictionary according to a chosen sort key.
-
-```
-SORT VARIABLES [BY]
- (NAME | TYPE | FORMAT | LABEL | VALUES | MISSING | MEASURE
- | ROLE | COLUMNS | ALIGNMENT | ATTRIBUTE NAME)
- [(D)].
-```
-
-The main specification is one of the following identifiers, which
-determines how the variables are sorted:
-
-* `NAME`
- Sorts the variables according to their names, in a case-insensitive
- fashion. However, when variable names differ only in a number at
- the end, they are sorted numerically. For example, `VAR5` is
- sorted before `VAR400` even though `4` precedes `5`.
-
-* `TYPE`
- Sorts numeric variables before string variables, and shorter string
- variables before longer ones.
-
-* `FORMAT`
- Groups variables by print format; within a format, sorts narrower
- formats before wider ones; with the same format and width, sorts
- fewer decimal places before more decimal places. See [`PRINT
- FORMATS`](print-formats.md).
-
-* `LABEL`
- Sorts variables without a variable label before those with one.
- See [VARIABLE LABELS](variable-labels.md).
-
-* `VALUES`
- Sorts variables without value labels before those with some. See
- [VALUE LABELS](value-labels.md).
-
-* `MISSING`
- Sorts variables without missing values before those with some. See
- [MISSING VALUES](missing-values.md).
-
-* `MEASURE`
- Sorts nominal variables first, followed by ordinal variables,
- followed by scale variables. See [VARIABLE
- LEVEL](variable-level.md).
-
-* `ROLE`
- Groups variables according to their role. See [VARIABLE
- ROLE](variable-role.md).
-
-* `COLUMNS`
- Sorts variables in ascending display width. See [VARIABLE
- WIDTH](variable-width.md).
-
-* `ALIGNMENT`
- Sorts variables according to their alignment, first left-aligned,
- then right-aligned, then centered. See [VARIABLE
- ALIGNMENT](variable-alignment.md).
-
-* `ATTRIBUTE NAME`
- Sorts variables according to the first value of their `NAME`
- attribute. Variables without attributes are sorted first. See
- [VARIABLE ATTRIBUTE](variable-attribute.md).
-
-Only one sort criterion can be specified. The sort is "stable," so to
-sort on multiple criteria one may perform multiple sorts. For
-example, the following will sort primarily based on alignment, with
-variables that have the same alignment ordered based on display width:
-
-```
-SORT VARIABLES BY COLUMNS.
-SORT VARIABLES BY ALIGNMENT.
-```
-
-
-Specify `(D)` to reverse the sort order.
-
+++ /dev/null
-# STRING
-
-`STRING` creates new string variables.
-
-```
-STRING VAR_LIST (FMT_SPEC) [/VAR_LIST (FMT_SPEC)] [...].
-```
-
-Specify a list of names for the variable you want to create, followed
-by the desired [output
-format](../../language/datasets/formats/index.html) in parentheses.
-Variable widths are implicitly derived from the specified output
-formats. The created variables will be initialized to spaces.
-
-If you want to create several variables with distinct output formats,
-you can either use two or more separate `STRING` commands, or you can
-specify further variable list and format specification pairs, each
-separated from the previous by a slash (`/`).
-
-The following example is one way to create three string variables; Two
-of the variables have format `A24` and the other `A80`:
-
-```
-STRING firstname lastname (A24) / address (A80).
-```
-
-Here is another way to achieve the same result:
-
-```
-STRING firstname lastname (A24).
-STRING address (A80).
-```
-
-... and here is yet another way:
-
-```
-STRING firstname (A24).
-STRING lastname (A24).
-STRING address (A80).
-```
+++ /dev/null
-# VALUE LABELS
-
-The values of a variable can be associated with explanatory text
-strings. In this way, a short value can stand for a longer, more
-descriptive label.
-
-Both numeric and string variables can be given labels. For string
-variables, the values are case-sensitive, so that, for example, a
-capitalized value and its lowercase variant would have to be labeled
-separately if both are present in the data.
-
-```
-VALUE LABELS
- /VAR_LIST VALUE 'LABEL' [VALUE 'LABEL']...
-```
-
-`VALUE LABELS` allows values of variables to be associated with
-labels.
-
-To set up value labels for one or more variables, specify the variable
-names after a slash (`/`), followed by a list of values and their
-associated labels, separated by spaces.
-
-Value labels in output are normally broken into lines automatically.
-Put `\n` in a label string to force a line break at that point. The
-label may still be broken into lines at additional points.
-
-Before `VALUE LABELS` is executed, any existing value labels are
-cleared from the variables specified. Use [`ADD VALUE
-LABELS`](add-value-labels.md) to add value labels without clearing
-those already present.
-
+++ /dev/null
-# VARIABLE ALIGNMENT
-
-`VARIABLE ALIGNMENT` sets the alignment of variables for display
-editing purposes. It does not affect the display of variables in PSPP
-output.
-
-```
-VARIABLE ALIGNMENT
- VAR_LIST ( LEFT | RIGHT | CENTER )
- [ /VAR_LIST ( LEFT | RIGHT | CENTER ) ]
- .
- .
- .
- [ /VAR_LIST ( LEFT | RIGHT | CENTER ) ]
-
-```
+++ /dev/null
-# VARIABLE ATTRIBUTE
-
-`VARIABLE ATTRIBUTE` adds, modifies, or removes user-defined attributes
-associated with variables in the active dataset. Custom variable
-attributes are not interpreted by PSPP, but they are saved as part of
-system files and may be used by other software that reads them.
-
-```
-VARIABLE ATTRIBUTE
- VARIABLES=VAR_LIST
- ATTRIBUTE=NAME('VALUE') [NAME('VALUE')]...
- ATTRIBUTE=NAME[INDEX]('VALUE') [NAME[INDEX]('VALUE')]...
- DELETE=NAME [NAME]...
- DELETE=NAME[INDEX] [NAME[INDEX]]...
-```
-
-The required `VARIABLES` subcommand must come first. Specify the
-variables to which the following `ATTRIBUTE` or `DELETE` subcommand
-should apply.
-
-Use the `ATTRIBUTE` subcommand to add or modify custom variable
-attributes. Specify the name of the attribute as an
-[identifier](../../language/basics/tokens.md), followed by the desired
-value, in parentheses, as a quoted string. The specified attributes
-are then added or modified in the variables specified on `VARIABLES`.
-Attribute names that begin with `$` are reserved for PSPP's internal
-use, and attribute names that begin with `@` or `$@` are not displayed
-by most PSPP commands that display other attributes. Other attribute
-names are not treated specially.
-
-Attributes may also be organized into arrays. To assign to an array
-element, add an integer array index enclosed in square brackets (`[`
-and `]`) between the attribute name and value. Array indexes start at
-1, not 0. An attribute array that has a single element (number 1) is
-not distinguished from a non-array attribute.
-
-Use the `DELETE` subcommand to delete an attribute from the variable
-specified on `VARIABLES`. Specify an attribute name by itself to
-delete an entire attribute, including all array elements for attribute
-arrays. Specify an attribute name followed by an array index in
-square brackets to delete a single element of an attribute array. In
-the latter case, all the array elements numbered higher than the
-deleted element are shifted down, filling the vacated position.
-
-To associate custom attributes with the entire active dataset, instead
-of with particular variables, use [`DATAFILE
-ATTRIBUTE`](../../commands/data-io/datafile-attribute.md) instead.
-
-`VARIABLE ATTRIBUTE` takes effect immediately. It is not affected by
-conditional and looping structures such as `DO IF` or `LOOP`.
-
+++ /dev/null
-# VARIABLE LABELS
-
-Each variable can have a "label" to supplement its name. Whereas a
-variable name is a concise, easy-to-type mnemonic for the variable, a
-label may be longer and more descriptive.
-
-```
-VARIABLE LABELS
- VARIABLE 'LABEL'
- [VARIABLE 'LABEL']...
-```
-
-`VARIABLE LABELS` associates explanatory names with variables. This
-name, called a "variable label", is displayed by statistical
-procedures.
-
-Specify each variable followed by its label as a quoted string.
-Variable-label pairs may be separated by an optional slash `/`.
-
-If a listed variable already has a label, the new one replaces it.
-Specifying an empty string as the label, e.g. `''`, removes a label.
-
+++ /dev/null
-# VARIABLE LEVEL
-
-```
-VARIABLE LEVEL variables ({SCALE | NOMINAL | ORDINAL})...
-```
-
-`VARIABLE LEVEL` sets the [measurement
-level](../../language/datasets/variables.md) of the listed variables.
-
+++ /dev/null
-# VARIABLE ROLE
-
-```
-VARIABLE ROLE
- /ROLE VAR_LIST
- [/ROLE VAR_LIST]...
-```
-
-`VARIABLE ROLE` sets the intended role of a variable for use in dialog
-boxes in graphical user interfaces. Each `ROLE` specifies one of the
-following roles for the variables that follow it:
-
-* `INPUT`
- An input variable, such as an independent variable.
-
-* `TARGET`
- An output variable, such as an dependent variable.
-
-* `BOTH`
- A variable used for input and output.
-
-* `NONE`
- No role assigned. (This is a variable's default role.)
-
-* `PARTITION`
- Used to break the data into groups for testing.
-
-* `SPLIT`
- No meaning except for certain third party software. (This role's
- meaning is unrelated to `SPLIT FILE`.)
-
-The PSPPIRE GUI does not yet use variable roles.
+++ /dev/null
-# VARIABLE WIDTH
-
-```
-VARIABLE WIDTH
- VAR_LIST (width)
- [ /VAR_LIST (width) ]
- .
- .
- .
- [ /VAR_LIST (width) ]
-```
-
-`VARIABLE WIDTH` sets the column width of variables for display
-editing purposes. It does not affect the display of variables in the
-PSPP output.
-
+++ /dev/null
-# VECTOR
-
-```
-Two possible syntaxes:
- VECTOR VEC_NAME=VAR_LIST.
- VECTOR VEC_NAME_LIST(COUNT [FORMAT]).
-```
-
-`VECTOR` allows a group of variables to be accessed as if they were
-consecutive members of an array with a `vector(index)` notation.
-
-To make a vector out of a set of existing variables, specify a name
-for the vector followed by an equals sign (`=`) and the variables to
-put in the vector. The variables must be all numeric or all string,
-and string variables must have the same width.
-
-To make a vector and create variables at the same time, specify one or
-more vector names followed by a count in parentheses. This will
-create variables named `VEC1` through `VEC<count>`. By default, the
-new variables are numeric with format `F8.2`, but an alternate format
-may be specified inside the parentheses before or after the count and
-separated from it by white space or a comma. With a string format
-such as `A8`, the variables will be string variables; with a numeric
-format, they will be numeric. Variable names including the suffixes
-may not exceed 64 characters in length, and none of the variables may
-exist prior to `VECTOR`.
-
-Vectors created with `VECTOR` disappear after any procedure or
-procedure-like command is executed. The variables contained in the
-vectors remain, unless they are [scratch
-variables](../../language/datasets/scratch-variables.md).
-
-Variables within a vector may be referenced in expressions using
-`vector(index)` syntax.
-
+++ /dev/null
-# WRITE FORMATS
-
-```
-WRITE FORMATS VAR_LIST (FMT_SPEC) [VAR_LIST (FMT_SPEC)]....
-```
-
-`WRITE FORMATS` sets the write formats for the specified variables to
-the specified format specification. It has the same syntax as
-[`FORMATS`](formats.md), but `WRITE FORMATS` sets only write formats,
-not print formats.
-
--- /dev/null
+# VECTOR
+
+```
+Two possible syntaxes:
+ VECTOR VEC_NAME=VAR_LIST.
+ VECTOR VEC_NAME_LIST(COUNT [FORMAT]).
+```
+
+`VECTOR` allows a group of variables to be accessed as if they were
+consecutive members of an array with a `vector(index)` notation.
+
+To make a vector out of a set of existing variables, specify a name
+for the vector followed by an equals sign (`=`) and the variables to
+put in the vector. The variables must be all numeric or all string,
+and string variables must have the same width.
+
+To make a vector and create variables at the same time, specify one or
+more vector names followed by a count in parentheses. This will
+create variables named `VEC1` through `VEC<count>`. By default, the
+new variables are numeric with format `F8.2`, but an alternate format
+may be specified inside the parentheses before or after the count and
+separated from it by white space or a comma. With a string format
+such as `A8`, the variables will be string variables; with a numeric
+format, they will be numeric. Variable names including the suffixes
+may not exceed 64 characters in length, and none of the variables may
+exist prior to `VECTOR`.
+
+Vectors created with `VECTOR` disappear after any procedure or
+procedure-like command is executed. The variables contained in the
+vectors remain, unless they are [scratch
+variables](../language/datasets/scratch-variables.md).
+variables](../language/datasets/scratch-variables.md).
+
+Variables within a vector may be referenced in expressions using
+`vector(index)` syntax.
+
--- /dev/null
+# WEIGHT
+
+```
+WEIGHT BY VAR_NAME.
+WEIGHT OFF.
+```
+
+`WEIGHT` assigns cases varying weights, changing the frequency
+distribution of the active dataset. Execution of `WEIGHT` is delayed
+until data have been read.
+
+If a variable name is specified, `WEIGHT` causes the values of that
+variable to be used as weighting factors for subsequent statistical
+procedures. Use of keyword `BY` is optional but recommended.
+Weighting variables must be numeric. [Scratch
+variables](../language/datasets/scratch-variables.md) may not be
+variables](../language/datasets/scratch-variables.md) may not be
+used for weighting.
+
+When `OFF` is specified, subsequent statistical procedures weight all
+cases equally.
+
+A positive integer weighting factor `W` on a case yields the same
+statistical output as would replicating the case `W` times. A
+weighting factor of 0 is treated for statistical purposes as if the
+case did not exist in the input. Weighting values need not be
+integers, but negative and system-missing values for the weighting
+variable are interpreted as weighting factors of 0. User-missing
+values are not treated specially.
+
+When `WEIGHT` is specified after [`TEMPORARY`](temporary.md), it
+affects only the next procedure.
+
+`WEIGHT` does not cause cases in the active dataset to be replicated
+in memory.
+
+## Example
+
+One could define a dataset containing an inventory of stock items. It
+would be reasonable to use a string variable for a description of the
+item, and a numeric variable for the number in stock, like in the
+syntax below.
+
+```
+data list notable list /item (a16) quantity (f8.0).
+begin data
+nuts 345
+screws 10034
+washers 32012
+bolts 876
+end data.
+
+echo 'Unweighted frequency table'.
+frequencies /variables = item /format=dfreq.
+
+weight by quantity.
+
+echo 'Weighted frequency table'.
+frequencies /variables = item /format=dfreq.
+```
+
+One analysis which most surely would be of interest is the relative
+amounts or each item in stock. However without setting a weight
+variable, [`FREQUENCIES`](frequencies.md) does not tell
+us what we want to know, since there is only one case for each stock
+item. The output below shows the difference between the weighted and
+unweighted frequency tables.
+
+```
+Unweighted frequency table
+
+ item
+┌─────────────┬─────────┬───────┬─────────────┬──────────────────┐
+│ │Frequency│Percent│Valid Percent│Cumulative Percent│
+├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Valid bolts │ 1│ 25.0%│ 25.0%│ 25.0%│
+│ nuts │ 1│ 25.0%│ 25.0%│ 50.0%│
+│ screws │ 1│ 25.0%│ 25.0%│ 75.0%│
+│ washers│ 1│ 25.0%│ 25.0%│ 100.0%│
+├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Total │ 4│ 100.0%│ │ │
+└─────────────┴─────────┴───────┴─────────────┴──────────────────┘
+
+Weighted frequency table
+
+ item
+┌─────────────┬─────────┬───────┬─────────────┬──────────────────┐
+│ │Frequency│Percent│Valid Percent│Cumulative Percent│
+├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Valid washers│ 32012│ 74.0%│ 74.0%│ 74.0%│
+│ screws │ 10034│ 23.2%│ 23.2%│ 97.2%│
+│ bolts │ 876│ 2.0%│ 2.0%│ 99.2%│
+│ nuts │ 345│ .8%│ .8%│ 100.0%│
+├─────────────┼─────────┼───────┼─────────────┼──────────────────┤
+│Total │ 43267│ 100.0%│ │ │
+└─────────────┴─────────┴───────┴─────────────┴──────────────────┘
+```
--- /dev/null
+# WRITE FORMATS
+
+```
+WRITE FORMATS VAR_LIST (FMT_SPEC) [VAR_LIST (FMT_SPEC)]....
+```
+
+`WRITE FORMATS` sets the write formats for the specified variables to
+the specified format specification. It has the same syntax as
+[`FORMATS`](formats.md), but `WRITE FORMATS` sets only write formats,
+not print formats.
+
--- /dev/null
+# WRITE
+
+```
+WRITE
+ OUTFILE='FILE_NAME'
+ RECORDS=N_LINES
+ {NOTABLE,TABLE}
+ /[LINE_NO] ARG...
+
+ARG takes one of the following forms:
+ 'STRING' [START-END]
+ VAR_LIST START-END [TYPE_SPEC]
+ VAR_LIST (FORTRAN_SPEC)
+ VAR_LIST *
+```
+
+`WRITE` writes text or binary data to an output file. `WRITE` differs
+from [`PRINT`](print.md) in only a few ways:
+
+- `WRITE` uses write formats by default, whereas `PRINT` uses print
+ formats.
+
+- `PRINT` inserts a space between variables unless a format is
+ explicitly specified, but `WRITE` never inserts space between
+ variables in output.
+
+- `PRINT` inserts a space at the beginning of each line that it writes
+ to an output file (and `PRINT EJECT` inserts `1` at the beginning of
+ each line that should begin a new page), but `WRITE` does not.
+
+- `PRINT` outputs the system-missing value according to its specified
+ output format, whereas `WRITE` outputs the system-missing value as a
+ field filled with spaces. Binary formats are an exception.
+
--- /dev/null
+# XEXPORT
+
+```
+XEXPORT
+ /OUTFILE='FILE_NAME'
+ /DIGITS=N
+ /DROP=VAR_LIST
+ /KEEP=VAR_LIST
+ /RENAME=(SRC_NAMES=TARGET_NAMES)...
+ /TYPE={COMM,TAPE}
+ /MAP
+```
+
+The `XEXPORT` transformation writes the active dataset dictionary and
+data to a specified portable file.
+
+This transformation is a PSPP extension.
+
+It is similar to the `EXPORT` procedure, with two differences:
+
+- `XEXPORT` is a transformation, not a procedure. It is executed when
+ the data is read by a procedure or procedure-like command.
+
+- `XEXPORT` does not support the `UNSELECTED` subcommand.
+
+See [`EXPORT`](export.md) for more information.
+
--- /dev/null
+# XSAVE
+
+```
+XSAVE
+ /OUTFILE='FILE_NAME'
+ /{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED}
+ /PERMISSIONS={WRITEABLE,READONLY}
+ /DROP=VAR_LIST
+ /KEEP=VAR_LIST
+ /VERSION=VERSION
+ /RENAME=(SRC_NAMES=TARGET_NAMES)...
+ /NAMES
+ /MAP
+```
+
+The `XSAVE` transformation writes the active dataset's dictionary and
+data to a system file. It is similar to the `SAVE` procedure, with
+two differences:
+
+- `XSAVE` is a transformation, not a procedure. It is executed when
+ the data is read by a procedure or procedure-like command.
+
+- `XSAVE` does not support the `UNSELECTED` subcommand.
+
+See [`SAVE`](save.md) for more information.
+
variables, specify `PRESORTED` to save time. With
`MODE=ADDVARIABLES`, the data must be pre-sorted.
-Specify `DOCUMENT` to copy the [documents](../utilities/document.md)
-from the active dataset into the aggregate file. Otherwise, the
-aggregate file does not contain any documents, even if the aggregate
-file replaces the active dataset.
+Specify `DOCUMENT` to copy the [documents](document.md) from the
+active dataset into the aggregate file. Otherwise, the aggregate file
+does not contain any documents, even if the aggregate file replaces
+the active dataset.
Normally, `AGGREGATE` produces a non-missing value whenever there is
enough non-missing data for the aggregation function in use, that is,
* `N(VAR_NAME...)`
`NMISS(VAR_NAME...)`
Total weight of non-missing or missing values, respectively. The
- default format is `F7.0` if weighting is not enabled, `F8.2` if it is
- (see [`WEIGHT`](../selection/weight.md)).
+ default format is `F7.0` if weighting is not enabled, `F8.2` if it
+ is (see [`WEIGHT`](weight.md)).
* `NU(VAR_NAME...)`
`NUMISS(VAR_NAME...)`
* `N`
Total weight of cases aggregated to form this group. The default
format is `F7.0` if weighting is not enabled, `F8.2` if it is (see
- [`WEIGHT`](../selection/weight.md)).
+ [`WEIGHT`](weight.md)).
* `NU`
Count of cases aggregated to form this group, ignoring case
period to be interpreted as the end of the command.)
`AGGREGATE` both ignores and cancels the current [`SPLIT
-FILE`](../selection/split-file.md) settings.
+FILE`](split-file.md) settings.
## Aggregate Example
`--data`, also outputs cases from the system file.
This can be useful as an alternative to PSPP syntax commands such as
- [`SYSFILE INFO`](../commands/spss-io/sysfile-info.md) or [`DISPLAY
- DICTIONARY`](../commands/variables/display.md).
+ [`SYSFILE INFO`](../commands/sysfile-info.md) or [`DISPLAY
+ DICTIONARY`](../commands/display.md).
[`pspp convert`](pspp-convert.md) is a better way to convert a
system file to another format.
- Numbers in `COMMA` format are additionally grouped every three digits
by inserting a grouping character. The grouping character is
ordinarily a comma, but it can be changed to a period (with [`SET
- DECIMAL`](../../../commands/utilities/set.md#decimal)).
+ DECIMAL`](../../../commands/set.md#decimal)).
- `DOT` format is like `COMMA` format, but it interchanges the role of
the decimal point and grouping characters. That is, the current
follow the letter or the sign or both.
On [fixed-format `DATA
-LIST`](../../../commands/data-io/data-list.md#data-list-fixed) and in
+LIST`](../../../commands/data-list.md#data-list-fixed) and in
a few other contexts, decimals are implied when the field does not
contain a decimal point. In `F6.5` format, for example, the field
`314159` is taken as the value 3.14159 with implied decimals.
output is 0, and a decimal point is included, then PSPP ordinarily
drops the zero before the decimal point. However, in `F`, `COMMA`,
or `DOT` formats, PSPP keeps the zero if [`SET
- LEADZERO`](../../../commands/utilities/set.md#leadzero) is set to
+ LEADZERO`](../../../commands/set.md#leadzero) is set to
`ON`.
In scientific notation, the number always includes a decimal point,
read from data files with variable-length records, such as ordinary
text files. They may be read from or written to data files with
fixed-length records. See [`FILE
-HANDLE`](../../../commands/data-io/file-handle.md), for information on
+HANDLE`](../../../commands/file-handle.md), for information on
working with fixed-length records.
## `P` and `PK` Formats
These are integer binary formats. `IB` reads and writes 2's
complement binary integers, and `PIB` reads and writes unsigned binary
integers. The byte ordering is by default the host machine's, but
-[`SET RIB`](../../../commands/utilities/set.md#rib) may be used to
-select a specific byte ordering for reading and [`SET
-WIB`](../../../commands/utilities/set.md#wib), similarly, for writing.
+[`SET RIB`](../../../commands/set.md#rib) may be used to select a
+specific byte ordering for reading and [`SET
+WIB`](../../../commands/set.md#wib), similarly, for writing.
The maximum field width is 8. Decimal places may range from 0 up to
the number of decimal digits in the largest value representable in the
This is a binary format for real numbers. It reads and writes the
host machine's floating-point format. The byte ordering is by default
-the host machine's, but [`SET
-RIB`](../../../commands/utilities/set.md#rib) may be used to select a
-specific byte ordering for reading and [`SET
-WIB`](../../../commands/utilities/set.md#wib), similarly, for writing.
+the host machine's, but [`SET RIB`](../../../commands/set.md#rib) may
+be used to select a specific byte ordering for reading and [`SET
+WIB`](../../../commands/set.md#wib), similarly, for writing.
The field width should be 4, for 32-bit floating-point numbers, or 8,
for 64-bit floating-point numbers. Other field widths do not produce
Every variable has two output formats, called its "print format"
and "write format". Print formats are used in most output contexts;
-only the [`WRITE`](../../../commands/data-io/write.md) command uses
-write formats. Newly created variables have identical print and write
-formats, and [`FORMATS`](../../../commands/variables/formats.md), the
-most commonly used command for changing formats, sets both of them to
-the same value as well. This means that the distinction between print
-and write formats is usually unimportant.
+only the [`WRITE`](../../../commands/write.md) command uses write
+formats. Newly created variables have identical print and write
+formats, and [`FORMATS`](../../../commands/formats.md), the most
+commonly used command for changing formats, sets both of them to the
+same value as well. This means that the distinction between print and
+write formats is usually unimportant.
Input and output formats are specified to PSPP with a "format
specification" of the form `TypeW` or `TypeW.D`, where `Type` is one
The decimal point character for input and output is always `.`,
even if the decimal point character is a comma (see [`SET
-DECIMAL`](../../../commands/utilities/set.md#decimal)).
+DECIMAL`](../../../commands/set.md#decimal)).
Nonzero, negative values output in `Z` format are marked as
negative even when no nonzero digits are output. For example, -0.2 is
Year. In output, `DATETIME` and `YMDHMS` always produce 4-digit
years; other formats can produce a 2- or 4-digit year. The century
assumed for 2-digit years depends on the
- [`EPOCH`](../../../commands/utilities/set.md#epoch) setting. In
- output, a year outside the epoch causes the whole field to be filled
- with asterisks (`*`).
+ [`EPOCH`](../../../commands/set.md#epoch) setting. In output, a
+ year outside the epoch causes the whole field to be filled with
+ asterisks (`*`).
* `jjj`
Day of year (Julian day), from 1 to 366. This is exactly three
On input, seconds and fractional seconds are optional. The
`DECIMAL` setting controls the character accepted and displayed as
the decimal point (see [`SET
- DECIMAL`](../../../commands/utilities/set.md#decimal)).
+ DECIMAL`](../../../commands/set.md#decimal)).
For output, the date and time formats use the delimiters indicated in
the table. For input, date components may be separated by spaces or by
called the "active dataset". Most PSPP commands work only with the
active dataset. In addition to the active dataset, PSPP also supports
any number of additional open datasets. The [`DATASET`
-commands](../../commands/data-io/dataset.md) can choose a new active
-dataset from among those that are open, as well as create and destroy
+commands](../../commands/dataset.md) can choose a new active dataset
+from among those that are open, as well as create and destroy
datasets.
However, sometimes it's useful to have a variable that keeps its
value between cases. You can do this with
-[`LEAVE`](../../commands/variables/leave.md), or you can use a
-"scratch variable". Scratch variables are variables whose names begin
-with an octothorpe (`#`).
+[`LEAVE`](../../commands/leave.md), or you can use a "scratch
+variable". Scratch variables are variables whose names begin with an
+octothorpe (`#`).
Scratch variables have the same properties as variables left with
`LEAVE`: they retain their values between cases, and for the first
property that they are deleted before the execution of any procedure.
For this reason, scratch variables can't be used for analysis. To use
a scratch variable in an analysis, use
-[`COMPUTE`](../../commands/data/compute.md) to copy its value into an
+[`COMPUTE`](../../commands/compute.md) to copy its value into an
ordinary variable, then use that ordinary variable in the analysis.
* Position
Variables in the dictionary are arranged in a specific order.
- [`DISPLAY`](../../commands/variables/display.md) can show this
- order.
+ [`DISPLAY`](../../commands/display.md) can show this order.
* Initialization
Either reinitialized to 0 or spaces for each case, or left at its
- existing value. Use [`LEAVE`](../../commands/variables/leave.md) to
- avoid reinitializing a variable.
+ existing value. Use [`LEAVE`](../../commands/leave.md) to avoid
+ reinitializing a variable.
* Missing values
Optionally, up to three values, or a range of values, or a specific
while the system-missing value is not a value at all. See [Handling
Missing Values](../../language/basics/missing-values.md) for more
information on missing values. The [`MISSING
- VALUES`](../../commands/variables/missing-values.md) command sets
- missing values.
+ VALUES`](../../commands/missing-values.md) command sets missing
+ values.
* Variable label
A string that describes the variable. The [`VARIABLE
- LABELS`](../../commands/variables/variable-labels.md) command sets
- variable labels.
+ LABELS`](../../commands/variable-labels.md) command sets variable
+ labels.
* Value label
Optionally, these associate each possible value of the variable with
- a string. The [`VALUE
- LABELS`](../../commands/variables/value-labels.md) and [`ADD VALUE
- LABELS`](../../commands/variables/add-value-labels.md) commands set
- value labels.
+ a string. The [`VALUE LABELS`](../../commands/value-labels.md) and
+ [`ADD VALUE LABELS`](../../commands/add-value-labels.md) commands
+ set value labels.
* Print format
Display width, format, and (for numeric variables) number of decimal
places. This attribute does not affect how data are stored, just
how they are displayed. See [Input and Output
Formats](../../language/datasets/formats/index.md) for details. The
- [`FORMATS`](../../commands/variables/formats.md) and [`PRINT
- FORMATS`](../../commands/variables/print-formats.md) commands set
- print formats.
+ [`FORMATS`](../../commands/formats.md) and [`PRINT
+ FORMATS`](../../commands/print-formats.md) commands set print
+ formats.
* Write format
Similar to print format, but used by the
- [`WRITE`](../../commands/data-io/write.md) command. The
- [`FORMATS`](../../commands/variables/formats.md) and [`WRITE
- FORMATS`](../../commands/variables/write-formats.md) commands set
- write formats.
+ [`WRITE`](../../commands/write.md) command. The
+ [`FORMATS`](../../commands/formats.md) and [`WRITE
+ FORMATS`](../../commands/write-formats.md) commands set write
+ formats.
* <a name="measurement-level">Measurement level</a>
One of the following:
attached, such as age in years, income in dollars, or distance in
miles. Only numeric variables are scalar.
- The [`VARIABLE LEVEL`](../../commands/variables/variable-level.md)
- command sets measurement levels.
+ The [`VARIABLE LEVEL`](../../commands/variable-level.md) command
+ sets measurement levels.
Variables created by `COMPUTE` and similar transformations,
obtained from external sources, etc., initially have an unknown
- Scale, if the variable has 24 or more unique valid values. The
value 24 is the default. Use [`SET
- SCALEMIN`](../../commands/utilities/set.md#scalemin) to change the
- default.
+ SCALEMIN`](../../commands/set.md#scalemin) to change the default.
Finally, if none of the above is true, PSPP assigns the variable a
nominal measurement level.
* Custom attributes
User-defined associations between names and values. The [`VARIABLE
- ATTRIBUTE`](../../commands/variables/variable-attribute.md) command
- sets variable atributes.
+ ATTRIBUTE`](../../commands/variable-attribute.md) command sets
+ variable atributes.
* Role
The intended role of a variable for use in dialog boxes in graphical
user interfaces. The [`VARIABLE
- ROLE`](../../commands/variables/variable-role.md) command sets
- variable roles.
+ ROLE`](../../commands/variable-role.md) command sets variable roles.
Halves are rounded away from zero, as are values that fall short of
halves by less than `FUZZBITS` of errors in the least-significant
bits of X. If `FUZZBITS` is not specified then the default is taken
- from [`SET FUZZBITS`](../../../commands/utilities/set.md#fuzzbits),
- which is 6 unless overridden.
+ from [`SET FUZZBITS`](../../../commands/set.md#fuzzbits), which is 6
+ unless overridden.
* `SQRT(X)`
Takes the square root of `X`. If `X` is negative, the result is
`FUZZBITS` of errors in the least-significant bits of `X` are
rounded away from zero. If `FUZZBITS` is not specified then the
default is taken from [`SET
- FUZZBITS`](../../../commands/utilities/set.md#fuzzbits), which is 6
- unless overridden.
+ FUZZBITS`](../../../commands/set.md#fuzzbits), which is 6 unless
+ overridden.
values. Some statistics can be computed on numeric or string values;
other can only be computed on numeric values. Their results have the
same type as their arguments. The current case's
-[weight](../../../commands/selection/weight.md) has no effect on
-statistical functions.
+[weight](../../../commands/weight.md) has no effect on statistical
+functions.
These functions' argument lists may include entire ranges of
variables using the `VAR1 TO VAR2` syntax.
* `YEAR`
Refers to a year, 1582 or greater. Years between 0 and 99 are
treated according to the epoch set on [`SET
- EPOCH`](../../../commands/utilities/set.md#epoch), by default
- beginning 69 years before the current date.
+ EPOCH`](../../../commands/set.md#epoch), by default beginning 69
+ years before the current date.
If these functions' arguments are out-of-range, they are correctly
normalized before conversion to date format. Non-integers are rounded
mydata.sav.gz'`), and for many other purposes.
PSPP also supports declaring named file handles with the [`FILE
-HANDLE`](../../commands/data-io/file-handle.md) command. This command
+HANDLE`](../../commands/file-handle.md) command. This command
associates an identifier of your choice (the file handle's name) with
a file. Later, the file handle name can be substituted for the name
of the file. When PSPP syntax accesses a file multiple times,
`INLINE` is reserved as a file handle name. It refers to the "data
file" embedded into the syntax file between [`BEGIN DATA` and `END
-DATA`](../../commands/data-io/begin-data.md).
+DATA`](../../commands/begin-data.md).
The file to which a file handle refers may be reassigned on a later
`FILE HANDLE` command if it is first closed using [`CLOSE FILE
-HANDLE`](../../commands/data-io/close-file-handle.md).
+HANDLE`](../../commands/close-file-handle.md).
These names (synonyms) refer to the file that contains instructions
that tell PSPP what to do. The syntax file's name is specified on
the PSPP command line. Syntax files can also be read with
- [`INCLUDE`](../../commands/utilities/include.md) or
- [`INSERT`](../../commands/utilities/insert.md).
+ [`INCLUDE`](../../commands/include.md) or
+ [`INSERT`](../../commands/insert.md).
* data file
Data files contain raw data in text or binary format. Data can
time to repair might be related to the time between failures and the
duty cycle of the equipment. The p-value of 0.1 was chosen for this
investigation. In order to investigate this hypothesis, the
-[`REGRESSION`](../../commands/statistics/regression.md) command was
-used. This command not only tests if the variables are related, but
-also identifies the potential linear relationship.
+[`REGRESSION`](../../commands/regression.md) command was used. This
+command not only tests if the variables are related, but also
+identifies the potential linear relationship.
A first attempt includes `duty_cycle`:
There are several things to note about this example.
- The words `data list list` are an example of the [`DATA
- LIST`](../../commands/data-io/data-list.md). command, which tells
- PSPP to prepare for reading data. The word `list` intentionally
- appears twice. The first occurrence is part of the `DATA LIST`
- call, whilst the second tells PSPP that the data is to be read as
- free format data with one record per line.
+ LIST`](../../commands/data-list.md). command, which tells PSPP to
+ prepare for reading data. The word `list` intentionally appears
+ twice. The first occurrence is part of the `DATA LIST` call, whilst
+ the second tells PSPP that the data is to be read as free format
+ data with one record per line.
Usually this manual shows command names and other fixed elements of
syntax in upper case, but case doesn't matter in most parts of
Note that the numeric variable height is displayed to 2 decimal
places, because the format for that variable is `F8.2`. For a
complete description of the `LIST` command, see
-[`LIST`](../../commands/data-io/list.md).
+[`LIST`](../../commands/list.md).
## Reading data from a text file
variables and their formats, since this information is not contained
in the file. It is also possible to specify the file's character
encoding and other parameters. For full details refer to [`DATA
-LIST`](../../commands/data-io/data-list.md).
+LIST`](../../commands/data-list.md).
## Reading data from a pre-prepared PSPP file
Sometimes it's useful to be able to read data from comma separated
text, from spreadsheets, databases or other sources. In these
-instances you should use the [`GET
-DATA`](../../commands/spss-io/get-data.md) command.
+instances you should use the [`GET DATA`](../../commands/get-data.md)
+command.
## Exiting PSPP
procedures which can be used to help identify data which might be
incorrect.
-The [`DESCRIPTIVES`](../../commands/statistics/descriptives.md)
-command is used to generate simple linear statistics for a dataset.
-It is also useful for identifying potential problems in the data. The
-example file `physiology.sav` contains a number of physiological
-measurements of a sample of healthy adults selected at random.
-However, the data entry clerk made a number of mistakes when entering
-the data. The following example illustrates the use of `DESCRIPTIVES`
-to screen this data and identify the erroneous values:
+The [`DESCRIPTIVES`](../../commands/descriptives.md) command is used
+to generate simple linear statistics for a dataset. It is also useful
+for identifying potential problems in the data. The example file
+`physiology.sav` contains a number of physiological measurements of a
+sample of healthy adults selected at random. However, the data entry
+clerk made a number of mistakes when entering the data. The following
+example illustrates the use of `DESCRIPTIVES` to screen this data and
+identify the erroneous values:
```
PSPP> get file='/usr/local/share/pspp/examples/physiology.sav'.
values](../../language/basics/missing-values.md), whereby data can
assume the special value 'SYSMIS', and will be disregarded in future
analysis. You can set the two suspect values to the `SYSMIS` value
-using the [`RECODE`](../../commands/data/recode.md) command.
+using the [`RECODE`](../../commands/recode.md) command.
```
PSPP> recode height (179 = SYSMIS).
```
However an easier and more elegant way uses the
-[`COMPUTE`](../../commands/data/compute.md) command. Since the
-variables are Likert variables in the range (1 ... 5), subtracting
-their value from 6 has the effect of inverting them:
+[`COMPUTE`](../../commands/compute.md) command. Since the variables
+are Likert variables in the range (1 ... 5), subtracting their value
+from 6 has the effect of inverting them:
```
compute VAR = 6 - VAR.
labels of variables v1, v3 and v4, you will notice that they ask very
similar questions. One would therefore expect the values of these
variables (after recoding) to closely follow one another, and we can
-test that with the
-[`RELIABILITY`](../../commands/statistics/reliability.md) command.
-The following example shows a PSPP session where the user recodes
-negatively scaled variables and then requests reliability statistics
-for v1, v3, and v4.
+test that with the [`RELIABILITY`](../../commands/reliability.md)
+command. The following example shows a PSPP session where the user
+recodes negatively scaled variables and then requests reliability
+statistics for v1, v3, and v4.
```
PSPP> get file='/usr/local/share/pspp/examples/hotel.sav'.
variable is normally distributed and thus safe for linear analysis.
In the event that no suitable transformation can be found, then it
would be worth considering an appropriate non-parametric test instead
-of a linear one. See [`NPAR
-TESTS`](../../commands/statistics/npar-tests.md), for information
-about non-parametric tests.
+of a linear one. See [`NPAR TESTS`](../../commands/npar-tests.md),
+for information about non-parametric tests.
seconds are always written as two digits.
* `char file_label[64];`
- [File label](commands/utilities/file-label.md) declared by the user,
- if any. Padded on the right with spaces.
+ [File label](commands/file-label.md) declared by the user, if any.
+ Padded on the right with spaces.
## Record 1: Variables Record
`command`.
`include-leading-zero` is the
-[`LEADZERO`](../commands/utilities/set.md#leadzero) setting for the
-table, where false is `OFF` (the default) and true is `ON`.
+[`LEADZERO`](../commands/set.md#leadzero) setting for the table, where
+false is `OFF` (the default) and true is `ON`.
`missing` is the character used to indicate that a cell contains a
missing value. It is always observed as `.`.