[Introduction](introduction.md)
[License](license.md)
+# Language Syntax
+
+- [Basics](language/basics/index.md)
+ - [Tokens](language/basics/tokens.md)
+ - [Forming Commands](language/basics/commands.md)
+ - [Syntax Variants](language/basics/syntax-variants.md)
+ - [Handling Missing Values](language/basics/missing-values.md)
+- [Datasets](language/datasets/index.md)
+ - [Variables](language/datasets/variables.md)
+ - [Variable Lists](language/datasets/variable-lists.md)
+ - [Input and Output Formats](language/datasets/formats/index.md)
+ - [Basic Numeric Formats](language/datasets/formats/basic.md)
+
# Developer Documentation
- [System File Format](system-file/index.md)
--- /dev/null
+# Forming Commands
+
+Most PSPP commands share a common structure. A command begins with a
+command name, such as `FREQUENCIES`, `DATA LIST`, or `N OF CASES`. The
+command name may be abbreviated to its first word, and each word in the
+command name may be abbreviated to its first three or more characters,
+where these abbreviations are unambiguous.
+
+The command name may be followed by one or more "subcommands". Each
+subcommand begins with a subcommand name, which may be abbreviated to
+its first three letters. Some subcommands accept a series of one or
+more specifications, which follow the subcommand name, optionally
+separated from it by an equals sign (`=`). Specifications may be
+separated from each other by commas or spaces. Each subcommand must
+be separated from the next (if any) by a forward slash (`/`).
+
+There are multiple ways to mark the end of a command. The most common
+way is to end the last line of the command with a period (`.`) as
+described in the previous section. A blank line, or one that consists
+only of white space or comments, also ends a command.
+
--- /dev/null
+This chapter discusses elements common to many PSPP commands. Later
+chapters describe individual commands in detail.
--- /dev/null
+# Handling Missing Values
+
+PSPP includes special support for unknown numeric data values. Missing
+observations are assigned a special value, called the "system-missing
+value". This "value" actually indicates the absence of a value; it
+means that the actual value is unknown. Procedures automatically
+exclude from analyses those observations or cases that have missing
+values. Details of missing value exclusion depend on the procedure and
+can often be controlled by the user; refer to descriptions of individual
+procedures for details.
+
+ The system-missing value exists only for numeric variables. String
+variables always have a defined value, even if it is only a string of
+spaces.
+
+ Variables, whether numeric or string, can have designated
+"user-missing values". Every user-missing value is an actual value for
+that variable. However, most of the time user-missing values are
+treated in the same way as the system-missing value.
--- /dev/null
+# Syntax Variants
+
+There are three variants of command syntax, which vary only in how they
+detect the end of one command and the start of the next.
+
+ In "interactive mode", which is the default for syntax typed at a
+command prompt, a period as the last non-blank character on a line ends
+a command. A blank line also ends a command.
+
+ In "batch mode", an end-of-line period or a blank line also ends a
+command. Additionally, it treats any line that has a non-blank
+character in the leftmost column as beginning a new command. Thus, in
+batch mode the second and subsequent lines in a command must be
+indented.
+
+ Regardless of the syntax mode, a plus sign, minus sign, or period in
+the leftmost column of a line is ignored and causes that line to begin a
+new command. This is most useful in batch mode, in which the first line
+of a new command could not otherwise be indented, but it is accepted
+regardless of syntax mode.
+
+ The default mode for reading commands from a file is "auto mode". It
+is the same as batch mode, except that a line with a non-blank in the
+leftmost column only starts a new command if that line begins with the
+name of a PSPP command. This correctly interprets most valid PSPP
+syntax files regardless of the syntax mode for which they are intended.
+
+ The `--interactive` (or `-i`) or `--batch` (or `-b`) options set the
+syntax mode for files listed on the PSPP command line.
+
--- /dev/null
+# Tokens
+
+PSPP divides most syntax file lines into series of short chunks called
+"tokens". Tokens are then grouped to form commands, each of which
+tells PSPP to take some action—read in data, write out data, perform a
+statistical procedure, etc. Each type of token is described below.
+
+## Identifiers
+
+Identifiers are names that typically specify variables, commands, or
+subcommands. The first character in an identifier must be a letter,
+`#`, or `@`. The remaining characters in the identifier must be
+letters, digits, or one of the following special characters:
+
+```
+. _ $ # @
+```
+
+Identifiers may be any length, but only the first 64 bytes are
+significant. Identifiers are not case-sensitive: `foobar`,
+`Foobar`, `FooBar`, `FOOBAR`, and `FoObaR` are different
+representations of the same identifier.
+
+Some identifiers are reserved. Reserved identifiers may not be
+used in any context besides those explicitly described in this
+manual. The reserved identifiers are:
+
+```
+ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
+```
+
+## Keywords
+
+Keywords are a subclass of identifiers that form a fixed part of
+command syntax. For example, command and subcommand names are
+keywords. Keywords may be abbreviated to their first 3 characters
+if this abbreviation is unambiguous. (Unique abbreviations of 3 or
+more characters are also accepted: `FRE`, `FREQ`, and `FREQUENCIES`
+are equivalent when the last is a keyword.)
+
+Reserved identifiers are always used as keywords. Other
+identifiers may be used both as keywords and as user-defined
+identifiers, such as variable names.
+
+## Numbers
+
+Numbers are expressed in decimal. A decimal point is optional.
+Numbers may be expressed in scientific notation by adding `e` and a
+base-10 exponent, so that `1.234e3` has the value 1234. Here are
+some more examples of valid numbers:
+
+```
+-5 3.14159265359 1e100 -.707 8945.
+```
+
+Negative numbers are expressed with a `-` prefix. However, in
+situations where a literal `-` token is expected, what appears to
+be a negative number is treated as `-` followed by a positive
+number.
+
+No white space is allowed within a number token, except for
+horizontal white space between `-` and the rest of the number.
+
+The last example above, `8945.` is interpreted as two tokens,
+`8945` and `.`, if it is the last token on a line. *Note Forming
+commands of tokens: Commands.
+
+## Strings
+
+Strings are literal sequences of characters enclosed in pairs of
+single quotes (`'`) or double quotes (`"`). To include the
+character used for quoting in the string, double it, e.g. `'it''s
+an apostrophe'`. White space and case of letters are significant
+inside strings.
+
+Strings can be concatenated using `+`, so that `"a" + 'b' + 'c'` is
+equivalent to `'abc'`. So that a long string may be broken across
+lines, a line break may precede or follow, or both precede and
+follow, the `+`. (However, an entirely blank line preceding or
+following the `+` is interpreted as ending the current command.)
+
+Strings may also be expressed as hexadecimal character values by
+prefixing the initial quote character by `x` or `X`. Regardless of
+the syntax file or active dataset's encoding, the hexadecimal
+digits in the string are interpreted as Unicode characters in UTF-8
+encoding.
+
+> Individual Unicode code points may also be expressed by specifying
+the hexadecimal code point number in single or double quotes
+preceded by `u` or `U`. For example, Unicode code point U+1D11E,
+the musical G clef character, could be expressed as `U'1D11E'`.
+Invalid Unicode code points (above U+10FFFF or in between U+D800
+and U+DFFF) are not allowed.
+
+When strings are concatenated with `+`, each segment's prefix is
+considered individually. For example, `'The G clef symbol is:' +
+u"1d11e" + "."` inserts a G clef symbol in the middle of an
+otherwise plain text string.
+
+## Punctuators and Operators
+
+These tokens are the punctuators and operators:
+
+```
+, / = ( ) + - * / ** < <= <> > >= ~= & | .
+```
+
+Most of these appear within the syntax of commands, but the period
+(`.`) punctuator is used only at the end of a command. It is a
+punctuator only as the last character on a line (except white
+space). When it is the last non-space character on a line, a
+period is not treated as part of another token, even if it would
+otherwise be part of, e.g., an identifier or a floating-point
+number.
+
--- /dev/null
+# Basic Numeric Formats
+
+The basic numeric formats are used for input and output of real numbers
+in standard or scientific notation. The following table shows an
+example of how each format displays positive and negative numbers with
+the default decimal point setting:
+
+|Format |3141.59 |-3141.59|
+|:--------------|--------------:|---------:|
+|`F8.2` |` 3141.59` |`-3141.59`|
+|`COMMA9.2` |` 3,141.59` |`-3,141.59`|
+|`DOT9.2` |` 3.141,59` |`-3.141,59`|
+|`DOLLAR10.2` |` $3,141.59` |`-$3,141.59`|
+|`PCT9.2` |` 3141.59%` |`-3141.59%`|
+|`E8.1` |` 3.1E+003` |`-3.1E+003`|
+
+ On output, numbers in `F` format are expressed in standard decimal
+notation with the requested number of decimal places. The other formats
+output some variation on this style:
+
+ - Numbers in `COMMA` format are additionally grouped every three digits
+ by inserting a grouping character. The grouping character is
+ ordinarily a comma, but it can be changed to a period (*note SET
+ DECIMAL::).
+
+ - `DOT` format is like `COMMA` format, but it interchanges the role of
+ the decimal point and grouping characters. That is, the current
+ grouping character is used as a decimal point and vice versa.
+
+ - `DOLLAR` format is like `COMMA` format, but it prefixes the number with
+ `$`.
+
+ - `PCT` format is like `F` format, but adds `%` after the number.
+
+ - The `E` format always produces output in scientific notation.
+
+ On input, the basic numeric formats accept positive and numbers in
+standard decimal notation or scientific notation. Leading and trailing
+spaces are allowed. An empty or all-spaces field, or one that contains
+only a single period, is treated as the system missing value.
+
+ In scientific notation, the exponent may be introduced by a sign (`+`
+or `-`), or by one of the letters `e` or `d` (in uppercase or
+lowercase), or by a letter followed by a sign. A single space may
+follow the letter or the sign or both.
+
+ On fixed-format `DATA LIST` (*note DATA LIST FIXED::) and in a few
+other contexts, decimals are implied when the field does not contain a
+decimal point. In `F6.5` format, for example, the field `314159` is taken
+as the value 3.14159 with implied decimals. Decimals are never implied
+if an explicit decimal point is present or if scientific notation is
+used.
+
+ `E` and `F` formats accept the basic syntax already described. The other
+formats allow some additional variations:
+
+- `COMMA`, `DOLLAR`, and `DOT` formats ignore grouping characters within
+ the integer part of the input field. The identity of the grouping
+ character depends on the format.
+
+- `DOLLAR` format allows a dollar sign to precede the number. In a
+ negative number, the dollar sign may precede or follow the minus
+ sign.
+
+- `PCT` format allows a percent sign to follow the number.
+
+ All of the basic number formats have a maximum field width of 40 and
+accept no more than 16 decimal places, on both input and output. Some
+additional restrictions apply:
+
+- As input formats, the basic numeric formats allow no more decimal
+ places than the field width. As output formats, the field width
+ must be greater than the number of decimal places; that is, large
+ enough to allow for a decimal point and the number of requested
+ decimal places. `DOLLAR` and `PCT` formats must allow an additional
+ column for `$` or `%`.
+
+- The default output format for a given input format increases the
+ field width enough to make room for optional input characters. If
+ an input format calls for decimal places, the width is increased by
+ 1 to make room for an implied decimal point. `COMMA`, `DOT`, and
+ `DOLLAR` formats also increase the output width to make room for
+ grouping characters. `DOLLAR` and `PCT` further increase the output
+ field width by 1 to make room for `$` or `%`. The increased output
+ width is capped at 40, the maximum field width.
+
+- The `E` format is exceptional. For output, `E` format has a minimum
+ width of 7 plus the number of decimal places. The default output
+ format for an `E` input format is an `E` format with at least 3 decimal
+ places and thus a minimum width of 10.
+
+More details of basic numeric output formatting are given below:
+
+- Output rounds to nearest, with ties rounded away from zero. Thus,
+ 2.5 is output as `3` in `F1.0` format, and -1.125 as `-1.13` in `F5.1`
+ format.
+
+- The system-missing value is output as a period in a field of
+ spaces, placed in the decimal point's position, or in the rightmost
+ column if no decimal places are requested. A period is used even
+ if the decimal point character is a comma.
+
+- A number that does not fill its field is right-justified within the
+ field.
+
+- A number is too large for its field causes decimal places to be
+ dropped to make room. If dropping decimals does not make enough
+ room, scientific notation is used if the field is wide enough. If
+ a number does not fit in the field, even in scientific notation,
+ the overflow is indicated by filling the field with asterisks
+ (`*`).
+
+- `COMMA`, `DOT`, and `DOLLAR` formats insert grouping characters only if
+ space is available for all of them. Grouping characters are never
+ inserted when all decimal places must be dropped. Thus, 1234.56 in
+ `COMMA5.2` format is output as ` 1235` without a comma, even though
+ there is room for one, because all decimal places were dropped.
+
+- `DOLLAR` or `PCT` format drop the `$` or `%` only if the number would
+ not fit at all without it. Scientific notation with `$` or `%` is
+ preferred to ordinary decimal notation without it.
+
+- Except in scientific notation, a decimal point is included only
+ when it is followed by a digit. If the integer part of the number
+ being output is 0, and a decimal point is included, then PSPP
+ ordinarily drops the zero before the decimal point. However, in
+ `F`, `COMMA`, or `DOT` formats, PSPP keeps the zero if `SET
+ LEADZERO` is set to `ON` (*note SET LEADZERO::).
+
+ In scientific notation, the number always includes a decimal point,
+ even if it is not followed by a digit.
+
+- A negative number includes a minus sign only in the presence of a
+ nonzero digit: -0.01 is output as `-.01` in `F4.2` format but as
+ ` .0` in `F4.1` format. Thus, a "negative zero" never includes a
+ minus sign.
+
+- In negative numbers output in `DOLLAR` format, the dollar sign
+ follows the negative sign. Thus, -9.99 in `DOLLAR6.2` format is
+ output as `-$9.99`.
+
+- In scientific notation, the exponent is output as `E` followed by
+ `+` or `-` and exactly three digits. Numbers with magnitude less
+ than 10**-999 or larger than 10**999 are not supported by most
+ computers, but if they are supported then their output is
+ considered to overflow the field and they are output as asterisks.
+
+- On most computers, no more than 15 decimal digits are significant
+ in output, even if more are printed. In any case, output precision
+ cannot be any higher than input precision; few data sets are
+ accurate to 15 digits of precision. Unavoidable loss of precision
+ in intermediate calculations may also reduce precision of output.
+
+- Special values such as infinities and "not a number" values are
+ usually converted to the system-missing value before printing. In
+ a few circumstances, these values are output directly. In fields
+ of width 3 or greater, special values are output as however many
+ characters fit from `+Infinity` or `-Infinity` for infinities, from
+ `NaN` for "not a number," or from `Unknown` for other values (if
+ any are supported by the system). In fields under 3 columns wide,
+ special values are output as asterisks.
--- /dev/null
+# Input and Output Formats
+
+An "input format" describes how to interpret the contents of an input
+field as a number or a string. It might specify that the field contains
+an ordinary decimal number, a time or date, a number in binary or
+hexadecimal notation, or one of several other notations. Input formats
+are used by commands such as `DATA LIST` that read data or syntax files
+into the PSPP active dataset.
+
+ Every input format corresponds to a default "output format" that
+specifies the formatting used when the value is output later. It is
+always possible to explicitly specify an output format that resembles
+the input format. Usually, this is the default, but in cases where the
+input format is unfriendly to human readability, such as binary or
+hexadecimal formats, the default output format is an easier-to-read
+decimal format.
+
+ Every variable has two output formats, called its "print format" and
+"write format". Print formats are used in most output contexts; write
+formats are used only by `WRITE` (*note WRITE::). Newly created
+variables have identical print and write formats, and `FORMATS`, the
+most commonly used command for changing formats (*note FORMATS::), sets
+both of them to the same value as well. Thus, most of the time, the
+distinction between print and write formats is unimportant.
+
+ Input and output formats are specified to PSPP with a "format
+specification" of the form `TypeW` or `TypeW.D`, where `Type` is one
+of the format types described later, `W` is a field width measured in
+columns, and `D` is an optional number of decimal places. If `D` is
+omitted, a value of 0 is assumed. Some formats do not allow a nonzero
+`D` to be specified.
--- /dev/null
+# Datasets
+
+PSPP works with data organized into "datasets". A dataset consists of a
+set of "variables", which taken together are said to form a
+"dictionary", and one or more "cases", each of which has one value for
+each variable.
+
+ At any given time PSPP has exactly one distinguished dataset, called
+the "active dataset". Most PSPP commands work only with the active
+dataset. In addition to the active dataset, PSPP also supports any
+number of additional open datasets. The `DATASET` commands can choose a
+new active dataset from among those that are open, as well as create and
+destroy datasets (*note DATASET::).
--- /dev/null
+# Variable Lists
+
+To refer to a set of variables, list their names one after another.
+Optionally, their names may be separated by commas. To include a
+range of variables from the dictionary in the list, write the name of
+the first and last variable in the range, separated by `TO`. For
+instance, if the dictionary contains six variables with the names
+`ID`, `X1`, `X2`, `GOAL`, `MET`, and `NEXTGOAL`, in that order, then
+`X2 TO MET` would include variables `X2`, `GOAL`, and `MET`.
+
+ Commands that define variables, such as `DATA LIST`, give `TO` an
+alternate meaning. With these commands, `TO` define sequences of
+variables whose names end in consecutive integers. The syntax is two
+identifiers that begin with the same root and end with numbers,
+separated by `TO`. The syntax `X1 TO X5` defines 5 variables, named
+`X1`, `X2`, `X3`, `X4`, and `X5`. The syntax `ITEM0008 TO ITEM0013`
+defines 6 variables, named `ITEM0008`, `ITEM0009`, `ITEM0010`,
+`ITEM0011`, `ITEM0012`, and `ITEM00013`. The syntaxes `QUES001 TO
+QUES9` and `QUES6 TO QUES3` are invalid.
+
+ After a set of variables has been defined with `DATA LIST` or
+another command with this method, the same set can be referenced on
+later commands using the same syntax.
+
--- /dev/null
+# Attributes of Variables
+
+Each variable has a number of attributes, including:
+
+* Name
+ An identifier, up to 64 bytes long. Each variable must have a
+ different name. *Note Tokens::.
+
+ Some system variable names begin with `$`, but user-defined
+ variables' names may not begin with `$`.
+
+ The final character in a variable name should not be `.`, because
+ such an identifier will be misinterpreted when it is the final
+ token on a line: `FOO.` is divided into two separate tokens, `FOO`
+ and `.`, indicating end-of-command. *Note Tokens::.
+
+ The final character in a variable name should not be `_`, because
+ some such identifiers are used for special purposes by PSPP
+ procedures.
+
+ As with all PSPP identifiers, variable names are not
+ case-sensitive. PSPP capitalizes variable names on output the same
+ way they were capitalized at their point of definition in the
+ input.
+
+* Type
+ Numeric or string.
+
+* Width (string variables only)
+ String variables with a width of 8
+ characters or fewer are called "short string variables". Short
+ string variables may be used in a few contexts where "long string
+ variables" (those with widths greater than 8) are not allowed.
+
+* Position
+ Variables in the dictionary are arranged in a specific order.
+ `DISPLAY` can be used to show this order: see *note DISPLAY::.
+
+* Initialization
+ Either reinitialized to 0 or spaces for each case, or left at its
+ existing value. *Note LEAVE::.
+
+* Missing values
+ Optionally, up to three values, or a range of values, or a specific
+ value plus a range, can be specified as "user-missing values".
+ There is also a "system-missing value" that is assigned to an
+ observation when there is no other obvious value for that
+ observation. Observations with missing values are automatically
+ excluded from analyses. User-missing values are actual data
+ values, while the system-missing value is not a value at all.
+ *Note Missing Observations::.
+
+* Variable label
+ A string that describes the variable. *Note VARIABLE LABELS::.
+
+* Value label
+ Optionally, these associate each possible value of the variable
+ with a string. *Note VALUE LABELS::.
+
+* Print format
+ Display width, format, and (for numeric variables) number of
+ decimal places. This attribute does not affect how data are
+ stored, just how they are displayed. Example: a width of 8, with 2
+ decimal places. *Note Input and Output Formats::.
+
+* Write format
+ Similar to print format, but used by the `WRITE` command (*note
+ WRITE::).
+
+* Measurement level
+ One of the following:
+
+ - *Nominal*: Each value of a nominal variable represents a distinct
+ category. The possible categories are finite and often have value
+ labels. The order of categories is not significant. Political
+ parties, US states, and yes/no choices are nominal. Numeric and
+ string variables can be nominal.
+
+ - *Ordinal*: Ordinal variables also represent distinct categories, but
+ their values are arranged according to some natural order. Likert
+ scales, e.g. from strongly disagree to strongly agree, are
+ ordinal. Data grouped into ranges, e.g. age groups or income
+ groups, are ordinal. Both numeric and string variables can be
+ ordinal. String values are ordered alphabetically, so letter
+ grades from A to F will work as expected, but `poor`,
+ `satisfactory`, `excellent` will not.
+
+ - *Scale*: Scale variables are ones for which differences and ratios
+ are meaningful. These are often values which have a natural unit
+ attached, such as age in years, income in dollars, or distance in
+ miles. Only numeric variables are scalar.
+
+ Variables created by `COMPUTE` and similar transformations,
+ obtained from external sources, etc., initially have an unknown
+ measurement level. Any procedure that reads the data will then
+ assign a default measurement level. PSPP can assign some defaults
+ without reading the data:
+
+ - Nominal, if it's a string variable.
+
+ - Nominal, if the variable has a WKDAY or MONTH print format.
+
+ - Scale, if the variable has a DOLLAR, CCA through CCE, or time
+ or date print format.
+
+ Otherwise, PSPP reads the data and decides based on its
+ distribution:
+
+ - Nominal, if all observations are missing.
+
+ - Scale, if one or more valid observations are noninteger or
+ negative.
+
+ - Scale, if no valid observation is less than 10.
+
+ - Scale, if the variable has 24 or more unique valid values.
+ The value 24 is the default and can be adjusted (*note SET
+ SCALEMIN::).
+
+ Finally, if none of the above is true, PSPP assigns the variable a
+ nominal measurement level.
+
+* Custom attributes
+ User-defined associations between names and values. *Note VARIABLE
+ ATTRIBUTE::.
+
+* Role
+ The intended role of a variable for use in dialog boxes in
+ graphical user interfaces. *Note VARIABLE ROLE::.
+
--- /dev/null
+This chapter discusses elements common to many PSPP commands. Later
+chapters describe individual commands in detail.
License](https://www.gnu.org/licenses/licenses.html#GPL), version 3 or
later. This manual is licensed under the [GNU Free Documentation
License](https://www.gnu.org/licenses/licenses.html#FDL), version 1.3
-or later.
+or later; with no Invariant Sections, no Front-Cover Texts, and no
+Back-Cover Texts.