@quotation
@strong{Please note:} PSPP is not even close to completion.
-Only a few actual statistical procedures are implemented. PSPP
+Only a few statistical procedures are implemented. PSPP
is a work in progress.
@end quotation
* Missing Observations:: Handling missing observations.
* Variables:: The unit of data storage.
* Files:: Files used by PSPP.
+* File Handles:: How files are named.
* BNF:: How command syntax is described.
@end menu
@cindex language, tokens
@cindex tokens
@cindex lexical analysis
-@cindex lexemes
PSPP divides most syntax file lines into series of short chunks
-called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}. These
-tokens are then grouped to form commands, each of which tells
+called @dfn{tokens}.
+Tokens are then grouped to form commands, each of which tells
PSPP to take some action---read in data, write out data, perform
-a statistical procedure, etc. The process of dividing input into tokens
-is @dfn{tokenization}, or @dfn{lexical analysis}. Each type of token is
+a statistical procedure, etc. Each type of token is
described below.
-@cindex delimiters
-@cindex whitespace
-Tokens must be separated from each other by @dfn{delimiters}.
-Delimiters include whitespace (spaces, tabs, carriage returns, line
-feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
-operators (plus, minus, times, divide, etc.) Note that while whitespace
-only separates tokens, other delimiters are tokens in themselves.
-
@table @strong
@cindex identifiers
@item Identifiers
-Identifiers are names that specify variable names, commands, or command
-details.
-
-@itemize @bullet
-@item
-The first character in an identifier must be a letter, @samp{#}, or
-@samp{@@}. Some system identifiers begin with @samp{$}, but
-user-defined variables' names may not begin with @samp{$}.
-
-@item
-The remaining characters in the identifier must be letters, digits, or
-one of the following special characters:
+Identifiers are names that typically specify variables, commands, or
+subcommands. The first character in an identifier must be a letter,
+@samp{#}, or @samp{@@}. The remaining characters in the identifier
+must be letters, digits, or one of the following special characters:
@example
-. _ $ # @@
+@center @. _ $ # @@
@end example
-@item
-@cindex variable names
-@cindex names, variable
-Variable names may be up any length up to 64 bytes long.
-
-
-@item
@cindex case-sensitivity
-Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
-@code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
-representations of the same identifier.
+Identifiers may be any length, but only the first 64 bytes are
+significant. Identifiers are not case-sensitive: @code{foobar},
+@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
+different representations of the same identifier.
-@item
-@cindex keywords
-Identifiers other than variable names may be abbreviated to their first
-3 characters if this abbreviation is unambiguous. These identifiers are
-often called @dfn{keywords}. (Unique abbreviations of 3 or more
-characters are also accepted: @samp{FRE}, @samp{FREQ}, and
-@samp{FREQUENCIES} are equivalent when the last is a keyword.)
-
-@item
-Whether an identifier is a keyword depends on the context.
-
-@item
-@cindex keywords, reserved
-@cindex reserved keywords
-Some keywords are reserved. These keywords may not be used in any
-context besides those explicitly described in this manual. The reserved
-keywords are:
+@cindex identifiers, reserved
+@cindex reserved identifiers
+Some identifiers are reserved. Reserved identifiers may not be used
+in any context besides those explicitly described in this manual. The
+reserved identifiers are:
@example
-ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
+@center ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
@end example
-@item
-Since keywords are identifiers, all the rules for identifiers apply.
-Specifically, they must be delimited as are other identifiers:
-@code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
-variable name.
-@end itemize
+@item Keywords
+Keywords are a subclass of identifiers that form a fixed part of
+command syntax. For example, command and subcommand names are
+keywords. Keywords may be abbreviated to their first 3 characters if
+this abbreviation is unambiguous. (Unique abbreviations of 3 or more
+characters are also accepted: @samp{FRE}, @samp{FREQ}, and
+@samp{FREQUENCIES} are equivalent when the last is a keyword.)
-@cindex @samp{.}
-@cindex period
-@cindex variable names, ending with period
-@strong{Caution:} It is legal to end a variable name with a period, but
-@emph{don't do it!} The variable name will be misinterpreted when it is
-the final token on a line: @code{FOO.} will be divided into two separate
-tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
-@xref{Commands, , Forming commands of tokens}.
+Reserved identifiers are always used as keywords. Other identifiers
+may be used both as keywords and as user-defined identifiers, such as
+variable names.
@item Numbers
@cindex numbers
@cindex integers
@cindex reals
-Numbers may be specified as integers or reals. Integers are internally
-converted into reals. Scientific notation is not supported. Here are
-some examples of valid numbers:
+Numbers are expressed in decimal. A decimal point is optional.
+Numbers may be expressed in scientific notation by adding @samp{e} and
+a base-10 exponent, so that @samp{1.234e3} has the value 1234. Here
+are some more examples of valid numbers:
@example
-1234 3.14159265359 .707106781185 8945.
+-5 3.14159265359 1e100 -.707 8945.
@end example
-@strong{Caution:} The last example will be interpreted as two tokens,
-@samp{8945} and @samp{.}, if it is the last token on a line.
+Negative numbers are expressed with a @samp{-} prefix. However, in
+situations where a literal @samp{-} token is expected, what appears to
+be a negative number is treated as @samp{-} followed by a positive
+number.
+
+No white space is allowed within a number token, except for horizontal
+white space between @samp{-} and the rest of the number.
+
+The last example above, @samp{8945.} will be interpreted as two
+tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
+@xref{Commands, , Forming commands of tokens}.
@item Strings
@cindex strings
@cindex @samp{'}
@cindex @samp{"}
@cindex case-sensitivity
-Strings are literal sequences of characters enclosed in pairs of single
-quotes (@samp{'}) or double quotes (@samp{"}).
-
-@itemize @bullet
-@item
-Whitespace and case of letters @emph{are} significant inside strings.
-@item
-Whitespace characters inside a string are not delimiters.
-@item
-To include single-quote characters in a string, enclose the string in
-double quotes.
-@item
-To include double-quote characters in a string, enclose the string in
-single quotes.
-@item
-It is not possible to put both single- and double-quote characters
-inside one string.
-@end itemize
-
-@item Hexstrings
-@cindex hexstrings
-Hexstrings are string variants that use hex digits to specify
-characters.
-
-@itemize @bullet
-@item
-A hexstring may be used anywhere that an ordinary string is allowed.
-
-@item
-@cindex @samp{X'}
-@cindex @samp{'}
-A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
-
-@cindex whitespace
-@item
-No whitespace is allowed between the initial @samp{X} and @samp{'}.
-
-@item
-Double quotes @samp{"} may be used in place of single quotes @samp{'} if
-done in both places.
-
-@item
-Each pair of hex digits is internally changed into a single character
-with the given value.
-
-@item
-If there is an odd number of hex digits, the missing last digit is
-assumed to be @samp{0}.
-
-@item
-@cindex portability
-@strong{Please note:} Use of hexstrings is nonportable because the same
-numeric values are associated with different glyphs by different
-operating systems. Therefore, their use should be confined to syntax
-files that will not be widely distributed.
+Strings are literal sequences of characters enclosed in pairs of
+single quotes (@samp{'}) or double quotes (@samp{"}). To include the
+character used for quoting in the string, double it, e.g.@:
+@samp{'it''s an apostrophe'}. White space and case of letters are
+significant inside strings.
+
+Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
+'c'} is equivalent to @samp{'abc'}. Concatenation is useful for
+splitting a single string across multiple source lines. The maximum
+length of a string, after concatenation, is 255 characters.
+
+Strings may also be expressed as hexadecimal, octal, or binary
+character values by prefixing the initial quote character by @samp{X},
+@samp{O}, or @samp{B} or their lowercase equivalents. Each pair,
+triplet, or octet of characters, according to the radix, is
+transformed into a single character with the given value. If there is
+an incomplete group of characters, the missing final digits are
+assumed to be @samp{0}. These forms of strings are nonportable
+because numeric values are associated with different characters by
+different operating systems. Therefore, their use should be confined
+to syntax files that will not be widely distributed.
-@item
@cindex characters, reserved
@cindex 0
-@cindex whitespace
-@strong{Please note also:} The character with value 00 is reserved for
+@cindex white space
+The character with value 00 is reserved for
internal use by PSPP. Its use in strings causes an error and
-replacement with a blank space (in ASCII, hex 20, decimal 32).
-@end itemize
-
-@item Punctuation
-@cindex punctuation
-Punctuation separates tokens; punctuators are delimiters. These are the
-punctuation characters:
+replacement by a space character.
-@example
-, / = ( )
-@end example
-
-@item Operators
+@item Punctuators and Operators
+@cindex punctuators
@cindex operators
-Operators describe mathematical operations. Some operators are delimiters:
+These tokens are the punctuators and operators:
@example
-( ) + - * / **
+@center , / = ( ) + - * / ** < <= <> > >= ~= & | .
@end example
-Many of the above operators are also punctuators. Punctuators are
-distinguished from operators by context.
-
-The other operators are all reserved keywords. None of these are
-delimiters:
-
-@example
-AND EQ GE GT LE LT NE OR
-@end example
-
-@item Terminal Dot
-@cindex terminal dot
-@cindex dot, terminal
-@cindex period
-@cindex @samp{.}
-A period (@samp{.}) at the end of a line (except for whitespace) is one
-type of a @dfn{terminal dot}, although not every terminal dot is a
-period at the end of a line. @xref{Commands, , Forming commands of
-tokens}. A period is a terminal dot @emph{only}
-when it is at the end of a line; otherwise it is part of a
-floating-point number. (A period outside a number in the middle of a
-line is an error.)
-
-@quotation
-@cindex terminal dot, changing
-@cindex dot, terminal, changing
-@strong{Please note:} The character used for the @dfn{terminal dot}
-can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}). This
-is strongly discouraged, and throughout all the remainder of this
-manual it will be assumed that the default setting is in effect.
-@end quotation
-
+Most of these appear within the syntax of commands, but the period
+(@samp{.}) punctuator is used only at the end of a command. It is a
+punctuator only as the last character on a line (except white space).
+When it is the last non-space character on a line, a period is not
+treated as part of another token, even if it would otherwise be part
+of, e.g.@:, an identifier or a floating-point number.
+
+Actually, the character that ends a command can be changed with
+@cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
+doing so. Throughout the remainder of this manual we will assume that
+the default setting is in effect.
@end table
@node Commands, Types of Commands, Tokens, Language
@cindex language, command structure
@cindex commands, structure
-Most PSPP commands share a common structure, diagrammed below:
-
-@example
-@var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
-@end example
-
-@cindex @samp{[ ]}
-In the above, rather daunting, expression, pairs of square brackets
-(@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
-indicate parts of the syntax that vary from command to command.
-Ellipses (@samp{...}) indicate that the preceding part may be repeated
-an arbitrary number of times. Let's pick apart what it says above:
-
-@itemize @bullet
-@cindex commands, names
-@item
-A command begins with a command name of one or more keywords, such as
-@cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}. @var{cmd}
-may be abbreviated to its first word if that is unambiguous; each word
-in @var{cmd} may be abbreviated to a unique prefix of three or more
-characters as described above.
-
-@cindex subcommands
-@item
-The command name may be followed by one or more @dfn{subcommands}:
-
-@itemize @minus
-@item
-Each subcommand begins with a unique keyword, indicated by @var{sbc}
-above. This is analogous to the command name.
-
-@item
-The subcommand name is optionally followed by an equals sign (@samp{=}).
-
-@item
-Some subcommands accept a series of one or more specifications
-(@var{spec}), optionally separated by commas.
-
-@item
-Each subcommand must be separated from the next (if any) by a forward
-slash (@samp{/}).
-@end itemize
-
-@cindex dot, terminal
-@cindex terminal dot
-@item
-Each command must be terminated with a @dfn{terminal dot}.
-The terminal dot may be given one of three ways:
-
-@itemize @minus
-@item
-(most commonly) A period character at the very end of a line, as
-described above.
-
-@item
-(only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
-details.) A completely blank line.
-
-@item
-(in batch mode only) Any line that is not indented from the left side of
-the page causes a terminal dot to be inserted before that line.
-Therefore, each command begins with a line that is flush left, followed
-by zero or more lines that are indented one or more characters from the
-left margin.
-
-In batch mode, PSPP will ignore a plus sign, minus sign, or period
-(@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
-line. Any of these characters as the first character on a line will
-begin a new command. This allows for visual indentation of a command
-without that command being considered part of the previous command.
-
-PSPP is in batch mode when it is reading input from a file, rather
-than from an interactive user. Note that the other forms of the
-terminal dot may also be used in batch mode.
-
-Sometimes, one encounters syntax files that are intended to be
-interpreted in interactive mode rather than batch mode (for instance,
-this can happen if a session log file is used directly as a syntax
-file). When this occurs, use the @samp{-i} command line option to force
-interpretation in interactive mode (@pxref{Language control options}).
-@end itemize
-@end itemize
-
-PSPP ignores empty commands when they are generated by the above
-rules. Note that, as a consequence of these rules, each command must
-begin on a new line.
+Most PSPP commands share a common structure. A command begins with a
+command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
+CASES}. The command name may be abbreviated to its first word, and
+each word in the command name may be abbreviated to its first three
+or more characters, where these abbreviations are unambiguous.
+
+The command name may be followed by one or more @dfn{subcommands}.
+Each subcommand begins with a subcommand name, which may be
+abbreviated to its first three letters. Some subcommands accept a
+series of one or more specifications, which follow the subcommand
+name, optionally separated from it by an equals sign
+(@samp{=}). Specifications may be separated from each other
+by commas or spaces. Each subcommand must be separated from the next (if any)
+by a forward slash (@samp{/}).
+
+There are multiple ways to mark the end of a command. The most common
+way is to end the last line of the command with a period (@samp{.}) as
+described in the previous section (@pxref{Tokens}). A blank line, or
+one that consists only of white space or comments, also ends a command
+by default, although you can use the NULLINE subcommand of @cmd{SET}
+to disable this feature (@pxref{SET}).
+
+In batch mode only, that is, when reading commands from a file instead
+of an interactive user, any line that contains a non-space character
+in the leftmost column begins a new command. Thus, each command
+consists of a flush-left line followed by any number of lines indented
+from the left margin. In this mode, a plus or minus sign
+(@samp{+}, @samp{@minus{}}) as the first character
+in a line is ignored and causes that line to begin a new command,
+which allows for visual indentation of a command without that command
+being considered part of the previous command.
@node Types of Commands, Order of Commands, Commands, Language
@section Types of Commands
@item File definition commands
@cindex file definition commands
Give instructions for reading data from text files or from special
-binary ``system files''. Most of these commands discard any previous
-data or variables to replace it with the new data and
-variables. At least one must appear before the first command in any of
+binary ``system files''. Most of these commands replace any previous
+data or variables with new data or
+variables. At least one file definition command must appear before the first command in any of
the categories below. @xref{Data Input and Output}.
@item Input program commands
@cindex input program commands
-Though rarely used, these provide powerful tools for reading data files
+Though rarely used, these provide tools for reading data files
in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
@item Transformations
@item Restricted transformations
@cindex restricted transformations
-Same as transformations for most purposes. @xref{Order of Commands}, for a
-detailed description of the differences.
+Transformations that cannot appear in certain contexts. @xref{Order
+of Commands}, for details.
@item Procedures
@cindex procedures
@cindex commands, ordering
@cindex order of commands
-PSPP does not place many restrictions on ordering of commands.
-The main restriction is that variables must be defined with one of the
-file-definition commands before they are otherwise referred to.
+PSPP does not place many restrictions on ordering of commands. The
+main restriction is that variables must be defined before they are otherwise
+referenced. This section describes the details of command ordering,
+but most users will have no need to refer to them.
-Of course, there are specific rules, for those who are interested.
PSPP possesses five internal states, called initial, INPUT PROGRAM,
FILE TYPE, transformation, and procedure states. (Please note the
distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
@emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
-PSPP starts up in the initial state. Each successful completion
+PSPP starts in the initial state. Each successful completion
of a command may cause a state transition. Each type of command has its
own rules for state transitions:
@item Utility commands
@itemize @bullet
@item
-Legal in all states.
+Valid in any state.
@item
Do not cause state transitions. Exception: when @cmd{N OF CASES}
is executed in the procedure state, it causes a transition to the
@item @cmd{DATA LIST}
@itemize @bullet
@item
-Legal in all states.
+Valid in any state.
@item
When executed in the initial or procedure state, causes a transition to
the transformation state.
PSPP includes special support for unknown numeric data values.
Missing observations are assigned a special value, called the
@dfn{system-missing value}. This ``value'' actually indicates the
-absence of value; it means that the actual value is unknown. Procedures
+absence of a value; it means that the actual value is unknown. Procedures
automatically exclude from analyses those observations or cases that
-have missing values. Whether single observations or entire cases are
-excluded depends on the procedure.
+have missing values. Details of missing value exclusion depend on the
+procedure and can often be controlled by the user; refer to
+descriptions of individual procedures for details.
The system-missing value exists only for numeric variables. String
variables always have a defined value, even if it is only a string of
@table @strong
@item Name
-This is an identifier. Each variable must have a different name.
+An identifier, up to 64 bytes long. Each variable must have a different name.
@xref{Tokens}.
+Some system variable names begin with @samp{$}, but user-defined
+variables' names may not begin with @samp{$}.
+
+@cindex @samp{.}
+@cindex period
+@cindex variable names, ending with period
+The final character in a variable name should not be @samp{.}, because
+such an identifier will be misinterpreted when it is the final token
+on a line: @code{FOO.} will be divided into two separate tokens,
+@samp{FOO} and @samp{.}, indicating end-of-command. @xref{Tokens}.
+
+@cindex @samp{_}
+The final character in a variable name should not be @samp{_}, because
+some such identifiers are used for special purposes by PSPP
+procedures.
+
+As with all PSPP identifiers, variable names are not case-sensitive.
+PSPP capitalizes variable names on output the same way they were
+capitalized at their point of definition in the input.
+
@cindex variables, type
@cindex type of variables
@item Type
can be used in many procedures where @dfn{long string variables} (those
with widths greater than 8) are not allowed.
-@quotation
-@strong{Please note:} Certain systems may consider strings longer than 8
+Certain systems may consider strings longer than 8
characters to be short strings. Eight characters represents a minimum
figure for the maximum length of a short string.
-@end quotation
@item Position
Variables in the dictionary are arranged in a specific order.
@cindex variables, system
There are seven system variables. These are not like ordinary
-variables, as they are not stored in each case. They can only be used
+variables because system variables are not always stored. They can be used only
in expressions. These system variables, whose values and output formats
cannot be modified, are described below.
@cindex TO convention
@cindex convention, TO
-There are several ways to specify a set of variables:
-
-@enumerate
-@item
-(Most commonly.) List the variable names one after another, optionally
-separating them by commas.
-
-@cindex @code{TO}
-@item
-(This method cannot be used on commands that define the dictionary, such
-as @cmd{DATA LIST}.) The syntax is the names of two existing variables,
-separated by the reserved keyword @code{TO}. The meaning is to include
-every variable in the dictionary between and including the variables
-specified. For instance, if the dictionary contains six variables with
-the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
+To refer to a set of variables, list their names one after another.
+Optionally, their names may be separated by commas. To include a
+range of variables from the dictionary in the list, write the name of
+the first and last variable in the range, separated by @code{TO}. For
+instance, if the dictionary contains six variables with the names
+@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
variables @code{X2}, @code{GOAL}, and @code{MET}.
-@item
-(This method can be used only on commands that define the dictionary,
-such as @cmd{DATA LIST}.) It is used to define sequences of variables
-that end in consecutive integers. The syntax is two identifiers that
-end in numbers. This method is best illustrated with examples:
-
-@itemize @bullet
-@item
-The syntax @code{X1 TO X5} defines 5 variables:
-
-@itemize @minus
-@item
-X1
-@item
-X2
-@item
-X3
-@item
-X4
-@item
-X5
-@end itemize
-
-@item
-The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
-
-@itemize @minus
-@item
-ITEM0008
-@item
-ITEM0009
-@item
-ITEM0010
-@item
-ITEM0011
-@item
-ITEM0012
-@item
-ITEM0013
-@end itemize
-
-@item
-Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
-are invalid, although for different reasons, which should be evident.
-@end itemize
-
-Note that after a set of variables has been defined with @cmd{DATA LIST}
-or another command with this method, the same set can be referenced on
+Commands that define variables, such as @cmd{DATA LIST}, give
+@code{TO} an alternate meaning. With these commands, @code{TO} define
+sequences of variables whose names end in consecutive integers. The
+syntax is two identifiers that begin with the same root and end with
+numbers, separated by @code{TO}. The syntax @code{X1 TO X5} defines 5
+variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
+@code{X5}. The syntax @code{ITEM0008 TO ITEM0013} defines 6
+variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
+@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}. The syntaxes
+@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
+
+After a set of variables has been defined with @cmd{DATA LIST} or
+another command with this method, the same set can be referenced on
later commands using the same syntax.
-@item
-The above methods can be combined, either one after another or delimited
-by commas. For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
-is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
-is individually legal.
-@end enumerate
-
@node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
@subsection Input and Output Formats
included then it is assumed to be 0. Some formats do not allow @var{d}
to be specified.
-When an input format is specified on @cmd{DATA LIST} or another
-command, then
-it is converted to an output format for the purposes of @cmd{PRINT}
-and other
-data output commands. For most purposes, input and output formats are
-the same; the salient differences are described below.
+When @cmd{DATA LIST} or another command specifies an input format,
+that format is converted to an output format for the purposes of
+@cmd{PRINT} and other data output commands. For most purposes, input
+and output formats are the same; the salient differences are described
+below.
Below are listed the input and output formats supported by PSPP. If an
input format is mapped to a different output format by default, then
@item PIB @result{} F: 1 <= iw,ow <= 8
Positive integer binary format. The field is interpreted as a
fixed-point positive binary number. The location of the decimal point
-is implied. Endianness is teh same as the host machine.
+is implied. Endianness is the same as the host machine.
The default output format follows the rules for IB format.
@table @dfn
@item leader
-All formats accept an optional whitespace leader.
+All formats accept an optional white space leader.
@item day
An integer between 1 and 31 representing the day of month.
An integer representing a number of days.
@item date-delimiter
-One or more characters of whitespace or the following characters:
+One or more characters of white space or the following characters:
@code{- / . ,}
@item month
The letters @samp{wk} in any case.
@item time-delimiter
-At least one characters of whitespace or @samp{:} or @samp{.}.
+At least one characters of white space or @samp{:} or @samp{.}.
@item hour
An integer greater than 0 representing an hour.
At least the first two characters of an English day word.
@item spaces
-Any amount or no amount of whitespace.
+Any amount or no amount of white space.
@item sign
An optional positive or negative sign.
@item trailer
-All formats accept an optional whitespace trailer.
+All formats accept an optional white space trailer.
@end table
The date input formats are strung together from the above pieces. On
@item DATETIMEw.d: 17 <= iw,ow <= 40
Date and time format. Input format: leader + day + date-delimiter +
-month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
+month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
+ minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
@var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
@var{d} > 0 then fractional seconds @samp{.SS} are added.
names begin with an octothorpe (@samp{#}).
Scratch variables have the same properties as variables left with
-@cmd{LEAVE}:
-they retain their values between cases, and for the first case they are
-initialized to 0 or blanks. They have the additional property that they
-are deleted before the execution of any procedure. For this reason,
-scratch variables can't be used for analysis. To obtain the same
-effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
-value into an ordinary variable, then analysis that variable.
-
-@node Files, BNF, Variables, Language
+@cmd{LEAVE}: they retain their values between cases, and for the first
+case they are initialized to 0 or blanks. They have the additional
+property that they are deleted before the execution of any procedure.
+For this reason, scratch variables can't be used for analysis. To use
+a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
+to copy its value into an ordinary variable, then use that ordinary
+variable in the analysis.
+
+@node Files
@section Files Used by PSPP
PSPP makes use of many files each time it runs. Some of these it
@cindex syntax file
@item command file
@itemx syntax file
-These names (synonyms) refer to the file that contains instructions to
-PSPP that tell it what to do. The syntax file's name is specified on
-the PSPP command line. Syntax files can also be pulled in with
+These names (synonyms) refer to the file that contains instructions
+that tell PSPP what to do. The syntax file's name is specified on
+the PSPP command line. Syntax files can also be read with
@cmd{INCLUDE} (@pxref{INCLUDE}).
@cindex file, data
@cindex data file
@item data file
-Data files contain raw data in ASCII format suitable for being read in
-by @cmd{DATA LIST}. Data can be embedded in the syntax
-file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
-syntax file a data file too.
+Data files contain raw data in text or binary format. Data can also
+be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
@cindex file, output
@cindex output file
@cindex active file
@cindex file, active
@item active file
-The active file is the ``file'' on which all PSPP procedures
-are performed. The active file contains variable definitions and
-cases. The active file is not necessarily a disk file: it is stored
-in memory if there is room.
+The active file is the ``file'' on which all PSPP procedures are
+performed. The active file consists of a dictionary and a set of cases.
+The active file is not necessarily a disk file: it is stored in memory
+if there is room.
+
+@cindex system file
+@cindex file, system
+@item system file
+System files are binary files that store a dictionary and a set of
+cases. @cmd{GET} and @cmd{SAVE} read and write system files.
+
+@cindex portable file
+@cindex file, portable
+@item portable file
+Portable files are files in a text-based format that store a dictionary
+and a set of cases. @cmd{IMPORT} and @cmd{EXPORT} read and write
+portable files.
+
+@cindex scratch file
+@cindex file, scratch
+@item scratch file
+Scratch files consist of a dictionary and cases and may be stored in
+memory or on disk. Most procedures that act on a system file or
+portable file can use a scratch file instead. The contents of scratch
+files persist within a single PSPP session only. @cmd{GET} and
+@cmd{SAVE} can be used to read and write scratch files. Scratch files
+are a PSPP extension.
@end table
-@node BNF, , Files, Language
+@node File Handles
+@section File Handles
+@cindex file handles
+
+A @dfn{file handle} is a reference to a data file, system file, portable
+file, or scratch file. Most often, a file handle is specified as the
+name of a file as a string, that is, enclosed within @samp{'} or
+@samp{"}.
+
+PSPP also supports declaring named file handles with the @cmd{FILE
+HANDLE} command. This command associates an identifier of your choice
+(the file handle's name) with a file. Later, the file handle name can
+be substituted for the name of the file. When PSPP syntax accesses a
+file multiple times, declaring a named file handle simplifies updating
+the syntax later to use a different file. Use of @cmd{FILE HANDLE} is
+also required to read data files in binary formats. @xref{FILE HANDLE},
+for more information.
+
+PSPP assumes that a file handle name that begins with @samp{#} refers to
+a scratch file, unless the name has already been declared on @cmd{FILE
+HANDLE} to refer to another kind of file. A scratch file is similar to
+a system file, except that it persists only for the duration of a given
+PSPP session. Most commands that read or write a system or portable
+file, such as @cmd{GET} and @cmd{SAVE}, also accept scratch file
+handles. Scratch file handles may also be declared explicitly with
+@cmd{FILE HANDLE}. Scratch files are a PSPP extension.
+
+In some circumstances, PSPP must distinguish whether a file handle
+refers to a system file or a portable file. When this is necessary to
+read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
+PSPP uses the file's contents to decide. In the context of writing a
+file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, PSPP
+decides based on the file's name: if it ends in @samp{.por} (with any
+capitalization), then PSPP writes a portable file; otherwise, PSPP
+writes a system file.
+
+INLINE is reserved as a file handle name. It refers to the ``data
+file'' embedded into the syntax file between @cmd{BEGIN DATA} and
+@cmd{END DATA}. @xref{BEGIN DATA}, for more information.
+
+The file to which a file handle refers may be reassigned on a later
+@cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
+HANDLE}. The @cmd{CLOSE FILE HANDLE} command is also useful to free the
+storage associated with a scratch file. @xref{CLOSE FILE HANDLE}, for
+more information.
+
+@node BNF
@section Backus-Naur Form
@cindex BNF
@cindex Backus-Naur Form
@item
Words in all-uppercase are PSPP keyword tokens. In BNF, these are
often called @dfn{terminals}. There are some special terminals, which
-are actually written in lowercase for clarity:
+are written in lowercase for clarity:
@table @asis
@cindex @code{number}
Operators and punctuators.
@cindex @code{.}
-@cindex terminal dot
-@cindex dot, terminal
@item @code{.}
-The terminal dot. This is not necessarily an actual dot in the syntax
-file: @xref{Commands}, for more details.
+The end of the command. This is not necessarily an actual dot in the
+syntax file: @xref{Commands}, for more details.
@end table
@item
@end table
@item
-@cindex @code{::=}
@cindex ``is defined as''
@cindex productions
@samp{::=} means ``is defined as''. The left side of @samp{::=} gives