Increment version to 0.7.9 to send to Translation Project.

[pspp-builds.git] / doc / language.texi
diff --git a/doc/language.texi b/doc/language.texi

index 810037100bba6fbfe347d1968a5c19c1922eca13..a5ef4b69d6e338084847b97278a1357d8478ebaa 100644 (file)
--- a/doc/language.texi
+++ b/doc/language.texi
@@ -1,353 +1,221 @@
-@node Language, Expressions, Invocation, Top
+@node Language
  @chapter The PSPP language
  @cindex language, PSPP
  @cindex PSPP, language
  
-@quotation
-@strong{Please note:} PSPP is not even close to completion.
-Only a few actual statistical procedures are implemented.  PSPP
-is a work in progress.
-@end quotation
-
  This chapter discusses elements common to many PSPP commands.
  Later chapters will describe individual commands in detail.
  
  @menu
  * Tokens::                      Characters combine to form tokens.
  * Commands::                    Tokens combine to form commands.
+* Syntax Variants::             Batch vs. Interactive mode
  * Types of Commands::           Commands come in several flavors.
  * Order of Commands::           Commands combine to form syntax files.
  * Missing Observations::        Handling missing observations.
-* Variables::                   The unit of data storage.
+* Datasets::                    Data organization.
  * Files::                       Files used by PSPP.
+* File Handles::                How files are named.
  * BNF::                         How command syntax is described.
  @end menu
  
-@node Tokens, Commands, Language, Language
+
+@node Tokens
  @section Tokens
  @cindex language, lexical analysis
  @cindex language, tokens
  @cindex tokens
  @cindex lexical analysis
-@cindex lexemes
  
  PSPP divides most syntax file lines into series of short chunks
-called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}.  These
-tokens are then grouped to form commands, each of which tells
+called @dfn{tokens}.
+Tokens are then grouped to form commands, each of which tells
  PSPP to take some action---read in data, write out data, perform
-a statistical procedure, etc.  The process of dividing input into tokens
-is @dfn{tokenization}, or @dfn{lexical analysis}.  Each type of token is
+a statistical procedure, etc.  Each type of token is
  described below.
  
-@cindex delimiters
-@cindex whitespace
-Tokens must be separated from each other by @dfn{delimiters}.
-Delimiters include whitespace (spaces, tabs, carriage returns, line
-feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
-operators (plus, minus, times, divide, etc.)  Note that while whitespace
-only separates tokens, other delimiters are tokens in themselves.
-
  @table @strong
  @cindex identifiers
  @item Identifiers
-Identifiers are names that specify variable names, commands, or command
-details.
-
-@itemize @bullet
-@item
-The first character in an identifier must be a letter, @samp{#}, or
-@samp{@@}.  Some system identifiers begin with @samp{$}, but
-user-defined variables' names may not begin with @samp{$}.
-
-@item
-The remaining characters in the identifier must be letters, digits, or
-one of the following special characters: 
+Identifiers are names that typically specify variables, commands, or
+subcommands.  The first character in an identifier must be a letter,
+@samp{#}, or @samp{@@}.  The remaining characters in the identifier
+must be letters, digits, or one of the following special characters:
  
  @example
-.  _  $  #  @@
+@center @.  _  $  #  @@
  @end example
  
-@item
-@cindex variable names
-@cindex names, variable
-Variable names may be up any length up to 64 bytes long.
-
-
-@item
  @cindex case-sensitivity
-Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
-@code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
-representations of the same identifier.
+Identifiers may be any length, but only the first 64 bytes are
+significant.  Identifiers are not case-sensitive: @code{foobar},
+@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
+different representations of the same identifier.
  
-@item
-@cindex keywords
-Identifiers other than variable names may be abbreviated to their first
-3 characters if this abbreviation is unambiguous.  These identifiers are
-often called @dfn{keywords}.  (Unique abbreviations of 3 or more
-characters are also accepted: @samp{FRE}, @samp{FREQ}, and
-@samp{FREQUENCIES} are equivalent when the last is a keyword.)
-
-@item
-Whether an identifier is a keyword depends on the context.
-
-@item
-@cindex keywords, reserved
-@cindex reserved keywords
-Some keywords are reserved.  These keywords may not be used in any
-context besides those explicitly described in this manual.  The reserved
-keywords are:
+@cindex identifiers, reserved
+@cindex reserved identifiers
+Some identifiers are reserved.  Reserved identifiers may not be used
+in any context besides those explicitly described in this manual.  The
+reserved identifiers are:
  
  @example
-ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
+@center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
  @end example
  
-@item 
-Since keywords are identifiers, all the rules for identifiers apply.
-Specifically, they must be delimited as are other identifiers:
-@code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
-variable name.
-@end itemize
+@item Keywords
+Keywords are a subclass of identifiers that form a fixed part of
+command syntax.  For example, command and subcommand names are
+keywords.  Keywords may be abbreviated to their first 3 characters if
+this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
+characters are also accepted: @samp{FRE}, @samp{FREQ}, and
+@samp{FREQUENCIES} are equivalent when the last is a keyword.)
  
-@cindex @samp{.}
-@cindex period
-@cindex variable names, ending with period
-@strong{Caution:} It is legal to end a variable name with a period, but
-@emph{don't do it!}  The variable name will be misinterpreted when it is
-the final token on a line: @code{FOO.} will be divided into two separate
-tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
-@xref{Commands, , Forming commands of tokens}.
+Reserved identifiers are always used as keywords.  Other identifiers
+may be used both as keywords and as user-defined identifiers, such as
+variable names.
  
  @item Numbers
  @cindex numbers
  @cindex integers
  @cindex reals
-Numbers may be specified as integers or reals.  Integers are internally
-converted into reals.  Scientific notation is not supported.  Here are
-some examples of valid numbers:
+Numbers are expressed in decimal.  A decimal point is optional.
+Numbers may be expressed in scientific notation by adding @samp{e} and
+a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
+are some more examples of valid numbers:
  
  @example
-1234  3.14159265359  .707106781185  8945.
+-5  3.14159265359  1e100  -.707  8945.
  @end example
  
-@strong{Caution:} The last example will be interpreted as two tokens,
-@samp{8945} and @samp{.}, if it is the last token on a line.
+Negative numbers are expressed with a @samp{-} prefix.  However, in
+situations where a literal @samp{-} token is expected, what appears to
+be a negative number is treated as @samp{-} followed by a positive
+number.
+
+No white space is allowed within a number token, except for horizontal
+white space between @samp{-} and the rest of the number.
+
+The last example above, @samp{8945.} will be interpreted as two
+tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
+@xref{Commands, , Forming commands of tokens}.
  
  @item Strings
  @cindex strings
  @cindex @samp{'}
  @cindex @samp{"}
  @cindex case-sensitivity
-Strings are literal sequences of characters enclosed in pairs of single
-quotes (@samp{'}) or double quotes (@samp{"}).
-
-@itemize @bullet
-@item
-Whitespace and case of letters @emph{are} significant inside strings.
-@item
-Whitespace characters inside a string are not delimiters.
-@item
-To include single-quote characters in a string, enclose the string in
-double quotes.
-@item
-To include double-quote characters in a string, enclose the string in
-single quotes.
-@item
-It is not possible to put both single- and double-quote characters
-inside one string.
-@end itemize
-
-@item Hexstrings
-@cindex hexstrings
-Hexstrings are string variants that use hex digits to specify
-characters.
-
-@itemize @bullet
-@item
-A hexstring may be used anywhere that an ordinary string is allowed.
-
-@item
-@cindex @samp{X'}
-@cindex @samp{'}
-A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
-
-@cindex whitespace
-@item
-No whitespace is allowed between the initial @samp{X} and @samp{'}.
-
-@item
-Double quotes @samp{"} may be used in place of single quotes @samp{'} if
-done in both places.
-
-@item
-Each pair of hex digits is internally changed into a single character
-with the given value.
-
-@item
-If there is an odd number of hex digits, the missing last digit is
-assumed to be @samp{0}.
-
-@item
-@cindex portability
-@strong{Please note:} Use of hexstrings is nonportable because the same
-numeric values are associated with different glyphs by different
-operating systems.  Therefore, their use should be confined to syntax
-files that will not be widely distributed.
-
-@item
-@cindex characters, reserved
-@cindex 0
-@cindex whitespace
-@strong{Please note also:} The character with value 00 is reserved for
-internal use by PSPP.  Its use in strings causes an error and
-replacement with a blank space (in ASCII, hex 20, decimal 32).
-@end itemize
-
-@item Punctuation
-@cindex punctuation
-Punctuation separates tokens; punctuators are delimiters.  These are the
-punctuation characters:
-
-@example
-,  /  =  (  )
-@end example
-
-@item Operators
+Strings are literal sequences of characters enclosed in pairs of
+single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
+character used for quoting in the string, double it, e.g.@:
+@samp{'it''s an apostrophe'}.  White space and case of letters are
+significant inside strings.
+
+Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
+'c'} is equivalent to @samp{'abc'}.  So that a long string may be
+broken across lines, a line break may precede or follow, or both
+precede and follow, the @samp{+}.  (However, an entirely blank line
+preceding or following the @samp{+} is interpreted as ending the
+current command.)
+
+Strings may also be expressed as hexadecimal character values by
+prefixing the initial quote character by @samp{x} or @samp{X}.
+Regardless of the syntax file or active dataset's encoding, the
+hexadecimal digits in the string are interpreted as Unicode characters
+in UTF-8 encoding.
+
+Individual Unicode code points may also be expressed by specifying the
+hexadecimal code point number in single or double quotes preceded by
+@samp{u} or @samp{U}.  For example, Unicode code point U+1D11E, the
+musical G clef character, could be expressed as @code{U'1D11E'}.
+Invalid Unicode code points (above U+10FFFF or in between U+D800 and
+U+DFFF) are not allowed.
+
+When strings are concatenated with @samp{+}, each segment's prefix is
+considered individually.  For example, @code{'The G clef symbol is:' +
+u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
+plain text string.
+
+@item Punctuators and Operators
+@cindex punctuators
  @cindex operators
-Operators describe mathematical operations.  Some operators are delimiters:
-
-@example
-(  )  +  -  *  /  **
-@end example
-
-Many of the above operators are also punctuators.  Punctuators are
-distinguished from operators by context. 
-
-The other operators are all reserved keywords.  None of these are
-delimiters:
+These tokens are the punctuators and operators:
  
  @example
-AND  EQ  GE  GT  LE  LT  NE  OR
+@center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
  @end example
  
-@item Terminal Dot
-@cindex terminal dot
-@cindex dot, terminal
-@cindex period
-@cindex @samp{.}
-A period (@samp{.}) at the end of a line (except for whitespace) is one
-type of a @dfn{terminal dot}, although not every terminal dot is a
-period at the end of a line.  @xref{Commands, , Forming commands of
-tokens}.  A period is a terminal dot @emph{only}
-when it is at the end of a line; otherwise it is part of a
-floating-point number.  (A period outside a number in the middle of a
-line is an error.)
-
-@quotation
-@cindex terminal dot, changing
-@cindex dot, terminal, changing
-@strong{Please note:} The character used for the @dfn{terminal dot}
-can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}).  This
-is strongly discouraged, and throughout all the remainder of this
-manual it will be assumed that the default setting is in effect.
-@end quotation
-
+Most of these appear within the syntax of commands, but the period
+(@samp{.}) punctuator is used only at the end of a command.  It is a
+punctuator only as the last character on a line (except white space).
+When it is the last non-space character on a line, a period is not
+treated as part of another token, even if it would otherwise be part
+of, e.g.@:, an identifier or a floating-point number.
  @end table
  
-@node Commands, Types of Commands, Tokens, Language
+@node Commands
  @section Forming commands of tokens
  
  @cindex PSPP, command structure
  @cindex language, command structure
  @cindex commands, structure
  
-Most PSPP commands share a common structure, diagrammed below:
-
-@example
-@var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
-@end example
-
-@cindex @samp{[  ]}
-In the above, rather daunting, expression, pairs of square brackets
-(@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
-indicate parts of the syntax that vary from command to command.
-Ellipses (@samp{...}) indicate that the preceding part may be repeated
-an arbitrary number of times.  Let's pick apart what it says above:
-
-@itemize @bullet
-@cindex commands, names
-@item
-A command begins with a command name of one or more keywords, such as
-@cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}.  @var{cmd}
-may be abbreviated to its first word if that is unambiguous; each word
-in @var{cmd} may be abbreviated to a unique prefix of three or more
-characters as described above.
-
-@cindex subcommands
-@item
-The command name may be followed by one or more @dfn{subcommands}:
-
-@itemize @minus
-@item
-Each subcommand begins with a unique keyword, indicated by @var{sbc}
-above.  This is analogous to the command name.
-
-@item
-The subcommand name is optionally followed by an equals sign (@samp{=}).
-
-@item
-Some subcommands accept a series of one or more specifications
-(@var{spec}), optionally separated by commas.
-
-@item
-Each subcommand must be separated from the next (if any) by a forward
-slash (@samp{/}).
-@end itemize
-
-@cindex dot, terminal
-@cindex terminal dot
-@item
-Each command must be terminated with a @dfn{terminal dot}.  
-The terminal dot may be given one of three ways:
-
-@itemize @minus
-@item
-(most commonly) A period character at the very end of a line, as
-described above.
-
-@item
-(only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
-details.)  A completely blank line.
-
-@item
-(in batch mode only) Any line that is not indented from the left side of
-the page causes a terminal dot to be inserted before that line.
-Therefore, each command begins with a line that is flush left, followed
-by zero or more lines that are indented one or more characters from the
-left margin.
-
-In batch mode, PSPP will ignore a plus sign, minus sign, or period
-(@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
-line.  Any of these characters as the first character on a line will
-begin a new command.  This allows for visual indentation of a command
-without that command being considered part of the previous command.
-
-PSPP is in batch mode when it is reading input from a file, rather
-than from an interactive user.  Note that the other forms of the
-terminal dot may also be used in batch mode.
-
-Sometimes, one encounters syntax files that are intended to be
-interpreted in interactive mode rather than batch mode (for instance,
-this can happen if a session log file is used directly as a syntax
-file).  When this occurs, use the @samp{-i} command line option to force
-interpretation in interactive mode (@pxref{Language control options}).
-@end itemize
-@end itemize
-
-PSPP ignores empty commands when they are generated by the above
-rules.  Note that, as a consequence of these rules, each command must
-begin on a new line.
-
-@node Types of Commands, Order of Commands, Commands, Language
+Most PSPP commands share a common structure.  A command begins with a
+command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
+CASES}.  The command name may be abbreviated to its first word, and
+each word in the command name may be abbreviated to its first three
+or more characters, where these abbreviations are unambiguous.
+
+The command name may be followed by one or more @dfn{subcommands}.
+Each subcommand begins with a subcommand name, which may be
+abbreviated to its first three letters.  Some subcommands accept a
+series of one or more specifications, which follow the subcommand
+name, optionally separated from it by an equals sign
+(@samp{=}). Specifications may be separated from each other
+by commas or spaces.  Each subcommand must be separated from the next (if any)
+by a forward slash (@samp{/}).
+
+There are multiple ways to mark the end of a command.  The most common
+way is to end the last line of the command with a period (@samp{.}) as
+described in the previous section (@pxref{Tokens}).  A blank line, or
+one that consists only of white space or comments, also ends a command.
+
+@node Syntax Variants
+@section Syntax Variants
+
+@cindex Batch syntax
+@cindex Interactive syntax
+
+There are three variants of command syntax, which vary only in how
+they detect the end of one command and the start of the next.
+
+In @dfn{interactive mode}, which is the default for syntax typed at a
+command prompt, a period as the last non-blank character on a line
+ends a command.  A blank line also ends a command.
+
+In @dfn{batch mode}, an end-of-line period or a blank line also ends a
+command.  Additionally, it treats any line that has a non-blank
+character in the leftmost column as beginning a new command.  Thus, in
+batch mode the second and subsequent lines in a command must be
+indented.
+
+Regardless of the syntax mode, a plus sign, minus sign, or period in
+the leftmost column of a line is ignored and causes that line to begin
+a new command.  This is most useful in batch mode, in which the first
+line of a new command could not otherwise be indented, but it is
+accepted regardless of syntax mode.
+
+The default mode for reading commands from a file is @dfn{auto mode}.
+It is the same as batch mode, except that a line with a non-blank in
+the leftmost column only starts a new command if that line begins with
+the name of a PSPP command.  This correctly interprets most valid PSPP
+syntax files regardless of the syntax mode for which they are
+intended.
+
+The @option{--interactive} (or @option{-i}) or @option{--batch} (or
+@option{-b}) options set the syntax mode for files listed on the PSPP
+command line.  @xref{Main Options}, for more details.
+
+@node Types of Commands
  @section Types of Commands
  
  Commands in PSPP are divided roughly into six categories:
@@ -362,14 +230,14 @@ commands}.
  @item File definition commands
  @cindex file definition commands
  Give instructions for reading data from text files or from special
-binary ``system files''.  Most of these commands discard any previous
-data or variables to replace it with the new data and
-variables.  At least one must appear before the first command in any of
+binary ``system files''.  Most of these commands replace any previous
+data or variables with new data or
+variables.  At least one file definition command must appear before the first command in any of
  the categories below.  @xref{Data Input and Output}.
  
  @item Input program commands
  @cindex input program commands
-Though rarely used, these provide powerful tools for reading data files
+Though rarely used, these provide tools for reading data files
  in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
  
  @item Transformations
@@ -379,33 +247,33 @@ are not carried out until a procedure is executed.
  
  @item Restricted transformations
  @cindex restricted transformations
-Same as transformations for most purposes.  @xref{Order of Commands}, for a
-detailed description of the differences.
+Transformations that cannot appear in certain contexts.  @xref{Order
+of Commands}, for details.
  
  @item Procedures
  @cindex procedures
  Analyze data, writing results of analyses to the listing file.  Cause
  transformations specified earlier in the file to be performed.  In a
  more general sense, a @dfn{procedure} is any command that causes the
-active file (the data) to be read.
+active dataset (the data) to be read.
  @end table
  
-@node Order of Commands, Missing Observations, Types of Commands, Language
+@node Order of Commands
  @section Order of Commands
  @cindex commands, ordering
  @cindex order of commands
  
-PSPP does not place many restrictions on ordering of commands.
-The main restriction is that variables must be defined with one of the
-file-definition commands before they are otherwise referred to.
+PSPP does not place many restrictions on ordering of commands.  The
+main restriction is that variables must be defined before they are otherwise
+referenced.  This section describes the details of command ordering,
+but most users will have no need to refer to them.
  
-Of course, there are specific rules, for those who are interested.
  PSPP possesses five internal states, called initial, INPUT PROGRAM,
  FILE TYPE, transformation, and procedure states.  (Please note the
  distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
  @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
  
-PSPP starts up in the initial state.  Each successful completion
+PSPP starts in the initial state.  Each successful completion
  of a command may cause a state transition.  Each type of command has its
  own rules for state transitions:
  
@@ -413,7 +281,7 @@ own rules for state transitions:
  @item Utility commands
  @itemize @bullet
  @item
-Legal in all states.
+Valid in any state.
  @item
  Do not cause state transitions.  Exception: when @cmd{N OF CASES}
  is executed in the procedure state, it causes a transition to the
@@ -423,12 +291,12 @@ transformation state.
  @item @cmd{DATA LIST}
  @itemize @bullet
  @item
-Legal in all states.
+Valid in any state.
  @item
  When executed in the initial or procedure state, causes a transition to
  the transformation state.  
  @item
-Clears the active file if executed in the procedure or transformation
+Clears the active dataset if executed in the procedure or transformation
  state.
  @end itemize
  
@@ -439,7 +307,7 @@ Invalid in INPUT PROGRAM and FILE TYPE states.
  @item
  Causes a transition to the INPUT PROGRAM state.  
  @item
-Clears the active file.
+Clears the active dataset.
  @end itemize
  
  @item @cmd{FILE TYPE}
@@ -449,7 +317,7 @@ Invalid in INPUT PROGRAM and FILE TYPE states.
  @item
  Causes a transition to the FILE TYPE state.
  @item
-Clears the active file.
+Clears the active dataset.
  @end itemize
  
  @item Other file definition commands
@@ -459,7 +327,7 @@ Invalid in INPUT PROGRAM and FILE TYPE states.
  @item
  Cause a transition to the transformation state.
  @item
-Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
+Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
  and @cmd{UPDATE}.
  @end itemize
  
@@ -488,7 +356,7 @@ Cause a transition to the procedure state.
  @end itemize
  @end table
  
-@node Missing Observations, Variables, Order of Commands, Language
+@node Missing Observations
  @section Handling missing observations
  @cindex missing values
  @cindex values, missing
@@ -496,10 +364,11 @@ Cause a transition to the procedure state.
  PSPP includes special support for unknown numeric data values.
  Missing observations are assigned a special value, called the
  @dfn{system-missing value}.  This ``value'' actually indicates the
-absence of value; it means that the actual value is unknown.  Procedures
+absence of a value; it means that the actual value is unknown.  Procedures
  automatically exclude from analyses those observations or cases that
-have missing values.  Whether single observations or entire cases are
-excluded depends on the procedure.
+have missing values.  Details of missing value exclusion depend on the
+procedure and can often be controlled by the user; refer to
+descriptions of individual procedures for details.
  
  The system-missing value exists only for numeric variables.  String
  variables always have a defined value, even if it is only a string of
@@ -508,34 +377,42 @@ spaces.
  Variables, whether numeric or string, can have designated
  @dfn{user-missing values}.  Every user-missing value is an actual value
  for that variable.  However, most of the time user-missing values are
-treated in the same way as the system-missing value.  String variables
-that are wider than a certain width, usually 8 characters (depending on
-computer architecture), cannot have user-missing values.
+treated in the same way as the system-missing value.
  
  For more information on missing values, see the following sections:
-@ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
+@ref{Datasets}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
  documentation on individual procedures for information on how they
  handle missing values.
  
-@node Variables, Files, Missing Observations, Language
-@section Variables
-@cindex variables
+@node Datasets
+@section Datasets
+@cindex dataset
+@cindex variable
  @cindex dictionary
  
-Variables are the basic unit of data storage in PSPP.  All the
-variables in a file taken together, apart from any associated data, are
-said to form a @dfn{dictionary}.  
-Some details of variables are described in the sections below.
+PSPP works with data organized into @dfn{datasets}.  A dataset
+consists of a set of @dfn{variables}, which taken together are said to
+form a @dfn{dictionary}, and one or more @dfn{cases}, each of which
+has one value for each variable.
+
+At any given time PSPP has exactly one distinguished dataset, called
+the @dfn{active dataset}.  Most PSPP commands work only with the
+active dataset.  In addition to the active dataset, PSPP also supports
+any number of additional open datasets.  The @cmd{DATASET} commands
+can choose a new active dataset from among those that are open, as
+well as create and destroy datasets (@pxref{DATASET}).
+
+The sections below describe variables in more detail.
  
  @menu
  * Attributes::                  Attributes of variables.
  * System Variables::            Variables automatically defined by PSPP.
  * Sets of Variables::           Lists of variable names.
-* Input/Output Formats::        Input and output formats.
+* Input and Output Formats::    Input and output formats.
  * Scratch Variables::           Variables deleted by procedures.
  @end menu
  
-@node Attributes, System Variables, Variables, Variables
+@node Attributes
  @subsection Attributes of Variables
  @cindex variables, attributes of
  @cindex attributes of variables
@@ -543,9 +420,29 @@ Each variable has a number of attributes, including:
  
  @table @strong
  @item Name
-This is an identifier.  Each variable must have a different name.
+An identifier, up to 64 bytes long.  Each variable must have a different name.
  @xref{Tokens}.
  
+Some system variable names begin with @samp{$}, but user-defined
+variables' names may not begin with @samp{$}.
+
+@cindex @samp{.}
+@cindex period
+@cindex variable names, ending with period
+The final character in a variable name should not be @samp{.}, because
+such an identifier will be misinterpreted when it is the final token
+on a line: @code{FOO.} will be divided into two separate tokens,
+@samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.
+
+@cindex @samp{_}
+The final character in a variable name should not be @samp{_}, because
+some such identifiers are used for special purposes by PSPP
+procedures.
+
+As with all PSPP identifiers, variable names are not case-sensitive.
+PSPP capitalizes variable names on output the same way they were
+capitalized at their point of definition in the input.
+
  @cindex variables, type
  @cindex type of variables
  @item Type
@@ -556,15 +453,9 @@ Numeric or string.
  @item Width
  (string variables only) String variables with a width of 8 characters or
  fewer are called @dfn{short string variables}.  Short string variables
-can be used in many procedures where @dfn{long string variables} (those
+may be used in a few contexts where @dfn{long string variables} (those
  with widths greater than 8) are not allowed.
  
-@quotation
-@strong{Please note:} Certain systems may consider strings longer than 8
-characters to be short strings.  Eight characters represents a minimum
-figure for the maximum length of a short string.
-@end quotation
-
  @item Position
  Variables in the dictionary are arranged in a specific order.
  @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
@@ -600,21 +491,26 @@ string.  @xref{VALUE LABELS}.
  Display width, format, and (for numeric variables) number of decimal
  places.  This attribute does not affect how data are stored, just how
  they are displayed.  Example: a width of 8, with 2 decimal places.
-@xref{PRINT FORMATS}.
+@xref{Input and Output Formats}.
  
  @cindex write format
  @item Write format
-Similar to print format, but used by certain commands that are
-designed to write to binary files.  @xref{WRITE FORMATS}.
+Similar to print format, but used by the @cmd{WRITE} command
+(@pxref{WRITE}).
+
+@cindex custom attributes
+@item Custom attributes
+User-defined associations between names and values.  @xref{VARIABLE
+ATTRIBUTE}.
  @end table
  
-@node System Variables, Sets of Variables, Attributes, Variables
+@node System Variables
  @subsection Variables Automatically Defined by PSPP
  @cindex system variables
  @cindex variables, system
  
  There are seven system variables.  These are not like ordinary
-variables, as they are not stored in each case.  They can only be used
+variables because system variables are not always stored.  They can be used only
  in expressions.  These system variables, whose values and output formats
  cannot be modified, are described below.
  
@@ -644,7 +540,7 @@ System missing value, in format F1.
  
  @cindex @code{$TIME}
  @item $TIME
-Number of seconds between midnight 14 Oct 1582 and the time the active file
+Number of seconds between midnight 14 Oct 1582 and the time the active dataset
  was read, in format F20.
  
  @cindex @code{$WIDTH}
@@ -652,411 +548,723 @@ was read, in format F20.
  Page width, in characters, in format F3.
  @end table
  
-@node Sets of Variables, Input/Output Formats, System Variables, Variables
+@node Sets of Variables
  @subsection Lists of variable names
-@cindex TO convention
-@cindex convention, TO
+@cindex @code{TO} convention
+@cindex convention, @code{TO}
+
+To refer to a set of variables, list their names one after another.
+Optionally, their names may be separated by commas.  To include a
+range of variables from the dictionary in the list, write the name of
+the first and last variable in the range, separated by @code{TO}.  For
+instance, if the dictionary contains six variables with the names
+@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
+@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
+variables @code{X2}, @code{GOAL}, and @code{MET}.
  
-There are several ways to specify a set of variables:
+Commands that define variables, such as @cmd{DATA LIST}, give
+@code{TO} an alternate meaning.  With these commands, @code{TO} define
+sequences of variables whose names end in consecutive integers.  The
+syntax is two identifiers that begin with the same root and end with
+numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
+variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
+@code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
+variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
+@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
+@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
+
+After a set of variables has been defined with @cmd{DATA LIST} or
+another command with this method, the same set can be referenced on
+later commands using the same syntax.
  
-@enumerate
-@item
-(Most commonly.)  List the variable names one after another, optionally
-separating them by commas.
+@node Input and Output Formats
+@subsection Input and Output Formats
  
-@cindex @code{TO}
-@item
-(This method cannot be used on commands that define the dictionary, such
-as @cmd{DATA LIST}.)  The syntax is the names of two existing variables,
-separated by the reserved keyword @code{TO}.  The meaning is to include
-every variable in the dictionary between and including the variables
-specified.  For instance, if the dictionary contains six variables with
-the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
-@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
-variables @code{X2}, @code{GOAL}, and @code{MET}.
+@cindex formats
+An @dfn{input format} describes how to interpret the contents of an
+input field as a number or a string.  It might specify that the field
+contains an ordinary decimal number, a time or date, a number in binary
+or hexadecimal notation, or one of several other notations.  Input
+formats are used by commands such as @cmd{DATA LIST} that read data or
+syntax files into the PSPP active dataset.
+
+Every input format corresponds to a default @dfn{output format} that
+specifies the formatting used when the value is output later.  It is
+always possible to explicitly specify an output format that resembles
+the input format.  Usually, this is the default, but in cases where the
+input format is unfriendly to human readability, such as binary or
+hexadecimal formats, the default output format is an easier-to-read
+decimal format.
+
+Every variable has two output formats, called its @dfn{print format} and
+@dfn{write format}.  Print formats are used in most output contexts;
+write formats are used only by @cmd{WRITE} (@pxref{WRITE}).  Newly
+created variables have identical print and write formats, and
+@cmd{FORMATS}, the most commonly used command for changing formats
+(@pxref{FORMATS}), sets both of them to the same value as well.  Thus,
+most of the time, the distinction between print and write formats is
+unimportant.
+
+Input and output formats are specified to PSPP with a @dfn{format
+specification} of the form @code{TYPEw} or @code{TYPEw.d}, where
+@code{TYPE} is one of the format types described later, @code{w} is a
+field width measured in columns, and @code{d} is an optional number of
+decimal places.  If @code{d} is omitted, a value of 0 is assumed.  Some
+formats do not allow a nonzero @code{d} to be specified.
+
+The following sections describe the input and output formats supported
+by PSPP.
  
-@item
-(This method can be used only on commands that define the dictionary,
-such as @cmd{DATA LIST}.)  It is used to define sequences of variables
-that end in consecutive integers.  The syntax is two identifiers that
-end in numbers.  This method is best illustrated with examples:
+@menu
+* Basic Numeric Formats::       
+* Custom Currency Formats::     
+* Legacy Numeric Formats::      
+* Binary and Hexadecimal Numeric Formats::  
+* Time and Date Formats::       
+* Date Component Formats::      
+* String Formats::              
+@end menu
+
+@node Basic Numeric Formats
+@subsubsection Basic Numeric Formats
+
+@cindex numeric formats
+The basic numeric formats are used for input and output of real numbers
+in standard or scientific notation.  The following table shows an
+example of how each format displays positive and negative numbers with
+the default decimal point setting:
+
+@float
+@multitable {DOLLAR10.2} {@code{@tie{}$3,141.59}} {@code{-$3,141.59}}
+@headitem Format @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
+@item F8.2       @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
+@item COMMA9.2   @tab @code{@tie{}3,141.59}  @tab @code{-3,141.59}
+@item DOT9.2     @tab @code{@tie{}3.141,59}  @tab @code{-3.141,59}
+@item DOLLAR10.2 @tab @code{@tie{}$3,141.59} @tab @code{-$3,141.59}
+@item PCT9.2     @tab @code{@tie{}3141.59%}  @tab @code{-3141.59%}
+@item E8.1       @tab @code{@tie{}3.1E+003}  @tab @code{-3.1E+003}
+@end multitable
+@end float
+
+On output, numbers in F format are expressed in standard decimal
+notation with the requested number of decimal places.  The other formats
+output some variation on this style:
  
  @itemize @bullet
  @item
-The syntax @code{X1 TO X5} defines 5 variables:
+Numbers in COMMA format are additionally grouped every three digits by
+inserting a grouping character.  The grouping character is ordinarily a
+comma, but it can be changed to a period (@pxref{SET DECIMAL}).
  
-@itemize @minus
-@item
-X1
-@item
-X2
-@item
-X3
-@item
-X4
  @item
-X5
-@end itemize
+DOT format is like COMMA format, but it interchanges the role of the
+decimal point and grouping characters.  That is, the current grouping
+character is used as a decimal point and vice versa.
  
  @item
-The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
+DOLLAR format is like COMMA format, but it prefixes the number with
+@samp{$}.
  
-@itemize @minus
-@item
-ITEM0008
-@item
-ITEM0009
-@item
-ITEM0010
-@item
-ITEM0011
-@item
-ITEM0012
  @item
-ITEM0013
-@end itemize
+PCT format is like F format, but adds @samp{%} after the number.
  
  @item
-Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
-are invalid, although for different reasons, which should be evident.
+The E format always produces output in scientific notation.
  @end itemize
  
-Note that after a set of variables has been defined with @cmd{DATA LIST}
-or another command with this method, the same set can be referenced on
-later commands using the same syntax.
-
-@item
-The above methods can be combined, either one after another or delimited
-by commas.  For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
-is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
-is individually legal.
-@end enumerate
+On input, the basic numeric formats accept positive and numbers in
+standard decimal notation or scientific notation.  Leading and trailing
+spaces are allowed.  An empty or all-spaces field, or one that contains
+only a single period, is treated as the system missing value.
  
-@node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
-@subsection Input and Output Formats
+In scientific notation, the exponent may be introduced by a sign
+(@samp{+} or @samp{-}), or by one of the letters @samp{e} or @samp{d}
+(in uppercase or lowercase), or by a letter followed by a sign.  A
+single space may follow the letter or the sign or both.
  
-Data that PSPP inputs and outputs must have one of a number of formats.
-These formats are described, in general, by a format specification of
-the form @code{NAMEw.d}, where @var{name} is the
-format name and @var{w} is a field width.  @var{d} is the optional
-desired number of decimal places, if appropriate.  If @var{d} is not
-included then it is assumed to be 0.  Some formats do not allow @var{d}
-to be specified.
-
-When an input format is specified on @cmd{DATA LIST} or another
-command, then
-it is converted to an output format for the purposes of @cmd{PRINT}
-and other
-data output commands.  For most purposes, input and output formats are
-the same; the salient differences are described below.
-
-Below are listed the input and output formats supported by PSPP.  If an
-input format is mapped to a different output format by default, then
-that mapping is indicated with @result{}.  Each format has the listed
-bounds on input width (iw) and output width (ow).
-
-The standard numeric input and output formats are given in the following
-table:
+On fixed-format @cmd{DATA LIST} (@pxref{DATA LIST FIXED}) and in a few
+other contexts, decimals are implied when the field does not contain a
+decimal point.  In F6.5 format, for example, the field @code{314159} is
+taken as the value 3.14159 with implied decimals.  Decimals are never
+implied if an explicit decimal point is present or if scientific
+notation is used.
  
-@table @asis
-@item Fw.d: 1 <= iw,ow <= 40
-Standard decimal format with @var{d} decimal places.  If the number is
-too large to fit within the field width, it is expressed in scientific
-notation (@code{1.2+34}) if w >= 6, with always at least two digits in
-the exponent.  When used as an input format, scientific notation is
-allowed but an E or an F must be used to introduce the exponent.
-
-The default output format is the same as the input format, except if
-@var{d} > 1.  In that case the output @var{w} is always made to be at
-least 2 + @var{d}.
+E and F formats accept the basic syntax already described.  The other
+formats allow some additional variations:
  
-@item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
-For input this is equivalent to F format except that no E or F is
-require to introduce the exponent.  For output, produces scientific
-notation in the form @code{1.2+34}.  There are always at least two
-digits given in the exponent.
+@itemize @bullet
+@item
+COMMA, DOLLAR, and DOT formats ignore grouping characters within the
+integer part of the input field.  The identity of the grouping
+character depends on the format.
  
-The default output @var{w} is the largest of the input @var{w}, the
-input @var{d} + 7, and 10.  The default output @var{d} is the input
-@var{d}, but at least 3.
+@item
+DOLLAR format allows a dollar sign to precede the number.  In a negative
+number, the dollar sign may precede or follow the minus sign.
  
-@item COMMAw.d: 1 <= iw,ow <= 40
-Equivalent to F format, except that groups of three digits are
-comma-separated on output.  If the number is too large to express in the
-field width, then first commas are eliminated, then if there is still
-not enough space the number is expressed in scientific notation given
-that w >= 6.  Commas are allowed and ignored when this is used as an
-input format.  
+@item
+PCT format allows a percent sign to follow the number.
+@end itemize
  
-@item DOTw.d: 1 <= iw,ow <= 40
-Equivalent to COMMA format except that the roles of comma and decimal
-point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
-COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
-decimal point.
+All of the basic number formats have a maximum field width of 40 and
+accept no more than 16 decimal places, on both input and output.  Some
+additional restrictions apply:
  
-@item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
-Equivalent to COMMA format, except that the number is prefixed by a
-dollar sign (@samp{$}) if there is room.  On input the value is allowed
-to be prefixed by a dollar sign, which is ignored.
+@itemize @bullet
+@item
+As input formats, the basic numeric formats allow no more decimal places
+than the field width.  As output formats, the field width must be
+greater than the number of decimal places; that is, large enough to
+allow for a decimal point and the number of requested decimal places.
+DOLLAR and PCT formats must allow an additional column for @samp{$} or
+@samp{%}.
  
-The default output @var{w} is the input @var{w}, but at least 2.
+@item
+The default output format for a given input format increases the field
+width enough to make room for optional input characters.  If an input
+format calls for decimal places, the width is increased by 1 to make
+room for an implied decimal point.  COMMA, DOT, and DOLLAR formats also
+increase the output width to make room for grouping characters.  DOLLAR
+and PCT further increase the output field width by 1 to make room for
+@samp{$} or @samp{%}.  The increased output width is capped at 40, the
+maximum field width.
  
-@item PCTw.d: 2 <= iw,ow <= 40
-Equivalent to F format, except that the number is suffixed by a percent
-sign (@samp{%}) if there is room.  On input the value is allowed to be
-suffixed by a percent sign, which is ignored.
+@item
+The E format is exceptional.  For output, E format has a minimum width
+of 7 plus the number of decimal places.  The default output format for
+an E input format is an E format with at least 3 decimal places and
+thus a minimum width of 10.
+@end itemize
  
-The default output @var{w} is the input @var{w}, but at least 2.
+More details of basic numeric output formatting are given below:
  
-@item Nw.d: 1 <= iw,ow <= 40
-Only digits are allowed within the field width.  The decimal point is
-assumed to be @var{d} digits from the right margin.
+@itemize @bullet
+@item
+Output rounds to nearest, with ties rounded away from zero.  Thus, 2.5
+is output as @code{3} in F1.0 format, and -1.125 as @code{-1.13} in F5.1
+format.
  
-The default output format is F with the same @var{w} and @var{d}, except
-if @var{d} > 1.  In that case the output @var{w} is always made to be at
-least 2 + @var{d}.
+@item
+The system-missing value is output as a period in a field of spaces,
+placed in the decimal point's position, or in the rightmost column if no
+decimal places are requested.  A period is used even if the decimal
+point character is a comma.
  
-@item Zw.d @result{} F: 1 <= iw,ow <= 40
-Zoned decimal input.  If you need to use this then you know how.
+@item
+A number that does not fill its field is right-justified within the
+field.
  
-@item IBw.d @result{} F: 1 <= iw,ow <= 8
-Integer binary format.  The field is interpreted as a fixed-point
-positive or negative binary number in two's-complement notation.  The
-location of the decimal point is implied.  Endianness is the same as the
-host machine.
+@item
+A number is too large for its field causes decimal places to be dropped
+to make room.  If dropping decimals does not make enough room,
+scientific notation is used if the field is wide enough.  If a number
+does not fit in the field, even in scientific notation, the overflow is
+indicated by filling the field with asterisks (@samp{*}).
  
-The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
-with output @var{w} as 9 + input @var{d} and output @var{d} as input
-@var{d}.
+@item
+COMMA, DOT, and DOLLAR formats insert grouping characters only if space
+is available for all of them.  Grouping characters are never inserted
+when all decimal places must be dropped.  Thus, 1234.56 in COMMA5.2
+format is output as @samp{@tie{}1235} without a comma, even though there
+is room for one, because all decimal places were dropped.
  
-@item PIB @result{} F: 1 <= iw,ow <= 8
-Positive integer binary format.  The field is interpreted as a
-fixed-point positive binary number.  The location of the decimal point
-is implied.  Endianness is teh same as the host machine.
+@item
+DOLLAR or PCT format drop the @samp{$} or @samp{%} only if the number
+would not fit at all without it.  Scientific notation with @samp{$} or
+@samp{%} is preferred to ordinary decimal notation without it.
  
-The default output format follows the rules for IB format.
+@item
+Except in scientific notation, a decimal point is included only when
+it is followed by a digit.  If the integer part of the number being
+output is 0, and a decimal point is included, then the zero before the
+decimal point is dropped.
  
-@item Pw.d @result{} F: 1 <= iw,ow <= 16
-Binary coded decimal format.  Each byte from left to right, except the
-rightmost, represents two digits.  The upper nibble of each byte is more
-significant.  The upper nibble of the final byte is the least
-significant digit.  The lower nibble of the final byte is the sign; a
-value of D represents a negative sign and all other values are
-considered positive.  The decimal point is implied.
+In scientific notation, the number always includes a decimal point,
+even if it is not followed by a digit.
  
-The default output format follows the rules for IB format.
+@item
+A negative number includes a minus sign only in the presence of a
+nonzero digit: -0.01 is output as @samp{-.01} in F4.2 format but as
+@samp{@tie{}@tie{}.0} in F4.1 format.  Thus, a ``negative zero'' never
+includes a minus sign.
  
-@item PKw.d @result{} F: 1 <= iw,ow <= 16
-Positive binary code decimal format.  Same as P but the last byte is the
-same as the others.
+@item
+In negative numbers output in DOLLAR format, the dollar sign follows the
+negative sign.  Thus, -9.99 in DOLLAR6.2 format is output as
+@code{-$9.99}.
  
-The default output format follows the rules for IB format.
+@item
+In scientific notation, the exponent is output as @samp{E} followed by
+@samp{+} or @samp{-} and exactly three digits.  Numbers with magnitude
+less than 10**-999 or larger than 10**999 are not supported by most
+computers, but if they are supported then their output is considered
+to overflow the field and will be output as asterisks.
  
-@item RBw @result{} F: 2 <= iw,ow <= 8
+@item
+On most computers, no more than 15 decimal digits are significant in
+output, even if more are printed.  In any case, output precision cannot
+be any higher than input precision; few data sets are accurate to 15
+digits of precision.  Unavoidable loss of precision in intermediate
+calculations may also reduce precision of output.
  
-Binary C architecture-dependent ``double'' format.  For a standard
-IEEE754 implementation @var{w} should be 8.
+@item
+Special values such as infinities and ``not a number'' values are
+usually converted to the system-missing value before printing.  In a few
+circumstances, these values are output directly.  In fields of width 3
+or greater, special values are output as however many characters will
+fit from @code{+Infinity} or @code{-Infinity} for infinities, from
+@code{NaN} for ``not a number,'' or from @code{Unknown} for other values
+(if any are supported by the system).  In fields under 3 columns wide,
+special values are output as asterisks.
+@end itemize
  
-The default output format follows the rules for IB format.
+@node Custom Currency Formats
+@subsubsection Custom Currency Formats
+
+@cindex currency formats
+The custom currency formats are closely related to the basic numeric
+formats, but they allow users to customize the output format.  The
+SET command configures custom currency formats, using the syntax
+@display
+SET CC@var{x}=@t{"}@var{string}@t{"}.
+@end display
+@noindent 
+where @var{x} is A, B, C, D, or E, and @var{string} is no more than 16
+characters long.
+
+@var{string} must contain exactly three commas or exactly three periods
+(but not both), except that a single quote character may be used to
+``escape'' a following comma, period, or single quote.  If three commas
+are used, commas will be used for grouping in output, and a period will
+be used as the decimal point.  Uses of periods reverses these roles.
+
+The commas or periods divide @var{string} into four fields, called the
+@dfn{negative prefix}, @dfn{prefix}, @dfn{suffix}, and @dfn{negative
+suffix}, respectively.  The prefix and suffix are added to output
+whenever space is available.  The negative prefix and negative suffix
+are always added to a negative number when the output includes a nonzero
+digit.
+
+The following syntax shows how custom currency formats could be used to
+reproduce basic numeric formats:
  
-@item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
-PIB format encoded as textual hex digit pairs.  @var{w} must be even.
+@example
+@group
+SET CCA="-,,,".  /* Same as COMMA.
+SET CCB="-...".  /* Same as DOT.
+SET CCC="-,$,,". /* Same as DOLLAR.
+SET CCD="-,,%,". /* Like PCT, but groups with commas.
+@end group
+@end example
  
-The input width is mapped to a default output width as follows:
-2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
-12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
-decimal places.
+Here are some more examples of custom currency formats.  The final
+example shows how to use a single quote to escape a delimiter:
  
-@item RBHEXw @result{} F: 4 <= iw,ow <= 16
+@example
+@group
+SET CCA=",EUR,,-".   /* Euro.
+SET CCB="(,USD ,,)". /* US dollar.
+SET CCC="-.R$..".    /* Brazilian real.
+SET CCD="-,, NIS,".  /* Israel shekel.
+SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
+@end group
+@end example
  
-RB format encoded as textual hex digits pairs.  @var{w} must be even.
+@noindent These formats would yield the following output:
  
-The default output format is F8.2.
+@float
+@multitable {CCD13.2} {@code{@tie{}@tie{}USD 3,145.59}} {@code{(USD 3,145.59)}}
+@headitem Format @tab @code{@tie{}3145.59}         @tab @code{-3145.59}
+@item CCA12.2 @tab @code{@tie{}EUR3,145.59}        @tab @code{EUR3,145.59-}
+@item CCB14.2 @tab @code{@tie{}@tie{}USD 3,145.59} @tab @code{(USD 3,145.59)}
+@item CCC11.2 @tab @code{@tie{}R$3.145,59}         @tab @code{-R$3.145,59}
+@item CCD13.2 @tab @code{@tie{}3,145.59 NIS}       @tab @code{-3,145.59 NIS}
+@item CCE10.0 @tab @code{@tie{}Rp. 3.146}          @tab @code{-Rp. 3.146}
+@end multitable
+@end float
  
-@item CCAw.d: 1 <= ow <= 40
-@itemx CCBw.d: 1 <= ow <= 40
-@itemx CCCw.d: 1 <= ow <= 40
-@itemx CCDw.d: 1 <= ow <= 40
-@itemx CCEw.d: 1 <= ow <= 40
+The default for all the custom currency formats is @samp{-,,,},
+equivalent to COMMA format.
  
-User-defined custom currency formats.  May not be used as an input
-format.  @xref{SET}, for more details.
-@end table
+@node Legacy Numeric Formats
+@subsubsection Legacy Numeric Formats
  
-The date and time numeric input and output formats accept a number of
-possible formats.  Before describing the formats themselves, some
-definitions of the elements that make up their formats will be helpful:
+The N and Z numeric formats provide compatibility with legacy file
+formats.  They have much in common:
  
-@table @dfn
-@item leader
-All formats accept an optional whitespace leader.
+@itemize @bullet
+@item
+Output is rounded to the nearest representable value, with ties rounded
+away from zero.
  
-@item day
-An integer between 1 and 31 representing the day of month.
+@item
+Numbers too large to display are output as a field filled with asterisks
+(@samp{*}).
  
-@item day-count
-An integer representing a number of days.
+@item
+The decimal point is always implicitly the specified number of digits
+from the right edge of the field, except that Z format input allows an
+explicit decimal point.
  
-@item date-delimiter
-One or more characters of whitespace or the following characters:
-@code{- / . ,}
+@item
+Scientific notation may not be used.
  
-@item month
-A month name in one of the following forms:
-@itemize @bullet
  @item
-An integer between 1 and 12.
+The system-missing value is output as a period in a field of spaces.
+The period is placed just to the right of the implied decimal point in
+Z format, or at the right end in N format or in Z format if no decimal
+places are requested.  A period is used even if the decimal point
+character is a comma.
+
  @item
-Roman numerals representing an integer between 1 and 12.
+Field width may range from 1 to 40.  Decimal places may range from 0 up
+to the field width, to a maximum of 16.
+
  @item
-At least the first three characters of an English month name (January,
-February, @dots{}).
+When a legacy numeric format used for input is converted to an output
+format, it is changed into the equivalent F format.  The field width is
+increased by 1 if any decimal places are specified, to make room for a
+decimal point.  For Z format, the field width is increased by 1 more
+column, to make room for a negative sign.  The output field width is
+capped at 40 columns.
  @end itemize
  
-@item year
-An integer year number between 1582 and 19999, or between 1 and 199.
-Years between 1 and 199 will have 1900 added.
+@subsubheading N Format
+
+The N format supports input and output of fields that contain only
+digits.  On input, leading or trailing spaces, a decimal point, or any
+other non-digit character causes the field to be read as the
+system-missing value.  As a special exception, an N format used on
+@cmd{DATA LIST FREE} or @cmd{DATA LIST LIST} is treated as the
+equivalent F format.
+
+On output, N pads the field on the left with zeros.  Negative numbers
+are output like the system-missing value.
+
+@subsubheading Z Format
+
+The Z format is a ``zoned decimal'' format used on IBM mainframes.  Z
+format encodes the sign as part of the final digit, which must be one of
+the following:
+@example
+0123456789
+@{ABCDEFGHI
+@}JKLMNOPQR
+@end example
+@noindent
+where the characters in each row represent digits 0 through 9 in order.
+Characters in the first two rows indicate a positive sign; those in the
+third indicate a negative sign.
+
+On output, Z fields are padded on the left with spaces.  On input,
+leading and trailing spaces are ignored.  Any character in an input
+field other than spaces, the digit characters above, and @samp{.} causes
+the field to be read as system-missing.
+
+The decimal point character for input and output is always @samp{.},
+even if the decimal point character is a comma (@pxref{SET DECIMAL}).
+
+Nonzero, negative values output in Z format are marked as negative even
+when no nonzero digits are output.  For example, -0.2 is output in Z1.0
+format as @samp{J}.  The ``negative zero'' value supported by most
+machines is output as positive.
+
+@node Binary and Hexadecimal Numeric Formats
+@subsubsection Binary and Hexadecimal Numeric Formats
+
+@cindex binary formats
+@cindex hexadecimal formats
+The binary and hexadecimal formats are primarily designed for
+compatibility with existing machine formats, not for human readability.
+All of them therefore have a F format as default output format.  Some of
+these formats are only portable between machines with compatible byte
+ordering (endianness) or floating-point format.
+
+Binary formats use byte values that in text files are interpreted as
+special control functions, such as carriage return and line feed.  Thus,
+data in binary formats should not be included in syntax files or read
+from data files with variable-length records, such as ordinary text
+files.  They may be read from or written to data files with fixed-length
+records.  @xref{FILE HANDLE}, for information on working with
+fixed-length records.
+
+@subsubheading P and PK Formats
+
+These are binary-coded decimal formats, in which every byte (except the
+last, in P format) represents two decimal digits.  The most-significant
+4 bits of the first byte is the most-significant decimal digit, the
+least-significant 4 bits of the first byte is the next decimal digit,
+and so on.
+
+In P format, the most-significant 4 bits of the last byte are the
+least-significant decimal digit.  The least-significant 4 bits represent
+the sign: decimal 15 indicates a negative value, decimal 13 indicates a
+positive value.
+
+Numbers are rounded downward on output.  The system-missing value and
+numbers outside representable range are output as zero.
+
+The maximum field width is 16.  Decimal places may range from 0 up to
+the number of decimal digits represented by the field.
+
+The default output format is an F format with twice the input field
+width, plus one column for a decimal point (if decimal places were
+requested).
+
+@subsubheading IB and PIB Formats
+
+These are integer binary formats.  IB reads and writes 2's complement
+binary integers, and PIB reads and writes unsigned binary integers.  The
+byte ordering is by default the host machine's, but SET RIB may be used
+to select a specific byte ordering for reading (@pxref{SET RIB}) and
+SET WIB, similarly, for writing (@pxref{SET WIB}).
+
+The maximum field width is 8.  Decimal places may range from 0 up to the
+number of decimal digits in the largest value representable in the field
+width.
+
+The default output format is an F format whose width is the number of
+decimal digits in the largest value representable in the field width,
+plus 1 if the format has decimal places.
+
+@subsubheading RB Format
+
+This is a binary format for real numbers.  By default it reads and
+writes the host machine's floating-point format, but SET RRB may be
+used to select an alternate floating-point format for reading
+(@pxref{SET RRB}) and SET WRB, similarly, for writing (@pxref{SET
+WRB}).
+
+The recommended field width depends on the floating-point format.
+NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use
+a field width of 8.  ISL, ISB, VF, and ZS formats should use a field
+width of 4.  Other field widths will not produce useful results.  The
+maximum field width is 8.  No decimal places may be specified.
  
-@item julian
-A single number with a year number in the first 2, 3, or 4 digits (as
-above) and the day number within the year in the last 3 digits.
+The default output format is F8.2.
  
-@item quarter
-An integer between 1 and 4 representing a quarter.
+@subsubheading PIBHEX and RBHEX Formats
+
+These are hexadecimal formats, for reading and writing binary formats
+where each byte has been recoded as a pair of hexadecimal digits.
+
+A hexadecimal field consists solely of hexadecimal digits
+@samp{0}@dots{}@samp{9} and @samp{A}@dots{}@samp{F}.  Uppercase and
+lowercase are accepted on input; output is in uppercase.
+
+Other than the hexadecimal representation, these formats are equivalent
+to PIB and RB formats, respectively.  However, bytes in PIBHEX format
+are always ordered with the most-significant byte first (big-endian
+order), regardless of the host machine's native byte order or PSPP
+settings.
+
+Field widths must be even and between 2 and 16.  RBHEX format allows no
+decimal places; PIBHEX allows as many decimal places as a PIB format
+with half the given width.
+
+@node Time and Date Formats
+@subsubsection Time and Date Formats
+
+@cindex time formats
+@cindex date formats
+In PSPP, a @dfn{time} is an interval.  The time formats translate
+between human-friendly descriptions of time intervals and PSPP's
+internal representation of time intervals, which is simply the number of
+seconds in the interval.  PSPP has two time formats:
+
+@float
+@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
+@headitem Time Format @tab Template                  @tab Example
+@item TIME     @tab @code{hh:MM:SS.ss}          @tab @code{04:31:17.01}
+@item DTIME    @tab @code{DD HH:MM:SS.ss}       @tab @code{00 04:31:17.01}
+@end multitable
+@end float
+
+A @dfn{date} is a moment in the past or the future.  Internally, PSPP
+represents a date as the number of seconds since the @dfn{epoch},
+midnight, Oct. 14, 1582.  The date formats translate between
+human-readable dates and PSPP's numeric representation of dates and
+times.  PSPP has several date formats:
+
+@float
+@multitable {Date Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
+@headitem Date Format @tab Template                  @tab Example
+@item DATE     @tab @code{dd-mmm-yyyy}          @tab @code{01-OCT-1978}
+@item ADATE    @tab @code{mm/dd/yyyy}           @tab @code{10/01/1978}
+@item EDATE    @tab @code{dd.mm.yyyy}           @tab @code{01.10.1978}
+@item JDATE    @tab @code{yyyyjjj}              @tab @code{1978274}
+@item SDATE    @tab @code{yyyy/mm/dd}           @tab @code{1978/10/01}
+@item QYR      @tab @code{q Q yyyy}             @tab @code{3 Q 1978}
+@item MOYR     @tab @code{mmm yyyy}             @tab @code{OCT 1978}
+@item WKYR     @tab @code{ww WK yyyy}           @tab @code{40 WK 1978}
+@item DATETIME @tab @code{dd-mmm-yyyy HH:MM:SS.ss} @tab @code{01-OCT-1978 04:31:17.01}
+@end multitable
+@end float
+
+The templates in the preceding tables describe how the time and date
+formats are input and output:
  
-@item q-delimiter
-The letter @samp{Q} or @samp{q}.
+@table @code
+@item dd
+Day of month, from 1 to 31.  Always output as two digits.
+
+@item mm
+@itemx mmm
+Month.  In output, @code{mm} is output as two digits, @code{mmm} as the
+first three letters of an English month name (January, February,
+@dots{}).  In input, both of these formats, plus Roman numerals, are
+accepted.
+
+@item yyyy
+Year.  In output, DATETIME always produces a 4-digit year; other
+formats can produce a 2- or 4-digit year.  The century assumed for
+2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).  In
+output, a year outside the epoch causes the whole field to be filled
+with asterisks (@samp{*}).
+
+@item jjj
+Day of year (Julian day), from 1 to 366.  This is exactly three digits
+giving the count of days from the start of the year.  January 1 is
+considered day 1.
+
+@item q
+Quarter of year, from 1 to 4.  Quarters start on January 1, April 1,
+July 1, and October 1.
+
+@item ww
+Week of year, from 1 to 53.  Output as exactly two digits.  January 1 is
+the first day of week 1.
+
+@item DD
+Count of days, which may be positive or negative.  Output as at least
+two digits.
+
+@item hh
+Count of hours, which may be positive or negative.  Output as at least
+two digits.
+
+@item HH
+Hour of day, from 0 to 23.  Output as exactly two digits.
+
+@item MM
+Minute of hour, from 0 to 59.  Output as exactly two digits.
+
+@item SS.ss
+Seconds within minute, from 0 to 59.  The integer part is output as
+exactly two digits.  On output, seconds and fractional seconds may or
+may not be included, depending on field width and decimal places.  On
+input, seconds and fractional seconds are optional.  The DECIMAL setting
+controls the character accepted and displayed as the decimal point
+(@pxref{SET DECIMAL}).
+@end table
  
-@item week
-An integer between 1 and 53 representing a week within a year.
+For output, the date and time formats use the delimiters indicated in
+the table.  For input, date components may be separated by spaces or by
+one of the characters @samp{-}, @samp{/}, @samp{.}, or @samp{,}, and
+time components may be separated by spaces, @samp{:}, or @samp{.}.  On
+input, the @samp{Q} separating quarter from year and the @samp{WK}
+separating week from year may be uppercase or lowercase, and the spaces
+around them are optional.
+
+On input, all time and date formats accept any amount of leading and
+trailing white space.
+
+The maximum width for time and date formats is 40 columns.  Minimum
+input and output width for each of the time and date formats is shown
+below:
+
+@float
+@multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year}
+@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option 
+@item DATE @tab 8 @tab 9 @tab 4-digit year
+@item ADATE @tab 8 @tab 8 @tab 4-digit year
+@item EDATE @tab 8 @tab 8 @tab 4-digit year
+@item JDATE @tab 5 @tab 5 @tab 4-digit year
+@item SDATE @tab 8 @tab 8 @tab 4-digit year
+@item QYR @tab 4 @tab 6 @tab 4-digit year
+@item MOYR @tab 6 @tab 6 @tab 4-digit year
+@item WKYR @tab 6 @tab 8 @tab 4-digit year
+@item DATETIME @tab 17 @tab 17 @tab seconds
+@item TIME @tab 5 @tab 5 @tab seconds
+@item DTIME @tab 8 @tab 8 @tab seconds
+@end multitable
+@end float
+@noindent 
+In the table, ``Option'' describes what increased output width enables:
  
-@item wk-delimiter
-The letters @samp{wk} in any case.
+@table @asis
+@item 4-digit year
+A field 2 columns wider than minimum will include a 4-digit year.
+(DATETIME format always includes a 4-digit year.)
+
+@item seconds
+A field 3 columns wider than minimum will include seconds as well as
+minutes.  A field 5 columns wider than minimum, or more, can also
+include a decimal point and fractional seconds (but no more than allowed
+by the format's decimal places).
+@end table
  
-@item time-delimiter
-At least one characters of whitespace or @samp{:} or @samp{.}.
+For the time and date formats, the default output format is the same as
+the input format, except that PSPP increases the field width, if
+necessary, to the minimum allowed for output.
  
-@item hour
-An integer greater than 0 representing an hour.
+Time or dates narrower than the field width are right-justified within
+the field.
  
-@item minute
-An integer between 0 and 59 representing a minute within an hour.
+When a time or date exceeds the field width, characters are trimmed from
+the end until it fits.  This can occur in an unusual situation, e.g.@:
+with a year greater than 9999 (which adds an extra digit), or for a
+negative value on TIME or DTIME (which adds a leading minus sign).
  
-@item opt-second
-Optionally, a time-delimiter followed by a real number representing a
-number of seconds.
+@c What about out-of-range values?
  
-@item hour24
-An integer between 0 and 23 representing an hour within a day.
+The system-missing value is output as a period at the right end of the
+field.  
  
-@item weekday
-At least the first two characters of an English day word.
+@node Date Component Formats
+@subsubsection Date Component Formats
  
-@item spaces
-Any amount or no amount of whitespace.
+The WKDAY and MONTH formats provide input and output for the names of
+weekdays and months, respectively.
  
-@item sign
-An optional positive or negative sign.
+On output, these formats convert a number between 1 and 7, for WKDAY, or
+between 1 and 12, for MONTH, into the English name of a day or month,
+respectively.  If the name is longer than the field, it is trimmed to
+fit.  If the name is shorter than the field, it is padded on the right
+with spaces.  Values outside the valid range, and the system-missing
+value, are output as all spaces.
  
-@item trailer
-All formats accept an optional whitespace trailer.
-@end table
+On input, English weekday or month names (in uppercase or lowercase) are
+converted back to their corresponding numbers.  Weekday and month names
+may be abbreviated to their first 2 or 3 letters, respectively.
  
-The date input formats are strung together from the above pieces.  On
-output, the date formats are always printed in a single canonical
-manner, based on field width.  The date input and output formats are
-described below:
+The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for
+MONTH.  No decimal places are allowed.
  
-@table @asis
-@item DATEw: 9 <= iw,ow <= 40
-Date format. Input format: leader + day + date-delimiter +
-month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
-@var{w} < 11, DD-MMM-YYYY otherwise.
-
-@item EDATEw: 8 <= iw,ow <= 40
-European date format.  Input format same as DATE.  Output format:
-DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
-
-@item SDATEw: 8 <= iw,ow <= 40
-Standard date format. Input format: leader + year + date-delimiter +
-month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
-@var{w} < 10, YYYY/MM/DD otherwise.
-
-@item ADATEw: 8 <= iw,ow <= 40
-American date format.  Input format: leader + month + date-delimiter +
-day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
-@var{w} < 10, MM/DD/YYYY otherwise.
-
-@item JDATEw: 5 <= iw,ow <= 40
-Julian date format.  Input format: leader + julian + trailer.  Output
-format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
-
-@item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
-Quarter/year format.  Input format: leader + quarter + q-delimiter +
-year + trailer.  Output format: @samp{Q Q YY}, where the first
-@samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
-YYYY} otherwise.
-
-@item MOYRw: 6 <= iw,ow <= 40
-Month/year format.  Input format: leader + month + date-delimiter + year
-+ trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
-YYYY} otherwise.
-
-@item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
-Week/year format.  Input format: leader + week + wk-delimiter + year +
-trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
-YYYY} otherwise.
-
-@item DATETIMEw.d: 17 <= iw,ow <= 40
-Date and time format.  Input format: leader + day + date-delimiter +
-month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
-+ minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
-@var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
-@var{d} > 0 then fractional seconds @samp{.SS} are added.
-
-@item TIMEw.d: 5 <= iw,ow <= 40
-Time format.  Input format: leader + sign + spaces + hour +
-time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
-Seconds and fractional seconds are available with @var{w} of at least 8
-and 10, respectively.
-
-@item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
-Time format with day count.  Input format: leader + sign + spaces +
-day-count + time-delimiter + hour + time-delimiter + minute +
-opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
-seconds are available with @var{w} of at least 8 and 10, respectively.
-
-@item WKDAYw: 2 <= iw,ow <= 40
-A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
-leader + weekday + trailer.  Output format: as many characters, in all
-capital letters, of the English name of the weekday as will fit in the
-field width.
-
-@item MONTHw: 3 <= iw,ow <= 40
-A month as a number between 1 and 12, where 1 is January.  Input format:
-leader + month + trailer.  Output format: as many character, in all
-capital letters, of the English name of the month as will fit in the
-field width.
-@end table
+The default output format is the same as the input format.
  
-There are only two formats that may be used with string variables:
+@node String Formats
+@subsubsection String Formats
  
-@table @asis
-@item Aw: 1 <= iw <= 255, 1 <= ow <= 254
-The entire field is treated as a string value.
+@cindex string formats
+The A and AHEX formats are the only ones that may be assigned to string
+variables.  Neither format allows any decimal places.
  
-@item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
-The field is composed of characters in a string encoded as textual hex
-digit pairs.
+In A format, the entire field is treated as a string value.  The field
+width may range from 1 to 32,767, the maximum string width.  The default
+output format is the same as the input format.
  
-The default output @var{w} is half the input @var{w}.
-@end table
+In AHEX format, the field is composed of characters in a string encoded
+as hex digit pairs.  On output, hex digits are output in uppercase; on
+input, uppercase and lowercase are both accepted.  The default output
+format is A format with half the input width.
  
-@node Scratch Variables,  , Input/Output Formats, Variables
+@node Scratch Variables
  @subsection Scratch Variables
  
+@cindex scratch variables
  Most of the time, variables don't retain their values between cases.
-Instead, either they're being read from a data file or the active file,
+Instead, either they're being read from a data file or the active dataset,
  in which case they assume the value read, or, if created with
  @cmd{COMPUTE} or
  another transformation, they're initialized to the system-missing value
@@ -1068,15 +1276,15 @@ use a @dfn{scratch variable}.  Scratch variables are variables whose
  names begin with an octothorpe (@samp{#}).  
  
  Scratch variables have the same properties as variables left with
-@cmd{LEAVE}:
-they retain their values between cases, and for the first case they are
-initialized to 0 or blanks.  They have the additional property that they
-are deleted before the execution of any procedure.  For this reason,
-scratch variables can't be used for analysis.  To obtain the same
-effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
-value into an ordinary variable, then analysis that variable.
-
-@node Files, BNF, Variables, Language
+@cmd{LEAVE}: they retain their values between cases, and for the first
+case they are initialized to 0 or blanks.  They have the additional
+property that they are deleted before the execution of any procedure.
+For this reason, scratch variables can't be used for analysis.  To use
+a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
+to copy its value into an ordinary variable, then use that ordinary
+variable in the analysis.
+
+@node Files
  @section Files Used by PSPP
  
  PSPP makes use of many files each time it runs.  Some of these it
@@ -1090,18 +1298,16 @@ most important of these files:
  @cindex syntax file
  @item command file
  @itemx syntax file
-These names (synonyms) refer to the file that contains instructions to
-PSPP that tell it what to do.  The syntax file's name is specified on
-the PSPP command line.  Syntax files can also be pulled in with
+These names (synonyms) refer to the file that contains instructions
+that tell PSPP what to do.  The syntax file's name is specified on
+the PSPP command line.  Syntax files can also be read with
  @cmd{INCLUDE} (@pxref{INCLUDE}).
  
  @cindex file, data
  @cindex data file
  @item data file
-Data files contain raw data in ASCII format suitable for being read in
-by @cmd{DATA LIST}.  Data can be embedded in the syntax
-file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
-syntax file a data file too.
+Data files contain raw data in text or binary format.  Data can also
+be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
  
  @cindex file, output
  @cindex output file
@@ -1111,16 +1317,65 @@ run.  The output files receive the tables and charts produced by
  statistical procedures.  The output files may be in any number of formats,
  depending on how PSPP is configured.
  
-@cindex active file
-@cindex file, active
-@item active file
-The active file is the ``file'' on which all PSPP procedures
-are performed.  The active file contains variable definitions and
-cases.  The active file is not necessarily a disk file: it is stored
-in memory if there is room.
+@cindex system file
+@cindex file, system
+@item system file
+System files are binary files that store a dictionary and a set of
+cases.  @cmd{GET} and @cmd{SAVE} read and write system files.
+
+@cindex portable file
+@cindex file, portable
+@item portable file
+Portable files are files in a text-based format that store a dictionary
+and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
+portable files.
  @end table
  
-@node BNF,  , Files, Language
+@node File Handles
+@section File Handles
+@cindex file handles
+
+A @dfn{file handle} is a reference to a data file, system file, or 
+portable file.  Most often, a file handle is specified as the
+name of a file as a string, that is, enclosed within @samp{'} or
+@samp{"}.
+
+A file name string that begins or ends with @samp{|} is treated as the
+name of a command to pipe data to or from.  You can use this feature
+to read data over the network using a program such as @samp{curl}
+(e.g.@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
+read compressed data from a file using a program such as @samp{zcat}
+(e.g.@: @code{GET '|zcat mydata.sav.gz'}), and for many other
+purposes.
+
+PSPP also supports declaring named file handles with the @cmd{FILE
+HANDLE} command.  This command associates an identifier of your choice
+(the file handle's name) with a file.  Later, the file handle name can
+be substituted for the name of the file.  When PSPP syntax accesses a
+file multiple times, declaring a named file handle simplifies updating
+the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
+also required to read data files in binary formats.  @xref{FILE HANDLE},
+for more information.
+
+In some circumstances, PSPP must distinguish whether a file handle
+refers to a system file or a portable file.  When this is necessary to
+read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
+PSPP uses the file's contents to decide.  In the context of writing a
+file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, PSPP
+decides based on the file's name: if it ends in @samp{.por} (with any
+capitalization), then PSPP writes a portable file; otherwise, PSPP
+writes a system file.
+
+INLINE is reserved as a file handle name.  It refers to the ``data
+file'' embedded into the syntax file between @cmd{BEGIN DATA} and
+@cmd{END DATA}.  @xref{BEGIN DATA}, for more information.
+
+The file to which a file handle refers may be reassigned on a later
+@cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
+HANDLE}.  @xref{CLOSE FILE HANDLE}, for
+more information.
+
+@node BNF
  @section Backus-Naur Form
  @cindex BNF
  @cindex Backus-Naur Form
@@ -1137,7 +1392,7 @@ following table describes BNF:
  @item
  Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
  often called @dfn{terminals}.  There are some special terminals, which
-are actually written in lowercase for clarity:
+are written in lowercase for clarity:
  
  @table @asis
  @cindex @code{number}
@@ -1162,11 +1417,9 @@ A single variable name.
  Operators and punctuators.
  
  @cindex @code{.}
-@cindex terminal dot
-@cindex dot, terminal
  @item @code{.}
-The terminal dot.  This is not necessarily an actual dot in the syntax
-file: @xref{Commands}, for more details.
+The end of the command.  This is not necessarily an actual dot in the
+syntax file: @xref{Commands}, for more details.
  @end table
  
  @item
@@ -1188,7 +1441,6 @@ An expression.  @xref{Expressions}, for details.
  @end table
  
  @item
-@cindex @code{::=}
  @cindex ``is defined as''
  @cindex productions
  @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
@@ -1214,4 +1466,3 @@ The first nonterminal defined in a set of productions is called the
  @dfn{start symbol}.  The start symbol defines the entire syntax for
  that command.
  @end itemize
-@setfilename ignored