work on docs

[pspp] / doc / language.texi
diff --git a/doc/language.texi b/doc/language.texi

index 810037100bba6fbfe347d1968a5c19c1922eca13..1c9a4f0105245a5550f7ad71f577be5db2be0c53 100644 (file)
--- a/doc/language.texi
+++ b/doc/language.texi
@@ -1,411 +1,288 @@
-@node Language, Expressions, Invocation, Top
-@chapter The PSPP language
-@cindex language, PSPP
-@cindex PSPP, language
-
-@quotation
-@strong{Please note:} PSPP is not even close to completion.
-Only a few actual statistical procedures are implemented.  PSPP
-is a work in progress.
-@end quotation
-
-This chapter discusses elements common to many PSPP commands.
-Later chapters will describe individual commands in detail.
+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
+@node Language
+@chapter The @pspp{} language
+@cindex language, @pspp{}
+@cindex @pspp{}, language
+
+This chapter discusses elements common to many @pspp{} commands.
+Later chapters describe individual commands in detail.
  
  @menu
  * Tokens::                      Characters combine to form tokens.
  * Commands::                    Tokens combine to form commands.
  
  @menu
  * Tokens::                      Characters combine to form tokens.
  * Commands::                    Tokens combine to form commands.
+* Syntax Variants::             Batch vs. Interactive mode
  * Types of Commands::           Commands come in several flavors.
  * Order of Commands::           Commands combine to form syntax files.
  * Missing Observations::        Handling missing observations.
  * Types of Commands::           Commands come in several flavors.
  * Order of Commands::           Commands combine to form syntax files.
  * Missing Observations::        Handling missing observations.
-* Variables::                   The unit of data storage.
-* Files::                       Files used by PSPP.
+* Datasets::                    Data organization.
+* Files::                       Files used by @pspp{}.
+* File Handles::                How files are named.
  * BNF::                         How command syntax is described.
  @end menu
  
  * BNF::                         How command syntax is described.
  @end menu
  
-@node Tokens, Commands, Language, Language
+
+@node Tokens
  @section Tokens
  @cindex language, lexical analysis
  @cindex language, tokens
  @cindex tokens
  @cindex lexical analysis
  @section Tokens
  @cindex language, lexical analysis
  @cindex language, tokens
  @cindex tokens
  @cindex lexical analysis
-@cindex lexemes
-
-PSPP divides most syntax file lines into series of short chunks
-called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}.  These
-tokens are then grouped to form commands, each of which tells
-PSPP to take some action---read in data, write out data, perform
-a statistical procedure, etc.  The process of dividing input into tokens
-is @dfn{tokenization}, or @dfn{lexical analysis}.  Each type of token is
-described below.
  
  
-@cindex delimiters
-@cindex whitespace
-Tokens must be separated from each other by @dfn{delimiters}.
-Delimiters include whitespace (spaces, tabs, carriage returns, line
-feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
-operators (plus, minus, times, divide, etc.)  Note that while whitespace
-only separates tokens, other delimiters are tokens in themselves.
+@pspp{} divides most syntax file lines into series of short chunks
+called @dfn{tokens}.
+Tokens are then grouped to form commands, each of which tells
+@pspp{} to take some action---read in data, write out data, perform
+a statistical procedure, etc.  Each type of token is
+described below.
  
  @table @strong
  @cindex identifiers
  @item Identifiers
  
  @table @strong
  @cindex identifiers
  @item Identifiers
-Identifiers are names that specify variable names, commands, or command
-details.
-
-@itemize @bullet
-@item
-The first character in an identifier must be a letter, @samp{#}, or
-@samp{@@}.  Some system identifiers begin with @samp{$}, but
-user-defined variables' names may not begin with @samp{$}.
-
-@item
-The remaining characters in the identifier must be letters, digits, or
-one of the following special characters: 
+Identifiers are names that typically specify variables, commands, or
+subcommands.  The first character in an identifier must be a letter,
+@samp{#}, or @samp{@@}.  The remaining characters in the identifier
+must be letters, digits, or one of the following special characters:
  
  @example
  
  @example
-.  _  $  #  @@
+@center @.  _  $  #  @@
  @end example
  
  @end example
  
-@item
-@cindex variable names
-@cindex names, variable
-Variable names may be up any length up to 64 bytes long.
-
-
-@item
  @cindex case-sensitivity
  @cindex case-sensitivity
-Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
-@code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
-representations of the same identifier.
+Identifiers may be any length, but only the first 64 bytes are
+significant.  Identifiers are not case-sensitive: @code{foobar},
+@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
+different representations of the same identifier.
  
  
-@item
-@cindex keywords
-Identifiers other than variable names may be abbreviated to their first
-3 characters if this abbreviation is unambiguous.  These identifiers are
-often called @dfn{keywords}.  (Unique abbreviations of 3 or more
-characters are also accepted: @samp{FRE}, @samp{FREQ}, and
-@samp{FREQUENCIES} are equivalent when the last is a keyword.)
-
-@item
-Whether an identifier is a keyword depends on the context.
-
-@item
-@cindex keywords, reserved
-@cindex reserved keywords
-Some keywords are reserved.  These keywords may not be used in any
-context besides those explicitly described in this manual.  The reserved
-keywords are:
+@cindex identifiers, reserved
+@cindex reserved identifiers
+Some identifiers are reserved.  Reserved identifiers may not be used
+in any context besides those explicitly described in this manual.  The
+reserved identifiers are:
  
  @example
  
  @example
-ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
+@center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
  @end example
  
  @end example
  
-@item 
-Since keywords are identifiers, all the rules for identifiers apply.
-Specifically, they must be delimited as are other identifiers:
-@code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
-variable name.
-@end itemize
+@item Keywords
+Keywords are a subclass of identifiers that form a fixed part of
+command syntax.  For example, command and subcommand names are
+keywords.  Keywords may be abbreviated to their first 3 characters if
+this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
+characters are also accepted: @samp{FRE}, @samp{FREQ}, and
+@samp{FREQUENCIES} are equivalent when the last is a keyword.)
  
  
-@cindex @samp{.}
-@cindex period
-@cindex variable names, ending with period
-@strong{Caution:} It is legal to end a variable name with a period, but
-@emph{don't do it!}  The variable name will be misinterpreted when it is
-the final token on a line: @code{FOO.} will be divided into two separate
-tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
-@xref{Commands, , Forming commands of tokens}.
+Reserved identifiers are always used as keywords.  Other identifiers
+may be used both as keywords and as user-defined identifiers, such as
+variable names.
  
  @item Numbers
  @cindex numbers
  @cindex integers
  @cindex reals
  
  @item Numbers
  @cindex numbers
  @cindex integers
  @cindex reals
-Numbers may be specified as integers or reals.  Integers are internally
-converted into reals.  Scientific notation is not supported.  Here are
-some examples of valid numbers:
+Numbers are expressed in decimal.  A decimal point is optional.
+Numbers may be expressed in scientific notation by adding @samp{e} and
+a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
+are some more examples of valid numbers:
  
  @example
  
  @example
-1234  3.14159265359  .707106781185  8945.
+-5  3.14159265359  1e100  -.707  8945.
  @end example
  
  @end example
  
-@strong{Caution:} The last example will be interpreted as two tokens,
-@samp{8945} and @samp{.}, if it is the last token on a line.
+Negative numbers are expressed with a @samp{-} prefix.  However, in
+situations where a literal @samp{-} token is expected, what appears to
+be a negative number is treated as @samp{-} followed by a positive
+number.
+
+No white space is allowed within a number token, except for horizontal
+white space between @samp{-} and the rest of the number.
+
+The last example above, @samp{8945.} is interpreted as two
+tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
+@xref{Commands, , Forming commands of tokens}.
  
  @item Strings
  @cindex strings
  @cindex @samp{'}
  @cindex @samp{"}
  @cindex case-sensitivity
  
  @item Strings
  @cindex strings
  @cindex @samp{'}
  @cindex @samp{"}
  @cindex case-sensitivity
-Strings are literal sequences of characters enclosed in pairs of single
-quotes (@samp{'}) or double quotes (@samp{"}).
-
-@itemize @bullet
-@item
-Whitespace and case of letters @emph{are} significant inside strings.
-@item
-Whitespace characters inside a string are not delimiters.
-@item
-To include single-quote characters in a string, enclose the string in
-double quotes.
-@item
-To include double-quote characters in a string, enclose the string in
-single quotes.
-@item
-It is not possible to put both single- and double-quote characters
-inside one string.
-@end itemize
-
-@item Hexstrings
-@cindex hexstrings
-Hexstrings are string variants that use hex digits to specify
-characters.
-
-@itemize @bullet
-@item
-A hexstring may be used anywhere that an ordinary string is allowed.
-
-@item
-@cindex @samp{X'}
-@cindex @samp{'}
-A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
-
-@cindex whitespace
-@item
-No whitespace is allowed between the initial @samp{X} and @samp{'}.
-
-@item
-Double quotes @samp{"} may be used in place of single quotes @samp{'} if
-done in both places.
-
-@item
-Each pair of hex digits is internally changed into a single character
-with the given value.
-
-@item
-If there is an odd number of hex digits, the missing last digit is
-assumed to be @samp{0}.
-
-@item
-@cindex portability
-@strong{Please note:} Use of hexstrings is nonportable because the same
-numeric values are associated with different glyphs by different
-operating systems.  Therefore, their use should be confined to syntax
-files that will not be widely distributed.
-
-@item
-@cindex characters, reserved
-@cindex 0
-@cindex whitespace
-@strong{Please note also:} The character with value 00 is reserved for
-internal use by PSPP.  Its use in strings causes an error and
-replacement with a blank space (in ASCII, hex 20, decimal 32).
-@end itemize
-
-@item Punctuation
-@cindex punctuation
-Punctuation separates tokens; punctuators are delimiters.  These are the
-punctuation characters:
-
-@example
-,  /  =  (  )
-@end example
-
-@item Operators
+Strings are literal sequences of characters enclosed in pairs of
+single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
+character used for quoting in the string, double it, @i{e.g.}@:
+@samp{'it''s an apostrophe'}.  White space and case of letters are
+significant inside strings.
+
+Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
+'c'} is equivalent to @samp{'abc'}.  So that a long string may be
+broken across lines, a line break may precede or follow, or both
+precede and follow, the @samp{+}.  (However, an entirely blank line
+preceding or following the @samp{+} is interpreted as ending the
+current command.)
+
+Strings may also be expressed as hexadecimal character values by
+prefixing the initial quote character by @samp{x} or @samp{X}.
+Regardless of the syntax file or active dataset's encoding, the
+hexadecimal digits in the string are interpreted as Unicode characters
+in UTF-8 encoding.
+
+Individual Unicode code points may also be expressed by specifying the
+hexadecimal code point number in single or double quotes preceded by
+@samp{u} or @samp{U}.  For example, Unicode code point U+1D11E, the
+musical G clef character, could be expressed as @code{U'1D11E'}.
+Invalid Unicode code points (above U+10FFFF or in between U+D800 and
+U+DFFF) are not allowed.
+
+When strings are concatenated with @samp{+}, each segment's prefix is
+considered individually.  For example, @code{'The G clef symbol is:' +
+u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
+plain text string.
+
+@item Punctuators and Operators
+@cindex punctuators
  @cindex operators
  @cindex operators
-Operators describe mathematical operations.  Some operators are delimiters:
-
-@example
-(  )  +  -  *  /  **
-@end example
-
-Many of the above operators are also punctuators.  Punctuators are
-distinguished from operators by context. 
-
-The other operators are all reserved keywords.  None of these are
-delimiters:
+These tokens are the punctuators and operators:
  
  @example
  
  @example
-AND  EQ  GE  GT  LE  LT  NE  OR
+@center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
  @end example
  
  @end example
  
-@item Terminal Dot
-@cindex terminal dot
-@cindex dot, terminal
-@cindex period
-@cindex @samp{.}
-A period (@samp{.}) at the end of a line (except for whitespace) is one
-type of a @dfn{terminal dot}, although not every terminal dot is a
-period at the end of a line.  @xref{Commands, , Forming commands of
-tokens}.  A period is a terminal dot @emph{only}
-when it is at the end of a line; otherwise it is part of a
-floating-point number.  (A period outside a number in the middle of a
-line is an error.)
-
-@quotation
-@cindex terminal dot, changing
-@cindex dot, terminal, changing
-@strong{Please note:} The character used for the @dfn{terminal dot}
-can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}).  This
-is strongly discouraged, and throughout all the remainder of this
-manual it will be assumed that the default setting is in effect.
-@end quotation
-
+Most of these appear within the syntax of commands, but the period
+(@samp{.}) punctuator is used only at the end of a command.  It is a
+punctuator only as the last character on a line (except white space).
+When it is the last non-space character on a line, a period is not
+treated as part of another token, even if it would otherwise be part
+of, @i{e.g.}@:, an identifier or a floating-point number.
  @end table
  
  @end table
  
-@node Commands, Types of Commands, Tokens, Language
+@node Commands
  @section Forming commands of tokens
  
  @section Forming commands of tokens
  
-@cindex PSPP, command structure
+@cindex @pspp{}, command structure
  @cindex language, command structure
  @cindex commands, structure
  
  @cindex language, command structure
  @cindex commands, structure
  
-Most PSPP commands share a common structure, diagrammed below:
-
-@example
-@var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
-@end example
-
-@cindex @samp{[  ]}
-In the above, rather daunting, expression, pairs of square brackets
-(@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
-indicate parts of the syntax that vary from command to command.
-Ellipses (@samp{...}) indicate that the preceding part may be repeated
-an arbitrary number of times.  Let's pick apart what it says above:
-
-@itemize @bullet
-@cindex commands, names
-@item
-A command begins with a command name of one or more keywords, such as
-@cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}.  @var{cmd}
-may be abbreviated to its first word if that is unambiguous; each word
-in @var{cmd} may be abbreviated to a unique prefix of three or more
-characters as described above.
-
-@cindex subcommands
-@item
-The command name may be followed by one or more @dfn{subcommands}:
-
-@itemize @minus
-@item
-Each subcommand begins with a unique keyword, indicated by @var{sbc}
-above.  This is analogous to the command name.
-
-@item
-The subcommand name is optionally followed by an equals sign (@samp{=}).
-
-@item
-Some subcommands accept a series of one or more specifications
-(@var{spec}), optionally separated by commas.
-
-@item
-Each subcommand must be separated from the next (if any) by a forward
-slash (@samp{/}).
-@end itemize
-
-@cindex dot, terminal
-@cindex terminal dot
-@item
-Each command must be terminated with a @dfn{terminal dot}.  
-The terminal dot may be given one of three ways:
-
-@itemize @minus
-@item
-(most commonly) A period character at the very end of a line, as
-described above.
-
-@item
-(only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
-details.)  A completely blank line.
-
-@item
-(in batch mode only) Any line that is not indented from the left side of
-the page causes a terminal dot to be inserted before that line.
-Therefore, each command begins with a line that is flush left, followed
-by zero or more lines that are indented one or more characters from the
-left margin.
-
-In batch mode, PSPP will ignore a plus sign, minus sign, or period
-(@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
-line.  Any of these characters as the first character on a line will
-begin a new command.  This allows for visual indentation of a command
-without that command being considered part of the previous command.
-
-PSPP is in batch mode when it is reading input from a file, rather
-than from an interactive user.  Note that the other forms of the
-terminal dot may also be used in batch mode.
-
-Sometimes, one encounters syntax files that are intended to be
-interpreted in interactive mode rather than batch mode (for instance,
-this can happen if a session log file is used directly as a syntax
-file).  When this occurs, use the @samp{-i} command line option to force
-interpretation in interactive mode (@pxref{Language control options}).
-@end itemize
-@end itemize
-
-PSPP ignores empty commands when they are generated by the above
-rules.  Note that, as a consequence of these rules, each command must
-begin on a new line.
-
-@node Types of Commands, Order of Commands, Commands, Language
+Most @pspp{} commands share a common structure.  A command begins with a
+command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
+CASES}.  The command name may be abbreviated to its first word, and
+each word in the command name may be abbreviated to its first three
+or more characters, where these abbreviations are unambiguous.
+
+The command name may be followed by one or more @dfn{subcommands}.
+Each subcommand begins with a subcommand name, which may be
+abbreviated to its first three letters.  Some subcommands accept a
+series of one or more specifications, which follow the subcommand
+name, optionally separated from it by an equals sign
+(@samp{=}). Specifications may be separated from each other
+by commas or spaces.  Each subcommand must be separated from the next (if any)
+by a forward slash (@samp{/}).
+
+There are multiple ways to mark the end of a command.  The most common
+way is to end the last line of the command with a period (@samp{.}) as
+described in the previous section (@pxref{Tokens}).  A blank line, or
+one that consists only of white space or comments, also ends a command.
+
+@node Syntax Variants
+@section Syntax Variants
+
+@cindex Batch syntax
+@cindex Interactive syntax
+
+There are three variants of command syntax, which vary only in how
+they detect the end of one command and the start of the next.
+
+In @dfn{interactive mode}, which is the default for syntax typed at a
+command prompt, a period as the last non-blank character on a line
+ends a command.  A blank line also ends a command.
+
+In @dfn{batch mode}, an end-of-line period or a blank line also ends a
+command.  Additionally, it treats any line that has a non-blank
+character in the leftmost column as beginning a new command.  Thus, in
+batch mode the second and subsequent lines in a command must be
+indented.
+
+Regardless of the syntax mode, a plus sign, minus sign, or period in
+the leftmost column of a line is ignored and causes that line to begin
+a new command.  This is most useful in batch mode, in which the first
+line of a new command could not otherwise be indented, but it is
+accepted regardless of syntax mode.
+
+The default mode for reading commands from a file is @dfn{auto mode}.
+It is the same as batch mode, except that a line with a non-blank in
+the leftmost column only starts a new command if that line begins with
+the name of a @pspp{} command.  This correctly interprets most valid @pspp{}
+syntax files regardless of the syntax mode for which they are
+intended.
+
+The @option{--interactive} (or @option{-i}) or @option{--batch} (or
+@option{-b}) options set the syntax mode for files listed on the @pspp{}
+command line.  @xref{Main Options}, for more details.
+
+@node Types of Commands
  @section Types of Commands
  
  @section Types of Commands
  
-Commands in PSPP are divided roughly into six categories:
+Commands in @pspp{} are divided roughly into six categories:
  
  @table @strong
  @item Utility commands
  @cindex utility commands
  
  @table @strong
  @item Utility commands
  @cindex utility commands
-Set or display various global options that affect PSPP operations.
+Set or display various global options that affect @pspp{} operations.
  May appear anywhere in a syntax file.  @xref{Utilities, , Utility
  commands}.
  
  @item File definition commands
  @cindex file definition commands
  Give instructions for reading data from text files or from special
  May appear anywhere in a syntax file.  @xref{Utilities, , Utility
  commands}.
  
  @item File definition commands
  @cindex file definition commands
  Give instructions for reading data from text files or from special
-binary ``system files''.  Most of these commands discard any previous
-data or variables to replace it with the new data and
-variables.  At least one must appear before the first command in any of
+binary ``system files''.  Most of these commands replace any previous
+data or variables with new data or
+variables.  At least one file definition command must appear before the first command in any of
  the categories below.  @xref{Data Input and Output}.
  
  @item Input program commands
  @cindex input program commands
  the categories below.  @xref{Data Input and Output}.
  
  @item Input program commands
  @cindex input program commands
-Though rarely used, these provide powerful tools for reading data files
+Though rarely used, these provide tools for reading data files
  in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
  
  @item Transformations
  @cindex transformations
  Perform operations on data and write data to output files.  Transformations
  in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
  
  @item Transformations
  @cindex transformations
  Perform operations on data and write data to output files.  Transformations
-are not carried out until a procedure is executed.  
+are not carried out until a procedure is executed.
  
  @item Restricted transformations
  @cindex restricted transformations
  
  @item Restricted transformations
  @cindex restricted transformations
-Same as transformations for most purposes.  @xref{Order of Commands}, for a
-detailed description of the differences.
+Transformations that cannot appear in certain contexts.  @xref{Order
+of Commands}, for details.
  
  @item Procedures
  @cindex procedures
  Analyze data, writing results of analyses to the listing file.  Cause
  transformations specified earlier in the file to be performed.  In a
  more general sense, a @dfn{procedure} is any command that causes the
  
  @item Procedures
  @cindex procedures
  Analyze data, writing results of analyses to the listing file.  Cause
  transformations specified earlier in the file to be performed.  In a
  more general sense, a @dfn{procedure} is any command that causes the
-active file (the data) to be read.
+active dataset (the data) to be read.
  @end table
  
  @end table
  
-@node Order of Commands, Missing Observations, Types of Commands, Language
+@node Order of Commands
  @section Order of Commands
  @cindex commands, ordering
  @cindex order of commands
  
  @section Order of Commands
  @cindex commands, ordering
  @cindex order of commands
  
-PSPP does not place many restrictions on ordering of commands.
-The main restriction is that variables must be defined with one of the
-file-definition commands before they are otherwise referred to.
+@pspp{} does not place many restrictions on ordering of commands.  The
+main restriction is that variables must be defined before they are otherwise
+referenced.  This section describes the details of command ordering,
+but most users will have no need to refer to them.
  
  
-Of course, there are specific rules, for those who are interested.
-PSPP possesses five internal states, called initial, INPUT PROGRAM,
-FILE TYPE, transformation, and procedure states.  (Please note the
+@pspp{} possesses five internal states, called @dfn{initial}, @dfn{input-program}
+@dfn{file-type}, @dfn{transformation}, and @dfn{procedure} states.  (Please note the
  distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
  distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
-@emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
+@emph{commands} and the @dfn{input-program} and @dfn{file-type} @emph{states}.)
  
  
-PSPP starts up in the initial state.  Each successful completion
+@pspp{} starts in the initial state.  Each successful completion
  of a command may cause a state transition.  Each type of command has its
  own rules for state transitions:
  
  of a command may cause a state transition.  Each type of command has its
  own rules for state transitions:
  
@@ -413,7 +290,7 @@ own rules for state transitions:
  @item Utility commands
  @itemize @bullet
  @item
  @item Utility commands
  @itemize @bullet
  @item
-Legal in all states.
+Valid in any state.
  @item
  Do not cause state transitions.  Exception: when @cmd{N OF CASES}
  is executed in the procedure state, it causes a transition to the
  @item
  Do not cause state transitions.  Exception: when @cmd{N OF CASES}
  is executed in the procedure state, it causes a transition to the
@@ -423,50 +300,50 @@ transformation state.
  @item @cmd{DATA LIST}
  @itemize @bullet
  @item
  @item @cmd{DATA LIST}
  @itemize @bullet
  @item
-Legal in all states.
+Valid in any state.
  @item
  When executed in the initial or procedure state, causes a transition to
  @item
  When executed in the initial or procedure state, causes a transition to
-the transformation state.  
+the transformation state.
  @item
  @item
-Clears the active file if executed in the procedure or transformation
+Clears the active dataset if executed in the procedure or transformation
  state.
  @end itemize
  
  @item @cmd{INPUT PROGRAM}
  @itemize @bullet
  @item
  state.
  @end itemize
  
  @item @cmd{INPUT PROGRAM}
  @itemize @bullet
  @item
-Invalid in INPUT PROGRAM and FILE TYPE states.
+Invalid in input-program and file-type states.
  @item
  @item
-Causes a transition to the INPUT PROGRAM state.  
+Causes a transition to the intput-program state.
  @item
  @item
-Clears the active file.
+Clears the active dataset.
  @end itemize
  
  @item @cmd{FILE TYPE}
  @itemize @bullet
  @item
  @end itemize
  
  @item @cmd{FILE TYPE}
  @itemize @bullet
  @item
-Invalid in INPUT PROGRAM and FILE TYPE states.
+Invalid in intput-program and file-type states.
  @item
  @item
-Causes a transition to the FILE TYPE state.
+Causes a transition to the file-type state.
  @item
  @item
-Clears the active file.
+Clears the active dataset.
  @end itemize
  
  @item Other file definition commands
  @itemize @bullet
  @item
  @end itemize
  
  @item Other file definition commands
  @itemize @bullet
  @item
-Invalid in INPUT PROGRAM and FILE TYPE states.
+Invalid in input-program and file-type states.
  @item
  Cause a transition to the transformation state.
  @item
  @item
  Cause a transition to the transformation state.
  @item
-Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
+Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
  and @cmd{UPDATE}.
  @end itemize
  
  @item Transformations
  @itemize @bullet
  @item
  and @cmd{UPDATE}.
  @end itemize
  
  @item Transformations
  @itemize @bullet
  @item
-Invalid in initial and FILE TYPE states.
+Invalid in initial and file-type states.
  @item
  Cause a transition to the transformation state.
  @end itemize
  @item
  Cause a transition to the transformation state.
  @end itemize
@@ -474,7 +351,7 @@ Cause a transition to the transformation state.
  @item Restricted transformations
  @itemize @bullet
  @item
  @item Restricted transformations
  @itemize @bullet
  @item
-Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
+Invalid in initial, input-program, and file-type states.
  @item
  Cause a transition to the transformation state.
  @end itemize
  @item
  Cause a transition to the transformation state.
  @end itemize
@@ -482,24 +359,25 @@ Cause a transition to the transformation state.
  @item Procedures
  @itemize @bullet
  @item
  @item Procedures
  @itemize @bullet
  @item
-Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
+Invalid in initial, input-program, and file-type states.
  @item
  Cause a transition to the procedure state.
  @end itemize
  @end table
  
  @item
  Cause a transition to the procedure state.
  @end itemize
  @end table
  
-@node Missing Observations, Variables, Order of Commands, Language
+@node Missing Observations
  @section Handling missing observations
  @cindex missing values
  @cindex values, missing
  
  @section Handling missing observations
  @cindex missing values
  @cindex values, missing
  
-PSPP includes special support for unknown numeric data values.
+@pspp{} includes special support for unknown numeric data values.
  Missing observations are assigned a special value, called the
  @dfn{system-missing value}.  This ``value'' actually indicates the
  Missing observations are assigned a special value, called the
  @dfn{system-missing value}.  This ``value'' actually indicates the
-absence of value; it means that the actual value is unknown.  Procedures
+absence of a value; it means that the actual value is unknown.  Procedures
  automatically exclude from analyses those observations or cases that
  automatically exclude from analyses those observations or cases that
-have missing values.  Whether single observations or entire cases are
-excluded depends on the procedure.
+have missing values.  Details of missing value exclusion depend on the
+procedure and can often be controlled by the user; refer to
+descriptions of individual procedures for details.
  
  The system-missing value exists only for numeric variables.  String
  variables always have a defined value, even if it is only a string of
  
  The system-missing value exists only for numeric variables.  String
  variables always have a defined value, even if it is only a string of
@@ -508,34 +386,42 @@ spaces.
  Variables, whether numeric or string, can have designated
  @dfn{user-missing values}.  Every user-missing value is an actual value
  for that variable.  However, most of the time user-missing values are
  Variables, whether numeric or string, can have designated
  @dfn{user-missing values}.  Every user-missing value is an actual value
  for that variable.  However, most of the time user-missing values are
-treated in the same way as the system-missing value.  String variables
-that are wider than a certain width, usually 8 characters (depending on
-computer architecture), cannot have user-missing values.
+treated in the same way as the system-missing value.
  
  For more information on missing values, see the following sections:
  
  For more information on missing values, see the following sections:
-@ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
+@ref{Datasets}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
  documentation on individual procedures for information on how they
  handle missing values.
  
  documentation on individual procedures for information on how they
  handle missing values.
  
-@node Variables, Files, Missing Observations, Language
-@section Variables
-@cindex variables
+@node Datasets
+@section Datasets
+@cindex dataset
+@cindex variable
  @cindex dictionary
  
  @cindex dictionary
  
-Variables are the basic unit of data storage in PSPP.  All the
-variables in a file taken together, apart from any associated data, are
-said to form a @dfn{dictionary}.  
-Some details of variables are described in the sections below.
+@pspp{} works with data organized into @dfn{datasets}.  A dataset
+consists of a set of @dfn{variables}, which taken together are said to
+form a @dfn{dictionary}, and one or more @dfn{cases}, each of which
+has one value for each variable.
+
+At any given time @pspp{} has exactly one distinguished dataset, called
+the @dfn{active dataset}.  Most @pspp{} commands work only with the
+active dataset.  In addition to the active dataset, @pspp{} also supports
+any number of additional open datasets.  The @cmd{DATASET} commands
+can choose a new active dataset from among those that are open, as
+well as create and destroy datasets (@pxref{DATASET}).
+
+The sections below describe variables in more detail.
  
  @menu
  * Attributes::                  Attributes of variables.
  
  @menu
  * Attributes::                  Attributes of variables.
-* System Variables::            Variables automatically defined by PSPP.
+* System Variables::            Variables automatically defined by @pspp{}.
  * Sets of Variables::           Lists of variable names.
  * Sets of Variables::           Lists of variable names.
-* Input/Output Formats::        Input and output formats.
+* Input and Output Formats::    Input and output formats.
  * Scratch Variables::           Variables deleted by procedures.
  @end menu
  
  * Scratch Variables::           Variables deleted by procedures.
  @end menu
  
-@node Attributes, System Variables, Variables, Variables
+@node Attributes
  @subsection Attributes of Variables
  @cindex variables, attributes of
  @cindex attributes of variables
  @subsection Attributes of Variables
  @cindex variables, attributes of
  @cindex attributes of variables
@@ -543,9 +429,29 @@ Each variable has a number of attributes, including:
  
  @table @strong
  @item Name
  
  @table @strong
  @item Name
-This is an identifier.  Each variable must have a different name.
+An identifier, up to 64 bytes long.  Each variable must have a different name.
  @xref{Tokens}.
  
  @xref{Tokens}.
  
+Some system variable names begin with @samp{$}, but user-defined
+variables' names may not begin with @samp{$}.
+
+@cindex @samp{.}
+@cindex period
+@cindex variable names, ending with period
+The final character in a variable name should not be @samp{.}, because
+such an identifier will be misinterpreted when it is the final token
+on a line: @code{FOO.} is divided into two separate tokens,
+@samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.
+
+@cindex @samp{_}
+The final character in a variable name should not be @samp{_}, because
+some such identifiers are used for special purposes by @pspp{}
+procedures.
+
+As with all @pspp{} identifiers, variable names are not case-sensitive.
+@pspp{} capitalizes variable names on output the same way they were
+capitalized at their point of definition in the input.
+
  @cindex variables, type
  @cindex type of variables
  @item Type
  @cindex variables, type
  @cindex type of variables
  @item Type
@@ -556,15 +462,9 @@ Numeric or string.
  @item Width
  (string variables only) String variables with a width of 8 characters or
  fewer are called @dfn{short string variables}.  Short string variables
  @item Width
  (string variables only) String variables with a width of 8 characters or
  fewer are called @dfn{short string variables}.  Short string variables
-can be used in many procedures where @dfn{long string variables} (those
+may be used in a few contexts where @dfn{long string variables} (those
  with widths greater than 8) are not allowed.
  
  with widths greater than 8) are not allowed.
  
-@quotation
-@strong{Please note:} Certain systems may consider strings longer than 8
-characters to be short strings.  Eight characters represents a minimum
-figure for the maximum length of a short string.
-@end quotation
-
  @item Position
  Variables in the dictionary are arranged in a specific order.
  @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
  @item Position
  Variables in the dictionary are arranged in a specific order.
  @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
@@ -600,21 +500,60 @@ string.  @xref{VALUE LABELS}.
  Display width, format, and (for numeric variables) number of decimal
  places.  This attribute does not affect how data are stored, just how
  they are displayed.  Example: a width of 8, with 2 decimal places.
  Display width, format, and (for numeric variables) number of decimal
  places.  This attribute does not affect how data are stored, just how
  they are displayed.  Example: a width of 8, with 2 decimal places.
-@xref{PRINT FORMATS}.
+@xref{Input and Output Formats}.
  
  @cindex write format
  @item Write format
  
  @cindex write format
  @item Write format
-Similar to print format, but used by certain commands that are
-designed to write to binary files.  @xref{WRITE FORMATS}.
+Similar to print format, but used by the @cmd{WRITE} command
+(@pxref{WRITE}).
+
+@cindex measurement level
+@item Measurement level
+One of the following:
+
+@table @asis
+@item Nominal
+Each value of a nominal variable represents a distinct category.  The
+possible categories are finite and often have value labels.  The order
+of categories is not significant.  Political parties, US states, and
+yes/no choices are nominal.  Numeric and string variables can be
+nominal.
+
+@item Ordinal
+Ordinal variables also represent distinct categories, but their values
+are arranged according to some natural order.  Likert scales, e.g.@:
+from strongly disagree to strongly agree, are ordinal.  Data grouped
+into ranges, e.g.@: age groups or income groups, are ordinal.  Both
+numeric and string variables can be ordinal.  String values are
+ordered alphabetically, so letter grades from A to F will work as
+expected, but @code{poor}, @code{satisfactory}, @code{excellent} will
+not.
+
+@item Scale
+Scale variables are ones for which differences and ratios are
+meaningful.  These are often values which have a natural unit
+attached, such as age in years, income in dollars, or distance in
+miles.  Only numeric variables are scalar.
  @end table
  
  @end table
  
-@node System Variables, Sets of Variables, Attributes, Variables
-@subsection Variables Automatically Defined by PSPP
+@cindex custom attributes
+@item Custom attributes
+User-defined associations between names and values.  @xref{VARIABLE
+ATTRIBUTE}.
+
+@cindex variable role
+@item Role
+The intended role of a variable for use in dialog boxes in graphical
+user interfaces.  @xref{VARIABLE ROLE}.
+@end table
+
+@node System Variables
+@subsection Variables Automatically Defined by @pspp{}
  @cindex system variables
  @cindex variables, system
  
  There are seven system variables.  These are not like ordinary
  @cindex system variables
  @cindex variables, system
  
  There are seven system variables.  These are not like ordinary
-variables, as they are not stored in each case.  They can only be used
+variables because system variables are not always stored.  They can be used only
  in expressions.  These system variables, whose values and output formats
  cannot be modified, are described below.
  
  in expressions.  These system variables, whose values and output formats
  cannot be modified, are described below.
  
@@ -626,12 +565,17 @@ shuffled around.
  
  @cindex @code{$DATE}
  @item $DATE
  
  @cindex @code{$DATE}
  @item $DATE
-Date the PSPP process was started, in format A9, following the
-pattern @code{DD MMM YY}.
+Date the @pspp{} process was started, in format A9, following the
+pattern @code{DD-MMM-YY}.
+
+@cindex @code{$DATE11}
+@item $DATE11
+Date the @pspp{} process was started, in format A11, following the
+pattern @code{DD-MMM-YYYY}.
  
  @cindex @code{$JDATE}
  @item $JDATE
  
  @cindex @code{$JDATE}
  @item $JDATE
-Number of days between 15 Oct 1582 and the time the PSPP process
+Number of days between 15 Oct 1582 and the time the @pspp{} process
  was started.
  
  @cindex @code{$LENGTH}
  was started.
  
  @cindex @code{$LENGTH}
@@ -644,7 +588,7 @@ System missing value, in format F1.
  
  @cindex @code{$TIME}
  @item $TIME
  
  @cindex @code{$TIME}
  @item $TIME
-Number of seconds between midnight 14 Oct 1582 and the time the active file
+Number of seconds between midnight 14 Oct 1582 and the time the active dataset
  was read, in format F20.
  
  @cindex @code{$WIDTH}
  was read, in format F20.
  
  @cindex @code{$WIDTH}
@@ -652,411 +596,732 @@ was read, in format F20.
  Page width, in characters, in format F3.
  @end table
  
  Page width, in characters, in format F3.
  @end table
  
-@node Sets of Variables, Input/Output Formats, System Variables, Variables
+@node Sets of Variables
  @subsection Lists of variable names
  @subsection Lists of variable names
-@cindex TO convention
-@cindex convention, TO
+@cindex @code{TO} convention
+@cindex convention, @code{TO}
+
+To refer to a set of variables, list their names one after another.
+Optionally, their names may be separated by commas.  To include a
+range of variables from the dictionary in the list, write the name of
+the first and last variable in the range, separated by @code{TO}.  For
+instance, if the dictionary contains six variables with the names
+@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
+@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
+variables @code{X2}, @code{GOAL}, and @code{MET}.
  
  
-There are several ways to specify a set of variables:
+Commands that define variables, such as @cmd{DATA LIST}, give
+@code{TO} an alternate meaning.  With these commands, @code{TO} define
+sequences of variables whose names end in consecutive integers.  The
+syntax is two identifiers that begin with the same root and end with
+numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
+variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
+@code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
+variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
+@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
+@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
+
+After a set of variables has been defined with @cmd{DATA LIST} or
+another command with this method, the same set can be referenced on
+later commands using the same syntax.
  
  
-@enumerate
-@item
-(Most commonly.)  List the variable names one after another, optionally
-separating them by commas.
+@node Input and Output Formats
+@subsection Input and Output Formats
  
  
-@cindex @code{TO}
-@item
-(This method cannot be used on commands that define the dictionary, such
-as @cmd{DATA LIST}.)  The syntax is the names of two existing variables,
-separated by the reserved keyword @code{TO}.  The meaning is to include
-every variable in the dictionary between and including the variables
-specified.  For instance, if the dictionary contains six variables with
-the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
-@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
-variables @code{X2}, @code{GOAL}, and @code{MET}.
+@cindex formats
+An @dfn{input format} describes how to interpret the contents of an
+input field as a number or a string.  It might specify that the field
+contains an ordinary decimal number, a time or date, a number in binary
+or hexadecimal notation, or one of several other notations.  Input
+formats are used by commands such as @cmd{DATA LIST} that read data or
+syntax files into the @pspp{} active dataset.
+
+Every input format corresponds to a default @dfn{output format} that
+specifies the formatting used when the value is output later.  It is
+always possible to explicitly specify an output format that resembles
+the input format.  Usually, this is the default, but in cases where the
+input format is unfriendly to human readability, such as binary or
+hexadecimal formats, the default output format is an easier-to-read
+decimal format.
+
+Every variable has two output formats, called its @dfn{print format} and
+@dfn{write format}.  Print formats are used in most output contexts;
+write formats are used only by @cmd{WRITE} (@pxref{WRITE}).  Newly
+created variables have identical print and write formats, and
+@cmd{FORMATS}, the most commonly used command for changing formats
+(@pxref{FORMATS}), sets both of them to the same value as well.  Thus,
+most of the time, the distinction between print and write formats is
+unimportant.
+
+Input and output formats are specified to @pspp{} with
+a @dfn{format specification} of the
+form @subcmd{@var{TYPE}@var{w}} or @code{TYPE@var{w}.@var{d}}, where
+@var{TYPE} is one of the format types described later, @var{w} is a
+field width measured in columns, and @var{d} is an optional number of
+decimal places.  If @var{d} is omitted, a value of 0 is assumed.  Some
+formats do not allow a nonzero @var{d} to be specified.
+
+The following sections describe the input and output formats supported
+by @pspp{}.
  
  
-@item
-(This method can be used only on commands that define the dictionary,
-such as @cmd{DATA LIST}.)  It is used to define sequences of variables
-that end in consecutive integers.  The syntax is two identifiers that
-end in numbers.  This method is best illustrated with examples:
+@menu
+* Basic Numeric Formats::
+* Custom Currency Formats::
+* Legacy Numeric Formats::
+* Binary and Hexadecimal Numeric Formats::
+* Time and Date Formats::
+* Date Component Formats::
+* String Formats::
+@end menu
+
+@node Basic Numeric Formats
+@subsubsection Basic Numeric Formats
+
+@cindex numeric formats
+The basic numeric formats are used for input and output of real numbers
+in standard or scientific notation.  The following table shows an
+example of how each format displays positive and negative numbers with
+the default decimal point setting:
+
+@float
+@multitable {DOLLAR10.2} {@code{@tie{}$3,141.59}} {@code{-$3,141.59}}
+@headitem Format @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
+@item F8.2       @tab @code{@tie{}3141.59}   @tab @code{-3141.59}
+@item COMMA9.2   @tab @code{@tie{}3,141.59}  @tab @code{-3,141.59}
+@item DOT9.2     @tab @code{@tie{}3.141,59}  @tab @code{-3.141,59}
+@item DOLLAR10.2 @tab @code{@tie{}$3,141.59} @tab @code{-$3,141.59}
+@item PCT9.2     @tab @code{@tie{}3141.59%}  @tab @code{-3141.59%}
+@item E8.1       @tab @code{@tie{}3.1E+003}  @tab @code{-3.1E+003}
+@end multitable
+@end float
+
+On output, numbers in F format are expressed in standard decimal
+notation with the requested number of decimal places.  The other formats
+output some variation on this style:
  
  @itemize @bullet
  @item
  
  @itemize @bullet
  @item
-The syntax @code{X1 TO X5} defines 5 variables:
+Numbers in COMMA format are additionally grouped every three digits by
+inserting a grouping character.  The grouping character is ordinarily a
+comma, but it can be changed to a period (@pxref{SET DECIMAL}).
  
  
-@itemize @minus
-@item
-X1
-@item
-X2
  @item
  @item
-X3
-@item
-X4
-@item
-X5
-@end itemize
+DOT format is like COMMA format, but it interchanges the role of the
+decimal point and grouping characters.  That is, the current grouping
+character is used as a decimal point and vice versa.
  
  @item
  
  @item
-The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
+DOLLAR format is like COMMA format, but it prefixes the number with
+@samp{$}.
  
  
-@itemize @minus
-@item
-ITEM0008
-@item
-ITEM0009
-@item
-ITEM0010
  @item
  @item
-ITEM0011
-@item
-ITEM0012
-@item
-ITEM0013
-@end itemize
+PCT format is like F format, but adds @samp{%} after the number.
  
  @item
  
  @item
-Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
-are invalid, although for different reasons, which should be evident.
+The E format always produces output in scientific notation.
  @end itemize
  
  @end itemize
  
-Note that after a set of variables has been defined with @cmd{DATA LIST}
-or another command with this method, the same set can be referenced on
-later commands using the same syntax.
-
-@item
-The above methods can be combined, either one after another or delimited
-by commas.  For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
-is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
-is individually legal.
-@end enumerate
+On input, the basic numeric formats accept positive and numbers in
+standard decimal notation or scientific notation.  Leading and trailing
+spaces are allowed.  An empty or all-spaces field, or one that contains
+only a single period, is treated as the system missing value.
  
  
-@node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
-@subsection Input and Output Formats
+In scientific notation, the exponent may be introduced by a sign
+(@samp{+} or @samp{-}), or by one of the letters @samp{e} or @samp{d}
+(in uppercase or lowercase), or by a letter followed by a sign.  A
+single space may follow the letter or the sign or both.
  
  
-Data that PSPP inputs and outputs must have one of a number of formats.
-These formats are described, in general, by a format specification of
-the form @code{NAMEw.d}, where @var{name} is the
-format name and @var{w} is a field width.  @var{d} is the optional
-desired number of decimal places, if appropriate.  If @var{d} is not
-included then it is assumed to be 0.  Some formats do not allow @var{d}
-to be specified.
-
-When an input format is specified on @cmd{DATA LIST} or another
-command, then
-it is converted to an output format for the purposes of @cmd{PRINT}
-and other
-data output commands.  For most purposes, input and output formats are
-the same; the salient differences are described below.
-
-Below are listed the input and output formats supported by PSPP.  If an
-input format is mapped to a different output format by default, then
-that mapping is indicated with @result{}.  Each format has the listed
-bounds on input width (iw) and output width (ow).
-
-The standard numeric input and output formats are given in the following
-table:
+On fixed-format @cmd{DATA LIST} (@pxref{DATA LIST FIXED}) and in a few
+other contexts, decimals are implied when the field does not contain a
+decimal point.  In F6.5 format, for example, the field @code{314159} is
+taken as the value 3.14159 with implied decimals.  Decimals are never
+implied if an explicit decimal point is present or if scientific
+notation is used.
  
  
-@table @asis
-@item Fw.d: 1 <= iw,ow <= 40
-Standard decimal format with @var{d} decimal places.  If the number is
-too large to fit within the field width, it is expressed in scientific
-notation (@code{1.2+34}) if w >= 6, with always at least two digits in
-the exponent.  When used as an input format, scientific notation is
-allowed but an E or an F must be used to introduce the exponent.
-
-The default output format is the same as the input format, except if
-@var{d} > 1.  In that case the output @var{w} is always made to be at
-least 2 + @var{d}.
+E and F formats accept the basic syntax already described.  The other
+formats allow some additional variations:
  
  
-@item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
-For input this is equivalent to F format except that no E or F is
-require to introduce the exponent.  For output, produces scientific
-notation in the form @code{1.2+34}.  There are always at least two
-digits given in the exponent.
+@itemize @bullet
+@item
+COMMA, DOLLAR, and DOT formats ignore grouping characters within the
+integer part of the input field.  The identity of the grouping
+character depends on the format.
  
  
-The default output @var{w} is the largest of the input @var{w}, the
-input @var{d} + 7, and 10.  The default output @var{d} is the input
-@var{d}, but at least 3.
+@item
+DOLLAR format allows a dollar sign to precede the number.  In a negative
+number, the dollar sign may precede or follow the minus sign.
  
  
-@item COMMAw.d: 1 <= iw,ow <= 40
-Equivalent to F format, except that groups of three digits are
-comma-separated on output.  If the number is too large to express in the
-field width, then first commas are eliminated, then if there is still
-not enough space the number is expressed in scientific notation given
-that w >= 6.  Commas are allowed and ignored when this is used as an
-input format.  
+@item
+PCT format allows a percent sign to follow the number.
+@end itemize
  
  
-@item DOTw.d: 1 <= iw,ow <= 40
-Equivalent to COMMA format except that the roles of comma and decimal
-point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
-COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
-decimal point.
+All of the basic number formats have a maximum field width of 40 and
+accept no more than 16 decimal places, on both input and output.  Some
+additional restrictions apply:
  
  
-@item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
-Equivalent to COMMA format, except that the number is prefixed by a
-dollar sign (@samp{$}) if there is room.  On input the value is allowed
-to be prefixed by a dollar sign, which is ignored.
+@itemize @bullet
+@item
+As input formats, the basic numeric formats allow no more decimal places
+than the field width.  As output formats, the field width must be
+greater than the number of decimal places; that is, large enough to
+allow for a decimal point and the number of requested decimal places.
+DOLLAR and PCT formats must allow an additional column for @samp{$} or
+@samp{%}.
  
  
-The default output @var{w} is the input @var{w}, but at least 2.
+@item
+The default output format for a given input format increases the field
+width enough to make room for optional input characters.  If an input
+format calls for decimal places, the width is increased by 1 to make
+room for an implied decimal point.  COMMA, DOT, and DOLLAR formats also
+increase the output width to make room for grouping characters.  DOLLAR
+and PCT further increase the output field width by 1 to make room for
+@samp{$} or @samp{%}.  The increased output width is capped at 40, the
+maximum field width.
  
  
-@item PCTw.d: 2 <= iw,ow <= 40
-Equivalent to F format, except that the number is suffixed by a percent
-sign (@samp{%}) if there is room.  On input the value is allowed to be
-suffixed by a percent sign, which is ignored.
+@item
+The E format is exceptional.  For output, E format has a minimum width
+of 7 plus the number of decimal places.  The default output format for
+an E input format is an E format with at least 3 decimal places and
+thus a minimum width of 10.
+@end itemize
  
  
-The default output @var{w} is the input @var{w}, but at least 2.
+More details of basic numeric output formatting are given below:
  
  
-@item Nw.d: 1 <= iw,ow <= 40
-Only digits are allowed within the field width.  The decimal point is
-assumed to be @var{d} digits from the right margin.
+@itemize @bullet
+@item
+Output rounds to nearest, with ties rounded away from zero.  Thus, 2.5
+is output as @code{3} in F1.0 format, and -1.125 as @code{-1.13} in F5.1
+format.
  
  
-The default output format is F with the same @var{w} and @var{d}, except
-if @var{d} > 1.  In that case the output @var{w} is always made to be at
-least 2 + @var{d}.
+@item
+The system-missing value is output as a period in a field of spaces,
+placed in the decimal point's position, or in the rightmost column if no
+decimal places are requested.  A period is used even if the decimal
+point character is a comma.
  
  
-@item Zw.d @result{} F: 1 <= iw,ow <= 40
-Zoned decimal input.  If you need to use this then you know how.
+@item
+A number that does not fill its field is right-justified within the
+field.
  
  
-@item IBw.d @result{} F: 1 <= iw,ow <= 8
-Integer binary format.  The field is interpreted as a fixed-point
-positive or negative binary number in two's-complement notation.  The
-location of the decimal point is implied.  Endianness is the same as the
-host machine.
+@item
+A number is too large for its field causes decimal places to be dropped
+to make room.  If dropping decimals does not make enough room,
+scientific notation is used if the field is wide enough.  If a number
+does not fit in the field, even in scientific notation, the overflow is
+indicated by filling the field with asterisks (@samp{*}).
  
  
-The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
-with output @var{w} as 9 + input @var{d} and output @var{d} as input
-@var{d}.
+@item
+COMMA, DOT, and DOLLAR formats insert grouping characters only if space
+is available for all of them.  Grouping characters are never inserted
+when all decimal places must be dropped.  Thus, 1234.56 in COMMA5.2
+format is output as @samp{@tie{}1235} without a comma, even though there
+is room for one, because all decimal places were dropped.
  
  
-@item PIB @result{} F: 1 <= iw,ow <= 8
-Positive integer binary format.  The field is interpreted as a
-fixed-point positive binary number.  The location of the decimal point
-is implied.  Endianness is teh same as the host machine.
+@item
+DOLLAR or PCT format drop the @samp{$} or @samp{%} only if the number
+would not fit at all without it.  Scientific notation with @samp{$} or
+@samp{%} is preferred to ordinary decimal notation without it.
  
  
-The default output format follows the rules for IB format.
+@item
+Except in scientific notation, a decimal point is included only when
+it is followed by a digit.  If the integer part of the number being
+output is 0, and a decimal point is included, then the zero before the
+decimal point is dropped.
  
  
-@item Pw.d @result{} F: 1 <= iw,ow <= 16
-Binary coded decimal format.  Each byte from left to right, except the
-rightmost, represents two digits.  The upper nibble of each byte is more
-significant.  The upper nibble of the final byte is the least
-significant digit.  The lower nibble of the final byte is the sign; a
-value of D represents a negative sign and all other values are
-considered positive.  The decimal point is implied.
+In scientific notation, the number always includes a decimal point,
+even if it is not followed by a digit.
  
  
-The default output format follows the rules for IB format.
+@item
+A negative number includes a minus sign only in the presence of a
+nonzero digit: -0.01 is output as @samp{-.01} in F4.2 format but as
+@samp{@tie{}@tie{}.0} in F4.1 format.  Thus, a ``negative zero'' never
+includes a minus sign.
  
  
-@item PKw.d @result{} F: 1 <= iw,ow <= 16
-Positive binary code decimal format.  Same as P but the last byte is the
-same as the others.
+@item
+In negative numbers output in DOLLAR format, the dollar sign follows the
+negative sign.  Thus, -9.99 in DOLLAR6.2 format is output as
+@code{-$9.99}.
  
  
-The default output format follows the rules for IB format.
+@item
+In scientific notation, the exponent is output as @samp{E} followed by
+@samp{+} or @samp{-} and exactly three digits.  Numbers with magnitude
+less than 10**-999 or larger than 10**999 are not supported by most
+computers, but if they are supported then their output is considered
+to overflow the field and they are output as asterisks.
  
  
-@item RBw @result{} F: 2 <= iw,ow <= 8
+@item
+On most computers, no more than 15 decimal digits are significant in
+output, even if more are printed.  In any case, output precision cannot
+be any higher than input precision; few data sets are accurate to 15
+digits of precision.  Unavoidable loss of precision in intermediate
+calculations may also reduce precision of output.
  
  
-Binary C architecture-dependent ``double'' format.  For a standard
-IEEE754 implementation @var{w} should be 8.
+@item
+Special values such as infinities and ``not a number'' values are
+usually converted to the system-missing value before printing.  In a few
+circumstances, these values are output directly.  In fields of width 3
+or greater, special values are output as however many characters
+fit from @code{+Infinity} or @code{-Infinity} for infinities, from
+@code{NaN} for ``not a number,'' or from @code{Unknown} for other values
+(if any are supported by the system).  In fields under 3 columns wide,
+special values are output as asterisks.
+@end itemize
  
  
-The default output format follows the rules for IB format.
+@node Custom Currency Formats
+@subsubsection Custom Currency Formats
+
+@cindex currency formats
+The custom currency formats are closely related to the basic numeric
+formats, but they allow users to customize the output format.  The
+SET command configures custom currency formats, using the syntax
+@display
+SET CC@var{x}=@t{"}@var{string}@t{"}.
+@end display
+@noindent
+where @var{x} is A, B, C, D, or E, and @var{string} is no more than 16
+characters long.
+
+@var{string} must contain exactly three commas or exactly three periods
+(but not both), except that a single quote character may be used to
+``escape'' a following comma, period, or single quote.  If three commas
+are used, commas are used for grouping in output, and a period
+is used as the decimal point.  Uses of periods reverses these roles.
+
+The commas or periods divide @var{string} into four fields, called the
+@dfn{negative prefix}, @dfn{prefix}, @dfn{suffix}, and @dfn{negative
+suffix}, respectively.  The prefix and suffix are added to output
+whenever space is available.  The negative prefix and negative suffix
+are always added to a negative number when the output includes a nonzero
+digit.
+
+The following syntax shows how custom currency formats could be used to
+reproduce basic numeric formats:
  
  
-@item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
-PIB format encoded as textual hex digit pairs.  @var{w} must be even.
+@example
+@group
+SET CCA="-,,,".  /* Same as COMMA.
+SET CCB="-...".  /* Same as DOT.
+SET CCC="-,$,,". /* Same as DOLLAR.
+SET CCD="-,,%,". /* Like PCT, but groups with commas.
+@end group
+@end example
  
  
-The input width is mapped to a default output width as follows:
-2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
-12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
-decimal places.
+Here are some more examples of custom currency formats.  The final
+example shows how to use a single quote to escape a delimiter:
  
  
-@item RBHEXw @result{} F: 4 <= iw,ow <= 16
+@example
+@group
+SET CCA=",EUR,,-".   /* Euro.
+SET CCB="(,USD ,,)". /* US dollar.
+SET CCC="-.R$..".    /* Brazilian real.
+SET CCD="-,, NIS,".  /* Israel shekel.
+SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
+@end group
+@end example
  
  
-RB format encoded as textual hex digits pairs.  @var{w} must be even.
+@noindent These formats would yield the following output:
  
  
-The default output format is F8.2.
+@float
+@multitable {CCD13.2} {@code{@tie{}@tie{}USD 3,145.59}} {@code{(USD 3,145.59)}}
+@headitem Format @tab @code{@tie{}3145.59}         @tab @code{-3145.59}
+@item CCA12.2 @tab @code{@tie{}EUR3,145.59}        @tab @code{EUR3,145.59-}
+@item CCB14.2 @tab @code{@tie{}@tie{}USD 3,145.59} @tab @code{(USD 3,145.59)}
+@item CCC11.2 @tab @code{@tie{}R$3.145,59}         @tab @code{-R$3.145,59}
+@item CCD13.2 @tab @code{@tie{}3,145.59 NIS}       @tab @code{-3,145.59 NIS}
+@item CCE10.0 @tab @code{@tie{}Rp. 3.146}          @tab @code{-Rp. 3.146}
+@end multitable
+@end float
  
  
-@item CCAw.d: 1 <= ow <= 40
-@itemx CCBw.d: 1 <= ow <= 40
-@itemx CCCw.d: 1 <= ow <= 40
-@itemx CCDw.d: 1 <= ow <= 40
-@itemx CCEw.d: 1 <= ow <= 40
+The default for all the custom currency formats is @samp{-,,,},
+equivalent to COMMA format.
  
  
-User-defined custom currency formats.  May not be used as an input
-format.  @xref{SET}, for more details.
-@end table
+@node Legacy Numeric Formats
+@subsubsection Legacy Numeric Formats
  
  
-The date and time numeric input and output formats accept a number of
-possible formats.  Before describing the formats themselves, some
-definitions of the elements that make up their formats will be helpful:
+The N and Z numeric formats provide compatibility with legacy file
+formats.  They have much in common:
  
  
-@table @dfn
-@item leader
-All formats accept an optional whitespace leader.
+@itemize @bullet
+@item
+Output is rounded to the nearest representable value, with ties rounded
+away from zero.
  
  
-@item day
-An integer between 1 and 31 representing the day of month.
+@item
+Numbers too large to display are output as a field filled with asterisks
+(@samp{*}).
  
  
-@item day-count
-An integer representing a number of days.
+@item
+The decimal point is always implicitly the specified number of digits
+from the right edge of the field, except that Z format input allows an
+explicit decimal point.
  
  
-@item date-delimiter
-One or more characters of whitespace or the following characters:
-@code{- / . ,}
+@item
+Scientific notation may not be used.
  
  
-@item month
-A month name in one of the following forms:
-@itemize @bullet
  @item
  @item
-An integer between 1 and 12.
+The system-missing value is output as a period in a field of spaces.
+The period is placed just to the right of the implied decimal point in
+Z format, or at the right end in N format or in Z format if no decimal
+places are requested.  A period is used even if the decimal point
+character is a comma.
+
  @item
  @item
-Roman numerals representing an integer between 1 and 12.
+Field width may range from 1 to 40.  Decimal places may range from 0 up
+to the field width, to a maximum of 16.
+
  @item
  @item
-At least the first three characters of an English month name (January,
-February, @dots{}).
+When a legacy numeric format used for input is converted to an output
+format, it is changed into the equivalent F format.  The field width is
+increased by 1 if any decimal places are specified, to make room for a
+decimal point.  For Z format, the field width is increased by 1 more
+column, to make room for a negative sign.  The output field width is
+capped at 40 columns.
  @end itemize
  
  @end itemize
  
-@item year
-An integer year number between 1582 and 19999, or between 1 and 199.
-Years between 1 and 199 will have 1900 added.
+@subsubheading N Format
  
  
-@item julian
-A single number with a year number in the first 2, 3, or 4 digits (as
-above) and the day number within the year in the last 3 digits.
+The N format supports input and output of fields that contain only
+digits.  On input, leading or trailing spaces, a decimal point, or any
+other non-digit character causes the field to be read as the
+system-missing value.  As a special exception, an N format used on
+@cmd{DATA LIST FREE} or @cmd{DATA LIST LIST} is treated as the
+equivalent F format.
  
  
-@item quarter
-An integer between 1 and 4 representing a quarter.
+On output, N pads the field on the left with zeros.  Negative numbers
+are output like the system-missing value.
  
  
-@item q-delimiter
-The letter @samp{Q} or @samp{q}.
+@subsubheading Z Format
  
  
-@item week
-An integer between 1 and 53 representing a week within a year.
+The Z format is a ``zoned decimal'' format used on IBM mainframes.  Z
+format encodes the sign as part of the final digit, which must be one of
+the following:
+@example
+0123456789
+@{ABCDEFGHI
+@}JKLMNOPQR
+@end example
+@noindent
+where the characters in each row represent digits 0 through 9 in order.
+Characters in the first two rows indicate a positive sign; those in the
+third indicate a negative sign.
+
+On output, Z fields are padded on the left with spaces.  On input,
+leading and trailing spaces are ignored.  Any character in an input
+field other than spaces, the digit characters above, and @samp{.} causes
+the field to be read as system-missing.
+
+The decimal point character for input and output is always @samp{.},
+even if the decimal point character is a comma (@pxref{SET DECIMAL}).
+
+Nonzero, negative values output in Z format are marked as negative even
+when no nonzero digits are output.  For example, -0.2 is output in Z1.0
+format as @samp{J}.  The ``negative zero'' value supported by most
+machines is output as positive.
+
+@node Binary and Hexadecimal Numeric Formats
+@subsubsection Binary and Hexadecimal Numeric Formats
+
+@cindex binary formats
+@cindex hexadecimal formats
+The binary and hexadecimal formats are primarily designed for
+compatibility with existing machine formats, not for human readability.
+All of them therefore have a F format as default output format.  Some of
+these formats are only portable between machines with compatible byte
+ordering (endianness) or floating-point format.
+
+Binary formats use byte values that in text files are interpreted as
+special control functions, such as carriage return and line feed.  Thus,
+data in binary formats should not be included in syntax files or read
+from data files with variable-length records, such as ordinary text
+files.  They may be read from or written to data files with fixed-length
+records.  @xref{FILE HANDLE}, for information on working with
+fixed-length records.
+
+@subsubheading P and PK Formats
+
+These are binary-coded decimal formats, in which every byte (except the
+last, in P format) represents two decimal digits.  The most-significant
+4 bits of the first byte is the most-significant decimal digit, the
+least-significant 4 bits of the first byte is the next decimal digit,
+and so on.
+
+In P format, the most-significant 4 bits of the last byte are the
+least-significant decimal digit.  The least-significant 4 bits represent
+the sign: decimal 15 indicates a negative value, decimal 13 indicates a
+positive value.
+
+Numbers are rounded downward on output.  The system-missing value and
+numbers outside representable range are output as zero.
+
+The maximum field width is 16.  Decimal places may range from 0 up to
+the number of decimal digits represented by the field.
+
+The default output format is an F format with twice the input field
+width, plus one column for a decimal point (if decimal places were
+requested).
+
+@subsubheading IB and PIB Formats
+
+These are integer binary formats.  IB reads and writes 2's complement
+binary integers, and PIB reads and writes unsigned binary integers.  The
+byte ordering is by default the host machine's, but SET RIB may be used
+to select a specific byte ordering for reading (@pxref{SET RIB}) and
+SET WIB, similarly, for writing (@pxref{SET WIB}).
+
+The maximum field width is 8.  Decimal places may range from 0 up to the
+number of decimal digits in the largest value representable in the field
+width.
+
+The default output format is an F format whose width is the number of
+decimal digits in the largest value representable in the field width,
+plus 1 if the format has decimal places.
+
+@subsubheading RB Format
+
+This is a binary format for real numbers.  By default it reads and
+writes the host machine's floating-point format, but SET RRB may be
+used to select an alternate floating-point format for reading
+(@pxref{SET RRB}) and SET WRB, similarly, for writing (@pxref{SET
+WRB}).
+
+The recommended field width depends on the floating-point format.
+NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use
+a field width of 8.  ISL, ISB, VF, and ZS formats should use a field
+width of 4.  Other field widths do not produce useful results.  The
+maximum field width is 8.  No decimal places may be specified.
  
  
-@item wk-delimiter
-The letters @samp{wk} in any case.
+The default output format is F8.2.
  
  
-@item time-delimiter
-At least one characters of whitespace or @samp{:} or @samp{.}.
+@subsubheading PIBHEX and RBHEX Formats
+
+These are hexadecimal formats, for reading and writing binary formats
+where each byte has been recoded as a pair of hexadecimal digits.
+
+A hexadecimal field consists solely of hexadecimal digits
+@samp{0}@dots{}@samp{9} and @samp{A}@dots{}@samp{F}.  Uppercase and
+lowercase are accepted on input; output is in uppercase.
+
+Other than the hexadecimal representation, these formats are equivalent
+to PIB and RB formats, respectively.  However, bytes in PIBHEX format
+are always ordered with the most-significant byte first (big-endian
+order), regardless of the host machine's native byte order or @pspp{}
+settings.
+
+Field widths must be even and between 2 and 16.  RBHEX format allows no
+decimal places; PIBHEX allows as many decimal places as a PIB format
+with half the given width.
+
+@node Time and Date Formats
+@subsubsection Time and Date Formats
+
+@cindex time formats
+@cindex date formats
+In @pspp{}, a @dfn{time} is an interval.  The time formats translate
+between human-friendly descriptions of time intervals and @pspp{}'s
+internal representation of time intervals, which is simply the number of
+seconds in the interval.  @pspp{} has three time formats:
+
+@float
+@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 01:31:17.01}}
+@headitem Time Format @tab Template                  @tab Example
+@item MTIME    @tab @code{MM:SS.ss}             @tab @code{91:17.01}
+@item TIME     @tab @code{hh:MM:SS.ss}          @tab @code{01:31:17.01}
+@item DTIME    @tab @code{DD HH:MM:SS.ss}       @tab @code{00 04:31:17.01}
+@end multitable
+@end float
+
+A @dfn{date} is a moment in the past or the future.  Internally, @pspp{}
+represents a date as the number of seconds since the @dfn{epoch},
+midnight, Oct. 14, 1582.  The date formats translate between
+human-readable dates and @pspp{}'s numeric representation of dates and
+times.  @pspp{} has several date formats:
+
+@float
+@multitable {Date Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
+@headitem Date Format @tab Template                  @tab Example
+@item DATE     @tab @code{dd-mmm-yyyy}          @tab @code{01-OCT-1978}
+@item ADATE    @tab @code{mm/dd/yyyy}           @tab @code{10/01/1978}
+@item EDATE    @tab @code{dd.mm.yyyy}           @tab @code{01.10.1978}
+@item JDATE    @tab @code{yyyyjjj}              @tab @code{1978274}
+@item SDATE    @tab @code{yyyy/mm/dd}           @tab @code{1978/10/01}
+@item QYR      @tab @code{q Q yyyy}             @tab @code{3 Q 1978}
+@item MOYR     @tab @code{mmm yyyy}             @tab @code{OCT 1978}
+@item WKYR     @tab @code{ww WK yyyy}           @tab @code{40 WK 1978}
+@item DATETIME @tab @code{dd-mmm-yyyy HH:MM:SS.ss} @tab @code{01-OCT-1978 04:31:17.01}
+@item YMDHMS   @tab @code{yyyy-mm-dd HH:MM:SS.ss} @tab @code{1978-01-OCT 04:31:17.01}
+@end multitable
+@end float
+
+The templates in the preceding tables describe how the time and date
+formats are input and output:
  
  
-@item hour
-An integer greater than 0 representing an hour.
+@table @code
+@item dd
+Day of month, from 1 to 31.  Always output as two digits.
+
+@item mm
+@itemx mmm
+Month.  In output, @code{mm} is output as two digits, @code{mmm} as the
+first three letters of an English month name (January, February,
+@dots{}).  In input, both of these formats, plus Roman numerals, are
+accepted.
+
+@item yyyy
+Year.  In output, DATETIME and YMDHMS always produce 4-digit years;
+other formats can produce a 2- or 4-digit year.  The century assumed
+for 2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).
+In output, a year outside the epoch causes the whole field to be
+filled with asterisks (@samp{*}).
+
+@item jjj
+Day of year (Julian day), from 1 to 366.  This is exactly three digits
+giving the count of days from the start of the year.  January 1 is
+considered day 1.
+
+@item q
+Quarter of year, from 1 to 4.  Quarters start on January 1, April 1,
+July 1, and October 1.
+
+@item ww
+Week of year, from 1 to 53.  Output as exactly two digits.  January 1 is
+the first day of week 1.
+
+@item DD
+Count of days, which may be positive or negative.  Output as at least
+two digits.
+
+@item hh
+Count of hours, which may be positive or negative.  Output as at least
+two digits.
+
+@item HH
+Hour of day, from 0 to 23.  Output as exactly two digits.
+
+@item MM
+In MTIME, count of minutes, which may be positive or negative.  Output
+as at least two digits.
+
+In other formats, minute of hour, from 0 to 59.  Output as exactly two
+digits.
+
+@item SS.ss
+Seconds within minute, from 0 to 59.  The integer part is output as
+exactly two digits.  On output, seconds and fractional seconds may or
+may not be included, depending on field width and decimal places.  On
+input, seconds and fractional seconds are optional.  The DECIMAL setting
+controls the character accepted and displayed as the decimal point
+(@pxref{SET DECIMAL}).
+@end table
  
  
-@item minute
-An integer between 0 and 59 representing a minute within an hour.
+For output, the date and time formats use the delimiters indicated in
+the table.  For input, date components may be separated by spaces or by
+one of the characters @samp{-}, @samp{/}, @samp{.}, or @samp{,}, and
+time components may be separated by spaces or @samp{:}.  On
+input, the @samp{Q} separating quarter from year and the @samp{WK}
+separating week from year may be uppercase or lowercase, and the spaces
+around them are optional.
+
+On input, all time and date formats accept any amount of leading and
+trailing white space.
+
+The maximum width for time and date formats is 40 columns.  Minimum
+input and output width for each of the time and date formats is shown
+below:
+
+@float
+@multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year}
+@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option
+@item DATE @tab 8 @tab 9 @tab 4-digit year
+@item ADATE @tab 8 @tab 8 @tab 4-digit year
+@item EDATE @tab 8 @tab 8 @tab 4-digit year
+@item JDATE @tab 5 @tab 5 @tab 4-digit year
+@item SDATE @tab 8 @tab 8 @tab 4-digit year
+@item QYR @tab 4 @tab 6 @tab 4-digit year
+@item MOYR @tab 6 @tab 6 @tab 4-digit year
+@item WKYR @tab 6 @tab 8 @tab 4-digit year
+@item DATETIME @tab 17 @tab 17 @tab seconds
+@item YMDHMS @tab 12 @tab 16 @tab seconds
+@item MTIME @tab 4 @tab 5
+@item TIME @tab 5 @tab 5 @tab seconds
+@item DTIME @tab 8 @tab 8 @tab seconds
+@end multitable
+@end float
+@noindent
+In the table, ``Option'' describes what increased output width enables:
  
  
-@item opt-second
-Optionally, a time-delimiter followed by a real number representing a
-number of seconds.
+@table @asis
+@item 4-digit year
+A field 2 columns wider than the minimum includes a 4-digit year.
+(DATETIME and YMDHMS formats always include a 4-digit year.)
+
+@item seconds
+A field 3 columns wider than the minimum includes seconds as well as
+minutes.  A field 5 columns wider than minimum, or more, can also
+include a decimal point and fractional seconds (but no more than allowed
+by the format's decimal places).
+@end table
  
  
-@item hour24
-An integer between 0 and 23 representing an hour within a day.
+For the time and date formats, the default output format is the same as
+the input format, except that @pspp{} increases the field width, if
+necessary, to the minimum allowed for output.
  
  
-@item weekday
-At least the first two characters of an English day word.
+Time or dates narrower than the field width are right-justified within
+the field.
  
  
-@item spaces
-Any amount or no amount of whitespace.
+When a time or date exceeds the field width, characters are trimmed from
+the end until it fits.  This can occur in an unusual situation, @i{e.g.}@:
+with a year greater than 9999 (which adds an extra digit), or for a
+negative value on MTIME, TIME, or DTIME (which adds a leading minus sign).
  
  
-@item sign
-An optional positive or negative sign.
+@c What about out-of-range values?
  
  
-@item trailer
-All formats accept an optional whitespace trailer.
-@end table
+The system-missing value is output as a period at the right end of the
+field.
  
  
-The date input formats are strung together from the above pieces.  On
-output, the date formats are always printed in a single canonical
-manner, based on field width.  The date input and output formats are
-described below:
+@node Date Component Formats
+@subsubsection Date Component Formats
  
  
-@table @asis
-@item DATEw: 9 <= iw,ow <= 40
-Date format. Input format: leader + day + date-delimiter +
-month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
-@var{w} < 11, DD-MMM-YYYY otherwise.
-
-@item EDATEw: 8 <= iw,ow <= 40
-European date format.  Input format same as DATE.  Output format:
-DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
-
-@item SDATEw: 8 <= iw,ow <= 40
-Standard date format. Input format: leader + year + date-delimiter +
-month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
-@var{w} < 10, YYYY/MM/DD otherwise.
-
-@item ADATEw: 8 <= iw,ow <= 40
-American date format.  Input format: leader + month + date-delimiter +
-day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
-@var{w} < 10, MM/DD/YYYY otherwise.
-
-@item JDATEw: 5 <= iw,ow <= 40
-Julian date format.  Input format: leader + julian + trailer.  Output
-format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
-
-@item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
-Quarter/year format.  Input format: leader + quarter + q-delimiter +
-year + trailer.  Output format: @samp{Q Q YY}, where the first
-@samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
-YYYY} otherwise.
-
-@item MOYRw: 6 <= iw,ow <= 40
-Month/year format.  Input format: leader + month + date-delimiter + year
-+ trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
-YYYY} otherwise.
-
-@item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
-Week/year format.  Input format: leader + week + wk-delimiter + year +
-trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
-YYYY} otherwise.
-
-@item DATETIMEw.d: 17 <= iw,ow <= 40
-Date and time format.  Input format: leader + day + date-delimiter +
-month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
-+ minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
-@var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
-@var{d} > 0 then fractional seconds @samp{.SS} are added.
-
-@item TIMEw.d: 5 <= iw,ow <= 40
-Time format.  Input format: leader + sign + spaces + hour +
-time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
-Seconds and fractional seconds are available with @var{w} of at least 8
-and 10, respectively.
-
-@item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
-Time format with day count.  Input format: leader + sign + spaces +
-day-count + time-delimiter + hour + time-delimiter + minute +
-opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
-seconds are available with @var{w} of at least 8 and 10, respectively.
-
-@item WKDAYw: 2 <= iw,ow <= 40
-A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
-leader + weekday + trailer.  Output format: as many characters, in all
-capital letters, of the English name of the weekday as will fit in the
-field width.
-
-@item MONTHw: 3 <= iw,ow <= 40
-A month as a number between 1 and 12, where 1 is January.  Input format:
-leader + month + trailer.  Output format: as many character, in all
-capital letters, of the English name of the month as will fit in the
-field width.
-@end table
+The WKDAY and MONTH formats provide input and output for the names of
+weekdays and months, respectively.
  
  
-There are only two formats that may be used with string variables:
+On output, these formats convert a number between 1 and 7, for WKDAY, or
+between 1 and 12, for MONTH, into the English name of a day or month,
+respectively.  If the name is longer than the field, it is trimmed to
+fit.  If the name is shorter than the field, it is padded on the right
+with spaces.  Values outside the valid range, and the system-missing
+value, are output as all spaces.
  
  
-@table @asis
-@item Aw: 1 <= iw <= 255, 1 <= ow <= 254
-The entire field is treated as a string value.
+On input, English weekday or month names (in uppercase or lowercase) are
+converted back to their corresponding numbers.  Weekday and month names
+may be abbreviated to their first 2 or 3 letters, respectively.
  
  
-@item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
-The field is composed of characters in a string encoded as textual hex
-digit pairs.
+The field width may range from 2 to 40, for WKDAY, or from 3 to 40, for
+MONTH.  No decimal places are allowed.
  
  
-The default output @var{w} is half the input @var{w}.
-@end table
+The default output format is the same as the input format.
+
+@node String Formats
+@subsubsection String Formats
+
+@cindex string formats
+The A and AHEX formats are the only ones that may be assigned to string
+variables.  Neither format allows any decimal places.
+
+In A format, the entire field is treated as a string value.  The field
+width may range from 1 to 32,767, the maximum string width.  The default
+output format is the same as the input format.
+
+In AHEX format, the field is composed of characters in a string encoded
+as hex digit pairs.  On output, hex digits are output in uppercase; on
+input, uppercase and lowercase are both accepted.  The default output
+format is A format with half the input width.
  
  
-@node Scratch Variables,  , Input/Output Formats, Variables
+@node Scratch Variables
  @subsection Scratch Variables
  
  @subsection Scratch Variables
  
+@cindex scratch variables
  Most of the time, variables don't retain their values between cases.
  Most of the time, variables don't retain their values between cases.
-Instead, either they're being read from a data file or the active file,
+Instead, either they're being read from a data file or the active dataset,
  in which case they assume the value read, or, if created with
  @cmd{COMPUTE} or
  another transformation, they're initialized to the system-missing value
  in which case they assume the value read, or, if created with
  @cmd{COMPUTE} or
  another transformation, they're initialized to the system-missing value
@@ -1065,21 +1330,21 @@ or to blanks, depending on type.
  However, sometimes it's useful to have a variable that keeps its value
  between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
  use a @dfn{scratch variable}.  Scratch variables are variables whose
  However, sometimes it's useful to have a variable that keeps its value
  between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
  use a @dfn{scratch variable}.  Scratch variables are variables whose
-names begin with an octothorpe (@samp{#}).  
+names begin with an octothorpe (@samp{#}).
  
  Scratch variables have the same properties as variables left with
  
  Scratch variables have the same properties as variables left with
-@cmd{LEAVE}:
-they retain their values between cases, and for the first case they are
-initialized to 0 or blanks.  They have the additional property that they
-are deleted before the execution of any procedure.  For this reason,
-scratch variables can't be used for analysis.  To obtain the same
-effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
-value into an ordinary variable, then analysis that variable.
-
-@node Files, BNF, Variables, Language
-@section Files Used by PSPP
-
-PSPP makes use of many files each time it runs.  Some of these it
+@cmd{LEAVE}: they retain their values between cases, and for the first
+case they are initialized to 0 or blanks.  They have the additional
+property that they are deleted before the execution of any procedure.
+For this reason, scratch variables can't be used for analysis.  To use
+a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
+to copy its value into an ordinary variable, then use that ordinary
+variable in the analysis.
+
+@node Files
+@section Files Used by @pspp{}
+
+@pspp{} makes use of many files each time it runs.  Some of these it
  reads, some it writes, some it creates.  Here is a table listing the
  most important of these files:
  
  reads, some it writes, some it creates.  Here is a table listing the
  most important of these files:
  
@@ -1090,44 +1355,91 @@ most important of these files:
  @cindex syntax file
  @item command file
  @itemx syntax file
  @cindex syntax file
  @item command file
  @itemx syntax file
-These names (synonyms) refer to the file that contains instructions to
-PSPP that tell it what to do.  The syntax file's name is specified on
-the PSPP command line.  Syntax files can also be pulled in with
+These names (synonyms) refer to the file that contains instructions
+that tell @pspp{} what to do.  The syntax file's name is specified on
+the @pspp{} command line.  Syntax files can also be read with
  @cmd{INCLUDE} (@pxref{INCLUDE}).
  
  @cindex file, data
  @cindex data file
  @item data file
  @cmd{INCLUDE} (@pxref{INCLUDE}).
  
  @cindex file, data
  @cindex data file
  @item data file
-Data files contain raw data in ASCII format suitable for being read in
-by @cmd{DATA LIST}.  Data can be embedded in the syntax
-file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
-syntax file a data file too.
+Data files contain raw data in text or binary format.  Data can also
+be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
  
  @cindex file, output
  @cindex output file
  @item listing file
  
  @cindex file, output
  @cindex output file
  @item listing file
-One or more output files are created by PSPP each time it is
+One or more output files are created by @pspp{} each time it is
  run.  The output files receive the tables and charts produced by
  statistical procedures.  The output files may be in any number of formats,
  run.  The output files receive the tables and charts produced by
  statistical procedures.  The output files may be in any number of formats,
-depending on how PSPP is configured.
-
-@cindex active file
-@cindex file, active
-@item active file
-The active file is the ``file'' on which all PSPP procedures
-are performed.  The active file contains variable definitions and
-cases.  The active file is not necessarily a disk file: it is stored
-in memory if there is room.
+depending on how @pspp{} is configured.
+
+@cindex system file
+@cindex file, system
+@item system file
+System files are binary files that store a dictionary and a set of
+cases.  @cmd{GET} and @cmd{SAVE} read and write system files.
+
+@cindex portable file
+@cindex file, portable
+@item portable file
+Portable files are files in a text-based format that store a dictionary
+and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
+portable files.
  @end table
  
  @end table
  
-@node BNF,  , Files, Language
+@node File Handles
+@section File Handles
+@cindex file handles
+
+A @dfn{file handle} is a reference to a data file, system file, or
+portable file.  Most often, a file handle is specified as the
+name of a file as a string, that is, enclosed within @samp{'} or
+@samp{"}.
+
+A file name string that begins or ends with @samp{|} is treated as the
+name of a command to pipe data to or from.  You can use this feature
+to read data over the network using a program such as @samp{curl}
+(@i{e.g.}@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
+read compressed data from a file using a program such as @samp{zcat}
+(@i{e.g.}@: @code{GET '|zcat mydata.sav.gz'}), and for many other
+purposes.
+
+@pspp{} also supports declaring named file handles with the @cmd{FILE
+HANDLE} command.  This command associates an identifier of your choice
+(the file handle's name) with a file.  Later, the file handle name can
+be substituted for the name of the file.  When @pspp{} syntax accesses a
+file multiple times, declaring a named file handle simplifies updating
+the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
+also required to read data files in binary formats.  @xref{FILE HANDLE},
+for more information.
+
+In some circumstances, @pspp{} must distinguish whether a file handle
+refers to a system file or a portable file.  When this is necessary to
+read a file, @i{e.g.}@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
+@pspp{} uses the file's contents to decide.  In the context of writing a
+file, @i{e.g.}@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, @pspp{}
+decides based on the file's name: if it ends in @samp{.por} (with any
+capitalization), then @pspp{} writes a portable file; otherwise, @pspp{}
+writes a system file.
+
+INLINE is reserved as a file handle name.  It refers to the ``data
+file'' embedded into the syntax file between @cmd{BEGIN DATA} and
+@cmd{END DATA}.  @xref{BEGIN DATA}, for more information.
+
+The file to which a file handle refers may be reassigned on a later
+@cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
+HANDLE}.  @xref{CLOSE FILE HANDLE}, for
+more information.
+
+@node BNF
  @section Backus-Naur Form
  @cindex BNF
  @cindex Backus-Naur Form
  @cindex command syntax, description of
  @cindex description of command syntax
  
  @section Backus-Naur Form
  @cindex BNF
  @cindex Backus-Naur Form
  @cindex command syntax, description of
  @cindex description of command syntax
  
-The syntax of some parts of the PSPP language is presented in this
+The syntax of some parts of the @pspp{} language is presented in this
  manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
  following table describes BNF:
  
  manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
  following table describes BNF:
  
@@ -1135,9 +1447,9 @@ following table describes BNF:
  @cindex keywords
  @cindex terminals
  @item
  @cindex keywords
  @cindex terminals
  @item
-Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
+Words in all-uppercase are @pspp{} keyword tokens.  In BNF, these are
  often called @dfn{terminals}.  There are some special terminals, which
  often called @dfn{terminals}.  There are some special terminals, which
-are actually written in lowercase for clarity:
+are written in lowercase for clarity:
  
  @table @asis
  @cindex @code{number}
  
  @table @asis
  @cindex @code{number}
@@ -1162,11 +1474,9 @@ A single variable name.
  Operators and punctuators.
  
  @cindex @code{.}
  Operators and punctuators.
  
  @cindex @code{.}
-@cindex terminal dot
-@cindex dot, terminal
  @item @code{.}
  @item @code{.}
-The terminal dot.  This is not necessarily an actual dot in the syntax
-file: @xref{Commands}, for more details.
+The end of the command.  This is not necessarily an actual dot in the
+syntax file (@pxref{Commands}).
  @end table
  
  @item
  @end table
  
  @item
@@ -1188,7 +1498,6 @@ An expression.  @xref{Expressions}, for details.
  @end table
  
  @item
  @end table
  
  @item
-@cindex @code{::=}
  @cindex ``is defined as''
  @cindex productions
  @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
  @cindex ``is defined as''
  @cindex productions
  @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
@@ -1214,4 +1523,3 @@ The first nonterminal defined in a set of productions is called the
  @dfn{start symbol}.  The start symbol defines the entire syntax for
  that command.
  @end itemize
  @dfn{start symbol}.  The start symbol defines the entire syntax for
  that command.
  @end itemize
-@setfilename ignored