Automatically infer variables' measurement level from format and data.

[pspp] / doc / language.texi
diff --git a/doc/language.texi b/doc/language.texi

index 6226741463a4086a1d7021e3c159f4043918c0ab..7b750eb7ae75ac3acf8053184aaacded62fc97a2 100644 (file)
--- a/doc/language.texi
+++ b/doc/language.texi
@@ -1,16 +1,19 @@
+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
  @node Language
-@chapter The PSPP language
-@cindex language, PSPP
-@cindex PSPP, language
+@chapter The @pspp{} language
+@cindex language, @pspp{}
+@cindex @pspp{}, language
  
-@quotation
-@strong{Please note:} PSPP is not even close to completion.
-Only a few statistical procedures are implemented.  PSPP
-is a work in progress.
-@end quotation
-
-This chapter discusses elements common to many PSPP commands.
-Later chapters will describe individual commands in detail.
+This chapter discusses elements common to many @pspp{} commands.
+Later chapters describe individual commands in detail.
  
  @menu
  * Tokens::                      Characters combine to form tokens.
@@ -19,8 +22,8 @@ Later chapters will describe individual commands in detail.
  * Types of Commands::           Commands come in several flavors.
  * Order of Commands::           Commands combine to form syntax files.
  * Missing Observations::        Handling missing observations.
-* Variables::                   The unit of data storage.
-* Files::                       Files used by PSPP.
+* Datasets::                    Data organization.
+* Files::                       Files used by @pspp{}.
  * File Handles::                How files are named.
  * BNF::                         How command syntax is described.
  @end menu
@@ -33,10 +36,10 @@ Later chapters will describe individual commands in detail.
  @cindex tokens
  @cindex lexical analysis
  
-PSPP divides most syntax file lines into series of short chunks
+@pspp{} divides most syntax file lines into series of short chunks
  called @dfn{tokens}.
  Tokens are then grouped to form commands, each of which tells
-PSPP to take some action---read in data, write out data, perform
+@pspp{} to take some action---read in data, write out data, perform
  a statistical procedure, etc.  Each type of token is
  described below.
  
@@ -101,7 +104,7 @@ number.
  No white space is allowed within a number token, except for horizontal
  white space between @samp{-} and the rest of the number.
  
-The last example above, @samp{8945.} will be interpreted as two
+The last example above, @samp{8945.} is interpreted as two
  tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
  @xref{Commands, , Forming commands of tokens}.
  
@@ -112,32 +115,34 @@ tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
  @cindex case-sensitivity
  Strings are literal sequences of characters enclosed in pairs of
  single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
-character used for quoting in the string, double it, e.g.@:
+character used for quoting in the string, double it, @i{e.g.}@:
  @samp{'it''s an apostrophe'}.  White space and case of letters are
  significant inside strings.
  
  Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
-'c'} is equivalent to @samp{'abc'}.  Concatenation is useful for
-splitting a single string across multiple source lines. The maximum
-length of a string, after concatenation, is 255 characters.
-
-Strings may also be expressed as hexadecimal, octal, or binary
-character values by prefixing the initial quote character by @samp{X},
-@samp{O}, or @samp{B} or their lowercase equivalents.  Each pair,
-triplet, or octet of characters, according to the radix, is
-transformed into a single character with the given value.  If there is
-an incomplete group of characters, the missing final digits are
-assumed to be @samp{0}.  These forms of strings are nonportable
-because numeric values are associated with different characters by
-different operating systems.  Therefore, their use should be confined
-to syntax files that will not be widely distributed.
-
-@cindex characters, reserved
-@cindex 0
-@cindex white space
-The character with value 00 is reserved for
-internal use by PSPP.  Its use in strings causes an error and
-replacement by a space character.
+'c'} is equivalent to @samp{'abc'}.  So that a long string may be
+broken across lines, a line break may precede or follow, or both
+precede and follow, the @samp{+}.  (However, an entirely blank line
+preceding or following the @samp{+} is interpreted as ending the
+current command.)
+
+Strings may also be expressed as hexadecimal character values by
+prefixing the initial quote character by @samp{x} or @samp{X}.
+Regardless of the syntax file or active dataset's encoding, the
+hexadecimal digits in the string are interpreted as Unicode characters
+in UTF-8 encoding.
+
+Individual Unicode code points may also be expressed by specifying the
+hexadecimal code point number in single or double quotes preceded by
+@samp{u} or @samp{U}.  For example, Unicode code point U+1D11E, the
+musical G clef character, could be expressed as @code{U'1D11E'}.
+Invalid Unicode code points (above U+10FFFF or in between U+D800 and
+U+DFFF) are not allowed.
+
+When strings are concatenated with @samp{+}, each segment's prefix is
+considered individually.  For example, @code{'The G clef symbol is:' +
+u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
+plain text string.
  
  @item Punctuators and Operators
  @cindex punctuators
@@ -153,22 +158,17 @@ Most of these appear within the syntax of commands, but the period
  punctuator only as the last character on a line (except white space).
  When it is the last non-space character on a line, a period is not
  treated as part of another token, even if it would otherwise be part
-of, e.g.@:, an identifier or a floating-point number.
-
-Actually, the character that ends a command can be changed with
-@cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
-doing so.  Throughout the remainder of this manual we will assume that
-the default setting is in effect.
+of, @i{e.g.}@:, an identifier or a floating-point number.
  @end table
  
  @node Commands
  @section Forming commands of tokens
  
-@cindex PSPP, command structure
+@cindex @pspp{}, command structure
  @cindex language, command structure
  @cindex commands, structure
  
-Most PSPP commands share a common structure.  A command begins with a
+Most @pspp{} commands share a common structure.  A command begins with a
  command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
  CASES}.  The command name may be abbreviated to its first word, and
  each word in the command name may be abbreviated to its first three
@@ -186,48 +186,53 @@ by a forward slash (@samp{/}).
  There are multiple ways to mark the end of a command.  The most common
  way is to end the last line of the command with a period (@samp{.}) as
  described in the previous section (@pxref{Tokens}).  A blank line, or
-one that consists only of white space or comments, also ends a command
-by default, although you can use the NULLINE subcommand of @cmd{SET}
-to disable this feature (@pxref{SET}).
+one that consists only of white space or comments, also ends a command.
  
  @node Syntax Variants
-@section Variants of syntax.
+@section Syntax Variants
  
  @cindex Batch syntax
  @cindex Interactive syntax
  
-There are two variants of command syntax, @i{viz}: @dfn{batch} mode and
-@dfn{interactive} mode.
-Batch mode is the default when reading commands from a file.
-Interactive mode is the default when commands are typed at a prompt
-by a user.
-Certain commands, such as @cmd{INSERT} (@pxref{INSERT}), may explicitly
-change the syntax mode. 
-
-In batch mode, any line that contains a non-space character
-in the leftmost column begins a new command. 
-Thus, each command consists of a flush-left line followed by any
-number of lines indented from the left margin. 
-In this mode, a plus or minus sign (@samp{+}, @samp{@minus{}}) as the
-first character in a line is ignored and causes that line to begin a
-new command, which allows for visual indentation of a command without
-that command being considered part of the previous command. 
-The period terminating the end of a command is optional but recommended.
-
-In interactive mode, each command must  either be terminated with a period,
-or an empty line must follow the command.
-The use of (@samp{+} and @samp{@minus{}} as continuation characters is not
-permitted.
+There are three variants of command syntax, which vary only in how
+they detect the end of one command and the start of the next.
+
+In @dfn{interactive mode}, which is the default for syntax typed at a
+command prompt, a period as the last non-blank character on a line
+ends a command.  A blank line also ends a command.
+
+In @dfn{batch mode}, an end-of-line period or a blank line also ends a
+command.  Additionally, it treats any line that has a non-blank
+character in the leftmost column as beginning a new command.  Thus, in
+batch mode the second and subsequent lines in a command must be
+indented.
+
+Regardless of the syntax mode, a plus sign, minus sign, or period in
+the leftmost column of a line is ignored and causes that line to begin
+a new command.  This is most useful in batch mode, in which the first
+line of a new command could not otherwise be indented, but it is
+accepted regardless of syntax mode.
+
+The default mode for reading commands from a file is @dfn{auto mode}.
+It is the same as batch mode, except that a line with a non-blank in
+the leftmost column only starts a new command if that line begins with
+the name of a @pspp{} command.  This correctly interprets most valid @pspp{}
+syntax files regardless of the syntax mode for which they are
+intended.
+
+The @option{--interactive} (or @option{-i}) or @option{--batch} (or
+@option{-b}) options set the syntax mode for files listed on the @pspp{}
+command line.  @xref{Main Options}, for more details.
  
  @node Types of Commands
  @section Types of Commands
  
-Commands in PSPP are divided roughly into six categories:
+Commands in @pspp{} are divided roughly into six categories:
  
  @table @strong
  @item Utility commands
  @cindex utility commands
-Set or display various global options that affect PSPP operations.
+Set or display various global options that affect @pspp{} operations.
  May appear anywhere in a syntax file.  @xref{Utilities, , Utility
  commands}.
  
@@ -247,7 +252,7 @@ in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
  @item Transformations
  @cindex transformations
  Perform operations on data and write data to output files.  Transformations
-are not carried out until a procedure is executed.  
+are not carried out until a procedure is executed.
  
  @item Restricted transformations
  @cindex restricted transformations
@@ -259,7 +264,7 @@ of Commands}, for details.
  Analyze data, writing results of analyses to the listing file.  Cause
  transformations specified earlier in the file to be performed.  In a
  more general sense, a @dfn{procedure} is any command that causes the
-active file (the data) to be read.
+active dataset (the data) to be read.
  @end table
  
  @node Order of Commands
@@ -267,17 +272,17 @@ active file (the data) to be read.
  @cindex commands, ordering
  @cindex order of commands
  
-PSPP does not place many restrictions on ordering of commands.  The
+@pspp{} does not place many restrictions on ordering of commands.  The
  main restriction is that variables must be defined before they are otherwise
  referenced.  This section describes the details of command ordering,
  but most users will have no need to refer to them.
  
-PSPP possesses five internal states, called initial, INPUT PROGRAM,
-FILE TYPE, transformation, and procedure states.  (Please note the
+@pspp{} possesses five internal states, called @dfn{initial}, @dfn{input-program}
+@dfn{file-type}, @dfn{transformation}, and @dfn{procedure} states.  (Please note the
  distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
-@emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
+@emph{commands} and the @dfn{input-program} and @dfn{file-type} @emph{states}.)
  
-PSPP starts in the initial state.  Each successful completion
+@pspp{} starts in the initial state.  Each successful completion
  of a command may cause a state transition.  Each type of command has its
  own rules for state transitions:
  
@@ -298,47 +303,47 @@ transformation state.
  Valid in any state.
  @item
  When executed in the initial or procedure state, causes a transition to
-the transformation state.  
+the transformation state.
  @item
-Clears the active file if executed in the procedure or transformation
+Clears the active dataset if executed in the procedure or transformation
  state.
  @end itemize
  
  @item @cmd{INPUT PROGRAM}
  @itemize @bullet
  @item
-Invalid in INPUT PROGRAM and FILE TYPE states.
+Invalid in input-program and file-type states.
  @item
-Causes a transition to the INPUT PROGRAM state.  
+Causes a transition to the intput-program state.
  @item
-Clears the active file.
+Clears the active dataset.
  @end itemize
  
  @item @cmd{FILE TYPE}
  @itemize @bullet
  @item
-Invalid in INPUT PROGRAM and FILE TYPE states.
+Invalid in intput-program and file-type states.
  @item
-Causes a transition to the FILE TYPE state.
+Causes a transition to the file-type state.
  @item
-Clears the active file.
+Clears the active dataset.
  @end itemize
  
  @item Other file definition commands
  @itemize @bullet
  @item
-Invalid in INPUT PROGRAM and FILE TYPE states.
+Invalid in input-program and file-type states.
  @item
  Cause a transition to the transformation state.
  @item
-Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
+Clear the active dataset, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
  and @cmd{UPDATE}.
  @end itemize
  
  @item Transformations
  @itemize @bullet
  @item
-Invalid in initial and FILE TYPE states.
+Invalid in initial and file-type states.
  @item
  Cause a transition to the transformation state.
  @end itemize
@@ -346,7 +351,7 @@ Cause a transition to the transformation state.
  @item Restricted transformations
  @itemize @bullet
  @item
-Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
+Invalid in initial, input-program, and file-type states.
  @item
  Cause a transition to the transformation state.
  @end itemize
@@ -354,7 +359,7 @@ Cause a transition to the transformation state.
  @item Procedures
  @itemize @bullet
  @item
-Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
+Invalid in initial, input-program, and file-type states.
  @item
  Cause a transition to the procedure state.
  @end itemize
@@ -365,7 +370,7 @@ Cause a transition to the procedure state.
  @cindex missing values
  @cindex values, missing
  
-PSPP includes special support for unknown numeric data values.
+@pspp{} includes special support for unknown numeric data values.
  Missing observations are assigned a special value, called the
  @dfn{system-missing value}.  This ``value'' actually indicates the
  absence of a value; it means that the actual value is unknown.  Procedures
@@ -381,28 +386,36 @@ spaces.
  Variables, whether numeric or string, can have designated
  @dfn{user-missing values}.  Every user-missing value is an actual value
  for that variable.  However, most of the time user-missing values are
-treated in the same way as the system-missing value.  String variables
-that are wider than a certain width, usually 8 characters (depending on
-computer architecture), cannot have user-missing values.
+treated in the same way as the system-missing value.
  
  For more information on missing values, see the following sections:
-@ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
+@ref{Datasets}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
  documentation on individual procedures for information on how they
  handle missing values.
  
-@node Variables
-@section Variables
-@cindex variables
+@node Datasets
+@section Datasets
+@cindex dataset
+@cindex variable
  @cindex dictionary
  
-Variables are the basic unit of data storage in PSPP.  All the
-variables in a file taken together, apart from any associated data, are
-said to form a @dfn{dictionary}.  
-Some details of variables are described in the sections below.
+@pspp{} works with data organized into @dfn{datasets}.  A dataset
+consists of a set of @dfn{variables}, which taken together are said to
+form a @dfn{dictionary}, and one or more @dfn{cases}, each of which
+has one value for each variable.
+
+At any given time @pspp{} has exactly one distinguished dataset, called
+the @dfn{active dataset}.  Most @pspp{} commands work only with the
+active dataset.  In addition to the active dataset, @pspp{} also supports
+any number of additional open datasets.  The @cmd{DATASET} commands
+can choose a new active dataset from among those that are open, as
+well as create and destroy datasets (@pxref{DATASET}).
+
+The sections below describe variables in more detail.
  
  @menu
  * Attributes::                  Attributes of variables.
-* System Variables::            Variables automatically defined by PSPP.
+* System Variables::            Variables automatically defined by @pspp{}.
  * Sets of Variables::           Lists of variable names.
  * Input and Output Formats::    Input and output formats.
  * Scratch Variables::           Variables deleted by procedures.
@@ -427,16 +440,16 @@ variables' names may not begin with @samp{$}.
  @cindex variable names, ending with period
  The final character in a variable name should not be @samp{.}, because
  such an identifier will be misinterpreted when it is the final token
-on a line: @code{FOO.} will be divided into two separate tokens,
+on a line: @code{FOO.} is divided into two separate tokens,
  @samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.
  
  @cindex @samp{_}
  The final character in a variable name should not be @samp{_}, because
-some such identifiers are used for special purposes by PSPP
+some such identifiers are used for special purposes by @pspp{}
  procedures.
  
-As with all PSPP identifiers, variable names are not case-sensitive.
-PSPP capitalizes variable names on output the same way they were
+As with all @pspp{} identifiers, variable names are not case-sensitive.
+@pspp{} capitalizes variable names on output the same way they were
  capitalized at their point of definition in the input.
  
  @cindex variables, type
@@ -449,13 +462,9 @@ Numeric or string.
  @item Width
  (string variables only) String variables with a width of 8 characters or
  fewer are called @dfn{short string variables}.  Short string variables
-can be used in many procedures where @dfn{long string variables} (those
+may be used in a few contexts where @dfn{long string variables} (those
  with widths greater than 8) are not allowed.
  
-Certain systems may consider strings longer than 8
-characters to be short strings.  Eight characters represents a minimum
-figure for the maximum length of a short string.
-
  @item Position
  Variables in the dictionary are arranged in a specific order.
  @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
@@ -498,14 +507,88 @@ they are displayed.  Example: a width of 8, with 2 decimal places.
  Similar to print format, but used by the @cmd{WRITE} command
  (@pxref{WRITE}).
  
+@cindex measurement level
+@item Measurement level
+@anchor{Measurement Level}
+One of the following:
+
+@table @asis
+@item Nominal
+Each value of a nominal variable represents a distinct category.  The
+possible categories are finite and often have value labels.  The order
+of categories is not significant.  Political parties, US states, and
+yes/no choices are nominal.  Numeric and string variables can be
+nominal.
+
+@item Ordinal
+Ordinal variables also represent distinct categories, but their values
+are arranged according to some natural order.  Likert scales, e.g.@:
+from strongly disagree to strongly agree, are ordinal.  Data grouped
+into ranges, e.g.@: age groups or income groups, are ordinal.  Both
+numeric and string variables can be ordinal.  String values are
+ordered alphabetically, so letter grades from A to F will work as
+expected, but @code{poor}, @code{satisfactory}, @code{excellent} will
+not.
+
+@item Scale
+Scale variables are ones for which differences and ratios are
+meaningful.  These are often values which have a natural unit
+attached, such as age in years, income in dollars, or distance in
+miles.  Only numeric variables are scalar.
+@end table
+
+Variables created by @cmd{COMPUTE} and similar transformations,
+obtained from external sources, etc., initially have an unknown
+measurement level.  Any procedure that reads the data will then assign
+a default measurement level.  @pspp{} can assign some defaults without
+reading the data:
+
+@itemize @bullet
+@item
+Nominal, if it's a string variable.
+
+@item
+Nominal, if the variable has a WKDAY or MONTH print format.
+
+@item
+Scale, if the variable has a DOLLAR, CCA through CCE, or time or date
+print format.
+@end itemize
+
+Otherwise, @pspp{} reads the data and decides based on its
+distribution:
+
+@itemize @bullet
+@item
+Nominal, if all observations are missing.
+
+@item
+Scale, if one or more valid observations are noninteger or negative.
+
+@item
+Scale, if no valid observation is less than 10.
+
+@item
+Scale, if the variable has 24 or more unique valid values.  The value
+24 is the default and can be adjusted (@pxref{SET SCALEMIN}).
+@end itemize
+
+Finally, if none of the above is true, @pspp{} assigns the variable a
+nominal measurement level.
+
  @cindex custom attributes
  @item Custom attributes
  User-defined associations between names and values.  @xref{VARIABLE
  ATTRIBUTE}.
+
+@cindex variable role
+@item Role
+The intended role of a variable for use in dialog boxes in graphical
+user interfaces.  @xref{VARIABLE ROLE}.
  @end table
  
  @node System Variables
-@subsection Variables Automatically Defined by PSPP
+@subsection Variables Automatically Defined by @pspp{}
  @cindex system variables
  @cindex variables, system
  
@@ -522,12 +605,17 @@ shuffled around.
  
  @cindex @code{$DATE}
  @item $DATE
-Date the PSPP process was started, in format A9, following the
-pattern @code{DD MMM YY}.
+Date the @pspp{} process was started, in format A9, following the
+pattern @code{DD-MMM-YY}.
+
+@cindex @code{$DATE11}
+@item $DATE11
+Date the @pspp{} process was started, in format A11, following the
+pattern @code{DD-MMM-YYYY}.
  
  @cindex @code{$JDATE}
  @item $JDATE
-Number of days between 15 Oct 1582 and the time the PSPP process
+Number of days between 15 Oct 1582 and the time the @pspp{} process
  was started.
  
  @cindex @code{$LENGTH}
@@ -540,7 +628,7 @@ System missing value, in format F1.
  
  @cindex @code{$TIME}
  @item $TIME
-Number of seconds between midnight 14 Oct 1582 and the time the active file
+Number of seconds between midnight 14 Oct 1582 and the time the active dataset
  was read, in format F20.
  
  @cindex @code{$WIDTH}
@@ -586,7 +674,7 @@ input field as a number or a string.  It might specify that the field
  contains an ordinary decimal number, a time or date, a number in binary
  or hexadecimal notation, or one of several other notations.  Input
  formats are used by commands such as @cmd{DATA LIST} that read data or
-syntax files into the PSPP active file.
+syntax files into the @pspp{} active dataset.
  
  Every input format corresponds to a default @dfn{output format} that
  specifies the formatting used when the value is output later.  It is
@@ -605,24 +693,25 @@ created variables have identical print and write formats, and
  most of the time, the distinction between print and write formats is
  unimportant.
  
-Input and output formats are specified to PSPP with a @dfn{format
-specification} of the form @code{TYPEw} or @code{TYPEw.d}, where
-@code{TYPE} is one of the format types described later, @code{w} is a
-field width measured in columns, and @code{d} is an optional number of
-decimal places.  If @code{d} is omitted, a value of 0 is assumed.  Some
-formats do not allow a nonzero @code{d} to be specified.
+Input and output formats are specified to @pspp{} with
+a @dfn{format specification} of the
+form @subcmd{@var{TYPE}@var{w}} or @code{TYPE@var{w}.@var{d}}, where
+@var{TYPE} is one of the format types described later, @var{w} is a
+field width measured in columns, and @var{d} is an optional number of
+decimal places.  If @var{d} is omitted, a value of 0 is assumed.  Some
+formats do not allow a nonzero @var{d} to be specified.
  
  The following sections describe the input and output formats supported
-by PSPP.
+by @pspp{}.
  
  @menu
-* Basic Numeric Formats::       
-* Custom Currency Formats::     
-* Legacy Numeric Formats::      
-* Binary and Hexadecimal Numeric Formats::  
-* Time and Date Formats::       
-* Date Component Formats::      
-* String Formats::              
+* Basic Numeric Formats::
+* Custom Currency Formats::
+* Legacy Numeric Formats::
+* Binary and Hexadecimal Numeric Formats::
+* Time and Date Formats::
+* Date Component Formats::
+* String Formats::
  @end menu
  
  @node Basic Numeric Formats
@@ -776,8 +865,10 @@ would not fit at all without it.  Scientific notation with @samp{$} or
  @item
  Except in scientific notation, a decimal point is included only when
  it is followed by a digit.  If the integer part of the number being
-output is 0, and a decimal point is included, then the zero before the
-decimal point is dropped.
+output is 0, and a decimal point is included, then @pspp{} ordinarily
+drops the zero before the decimal point.  However, in @code{F},
+@code{COMMA}, or @code{DOT} formats, @pspp{} keeps the zero if
+@code{SET LEADZERO} is set to @code{ON} (@pxref{SET LEADZERO}).
  
  In scientific notation, the number always includes a decimal point,
  even if it is not followed by a digit.
@@ -798,7 +889,7 @@ In scientific notation, the exponent is output as @samp{E} followed by
  @samp{+} or @samp{-} and exactly three digits.  Numbers with magnitude
  less than 10**-999 or larger than 10**999 are not supported by most
  computers, but if they are supported then their output is considered
-to overflow the field and will be output as asterisks.
+to overflow the field and they are output as asterisks.
  
  @item
  On most computers, no more than 15 decimal digits are significant in
@@ -811,7 +902,7 @@ calculations may also reduce precision of output.
  Special values such as infinities and ``not a number'' values are
  usually converted to the system-missing value before printing.  In a few
  circumstances, these values are output directly.  In fields of width 3
-or greater, special values are output as however many characters will
+or greater, special values are output as however many characters
  fit from @code{+Infinity} or @code{-Infinity} for infinities, from
  @code{NaN} for ``not a number,'' or from @code{Unknown} for other values
  (if any are supported by the system).  In fields under 3 columns wide,
@@ -828,15 +919,15 @@ SET command configures custom currency formats, using the syntax
  @display
  SET CC@var{x}=@t{"}@var{string}@t{"}.
  @end display
-@noindent 
+@noindent
  where @var{x} is A, B, C, D, or E, and @var{string} is no more than 16
  characters long.
  
  @var{string} must contain exactly three commas or exactly three periods
  (but not both), except that a single quote character may be used to
  ``escape'' a following comma, period, or single quote.  If three commas
-are used, commas will be used for grouping in output, and a period will
-be used as the decimal point.  Uses of periods reverses these roles.
+are used, commas are used for grouping in output, and a period
+is used as the decimal point.  Uses of periods reverses these roles.
  
  The commas or periods divide @var{string} into four fields, called the
  @dfn{negative prefix}, @dfn{prefix}, @dfn{suffix}, and @dfn{negative
@@ -1038,7 +1129,7 @@ WRB}).
  The recommended field width depends on the floating-point format.
  NATIVE (the default format), IDL, IDB, VD, VG, and ZL formats should use
  a field width of 8.  ISL, ISB, VF, and ZS formats should use a field
-width of 4.  Other field widths will not produce useful results.  The
+width of 4.  Other field widths do not produce useful results.  The
  maximum field width is 8.  No decimal places may be specified.
  
  The default output format is F8.2.
@@ -1055,7 +1146,7 @@ lowercase are accepted on input; output is in uppercase.
  Other than the hexadecimal representation, these formats are equivalent
  to PIB and RB formats, respectively.  However, bytes in PIBHEX format
  are always ordered with the most-significant byte first (big-endian
-order), regardless of the host machine's native byte order or PSPP
+order), regardless of the host machine's native byte order or @pspp{}
  settings.
  
  Field widths must be even and between 2 and 16.  RBHEX format allows no
@@ -1067,24 +1158,25 @@ with half the given width.
  
  @cindex time formats
  @cindex date formats
-In PSPP, a @dfn{time} is an interval.  The time formats translate
-between human-friendly descriptions of time intervals and PSPP's
+In @pspp{}, a @dfn{time} is an interval.  The time formats translate
+between human-friendly descriptions of time intervals and @pspp{}'s
  internal representation of time intervals, which is simply the number of
-seconds in the interval.  PSPP has two time formats:
+seconds in the interval.  @pspp{} has three time formats:
  
  @float
-@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
+@multitable {Time Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 01:31:17.01}}
  @headitem Time Format @tab Template                  @tab Example
-@item TIME     @tab @code{hh:MM:SS.ss}          @tab @code{04:31:17.01}
+@item MTIME    @tab @code{MM:SS.ss}             @tab @code{91:17.01}
+@item TIME     @tab @code{hh:MM:SS.ss}          @tab @code{01:31:17.01}
  @item DTIME    @tab @code{DD HH:MM:SS.ss}       @tab @code{00 04:31:17.01}
  @end multitable
  @end float
  
-A @dfn{date} is a moment in the past or the future.  Internally, PSPP
+A @dfn{date} is a moment in the past or the future.  Internally, @pspp{}
  represents a date as the number of seconds since the @dfn{epoch},
  midnight, Oct. 14, 1582.  The date formats translate between
-human-readable dates and PSPP's numeric representation of dates and
-times.  PSPP has several date formats:
+human-readable dates and @pspp{}'s numeric representation of dates and
+times.  @pspp{} has several date formats:
  
  @float
  @multitable {Date Format} {@code{dd-mmm-yyyy HH:MM:SS.ss}} {@code{01-OCT-1978 04:31:17.01}}
@@ -1098,6 +1190,7 @@ times.  PSPP has several date formats:
  @item MOYR     @tab @code{mmm yyyy}             @tab @code{OCT 1978}
  @item WKYR     @tab @code{ww WK yyyy}           @tab @code{40 WK 1978}
  @item DATETIME @tab @code{dd-mmm-yyyy HH:MM:SS.ss} @tab @code{01-OCT-1978 04:31:17.01}
+@item YMDHMS   @tab @code{yyyy-mm-dd HH:MM:SS.ss} @tab @code{1978-01-OCT 04:31:17.01}
  @end multitable
  @end float
  
@@ -1116,11 +1209,11 @@ first three letters of an English month name (January, February,
  accepted.
  
  @item yyyy
-Year.  In output, DATETIME always produces a 4-digit year; other
-formats can produce a 2- or 4-digit year.  The century assumed for
-2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).  In
-output, a year outside the epoch causes the whole field to be filled
-with asterisks (@samp{*}).
+Year.  In output, DATETIME and YMDHMS always produce 4-digit years;
+other formats can produce a 2- or 4-digit year.  The century assumed
+for 2-digit years depends on the EPOCH setting (@pxref{SET EPOCH}).
+In output, a year outside the epoch causes the whole field to be
+filled with asterisks (@samp{*}).
  
  @item jjj
  Day of year (Julian day), from 1 to 366.  This is exactly three digits
@@ -1147,7 +1240,11 @@ two digits.
  Hour of day, from 0 to 23.  Output as exactly two digits.
  
  @item MM
-Minute of hour, from 0 to 59.  Output as exactly two digits.
+In MTIME, count of minutes, which may be positive or negative.  Output
+as at least two digits.
+
+In other formats, minute of hour, from 0 to 59.  Output as exactly two
+digits.
  
  @item SS.ss
  Seconds within minute, from 0 to 59.  The integer part is output as
@@ -1161,7 +1258,7 @@ controls the character accepted and displayed as the decimal point
  For output, the date and time formats use the delimiters indicated in
  the table.  For input, date components may be separated by spaces or by
  one of the characters @samp{-}, @samp{/}, @samp{.}, or @samp{,}, and
-time components may be separated by spaces, @samp{:}, or @samp{.}.  On
+time components may be separated by spaces or @samp{:}.  On
  input, the @samp{Q} separating quarter from year and the @samp{WK}
  separating week from year may be uppercase or lowercase, and the spaces
  around them are optional.
@@ -1175,7 +1272,7 @@ below:
  
  @float
  @multitable {DATETIME} {Min. Input Width} {Min. Output Width} {4-digit year}
-@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option 
+@headitem Format @tab Min. Input Width @tab Min. Output Width @tab Option
  @item DATE @tab 8 @tab 9 @tab 4-digit year
  @item ADATE @tab 8 @tab 8 @tab 4-digit year
  @item EDATE @tab 8 @tab 8 @tab 4-digit year
@@ -1185,41 +1282,43 @@ below:
  @item MOYR @tab 6 @tab 6 @tab 4-digit year
  @item WKYR @tab 6 @tab 8 @tab 4-digit year
  @item DATETIME @tab 17 @tab 17 @tab seconds
+@item YMDHMS @tab 12 @tab 16 @tab seconds
+@item MTIME @tab 4 @tab 5
  @item TIME @tab 5 @tab 5 @tab seconds
  @item DTIME @tab 8 @tab 8 @tab seconds
  @end multitable
  @end float
-@noindent 
+@noindent
  In the table, ``Option'' describes what increased output width enables:
  
  @table @asis
  @item 4-digit year
-A field 2 columns wider than minimum will include a 4-digit year.
-(DATETIME format always includes a 4-digit year.)
+A field 2 columns wider than the minimum includes a 4-digit year.
+(DATETIME and YMDHMS formats always include a 4-digit year.)
  
  @item seconds
-A field 3 columns wider than minimum will include seconds as well as
+A field 3 columns wider than the minimum includes seconds as well as
  minutes.  A field 5 columns wider than minimum, or more, can also
  include a decimal point and fractional seconds (but no more than allowed
  by the format's decimal places).
  @end table
  
  For the time and date formats, the default output format is the same as
-the input format, except that PSPP increases the field width, if
+the input format, except that @pspp{} increases the field width, if
  necessary, to the minimum allowed for output.
  
  Time or dates narrower than the field width are right-justified within
  the field.
  
  When a time or date exceeds the field width, characters are trimmed from
-the end until it fits.  This can occur in an unusual situation, e.g.@:
+the end until it fits.  This can occur in an unusual situation, @i{e.g.}@:
  with a year greater than 9999 (which adds an extra digit), or for a
-negative value on TIME or DTIME (which adds a leading minus sign).
+negative value on MTIME, TIME, or DTIME (which adds a leading minus sign).
  
  @c What about out-of-range values?
  
  The system-missing value is output as a period at the right end of the
-field.  
+field.
  
  @node Date Component Formats
  @subsubsection Date Component Formats
@@ -1264,7 +1363,7 @@ format is A format with half the input width.
  
  @cindex scratch variables
  Most of the time, variables don't retain their values between cases.
-Instead, either they're being read from a data file or the active file,
+Instead, either they're being read from a data file or the active dataset,
  in which case they assume the value read, or, if created with
  @cmd{COMPUTE} or
  another transformation, they're initialized to the system-missing value
@@ -1273,7 +1372,7 @@ or to blanks, depending on type.
  However, sometimes it's useful to have a variable that keeps its value
  between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
  use a @dfn{scratch variable}.  Scratch variables are variables whose
-names begin with an octothorpe (@samp{#}).  
+names begin with an octothorpe (@samp{#}).
  
  Scratch variables have the same properties as variables left with
  @cmd{LEAVE}: they retain their values between cases, and for the first
@@ -1285,9 +1384,9 @@ to copy its value into an ordinary variable, then use that ordinary
  variable in the analysis.
  
  @node Files
-@section Files Used by PSPP
+@section Files Used by @pspp{}
  
-PSPP makes use of many files each time it runs.  Some of these it
+@pspp{} makes use of many files each time it runs.  Some of these it
  reads, some it writes, some it creates.  Here is a table listing the
  most important of these files:
  
@@ -1299,8 +1398,8 @@ most important of these files:
  @item command file
  @itemx syntax file
  These names (synonyms) refer to the file that contains instructions
-that tell PSPP what to do.  The syntax file's name is specified on
-the PSPP command line.  Syntax files can also be read with
+that tell @pspp{} what to do.  The syntax file's name is specified on
+the @pspp{} command line.  Syntax files can also be read with
  @cmd{INCLUDE} (@pxref{INCLUDE}).
  
  @cindex file, data
@@ -1312,18 +1411,10 @@ be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
  @cindex file, output
  @cindex output file
  @item listing file
-One or more output files are created by PSPP each time it is
+One or more output files are created by @pspp{} each time it is
  run.  The output files receive the tables and charts produced by
  statistical procedures.  The output files may be in any number of formats,
-depending on how PSPP is configured.
-
-@cindex active file
-@cindex file, active
-@item active file
-The active file is the ``file'' on which all PSPP procedures are
-performed.  The active file consists of a dictionary and a set of cases.
-The active file is not necessarily a disk file: it is stored in memory
-if there is room.
+depending on how @pspp{} is configured.
  
  @cindex system file
  @cindex file, system
@@ -1337,60 +1428,41 @@ cases.  @cmd{GET} and @cmd{SAVE} read and write system files.
  Portable files are files in a text-based format that store a dictionary
  and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
  portable files.
-
-@cindex scratch file
-@cindex file, scratch
-@item scratch file
-Scratch files consist of a dictionary and cases and may be stored in
-memory or on disk.  Most procedures that act on a system file or
-portable file can use a scratch file instead.  The contents of scratch
-files persist within a single PSPP session only.  @cmd{GET} and
-@cmd{SAVE} can be used to read and write scratch files.  Scratch files
-are a PSPP extension.
  @end table
  
  @node File Handles
  @section File Handles
  @cindex file handles
  
-A @dfn{file handle} is a reference to a data file, system file, portable
-file, or scratch file.  Most often, a file handle is specified as the
+A @dfn{file handle} is a reference to a data file, system file, or
+portable file.  Most often, a file handle is specified as the
  name of a file as a string, that is, enclosed within @samp{'} or
  @samp{"}.
  
  A file name string that begins or ends with @samp{|} is treated as the
  name of a command to pipe data to or from.  You can use this feature
  to read data over the network using a program such as @samp{curl}
-(e.g.@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
+(@i{e.g.}@: @code{GET '|curl -s -S http://example.com/mydata.sav'}), to
  read compressed data from a file using a program such as @samp{zcat}
-(e.g.@: @code{GET '|zcat mydata.sav.gz'}), and for many other
+(@i{e.g.}@: @code{GET '|zcat mydata.sav.gz'}), and for many other
  purposes.
  
-PSPP also supports declaring named file handles with the @cmd{FILE
+@pspp{} also supports declaring named file handles with the @cmd{FILE
  HANDLE} command.  This command associates an identifier of your choice
  (the file handle's name) with a file.  Later, the file handle name can
-be substituted for the name of the file.  When PSPP syntax accesses a
+be substituted for the name of the file.  When @pspp{} syntax accesses a
  file multiple times, declaring a named file handle simplifies updating
  the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
  also required to read data files in binary formats.  @xref{FILE HANDLE},
  for more information.
  
-PSPP assumes that a file handle name that begins with @samp{#} refers to
-a scratch file, unless the name has already been declared on @cmd{FILE
-HANDLE} to refer to another kind of file.  A scratch file is similar to
-a system file, except that it persists only for the duration of a given
-PSPP session.  Most commands that read or write a system or portable
-file, such as @cmd{GET} and @cmd{SAVE}, also accept scratch file
-handles.  Scratch file handles may also be declared explicitly with
-@cmd{FILE HANDLE}.  Scratch files are a PSPP extension.
-
-In some circumstances, PSPP must distinguish whether a file handle
+In some circumstances, @pspp{} must distinguish whether a file handle
  refers to a system file or a portable file.  When this is necessary to
-read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
-PSPP uses the file's contents to decide.  In the context of writing a
-file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, PSPP
+read a file, @i{e.g.}@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
+@pspp{} uses the file's contents to decide.  In the context of writing a
+file, @i{e.g.}@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, @pspp{}
  decides based on the file's name: if it ends in @samp{.por} (with any
-capitalization), then PSPP writes a portable file; otherwise, PSPP
+capitalization), then @pspp{} writes a portable file; otherwise, @pspp{}
  writes a system file.
  
  INLINE is reserved as a file handle name.  It refers to the ``data
@@ -1399,8 +1471,7 @@ file'' embedded into the syntax file between @cmd{BEGIN DATA} and
  
  The file to which a file handle refers may be reassigned on a later
  @cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
-HANDLE}.  The @cmd{CLOSE FILE HANDLE} command is also useful to free the
-storage associated with a scratch file.  @xref{CLOSE FILE HANDLE}, for
+HANDLE}.  @xref{CLOSE FILE HANDLE}, for
  more information.
  
  @node BNF
@@ -1410,7 +1481,7 @@ more information.
  @cindex command syntax, description of
  @cindex description of command syntax
  
-The syntax of some parts of the PSPP language is presented in this
+The syntax of some parts of the @pspp{} language is presented in this
  manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
  following table describes BNF:
  
@@ -1418,7 +1489,7 @@ following table describes BNF:
  @cindex keywords
  @cindex terminals
  @item
-Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
+Words in all-uppercase are @pspp{} keyword tokens.  In BNF, these are
  often called @dfn{terminals}.  There are some special terminals, which
  are written in lowercase for clarity:
  
@@ -1447,7 +1518,7 @@ Operators and punctuators.
  @cindex @code{.}
  @item @code{.}
  The end of the command.  This is not necessarily an actual dot in the
-syntax file: @xref{Commands}, for more details.
+syntax file (@pxref{Commands}).
  @end table
  
  @item