Split pspp.texi into one texi file per chapter

[pspp-builds.git] / doc / language.texi
diff --git a/doc/language.texi b/doc/language.texi

new file mode 100644 (file)

index 0000000..6a5a19e
--- /dev/null
+++ b/doc/language.texi
@@ -0,0 +1,1217 @@
+@node Language, Expressions, Invocation, Top
+@chapter The PSPP language
+@cindex language, PSPP
+@cindex PSPP, language
+
+@quotation
+@strong{Please note:} PSPP is not even close to completion.
+Only a few actual statistical procedures are implemented.  PSPP
+is a work in progress.
+@end quotation
+
+This chapter discusses elements common to many PSPP commands.
+Later chapters will describe individual commands in detail.
+
+@menu
+* Tokens::                      Characters combine to form tokens.
+* Commands::                    Tokens combine to form commands.
+* Types of Commands::           Commands come in several flavors.
+* Order of Commands::           Commands combine to form syntax files.
+* Missing Observations::        Handling missing observations.
+* Variables::                   The unit of data storage.
+* Files::                       Files used by PSPP.
+* BNF::                         How command syntax is described.
+@end menu
+
+@node Tokens, Commands, Language, Language
+@section Tokens
+@cindex language, lexical analysis
+@cindex language, tokens
+@cindex tokens
+@cindex lexical analysis
+@cindex lexemes
+
+PSPP divides most syntax file lines into series of short chunks
+called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}.  These
+tokens are then grouped to form commands, each of which tells
+PSPP to take some action---read in data, write out data, perform
+a statistical procedure, etc.  The process of dividing input into tokens
+is @dfn{tokenization}, or @dfn{lexical analysis}.  Each type of token is
+described below.
+
+@cindex delimiters
+@cindex whitespace
+Tokens must be separated from each other by @dfn{delimiters}.
+Delimiters include whitespace (spaces, tabs, carriage returns, line
+feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
+operators (plus, minus, times, divide, etc.)  Note that while whitespace
+only separates tokens, other delimiters are tokens in themselves.
+
+@table @strong
+@cindex identifiers
+@item Identifiers
+Identifiers are names that specify variable names, commands, or command
+details.
+
+@itemize @bullet
+@item
+The first character in an identifier must be a letter, @samp{#}, or
+@samp{@@}.  Some system identifiers begin with @samp{$}, but
+user-defined variables' names may not begin with @samp{$}.
+
+@item
+The remaining characters in the identifier must be letters, digits, or
+one of the following special characters: 
+
+@example
+.  _  $  #  @@
+@end example
+
+@item
+@cindex variable names
+@cindex names, variable
+Variable names may be any length, but only the first 8 characters are
+significant.
+
+@item
+@cindex case-sensitivity
+Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
+@code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
+representations of the same identifier.
+
+@item
+@cindex keywords
+Identifiers other than variable names may be abbreviated to their first
+3 characters if this abbreviation is unambiguous.  These identifiers are
+often called @dfn{keywords}.  (Unique abbreviations of 3 or more
+characters are also accepted: @samp{FRE}, @samp{FREQ}, and
+@samp{FREQUENCIES} are equivalent when the last is a keyword.)
+
+@item
+Whether an identifier is a keyword depends on the context.
+
+@item
+@cindex keywords, reserved
+@cindex reserved keywords
+Some keywords are reserved.  These keywords may not be used in any
+context besides those explicitly described in this manual.  The reserved
+keywords are:
+
+@example
+ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
+@end example
+
+@item 
+Since keywords are identifiers, all the rules for identifiers apply.
+Specifically, they must be delimited as are other identifiers:
+@code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
+variable name.
+@end itemize
+
+@cindex @samp{.}
+@cindex period
+@cindex variable names, ending with period
+@strong{Caution:} It is legal to end a variable name with a period, but
+@emph{don't do it!}  The variable name will be misinterpreted when it is
+the final token on a line: @code{FOO.} will be divided into two separate
+tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
+@xref{Commands, , Forming commands of tokens}.
+
+@item Numbers
+@cindex numbers
+@cindex integers
+@cindex reals
+Numbers may be specified as integers or reals.  Integers are internally
+converted into reals.  Scientific notation is not supported.  Here are
+some examples of valid numbers:
+
+@example
+1234  3.14159265359  .707106781185  8945.
+@end example
+
+@strong{Caution:} The last example will be interpreted as two tokens,
+@samp{8945} and @samp{.}, if it is the last token on a line.
+
+@item Strings
+@cindex strings
+@cindex @samp{'}
+@cindex @samp{"}
+@cindex case-sensitivity
+Strings are literal sequences of characters enclosed in pairs of single
+quotes (@samp{'}) or double quotes (@samp{"}).
+
+@itemize @bullet
+@item
+Whitespace and case of letters @emph{are} significant inside strings.
+@item
+Whitespace characters inside a string are not delimiters.
+@item
+To include single-quote characters in a string, enclose the string in
+double quotes.
+@item
+To include double-quote characters in a string, enclose the string in
+single quotes.
+@item
+It is not possible to put both single- and double-quote characters
+inside one string.
+@end itemize
+
+@item Hexstrings
+@cindex hexstrings
+Hexstrings are string variants that use hex digits to specify
+characters.
+
+@itemize @bullet
+@item
+A hexstring may be used anywhere that an ordinary string is allowed.
+
+@item
+@cindex @samp{X'}
+@cindex @samp{'}
+A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
+
+@cindex whitespace
+@item
+No whitespace is allowed between the initial @samp{X} and @samp{'}.
+
+@item
+Double quotes @samp{"} may be used in place of single quotes @samp{'} if
+done in both places.
+
+@item
+Each pair of hex digits is internally changed into a single character
+with the given value.
+
+@item
+If there is an odd number of hex digits, the missing last digit is
+assumed to be @samp{0}.
+
+@item
+@cindex portability
+@strong{Please note:} Use of hexstrings is nonportable because the same
+numeric values are associated with different glyphs by different
+operating systems.  Therefore, their use should be confined to syntax
+files that will not be widely distributed.
+
+@item
+@cindex characters, reserved
+@cindex 0
+@cindex whitespace
+@strong{Please note also:} The character with value 00 is reserved for
+internal use by PSPP.  Its use in strings causes an error and
+replacement with a blank space (in ASCII, hex 20, decimal 32).
+@end itemize
+
+@item Punctuation
+@cindex punctuation
+Punctuation separates tokens; punctuators are delimiters.  These are the
+punctuation characters:
+
+@example
+,  /  =  (  )
+@end example
+
+@item Operators
+@cindex operators
+Operators describe mathematical operations.  Some operators are delimiters:
+
+@example
+(  )  +  -  *  /  **
+@end example
+
+Many of the above operators are also punctuators.  Punctuators are
+distinguished from operators by context. 
+
+The other operators are all reserved keywords.  None of these are
+delimiters:
+
+@example
+AND  EQ  GE  GT  LE  LT  NE  OR
+@end example
+
+@item Terminal Dot
+@cindex terminal dot
+@cindex dot, terminal
+@cindex period
+@cindex @samp{.}
+A period (@samp{.}) at the end of a line (except for whitespace) is one
+type of a @dfn{terminal dot}, although not every terminal dot is a
+period at the end of a line.  @xref{Commands, , Forming commands of
+tokens}.  A period is a terminal dot @emph{only}
+when it is at the end of a line; otherwise it is part of a
+floating-point number.  (A period outside a number in the middle of a
+line is an error.)
+
+@quotation
+@cindex terminal dot, changing
+@cindex dot, terminal, changing
+@strong{Please note:} The character used for the @dfn{terminal dot}
+can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}).  This
+is strongly discouraged, and throughout all the remainder of this
+manual it will be assumed that the default setting is in effect.
+@end quotation
+
+@end table
+
+@node Commands, Types of Commands, Tokens, Language
+@section Forming commands of tokens
+
+@cindex PSPP, command structure
+@cindex language, command structure
+@cindex commands, structure
+
+Most PSPP commands share a common structure, diagrammed below:
+
+@example
+@var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
+@end example
+
+@cindex @samp{[  ]}
+In the above, rather daunting, expression, pairs of square brackets
+(@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
+indicate parts of the syntax that vary from command to command.
+Ellipses (@samp{...}) indicate that the preceding part may be repeated
+an arbitrary number of times.  Let's pick apart what it says above:
+
+@itemize @bullet
+@cindex commands, names
+@item
+A command begins with a command name of one or more keywords, such as
+@cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}.  @var{cmd}
+may be abbreviated to its first word if that is unambiguous; each word
+in @var{cmd} may be abbreviated to a unique prefix of three or more
+characters as described above.
+
+@cindex subcommands
+@item
+The command name may be followed by one or more @dfn{subcommands}:
+
+@itemize @minus
+@item
+Each subcommand begins with a unique keyword, indicated by @var{sbc}
+above.  This is analogous to the command name.
+
+@item
+The subcommand name is optionally followed by an equals sign (@samp{=}).
+
+@item
+Some subcommands accept a series of one or more specifications
+(@var{spec}), optionally separated by commas.
+
+@item
+Each subcommand must be separated from the next (if any) by a forward
+slash (@samp{/}).
+@end itemize
+
+@cindex dot, terminal
+@cindex terminal dot
+@item
+Each command must be terminated with a @dfn{terminal dot}.  
+The terminal dot may be given one of three ways:
+
+@itemize @minus
+@item
+(most commonly) A period character at the very end of a line, as
+described above.
+
+@item
+(only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
+details.)  A completely blank line.
+
+@item
+(in batch mode only) Any line that is not indented from the left side of
+the page causes a terminal dot to be inserted before that line.
+Therefore, each command begins with a line that is flush left, followed
+by zero or more lines that are indented one or more characters from the
+left margin.
+
+In batch mode, PSPP will ignore a plus sign, minus sign, or period
+(@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
+line.  Any of these characters as the first character on a line will
+begin a new command.  This allows for visual indentation of a command
+without that command being considered part of the previous command.
+
+PSPP is in batch mode when it is reading input from a file, rather
+than from an interactive user.  Note that the other forms of the
+terminal dot may also be used in batch mode.
+
+Sometimes, one encounters syntax files that are intended to be
+interpreted in interactive mode rather than batch mode (for instance,
+this can happen if a session log file is used directly as a syntax
+file).  When this occurs, use the @samp{-i} command line option to force
+interpretation in interactive mode (@pxref{Language control options}).
+@end itemize
+@end itemize
+
+PSPP ignores empty commands when they are generated by the above
+rules.  Note that, as a consequence of these rules, each command must
+begin on a new line.
+
+@node Types of Commands, Order of Commands, Commands, Language
+@section Types of Commands
+
+Commands in PSPP are divided roughly into six categories:
+
+@table @strong
+@item Utility commands
+@cindex utility commands
+Set or display various global options that affect PSPP operations.
+May appear anywhere in a syntax file.  @xref{Utilities, , Utility
+commands}.
+
+@item File definition commands
+@cindex file definition commands
+Give instructions for reading data from text files or from special
+binary ``system files''.  Most of these commands discard any previous
+data or variables to replace it with the new data and
+variables.  At least one must appear before the first command in any of
+the categories below.  @xref{Data Input and Output}.
+
+@item Input program commands
+@cindex input program commands
+Though rarely used, these provide powerful tools for reading data files
+in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
+
+@item Transformations
+@cindex transformations
+Perform operations on data and write data to output files.  Transformations
+are not carried out until a procedure is executed.  
+
+@item Restricted transformations
+@cindex restricted transformations
+Same as transformations for most purposes.  @xref{Order of Commands}, for a
+detailed description of the differences.
+
+@item Procedures
+@cindex procedures
+Analyze data, writing results of analyses to the listing file.  Cause
+transformations specified earlier in the file to be performed.  In a
+more general sense, a @dfn{procedure} is any command that causes the
+active file (the data) to be read.
+@end table
+
+@node Order of Commands, Missing Observations, Types of Commands, Language
+@section Order of Commands
+@cindex commands, ordering
+@cindex order of commands
+
+PSPP does not place many restrictions on ordering of commands.
+The main restriction is that variables must be defined with one of the
+file-definition commands before they are otherwise referred to.
+
+Of course, there are specific rules, for those who are interested.
+PSPP possesses five internal states, called initial, INPUT PROGRAM,
+FILE TYPE, transformation, and procedure states.  (Please note the
+distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
+@emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
+
+PSPP starts up in the initial state.  Each successful completion
+of a command may cause a state transition.  Each type of command has its
+own rules for state transitions:
+
+@table @strong
+@item Utility commands
+@itemize @bullet
+@item
+Legal in all states.
+@item
+Do not cause state transitions.  Exception: when @cmd{N OF CASES}
+is executed in the procedure state, it causes a transition to the
+transformation state.
+@end itemize
+
+@item @cmd{DATA LIST}
+@itemize @bullet
+@item
+Legal in all states.
+@item
+When executed in the initial or procedure state, causes a transition to
+the transformation state.  
+@item
+Clears the active file if executed in the procedure or transformation
+state.
+@end itemize
+
+@item @cmd{INPUT PROGRAM}
+@itemize @bullet
+@item
+Invalid in INPUT PROGRAM and FILE TYPE states.
+@item
+Causes a transition to the INPUT PROGRAM state.  
+@item
+Clears the active file.
+@end itemize
+
+@item @cmd{FILE TYPE}
+@itemize @bullet
+@item
+Invalid in INPUT PROGRAM and FILE TYPE states.
+@item
+Causes a transition to the FILE TYPE state.
+@item
+Clears the active file.
+@end itemize
+
+@item Other file definition commands
+@itemize @bullet
+@item
+Invalid in INPUT PROGRAM and FILE TYPE states.
+@item
+Cause a transition to the transformation state.
+@item
+Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
+and @cmd{UPDATE}.
+@end itemize
+
+@item Transformations
+@itemize @bullet
+@item
+Invalid in initial and FILE TYPE states.
+@item
+Cause a transition to the transformation state.
+@end itemize
+
+@item Restricted transformations
+@itemize @bullet
+@item
+Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
+@item
+Cause a transition to the transformation state.
+@end itemize
+
+@item Procedures
+@itemize @bullet
+@item
+Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
+@item
+Cause a transition to the procedure state.
+@end itemize
+@end table
+
+@node Missing Observations, Variables, Order of Commands, Language
+@section Handling missing observations
+@cindex missing values
+@cindex values, missing
+
+PSPP includes special support for unknown numeric data values.
+Missing observations are assigned a special value, called the
+@dfn{system-missing value}.  This ``value'' actually indicates the
+absence of value; it means that the actual value is unknown.  Procedures
+automatically exclude from analyses those observations or cases that
+have missing values.  Whether single observations or entire cases are
+excluded depends on the procedure.
+
+The system-missing value exists only for numeric variables.  String
+variables always have a defined value, even if it is only a string of
+spaces.
+
+Variables, whether numeric or string, can have designated
+@dfn{user-missing values}.  Every user-missing value is an actual value
+for that variable.  However, most of the time user-missing values are
+treated in the same way as the system-missing value.  String variables
+that are wider than a certain width, usually 8 characters (depending on
+computer architecture), cannot have user-missing values.
+
+For more information on missing values, see the following sections:
+@ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
+documentation on individual procedures for information on how they
+handle missing values.
+
+@node Variables, Files, Missing Observations, Language
+@section Variables
+@cindex variables
+@cindex dictionary
+
+Variables are the basic unit of data storage in PSPP.  All the
+variables in a file taken together, apart from any associated data, are
+said to form a @dfn{dictionary}.  
+Some details of variables are described in the sections below.
+
+@menu
+* Attributes::                  Attributes of variables.
+* System Variables::            Variables automatically defined by PSPP.
+* Sets of Variables::           Lists of variable names.
+* Input/Output Formats::        Input and output formats.
+* Scratch Variables::           Variables deleted by procedures.
+@end menu
+
+@node Attributes, System Variables, Variables, Variables
+@subsection Attributes of Variables
+@cindex variables, attributes of
+@cindex attributes of variables
+Each variable has a number of attributes, including:
+
+@table @strong
+@item Name
+This is an identifier.  Each variable must have a different name.
+@xref{Tokens}.
+
+@cindex variables, type
+@cindex type of variables
+@item Type
+Numeric or string.
+
+@cindex variables, width
+@cindex width of variables
+@item Width
+(string variables only) String variables with a width of 8 characters or
+fewer are called @dfn{short string variables}.  Short string variables
+can be used in many procedures where @dfn{long string variables} (those
+with widths greater than 8) are not allowed.
+
+@quotation
+@strong{Please note:} Certain systems may consider strings longer than 8
+characters to be short strings.  Eight characters represents a minimum
+figure for the maximum length of a short string.
+@end quotation
+
+@item Position
+Variables in the dictionary are arranged in a specific order.
+@cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
+
+@item Initialization
+Either reinitialized to 0 or spaces for each case, or left at its
+existing value.  @xref{LEAVE}.
+
+@cindex missing values
+@cindex values, missing
+@item Missing values
+Optionally, up to three values, or a range of values, or a specific
+value plus a range, can be specified as @dfn{user-missing values}.
+There is also a @dfn{system-missing value} that is assigned to an
+observation when there is no other obvious value for that observation.
+Observations with missing values are automatically excluded from
+analyses.  User-missing values are actual data values, while the
+system-missing value is not a value at all.  @xref{Missing Observations}.
+
+@cindex variable labels
+@cindex labels, variable
+@item Variable label
+A string that describes the variable.  @xref{VARIABLE LABELS}.
+
+@cindex value labels
+@cindex labels, value
+@item Value label
+Optionally, these associate each possible value of the variable with a
+string.  @xref{VALUE LABELS}.
+
+@cindex print format
+@item Print format
+Display width, format, and (for numeric variables) number of decimal
+places.  This attribute does not affect how data are stored, just how
+they are displayed.  Example: a width of 8, with 2 decimal places.
+@xref{PRINT FORMATS}.
+
+@cindex write format
+@item Write format
+Similar to print format, but used by certain commands that are
+designed to write to binary files.  @xref{WRITE FORMATS}.
+@end table
+
+@node System Variables, Sets of Variables, Attributes, Variables
+@subsection Variables Automatically Defined by PSPP
+@cindex system variables
+@cindex variables, system
+
+There are seven system variables.  These are not like ordinary
+variables, as they are not stored in each case.  They can only be used
+in expressions.  These system variables, whose values and output formats
+cannot be modified, are described below.
+
+@table @code
+@cindex @code{$CASENUM}
+@item $CASENUM
+Case number of the case at the moment.  This changes as cases are
+shuffled around.
+
+@cindex @code{$DATE}
+@item $DATE
+Date the PSPP process was started, in format A9, following the
+pattern @code{DD MMM YY}.
+
+@cindex @code{$JDATE}
+@item $JDATE
+Number of days between 15 Oct 1582 and the time the PSPP process
+was started.
+
+@cindex @code{$LENGTH}
+@item $LENGTH
+Page length, in lines, in format F11.
+
+@cindex @code{$SYSMIS}
+@item $SYSMIS
+System missing value, in format F1.
+
+@cindex @code{$TIME}
+@item $TIME
+Number of seconds between midnight 14 Oct 1582 and the time the active file
+was read, in format F20.
+
+@cindex @code{$WIDTH}
+@item $WIDTH
+Page width, in characters, in format F3.
+@end table
+
+@node Sets of Variables, Input/Output Formats, System Variables, Variables
+@subsection Lists of variable names
+@cindex TO convention
+@cindex convention, TO
+
+There are several ways to specify a set of variables:
+
+@enumerate
+@item
+(Most commonly.)  List the variable names one after another, optionally
+separating them by commas.
+
+@cindex @code{TO}
+@item
+(This method cannot be used on commands that define the dictionary, such
+as @cmd{DATA LIST}.)  The syntax is the names of two existing variables,
+separated by the reserved keyword @code{TO}.  The meaning is to include
+every variable in the dictionary between and including the variables
+specified.  For instance, if the dictionary contains six variables with
+the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
+@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
+variables @code{X2}, @code{GOAL}, and @code{MET}.
+
+@item
+(This method can be used only on commands that define the dictionary,
+such as @cmd{DATA LIST}.)  It is used to define sequences of variables
+that end in consecutive integers.  The syntax is two identifiers that
+end in numbers.  This method is best illustrated with examples:
+
+@itemize @bullet
+@item
+The syntax @code{X1 TO X5} defines 5 variables:
+
+@itemize @minus
+@item
+X1
+@item
+X2
+@item
+X3
+@item
+X4
+@item
+X5
+@end itemize
+
+@item
+The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
+
+@itemize @minus
+@item
+ITEM0008
+@item
+ITEM0009
+@item
+ITEM0010
+@item
+ITEM0011
+@item
+ITEM0012
+@item
+ITEM0013
+@end itemize
+
+@item
+Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
+are invalid, although for different reasons, which should be evident.
+@end itemize
+
+Note that after a set of variables has been defined with @cmd{DATA LIST}
+or another command with this method, the same set can be referenced on
+later commands using the same syntax.
+
+@item
+The above methods can be combined, either one after another or delimited
+by commas.  For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
+is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
+is individually legal.
+@end enumerate
+
+@node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
+@subsection Input and Output Formats
+
+Data that PSPP inputs and outputs must have one of a number of formats.
+These formats are described, in general, by a format specification of
+the form @code{NAMEw.d}, where @var{name} is the
+format name and @var{w} is a field width.  @var{d} is the optional
+desired number of decimal places, if appropriate.  If @var{d} is not
+included then it is assumed to be 0.  Some formats do not allow @var{d}
+to be specified.
+
+When an input format is specified on @cmd{DATA LIST} or another
+command, then
+it is converted to an output format for the purposes of @cmd{PRINT}
+and other
+data output commands.  For most purposes, input and output formats are
+the same; the salient differences are described below.
+
+Below are listed the input and output formats supported by PSPP.  If an
+input format is mapped to a different output format by default, then
+that mapping is indicated with @result{}.  Each format has the listed
+bounds on input width (iw) and output width (ow).
+
+The standard numeric input and output formats are given in the following
+table:
+
+@table @asis
+@item Fw.d: 1 <= iw,ow <= 40
+Standard decimal format with @var{d} decimal places.  If the number is
+too large to fit within the field width, it is expressed in scientific
+notation (@code{1.2+34}) if w >= 6, with always at least two digits in
+the exponent.  When used as an input format, scientific notation is
+allowed but an E or an F must be used to introduce the exponent.
+
+The default output format is the same as the input format, except if
+@var{d} > 1.  In that case the output @var{w} is always made to be at
+least 2 + @var{d}.
+
+@item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
+For input this is equivalent to F format except that no E or F is
+require to introduce the exponent.  For output, produces scientific
+notation in the form @code{1.2+34}.  There are always at least two
+digits given in the exponent.
+
+The default output @var{w} is the largest of the input @var{w}, the
+input @var{d} + 7, and 10.  The default output @var{d} is the input
+@var{d}, but at least 3.
+
+@item COMMAw.d: 1 <= iw,ow <= 40
+Equivalent to F format, except that groups of three digits are
+comma-separated on output.  If the number is too large to express in the
+field width, then first commas are eliminated, then if there is still
+not enough space the number is expressed in scientific notation given
+that w >= 6.  Commas are allowed and ignored when this is used as an
+input format.  
+
+@item DOTw.d: 1 <= iw,ow <= 40
+Equivalent to COMMA format except that the roles of comma and decimal
+point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
+COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
+decimal point.
+
+@item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
+Equivalent to COMMA format, except that the number is prefixed by a
+dollar sign (@samp{$}) if there is room.  On input the value is allowed
+to be prefixed by a dollar sign, which is ignored.
+
+The default output @var{w} is the input @var{w}, but at least 2.
+
+@item PCTw.d: 2 <= iw,ow <= 40
+Equivalent to F format, except that the number is suffixed by a percent
+sign (@samp{%}) if there is room.  On input the value is allowed to be
+suffixed by a percent sign, which is ignored.
+
+The default output @var{w} is the input @var{w}, but at least 2.
+
+@item Nw.d: 1 <= iw,ow <= 40
+Only digits are allowed within the field width.  The decimal point is
+assumed to be @var{d} digits from the right margin.
+
+The default output format is F with the same @var{w} and @var{d}, except
+if @var{d} > 1.  In that case the output @var{w} is always made to be at
+least 2 + @var{d}.
+
+@item Zw.d @result{} F: 1 <= iw,ow <= 40
+Zoned decimal input.  If you need to use this then you know how.
+
+@item IBw.d @result{} F: 1 <= iw,ow <= 8
+Integer binary format.  The field is interpreted as a fixed-point
+positive or negative binary number in two's-complement notation.  The
+location of the decimal point is implied.  Endianness is the same as the
+host machine.
+
+The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
+with output @var{w} as 9 + input @var{d} and output @var{d} as input
+@var{d}.
+
+@item PIB @result{} F: 1 <= iw,ow <= 8
+Positive integer binary format.  The field is interpreted as a
+fixed-point positive binary number.  The location of the decimal point
+is implied.  Endianness is teh same as the host machine.
+
+The default output format follows the rules for IB format.
+
+@item Pw.d @result{} F: 1 <= iw,ow <= 16
+Binary coded decimal format.  Each byte from left to right, except the
+rightmost, represents two digits.  The upper nibble of each byte is more
+significant.  The upper nibble of the final byte is the least
+significant digit.  The lower nibble of the final byte is the sign; a
+value of D represents a negative sign and all other values are
+considered positive.  The decimal point is implied.
+
+The default output format follows the rules for IB format.
+
+@item PKw.d @result{} F: 1 <= iw,ow <= 16
+Positive binary code decimal format.  Same as P but the last byte is the
+same as the others.
+
+The default output format follows the rules for IB format.
+
+@item RBw @result{} F: 2 <= iw,ow <= 8
+
+Binary C architecture-dependent ``double'' format.  For a standard
+IEEE754 implementation @var{w} should be 8.
+
+The default output format follows the rules for IB format.
+
+@item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
+PIB format encoded as textual hex digit pairs.  @var{w} must be even.
+
+The input width is mapped to a default output width as follows:
+2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
+12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
+decimal places.
+
+@item RBHEXw @result{} F: 4 <= iw,ow <= 16
+
+RB format encoded as textual hex digits pairs.  @var{w} must be even.
+
+The default output format is F8.2.
+
+@item CCAw.d: 1 <= ow <= 40
+@itemx CCBw.d: 1 <= ow <= 40
+@itemx CCCw.d: 1 <= ow <= 40
+@itemx CCDw.d: 1 <= ow <= 40
+@itemx CCEw.d: 1 <= ow <= 40
+
+User-defined custom currency formats.  May not be used as an input
+format.  @xref{SET}, for more details.
+@end table
+
+The date and time numeric input and output formats accept a number of
+possible formats.  Before describing the formats themselves, some
+definitions of the elements that make up their formats will be helpful:
+
+@table @dfn
+@item leader
+All formats accept an optional whitespace leader.
+
+@item day
+An integer between 1 and 31 representing the day of month.
+
+@item day-count
+An integer representing a number of days.
+
+@item date-delimiter
+One or more characters of whitespace or the following characters:
+@code{- / . ,}
+
+@item month
+A month name in one of the following forms:
+@itemize @bullet
+@item
+An integer between 1 and 12.
+@item
+Roman numerals representing an integer between 1 and 12.
+@item
+At least the first three characters of an English month name (January,
+February, @dots{}).
+@end itemize
+
+@item year
+An integer year number between 1582 and 19999, or between 1 and 199.
+Years between 1 and 199 will have 1900 added.
+
+@item julian
+A single number with a year number in the first 2, 3, or 4 digits (as
+above) and the day number within the year in the last 3 digits.
+
+@item quarter
+An integer between 1 and 4 representing a quarter.
+
+@item q-delimiter
+The letter @samp{Q} or @samp{q}.
+
+@item week
+An integer between 1 and 53 representing a week within a year.
+
+@item wk-delimiter
+The letters @samp{wk} in any case.
+
+@item time-delimiter
+At least one characters of whitespace or @samp{:} or @samp{.}.
+
+@item hour
+An integer greater than 0 representing an hour.
+
+@item minute
+An integer between 0 and 59 representing a minute within an hour.
+
+@item opt-second
+Optionally, a time-delimiter followed by a real number representing a
+number of seconds.
+
+@item hour24
+An integer between 0 and 23 representing an hour within a day.
+
+@item weekday
+At least the first two characters of an English day word.
+
+@item spaces
+Any amount or no amount of whitespace.
+
+@item sign
+An optional positive or negative sign.
+
+@item trailer
+All formats accept an optional whitespace trailer.
+@end table
+
+The date input formats are strung together from the above pieces.  On
+output, the date formats are always printed in a single canonical
+manner, based on field width.  The date input and output formats are
+described below:
+
+@table @asis
+@item DATEw: 9 <= iw,ow <= 40
+Date format. Input format: leader + day + date-delimiter +
+month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
+@var{w} < 11, DD-MMM-YYYY otherwise.
+
+@item EDATEw: 8 <= iw,ow <= 40
+European date format.  Input format same as DATE.  Output format:
+DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
+
+@item SDATEw: 8 <= iw,ow <= 40
+Standard date format. Input format: leader + year + date-delimiter +
+month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
+@var{w} < 10, YYYY/MM/DD otherwise.
+
+@item ADATEw: 8 <= iw,ow <= 40
+American date format.  Input format: leader + month + date-delimiter +
+day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
+@var{w} < 10, MM/DD/YYYY otherwise.
+
+@item JDATEw: 5 <= iw,ow <= 40
+Julian date format.  Input format: leader + julian + trailer.  Output
+format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
+
+@item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
+Quarter/year format.  Input format: leader + quarter + q-delimiter +
+year + trailer.  Output format: @samp{Q Q YY}, where the first
+@samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
+YYYY} otherwise.
+
+@item MOYRw: 6 <= iw,ow <= 40
+Month/year format.  Input format: leader + month + date-delimiter + year
++ trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
+YYYY} otherwise.
+
+@item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
+Week/year format.  Input format: leader + week + wk-delimiter + year +
+trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
+YYYY} otherwise.
+
+@item DATETIMEw.d: 17 <= iw,ow <= 40
+Date and time format.  Input format: leader + day + date-delimiter +
+month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
++ minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
+@var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
+@var{d} > 0 then fractional seconds @samp{.SS} are added.
+
+@item TIMEw.d: 5 <= iw,ow <= 40
+Time format.  Input format: leader + sign + spaces + hour +
+time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
+Seconds and fractional seconds are available with @var{w} of at least 8
+and 10, respectively.
+
+@item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
+Time format with day count.  Input format: leader + sign + spaces +
+day-count + time-delimiter + hour + time-delimiter + minute +
+opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
+seconds are available with @var{w} of at least 8 and 10, respectively.
+
+@item WKDAYw: 2 <= iw,ow <= 40
+A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
+leader + weekday + trailer.  Output format: as many characters, in all
+capital letters, of the English name of the weekday as will fit in the
+field width.
+
+@item MONTHw: 3 <= iw,ow <= 40
+A month as a number between 1 and 12, where 1 is January.  Input format:
+leader + month + trailer.  Output format: as many character, in all
+capital letters, of the English name of the month as will fit in the
+field width.
+@end table
+
+There are only two formats that may be used with string variables:
+
+@table @asis
+@item Aw: 1 <= iw <= 255, 1 <= ow <= 254
+The entire field is treated as a string value.
+
+@item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
+The field is composed of characters in a string encoded as textual hex
+digit pairs.
+
+The default output @var{w} is half the input @var{w}.
+@end table
+
+@node Scratch Variables,  , Input/Output Formats, Variables
+@subsection Scratch Variables
+
+Most of the time, variables don't retain their values between cases.
+Instead, either they're being read from a data file or the active file,
+in which case they assume the value read, or, if created with
+@cmd{COMPUTE} or
+another transformation, they're initialized to the system-missing value
+or to blanks, depending on type.
+
+However, sometimes it's useful to have a variable that keeps its value
+between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
+use a @dfn{scratch variable}.  Scratch variables are variables whose
+names begin with an octothorpe (@samp{#}).  
+
+Scratch variables have the same properties as variables left with
+@cmd{LEAVE}:
+they retain their values between cases, and for the first case they are
+initialized to 0 or blanks.  They have the additional property that they
+are deleted before the execution of any procedure.  For this reason,
+scratch variables can't be used for analysis.  To obtain the same
+effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
+value into an ordinary variable, then analysis that variable.
+
+@node Files, BNF, Variables, Language
+@section Files Used by PSPP
+
+PSPP makes use of many files each time it runs.  Some of these it
+reads, some it writes, some it creates.  Here is a table listing the
+most important of these files:
+
+@table @strong
+@cindex file, command
+@cindex file, syntax file
+@cindex command file
+@cindex syntax file
+@item command file
+@itemx syntax file
+These names (synonyms) refer to the file that contains instructions to
+PSPP that tell it what to do.  The syntax file's name is specified on
+the PSPP command line.  Syntax files can also be pulled in with
+@cmd{INCLUDE} (@pxref{INCLUDE}).
+
+@cindex file, data
+@cindex data file
+@item data file
+Data files contain raw data in ASCII format suitable for being read in
+by @cmd{DATA LIST}.  Data can be embedded in the syntax
+file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
+syntax file a data file too.
+
+@cindex file, output
+@cindex output file
+@item listing file
+One or more output files are created by PSPP each time it is
+run.  The output files receive the tables and charts produced by
+statistical procedures.  The output files may be in any number of formats,
+depending on how PSPP is configured.
+
+@cindex active file
+@cindex file, active
+@item active file
+The active file is the ``file'' on which all PSPP procedures
+are performed.  The active file contains variable definitions and
+cases.  The active file is not necessarily a disk file: it is stored
+in memory if there is room.
+@end table
+
+@node BNF,  , Files, Language
+@section Backus-Naur Form
+@cindex BNF
+@cindex Backus-Naur Form
+@cindex command syntax, description of
+@cindex description of command syntax
+
+The syntax of some parts of the PSPP language is presented in this
+manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
+following table describes BNF:
+
+@itemize @bullet
+@cindex keywords
+@cindex terminals
+@item
+Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
+often called @dfn{terminals}.  There are some special terminals, which
+are actually written in lowercase for clarity:
+
+@table @asis
+@cindex @code{number}
+@item @code{number}
+A real number.
+
+@cindex @code{integer}
+@item @code{integer}
+An integer number.
+
+@cindex @code{string}
+@item @code{string}
+A string.
+
+@cindex @code{var-name}
+@item @code{var-name}
+A single variable name.
+
+@cindex operators
+@cindex punctuators
+@item @code{=}, @code{/}, @code{+}, @code{-}, etc.
+Operators and punctuators.
+
+@cindex @code{.}
+@cindex terminal dot
+@cindex dot, terminal
+@item @code{.}
+The terminal dot.  This is not necessarily an actual dot in the syntax
+file: @xref{Commands}, for more details.
+@end table
+
+@item
+@cindex productions
+@cindex nonterminals
+Other words in all lowercase refer to BNF definitions, called
+@dfn{productions}.  These productions are also known as
+@dfn{nonterminals}.  Some nonterminals are very common, so they are
+defined here in English for clarity:
+
+@table @code
+@cindex @code{var-list}
+@item var-list
+A list of one or more variable names or the keyword @code{ALL}.
+
+@cindex @code{expression}
+@item expression
+An expression.  @xref{Expressions}, for details.
+@end table
+
+@item
+@cindex @code{::=}
+@cindex ``is defined as''
+@cindex productions
+@samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
+the name of the nonterminal being defined.  The right side of @samp{::=}
+gives the definition of that nonterminal.  If the right side is empty,
+then one possible expansion of that nonterminal is nothing.  A BNF
+definition is called a @dfn{production}.
+
+@item
+@cindex terminals and nonterminals, differences
+So, the key difference between a terminal and a nonterminal is that a
+terminal cannot be broken into smaller parts---in fact, every terminal
+is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
+composed of a (possibly empty) sequence of terminals and nonterminals.
+Thus, terminals indicate the deepest level of syntax description.  (In
+parsing theory, terminals are the leaves of the parse tree; nonterminals
+form the branches.)
+
+@item
+@cindex start symbol
+@cindex symbol, start
+The first nonterminal defined in a set of productions is called the
+@dfn{start symbol}.  The start symbol defines the entire syntax for
+that command.
+@end itemize
+@setfilename ignored