doc/language.texi

   1 @node Language, Expressions, Invocation, Top
   2 @chapter The PSPP language
   3 @cindex language, PSPP
   4 @cindex PSPP, language
   5
   6 @quotation
   7 @strong{Please note:} PSPP is not even close to completion.
   8 Only a few statistical procedures are implemented.  PSPP
   9 is a work in progress.
  10 @end quotation
  11
  12 This chapter discusses elements common to many PSPP commands.
  13 Later chapters will describe individual commands in detail.
  14
  15 @menu
  16 * Tokens::                      Characters combine to form tokens.
  17 * Commands::                    Tokens combine to form commands.
  18 * Types of Commands::           Commands come in several flavors.
  19 * Order of Commands::           Commands combine to form syntax files.
  20 * Missing Observations::        Handling missing observations.
  21 * Variables::                   The unit of data storage.
  22 * Files::                       Files used by PSPP.
  23 * File Handles::                How files are named.
  24 * BNF::                         How command syntax is described.
  25 @end menu
  26
  27 @node Tokens, Commands, Language, Language
  28 @section Tokens
  29 @cindex language, lexical analysis
  30 @cindex language, tokens
  31 @cindex tokens
  32 @cindex lexical analysis
  33
  34 PSPP divides most syntax file lines into series of short chunks
  35 called @dfn{tokens}.
  36 Tokens are then grouped to form commands, each of which tells
  37 PSPP to take some action---read in data, write out data, perform
  38 a statistical procedure, etc.  Each type of token is
  39 described below.
  40
  41 @table @strong
  42 @cindex identifiers
  43 @item Identifiers
  44 Identifiers are names that typically specify variables, commands, or
  45 subcommands.  The first character in an identifier must be a letter,
  46 @samp{#}, or @samp{@@}.  The remaining characters in the identifier
  47 must be letters, digits, or one of the following special characters:
  48
  49 @example
  50 @center @.  _  $  #  @@
  51 @end example
  52
  53 @cindex case-sensitivity
  54 Identifiers may be any length, but only the first 64 bytes are
  55 significant.  Identifiers are not case-sensitive: @code{foobar},
  56 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
  57 different representations of the same identifier.
  58
  59 @cindex identifiers, reserved
  60 @cindex reserved identifiers
  61 Some identifiers are reserved.  Reserved identifiers may not be used
  62 in any context besides those explicitly described in this manual.  The
  63 reserved identifiers are:
  64
  65 @example
  66 @center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
  67 @end example
  68
  69 @item Keywords
  70 Keywords are a subclass of identifiers that form a fixed part of
  71 command syntax.  For example, command and subcommand names are
  72 keywords.  Keywords may be abbreviated to their first 3 characters if
  73 this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
  74 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
  75 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
  76
  77 Reserved identifiers are always used as keywords.  Other identifiers
  78 may be used both as keywords and as user-defined identifiers, such as
  79 variable names.
  80
  81 @item Numbers
  82 @cindex numbers
  83 @cindex integers
  84 @cindex reals
  85 Numbers are expressed in decimal.  A decimal point is optional.
  86 Numbers may be expressed in scientific notation by adding @samp{e} and
  87 a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
  88 are some more examples of valid numbers:
  89
  90 @example
  91 -5  3.14159265359  1e100  -.707  8945.
  92 @end example
  93
  94 Negative numbers are expressed with a @samp{-} prefix.  However, in
  95 situations where a literal @samp{-} token is expected, what appears to
  96 be a negative number is treated as @samp{-} followed by a positive
  97 number.
  98
  99 No white space is allowed within a number token, except for horizontal
 100 white space between @samp{-} and the rest of the number.
 101
 102 The last example above, @samp{8945.} will be interpreted as two
 103 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
 104 @xref{Commands, , Forming commands of tokens}.
 105
 106 @item Strings
 107 @cindex strings
 108 @cindex @samp{'}
 109 @cindex @samp{"}
 110 @cindex case-sensitivity
 111 Strings are literal sequences of characters enclosed in pairs of
 112 single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
 113 character used for quoting in the string, double it, e.g.@:
 114 @samp{'it''s an apostrophe'}.  White space and case of letters are
 115 significant inside strings.
 116
 117 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
 118 'c'} is equivalent to @samp{'abc'}.  Concatenation is useful for
 119 splitting a single string across multiple source lines. The maximum
 120 length of a string, after concatenation, is 255 characters.
 121
 122 Strings may also be expressed as hexadecimal, octal, or binary
 123 character values by prefixing the initial quote character by @samp{X},
 124 @samp{O}, or @samp{B} or their lowercase equivalents.  Each pair,
 125 triplet, or octet of characters, according to the radix, is
 126 transformed into a single character with the given value.  If there is
 127 an incomplete group of characters, the missing final digits are
 128 assumed to be @samp{0}.  These forms of strings are nonportable
 129 because numeric values are associated with different characters by
 130 different operating systems.  Therefore, their use should be confined
 131 to syntax files that will not be widely distributed.
 132
 133 @cindex characters, reserved
 134 @cindex 0
 135 @cindex white space
 136 The character with value 00 is reserved for
 137 internal use by PSPP.  Its use in strings causes an error and
 138 replacement by a space character.
 139
 140 @item Punctuators and Operators
 141 @cindex punctuators
 142 @cindex operators
 143 These tokens are the punctuators and operators:
 144
 145 @example
 146 @center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
 147 @end example
 148
 149 Most of these appear within the syntax of commands, but the period
 150 (@samp{.}) punctuator is used only at the end of a command.  It is a
 151 punctuator only as the last character on a line (except white space).
 152 When it is the last non-space character on a line, a period is not
 153 treated as part of another token, even if it would otherwise be part
 154 of, e.g.@:, an identifier or a floating-point number.
 155
 156 Actually, the character that ends a command can be changed with
 157 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
 158 doing so.  Throughout the remainder of this manual we will assume that
 159 the default setting is in effect.
 160 @end table
 161
 162 @node Commands, Types of Commands, Tokens, Language
 163 @section Forming commands of tokens
 164
 165 @cindex PSPP, command structure
 166 @cindex language, command structure
 167 @cindex commands, structure
 168
 169 Most PSPP commands share a common structure.  A command begins with a
 170 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
 171 CASES}.  The command name may be abbreviated to its first word, and
 172 each word in the command name may be abbreviated to its first three
 173 or more characters, where these abbreviations are unambiguous.
 174
 175 The command name may be followed by one or more @dfn{subcommands}.
 176 Each subcommand begins with a subcommand name, which may be
 177 abbreviated to its first three letters.  Some subcommands accept a
 178 series of one or more specifications, which follow the subcommand
 179 name, optionally separated from it by an equals sign
 180 (@samp{=}). Specifications may be separated from each other
 181 by commas or spaces.  Each subcommand must be separated from the next (if any)
 182 by a forward slash (@samp{/}).
 183
 184 There are multiple ways to mark the end of a command.  The most common
 185 way is to end the last line of the command with a period (@samp{.}) as
 186 described in the previous section (@pxref{Tokens}).  A blank line, or
 187 one that consists only of white space or comments, also ends a command
 188 by default, although you can use the NULLINE subcommand of @cmd{SET}
 189 to disable this feature (@pxref{SET}).
 190
 191 In batch mode only, that is, when reading commands from a file instead
 192 of an interactive user, any line that contains a non-space character
 193 in the leftmost column begins a new command.  Thus, each command
 194 consists of a flush-left line followed by any number of lines indented
 195 from the left margin.  In this mode, a plus sign, minus sign, or
 196 period (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character
 197 in a line is ignored and causes that line to begin a new command,
 198 which allows for visual indentation of a command without that command
 199 being considered part of the previous command.
 200
 201 Sometimes, one encounters syntax files that are intended to be
 202 interpreted in interactive mode rather than batch mode.  When this
 203 occurs, use the @samp{-i} command line option to force interpretation
 204 in interactive mode (@pxref{Language control options}).
 205
 206 @node Types of Commands, Order of Commands, Commands, Language
 207 @section Types of Commands
 208
 209 Commands in PSPP are divided roughly into six categories:
 210
 211 @table @strong
 212 @item Utility commands
 213 @cindex utility commands
 214 Set or display various global options that affect PSPP operations.
 215 May appear anywhere in a syntax file.  @xref{Utilities, , Utility
 216 commands}.
 217
 218 @item File definition commands
 219 @cindex file definition commands
 220 Give instructions for reading data from text files or from special
 221 binary ``system files''.  Most of these commands replace any previous
 222 data or variables with new data or
 223 variables.  At least one file definition command must appear before the first command in any of
 224 the categories below.  @xref{Data Input and Output}.
 225
 226 @item Input program commands
 227 @cindex input program commands
 228 Though rarely used, these provide tools for reading data files
 229 in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
 230
 231 @item Transformations
 232 @cindex transformations
 233 Perform operations on data and write data to output files.  Transformations
 234 are not carried out until a procedure is executed.
 235
 236 @item Restricted transformations
 237 @cindex restricted transformations
 238 Transformations that cannot appear in certain contexts.  @xref{Order
 239 of Commands}, for details.
 240
 241 @item Procedures
 242 @cindex procedures
 243 Analyze data, writing results of analyses to the listing file.  Cause
 244 transformations specified earlier in the file to be performed.  In a
 245 more general sense, a @dfn{procedure} is any command that causes the
 246 active file (the data) to be read.
 247 @end table
 248
 249 @node Order of Commands, Missing Observations, Types of Commands, Language
 250 @section Order of Commands
 251 @cindex commands, ordering
 252 @cindex order of commands
 253
 254 PSPP does not place many restrictions on ordering of commands.  The
 255 main restriction is that variables must be defined before they are otherwise
 256 referenced.  This section describes the details of command ordering,
 257 but most users will have no need to refer to them.
 258
 259 PSPP possesses five internal states, called initial, INPUT PROGRAM,
 260 FILE TYPE, transformation, and procedure states.  (Please note the
 261 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
 262 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
 263
 264 PSPP starts in the initial state.  Each successful completion
 265 of a command may cause a state transition.  Each type of command has its
 266 own rules for state transitions:
 267
 268 @table @strong
 269 @item Utility commands
 270 @itemize @bullet
 271 @item
 272 Valid in any state.
 273 @item
 274 Do not cause state transitions.  Exception: when @cmd{N OF CASES}
 275 is executed in the procedure state, it causes a transition to the
 276 transformation state.
 277 @end itemize
 278
 279 @item @cmd{DATA LIST}
 280 @itemize @bullet
 281 @item
 282 Valid in any state.
 283 @item
 284 When executed in the initial or procedure state, causes a transition to
 285 the transformation state.
 286 @item
 287 Clears the active file if executed in the procedure or transformation
 288 state.
 289 @end itemize
 290
 291 @item @cmd{INPUT PROGRAM}
 292 @itemize @bullet
 293 @item
 294 Invalid in INPUT PROGRAM and FILE TYPE states.
 295 @item
 296 Causes a transition to the INPUT PROGRAM state.
 297 @item
 298 Clears the active file.
 299 @end itemize
 300
 301 @item @cmd{FILE TYPE}
 302 @itemize @bullet
 303 @item
 304 Invalid in INPUT PROGRAM and FILE TYPE states.
 305 @item
 306 Causes a transition to the FILE TYPE state.
 307 @item
 308 Clears the active file.
 309 @end itemize
 310
 311 @item Other file definition commands
 312 @itemize @bullet
 313 @item
 314 Invalid in INPUT PROGRAM and FILE TYPE states.
 315 @item
 316 Cause a transition to the transformation state.
 317 @item
 318 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
 319 and @cmd{UPDATE}.
 320 @end itemize
 321
 322 @item Transformations
 323 @itemize @bullet
 324 @item
 325 Invalid in initial and FILE TYPE states.
 326 @item
 327 Cause a transition to the transformation state.
 328 @end itemize
 329
 330 @item Restricted transformations
 331 @itemize @bullet
 332 @item
 333 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 334 @item
 335 Cause a transition to the transformation state.
 336 @end itemize
 337
 338 @item Procedures
 339 @itemize @bullet
 340 @item
 341 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 342 @item
 343 Cause a transition to the procedure state.
 344 @end itemize
 345 @end table
 346
 347 @node Missing Observations, Variables, Order of Commands, Language
 348 @section Handling missing observations
 349 @cindex missing values
 350 @cindex values, missing
 351
 352 PSPP includes special support for unknown numeric data values.
 353 Missing observations are assigned a special value, called the
 354 @dfn{system-missing value}.  This ``value'' actually indicates the
 355 absence of a value; it means that the actual value is unknown.  Procedures
 356 automatically exclude from analyses those observations or cases that
 357 have missing values.  Details of missing value exclusion depend on the
 358 procedure and can often be controlled by the user; refer to
 359 descriptions of individual procedures for details.
 360
 361 The system-missing value exists only for numeric variables.  String
 362 variables always have a defined value, even if it is only a string of
 363 spaces.
 364
 365 Variables, whether numeric or string, can have designated
 366 @dfn{user-missing values}.  Every user-missing value is an actual value
 367 for that variable.  However, most of the time user-missing values are
 368 treated in the same way as the system-missing value.  String variables
 369 that are wider than a certain width, usually 8 characters (depending on
 370 computer architecture), cannot have user-missing values.
 371
 372 For more information on missing values, see the following sections:
 373 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
 374 documentation on individual procedures for information on how they
 375 handle missing values.
 376
 377 @node Variables, Files, Missing Observations, Language
 378 @section Variables
 379 @cindex variables
 380 @cindex dictionary
 381
 382 Variables are the basic unit of data storage in PSPP.  All the
 383 variables in a file taken together, apart from any associated data, are
 384 said to form a @dfn{dictionary}.
 385 Some details of variables are described in the sections below.
 386
 387 @menu
 388 * Attributes::                  Attributes of variables.
 389 * System Variables::            Variables automatically defined by PSPP.
 390 * Sets of Variables::           Lists of variable names.
 391 * Input/Output Formats::        Input and output formats.
 392 * Scratch Variables::           Variables deleted by procedures.
 393 @end menu
 394
 395 @node Attributes, System Variables, Variables, Variables
 396 @subsection Attributes of Variables
 397 @cindex variables, attributes of
 398 @cindex attributes of variables
 399 Each variable has a number of attributes, including:
 400
 401 @table @strong
 402 @item Name
 403 An identifier, up to 64 bytes long.  Each variable must have a different name.
 404 @xref{Tokens}.
 405
 406 Some system variable names begin with @samp{$}, but user-defined
 407 variables' names may not begin with @samp{$}.
 408
 409 @cindex @samp{.}
 410 @cindex period
 411 @cindex variable names, ending with period
 412 The final character in a variable name should not be @samp{.}, because
 413 such an identifier will be misinterpreted when it is the final token
 414 on a line: @code{FOO.} will be divided into two separate tokens,
 415 @samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.
 416
 417 @cindex @samp{_}
 418 The final character in a variable name should not be @samp{_}, because
 419 some such identifiers are used for special purposes by PSPP
 420 procedures.
 421
 422 As with all PSPP identifiers, variable names are not case-sensitive.
 423 PSPP capitalizes variable names on output the same way they were
 424 capitalized at their point of definition in the input.
 425
 426 @cindex variables, type
 427 @cindex type of variables
 428 @item Type
 429 Numeric or string.
 430
 431 @cindex variables, width
 432 @cindex width of variables
 433 @item Width
 434 (string variables only) String variables with a width of 8 characters or
 435 fewer are called @dfn{short string variables}.  Short string variables
 436 can be used in many procedures where @dfn{long string variables} (those
 437 with widths greater than 8) are not allowed.
 438
 439 Certain systems may consider strings longer than 8
 440 characters to be short strings.  Eight characters represents a minimum
 441 figure for the maximum length of a short string.
 442
 443 @item Position
 444 Variables in the dictionary are arranged in a specific order.
 445 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
 446
 447 @item Initialization
 448 Either reinitialized to 0 or spaces for each case, or left at its
 449 existing value.  @xref{LEAVE}.
 450
 451 @cindex missing values
 452 @cindex values, missing
 453 @item Missing values
 454 Optionally, up to three values, or a range of values, or a specific
 455 value plus a range, can be specified as @dfn{user-missing values}.
 456 There is also a @dfn{system-missing value} that is assigned to an
 457 observation when there is no other obvious value for that observation.
 458 Observations with missing values are automatically excluded from
 459 analyses.  User-missing values are actual data values, while the
 460 system-missing value is not a value at all.  @xref{Missing Observations}.
 461
 462 @cindex variable labels
 463 @cindex labels, variable
 464 @item Variable label
 465 A string that describes the variable.  @xref{VARIABLE LABELS}.
 466
 467 @cindex value labels
 468 @cindex labels, value
 469 @item Value label
 470 Optionally, these associate each possible value of the variable with a
 471 string.  @xref{VALUE LABELS}.
 472
 473 @cindex print format
 474 @item Print format
 475 Display width, format, and (for numeric variables) number of decimal
 476 places.  This attribute does not affect how data are stored, just how
 477 they are displayed.  Example: a width of 8, with 2 decimal places.
 478 @xref{PRINT FORMATS}.
 479
 480 @cindex write format
 481 @item Write format
 482 Similar to print format, but used by certain commands that are
 483 designed to write to binary files.  @xref{WRITE FORMATS}.
 484 @end table
 485
 486 @node System Variables, Sets of Variables, Attributes, Variables
 487 @subsection Variables Automatically Defined by PSPP
 488 @cindex system variables
 489 @cindex variables, system
 490
 491 There are seven system variables.  These are not like ordinary
 492 variables because system variables are not always stored.  They can be used only
 493 in expressions.  These system variables, whose values and output formats
 494 cannot be modified, are described below.
 495
 496 @table @code
 497 @cindex @code{$CASENUM}
 498 @item $CASENUM
 499 Case number of the case at the moment.  This changes as cases are
 500 shuffled around.
 501
 502 @cindex @code{$DATE}
 503 @item $DATE
 504 Date the PSPP process was started, in format A9, following the
 505 pattern @code{DD MMM YY}.
 506
 507 @cindex @code{$JDATE}
 508 @item $JDATE
 509 Number of days between 15 Oct 1582 and the time the PSPP process
 510 was started.
 511
 512 @cindex @code{$LENGTH}
 513 @item $LENGTH
 514 Page length, in lines, in format F11.
 515
 516 @cindex @code{$SYSMIS}
 517 @item $SYSMIS
 518 System missing value, in format F1.
 519
 520 @cindex @code{$TIME}
 521 @item $TIME
 522 Number of seconds between midnight 14 Oct 1582 and the time the active file
 523 was read, in format F20.
 524
 525 @cindex @code{$WIDTH}
 526 @item $WIDTH
 527 Page width, in characters, in format F3.
 528 @end table
 529
 530 @node Sets of Variables, Input/Output Formats, System Variables, Variables
 531 @subsection Lists of variable names
 532 @cindex TO convention
 533 @cindex convention, TO
 534
 535 To refer to a set of variables, list their names one after another.
 536 Optionally, their names may be separated by commas.  To include a
 537 range of variables from the dictionary in the list, write the name of
 538 the first and last variable in the range, separated by @code{TO}.  For
 539 instance, if the dictionary contains six variables with the names
 540 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
 541 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
 542 variables @code{X2}, @code{GOAL}, and @code{MET}.
 543
 544 Commands that define variables, such as @cmd{DATA LIST}, give
 545 @code{TO} an alternate meaning.  With these commands, @code{TO} define
 546 sequences of variables whose names end in consecutive integers.  The
 547 syntax is two identifiers that begin with the same root and end with
 548 numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
 549 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
 550 @code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
 551 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
 552 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
 553 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
 554
 555 After a set of variables has been defined with @cmd{DATA LIST} or
 556 another command with this method, the same set can be referenced on
 557 later commands using the same syntax.
 558
 559 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
 560 @subsection Input and Output Formats
 561
 562 Data that PSPP inputs and outputs must have one of a number of formats.
 563 These formats are described, in general, by a format specification of
 564 the form @code{NAMEw.d}, where @var{name} is the
 565 format name and @var{w} is a field width.  @var{d} is the optional
 566 desired number of decimal places, if appropriate.  If @var{d} is not
 567 included then it is assumed to be 0.  Some formats do not allow @var{d}
 568 to be specified.
 569
 570 When @cmd{DATA LIST} or another command specifies an input format,
 571 that format is converted to an output format for the purposes of
 572 @cmd{PRINT} and other data output commands.  For most purposes, input
 573 and output formats are the same; the salient differences are described
 574 below.
 575
 576 Below are listed the input and output formats supported by PSPP.  If an
 577 input format is mapped to a different output format by default, then
 578 that mapping is indicated with @result{}.  Each format has the listed
 579 bounds on input width (iw) and output width (ow).
 580
 581 The standard numeric input and output formats are given in the following
 582 table:
 583
 584 @table @asis
 585 @item Fw.d: 1 <= iw,ow <= 40
 586 Standard decimal format with @var{d} decimal places.  If the number is
 587 too large to fit within the field width, it is expressed in scientific
 588 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
 589 the exponent.  When used as an input format, scientific notation is
 590 allowed but an E or an F must be used to introduce the exponent.
 591
 592 The default output format is the same as the input format, except if
 593 @var{d} > 1.  In that case the output @var{w} is always made to be at
 594 least 2 + @var{d}.
 595
 596 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
 597 For input this is equivalent to F format except that no E or F is
 598 require to introduce the exponent.  For output, produces scientific
 599 notation in the form @code{1.2+34}.  There are always at least two
 600 digits given in the exponent.
 601
 602 The default output @var{w} is the largest of the input @var{w}, the
 603 input @var{d} + 7, and 10.  The default output @var{d} is the input
 604 @var{d}, but at least 3.
 605
 606 @item COMMAw.d: 1 <= iw,ow <= 40
 607 Equivalent to F format, except that groups of three digits are
 608 comma-separated on output.  If the number is too large to express in the
 609 field width, then first commas are eliminated, then if there is still
 610 not enough space the number is expressed in scientific notation given
 611 that w >= 6.  Commas are allowed and ignored when this is used as an
 612 input format.
 613
 614 @item DOTw.d: 1 <= iw,ow <= 40
 615 Equivalent to COMMA format except that the roles of comma and decimal
 616 point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
 617 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
 618 decimal point.
 619
 620 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
 621 Equivalent to COMMA format, except that the number is prefixed by a
 622 dollar sign (@samp{$}) if there is room.  On input the value is allowed
 623 to be prefixed by a dollar sign, which is ignored.
 624
 625 The default output @var{w} is the input @var{w}, but at least 2.
 626
 627 @item PCTw.d: 2 <= iw,ow <= 40
 628 Equivalent to F format, except that the number is suffixed by a percent
 629 sign (@samp{%}) if there is room.  On input the value is allowed to be
 630 suffixed by a percent sign, which is ignored.
 631
 632 The default output @var{w} is the input @var{w}, but at least 2.
 633
 634 @item Nw.d: 1 <= iw,ow <= 40
 635 Only digits are allowed within the field width.  The decimal point is
 636 assumed to be @var{d} digits from the right margin.
 637
 638 The default output format is F with the same @var{w} and @var{d}, except
 639 if @var{d} > 1.  In that case the output @var{w} is always made to be at
 640 least 2 + @var{d}.
 641
 642 @item Zw.d @result{} F: 1 <= iw,ow <= 40
 643 Zoned decimal input.  If you need to use this then you know how.
 644
 645 @item IBw.d @result{} F: 1 <= iw,ow <= 8
 646 Integer binary format.  The field is interpreted as a fixed-point
 647 positive or negative binary number in two's-complement notation.  The
 648 location of the decimal point is implied.  Endianness is the same as the
 649 host machine.
 650
 651 The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
 652 with output @var{w} as 9 + input @var{d} and output @var{d} as input
 653 @var{d}.
 654
 655 @item PIB @result{} F: 1 <= iw,ow <= 8
 656 Positive integer binary format.  The field is interpreted as a
 657 fixed-point positive binary number.  The location of the decimal point
 658 is implied.  Endianness is the same as the host machine.
 659
 660 The default output format follows the rules for IB format.
 661
 662 @item Pw.d @result{} F: 1 <= iw,ow <= 16
 663 Binary coded decimal format.  Each byte from left to right, except the
 664 rightmost, represents two digits.  The upper nibble of each byte is more
 665 significant.  The upper nibble of the final byte is the least
 666 significant digit.  The lower nibble of the final byte is the sign; a
 667 value of D represents a negative sign and all other values are
 668 considered positive.  The decimal point is implied.
 669
 670 The default output format follows the rules for IB format.
 671
 672 @item PKw.d @result{} F: 1 <= iw,ow <= 16
 673 Positive binary code decimal format.  Same as P but the last byte is the
 674 same as the others.
 675
 676 The default output format follows the rules for IB format.
 677
 678 @item RBw @result{} F: 2 <= iw,ow <= 8
 679
 680 Binary C architecture-dependent ``double'' format.  For a standard
 681 IEEE754 implementation @var{w} should be 8.
 682
 683 The default output format follows the rules for IB format.
 684
 685 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
 686 PIB format encoded as textual hex digit pairs.  @var{w} must be even.
 687
 688 The input width is mapped to a default output width as follows:
 689 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
 690 12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
 691 decimal places.
 692
 693 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
 694
 695 RB format encoded as textual hex digits pairs.  @var{w} must be even.
 696
 697 The default output format is F8.2.
 698
 699 @item CCAw.d: 1 <= ow <= 40
 700 @itemx CCBw.d: 1 <= ow <= 40
 701 @itemx CCCw.d: 1 <= ow <= 40
 702 @itemx CCDw.d: 1 <= ow <= 40
 703 @itemx CCEw.d: 1 <= ow <= 40
 704
 705 User-defined custom currency formats.  May not be used as an input
 706 format.  @xref{SET}, for more details.
 707 @end table
 708
 709 The date and time numeric input and output formats accept a number of
 710 possible formats.  Before describing the formats themselves, some
 711 definitions of the elements that make up their formats will be helpful:
 712
 713 @table @dfn
 714 @item leader
 715 All formats accept an optional white space leader.
 716
 717 @item day
 718 An integer between 1 and 31 representing the day of month.
 719
 720 @item day-count
 721 An integer representing a number of days.
 722
 723 @item date-delimiter
 724 One or more characters of white space or the following characters:
 725 @code{- / . ,}
 726
 727 @item month
 728 A month name in one of the following forms:
 729 @itemize @bullet
 730 @item
 731 An integer between 1 and 12.
 732 @item
 733 Roman numerals representing an integer between 1 and 12.
 734 @item
 735 At least the first three characters of an English month name (January,
 736 February, @dots{}).
 737 @end itemize
 738
 739 @item year
 740 An integer year number between 1582 and 19999, or between 1 and 199.
 741 Years between 1 and 199 will have 1900 added.
 742
 743 @item julian
 744 A single number with a year number in the first 2, 3, or 4 digits (as
 745 above) and the day number within the year in the last 3 digits.
 746
 747 @item quarter
 748 An integer between 1 and 4 representing a quarter.
 749
 750 @item q-delimiter
 751 The letter @samp{Q} or @samp{q}.
 752
 753 @item week
 754 An integer between 1 and 53 representing a week within a year.
 755
 756 @item wk-delimiter
 757 The letters @samp{wk} in any case.
 758
 759 @item time-delimiter
 760 At least one characters of white space or @samp{:} or @samp{.}.
 761
 762 @item hour
 763 An integer greater than 0 representing an hour.
 764
 765 @item minute
 766 An integer between 0 and 59 representing a minute within an hour.
 767
 768 @item opt-second
 769 Optionally, a time-delimiter followed by a real number representing a
 770 number of seconds.
 771
 772 @item hour24
 773 An integer between 0 and 23 representing an hour within a day.
 774
 775 @item weekday
 776 At least the first two characters of an English day word.
 777
 778 @item spaces
 779 Any amount or no amount of white space.
 780
 781 @item sign
 782 An optional positive or negative sign.
 783
 784 @item trailer
 785 All formats accept an optional white space trailer.
 786 @end table
 787
 788 The date input formats are strung together from the above pieces.  On
 789 output, the date formats are always printed in a single canonical
 790 manner, based on field width.  The date input and output formats are
 791 described below:
 792
 793 @table @asis
 794 @item DATEw: 9 <= iw,ow <= 40
 795 Date format. Input format: leader + day + date-delimiter +
 796 month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
 797 @var{w} < 11, DD-MMM-YYYY otherwise.
 798
 799 @item EDATEw: 8 <= iw,ow <= 40
 800 European date format.  Input format same as DATE.  Output format:
 801 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
 802
 803 @item SDATEw: 8 <= iw,ow <= 40
 804 Standard date format. Input format: leader + year + date-delimiter +
 805 month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
 806 @var{w} < 10, YYYY/MM/DD otherwise.
 807
 808 @item ADATEw: 8 <= iw,ow <= 40
 809 American date format.  Input format: leader + month + date-delimiter +
 810 day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
 811 @var{w} < 10, MM/DD/YYYY otherwise.
 812
 813 @item JDATEw: 5 <= iw,ow <= 40
 814 Julian date format.  Input format: leader + julian + trailer.  Output
 815 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
 816
 817 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
 818 Quarter/year format.  Input format: leader + quarter + q-delimiter +
 819 year + trailer.  Output format: @samp{Q Q YY}, where the first
 820 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
 821 YYYY} otherwise.
 822
 823 @item MOYRw: 6 <= iw,ow <= 40
 824 Month/year format.  Input format: leader + month + date-delimiter + year
 825 + trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
 826 YYYY} otherwise.
 827
 828 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
 829 Week/year format.  Input format: leader + week + wk-delimiter + year +
 830 trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
 831 YYYY} otherwise.
 832
 833 @item DATETIMEw.d: 17 <= iw,ow <= 40
 834 Date and time format.  Input format: leader + day + date-delimiter +
 835 month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
 836 + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
 837 @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
 838 @var{d} > 0 then fractional seconds @samp{.SS} are added.
 839
 840 @item TIMEw.d: 5 <= iw,ow <= 40
 841 Time format.  Input format: leader + sign + spaces + hour +
 842 time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
 843 Seconds and fractional seconds are available with @var{w} of at least 8
 844 and 10, respectively.
 845
 846 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
 847 Time format with day count.  Input format: leader + sign + spaces +
 848 day-count + time-delimiter + hour + time-delimiter + minute +
 849 opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
 850 seconds are available with @var{w} of at least 8 and 10, respectively.
 851
 852 @item WKDAYw: 2 <= iw,ow <= 40
 853 A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
 854 leader + weekday + trailer.  Output format: as many characters, in all
 855 capital letters, of the English name of the weekday as will fit in the
 856 field width.
 857
 858 @item MONTHw: 3 <= iw,ow <= 40
 859 A month as a number between 1 and 12, where 1 is January.  Input format:
 860 leader + month + trailer.  Output format: as many character, in all
 861 capital letters, of the English name of the month as will fit in the
 862 field width.
 863 @end table
 864
 865 There are only two formats that may be used with string variables:
 866
 867 @table @asis
 868 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
 869 The entire field is treated as a string value.
 870
 871 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
 872 The field is composed of characters in a string encoded as textual hex
 873 digit pairs.
 874
 875 The default output @var{w} is half the input @var{w}.
 876 @end table
 877
 878 @node Scratch Variables,  , Input/Output Formats, Variables
 879 @subsection Scratch Variables
 880
 881 Most of the time, variables don't retain their values between cases.
 882 Instead, either they're being read from a data file or the active file,
 883 in which case they assume the value read, or, if created with
 884 @cmd{COMPUTE} or
 885 another transformation, they're initialized to the system-missing value
 886 or to blanks, depending on type.
 887
 888 However, sometimes it's useful to have a variable that keeps its value
 889 between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
 890 use a @dfn{scratch variable}.  Scratch variables are variables whose
 891 names begin with an octothorpe (@samp{#}).
 892
 893 Scratch variables have the same properties as variables left with
 894 @cmd{LEAVE}: they retain their values between cases, and for the first
 895 case they are initialized to 0 or blanks.  They have the additional
 896 property that they are deleted before the execution of any procedure.
 897 For this reason, scratch variables can't be used for analysis.  To use
 898 a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
 899 to copy its value into an ordinary variable, then use that ordinary
 900 variable in the analysis.
 901
 902 @node Files
 903 @section Files Used by PSPP
 904
 905 PSPP makes use of many files each time it runs.  Some of these it
 906 reads, some it writes, some it creates.  Here is a table listing the
 907 most important of these files:
 908
 909 @table @strong
 910 @cindex file, command
 911 @cindex file, syntax file
 912 @cindex command file
 913 @cindex syntax file
 914 @item command file
 915 @itemx syntax file
 916 These names (synonyms) refer to the file that contains instructions
 917 that tell PSPP what to do.  The syntax file's name is specified on
 918 the PSPP command line.  Syntax files can also be read with
 919 @cmd{INCLUDE} (@pxref{INCLUDE}).
 920
 921 @cindex file, data
 922 @cindex data file
 923 @item data file
 924 Data files contain raw data in text or binary format.  Data can also
 925 be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
 926
 927 @cindex file, output
 928 @cindex output file
 929 @item listing file
 930 One or more output files are created by PSPP each time it is
 931 run.  The output files receive the tables and charts produced by
 932 statistical procedures.  The output files may be in any number of formats,
 933 depending on how PSPP is configured.
 934
 935 @cindex active file
 936 @cindex file, active
 937 @item active file
 938 The active file is the ``file'' on which all PSPP procedures are
 939 performed.  The active file consists of a dictionary and a set of cases.
 940 The active file is not necessarily a disk file: it is stored in memory
 941 if there is room.
 942
 943 @cindex system file
 944 @cindex file, system
 945 @item system file
 946 System files are binary files that store a dictionary and a set of
 947 cases.  @cmd{GET} and @cmd{SAVE} read and write system files.
 948
 949 @cindex portable file
 950 @cindex file, portable
 951 @item portable file
 952 Portable files are files in a text-based format that store a dictionary
 953 and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
 954 portable files.
 955
 956 @cindex scratch file
 957 @cindex file, scratch
 958 @item scratch file
 959 Scratch files consist of a dictionary and cases and may be stored in
 960 memory or on disk.  Most procedures that act on a system file or
 961 portable file can use a scratch file instead.  The contents of scratch
 962 files persist within a single PSPP session only.  @cmd{GET} and
 963 @cmd{SAVE} can be used to read and write scratch files.  Scratch files
 964 are a PSPP extension.
 965 @end table
 966
 967 @node File Handles
 968 @section File Handles
 969 @cindex file handles
 970
 971 A @dfn{file handle} is a reference to a data file, system file, portable
 972 file, or scratch file.  Most often, a file handle is specified as the
 973 name of a file as a string, that is, enclosed within @samp{'} or
 974 @samp{"}.
 975
 976 PSPP also supports declaring named file handles with the @cmd{FILE
 977 HANDLE} command.  This command associates an identifier of your choice
 978 (the file handle's name) with a file.  Later, the file handle name can
 979 be substituted for the name of the file.  When PSPP syntax accesses a
 980 file multiple times, declaring a named file handle simplifies updating
 981 the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
 982 also required to read data files in binary formats.  @xref{FILE HANDLE},
 983 for more information.
 984
 985 PSPP assumes that a file handle name that begins with @samp{#} refers to
 986 a scratch file, unless the name has already been declared on @cmd{FILE
 987 HANDLE} to refer to another kind of file.  A scratch file is similar to
 988 a system file, except that it persists only for the duration of a given
 989 PSPP session.  Most commands that read or write a system or portable
 990 file, such as @cmd{GET} and @cmd{SAVE}, also accept scratch file
 991 handles.  Scratch file handles may also be declared explicitly with
 992 @cmd{FILE HANDLE}.  Scratch files are a PSPP extension.
 993
 994 In some circumstances, PSPP must distinguish whether a file handle
 995 refers to a system file or a portable file.  When this is necessary to
 996 read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
 997 PSPP uses the file's contents to decide.  In the context of writing a
 998 file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, PSPP
 999 decides based on the file's name: if it ends in @samp{.por} (with any
1000 capitalization), then PSPP writes a portable file; otherwise, PSPP
1001 writes a system file.
1002
1003 INLINE is reserved as a file handle name.  It refers to the ``data
1004 file'' embedded into the syntax file between @cmd{BEGIN DATA} and
1005 @cmd{END DATA}.  @xref{BEGIN DATA}, for more information.
1006
1007 The file to which a file handle refers may be reassigned on a later
1008 @cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
1009 HANDLE}.  The @cmd{CLOSE FILE HANDLE} command is also useful to free the
1010 storage associated with a scratch file.  @xref{CLOSE FILE HANDLE}, for
1011 more information.
1012
1013 @node BNF
1014 @section Backus-Naur Form
1015 @cindex BNF
1016 @cindex Backus-Naur Form
1017 @cindex command syntax, description of
1018 @cindex description of command syntax
1019
1020 The syntax of some parts of the PSPP language is presented in this
1021 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
1022 following table describes BNF:
1023
1024 @itemize @bullet
1025 @cindex keywords
1026 @cindex terminals
1027 @item
1028 Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
1029 often called @dfn{terminals}.  There are some special terminals, which
1030 are written in lowercase for clarity:
1031
1032 @table @asis
1033 @cindex @code{number}
1034 @item @code{number}
1035 A real number.
1036
1037 @cindex @code{integer}
1038 @item @code{integer}
1039 An integer number.
1040
1041 @cindex @code{string}
1042 @item @code{string}
1043 A string.
1044
1045 @cindex @code{var-name}
1046 @item @code{var-name}
1047 A single variable name.
1048
1049 @cindex operators
1050 @cindex punctuators
1051 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
1052 Operators and punctuators.
1053
1054 @cindex @code{.}
1055 @item @code{.}
1056 The end of the command.  This is not necessarily an actual dot in the
1057 syntax file: @xref{Commands}, for more details.
1058 @end table
1059
1060 @item
1061 @cindex productions
1062 @cindex nonterminals
1063 Other words in all lowercase refer to BNF definitions, called
1064 @dfn{productions}.  These productions are also known as
1065 @dfn{nonterminals}.  Some nonterminals are very common, so they are
1066 defined here in English for clarity:
1067
1068 @table @code
1069 @cindex @code{var-list}
1070 @item var-list
1071 A list of one or more variable names or the keyword @code{ALL}.
1072
1073 @cindex @code{expression}
1074 @item expression
1075 An expression.  @xref{Expressions}, for details.
1076 @end table
1077
1078 @item
1079 @cindex ``is defined as''
1080 @cindex productions
1081 @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
1082 the name of the nonterminal being defined.  The right side of @samp{::=}
1083 gives the definition of that nonterminal.  If the right side is empty,
1084 then one possible expansion of that nonterminal is nothing.  A BNF
1085 definition is called a @dfn{production}.
1086
1087 @item
1088 @cindex terminals and nonterminals, differences
1089 So, the key difference between a terminal and a nonterminal is that a
1090 terminal cannot be broken into smaller parts---in fact, every terminal
1091 is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
1092 composed of a (possibly empty) sequence of terminals and nonterminals.
1093 Thus, terminals indicate the deepest level of syntax description.  (In
1094 parsing theory, terminals are the leaves of the parse tree; nonterminals
1095 form the branches.)
1096
1097 @item
1098 @cindex start symbol
1099 @cindex symbol, start
1100 The first nonterminal defined in a set of productions is called the
1101 @dfn{start symbol}.  The start symbol defines the entire syntax for
1102 that command.
1103 @end itemize
1104 @setfilename ignored