pintos-os.org Git - pspp/blob - doc/language.texi

   1 @node Language, Expressions, Invocation, Top
   2 @chapter The PSPP language
   3 @cindex language, PSPP
   4 @cindex PSPP, language
   5
   6 @quotation
   7 @strong{Please note:} PSPP is not even close to completion.
   8 Only a few actual statistical procedures are implemented.  PSPP
   9 is a work in progress.
  10 @end quotation
  11
  12 This chapter discusses elements common to many PSPP commands.
  13 Later chapters will describe individual commands in detail.
  14
  15 @menu
  16 * Tokens::                      Characters combine to form tokens.
  17 * Commands::                    Tokens combine to form commands.
  18 * Types of Commands::           Commands come in several flavors.
  19 * Order of Commands::           Commands combine to form syntax files.
  20 * Missing Observations::        Handling missing observations.
  21 * Variables::                   The unit of data storage.
  22 * Files::                       Files used by PSPP.
  23 * BNF::                         How command syntax is described.
  24 @end menu
  25
  26 @node Tokens, Commands, Language, Language
  27 @section Tokens
  28 @cindex language, lexical analysis
  29 @cindex language, tokens
  30 @cindex tokens
  31 @cindex lexical analysis
  32
  33 PSPP divides most syntax file lines into series of short chunks
  34 called @dfn{tokens}.
  35 Tokens are then grouped to form commands, each of which tells
  36 PSPP to take some action---read in data, write out data, perform
  37 a statistical procedure, etc.  Each type of token is
  38 described below.
  39
  40 @table @strong
  41 @cindex identifiers
  42 @item Identifiers
  43 Identifiers are names that typically specify variables, commands, or
  44 subcommands.  The first character in an identifier must be a letter,
  45 @samp{#}, or @samp{@@}.  The remaining characters in the identifier
  46 must be letters, digits, or one of the following special characters:
  47
  48 @example
  49 @center @.  _  $  #  @@
  50 @end example
  51
  52 @cindex case-sensitivity
  53 Identifiers may be up any length, but only the first 64 bytes are
  54 significant.  Identifiers are not case-sensitive: @code{foobar},
  55 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
  56 different representations of the same identifier.
  57
  58 @cindex identifiers, reserved
  59 @cindex reserved identifiers
  60 Some identifiers are reserved.  Reserved identifiers may not be used
  61 in any context besides those explicitly described in this manual.  The
  62 reserved identifiers are:
  63
  64 @example
  65 @center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
  66 @end example
  67
  68 @item Keywords
  69 Keywords are a subclass of identifiers that form a fixed part of
  70 command syntax.  For example, command and subcommand names are
  71 keywords.  Keywords may be abbreviated to their first 3 characters if
  72 this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
  73 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
  74 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
  75
  76 Reserved identifiers are always used as keywords.  Other identifiers
  77 may be used both as keywords and as user-defined identifiers, such as
  78 variable names.
  79
  80 @item Numbers
  81 @cindex numbers
  82 @cindex integers
  83 @cindex reals
  84 Numbers are expressed in decimal.  A decimal point is optional.
  85 Numbers may be expressed in scientific notation by adding @samp{e} and
  86 a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
  87 are some more examples of valid numbers:
  88
  89 @example
  90 -5  3.14159265359  1e100  -.707  8945.
  91 @end example
  92
  93 Negative numbers are expressed with a @samp{-} prefix.  However, in
  94 situations where a literal @samp{-} token is expected, what appears to
  95 be a negative number is treated as @samp{-} followed by a positive
  96 number.
  97
  98 No white space is allowed within a number token, except for horizontal
  99 white space between @samp{-} and the rest of the number.
 100
 101 The last example above, @samp{8945.} will be interpreted as two
 102 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
 103 @xref{Commands, , Forming commands of tokens}.
 104
 105 @item Strings
 106 @cindex strings
 107 @cindex @samp{'}
 108 @cindex @samp{"}
 109 @cindex case-sensitivity
 110 Strings are literal sequences of characters enclosed in pairs of
 111 single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
 112 character used for quoting in the string, double it, e.g.@:
 113 @samp{'it''s an apostrophe'}.  White space and case of letters are
 114 significant inside strings.
 115
 116 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
 117 'c'} is equivalent to @samp{'abc'}.  Concatenation is useful for
 118 splitting a single string across multiple source lines. The maximum
 119 length of a string, after concatenation, is 255 characters.
 120
 121 Strings may also be expressed as hexadecimal, octal, or binary
 122 character values by prefixing the initial quote character by @samp{X},
 123 @samp{O}, or @samp{B} or their lowercase equivalents.  Each pair,
 124 triplet, or octet of characters, according to the radix, is
 125 transformed into a single character with the given value.  If there is
 126 an incomplete group of characters, the missing final digits are
 127 assumed to be @samp{0}.  These forms of strings are nonportable
 128 because numeric values are associated with different characters by
 129 different operating systems.  Therefore, their use should be confined
 130 to syntax files that will not be widely distributed.
 131
 132 @cindex characters, reserved
 133 @cindex 0
 134 @cindex white space
 135 The character with value 00 is reserved for
 136 internal use by PSPP.  Its use in strings causes an error and
 137 replacement by a space character.
 138
 139 @item Punctuators and Operators
 140 @cindex punctuators
 141 @cindex operators
 142 These tokens are the punctuators and operators:
 143
 144 @example
 145 @center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
 146 @end example
 147
 148 Most of these appear within the syntax of commands, but the period
 149 (@samp{.}) punctuator is used only at the end of a command.  It is a
 150 punctuator only as the last character on a line (except white space).
 151 When it is the last non-space character on a line, a period is not
 152 treated as part of another token, even if it would otherwise be part
 153 of e.g.@: an identifier or a floating-point number.
 154
 155 Actually, the character that ends a command can be changed with
 156 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
 157 doing so.  Throughout the remainder of this manual we will assume that
 158 the default setting is in effect.
 159 @end table
 160
 161 @node Commands, Types of Commands, Tokens, Language
 162 @section Forming commands of tokens
 163
 164 @cindex PSPP, command structure
 165 @cindex language, command structure
 166 @cindex commands, structure
 167
 168 Most PSPP commands share a common structure.  A command begins with a
 169 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
 170 CASES}.  The command name may be abbreviated to its first word, and
 171 each word in the command name may be abbreviated to its first three
 172 or more characters, where these abbreviations are unambiguous.
 173
 174 The command name may be followed by one or more @dfn{subcommands}.
 175 Each subcommand begins with a subcommand name, which may be
 176 abbreviated to its first three letters.  Some subcommands accept a
 177 series of one or more specifications, which follow the subcommand name
 178 and, optionally separated from it by an equals sign (@samp{=}), and
 179 optionally separated from each other by commas.  Each subcommand must
 180 be separated from the next (if any) by a forward slash (@samp{/}).
 181
 182 There are multiple ways to mark the end of a command.  The most common
 183 way is to end the last line of the command with a period (@samp{.}) as
 184 described in the previous section (@pxref{Tokens}).  A blank line, or
 185 one that consists only of white space or comments, also ends a command
 186 by default, although you can use the NULLINE subcommand of @cmd{SET}
 187 to disable this feature (@pxref{SET}).
 188
 189 In batch mode only, that is, when reading commands from a file instead
 190 of an interactive user, any line that contains a non-space character
 191 in the leftmost column begins a new command.  Thus, each command
 192 consists of a flush-left line followed by any number of lines indented
 193 from the left margin.  In this mode, a plus sign, minus sign, or
 194 period (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character
 195 in a line is ignored and causes that line to begin a new command,
 196 which allows for visual indentation of a command without that command
 197 being considered part of the previous command.
 198
 199 Sometimes, one encounters syntax files that are intended to be
 200 interpreted in interactive mode rather than batch mode.  When this
 201 occurs, use the @samp{-i} command line option to force interpretation
 202 in interactive mode (@pxref{Language control options}).
 203
 204 @node Types of Commands, Order of Commands, Commands, Language
 205 @section Types of Commands
 206
 207 Commands in PSPP are divided roughly into six categories:
 208
 209 @table @strong
 210 @item Utility commands
 211 @cindex utility commands
 212 Set or display various global options that affect PSPP operations.
 213 May appear anywhere in a syntax file.  @xref{Utilities, , Utility
 214 commands}.
 215
 216 @item File definition commands
 217 @cindex file definition commands
 218 Give instructions for reading data from text files or from special
 219 binary ``system files''.  Most of these commands discard any previous
 220 data or variables to replace it with the new data and
 221 variables.  At least one must appear before the first command in any of
 222 the categories below.  @xref{Data Input and Output}.
 223
 224 @item Input program commands
 225 @cindex input program commands
 226 Though rarely used, these provide powerful tools for reading data files
 227 in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
 228
 229 @item Transformations
 230 @cindex transformations
 231 Perform operations on data and write data to output files.  Transformations
 232 are not carried out until a procedure is executed.
 233
 234 @item Restricted transformations
 235 @cindex restricted transformations
 236 Transformations that cannot appear in certain contexts.  @xref{Order
 237 of Commands}, for details.
 238
 239 @item Procedures
 240 @cindex procedures
 241 Analyze data, writing results of analyses to the listing file.  Cause
 242 transformations specified earlier in the file to be performed.  In a
 243 more general sense, a @dfn{procedure} is any command that causes the
 244 active file (the data) to be read.
 245 @end table
 246
 247 @node Order of Commands, Missing Observations, Types of Commands, Language
 248 @section Order of Commands
 249 @cindex commands, ordering
 250 @cindex order of commands
 251
 252 PSPP does not place many restrictions on ordering of commands.  The
 253 main restriction is that variables must be defined they are otherwise
 254 referenced.  This section describes the details of command ordering,
 255 but most users will have no need to refer to them.
 256
 257 PSPP possesses five internal states, called initial, INPUT PROGRAM,
 258 FILE TYPE, transformation, and procedure states.  (Please note the
 259 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
 260 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
 261
 262 PSPP starts up in the initial state.  Each successful completion
 263 of a command may cause a state transition.  Each type of command has its
 264 own rules for state transitions:
 265
 266 @table @strong
 267 @item Utility commands
 268 @itemize @bullet
 269 @item
 270 Valid in any state.
 271 @item
 272 Do not cause state transitions.  Exception: when @cmd{N OF CASES}
 273 is executed in the procedure state, it causes a transition to the
 274 transformation state.
 275 @end itemize
 276
 277 @item @cmd{DATA LIST}
 278 @itemize @bullet
 279 @item
 280 Valid in any state.
 281 @item
 282 When executed in the initial or procedure state, causes a transition to
 283 the transformation state.
 284 @item
 285 Clears the active file if executed in the procedure or transformation
 286 state.
 287 @end itemize
 288
 289 @item @cmd{INPUT PROGRAM}
 290 @itemize @bullet
 291 @item
 292 Invalid in INPUT PROGRAM and FILE TYPE states.
 293 @item
 294 Causes a transition to the INPUT PROGRAM state.
 295 @item
 296 Clears the active file.
 297 @end itemize
 298
 299 @item @cmd{FILE TYPE}
 300 @itemize @bullet
 301 @item
 302 Invalid in INPUT PROGRAM and FILE TYPE states.
 303 @item
 304 Causes a transition to the FILE TYPE state.
 305 @item
 306 Clears the active file.
 307 @end itemize
 308
 309 @item Other file definition commands
 310 @itemize @bullet
 311 @item
 312 Invalid in INPUT PROGRAM and FILE TYPE states.
 313 @item
 314 Cause a transition to the transformation state.
 315 @item
 316 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
 317 and @cmd{UPDATE}.
 318 @end itemize
 319
 320 @item Transformations
 321 @itemize @bullet
 322 @item
 323 Invalid in initial and FILE TYPE states.
 324 @item
 325 Cause a transition to the transformation state.
 326 @end itemize
 327
 328 @item Restricted transformations
 329 @itemize @bullet
 330 @item
 331 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 332 @item
 333 Cause a transition to the transformation state.
 334 @end itemize
 335
 336 @item Procedures
 337 @itemize @bullet
 338 @item
 339 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 340 @item
 341 Cause a transition to the procedure state.
 342 @end itemize
 343 @end table
 344
 345 @node Missing Observations, Variables, Order of Commands, Language
 346 @section Handling missing observations
 347 @cindex missing values
 348 @cindex values, missing
 349
 350 PSPP includes special support for unknown numeric data values.
 351 Missing observations are assigned a special value, called the
 352 @dfn{system-missing value}.  This ``value'' actually indicates the
 353 absence of a value; it means that the actual value is unknown.  Procedures
 354 automatically exclude from analyses those observations or cases that
 355 have missing values.  Details of missing value exclusion depend on the
 356 procedure and can often be controlled by the user; refer to
 357 descriptions of individual procedures for details.
 358
 359 The system-missing value exists only for numeric variables.  String
 360 variables always have a defined value, even if it is only a string of
 361 spaces.
 362
 363 Variables, whether numeric or string, can have designated
 364 @dfn{user-missing values}.  Every user-missing value is an actual value
 365 for that variable.  However, most of the time user-missing values are
 366 treated in the same way as the system-missing value.  String variables
 367 that are wider than a certain width, usually 8 characters (depending on
 368 computer architecture), cannot have user-missing values.
 369
 370 For more information on missing values, see the following sections:
 371 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
 372 documentation on individual procedures for information on how they
 373 handle missing values.
 374
 375 @node Variables, Files, Missing Observations, Language
 376 @section Variables
 377 @cindex variables
 378 @cindex dictionary
 379
 380 Variables are the basic unit of data storage in PSPP.  All the
 381 variables in a file taken together, apart from any associated data, are
 382 said to form a @dfn{dictionary}.
 383 Some details of variables are described in the sections below.
 384
 385 @menu
 386 * Attributes::                  Attributes of variables.
 387 * System Variables::            Variables automatically defined by PSPP.
 388 * Sets of Variables::           Lists of variable names.
 389 * Input/Output Formats::        Input and output formats.
 390 * Scratch Variables::           Variables deleted by procedures.
 391 @end menu
 392
 393 @node Attributes, System Variables, Variables, Variables
 394 @subsection Attributes of Variables
 395 @cindex variables, attributes of
 396 @cindex attributes of variables
 397 Each variable has a number of attributes, including:
 398
 399 @table @strong
 400 @item Name
 401 An identifier, up to 64 bytes long.  Each variable must have a different name.
 402 @xref{Tokens}.
 403
 404 Some system variable names begin with @samp{$}, but user-defined
 405 variables' names may not begin with @samp{$}.
 406
 407 @cindex @samp{.}
 408 @cindex period
 409 @cindex variable names, ending with period
 410 The final character in a variable name should not be @samp{.}, because
 411 such an identifier will be misinterpreted when it is the final token
 412 on a line: @code{FOO.} will be divided into two separate tokens,
 413 @samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.
 414
 415 @cindex @samp{_}
 416 The final character in a variable name should not be @samp{_}, because
 417 some such identifiers are used for special purposes by PSPP
 418 procedures.
 419
 420 As with all PSPP identifiers, variable names are not case-sensitive.
 421 PSPP capitalizes variable names on output the same way they were
 422 capitalized at their point of definition in the input.
 423
 424 @cindex variables, type
 425 @cindex type of variables
 426 @item Type
 427 Numeric or string.
 428
 429 @cindex variables, width
 430 @cindex width of variables
 431 @item Width
 432 (string variables only) String variables with a width of 8 characters or
 433 fewer are called @dfn{short string variables}.  Short string variables
 434 can be used in many procedures where @dfn{long string variables} (those
 435 with widths greater than 8) are not allowed.
 436
 437 Certain systems may consider strings longer than 8
 438 characters to be short strings.  Eight characters represents a minimum
 439 figure for the maximum length of a short string.
 440
 441 @item Position
 442 Variables in the dictionary are arranged in a specific order.
 443 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
 444
 445 @item Initialization
 446 Either reinitialized to 0 or spaces for each case, or left at its
 447 existing value.  @xref{LEAVE}.
 448
 449 @cindex missing values
 450 @cindex values, missing
 451 @item Missing values
 452 Optionally, up to three values, or a range of values, or a specific
 453 value plus a range, can be specified as @dfn{user-missing values}.
 454 There is also a @dfn{system-missing value} that is assigned to an
 455 observation when there is no other obvious value for that observation.
 456 Observations with missing values are automatically excluded from
 457 analyses.  User-missing values are actual data values, while the
 458 system-missing value is not a value at all.  @xref{Missing Observations}.
 459
 460 @cindex variable labels
 461 @cindex labels, variable
 462 @item Variable label
 463 A string that describes the variable.  @xref{VARIABLE LABELS}.
 464
 465 @cindex value labels
 466 @cindex labels, value
 467 @item Value label
 468 Optionally, these associate each possible value of the variable with a
 469 string.  @xref{VALUE LABELS}.
 470
 471 @cindex print format
 472 @item Print format
 473 Display width, format, and (for numeric variables) number of decimal
 474 places.  This attribute does not affect how data are stored, just how
 475 they are displayed.  Example: a width of 8, with 2 decimal places.
 476 @xref{PRINT FORMATS}.
 477
 478 @cindex write format
 479 @item Write format
 480 Similar to print format, but used by certain commands that are
 481 designed to write to binary files.  @xref{WRITE FORMATS}.
 482 @end table
 483
 484 @node System Variables, Sets of Variables, Attributes, Variables
 485 @subsection Variables Automatically Defined by PSPP
 486 @cindex system variables
 487 @cindex variables, system
 488
 489 There are seven system variables.  These are not like ordinary
 490 variables, as they are not stored in each case.  They can only be used
 491 in expressions.  These system variables, whose values and output formats
 492 cannot be modified, are described below.
 493
 494 @table @code
 495 @cindex @code{$CASENUM}
 496 @item $CASENUM
 497 Case number of the case at the moment.  This changes as cases are
 498 shuffled around.
 499
 500 @cindex @code{$DATE}
 501 @item $DATE
 502 Date the PSPP process was started, in format A9, following the
 503 pattern @code{DD MMM YY}.
 504
 505 @cindex @code{$JDATE}
 506 @item $JDATE
 507 Number of days between 15 Oct 1582 and the time the PSPP process
 508 was started.
 509
 510 @cindex @code{$LENGTH}
 511 @item $LENGTH
 512 Page length, in lines, in format F11.
 513
 514 @cindex @code{$SYSMIS}
 515 @item $SYSMIS
 516 System missing value, in format F1.
 517
 518 @cindex @code{$TIME}
 519 @item $TIME
 520 Number of seconds between midnight 14 Oct 1582 and the time the active file
 521 was read, in format F20.
 522
 523 @cindex @code{$WIDTH}
 524 @item $WIDTH
 525 Page width, in characters, in format F3.
 526 @end table
 527
 528 @node Sets of Variables, Input/Output Formats, System Variables, Variables
 529 @subsection Lists of variable names
 530 @cindex TO convention
 531 @cindex convention, TO
 532
 533 To refer to a set of variables, list their names one after another.
 534 Optionally, their names may be separated by commas.  To include a
 535 range of variables from the dictionary in the list, write the name of
 536 the first and last variable in the range, separated by @code{TO}.  For
 537 instance, if the dictionary contains six variables with the names
 538 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
 539 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
 540 variables @code{X2}, @code{GOAL}, and @code{MET}.
 541
 542 Commands that define variables, such as @cmd{DATA LIST}, give
 543 @code{TO} an alternate meaning.  With these commands, @code{TO} define
 544 sequences of variables whose names end in consecutive integers.  The
 545 syntax is two identifiers that begin with the same root and end with
 546 numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
 547 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
 548 @code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
 549 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
 550 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
 551 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
 552
 553 After a set of variables has been defined with @cmd{DATA LIST} or
 554 another command with this method, the same set can be referenced on
 555 later commands using the same syntax.
 556
 557 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
 558 @subsection Input and Output Formats
 559
 560 Data that PSPP inputs and outputs must have one of a number of formats.
 561 These formats are described, in general, by a format specification of
 562 the form @code{NAMEw.d}, where @var{name} is the
 563 format name and @var{w} is a field width.  @var{d} is the optional
 564 desired number of decimal places, if appropriate.  If @var{d} is not
 565 included then it is assumed to be 0.  Some formats do not allow @var{d}
 566 to be specified.
 567
 568 When an input format is specified on @cmd{DATA LIST} or another
 569 command, then
 570 it is converted to an output format for the purposes of @cmd{PRINT}
 571 and other
 572 data output commands.  For most purposes, input and output formats are
 573 the same; the salient differences are described below.
 574
 575 Below are listed the input and output formats supported by PSPP.  If an
 576 input format is mapped to a different output format by default, then
 577 that mapping is indicated with @result{}.  Each format has the listed
 578 bounds on input width (iw) and output width (ow).
 579
 580 The standard numeric input and output formats are given in the following
 581 table:
 582
 583 @table @asis
 584 @item Fw.d: 1 <= iw,ow <= 40
 585 Standard decimal format with @var{d} decimal places.  If the number is
 586 too large to fit within the field width, it is expressed in scientific
 587 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
 588 the exponent.  When used as an input format, scientific notation is
 589 allowed but an E or an F must be used to introduce the exponent.
 590
 591 The default output format is the same as the input format, except if
 592 @var{d} > 1.  In that case the output @var{w} is always made to be at
 593 least 2 + @var{d}.
 594
 595 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
 596 For input this is equivalent to F format except that no E or F is
 597 require to introduce the exponent.  For output, produces scientific
 598 notation in the form @code{1.2+34}.  There are always at least two
 599 digits given in the exponent.
 600
 601 The default output @var{w} is the largest of the input @var{w}, the
 602 input @var{d} + 7, and 10.  The default output @var{d} is the input
 603 @var{d}, but at least 3.
 604
 605 @item COMMAw.d: 1 <= iw,ow <= 40
 606 Equivalent to F format, except that groups of three digits are
 607 comma-separated on output.  If the number is too large to express in the
 608 field width, then first commas are eliminated, then if there is still
 609 not enough space the number is expressed in scientific notation given
 610 that w >= 6.  Commas are allowed and ignored when this is used as an
 611 input format.
 612
 613 @item DOTw.d: 1 <= iw,ow <= 40
 614 Equivalent to COMMA format except that the roles of comma and decimal
 615 point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
 616 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
 617 decimal point.
 618
 619 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
 620 Equivalent to COMMA format, except that the number is prefixed by a
 621 dollar sign (@samp{$}) if there is room.  On input the value is allowed
 622 to be prefixed by a dollar sign, which is ignored.
 623
 624 The default output @var{w} is the input @var{w}, but at least 2.
 625
 626 @item PCTw.d: 2 <= iw,ow <= 40
 627 Equivalent to F format, except that the number is suffixed by a percent
 628 sign (@samp{%}) if there is room.  On input the value is allowed to be
 629 suffixed by a percent sign, which is ignored.
 630
 631 The default output @var{w} is the input @var{w}, but at least 2.
 632
 633 @item Nw.d: 1 <= iw,ow <= 40
 634 Only digits are allowed within the field width.  The decimal point is
 635 assumed to be @var{d} digits from the right margin.
 636
 637 The default output format is F with the same @var{w} and @var{d}, except
 638 if @var{d} > 1.  In that case the output @var{w} is always made to be at
 639 least 2 + @var{d}.
 640
 641 @item Zw.d @result{} F: 1 <= iw,ow <= 40
 642 Zoned decimal input.  If you need to use this then you know how.
 643
 644 @item IBw.d @result{} F: 1 <= iw,ow <= 8
 645 Integer binary format.  The field is interpreted as a fixed-point
 646 positive or negative binary number in two's-complement notation.  The
 647 location of the decimal point is implied.  Endianness is the same as the
 648 host machine.
 649
 650 The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
 651 with output @var{w} as 9 + input @var{d} and output @var{d} as input
 652 @var{d}.
 653
 654 @item PIB @result{} F: 1 <= iw,ow <= 8
 655 Positive integer binary format.  The field is interpreted as a
 656 fixed-point positive binary number.  The location of the decimal point
 657 is implied.  Endianness is teh same as the host machine.
 658
 659 The default output format follows the rules for IB format.
 660
 661 @item Pw.d @result{} F: 1 <= iw,ow <= 16
 662 Binary coded decimal format.  Each byte from left to right, except the
 663 rightmost, represents two digits.  The upper nibble of each byte is more
 664 significant.  The upper nibble of the final byte is the least
 665 significant digit.  The lower nibble of the final byte is the sign; a
 666 value of D represents a negative sign and all other values are
 667 considered positive.  The decimal point is implied.
 668
 669 The default output format follows the rules for IB format.
 670
 671 @item PKw.d @result{} F: 1 <= iw,ow <= 16
 672 Positive binary code decimal format.  Same as P but the last byte is the
 673 same as the others.
 674
 675 The default output format follows the rules for IB format.
 676
 677 @item RBw @result{} F: 2 <= iw,ow <= 8
 678
 679 Binary C architecture-dependent ``double'' format.  For a standard
 680 IEEE754 implementation @var{w} should be 8.
 681
 682 The default output format follows the rules for IB format.
 683
 684 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
 685 PIB format encoded as textual hex digit pairs.  @var{w} must be even.
 686
 687 The input width is mapped to a default output width as follows:
 688 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
 689 12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
 690 decimal places.
 691
 692 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
 693
 694 RB format encoded as textual hex digits pairs.  @var{w} must be even.
 695
 696 The default output format is F8.2.
 697
 698 @item CCAw.d: 1 <= ow <= 40
 699 @itemx CCBw.d: 1 <= ow <= 40
 700 @itemx CCCw.d: 1 <= ow <= 40
 701 @itemx CCDw.d: 1 <= ow <= 40
 702 @itemx CCEw.d: 1 <= ow <= 40
 703
 704 User-defined custom currency formats.  May not be used as an input
 705 format.  @xref{SET}, for more details.
 706 @end table
 707
 708 The date and time numeric input and output formats accept a number of
 709 possible formats.  Before describing the formats themselves, some
 710 definitions of the elements that make up their formats will be helpful:
 711
 712 @table @dfn
 713 @item leader
 714 All formats accept an optional white space leader.
 715
 716 @item day
 717 An integer between 1 and 31 representing the day of month.
 718
 719 @item day-count
 720 An integer representing a number of days.
 721
 722 @item date-delimiter
 723 One or more characters of white space or the following characters:
 724 @code{- / . ,}
 725
 726 @item month
 727 A month name in one of the following forms:
 728 @itemize @bullet
 729 @item
 730 An integer between 1 and 12.
 731 @item
 732 Roman numerals representing an integer between 1 and 12.
 733 @item
 734 At least the first three characters of an English month name (January,
 735 February, @dots{}).
 736 @end itemize
 737
 738 @item year
 739 An integer year number between 1582 and 19999, or between 1 and 199.
 740 Years between 1 and 199 will have 1900 added.
 741
 742 @item julian
 743 A single number with a year number in the first 2, 3, or 4 digits (as
 744 above) and the day number within the year in the last 3 digits.
 745
 746 @item quarter
 747 An integer between 1 and 4 representing a quarter.
 748
 749 @item q-delimiter
 750 The letter @samp{Q} or @samp{q}.
 751
 752 @item week
 753 An integer between 1 and 53 representing a week within a year.
 754
 755 @item wk-delimiter
 756 The letters @samp{wk} in any case.
 757
 758 @item time-delimiter
 759 At least one characters of white space or @samp{:} or @samp{.}.
 760
 761 @item hour
 762 An integer greater than 0 representing an hour.
 763
 764 @item minute
 765 An integer between 0 and 59 representing a minute within an hour.
 766
 767 @item opt-second
 768 Optionally, a time-delimiter followed by a real number representing a
 769 number of seconds.
 770
 771 @item hour24
 772 An integer between 0 and 23 representing an hour within a day.
 773
 774 @item weekday
 775 At least the first two characters of an English day word.
 776
 777 @item spaces
 778 Any amount or no amount of white space.
 779
 780 @item sign
 781 An optional positive or negative sign.
 782
 783 @item trailer
 784 All formats accept an optional white space trailer.
 785 @end table
 786
 787 The date input formats are strung together from the above pieces.  On
 788 output, the date formats are always printed in a single canonical
 789 manner, based on field width.  The date input and output formats are
 790 described below:
 791
 792 @table @asis
 793 @item DATEw: 9 <= iw,ow <= 40
 794 Date format. Input format: leader + day + date-delimiter +
 795 month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
 796 @var{w} < 11, DD-MMM-YYYY otherwise.
 797
 798 @item EDATEw: 8 <= iw,ow <= 40
 799 European date format.  Input format same as DATE.  Output format:
 800 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
 801
 802 @item SDATEw: 8 <= iw,ow <= 40
 803 Standard date format. Input format: leader + year + date-delimiter +
 804 month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
 805 @var{w} < 10, YYYY/MM/DD otherwise.
 806
 807 @item ADATEw: 8 <= iw,ow <= 40
 808 American date format.  Input format: leader + month + date-delimiter +
 809 day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
 810 @var{w} < 10, MM/DD/YYYY otherwise.
 811
 812 @item JDATEw: 5 <= iw,ow <= 40
 813 Julian date format.  Input format: leader + julian + trailer.  Output
 814 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
 815
 816 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
 817 Quarter/year format.  Input format: leader + quarter + q-delimiter +
 818 year + trailer.  Output format: @samp{Q Q YY}, where the first
 819 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
 820 YYYY} otherwise.
 821
 822 @item MOYRw: 6 <= iw,ow <= 40
 823 Month/year format.  Input format: leader + month + date-delimiter + year
 824 + trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
 825 YYYY} otherwise.
 826
 827 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
 828 Week/year format.  Input format: leader + week + wk-delimiter + year +
 829 trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
 830 YYYY} otherwise.
 831
 832 @item DATETIMEw.d: 17 <= iw,ow <= 40
 833 Date and time format.  Input format: leader + day + date-delimiter +
 834 month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
 835 + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
 836 @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
 837 @var{d} > 0 then fractional seconds @samp{.SS} are added.
 838
 839 @item TIMEw.d: 5 <= iw,ow <= 40
 840 Time format.  Input format: leader + sign + spaces + hour +
 841 time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
 842 Seconds and fractional seconds are available with @var{w} of at least 8
 843 and 10, respectively.
 844
 845 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
 846 Time format with day count.  Input format: leader + sign + spaces +
 847 day-count + time-delimiter + hour + time-delimiter + minute +
 848 opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
 849 seconds are available with @var{w} of at least 8 and 10, respectively.
 850
 851 @item WKDAYw: 2 <= iw,ow <= 40
 852 A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
 853 leader + weekday + trailer.  Output format: as many characters, in all
 854 capital letters, of the English name of the weekday as will fit in the
 855 field width.
 856
 857 @item MONTHw: 3 <= iw,ow <= 40
 858 A month as a number between 1 and 12, where 1 is January.  Input format:
 859 leader + month + trailer.  Output format: as many character, in all
 860 capital letters, of the English name of the month as will fit in the
 861 field width.
 862 @end table
 863
 864 There are only two formats that may be used with string variables:
 865
 866 @table @asis
 867 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
 868 The entire field is treated as a string value.
 869
 870 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
 871 The field is composed of characters in a string encoded as textual hex
 872 digit pairs.
 873
 874 The default output @var{w} is half the input @var{w}.
 875 @end table
 876
 877 @node Scratch Variables,  , Input/Output Formats, Variables
 878 @subsection Scratch Variables
 879
 880 Most of the time, variables don't retain their values between cases.
 881 Instead, either they're being read from a data file or the active file,
 882 in which case they assume the value read, or, if created with
 883 @cmd{COMPUTE} or
 884 another transformation, they're initialized to the system-missing value
 885 or to blanks, depending on type.
 886
 887 However, sometimes it's useful to have a variable that keeps its value
 888 between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
 889 use a @dfn{scratch variable}.  Scratch variables are variables whose
 890 names begin with an octothorpe (@samp{#}).
 891
 892 Scratch variables have the same properties as variables left with
 893 @cmd{LEAVE}:
 894 they retain their values between cases, and for the first case they are
 895 initialized to 0 or blanks.  They have the additional property that they
 896 are deleted before the execution of any procedure.  For this reason,
 897 scratch variables can't be used for analysis.  To obtain the same
 898 effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
 899 value into an ordinary variable, then analysis that variable.
 900
 901 @node Files, BNF, Variables, Language
 902 @section Files Used by PSPP
 903
 904 PSPP makes use of many files each time it runs.  Some of these it
 905 reads, some it writes, some it creates.  Here is a table listing the
 906 most important of these files:
 907
 908 @table @strong
 909 @cindex file, command
 910 @cindex file, syntax file
 911 @cindex command file
 912 @cindex syntax file
 913 @item command file
 914 @itemx syntax file
 915 These names (synonyms) refer to the file that contains instructions to
 916 PSPP that tell it what to do.  The syntax file's name is specified on
 917 the PSPP command line.  Syntax files can also be pulled in with
 918 @cmd{INCLUDE} (@pxref{INCLUDE}).
 919
 920 @cindex file, data
 921 @cindex data file
 922 @item data file
 923 Data files contain raw data in ASCII format suitable for being read in
 924 by @cmd{DATA LIST}.  Data can be embedded in the syntax
 925 file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
 926 syntax file a data file too.
 927
 928 @cindex file, output
 929 @cindex output file
 930 @item listing file
 931 One or more output files are created by PSPP each time it is
 932 run.  The output files receive the tables and charts produced by
 933 statistical procedures.  The output files may be in any number of formats,
 934 depending on how PSPP is configured.
 935
 936 @cindex active file
 937 @cindex file, active
 938 @item active file
 939 The active file is the ``file'' on which all PSPP procedures
 940 are performed.  The active file contains variable definitions and
 941 cases.  The active file is not necessarily a disk file: it is stored
 942 in memory if there is room.
 943 @end table
 944
 945 @node BNF,  , Files, Language
 946 @section Backus-Naur Form
 947 @cindex BNF
 948 @cindex Backus-Naur Form
 949 @cindex command syntax, description of
 950 @cindex description of command syntax
 951
 952 The syntax of some parts of the PSPP language is presented in this
 953 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
 954 following table describes BNF:
 955
 956 @itemize @bullet
 957 @cindex keywords
 958 @cindex terminals
 959 @item
 960 Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
 961 often called @dfn{terminals}.  There are some special terminals, which
 962 are actually written in lowercase for clarity:
 963
 964 @table @asis
 965 @cindex @code{number}
 966 @item @code{number}
 967 A real number.
 968
 969 @cindex @code{integer}
 970 @item @code{integer}
 971 An integer number.
 972
 973 @cindex @code{string}
 974 @item @code{string}
 975 A string.
 976
 977 @cindex @code{var-name}
 978 @item @code{var-name}
 979 A single variable name.
 980
 981 @cindex operators
 982 @cindex punctuators
 983 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
 984 Operators and punctuators.
 985
 986 @cindex @code{.}
 987 @item @code{.}
 988 The end of the command.  This is not necessarily an actual dot in the
 989 syntax file: @xref{Commands}, for more details.
 990 @end table
 991
 992 @item
 993 @cindex productions
 994 @cindex nonterminals
 995 Other words in all lowercase refer to BNF definitions, called
 996 @dfn{productions}.  These productions are also known as
 997 @dfn{nonterminals}.  Some nonterminals are very common, so they are
 998 defined here in English for clarity:
 999
1000 @table @code
1001 @cindex @code{var-list}
1002 @item var-list
1003 A list of one or more variable names or the keyword @code{ALL}.
1004
1005 @cindex @code{expression}
1006 @item expression
1007 An expression.  @xref{Expressions}, for details.
1008 @end table
1009
1010 @item
1011 @cindex @code{::=}
1012 @cindex ``is defined as''
1013 @cindex productions
1014 @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
1015 the name of the nonterminal being defined.  The right side of @samp{::=}
1016 gives the definition of that nonterminal.  If the right side is empty,
1017 then one possible expansion of that nonterminal is nothing.  A BNF
1018 definition is called a @dfn{production}.
1019
1020 @item
1021 @cindex terminals and nonterminals, differences
1022 So, the key difference between a terminal and a nonterminal is that a
1023 terminal cannot be broken into smaller parts---in fact, every terminal
1024 is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
1025 composed of a (possibly empty) sequence of terminals and nonterminals.
1026 Thus, terminals indicate the deepest level of syntax description.  (In
1027 parsing theory, terminals are the leaves of the parse tree; nonterminals
1028 form the branches.)
1029
1030 @item
1031 @cindex start symbol
1032 @cindex symbol, start
1033 The first nonterminal defined in a set of productions is called the
1034 @dfn{start symbol}.  The start symbol defines the entire syntax for
1035 that command.
1036 @end itemize
1037 @setfilename ignored