doc/language.texi

   1 @node Language, Expressions, Invocation, Top
   2 @chapter The PSPP language
   3 @cindex language, PSPP
   4 @cindex PSPP, language
   5
   6 @quotation
   7 @strong{Please note:} PSPP is not even close to completion.
   8 Only a few statistical procedures are implemented.  PSPP
   9 is a work in progress.
  10 @end quotation
  11
  12 This chapter discusses elements common to many PSPP commands.
  13 Later chapters will describe individual commands in detail.
  14
  15 @menu
  16 * Tokens::                      Characters combine to form tokens.
  17 * Commands::                    Tokens combine to form commands.
  18 * Types of Commands::           Commands come in several flavors.
  19 * Order of Commands::           Commands combine to form syntax files.
  20 * Missing Observations::        Handling missing observations.
  21 * Variables::                   The unit of data storage.
  22 * Files::                       Files used by PSPP.
  23 * BNF::                         How command syntax is described.
  24 @end menu
  25
  26 @node Tokens, Commands, Language, Language
  27 @section Tokens
  28 @cindex language, lexical analysis
  29 @cindex language, tokens
  30 @cindex tokens
  31 @cindex lexical analysis
  32
  33 PSPP divides most syntax file lines into series of short chunks
  34 called @dfn{tokens}.
  35 Tokens are then grouped to form commands, each of which tells
  36 PSPP to take some action---read in data, write out data, perform
  37 a statistical procedure, etc.  Each type of token is
  38 described below.
  39
  40 @table @strong
  41 @cindex identifiers
  42 @item Identifiers
  43 Identifiers are names that typically specify variables, commands, or
  44 subcommands.  The first character in an identifier must be a letter,
  45 @samp{#}, or @samp{@@}.  The remaining characters in the identifier
  46 must be letters, digits, or one of the following special characters:
  47
  48 @example
  49 @center @.  _  $  #  @@
  50 @end example
  51
  52 @cindex case-sensitivity
  53 Identifiers may be any length, but only the first 64 bytes are
  54 significant.  Identifiers are not case-sensitive: @code{foobar},
  55 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
  56 different representations of the same identifier.
  57
  58 @cindex identifiers, reserved
  59 @cindex reserved identifiers
  60 Some identifiers are reserved.  Reserved identifiers may not be used
  61 in any context besides those explicitly described in this manual.  The
  62 reserved identifiers are:
  63
  64 @example
  65 @center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
  66 @end example
  67
  68 @item Keywords
  69 Keywords are a subclass of identifiers that form a fixed part of
  70 command syntax.  For example, command and subcommand names are
  71 keywords.  Keywords may be abbreviated to their first 3 characters if
  72 this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
  73 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
  74 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
  75
  76 Reserved identifiers are always used as keywords.  Other identifiers
  77 may be used both as keywords and as user-defined identifiers, such as
  78 variable names.
  79
  80 @item Numbers
  81 @cindex numbers
  82 @cindex integers
  83 @cindex reals
  84 Numbers are expressed in decimal.  A decimal point is optional.
  85 Numbers may be expressed in scientific notation by adding @samp{e} and
  86 a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
  87 are some more examples of valid numbers:
  88
  89 @example
  90 -5  3.14159265359  1e100  -.707  8945.
  91 @end example
  92
  93 Negative numbers are expressed with a @samp{-} prefix.  However, in
  94 situations where a literal @samp{-} token is expected, what appears to
  95 be a negative number is treated as @samp{-} followed by a positive
  96 number.
  97
  98 No white space is allowed within a number token, except for horizontal
  99 white space between @samp{-} and the rest of the number.
 100
 101 The last example above, @samp{8945.} will be interpreted as two
 102 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
 103 @xref{Commands, , Forming commands of tokens}.
 104
 105 @item Strings
 106 @cindex strings
 107 @cindex @samp{'}
 108 @cindex @samp{"}
 109 @cindex case-sensitivity
 110 Strings are literal sequences of characters enclosed in pairs of
 111 single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
 112 character used for quoting in the string, double it, e.g.@:
 113 @samp{'it''s an apostrophe'}.  White space and case of letters are
 114 significant inside strings.
 115
 116 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
 117 'c'} is equivalent to @samp{'abc'}.  Concatenation is useful for
 118 splitting a single string across multiple source lines. The maximum
 119 length of a string, after concatenation, is 255 characters.
 120
 121 Strings may also be expressed as hexadecimal, octal, or binary
 122 character values by prefixing the initial quote character by @samp{X},
 123 @samp{O}, or @samp{B} or their lowercase equivalents.  Each pair,
 124 triplet, or octet of characters, according to the radix, is
 125 transformed into a single character with the given value.  If there is
 126 an incomplete group of characters, the missing final digits are
 127 assumed to be @samp{0}.  These forms of strings are nonportable
 128 because numeric values are associated with different characters by
 129 different operating systems.  Therefore, their use should be confined
 130 to syntax files that will not be widely distributed.
 131
 132 @cindex characters, reserved
 133 @cindex 0
 134 @cindex white space
 135 The character with value 00 is reserved for
 136 internal use by PSPP.  Its use in strings causes an error and
 137 replacement by a space character.
 138
 139 @item Punctuators and Operators
 140 @cindex punctuators
 141 @cindex operators
 142 These tokens are the punctuators and operators:
 143
 144 @example
 145 @center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
 146 @end example
 147
 148 Most of these appear within the syntax of commands, but the period
 149 (@samp{.}) punctuator is used only at the end of a command.  It is a
 150 punctuator only as the last character on a line (except white space).
 151 When it is the last non-space character on a line, a period is not
 152 treated as part of another token, even if it would otherwise be part
 153 of, e.g.@:, an identifier or a floating-point number.
 154
 155 Actually, the character that ends a command can be changed with
 156 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
 157 doing so.  Throughout the remainder of this manual we will assume that
 158 the default setting is in effect.
 159 @end table
 160
 161 @node Commands, Types of Commands, Tokens, Language
 162 @section Forming commands of tokens
 163
 164 @cindex PSPP, command structure
 165 @cindex language, command structure
 166 @cindex commands, structure
 167
 168 Most PSPP commands share a common structure.  A command begins with a
 169 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
 170 CASES}.  The command name may be abbreviated to its first word, and
 171 each word in the command name may be abbreviated to its first three
 172 or more characters, where these abbreviations are unambiguous.
 173
 174 The command name may be followed by one or more @dfn{subcommands}.
 175 Each subcommand begins with a subcommand name, which may be
 176 abbreviated to its first three letters.  Some subcommands accept a
 177 series of one or more specifications, which follow the subcommand
 178 name, optionally separated from it by an equals sign
 179 (@samp{=}). Specifications may be separated from each other
 180 by commas or spaces.  Each subcommand must be separated from the next (if any)
 181 by a forward slash (@samp{/}).
 182
 183 There are multiple ways to mark the end of a command.  The most common
 184 way is to end the last line of the command with a period (@samp{.}) as
 185 described in the previous section (@pxref{Tokens}).  A blank line, or
 186 one that consists only of white space or comments, also ends a command
 187 by default, although you can use the NULLINE subcommand of @cmd{SET}
 188 to disable this feature (@pxref{SET}).
 189
 190 In batch mode only, that is, when reading commands from a file instead
 191 of an interactive user, any line that contains a non-space character
 192 in the leftmost column begins a new command.  Thus, each command
 193 consists of a flush-left line followed by any number of lines indented
 194 from the left margin.  In this mode, a plus sign, minus sign, or
 195 period (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character
 196 in a line is ignored and causes that line to begin a new command,
 197 which allows for visual indentation of a command without that command
 198 being considered part of the previous command.
 199
 200 Sometimes, one encounters syntax files that are intended to be
 201 interpreted in interactive mode rather than batch mode.  When this
 202 occurs, use the @samp{-i} command line option to force interpretation
 203 in interactive mode (@pxref{Language control options}).
 204
 205 @node Types of Commands, Order of Commands, Commands, Language
 206 @section Types of Commands
 207
 208 Commands in PSPP are divided roughly into six categories:
 209
 210 @table @strong
 211 @item Utility commands
 212 @cindex utility commands
 213 Set or display various global options that affect PSPP operations.
 214 May appear anywhere in a syntax file.  @xref{Utilities, , Utility
 215 commands}.
 216
 217 @item File definition commands
 218 @cindex file definition commands
 219 Give instructions for reading data from text files or from special
 220 binary ``system files''.  Most of these commands replace any previous
 221 data or variables with new data or
 222 variables.  At least one file definition command must appear before the first command in any of
 223 the categories below.  @xref{Data Input and Output}.
 224
 225 @item Input program commands
 226 @cindex input program commands
 227 Though rarely used, these provide tools for reading data files
 228 in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
 229
 230 @item Transformations
 231 @cindex transformations
 232 Perform operations on data and write data to output files.  Transformations
 233 are not carried out until a procedure is executed.
 234
 235 @item Restricted transformations
 236 @cindex restricted transformations
 237 Transformations that cannot appear in certain contexts.  @xref{Order
 238 of Commands}, for details.
 239
 240 @item Procedures
 241 @cindex procedures
 242 Analyze data, writing results of analyses to the listing file.  Cause
 243 transformations specified earlier in the file to be performed.  In a
 244 more general sense, a @dfn{procedure} is any command that causes the
 245 active file (the data) to be read.
 246 @end table
 247
 248 @node Order of Commands, Missing Observations, Types of Commands, Language
 249 @section Order of Commands
 250 @cindex commands, ordering
 251 @cindex order of commands
 252
 253 PSPP does not place many restrictions on ordering of commands.  The
 254 main restriction is that variables must be defined before they are otherwise
 255 referenced.  This section describes the details of command ordering,
 256 but most users will have no need to refer to them.
 257
 258 PSPP possesses five internal states, called initial, INPUT PROGRAM,
 259 FILE TYPE, transformation, and procedure states.  (Please note the
 260 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
 261 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
 262
 263 PSPP starts in the initial state.  Each successful completion
 264 of a command may cause a state transition.  Each type of command has its
 265 own rules for state transitions:
 266
 267 @table @strong
 268 @item Utility commands
 269 @itemize @bullet
 270 @item
 271 Valid in any state.
 272 @item
 273 Do not cause state transitions.  Exception: when @cmd{N OF CASES}
 274 is executed in the procedure state, it causes a transition to the
 275 transformation state.
 276 @end itemize
 277
 278 @item @cmd{DATA LIST}
 279 @itemize @bullet
 280 @item
 281 Valid in any state.
 282 @item
 283 When executed in the initial or procedure state, causes a transition to
 284 the transformation state.
 285 @item
 286 Clears the active file if executed in the procedure or transformation
 287 state.
 288 @end itemize
 289
 290 @item @cmd{INPUT PROGRAM}
 291 @itemize @bullet
 292 @item
 293 Invalid in INPUT PROGRAM and FILE TYPE states.
 294 @item
 295 Causes a transition to the INPUT PROGRAM state.
 296 @item
 297 Clears the active file.
 298 @end itemize
 299
 300 @item @cmd{FILE TYPE}
 301 @itemize @bullet
 302 @item
 303 Invalid in INPUT PROGRAM and FILE TYPE states.
 304 @item
 305 Causes a transition to the FILE TYPE state.
 306 @item
 307 Clears the active file.
 308 @end itemize
 309
 310 @item Other file definition commands
 311 @itemize @bullet
 312 @item
 313 Invalid in INPUT PROGRAM and FILE TYPE states.
 314 @item
 315 Cause a transition to the transformation state.
 316 @item
 317 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
 318 and @cmd{UPDATE}.
 319 @end itemize
 320
 321 @item Transformations
 322 @itemize @bullet
 323 @item
 324 Invalid in initial and FILE TYPE states.
 325 @item
 326 Cause a transition to the transformation state.
 327 @end itemize
 328
 329 @item Restricted transformations
 330 @itemize @bullet
 331 @item
 332 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 333 @item
 334 Cause a transition to the transformation state.
 335 @end itemize
 336
 337 @item Procedures
 338 @itemize @bullet
 339 @item
 340 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 341 @item
 342 Cause a transition to the procedure state.
 343 @end itemize
 344 @end table
 345
 346 @node Missing Observations, Variables, Order of Commands, Language
 347 @section Handling missing observations
 348 @cindex missing values
 349 @cindex values, missing
 350
 351 PSPP includes special support for unknown numeric data values.
 352 Missing observations are assigned a special value, called the
 353 @dfn{system-missing value}.  This ``value'' actually indicates the
 354 absence of a value; it means that the actual value is unknown.  Procedures
 355 automatically exclude from analyses those observations or cases that
 356 have missing values.  Details of missing value exclusion depend on the
 357 procedure and can often be controlled by the user; refer to
 358 descriptions of individual procedures for details.
 359
 360 The system-missing value exists only for numeric variables.  String
 361 variables always have a defined value, even if it is only a string of
 362 spaces.
 363
 364 Variables, whether numeric or string, can have designated
 365 @dfn{user-missing values}.  Every user-missing value is an actual value
 366 for that variable.  However, most of the time user-missing values are
 367 treated in the same way as the system-missing value.  String variables
 368 that are wider than a certain width, usually 8 characters (depending on
 369 computer architecture), cannot have user-missing values.
 370
 371 For more information on missing values, see the following sections:
 372 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
 373 documentation on individual procedures for information on how they
 374 handle missing values.
 375
 376 @node Variables, Files, Missing Observations, Language
 377 @section Variables
 378 @cindex variables
 379 @cindex dictionary
 380
 381 Variables are the basic unit of data storage in PSPP.  All the
 382 variables in a file taken together, apart from any associated data, are
 383 said to form a @dfn{dictionary}.
 384 Some details of variables are described in the sections below.
 385
 386 @menu
 387 * Attributes::                  Attributes of variables.
 388 * System Variables::            Variables automatically defined by PSPP.
 389 * Sets of Variables::           Lists of variable names.
 390 * Input/Output Formats::        Input and output formats.
 391 * Scratch Variables::           Variables deleted by procedures.
 392 @end menu
 393
 394 @node Attributes, System Variables, Variables, Variables
 395 @subsection Attributes of Variables
 396 @cindex variables, attributes of
 397 @cindex attributes of variables
 398 Each variable has a number of attributes, including:
 399
 400 @table @strong
 401 @item Name
 402 An identifier, up to 64 bytes long.  Each variable must have a different name.
 403 @xref{Tokens}.
 404
 405 Some system variable names begin with @samp{$}, but user-defined
 406 variables' names may not begin with @samp{$}.
 407
 408 @cindex @samp{.}
 409 @cindex period
 410 @cindex variable names, ending with period
 411 The final character in a variable name should not be @samp{.}, because
 412 such an identifier will be misinterpreted when it is the final token
 413 on a line: @code{FOO.} will be divided into two separate tokens,
 414 @samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.
 415
 416 @cindex @samp{_}
 417 The final character in a variable name should not be @samp{_}, because
 418 some such identifiers are used for special purposes by PSPP
 419 procedures.
 420
 421 As with all PSPP identifiers, variable names are not case-sensitive.
 422 PSPP capitalizes variable names on output the same way they were
 423 capitalized at their point of definition in the input.
 424
 425 @cindex variables, type
 426 @cindex type of variables
 427 @item Type
 428 Numeric or string.
 429
 430 @cindex variables, width
 431 @cindex width of variables
 432 @item Width
 433 (string variables only) String variables with a width of 8 characters or
 434 fewer are called @dfn{short string variables}.  Short string variables
 435 can be used in many procedures where @dfn{long string variables} (those
 436 with widths greater than 8) are not allowed.
 437
 438 Certain systems may consider strings longer than 8
 439 characters to be short strings.  Eight characters represents a minimum
 440 figure for the maximum length of a short string.
 441
 442 @item Position
 443 Variables in the dictionary are arranged in a specific order.
 444 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
 445
 446 @item Initialization
 447 Either reinitialized to 0 or spaces for each case, or left at its
 448 existing value.  @xref{LEAVE}.
 449
 450 @cindex missing values
 451 @cindex values, missing
 452 @item Missing values
 453 Optionally, up to three values, or a range of values, or a specific
 454 value plus a range, can be specified as @dfn{user-missing values}.
 455 There is also a @dfn{system-missing value} that is assigned to an
 456 observation when there is no other obvious value for that observation.
 457 Observations with missing values are automatically excluded from
 458 analyses.  User-missing values are actual data values, while the
 459 system-missing value is not a value at all.  @xref{Missing Observations}.
 460
 461 @cindex variable labels
 462 @cindex labels, variable
 463 @item Variable label
 464 A string that describes the variable.  @xref{VARIABLE LABELS}.
 465
 466 @cindex value labels
 467 @cindex labels, value
 468 @item Value label
 469 Optionally, these associate each possible value of the variable with a
 470 string.  @xref{VALUE LABELS}.
 471
 472 @cindex print format
 473 @item Print format
 474 Display width, format, and (for numeric variables) number of decimal
 475 places.  This attribute does not affect how data are stored, just how
 476 they are displayed.  Example: a width of 8, with 2 decimal places.
 477 @xref{PRINT FORMATS}.
 478
 479 @cindex write format
 480 @item Write format
 481 Similar to print format, but used by certain commands that are
 482 designed to write to binary files.  @xref{WRITE FORMATS}.
 483 @end table
 484
 485 @node System Variables, Sets of Variables, Attributes, Variables
 486 @subsection Variables Automatically Defined by PSPP
 487 @cindex system variables
 488 @cindex variables, system
 489
 490 There are seven system variables.  These are not like ordinary
 491 variables because system variables are not always stored.  They can be used only
 492 in expressions.  These system variables, whose values and output formats
 493 cannot be modified, are described below.
 494
 495 @table @code
 496 @cindex @code{$CASENUM}
 497 @item $CASENUM
 498 Case number of the case at the moment.  This changes as cases are
 499 shuffled around.
 500
 501 @cindex @code{$DATE}
 502 @item $DATE
 503 Date the PSPP process was started, in format A9, following the
 504 pattern @code{DD MMM YY}.
 505
 506 @cindex @code{$JDATE}
 507 @item $JDATE
 508 Number of days between 15 Oct 1582 and the time the PSPP process
 509 was started.
 510
 511 @cindex @code{$LENGTH}
 512 @item $LENGTH
 513 Page length, in lines, in format F11.
 514
 515 @cindex @code{$SYSMIS}
 516 @item $SYSMIS
 517 System missing value, in format F1.
 518
 519 @cindex @code{$TIME}
 520 @item $TIME
 521 Number of seconds between midnight 14 Oct 1582 and the time the active file
 522 was read, in format F20.
 523
 524 @cindex @code{$WIDTH}
 525 @item $WIDTH
 526 Page width, in characters, in format F3.
 527 @end table
 528
 529 @node Sets of Variables, Input/Output Formats, System Variables, Variables
 530 @subsection Lists of variable names
 531 @cindex TO convention
 532 @cindex convention, TO
 533
 534 To refer to a set of variables, list their names one after another.
 535 Optionally, their names may be separated by commas.  To include a
 536 range of variables from the dictionary in the list, write the name of
 537 the first and last variable in the range, separated by @code{TO}.  For
 538 instance, if the dictionary contains six variables with the names
 539 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
 540 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
 541 variables @code{X2}, @code{GOAL}, and @code{MET}.
 542
 543 Commands that define variables, such as @cmd{DATA LIST}, give
 544 @code{TO} an alternate meaning.  With these commands, @code{TO} define
 545 sequences of variables whose names end in consecutive integers.  The
 546 syntax is two identifiers that begin with the same root and end with
 547 numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
 548 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
 549 @code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
 550 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
 551 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
 552 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
 553
 554 After a set of variables has been defined with @cmd{DATA LIST} or
 555 another command with this method, the same set can be referenced on
 556 later commands using the same syntax.
 557
 558 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
 559 @subsection Input and Output Formats
 560
 561 Data that PSPP inputs and outputs must have one of a number of formats.
 562 These formats are described, in general, by a format specification of
 563 the form @code{NAMEw.d}, where @var{name} is the
 564 format name and @var{w} is a field width.  @var{d} is the optional
 565 desired number of decimal places, if appropriate.  If @var{d} is not
 566 included then it is assumed to be 0.  Some formats do not allow @var{d}
 567 to be specified.
 568
 569 When @cmd{DATA LIST} or another command specifies an input format,
 570 that format is converted to an output format for the purposes of
 571 @cmd{PRINT} and other data output commands.  For most purposes, input
 572 and output formats are the same; the salient differences are described
 573 below.
 574
 575 Below are listed the input and output formats supported by PSPP.  If an
 576 input format is mapped to a different output format by default, then
 577 that mapping is indicated with @result{}.  Each format has the listed
 578 bounds on input width (iw) and output width (ow).
 579
 580 The standard numeric input and output formats are given in the following
 581 table:
 582
 583 @table @asis
 584 @item Fw.d: 1 <= iw,ow <= 40
 585 Standard decimal format with @var{d} decimal places.  If the number is
 586 too large to fit within the field width, it is expressed in scientific
 587 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
 588 the exponent.  When used as an input format, scientific notation is
 589 allowed but an E or an F must be used to introduce the exponent.
 590
 591 The default output format is the same as the input format, except if
 592 @var{d} > 1.  In that case the output @var{w} is always made to be at
 593 least 2 + @var{d}.
 594
 595 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
 596 For input this is equivalent to F format except that no E or F is
 597 require to introduce the exponent.  For output, produces scientific
 598 notation in the form @code{1.2+34}.  There are always at least two
 599 digits given in the exponent.
 600
 601 The default output @var{w} is the largest of the input @var{w}, the
 602 input @var{d} + 7, and 10.  The default output @var{d} is the input
 603 @var{d}, but at least 3.
 604
 605 @item COMMAw.d: 1 <= iw,ow <= 40
 606 Equivalent to F format, except that groups of three digits are
 607 comma-separated on output.  If the number is too large to express in the
 608 field width, then first commas are eliminated, then if there is still
 609 not enough space the number is expressed in scientific notation given
 610 that w >= 6.  Commas are allowed and ignored when this is used as an
 611 input format.
 612
 613 @item DOTw.d: 1 <= iw,ow <= 40
 614 Equivalent to COMMA format except that the roles of comma and decimal
 615 point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
 616 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
 617 decimal point.
 618
 619 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
 620 Equivalent to COMMA format, except that the number is prefixed by a
 621 dollar sign (@samp{$}) if there is room.  On input the value is allowed
 622 to be prefixed by a dollar sign, which is ignored.
 623
 624 The default output @var{w} is the input @var{w}, but at least 2.
 625
 626 @item PCTw.d: 2 <= iw,ow <= 40
 627 Equivalent to F format, except that the number is suffixed by a percent
 628 sign (@samp{%}) if there is room.  On input the value is allowed to be
 629 suffixed by a percent sign, which is ignored.
 630
 631 The default output @var{w} is the input @var{w}, but at least 2.
 632
 633 @item Nw.d: 1 <= iw,ow <= 40
 634 Only digits are allowed within the field width.  The decimal point is
 635 assumed to be @var{d} digits from the right margin.
 636
 637 The default output format is F with the same @var{w} and @var{d}, except
 638 if @var{d} > 1.  In that case the output @var{w} is always made to be at
 639 least 2 + @var{d}.
 640
 641 @item Zw.d @result{} F: 1 <= iw,ow <= 40
 642 Zoned decimal input.  If you need to use this then you know how.
 643
 644 @item IBw.d @result{} F: 1 <= iw,ow <= 8
 645 Integer binary format.  The field is interpreted as a fixed-point
 646 positive or negative binary number in two's-complement notation.  The
 647 location of the decimal point is implied.  Endianness is the same as the
 648 host machine.
 649
 650 The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
 651 with output @var{w} as 9 + input @var{d} and output @var{d} as input
 652 @var{d}.
 653
 654 @item PIB @result{} F: 1 <= iw,ow <= 8
 655 Positive integer binary format.  The field is interpreted as a
 656 fixed-point positive binary number.  The location of the decimal point
 657 is implied.  Endianness is the same as the host machine.
 658
 659 The default output format follows the rules for IB format.
 660
 661 @item Pw.d @result{} F: 1 <= iw,ow <= 16
 662 Binary coded decimal format.  Each byte from left to right, except the
 663 rightmost, represents two digits.  The upper nibble of each byte is more
 664 significant.  The upper nibble of the final byte is the least
 665 significant digit.  The lower nibble of the final byte is the sign; a
 666 value of D represents a negative sign and all other values are
 667 considered positive.  The decimal point is implied.
 668
 669 The default output format follows the rules for IB format.
 670
 671 @item PKw.d @result{} F: 1 <= iw,ow <= 16
 672 Positive binary code decimal format.  Same as P but the last byte is the
 673 same as the others.
 674
 675 The default output format follows the rules for IB format.
 676
 677 @item RBw @result{} F: 2 <= iw,ow <= 8
 678
 679 Binary C architecture-dependent ``double'' format.  For a standard
 680 IEEE754 implementation @var{w} should be 8.
 681
 682 The default output format follows the rules for IB format.
 683
 684 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
 685 PIB format encoded as textual hex digit pairs.  @var{w} must be even.
 686
 687 The input width is mapped to a default output width as follows:
 688 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
 689 12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
 690 decimal places.
 691
 692 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
 693
 694 RB format encoded as textual hex digits pairs.  @var{w} must be even.
 695
 696 The default output format is F8.2.
 697
 698 @item CCAw.d: 1 <= ow <= 40
 699 @itemx CCBw.d: 1 <= ow <= 40
 700 @itemx CCCw.d: 1 <= ow <= 40
 701 @itemx CCDw.d: 1 <= ow <= 40
 702 @itemx CCEw.d: 1 <= ow <= 40
 703
 704 User-defined custom currency formats.  May not be used as an input
 705 format.  @xref{SET}, for more details.
 706 @end table
 707
 708 The date and time numeric input and output formats accept a number of
 709 possible formats.  Before describing the formats themselves, some
 710 definitions of the elements that make up their formats will be helpful:
 711
 712 @table @dfn
 713 @item leader
 714 All formats accept an optional white space leader.
 715
 716 @item day
 717 An integer between 1 and 31 representing the day of month.
 718
 719 @item day-count
 720 An integer representing a number of days.
 721
 722 @item date-delimiter
 723 One or more characters of white space or the following characters:
 724 @code{- / . ,}
 725
 726 @item month
 727 A month name in one of the following forms:
 728 @itemize @bullet
 729 @item
 730 An integer between 1 and 12.
 731 @item
 732 Roman numerals representing an integer between 1 and 12.
 733 @item
 734 At least the first three characters of an English month name (January,
 735 February, @dots{}).
 736 @end itemize
 737
 738 @item year
 739 An integer year number between 1582 and 19999, or between 1 and 199.
 740 Years between 1 and 199 will have 1900 added.
 741
 742 @item julian
 743 A single number with a year number in the first 2, 3, or 4 digits (as
 744 above) and the day number within the year in the last 3 digits.
 745
 746 @item quarter
 747 An integer between 1 and 4 representing a quarter.
 748
 749 @item q-delimiter
 750 The letter @samp{Q} or @samp{q}.
 751
 752 @item week
 753 An integer between 1 and 53 representing a week within a year.
 754
 755 @item wk-delimiter
 756 The letters @samp{wk} in any case.
 757
 758 @item time-delimiter
 759 At least one characters of white space or @samp{:} or @samp{.}.
 760
 761 @item hour
 762 An integer greater than 0 representing an hour.
 763
 764 @item minute
 765 An integer between 0 and 59 representing a minute within an hour.
 766
 767 @item opt-second
 768 Optionally, a time-delimiter followed by a real number representing a
 769 number of seconds.
 770
 771 @item hour24
 772 An integer between 0 and 23 representing an hour within a day.
 773
 774 @item weekday
 775 At least the first two characters of an English day word.
 776
 777 @item spaces
 778 Any amount or no amount of white space.
 779
 780 @item sign
 781 An optional positive or negative sign.
 782
 783 @item trailer
 784 All formats accept an optional white space trailer.
 785 @end table
 786
 787 The date input formats are strung together from the above pieces.  On
 788 output, the date formats are always printed in a single canonical
 789 manner, based on field width.  The date input and output formats are
 790 described below:
 791
 792 @table @asis
 793 @item DATEw: 9 <= iw,ow <= 40
 794 Date format. Input format: leader + day + date-delimiter +
 795 month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
 796 @var{w} < 11, DD-MMM-YYYY otherwise.
 797
 798 @item EDATEw: 8 <= iw,ow <= 40
 799 European date format.  Input format same as DATE.  Output format:
 800 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
 801
 802 @item SDATEw: 8 <= iw,ow <= 40
 803 Standard date format. Input format: leader + year + date-delimiter +
 804 month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
 805 @var{w} < 10, YYYY/MM/DD otherwise.
 806
 807 @item ADATEw: 8 <= iw,ow <= 40
 808 American date format.  Input format: leader + month + date-delimiter +
 809 day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
 810 @var{w} < 10, MM/DD/YYYY otherwise.
 811
 812 @item JDATEw: 5 <= iw,ow <= 40
 813 Julian date format.  Input format: leader + julian + trailer.  Output
 814 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
 815
 816 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
 817 Quarter/year format.  Input format: leader + quarter + q-delimiter +
 818 year + trailer.  Output format: @samp{Q Q YY}, where the first
 819 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
 820 YYYY} otherwise.
 821
 822 @item MOYRw: 6 <= iw,ow <= 40
 823 Month/year format.  Input format: leader + month + date-delimiter + year
 824 + trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
 825 YYYY} otherwise.
 826
 827 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
 828 Week/year format.  Input format: leader + week + wk-delimiter + year +
 829 trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
 830 YYYY} otherwise.
 831
 832 @item DATETIMEw.d: 17 <= iw,ow <= 40
 833 Date and time format.  Input format: leader + day + date-delimiter +
 834 month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
 835 + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
 836 @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
 837 @var{d} > 0 then fractional seconds @samp{.SS} are added.
 838
 839 @item TIMEw.d: 5 <= iw,ow <= 40
 840 Time format.  Input format: leader + sign + spaces + hour +
 841 time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
 842 Seconds and fractional seconds are available with @var{w} of at least 8
 843 and 10, respectively.
 844
 845 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
 846 Time format with day count.  Input format: leader + sign + spaces +
 847 day-count + time-delimiter + hour + time-delimiter + minute +
 848 opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
 849 seconds are available with @var{w} of at least 8 and 10, respectively.
 850
 851 @item WKDAYw: 2 <= iw,ow <= 40
 852 A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
 853 leader + weekday + trailer.  Output format: as many characters, in all
 854 capital letters, of the English name of the weekday as will fit in the
 855 field width.
 856
 857 @item MONTHw: 3 <= iw,ow <= 40
 858 A month as a number between 1 and 12, where 1 is January.  Input format:
 859 leader + month + trailer.  Output format: as many character, in all
 860 capital letters, of the English name of the month as will fit in the
 861 field width.
 862 @end table
 863
 864 There are only two formats that may be used with string variables:
 865
 866 @table @asis
 867 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
 868 The entire field is treated as a string value.
 869
 870 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
 871 The field is composed of characters in a string encoded as textual hex
 872 digit pairs.
 873
 874 The default output @var{w} is half the input @var{w}.
 875 @end table
 876
 877 @node Scratch Variables,  , Input/Output Formats, Variables
 878 @subsection Scratch Variables
 879
 880 Most of the time, variables don't retain their values between cases.
 881 Instead, either they're being read from a data file or the active file,
 882 in which case they assume the value read, or, if created with
 883 @cmd{COMPUTE} or
 884 another transformation, they're initialized to the system-missing value
 885 or to blanks, depending on type.
 886
 887 However, sometimes it's useful to have a variable that keeps its value
 888 between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
 889 use a @dfn{scratch variable}.  Scratch variables are variables whose
 890 names begin with an octothorpe (@samp{#}).
 891
 892 Scratch variables have the same properties as variables left with
 893 @cmd{LEAVE}: they retain their values between cases, and for the first
 894 case they are initialized to 0 or blanks.  They have the additional
 895 property that they are deleted before the execution of any procedure.
 896 For this reason, scratch variables can't be used for analysis.  To use
 897 a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
 898 to copy its value into an ordinary variable, then use that ordinary
 899 variable in the analysis.
 900
 901 @node Files, BNF, Variables, Language
 902 @section Files Used by PSPP
 903
 904 PSPP makes use of many files each time it runs.  Some of these it
 905 reads, some it writes, some it creates.  Here is a table listing the
 906 most important of these files:
 907
 908 @table @strong
 909 @cindex file, command
 910 @cindex file, syntax file
 911 @cindex command file
 912 @cindex syntax file
 913 @item command file
 914 @itemx syntax file
 915 These names (synonyms) refer to the file that contains instructions
 916 that tell PSPP what to do.  The syntax file's name is specified on
 917 the PSPP command line.  Syntax files can also be pulled in with
 918 @cmd{INCLUDE} (@pxref{INCLUDE}).
 919
 920 @cindex file, data
 921 @cindex data file
 922 @item data file
 923 Data files contain raw data in ASCII format suitable for being read in
 924 by @cmd{DATA LIST}.  Data can be embedded in the syntax
 925 file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
 926 syntax file a data file too.
 927
 928 @cindex file, output
 929 @cindex output file
 930 @item listing file
 931 One or more output files are created by PSPP each time it is
 932 run.  The output files receive the tables and charts produced by
 933 statistical procedures.  The output files may be in any number of formats,
 934 depending on how PSPP is configured.
 935
 936 @cindex active file
 937 @cindex file, active
 938 @item active file
 939 The active file is the ``file'' on which all PSPP procedures
 940 are performed.  The active file contains variable definitions and
 941 cases.  The active file is not necessarily a disk file: it is stored
 942 in memory if there is room.
 943 @end table
 944
 945 @node BNF,  , Files, Language
 946 @section Backus-Naur Form
 947 @cindex BNF
 948 @cindex Backus-Naur Form
 949 @cindex command syntax, description of
 950 @cindex description of command syntax
 951
 952 The syntax of some parts of the PSPP language is presented in this
 953 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
 954 following table describes BNF:
 955
 956 @itemize @bullet
 957 @cindex keywords
 958 @cindex terminals
 959 @item
 960 Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
 961 often called @dfn{terminals}.  There are some special terminals, which
 962 are written in lowercase for clarity:
 963
 964 @table @asis
 965 @cindex @code{number}
 966 @item @code{number}
 967 A real number.
 968
 969 @cindex @code{integer}
 970 @item @code{integer}
 971 An integer number.
 972
 973 @cindex @code{string}
 974 @item @code{string}
 975 A string.
 976
 977 @cindex @code{var-name}
 978 @item @code{var-name}
 979 A single variable name.
 980
 981 @cindex operators
 982 @cindex punctuators
 983 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
 984 Operators and punctuators.
 985
 986 @cindex @code{.}
 987 @item @code{.}
 988 The end of the command.  This is not necessarily an actual dot in the
 989 syntax file: @xref{Commands}, for more details.
 990 @end table
 991
 992 @item
 993 @cindex productions
 994 @cindex nonterminals
 995 Other words in all lowercase refer to BNF definitions, called
 996 @dfn{productions}.  These productions are also known as
 997 @dfn{nonterminals}.  Some nonterminals are very common, so they are
 998 defined here in English for clarity:
 999
1000 @table @code
1001 @cindex @code{var-list}
1002 @item var-list
1003 A list of one or more variable names or the keyword @code{ALL}.
1004
1005 @cindex @code{expression}
1006 @item expression
1007 An expression.  @xref{Expressions}, for details.
1008 @end table
1009
1010 @item
1011 @cindex ``is defined as''
1012 @cindex productions
1013 @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
1014 the name of the nonterminal being defined.  The right side of @samp{::=}
1015 gives the definition of that nonterminal.  If the right side is empty,
1016 then one possible expansion of that nonterminal is nothing.  A BNF
1017 definition is called a @dfn{production}.
1018
1019 @item
1020 @cindex terminals and nonterminals, differences
1021 So, the key difference between a terminal and a nonterminal is that a
1022 terminal cannot be broken into smaller parts---in fact, every terminal
1023 is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
1024 composed of a (possibly empty) sequence of terminals and nonterminals.
1025 Thus, terminals indicate the deepest level of syntax description.  (In
1026 parsing theory, terminals are the leaves of the parse tree; nonterminals
1027 form the branches.)
1028
1029 @item
1030 @cindex start symbol
1031 @cindex symbol, start
1032 The first nonterminal defined in a set of productions is called the
1033 @dfn{start symbol}.  The start symbol defines the entire syntax for
1034 that command.
1035 @end itemize
1036 @setfilename ignored