doc/language.texi

   1 @node Language, Expressions, Invocation, Top
   2 @chapter The PSPP language
   3 @cindex language, PSPP
   4 @cindex PSPP, language
   5
   6 @quotation
   7 @strong{Please note:} PSPP is not even close to completion.
   8 Only a few actual statistical procedures are implemented.  PSPP
   9 is a work in progress.
  10 @end quotation
  11
  12 This chapter discusses elements common to many PSPP commands.
  13 Later chapters will describe individual commands in detail.
  14
  15 @menu
  16 * Tokens::                      Characters combine to form tokens.
  17 * Commands::                    Tokens combine to form commands.
  18 * Types of Commands::           Commands come in several flavors.
  19 * Order of Commands::           Commands combine to form syntax files.
  20 * Missing Observations::        Handling missing observations.
  21 * Variables::                   The unit of data storage.
  22 * Files::                       Files used by PSPP.
  23 * BNF::                         How command syntax is described.
  24 @end menu
  25
  26 @node Tokens, Commands, Language, Language
  27 @section Tokens
  28 @cindex language, lexical analysis
  29 @cindex language, tokens
  30 @cindex tokens
  31 @cindex lexical analysis
  32 @cindex lexemes
  33
  34 PSPP divides most syntax file lines into series of short chunks
  35 called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}.  These
  36 tokens are then grouped to form commands, each of which tells
  37 PSPP to take some action---read in data, write out data, perform
  38 a statistical procedure, etc.  The process of dividing input into tokens
  39 is @dfn{tokenization}, or @dfn{lexical analysis}.  Each type of token is
  40 described below.
  41
  42 @cindex delimiters
  43 @cindex whitespace
  44 Tokens must be separated from each other by @dfn{delimiters}.
  45 Delimiters include whitespace (spaces, tabs, carriage returns, line
  46 feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
  47 operators (plus, minus, times, divide, etc.)  Note that while whitespace
  48 only separates tokens, other delimiters are tokens in themselves.
  49
  50 @table @strong
  51 @cindex identifiers
  52 @item Identifiers
  53 Identifiers are names that specify variable names, commands, or command
  54 details.
  55
  56 @itemize @bullet
  57 @item
  58 The first character in an identifier must be a letter, @samp{#}, or
  59 @samp{@@}.  Some system identifiers begin with @samp{$}, but
  60 user-defined variables' names may not begin with @samp{$}.
  61
  62 @item
  63 The remaining characters in the identifier must be letters, digits, or
  64 one of the following special characters:
  65
  66 @example
  67 .  _  $  #  @@
  68 @end example
  69
  70 @item
  71 @cindex variable names
  72 @cindex names, variable
  73 Variable names may be any length, but only the first 8 characters are
  74 significant.
  75
  76 @item
  77 @cindex case-sensitivity
  78 Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
  79 @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
  80 representations of the same identifier.
  81
  82 @item
  83 @cindex keywords
  84 Identifiers other than variable names may be abbreviated to their first
  85 3 characters if this abbreviation is unambiguous.  These identifiers are
  86 often called @dfn{keywords}.  (Unique abbreviations of 3 or more
  87 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
  88 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
  89
  90 @item
  91 Whether an identifier is a keyword depends on the context.
  92
  93 @item
  94 @cindex keywords, reserved
  95 @cindex reserved keywords
  96 Some keywords are reserved.  These keywords may not be used in any
  97 context besides those explicitly described in this manual.  The reserved
  98 keywords are:
  99
 100 @example
 101 ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
 102 @end example
 103
 104 @item
 105 Since keywords are identifiers, all the rules for identifiers apply.
 106 Specifically, they must be delimited as are other identifiers:
 107 @code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
 108 variable name.
 109 @end itemize
 110
 111 @cindex @samp{.}
 112 @cindex period
 113 @cindex variable names, ending with period
 114 @strong{Caution:} It is legal to end a variable name with a period, but
 115 @emph{don't do it!}  The variable name will be misinterpreted when it is
 116 the final token on a line: @code{FOO.} will be divided into two separate
 117 tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
 118 @xref{Commands, , Forming commands of tokens}.
 119
 120 @item Numbers
 121 @cindex numbers
 122 @cindex integers
 123 @cindex reals
 124 Numbers may be specified as integers or reals.  Integers are internally
 125 converted into reals.  Scientific notation is not supported.  Here are
 126 some examples of valid numbers:
 127
 128 @example
 129 1234  3.14159265359  .707106781185  8945.
 130 @end example
 131
 132 @strong{Caution:} The last example will be interpreted as two tokens,
 133 @samp{8945} and @samp{.}, if it is the last token on a line.
 134
 135 @item Strings
 136 @cindex strings
 137 @cindex @samp{'}
 138 @cindex @samp{"}
 139 @cindex case-sensitivity
 140 Strings are literal sequences of characters enclosed in pairs of single
 141 quotes (@samp{'}) or double quotes (@samp{"}).
 142
 143 @itemize @bullet
 144 @item
 145 Whitespace and case of letters @emph{are} significant inside strings.
 146 @item
 147 Whitespace characters inside a string are not delimiters.
 148 @item
 149 To include single-quote characters in a string, enclose the string in
 150 double quotes.
 151 @item
 152 To include double-quote characters in a string, enclose the string in
 153 single quotes.
 154 @item
 155 It is not possible to put both single- and double-quote characters
 156 inside one string.
 157 @end itemize
 158
 159 @item Hexstrings
 160 @cindex hexstrings
 161 Hexstrings are string variants that use hex digits to specify
 162 characters.
 163
 164 @itemize @bullet
 165 @item
 166 A hexstring may be used anywhere that an ordinary string is allowed.
 167
 168 @item
 169 @cindex @samp{X'}
 170 @cindex @samp{'}
 171 A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
 172
 173 @cindex whitespace
 174 @item
 175 No whitespace is allowed between the initial @samp{X} and @samp{'}.
 176
 177 @item
 178 Double quotes @samp{"} may be used in place of single quotes @samp{'} if
 179 done in both places.
 180
 181 @item
 182 Each pair of hex digits is internally changed into a single character
 183 with the given value.
 184
 185 @item
 186 If there is an odd number of hex digits, the missing last digit is
 187 assumed to be @samp{0}.
 188
 189 @item
 190 @cindex portability
 191 @strong{Please note:} Use of hexstrings is nonportable because the same
 192 numeric values are associated with different glyphs by different
 193 operating systems.  Therefore, their use should be confined to syntax
 194 files that will not be widely distributed.
 195
 196 @item
 197 @cindex characters, reserved
 198 @cindex 0
 199 @cindex whitespace
 200 @strong{Please note also:} The character with value 00 is reserved for
 201 internal use by PSPP.  Its use in strings causes an error and
 202 replacement with a blank space (in ASCII, hex 20, decimal 32).
 203 @end itemize
 204
 205 @item Punctuation
 206 @cindex punctuation
 207 Punctuation separates tokens; punctuators are delimiters.  These are the
 208 punctuation characters:
 209
 210 @example
 211 ,  /  =  (  )
 212 @end example
 213
 214 @item Operators
 215 @cindex operators
 216 Operators describe mathematical operations.  Some operators are delimiters:
 217
 218 @example
 219 (  )  +  -  *  /  **
 220 @end example
 221
 222 Many of the above operators are also punctuators.  Punctuators are
 223 distinguished from operators by context.
 224
 225 The other operators are all reserved keywords.  None of these are
 226 delimiters:
 227
 228 @example
 229 AND  EQ  GE  GT  LE  LT  NE  OR
 230 @end example
 231
 232 @item Terminal Dot
 233 @cindex terminal dot
 234 @cindex dot, terminal
 235 @cindex period
 236 @cindex @samp{.}
 237 A period (@samp{.}) at the end of a line (except for whitespace) is one
 238 type of a @dfn{terminal dot}, although not every terminal dot is a
 239 period at the end of a line.  @xref{Commands, , Forming commands of
 240 tokens}.  A period is a terminal dot @emph{only}
 241 when it is at the end of a line; otherwise it is part of a
 242 floating-point number.  (A period outside a number in the middle of a
 243 line is an error.)
 244
 245 @quotation
 246 @cindex terminal dot, changing
 247 @cindex dot, terminal, changing
 248 @strong{Please note:} The character used for the @dfn{terminal dot}
 249 can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}).  This
 250 is strongly discouraged, and throughout all the remainder of this
 251 manual it will be assumed that the default setting is in effect.
 252 @end quotation
 253
 254 @end table
 255
 256 @node Commands, Types of Commands, Tokens, Language
 257 @section Forming commands of tokens
 258
 259 @cindex PSPP, command structure
 260 @cindex language, command structure
 261 @cindex commands, structure
 262
 263 Most PSPP commands share a common structure, diagrammed below:
 264
 265 @example
 266 @var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
 267 @end example
 268
 269 @cindex @samp{[  ]}
 270 In the above, rather daunting, expression, pairs of square brackets
 271 (@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
 272 indicate parts of the syntax that vary from command to command.
 273 Ellipses (@samp{...}) indicate that the preceding part may be repeated
 274 an arbitrary number of times.  Let's pick apart what it says above:
 275
 276 @itemize @bullet
 277 @cindex commands, names
 278 @item
 279 A command begins with a command name of one or more keywords, such as
 280 @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}.  @var{cmd}
 281 may be abbreviated to its first word if that is unambiguous; each word
 282 in @var{cmd} may be abbreviated to a unique prefix of three or more
 283 characters as described above.
 284
 285 @cindex subcommands
 286 @item
 287 The command name may be followed by one or more @dfn{subcommands}:
 288
 289 @itemize @minus
 290 @item
 291 Each subcommand begins with a unique keyword, indicated by @var{sbc}
 292 above.  This is analogous to the command name.
 293
 294 @item
 295 The subcommand name is optionally followed by an equals sign (@samp{=}).
 296
 297 @item
 298 Some subcommands accept a series of one or more specifications
 299 (@var{spec}), optionally separated by commas.
 300
 301 @item
 302 Each subcommand must be separated from the next (if any) by a forward
 303 slash (@samp{/}).
 304 @end itemize
 305
 306 @cindex dot, terminal
 307 @cindex terminal dot
 308 @item
 309 Each command must be terminated with a @dfn{terminal dot}.
 310 The terminal dot may be given one of three ways:
 311
 312 @itemize @minus
 313 @item
 314 (most commonly) A period character at the very end of a line, as
 315 described above.
 316
 317 @item
 318 (only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
 319 details.)  A completely blank line.
 320
 321 @item
 322 (in batch mode only) Any line that is not indented from the left side of
 323 the page causes a terminal dot to be inserted before that line.
 324 Therefore, each command begins with a line that is flush left, followed
 325 by zero or more lines that are indented one or more characters from the
 326 left margin.
 327
 328 In batch mode, PSPP will ignore a plus sign, minus sign, or period
 329 (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
 330 line.  Any of these characters as the first character on a line will
 331 begin a new command.  This allows for visual indentation of a command
 332 without that command being considered part of the previous command.
 333
 334 PSPP is in batch mode when it is reading input from a file, rather
 335 than from an interactive user.  Note that the other forms of the
 336 terminal dot may also be used in batch mode.
 337
 338 Sometimes, one encounters syntax files that are intended to be
 339 interpreted in interactive mode rather than batch mode (for instance,
 340 this can happen if a session log file is used directly as a syntax
 341 file).  When this occurs, use the @samp{-i} command line option to force
 342 interpretation in interactive mode (@pxref{Language control options}).
 343 @end itemize
 344 @end itemize
 345
 346 PSPP ignores empty commands when they are generated by the above
 347 rules.  Note that, as a consequence of these rules, each command must
 348 begin on a new line.
 349
 350 @node Types of Commands, Order of Commands, Commands, Language
 351 @section Types of Commands
 352
 353 Commands in PSPP are divided roughly into six categories:
 354
 355 @table @strong
 356 @item Utility commands
 357 @cindex utility commands
 358 Set or display various global options that affect PSPP operations.
 359 May appear anywhere in a syntax file.  @xref{Utilities, , Utility
 360 commands}.
 361
 362 @item File definition commands
 363 @cindex file definition commands
 364 Give instructions for reading data from text files or from special
 365 binary ``system files''.  Most of these commands discard any previous
 366 data or variables to replace it with the new data and
 367 variables.  At least one must appear before the first command in any of
 368 the categories below.  @xref{Data Input and Output}.
 369
 370 @item Input program commands
 371 @cindex input program commands
 372 Though rarely used, these provide powerful tools for reading data files
 373 in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
 374
 375 @item Transformations
 376 @cindex transformations
 377 Perform operations on data and write data to output files.  Transformations
 378 are not carried out until a procedure is executed.
 379
 380 @item Restricted transformations
 381 @cindex restricted transformations
 382 Same as transformations for most purposes.  @xref{Order of Commands}, for a
 383 detailed description of the differences.
 384
 385 @item Procedures
 386 @cindex procedures
 387 Analyze data, writing results of analyses to the listing file.  Cause
 388 transformations specified earlier in the file to be performed.  In a
 389 more general sense, a @dfn{procedure} is any command that causes the
 390 active file (the data) to be read.
 391 @end table
 392
 393 @node Order of Commands, Missing Observations, Types of Commands, Language
 394 @section Order of Commands
 395 @cindex commands, ordering
 396 @cindex order of commands
 397
 398 PSPP does not place many restrictions on ordering of commands.
 399 The main restriction is that variables must be defined with one of the
 400 file-definition commands before they are otherwise referred to.
 401
 402 Of course, there are specific rules, for those who are interested.
 403 PSPP possesses five internal states, called initial, INPUT PROGRAM,
 404 FILE TYPE, transformation, and procedure states.  (Please note the
 405 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
 406 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
 407
 408 PSPP starts up in the initial state.  Each successful completion
 409 of a command may cause a state transition.  Each type of command has its
 410 own rules for state transitions:
 411
 412 @table @strong
 413 @item Utility commands
 414 @itemize @bullet
 415 @item
 416 Legal in all states.
 417 @item
 418 Do not cause state transitions.  Exception: when @cmd{N OF CASES}
 419 is executed in the procedure state, it causes a transition to the
 420 transformation state.
 421 @end itemize
 422
 423 @item @cmd{DATA LIST}
 424 @itemize @bullet
 425 @item
 426 Legal in all states.
 427 @item
 428 When executed in the initial or procedure state, causes a transition to
 429 the transformation state.
 430 @item
 431 Clears the active file if executed in the procedure or transformation
 432 state.
 433 @end itemize
 434
 435 @item @cmd{INPUT PROGRAM}
 436 @itemize @bullet
 437 @item
 438 Invalid in INPUT PROGRAM and FILE TYPE states.
 439 @item
 440 Causes a transition to the INPUT PROGRAM state.
 441 @item
 442 Clears the active file.
 443 @end itemize
 444
 445 @item @cmd{FILE TYPE}
 446 @itemize @bullet
 447 @item
 448 Invalid in INPUT PROGRAM and FILE TYPE states.
 449 @item
 450 Causes a transition to the FILE TYPE state.
 451 @item
 452 Clears the active file.
 453 @end itemize
 454
 455 @item Other file definition commands
 456 @itemize @bullet
 457 @item
 458 Invalid in INPUT PROGRAM and FILE TYPE states.
 459 @item
 460 Cause a transition to the transformation state.
 461 @item
 462 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
 463 and @cmd{UPDATE}.
 464 @end itemize
 465
 466 @item Transformations
 467 @itemize @bullet
 468 @item
 469 Invalid in initial and FILE TYPE states.
 470 @item
 471 Cause a transition to the transformation state.
 472 @end itemize
 473
 474 @item Restricted transformations
 475 @itemize @bullet
 476 @item
 477 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 478 @item
 479 Cause a transition to the transformation state.
 480 @end itemize
 481
 482 @item Procedures
 483 @itemize @bullet
 484 @item
 485 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 486 @item
 487 Cause a transition to the procedure state.
 488 @end itemize
 489 @end table
 490
 491 @node Missing Observations, Variables, Order of Commands, Language
 492 @section Handling missing observations
 493 @cindex missing values
 494 @cindex values, missing
 495
 496 PSPP includes special support for unknown numeric data values.
 497 Missing observations are assigned a special value, called the
 498 @dfn{system-missing value}.  This ``value'' actually indicates the
 499 absence of value; it means that the actual value is unknown.  Procedures
 500 automatically exclude from analyses those observations or cases that
 501 have missing values.  Whether single observations or entire cases are
 502 excluded depends on the procedure.
 503
 504 The system-missing value exists only for numeric variables.  String
 505 variables always have a defined value, even if it is only a string of
 506 spaces.
 507
 508 Variables, whether numeric or string, can have designated
 509 @dfn{user-missing values}.  Every user-missing value is an actual value
 510 for that variable.  However, most of the time user-missing values are
 511 treated in the same way as the system-missing value.  String variables
 512 that are wider than a certain width, usually 8 characters (depending on
 513 computer architecture), cannot have user-missing values.
 514
 515 For more information on missing values, see the following sections:
 516 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
 517 documentation on individual procedures for information on how they
 518 handle missing values.
 519
 520 @node Variables, Files, Missing Observations, Language
 521 @section Variables
 522 @cindex variables
 523 @cindex dictionary
 524
 525 Variables are the basic unit of data storage in PSPP.  All the
 526 variables in a file taken together, apart from any associated data, are
 527 said to form a @dfn{dictionary}.
 528 Some details of variables are described in the sections below.
 529
 530 @menu
 531 * Attributes::                  Attributes of variables.
 532 * System Variables::            Variables automatically defined by PSPP.
 533 * Sets of Variables::           Lists of variable names.
 534 * Input/Output Formats::        Input and output formats.
 535 * Scratch Variables::           Variables deleted by procedures.
 536 @end menu
 537
 538 @node Attributes, System Variables, Variables, Variables
 539 @subsection Attributes of Variables
 540 @cindex variables, attributes of
 541 @cindex attributes of variables
 542 Each variable has a number of attributes, including:
 543
 544 @table @strong
 545 @item Name
 546 This is an identifier.  Each variable must have a different name.
 547 @xref{Tokens}.
 548
 549 @cindex variables, type
 550 @cindex type of variables
 551 @item Type
 552 Numeric or string.
 553
 554 @cindex variables, width
 555 @cindex width of variables
 556 @item Width
 557 (string variables only) String variables with a width of 8 characters or
 558 fewer are called @dfn{short string variables}.  Short string variables
 559 can be used in many procedures where @dfn{long string variables} (those
 560 with widths greater than 8) are not allowed.
 561
 562 @quotation
 563 @strong{Please note:} Certain systems may consider strings longer than 8
 564 characters to be short strings.  Eight characters represents a minimum
 565 figure for the maximum length of a short string.
 566 @end quotation
 567
 568 @item Position
 569 Variables in the dictionary are arranged in a specific order.
 570 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
 571
 572 @item Initialization
 573 Either reinitialized to 0 or spaces for each case, or left at its
 574 existing value.  @xref{LEAVE}.
 575
 576 @cindex missing values
 577 @cindex values, missing
 578 @item Missing values
 579 Optionally, up to three values, or a range of values, or a specific
 580 value plus a range, can be specified as @dfn{user-missing values}.
 581 There is also a @dfn{system-missing value} that is assigned to an
 582 observation when there is no other obvious value for that observation.
 583 Observations with missing values are automatically excluded from
 584 analyses.  User-missing values are actual data values, while the
 585 system-missing value is not a value at all.  @xref{Missing Observations}.
 586
 587 @cindex variable labels
 588 @cindex labels, variable
 589 @item Variable label
 590 A string that describes the variable.  @xref{VARIABLE LABELS}.
 591
 592 @cindex value labels
 593 @cindex labels, value
 594 @item Value label
 595 Optionally, these associate each possible value of the variable with a
 596 string.  @xref{VALUE LABELS}.
 597
 598 @cindex print format
 599 @item Print format
 600 Display width, format, and (for numeric variables) number of decimal
 601 places.  This attribute does not affect how data are stored, just how
 602 they are displayed.  Example: a width of 8, with 2 decimal places.
 603 @xref{PRINT FORMATS}.
 604
 605 @cindex write format
 606 @item Write format
 607 Similar to print format, but used by certain commands that are
 608 designed to write to binary files.  @xref{WRITE FORMATS}.
 609 @end table
 610
 611 @node System Variables, Sets of Variables, Attributes, Variables
 612 @subsection Variables Automatically Defined by PSPP
 613 @cindex system variables
 614 @cindex variables, system
 615
 616 There are seven system variables.  These are not like ordinary
 617 variables, as they are not stored in each case.  They can only be used
 618 in expressions.  These system variables, whose values and output formats
 619 cannot be modified, are described below.
 620
 621 @table @code
 622 @cindex @code{$CASENUM}
 623 @item $CASENUM
 624 Case number of the case at the moment.  This changes as cases are
 625 shuffled around.
 626
 627 @cindex @code{$DATE}
 628 @item $DATE
 629 Date the PSPP process was started, in format A9, following the
 630 pattern @code{DD MMM YY}.
 631
 632 @cindex @code{$JDATE}
 633 @item $JDATE
 634 Number of days between 15 Oct 1582 and the time the PSPP process
 635 was started.
 636
 637 @cindex @code{$LENGTH}
 638 @item $LENGTH
 639 Page length, in lines, in format F11.
 640
 641 @cindex @code{$SYSMIS}
 642 @item $SYSMIS
 643 System missing value, in format F1.
 644
 645 @cindex @code{$TIME}
 646 @item $TIME
 647 Number of seconds between midnight 14 Oct 1582 and the time the active file
 648 was read, in format F20.
 649
 650 @cindex @code{$WIDTH}
 651 @item $WIDTH
 652 Page width, in characters, in format F3.
 653 @end table
 654
 655 @node Sets of Variables, Input/Output Formats, System Variables, Variables
 656 @subsection Lists of variable names
 657 @cindex TO convention
 658 @cindex convention, TO
 659
 660 There are several ways to specify a set of variables:
 661
 662 @enumerate
 663 @item
 664 (Most commonly.)  List the variable names one after another, optionally
 665 separating them by commas.
 666
 667 @cindex @code{TO}
 668 @item
 669 (This method cannot be used on commands that define the dictionary, such
 670 as @cmd{DATA LIST}.)  The syntax is the names of two existing variables,
 671 separated by the reserved keyword @code{TO}.  The meaning is to include
 672 every variable in the dictionary between and including the variables
 673 specified.  For instance, if the dictionary contains six variables with
 674 the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
 675 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
 676 variables @code{X2}, @code{GOAL}, and @code{MET}.
 677
 678 @item
 679 (This method can be used only on commands that define the dictionary,
 680 such as @cmd{DATA LIST}.)  It is used to define sequences of variables
 681 that end in consecutive integers.  The syntax is two identifiers that
 682 end in numbers.  This method is best illustrated with examples:
 683
 684 @itemize @bullet
 685 @item
 686 The syntax @code{X1 TO X5} defines 5 variables:
 687
 688 @itemize @minus
 689 @item
 690 X1
 691 @item
 692 X2
 693 @item
 694 X3
 695 @item
 696 X4
 697 @item
 698 X5
 699 @end itemize
 700
 701 @item
 702 The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
 703
 704 @itemize @minus
 705 @item
 706 ITEM0008
 707 @item
 708 ITEM0009
 709 @item
 710 ITEM0010
 711 @item
 712 ITEM0011
 713 @item
 714 ITEM0012
 715 @item
 716 ITEM0013
 717 @end itemize
 718
 719 @item
 720 Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
 721 are invalid, although for different reasons, which should be evident.
 722 @end itemize
 723
 724 Note that after a set of variables has been defined with @cmd{DATA LIST}
 725 or another command with this method, the same set can be referenced on
 726 later commands using the same syntax.
 727
 728 @item
 729 The above methods can be combined, either one after another or delimited
 730 by commas.  For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
 731 is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
 732 is individually legal.
 733 @end enumerate
 734
 735 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
 736 @subsection Input and Output Formats
 737
 738 Data that PSPP inputs and outputs must have one of a number of formats.
 739 These formats are described, in general, by a format specification of
 740 the form @code{NAMEw.d}, where @var{name} is the
 741 format name and @var{w} is a field width.  @var{d} is the optional
 742 desired number of decimal places, if appropriate.  If @var{d} is not
 743 included then it is assumed to be 0.  Some formats do not allow @var{d}
 744 to be specified.
 745
 746 When an input format is specified on @cmd{DATA LIST} or another
 747 command, then
 748 it is converted to an output format for the purposes of @cmd{PRINT}
 749 and other
 750 data output commands.  For most purposes, input and output formats are
 751 the same; the salient differences are described below.
 752
 753 Below are listed the input and output formats supported by PSPP.  If an
 754 input format is mapped to a different output format by default, then
 755 that mapping is indicated with @result{}.  Each format has the listed
 756 bounds on input width (iw) and output width (ow).
 757
 758 The standard numeric input and output formats are given in the following
 759 table:
 760
 761 @table @asis
 762 @item Fw.d: 1 <= iw,ow <= 40
 763 Standard decimal format with @var{d} decimal places.  If the number is
 764 too large to fit within the field width, it is expressed in scientific
 765 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
 766 the exponent.  When used as an input format, scientific notation is
 767 allowed but an E or an F must be used to introduce the exponent.
 768
 769 The default output format is the same as the input format, except if
 770 @var{d} > 1.  In that case the output @var{w} is always made to be at
 771 least 2 + @var{d}.
 772
 773 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
 774 For input this is equivalent to F format except that no E or F is
 775 require to introduce the exponent.  For output, produces scientific
 776 notation in the form @code{1.2+34}.  There are always at least two
 777 digits given in the exponent.
 778
 779 The default output @var{w} is the largest of the input @var{w}, the
 780 input @var{d} + 7, and 10.  The default output @var{d} is the input
 781 @var{d}, but at least 3.
 782
 783 @item COMMAw.d: 1 <= iw,ow <= 40
 784 Equivalent to F format, except that groups of three digits are
 785 comma-separated on output.  If the number is too large to express in the
 786 field width, then first commas are eliminated, then if there is still
 787 not enough space the number is expressed in scientific notation given
 788 that w >= 6.  Commas are allowed and ignored when this is used as an
 789 input format.
 790
 791 @item DOTw.d: 1 <= iw,ow <= 40
 792 Equivalent to COMMA format except that the roles of comma and decimal
 793 point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
 794 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
 795 decimal point.
 796
 797 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
 798 Equivalent to COMMA format, except that the number is prefixed by a
 799 dollar sign (@samp{$}) if there is room.  On input the value is allowed
 800 to be prefixed by a dollar sign, which is ignored.
 801
 802 The default output @var{w} is the input @var{w}, but at least 2.
 803
 804 @item PCTw.d: 2 <= iw,ow <= 40
 805 Equivalent to F format, except that the number is suffixed by a percent
 806 sign (@samp{%}) if there is room.  On input the value is allowed to be
 807 suffixed by a percent sign, which is ignored.
 808
 809 The default output @var{w} is the input @var{w}, but at least 2.
 810
 811 @item Nw.d: 1 <= iw,ow <= 40
 812 Only digits are allowed within the field width.  The decimal point is
 813 assumed to be @var{d} digits from the right margin.
 814
 815 The default output format is F with the same @var{w} and @var{d}, except
 816 if @var{d} > 1.  In that case the output @var{w} is always made to be at
 817 least 2 + @var{d}.
 818
 819 @item Zw.d @result{} F: 1 <= iw,ow <= 40
 820 Zoned decimal input.  If you need to use this then you know how.
 821
 822 @item IBw.d @result{} F: 1 <= iw,ow <= 8
 823 Integer binary format.  The field is interpreted as a fixed-point
 824 positive or negative binary number in two's-complement notation.  The
 825 location of the decimal point is implied.  Endianness is the same as the
 826 host machine.
 827
 828 The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
 829 with output @var{w} as 9 + input @var{d} and output @var{d} as input
 830 @var{d}.
 831
 832 @item PIB @result{} F: 1 <= iw,ow <= 8
 833 Positive integer binary format.  The field is interpreted as a
 834 fixed-point positive binary number.  The location of the decimal point
 835 is implied.  Endianness is teh same as the host machine.
 836
 837 The default output format follows the rules for IB format.
 838
 839 @item Pw.d @result{} F: 1 <= iw,ow <= 16
 840 Binary coded decimal format.  Each byte from left to right, except the
 841 rightmost, represents two digits.  The upper nibble of each byte is more
 842 significant.  The upper nibble of the final byte is the least
 843 significant digit.  The lower nibble of the final byte is the sign; a
 844 value of D represents a negative sign and all other values are
 845 considered positive.  The decimal point is implied.
 846
 847 The default output format follows the rules for IB format.
 848
 849 @item PKw.d @result{} F: 1 <= iw,ow <= 16
 850 Positive binary code decimal format.  Same as P but the last byte is the
 851 same as the others.
 852
 853 The default output format follows the rules for IB format.
 854
 855 @item RBw @result{} F: 2 <= iw,ow <= 8
 856
 857 Binary C architecture-dependent ``double'' format.  For a standard
 858 IEEE754 implementation @var{w} should be 8.
 859
 860 The default output format follows the rules for IB format.
 861
 862 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
 863 PIB format encoded as textual hex digit pairs.  @var{w} must be even.
 864
 865 The input width is mapped to a default output width as follows:
 866 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
 867 12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
 868 decimal places.
 869
 870 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
 871
 872 RB format encoded as textual hex digits pairs.  @var{w} must be even.
 873
 874 The default output format is F8.2.
 875
 876 @item CCAw.d: 1 <= ow <= 40
 877 @itemx CCBw.d: 1 <= ow <= 40
 878 @itemx CCCw.d: 1 <= ow <= 40
 879 @itemx CCDw.d: 1 <= ow <= 40
 880 @itemx CCEw.d: 1 <= ow <= 40
 881
 882 User-defined custom currency formats.  May not be used as an input
 883 format.  @xref{SET}, for more details.
 884 @end table
 885
 886 The date and time numeric input and output formats accept a number of
 887 possible formats.  Before describing the formats themselves, some
 888 definitions of the elements that make up their formats will be helpful:
 889
 890 @table @dfn
 891 @item leader
 892 All formats accept an optional whitespace leader.
 893
 894 @item day
 895 An integer between 1 and 31 representing the day of month.
 896
 897 @item day-count
 898 An integer representing a number of days.
 899
 900 @item date-delimiter
 901 One or more characters of whitespace or the following characters:
 902 @code{- / . ,}
 903
 904 @item month
 905 A month name in one of the following forms:
 906 @itemize @bullet
 907 @item
 908 An integer between 1 and 12.
 909 @item
 910 Roman numerals representing an integer between 1 and 12.
 911 @item
 912 At least the first three characters of an English month name (January,
 913 February, @dots{}).
 914 @end itemize
 915
 916 @item year
 917 An integer year number between 1582 and 19999, or between 1 and 199.
 918 Years between 1 and 199 will have 1900 added.
 919
 920 @item julian
 921 A single number with a year number in the first 2, 3, or 4 digits (as
 922 above) and the day number within the year in the last 3 digits.
 923
 924 @item quarter
 925 An integer between 1 and 4 representing a quarter.
 926
 927 @item q-delimiter
 928 The letter @samp{Q} or @samp{q}.
 929
 930 @item week
 931 An integer between 1 and 53 representing a week within a year.
 932
 933 @item wk-delimiter
 934 The letters @samp{wk} in any case.
 935
 936 @item time-delimiter
 937 At least one characters of whitespace or @samp{:} or @samp{.}.
 938
 939 @item hour
 940 An integer greater than 0 representing an hour.
 941
 942 @item minute
 943 An integer between 0 and 59 representing a minute within an hour.
 944
 945 @item opt-second
 946 Optionally, a time-delimiter followed by a real number representing a
 947 number of seconds.
 948
 949 @item hour24
 950 An integer between 0 and 23 representing an hour within a day.
 951
 952 @item weekday
 953 At least the first two characters of an English day word.
 954
 955 @item spaces
 956 Any amount or no amount of whitespace.
 957
 958 @item sign
 959 An optional positive or negative sign.
 960
 961 @item trailer
 962 All formats accept an optional whitespace trailer.
 963 @end table
 964
 965 The date input formats are strung together from the above pieces.  On
 966 output, the date formats are always printed in a single canonical
 967 manner, based on field width.  The date input and output formats are
 968 described below:
 969
 970 @table @asis
 971 @item DATEw: 9 <= iw,ow <= 40
 972 Date format. Input format: leader + day + date-delimiter +
 973 month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
 974 @var{w} < 11, DD-MMM-YYYY otherwise.
 975
 976 @item EDATEw: 8 <= iw,ow <= 40
 977 European date format.  Input format same as DATE.  Output format:
 978 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
 979
 980 @item SDATEw: 8 <= iw,ow <= 40
 981 Standard date format. Input format: leader + year + date-delimiter +
 982 month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
 983 @var{w} < 10, YYYY/MM/DD otherwise.
 984
 985 @item ADATEw: 8 <= iw,ow <= 40
 986 American date format.  Input format: leader + month + date-delimiter +
 987 day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
 988 @var{w} < 10, MM/DD/YYYY otherwise.
 989
 990 @item JDATEw: 5 <= iw,ow <= 40
 991 Julian date format.  Input format: leader + julian + trailer.  Output
 992 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
 993
 994 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
 995 Quarter/year format.  Input format: leader + quarter + q-delimiter +
 996 year + trailer.  Output format: @samp{Q Q YY}, where the first
 997 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
 998 YYYY} otherwise.
 999
1000 @item MOYRw: 6 <= iw,ow <= 40
1001 Month/year format.  Input format: leader + month + date-delimiter + year
1002 + trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
1003 YYYY} otherwise.
1004
1005 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
1006 Week/year format.  Input format: leader + week + wk-delimiter + year +
1007 trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
1008 YYYY} otherwise.
1009
1010 @item DATETIMEw.d: 17 <= iw,ow <= 40
1011 Date and time format.  Input format: leader + day + date-delimiter +
1012 month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
1013 + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
1014 @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
1015 @var{d} > 0 then fractional seconds @samp{.SS} are added.
1016
1017 @item TIMEw.d: 5 <= iw,ow <= 40
1018 Time format.  Input format: leader + sign + spaces + hour +
1019 time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
1020 Seconds and fractional seconds are available with @var{w} of at least 8
1021 and 10, respectively.
1022
1023 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
1024 Time format with day count.  Input format: leader + sign + spaces +
1025 day-count + time-delimiter + hour + time-delimiter + minute +
1026 opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
1027 seconds are available with @var{w} of at least 8 and 10, respectively.
1028
1029 @item WKDAYw: 2 <= iw,ow <= 40
1030 A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
1031 leader + weekday + trailer.  Output format: as many characters, in all
1032 capital letters, of the English name of the weekday as will fit in the
1033 field width.
1034
1035 @item MONTHw: 3 <= iw,ow <= 40
1036 A month as a number between 1 and 12, where 1 is January.  Input format:
1037 leader + month + trailer.  Output format: as many character, in all
1038 capital letters, of the English name of the month as will fit in the
1039 field width.
1040 @end table
1041
1042 There are only two formats that may be used with string variables:
1043
1044 @table @asis
1045 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
1046 The entire field is treated as a string value.
1047
1048 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
1049 The field is composed of characters in a string encoded as textual hex
1050 digit pairs.
1051
1052 The default output @var{w} is half the input @var{w}.
1053 @end table
1054
1055 @node Scratch Variables,  , Input/Output Formats, Variables
1056 @subsection Scratch Variables
1057
1058 Most of the time, variables don't retain their values between cases.
1059 Instead, either they're being read from a data file or the active file,
1060 in which case they assume the value read, or, if created with
1061 @cmd{COMPUTE} or
1062 another transformation, they're initialized to the system-missing value
1063 or to blanks, depending on type.
1064
1065 However, sometimes it's useful to have a variable that keeps its value
1066 between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
1067 use a @dfn{scratch variable}.  Scratch variables are variables whose
1068 names begin with an octothorpe (@samp{#}).
1069
1070 Scratch variables have the same properties as variables left with
1071 @cmd{LEAVE}:
1072 they retain their values between cases, and for the first case they are
1073 initialized to 0 or blanks.  They have the additional property that they
1074 are deleted before the execution of any procedure.  For this reason,
1075 scratch variables can't be used for analysis.  To obtain the same
1076 effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
1077 value into an ordinary variable, then analysis that variable.
1078
1079 @node Files, BNF, Variables, Language
1080 @section Files Used by PSPP
1081
1082 PSPP makes use of many files each time it runs.  Some of these it
1083 reads, some it writes, some it creates.  Here is a table listing the
1084 most important of these files:
1085
1086 @table @strong
1087 @cindex file, command
1088 @cindex file, syntax file
1089 @cindex command file
1090 @cindex syntax file
1091 @item command file
1092 @itemx syntax file
1093 These names (synonyms) refer to the file that contains instructions to
1094 PSPP that tell it what to do.  The syntax file's name is specified on
1095 the PSPP command line.  Syntax files can also be pulled in with
1096 @cmd{INCLUDE} (@pxref{INCLUDE}).
1097
1098 @cindex file, data
1099 @cindex data file
1100 @item data file
1101 Data files contain raw data in ASCII format suitable for being read in
1102 by @cmd{DATA LIST}.  Data can be embedded in the syntax
1103 file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
1104 syntax file a data file too.
1105
1106 @cindex file, output
1107 @cindex output file
1108 @item listing file
1109 One or more output files are created by PSPP each time it is
1110 run.  The output files receive the tables and charts produced by
1111 statistical procedures.  The output files may be in any number of formats,
1112 depending on how PSPP is configured.
1113
1114 @cindex active file
1115 @cindex file, active
1116 @item active file
1117 The active file is the ``file'' on which all PSPP procedures
1118 are performed.  The active file contains variable definitions and
1119 cases.  The active file is not necessarily a disk file: it is stored
1120 in memory if there is room.
1121 @end table
1122
1123 @node BNF,  , Files, Language
1124 @section Backus-Naur Form
1125 @cindex BNF
1126 @cindex Backus-Naur Form
1127 @cindex command syntax, description of
1128 @cindex description of command syntax
1129
1130 The syntax of some parts of the PSPP language is presented in this
1131 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
1132 following table describes BNF:
1133
1134 @itemize @bullet
1135 @cindex keywords
1136 @cindex terminals
1137 @item
1138 Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
1139 often called @dfn{terminals}.  There are some special terminals, which
1140 are actually written in lowercase for clarity:
1141
1142 @table @asis
1143 @cindex @code{number}
1144 @item @code{number}
1145 A real number.
1146
1147 @cindex @code{integer}
1148 @item @code{integer}
1149 An integer number.
1150
1151 @cindex @code{string}
1152 @item @code{string}
1153 A string.
1154
1155 @cindex @code{var-name}
1156 @item @code{var-name}
1157 A single variable name.
1158
1159 @cindex operators
1160 @cindex punctuators
1161 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
1162 Operators and punctuators.
1163
1164 @cindex @code{.}
1165 @cindex terminal dot
1166 @cindex dot, terminal
1167 @item @code{.}
1168 The terminal dot.  This is not necessarily an actual dot in the syntax
1169 file: @xref{Commands}, for more details.
1170 @end table
1171
1172 @item
1173 @cindex productions
1174 @cindex nonterminals
1175 Other words in all lowercase refer to BNF definitions, called
1176 @dfn{productions}.  These productions are also known as
1177 @dfn{nonterminals}.  Some nonterminals are very common, so they are
1178 defined here in English for clarity:
1179
1180 @table @code
1181 @cindex @code{var-list}
1182 @item var-list
1183 A list of one or more variable names or the keyword @code{ALL}.
1184
1185 @cindex @code{expression}
1186 @item expression
1187 An expression.  @xref{Expressions}, for details.
1188 @end table
1189
1190 @item
1191 @cindex @code{::=}
1192 @cindex ``is defined as''
1193 @cindex productions
1194 @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
1195 the name of the nonterminal being defined.  The right side of @samp{::=}
1196 gives the definition of that nonterminal.  If the right side is empty,
1197 then one possible expansion of that nonterminal is nothing.  A BNF
1198 definition is called a @dfn{production}.
1199
1200 @item
1201 @cindex terminals and nonterminals, differences
1202 So, the key difference between a terminal and a nonterminal is that a
1203 terminal cannot be broken into smaller parts---in fact, every terminal
1204 is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
1205 composed of a (possibly empty) sequence of terminals and nonterminals.
1206 Thus, terminals indicate the deepest level of syntax description.  (In
1207 parsing theory, terminals are the leaves of the parse tree; nonterminals
1208 form the branches.)
1209
1210 @item
1211 @cindex start symbol
1212 @cindex symbol, start
1213 The first nonterminal defined in a set of productions is called the
1214 @dfn{start symbol}.  The start symbol defines the entire syntax for
1215 that command.
1216 @end itemize
1217 @setfilename ignored