pintos-os.org Git - pspp/blob - doc/language.texi

   1 @node Language, Expressions, Invocation, Top
   2 @chapter The PSPP language
   3 @cindex language, PSPP
   4 @cindex PSPP, language
   5
   6 @quotation
   7 @strong{Please note:} PSPP is not even close to completion.
   8 Only a few statistical procedures are implemented.  PSPP
   9 is a work in progress.
  10 @end quotation
  11
  12 This chapter discusses elements common to many PSPP commands.
  13 Later chapters will describe individual commands in detail.
  14
  15 @menu
  16 * Tokens::                      Characters combine to form tokens.
  17 * Commands::                    Tokens combine to form commands.
  18 * Types of Commands::           Commands come in several flavors.
  19 * Order of Commands::           Commands combine to form syntax files.
  20 * Missing Observations::        Handling missing observations.
  21 * Variables::                   The unit of data storage.
  22 * Files::                       Files used by PSPP.
  23 * File Handles::                How files are named.
  24 * BNF::                         How command syntax is described.
  25 @end menu
  26
  27 @node Tokens, Commands, Language, Language
  28 @section Tokens
  29 @cindex language, lexical analysis
  30 @cindex language, tokens
  31 @cindex tokens
  32 @cindex lexical analysis
  33
  34 PSPP divides most syntax file lines into series of short chunks
  35 called @dfn{tokens}.
  36 Tokens are then grouped to form commands, each of which tells
  37 PSPP to take some action---read in data, write out data, perform
  38 a statistical procedure, etc.  Each type of token is
  39 described below.
  40
  41 @table @strong
  42 @cindex identifiers
  43 @item Identifiers
  44 Identifiers are names that typically specify variables, commands, or
  45 subcommands.  The first character in an identifier must be a letter,
  46 @samp{#}, or @samp{@@}.  The remaining characters in the identifier
  47 must be letters, digits, or one of the following special characters:
  48
  49 @example
  50 @center @.  _  $  #  @@
  51 @end example
  52
  53 @cindex case-sensitivity
  54 Identifiers may be any length, but only the first 64 bytes are
  55 significant.  Identifiers are not case-sensitive: @code{foobar},
  56 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
  57 different representations of the same identifier.
  58
  59 @cindex identifiers, reserved
  60 @cindex reserved identifiers
  61 Some identifiers are reserved.  Reserved identifiers may not be used
  62 in any context besides those explicitly described in this manual.  The
  63 reserved identifiers are:
  64
  65 @example
  66 @center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
  67 @end example
  68
  69 @item Keywords
  70 Keywords are a subclass of identifiers that form a fixed part of
  71 command syntax.  For example, command and subcommand names are
  72 keywords.  Keywords may be abbreviated to their first 3 characters if
  73 this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
  74 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
  75 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
  76
  77 Reserved identifiers are always used as keywords.  Other identifiers
  78 may be used both as keywords and as user-defined identifiers, such as
  79 variable names.
  80
  81 @item Numbers
  82 @cindex numbers
  83 @cindex integers
  84 @cindex reals
  85 Numbers are expressed in decimal.  A decimal point is optional.
  86 Numbers may be expressed in scientific notation by adding @samp{e} and
  87 a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
  88 are some more examples of valid numbers:
  89
  90 @example
  91 -5  3.14159265359  1e100  -.707  8945.
  92 @end example
  93
  94 Negative numbers are expressed with a @samp{-} prefix.  However, in
  95 situations where a literal @samp{-} token is expected, what appears to
  96 be a negative number is treated as @samp{-} followed by a positive
  97 number.
  98
  99 No white space is allowed within a number token, except for horizontal
 100 white space between @samp{-} and the rest of the number.
 101
 102 The last example above, @samp{8945.} will be interpreted as two
 103 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
 104 @xref{Commands, , Forming commands of tokens}.
 105
 106 @item Strings
 107 @cindex strings
 108 @cindex @samp{'}
 109 @cindex @samp{"}
 110 @cindex case-sensitivity
 111 Strings are literal sequences of characters enclosed in pairs of
 112 single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
 113 character used for quoting in the string, double it, e.g.@:
 114 @samp{'it''s an apostrophe'}.  White space and case of letters are
 115 significant inside strings.
 116
 117 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
 118 'c'} is equivalent to @samp{'abc'}.  Concatenation is useful for
 119 splitting a single string across multiple source lines. The maximum
 120 length of a string, after concatenation, is 255 characters.
 121
 122 Strings may also be expressed as hexadecimal, octal, or binary
 123 character values by prefixing the initial quote character by @samp{X},
 124 @samp{O}, or @samp{B} or their lowercase equivalents.  Each pair,
 125 triplet, or octet of characters, according to the radix, is
 126 transformed into a single character with the given value.  If there is
 127 an incomplete group of characters, the missing final digits are
 128 assumed to be @samp{0}.  These forms of strings are nonportable
 129 because numeric values are associated with different characters by
 130 different operating systems.  Therefore, their use should be confined
 131 to syntax files that will not be widely distributed.
 132
 133 @cindex characters, reserved
 134 @cindex 0
 135 @cindex white space
 136 The character with value 00 is reserved for
 137 internal use by PSPP.  Its use in strings causes an error and
 138 replacement by a space character.
 139
 140 @item Punctuators and Operators
 141 @cindex punctuators
 142 @cindex operators
 143 These tokens are the punctuators and operators:
 144
 145 @example
 146 @center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
 147 @end example
 148
 149 Most of these appear within the syntax of commands, but the period
 150 (@samp{.}) punctuator is used only at the end of a command.  It is a
 151 punctuator only as the last character on a line (except white space).
 152 When it is the last non-space character on a line, a period is not
 153 treated as part of another token, even if it would otherwise be part
 154 of, e.g.@:, an identifier or a floating-point number.
 155
 156 Actually, the character that ends a command can be changed with
 157 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
 158 doing so.  Throughout the remainder of this manual we will assume that
 159 the default setting is in effect.
 160 @end table
 161
 162 @node Commands, Types of Commands, Tokens, Language
 163 @section Forming commands of tokens
 164
 165 @cindex PSPP, command structure
 166 @cindex language, command structure
 167 @cindex commands, structure
 168
 169 Most PSPP commands share a common structure.  A command begins with a
 170 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
 171 CASES}.  The command name may be abbreviated to its first word, and
 172 each word in the command name may be abbreviated to its first three
 173 or more characters, where these abbreviations are unambiguous.
 174
 175 The command name may be followed by one or more @dfn{subcommands}.
 176 Each subcommand begins with a subcommand name, which may be
 177 abbreviated to its first three letters.  Some subcommands accept a
 178 series of one or more specifications, which follow the subcommand
 179 name, optionally separated from it by an equals sign
 180 (@samp{=}). Specifications may be separated from each other
 181 by commas or spaces.  Each subcommand must be separated from the next (if any)
 182 by a forward slash (@samp{/}).
 183
 184 There are multiple ways to mark the end of a command.  The most common
 185 way is to end the last line of the command with a period (@samp{.}) as
 186 described in the previous section (@pxref{Tokens}).  A blank line, or
 187 one that consists only of white space or comments, also ends a command
 188 by default, although you can use the NULLINE subcommand of @cmd{SET}
 189 to disable this feature (@pxref{SET}).
 190
 191 In batch mode only, that is, when reading commands from a file instead
 192 of an interactive user, any line that contains a non-space character
 193 in the leftmost column begins a new command.  Thus, each command
 194 consists of a flush-left line followed by any number of lines indented
 195 from the left margin.  In this mode, a plus or minus sign
 196 (@samp{+}, @samp{@minus{}}) as the first character
 197 in a line is ignored and causes that line to begin a new command,
 198 which allows for visual indentation of a command without that command
 199 being considered part of the previous command.
 200
 201 @node Types of Commands, Order of Commands, Commands, Language
 202 @section Types of Commands
 203
 204 Commands in PSPP are divided roughly into six categories:
 205
 206 @table @strong
 207 @item Utility commands
 208 @cindex utility commands
 209 Set or display various global options that affect PSPP operations.
 210 May appear anywhere in a syntax file.  @xref{Utilities, , Utility
 211 commands}.
 212
 213 @item File definition commands
 214 @cindex file definition commands
 215 Give instructions for reading data from text files or from special
 216 binary ``system files''.  Most of these commands replace any previous
 217 data or variables with new data or
 218 variables.  At least one file definition command must appear before the first command in any of
 219 the categories below.  @xref{Data Input and Output}.
 220
 221 @item Input program commands
 222 @cindex input program commands
 223 Though rarely used, these provide tools for reading data files
 224 in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
 225
 226 @item Transformations
 227 @cindex transformations
 228 Perform operations on data and write data to output files.  Transformations
 229 are not carried out until a procedure is executed.
 230
 231 @item Restricted transformations
 232 @cindex restricted transformations
 233 Transformations that cannot appear in certain contexts.  @xref{Order
 234 of Commands}, for details.
 235
 236 @item Procedures
 237 @cindex procedures
 238 Analyze data, writing results of analyses to the listing file.  Cause
 239 transformations specified earlier in the file to be performed.  In a
 240 more general sense, a @dfn{procedure} is any command that causes the
 241 active file (the data) to be read.
 242 @end table
 243
 244 @node Order of Commands, Missing Observations, Types of Commands, Language
 245 @section Order of Commands
 246 @cindex commands, ordering
 247 @cindex order of commands
 248
 249 PSPP does not place many restrictions on ordering of commands.  The
 250 main restriction is that variables must be defined before they are otherwise
 251 referenced.  This section describes the details of command ordering,
 252 but most users will have no need to refer to them.
 253
 254 PSPP possesses five internal states, called initial, INPUT PROGRAM,
 255 FILE TYPE, transformation, and procedure states.  (Please note the
 256 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
 257 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
 258
 259 PSPP starts in the initial state.  Each successful completion
 260 of a command may cause a state transition.  Each type of command has its
 261 own rules for state transitions:
 262
 263 @table @strong
 264 @item Utility commands
 265 @itemize @bullet
 266 @item
 267 Valid in any state.
 268 @item
 269 Do not cause state transitions.  Exception: when @cmd{N OF CASES}
 270 is executed in the procedure state, it causes a transition to the
 271 transformation state.
 272 @end itemize
 273
 274 @item @cmd{DATA LIST}
 275 @itemize @bullet
 276 @item
 277 Valid in any state.
 278 @item
 279 When executed in the initial or procedure state, causes a transition to
 280 the transformation state.
 281 @item
 282 Clears the active file if executed in the procedure or transformation
 283 state.
 284 @end itemize
 285
 286 @item @cmd{INPUT PROGRAM}
 287 @itemize @bullet
 288 @item
 289 Invalid in INPUT PROGRAM and FILE TYPE states.
 290 @item
 291 Causes a transition to the INPUT PROGRAM state.
 292 @item
 293 Clears the active file.
 294 @end itemize
 295
 296 @item @cmd{FILE TYPE}
 297 @itemize @bullet
 298 @item
 299 Invalid in INPUT PROGRAM and FILE TYPE states.
 300 @item
 301 Causes a transition to the FILE TYPE state.
 302 @item
 303 Clears the active file.
 304 @end itemize
 305
 306 @item Other file definition commands
 307 @itemize @bullet
 308 @item
 309 Invalid in INPUT PROGRAM and FILE TYPE states.
 310 @item
 311 Cause a transition to the transformation state.
 312 @item
 313 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
 314 and @cmd{UPDATE}.
 315 @end itemize
 316
 317 @item Transformations
 318 @itemize @bullet
 319 @item
 320 Invalid in initial and FILE TYPE states.
 321 @item
 322 Cause a transition to the transformation state.
 323 @end itemize
 324
 325 @item Restricted transformations
 326 @itemize @bullet
 327 @item
 328 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 329 @item
 330 Cause a transition to the transformation state.
 331 @end itemize
 332
 333 @item Procedures
 334 @itemize @bullet
 335 @item
 336 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
 337 @item
 338 Cause a transition to the procedure state.
 339 @end itemize
 340 @end table
 341
 342 @node Missing Observations, Variables, Order of Commands, Language
 343 @section Handling missing observations
 344 @cindex missing values
 345 @cindex values, missing
 346
 347 PSPP includes special support for unknown numeric data values.
 348 Missing observations are assigned a special value, called the
 349 @dfn{system-missing value}.  This ``value'' actually indicates the
 350 absence of a value; it means that the actual value is unknown.  Procedures
 351 automatically exclude from analyses those observations or cases that
 352 have missing values.  Details of missing value exclusion depend on the
 353 procedure and can often be controlled by the user; refer to
 354 descriptions of individual procedures for details.
 355
 356 The system-missing value exists only for numeric variables.  String
 357 variables always have a defined value, even if it is only a string of
 358 spaces.
 359
 360 Variables, whether numeric or string, can have designated
 361 @dfn{user-missing values}.  Every user-missing value is an actual value
 362 for that variable.  However, most of the time user-missing values are
 363 treated in the same way as the system-missing value.  String variables
 364 that are wider than a certain width, usually 8 characters (depending on
 365 computer architecture), cannot have user-missing values.
 366
 367 For more information on missing values, see the following sections:
 368 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
 369 documentation on individual procedures for information on how they
 370 handle missing values.
 371
 372 @node Variables, Files, Missing Observations, Language
 373 @section Variables
 374 @cindex variables
 375 @cindex dictionary
 376
 377 Variables are the basic unit of data storage in PSPP.  All the
 378 variables in a file taken together, apart from any associated data, are
 379 said to form a @dfn{dictionary}.
 380 Some details of variables are described in the sections below.
 381
 382 @menu
 383 * Attributes::                  Attributes of variables.
 384 * System Variables::            Variables automatically defined by PSPP.
 385 * Sets of Variables::           Lists of variable names.
 386 * Input/Output Formats::        Input and output formats.
 387 * Scratch Variables::           Variables deleted by procedures.
 388 @end menu
 389
 390 @node Attributes, System Variables, Variables, Variables
 391 @subsection Attributes of Variables
 392 @cindex variables, attributes of
 393 @cindex attributes of variables
 394 Each variable has a number of attributes, including:
 395
 396 @table @strong
 397 @item Name
 398 An identifier, up to 64 bytes long.  Each variable must have a different name.
 399 @xref{Tokens}.
 400
 401 Some system variable names begin with @samp{$}, but user-defined
 402 variables' names may not begin with @samp{$}.
 403
 404 @cindex @samp{.}
 405 @cindex period
 406 @cindex variable names, ending with period
 407 The final character in a variable name should not be @samp{.}, because
 408 such an identifier will be misinterpreted when it is the final token
 409 on a line: @code{FOO.} will be divided into two separate tokens,
 410 @samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.
 411
 412 @cindex @samp{_}
 413 The final character in a variable name should not be @samp{_}, because
 414 some such identifiers are used for special purposes by PSPP
 415 procedures.
 416
 417 As with all PSPP identifiers, variable names are not case-sensitive.
 418 PSPP capitalizes variable names on output the same way they were
 419 capitalized at their point of definition in the input.
 420
 421 @cindex variables, type
 422 @cindex type of variables
 423 @item Type
 424 Numeric or string.
 425
 426 @cindex variables, width
 427 @cindex width of variables
 428 @item Width
 429 (string variables only) String variables with a width of 8 characters or
 430 fewer are called @dfn{short string variables}.  Short string variables
 431 can be used in many procedures where @dfn{long string variables} (those
 432 with widths greater than 8) are not allowed.
 433
 434 Certain systems may consider strings longer than 8
 435 characters to be short strings.  Eight characters represents a minimum
 436 figure for the maximum length of a short string.
 437
 438 @item Position
 439 Variables in the dictionary are arranged in a specific order.
 440 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
 441
 442 @item Initialization
 443 Either reinitialized to 0 or spaces for each case, or left at its
 444 existing value.  @xref{LEAVE}.
 445
 446 @cindex missing values
 447 @cindex values, missing
 448 @item Missing values
 449 Optionally, up to three values, or a range of values, or a specific
 450 value plus a range, can be specified as @dfn{user-missing values}.
 451 There is also a @dfn{system-missing value} that is assigned to an
 452 observation when there is no other obvious value for that observation.
 453 Observations with missing values are automatically excluded from
 454 analyses.  User-missing values are actual data values, while the
 455 system-missing value is not a value at all.  @xref{Missing Observations}.
 456
 457 @cindex variable labels
 458 @cindex labels, variable
 459 @item Variable label
 460 A string that describes the variable.  @xref{VARIABLE LABELS}.
 461
 462 @cindex value labels
 463 @cindex labels, value
 464 @item Value label
 465 Optionally, these associate each possible value of the variable with a
 466 string.  @xref{VALUE LABELS}.
 467
 468 @cindex print format
 469 @item Print format
 470 Display width, format, and (for numeric variables) number of decimal
 471 places.  This attribute does not affect how data are stored, just how
 472 they are displayed.  Example: a width of 8, with 2 decimal places.
 473 @xref{PRINT FORMATS}.
 474
 475 @cindex write format
 476 @item Write format
 477 Similar to print format, but used by certain commands that are
 478 designed to write to binary files.  @xref{WRITE FORMATS}.
 479 @end table
 480
 481 @node System Variables, Sets of Variables, Attributes, Variables
 482 @subsection Variables Automatically Defined by PSPP
 483 @cindex system variables
 484 @cindex variables, system
 485
 486 There are seven system variables.  These are not like ordinary
 487 variables because system variables are not always stored.  They can be used only
 488 in expressions.  These system variables, whose values and output formats
 489 cannot be modified, are described below.
 490
 491 @table @code
 492 @cindex @code{$CASENUM}
 493 @item $CASENUM
 494 Case number of the case at the moment.  This changes as cases are
 495 shuffled around.
 496
 497 @cindex @code{$DATE}
 498 @item $DATE
 499 Date the PSPP process was started, in format A9, following the
 500 pattern @code{DD MMM YY}.
 501
 502 @cindex @code{$JDATE}
 503 @item $JDATE
 504 Number of days between 15 Oct 1582 and the time the PSPP process
 505 was started.
 506
 507 @cindex @code{$LENGTH}
 508 @item $LENGTH
 509 Page length, in lines, in format F11.
 510
 511 @cindex @code{$SYSMIS}
 512 @item $SYSMIS
 513 System missing value, in format F1.
 514
 515 @cindex @code{$TIME}
 516 @item $TIME
 517 Number of seconds between midnight 14 Oct 1582 and the time the active file
 518 was read, in format F20.
 519
 520 @cindex @code{$WIDTH}
 521 @item $WIDTH
 522 Page width, in characters, in format F3.
 523 @end table
 524
 525 @node Sets of Variables, Input/Output Formats, System Variables, Variables
 526 @subsection Lists of variable names
 527 @cindex TO convention
 528 @cindex convention, TO
 529
 530 To refer to a set of variables, list their names one after another.
 531 Optionally, their names may be separated by commas.  To include a
 532 range of variables from the dictionary in the list, write the name of
 533 the first and last variable in the range, separated by @code{TO}.  For
 534 instance, if the dictionary contains six variables with the names
 535 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
 536 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
 537 variables @code{X2}, @code{GOAL}, and @code{MET}.
 538
 539 Commands that define variables, such as @cmd{DATA LIST}, give
 540 @code{TO} an alternate meaning.  With these commands, @code{TO} define
 541 sequences of variables whose names end in consecutive integers.  The
 542 syntax is two identifiers that begin with the same root and end with
 543 numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
 544 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
 545 @code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
 546 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
 547 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
 548 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
 549
 550 After a set of variables has been defined with @cmd{DATA LIST} or
 551 another command with this method, the same set can be referenced on
 552 later commands using the same syntax.
 553
 554 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
 555 @subsection Input and Output Formats
 556
 557 Data that PSPP inputs and outputs must have one of a number of formats.
 558 These formats are described, in general, by a format specification of
 559 the form @code{NAMEw.d}, where @var{name} is the
 560 format name and @var{w} is a field width.  @var{d} is the optional
 561 desired number of decimal places, if appropriate.  If @var{d} is not
 562 included then it is assumed to be 0.  Some formats do not allow @var{d}
 563 to be specified.
 564
 565 When @cmd{DATA LIST} or another command specifies an input format,
 566 that format is converted to an output format for the purposes of
 567 @cmd{PRINT} and other data output commands.  For most purposes, input
 568 and output formats are the same; the salient differences are described
 569 below.
 570
 571 Below are listed the input and output formats supported by PSPP.  If an
 572 input format is mapped to a different output format by default, then
 573 that mapping is indicated with @result{}.  Each format has the listed
 574 bounds on input width (iw) and output width (ow).
 575
 576 The standard numeric input and output formats are given in the following
 577 table:
 578
 579 @table @asis
 580 @item Fw.d: 1 <= iw,ow <= 40
 581 Standard decimal format with @var{d} decimal places.  If the number is
 582 too large to fit within the field width, it is expressed in scientific
 583 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
 584 the exponent.  When used as an input format, scientific notation is
 585 allowed but an E or an F must be used to introduce the exponent.
 586
 587 The default output format is the same as the input format, except if
 588 @var{d} > 1.  In that case the output @var{w} is always made to be at
 589 least 2 + @var{d}.
 590
 591 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
 592 For input this is equivalent to F format except that no E or F is
 593 require to introduce the exponent.  For output, produces scientific
 594 notation in the form @code{1.2+34}.  There are always at least two
 595 digits given in the exponent.
 596
 597 The default output @var{w} is the largest of the input @var{w}, the
 598 input @var{d} + 7, and 10.  The default output @var{d} is the input
 599 @var{d}, but at least 3.
 600
 601 @item COMMAw.d: 1 <= iw,ow <= 40
 602 Equivalent to F format, except that groups of three digits are
 603 comma-separated on output.  If the number is too large to express in the
 604 field width, then first commas are eliminated, then if there is still
 605 not enough space the number is expressed in scientific notation given
 606 that w >= 6.  Commas are allowed and ignored when this is used as an
 607 input format.
 608
 609 @item DOTw.d: 1 <= iw,ow <= 40
 610 Equivalent to COMMA format except that the roles of comma and decimal
 611 point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
 612 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
 613 decimal point.
 614
 615 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
 616 Equivalent to COMMA format, except that the number is prefixed by a
 617 dollar sign (@samp{$}) if there is room.  On input the value is allowed
 618 to be prefixed by a dollar sign, which is ignored.
 619
 620 The default output @var{w} is the input @var{w}, but at least 2.
 621
 622 @item PCTw.d: 2 <= iw,ow <= 40
 623 Equivalent to F format, except that the number is suffixed by a percent
 624 sign (@samp{%}) if there is room.  On input the value is allowed to be
 625 suffixed by a percent sign, which is ignored.
 626
 627 The default output @var{w} is the input @var{w}, but at least 2.
 628
 629 @item Nw.d: 1 <= iw,ow <= 40
 630 Only digits are allowed within the field width.  The decimal point is
 631 assumed to be @var{d} digits from the right margin.
 632
 633 The default output format is F with the same @var{w} and @var{d}, except
 634 if @var{d} > 1.  In that case the output @var{w} is always made to be at
 635 least 2 + @var{d}.
 636
 637 @item Zw.d @result{} F: 1 <= iw,ow <= 40
 638 Zoned decimal input.  If you need to use this then you know how.
 639
 640 @item IBw.d @result{} F: 1 <= iw,ow <= 8
 641 Integer binary format.  The field is interpreted as a fixed-point
 642 positive or negative binary number in two's-complement notation.  The
 643 location of the decimal point is implied.  Endianness is the same as the
 644 host machine.
 645
 646 The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
 647 with output @var{w} as 9 + input @var{d} and output @var{d} as input
 648 @var{d}.
 649
 650 @item PIB @result{} F: 1 <= iw,ow <= 8
 651 Positive integer binary format.  The field is interpreted as a
 652 fixed-point positive binary number.  The location of the decimal point
 653 is implied.  Endianness is the same as the host machine.
 654
 655 The default output format follows the rules for IB format.
 656
 657 @item Pw.d @result{} F: 1 <= iw,ow <= 16
 658 Binary coded decimal format.  Each byte from left to right, except the
 659 rightmost, represents two digits.  The upper nibble of each byte is more
 660 significant.  The upper nibble of the final byte is the least
 661 significant digit.  The lower nibble of the final byte is the sign; a
 662 value of D represents a negative sign and all other values are
 663 considered positive.  The decimal point is implied.
 664
 665 The default output format follows the rules for IB format.
 666
 667 @item PKw.d @result{} F: 1 <= iw,ow <= 16
 668 Positive binary code decimal format.  Same as P but the last byte is the
 669 same as the others.
 670
 671 The default output format follows the rules for IB format.
 672
 673 @item RBw @result{} F: 2 <= iw,ow <= 8
 674
 675 Binary C architecture-dependent ``double'' format.  For a standard
 676 IEEE754 implementation @var{w} should be 8.
 677
 678 The default output format follows the rules for IB format.
 679
 680 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
 681 PIB format encoded as textual hex digit pairs.  @var{w} must be even.
 682
 683 The input width is mapped to a default output width as follows:
 684 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
 685 12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
 686 decimal places.
 687
 688 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
 689
 690 RB format encoded as textual hex digits pairs.  @var{w} must be even.
 691
 692 The default output format is F8.2.
 693
 694 @item CCAw.d: 1 <= ow <= 40
 695 @itemx CCBw.d: 1 <= ow <= 40
 696 @itemx CCCw.d: 1 <= ow <= 40
 697 @itemx CCDw.d: 1 <= ow <= 40
 698 @itemx CCEw.d: 1 <= ow <= 40
 699
 700 User-defined custom currency formats.  May not be used as an input
 701 format.  @xref{SET}, for more details.
 702 @end table
 703
 704 The date and time numeric input and output formats accept a number of
 705 possible formats.  Before describing the formats themselves, some
 706 definitions of the elements that make up their formats will be helpful:
 707
 708 @table @dfn
 709 @item leader
 710 All formats accept an optional white space leader.
 711
 712 @item day
 713 An integer between 1 and 31 representing the day of month.
 714
 715 @item day-count
 716 An integer representing a number of days.
 717
 718 @item date-delimiter
 719 One or more characters of white space or the following characters:
 720 @code{- / . ,}
 721
 722 @item month
 723 A month name in one of the following forms:
 724 @itemize @bullet
 725 @item
 726 An integer between 1 and 12.
 727 @item
 728 Roman numerals representing an integer between 1 and 12.
 729 @item
 730 At least the first three characters of an English month name (January,
 731 February, @dots{}).
 732 @end itemize
 733
 734 @item year
 735 An integer year number between 1582 and 19999, or between 1 and 199.
 736 Years between 1 and 199 will have 1900 added.
 737
 738 @item julian
 739 A single number with a year number in the first 2, 3, or 4 digits (as
 740 above) and the day number within the year in the last 3 digits.
 741
 742 @item quarter
 743 An integer between 1 and 4 representing a quarter.
 744
 745 @item q-delimiter
 746 The letter @samp{Q} or @samp{q}.
 747
 748 @item week
 749 An integer between 1 and 53 representing a week within a year.
 750
 751 @item wk-delimiter
 752 The letters @samp{wk} in any case.
 753
 754 @item time-delimiter
 755 At least one characters of white space or @samp{:} or @samp{.}.
 756
 757 @item hour
 758 An integer greater than 0 representing an hour.
 759
 760 @item minute
 761 An integer between 0 and 59 representing a minute within an hour.
 762
 763 @item opt-second
 764 Optionally, a time-delimiter followed by a real number representing a
 765 number of seconds.
 766
 767 @item hour24
 768 An integer between 0 and 23 representing an hour within a day.
 769
 770 @item weekday
 771 At least the first two characters of an English day word.
 772
 773 @item spaces
 774 Any amount or no amount of white space.
 775
 776 @item sign
 777 An optional positive or negative sign.
 778
 779 @item trailer
 780 All formats accept an optional white space trailer.
 781 @end table
 782
 783 The date input formats are strung together from the above pieces.  On
 784 output, the date formats are always printed in a single canonical
 785 manner, based on field width.  The date input and output formats are
 786 described below:
 787
 788 @table @asis
 789 @item DATEw: 9 <= iw,ow <= 40
 790 Date format. Input format: leader + day + date-delimiter +
 791 month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
 792 @var{w} < 11, DD-MMM-YYYY otherwise.
 793
 794 @item EDATEw: 8 <= iw,ow <= 40
 795 European date format.  Input format same as DATE.  Output format:
 796 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
 797
 798 @item SDATEw: 8 <= iw,ow <= 40
 799 Standard date format. Input format: leader + year + date-delimiter +
 800 month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
 801 @var{w} < 10, YYYY/MM/DD otherwise.
 802
 803 @item ADATEw: 8 <= iw,ow <= 40
 804 American date format.  Input format: leader + month + date-delimiter +
 805 day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
 806 @var{w} < 10, MM/DD/YYYY otherwise.
 807
 808 @item JDATEw: 5 <= iw,ow <= 40
 809 Julian date format.  Input format: leader + julian + trailer.  Output
 810 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
 811
 812 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
 813 Quarter/year format.  Input format: leader + quarter + q-delimiter +
 814 year + trailer.  Output format: @samp{Q Q YY}, where the first
 815 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
 816 YYYY} otherwise.
 817
 818 @item MOYRw: 6 <= iw,ow <= 40
 819 Month/year format.  Input format: leader + month + date-delimiter + year
 820 + trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
 821 YYYY} otherwise.
 822
 823 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
 824 Week/year format.  Input format: leader + week + wk-delimiter + year +
 825 trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
 826 YYYY} otherwise.
 827
 828 @item DATETIMEw.d: 17 <= iw,ow <= 40
 829 Date and time format.  Input format: leader + day + date-delimiter +
 830 month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
 831 + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
 832 @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
 833 @var{d} > 0 then fractional seconds @samp{.SS} are added.
 834
 835 @item TIMEw.d: 5 <= iw,ow <= 40
 836 Time format.  Input format: leader + sign + spaces + hour +
 837 time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
 838 Seconds and fractional seconds are available with @var{w} of at least 8
 839 and 10, respectively.
 840
 841 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
 842 Time format with day count.  Input format: leader + sign + spaces +
 843 day-count + time-delimiter + hour + time-delimiter + minute +
 844 opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
 845 seconds are available with @var{w} of at least 8 and 10, respectively.
 846
 847 @item WKDAYw: 2 <= iw,ow <= 40
 848 A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
 849 leader + weekday + trailer.  Output format: as many characters, in all
 850 capital letters, of the English name of the weekday as will fit in the
 851 field width.
 852
 853 @item MONTHw: 3 <= iw,ow <= 40
 854 A month as a number between 1 and 12, where 1 is January.  Input format:
 855 leader + month + trailer.  Output format: as many character, in all
 856 capital letters, of the English name of the month as will fit in the
 857 field width.
 858 @end table
 859
 860 There are only two formats that may be used with string variables:
 861
 862 @table @asis
 863 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
 864 The entire field is treated as a string value.
 865
 866 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
 867 The field is composed of characters in a string encoded as textual hex
 868 digit pairs.
 869
 870 The default output @var{w} is half the input @var{w}.
 871 @end table
 872
 873 @node Scratch Variables,  , Input/Output Formats, Variables
 874 @subsection Scratch Variables
 875
 876 Most of the time, variables don't retain their values between cases.
 877 Instead, either they're being read from a data file or the active file,
 878 in which case they assume the value read, or, if created with
 879 @cmd{COMPUTE} or
 880 another transformation, they're initialized to the system-missing value
 881 or to blanks, depending on type.
 882
 883 However, sometimes it's useful to have a variable that keeps its value
 884 between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
 885 use a @dfn{scratch variable}.  Scratch variables are variables whose
 886 names begin with an octothorpe (@samp{#}).
 887
 888 Scratch variables have the same properties as variables left with
 889 @cmd{LEAVE}: they retain their values between cases, and for the first
 890 case they are initialized to 0 or blanks.  They have the additional
 891 property that they are deleted before the execution of any procedure.
 892 For this reason, scratch variables can't be used for analysis.  To use
 893 a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
 894 to copy its value into an ordinary variable, then use that ordinary
 895 variable in the analysis.
 896
 897 @node Files
 898 @section Files Used by PSPP
 899
 900 PSPP makes use of many files each time it runs.  Some of these it
 901 reads, some it writes, some it creates.  Here is a table listing the
 902 most important of these files:
 903
 904 @table @strong
 905 @cindex file, command
 906 @cindex file, syntax file
 907 @cindex command file
 908 @cindex syntax file
 909 @item command file
 910 @itemx syntax file
 911 These names (synonyms) refer to the file that contains instructions
 912 that tell PSPP what to do.  The syntax file's name is specified on
 913 the PSPP command line.  Syntax files can also be read with
 914 @cmd{INCLUDE} (@pxref{INCLUDE}).
 915
 916 @cindex file, data
 917 @cindex data file
 918 @item data file
 919 Data files contain raw data in text or binary format.  Data can also
 920 be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
 921
 922 @cindex file, output
 923 @cindex output file
 924 @item listing file
 925 One or more output files are created by PSPP each time it is
 926 run.  The output files receive the tables and charts produced by
 927 statistical procedures.  The output files may be in any number of formats,
 928 depending on how PSPP is configured.
 929
 930 @cindex active file
 931 @cindex file, active
 932 @item active file
 933 The active file is the ``file'' on which all PSPP procedures are
 934 performed.  The active file consists of a dictionary and a set of cases.
 935 The active file is not necessarily a disk file: it is stored in memory
 936 if there is room.
 937
 938 @cindex system file
 939 @cindex file, system
 940 @item system file
 941 System files are binary files that store a dictionary and a set of
 942 cases.  @cmd{GET} and @cmd{SAVE} read and write system files.
 943
 944 @cindex portable file
 945 @cindex file, portable
 946 @item portable file
 947 Portable files are files in a text-based format that store a dictionary
 948 and a set of cases.  @cmd{IMPORT} and @cmd{EXPORT} read and write
 949 portable files.
 950
 951 @cindex scratch file
 952 @cindex file, scratch
 953 @item scratch file
 954 Scratch files consist of a dictionary and cases and may be stored in
 955 memory or on disk.  Most procedures that act on a system file or
 956 portable file can use a scratch file instead.  The contents of scratch
 957 files persist within a single PSPP session only.  @cmd{GET} and
 958 @cmd{SAVE} can be used to read and write scratch files.  Scratch files
 959 are a PSPP extension.
 960 @end table
 961
 962 @node File Handles
 963 @section File Handles
 964 @cindex file handles
 965
 966 A @dfn{file handle} is a reference to a data file, system file, portable
 967 file, or scratch file.  Most often, a file handle is specified as the
 968 name of a file as a string, that is, enclosed within @samp{'} or
 969 @samp{"}.
 970
 971 PSPP also supports declaring named file handles with the @cmd{FILE
 972 HANDLE} command.  This command associates an identifier of your choice
 973 (the file handle's name) with a file.  Later, the file handle name can
 974 be substituted for the name of the file.  When PSPP syntax accesses a
 975 file multiple times, declaring a named file handle simplifies updating
 976 the syntax later to use a different file.  Use of @cmd{FILE HANDLE} is
 977 also required to read data files in binary formats.  @xref{FILE HANDLE},
 978 for more information.
 979
 980 PSPP assumes that a file handle name that begins with @samp{#} refers to
 981 a scratch file, unless the name has already been declared on @cmd{FILE
 982 HANDLE} to refer to another kind of file.  A scratch file is similar to
 983 a system file, except that it persists only for the duration of a given
 984 PSPP session.  Most commands that read or write a system or portable
 985 file, such as @cmd{GET} and @cmd{SAVE}, also accept scratch file
 986 handles.  Scratch file handles may also be declared explicitly with
 987 @cmd{FILE HANDLE}.  Scratch files are a PSPP extension.
 988
 989 In some circumstances, PSPP must distinguish whether a file handle
 990 refers to a system file or a portable file.  When this is necessary to
 991 read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
 992 PSPP uses the file's contents to decide.  In the context of writing a
 993 file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, PSPP
 994 decides based on the file's name: if it ends in @samp{.por} (with any
 995 capitalization), then PSPP writes a portable file; otherwise, PSPP
 996 writes a system file.
 997
 998 INLINE is reserved as a file handle name.  It refers to the ``data
 999 file'' embedded into the syntax file between @cmd{BEGIN DATA} and
1000 @cmd{END DATA}.  @xref{BEGIN DATA}, for more information.
1001
1002 The file to which a file handle refers may be reassigned on a later
1003 @cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
1004 HANDLE}.  The @cmd{CLOSE FILE HANDLE} command is also useful to free the
1005 storage associated with a scratch file.  @xref{CLOSE FILE HANDLE}, for
1006 more information.
1007
1008 @node BNF
1009 @section Backus-Naur Form
1010 @cindex BNF
1011 @cindex Backus-Naur Form
1012 @cindex command syntax, description of
1013 @cindex description of command syntax
1014
1015 The syntax of some parts of the PSPP language is presented in this
1016 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
1017 following table describes BNF:
1018
1019 @itemize @bullet
1020 @cindex keywords
1021 @cindex terminals
1022 @item
1023 Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
1024 often called @dfn{terminals}.  There are some special terminals, which
1025 are written in lowercase for clarity:
1026
1027 @table @asis
1028 @cindex @code{number}
1029 @item @code{number}
1030 A real number.
1031
1032 @cindex @code{integer}
1033 @item @code{integer}
1034 An integer number.
1035
1036 @cindex @code{string}
1037 @item @code{string}
1038 A string.
1039
1040 @cindex @code{var-name}
1041 @item @code{var-name}
1042 A single variable name.
1043
1044 @cindex operators
1045 @cindex punctuators
1046 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
1047 Operators and punctuators.
1048
1049 @cindex @code{.}
1050 @item @code{.}
1051 The end of the command.  This is not necessarily an actual dot in the
1052 syntax file: @xref{Commands}, for more details.
1053 @end table
1054
1055 @item
1056 @cindex productions
1057 @cindex nonterminals
1058 Other words in all lowercase refer to BNF definitions, called
1059 @dfn{productions}.  These productions are also known as
1060 @dfn{nonterminals}.  Some nonterminals are very common, so they are
1061 defined here in English for clarity:
1062
1063 @table @code
1064 @cindex @code{var-list}
1065 @item var-list
1066 A list of one or more variable names or the keyword @code{ALL}.
1067
1068 @cindex @code{expression}
1069 @item expression
1070 An expression.  @xref{Expressions}, for details.
1071 @end table
1072
1073 @item
1074 @cindex ``is defined as''
1075 @cindex productions
1076 @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
1077 the name of the nonterminal being defined.  The right side of @samp{::=}
1078 gives the definition of that nonterminal.  If the right side is empty,
1079 then one possible expansion of that nonterminal is nothing.  A BNF
1080 definition is called a @dfn{production}.
1081
1082 @item
1083 @cindex terminals and nonterminals, differences
1084 So, the key difference between a terminal and a nonterminal is that a
1085 terminal cannot be broken into smaller parts---in fact, every terminal
1086 is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
1087 composed of a (possibly empty) sequence of terminals and nonterminals.
1088 Thus, terminals indicate the deepest level of syntax description.  (In
1089 parsing theory, terminals are the leaves of the parse tree; nonterminals
1090 form the branches.)
1091
1092 @item
1093 @cindex start symbol
1094 @cindex symbol, start
1095 The first nonterminal defined in a set of productions is called the
1096 @dfn{start symbol}.  The start symbol defines the entire syntax for
1097 that command.
1098 @end itemize
1099 @setfilename ignored