doc/data-io.texi

   1 @node Data Input and Output
   2 @chapter Data Input and Output
   3 @cindex input
   4 @cindex output
   5 @cindex data
   6 @cindex cases
   7 @cindex observations
   8
   9 Data are the focus of the PSPP language.
  10 Each datum  belongs to a @dfn{case} (also called an @dfn{observation}).
  11 Each case represents an individual or ``experimental unit''.
  12 For example, in the results of a survey, the names of the respondents,
  13 their sex, age, etc.@: and their responses are all data and the data
  14 pertaining to single respondent is a case.
  15 This chapter examines
  16 the PSPP commands for defining variables and reading and writing data.
  17
  18 @quotation Note
  19 These commands tell PSPP how to read data, but the data will not
  20 actually be read until a procedure is executed.
  21 @end quotation
  22
  23 @menu
  24 * BEGIN DATA::                  Embed data within a syntax file.
  25 * CLOSE FILE HANDLE::           Close a file handle.
  26 * DATA LIST::                   Fundamental data reading command.
  27 * END CASE::                    Output the current case.
  28 * END FILE::                    Terminate the current input program.
  29 * FILE HANDLE::                 Support for special file formats.
  30 * INPUT PROGRAM::               Support for complex input programs.
  31 * LIST::                        List cases in the active file.
  32 * NEW FILE::                    Clear the active file and dictionary.
  33 * PRINT::                       Display values in print formats.
  34 * PRINT EJECT::                 Eject the current page then print.
  35 * PRINT SPACE::                 Print blank lines.
  36 * REREAD::                      Take another look at the previous input line.
  37 * REPEATING DATA::              Multiple cases on a single line.
  38 * WRITE::                       Display values in write formats.
  39 @end menu
  40
  41 @node BEGIN DATA
  42 @section BEGIN DATA
  43 @vindex BEGIN DATA
  44 @vindex END DATA
  45 @cindex Embedding data in syntax files
  46 @cindex Data, embedding in syntax files
  47
  48 @display
  49 BEGIN DATA.
  50 @dots{}
  51 END DATA.
  52 @end display
  53
  54 @cmd{BEGIN DATA} and @cmd{END DATA} can be used to embed raw ASCII
  55 data in a PSPP syntax file.  @cmd{DATA LIST} or another input
  56 procedure must be used before @cmd{BEGIN DATA} (@pxref{DATA LIST}).
  57 @cmd{BEGIN DATA} and @cmd{END DATA} must be used together.  @cmd{END
  58 DATA} must appear by itself on a single line, with no leading
  59 white space and exactly one space between the words @code{END} and
  60 @code{DATA}, like this:
  61
  62 @example
  63 END DATA.
  64 @end example
  65
  66 @node CLOSE FILE HANDLE
  67 @section CLOSE FILE HANDLE
  68
  69 @display
  70 CLOSE FILE HANDLE handle_name.
  71 @end display
  72
  73 @cmd{CLOSE FILE HANDLE} disassociates the name of a file handle with a
  74 given file.  The only specification is the name of the handle to close.
  75 Afterward
  76 @cmd{FILE HANDLE}.
  77
  78 If the file handle name refers to a scratch file, then the storage
  79 associated with the scratch file in memory or on disk will be freed.
  80 If the scratch file is in use, e.g.@: it has been specified on a
  81 @cmd{GET} command whose execution has not completed, then freeing is
  82 delayed until it is no longer in use.
  83
  84 The file named INLINE, which represents data entered between @cmd{BEGIN
  85 DATA} and @cmd{END DATA}, cannot be closed.  Attempts to close it with
  86 @cmd{CLOSE FILE HANDLE} have no effect.
  87
  88 @cmd{CLOSE FILE HANDLE} is a PSPP extension.
  89
  90 @node DATA LIST
  91 @section DATA LIST
  92 @vindex DATA LIST
  93 @cindex reading data from a file
  94 @cindex data, reading from a file
  95 @cindex data, embedding in syntax files
  96 @cindex embedding data in syntax files
  97
  98 Used to read text or binary data, @cmd{DATA LIST} is the most
  99 fundamental data-reading command.  Even the more sophisticated input
 100 methods use @cmd{DATA LIST} commands as a building block.
 101 Understanding @cmd{DATA LIST} is important to understanding how to use
 102 PSPP to read your data files.
 103
 104 There are two major variants of @cmd{DATA LIST}, which are fixed
 105 format and free format.  In addition, free format has a minor variant,
 106 list format, which is discussed in terms of its differences from vanilla
 107 free format.
 108
 109 Each form of @cmd{DATA LIST} is described in detail below.
 110
 111 @xref{GET DATA}, for a command that offers a few enhancements over
 112 DATA LIST and that may be substituted for DATA LIST in many
 113 situations.
 114
 115 @menu
 116 * DATA LIST FIXED::             Fixed columnar locations for data.
 117 * DATA LIST FREE::              Any spacing you like.
 118 * DATA LIST LIST::              Each case must be on a single line.
 119 @end menu
 120
 121 @node DATA LIST FIXED
 122 @subsection DATA LIST FIXED
 123 @vindex DATA LIST FIXED
 124 @cindex reading fixed-format data
 125 @cindex fixed-format data, reading
 126 @cindex data, fixed-format, reading
 127 @cindex embedding fixed-format data
 128
 129 @display
 130 DATA LIST [FIXED]
 131         @{TABLE,NOTABLE@}
 132         [FILE='file-name']
 133         [RECORDS=record_count]
 134         [END=end_var]
 135         [SKIP=record_count]
 136         /[line_no] var_spec@dots{}
 137
 138 where each var_spec takes one of the forms
 139         var_list start-end [type_spec]
 140         var_list (fortran_spec)
 141 @end display
 142
 143 @cmd{DATA LIST FIXED} is used to read data files that have values at fixed
 144 positions on each line of single-line or multiline records.  The
 145 keyword FIXED is optional.
 146
 147 The FILE subcommand must be used if input is to be taken from an
 148 external file.  It may be used to specify a file name as a string or a
 149 file handle (@pxref{File Handles}).  If the FILE subcommand is not used,
 150 then input is assumed to be specified within the command file using
 151 @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}).
 152
 153 The optional RECORDS subcommand, which takes a single integer as an
 154 argument, is used to specify the number of lines per record.  If RECORDS
 155 is not specified, then the number of lines per record is calculated from
 156 the list of variable specifications later in @cmd{DATA LIST}.
 157
 158 The END subcommand is only useful in conjunction with @cmd{INPUT
 159 PROGRAM}.  @xref{INPUT PROGRAM}, for details.
 160
 161 The optional SKIP subcommand specifies a number of records to skip at
 162 the beginning of an input file.  It can be used to skip over a row
 163 that contains variable names, for example.
 164
 165 @cmd{DATA LIST} can optionally output a table describing how the data file
 166 will be read.  The TABLE subcommand enables this output, and NOTABLE
 167 disables it.  The default is to output the table.
 168
 169 The list of variables to be read from the data list must come last.
 170 Each line in the data record is introduced by a slash (@samp{/}).
 171 Optionally, a line number may follow the slash.  Following, any number
 172 of variable specifications may be present.
 173
 174 Each variable specification consists of a list of variable names
 175 followed by a description of their location on the input line.  Sets of
 176 variables may be specified using the @code{DATA LIST} TO convention
 177 (@pxref{Sets of
 178 Variables}).  There are two ways to specify the location of the variable
 179 on the line: columnar style and FORTRAN style.
 180
 181 In columnar style, the starting column and ending column for the field
 182 are specified after the variable name, separated by a dash (@samp{-}).
 183 For instance, the third through fifth columns on a line would be
 184 specified @samp{3-5}.  By default, variables are considered to be in
 185 @samp{F} format (@pxref{Input and Output Formats}).  (This default can be
 186 changed; see @ref{SET} for more information.)
 187
 188 In columnar style, to use a variable format other than the default,
 189 specify the format type in parentheses after the column numbers.  For
 190 instance, for alphanumeric @samp{A} format, use @samp{(A)}.
 191
 192 In addition, implied decimal places can be specified in parentheses
 193 after the column numbers.  As an example, suppose that a data file has a
 194 field in which the characters @samp{1234} should be interpreted as
 195 having the value 12.34.  Then this field has two implied decimal places,
 196 and the corresponding specification would be @samp{(2)}.  If a field
 197 that has implied decimal places contains a decimal point, then the
 198 implied decimal places are not applied.
 199
 200 Changing the variable format and adding implied decimal places can be
 201 done together; for instance, @samp{(N,5)}.
 202
 203 When using columnar style, the input and output width of each variable is
 204 computed from the field width.  The field width must be evenly divisible
 205 into the number of variables specified.
 206
 207 FORTRAN style is an altogether different approach to specifying field
 208 locations.  With this approach, a list of variable input format
 209 specifications, separated by commas, are placed after the variable names
 210 inside parentheses.  Each format specifier advances as many characters
 211 into the input line as it uses.
 212
 213 Implied decimal places also exist in FORTRAN style.  A format
 214 specification with @var{d} decimal places also has @var{d} implied
 215 decimal places.
 216
 217 In addition to the standard format specifiers (@pxref{Input and Output
 218 Formats}), FORTRAN style defines some extensions:
 219
 220 @table @asis
 221 @item @code{X}
 222 Advance the current column on this line by one character position.
 223
 224 @item @code{T}@var{x}
 225 Set the current column on this line to column @var{x}, with column
 226 numbers considered to begin with 1 at the left margin.
 227
 228 @item @code{NEWREC}@var{x}
 229 Skip forward @var{x} lines in the current record, resetting the active
 230 column to the left margin.
 231
 232 @item Repeat count
 233 Any format specifier may be preceded by a number.  This causes the
 234 action of that format specifier to be repeated the specified number of
 235 times.
 236
 237 @item (@var{spec1}, @dots{}, @var{specN})
 238 Group the given specifiers together.  This is most useful when preceded
 239 by a repeat count.  Groups may be nested arbitrarily.
 240 @end table
 241
 242 FORTRAN and columnar styles may be freely intermixed.  Columnar style
 243 leaves the active column immediately after the ending column
 244 specified.  Record motion using @code{NEWREC} in FORTRAN style also
 245 applies to later FORTRAN and columnar specifiers.
 246
 247 @menu
 248 * DATA LIST FIXED Examples::    Examples of DATA LIST FIXED.
 249 @end menu
 250
 251 @node DATA LIST FIXED Examples
 252 @unnumberedsubsubsec Examples
 253
 254 @enumerate
 255 @item
 256 @example
 257 DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
 258
 259 BEGIN DATA.
 260 John Smith 102311
 261 Bob Arnold 122015
 262 Bill Yates  918 6
 263 END DATA.
 264 @end example
 265
 266 Defines the following variables:
 267
 268 @itemize @bullet
 269 @item
 270 @code{NAME}, a 10-character-wide long string variable, in columns 1
 271 through 10.
 272
 273 @item
 274 @code{INFO1}, a numeric variable, in columns 12 through 13.
 275
 276 @item
 277 @code{INFO2}, a numeric variable, in columns 14 through 15.
 278
 279 @item
 280 @code{INFO3}, a numeric variable, in columns 16 through 17.
 281 @end itemize
 282
 283 The @code{BEGIN DATA}/@code{END DATA} commands cause three cases to be
 284 defined:
 285
 286 @example
 287 Case   NAME         INFO1   INFO2   INFO3
 288    1   John Smith     10      23      11
 289    2   Bob Arnold     12      20      15
 290    3   Bill Yates      9      18       6
 291 @end example
 292
 293 The @code{TABLE} keyword causes PSPP to print out a table
 294 describing the four variables defined.
 295
 296 @item
 297 @example
 298 DAT LIS FIL="survey.dat"
 299         /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
 300         /Q01 TO Q50 7-56
 301         /.
 302 @end example
 303
 304 Defines the following variables:
 305
 306 @itemize @bullet
 307 @item
 308 @code{ID}, a numeric variable, in columns 1-5 of the first record.
 309
 310 @item
 311 @code{NAME}, a 30-character long string variable, in columns 7-36 of the
 312 first record.
 313
 314 @item
 315 @code{SURNAME}, a 30-character long string variable, in columns 38-67 of
 316 the first record.
 317
 318 @item
 319 @code{MINITIAL}, a 1-character short string variable, in column 69 of
 320 the first record.
 321
 322 @item
 323 Fifty variables @code{Q01}, @code{Q02}, @code{Q03}, @dots{}, @code{Q49},
 324 @code{Q50}, all numeric, @code{Q01} in column 7, @code{Q02} in column 8,
 325 @dots{}, @code{Q49} in column 55, @code{Q50} in column 56, all in the second
 326 record.
 327 @end itemize
 328
 329 Cases are separated by a blank record.
 330
 331 Data is read from file @file{survey.dat} in the current directory.
 332
 333 This example shows keywords abbreviated to their first 3 letters.
 334
 335 @end enumerate
 336
 337 @node DATA LIST FREE
 338 @subsection DATA LIST FREE
 339 @vindex DATA LIST FREE
 340
 341 @display
 342 DATA LIST FREE
 343         [(@{TAB,'c'@}, @dots{})]
 344         [@{NOTABLE,TABLE@}]
 345         [FILE='file-name']
 346         [SKIP=record_cnt]
 347         /var_spec@dots{}
 348
 349 where each var_spec takes one of the forms
 350         var_list [(type_spec)]
 351         var_list *
 352 @end display
 353
 354 In free format, the input data is, by default, structured as a series
 355 of fields separated by spaces, tabs, commas, or line breaks.  Each
 356 field's content may be unquoted, or it may be quoted with a pairs of
 357 apostrophes (@samp{'}) or double quotes (@samp{"}).  Unquoted white
 358 space separates fields but is not part of any field.  Any mix of
 359 spaces, tabs, and line breaks is equivalent to a single space for the
 360 purpose of separating fields, but consecutive commas will skip a
 361 field.
 362
 363 Alternatively, delimiters can be specified explicitly, as a
 364 parenthesized, comma-separated list of single-character strings
 365 immediately following FREE.  The word TAB may also be used to specify
 366 a tab character as a delimiter.  When delimiters are specified
 367 explicitly, only the given characters, plus line breaks, separate
 368 fields.  Furthermore, leading spaces at the beginnings of fields are
 369 not trimmed, consecutive delimiters define empty fields, and no form
 370 of quoting is allowed.
 371
 372 The NOTABLE and TABLE subcommands are as in @cmd{DATA LIST FIXED} above.
 373 NOTABLE is the default.
 374
 375 The FILE and SKIP subcommands are as in @cmd{DATA LIST FIXED} above.
 376
 377 The variables to be parsed are given as a single list of variable names.
 378 This list must be introduced by a single slash (@samp{/}).  The set of
 379 variable names may contain format specifications in parentheses
 380 (@pxref{Input and Output Formats}).  Format specifications apply to all
 381 variables back to the previous parenthesized format specification.
 382
 383 In addition, an asterisk may be used to indicate that all variables
 384 preceding it are to have input/output format @samp{F8.0}.
 385
 386 Specified field widths are ignored on input, although all normal limits
 387 on field width apply, but they are honored on output.
 388
 389 @node DATA LIST LIST
 390 @subsection DATA LIST LIST
 391 @vindex DATA LIST LIST
 392
 393 @display
 394 DATA LIST LIST
 395         [(@{TAB,'c'@}, @dots{})]
 396         [@{NOTABLE,TABLE@}]
 397         [FILE='file-name']
 398         [SKIP=record_count]
 399         /var_spec@dots{}
 400
 401 where each var_spec takes one of the forms
 402         var_list [(type_spec)]
 403         var_list *
 404 @end display
 405
 406 With one exception, @cmd{DATA LIST LIST} is syntactically and
 407 semantically equivalent to @cmd{DATA LIST FREE}.  The exception is
 408 that each input line is expected to correspond to exactly one input
 409 record.  If more or fewer fields are found on an input line than
 410 expected, an appropriate diagnostic is issued.
 411
 412 @node END CASE
 413 @section END CASE
 414 @vindex END CASE
 415
 416 @display
 417 END CASE.
 418 @end display
 419
 420 @cmd{END CASE} is used only within @cmd{INPUT PROGRAM} to output the
 421 current case.  @xref{INPUT PROGRAM}, for details.
 422
 423 @node END FILE
 424 @section END FILE
 425 @vindex END FILE
 426
 427 @display
 428 END FILE.
 429 @end display
 430
 431 @cmd{END FILE} is used only within @cmd{INPUT PROGRAM} to terminate
 432 the current input program.  @xref{INPUT PROGRAM}.
 433
 434 @node FILE HANDLE
 435 @section FILE HANDLE
 436 @vindex FILE HANDLE
 437
 438 @display
 439 For text files:
 440         FILE HANDLE handle_name
 441                 /NAME='file-name'
 442                 [/MODE=CHARACTER]
 443                 /TABWIDTH=tab_width
 444
 445 For binary files in native encoding with fixed-length records:
 446         FILE HANDLE handle_name
 447                 /NAME='file-name'
 448                 /MODE=IMAGE
 449                 [/LRECL=rec_len]
 450
 451 For binary files in native encoding with variable-length records:
 452         FILE HANDLE handle_name
 453                 /NAME='file-name'
 454                 /MODE=BINARY
 455                 [/LRECL=rec_len]
 456
 457 For binary files encoded in EBCDIC:
 458         FILE HANDLE handle_name
 459                 /NAME='file-name'
 460                 /MODE=360
 461                 /RECFORM=@{FIXED,VARIABLE,SPANNED@}
 462                 [/LRECL=rec_len]
 463
 464 To explicitly declare a scratch handle:
 465         FILE HANDLE handle_name
 466                 /MODE=SCRATCH
 467 @end display
 468
 469 Use @cmd{FILE HANDLE} to associate a file handle name with a file and
 470 its attributes, so that later commands can refer to the file by its
 471 handle name.  Names of text files can be specified directly on
 472 commands that access files, so that @cmd{FILE HANDLE} is only needed when a
 473 file is not an ordinary file containing lines of text.  However,
 474 @cmd{FILE HANDLE} may be used even for text files, and it may be
 475 easier to specify a file's name once and later refer to it by an
 476 abstract handle.
 477
 478 Specify the file handle name as the identifier immediately following the
 479 @cmd{FILE HANDLE} command name.  The identifier INLINE is reserved for
 480 representing data embedded in the syntax file (@pxref{BEGIN DATA}) The
 481 file handle name must not already have been used in a previous
 482 invocation of @cmd{FILE HANDLE}, unless it has been closed by an
 483 intervening command (@pxref{CLOSE FILE HANDLE}).
 484
 485 The effect and syntax of FILE HANDLE depends on the selected MODE:
 486
 487 @itemize
 488 @item
 489 In CHARACTER mode, the default, the data file is read as a text file,
 490 according to the local system's conventions, and each text line is
 491 read as one record.
 492
 493 In CHARACTER mode only, tabs are expanded to spaces by input programs,
 494 except by @cmd{DATA LIST FREE} with explicitly specified delimiters.
 495 Each tab is 4 characters wide by default, but TABWIDTH (a PSPP
 496 extension) may be used to specify an alternate width.  Use a TABWIDTH
 497 of 0 to suppress tab expansion.
 498
 499 @item
 500 In IMAGE mode, the data file is treated as a series of fixed-length
 501 binary records.  LRECL should be used to specify the record length in
 502 bytes, with a default of 1024.  On input, it is an error if an IMAGE
 503 file's length is not a integer multiple of the record length.  On
 504 output, each record is padded with spaces or truncated, if necessary,
 505 to make it exactly the correct length.
 506
 507 @item
 508 In BINARY mode, the data file is treated as a series of
 509 variable-length binary records.  LRECL may be specified, but its value
 510 is ignored.  The data for each record is both preceded and followed by
 511 a 32-bit signed integer in little-endian byte order that specifies the
 512 length of the record.  (This redundancy permits records in these
 513 files to be efficiently read in reverse order, although PSPP always
 514 reads them in forward order.)  The length does not include either
 515 integer.
 516
 517 @item
 518 Mode 360 reads and writes files in formats first used for tapes in the
 519 1960s on IBM mainframe operating systems and still supported today by
 520 the modern successors of those operating systems.  For more
 521 information, see @cite{OS/400 Tape and Diskette Device Programming},
 522 available on IBM's website.
 523
 524 Alphanumeric data in mode 360 files are encoded in EBCDIC.  PSPP
 525 translates EBCDIC to or from the host's native format as necessary on
 526 input or output, using an ASCII/EBCDIC translation that is one-to-one,
 527 so that a ``round trip'' from ASCII to EBCDIC back to ASCII, or vice
 528 versa, always yields exactly the original data.
 529
 530 The RECFORM subcommand is required in mode 360.  The precise file
 531 format depends on its setting:
 532
 533 @table @asis
 534 @item F
 535 @itemx FIXED
 536 This record format is equivalent to IMAGE mode, except for EBCDIC
 537 translation.
 538
 539 IBM documentation calls this @code{*F} (fixed-length, deblocked)
 540 format.
 541
 542 @item V
 543 @itemx VARIABLE
 544 The file comprises a sequence of zero or more variable-length blocks.
 545 Each block begins with a 4-byte @dfn{block descriptor word} (BDW).
 546 The first two bytes of the BDW are an unsigned integer in big-endian
 547 byte order that specifies the length of the block, including the BDW
 548 itself.  The other two bytes of the BDW are ignored on input and
 549 written as zeros on output.
 550
 551 Following the BDW, the remainder of each block is a sequence of one or
 552 more variable-length records, each of which in turn begins with a
 553 4-byte @dfn{record descriptor word} (RDW) that has the same format as
 554 the BDW.  Following the RDW, the remainder of each record is the
 555 record data.
 556
 557 The maximum length of a record in VARIABLE mode is 65,527 bytes:
 558 65,535 bytes (the maximum value of a 16-bit unsigned integer), minus 4
 559 bytes for the BDW, minus 4 bytes for the RDW.
 560
 561 In mode VARIABLE, LRECL specifies a maximum, not a fixed, record
 562 length, in bytes.  The default is 8,192.
 563
 564 IBM documentation calls this @code{*VB} (variable-length, blocked,
 565 unspanned) format.
 566
 567 @item VS
 568 @itemx SPANNED
 569 The file format is like that of VARIABLE mode, except that logical
 570 records may be split among multiple physical records (called
 571 @dfn{segments}) or blocks.  In SPANNED mode, the third byte of each
 572 RDW is called the segment control character (SCC).  Odd SCC values
 573 cause the segment to be appended to a record buffer maintained in
 574 memory; even values also append the segment and then flush its
 575 contents to the input procedure.  Canonically, SCC value 0 designates
 576 a record not spanned among multiple segments, and values 1 through 3
 577 designate the first segment, the last segment, or an intermediate
 578 segment, respectively, within a multi-segment record.  The record
 579 buffer is also flushed at end of file regardless of the final record's
 580 SCC.
 581
 582 The maximum length of a logical record in VARIABLE mode is limited
 583 only by memory available to PSPP.  Segments are limited to 65,527
 584 bytes, as in VARIABLE mode.
 585
 586 This format is similar to what IBM documentation call @code{*VS}
 587 (variable-length, deblocked, spanned) format.
 588 @end table
 589
 590 In mode 360, fields of type A that extend beyond the end of a record
 591 read from disk are padded with spaces in the host's native character
 592 set, which are then translated from EBCDIC to the native character
 593 set.  Thus, when the host's native character set is based on ASCII,
 594 these fields are effectively padded with character @code{X'80'}.  This
 595 wart is implemented for compatibility.
 596
 597 @item
 598 SCRATCH mode is a PSPP extension that designates the file handle as a
 599 scratch file handle.
 600 Its use is usually unnecessary because file handle names that begin with
 601 @samp{#} are assumed to refer to scratch files.  @pxref{File Handles},
 602 for more information.
 603 @end itemize
 604
 605 The NAME subcommand specifies the name of the file associated with the
 606 handle.  It is required in all modes but SCRATCH mode, in which its
 607 use is forbidden.
 608
 609 @node INPUT PROGRAM
 610 @section INPUT PROGRAM
 611 @vindex INPUT PROGRAM
 612
 613 @display
 614 INPUT PROGRAM.
 615 @dots{} input commands @dots{}
 616 END INPUT PROGRAM.
 617 @end display
 618
 619 @cmd{INPUT PROGRAM}@dots{}@cmd{END INPUT PROGRAM} specifies a
 620 complex input program.  By placing data input commands within @cmd{INPUT
 621 PROGRAM}, PSPP programs can take advantage of more complex file
 622 structures than available with only @cmd{DATA LIST}.
 623
 624 The first sort of extended input program is to simply put multiple @cmd{DATA
 625 LIST} commands within the @cmd{INPUT PROGRAM}.  This will cause all of
 626 the data
 627 files to be read in parallel.  Input will stop when end of file is
 628 reached on any of the data files.
 629
 630 Transformations, such as conditional and looping constructs, can also be
 631 included within @cmd{INPUT PROGRAM}.  These can be used to combine input
 632 from several data files in more complex ways.  However, input will still
 633 stop when end of file is reached on any of the data files.
 634
 635 To prevent @cmd{INPUT PROGRAM} from terminating at the first end of
 636 file, use
 637 the END subcommand on @cmd{DATA LIST}.  This subcommand takes a
 638 variable name,
 639 which should be a numeric scratch variable (@pxref{Scratch Variables}).
 640 (It need not be a scratch variable but otherwise the results can be
 641 surprising.)  The value of this variable is set to 0 when reading the
 642 data file, or 1 when end of file is encountered.
 643
 644 Two additional commands are useful in conjunction with @cmd{INPUT PROGRAM}.
 645 @cmd{END CASE} is the first.  Normally each loop through the
 646 @cmd{INPUT PROGRAM}
 647 structure produces one case.  @cmd{END CASE} controls exactly
 648 when cases are output.  When @cmd{END CASE} is used, looping from the end of
 649 @cmd{INPUT PROGRAM} to the beginning does not cause a case to be output.
 650
 651 @cmd{END FILE} is the second.  When the END subcommand is used on @cmd{DATA
 652 LIST}, there is no way for the @cmd{INPUT PROGRAM} construct to stop
 653 looping,
 654 so an infinite loop results.  @cmd{END FILE}, when executed,
 655 stops the flow of input data and passes out of the @cmd{INPUT PROGRAM}
 656 structure.
 657
 658 All this is very confusing.  A few examples should help to clarify.
 659
 660 @c If you change this example, change the regression test1 in
 661 @c tests/command/input-program.sh to match.
 662 @example
 663 INPUT PROGRAM.
 664         DATA LIST NOTABLE FILE='a.data'/X 1-10.
 665         DATA LIST NOTABLE FILE='b.data'/Y 1-10.
 666 END INPUT PROGRAM.
 667 LIST.
 668 @end example
 669
 670 The example above reads variable X from file @file{a.data} and variable
 671 Y from file @file{b.data}.  If one file is shorter than the other then
 672 the extra data in the longer file is ignored.
 673
 674 @c If you change this example, change the regression test2 in
 675 @c tests/command/input-program.sh to match.
 676 @example
 677 INPUT PROGRAM.
 678         NUMERIC #A #B.
 679
 680         DO IF NOT #A.
 681                 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
 682         END IF.
 683         DO IF NOT #B.
 684                 DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
 685         END IF.
 686         DO IF #A AND #B.
 687                 END FILE.
 688         END IF.
 689         END CASE.
 690 END INPUT PROGRAM.
 691 LIST.
 692 @end example
 693
 694 The above example reads variable X from @file{a.data} and variable Y from
 695 @file{b.data}.  If one file is shorter than the other then the missing
 696 field is set to the system-missing value alongside the present value for
 697 the remaining length of the longer file.
 698
 699 @c If you change this example, change the regression test3 in
 700 @c tests/command/input-program.sh to match.
 701 @example
 702 INPUT PROGRAM.
 703         NUMERIC #A #B.
 704
 705         DO IF #A.
 706                 DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
 707                 DO IF #B.
 708                         END FILE.
 709                 ELSE.
 710                         END CASE.
 711                 END IF.
 712         ELSE.
 713                 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
 714                 DO IF NOT #A.
 715                         END CASE.
 716                 END IF.
 717         END IF.
 718 END INPUT PROGRAM.
 719 LIST.
 720 @end example
 721
 722 The above example reads data from file @file{a.data}, then from
 723 @file{b.data}, and concatenates them into a single active file.
 724
 725 @c If you change this example, change the regression test4 in
 726 @c tests/command/input-program.sh to match.
 727 @example
 728 INPUT PROGRAM.
 729         NUMERIC #EOF.
 730
 731         LOOP IF NOT #EOF.
 732                 DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
 733                 DO IF NOT #EOF.
 734                         END CASE.
 735                 END IF.
 736         END LOOP.
 737
 738         COMPUTE #EOF = 0.
 739         LOOP IF NOT #EOF.
 740                 DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
 741                 DO IF NOT #EOF.
 742                         END CASE.
 743                 END IF.
 744         END LOOP.
 745
 746         END FILE.
 747 END INPUT PROGRAM.
 748 LIST.
 749 @end example
 750
 751 The above example does the same thing as the previous example, in a
 752 different way.
 753
 754 @c If you change this example, make similar changes to the regression
 755 @c test5 in tests/command/input-program.sh.
 756 @example
 757 INPUT PROGRAM.
 758         LOOP #I=1 TO 50.
 759                 COMPUTE X=UNIFORM(10).
 760                 END CASE.
 761         END LOOP.
 762         END FILE.
 763 END INPUT PROGRAM.
 764 LIST/FORMAT=NUMBERED.
 765 @end example
 766
 767 The above example causes an active file to be created consisting of 50
 768 random variates between 0 and 10.
 769
 770 @node LIST
 771 @section LIST
 772 @vindex LIST
 773
 774 @display
 775 LIST
 776         /VARIABLES=var_list
 777         /CASES=FROM start_index TO end_index BY incr_index
 778         /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@}
 779                 @{NOWEIGHT,WEIGHT@}
 780 @end display
 781
 782 The @cmd{LIST} procedure prints the values of specified variables to the
 783 listing file.
 784
 785 The VARIABLES subcommand specifies the variables whose values are to be
 786 printed.  Keyword VARIABLES is optional.  If VARIABLES subcommand is not
 787 specified then all variables in the active file are printed.
 788
 789 The CASES subcommand can be used to specify a subset of cases to be
 790 printed.  Specify FROM and the case number of the first case to print,
 791 TO and the case number of the last case to print, and BY and the number
 792 of cases to advance between printing cases, or any subset of those
 793 settings.  If CASES is not specified then all cases are printed.
 794
 795 The FORMAT subcommand can be used to change the output format.  NUMBERED
 796 will print case numbers along with each case; UNNUMBERED, the default,
 797 causes the case numbers to be omitted.  The WRAP and SINGLE settings are
 798 currently not used.  WEIGHT will cause case weights to be printed along
 799 with variable values; NOWEIGHT, the default, causes case weights to be
 800 omitted from the output.
 801
 802 Case numbers start from 1.  They are counted after all transformations
 803 have been considered.
 804
 805 @cmd{LIST} attempts to fit all the values on a single line.  If needed
 806 to make them fit, variable names are displayed vertically.  If values
 807 cannot fit on a single line, then a multi-line format will be used.
 808
 809 @cmd{LIST} is a procedure.  It causes the data to be read.
 810
 811 @node NEW FILE
 812 @section NEW FILE
 813 @vindex NEW FILE
 814
 815 @display
 816 NEW FILE.
 817 @end display
 818
 819 @cmd{NEW FILE} command clears the current active file.
 820
 821 @node PRINT
 822 @section PRINT
 823 @vindex PRINT
 824
 825 @display
 826 PRINT
 827         OUTFILE='file-name'
 828         RECORDS=n_lines
 829         @{NOTABLE,TABLE@}
 830         [/[line_no] arg@dots{}]
 831
 832 arg takes one of the following forms:
 833         'string' [start-end]
 834         var_list start-end [type_spec]
 835         var_list (fortran_spec)
 836         var_list *
 837 @end display
 838
 839 The @cmd{PRINT} transformation writes variable data to the listing
 840 file or an output file.  @cmd{PRINT} is executed when a procedure
 841 causes the data to be read.  Follow @cmd{PRINT} by @cmd{EXECUTE} to
 842 print variable data without invoking a procedure (@pxref{EXECUTE}).
 843
 844 All @cmd{PRINT} subcommands are optional.  If no strings or variables
 845 are specified, PRINT outputs a single blank line.
 846
 847 The OUTFILE subcommand specifies the file to receive the output.  The
 848 file may be a file name as a string or a file handle (@pxref{File
 849 Handles}).  If OUTFILE is not present then output will be sent to
 850 PSPP's output listing file.  When OUTFILE is present, a space is
 851 inserted at beginning of each output line, even lines that otherwise
 852 would be blank.
 853
 854 The RECORDS subcommand specifies the number of lines to be output.  The
 855 number of lines may optionally be surrounded by parentheses.
 856
 857 TABLE will cause the PRINT command to output a table to the listing file
 858 that describes what it will print to the output file.  NOTABLE, the
 859 default, suppresses this output table.
 860
 861 Introduce the strings and variables to be printed with a slash
 862 (@samp{/}).  Optionally, the slash may be followed by a number
 863 indicating which output line will be specified.  In the absence of this
 864 line number, the next line number will be specified.  Multiple lines may
 865 be specified using multiple slashes with the intended output for a line
 866 following its respective slash.
 867
 868 Literal strings may be printed.  Specify the string itself.  Optionally
 869 the string may be followed by a column number or range of column
 870 numbers, specifying the location on the line for the string to be
 871 printed.  Otherwise, the string will be printed at the current position
 872 on the line.
 873
 874 Variables to be printed can be specified in the same ways as available
 875 for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}).  In addition, a
 876 variable
 877 list may be followed by an asterisk (@samp{*}), which indicates that the
 878 variables should be printed in their dictionary print formats, separated
 879 by spaces.  A variable list followed by a slash or the end of command
 880 will be interpreted the same way.
 881
 882 If a FORTRAN type specification is used to move backwards on the current
 883 line, then text is written at that point on the line, the line will be
 884 truncated to that length, although additional text being added will
 885 again extend the line to that length.
 886
 887 @node PRINT EJECT
 888 @section PRINT EJECT
 889 @vindex PRINT EJECT
 890
 891 @display
 892 PRINT EJECT
 893         OUTFILE='file-name'
 894         RECORDS=n_lines
 895         @{NOTABLE,TABLE@}
 896         /[line_no] arg@dots{}
 897
 898 arg takes one of the following forms:
 899         'string' [start-end]
 900         var_list start-end [type_spec]
 901         var_list (fortran_spec)
 902         var_list *
 903 @end display
 904
 905 @cmd{PRINT EJECT} advances to the beginning of a new output page in
 906 the listing file or output file.  It can also output data in the same
 907 way as @cmd{PRINT}.
 908
 909 All @cmd{PRINT EJECT} subcommands are optional.
 910
 911 Without OUTFILE, PRINT EJECT ejects the current page in
 912 the listing file, then it produces other output, if any is specified.
 913
 914 With OUTFILE, PRINT EJECT writes its output to the specified file.
 915 The first line of output is written with @samp{1} inserted in the
 916 first column.  Commonly, this is the only line of output.  If
 917 additional lines of output are specified, these additional lines are
 918 written with a space inserted in the first column, as with PRINT.
 919
 920 @xref{PRINT}, for more information on syntax and usage.
 921
 922 @node PRINT SPACE
 923 @section PRINT SPACE
 924 @vindex PRINT SPACE
 925
 926 @display
 927 PRINT SPACE OUTFILE='file-name' n_lines.
 928 @end display
 929
 930 @cmd{PRINT SPACE} prints one or more blank lines to an output file.
 931
 932 The OUTFILE subcommand is optional.  It may be used to direct output to
 933 a file specified by file name as a string or file handle (@pxref{File
 934 Handles}).  If OUTFILE is not specified then output will be directed to
 935 the listing file.
 936
 937 n_lines is also optional.  If present, it is an expression
 938 (@pxref{Expressions}) specifying the number of blank lines to be
 939 printed.  The expression must evaluate to a nonnegative value.
 940
 941 @node REREAD
 942 @section REREAD
 943 @vindex REREAD
 944
 945 @display
 946 REREAD FILE=handle COLUMN=column.
 947 @end display
 948
 949 The @cmd{REREAD} transformation allows the previous input line in a
 950 data file
 951 already processed by @cmd{DATA LIST} or another input command to be re-read
 952 for further processing.
 953
 954 The FILE subcommand, which is optional, is used to specify the file to
 955 have its line re-read.  The file must be specified as the name of a file
 956 handle (@pxref{File Handles}).  If FILE is not specified then the last
 957 file specified on @cmd{DATA LIST} will be assumed (last file specified
 958 lexically, not in terms of flow-of-control).
 959
 960 By default, the line re-read is re-read in its entirety.  With the
 961 COLUMN subcommand, a prefix of the line can be exempted from
 962 re-reading.  Specify an expression (@pxref{Expressions}) evaluating to
 963 the first column that should be included in the re-read line.  Columns
 964 are numbered from 1 at the left margin.
 965
 966 Issuing @code{REREAD} multiple times will not back up in the data
 967 file.  Instead, it will re-read the same line multiple times.
 968
 969 @node REPEATING DATA
 970 @section REPEATING DATA
 971 @vindex REPEATING DATA
 972
 973 @display
 974 REPEATING DATA
 975         /STARTS=start-end
 976         /OCCURS=n_occurs
 977         /FILE='file-name'
 978         /LENGTH=length
 979         /CONTINUED[=cont_start-cont_end]
 980         /ID=id_start-id_end=id_var
 981         /@{TABLE,NOTABLE@}
 982         /DATA=var_spec@dots{}
 983
 984 where each var_spec takes one of the forms
 985         var_list start-end [type_spec]
 986         var_list (fortran_spec)
 987 @end display
 988
 989 @cmd{REPEATING DATA} parses groups of data repeating in
 990 a uniform format, possibly with several groups on a single line.  Each
 991 group of data corresponds with one case.  @cmd{REPEATING DATA} may only be
 992 used within an @cmd{INPUT PROGRAM} structure (@pxref{INPUT PROGRAM}).
 993 When used with @cmd{DATA LIST}, it
 994 can be used to parse groups of cases that share a subset of variables
 995 but differ in their other data.
 996
 997 The STARTS subcommand is required.  Specify a range of columns, using
 998 literal numbers or numeric variable names.  This range specifies the
 999 columns on the first line that are used to contain groups of data.  The
1000 ending column is optional.  If it is not specified, then the record
1001 width of the input file is used.  For the inline file (@pxref{BEGIN
1002 DATA}) this is 80 columns; for a file with fixed record widths it is the
1003 record width; for other files it is 1024 characters by default.
1004
1005 The OCCURS subcommand is required.  It must be a number or the name of a
1006 numeric variable.  Its value is the number of groups present in the
1007 current record.
1008
1009 The DATA subcommand is required.  It must be the last subcommand
1010 specified.  It is used to specify the data present within each repeating
1011 group.  Column numbers are specified relative to the beginning of a
1012 group at column 1.  Data is specified in the same way as with @cmd{DATA LIST
1013 FIXED} (@pxref{DATA LIST FIXED}).
1014
1015 All other subcommands are optional.
1016
1017 FILE specifies the file to read, either a file name as a string or a
1018 file handle (@pxref{File Handles}).  If FILE is not present then the
1019 default is the last file handle used on @cmd{DATA LIST} (lexically, not in
1020 terms of flow of control).
1021
1022 By default @cmd{REPEATING DATA} will output a table describing how it will
1023 parse the input data.  Specifying NOTABLE will disable this behavior;
1024 specifying TABLE will explicitly enable it.
1025
1026 The LENGTH subcommand specifies the length in characters of each group.
1027 If it is not present then length is inferred from the DATA subcommand.
1028 LENGTH can be a number or a variable name.
1029
1030 Normally all the data groups are expected to be present on a single
1031 line.  Use the CONTINUED command to indicate that data can be continued
1032 onto additional lines.  If data on continuation lines starts at the left
1033 margin and continues through the entire field width, no column
1034 specifications are necessary on CONTINUED.  Otherwise, specify the
1035 possible range of columns in the same way as on STARTS.
1036
1037 When data groups are continued from line to line, it is easy
1038 for cases to get out of sync through careless hand editing.  The
1039 ID subcommand allows a case identifier to be present on each line of
1040 repeating data groups.  @cmd{REPEATING DATA} will check for the same
1041 identifier on each line and report mismatches.  Specify the range of
1042 columns that the identifier will occupy, followed by an equals sign
1043 (@samp{=}) and the identifier variable name.  The variable must already
1044 have been declared with @cmd{NUMERIC} or another command.
1045
1046 @cmd{REPEATING DATA} should be the last command given within an
1047 @cmd{INPUT PROGRAM}.  It should not be enclosed within a @cmd{LOOP}
1048 structure (@pxref{LOOP}).  Use @cmd{DATA LIST} before, not after,
1049 @cmd{REPEATING DATA}.
1050
1051 @node WRITE
1052 @section WRITE
1053 @vindex WRITE
1054
1055 @display
1056 WRITE
1057         OUTFILE='file-name'
1058         RECORDS=n_lines
1059         @{NOTABLE,TABLE@}
1060         /[line_no] arg@dots{}
1061
1062 arg takes one of the following forms:
1063         'string' [start-end]
1064         var_list start-end [type_spec]
1065         var_list (fortran_spec)
1066         var_list *
1067 @end display
1068
1069 @code{WRITE} writes text or binary data to an output file.
1070
1071 @xref{PRINT}, for more information on syntax and usage.  @cmd{PRINT}
1072 and @cmd{WRITE} differ in only a few ways:
1073
1074 @itemize @bullet
1075 @item
1076 @cmd{WRITE} uses write formats by default, whereas @cmd{PRINT} uses
1077 print formats.
1078
1079 @item
1080 @cmd{PRINT} inserts a space between variables unless a format is
1081 explicitly specified, but @cmd{WRITE} never inserts space between
1082 variables in output.
1083
1084 @item
1085 @cmd{PRINT} inserts a space at the beginning of each line that it
1086 writes to an output file (and @cmd{PRINT EJECT} inserts @samp{1} at
1087 the beginning of each line that should begin a new page), but
1088 @cmd{WRITE} does not.
1089
1090 @item
1091 @cmd{PRINT} outputs the system-missing value according to its
1092 specified output format, whereas @cmd{WRITE} outputs the
1093 system-missing value as a field filled with spaces.  Binary formats
1094 are an exception.
1095 @end itemize
1096 @setfilename ignored