pintos-os.org Git - pspp/blob - doc/data-io.texi

   1 @node Data Input and Output, System and Portable Files, Expressions, Top
   2 @chapter Data Input and Output
   3 @cindex input
   4 @cindex output
   5 @cindex data
   6 @cindex cases
   7 @cindex observations
   8
   9 Data are the focus of the PSPP language.
  10 Each datum  belongs to a @dfn{case} (also called an @dfn{observation}).
  11 Each case represents an individual or `experimental unit'.
  12 For example, in the results of a survey, the names of the respondents,
  13 their sex, age @i{etc}. and their responses are all data and the data
  14 pertaining to single respondent is a case.
  15 This chapter examines
  16 the PSPP commands for defining variables and reading and writing data.
  17
  18 @quotation
  19 @strong{Please note:} Data is not actually read until a procedure is
  20 executed.  These commands tell PSPP how to read data, but they
  21 do not @emph{cause} PSPP to read data.
  22 @end quotation
  23
  24 @menu
  25 * BEGIN DATA::                  Embed data within a syntax file.
  26 * CLEAR TRANSFORMATIONS::       Clear pending transformations.
  27 * DATA LIST::                   Fundamental data reading command.
  28 * END CASE::                    Output the current case.
  29 * END FILE::                    Terminate the current input program.
  30 * FILE HANDLE::                 Support for fixed-length records.
  31 * INPUT PROGRAM::               Support for complex input programs.
  32 * LIST::                        List cases in the active file.
  33 * MATRIX DATA::                 Read matrices in text format.
  34 * NEW FILE::                    Clear the active file and dictionary.
  35 * PRINT::                       Display values in print formats.
  36 * PRINT EJECT::                 Eject the current page then print.
  37 * PRINT SPACE::                 Print blank lines.
  38 * REREAD::                      Take another look at the previous input line.
  39 * REPEATING DATA::              Multiple cases on a single line.
  40 * WRITE::                       Display values in write formats.
  41 @end menu
  42
  43 @node BEGIN DATA, CLEAR TRANSFORMATIONS, Data Input and Output, Data Input and Output
  44 @section BEGIN DATA
  45 @vindex BEGIN DATA
  46 @vindex END DATA
  47 @cindex Embedding data in syntax files
  48 @cindex Data, embedding in syntax files
  49
  50 @display
  51 BEGIN DATA.
  52 @dots{}
  53 END DATA.
  54 @end display
  55
  56 @cmd{BEGIN DATA} and @cmd{END DATA} can be used to embed raw ASCII
  57 data in a PSPP syntax file.  @cmd{DATA LIST} or another input
  58 procedure must be used before @cmd{BEGIN DATA} (@pxref{DATA LIST}).
  59 @cmd{BEGIN DATA} and @cmd{END DATA} must be used together.  @cmd{END
  60 DATA} must appear by itself on a single line, with no leading
  61 white space and exactly one space between the words @code{END} and
  62 @code{DATA}, like this:
  63
  64 @example
  65 END DATA.
  66 @end example
  67
  68 @node CLEAR TRANSFORMATIONS, DATA LIST, BEGIN DATA, Data Input and Output
  69 @section CLEAR TRANSFORMATIONS
  70 @vindex CLEAR TRANSFORMATIONS
  71
  72 @display
  73 CLEAR TRANSFORMATIONS.
  74 @end display
  75
  76 @cmd{CLEAR TRANSFORMATIONS} clears out all pending
  77 transformations.  It does not cancel the current input program.  It is
  78 valid only when PSPP is interactive, not in syntax files.
  79
  80 @node DATA LIST, END CASE, CLEAR TRANSFORMATIONS, Data Input and Output
  81 @section DATA LIST
  82 @vindex DATA LIST
  83 @cindex reading data from a file
  84 @cindex data, reading from a file
  85 @cindex data, embedding in syntax files
  86 @cindex embedding data in syntax files
  87
  88 Used to read text or binary data, @cmd{DATA LIST} is the most
  89 fundamental data-reading command.  Even the more sophisticated input
  90 methods use @cmd{DATA LIST} commands as a building block.
  91 Understanding @cmd{DATA LIST} is important to understanding how to use
  92 PSPP to read your data files.
  93
  94 There are two major variants of @cmd{DATA LIST}, which are fixed
  95 format and free format.  In addition, free format has a minor variant,
  96 list format, which is discussed in terms of its differences from vanilla
  97 free format.
  98
  99 Each form of @cmd{DATA LIST} is described in detail below.
 100
 101 @menu
 102 * DATA LIST FIXED::             Fixed columnar locations for data.
 103 * DATA LIST FREE::              Any spacing you like.
 104 * DATA LIST LIST::              Each case must be on a single line.
 105 @end menu
 106
 107 @node DATA LIST FIXED, DATA LIST FREE, DATA LIST, DATA LIST
 108 @subsection DATA LIST FIXED
 109 @vindex DATA LIST FIXED
 110 @cindex reading fixed-format data
 111 @cindex fixed-format data, reading
 112 @cindex data, fixed-format, reading
 113 @cindex embedding fixed-format data
 114
 115 @display
 116 DATA LIST [FIXED]
 117         @{TABLE,NOTABLE@}
 118         FILE='filename'
 119         RECORDS=record_count
 120         END=end_var
 121         /[line_no] var_spec@dots{}
 122
 123 where each var_spec takes one of the forms
 124         var_list start-end [type_spec]
 125         var_list (fortran_spec)
 126 @end display
 127
 128 @cmd{DATA LIST FIXED} is used to read data files that have values at fixed
 129 positions on each line of single-line or multiline records.  The
 130 keyword FIXED is optional.
 131
 132 The FILE subcommand must be used if input is to be taken from an
 133 external file.  It may be used to specify a filename as a string or a
 134 file handle (@pxref{FILE HANDLE}).  If the FILE subcommand is not used,
 135 then input is assumed to be specified within the command file using
 136 @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}).
 137
 138 The optional RECORDS subcommand, which takes a single integer as an
 139 argument, is used to specify the number of lines per record.  If RECORDS
 140 is not specified, then the number of lines per record is calculated from
 141 the list of variable specifications later in @cmd{DATA LIST}.
 142
 143 The END subcommand is only useful in conjunction with @cmd{INPUT
 144 PROGRAM}.  @xref{INPUT PROGRAM}, for details.
 145
 146 @cmd{DATA LIST} can optionally output a table describing how the data file
 147 will be read.  The TABLE subcommand enables this output, and NOTABLE
 148 disables it.  The default is to output the table.
 149
 150 The list of variables to be read from the data list must come last.
 151 Each line in the data record is introduced by a slash (@samp{/}).
 152 Optionally, a line number may follow the slash.  Following, any number
 153 of variable specifications may be present.
 154
 155 Each variable specification consists of a list of variable names
 156 followed by a description of their location on the input line.  Sets of
 157 variables may specified using the @code{DATA LIST} TO convention
 158 (@pxref{Sets of
 159 Variables}).  There are two ways to specify the location of the variable
 160 on the line: columnar style and FORTRAN style.
 161
 162 In columnar style, the starting column and ending column for the field
 163 are specified after the variable name, separated by a dash (@samp{-}).
 164 For instance, the third through fifth columns on a line would be
 165 specified @samp{3-5}.  By default, variables are considered to be in
 166 @samp{F} format (@pxref{Input/Output Formats}).  (This default can be
 167 changed; see @ref{SET} for more information.)
 168
 169 In columnar style, to use a variable format other than the default,
 170 specify the format type in parentheses after the column numbers.  For
 171 instance, for alphanumeric @samp{A} format, use @samp{(A)}.
 172
 173 In addition, implied decimal places can be specified in parentheses
 174 after the column numbers.  As an example, suppose that a data file has a
 175 field in which the characters @samp{1234} should be interpreted as
 176 having the value 12.34.  Then this field has two implied decimal places,
 177 and the corresponding specification would be @samp{(2)}.  If a field
 178 that has implied decimal places contains a decimal point, then the
 179 implied decimal places are not applied.
 180
 181 Changing the variable format and adding implied decimal places can be
 182 done together; for instance, @samp{(N,5)}.
 183
 184 When using columnar style, the input and output width of each variable is
 185 computed from the field width.  The field width must be evenly divisible
 186 into the number of variables specified.
 187
 188 FORTRAN style is an altogether different approach to specifying field
 189 locations.  With this approach, a list of variable input format
 190 specifications, separated by commas, are placed after the variable names
 191 inside parentheses.  Each format specifier advances as many characters
 192 into the input line as it uses.
 193
 194 Implied decimal places also exist in FORTRAN style.  A format
 195 specification with @var{d} decimal places also has @var{d} implied
 196 decimal places.
 197
 198 In addition to the standard format specifiers (@pxref{Input/Output
 199 Formats}), FORTRAN style defines some extensions:
 200
 201 @table @asis
 202 @item @code{X}
 203 Advance the current column on this line by one character position.
 204
 205 @item @code{T}@var{x}
 206 Set the current column on this line to column @var{x}, with column
 207 numbers considered to begin with 1 at the left margin.
 208
 209 @item @code{NEWREC}@var{x}
 210 Skip forward @var{x} lines in the current record, resetting the active
 211 column to the left margin.
 212
 213 @item Repeat count
 214 Any format specifier may be preceded by a number.  This causes the
 215 action of that format specifier to be repeated the specified number of
 216 times.
 217
 218 @item (@var{spec1}, @dots{}, @var{specN})
 219 Group the given specifiers together.  This is most useful when preceded
 220 by a repeat count.  Groups may be nested arbitrarily.
 221 @end table
 222
 223 FORTRAN and columnar styles may be freely intermixed.  Columnar style
 224 leaves the active column immediately after the ending column
 225 specified.  Record motion using @code{NEWREC} in FORTRAN style also
 226 applies to later FORTRAN and columnar specifiers.
 227
 228 @menu
 229 * DATA LIST FIXED Examples::    Examples of DATA LIST FIXED.
 230 @end menu
 231
 232 @node DATA LIST FIXED Examples,  , DATA LIST FIXED, DATA LIST FIXED
 233 @unnumberedsubsubsec Examples
 234
 235 @enumerate
 236 @item
 237 @example
 238 DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
 239
 240 BEGIN DATA.
 241 John Smith 102311
 242 Bob Arnold 122015
 243 Bill Yates  918 6
 244 END DATA.
 245 @end example
 246
 247 Defines the following variables:
 248
 249 @itemize @bullet
 250 @item
 251 @code{NAME}, a 10-character-wide long string variable, in columns 1
 252 through 10.
 253
 254 @item
 255 @code{INFO1}, a numeric variable, in columns 12 through 13.
 256
 257 @item
 258 @code{INFO2}, a numeric variable, in columns 14 through 15.
 259
 260 @item
 261 @code{INFO3}, a numeric variable, in columns 16 through 17.
 262 @end itemize
 263
 264 The @code{BEGIN DATA}/@code{END DATA} commands cause three cases to be
 265 defined:
 266
 267 @example
 268 Case   NAME         INFO1   INFO2   INFO3
 269    1   John Smith     10      23      11
 270    2   Bob Arnold     12      20      15
 271    3   Bill Yates      9      18       6
 272 @end example
 273
 274 The @code{TABLE} keyword causes PSPP to print out a table
 275 describing the four variables defined.
 276
 277 @item
 278 @example
 279 DAT LIS FIL="survey.dat"
 280         /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
 281         /Q01 TO Q50 7-56
 282         /.
 283 @end example
 284
 285 Defines the following variables:
 286
 287 @itemize @bullet
 288 @item
 289 @code{ID}, a numeric variable, in columns 1-5 of the first record.
 290
 291 @item
 292 @code{NAME}, a 30-character long string variable, in columns 7-36 of the
 293 first record.
 294
 295 @item
 296 @code{SURNAME}, a 30-character long string variable, in columns 38-67 of
 297 the first record.
 298
 299 @item
 300 @code{MINITIAL}, a 1-character short string variable, in column 69 of
 301 the first record.
 302
 303 @item
 304 Fifty variables @code{Q01}, @code{Q02}, @code{Q03}, @dots{}, @code{Q49},
 305 @code{Q50}, all numeric, @code{Q01} in column 7, @code{Q02} in column 8,
 306 @dots{}, @code{Q49} in column 55, @code{Q50} in column 56, all in the second
 307 record.
 308 @end itemize
 309
 310 Cases are separated by a blank record.
 311
 312 Data is read from file @file{survey.dat} in the current directory.
 313
 314 This example shows keywords abbreviated to their first 3 letters.
 315
 316 @end enumerate
 317
 318 @node DATA LIST FREE, DATA LIST LIST, DATA LIST FIXED, DATA LIST
 319 @subsection DATA LIST FREE
 320 @vindex DATA LIST FREE
 321
 322 @display
 323 DATA LIST FREE
 324         [(@{TAB,'c'@}, @dots{})]
 325         [@{NOTABLE,TABLE@}]
 326         FILE='filename'
 327         END=end_var
 328         /var_spec@dots{}
 329
 330 where each var_spec takes one of the forms
 331         var_list [(type_spec)]
 332         var_list *
 333 @end display
 334
 335 In free format, the input data is, by default, structured as a series
 336 of fields separated by spaces, tabs, commas, or line breaks.  Each
 337 field's content may be unquoted, or it may be quoted with a pairs of
 338 apostrophes (@samp{'}) or double quotes (@samp{"}).  Unquoted white
 339 space separates fields but is not part of any field.  Any mix of
 340 spaces, tabs, and line breaks is equivalent to a single space for the
 341 purpose of separating fields, but consecutive commas will skip a
 342 field.
 343
 344 Alternatively, delimiters can be specified explicitly, as a
 345 parenthesized, comma-separated list of single-character strings
 346 immediately following FREE.  The word TAB may also be used to specify
 347 a tab character as a delimiter.  When delimiters are specified
 348 explicitly, only the given characters, plus line breaks, separate
 349 fields.  Furthermore, leading spaces at the beginnings of fields are
 350 not trimmed, consecutive delimiters define empty fields, and no form
 351 of quoting is allowed.
 352
 353 The NOTABLE and TABLE subcommands are as in @cmd{DATA LIST FIXED} above.
 354 NOTABLE is the default.
 355
 356 The FILE and END subcommands are as in @cmd{DATA LIST FIXED} above.
 357
 358 The variables to be parsed are given as a single list of variable names.
 359 This list must be introduced by a single slash (@samp{/}).  The set of
 360 variable names may contain format specifications in parentheses
 361 (@pxref{Input/Output Formats}).  Format specifications apply to all
 362 variables back to the previous parenthesized format specification.
 363
 364 In addition, an asterisk may be used to indicate that all variables
 365 preceding it are to have input/output format @samp{F8.0}.
 366
 367 Specified field widths are ignored on input, although all normal limits
 368 on field width apply, but they are honored on output.
 369
 370 @node DATA LIST LIST,  , DATA LIST FREE, DATA LIST
 371 @subsection DATA LIST LIST
 372 @vindex DATA LIST LIST
 373
 374 @display
 375 DATA LIST LIST
 376         [(@{TAB,'c'@}, @dots{})]
 377         [@{NOTABLE,TABLE@}]
 378         FILE='filename'
 379         END=end_var
 380         /var_spec@dots{}
 381
 382 where each var_spec takes one of the forms
 383         var_list [(type_spec)]
 384         var_list *
 385 @end display
 386
 387 With one exception, @cmd{DATA LIST LIST} is syntactically and
 388 semantically equivalent to @cmd{DATA LIST FREE}.  The exception is
 389 that each input line is expected to correspond to exactly one input
 390 record.  If more or fewer fields are found on an input line than
 391 expected, an appropriate diagnostic is issued.
 392
 393 @node END CASE, END FILE, DATA LIST, Data Input and Output
 394 @section END CASE
 395 @vindex END CASE
 396
 397 @display
 398 END CASE.
 399 @end display
 400
 401 @cmd{END CASE} is used only within @cmd{INPUT PROGRAM} to output the
 402 current case.  @xref{INPUT PROGRAM}, for details.
 403
 404 @node END FILE, FILE HANDLE, END CASE, Data Input and Output
 405 @section END FILE
 406 @vindex END FILE
 407
 408 @display
 409 END FILE.
 410 @end display
 411
 412 @cmd{END FILE} is used only within @cmd{INPUT PROGRAM} to terminate
 413 the current input program.  @xref{INPUT PROGRAM}.
 414
 415 @node FILE HANDLE, INPUT PROGRAM, END FILE, Data Input and Output
 416 @section FILE HANDLE
 417 @vindex FILE HANDLE
 418
 419 @display
 420 FILE HANDLE handle_name
 421         /NAME='filename'
 422         /MODE=@{CHARACTER,IMAGE@}
 423         /LRECL=rec_len
 424         /TABWIDTH=tab_width
 425 @end display
 426
 427 Use @cmd{FILE HANDLE} to associate a file handle name with a file and
 428 its attributes, so that later commands can refer to the file by its
 429 handle name.  Because names of text files can be specified directly on
 430 commands that access files, @cmd{FILE HANDLE} is only needed when a
 431 file is not an ordinary file containing lines of text.  However,
 432 @cmd{FILE HANDLE} may be used even for text files, and it may be
 433 easier to specify a file's name once and later refer to it by an
 434 abstract handle.
 435
 436 Specify the file handle name as an identifier.  Any given identifier may
 437 only appear once in a PSPP run.  File handles may not be reassigned to a
 438 different file.  The file handle name must immediately follow the @cmd{FILE
 439 HANDLE} command name.
 440
 441 The NAME subcommand specifies the name of the file associated with the
 442 handle.  It is the only required subcommand.
 443
 444 MODE specifies a file mode.  In CHARACTER mode, the default, the data
 445 file is opened in ANSI C text mode, so that local end of line
 446 conventions are followed, and each text line is read as one record.
 447 In CHARACTER mode, most input programs will expand tabs to spaces
 448 (@cmd{DATA LIST FREE} with explicitly specified delimiters is an
 449 exception).  By default, each tab is 4 characters wide, but an
 450 alternate width may be specified on TABWIDTH.  A tab width of 0
 451 suppresses tab expansion entirely.
 452
 453 By contrast, in BINARY mode, the data file is opened in ANSI C binary
 454 mode and records are a fixed length.  In BINARY mode, LRECL specifies
 455 the record length in bytes, with a default of 1024.  Tab characters
 456 are never expanded to spaces in binary mode.
 457
 458 @node INPUT PROGRAM, LIST, FILE HANDLE, Data Input and Output
 459 @section INPUT PROGRAM
 460 @vindex INPUT PROGRAM
 461
 462 @display
 463 INPUT PROGRAM.
 464 @dots{} input commands @dots{}
 465 END INPUT PROGRAM.
 466 @end display
 467
 468 @cmd{INPUT PROGRAM}@dots{}@cmd{END INPUT PROGRAM} specifies a
 469 complex input program.  By placing data input commands within @cmd{INPUT
 470 PROGRAM}, PSPP programs can take advantage of more complex file
 471 structures than available with only @cmd{DATA LIST}.
 472
 473 The first sort of extended input program is to simply put multiple @cmd{DATA
 474 LIST} commands within the @cmd{INPUT PROGRAM}.  This will cause all of
 475 the data
 476 files to be read in parallel.  Input will stop when end of file is
 477 reached on any of the data files.
 478
 479 Transformations, such as conditional and looping constructs, can also be
 480 included within @cmd{INPUT PROGRAM}.  These can be used to combine input
 481 from several data files in more complex ways.  However, input will still
 482 stop when end of file is reached on any of the data files.
 483
 484 To prevent @cmd{INPUT PROGRAM} from terminating at the first end of
 485 file, use
 486 the END subcommand on @cmd{DATA LIST}.  This subcommand takes a
 487 variable name,
 488 which should be a numeric scratch variable (@pxref{Scratch Variables}).
 489 (It need not be a scratch variable but otherwise the results can be
 490 surprising.)  The value of this variable is set to 0 when reading the
 491 data file, or 1 when end of file is encountered.
 492
 493 Two additional commands are useful in conjunction with @cmd{INPUT PROGRAM}.
 494 @cmd{END CASE} is the first.  Normally each loop through the
 495 @cmd{INPUT PROGRAM}
 496 structure produces one case.  @cmd{END CASE} controls exactly
 497 when cases are output.  When @cmd{END CASE} is used, looping from the end of
 498 @cmd{INPUT PROGRAM} to the beginning does not cause a case to be output.
 499
 500 @cmd{END FILE} is the second.  When the END subcommand is used on @cmd{DATA
 501 LIST}, there is no way for the @cmd{INPUT PROGRAM} construct to stop
 502 looping,
 503 so an infinite loop results.  @cmd{END FILE}, when executed,
 504 stops the flow of input data and passes out of the @cmd{INPUT PROGRAM}
 505 structure.
 506
 507 All this is very confusing.  A few examples should help to clarify.
 508
 509 @example
 510 INPUT PROGRAM.
 511         DATA LIST NOTABLE FILE='a.data'/X 1-10.
 512         DATA LIST NOTABLE FILE='b.data'/Y 1-10.
 513 END INPUT PROGRAM.
 514 LIST.
 515 @end example
 516
 517 The example above reads variable X from file @file{a.data} and variable
 518 Y from file @file{b.data}.  If one file is shorter than the other then
 519 the extra data in the longer file is ignored.
 520
 521 @example
 522 INPUT PROGRAM.
 523         NUMERIC #A #B.
 524
 525         DO IF NOT #A.
 526                 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
 527         END IF.
 528         DO IF NOT #B.
 529                 DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
 530         END IF.
 531         DO IF #A AND #B.
 532                 END FILE.
 533         END IF.
 534         END CASE.
 535 END INPUT PROGRAM.
 536 LIST.
 537 @end example
 538
 539 The above example reads variable X from @file{a.data} and variable Y from
 540 @file{b.data}.  If one file is shorter than the other then the missing
 541 field is set to the system-missing value alongside the present value for
 542 the remaining length of the longer file.
 543
 544 @example
 545 INPUT PROGRAM.
 546         NUMERIC #A #B.
 547
 548         DO IF #A.
 549                 DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
 550                 DO IF #B.
 551                         END FILE.
 552                 ELSE.
 553                         END CASE.
 554                 END IF.
 555         ELSE.
 556                 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
 557                 DO IF NOT #A.
 558                         END CASE.
 559                 END IF.
 560         END IF.
 561 END INPUT PROGRAM.
 562 LIST.
 563 @end example
 564
 565 The above example reads data from file @file{a.data}, then from
 566 @file{b.data}, and concatenates them into a single active file.
 567
 568 @example
 569 INPUT PROGRAM.
 570         NUMERIC #EOF.
 571
 572         LOOP IF NOT #EOF.
 573                 DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
 574                 DO IF NOT #EOF.
 575                         END CASE.
 576                 END IF.
 577         END LOOP.
 578
 579         COMPUTE #EOF = 0.
 580         LOOP IF NOT #EOF.
 581                 DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
 582                 DO IF NOT #EOF.
 583                         END CASE.
 584                 END IF.
 585         END LOOP.
 586
 587         END FILE.
 588 END INPUT PROGRAM.
 589 LIST.
 590 @end example
 591
 592 The above example does the same thing as the previous example, in a
 593 different way.
 594
 595 @example
 596 INPUT PROGRAM.
 597         LOOP #I=1 TO 50.
 598                 COMPUTE X=UNIFORM(10).
 599                 END CASE.
 600         END LOOP.
 601         END FILE.
 602 END INPUT PROGRAM.
 603 LIST/FORMAT=NUMBERED.
 604 @end example
 605
 606 The above example causes an active file to be created consisting of 50
 607 random variates between 0 and 10.
 608
 609 @node LIST, MATRIX DATA, INPUT PROGRAM, Data Input and Output
 610 @section LIST
 611 @vindex LIST
 612
 613 @display
 614 LIST
 615         /VARIABLES=var_list
 616         /CASES=FROM start_index TO end_index BY incr_index
 617         /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@}
 618                 @{NOWEIGHT,WEIGHT@}
 619 @end display
 620
 621 The @cmd{LIST} procedure prints the values of specified variables to the
 622 listing file.
 623
 624 The VARIABLES subcommand specifies the variables whose values are to be
 625 printed.  Keyword VARIABLES is optional.  If VARIABLES subcommand is not
 626 specified then all variables in the active file are printed.
 627
 628 The CASES subcommand can be used to specify a subset of cases to be
 629 printed.  Specify FROM and the case number of the first case to print,
 630 TO and the case number of the last case to print, and BY and the number
 631 of cases to advance between printing cases, or any subset of those
 632 settings.  If CASES is not specified then all cases are printed.
 633
 634 The FORMAT subcommand can be used to change the output format.  NUMBERED
 635 will print case numbers along with each case; UNNUMBERED, the default,
 636 causes the case numbers to be omitted.  The WRAP and SINGLE settings are
 637 currently not used.  WEIGHT will cause case weights to be printed along
 638 with variable values; NOWEIGHT, the default, causes case weights to be
 639 omitted from the output.
 640
 641 Case numbers start from 1.  They are counted after all transformations
 642 have been considered.
 643
 644 @cmd{LIST} attempts to fit all the values on a single line.  If needed
 645 to make them fit, variable names are displayed vertically.  If values
 646 cannot fit on a single line, then a multi-line format will be used.
 647
 648 @cmd{LIST} is a procedure.  It causes the data to be read.
 649
 650 @node MATRIX DATA, NEW FILE, LIST, Data Input and Output
 651 @section MATRIX DATA
 652 @vindex MATRIX DATA
 653
 654 @display
 655 MATRIX DATA
 656         /VARIABLES=var_list
 657         /FILE='filename'
 658         /FORMAT=@{LIST,FREE@} @{LOWER,UPPER,FULL@} @{DIAGONAL,NODIAGONAL@}
 659         /SPLIT=@{new_var,var_list@}
 660         /FACTORS=var_list
 661         /CELLS=n_cells
 662         /N=n
 663         /CONTENTS=@{N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE,
 664                    DFE,MAT,COV,CORR,PROX@}
 665 @end display
 666
 667 @cmd{MATRIX DATA} command reads square matrices in one of several textual
 668 formats.  @cmd{MATRIX DATA} clears the dictionary and replaces it and
 669 reads a
 670 data file.
 671
 672 Use VARIABLES to specify the variables that form the rows and columns of
 673 the matrices.  You may not specify a variable named @code{VARNAME_}.  You
 674 should specify VARIABLES first.
 675
 676 Specify the file to read on FILE, either as a file name string or a file
 677 handle (@pxref{FILE HANDLE}).  If FILE is not specified then matrix data
 678 must immediately follow @cmd{MATRIX DATA} with a @cmd{BEGIN
 679 DATA}@dots{}@cmd{END DATA}
 680 construct (@pxref{BEGIN DATA}).
 681
 682 The FORMAT subcommand specifies how the matrices are formatted.  LIST,
 683 the default, indicates that there is one line per row of matrix data;
 684 FREE allows single matrix rows to be broken across multiple lines.  This
 685 is analogous to the difference between @cmd{DATA LIST FREE} and
 686 @cmd{DATA LIST LIST}
 687 (@pxref{DATA LIST}).  LOWER, the default, indicates that the lower
 688 triangle of the matrix is given; UPPER indicates the upper triangle; and
 689 FULL indicates that the entire matrix is given.  DIAGONAL, the default,
 690 indicates that the diagonal is part of the data; NODIAGONAL indicates
 691 that it is omitted.  DIAGONAL/NODIAGONAL have no effect when FULL is
 692 specified.
 693
 694 The SPLIT subcommand is used to specify @cmd{SPLIT FILE} variables for the
 695 input matrices (@pxref{SPLIT FILE}).  Specify either a single variable
 696 not specified on VARIABLES, or one or more variables that are specified
 697 on VARIABLES.  In the former case, the SPLIT values are not present in
 698 the data and ROWTYPE_ may not be specified on VARIABLES.  In the latter
 699 case, the SPLIT values are present in the data.
 700
 701 Specify a list of factor variables on FACTORS.  Factor variables must
 702 also be listed on VARIABLES.  Factor variables are used when there are
 703 some variables where, for each possible combination of their values,
 704 statistics on the matrix variables are included in the data.
 705
 706 If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the
 707 CELLS subcommand is required.  Specify the number of factor variable
 708 combinations that are given.  For instance, if factor variable A has 2
 709 values and factor variable B has 3 values, specify 6.
 710
 711 The N subcommand specifies a population number of observations.  When N
 712 is specified, one N record is output for each @cmd{SPLIT FILE}.
 713
 714 Use CONTENTS to specify what sort of information the matrices include.
 715 Each possible option is described in more detail below.  When ROWTYPE_
 716 is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS
 717 is not specified then /CONTENTS=CORR is assumed.
 718
 719 @table @asis
 720 @item N
 721 @item N_VECTOR
 722 Number of observations as a vector, one value for each variable.
 723 @item N_SCALAR
 724 Number of observations as a single value.
 725 @item N_MATRIX
 726 Matrix of counts.
 727 @item MEAN
 728 Vector of means.
 729 @item STDDEV
 730 Vector of standard deviations.
 731 @item COUNT
 732 Vector of counts.
 733 @item MSE
 734 Vector of mean squared errors.
 735 @item DFE
 736 Vector of degrees of freedom.
 737 @item MAT
 738 Generic matrix.
 739 @item COV
 740 Covariance matrix.
 741 @item CORR
 742 Correlation matrix.
 743 @item PROX
 744 Proximities matrix.
 745 @end table
 746
 747 The exact semantics of the matrices read by @cmd{MATRIX DATA} are complex.
 748 Right now @cmd{MATRIX DATA} isn't too useful due to a lack of procedures
 749 accepting or producing related data, so these semantics aren't
 750 documented.  Later, they'll be described here in detail.
 751
 752 @node NEW FILE, PRINT, MATRIX DATA, Data Input and Output
 753 @section NEW FILE
 754 @vindex NEW FILE
 755
 756 @display
 757 NEW FILE.
 758 @end display
 759
 760 @cmd{NEW FILE} command clears the current active file.
 761
 762 @node PRINT, PRINT EJECT, NEW FILE, Data Input and Output
 763 @section PRINT
 764 @vindex PRINT
 765
 766 @display
 767 PRINT
 768         OUTFILE='filename'
 769         RECORDS=n_lines
 770         @{NOTABLE,TABLE@}
 771         /[line_no] arg@dots{}
 772
 773 arg takes one of the following forms:
 774         'string' [start-end]
 775         var_list start-end [type_spec]
 776         var_list (fortran_spec)
 777         var_list *
 778 @end display
 779
 780 The @cmd{PRINT} transformation writes variable data to an output file.
 781 @cmd{PRINT} is executed when a procedure causes the data to be read.
 782 Follow @cmd{PRINT} by @cmd{EXECUTE} to print variable data without
 783 invoking a procedure (@pxref{EXECUTE}).
 784
 785 All @cmd{PRINT} subcommands are optional.
 786
 787 The OUTFILE subcommand specifies the file to receive the output.  The
 788 file may be a file name as a string or a file handle (@pxref{FILE
 789 HANDLE}).  If OUTFILE is not present then output will be sent to PSPP's
 790 output listing file.
 791
 792 The RECORDS subcommand specifies the number of lines to be output.  The
 793 number of lines may optionally be surrounded by parentheses.
 794
 795 TABLE will cause the PRINT command to output a table to the listing file
 796 that describes what it will print to the output file.  NOTABLE, the
 797 default, suppresses this output table.
 798
 799 Introduce the strings and variables to be printed with a slash
 800 (@samp{/}).  Optionally, the slash may be followed by a number
 801 indicating which output line will be specified.  In the absence of this
 802 line number, the next line number will be specified.  Multiple lines may
 803 be specified using multiple slashes with the intended output for a line
 804 following its respective slash.
 805
 806 Literal strings may be printed.  Specify the string itself.  Optionally
 807 the string may be followed by a column number or range of column
 808 numbers, specifying the location on the line for the string to be
 809 printed.  Otherwise, the string will be printed at the current position
 810 on the line.
 811
 812 Variables to be printed can be specified in the same ways as available
 813 for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}).  In addition, a
 814 variable
 815 list may be followed by an asterisk (@samp{*}), which indicates that the
 816 variables should be printed in their dictionary print formats, separated
 817 by spaces.  A variable list followed by a slash or the end of command
 818 will be interpreted the same way.
 819
 820 If a FORTRAN type specification is used to move backwards on the current
 821 line, then text is written at that point on the line, the line will be
 822 truncated to that length, although additional text being added will
 823 again extend the line to that length.
 824
 825 @node PRINT EJECT, PRINT SPACE, PRINT, Data Input and Output
 826 @section PRINT EJECT
 827 @vindex PRINT EJECT
 828
 829 @display
 830 PRINT EJECT
 831         OUTFILE='filename'
 832         RECORDS=n_lines
 833         @{NOTABLE,TABLE@}
 834         /[line_no] arg@dots{}
 835
 836 arg takes one of the following forms:
 837         'string' [start-end]
 838         var_list start-end [type_spec]
 839         var_list (fortran_spec)
 840         var_list *
 841 @end display
 842
 843 @cmd{PRINT EJECT} writes data to an output file.  Before the data is
 844 written, the current page in the listing file is ejected.
 845
 846 @xref{PRINT}, for more information on syntax and usage.
 847
 848 @node PRINT SPACE, REREAD, PRINT EJECT, Data Input and Output
 849 @section PRINT SPACE
 850 @vindex PRINT SPACE
 851
 852 @display
 853 PRINT SPACE OUTFILE='filename' n_lines.
 854 @end display
 855
 856 @cmd{PRINT SPACE} prints one or more blank lines to an output file.
 857
 858 The OUTFILE subcommand is optional.  It may be used to direct output to
 859 a file specified by file name as a string or file handle (@pxref{FILE
 860 HANDLE}).  If OUTFILE is not specified then output will be directed to
 861 the listing file.
 862
 863 n_lines is also optional.  If present, it is an expression
 864 (@pxref{Expressions}) specifying the number of blank lines to be
 865 printed.  The expression must evaluate to a nonnegative value.
 866
 867 @node REREAD, REPEATING DATA, PRINT SPACE, Data Input and Output
 868 @section REREAD
 869 @vindex REREAD
 870
 871 @display
 872 REREAD FILE=handle COLUMN=column.
 873 @end display
 874
 875 The @cmd{REREAD} transformation allows the previous input line in a
 876 data file
 877 already processed by @cmd{DATA LIST} or another input command to be re-read
 878 for further processing.
 879
 880 The FILE subcommand, which is optional, is used to specify the file to
 881 have its line re-read.  The file must be specified in the form of a file
 882 handle (@pxref{FILE HANDLE}).  If FILE is not specified then the last
 883 file specified on @cmd{DATA LIST} will be assumed (last file specified
 884 lexically, not in terms of flow-of-control).
 885
 886 By default, the line re-read is re-read in its entirety.  With the
 887 COLUMN subcommand, a prefix of the line can be exempted from
 888 re-reading.  Specify an expression (@pxref{Expressions}) evaluating to
 889 the first column that should be included in the re-read line.  Columns
 890 are numbered from 1 at the left margin.
 891
 892 Issuing @code{REREAD} multiple times will not back up in the data
 893 file.  Instead, it will re-read the same line multiple times.
 894
 895 @node REPEATING DATA, WRITE, REREAD, Data Input and Output
 896 @section REPEATING DATA
 897 @vindex REPEATING DATA
 898
 899 @display
 900 REPEATING DATA
 901         /STARTS=start-end
 902         /OCCURS=n_occurs
 903         /FILE='filename'
 904         /LENGTH=length
 905         /CONTINUED[=cont_start-cont_end]
 906         /ID=id_start-id_end=id_var
 907         /@{TABLE,NOTABLE@}
 908         /DATA=var_spec@dots{}
 909
 910 where each var_spec takes one of the forms
 911         var_list start-end [type_spec]
 912         var_list (fortran_spec)
 913 @end display
 914
 915 @cmd{REPEATING DATA} parses groups of data repeating in
 916 a uniform format, possibly with several groups on a single line.  Each
 917 group of data corresponds with one case.  @cmd{REPEATING DATA} may only be
 918 used within an @cmd{INPUT PROGRAM} structure (@pxref{INPUT PROGRAM}).
 919 When used with @cmd{DATA LIST}, it
 920 can be used to parse groups of cases that share a subset of variables
 921 but differ in their other data.
 922
 923 The STARTS subcommand is required.  Specify a range of columns, using
 924 literal numbers or numeric variable names.  This range specifies the
 925 columns on the first line that are used to contain groups of data.  The
 926 ending column is optional.  If it is not specified, then the record
 927 width of the input file is used.  For the inline file (@pxref{BEGIN
 928 DATA}) this is 80 columns; for a file with fixed record widths it is the
 929 record width; for other files it is 1024 characters by default.
 930
 931 The OCCURS subcommand is required.  It must be a number or the name of a
 932 numeric variable.  Its value is the number of groups present in the
 933 current record.
 934
 935 The DATA subcommand is required.  It must be the last subcommand
 936 specified.  It is used to specify the data present within each repeating
 937 group.  Column numbers are specified relative to the beginning of a
 938 group at column 1.  Data is specified in the same way as with @cmd{DATA LIST
 939 FIXED} (@pxref{DATA LIST FIXED}).
 940
 941 All other subcommands are optional.
 942
 943 FILE specifies the file to read, either a file name as a string or a
 944 file handle (@pxref{FILE HANDLE}).  If FILE is not present then the
 945 default is the last file handle used on @cmd{DATA LIST} (lexically, not in
 946 terms of flow of control).
 947
 948 By default @cmd{REPEATING DATA} will output a table describing how it will
 949 parse the input data.  Specifying NOTABLE will disable this behavior;
 950 specifying TABLE will explicitly enable it.
 951
 952 The LENGTH subcommand specifies the length in characters of each group.
 953 If it is not present then length is inferred from the DATA subcommand.
 954 LENGTH can be a number or a variable name.
 955
 956 Normally all the data groups are expected to be present on a single
 957 line.  Use the CONTINUED command to indicate that data can be continued
 958 onto additional lines.  If data on continuation lines starts at the left
 959 margin and continues through the entire field width, no column
 960 specifications are necessary on CONTINUED.  Otherwise, specify the
 961 possible range of columns in the same way as on STARTS.
 962
 963 When data groups are continued from line to line, it is easy
 964 for cases to get out of sync through careless hand editing.  The
 965 ID subcommand allows a case identifier to be present on each line of
 966 repeating data groups.  @cmd{REPEATING DATA} will check for the same
 967 identifier on each line and report mismatches.  Specify the range of
 968 columns that the identifier will occupy, followed by an equals sign
 969 (@samp{=}) and the identifier variable name.  The variable must already
 970 have been declared with @cmd{NUMERIC} or another command.
 971
 972 @cmd{REPEATING DATA} should be the last command given within an
 973 @cmd{INPUT PROGRAM}.  It should not be enclosed within a @cmd{LOOP}
 974 structure (@pxref{LOOP}).  Use @cmd{DATA LIST} before, not after,
 975 @cmd{REPEATING DATA}.
 976
 977 @node WRITE,  , REPEATING DATA, Data Input and Output
 978 @section WRITE
 979 @vindex WRITE
 980
 981 @display
 982 WRITE
 983         OUTFILE='filename'
 984         RECORDS=n_lines
 985         @{NOTABLE,TABLE@}
 986         /[line_no] arg@dots{}
 987
 988 arg takes one of the following forms:
 989         'string' [start-end]
 990         var_list start-end [type_spec]
 991         var_list (fortran_spec)
 992         var_list *
 993 @end display
 994
 995 @code{WRITE} writes text or binary data to an output file.
 996
 997 @xref{PRINT}, for more information on syntax and usage.  The main
 998 difference between @code{PRINT} and @code{WRITE} is that @cmd{WRITE}
 999 uses write formats by default, where PRINT uses print formats.
1000
1001 The sole additional difference is that if @cmd{WRITE} is used to send output
1002 to a binary file, carriage control characters will not be output.
1003 @xref{FILE HANDLE}, for information on how to declare a file as binary.
1004 @setfilename ignored