1 This is pspp.info, produced by makeinfo version 4.0 from pspp.texi.
4 * PSPP: (pspp). Statistical analysis package.
7 PSPP, for statistical analysis of sampled data, by Ben Pfaff.
9 This file documents PSPP, a statistical package for analysis of
10 sampled data that uses a command language compatible with SPSS.
12 Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.
14 This version of the PSPP documentation is consistent with version 2
17 Permission is granted to make and distribute verbatim copies of this
18 manual provided the copyright notice and this permission notice are
19 preserved on all copies.
21 Permission is granted to copy and distribute modified versions of
22 this manual under the conditions for verbatim copying, provided that the
23 entire resulting derived work is distributed under the terms of a
24 permission notice identical to this one.
26 Permission is granted to copy and distribute translations of this
27 manual into another language, under the above condition for modified
28 versions, except that this permission notice may be stated in a
29 translation approved by the Free Software Foundation.
32 File: pspp.info, Node: Date Extraction, Prev: Date Construction, Up: Time & Date
34 Functions that Examine Dates
35 ............................
37 These functions take numeric arguments in PSPP date or time format
38 and give numeric results. These names are used for arguments:
41 A numeric value in PSPP date format.
44 A numeric value in PSPP time format.
47 A numeric value in PSPP time or date format.
49 - Function: XDATE.DATE (TIME-OR-DATE)
50 For a time, results in the time corresponding to the number of
51 whole days DATE-OR-TIME includes. For a date, results in the date
52 corresponding to the latest midnight at or before DATE-OR-TIME;
53 that is, gives the date that DATE-OR-TIME is in. (XDATE.DATE(X)
54 is equivalent to TRUNC(X/86400)*86400.) Applying this function to
55 a time is a Portability: none feature.
57 - Function: XDATE.HOUR (TIME-OR-DATE)
58 For a time, results in the number of whole hours beyond the number
59 of whole days represented by DATE-OR-TIME. For a date, results in
60 the hour (as an integer between 0 and 23) corresponding to
61 DATE-OR-TIME. (XDATE.HOUR(X) is equivalent to
62 MOD(TRUNC(X/3600),24)) Applying this function to a time is a
63 Portability: none feature.
65 - Function: XDATE.JDAY(DATE)
66 Results in the day of the year (as an integer between 1 and 366)
67 corresponding to DATE.
69 - Function: XDATE.MDAY(DATE)
70 Results in the day of the month (as an integer between 1 and 31)
71 corresponding to DATE.
73 - Function: XDATE.MINUTE(TIME-OR-DATE)
74 Results in the number of minutes (as an integer between 0 and 59)
75 after the last hour in TIME-OR-DATE. (XDATE.MINUTE(X) is
76 equivalent to MOD(TRUNC(X/60),60)) Applying this function to a
77 time is a Portability: none feature.
79 - Function: XDATE.MONTH(DATE)
80 Results in the month of the year (as an integer between 1 and 12)
81 corresponding to DATE.
83 - Function: XDATE.QUARTER(DATE)
84 Results in the quarter of the year (as an integer between 1 and 4)
85 corresponding to DATE.
87 - Function: XDATE.SECOND(TIME-OR-DATE)
88 Results in the number of whole seconds after the last whole minute
89 (as an integer between 0 and 59) in TIME-OR-DATE.
90 (XDATE.SECOND(X) is equivalent to MOD(X, 60).) Applying this
91 function to a time is a Portability: none feature.
93 - Function: XDATE.TDAY(TIME)
94 Results in the number of whole days (as an integer) in TIME.
95 (XDATE.TDAY(X) is equivalent to TRUNC(X/86400).)
97 - Function: XDATE.TIME(DATE)
98 Results in the time of day at the instant corresponding to DATE,
99 in PSPP time format. This is the number of seconds since midnight
100 on the day corresponding to DATE. (XDATE.TIME(X) is equivalent to
101 TRUNC(X/86400)*86400.)
103 - Function: XDATE.WEEK(DATE)
104 Results in the week of the year (as an integer between 1 and 53)
105 corresponding to DATE.
107 - Function: XDATE.WKDAY(DATE)
108 Results in the day of week (as an integer between 1 and 7)
109 corresponding to DATE. The days of the week are:
132 - Function: XDATE.YEAR (DATE)
133 Returns the year (as an integer between 1582 and 19999)
134 corresponding to DATE.
137 File: pspp.info, Node: Miscellaneous Functions, Next: Functions Not Implemented, Prev: Time & Date, Up: Functions
139 Miscellaneous Functions
140 -----------------------
142 Miscellaneous functions take various arguments and produce various
145 - Function: LAG (VARIABLE)
146 VARIABLE must be a numeric or string variable name. `LAG' results
147 in the value of that variable for the case before the current one.
148 In case-selection procedures, `LAG' results in the value of the
149 variable for the last case selected. Results in system-missing
150 (for numeric variables) or blanks (for string variables) for the
151 first case or before any cases are selected.
153 - Function: LAG (VARIABLE, NCASES)
154 VARIABLE must be a numeric or string variable name. NCASES must
155 be a small positive constant integer, although there is no explicit
156 limit. (Use of a large value for NCASES will increase memory
157 consumption, since PSPP must keep NCASES cases in memory.) `LAG
158 (VARIABLE, NCASES' results in the value of VARIABLE that is NCASES
159 before the case currently being processed. See `LAG (VARIABLE)'
160 above for more details.
162 - Function: YRMODA (YEAR, MONTH, DAY)
163 YEAR is a year between 0 and 199 or 1582 and 19999. MONTH is a
164 month between 1 and 12. DAY is a day between 1 and 31. If MONTH
165 or DAY is out-of-range, it changes the next higher unit. For
166 instance, a DAY of 0 refers to the last day of the previous month,
167 and a MONTH of 13 refers to the first month of the next year.
168 YEAR must be in range. If YEAR is between 0 and 199, 1900 is
169 added. YEAR, MONTH, and DAY must all be integers.
171 `YRMODA' results in the number of days between 15 Oct 1582 and the
172 date specified, plus one. The date passed to `YRMODA' must be on
173 or after 15 Oct 1582. 15 Oct 1582 has a value of 1.
176 File: pspp.info, Node: Functions Not Implemented, Prev: Miscellaneous Functions, Up: Functions
178 Functions Not Implemented
179 -------------------------
181 These functions are not yet implemented and thus not yet documented,
197 File: pspp.info, Node: Order of Operations, Prev: Functions, Up: Expressions
202 The following table describes operator precedence. Smaller-numbered
203 levels in the table have higher precedence. Within a level, operations
204 are performed from left to right, except for level 2 (exponentiation),
205 where operations are performed from right to left. If an operator
206 appears in the table in two places (`-'), the first occurrence is
207 unary, the second is binary.
219 6. `EQ GE GT LE LT NE'
224 File: pspp.info, Node: Data Input and Output, Next: System and Portable Files, Prev: Expressions, Up: Top
226 Data Input and Output
227 *********************
229 Data is the focus of the PSPP language. This chapter examines the
230 PSPP commands for defining variables and reading and writing data.
232 *Please note:* Data is not actually read until a procedure is
233 executed. These commands tell PSPP how to read data, but they do
234 not _cause_ PSPP to read data.
238 * BEGIN DATA:: Embed data within a syntax file.
239 * CLEAR TRANSFORMATIONS:: Clear pending transformations.
240 * DATA LIST:: Fundamental data reading command.
241 * END CASE:: Output the current case.
242 * END FILE:: Terminate the current input program.
243 * FILE HANDLE:: Support for fixed-length records.
244 * INPUT PROGRAM:: Support for complex input programs.
245 * LIST:: List cases in the active file.
246 * MATRIX DATA:: Read matrices in text format.
247 * NEW FILE:: Clear the active file and dictionary.
248 * PRINT:: Display values in print formats.
249 * PRINT EJECT:: Eject the current page then print.
250 * PRINT SPACE:: Print blank lines.
251 * REREAD:: Take another look at the previous input line.
252 * REPEATING DATA:: Multiple cases on a single line.
253 * WRITE:: Display values in write formats.
256 File: pspp.info, Node: BEGIN DATA, Next: CLEAR TRANSFORMATIONS, Prev: Data Input and Output, Up: Data Input and Output
265 BEGIN DATA and END DATA can be used to embed raw ASCII data in a PSPP
266 syntax file. DATA LIST or another input procedure must be used before
267 BEGIN DATA (*note DATA LIST::). BEGIN DATA and END DATA must be used
268 together. The END DATA command must appear by itself on a single line,
269 with no leading whitespace and exactly one space between the words
270 `END' and `DATA', followed immediately by the terminal dot, like this:
275 File: pspp.info, Node: CLEAR TRANSFORMATIONS, Next: DATA LIST, Prev: BEGIN DATA, Up: Data Input and Output
277 CLEAR TRANSFORMATIONS
278 =====================
280 CLEAR TRANSFORMATIONS.
282 The CLEAR TRANSFORMATIONS command clears out all pending
283 transformations. It does not cancel the current input program. It is
284 valid only when PSPP is interactive, not in syntax files.
287 File: pspp.info, Node: DATA LIST, Next: END CASE, Prev: CLEAR TRANSFORMATIONS, Up: Data Input and Output
292 Used to read text or binary data, DATA LIST is the most fundamental
293 data-reading command. Even the more sophisticated input methods use
294 DATA LIST commands as a building block. Understanding DATA LIST is
295 important to understanding how to use PSPP to read your data files.
297 There are two major variants of DATA LIST, which are fixed format
298 and free format. In addition, free format has a minor variant, list
299 format, which is discussed in terms of its differences from vanilla
302 Each form of DATA LIST is described in detail below.
306 * DATA LIST FIXED:: Fixed columnar locations for data.
307 * DATA LIST FREE:: Any spacing you like.
308 * DATA LIST LIST:: Each case must be on a single line.
311 File: pspp.info, Node: DATA LIST FIXED, Next: DATA LIST FREE, Prev: DATA LIST, Up: DATA LIST
321 /[line_no] var_spec...
323 where each var_spec takes one of the forms
324 var_list start-end [type_spec]
325 var_list (fortran_spec)
327 DATA LIST FIXED is used to read data files that have values at fixed
328 positions on each line of single-line or multiline records. The
329 keyword FIXED is optional.
331 The FILE subcommand must be used if input is to be taken from an
332 external file. It may be used to specify a filename as a string or a
333 file handle (*note FILE HANDLE::). If the FILE subcommand is not used,
334 then input is assumed to be specified within the command file using
335 BEGIN DATA...END DATA (*note BEGIN DATA::).
337 The optional RECORDS subcommand, which takes a single integer as an
338 argument, is used to specify the number of lines per record. If RECORDS
339 is not specified, then the number of lines per record is calculated from
340 the list of variable specifications later in the DATA LIST command.
342 The END subcommand is only useful in conjunction with the INPUT
343 PROGRAM input procedure, and for that reason it is not discussed here
344 (*note INPUT PROGRAM::).
346 DATA LIST can optionally output a table describing how the data file
347 will be read. The TABLE subcommand enables this output, and NOTABLE
348 disables it. The default is to output the table.
350 The list of variables to be read from the data list must come last in
351 the DATA LIST command. Each line in the data record is introduced by a
352 slash (`/'). Optionally, a line number may follow the slash.
353 Following, any number of variable specifications may be present.
355 Each variable specification consists of a list of variable names
356 followed by a description of their location on the input line. Sets of
357 variables may specified using DATA LIST's TO convention (*note Sets of
358 Variables::). There are two ways to specify the location of the
359 variable on the line: SPSS style and FORTRAN style.
361 With SPSS style, the starting column and ending column for the field
362 are specified after the variable name, separated by a dash (`-'). For
363 instance, the third through fifth columns on a line would be specified
364 `3-5'. By default, variables are considered to be in `F' format (*note
365 Input/Output Formats::). (This default can be changed; see *Note SET::
366 for more information.)
368 When using SPSS style, to use a variable format other than the
369 default, specify the format type in parentheses after the column
370 numbers. For instance, for alphanumeric `A' format, use `(A)'.
372 In addition, implied decimal places can be specified in parentheses
373 after the column numbers. As an example, suppose that a data file has a
374 field in which the characters `1234' should be interpreted as having
375 the value 12.34. Then this field has two implied decimal places, and
376 the corresponding specification would be `(2)'. If a field that has
377 implied decimal places contains a decimal point, then the implied
378 decimal places are not applied.
380 Changing the variable format and adding implied decimal places can be
381 done together; for instance, `(N,5)'.
383 When using SPSS style, the input and output width of each variable is
384 computed from the field width. The field width must be evenly divisible
385 into the number of variables specified.
387 FORTRAN style is an altogether different approach to specifying field
388 locations. With this approach, a list of variable input format
389 specifications, separated by commas, are placed after the variable names
390 inside parentheses. Each format specifier advances as many characters
391 into the input line as it uses.
393 In addition to the standard format specifiers (*note Input/Output
394 Formats::), FORTRAN style defines some extensions:
397 Advance the current column on this line by one character position.
400 Set the current column on this line to column X, with column
401 numbers considered to begin with 1 at the left margin.
404 Skip forward X lines in the current record, resetting the active
405 column to the left margin.
408 Any format specifier may be preceded by a number. This causes the
409 action of that format specifier to be repeated the specified
413 Group the given specifiers together. This is most useful when
414 preceded by a repeat count. Groups may be nested arbitrarily.
416 FORTRAN and SPSS styles may be freely intermixed. SPSS style leaves
417 the active column immediately after the ending column specified. Record
418 motion using `NEWREC' in FORTRAN style also applies to later FORTRAN
423 * DATA LIST FIXED Examples:: Examples of DATA LIST FIXED.
426 File: pspp.info, Node: DATA LIST FIXED Examples, Prev: DATA LIST FIXED, Up: DATA LIST FIXED
431 1. DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
439 Defines the following variables:
441 * `NAME', a 10-character-wide long string variable, in columns 1
444 * `INFO1', a numeric variable, in columns 12 through 13.
446 * `INFO2', a numeric variable, in columns 14 through 15.
448 * `INFO3', a numeric variable, in columns 16 through 17.
450 The `BEGIN DATA'/`END DATA' commands cause three cases to be
453 Case NAME INFO1 INFO2 INFO3
454 1 John Smith 10 23 11
455 2 Bob Arnold 12 20 15
458 The `TABLE' keyword causes PSPP to print out a table describing
459 the four variables defined.
461 2. DAT LIS FIL="survey.dat"
462 /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
466 Defines the following variables:
468 * `ID', a numeric variable, in columns 1-5 of the first record.
470 * `NAME', a 30-character long string variable, in columns 7-36
473 * `SURNAME', a 30-character long string variable, in columns
474 38-67 of the first record.
476 * `MINITIAL', a 1-character short string variable, in column 69
479 * Fifty variables `Q01', `Q02', `Q03', ..., `Q49', `Q50', all
480 numeric, `Q01' in column 7, `Q02' in column 8, ..., `Q49' in
481 column 55, `Q50' in column 56, all in the second record.
483 Cases are separated by a blank record.
485 Data is read from file `survey.dat' in the current directory.
487 This example shows keywords abbreviated to their first 3 letters.
491 File: pspp.info, Node: DATA LIST FREE, Next: DATA LIST LIST, Prev: DATA LIST FIXED, Up: DATA LIST
502 where each var_spec takes one of the forms
503 var_list [(type_spec)]
506 In free format, the input data is structured as a series of comma- or
507 whitespace-delimited fields (end of line is one form of whitespace; it
508 is not treated specially). Field contents may be surrounded by matched
509 pairs of apostrophes (`'') or quotes (`"'), or they may be unenclosed.
510 For any type of field leading white space (up to the apostrophe or
511 quote, if any) is not included in the field.
513 Multiple consecutive delimiters are equivalent to a single delimiter.
514 To specify an empty field, write an empty set of single or double
515 quotes; for instance, `""'.
517 The NOTABLE and TABLE subcommands are as in DATA LIST FIXED above.
518 NOTABLE is the default.
520 The FILE and END subcommands are as in DATA LIST FIXED above.
522 The variables to be parsed are given as a single list of variable
523 names. This list must be introduced by a single slash (`/'). The set
524 of variable names may contain format specifications in parentheses
525 (*note Input/Output Formats::). Format specifications apply to all
526 variables back to the previous parenthesized format specification.
528 In addition, an asterisk may be used to indicate that all variables
529 preceding it are to have input/output format `F8.0'.
531 Specified field widths are ignored on input, although all normal
532 limits on field width apply, but they are honored on output.
535 File: pspp.info, Node: DATA LIST LIST, Prev: DATA LIST FREE, Up: DATA LIST
546 where each var_spec takes one of the forms
547 var_list [(type_spec)]
550 Syntactically and semantically, DATA LIST LIST is equivalent to DATA
551 LIST FREE, with one exception: each input line is expected to correspond
552 to exactly one input record. If more or fewer fields are found on an
553 input line than expected, an appropriate diagnostic is issued.
556 File: pspp.info, Node: END CASE, Next: END FILE, Prev: DATA LIST, Up: Data Input and Output
563 END CASE is used within INPUT PROGRAM to output the current case.
564 *Note INPUT PROGRAM::.
567 File: pspp.info, Node: END FILE, Next: FILE HANDLE, Prev: END CASE, Up: Data Input and Output
574 END FILE is used within INPUT PROGRAM to terminate the current input
575 program. *Note INPUT PROGRAM::.
578 File: pspp.info, Node: FILE HANDLE, Next: INPUT PROGRAM, Prev: END FILE, Up: Data Input and Output
583 FILE HANDLE handle_name
585 /RECFORM={VARIABLE,FIXED,SPANNED}
587 /MODE={CHARACTER,IMAGE,BINARY,MULTIPUNCH,360}
589 Use the FILE HANDLE command to define the attributes of a file that
590 does not use conventional variable-length records terminated by newline
593 Specify the file handle name as an identifier. Any given identifier
594 may only appear once in a PSPP run. File handles may not be reassigned
595 to a different file. The file handle name must immediately follow the
596 FILE HANDLE command name.
598 The NAME subcommand specifies the name of the file associated with
599 the handle. It is the only required subcommand.
601 The RECFORM subcommand specifies how the file is laid out. VARIABLE
602 specifies variable-length lines terminated with newlines, and it is the
603 default. FIXED specifies fixed-length records. SPANNED is not
606 LRECL specifies the length of fixed-length records. It is required
607 if `/RECFORM FIXED' is specified.
609 MODE specifies a file mode. CHARACTER, the default, causes the data
610 file to be opened in ANSI C text mode. BINARY causes the data file to
611 be opened in ANSI C binary mode. The other possibilities are not
615 File: pspp.info, Node: INPUT PROGRAM, Next: LIST, Prev: FILE HANDLE, Up: Data Input and Output
621 ... input commands ...
624 The INPUT PROGRAM...END INPUT PROGRAM construct is used to specify a
625 complex input program. By placing data input commands within INPUT
626 PROGRAM, PSPP programs can take advantage of more complex file
627 structures than available by using DATA LIST by itself.
629 The first sort of extended input program is to simply put multiple
630 DATA LIST commands within the INPUT PROGRAM. This will cause all of
631 the data files to be read in parallel. Input will stop when end of
632 file is reached on any of the data files.
634 Transformations, such as conditional and looping constructs, can
635 also be included within an INPUT PROGRAM. These can be used to combine
636 input from several data files in more complex ways. However, input
637 will still stop when end of file is reached on any of the data files.
639 To prevent INPUT PROGRAM from terminating at the first end of file,
640 use the END subcommand on DATA LIST. This subcommand takes a variable
641 name, which should be a numeric scratch variable (*note Scratch
642 Variables::). (It need not be a scratch variable but otherwise the
643 results can be surprising.) The value of this variable is set to 0
644 when reading the data file, or 1 when end of file is encountered.
646 Some additional commands are useful in conjunction with INPUT
647 PROGRAM. END CASE is the first one. Normally each loop through the
648 INPUT PROGRAM structure produces one case. But with END CASE you can
649 control exactly when cases are output. When END CASE is used, looping
650 from the end of INPUT PROGRAM to the beginning does not cause a case to
653 END FILE is the other command. When the END subcommand is used on
654 DATA LIST, there is no way for the INPUT PROGRAM construct to stop
655 looping, so an infinite loop results. The END FILE command, when
656 executed, stops the flow of input data and passes out of the INPUT
659 All this is very confusing. A few examples should help to clarify.
662 DATA LIST NOTABLE FILE='a.data'/X 1-10.
663 DATA LIST NOTABLE FILE='b.data'/Y 1-10.
667 The example above reads variable X from file `a.data' and variable Y
668 from file `b.data'. If one file is shorter than the other then the
669 extra data in the longer file is ignored.
675 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
678 DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
687 This example reads variable X from `a.data' and variable Y from
688 `b.data'. If one file is shorter than the other then the missing field
689 is set to the system-missing value alongside the present value for the
690 remaining length of the longer file.
696 DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
703 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
711 The above example reads data from file `a.data', then from `b.data',
712 and concatenates them into a single active file.
718 DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
726 DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
736 The above example does the same thing as the previous example, in a
741 COMPUTE X=UNIFORM(10).
746 LIST/FORMAT=NUMBERED.
748 The above example causes an active file to be created consisting of
749 50 random variates between 0 and 10.
752 File: pspp.info, Node: LIST, Next: MATRIX DATA, Prev: INPUT PROGRAM, Up: Data Input and Output
759 /CASES=FROM start_index TO end_index BY incr_index
760 /FORMAT={UNNUMBERED,NUMBERED} {WRAP,SINGLE}
763 The LIST procedure prints the values of specified variables to the
766 The VARIABLES subcommand specifies the variables whose values are to
767 be printed. Keyword VARIABLES is optional. If VARIABLES subcommand is
768 not specified then all variables in the active file are printed.
770 The CASES subcommand can be used to specify a subset of cases to be
771 printed. Specify FROM and the case number of the first case to print,
772 TO and the case number of the last case to print, and BY and the number
773 of cases to advance between printing cases, or any subset of those
774 settings. If CASES is not specified then all cases are printed.
776 The FORMAT subcommand can be used to change the output format.
777 NUMBERED will print case numbers along with each case; UNNUMBERED, the
778 default, causes the case numbers to be omitted. The WRAP and SINGLE
779 settings are currently not used. WEIGHT will cause case weights to be
780 printed along with variable values; NOWEIGHT, the default, causes case
781 weights to be omitted from the output.
783 Case numbers start from 1. They are counted after all
784 transformations have been considered.
786 LIST will attempt to fit all the values on a single line. If
787 necessary, variable names will be display vertically in order to fit.
788 If values cannot fit on a single line, then a multi-line format will be
791 LIST is a procedure. It causes the data to be read.
794 File: pspp.info, Node: MATRIX DATA, Next: NEW FILE, Prev: LIST, Up: Data Input and Output
802 /FORMAT={LIST,FREE} {LOWER,UPPER,FULL} {DIAGONAL,NODIAGONAL}
803 /SPLIT={new_var,var_list}
807 /CONTENTS={N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE,
808 DFE,MAT,COV,CORR,PROX}
810 The MATRIX DATA command reads square matrices in one of several
811 textual formats. MATRIX DATA clears the dictionary and replaces it and
814 Use VARIABLES to specify the variables that form the rows and
815 columns of the matrices. You may not specify a variable named
816 VARNAME_. You should specify VARIABLES first.
818 Specify the file to read on FILE, either as a file name string or a
819 file handle (*note FILE HANDLE::). If FILE is not specified then
820 matrix data must immediately follow MATRIX DATA with a BEGIN DATA...END
821 DATA construct (*note BEGIN DATA::).
823 The FORMAT subcommand specifies how the matrices are formatted.
824 LIST, the default, indicates that there is one line per row of matrix
825 data; FREE allows single matrix rows to be broken across multiple
826 lines. This is analogous to the difference between DATA LIST FREE and
827 DATA LIST LIST (*note DATA LIST::). LOWER, the default, indicates that
828 the lower triangle of the matrix is given; UPPER indicates the upper
829 triangle; and FULL indicates that the entire matrix is given.
830 DIAGONAL, the default, indicates that the diagonal is part of the data;
831 NODIAGONAL indicates that it is omitted. DIAGONAL/NODIAGONAL have no
832 effect when FULL is specified.
834 The SPLIT subcommand is used to specify SPLIT FILE variables for the
835 input matrices (*note SPLIT FILE::). Specify either a single variable
836 not specified on VARIABLES, or one or more variables that are specified
837 on VARIABLES. In the former case, the SPLIT values are not present in
838 the data and ROWTYPE_ may not be specified on VARIABLES. In the latter
839 case, the SPLIT values are present in the data.
841 Specify a list of factor variables on FACTORS. Factor variables must
842 also be listed on VARIABLES. Factor variables are used when there are
843 some variables where, for each possible combination of their values,
844 statistics on the matrix variables are included in the data.
846 If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES,
847 the CELLS subcommand is required. Specify the number of factor variable
848 combinations that are given. For instance, if factor variable A has 2
849 values and factor variable B has 3 values, specify 6.
851 The N subcommand specifies a population number of observations.
852 When N is specified, one N record is output for each SPLIT FILE.
854 Use CONTENTS to specify what sort of information the matrices
855 include. Each possible option is described in more detail below. When
856 ROWTYPE_ is specified on VARIABLES, CONTENTS is optional; otherwise, if
857 CONTENTS is not specified then /CONTENTS=CORR is assumed.
862 Number of observations as a vector, one value for each variable.
865 Number of observations as a single value.
874 Vector of standard deviations.
880 Vector of mean squared errors.
883 Vector of degrees of freedom.
897 The exact semantics of the matrices read by MATRIX DATA are complex.
898 Right now MATRIX DATA isn't too useful due to a lack of procedures
899 accepting or producing related data, so these semantics aren't
900 documented. Later, they'll be described here in detail.
903 File: pspp.info, Node: NEW FILE, Next: PRINT, Prev: MATRIX DATA, Up: Data Input and Output
910 The NEW FILE command clears the current active file.
913 File: pspp.info, Node: PRINT, Next: PRINT EJECT, Prev: NEW FILE, Up: Data Input and Output
924 arg takes one of the following forms:
926 var_list start-end [type_spec]
927 var_list (fortran_spec)
930 The PRINT transformation writes variable data to an output file.
931 PRINT is executed when a procedure causes the data to be read. In
932 order to execute the PRINT transformation without invoking a procedure,
933 use the EXECUTE command (*note EXECUTE::).
935 All PRINT subcommands are optional.
937 The OUTFILE subcommand specifies the file to receive the output. The
938 file may be a file name as a string or a file handle (*note FILE
939 HANDLE::). If OUTFILE is not present then output will be sent to PSPP's
942 The RECORDS subcommand specifies the number of lines to be output.
943 The number of lines may optionally be surrounded by parentheses.
945 TABLE will cause the PRINT command to output a table to the listing
946 file that describes what it will print to the output file. NOTABLE, the
947 default, suppresses this output table.
949 Introduce the strings and variables to be printed with a slash
950 (`/'). Optionally, the slash may be followed by a number indicating
951 which output line will be specified. In the absence of this line
952 number, the next line number will be specified. Multiple lines may be
953 specified using multiple slashes with the intended output for a line
954 following its respective slash.
956 Literal strings may be printed. Specify the string itself.
957 Optionally the string may be followed by a column number or range of
958 column numbers, specifying the location on the line for the string to be
959 printed. Otherwise, the string will be printed at the current position
962 Variables to be printed can be specified in the same ways as
963 available for DATA LIST FIXED (*note DATA LIST FIXED::). In addition,
964 a variable list may be followed by an asterisk (`*'), which indicates
965 that the variables should be printed in their dictionary print formats,
966 separated by spaces. A variable list followed by a slash or the end of
967 command will be interpreted the same way.
969 If a FORTRAN type specification is used to move backwards on the
970 current line, then text is written at that point on the line, the line
971 will be truncated to that length, although additional text being added
972 will again extend the line to that length.
975 File: pspp.info, Node: PRINT EJECT, Next: PRINT SPACE, Prev: PRINT, Up: Data Input and Output
986 arg takes one of the following forms:
988 var_list start-end [type_spec]
989 var_list (fortran_spec)
992 PRINT EJECT is used to write data to an output file. Before the
993 data is written, the current page in the listing file is ejected.
995 *Note PRINT::, for more information on syntax and usage.
998 File: pspp.info, Node: PRINT SPACE, Next: REREAD, Prev: PRINT EJECT, Up: Data Input and Output
1003 PRINT SPACE OUTFILE='filename' n_lines.
1005 The PRINT SPACE prints one or more blank lines to an output file.
1007 The OUTFILE subcommand is optional. It may be used to direct output
1008 to a file specified by file name as a string or file handle (*note FILE
1009 HANDLE::). If OUTFILE is not specified then output will be directed to
1012 n_lines is also optional. If present, it is an expression (*note
1013 Expressions::) specifying the number of blank lines to be printed. The
1014 expression must evaluate to a nonnegative value.
1017 File: pspp.info, Node: REREAD, Next: REPEATING DATA, Prev: PRINT SPACE, Up: Data Input and Output
1022 REREAD FILE=handle COLUMN=column.
1024 The REREAD transformation allows the previous input line in a data
1025 file already processed by DATA LIST or another input command to be
1026 re-read for further processing.
1028 The FILE subcommand, which is optional, is used to specify the file
1029 to have its line re-read. The file must be specified in the form of a
1030 file handle (*note FILE HANDLE::). If FILE is not specified then the
1031 last file specified on DATA LIST will be assumed (last file specified
1032 lexically, not in terms of flow-of-control).
1034 By default, the line re-read is re-read in its entirety. With the
1035 COLUMN subcommand, a prefix of the line can be exempted from
1036 re-reading. Specify an expression (*note Expressions::) evaluating to
1037 the first column that should be included in the re-read line. Columns
1038 are numbered from 1 at the left margin.
1040 Multiple REREAD commands will not back up in the data file. Instead,
1041 they will re-read the same line multiple times.
1044 File: pspp.info, Node: REPEATING DATA, Next: WRITE, Prev: REREAD, Up: Data Input and Output
1054 /CONTINUED[=cont_start-cont_end]
1055 /ID=id_start-id_end=id_var
1059 where each var_spec takes one of the forms
1060 var_list start-end [type_spec]
1061 var_list (fortran_spec)
1063 The REPEATING DATA command is used to parse groups of data repeating
1064 in a uniform format, possibly with several groups on a single line.
1065 Each group of data corresponds with one case. REPEATING DATA may only
1066 be used within an INPUT PROGRAM structure. When used with DATA LIST, it
1067 can be used to parse groups of cases that share a subset of variables
1068 but differ in their other data.
1070 The STARTS subcommand is required. Specify a range of columns, using
1071 literal numbers or numeric variable names. This range specifies the
1072 columns on the first line that are used to contain groups of data. The
1073 ending column is optional. If it is not specified, then the record
1074 width of the input file is used. For the inline file (*note BEGIN
1075 DATA::) this is 80 columns; for a file with fixed record widths it is
1076 the record width; for other files it is 1024 characters by default.
1078 The OCCURS subcommand is required. It must be a number or the name
1079 of a numeric variable. Its value is the number of groups present in the
1082 The DATA subcommand is required. It must be the last subcommand
1083 specified. It is used to specify the data present within each repeating
1084 group. Column numbers are specified relative to the beginning of a
1085 group at column 1. Data is specified in the same way as with DATA LIST
1086 FIXED (*note DATA LIST FIXED::).
1088 All other subcommands are optional.
1090 FILE specifies the file to read, either a file name as a string or a
1091 file handle (*note FILE HANDLE::). If FILE is not present then the
1092 default is the last file handle used on DATA LIST (lexically, not in
1093 terms of flow of control).
1095 By default REPEATING DATA will output a table describing how it will
1096 parse the input data. Specifying NOTABLE will disable this behavior;
1097 specifying TABLE will explicitly enable it.
1099 The LENGTH subcommand specifies the length in characters of each
1100 group. If it is not present then length is inferred from the DATA
1101 subcommand. LENGTH can be a number or a variable name.
1103 Normally all the data groups are expected to be present on a single
1104 line. Use the CONTINUED command to indicate that data can be continued
1105 onto additional lines. If data on continuation lines starts at the left
1106 margin and continues through the entire field width, no column
1107 specifications are necessary on CONTINUED. Otherwise, specify the
1108 possible range of columns in the same way as on STARTS.
1110 When data groups are continued from line to line, it's easily
1111 possible for cases to get out of sync if hand editing is not done
1112 carefully. The ID subcommand allows a case identifier to be present on
1113 each line of repeating data groups. REPEATING DATA will check for the
1114 same identifier on each line and report mismatches. Specify the range
1115 of columns that the identifier will occupy, followed by an equals sign
1116 (`=') and the identifier variable name. The variable must already have
1117 been declared with NUMERIC or another command.
1120 File: pspp.info, Node: WRITE, Prev: REPEATING DATA, Up: Data Input and Output
1131 arg takes one of the following forms:
1132 'string' [start-end]
1133 var_list start-end [type_spec]
1134 var_list (fortran_spec)
1137 WRITE is used to write text or binary data to an output file.
1139 *Note PRINT::, for more information on syntax and usage. The main
1140 difference between PRINT and WRITE is that whereas by default PRINT uses
1141 variables' print formats, WRITE uses write formats.
1143 The sole additional difference is that if WRITE is used to send
1144 output to a binary file, carriage control characters will not be output.
1145 *Note FILE HANDLE::, for information on how to declare a file as binary.
1148 File: pspp.info, Node: System and Portable Files, Next: Variable Attributes, Prev: Data Input and Output, Up: Top
1150 System Files and Portable Files
1151 *******************************
1153 The commands in this chapter read, write, and examine system files
1158 * APPLY DICTIONARY:: Apply system file dictionary to active file.
1159 * EXPORT:: Write to a portable file.
1160 * GET:: Read from a system file.
1161 * IMPORT:: Read from a portable file.
1162 * MATCH FILES:: Merge system files.
1163 * SAVE:: Write to a system file.
1164 * SYSFILE INFO:: Display system file dictionary.
1165 * XSAVE:: Write to a system file, as a transform.
1168 File: pspp.info, Node: APPLY DICTIONARY, Next: EXPORT, Prev: System and Portable Files, Up: System and Portable Files
1173 APPLY DICTIONARY FROM='filename'.
1175 The APPLY DICTIONARY command applies the variable labels, value
1176 labels, and missing values from variables in a system file to
1177 corresponding variables in the active file. In some cases it also
1178 updates the weighting variable.
1180 Specify a system file with a file name string or as a file handle
1181 (*note FILE HANDLE::). The dictionary in the system file will be read,
1182 but it will not replace the active file dictionary. The system file's
1183 data will not be read.
1185 Only variables with names that exist in both the active file and the
1186 system file are considered. Variables with the same name but different
1187 types (numeric, string) will cause an error message. Otherwise, the
1188 system file variables' attributes will replace those in their matching
1189 active file variables, as described below.
1191 If a system file variable has a variable label, then it will replace
1192 the active file variable's variable label. If the system file variable
1193 does not have a variable label, then the active file variable's variable
1194 label, if any, will be retained.
1196 If the active file variable is numeric or short string, then value
1197 labels and missing values, if any, will be copied to the active file
1198 variable. If the system file variable does not have value labels or
1199 missing values, then those in the active file variable, if any, will not
1202 Finally, weighting of the active file is updated (*note WEIGHT::).
1203 If the active file has a weighting variable, and the system file does
1204 not, or if the weighting variable in the system file does not exist in
1205 the active file, then the active file weighting variable, if any, is
1206 retained. Otherwise, the weighting variable in the system file becomes
1207 the active file weighting variable.
1209 APPLY DICTIONARY takes effect immediately. It does not read the
1210 active file. The system file is not modified.
1213 File: pspp.info, Node: EXPORT, Next: GET, Prev: APPLY DICTIONARY, Up: System and Portable Files
1222 /RENAME=(src_names=target_names)...
1224 The EXPORT procedure writes the active file dictionary and data to a
1225 specified portable file.
1227 The OUTFILE subcommand, which is the only required subcommand,
1228 specifies the portable file to be written as a file name string or a
1229 file handle (*note FILE HANDLE::).
1231 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
1234 EXPORT is a procedure. It causes the active file to be read.
1237 File: pspp.info, Node: GET, Next: IMPORT, Prev: EXPORT, Up: System and Portable Files
1246 /RENAME=(src_names=target_names)...
1248 The GET transformation clears the current dictionary and active file
1249 and replaces them with the dictionary and data from a specified system
1252 The FILE subcommand is the only required subcommand. Specify the
1253 system file to be read as a string file name or a file handle (*note
1256 By default, all the variables in a system file are read. The DROP
1257 subcommand can be used to specify a list of variables that are not to be
1258 read. By contrast, the KEEP subcommand can be used to specify variable
1259 that are to be read, with all other variables not read.
1261 Normally variables in a system file retain the names that they were
1262 saved under. Use the RENAME subcommand to change these names. Specify,
1263 within parentheses, a list of variable names followed by an equals sign
1264 (`=') and the names that they should be renamed to. Multiple
1265 parenthesized groups of variable names can be included on a single
1266 RENAME subcommand. Variables' names may be swapped using a RENAME
1267 subcommand of the form `/RENAME=(A B=B A)'.
1269 Alternate syntax for the RENAME subcommand allows the parentheses to
1270 be eliminated. When this is done, only a single variable may be
1271 renamed at once. For instance, `/RENAME=A=B'. This alternate syntax is
1274 DROP, KEEP, and RENAME are performed in left-to-right order. They
1275 each may be present any number of times.
1277 Please note that DROP, KEEP, and RENAME do not cause the system file
1278 on disk to be modified. Only the active file read from the system file
1281 GET does not cause the data to be read, only the dictionary. The
1282 data is read later, when a procedure is executed.
1285 File: pspp.info, Node: IMPORT, Next: MATCH FILES, Prev: GET, Up: System and Portable Files
1295 /RENAME=(src_names=target_names)...
1297 The IMPORT transformation clears the active file dictionary and data
1298 and replaces them with a dictionary and data from a portable file on
1301 The FILE subcommand, which is the only required subcommand, specifies
1302 the portable file to be read as a file name string or a file handle
1303 (*note FILE HANDLE::).
1305 The TYPE subcommand is currently not used.
1307 DROP, KEEP, and RENAME follow the syntax used by GET (*note GET::).
1309 IMPORT does not cause the data to be read, only the dictionary. The
1310 data is read later, when a procedure is executed.
1313 File: pspp.info, Node: MATCH FILES, Next: SAVE, Prev: IMPORT, Up: System and Portable Files
1320 /{FILE,TABLE}={*,'filename'}
1323 /RENAME=(src_names=target_names)...
1329 The MATCH FILES command merges one or more system files, optionally
1330 including the active file. Records with the same values for BY
1331 variables are combined into a single record. Records with different
1332 values are output in order. Thus, multiple sorted system files are
1333 combined into a single sorted system file based on the value of the BY
1336 The BY subcommand specifies a list of variables that are used to
1337 match records from each of the system files. Variables specified must
1338 exist in all the files specified on FILE and TABLE. BY should usually
1339 be specified. If TABLE is used then BY is required.
1341 Specify FILE with a system file as a file name string or file handle
1342 (*note FILE HANDLE::). An asterisk (`*') may also be specified to
1343 indicate the current active file. The files specified on FILE are
1344 merged together based on the BY variables, or combined case-by-case if
1345 BY is not specified. Normally at least two FILE subcommands should be
1348 Specify TABLE with a system file in order to use it as a "table
1349 lookup file". Records in table lookup files are not used up after
1350 they've been used once. This means that data in table lookup files can
1351 correspond to any number of records in FILE files. Table lookup files
1352 correspond to lookup tables in traditional relational database systems.
1353 It is incorrect to have records with duplicate BY values in table lookup
1356 Any number of FILE and TABLE subcommands may be specified. Each
1357 instance of FILE or TABLE can be followed by DROP, KEEP, and/or RENAME
1358 subcommands. These take the same form as the corresponding subcommands
1359 of GET (*note GET::), and perform the same functions.
1361 Variables belonging to files that are not present for the current
1362 case are set to the system-missing value for numeric variables or
1363 spaces for string variables.
1365 IN, FIRST, LAST, and MAP are currently not used.