This is pspp.info, produced by makeinfo version 4.0 from pspp.texi. START-INFO-DIR-ENTRY * PSPP: (pspp). Statistical analysis package. END-INFO-DIR-ENTRY PSPP, for statistical analysis of sampled data, by Ben Pfaff. This file documents PSPP, a statistical package for analysis of sampled data that uses a command language compatible with SPSS. Copyright (C) 1996-9, 2000 Free Software Foundation, Inc. This version of the PSPP documentation is consistent with version 2 of "texinfo.tex". Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above condition for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation.  File: pspp.info, Node: Date Extraction, Prev: Date Construction, Up: Time & Date Functions that Examine Dates ............................ These functions take numeric arguments in PSPP date or time format and give numeric results. These names are used for arguments: DATE A numeric value in PSPP date format. TIME A numeric value in PSPP time format. TIME-OR-DATE A numeric value in PSPP time or date format. - Function: XDATE.DATE (TIME-OR-DATE) For a time, results in the time corresponding to the number of whole days DATE-OR-TIME includes. For a date, results in the date corresponding to the latest midnight at or before DATE-OR-TIME; that is, gives the date that DATE-OR-TIME is in. (XDATE.DATE(X) is equivalent to TRUNC(X/86400)*86400.) Applying this function to a time is a Portability: none feature. - Function: XDATE.HOUR (TIME-OR-DATE) For a time, results in the number of whole hours beyond the number of whole days represented by DATE-OR-TIME. For a date, results in the hour (as an integer between 0 and 23) corresponding to DATE-OR-TIME. (XDATE.HOUR(X) is equivalent to MOD(TRUNC(X/3600),24)) Applying this function to a time is a Portability: none feature. - Function: XDATE.JDAY(DATE) Results in the day of the year (as an integer between 1 and 366) corresponding to DATE. - Function: XDATE.MDAY(DATE) Results in the day of the month (as an integer between 1 and 31) corresponding to DATE. - Function: XDATE.MINUTE(TIME-OR-DATE) Results in the number of minutes (as an integer between 0 and 59) after the last hour in TIME-OR-DATE. (XDATE.MINUTE(X) is equivalent to MOD(TRUNC(X/60),60)) Applying this function to a time is a Portability: none feature. - Function: XDATE.MONTH(DATE) Results in the month of the year (as an integer between 1 and 12) corresponding to DATE. - Function: XDATE.QUARTER(DATE) Results in the quarter of the year (as an integer between 1 and 4) corresponding to DATE. - Function: XDATE.SECOND(TIME-OR-DATE) Results in the number of whole seconds after the last whole minute (as an integer between 0 and 59) in TIME-OR-DATE. (XDATE.SECOND(X) is equivalent to MOD(X, 60).) Applying this function to a time is a Portability: none feature. - Function: XDATE.TDAY(TIME) Results in the number of whole days (as an integer) in TIME. (XDATE.TDAY(X) is equivalent to TRUNC(X/86400).) - Function: XDATE.TIME(DATE) Results in the time of day at the instant corresponding to DATE, in PSPP time format. This is the number of seconds since midnight on the day corresponding to DATE. (XDATE.TIME(X) is equivalent to TRUNC(X/86400)*86400.) - Function: XDATE.WEEK(DATE) Results in the week of the year (as an integer between 1 and 53) corresponding to DATE. - Function: XDATE.WKDAY(DATE) Results in the day of week (as an integer between 1 and 7) corresponding to DATE. The days of the week are: 1 Sunday 2 Monday 3 Tuesday 4 Wednesday 5 Thursday 6 Friday 7 Saturday - Function: XDATE.YEAR (DATE) Returns the year (as an integer between 1582 and 19999) corresponding to DATE.  File: pspp.info, Node: Miscellaneous Functions, Next: Functions Not Implemented, Prev: Time & Date, Up: Functions Miscellaneous Functions ----------------------- Miscellaneous functions take various arguments and produce various results. - Function: LAG (VARIABLE) VARIABLE must be a numeric or string variable name. `LAG' results in the value of that variable for the case before the current one. In case-selection procedures, `LAG' results in the value of the variable for the last case selected. Results in system-missing (for numeric variables) or blanks (for string variables) for the first case or before any cases are selected. - Function: LAG (VARIABLE, NCASES) VARIABLE must be a numeric or string variable name. NCASES must be a small positive constant integer, although there is no explicit limit. (Use of a large value for NCASES will increase memory consumption, since PSPP must keep NCASES cases in memory.) `LAG (VARIABLE, NCASES' results in the value of VARIABLE that is NCASES before the case currently being processed. See `LAG (VARIABLE)' above for more details. - Function: YRMODA (YEAR, MONTH, DAY) YEAR is a year between 0 and 199 or 1582 and 19999. MONTH is a month between 1 and 12. DAY is a day between 1 and 31. If MONTH or DAY is out-of-range, it changes the next higher unit. For instance, a DAY of 0 refers to the last day of the previous month, and a MONTH of 13 refers to the first month of the next year. YEAR must be in range. If YEAR is between 0 and 199, 1900 is added. YEAR, MONTH, and DAY must all be integers. `YRMODA' results in the number of days between 15 Oct 1582 and the date specified, plus one. The date passed to `YRMODA' must be on or after 15 Oct 1582. 15 Oct 1582 has a value of 1.  File: pspp.info, Node: Functions Not Implemented, Prev: Miscellaneous Functions, Up: Functions Functions Not Implemented ------------------------- These functions are not yet implemented and thus not yet documented, since it's a hassle. * `CDF.xxx' * `CDFNORM' * `IDF.xxx' * `NCDF.xxx' * `PROBIT' * `RV.xxx'  File: pspp.info, Node: Order of Operations, Prev: Functions, Up: Expressions Operator Precedence =================== The following table describes operator precedence. Smaller-numbered levels in the table have higher precedence. Within a level, operations are performed from left to right, except for level 2 (exponentiation), where operations are performed from right to left. If an operator appears in the table in two places (`-'), the first occurrence is unary, the second is binary. 1. `( )' 2. `**' 3. `-' 4. `* /' 5. `+ -' 6. `EQ GE GT LE LT NE' 7. `AND NOT OR'  File: pspp.info, Node: Data Input and Output, Next: System and Portable Files, Prev: Expressions, Up: Top Data Input and Output ********************* Data is the focus of the PSPP language. This chapter examines the PSPP commands for defining variables and reading and writing data. *Please note:* Data is not actually read until a procedure is executed. These commands tell PSPP how to read data, but they do not _cause_ PSPP to read data. * Menu: * BEGIN DATA:: Embed data within a syntax file. * CLEAR TRANSFORMATIONS:: Clear pending transformations. * DATA LIST:: Fundamental data reading command. * END CASE:: Output the current case. * END FILE:: Terminate the current input program. * FILE HANDLE:: Support for fixed-length records. * INPUT PROGRAM:: Support for complex input programs. * LIST:: List cases in the active file. * MATRIX DATA:: Read matrices in text format. * NEW FILE:: Clear the active file and dictionary. * PRINT:: Display values in print formats. * PRINT EJECT:: Eject the current page then print. * PRINT SPACE:: Print blank lines. * REREAD:: Take another look at the previous input line. * REPEATING DATA:: Multiple cases on a single line. * WRITE:: Display values in write formats.  File: pspp.info, Node: BEGIN DATA, Next: CLEAR TRANSFORMATIONS, Prev: Data Input and Output, Up: Data Input and Output BEGIN DATA ========== BEGIN DATA. ... END DATA. BEGIN DATA and END DATA can be used to embed raw ASCII data in a PSPP syntax file. DATA LIST or another input procedure must be used before BEGIN DATA (*note DATA LIST::). BEGIN DATA and END DATA must be used together. The END DATA command must appear by itself on a single line, with no leading whitespace and exactly one space between the words `END' and `DATA', followed immediately by the terminal dot, like this: END DATA.  File: pspp.info, Node: CLEAR TRANSFORMATIONS, Next: DATA LIST, Prev: BEGIN DATA, Up: Data Input and Output CLEAR TRANSFORMATIONS ===================== CLEAR TRANSFORMATIONS. The CLEAR TRANSFORMATIONS command clears out all pending transformations. It does not cancel the current input program. It is valid only when PSPP is interactive, not in syntax files.  File: pspp.info, Node: DATA LIST, Next: END CASE, Prev: CLEAR TRANSFORMATIONS, Up: Data Input and Output DATA LIST ========= Used to read text or binary data, DATA LIST is the most fundamental data-reading command. Even the more sophisticated input methods use DATA LIST commands as a building block. Understanding DATA LIST is important to understanding how to use PSPP to read your data files. There are two major variants of DATA LIST, which are fixed format and free format. In addition, free format has a minor variant, list format, which is discussed in terms of its differences from vanilla free format. Each form of DATA LIST is described in detail below. * Menu: * DATA LIST FIXED:: Fixed columnar locations for data. * DATA LIST FREE:: Any spacing you like. * DATA LIST LIST:: Each case must be on a single line.  File: pspp.info, Node: DATA LIST FIXED, Next: DATA LIST FREE, Prev: DATA LIST, Up: DATA LIST DATA LIST FIXED --------------- DATA LIST [FIXED] {TABLE,NOTABLE} FILE='filename' RECORDS=record_count END=end_var /[line_no] var_spec... where each var_spec takes one of the forms var_list start-end [type_spec] var_list (fortran_spec) DATA LIST FIXED is used to read data files that have values at fixed positions on each line of single-line or multiline records. The keyword FIXED is optional. The FILE subcommand must be used if input is to be taken from an external file. It may be used to specify a filename as a string or a file handle (*note FILE HANDLE::). If the FILE subcommand is not used, then input is assumed to be specified within the command file using BEGIN DATA...END DATA (*note BEGIN DATA::). The optional RECORDS subcommand, which takes a single integer as an argument, is used to specify the number of lines per record. If RECORDS is not specified, then the number of lines per record is calculated from the list of variable specifications later in the DATA LIST command. The END subcommand is only useful in conjunction with the INPUT PROGRAM input procedure, and for that reason it is not discussed here (*note INPUT PROGRAM::). DATA LIST can optionally output a table describing how the data file will be read. The TABLE subcommand enables this output, and NOTABLE disables it. The default is to output the table. The list of variables to be read from the data list must come last in the DATA LIST command. Each line in the data record is introduced by a slash (`/'). Optionally, a line number may follow the slash. Following, any number of variable specifications may be present. Each variable specification consists of a list of variable names followed by a description of their location on the input line. Sets of variables may specified using DATA LIST's TO convention (*note Sets of Variables::). There are two ways to specify the location of the variable on the line: SPSS style and FORTRAN style. With SPSS style, the starting column and ending column for the field are specified after the variable name, separated by a dash (`-'). For instance, the third through fifth columns on a line would be specified `3-5'. By default, variables are considered to be in `F' format (*note Input/Output Formats::). (This default can be changed; see *Note SET:: for more information.) When using SPSS style, to use a variable format other than the default, specify the format type in parentheses after the column numbers. For instance, for alphanumeric `A' format, use `(A)'. In addition, implied decimal places can be specified in parentheses after the column numbers. As an example, suppose that a data file has a field in which the characters `1234' should be interpreted as having the value 12.34. Then this field has two implied decimal places, and the corresponding specification would be `(2)'. If a field that has implied decimal places contains a decimal point, then the implied decimal places are not applied. Changing the variable format and adding implied decimal places can be done together; for instance, `(N,5)'. When using SPSS style, the input and output width of each variable is computed from the field width. The field width must be evenly divisible into the number of variables specified. FORTRAN style is an altogether different approach to specifying field locations. With this approach, a list of variable input format specifications, separated by commas, are placed after the variable names inside parentheses. Each format specifier advances as many characters into the input line as it uses. In addition to the standard format specifiers (*note Input/Output Formats::), FORTRAN style defines some extensions: `X' Advance the current column on this line by one character position. `T'X Set the current column on this line to column X, with column numbers considered to begin with 1 at the left margin. `NEWREC'X Skip forward X lines in the current record, resetting the active column to the left margin. Repeat count Any format specifier may be preceded by a number. This causes the action of that format specifier to be repeated the specified number of times. (SPEC1, ..., SPECN) Group the given specifiers together. This is most useful when preceded by a repeat count. Groups may be nested arbitrarily. FORTRAN and SPSS styles may be freely intermixed. SPSS style leaves the active column immediately after the ending column specified. Record motion using `NEWREC' in FORTRAN style also applies to later FORTRAN and SPSS specifiers. * Menu: * DATA LIST FIXED Examples:: Examples of DATA LIST FIXED.  File: pspp.info, Node: DATA LIST FIXED Examples, Prev: DATA LIST FIXED, Up: DATA LIST FIXED Examples ........ 1. DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1). BEGIN DATA. John Smith 102311 Bob Arnold 122015 Bill Yates 918 6 END DATA. Defines the following variables: * `NAME', a 10-character-wide long string variable, in columns 1 through 10. * `INFO1', a numeric variable, in columns 12 through 13. * `INFO2', a numeric variable, in columns 14 through 15. * `INFO3', a numeric variable, in columns 16 through 17. The `BEGIN DATA'/`END DATA' commands cause three cases to be defined: Case NAME INFO1 INFO2 INFO3 1 John Smith 10 23 11 2 Bob Arnold 12 20 15 3 Bill Yates 9 18 6 The `TABLE' keyword causes PSPP to print out a table describing the four variables defined. 2. DAT LIS FIL="survey.dat" /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A) /Q01 TO Q50 7-56 /. Defines the following variables: * `ID', a numeric variable, in columns 1-5 of the first record. * `NAME', a 30-character long string variable, in columns 7-36 of the first record. * `SURNAME', a 30-character long string variable, in columns 38-67 of the first record. * `MINITIAL', a 1-character short string variable, in column 69 of the first record. * Fifty variables `Q01', `Q02', `Q03', ..., `Q49', `Q50', all numeric, `Q01' in column 7, `Q02' in column 8, ..., `Q49' in column 55, `Q50' in column 56, all in the second record. Cases are separated by a blank record. Data is read from file `survey.dat' in the current directory. This example shows keywords abbreviated to their first 3 letters.  File: pspp.info, Node: DATA LIST FREE, Next: DATA LIST LIST, Prev: DATA LIST FIXED, Up: DATA LIST DATA LIST FREE -------------- DATA LIST FREE [{NOTABLE,TABLE}] FILE='filename' END=end_var /var_spec... where each var_spec takes one of the forms var_list [(type_spec)] var_list * In free format, the input data is structured as a series of comma- or whitespace-delimited fields (end of line is one form of whitespace; it is not treated specially). Field contents may be surrounded by matched pairs of apostrophes (`'') or quotes (`"'), or they may be unenclosed. For any type of field leading white space (up to the apostrophe or quote, if any) is not included in the field. Multiple consecutive delimiters are equivalent to a single delimiter. To specify an empty field, write an empty set of single or double quotes; for instance, `""'. The NOTABLE and TABLE subcommands are as in DATA LIST FIXED above. NOTABLE is the default. The FILE and END subcommands are as in DATA LIST FIXED above. The variables to be parsed are given as a single list of variable names. This list must be introduced by a single slash (`/'). The set of variable names may contain format specifications in parentheses (*note Input/Output Formats::). Format specifications apply to all variables back to the previous parenthesized format specification. In addition, an asterisk may be used to indicate that all variables preceding it are to have input/output format `F8.0'. Specified field widths are ignored on input, although all normal limits on field width apply, but they are honored on output.  File: pspp.info, Node: DATA LIST LIST, Prev: DATA LIST FREE, Up: DATA LIST DATA LIST LIST -------------- DATA LIST LIST [{NOTABLE,TABLE}] FILE='filename' END=end_var /var_spec... where each var_spec takes one of the forms var_list [(type_spec)] var_list * Syntactically and semantically, DATA LIST LIST is equivalent to DATA LIST FREE, with one exception: each input line is expected to correspond to exactly one input record. If more or fewer fields are found on an input line than expected, an appropriate diagnostic is issued.  File: pspp.info, Node: END CASE, Next: END FILE, Prev: DATA LIST, Up: Data Input and Output END CASE ======== END CASE. END CASE is used within INPUT PROGRAM to output the current case. *Note INPUT PROGRAM::.  File: pspp.info, Node: END FILE, Next: FILE HANDLE, Prev: END CASE, Up: Data Input and Output END FILE ======== END FILE. END FILE is used within INPUT PROGRAM to terminate the current input program. *Note INPUT PROGRAM::.  File: pspp.info, Node: FILE HANDLE, Next: INPUT PROGRAM, Prev: END FILE, Up: Data Input and Output FILE HANDLE =========== FILE HANDLE handle_name /NAME='filename' /RECFORM={VARIABLE,FIXED,SPANNED} /LRECL=rec_len /MODE={CHARACTER,IMAGE,BINARY,MULTIPUNCH,360} Use the FILE HANDLE command to define the attributes of a file that does not use conventional variable-length records terminated by newline characters. Specify the file handle name as an identifier. Any given identifier may only appear once in a PSPP run. File handles may not be reassigned to a different file. The file handle name must immediately follow the FILE HANDLE command name. The NAME subcommand specifies the name of the file associated with the handle. It is the only required subcommand. The RECFORM subcommand specifies how the file is laid out. VARIABLE specifies variable-length lines terminated with newlines, and it is the default. FIXED specifies fixed-length records. SPANNED is not supported. LRECL specifies the length of fixed-length records. It is required if `/RECFORM FIXED' is specified. MODE specifies a file mode. CHARACTER, the default, causes the data file to be opened in ANSI C text mode. BINARY causes the data file to be opened in ANSI C binary mode. The other possibilities are not supported.  File: pspp.info, Node: INPUT PROGRAM, Next: LIST, Prev: FILE HANDLE, Up: Data Input and Output INPUT PROGRAM ============= INPUT PROGRAM. ... input commands ... END INPUT PROGRAM. The INPUT PROGRAM...END INPUT PROGRAM construct is used to specify a complex input program. By placing data input commands within INPUT PROGRAM, PSPP programs can take advantage of more complex file structures than available by using DATA LIST by itself. The first sort of extended input program is to simply put multiple DATA LIST commands within the INPUT PROGRAM. This will cause all of the data files to be read in parallel. Input will stop when end of file is reached on any of the data files. Transformations, such as conditional and looping constructs, can also be included within an INPUT PROGRAM. These can be used to combine input from several data files in more complex ways. However, input will still stop when end of file is reached on any of the data files. To prevent INPUT PROGRAM from terminating at the first end of file, use the END subcommand on DATA LIST. This subcommand takes a variable name, which should be a numeric scratch variable (*note Scratch Variables::). (It need not be a scratch variable but otherwise the results can be surprising.) The value of this variable is set to 0 when reading the data file, or 1 when end of file is encountered. Some additional commands are useful in conjunction with INPUT PROGRAM. END CASE is the first one. Normally each loop through the INPUT PROGRAM structure produces one case. But with END CASE you can control exactly when cases are output. When END CASE is used, looping from the end of INPUT PROGRAM to the beginning does not cause a case to be output. END FILE is the other command. When the END subcommand is used on DATA LIST, there is no way for the INPUT PROGRAM construct to stop looping, so an infinite loop results. The END FILE command, when executed, stops the flow of input data and passes out of the INPUT PROGRAM structure. All this is very confusing. A few examples should help to clarify. INPUT PROGRAM. DATA LIST NOTABLE FILE='a.data'/X 1-10. DATA LIST NOTABLE FILE='b.data'/Y 1-10. END INPUT PROGRAM. LIST. The example above reads variable X from file `a.data' and variable Y from file `b.data'. If one file is shorter than the other then the extra data in the longer file is ignored. INPUT PROGRAM. NUMERIC #A #B. DO IF NOT #A. DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. END IF. DO IF NOT #B. DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10. END IF. DO IF #A AND #B. END FILE. END IF. END CASE. END INPUT PROGRAM. LIST. This example reads variable X from `a.data' and variable Y from `b.data'. If one file is shorter than the other then the missing field is set to the system-missing value alongside the present value for the remaining length of the longer file. INPUT PROGRAM. NUMERIC #A #B. DO IF #A. DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10. DO IF #B. END FILE. ELSE. END CASE. END IF. ELSE. DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. DO IF NOT #A. END CASE. END IF. END IF. END INPUT PROGRAM. LIST. The above example reads data from file `a.data', then from `b.data', and concatenates them into a single active file. INPUT PROGRAM. NUMERIC #EOF. LOOP IF NOT #EOF. DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10. DO IF NOT #EOF. END CASE. END IF. END LOOP. COMPUTE #EOF = 0. LOOP IF NOT #EOF. DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10. DO IF NOT #EOF. END CASE. END IF. END LOOP. END FILE. END INPUT PROGRAM. LIST. The above example does the same thing as the previous example, in a different way. INPUT PROGRAM. LOOP #I=1 TO 50. COMPUTE X=UNIFORM(10). END CASE. END LOOP. END FILE. END INPUT PROGRAM. LIST/FORMAT=NUMBERED. The above example causes an active file to be created consisting of 50 random variates between 0 and 10.  File: pspp.info, Node: LIST, Next: MATRIX DATA, Prev: INPUT PROGRAM, Up: Data Input and Output LIST ==== LIST /VARIABLES=var_list /CASES=FROM start_index TO end_index BY incr_index /FORMAT={UNNUMBERED,NUMBERED} {WRAP,SINGLE} {NOWEIGHT,WEIGHT} The LIST procedure prints the values of specified variables to the listing file. The VARIABLES subcommand specifies the variables whose values are to be printed. Keyword VARIABLES is optional. If VARIABLES subcommand is not specified then all variables in the active file are printed. The CASES subcommand can be used to specify a subset of cases to be printed. Specify FROM and the case number of the first case to print, TO and the case number of the last case to print, and BY and the number of cases to advance between printing cases, or any subset of those settings. If CASES is not specified then all cases are printed. The FORMAT subcommand can be used to change the output format. NUMBERED will print case numbers along with each case; UNNUMBERED, the default, causes the case numbers to be omitted. The WRAP and SINGLE settings are currently not used. WEIGHT will cause case weights to be printed along with variable values; NOWEIGHT, the default, causes case weights to be omitted from the output. Case numbers start from 1. They are counted after all transformations have been considered. LIST will attempt to fit all the values on a single line. If necessary, variable names will be display vertically in order to fit. If values cannot fit on a single line, then a multi-line format will be used. LIST is a procedure. It causes the data to be read.  File: pspp.info, Node: MATRIX DATA, Next: NEW FILE, Prev: LIST, Up: Data Input and Output MATRIX DATA =========== MATRIX DATA /VARIABLES=var_list /FILE='filename' /FORMAT={LIST,FREE} {LOWER,UPPER,FULL} {DIAGONAL,NODIAGONAL} /SPLIT={new_var,var_list} /FACTORS=var_list /CELLS=n_cells /N=n /CONTENTS={N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE, DFE,MAT,COV,CORR,PROX} The MATRIX DATA command reads square matrices in one of several textual formats. MATRIX DATA clears the dictionary and replaces it and reads a data file. Use VARIABLES to specify the variables that form the rows and columns of the matrices. You may not specify a variable named VARNAME_. You should specify VARIABLES first. Specify the file to read on FILE, either as a file name string or a file handle (*note FILE HANDLE::). If FILE is not specified then matrix data must immediately follow MATRIX DATA with a BEGIN DATA...END DATA construct (*note BEGIN DATA::). The FORMAT subcommand specifies how the matrices are formatted. LIST, the default, indicates that there is one line per row of matrix data; FREE allows single matrix rows to be broken across multiple lines. This is analogous to the difference between DATA LIST FREE and DATA LIST LIST (*note DATA LIST::). LOWER, the default, indicates that the lower triangle of the matrix is given; UPPER indicates the upper triangle; and FULL indicates that the entire matrix is given. DIAGONAL, the default, indicates that the diagonal is part of the data; NODIAGONAL indicates that it is omitted. DIAGONAL/NODIAGONAL have no effect when FULL is specified. The SPLIT subcommand is used to specify SPLIT FILE variables for the input matrices (*note SPLIT FILE::). Specify either a single variable not specified on VARIABLES, or one or more variables that are specified on VARIABLES. In the former case, the SPLIT values are not present in the data and ROWTYPE_ may not be specified on VARIABLES. In the latter case, the SPLIT values are present in the data. Specify a list of factor variables on FACTORS. Factor variables must also be listed on VARIABLES. Factor variables are used when there are some variables where, for each possible combination of their values, statistics on the matrix variables are included in the data. If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the CELLS subcommand is required. Specify the number of factor variable combinations that are given. For instance, if factor variable A has 2 values and factor variable B has 3 values, specify 6. The N subcommand specifies a population number of observations. When N is specified, one N record is output for each SPLIT FILE. Use CONTENTS to specify what sort of information the matrices include. Each possible option is described in more detail below. When ROWTYPE_ is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS is not specified then /CONTENTS=CORR is assumed. N N_VECTOR Number of observations as a vector, one value for each variable. N_SCALAR Number of observations as a single value. N_MATRIX Matrix of counts. MEAN Vector of means. STDDEV Vector of standard deviations. COUNT Vector of counts. MSE Vector of mean squared errors. DFE Vector of degrees of freedom. MAT Generic matrix. COV Covariance matrix. CORR Correlation matrix. PROX Proximities matrix. The exact semantics of the matrices read by MATRIX DATA are complex. Right now MATRIX DATA isn't too useful due to a lack of procedures accepting or producing related data, so these semantics aren't documented. Later, they'll be described here in detail.  File: pspp.info, Node: NEW FILE, Next: PRINT, Prev: MATRIX DATA, Up: Data Input and Output NEW FILE ======== NEW FILE. The NEW FILE command clears the current active file.  File: pspp.info, Node: PRINT, Next: PRINT EJECT, Prev: NEW FILE, Up: Data Input and Output PRINT ===== PRINT OUTFILE='filename' RECORDS=n_lines {NOTABLE,TABLE} /[line_no] arg... arg takes one of the following forms: 'string' [start-end] var_list start-end [type_spec] var_list (fortran_spec) var_list * The PRINT transformation writes variable data to an output file. PRINT is executed when a procedure causes the data to be read. In order to execute the PRINT transformation without invoking a procedure, use the EXECUTE command (*note EXECUTE::). All PRINT subcommands are optional. The OUTFILE subcommand specifies the file to receive the output. The file may be a file name as a string or a file handle (*note FILE HANDLE::). If OUTFILE is not present then output will be sent to PSPP's output listing file. The RECORDS subcommand specifies the number of lines to be output. The number of lines may optionally be surrounded by parentheses. TABLE will cause the PRINT command to output a table to the listing file that describes what it will print to the output file. NOTABLE, the default, suppresses this output table. Introduce the strings and variables to be printed with a slash (`/'). Optionally, the slash may be followed by a number indicating which output line will be specified. In the absence of this line number, the next line number will be specified. Multiple lines may be specified using multiple slashes with the intended output for a line following its respective slash. Literal strings may be printed. Specify the string itself. Optionally the string may be followed by a column number or range of column numbers, specifying the location on the line for the string to be printed. Otherwise, the string will be printed at the current position on the line. Variables to be printed can be specified in the same ways as available for DATA LIST FIXED (*note DATA LIST FIXED::). In addition, a variable list may be followed by an asterisk (`*'), which indicates that the variables should be printed in their dictionary print formats, separated by spaces. A variable list followed by a slash or the end of command will be interpreted the same way. If a FORTRAN type specification is used to move backwards on the current line, then text is written at that point on the line, the line will be truncated to that length, although additional text being added will again extend the line to that length.  File: pspp.info, Node: PRINT EJECT, Next: PRINT SPACE, Prev: PRINT, Up: Data Input and Output PRINT EJECT =========== PRINT EJECT OUTFILE='filename' RECORDS=n_lines {NOTABLE,TABLE} /[line_no] arg... arg takes one of the following forms: 'string' [start-end] var_list start-end [type_spec] var_list (fortran_spec) var_list * PRINT EJECT is used to write data to an output file. Before the data is written, the current page in the listing file is ejected. *Note PRINT::, for more information on syntax and usage.  File: pspp.info, Node: PRINT SPACE, Next: REREAD, Prev: PRINT EJECT, Up: Data Input and Output PRINT SPACE =========== PRINT SPACE OUTFILE='filename' n_lines. The PRINT SPACE prints one or more blank lines to an output file. The OUTFILE subcommand is optional. It may be used to direct output to a file specified by file name as a string or file handle (*note FILE HANDLE::). If OUTFILE is not specified then output will be directed to the listing file. n_lines is also optional. If present, it is an expression (*note Expressions::) specifying the number of blank lines to be printed. The expression must evaluate to a nonnegative value.  File: pspp.info, Node: REREAD, Next: REPEATING DATA, Prev: PRINT SPACE, Up: Data Input and Output REREAD ====== REREAD FILE=handle COLUMN=column. The REREAD transformation allows the previous input line in a data file already processed by DATA LIST or another input command to be re-read for further processing. The FILE subcommand, which is optional, is used to specify the file to have its line re-read. The file must be specified in the form of a file handle (*note FILE HANDLE::). If FILE is not specified then the last file specified on DATA LIST will be assumed (last file specified lexically, not in terms of flow-of-control). By default, the line re-read is re-read in its entirety. With the COLUMN subcommand, a prefix of the line can be exempted from re-reading. Specify an expression (*note Expressions::) evaluating to the first column that should be included in the re-read line. Columns are numbered from 1 at the left margin. Multiple REREAD commands will not back up in the data file. Instead, they will re-read the same line multiple times.  File: pspp.info, Node: REPEATING DATA, Next: WRITE, Prev: REREAD, Up: Data Input and Output REPEATING DATA ============== REPEATING DATA /STARTS=start-end /OCCURS=n_occurs /FILE='filename' /LENGTH=length /CONTINUED[=cont_start-cont_end] /ID=id_start-id_end=id_var /{TABLE,NOTABLE} /DATA=var_spec... where each var_spec takes one of the forms var_list start-end [type_spec] var_list (fortran_spec) The REPEATING DATA command is used to parse groups of data repeating in a uniform format, possibly with several groups on a single line. Each group of data corresponds with one case. REPEATING DATA may only be used within an INPUT PROGRAM structure. When used with DATA LIST, it can be used to parse groups of cases that share a subset of variables but differ in their other data. The STARTS subcommand is required. Specify a range of columns, using literal numbers or numeric variable names. This range specifies the columns on the first line that are used to contain groups of data. The ending column is optional. If it is not specified, then the record width of the input file is used. For the inline file (*note BEGIN DATA::) this is 80 columns; for a file with fixed record widths it is the record width; for other files it is 1024 characters by default. The OCCURS subcommand is required. It must be a number or the name of a numeric variable. Its value is the number of groups present in the current record. The DATA subcommand is required. It must be the last subcommand specified. It is used to specify the data present within each repeating group. Column numbers are specified relative to the beginning of a group at column 1. Data is specified in the same way as with DATA LIST FIXED (*note DATA LIST FIXED::). All other subcommands are optional. FILE specifies the file to read, either a file name as a string or a file handle (*note FILE HANDLE::). If FILE is not present then the default is the last file handle used on DATA LIST (lexically, not in terms of flow of control). By default REPEATING DATA will output a table describing how it will parse the input data. Specifying NOTABLE will disable this behavior; specifying TABLE will explicitly enable it. The LENGTH subcommand specifies the length in characters of each group. If it is not present then length is inferred from the DATA subcommand. LENGTH can be a number or a variable name. Normally all the data groups are expected to be present on a single line. Use the CONTINUED command to indicate that data can be continued onto additional lines. If data on continuation lines starts at the left margin and continues through the entire field width, no column specifications are necessary on CONTINUED. Otherwise, specify the possible range of columns in the same way as on STARTS. When data groups are continued from line to line, it's easily possible for cases to get out of sync if hand editing is not done carefully. The ID subcommand allows a case identifier to be present on each line of repeating data groups. REPEATING DATA will check for the same identifier on each line and report mismatches. Specify the range of columns that the identifier will occupy, followed by an equals sign (`=') and the identifier variable name. The variable must already have been declared with NUMERIC or another command.  File: pspp.info, Node: WRITE, Prev: REPEATING DATA, Up: Data Input and Output WRITE ===== WRITE OUTFILE='filename' RECORDS=n_lines {NOTABLE,TABLE} /[line_no] arg... arg takes one of the following forms: 'string' [start-end] var_list start-end [type_spec] var_list (fortran_spec) var_list * WRITE is used to write text or binary data to an output file. *Note PRINT::, for more information on syntax and usage. The main difference between PRINT and WRITE is that whereas by default PRINT uses variables' print formats, WRITE uses write formats. The sole additional difference is that if WRITE is used to send output to a binary file, carriage control characters will not be output. *Note FILE HANDLE::, for information on how to declare a file as binary.  File: pspp.info, Node: System and Portable Files, Next: Variable Attributes, Prev: Data Input and Output, Up: Top System Files and Portable Files ******************************* The commands in this chapter read, write, and examine system files and portable files. * Menu: * APPLY DICTIONARY:: Apply system file dictionary to active file. * EXPORT:: Write to a portable file. * GET:: Read from a system file. * IMPORT:: Read from a portable file. * MATCH FILES:: Merge system files. * SAVE:: Write to a system file. * SYSFILE INFO:: Display system file dictionary. * XSAVE:: Write to a system file, as a transform.  File: pspp.info, Node: APPLY DICTIONARY, Next: EXPORT, Prev: System and Portable Files, Up: System and Portable Files APPLY DICTIONARY ================ APPLY DICTIONARY FROM='filename'. The APPLY DICTIONARY command applies the variable labels, value labels, and missing values from variables in a system file to corresponding variables in the active file. In some cases it also updates the weighting variable. Specify a system file with a file name string or as a file handle (*note FILE HANDLE::). The dictionary in the system file will be read, but it will not replace the active file dictionary. The system file's data will not be read. Only variables with names that exist in both the active file and the system file are considered. Variables with the same name but different types (numeric, string) will cause an error message. Otherwise, the system file variables' attributes will replace those in their matching active file variables, as described below. If a system file variable has a variable label, then it will replace the active file variable's variable label. If the system file variable does not have a variable label, then the active file variable's variable label, if any, will be retained. If the active file variable is numeric or short string, then value labels and missing values, if any, will be copied to the active file variable. If the system file variable does not have value labels or missing values, then those in the active file variable, if any, will not be disturbed. Finally, weighting of the active file is updated (*note WEIGHT::). If the active file has a weighting variable, and the system file does not, or if the weighting variable in the system file does not exist in the active file, then the active file weighting variable, if any, is retained. Otherwise, the weighting variable in the system file becomes the active file weighting variable. APPLY DICTIONARY takes effect immediately. It does not read the active file. The system file is not modified.  File: pspp.info, Node: EXPORT, Next: GET, Prev: APPLY DICTIONARY, Up: System and Portable Files EXPORT ====== EXPORT /OUTFILE='filename' /DROP=var_list /KEEP=var_list /RENAME=(src_names=target_names)... The EXPORT procedure writes the active file dictionary and data to a specified portable file. The OUTFILE subcommand, which is the only required subcommand, specifies the portable file to be written as a file name string or a file handle (*note FILE HANDLE::). DROP, KEEP, and RENAME follow the same format as the SAVE procedure (*note SAVE::). EXPORT is a procedure. It causes the active file to be read.  File: pspp.info, Node: GET, Next: IMPORT, Prev: EXPORT, Up: System and Portable Files GET === GET /FILE='filename' /DROP=var_list /KEEP=var_list /RENAME=(src_names=target_names)... The GET transformation clears the current dictionary and active file and replaces them with the dictionary and data from a specified system file. The FILE subcommand is the only required subcommand. Specify the system file to be read as a string file name or a file handle (*note FILE HANDLE::). By default, all the variables in a system file are read. The DROP subcommand can be used to specify a list of variables that are not to be read. By contrast, the KEEP subcommand can be used to specify variable that are to be read, with all other variables not read. Normally variables in a system file retain the names that they were saved under. Use the RENAME subcommand to change these names. Specify, within parentheses, a list of variable names followed by an equals sign (`=') and the names that they should be renamed to. Multiple parenthesized groups of variable names can be included on a single RENAME subcommand. Variables' names may be swapped using a RENAME subcommand of the form `/RENAME=(A B=B A)'. Alternate syntax for the RENAME subcommand allows the parentheses to be eliminated. When this is done, only a single variable may be renamed at once. For instance, `/RENAME=A=B'. This alternate syntax is deprecated. DROP, KEEP, and RENAME are performed in left-to-right order. They each may be present any number of times. Please note that DROP, KEEP, and RENAME do not cause the system file on disk to be modified. Only the active file read from the system file is changed. GET does not cause the data to be read, only the dictionary. The data is read later, when a procedure is executed.  File: pspp.info, Node: IMPORT, Next: MATCH FILES, Prev: GET, Up: System and Portable Files IMPORT ====== IMPORT /FILE='filename' /TYPE={COMM,TAPE} /DROP=var_list /KEEP=var_list /RENAME=(src_names=target_names)... The IMPORT transformation clears the active file dictionary and data and replaces them with a dictionary and data from a portable file on disk. The FILE subcommand, which is the only required subcommand, specifies the portable file to be read as a file name string or a file handle (*note FILE HANDLE::). The TYPE subcommand is currently not used. DROP, KEEP, and RENAME follow the syntax used by GET (*note GET::). IMPORT does not cause the data to be read, only the dictionary. The data is read later, when a procedure is executed.  File: pspp.info, Node: MATCH FILES, Next: SAVE, Prev: IMPORT, Up: System and Portable Files MATCH FILES =========== MATCH FILES /BY var_list /{FILE,TABLE}={*,'filename'} /DROP=var_list /KEEP=var_list /RENAME=(src_names=target_names)... /IN=var_name /FIRST=var_name /LAST=var_name /MAP The MATCH FILES command merges one or more system files, optionally including the active file. Records with the same values for BY variables are combined into a single record. Records with different values are output in order. Thus, multiple sorted system files are combined into a single sorted system file based on the value of the BY variables. The BY subcommand specifies a list of variables that are used to match records from each of the system files. Variables specified must exist in all the files specified on FILE and TABLE. BY should usually be specified. If TABLE is used then BY is required. Specify FILE with a system file as a file name string or file handle (*note FILE HANDLE::). An asterisk (`*') may also be specified to indicate the current active file. The files specified on FILE are merged together based on the BY variables, or combined case-by-case if BY is not specified. Normally at least two FILE subcommands should be specified. Specify TABLE with a system file in order to use it as a "table lookup file". Records in table lookup files are not used up after they've been used once. This means that data in table lookup files can correspond to any number of records in FILE files. Table lookup files correspond to lookup tables in traditional relational database systems. It is incorrect to have records with duplicate BY values in table lookup files. Any number of FILE and TABLE subcommands may be specified. Each instance of FILE or TABLE can be followed by DROP, KEEP, and/or RENAME subcommands. These take the same form as the corresponding subcommands of GET (*note GET::), and perform the same functions. Variables belonging to files that are not present for the current case are set to the system-missing value for numeric variables or spaces for string variables. IN, FIRST, LAST, and MAP are currently not used.