This is pspp.info, produced by makeinfo version 4.0 from pspp.texi.

START-INFO-DIR-ENTRY
* PSPP: (pspp).             Statistical analysis package.
END-INFO-DIR-ENTRY

   PSPP, for statistical analysis of sampled data, by Ben Pfaff.

   This file documents PSPP, a statistical package for analysis of
sampled data that uses a command language compatible with SPSS.

   Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.

   This version of the PSPP documentation is consistent with version 2
of "texinfo.tex".

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above condition for modified
versions, except that this permission notice may be stated in a
translation approved by the Free Software Foundation.


File: pspp.info,  Node: Date Extraction,  Prev: Date Construction,  Up: Time & Date

Functions that Examine Dates
............................

   These functions take numeric arguments in PSPP date or time format
and give numeric results.  These names are used for arguments:

DATE
     A numeric value in PSPP date format.

TIME
     A numeric value in PSPP time format.

TIME-OR-DATE
     A numeric value in PSPP time or date format.

 - Function:  XDATE.DATE (TIME-OR-DATE)
     For a time, results in the time corresponding to the number of
     whole days DATE-OR-TIME includes.  For a date, results in the date
     corresponding to the latest midnight at or before DATE-OR-TIME;
     that is, gives the date that DATE-OR-TIME is in.  (XDATE.DATE(X)
     is equivalent to TRUNC(X/86400)*86400.)  Applying this function to
     a time is a Portability: none feature.

 - Function:  XDATE.HOUR (TIME-OR-DATE)
     For a time, results in the number of whole hours beyond the number
     of whole days represented by DATE-OR-TIME.  For a date, results in
     the hour (as an integer between 0 and 23) corresponding to
     DATE-OR-TIME.  (XDATE.HOUR(X) is equivalent to
     MOD(TRUNC(X/3600),24))  Applying this function to a time is a
     Portability: none feature.

 - Function:  XDATE.JDAY(DATE)
     Results in the day of the year (as an integer between 1 and 366)
     corresponding to DATE.

 - Function:  XDATE.MDAY(DATE)
     Results in the day of the month (as an integer between 1 and 31)
     corresponding to DATE.

 - Function:  XDATE.MINUTE(TIME-OR-DATE)
     Results in the number of minutes (as an integer between 0 and 59)
     after the last hour in TIME-OR-DATE.  (XDATE.MINUTE(X) is
     equivalent to MOD(TRUNC(X/60),60)) Applying this function to a
     time is a Portability: none feature.

 - Function:  XDATE.MONTH(DATE)
     Results in the month of the year (as an integer between 1 and 12)
     corresponding to DATE.

 - Function:  XDATE.QUARTER(DATE)
     Results in the quarter of the year (as an integer between 1 and 4)
     corresponding to DATE.

 - Function:  XDATE.SECOND(TIME-OR-DATE)
     Results in the number of whole seconds after the last whole minute
     (as an integer between 0 and 59) in TIME-OR-DATE.
     (XDATE.SECOND(X) is equivalent to MOD(X, 60).)  Applying this
     function to a time is a Portability: none feature.

 - Function:  XDATE.TDAY(TIME)
     Results in the number of whole days (as an integer) in TIME.
     (XDATE.TDAY(X) is equivalent to TRUNC(X/86400).)

 - Function:  XDATE.TIME(DATE)
     Results in the time of day at the instant corresponding to DATE,
     in PSPP time format.  This is the number of seconds since midnight
     on the day corresponding to DATE.  (XDATE.TIME(X) is equivalent to
     TRUNC(X/86400)*86400.)

 - Function:  XDATE.WEEK(DATE)
     Results in the week of the year (as an integer between 1 and 53)
     corresponding to DATE.

 - Function:  XDATE.WKDAY(DATE)
     Results in the day of week (as an integer between 1 and 7)
     corresponding to DATE.  The days of the week are:

    1
          Sunday

    2
          Monday

    3
          Tuesday

    4
          Wednesday

    5
          Thursday

    6
          Friday

    7
          Saturday

 - Function:  XDATE.YEAR (DATE)
     Returns the year (as an integer between 1582 and 19999)
     corresponding to DATE.


File: pspp.info,  Node: Miscellaneous Functions,  Next: Functions Not Implemented,  Prev: Time & Date,  Up: Functions

Miscellaneous Functions
-----------------------

   Miscellaneous functions take various arguments and produce various
results.

 - Function:  LAG (VARIABLE)
     VARIABLE must be a numeric or string variable name.  `LAG' results
     in the value of that variable for the case before the current one.
     In case-selection procedures, `LAG' results in the value of the
     variable for the last case selected.  Results in system-missing
     (for numeric variables) or blanks (for string variables) for the
     first case or before any cases are selected.

 - Function:  LAG (VARIABLE, NCASES)
     VARIABLE must be a numeric or string variable name.  NCASES must
     be a small positive constant integer, although there is no explicit
     limit.  (Use of a large value for NCASES will increase memory
     consumption, since PSPP must keep NCASES cases in memory.)  `LAG
     (VARIABLE, NCASES' results in the value of VARIABLE that is NCASES
     before the case currently being processed.  See `LAG (VARIABLE)'
     above for more details.

 - Function:  YRMODA (YEAR, MONTH, DAY)
     YEAR is a year between 0 and 199 or 1582 and 19999.  MONTH is a
     month between 1 and 12.  DAY is a day between 1 and 31.  If MONTH
     or DAY is out-of-range, it changes the next higher unit.  For
     instance, a DAY of 0 refers to the last day of the previous month,
     and a MONTH of 13 refers to the first month of the next year.
     YEAR must be in range.  If YEAR is between 0 and 199, 1900 is
     added.  YEAR, MONTH, and DAY must all be integers.

     `YRMODA' results in the number of days between 15 Oct 1582 and the
     date specified, plus one.  The date passed to `YRMODA' must be on
     or after 15 Oct 1582.  15 Oct 1582 has a value of 1.


File: pspp.info,  Node: Functions Not Implemented,  Prev: Miscellaneous Functions,  Up: Functions

Functions Not Implemented
-------------------------

   These functions are not yet implemented and thus not yet documented,
since it's a hassle.

   * `CDF.xxx'

   * `CDFNORM'

   * `IDF.xxx'

   * `NCDF.xxx'

   * `PROBIT'

   * `RV.xxx'


File: pspp.info,  Node: Order of Operations,  Prev: Functions,  Up: Expressions

Operator Precedence
===================

   The following table describes operator precedence.  Smaller-numbered
levels in the table have higher precedence.  Within a level, operations
are performed from left to right, except for level 2 (exponentiation),
where operations are performed from right to left.  If an operator
appears in the table in two places (`-'), the first occurrence is
unary, the second is binary.

  1. `(  )'

  2. `**'

  3. `-'

  4. `*  /'

  5. `+  -'

  6. `EQ  GE  GT  LE  LT  NE'

  7. `AND  NOT  OR'


File: pspp.info,  Node: Data Input and Output,  Next: System and Portable Files,  Prev: Expressions,  Up: Top

Data Input and Output
*********************

   Data is the focus of the PSPP language.  This chapter examines the
PSPP commands for defining variables and reading and writing data.

     *Please note:* Data is not actually read until a procedure is
     executed.  These commands tell PSPP how to read data, but they do
     not _cause_ PSPP to read data.

* Menu:

* BEGIN DATA::                  Embed data within a syntax file.
* CLEAR TRANSFORMATIONS::       Clear pending transformations.
* DATA LIST::                   Fundamental data reading command.
* END CASE::                    Output the current case.
* END FILE::                    Terminate the current input program.
* FILE HANDLE::                 Support for fixed-length records.
* INPUT PROGRAM::               Support for complex input programs.
* LIST::                        List cases in the active file.
* MATRIX DATA::                 Read matrices in text format.
* NEW FILE::                    Clear the active file and dictionary.
* PRINT::                       Display values in print formats.
* PRINT EJECT::                 Eject the current page then print.
* PRINT SPACE::                 Print blank lines.
* REREAD::                      Take another look at the previous input line.
* REPEATING DATA::              Multiple cases on a single line.
* WRITE::                       Display values in write formats.


File: pspp.info,  Node: BEGIN DATA,  Next: CLEAR TRANSFORMATIONS,  Prev: Data Input and Output,  Up: Data Input and Output

BEGIN DATA
==========

     BEGIN DATA.
     ...
     END DATA.

   BEGIN DATA and END DATA can be used to embed raw ASCII data in a PSPP
syntax file.  DATA LIST or another input procedure must be used before
BEGIN DATA (*note DATA LIST::).  BEGIN DATA and END DATA must be used
together.  The END DATA command must appear by itself on a single line,
with no leading whitespace and exactly one space between the words
`END' and `DATA', followed immediately by the terminal dot, like this:

     END DATA.


File: pspp.info,  Node: CLEAR TRANSFORMATIONS,  Next: DATA LIST,  Prev: BEGIN DATA,  Up: Data Input and Output

CLEAR TRANSFORMATIONS
=====================

     CLEAR TRANSFORMATIONS.

   The CLEAR TRANSFORMATIONS command clears out all pending
transformations.  It does not cancel the current input program.  It is
valid only when PSPP is interactive, not in syntax files.


File: pspp.info,  Node: DATA LIST,  Next: END CASE,  Prev: CLEAR TRANSFORMATIONS,  Up: Data Input and Output

DATA LIST
=========

   Used to read text or binary data, DATA LIST is the most fundamental
data-reading command.  Even the more sophisticated input methods use
DATA LIST commands as a building block.  Understanding DATA LIST is
important to understanding how to use PSPP to read your data files.

   There are two major variants of DATA LIST, which are fixed format
and free format.  In addition, free format has a minor variant, list
format, which is discussed in terms of its differences from vanilla
free format.

   Each form of DATA LIST is described in detail below.

* Menu:

* DATA LIST FIXED::             Fixed columnar locations for data.
* DATA LIST FREE::              Any spacing you like.
* DATA LIST LIST::              Each case must be on a single line.


File: pspp.info,  Node: DATA LIST FIXED,  Next: DATA LIST FREE,  Prev: DATA LIST,  Up: DATA LIST

DATA LIST FIXED
---------------

     DATA LIST [FIXED]
             {TABLE,NOTABLE}
             FILE='filename'
             RECORDS=record_count
             END=end_var
             /[line_no] var_spec...
     
     where each var_spec takes one of the forms
             var_list start-end [type_spec]
             var_list (fortran_spec)

   DATA LIST FIXED is used to read data files that have values at fixed
positions on each line of single-line or multiline records.  The
keyword FIXED is optional.

   The FILE subcommand must be used if input is to be taken from an
external file.  It may be used to specify a filename as a string or a
file handle (*note FILE HANDLE::).  If the FILE subcommand is not used,
then input is assumed to be specified within the command file using
BEGIN DATA...END DATA (*note BEGIN DATA::).

   The optional RECORDS subcommand, which takes a single integer as an
argument, is used to specify the number of lines per record.  If RECORDS
is not specified, then the number of lines per record is calculated from
the list of variable specifications later in the DATA LIST command.

   The END subcommand is only useful in conjunction with the INPUT
PROGRAM input procedure, and for that reason it is not discussed here
(*note INPUT PROGRAM::).

   DATA LIST can optionally output a table describing how the data file
will be read.  The TABLE subcommand enables this output, and NOTABLE
disables it.  The default is to output the table.

   The list of variables to be read from the data list must come last in
the DATA LIST command.  Each line in the data record is introduced by a
slash (`/').  Optionally, a line number may follow the slash.
Following, any number of variable specifications may be present.

   Each variable specification consists of a list of variable names
followed by a description of their location on the input line.  Sets of
variables may specified using DATA LIST's TO convention (*note Sets of
Variables::).  There are two ways to specify the location of the
variable on the line: SPSS style and FORTRAN style.

   With SPSS style, the starting column and ending column for the field
are specified after the variable name, separated by a dash (`-').  For
instance, the third through fifth columns on a line would be specified
`3-5'.  By default, variables are considered to be in `F' format (*note
Input/Output Formats::).  (This default can be changed; see *Note SET::
for more information.)

   When using SPSS style, to use a variable format other than the
default, specify the format type in parentheses after the column
numbers.  For instance, for alphanumeric `A' format, use `(A)'.

   In addition, implied decimal places can be specified in parentheses
after the column numbers.  As an example, suppose that a data file has a
field in which the characters `1234' should be interpreted as having
the value 12.34.  Then this field has two implied decimal places, and
the corresponding specification would be `(2)'.  If a field that has
implied decimal places contains a decimal point, then the implied
decimal places are not applied.

   Changing the variable format and adding implied decimal places can be
done together; for instance, `(N,5)'.

   When using SPSS style, the input and output width of each variable is
computed from the field width.  The field width must be evenly divisible
into the number of variables specified.

   FORTRAN style is an altogether different approach to specifying field
locations.  With this approach, a list of variable input format
specifications, separated by commas, are placed after the variable names
inside parentheses.  Each format specifier advances as many characters
into the input line as it uses.

   In addition to the standard format specifiers (*note Input/Output
Formats::), FORTRAN style defines some extensions:

`X'
     Advance the current column on this line by one character position.

`T'X
     Set the current column on this line to column X, with column
     numbers considered to begin with 1 at the left margin.

`NEWREC'X
     Skip forward X lines in the current record, resetting the active
     column to the left margin.

Repeat count
     Any format specifier may be preceded by a number.  This causes the
     action of that format specifier to be repeated the specified
     number of times.

(SPEC1, ..., SPECN)
     Group the given specifiers together.  This is most useful when
     preceded by a repeat count.  Groups may be nested arbitrarily.

   FORTRAN and SPSS styles may be freely intermixed.  SPSS style leaves
the active column immediately after the ending column specified.  Record
motion using `NEWREC' in FORTRAN style also applies to later FORTRAN
and SPSS specifiers.

* Menu:

* DATA LIST FIXED Examples::    Examples of DATA LIST FIXED.


File: pspp.info,  Node: DATA LIST FIXED Examples,  Prev: DATA LIST FIXED,  Up: DATA LIST FIXED

Examples
........

  1.      DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
          
          BEGIN DATA.
          John Smith 102311
          Bob Arnold 122015
          Bill Yates  918 6
          END DATA.

     Defines the following variables:

        * `NAME', a 10-character-wide long string variable, in columns 1
          through 10.

        * `INFO1', a numeric variable, in columns 12 through 13.

        * `INFO2', a numeric variable, in columns 14 through 15.

        * `INFO3', a numeric variable, in columns 16 through 17.

     The `BEGIN DATA'/`END DATA' commands cause three cases to be
     defined:

          Case   NAME         INFO1   INFO2   INFO3
             1   John Smith     10      23      11
             2   Bob Arnold     12      20      15
             3   Bill Yates      9      18       6

     The `TABLE' keyword causes PSPP to print out a table describing
     the four variables defined.

  2.      DAT LIS FIL="survey.dat"
                  /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
                  /Q01 TO Q50 7-56
                  /.

     Defines the following variables:

        * `ID', a numeric variable, in columns 1-5 of the first record.

        * `NAME', a 30-character long string variable, in columns 7-36
          of the first record.

        * `SURNAME', a 30-character long string variable, in columns
          38-67 of the first record.

        * `MINITIAL', a 1-character short string variable, in column 69
          of the first record.

        * Fifty variables `Q01', `Q02', `Q03', ..., `Q49', `Q50', all
          numeric, `Q01' in column 7, `Q02' in column 8, ..., `Q49' in
          column 55, `Q50' in column 56, all in the second record.

     Cases are separated by a blank record.

     Data is read from file `survey.dat' in the current directory.

     This example shows keywords abbreviated to their first 3 letters.



File: pspp.info,  Node: DATA LIST FREE,  Next: DATA LIST LIST,  Prev: DATA LIST FIXED,  Up: DATA LIST

DATA LIST FREE
--------------

     DATA LIST FREE
             [{NOTABLE,TABLE}]
             FILE='filename'
             END=end_var
             /var_spec...
     
     where each var_spec takes one of the forms
             var_list [(type_spec)]
             var_list *

   In free format, the input data is structured as a series of comma- or
whitespace-delimited fields (end of line is one form of whitespace; it
is not treated specially).  Field contents may be surrounded by matched
pairs of apostrophes (`'') or quotes (`"'), or they may be unenclosed.
For any type of field leading white space (up to the apostrophe or
quote, if any) is not included in the field.

   Multiple consecutive delimiters are equivalent to a single delimiter.
To specify an empty field, write an empty set of single or double
quotes; for instance, `""'.

   The NOTABLE and TABLE subcommands are as in DATA LIST FIXED above.
NOTABLE is the default.

   The FILE and END subcommands are as in DATA LIST FIXED above.

   The variables to be parsed are given as a single list of variable
names.  This list must be introduced by a single slash (`/').  The set
of variable names may contain format specifications in parentheses
(*note Input/Output Formats::).  Format specifications apply to all
variables back to the previous parenthesized format specification.

   In addition, an asterisk may be used to indicate that all variables
preceding it are to have input/output format `F8.0'.

   Specified field widths are ignored on input, although all normal
limits on field width apply, but they are honored on output.


File: pspp.info,  Node: DATA LIST LIST,  Prev: DATA LIST FREE,  Up: DATA LIST

DATA LIST LIST
--------------

     DATA LIST LIST
             [{NOTABLE,TABLE}]
             FILE='filename'
             END=end_var
             /var_spec...
     
     where each var_spec takes one of the forms
             var_list [(type_spec)]
             var_list *

   Syntactically and semantically, DATA LIST LIST is equivalent to DATA
LIST FREE, with one exception: each input line is expected to correspond
to exactly one input record.  If more or fewer fields are found on an
input line than expected, an appropriate diagnostic is issued.


File: pspp.info,  Node: END CASE,  Next: END FILE,  Prev: DATA LIST,  Up: Data Input and Output

END CASE
========

     END CASE.

   END CASE is used within INPUT PROGRAM to output the current case.
*Note INPUT PROGRAM::.


File: pspp.info,  Node: END FILE,  Next: FILE HANDLE,  Prev: END CASE,  Up: Data Input and Output

END FILE
========

     END FILE.

   END FILE is used within INPUT PROGRAM to terminate the current input
program.  *Note INPUT PROGRAM::.


File: pspp.info,  Node: FILE HANDLE,  Next: INPUT PROGRAM,  Prev: END FILE,  Up: Data Input and Output

FILE HANDLE
===========

     FILE HANDLE handle_name
             /NAME='filename'
             /RECFORM={VARIABLE,FIXED,SPANNED}
             /LRECL=rec_len
             /MODE={CHARACTER,IMAGE,BINARY,MULTIPUNCH,360}

   Use the FILE HANDLE command to define the attributes of a file that
does not use conventional variable-length records terminated by newline
characters.

   Specify the file handle name as an identifier.  Any given identifier
may only appear once in a PSPP run.  File handles may not be reassigned
to a different file.  The file handle name must immediately follow the
FILE HANDLE command name.

   The NAME subcommand specifies the name of the file associated with
the handle.  It is the only required subcommand.

   The RECFORM subcommand specifies how the file is laid out.  VARIABLE
specifies variable-length lines terminated with newlines, and it is the
default.  FIXED specifies fixed-length records.  SPANNED is not
supported.

   LRECL specifies the length of fixed-length records.  It is required
if `/RECFORM FIXED' is specified.

   MODE specifies a file mode.  CHARACTER, the default, causes the data
file to be opened in ANSI C text mode.  BINARY causes the data file to
be opened in ANSI C binary mode.  The other possibilities are not
supported.


File: pspp.info,  Node: INPUT PROGRAM,  Next: LIST,  Prev: FILE HANDLE,  Up: Data Input and Output

INPUT PROGRAM
=============

     INPUT PROGRAM.
     ... input commands ...
     END INPUT PROGRAM.

   The INPUT PROGRAM...END INPUT PROGRAM construct is used to specify a
complex input program.  By placing data input commands within INPUT
PROGRAM, PSPP programs can take advantage of more complex file
structures than available by using DATA LIST by itself.

   The first sort of extended input program is to simply put multiple
DATA LIST commands within the INPUT PROGRAM.  This will cause all of
the data files to be read in parallel.  Input will stop when end of
file is reached on any of the data files.

   Transformations, such as conditional and looping constructs, can
also be included within an INPUT PROGRAM.  These can be used to combine
input from several data files in more complex ways.  However, input
will still stop when end of file is reached on any of the data files.

   To prevent INPUT PROGRAM from terminating at the first end of file,
use the END subcommand on DATA LIST.  This subcommand takes a variable
name, which should be a numeric scratch variable (*note Scratch
Variables::).  (It need not be a scratch variable but otherwise the
results can be surprising.)  The value of this variable is set to 0
when reading the data file, or 1 when end of file is encountered.

   Some additional commands are useful in conjunction with INPUT
PROGRAM.  END CASE is the first one.  Normally each loop through the
INPUT PROGRAM structure produces one case.  But with END CASE you can
control exactly when cases are output.  When END CASE is used, looping
from the end of INPUT PROGRAM to the beginning does not cause a case to
be output.

   END FILE is the other command.  When the END subcommand is used on
DATA LIST, there is no way for the INPUT PROGRAM construct to stop
looping, so an infinite loop results.  The END FILE command, when
executed, stops the flow of input data and passes out of the INPUT
PROGRAM structure.

   All this is very confusing.  A few examples should help to clarify.

     INPUT PROGRAM.
             DATA LIST NOTABLE FILE='a.data'/X 1-10.
             DATA LIST NOTABLE FILE='b.data'/Y 1-10.
     END INPUT PROGRAM.
     LIST.

   The example above reads variable X from file `a.data' and variable Y
from file `b.data'.  If one file is shorter than the other then the
extra data in the longer file is ignored.

     INPUT PROGRAM.
             NUMERIC #A #B.
     
             DO IF NOT #A.
                     DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
             END IF.
             DO IF NOT #B.
                     DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
             END IF.
             DO IF #A AND #B.
                     END FILE.
             END IF.
             END CASE.
     END INPUT PROGRAM.
     LIST.

   This example reads variable X from `a.data' and variable Y from
`b.data'.  If one file is shorter than the other then the missing field
is set to the system-missing value alongside the present value for the
remaining length of the longer file.

     INPUT PROGRAM.
             NUMERIC #A #B.
     
             DO IF #A.
                     DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
                     DO IF #B.
                             END FILE.
                     ELSE.
                             END CASE.
                     END IF.
             ELSE.
                     DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
                     DO IF NOT #A.
                             END CASE.
                     END IF.
             END IF.
     END INPUT PROGRAM.
     LIST.

   The above example reads data from file `a.data', then from `b.data',
and concatenates them into a single active file.

     INPUT PROGRAM.
             NUMERIC #EOF.
     
             LOOP IF NOT #EOF.
                     DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
                     DO IF NOT #EOF.
                             END CASE.
                     END IF.
             END LOOP.
     
             COMPUTE #EOF = 0.
             LOOP IF NOT #EOF.
                     DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
                     DO IF NOT #EOF.
                             END CASE.
                     END IF.
             END LOOP.
     
             END FILE.
     END INPUT PROGRAM.
     LIST.

   The above example does the same thing as the previous example, in a
different way.

     INPUT PROGRAM.
             LOOP #I=1 TO 50.
                     COMPUTE X=UNIFORM(10).
                     END CASE.
             END LOOP.
             END FILE.
     END INPUT PROGRAM.
     LIST/FORMAT=NUMBERED.

   The above example causes an active file to be created consisting of
50 random variates between 0 and 10.


File: pspp.info,  Node: LIST,  Next: MATRIX DATA,  Prev: INPUT PROGRAM,  Up: Data Input and Output

LIST
====

     LIST
             /VARIABLES=var_list
             /CASES=FROM start_index TO end_index BY incr_index
             /FORMAT={UNNUMBERED,NUMBERED} {WRAP,SINGLE}
                     {NOWEIGHT,WEIGHT}

   The LIST procedure prints the values of specified variables to the
listing file.

   The VARIABLES subcommand specifies the variables whose values are to
be printed.  Keyword VARIABLES is optional.  If VARIABLES subcommand is
not specified then all variables in the active file are printed.

   The CASES subcommand can be used to specify a subset of cases to be
printed.  Specify FROM and the case number of the first case to print,
TO and the case number of the last case to print, and BY and the number
of cases to advance between printing cases, or any subset of those
settings.  If CASES is not specified then all cases are printed.

   The FORMAT subcommand can be used to change the output format.
NUMBERED will print case numbers along with each case; UNNUMBERED, the
default, causes the case numbers to be omitted.  The WRAP and SINGLE
settings are currently not used.  WEIGHT will cause case weights to be
printed along with variable values; NOWEIGHT, the default, causes case
weights to be omitted from the output.

   Case numbers start from 1.  They are counted after all
transformations have been considered.

   LIST will attempt to fit all the values on a single line.  If
necessary, variable names will be display vertically in order to fit.
If values cannot fit on a single line, then a multi-line format will be
used.

   LIST is a procedure.  It causes the data to be read.


File: pspp.info,  Node: MATRIX DATA,  Next: NEW FILE,  Prev: LIST,  Up: Data Input and Output

MATRIX DATA
===========

     MATRIX DATA
             /VARIABLES=var_list
             /FILE='filename'
             /FORMAT={LIST,FREE} {LOWER,UPPER,FULL} {DIAGONAL,NODIAGONAL}
             /SPLIT={new_var,var_list}
             /FACTORS=var_list
             /CELLS=n_cells
             /N=n
             /CONTENTS={N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE,
                        DFE,MAT,COV,CORR,PROX}

   The MATRIX DATA command reads square matrices in one of several
textual formats.  MATRIX DATA clears the dictionary and replaces it and
reads a data file.

   Use VARIABLES to specify the variables that form the rows and
columns of the matrices.  You may not specify a variable named
VARNAME_.  You should specify VARIABLES first.

   Specify the file to read on FILE, either as a file name string or a
file handle (*note FILE HANDLE::).  If FILE is not specified then
matrix data must immediately follow MATRIX DATA with a BEGIN DATA...END
DATA construct (*note BEGIN DATA::).

   The FORMAT subcommand specifies how the matrices are formatted.
LIST, the default, indicates that there is one line per row of matrix
data; FREE allows single matrix rows to be broken across multiple
lines.  This is analogous to the difference between DATA LIST FREE and
DATA LIST LIST (*note DATA LIST::).  LOWER, the default, indicates that
the lower triangle of the matrix is given; UPPER indicates the upper
triangle; and FULL indicates that the entire matrix is given.
DIAGONAL, the default, indicates that the diagonal is part of the data;
NODIAGONAL indicates that it is omitted.  DIAGONAL/NODIAGONAL have no
effect when FULL is specified.

   The SPLIT subcommand is used to specify SPLIT FILE variables for the
input matrices (*note SPLIT FILE::).  Specify either a single variable
not specified on VARIABLES, or one or more variables that are specified
on VARIABLES.  In the former case, the SPLIT values are not present in
the data and ROWTYPE_ may not be specified on VARIABLES.  In the latter
case, the SPLIT values are present in the data.

   Specify a list of factor variables on FACTORS.  Factor variables must
also be listed on VARIABLES.  Factor variables are used when there are
some variables where, for each possible combination of their values,
statistics on the matrix variables are included in the data.

   If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES,
the CELLS subcommand is required.  Specify the number of factor variable
combinations that are given.  For instance, if factor variable A has 2
values and factor variable B has 3 values, specify 6.

   The N subcommand specifies a population number of observations.
When N is specified, one N record is output for each SPLIT FILE.

   Use CONTENTS to specify what sort of information the matrices
include.  Each possible option is described in more detail below.  When
ROWTYPE_ is specified on VARIABLES, CONTENTS is optional; otherwise, if
CONTENTS is not specified then /CONTENTS=CORR is assumed.

N

N_VECTOR
     Number of observations as a vector, one value for each variable.

N_SCALAR
     Number of observations as a single value.

N_MATRIX
     Matrix of counts.

MEAN
     Vector of means.

STDDEV
     Vector of standard deviations.

COUNT
     Vector of counts.

MSE
     Vector of mean squared errors.

DFE
     Vector of degrees of freedom.

MAT
     Generic matrix.

COV
     Covariance matrix.

CORR
     Correlation matrix.

PROX
     Proximities matrix.

   The exact semantics of the matrices read by MATRIX DATA are complex.
Right now MATRIX DATA isn't too useful due to a lack of procedures
accepting or producing related data, so these semantics aren't
documented.  Later, they'll be described here in detail.


File: pspp.info,  Node: NEW FILE,  Next: PRINT,  Prev: MATRIX DATA,  Up: Data Input and Output

NEW FILE
========

     NEW FILE.

   The NEW FILE command clears the current active file.


File: pspp.info,  Node: PRINT,  Next: PRINT EJECT,  Prev: NEW FILE,  Up: Data Input and Output

PRINT
=====

     PRINT
             OUTFILE='filename'
             RECORDS=n_lines
             {NOTABLE,TABLE}
             /[line_no] arg...
     
     arg takes one of the following forms:
             'string' [start-end]
             var_list start-end [type_spec]
             var_list (fortran_spec)
             var_list *

   The PRINT transformation writes variable data to an output file.
PRINT is executed when a procedure causes the data to be read.  In
order to execute the PRINT transformation without invoking a procedure,
use the EXECUTE command (*note EXECUTE::).

   All PRINT subcommands are optional.

   The OUTFILE subcommand specifies the file to receive the output.  The
file may be a file name as a string or a file handle (*note FILE
HANDLE::).  If OUTFILE is not present then output will be sent to PSPP's
output listing file.

   The RECORDS subcommand specifies the number of lines to be output.
The number of lines may optionally be surrounded by parentheses.

   TABLE will cause the PRINT command to output a table to the listing
file that describes what it will print to the output file.  NOTABLE, the
default, suppresses this output table.

   Introduce the strings and variables to be printed with a slash
(`/').  Optionally, the slash may be followed by a number indicating
which output line will be specified.  In the absence of this line
number, the next line number will be specified.  Multiple lines may be
specified using multiple slashes with the intended output for a line
following its respective slash.

   Literal strings may be printed.  Specify the string itself.
Optionally the string may be followed by a column number or range of
column numbers, specifying the location on the line for the string to be
printed.  Otherwise, the string will be printed at the current position
on the line.

   Variables to be printed can be specified in the same ways as
available for DATA LIST FIXED (*note DATA LIST FIXED::).  In addition,
a variable list may be followed by an asterisk (`*'), which indicates
that the variables should be printed in their dictionary print formats,
separated by spaces.  A variable list followed by a slash or the end of
command will be interpreted the same way.

   If a FORTRAN type specification is used to move backwards on the
current line, then text is written at that point on the line, the line
will be truncated to that length, although additional text being added
will again extend the line to that length.


File: pspp.info,  Node: PRINT EJECT,  Next: PRINT SPACE,  Prev: PRINT,  Up: Data Input and Output

PRINT EJECT
===========

     PRINT EJECT
             OUTFILE='filename'
             RECORDS=n_lines
             {NOTABLE,TABLE}
             /[line_no] arg...
     
     arg takes one of the following forms:
             'string' [start-end]
             var_list start-end [type_spec]
             var_list (fortran_spec)
             var_list *

   PRINT EJECT is used to write data to an output file.  Before the
data is written, the current page in the listing file is ejected.

   *Note PRINT::, for more information on syntax and usage.


File: pspp.info,  Node: PRINT SPACE,  Next: REREAD,  Prev: PRINT EJECT,  Up: Data Input and Output

PRINT SPACE
===========

     PRINT SPACE OUTFILE='filename' n_lines.

   The PRINT SPACE prints one or more blank lines to an output file.

   The OUTFILE subcommand is optional.  It may be used to direct output
to a file specified by file name as a string or file handle (*note FILE
HANDLE::).  If OUTFILE is not specified then output will be directed to
the listing file.

   n_lines is also optional.  If present, it is an expression (*note
Expressions::) specifying the number of blank lines to be printed.  The
expression must evaluate to a nonnegative value.


File: pspp.info,  Node: REREAD,  Next: REPEATING DATA,  Prev: PRINT SPACE,  Up: Data Input and Output

REREAD
======

     REREAD FILE=handle COLUMN=column.

   The REREAD transformation allows the previous input line in a data
file already processed by DATA LIST or another input command to be
re-read for further processing.

   The FILE subcommand, which is optional, is used to specify the file
to have its line re-read.  The file must be specified in the form of a
file handle (*note FILE HANDLE::).  If FILE is not specified then the
last file specified on DATA LIST will be assumed (last file specified
lexically, not in terms of flow-of-control).

   By default, the line re-read is re-read in its entirety.  With the
COLUMN subcommand, a prefix of the line can be exempted from
re-reading.  Specify an expression (*note Expressions::) evaluating to
the first column that should be included in the re-read line.  Columns
are numbered from 1 at the left margin.

   Multiple REREAD commands will not back up in the data file.  Instead,
they will re-read the same line multiple times.


File: pspp.info,  Node: REPEATING DATA,  Next: WRITE,  Prev: REREAD,  Up: Data Input and Output

REPEATING DATA
==============

     REPEATING DATA
             /STARTS=start-end
             /OCCURS=n_occurs
             /FILE='filename'
             /LENGTH=length
             /CONTINUED[=cont_start-cont_end]
             /ID=id_start-id_end=id_var
             /{TABLE,NOTABLE}
             /DATA=var_spec...
     
     where each var_spec takes one of the forms
             var_list start-end [type_spec]
             var_list (fortran_spec)

   The REPEATING DATA command is used to parse groups of data repeating
in a uniform format, possibly with several groups on a single line.
Each group of data corresponds with one case.  REPEATING DATA may only
be used within an INPUT PROGRAM structure.  When used with DATA LIST, it
can be used to parse groups of cases that share a subset of variables
but differ in their other data.

   The STARTS subcommand is required.  Specify a range of columns, using
literal numbers or numeric variable names.  This range specifies the
columns on the first line that are used to contain groups of data.  The
ending column is optional.  If it is not specified, then the record
width of the input file is used.  For the inline file (*note BEGIN
DATA::) this is 80 columns; for a file with fixed record widths it is
the record width; for other files it is 1024 characters by default.

   The OCCURS subcommand is required.  It must be a number or the name
of a numeric variable.  Its value is the number of groups present in the
current record.

   The DATA subcommand is required.  It must be the last subcommand
specified.  It is used to specify the data present within each repeating
group.  Column numbers are specified relative to the beginning of a
group at column 1.  Data is specified in the same way as with DATA LIST
FIXED (*note DATA LIST FIXED::).

   All other subcommands are optional.

   FILE specifies the file to read, either a file name as a string or a
file handle (*note FILE HANDLE::).  If FILE is not present then the
default is the last file handle used on DATA LIST (lexically, not in
terms of flow of control).

   By default REPEATING DATA will output a table describing how it will
parse the input data.  Specifying NOTABLE will disable this behavior;
specifying TABLE will explicitly enable it.

   The LENGTH subcommand specifies the length in characters of each
group.  If it is not present then length is inferred from the DATA
subcommand.  LENGTH can be a number or a variable name.

   Normally all the data groups are expected to be present on a single
line.  Use the CONTINUED command to indicate that data can be continued
onto additional lines.  If data on continuation lines starts at the left
margin and continues through the entire field width, no column
specifications are necessary on CONTINUED.  Otherwise, specify the
possible range of columns in the same way as on STARTS.

   When data groups are continued from line to line, it's easily
possible for cases to get out of sync if hand editing is not done
carefully.  The ID subcommand allows a case identifier to be present on
each line of repeating data groups.  REPEATING DATA will check for the
same identifier on each line and report mismatches.  Specify the range
of columns that the identifier will occupy, followed by an equals sign
(`=') and the identifier variable name.  The variable must already have
been declared with NUMERIC or another command.


File: pspp.info,  Node: WRITE,  Prev: REPEATING DATA,  Up: Data Input and Output

WRITE
=====

     WRITE
             OUTFILE='filename'
             RECORDS=n_lines
             {NOTABLE,TABLE}
             /[line_no] arg...
     
     arg takes one of the following forms:
             'string' [start-end]
             var_list start-end [type_spec]
             var_list (fortran_spec)
             var_list *

   WRITE is used to write text or binary data to an output file.

   *Note PRINT::, for more information on syntax and usage.  The main
difference between PRINT and WRITE is that whereas by default PRINT uses
variables' print formats, WRITE uses write formats.

   The sole additional difference is that if WRITE is used to send
output to a binary file, carriage control characters will not be output.
*Note FILE HANDLE::, for information on how to declare a file as binary.


File: pspp.info,  Node: System and Portable Files,  Next: Variable Attributes,  Prev: Data Input and Output,  Up: Top

System Files and Portable Files
*******************************

   The commands in this chapter read, write, and examine system files
and portable files.

* Menu:

* APPLY DICTIONARY::            Apply system file dictionary to active file.
* EXPORT::                      Write to a portable file.
* GET::                         Read from a system file.
* IMPORT::                      Read from a portable file.
* MATCH FILES::                 Merge system files.
* SAVE::                        Write to a system file.
* SYSFILE INFO::                Display system file dictionary.
* XSAVE::                       Write to a system file, as a transform.


File: pspp.info,  Node: APPLY DICTIONARY,  Next: EXPORT,  Prev: System and Portable Files,  Up: System and Portable Files

APPLY DICTIONARY
================

     APPLY DICTIONARY FROM='filename'.

   The APPLY DICTIONARY command applies the variable labels, value
labels, and missing values from variables in a system file to
corresponding variables in the active file.  In some cases it also
updates the weighting variable.

   Specify a system file with a file name string or as a file handle
(*note FILE HANDLE::).  The dictionary in the system file will be read,
but it will not replace the active file dictionary.  The system file's
data will not be read.

   Only variables with names that exist in both the active file and the
system file are considered.  Variables with the same name but different
types (numeric, string) will cause an error message.  Otherwise, the
system file variables' attributes will replace those in their matching
active file variables, as described below.

   If a system file variable has a variable label, then it will replace
the active file variable's variable label.  If the system file variable
does not have a variable label, then the active file variable's variable
label, if any, will be retained.

   If the active file variable is numeric or short string, then value
labels and missing values, if any, will be copied to the active file
variable.  If the system file variable does not have value labels or
missing values, then those in the active file variable, if any, will not
be disturbed.

   Finally, weighting of the active file is updated (*note WEIGHT::).
If the active file has a weighting variable, and the system file does
not, or if the weighting variable in the system file does not exist in
the active file, then the active file weighting variable, if any, is
retained.  Otherwise, the weighting variable in the system file becomes
the active file weighting variable.

   APPLY DICTIONARY takes effect immediately.  It does not read the
active file.  The system file is not modified.


File: pspp.info,  Node: EXPORT,  Next: GET,  Prev: APPLY DICTIONARY,  Up: System and Portable Files

EXPORT
======

     EXPORT
             /OUTFILE='filename'
             /DROP=var_list
             /KEEP=var_list
             /RENAME=(src_names=target_names)...

   The EXPORT procedure writes the active file dictionary and data to a
specified portable file.

   The OUTFILE subcommand, which is the only required subcommand,
specifies the portable file to be written as a file name string or a
file handle (*note FILE HANDLE::).

   DROP, KEEP, and RENAME follow the same format as the SAVE procedure
(*note SAVE::).

   EXPORT is a procedure.  It causes the active file to be read.


File: pspp.info,  Node: GET,  Next: IMPORT,  Prev: EXPORT,  Up: System and Portable Files

GET
===

     GET
             /FILE='filename'
             /DROP=var_list
             /KEEP=var_list
             /RENAME=(src_names=target_names)...

   The GET transformation clears the current dictionary and active file
and replaces them with the dictionary and data from a specified system
file.

   The FILE subcommand is the only required subcommand.  Specify the
system file to be read as a string file name or a file handle (*note
FILE HANDLE::).

   By default, all the variables in a system file are read.  The DROP
subcommand can be used to specify a list of variables that are not to be
read.  By contrast, the KEEP subcommand can be used to specify variable
that are to be read, with all other variables not read.

   Normally variables in a system file retain the names that they were
saved under.  Use the RENAME subcommand to change these names.  Specify,
within parentheses, a list of variable names followed by an equals sign
(`=') and the names that they should be renamed to.  Multiple
parenthesized groups of variable names can be included on a single
RENAME subcommand.  Variables' names may be swapped using a RENAME
subcommand of the form `/RENAME=(A B=B A)'.

   Alternate syntax for the RENAME subcommand allows the parentheses to
be eliminated.  When this is done, only a single variable may be
renamed at once.  For instance, `/RENAME=A=B'.  This alternate syntax is
deprecated.

   DROP, KEEP, and RENAME are performed in left-to-right order.  They
each may be present any number of times.

   Please note that DROP, KEEP, and RENAME do not cause the system file
on disk to be modified.  Only the active file read from the system file
is changed.

   GET does not cause the data to be read, only the dictionary.  The
data is read later, when a procedure is executed.


File: pspp.info,  Node: IMPORT,  Next: MATCH FILES,  Prev: GET,  Up: System and Portable Files

IMPORT
======

     IMPORT
             /FILE='filename'
             /TYPE={COMM,TAPE}
             /DROP=var_list
             /KEEP=var_list
             /RENAME=(src_names=target_names)...

   The IMPORT transformation clears the active file dictionary and data
and replaces them with a dictionary and data from a portable file on
disk.

   The FILE subcommand, which is the only required subcommand, specifies
the portable file to be read as a file name string or a file handle
(*note FILE HANDLE::).

   The TYPE subcommand is currently not used.

   DROP, KEEP, and RENAME follow the syntax used by GET (*note GET::).

   IMPORT does not cause the data to be read, only the dictionary.  The
data is read later, when a procedure is executed.


File: pspp.info,  Node: MATCH FILES,  Next: SAVE,  Prev: IMPORT,  Up: System and Portable Files

MATCH FILES
===========

     MATCH FILES
             /BY var_list
             /{FILE,TABLE}={*,'filename'}
             /DROP=var_list
             /KEEP=var_list
             /RENAME=(src_names=target_names)...
             /IN=var_name
             /FIRST=var_name
             /LAST=var_name
             /MAP

   The MATCH FILES command merges one or more system files, optionally
including the active file.  Records with the same values for BY
variables are combined into a single record.  Records with different
values are output in order.  Thus, multiple sorted system files are
combined into a single sorted system file based on the value of the BY
variables.

   The BY subcommand specifies a list of variables that are used to
match records from each of the system files.  Variables specified must
exist in all the files specified on FILE and TABLE.  BY should usually
be specified.  If TABLE is used then BY is required.

   Specify FILE with a system file as a file name string or file handle
(*note FILE HANDLE::).  An asterisk (`*') may also be specified to
indicate the current active file.  The files specified on FILE are
merged together based on the BY variables, or combined case-by-case if
BY is not specified.  Normally at least two FILE subcommands should be
specified.

   Specify TABLE with a system file in order to use it as a "table
lookup file".  Records in table lookup files are not used up after
they've been used once.  This means that data in table lookup files can
correspond to any number of records in FILE files.  Table lookup files
correspond to lookup tables in traditional relational database systems.
It is incorrect to have records with duplicate BY values in table lookup
files.

   Any number of FILE and TABLE subcommands may be specified.  Each
instance of FILE or TABLE can be followed by DROP, KEEP, and/or RENAME
subcommands.  These take the same form as the corresponding subcommands
of GET (*note GET::), and perform the same functions.

   Variables belonging to files that are not present for the current
case are set to the system-missing value for numeric variables or
spaces for string variables.

   IN, FIRST, LAST, and MAP are currently not used.