This is pspp.info, produced by makeinfo version 4.0 from pspp.texi.

START-INFO-DIR-ENTRY
* PSPP: (pspp).             Statistical analysis package.
END-INFO-DIR-ENTRY

   PSPP, for statistical analysis of sampled data, by Ben Pfaff.

   This file documents PSPP, a statistical package for analysis of
sampled data that uses a command language compatible with SPSS.

   Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.

   This version of the PSPP documentation is consistent with version 2
of "texinfo.tex".

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above condition for modified
versions, except that this permission notice may be stated in a
translation approved by the Free Software Foundation.


File: pspp.info,  Node: SAVE,  Next: SYSFILE INFO,  Prev: MATCH FILES,  Up: System and Portable Files

SAVE
====

     SAVE
             /OUTFILE='filename'
             /{COMPRESSED,UNCOMPRESSED}
             /DROP=var_list
             /KEEP=var_list
             /RENAME=(src_names=target_names)...

   The SAVE procedure causes the dictionary and data in the active file
to be written to a system file.

   The FILE subcommand is the only required subcommand.  Specify the
system file to be written as a string file name or a file handle (*note
FILE HANDLE::).

   The COMPRESS and UNCOMPRESS subcommand determine whether the saved
system file is compressed.  By default, system files are compressed.
This default can be changed with the SET command (*note SET::).

   By default, all the variables in the active file dictionary are
written to the system file.  The DROP subcommand can be used to specify
a list of variables not to be written.  In contrast, KEEP specifies
variables to be written, with all variables not specified not written.

   Normally variables are saved to a system file under the same names
they have in the active file.  Use the RENAME command to change these
names.  Specify, within parentheses, a list of variable names followed
by an equals sign (`=') and the names that they should be renamed to.
Multiple parenthesized groups of variable names can be included on a
single RENAME subcommand.  Variables' names may be swapped using a
RENAME subcommand of the form `/RENAME=(A B=B A)'.

   Alternate syntax for the RENAME subcommand allows the parentheses to
be eliminated.  When this is done, only a single variable may be
renamed at once.  For instance, `/RENAME=A=B'.  This alternate syntax is
deprecated.

   DROP, KEEP, and RENAME are performed in left-to-right order.  They
each may be present any number of times.

   Please note that DROP, KEEP, and RENAME do not cause the active file
to be modified.  Only the system file written to disk is changed.

   SAVE causes the data to be read.  It is a procedure.


File: pspp.info,  Node: SYSFILE INFO,  Next: XSAVE,  Prev: SAVE,  Up: System and Portable Files

SYSFILE INFO
============

     SYSFILE INFO FILE='filename'.

   The SYSFILE INFO command reads the dictionary in a system file and
displays the information in its dictionary.

   Specify a file name or file handle.  SYSFILE INFO will read that
file as a system file and display information on its dictionary.

   The file does not replace the current active file.


File: pspp.info,  Node: XSAVE,  Prev: SYSFILE INFO,  Up: System and Portable Files

XSAVE
=====

     XSAVE
             /FILE='filename'
             /{COMPRESSED,UNCOMPRESSED}
             /DROP=var_list
             /KEEP=var_list
             /RENAME=(src_names=target_names)...

   The XSAVE transformation writes the active file dictionary and data
to a system file stored on disk.

   XSAVE is a transformation, not a procedure.  It is executed when the
data is read by a procedure or procedure-like command.  In all other
respects, XSAVE is identical to SAVE.  *Note SAVE::, for more
information on syntax and usage.


File: pspp.info,  Node: Variable Attributes,  Next: Data Manipulation,  Prev: System and Portable Files,  Up: Top

Manipulating variables
**********************

   The variables in the active file dictionary are important.  There are
several utility functions for examining and adjusting them.

* Menu:

* ADD VALUE LABELS::            Add value labels to variables.
* DISPLAY::                     Display variable names & descriptions.
* DISPLAY VECTORS::             Display a list of vectors.
* FORMATS::                     Set print and write formats.
* LEAVE::                       Don't clear variables between cases.
* MISSING VALUES::              Set missing values for variables.
* MODIFY VARS::                 Rename, reorder, and drop variables.
* NUMERIC::                     Create new numeric variables.
* PRINT FORMATS::               Set variable print formats.
* RENAME VARIABLES::            Rename variables.
* VALUE LABELS::                Set value labels for variables.
* STRING::                      Create new string variables.
* VARIABLE LABELS::             Set variable labels for variables.
* VECTOR::                      Declare an array of variables.
* WRITE FORMATS::               Set variable write formats.


File: pspp.info,  Node: ADD VALUE LABELS,  Next: DISPLAY,  Prev: Variable Attributes,  Up: Variable Attributes

ADD VALUE LABELS
================

     ADD VALUE LABELS
             /var_list value 'label' [value 'label']...

   ADD VALUE LABELS has the same syntax and purpose as VALUE LABELS (see
above), but it does not clear away value labels from the variables
before adding the ones specified.


File: pspp.info,  Node: DISPLAY,  Next: DISPLAY VECTORS,  Prev: ADD VALUE LABELS,  Up: Variable Attributes

DISPLAY
=======

     DISPLAY {NAMES,INDEX,LABELS,VARIABLES,DICTIONARY,SCRATCH}
             [SORTED] [var_list]

   DISPLAY displays requested information on variables.  Variables can
optionally be sorted alphabetically.  The entire dictionary or just
specified variables can be described.

   One of the following keywords can be present:

NAMES
     The variables' names are displayed.

INDEX
     The variables' names are displayed along with a value describing
     their position within the active file dictionary.

LABELS
     Variable names, positions, and variable labels are displayed.

VARIABLES
     Variable names, positions, print and write formats, and missing
     values are displayed.

DICTIONARY
     Variable names, positions, print and write formats, missing values,
     variable labels, and value labels are displayed.

SCRATCH
     Varible names are displayed, for scratch variables only (*note
     Scratch Variables::).

   If SORTED is specified, then the variables are displayed in ascending
order based on their names; otherwise, they are displayed in the order
that they occur in the active file dictionary.


File: pspp.info,  Node: DISPLAY VECTORS,  Next: FORMATS,  Prev: DISPLAY,  Up: Variable Attributes

DISPLAY VECTORS
===============

     DISPLAY VECTORS.

   The DISPLAY VECTORS command causes a list of the currently declared
vectors to be displayed.


File: pspp.info,  Node: FORMATS,  Next: LEAVE,  Prev: DISPLAY VECTORS,  Up: Variable Attributes

FORMATS
=======

     FORMATS var_list (fmt_spec).

   The FORMATS command set the print and write formats for the specified
variables to the specified format specification.  *Note Input/Output
Formats::.

   Specify a list of variables followed by a format specification in
parentheses.  The print and write formats of the specified variables
will be changed.

   Additional lists of variables and formats may be included if they are
delimited by a slash (`/').

   The FORMATS command takes effect immediately.  It is not affected by
conditional and looping structures such as DO IF or LOOP.


File: pspp.info,  Node: LEAVE,  Next: MISSING VALUES,  Prev: FORMATS,  Up: Variable Attributes

LEAVE
=====

     LEAVE var_list.

   The LEAVE command prevents the specified variables from being
reinitialized whenever a new case is processed.

   Normally, when a data file is processed, every variable in the active
file is initialized to the system-missing value or spaces at the
beginning of processing for each case.  When a variable has been
specified on LEAVE, this is not the case.  Instead, that variable is
initialized to 0 (not system-missing) or spaces for the first case.
After that, it retains its value between cases.

   This becomes useful for counters.  For instance, in the example below
the variable SUM maintains a running total of the values in the ITEM
variable.

     DATA LIST /ITEM 1-3.
     COMPUTE SUM=SUM+ITEM.
     PRINT /ITEM SUM.
     LEAVE SUM
     BEGIN DATA.
     123
     404
     555
     999
     END DATA.

Partial output from this example:

     123   123.00
     404   527.00
     555  1082.00
     999  2081.00

   It is best to use the LEAVE command immediately before invoking a
procedure command, because it is reset by certain transformations--for
instance, COMPUTE and IF.  LEAVE is also reset by all procedure
invocations.


File: pspp.info,  Node: MISSING VALUES,  Next: MODIFY VARS,  Prev: LEAVE,  Up: Variable Attributes

MISSING VALUES
==============

     MISSING VALUES var_list (missing_values).
     
     missing_values takes one of the following forms:
             num1
             num1, num2
             num1, num2, num3
             num1 THRU num2
             num1 THRU num2, num3
             string1
             string1, string2
             string1, string2, string3
     As part of a range, LO or LOWEST may take the place of num1;
     HI or HIGHEST may take the place of num2.

   The MISSING VALUES command sets user-missing values for numeric and
short string variables.  Long string variables may not have missing
values.

   Specify a list of variables, followed by a list of their user-missing
values in parentheses.  Up to three discrete values may be given, or,
for numeric variables only, a range of values optionally accompanied by
a single discrete value.  Ranges may be open-ended on one end, indicated
through the use of the keyword LO or LOWEST or HI or HIGHEST.

   The MISSING VALUES command takes effect immediately.  It is not
affected by conditional and looping constructs such as DO IF or LOOP.


File: pspp.info,  Node: MODIFY VARS,  Next: NUMERIC,  Prev: MISSING VALUES,  Up: Variable Attributes

MODIFY VARS
===========

     MODIFY VARS
             /REORDER={FORWARD,BACKWARD} {POSITIONAL,ALPHA} (var_list)...
             /RENAME=(old_names=new_names)...
             /{DROP,KEEP}=var_list
             /MAP

   The MODIFY VARS commands allows variables in the active file to be
reordered, renamed, or deleted from the active file.

   At least one subcommand must be specified, and no subcommand may be
specified more than once.  DROP and KEEP may not both be specified.

   The REORDER subcommand changes the order of variables in the active
file.  Specify one or more lists of variable names in parentheses.  By
default, each list of variables is rearranged into the specified order.
To put the variables into the reverse of the specified order, put
keyword BACKWARD before the parentheses.  To put them into alphabetical
order in the dictionary, specify keyword ALPHA before the parentheses.
BACKWARD and ALPHA may also be combined.

   To rename variables in the active file, specify RENAME, an equals
sign (`='), and lists of the old variable names and new variable names
separated by another equals sign within parentheses.  There must be the
same number of old and new variable names.  Each old variable is
renamed to the corresponding new variable name.  Multiple parenthesized
groups of variables may be specified.

   The DROP subcommand deletes a specified list of variables from the
active file.

   The KEEP subcommand keeps the specified list of variables in the
active file.  Any unlisted variables are delete from the active file.

   MAP is currently ignored.

   MODIFY VARS takes effect immediately.  It does not cause the data to
be read.


File: pspp.info,  Node: NUMERIC,  Next: PRINT FORMATS,  Prev: MODIFY VARS,  Up: Variable Attributes

NUMERIC
=======

     NUMERIC /var_list [(fmt_spec)].

   The NUMERIC command explicitly declares new numeric variables,
optionally setting their output formats.

   Specify a slash (`/'), followed by the names of the new numeric
variables.  If you wish to set their output formats, follow their names
by an output format specification in parentheses (*note Input/Output
Formats::).  If no output format specification is given then the
variables will default to F8.2.

   Variables created with NUMERIC will be initialized to the
system-missing value.


File: pspp.info,  Node: PRINT FORMATS,  Next: RENAME VARIABLES,  Prev: NUMERIC,  Up: Variable Attributes

PRINT FORMATS
=============

     PRINT FORMATS var_list (fmt_spec).

   The PRINT FORMATS command sets the print formats for the specified
variables to the specified format specification.

   Syntax is identical to that of FORMATS (*note FORMATS::), but the
PRINT FORMATS command sets only print formats, not write formats.


File: pspp.info,  Node: RENAME VARIABLES,  Next: VALUE LABELS,  Prev: PRINT FORMATS,  Up: Variable Attributes

RENAME VARIABLES
================

     RENAME VARIABLES (old_names=new_names)... .

   The RENAME VARIABLES command allows the names of variables in the
active file to be changed.

   To rename variables, specify lists of the old variable names and new
variable names, separated by an equals sign (`='), within parentheses.
There must be the same number of old and new variable names.  Each old
variable is renamed to the corresponding new variable name.  Multiple
parenthesized groups of variables may be specified.

   RENAME VARIABLES takes effect immediately.  It does not cause the
data to be read.


File: pspp.info,  Node: VALUE LABELS,  Next: STRING,  Prev: RENAME VARIABLES,  Up: Variable Attributes

VALUE LABELS
============

     VALUE LABELS
             /var_list value 'label' [value 'label']...

   The VALUE LABELS command allows values of numeric and short string
variables to be associated with labels.  In this way, a short value can
stand for a long value.

   In order to set up value labels for a set of variables, specify the
variable names after a slash (`/'), followed by a list of values and
their associated labels, separated by spaces.

   Before the VALUE LABELS command is executed, any existing value
labels are cleared from the variables specified.


File: pspp.info,  Node: STRING,  Next: VARIABLE LABELS,  Prev: VALUE LABELS,  Up: Variable Attributes

STRING
======

     STRING /var_list (fmt_spec).

   The STRING command creates new string variables for use in
transformations.

   Specify a slash (`/'), followed by the names of the string variables
to create and the desired output format specification in parentheses
(*note Input/Output Formats::).  Variable widths are implicitly derived
from the specified output formats.

   Created variables are initialized to spaces.


File: pspp.info,  Node: VARIABLE LABELS,  Next: VECTOR,  Prev: STRING,  Up: Variable Attributes

VARIABLE LABELS
===============

     VARIABLE LABELS
             /var_list 'var_label'.

   The VARIABLE LABELS command is used to associate an explanatory name
with a group of variables.  This name (a variable label) is displayed by
statistical procedures.

   To assign a variable label to a group of variables, specify a slash
(`/'), followed by the list of variable names and the variable label as
a string.


File: pspp.info,  Node: VECTOR,  Next: WRITE FORMATS,  Prev: VARIABLE LABELS,  Up: Variable Attributes

VECTOR
======

     Two possible syntaxes:
             VECTOR vec_name=var_list.
             VECTOR vec_name_list(count).

   The VECTOR command allows a group of variables to be accessed as if
they were consecutive members of an array with a vector(index) notation.

   To make a vector out of a set of existing variables, specify a name
for the vector followed by an equals sign (`=') and the variables that
belong in the vector.

   To make a vector and create variables at the same time, specify one
or more vector names followed by a count in parentheses.  This will
cause variables named `VEC1' through `VECCOUNT' to be created as
numeric variables.  Variable names including numeric suffixes may not
exceed 8 characters in length, and none of the variables may exist
prior to the VECTOR command.

   All the variables in a vector must be the same type.

   Vectors created with VECTOR disappear after any procedure or
procedure-like command is executed.  The variables contained in the
vectors remain, unless they are scratch variables (*note Scratch
Variables::).

   Variables within a vector may be references in expressions using
vector(index) syntax.


File: pspp.info,  Node: WRITE FORMATS,  Prev: VECTOR,  Up: Variable Attributes

WRITE FORMATS
=============

     WRITE FORMATS var_list (fmt_spec).

   The WRITE FORMATS command sets the write formats for the specified
variables to the specified format specification.

   Syntax is identical to that of FORMATS (*note FORMATS::), but the
WRITE FORMATS command sets only write formats, not print formats.


File: pspp.info,  Node: Data Manipulation,  Next: Data Selection,  Prev: Variable Attributes,  Up: Top

Data transformations
********************

   The PSPP procedures examined in this chapter manipulate data and
prepare the active file for later analyses.  They do not produce output,
as a rule.

* Menu:

* AGGREGATE::                   Summarize multiple cases into a single case.
* AUTORECODE::                  Automatic recoding of variables.
* COMPUTE::                     Assigning a variable a calculated value.
* COUNT::                       Counting variables with particular values.
* FLIP::                        Exchange variables with cases.
* IF::                          Conditionally assigning a calculated value.
* RECODE::                      Mapping values from one set to another.
* SORT CASES::                  Sort the active file.


File: pspp.info,  Node: AGGREGATE,  Next: AUTORECODE,  Prev: Data Manipulation,  Up: Data Manipulation

AGGREGATE
=========

     AGGREGATE
             /BREAK=var_list
             /PRESORTED
             /OUTFILE={*,'filename'}
             /DOCUMENT
             /MISSING=COLUMNWISE
             /dest_vars=agr_func(src_vars, args...)...

   The AGGREGATE command summarizes groups of cases into single cases.
Cases are divided into groups that have the same values for one or more
variables called "break variables".  Several functions are available
for summarizing case contents.

   BREAK is the only required subcommand (in addition, at least one
aggregation variable must be specified).  Specify a list of variable
names.  The values of these variables are used to divide the active file
into groups to be summarized.

   By default, the active file is sorted based on the break variables
before aggregation takes place.  If the active file is already sorted,
specify PRESORTED to save time.

   The OUTFILE subcommand specifies a system file by file name string or
file handle (*note FILE HANDLE::).  The aggregated cases are sent to
this file.  If OUTFILE is not specified, or if `*' is specified, then
the aggregated cases replace the active file.

   Normally the aggregate file does not receive the documents from the
active file, even if the aggregate file replaces the active file.
Specify DOCUMENT to have the documents from the active file copied to
the aggregate file.

   At least one aggregation variable must be specified.  Specify a list
of aggregation variables, an equals sign (`='), an aggregation function
name (see the list below), and a list of source variables in
parentheses.  In addition, some aggregation functions expect additional
arguments in the parentheses following the source variable names.

   There must be exactly as many source variables as aggregation
variables.  Each aggregation variable receives the results of applying
the specified aggregation function to the corresponding source
variable.  Most aggregation functions may be applied to numeric and
short and long string variables.  Others are restricted to numeric
values; these are marked as such in this list below.

   Any number of sets of aggregation variables may be specified.

   The available aggregation functions are as follows:

SUM(var_name)
     Sum.  Limited to numeric values.

MEAN(var_name)
     Arithmetic mean.  Limited to numeric values.

SD(var_name)
     Standard deviation of the mean.  Limited to numeric values.

MAX(var_name)
     Maximum value.

MIN(var_name)
     Minimum value.

FGT(var_name, value)
PGT(var_name, value)
     Fraction between 0 and 1, or percentage between 0 and 100,
     respectively, of values greater than the specified constant.

FLT(var_name, value)
PLT(var_name, value)
     Fraction or percentage, respectively, of values less than the
     specified constant.

FIN(var_name, low, high)
PIN(var_name, low, high)
     Fraction or percentage, respectively, of values within the
     specified inclusive range of constants.

FOUT(var_name, low, high)
POUT(var_name, low, high)
     Fraction or percentage, respectively, of values strictly outside
     the specified range of constants.

N(var_name)
     Number of non-missing values.

N
     Number of cases aggregated to form this group.  Don't supply a
     source variable for this aggregation function.

NU(var_name)
     Number of non-missing values.  Each case is considered to have a
     weight of 1, regardless of the current weighting variable (*note
     WEIGHT::).

NU
     Number of cases aggregated to form this group.  Each case is
     considered to have a weight of 1, regardless of the current
     weighting variable.

NMISS(var_name)
     Number of missing values.

NUMISS(var_name)
     Number of missing values.  Each case is considered to have a
     weight of 1, regardless of the current weighting variable.

FIRST(var_name)
     First value in this group.

LAST(var_name)
     Last value in this group.

   When string values are compared by aggregation functions, they are
done in terms of internal character codes.  On most modern computers,
this is a form of ASCII.

   In addition, there is a parallel set of aggregation functions having
the same names as those above, but with a dot after the last character
(for instance, `SUM.').  These functions are the same as the above,
except that they cause user-missing values, which are normally excluded
from calculations, to be included.

   Normally, only a single case (2 for SD and SD.) need be non-missing
in each group in order for the aggregate variable to be non-missing.  If
/MISSING=COLUMNWISE is specified, the behavior reverses: that is, a
single missing value is enough to make the aggregate variable become a
missing value.

   AGGREGATE ignores the current SPLIT FILE settings and causes them to
be canceled (*note SPLIT FILE::).


File: pspp.info,  Node: AUTORECODE,  Next: COMPUTE,  Prev: AGGREGATE,  Up: Data Manipulation

AUTORECODE
==========

     AUTORECODE VARIABLES=src_vars INTO dest_vars
             /DESCENDING
             /PRINT

   The AUTORECODE procedure considers the N values that a variable
takes on and maps them onto values 1...N on a new numeric variable.

   Subcommand VARIABLES is the only required subcommand and must come
first.  Specify VARIABLES, an equals sign (`='), a list of source
variables, INTO, and a list of target variables.  There must the same
number of source and target variables.  The target variables must not
already exist.

   By default, increasing values of a source variable (for a string,
this is based on character code comparisons) are recoded to increasing
values of its target variable.  To cause increasing values of a source
variable to be recoded to decreasing values of its target variable (N
down to 1), specify DESCENDING.

   PRINT is currently ignored.

   AUTORECODE is a procedure.  It causes the data to be read.


File: pspp.info,  Node: COMPUTE,  Next: COUNT,  Prev: AUTORECODE,  Up: Data Manipulation

COMPUTE
=======

     COMPUTE var_name = expression.

   `COMPUTE' creates a variable with the name specified (if necessary),
then evaluates the given expression for every case and assigns the
result to the variable.  *Note Expressions::.

   Numeric variables created or computed by `COMPUTE' are assigned an
output width of 8 character with two decimal places (`F8.2').  String
variables created or computed by `COMPUTE' have the same width as the
existing variable or constant.

   COMPUTE is a transformation.  It does not cause the active file to be
read.


File: pspp.info,  Node: COUNT,  Next: FLIP,  Prev: COMPUTE,  Up: Data Manipulation

COUNT
=====

     COUNT var_name = var... (value...).
     
     Each value takes one of the following forms:
             number
             string
             num1 THRU num2
             MISSING
             SYSMIS
     In addition, num1 and num2 can be LO or LOWEST, or HI or HIGHEST,
     respectively.

   `COUNT' creates or replaces a numeric "target" variable that counts
the occurrence of a "criterion" value or set of values over one or more
"test" variables for each case.

   The target variable values are always nonnegative integers.  They are
never missing.  The target variable is assigned an F8.2 output format.
*Note Input/Output Formats::.  Any variables, including long and short
string variables, may be test variables.

   User-missing values of test variables are treated just like any other
values.  They are *not* treated as system-missing values.  User-missing
values that are criterion values or inside ranges of criterion values
are counted as any other values.  However (for numeric variables),
keyword `MISSING' may be used to refer to all system- and user-missing
values.

   `COUNT' target variables are assigned values in the order specified.
In the command `COUNT A=A B(1) /B=A B(2).', the following actions
occur:

   - The number of occurrences of 1 between `A' and `B' is counted.

   - `A' is assigned this value.

   - The number of occurrences of 1 between `B' and the *new* value of
     `A' is counted.

   - `B' is assigned this value.

   Despite this ordering, all `COUNT' criterion variables must exist
before the procedure is executed--they may not be created as target
variables earlier in the command!  Break such a command into two
separate commands.

   The examples below may help to clarify.

  A. Assuming `Q0', `Q2', ..., `Q9' are numeric variables, the
     following commands:

       1. Count the number of times the value 1 occurs through these
          variables for each case and assigns the count to variable
          `QCOUNT'.

       2. Print out the total number of times the value 1 occurs
          throughout _all_ cases using `DESCRIPTIVES'.  *Note
          DESCRIPTIVES::, for details.

          COUNT QCOUNT=Q0 TO Q9(1).
          DESCRIPTIVES QCOUNT /STATISTICS=SUM.

  B. Given these same variables, the following commands:

       1. Count the number of valid values of these variables for each
          case and assigns the count to variable `QVALID'.

       2. Multiplies each value of `QVALID' by 10 to obtain a
          percentage of valid values, using `COMPUTE'.  *Note
          COMPUTE::, for details.

       3. Print out the percentage of valid values across all cases,
          using `DESCRIPTIVES'.  *Note DESCRIPTIVES::, for details.

          COUNT QVALID=Q0 TO Q9 (LO THRU HI).
          COMPUTE QVALID=QVALID*10.
          DESCRIPTIVES QVALID /STATISTICS=MEAN.


File: pspp.info,  Node: FLIP,  Next: IF,  Prev: COUNT,  Up: Data Manipulation

FLIP
====

     FLIP /VARIABLES=var_list /NEWNAMES=var_name.

   The FLIP command transposes rows and columns in the active file.  It
causes cases to be swapped with variables, and vice versa.

   There are no required subcommands.  The VARIABLES subcommand
specifies variables that will be transformed into cases.  Variables not
specified are discarded.  By default, all variables are selected for
transposition.

   The variables specified by NEWNAMES, which must be a string
variable, is used to give names to the variables created by FLIP.  If
NEWNAMES is not specified then the default is a variable named
CASE_LBL, if it exists.  If it does not then the variables created by
FLIP are named VAR000 through VAR999, then VAR1000, VAR1001, and so on.

   When a NEWNAMES variable is available, the names must be
canonicalized before becoming variable names.  Invalid characters are
replaced by letter `V' in the first position, or by `_' in subsequent
positions.  If the name thus generated is not unique, then numeric
extensions are added, starting with 1, until a unique name is found or
there are no remaining possibilities.  If the latter occurs then the
FLIP operation aborts.

   The resultant dictionary contains a CASE_LBL variable, which stores
the names of the variables in the dictionary before the transposition.
If the active file is subsequently transposed using FLIP, this variable
can be used to recreate the original variable names.


File: pspp.info,  Node: IF,  Next: RECODE,  Prev: FLIP,  Up: Data Manipulation

IF
==

     Two possible syntaxes:
             IF test_expr target_var=target_expr.
             IF test_expr target_vec(target_index)=target_expr.

   The IF transformation conditionally assigns the value of a target
expression to a target variable, based on the truth of a test
expression.

   Specify a boolean-valued expression (*note Expressions::) to be
tested following the IF keyword.  This expression is calculated for
each case.  If the value is true, then the value of target_expr is
computed and assigned to target_var.  If the value is false or missing,
nothing is done.  Numeric and short and long string variables may be
used.  The type of target_expr must match the type of target_var.

   For numeric variables only, target_var need not exist before the IF
transformation is executed.  In this case, target_var is assigned the
system-missing value if the IF condition is not true.  String variables
must be declared before they can be used as targets for IF.

   In addition to ordinary variables, the target variable may be an
element of a vector.  In this case, the vector index must be specified
in parentheses following the vector name.


File: pspp.info,  Node: RECODE,  Next: SORT CASES,  Prev: IF,  Up: Data Manipulation

RECODE
======

     RECODE var_list (src_value...=dest_value)... [INTO var_list].
     
     src_value may take the following forms:
             number
             string
             num1 THRU num2
             MISSING
             SYSMIS
             ELSE
     Open-ended ranges may be specified using LO or LOWEST for num1
     or HI or HIGHEST for num2.
     
     dest_value may take the following forms:
             num
             string
             SYSMIS
             COPY

   The RECODE command is used to translate data from one range of
values to another, using flexible user-specified mappings.  Data may be
remapped in-place or copied to new variables.  Numeric, short string,
and long string data can be recoded.

   Specify the list of source variables, followed by one or more mapping
specifications each enclosed in parentheses.  If the data is to be
copied to new variables, specify INTO, then the list of target
variables.  String target variables must already have been declared
using STRING or another transformation, but numeric target variables can
be created on the fly.  There must be exactly as many target variables
as source variables.  Each source variable is remapped into its
corresponding target variable.

   When INTO is not used, the input and output variables must be of the
same type.  Otherwise, string values can be recoded into numeric values,
and vice versa.  When this is done and there is no mapping for a
particular value, either a value consisting of all spaces or the
system-missing value is assigned, depending on variable type.

   Mappings are considered from left to right.  The first src_value that
matches the value of the source variable causes the target variable to
receive the value indicated by the dest_value.  Literal number, string,
and range src_value's should be self-explanatory.  MISSING as a
src_value matches any user- or system-missing value.  SYSMIS matches the
system missing value only.  ELSE is a catch-all that matches anything.
It should be the last src_value specified.

   Numeric and string dest_value's should also be self-explanatory.
COPY causes the input values to be copied to the output.  This is only
value if the source and target variables are of the same type.  SYSMIS
indicates the system-missing value.

   If the source variables are strings and the target variables are
numeric, then there is one additional mapping available: (CONVERT),
which must be the last specified mapping.  CONVERT causes a number
specified as a string to be converted to a numeric value.  If the string
cannot be parsed as a number, then the system-missing value is assigned.

   Multiple recodings can be specified on the same RECODE command.
Introduce additional recodings with a slash (`/') in order to separate
them from the previous recodings.


File: pspp.info,  Node: SORT CASES,  Prev: RECODE,  Up: Data Manipulation

SORT CASES
==========

     SORT CASES BY var_list.

   SORT CASES sorts the active file by the values of one or more
variables.

   Specify BY and a list of variables to sort by.  By default, variables
are sorted in ascending order.  To override sort order, specify (D) or
(DOWN) after a list of variables to get descending order, or (A) or (UP)
for ascending order.  These apply to the entire list of variables
preceding them.

   SORT CASES is a procedure.  It causes the data to be read.

   SORT CASES will attempt to sort the entire active file in main
memory.  If main memory is exhausted then it will use a merge sort
algorithm that involves writing and reading numerous temporary files.
Environment variables determine the temporary files' location.  The
first of SPSSTMPDIR, SPSSXTMPDIR, or TMPDIR that is set determines the
location.  Otherwise, if the compiler environment defined P_tmpdir,
that is used.  Otherwise, under Unix-like OSes /tmp is used; under
MS-DOS, the first of TEMP, TMP, or root on the current drive is used;
under other OSes, the current directory.


File: pspp.info,  Node: Data Selection,  Next: Conditionals and Looping,  Prev: Data Manipulation,  Up: Top

Selecting data for analysis
***************************

   This chapter documents PSPP commands that temporarily or permanently
select data records from the active file for analysis.

* Menu:

* FILTER::                      Exclude cases based on a variable.
* N OF CASES::                  Limit the size of the active file.
* PROCESS IF::                  Temporarily excluding cases.
* SAMPLE::                      Select a specified proportion of cases.
* SELECT IF::                   Permanently delete selected cases.
* SPLIT FILE::                  Do multiple analyses with one command.
* TEMPORARY::                   Make transformations' effects temporary.
* WEIGHT::                      Weight cases by a variable.


File: pspp.info,  Node: FILTER,  Next: N OF CASES,  Prev: Data Selection,  Up: Data Selection

FILTER
======

     FILTER BY var_name.
     FILTER OFF.

   The FILTER command allows a boolean-valued variable to be used to
select cases from the data stream for processing.

   In order to set up filtering, specify BY and a variable name.
Keyword BY is optional but recommended.  Cases which have a zero or
system- or user-missing value are excluded from analysis, but not
deleted from the data stream.  Cases with other values are analyzed.

   Use FILTER OFF to turn off case filtering.

   Filtering takes place immediately before cases pass to a procedure
for analysis.  Only one filter variable may be active at once.
Normally, case filtering continues until it is explicitly turned off
with FILTER OFF.  However, if FILTER is placed after TEMPORARY, then
filtering stops after execution of the next procedure or procedure-like
command.


File: pspp.info,  Node: N OF CASES,  Next: PROCESS IF,  Prev: FILTER,  Up: Data Selection

N OF CASES
==========

     N [OF CASES] num_of_cases [ESTIMATED].

   Sometimes you may want to disregard cases of your input.  The `N'
command can be used to do this.  `N 100' tells PSPP to disregard all
cases after the first 100.

   If the value specified for `N' is greater than the number of cases
read in, the value is ignored.

   `N' does not discard cases or cause them not to be read in.  It just
causes cases beyond the last one specified to be ignored by data
analysis commands.

   A later `N' command can increase or decrease the number of cases
selected.  (To select all the cases without knowing how many there are,
specify a very high number: 100000 or whatever you think is large
enough.)

   Transformation procedures performed after `N' is executed _do_ cause
cases to be discarded.

   The `SAMPLE', `PROCESS IF', and `SELECT IF' commands have precedence
over `N'--the same results are obtained by both of the following
fragments, given the same random number seeds:

     ...set up, read in data...
     N 100.
     SAMPLE .5.
     ...analyze data...
     
     ...set up, read in data...
     SAMPLE .5.
     N 100.
     ...analyze data...

   Both fragments above first randomly sample approximately half of the
cases, then select the first 100 of those sampled.

   `N' with the `ESTIMATED' keyword can be used to give an estimated
number of cases before DATA LIST or another command to read in data.
(`ESTIMATED' never limits the number of cases processed by procedures.)


File: pspp.info,  Node: PROCESS IF,  Next: SAMPLE,  Prev: N OF CASES,  Up: Data Selection

PROCESS IF
==========

     PROCESS IF expression.

   The PROCESS IF command is used to temporarily eliminate cases from
the data stream.  Its effects are active only through the execution of
the next procedure or procedure-like command.

   Specify a boolean expression (*note Expressions::).  If the value of
the expression is true for a particular case, the case will be
analyzed.  If the expression has a false or missing value, then the
case will be deleted from the data stream for this procedure only.

   Regardless of its placement relative to other commands, PROCESS IF
always takes effect immediately before data passes to the procedure.
Only one PROCESS IF command may be in effect at any given time.

   The effects of PROCESS IF are similar not identical to the effects of
executing TEMPORARY then SELECT IF (*note SELECT IF::).

   Use of PROCESS IF is deprecated.  It is included for compatibility
with old command files.  New syntax files should use SELECT IF or FILTER
instead.


File: pspp.info,  Node: SAMPLE,  Next: SELECT IF,  Prev: PROCESS IF,  Up: Data Selection

SAMPLE
======

     SAMPLE num1 [FROM num2].

   `SAMPLE' is used to randomly sample a proportion of the cases in the
active file.  `SAMPLE' is temporary, affecting only the next procedure,
unless that is a data transformation, such as `SELECT IF' or `RECODE'.

   The proportion to sample can be expressed as a single number between
0 and 1.  If `k' is the number specified, and `N' is the number of
currently-selected cases in the active file, then after `SAMPLE K.',
there will be `k*N', plus or minus one, cases selected.

   The proportion to sample can also be specified in the style `SAMPLE
M FROM N'.  With this style, cases are selected as follows:

  1. If N is equal to the number of currently-selected cases in the
     active file, exactly M cases will be selected.

  2. If N is greater than the number of currently-selected cases in the
     active file, an equivalent proportion of cases will be selected.

  3. If N is less than the number of currently-selected cases in the
     active, exactly M cases will be selected _from the first N cases
     in the active file._

   `SAMPLE', `SELECT IF', and `PROCESS IF' are performed in the order
specified by the syntax file.

   `SAMPLE' is ignored before `SORT CASES'.

   `SAMPLE' is always performed before `N OF CASES', regardless of
ordering in the syntax file.  *Note N OF CASES::.

   The same values for `SAMPLE' may result in different samples.  To
obtain the same sample, use the `SET' command to set the random number
seed to the same value before each `SAMPLE'.  By default, the random
number seed is based on the system time.


File: pspp.info,  Node: SELECT IF,  Next: SPLIT FILE,  Prev: SAMPLE,  Up: Data Selection

SELECT IF
=========

     SELECT IF expression.

   The SELECT IF command is used to select particular cases for analysis
based on the value of a boolean expression.  Cases not selected are
permanently eliminated, unless TEMPORARY is in effect (*note
TEMPORARY::).

   Specify a boolean expression (*note Expressions::).  If the value of
the expression is true for a particular case, the case will be
analyzed.  If the expression has a false or missing value, then the
case will be deleted from the data stream.

   Always place SELECT IF commands as early in the command file as
possible.  Cases that are deleted early can be processed more
efficiently in time and space.


File: pspp.info,  Node: SPLIT FILE,  Next: TEMPORARY,  Prev: SELECT IF,  Up: Data Selection

SPLIT FILE
==========

     Two possible syntaxes:
             SPLIT FILE BY var_list.
             SPLIT FILE OFF.

   The SPLIT FILE command allows multiple sets of data present in one
data file to be analyzed separately using single statistical procedure
commands.

   Specify a list of variable names in order to analyze multiple sets of
data separately.  Groups of cases having the same values for these
variables are analyzed by statistical procedure commands as one group.
An independent analysis is carried out for each group of cases, and the
variable values for the group are printed along with the analysis.

   Specify OFF in order to disable SPLIT FILE and resume analysis of the
entire active file as a single group of data.


File: pspp.info,  Node: TEMPORARY,  Next: WEIGHT,  Prev: SPLIT FILE,  Up: Data Selection

TEMPORARY
=========

     TEMPORARY.

   The TEMPORARY command is used to make the effects of transformations
following its execution temporary.  These transformations will affect
only the execution of the next procedure or procedure-like command.
Their effects will not be saved to the active file.

   The only specification is the command name.

   TEMPORARY may not appear within a DO IF or LOOP construct.  It may
appear only once between procedures and procedure-like commands.

   An example may help to clarify:

     DATA LIST /X 1-2.
     BEGIN DATA.
      2
      4
     10
     15
     20
     24
     END DATA.
     COMPUTE X=X/2.
     TEMPORARY.
     COMPUTE X=X+3.
     DESCRIPTIVES X.
     DESCRIPTIVES X.

   The data read by the first DESCRIPTIVES command are 4, 5, 8, 10.5,
13, 15.  The data read by the first DESCRIPTIVES command are 1, 2, 5,
7.5, 10, 12.


File: pspp.info,  Node: WEIGHT,  Prev: TEMPORARY,  Up: Data Selection

WEIGHT
======

     WEIGHT BY var_name.
     WEIGHT OFF.

   WEIGHT can be used to assign cases varying weights in order to
change the frequency distribution of the active file.  Execution of
WEIGHT is delayed until data have been read in.

   If a variable name is specified, WEIGHT causes the values of that
variable to be used as weighting factors for subsequent statistical
procedures.  Use of keyword BY is optional but recommended.  Weighting
variables must be numeric.  Scratch variables may not be used for
weighting (*note Scratch Variables::).

   When OFF is specified, subsequent statistical procedures will weight
all cases equally.

   Weighting values do not need to be integers.  However, negative and
system- and user-missing values for the weighting variable are
interpreted as weighting factors of 0.

   WEIGHT does not cause cases in the active file to be replicated in
memory.


File: pspp.info,  Node: Conditionals and Looping,  Next: Statistics,  Prev: Data Selection,  Up: Top

Conditional and Looping Constructs
**********************************

   This chapter documents PSPP commands used for conditional execution,
looping, and flow of control.

* Menu:

* BREAK::                       Exit a loop.
* DO IF::                       Conditionally execute a block of code.
* DO REPEAT::                   Textually repeat a code block.
* LOOP::                        Repeat a block of code.


File: pspp.info,  Node: BREAK,  Next: DO IF,  Prev: Conditionals and Looping,  Up: Conditionals and Looping

BREAK
=====

     BREAK.

   BREAK terminates execution of the innermost currently executing LOOP
construct.

   BREAK is allowed only inside a LOOP construct.  *Note LOOP::, for
more details.


File: pspp.info,  Node: DO IF,  Next: DO REPEAT,  Prev: BREAK,  Up: Conditionals and Looping

DO IF
=====

     DO IF condition.
             ...
     [ELSE IF condition.
             ...
     ]...
     [ELSE.
             ...]
     END IF.

   The DO IF command allows one of several sets of transformations to be
executed, depending on user-specified conditions.

   Specify a boolean expression.  If the condition is true, then the
block of code following DO IF is executed.  If the condition is
missing, then none of the code blocks is executed.  If the condition is
false, then the boolean expressions on the first ELSE IF, if present,
is tested in turn, with the same rules applied.  If all expressions
evaluate to false, then the ELSE code block is executed, if it is
present.


File: pspp.info,  Node: DO REPEAT,  Next: LOOP,  Prev: DO IF,  Up: Conditionals and Looping

DO REPEAT
=========

     DO REPEAT repvar_name=expansion....
             ...
     END REPEAT [PRINT].
     
     expansion takes one of the following forms:
             var_list
             num_or_range...
             'string'...
     
     num_or_range takes one of the following forms:
             number
             num1 TO num2

   The DO REPEAT command causes a block of code to be repeated a number
of times with different variables, numbers, or strings textually
substituted into the block with each repetition.

   Specify a repeat variable name followed by an equals sign (`=') and
the list of replacements.  Replacements can be a list of variables
(which may be existing variables or new variables or a combination
thereof), of numbers, or of strings.  When new variable names are
specified, DO REPEAT creates them as numeric variables.  When numbers
are specified, runs of integers may be indicated with TO notation, for
instance `1 TO 5' and `1 2 3 4 5' would be equivalent.  There is no
equivalent notation for string values.

   Multiple repeat variables can be specified.  When this is done, each
variable must have the same number of replacements.

   The code within DO REPEAT is repeated as many times as there are
replacements for each variable.  The first time, the first value for
each repeat variable is substituted; the second time, the second value
for each repeat variable is substituted; and so on.

   Repeat variable substitutions work like macros.  They take place
anywhere in a line that the repeat variable name occurs as a token,
including command and subcommand names.  For this reason it is not a
good idea to select words commonly used in command and subcommand names
as repeat variable identifiers.

   If PRINT is specified on END REPEAT, the commands after substitutions
are made are printed to the listing file, prefixed by a plus sign (`+').


File: pspp.info,  Node: LOOP,  Prev: DO REPEAT,  Up: Conditionals and Looping

LOOP
====

     LOOP [index_var=start TO end [BY incr]] [IF condition].
             ...
     END LOOP [IF condition].

   The LOOP command allows a group of commands to be iterated.  A
number of termination options are offered.

   Specify index_var in order to make that variable count from one
value to another by a particular increment.  index_var must be a
pre-existing numeric variable.  start, end, and incr are numeric
expressions (*note Expressions::.)

   During the first iteration, index_var is set to the value of start.
During each successive iteration, index_var is increased by the value of
incr.  If end > start, then the loop terminates when index_var > end;
otherwise it terminates when index_var < end.  If incr is not specified
then it defaults to +1 or -1 as appropriate.

   If end > start and incr < 0, or if end < start and incr > 0, then the
loop is never executed.  index_var is nevertheless set to the value of
start.

   Modifying index_var within the loop is allowed, but it has no effect
on the value of index_var in the next iteration.

   Specify a boolean expression for the condition on the LOOP command to
cause the loop to be executed only if the condition is true.  If the
condition is false or missing before the loop contents are executed the
first time, the loop contents are not executed at all.

   If index and condition clauses are both present on LOOP, the index
clause is always evaluated first.

   Specify a boolean expression for the condition on the END LOOP to
cause the loop to terminate if the condition is not true after the
enclosed code block is executed.  The condition is evaluated at the end
of the loop, not at the beginning.

   If the index clause and both condition clauses are not present, then
the loop is executed MXLOOPS (*note SET::) times or until BREAK (*note
BREAK::) is executed.

   The BREAK command provides another way to terminate execution of a
LOOP construct.


File: pspp.info,  Node: Statistics,  Next: Utilities,  Prev: Conditionals and Looping,  Up: Top

Statistics
**********

   This chapter documents the statistical procedures that PSPP supports
so far.

* Menu:

* DESCRIPTIVES::                Descriptive statistics.
* FREQUENCIES::                 Frequency tables.
* CROSSTABS::                   Crosstabulation tables.