@cindex transformations
The PSPP procedures examined in this chapter manipulate data and
-prepare the active file for later analyses. They do not produce output,
+prepare the active dataset for later analyses. They do not produce output,
as a rule.
@menu
* FLIP:: Exchange variables with cases.
* IF:: Conditionally assigning a calculated value.
* RECODE:: Mapping values from one set to another.
-* SORT CASES:: Sort the active file.
+* SORT CASES:: Sort the active dataset.
@end menu
@node AGGREGATE
@display
AGGREGATE
- OUTFILE=@{*,'file-name',file_handle@}
+ OUTFILE=@{*,'file-name',file_handle@} [MODE=@{REPLACE, ADDVARIABLES@}]
/PRESORTED
/DOCUMENT
/MISSING=COLUMNWISE
for summarizing case contents.
The OUTFILE subcommand is required and must appear first. Specify a
-system file, portable file, or scratch file by file name or file
-handle (@pxref{File Handles}).
+system file or portable file by file name or file
+handle (@pxref{File Handles}), or a dataset by its name
+(@pxref{Datasets}).
The aggregated cases are written to this file. If @samp{*} is
-specified, then the aggregated cases replace the active file. Use of
-OUTFILE to write a portable file or scratch file is a PSPP extension.
-
-By default, the active file will be sorted based on the break variables
-before aggregation takes place. If the active file is already sorted
+specified, then the aggregated cases replace the active dataset's data.
+Use of OUTFILE to write a portable file is a PSPP extension.
+
+If OUTFILE=@samp{*} is given, then the subcommand MODE may also be
+specified.
+The mode subcommand has two possible values: ADDVARIABLES or REPLACE.
+In REPLACE mode, the entire active dataset is replaced by a new dataset
+which contains just the break variables and the destination varibles.
+In this mode, the new file will contain as many cases as there are
+unique combinations of the break variables.
+In ADDVARIABLES mode, the destination variables will be appended to
+the existing active dataset.
+Cases which have identical combinations of values in their break
+variables, will receive identical values for the destination variables.
+The number of cases in the active dataset will remain unchanged.
+Note that if ADDVARIABLES is specified, then the data @emph{must} be
+sorted on the break variables.
+
+By default, the active dataset will be sorted based on the break variables
+before aggregation takes place. If the active dataset is already sorted
or otherwise grouped in terms of the break variables, specify
PRESORTED to save time.
+PRESORTED is assumed if MODE=ADDVARIABLES is used.
-Specify DOCUMENT to copy the documents from the active file into the
+Specify DOCUMENT to copy the documents from the active dataset into the
aggregate file (@pxref{DOCUMENT}). Otherwise, the aggregate file will
not contain any documents, even if the aggregate file replaces the
-active file.
+active dataset.
Normally, only a single case (for SD and SD., two cases) need be
non-missing in each group for the aggregate variable to be
At least one break variable must be specified on BREAK, a
required subcommand. The values of these variables are used to divide
-the active file into groups to be summarized. In addition, at least
+the active dataset into groups to be summarized. In addition, at least
one @var{dest_var} must be specified.
One or more sets of aggregation variables must be specified. Each set
@display
AUTORECODE VARIABLES=src_vars INTO dest_vars
- /DESCENDING
- /PRINT
+ [ /DESCENDING ]
+ [ /PRINT ]
+ [ /GROUP ]
+ [ /BLANK = @{VALID, MISSING@} ]
@end display
The @cmd{AUTORECODE} procedure considers the @var{n} values that a variable
PRINT is currently ignored.
+The GROUP subcommand is relevant only if more than one variable is to be
+recoded. It causes a single mapping between source and target values to
+be used, instead of one map per variable.
+
+If /BLANK=MISSING is given, then string variables which contain only
+whitespace are recoded as SYSMIS. If /BLANK=VALID is given then they
+will be allocated a value like any other. /BLANK is not relevant
+to numeric values. /BLANK=VALID is the default.
+
@cmd{AUTORECODE} is a procedure. It causes the data to be read.
@node COMPUTE
(@pxref{LEAVE}) resets the variable's left state. Therefore,
@code{LEAVE} should be specified following @cmd{COMPUTE}, not before.
-@cmd{COMPUTE} is a transformation. It does not cause the active file to be
+@cmd{COMPUTE} is a transformation. It does not cause the active dataset to be
read.
When @cmd{COMPUTE} is specified following @cmd{TEMPORARY}
FLIP /VARIABLES=var_list /NEWNAMES=var_name.
@end display
-@cmd{FLIP} transposes rows and columns in the active file. It
+@cmd{FLIP} transposes rows and columns in the active dataset. It
causes cases to be swapped with variables, and vice versa.
-All variables in the transposed active file are numeric. String
+All variables in the transposed active dataset are numeric. String
variables take on the system-missing value in the transposed file.
No subcommands are required. If specified, the VARIABLES subcommand
The resultant dictionary contains a CASE_LBL variable, a string
variable of width 8, which stores the names of the variables in the
dictionary before the transposition. Variables names longer than 8
-characters are truncated. If the active file is subsequently
+characters are truncated. If the active dataset is subsequently
transposed using @cmd{FLIP}, this variable can be used to recreate the
original variable names.
SORT CASES BY var_list[(@{D|A@}] [ var_list[(@{D|A@}] ] ...
@end display
-@cmd{SORT CASES} sorts the active file by the values of one or more
+@cmd{SORT CASES} sorts the active dataset by the values of one or more
variables.
Specify BY and a list of variables to sort by. By default, variables
@cmd{SORT CASES} is a procedure. It causes the data to be read.
-@cmd{SORT CASES} attempts to sort the entire active file in main memory.
+@cmd{SORT CASES} attempts to sort the entire active dataset in main memory.
If workspace is exhausted, it falls back to a merge sort algorithm that
involves creates numerous temporary files.