@c PSPP - a program for statistical analysis.
-@c Copyright (C) 2017 Free Software Foundation, Inc.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
IF}, the case limit is applied to the cases obtained after sampling or
case selection, regardless of how @cmd{N OF CASES} is placed relative
to @cmd{SAMPLE} or @cmd{SELECT IF} in the command file. Thus, the
-commands @code{N OF CASES 100} and @code{SAMPLE .5} will both randomly
+commands @code{N OF CASES 100} and @code{SAMPLE .5} both randomly
sample approximately half of the active dataset's cases, then select the
first 100 of those sampled, regardless of their order in the command
file.
The proportion to sample can be expressed as a single number between 0
and 1. If @var{k} is the number specified, and @var{N} is the number
of currently-selected cases in the active dataset, then after
-@subcmd{SAMPLE @var{k}.}, approximately @var{k}*@var{N} cases will be
+@subcmd{SAMPLE @var{k}.}, approximately @var{k}*@var{N} cases are
selected.
The proportion to sample can also be specified in the style @subcmd{SAMPLE
@enumerate
@item
If @var{N} is equal to the number of currently-selected cases in the
-active dataset, exactly @var{m} cases will be selected.
+active dataset, exactly @var{m} cases are selected.
@item
If @var{N} is greater than the number of currently-selected cases in the
-active dataset, an equivalent proportion of cases will be selected.
+active dataset, an equivalent proportion of cases are selected.
@item
If @var{N} is less than the number of currently-selected cases in the
-active, exactly @var{m} cases will be selected @emph{from the first
+active, exactly @var{m} cases are selected @emph{from the first
@var{N} cases in the active dataset.}
@end enumerate
(@pxref{TEMPORARY}).
Specify a boolean expression (@pxref{Expressions}). If the value of the
-expression is true for a particular case, the case will be analyzed. If
-the expression has a false or missing value, then the case will be
+expression is true for a particular case, the case is analyzed. If
+the expression has a false or missing value, then the case is
deleted from the data stream.
Place @cmd{SELECT IF} as early in the command file as
possible. Cases that are deleted early can be processed more
efficiently in time and space.
+Once cases have been deleted from the active dataset using @cmd{SELECT IF} they
+cannot be re-instated.
+If you want to be able to re-instate cases, then use @cmd{FILTER} (@pxref{FILTER})
+instead.
When @cmd{SELECT IF} is specified following @cmd{TEMPORARY}
(@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
(@pxref{LAG}).
+@subsection Example Select-If
+
+A shop steward is interested in the salaries of younger personnel in a firm.
+The file @file{personnel.sav} provides the salaries of all the workers and their
+dates of birth. The syntax in @ref{select-if:ex} shows how @cmd{SELECT IF} can
+be used to limit analysis only to those persons born after December 31, 1999.
+
+@float Example, select-if:ex
+@psppsyntax {select-if.sps}
+@caption {Using @cmd{SELECT IF} to select persons born on or after a certain date.}
+@end float
+
+From @ref{select-if:res} one can see that there are 56 persons listed in the dataset,
+and 17 of them were born after December 31, 1999.
+
+@float Result, select-if:res
+@psppoutput {select-if}
+@caption {Salary descriptives before and after the @cmd{SELECT IF} transformation.}
+@end float
+
+Note that the @file{personnel.sav} file from which the data were read is unaffected.
+The transformation affects only the active file.
+
@node SPLIT FILE
@section SPLIT FILE
@vindex SPLIT FILE
variable values for the group are printed along with the analysis.
When a list of variable names is specified, one of the keywords
-@subcmd{LAYERED} or @subcmd{SEPARATE} may also be specified. If provided, either
-keyword are ignored.
+@subcmd{LAYERED} or @subcmd{SEPARATE} may also be specified. With
+@subcmd{LAYERED}, which is the default, the separate analyses for each
+group are presented together in a single table. With
+@subcmd{SEPARATE}, each analysis is presented in a separate table.
+Not all procedures honor the distinction.
Groups are formed only by @emph{adjacent} cases. To create a split
using a variable where like values are not adjacent in the working file,
-you should first sort the data by that variable (@pxref{SORT CASES}).
+first sort the data by that variable (@pxref{SORT CASES}).
Specify @subcmd{OFF} to disable @cmd{SPLIT FILE} and resume analysis of the
entire active dataset as a single group of data.
split.
@float Example, split:res
-@psppsyntax {split.out}
+@psppoutput {split}
@caption {The results of running @cmd{DESCRIPTIVES} with an active split}
@end float
Unless @cmd{TEMPORARY} was used, after a split has been defined for
a dataset it remains active until explicitly disabled.
+In the graphical user interface, the active split variable (if any) is
+displayed in the status bar (@pxref{split-status-bar:scr}.
+If a dataset is saved to a system file (@pxref{SAVE}) whilst a split
+is active, the split stastus is stored in the file and will be
+automatically loaded when that file is loaded.
+
+@float Screenshot, split-status-bar:scr
+@psppimage {split-status-bar}
+@caption {The status bar indicating that the data set is split using the @exvar{treatment} variable}
+@end float
+
@node TEMPORARY
@section TEMPORARY
@end display
@cmd{TEMPORARY} is used to make the effects of transformations
-following its execution temporary. These transformations will
+following its execution temporary. These transformations
affect only the execution of the next procedure or procedure-like
-command. Their effects will not be saved to the active dataset.
+command. Their effects are not be saved to the active dataset.
The only specification on @cmd{TEMPORARY} is the command name.
Scratch variables cannot be used following @cmd{TEMPORARY}.
-An example may help to clarify:
+@subsection Example Temporary
-@example
-DATA LIST /X 1-2.
-BEGIN DATA.
- 2
- 4
-10
-15
-20
-24
-END DATA.
+In @ref{temporary:ex} there are two @cmd{COMPUTE} transformation. One
+of them immediatly follows a @cmd{TEMPORARY} command, and therefore has
+effect only for the next procedure, which in this case is the first
+@cmd{DESCRIPTIVES} command.
-COMPUTE X=X/2.
+@float Example, temporary:ex
+@psppsyntax {temporary.sps}
+@caption {Running a @cmd{COMPUTE} transformation after @cmd{TEMPORARY}}
+@end float
-TEMPORARY.
-COMPUTE X=X+3.
+The data read by the first @cmd{DESCRIPTIVES} procedure are 4, 5, 8,
+10.5, 13, 15. The data read by the second @cmd{DESCRIPTIVES} procedure are 1, 2,
+5, 7.5, 10, 12. This is because the second @cmd{COMPUTE} transformation
+has no effect on the second @cmd{DESCRIPTIVES} procedure. You can check these
+figures in @ref{temporary:res}.
-DESCRIPTIVES X.
-DESCRIPTIVES X.
-@end example
+@float Result, temporary:res
+@psppoutput {temporary}
+@caption {The results of running two consecutive @cmd{DESCRIPTIVES} commands after
+ a temporary transformation}
+@end float
-The data read by the first @cmd{DESCRIPTIVES} are 4, 5, 8,
-10.5, 13, 15. The data read by the first @cmd{DESCRIPTIVES} are 1, 2,
-5, 7.5, 10, 12.
@node WEIGHT
@section WEIGHT
variables must be numeric. Scratch variables may not be used for
weighting (@pxref{Scratch Variables}).
-When @subcmd{OFF} is specified, subsequent statistical procedures will weight all
+When @subcmd{OFF} is specified, subsequent statistical procedures weight all
cases equally.
-A positive integer weighting factor @var{w} on a case will yield the
+A positive integer weighting factor @var{w} on a case yields the
same statistical output as would replicating the case @var{w} times.
A weighting factor of 0 is treated for statistical purposes as if the
case did not exist in the input. Weighting values need not be
One analysis which most surely would be of interest is
the relative amounts or each item in stock.
-However without setting a weight variable, @cmd{FREQUENCIES} (@pxref{FREQUENCIES}) will not
-tell us what we want to know, since there is only one case for each stock item.
-@ref{weight:res} shows the difference between the weighted and unweighted
-frequency tables.
+However without setting a weight variable, @cmd{FREQUENCIES}
+(@pxref{FREQUENCIES}) does not tell us what we want to know, since
+there is only one case for each stock item. @ref{weight:res} shows the
+difference between the weighted and unweighted frequency tables.
@float Example, weight:res
@psppoutput {weight}