-@node Data Selection, Conditionals and Looping, Data Manipulation, Top
+@node Data Selection
@chapter Selecting data for analysis
This chapter documents PSPP commands that temporarily or permanently
@menu
* FILTER:: Exclude cases based on a variable.
* N OF CASES:: Limit the size of the active file.
-* PROCESS IF:: Temporarily excluding cases.
* SAMPLE:: Select a specified proportion of cases.
* SELECT IF:: Permanently delete selected cases.
* SPLIT FILE:: Do multiple analyses with one command.
* WEIGHT:: Weight cases by a variable.
@end menu
-@node FILTER, N OF CASES, Data Selection, Data Selection
+@node FILTER
@section FILTER
@vindex FILTER
OFF}. However, if @cmd{FILTER} is placed after TEMPORARY, it filters only
the next procedure or procedure-like command.
-@node N OF CASES, PROCESS IF, FILTER, Data Selection
+@node N OF CASES
@section N OF CASES
@vindex N OF CASES
N [OF CASES] num_of_cases [ESTIMATED].
@end display
-Sometimes you may want to disregard cases of your input. @cmd{N} can
-do this. @code{N 100} tells PSPP to disregard all cases after the
-first 100.
-
-If the value specified for @cmd{N} is greater than the number of cases
-read in, the value is ignored.
-
-@cmd{N} does not discard cases or prevent them from being read. It
-just causes cases beyond the last one specified to be ignored by data
-analysis commands.
-
-A later @cmd{N} command can increase or decrease the number of cases
-selected. (To select all the cases without knowing how many there are,
-specify a very high number: 100000 or whatever you think is large enough.)
-
-Transformation procedures performed after @cmd{N} is executed
-@emph{do} cause cases to be discarded.
-
-@cmd{SAMPLE}, @cmd{PROCESS IF}, and @cmd{SELECT IF} have
-precedence over @cmd{N}---the same results are obtained by both of the
-following fragments, given the same random number seeds:
-
-@example
-@i{@dots{}set up, read in data@dots{}}
-N 100.
-SAMPLE .5.
-@i{@dots{}analyze data@dots{}}
-
-@i{@dots{}set up, read in data@dots{}}
-SAMPLE .5.
-N 100.
-@i{@dots{}analyze data@dots{}}
-@end example
-
-Both fragments above first randomly sample approximately half of the
-cases, then select the first 100 of those sampled.
-
-@cmd{N} with the @code{ESTIMATED} keyword gives an
-estimated number of cases before @cmd{DATA LIST} or another command to
-read in data. @code{ESTIMATED} never limits the number of cases
-processed by procedures. PSPP currently does not make use of
-case count estimates.
-
-When @cmd{N} is specified after @cmd{TEMPORARY}, it affects only
-the next procedure (@pxref{TEMPORARY}).
-
-@node PROCESS IF, SAMPLE, N OF CASES, Data Selection
-@section PROCESS IF
-@vindex PROCESS IF
-
-@example
-PROCESS IF expression.
-@end example
-
-@cmd{PROCESS IF} temporarily eliminates cases from the
-data stream. Its effects are active only through the execution of the
-next procedure or procedure-like command.
-
-Specify a boolean expression (@pxref{Expressions}). If the value of the
-expression is true for a particular case, the case will be analyzed. If
-the expression has a false or missing value, then the case will be
-deleted from the data stream for this procedure only.
-
-Regardless of its placement relative to other commands, @cmd{PROCESS IF}
-always takes effect immediately before data passes to the procedure.
-Only one @cmd{PROCESS IF} command may be in effect at any given time.
-
-The effects of @cmd{PROCESS IF} are similar, but not identical, to the
-effects of executing @cmd{TEMPORARY}, then @cmd{SELECT IF}
-(@pxref{SELECT IF}).
-
-The filtering performed by @cmd{PROCESS IF} takes place immediately
-before cases pass to a procedure for analysis. Because @cmd{PROCESS
-IF} affects only a single procedure, its placement relative to
-@cmd{TEMPORARY} is unimportant.
-
-@cmd{PROCESS IF} is deprecated. It is included for compatibility with
-old command files. New syntax files should use @cmd{SELECT IF} or
-@cmd{FILTER} instead.
-
-@node SAMPLE, SELECT IF, PROCESS IF, Data Selection
+@cmd{N OF CASES} limits the number of cases processed by any
+procedures that follow it in the command stream. @code{N OF CASES
+100}, for example, tells PSPP to disregard all cases after the first
+100.
+
+When @cmd{N OF CASES} is specified after @cmd{TEMPORARY}, it affects
+only the next procedure (@pxref{TEMPORARY}). Otherwise, cases beyond
+the limit specified are not processed by any later procedure.
+
+If the limit specified on @cmd{N OF CASES} is greater than the number
+of cases in the active file, it has no effect.
+
+When @cmd{N OF CASES} is used along with @cmd{SAMPLE} or @cmd{SELECT
+IF}, the case limit is applied to the cases obtained after sampling or
+case selection, regardless of how @cmd{N OF CASES} is placed relative
+to @cmd{SAMPLE} or @cmd{SELECT IF} in the command file. Thus, the
+commands @code{N OF CASES 100} and @code{SAMPLE .5} will both randomly
+sample approximately half of the active file's cases, then select the
+first 100 of those sampled, regardless of their order in the command
+file.
+
+@cmd{N OF CASES} with the @code{ESTIMATED} keyword gives an estimated
+number of cases before @cmd{DATA LIST} or another command to read in
+data. @code{ESTIMATED} never limits the number of cases processed by
+procedures. PSPP currently does not make use of case count estimates.
+
+@node SAMPLE
@section SAMPLE
@vindex SAMPLE
differing endianness or floating-point formats. By default, the
random number seed is based on the system time.
-@node SELECT IF, SPLIT FILE, SAMPLE, Data Selection
+@node SELECT IF
@section SELECT IF
@vindex SELECT IF
(@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
(@pxref{LAG}).
-@node SPLIT FILE, TEMPORARY, SELECT IF, Data Selection
+@node SPLIT FILE
@section SPLIT FILE
@vindex SPLIT FILE
@display
-SPLIT FILE [{LAYERED, SEPARATE}] BY var_list.
+SPLIT FILE [@{LAYERED, SEPARATE@}] BY var_list.
SPLIT FILE OFF.
@end display
When @cmd{SPLIT FILE} is specified after @cmd{TEMPORARY}, it affects only
the next procedure (@pxref{TEMPORARY}).
-@node TEMPORARY, WEIGHT, SPLIT FILE, Data Selection
+@node TEMPORARY
@section TEMPORARY
@vindex TEMPORARY
10.5, 13, 15. The data read by the first @cmd{DESCRIPTIVES} are 1, 2,
5, 7.5, 10, 12.
-@node WEIGHT, , TEMPORARY, Data Selection
+@node WEIGHT
@section WEIGHT
@vindex WEIGHT
@cmd{WEIGHT} does not cause cases in the active file to be replicated in
memory.
-@setfilename ignored