Change terminology from "active file" to "active dataset".

[pspp-builds.git] / doc / data-selection.texi
diff --git a/doc/data-selection.texi b/doc/data-selection.texi

index d7f4c79fc11d756c2fdbe278a10c0edb1862d62a..d46dd310aa1654efdfa8486926afb9241fa2e60e 100644 (file)
--- a/doc/data-selection.texi
+++ b/doc/data-selection.texi
@@ -1,12 +1,12 @@
-@node Data Selection, Conditionals and Looping, Data Manipulation, Top
+@node Data Selection
  @chapter Selecting data for analysis
  
  This chapter documents PSPP commands that temporarily or permanently
-select data records from the active file for analysis.
+select data records from the active dataset for analysis.
  
  @menu
  * FILTER::                      Exclude cases based on a variable.
-* N OF CASES::                  Limit the size of the active file.
+* N OF CASES::                  Limit the size of the active dataset.
  * SAMPLE::                      Select a specified proportion of cases.
  * SELECT IF::                   Permanently delete selected cases.
  * SPLIT FILE::                  Do multiple analyses with one command.
@@ -14,7 +14,7 @@ select data records from the active file for analysis.
  * WEIGHT::                      Weight cases by a variable.
  @end menu
  
-@node FILTER, N OF CASES, Data Selection, Data Selection
+@node FILTER
  @section FILTER
  @vindex FILTER
  
@@ -51,51 +51,31 @@ the next procedure or procedure-like command.
  N [OF CASES] num_of_cases [ESTIMATED].
  @end display
  
-Sometimes you may want to disregard cases of your input.  @cmd{N} can
-do this.  @code{N 100} tells PSPP to disregard all cases after the
-first 100.
+@cmd{N OF CASES} limits the number of cases processed by any
+procedures that follow it in the command stream.  @code{N OF CASES
+100}, for example, tells PSPP to disregard all cases after the first
+100.
  
-If the value specified for @cmd{N} is greater than the number of cases
-read in, the value is ignored.
+When @cmd{N OF CASES} is specified after @cmd{TEMPORARY}, it affects
+only the next procedure (@pxref{TEMPORARY}).  Otherwise, cases beyond
+the limit specified are not processed by any later procedure.
  
-@cmd{N} does not discard cases or prevent them from being read.  It
-just causes cases beyond the last one specified to be ignored by data
-analysis commands.
+If the limit specified on @cmd{N OF CASES} is greater than the number
+of cases in the active dataset, it has no effect.
  
-A later @cmd{N} command can increase or decrease the number of cases
-selected.  (To select all the cases without knowing how many there are,
-specify a very high number: 100000 or whatever you think is large enough.)
+When @cmd{N OF CASES} is used along with @cmd{SAMPLE} or @cmd{SELECT
+IF}, the case limit is applied to the cases obtained after sampling or
+case selection, regardless of how @cmd{N OF CASES} is placed relative
+to @cmd{SAMPLE} or @cmd{SELECT IF} in the command file.  Thus, the
+commands @code{N OF CASES 100} and @code{SAMPLE .5} will both randomly
+sample approximately half of the active dataset's cases, then select the
+first 100 of those sampled, regardless of their order in the command
+file.
  
-Transformation procedures performed after @cmd{N} is executed
-@emph{do} cause cases to be discarded.
-
-@cmd{SAMPLE} and @cmd{SELECT IF} have
-precedence over @cmd{N}---the same results are obtained by both of the
-following fragments, given the same random number seeds:
-
-@example
-@i{@dots{}set up, read in data@dots{}}
-N 100.
-SAMPLE .5.
-@i{@dots{}analyze data@dots{}}
-
-@i{@dots{}set up, read in data@dots{}}  
-SAMPLE .5.
-N 100.
-@i{@dots{}analyze data@dots{}}
-@end example
-
-Both fragments above first randomly sample approximately half of the
-cases, then select the first 100 of those sampled.
-
-@cmd{N} with the @code{ESTIMATED} keyword gives an
-estimated number of cases before @cmd{DATA LIST} or another command to
-read in data.  @code{ESTIMATED} never limits the number of cases
-processed by procedures.  PSPP currently does not make use of
-case count estimates.
-
-When @cmd{N} is specified after @cmd{TEMPORARY}, it affects only
-the next procedure (@pxref{TEMPORARY}).
+@cmd{N OF CASES} with the @code{ESTIMATED} keyword gives an estimated
+number of cases before @cmd{DATA LIST} or another command to read in
+data.  @code{ESTIMATED} never limits the number of cases processed by
+procedures.  PSPP currently does not make use of case count estimates.
  
  @node SAMPLE
  @section SAMPLE
@@ -107,11 +87,11 @@ SAMPLE num1 [FROM num2].
  
  @cmd{SAMPLE} randomly samples a proportion of the cases in the active
  file.  Unless it follows @cmd{TEMPORARY}, it operates as a
-transformation, permanently removing cases from the active file.
+transformation, permanently removing cases from the active dataset.
  
  The proportion to sample can be expressed as a single number between 0
  and 1.  If @code{k} is the number specified, and @code{N} is the number
-of currently-selected cases in the active file, then after
+of currently-selected cases in the active dataset, then after
  @code{SAMPLE @var{k}.}, approximately @code{k*N} cases will be
  selected.
  
@@ -121,16 +101,16 @@ The proportion to sample can also be specified in the style @code{SAMPLE
  @enumerate
  @item
  If @var{N} is equal to the number of currently-selected cases in the
-active file, exactly @var{m} cases will be selected.
+active dataset, exactly @var{m} cases will be selected.
  
  @item
  If @var{N} is greater than the number of currently-selected cases in the
-active file, an equivalent proportion of cases will be selected.
+active dataset, an equivalent proportion of cases will be selected.
  
  @item
  If @var{N} is less than the number of currently-selected cases in the
  active, exactly @var{m} cases will be selected @emph{from the first
-@var{N} cases in the active file.}
+@var{N} cases in the active dataset.}
  @end enumerate
  
  @cmd{SAMPLE} and @cmd{SELECT IF} are performed in
@@ -146,7 +126,7 @@ samples may still result when the file is processed on systems with
  differing endianness or floating-point formats.  By default, the
  random number seed is based on the system time.
  
-@node SELECT IF, SPLIT FILE, SAMPLE, Data Selection
+@node SELECT IF
  @section SELECT IF
  @vindex SELECT IF
  
@@ -156,7 +136,7 @@ SELECT IF expression.
  
  @cmd{SELECT IF} selects cases for analysis based on the value of a
  boolean expression.  Cases not selected are permanently eliminated
-from the active file, unless @cmd{TEMPORARY} is in effect
+from the active dataset, unless @cmd{TEMPORARY} is in effect
  (@pxref{TEMPORARY}).
  
  Specify a boolean expression (@pxref{Expressions}).  If the value of the
@@ -172,7 +152,7 @@ When @cmd{SELECT IF} is specified following @cmd{TEMPORARY}
  (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
  (@pxref{LAG}).
  
-@node SPLIT FILE, TEMPORARY, SELECT IF, Data Selection
+@node SPLIT FILE
  @section SPLIT FILE
  @vindex SPLIT FILE
  
@@ -200,12 +180,12 @@ using a variable where like values are not adjacent in the working file,
  you should first sort the data by that variable (@pxref{SORT CASES}).
  
  Specify OFF to disable @cmd{SPLIT FILE} and resume analysis of the
-entire active file as a single group of data.
+entire active dataset as a single group of data.
  
  When @cmd{SPLIT FILE} is specified after @cmd{TEMPORARY}, it affects only
  the next procedure (@pxref{TEMPORARY}).
  
-@node TEMPORARY, WEIGHT, SPLIT FILE, Data Selection
+@node TEMPORARY
  @section TEMPORARY
  @vindex TEMPORARY
  
@@ -216,7 +196,7 @@ TEMPORARY.
  @cmd{TEMPORARY} is used to make the effects of transformations
  following its execution temporary.  These transformations will
  affect only the execution of the next procedure or procedure-like
-command.  Their effects will not be saved to the active file.
+command.  Their effects will not be saved to the active dataset.
  
  The only specification on @cmd{TEMPORARY} is the command name.
  
@@ -249,7 +229,7 @@ The data read by the first @cmd{DESCRIPTIVES} are 4, 5, 8,
  10.5, 13, 15.  The data read by the first @cmd{DESCRIPTIVES} are 1, 2,
  5, 7.5, 10, 12.
  
-@node WEIGHT,  , TEMPORARY, Data Selection
+@node WEIGHT
  @section WEIGHT
  @vindex WEIGHT
  
@@ -259,7 +239,7 @@ WEIGHT OFF.
  @end display
  
  @cmd{WEIGHT} assigns cases varying weights,
-changing the frequency distribution of the active file.  Execution of
+changing the frequency distribution of the active dataset.  Execution of
  @cmd{WEIGHT} is delayed until data have been read.
  
  If a variable name is specified, @cmd{WEIGHT} causes the values of that
@@ -282,6 +262,5 @@ values are not treated specially.
  When @cmd{WEIGHT} is specified after @cmd{TEMPORARY}, it affects only
  the next procedure (@pxref{TEMPORARY}).
  
-@cmd{WEIGHT} does not cause cases in the active file to be replicated in
-memory.
-@setfilename ignored
+@cmd{WEIGHT} does not cause cases in the active dataset to be
+replicated in memory.