X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-selection.texi;h=04269f32413733af6d626c3320b3142755264912;hb=HEAD;hp=4b6a748deaac594b4763d6b7751942dd1c42e753;hpb=e5279727df5324fbdc465e983043d27d26d2b42d;p=pspp-builds.git diff --git a/doc/data-selection.texi b/doc/data-selection.texi index 4b6a748d..04269f32 100644 --- a/doc/data-selection.texi +++ b/doc/data-selection.texi @@ -1,4 +1,4 @@ -@node Data Selection, Conditionals and Looping, Data Manipulation, Top +@node Data Selection @chapter Selecting data for analysis This chapter documents PSPP commands that temporarily or permanently @@ -7,7 +7,6 @@ select data records from the active file for analysis. @menu * FILTER:: Exclude cases based on a variable. * N OF CASES:: Limit the size of the active file. -* PROCESS IF:: Temporarily excluding cases. * SAMPLE:: Select a specified proportion of cases. * SELECT IF:: Permanently delete selected cases. * SPLIT FILE:: Do multiple analyses with one command. @@ -15,7 +14,7 @@ select data records from the active file for analysis. * WEIGHT:: Weight cases by a variable. @end menu -@node FILTER, N OF CASES, Data Selection, Data Selection +@node FILTER @section FILTER @vindex FILTER @@ -44,7 +43,7 @@ case filtering continues until it is explicitly turned off with @code{FILTER OFF}. However, if @cmd{FILTER} is placed after TEMPORARY, it filters only the next procedure or procedure-like command. -@node N OF CASES, PROCESS IF, FILTER, Data Selection +@node N OF CASES @section N OF CASES @vindex N OF CASES @@ -52,87 +51,33 @@ the next procedure or procedure-like command. N [OF CASES] num_of_cases [ESTIMATED]. @end display -Sometimes you may want to disregard cases of your input. @cmd{N} can -do this. @code{N 100} tells PSPP to disregard all cases after the -first 100. - -If the value specified for @cmd{N} is greater than the number of cases -read in, the value is ignored. - -@cmd{N} does not discard cases or prevent them from being read. It -just causes cases beyond the last one specified to be ignored by data -analysis commands. - -A later @cmd{N} command can increase or decrease the number of cases -selected. (To select all the cases without knowing how many there are, -specify a very high number: 100000 or whatever you think is large enough.) - -Transformation procedures performed after @cmd{N} is executed -@emph{do} cause cases to be discarded. - -@cmd{SAMPLE}, @cmd{PROCESS IF}, and @cmd{SELECT IF} have -precedence over @cmd{N}---the same results are obtained by both of the -following fragments, given the same random number seeds: - -@example -@i{@dots{}set up, read in data@dots{}} -N 100. -SAMPLE .5. -@i{@dots{}analyze data@dots{}} - -@i{@dots{}set up, read in data@dots{}} -SAMPLE .5. -N 100. -@i{@dots{}analyze data@dots{}} -@end example - -Both fragments above first randomly sample approximately half of the -cases, then select the first 100 of those sampled. - -@cmd{N} with the @code{ESTIMATED} keyword gives an -estimated number of cases before @cmd{DATA LIST} or another command to -read in data. @code{ESTIMATED} never limits the number of cases -processed by procedures. PSPP currently does not make use of -case count estimates. - -When @cmd{N} is specified after @cmd{TEMPORARY}, it affects only -the next procedure (@pxref{TEMPORARY}). - -@node PROCESS IF, SAMPLE, N OF CASES, Data Selection -@section PROCESS IF -@vindex PROCESS IF - -@example -PROCESS IF expression. -@end example - -@cmd{PROCESS IF} temporarily eliminates cases from the -data stream. Its effects are active only through the execution of the -next procedure or procedure-like command. - -Specify a boolean expression (@pxref{Expressions}). If the value of the -expression is true for a particular case, the case will be analyzed. If -the expression has a false or missing value, then the case will be -deleted from the data stream for this procedure only. - -Regardless of its placement relative to other commands, @cmd{PROCESS IF} -always takes effect immediately before data passes to the procedure. -Only one @cmd{PROCESS IF} command may be in effect at any given time. - -The effects of @cmd{PROCESS IF} are similar, but not identical, to the -effects of executing @cmd{TEMPORARY}, then @cmd{SELECT IF} -(@pxref{SELECT IF}). - -The filtering performed by @cmd{PROCESS IF} takes place immediately -before cases pass to a procedure for analysis. Because @cmd{PROCESS -IF} affects only a single procedure, its placement relative to -@cmd{TEMPORARY} is unimportant. - -@cmd{PROCESS IF} is deprecated. It is included for compatibility with -old command files. New syntax files should use @cmd{SELECT IF} or -@cmd{FILTER} instead. - -@node SAMPLE, SELECT IF, PROCESS IF, Data Selection +@cmd{N OF CASES} limits the number of cases processed by any +procedures that follow it in the command stream. @code{N OF CASES +100}, for example, tells PSPP to disregard all cases after the first +100. + +When @cmd{N OF CASES} is specified after @cmd{TEMPORARY}, it affects +only the next procedure (@pxref{TEMPORARY}). Otherwise, cases beyond +the limit specified are not processed by any later procedure. + +If the limit specified on @cmd{N OF CASES} is greater than the number +of cases in the active file, it has no effect. + +When @cmd{N OF CASES} is used along with @cmd{SAMPLE} or @cmd{SELECT +IF}, the case limit is applied to the cases obtained after sampling or +case selection, regardless of how @cmd{N OF CASES} is placed relative +to @cmd{SAMPLE} or @cmd{SELECT IF} in the command file. Thus, the +commands @code{N OF CASES 100} and @code{SAMPLE .5} will both randomly +sample approximately half of the active file's cases, then select the +first 100 of those sampled, regardless of their order in the command +file. + +@cmd{N OF CASES} with the @code{ESTIMATED} keyword gives an estimated +number of cases before @cmd{DATA LIST} or another command to read in +data. @code{ESTIMATED} never limits the number of cases processed by +procedures. PSPP currently does not make use of case count estimates. + +@node SAMPLE @section SAMPLE @vindex SAMPLE @@ -181,7 +126,7 @@ samples may still result when the file is processed on systems with differing endianness or floating-point formats. By default, the random number seed is based on the system time. -@node SELECT IF, SPLIT FILE, SAMPLE, Data Selection +@node SELECT IF @section SELECT IF @vindex SELECT IF @@ -207,7 +152,7 @@ When @cmd{SELECT IF} is specified following @cmd{TEMPORARY} (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used (@pxref{LAG}). -@node SPLIT FILE, TEMPORARY, SELECT IF, Data Selection +@node SPLIT FILE @section SPLIT FILE @vindex SPLIT FILE @@ -240,7 +185,7 @@ entire active file as a single group of data. When @cmd{SPLIT FILE} is specified after @cmd{TEMPORARY}, it affects only the next procedure (@pxref{TEMPORARY}). -@node TEMPORARY, WEIGHT, SPLIT FILE, Data Selection +@node TEMPORARY @section TEMPORARY @vindex TEMPORARY @@ -284,7 +229,7 @@ The data read by the first @cmd{DESCRIPTIVES} are 4, 5, 8, 10.5, 13, 15. The data read by the first @cmd{DESCRIPTIVES} are 1, 2, 5, 7.5, 10, 12. -@node WEIGHT, , TEMPORARY, Data Selection +@node WEIGHT @section WEIGHT @vindex WEIGHT @@ -319,4 +264,3 @@ the next procedure (@pxref{TEMPORARY}). @cmd{WEIGHT} does not cause cases in the active file to be replicated in memory. -@setfilename ignored