work on getting better output into documentation

[pspp] / doc / data-selection.texi
diff --git a/doc/data-selection.texi b/doc/data-selection.texi

index 9d8cbbf2f51d61dfa796fa0f0b65e6527a6a0a06..71b4e3de5aa45a8b4c72a67ff8725157446731f4 100644 (file)
--- a/doc/data-selection.texi
+++ b/doc/data-selection.texi
@@ -1,5 +1,5 @@
  @c PSPP - a program for statistical analysis.
-@c Copyright (C) 2017 Free Software Foundation, Inc.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
  @c Permission is granted to copy, distribute and/or modify this document
  @c under the terms of the GNU Free Documentation License, Version 1.3
  @c or any later version published by the Free Software Foundation;
@@ -76,7 +76,7 @@ When @cmd{N OF CASES} is used along with @cmd{SAMPLE} or @cmd{SELECT
  IF}, the case limit is applied to the cases obtained after sampling or
  case selection, regardless of how @cmd{N OF CASES} is placed relative
  to @cmd{SAMPLE} or @cmd{SELECT IF} in the command file.  Thus, the
-commands @code{N OF CASES 100} and @code{SAMPLE .5} will both randomly
+commands @code{N OF CASES 100} and @code{SAMPLE .5} both randomly
  sample approximately half of the active dataset's cases, then select the
  first 100 of those sampled, regardless of their order in the command
  file.
@@ -101,7 +101,7 @@ transformation, permanently removing cases from the active dataset.
  The proportion to sample can be expressed as a single number between 0
  and 1.  If @var{k} is the number specified, and @var{N} is the number
  of currently-selected cases in the active dataset, then after
-@subcmd{SAMPLE @var{k}.}, approximately @var{k}*@var{N} cases will be
+@subcmd{SAMPLE @var{k}.}, approximately @var{k}*@var{N} cases are
  selected.
  
  The proportion to sample can also be specified in the style @subcmd{SAMPLE
@@ -110,15 +110,15 @@ The proportion to sample can also be specified in the style @subcmd{SAMPLE
  @enumerate
  @item
  If @var{N} is equal to the number of currently-selected cases in the
-active dataset, exactly @var{m} cases will be selected.
+active dataset, exactly @var{m} cases are selected.
  
  @item
  If @var{N} is greater than the number of currently-selected cases in the
-active dataset, an equivalent proportion of cases will be selected.
+active dataset, an equivalent proportion of cases are selected.
  
  @item
  If @var{N} is less than the number of currently-selected cases in the
-active, exactly @var{m} cases will be selected @emph{from the first
+active, exactly @var{m} cases are selected @emph{from the first
  @var{N} cases in the active dataset.}
  @end enumerate
  
@@ -149,8 +149,8 @@ from the active dataset, unless @cmd{TEMPORARY} is in effect
  (@pxref{TEMPORARY}).
  
  Specify a boolean expression (@pxref{Expressions}).  If the value of the
-expression is true for a particular case, the case will be analyzed.  If
-the expression has a false or missing value, then the case will be
+expression is true for a particular case, the case is analyzed.  If
+the expression has a false or missing value, then the case is
  deleted from the data stream.
  
  Place @cmd{SELECT IF} as early in the command file as
@@ -184,7 +184,7 @@ When a list of variable names is specified, one of the keywords
  @subcmd{LAYERED} or @subcmd{SEPARATE} may also be specified.  If provided, either
  keyword are ignored.
  
-Groups are formed only by @emph{adjacent} cases.  To create a split 
+Groups are formed only by @emph{adjacent} cases.  To create a split
  using a variable where like values are not adjacent in the working file,
  you should first sort the data by that variable (@pxref{SORT CASES}).
  
@@ -194,6 +194,40 @@ entire active dataset as a single group of data.
  When @cmd{SPLIT FILE} is specified after @cmd{TEMPORARY}, it affects only
  the next procedure (@pxref{TEMPORARY}).
  
+@subsection Example Split
+
+The file @file{horticulture.sav} contains data describing the @exvar{yield}
+of a number of horticultural specimens which have been subjected to
+various @exvar{treatment}s.   If we wanted to investigate linear statistics
+of the @exvar{yeild}, one way to do this is using the @cmd{DESCRIPTIVES} (@pxref{DESCRIPTIVES}).
+However, it is reasonable to expect the mean to be different depending
+on the @exvar{treatment}.   So we might want to perform three separate
+procedures --- one for each treatment.
+@footnote{There are other, possibly better, ways to achieve a similar result
+using the @cmd{MEANS} or @cmd{EXAMINE} commands.}
+@ref{split:ex} shows how this can be done automatically using
+the @cmd{SPLIT FILE} command.
+
+@float Example, split:ex
+@psppsyntax {split.sps}
+@caption {Running @cmd{DESCRIPTIVES} on each value of @exvar{treatment}}
+@end float
+
+In @ref{split:res} you can see that the table of descriptive statistics
+appears 3 times --- once for each value of @exvar{treatment}.
+In this example @samp{N}, the number of observations are identical in
+all splits.  This is because that experiment was deliberately designed
+that way.  However in general one can expect a different @samp{N} for each
+split.
+
+@float Example, split:res
+@psppoutput {split}
+@caption {The results of running @cmd{DESCRIPTIVES} with an active split}
+@end float
+
+Unless @cmd{TEMPORARY} was used, after a split has been defined for
+a dataset it remains active until explicitly disabled.
+
  @node TEMPORARY
  @section TEMPORARY
  @vindex TEMPORARY
@@ -203,9 +237,9 @@ TEMPORARY.
  @end display
  
  @cmd{TEMPORARY} is used to make the effects of transformations
-following its execution temporary.  These transformations will
+following its execution temporary.  These transformations
  affect only the execution of the next procedure or procedure-like
-command.  Their effects will not be saved to the active dataset.
+command.  Their effects are not be saved to the active dataset.
  
  The only specification on @cmd{TEMPORARY} is the command name.
  
@@ -260,10 +294,10 @@ procedures.  Use of keyword @subcmd{BY} is optional but recommended.  Weighting
  variables must be numeric.  Scratch variables may not be used for
  weighting (@pxref{Scratch Variables}).
  
-When @subcmd{OFF} is specified, subsequent statistical procedures will weight all
+When @subcmd{OFF} is specified, subsequent statistical procedures weight all
  cases equally.
  
-A positive integer weighting factor @var{w} on a case will yield the
+A positive integer weighting factor @var{w} on a case yields the
  same statistical output as would replicating the case @var{w} times.
  A weighting factor of 0 is treated for statistical purposes as if the
  case did not exist in the input.  Weighting values need not be
@@ -276,3 +310,27 @@ the next procedure (@pxref{TEMPORARY}).
  
  @cmd{WEIGHT} does not cause cases in the active dataset to be
  replicated in memory.
+
+
+@subsection Example Weights
+
+One could define a  dataset containing an inventory of stock items.
+It would be reasonable to use a string variable for a description of the
+item, and a numeric variable for the number in stock, like in @ref{weight:ex}.
+
+@float Example, weight:ex
+@psppsyntax {weight.sps}
+@caption {Setting the weight on the variable @exvar{quantity}}
+@end float
+
+One analysis which most surely would be of interest is
+the relative amounts or each item in stock.
+However without setting a weight variable, @cmd{FREQUENCIES}
+(@pxref{FREQUENCIES}) does not tell us what we want to know, since
+there is only one case for each stock item. @ref{weight:res} shows the
+difference between the weighted and unweighted frequency tables.
+
+@float Example, weight:res
+@psppoutput {weight}
+@caption {Weighted and unweighted frequency tables of @exvar{items}}
+@end float