Handle multiple postcomputes.

[pspp] / doc / statistics.texi
diff --git a/doc/statistics.texi b/doc/statistics.texi

index 8c4a2dde445ec6c67b802ca40665d7aa19a56c6c..4966d6afb8ef5ad078b56577be5a827d7af50be4 100644 (file)
--- a/doc/statistics.texi
+++ b/doc/statistics.texi
@@ -923,7 +923,7 @@ where each @i{axis} may be empty or take one of the following forms:
  @i{axis} + @i{axis}
  @i{axis} > @i{axis}
  (@i{axis})
-@i{axis} @t{(}@i{summary} [@i{string}] [@i{format}]@t{)}
+@i{axis} @t{[}@i{summary} [@i{string}] [@i{format}]@t{]}
  @end display
  
  The following subcommands precede the first @code{TABLE} subcommand
@@ -944,8 +944,8 @@ optional:
  @t{/MRSETS COUNTDUPLICATES=}@{@t{YES} @math{|} @t{NO}@}
  @end ignore
  @t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@}
-@t{/PCOMPUTE} @t{&}@i{category}@t{=EXPR(}@i{expression}@t{)}
-@t{/PPROPERTIES} @t{&}@i{category}@dots{}
+@t{/PCOMPUTE} @t{&}@i{postcompute}@t{=EXPR(}@i{expression}@t{)}
+@t{/PPROPERTIES} @t{&}@i{postcompute}@dots{}
      [@t{LABEL=}@i{string}]
      [@t{FORMAT=}[@i{summary} @i{format}]@dots{}]
      [@t{HIDESOURCECATS=}@{@t{NO} @math{|} @t{YES}@}
@@ -1242,6 +1242,12 @@ for each function is listed in parentheses:
  @item @code{COUNT} (``Count'')
  The sum of weights in a cell.
  
+If @code{CATEGORIES} for one or more of the variables in a table
+include missing values (@pxref{CTABLES Per-Variable Category
+Options}), then some or all of the categories for a cell might be
+missing values.  @code{COUNT} counts data included in a cell
+regardless of whether its categories are missing.
+
  @item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'')
  A percentage within the specified @var{area}.
  
@@ -1252,7 +1258,11 @@ A percentage of valid values within the specified @var{area}.
  A percentage of total values within the specified @var{area}.
  @end table
  
-The following summary functions apply only to scalar variables:
+The following summary functions apply only to scalar variables or
+totals and subtotals for categorical variables.  Be cautious about
+interpreting the summary value in the latter case, because it is not
+necessarily meaningful; however, the mean of a Likert scale, etc.@:
+may have a straightforward interpreation.
  
  @table @asis
  @item @code{MAXIMUM} (``Maximum'')
@@ -1292,10 +1302,23 @@ The standard deviation.
  The sum.
  
  @item @code{TOTALN} (``Total N'')
-The sum of total count weights.
+The sum of weights in a cell.
+
+For scale data, @code{COUNT} and @code{TOTALN} are the same.
+
+For categorical data, @code{TOTALN} counts missing values in excluded
+categories, that is, user-missing values not in an explicit category
+list on @code{CATEGORIES} (@pxref{CTABLES Per-Variable Category
+Options}), or user-missing values excluded because
+@code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or
+system-missing values.  @code{COUNT} does not count these.
  
  @item @code{VALIDN} (``Valid N'')
-The sum of valid count weights.
+The sum of valid count weights in included categories.
+
+@code{VALIDN} does not count missing values regardless of whether they
+are in included categories via @code{CATEGORIES}.  @code{VALIDN} does
+not count valid values that are in excluded categories.
  
  @item @code{VARIANCE} (``Variance'')
  The variance.
@@ -1483,7 +1506,7 @@ order:
  
  @table @asis
  @item Explicit categories.
-@anchor{CTABLE Explicit Category List}
+@anchor{CTABLES Explicit Category List}
  To explicitly specify categories to include, list the categories
  within square brackets in the desired sort order.  Use spaces or
  commas to separate values.  Categories not covered by the list are
@@ -1516,7 +1539,7 @@ specify their category values.)
  Any non-missing value not covered by any other element of the list
  (regardless of where @t{OTHERNM} is placed in the list).
  
-@item &@i{pcompute}
+@item &@i{postcompute}
  A computed category name (@pxref{CTABLES Computed Categories}).
  @end table
  
@@ -1684,33 +1707,188 @@ is optional.
  
  With @code{SMISSING=VARIABLE}, which is the default, missing values
  are excluded on a variable-by-variable basis.  With
-@code{SMISSING=LISTWISE}, when scalar variables are stacked, a missing
-value for any of the scalar variables causes the case to be excluded
-for all of them.
+@code{SMISSING=LISTWISE}, when stacked scalar variables are nested
+together with a categorical variable, a missing value for any of the
+scalar variables causes the case to be excluded for all of them.
+
+As an example, consider the following dataset, in which @samp{x} is a
+categorical variable and @samp{y} and @samp{z} are scale:
+
+@psppoutput{ctables18}
+
+@noindent
+With the default missing-value treatment, @samp{x}'s mean is 20, based
+on the values 10, 20, and 30, and @samp{y}'s mean is 50, based on 40,
+50, and 60:
+
+@example
+CTABLES /TABLE (y + z) > x.
+@end example
+@psppoutput{ctables19}
+
+@noindent
+By adding @code{SMISSING=LISTWISE}, only cases where @samp{y} and
+@samp{z} are both non-missing are considered, so @samp{x}'s mean
+becomes 15, as the average of 10 and 20, and @samp{y}'s mean becomes
+55, the average of 50 and 60:
+
+@example
+CTABLES /SMISSING LISTWISE /TABLE (y + z) > x.
+@end example
+@psppoutput{ctables20}
+
+@noindent
+Even with @code{SMISSING=LISTWISE}, if @samp{y} and @samp{z} are
+separately nested with @samp{x}, instead of using a single @samp{>}
+operator, missing values revert to being considered on a
+variable-by-variable basis:
+
+@example
+CTABLES /SMISSING LISTWISE /TABLE (y > x) + (z > x).
+@end example
+@psppoutput{ctables21}
  
  @node CTABLES Computed Categories
  @subsection Computed Categories
  
  @display
-@t{/PCOMPUTE} @t{&}@i{category}@t{=EXPR(}@i{expression}@t{)}
-@t{/PPROPERTIES} @t{&}@i{category}@dots{}
-    [@t{LABEL=}@i{string}]
-    [@t{FORMAT=}[@i{summary} @i{format}]@dots{}]
-    [@t{HIDESOURCECATS=}@{@t{NO} @math{|} @t{YES}@}
+@t{/PCOMPUTE} @t{&}@i{postcompute}@t{=EXPR(}@i{expression}@t{)}
  @end display
  
  @dfn{Computed categories}, also called @dfn{postcomputes}, are
  categories created using arithmetic on categories obtained from the
  data.  The @code{PCOMPUTE} subcommand defines computed categories,
  which can then be used in two places: on @code{CATEGORIES} within an
-explicit category list (@pxref{CTABLE Explicit Category List}), and on
+explicit category list (@pxref{CTABLES Explicit Category List}), and on
  the @code{PPROPERTIES} subcommand to define further properties for a
  given postcompute.
  
  @code{PCOMPUTE} must precede the first @code{TABLE} command.  It is
-optional and it may be used multiple times to define multiple
+optional and it may be used any number of times to define multiple
  postcomputes.
  
+Each @code{PCOMPUTE} defines one postcompute.  Its syntax consists of
+a name to identify the postcompute as a @pspp{} identifier prefixed by
+@samp{&}, followed by @samp{=} and a postcompute expression enclosed
+in @code{EXPR(@dots{})}.  A postcompute expression consists of:
+
+@table @t
+@item [@i{category}]
+This form evaluates to the summary statistic for @i{category}, e.g.@:
+@code{[1]} evaluates to the value of the summary statistic associated
+with category 1.  The @i{category} may be a number, a quoted string,
+or a quoted time or date value, and all of the categories for a given
+postcompute must have the same form.
+
+@item [@i{min} THRU @i{max}]
+@itemx [LO THRU @i{max}]
+@itemx [@i{min} THRU HI]
+@itemx MISSING
+@itemx OTHERNM
+These forms evaluate to the summary statistics for categories matching
+the given syntax, as described in previous sections (@pxref{CTABLES
+Explicit Category List}).  If more than one category matches, their
+values are summed.
+
+@item SUBTOTAL
+The summary statistic for the subtotal category.  This form is allowed
+only for variables with exactly one subtotal.
+
+@item SUBTOTAL[@i{index}]
+The summary statistic for subtotal category @i{index}, where 1 is the
+first subtotal, 2 is the second, and so on.  This form may be used for
+any number of subtotals.
+
+@item TOTAL
+The summary statistic for the total.
+
+@item @i{a} + @i{b}
+@itemx @i{a} - @i{b}
+@itemx @i{a} * @i{b}
+@itemx @i{a} / @i{b}
+@itemx @i{a} ** @i{b}
+These forms perform arithmetic on the values of postcompute
+expressions @i{a} and @i{b}.  The usual operator precedence rules
+apply.
+
+@item @i{number}
+Numeric constants may be used in postcompute expressions.
+
+@item (@i{a})
+Parentheses override operator precedence.
+@end table
+
+A postcompute is not associated with any particular variable.
+Instead, it may be referenced within @code{CATEGORIES} for any
+suitable variable (e.g.@: only a string variable is suitable for a
+postcompute expression that refers to a string category, only a
+variable with subtotals for an expression that refers to subtotals,
+@dots{}).
+
+Normally a named postcompute is defined only once, but if a later
+@code{PCOMPUTE} redefines a postcompute with the same name as an
+earlier one, the later one take precedence.
+
+@node CTABLES Computed Category Properties
+@subsection Computed Category Properties
+
+@display
+@t{/PPROPERTIES} @t{&}@i{postcompute}@dots{}
+    [@t{LABEL=}@i{string}]
+    [@t{FORMAT=}[@i{summary} @i{format}]@dots{}]
+    [@t{HIDESOURCECATS=}@{@t{NO} @math{|} @t{YES}@}
+@end display
+
+The @code{PPROPERTIES} subcommand, which must appear before
+@code{TABLE}, sets properties for one or more postcomputes defined on
+prior @code{PCOMPUTE} subcommands.  The subcommand syntax begins with
+the list of postcomputes, each prefixed with @samp{&} as specified on
+@code{PCOMPUTE}.
+
+All of the settings on @code{PPROPERTIES} are optional.  Use
+@code{LABEL} to set the label shown for the postcomputes in table
+output.  The default label for a postcompute is the expression used to
+define it.
+
+The @code{FORMAT} setting sets summary statistics and display formats
+for the postcomputes.
+
+By default, or with @code{HIDESOURCECATS=NO}, categories referred to
+by computed categories are displayed like other categories.  Use
+@code{HIDESOURCECATS=YES} to hide them.
+
+@node CTABLES Base Weight
+@subsection Base Weight
+
+@display
+@t{/WEIGHT VARIABLE=}@i{variable}
+@end display
+
+The @code{WEIGHT} subcommand is optional and must appear before
+@code{TABLE}.  If it appears, it must name a numeric variable, known
+as the @dfn{effective base weight} or @dfn{adjustment weight}.  The
+effective base weight variable is used for the @code{ECOUNT},
+@code{ETOTALN}, and @code{EVALIDN} summary functions.
+
+Cases with zero, missing, or negative effective base weight are
+excluded from all analysis.
+
+Weights obtained from the @pspp{} dictionary are rounded to the
+nearest integer.  Effective base weights are not rounded.
+
+@node CTABLES Hiding Small Counts
+@subsection Hiding Small Counts
+
+@display
+@t{/HIDESMALLCOUNTS COUNT=@i{count}}
+@end display
+
+The @code{HIDESMALLCOUNTS} subcommand is optional.  If it specified,
+then count values in output tables less than the value of @i{count}
+are shown as @code{<@i{count}} instead of their true values.  The
+value of @i{count} must be an integer and must be at least 2.  Case
+weights are considered for deciding whether to hide a count.
+
  @node FACTOR
  @section FACTOR