X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=4966d6afb8ef5ad078b56577be5a827d7af50be4;hb=refs%2Fheads%2Fctables12;hp=adf1f67d95dac39ae5a6f8d04061c0874671867f;hpb=b9f78e2b4d46e219985adc8ab22adf8f67e08b0a;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index adf1f67d95..4966d6afb8 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -923,7 +923,7 @@ where each @i{axis} may be empty or take one of the following forms: @i{axis} + @i{axis} @i{axis} > @i{axis} (@i{axis}) -@i{axis} @t{(}@i{summary} [@i{string}] [@i{format}]@t{)} +@i{axis} @t{[}@i{summary} [@i{string}] [@i{format}]@t{]} @end display The following subcommands precede the first @code{TABLE} subcommand @@ -1242,6 +1242,12 @@ for each function is listed in parentheses: @item @code{COUNT} (``Count'') The sum of weights in a cell. +If @code{CATEGORIES} for one or more of the variables in a table +include missing values (@pxref{CTABLES Per-Variable Category +Options}), then some or all of the categories for a cell might be +missing values. @code{COUNT} counts data included in a cell +regardless of whether its categories are missing. + @item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'') A percentage within the specified @var{area}. @@ -1252,7 +1258,11 @@ A percentage of valid values within the specified @var{area}. A percentage of total values within the specified @var{area}. @end table -The following summary functions apply only to scalar variables: +The following summary functions apply only to scalar variables or +totals and subtotals for categorical variables. Be cautious about +interpreting the summary value in the latter case, because it is not +necessarily meaningful; however, the mean of a Likert scale, etc.@: +may have a straightforward interpreation. @table @asis @item @code{MAXIMUM} (``Maximum'') @@ -1292,10 +1302,23 @@ The standard deviation. The sum. @item @code{TOTALN} (``Total N'') -The sum of total count weights. +The sum of weights in a cell. + +For scale data, @code{COUNT} and @code{TOTALN} are the same. + +For categorical data, @code{TOTALN} counts missing values in excluded +categories, that is, user-missing values not in an explicit category +list on @code{CATEGORIES} (@pxref{CTABLES Per-Variable Category +Options}), or user-missing values excluded because +@code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or +system-missing values. @code{COUNT} does not count these. @item @code{VALIDN} (``Valid N'') -The sum of valid count weights. +The sum of valid count weights in included categories. + +@code{VALIDN} does not count missing values regardless of whether they +are in included categories via @code{CATEGORIES}. @code{VALIDN} does +not count valid values that are in excluded categories. @item @code{VARIANCE} (``Variance'') The variance. @@ -1483,7 +1506,7 @@ order: @table @asis @item Explicit categories. -@anchor{CTABLE Explicit Category List} +@anchor{CTABLES Explicit Category List} To explicitly specify categories to include, list the categories within square brackets in the desired sort order. Use spaces or commas to separate values. Categories not covered by the list are @@ -1684,9 +1707,46 @@ is optional. With @code{SMISSING=VARIABLE}, which is the default, missing values are excluded on a variable-by-variable basis. With -@code{SMISSING=LISTWISE}, when scalar variables are stacked, a missing -value for any of the scalar variables causes the case to be excluded -for all of them. +@code{SMISSING=LISTWISE}, when stacked scalar variables are nested +together with a categorical variable, a missing value for any of the +scalar variables causes the case to be excluded for all of them. + +As an example, consider the following dataset, in which @samp{x} is a +categorical variable and @samp{y} and @samp{z} are scale: + +@psppoutput{ctables18} + +@noindent +With the default missing-value treatment, @samp{x}'s mean is 20, based +on the values 10, 20, and 30, and @samp{y}'s mean is 50, based on 40, +50, and 60: + +@example +CTABLES /TABLE (y + z) > x. +@end example +@psppoutput{ctables19} + +@noindent +By adding @code{SMISSING=LISTWISE}, only cases where @samp{y} and +@samp{z} are both non-missing are considered, so @samp{x}'s mean +becomes 15, as the average of 10 and 20, and @samp{y}'s mean becomes +55, the average of 50 and 60: + +@example +CTABLES /SMISSING LISTWISE /TABLE (y + z) > x. +@end example +@psppoutput{ctables20} + +@noindent +Even with @code{SMISSING=LISTWISE}, if @samp{y} and @samp{z} are +separately nested with @samp{x}, instead of using a single @samp{>} +operator, missing values revert to being considered on a +variable-by-variable basis: + +@example +CTABLES /SMISSING LISTWISE /TABLE (y > x) + (z > x). +@end example +@psppoutput{ctables21} @node CTABLES Computed Categories @subsection Computed Categories @@ -1699,7 +1759,7 @@ for all of them. categories created using arithmetic on categories obtained from the data. The @code{PCOMPUTE} subcommand defines computed categories, which can then be used in two places: on @code{CATEGORIES} within an -explicit category list (@pxref{CTABLE Explicit Category List}), and on +explicit category list (@pxref{CTABLES Explicit Category List}), and on the @code{PPROPERTIES} subcommand to define further properties for a given postcompute. @@ -1726,7 +1786,7 @@ postcompute must have the same form. @itemx MISSING @itemx OTHERNM These forms evaluate to the summary statistics for categories matching -the given syntax, as described in previous sections (@pxref{CTABLE +the given syntax, as described in previous sections (@pxref{CTABLES Explicit Category List}). If more than one category matches, their values are summed.