X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=82855902f29c530d9257b8af3aa4ee2f8211f6c5;hb=fe829ccb26aa877de1068a3f1e237930ea8f2d34;hp=b1e040283d1b6dcf19231f9e5cf87607df71124f;hpb=f14ad7d1cee6228053fbadadfee9c86eb80da765;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index b1e040283d..82855902f2 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -1121,7 +1121,7 @@ When @pspp{} reads data from a file in an external format, such as a text file, variables' measurement levels are often unknown. If @code{CTABLES} runs when a variable has an unknown measurement level, it makes an initial pass through the data to guess measurement levels -using the rules described earlier in this manual (@pxref{Measurement +using the rules described in an earlier section (@pxref{Measurement Level}). Use the @code{VARIABLE LEVEL} command to set or change a variable's measurement level (@pxref{VARIABLE LEVEL}). @@ -1293,12 +1293,17 @@ Options}), or user-missing values excluded because @code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or system-missing values. @code{COUNT} does not count these. +@xref{CTABLES Missing Values for Summary Variables}, for details of +how @code{CTABLES} summarizes missing values. + @item @code{VALIDN} (``Valid N'', F40.0) The sum of valid count weights in included categories. -@code{VALIDN} does not count missing values regardless of whether they -are in included categories via @code{CATEGORIES}. @code{VALIDN} does -not count valid values that are in excluded categories. +For categorical variables, @code{VALIDN} does not count missing values +regardless of whether they are in included categories via +@code{CATEGORIES}. @code{VALIDN} does not count valid values that are +in excluded categories. @xref{CTABLES Missing Values for Summary +Variables}, for details. @item @code{VARIANCE} (``Variance'') The variance. @@ -1375,10 +1380,11 @@ Percentage of the sum of the values within @var{area}. @node CTABLES Summary Functions for Adjusted Weights @subsubsection Summary Functions for Adjusted Weights -If the @code{WEIGHT} subcommand specified an adjustment weight -variable, then the following summary functions use its value instead -of the dictionary weight variable. Otherwise, they are equivalent to -the summary function without the @samp{E}-prefix: +If the @code{WEIGHT} subcommand specified an effective weight variable +(@pxref{CTABLES Effective Weight}), then the following summary functions +use its value instead of the dictionary weight variable. Otherwise, +they are equivalent to the summary function without the +@samp{E}-prefix: @itemize @bullet @item @@ -1421,7 +1427,7 @@ counts: @code{UMISSING} (``Unweighted Missing'') @item -@code{UMODE} (``Unweight Mode'') +@code{UMODE} (``Unweighted Mode'') @item @code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'', PCT40.1) @@ -1697,7 +1703,20 @@ variables may be ``totaled'' indirectly by enabling totals and subtotals on a categorical variable within which the scalar variable is summarized. -@c TODO Specifying summaries for totals and subtotals +By default, @pspp{} uses the same summary functions for totals and +subtotals as other categories. To summarize totals and subtotals +differently, specify the summary functions for totals and subtotals +after the ordinary summary functions inside a nested set of @code{[]} +following @code{TOTALS}. For example, the following syntax displays +@code{COUNT} for individual categories and totals and @code{VALIDN} +for totals, as shown: + +@example +CTABLES + /TABLE qnd7a [COUNT, TOTALS[COUNT, VALIDN]] + /CATEGORIES VARIABLES=qnd7a TOTAL=YES MISSING=INCLUDE. +@end example +@psppoutput {ctables26} @subsubheading Categories Without Values @@ -1823,55 +1842,68 @@ Show nothing. @node CTABLES Missing Value Treatment @subsection Missing Value Treatment +The @code{TABLE} subcommand on @code{CTABLES} specifies two different +kinds of variables: variables that divide tables into cells (which are +always categorical) and variables being summarized (which may be +categorical or scale). @pspp{} treats missing values differently in +each kind of variable: +@itemize @bullet +@item +For variables that divide tables into cells, per-variable category +options determine which data is analyzed. If any of the categories +for such a variable would exclude a case, then that case is not +included. -The sections below describe how @code{CTABLES} treats missing values -in categorical and scale variables. +@item +The treatment of missing values in variables being summarized varies +between scale and scale and categorical variables. The following +section describes their treatment in detail. -@node CTABLES Categorical Missing Values -@subsubsection Categorical Missing Values +By default, each summarized variable is considered separately for +missing value treatment. A section below describes how to consider +missing values listwise for summarizing scale variables. +@end itemize -For categorical variables, in most cases, values that are valid and in -included categories are analyzed, and values that are missing or in -excluded categories are not analyzed. (@xref{CTABLES Per-Variable -Category Options}), for information on included and excluded -categories.) The exact rules are shown in the following chart, in -which cells that contain ``yes'' indicate that a value is analyzed: +@node CTABLES Missing Values for Summary Variables +@subsubsection Missing Values for Summary Variables -@multitable {@headitemfont{System-Missing}} {Included Category} {Excluded Category} -@headitem @tab Included Category @tab Excluded Category -@item @headitemfont{Valid} @tab yes @tab --- -@item @headitemfont{User-Missing} @tab yes [*] @tab --- [+] -@item @headitemfont{System-Missing} @tab n/a [#] @tab --- [+] -@end multitable +For summary variables, values that are valid and in included +categories are analyzed, and values that are missing or in excluded +categories are not analyzed, with the following exceptions: -@table @asis -@item [*] -Exceptions: The ``@t{VALIDN}'' summary functions (@code{VALIDN}, -@code{EVALIDN}, @code{UVALIDN}, @code{@i{area}PCT.VALIDN}, and -@code{U@i{area}PCT.VALIDN}), which only count valid values in included -categories. +@itemize @bullet +@item +The ``@t{VALIDN}'' summary functions (@code{VALIDN}, @code{EVALIDN}, +@code{UVALIDN}, @code{@i{area}PCT.VALIDN}, and +@code{U@i{area}PCT.VALIDN}) only count valid values in included +categories (not missing values in included categories). -@item [+] -Exceptions: The ``@t{TOTALN}'' summary functions (@code{TOTALN}, -@code{ETOTALN}, @code{UTOTALN}, @code{@i{area}PCT.TOTALN}), and -@code{U@i{area}PCT.TOTALN}, which count all values (valid and missing) -in included categories and missing (but not valid) values in excluded +@item +The ``@t{TOTALN}'' summary functions (@code{TOTALN}, @code{ETOTALN}, +@code{UTOTALN}, @code{@i{area}PCT.TOTALN}), and +@code{U@i{area}PCT.TOTALN} count all values (valid and missing) in +included categories and missing (but not valid) values in excluded categories. - -@item [#] -System-missing values are never in included categories. -@end table +@end itemize @noindent -The following table provides another view of the same information: - -@multitable {Missing values in excluded categories} {@code{VALIDN}} {other} {@code{TOTALN}} -@headitem @tab @code{VALIDN} @tab other @tab @code{TOTALN} -@item Valid values in included categories @tab yes @tab yes @tab yes -@item Missing values in included categories @tab --- @tab yes @tab yes -@item Missing values in excluded categories @tab --- @tab --- @tab yes -@item Valid values in excluded categories @tab --- @tab --- @tab --- +For categorical variables, system-missing values are never in included +categories. For scale variables, there is no notion of included and +excluded categories, so all values are effectively included. + +The following table provides another view of the above rules: + +@multitable {@w{ }@w{ }@w{ }@w{ }Missing values in excluded categories} {@t{VALIDN}} {other} {@t{TOTALN}} +@headitem @tab @t{VALIDN} @tab other @tab @t{TOTALN} +@item @headitemfont{Categorical variables:} +@item @w{ }@w{ }@w{ }@w{ }Valid values in included categories @tab yes @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }Missing values in included categories @tab --- @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }Missing values in excluded categories @tab --- @tab --- @tab yes +@item @w{ }@w{ }@w{ }@w{ }Valid values in excluded categories @tab --- @tab --- @tab --- +@item @headitemfont{Scale variables:} +@item @w{ }@w{ }@w{ }@w{ }Valid values @tab yes @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }User- or system-missing values @tab --- @tab yes @tab yes @end multitable @node CTABLES Scale Missing Values @@ -2041,8 +2073,8 @@ By default, or with @code{HIDESOURCECATS=NO}, categories referred to by computed categories are displayed like other categories. Use @code{HIDESOURCECATS=YES} to hide them. -@node CTABLES Base Weight -@subsection Base Weight +@node CTABLES Effective Weight +@subsection Effective Weight @display @t{/WEIGHT VARIABLE=}@i{variable} @@ -2050,17 +2082,17 @@ by computed categories are displayed like other categories. Use The @code{WEIGHT} subcommand is optional and must appear before @code{TABLE}. If it appears, it must name a numeric variable, known -as the @dfn{effective base weight} or @dfn{adjustment weight}. The -effective base weight variable stands in for the dictionary's weight +as the @dfn{effective weight} or @dfn{adjustment weight}. The +effective weight variable stands in for the dictionary's weight variable (@pxref{WEIGHT}), if any, in most calculations in @code{CTABLES}. The only exceptions are the @code{COUNT}, @code{TOTALN}, and @code{VALIDN} summary functions, which use the dictionary weight instead. Weights obtained from the @pspp{} dictionary are rounded to the -nearest integer at the case level. Effective base weights are not -rounded. Regardless of the weighting source, @pspp{} does not analyze -cases with zero, missing, or negative effective weights. +nearest integer at the case level. Effective weights are not rounded. +Regardless of the weighting source, @pspp{} does not analyze cases +with zero, missing, or negative effective weights. @node CTABLES Hiding Small Counts @subsection Hiding Small Counts