X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=82855902f29c530d9257b8af3aa4ee2f8211f6c5;hb=fe829ccb26aa877de1068a3f1e237930ea8f2d34;hp=1b0a294218608be25f11f92e67b935472d8b9824;hpb=c6caca37f19989f96ad843e2baee09a54c4f23ba;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index 1b0a294218..82855902f2 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -1014,12 +1014,6 @@ In @code{TABLE}, each of @var{rows}, @var{columns}, and @var{layers} is either empty or an axis expression that specifies one or more variables. At least one must specify an axis expression. -@menu -* CTABLES Categorical Variable Basics:: -* CTABLES Scalar Variable Basics:: -* CTABLES Overriding Measurement Level:: -@end menu - @node CTABLES Categorical Variable Basics @subsubsection Categorical Variables @@ -1127,7 +1121,7 @@ When @pspp{} reads data from a file in an external format, such as a text file, variables' measurement levels are often unknown. If @code{CTABLES} runs when a variable has an unknown measurement level, it makes an initial pass through the data to guess measurement levels -using the rules described earlier in this manual (@pxref{Measurement +using the rules described in an earlier section (@pxref{Measurement Level}). Use the @code{VARIABLE LEVEL} command to set or change a variable's measurement level (@pxref{VARIABLE LEVEL}). @@ -1155,8 +1149,6 @@ sets. @node CTABLES Data Summarization @subsection Data Summarization -@c TODO Summary function default formats - The @code{CTABLES} command allows the user to control how the data are summarized with @dfn{summary specifications}, syntax that lists one or more summary function names, optionally separated by commas, and which @@ -1188,6 +1180,31 @@ CTABLES /TABLE=AgeGroup [COLPCT 'Gender %' PCT5.0, @end example @psppoutput {ctables11} +In addition to the standard formats, @code{CTABLES} allows the user to +specify the following special formats: + +@multitable {@code{NEGPAREN@i{w}.@i{d}}} {Encloses all numbers in parentheses.} {@t{(42.96%)}} {@t{(-42.96%)}} +@item @code{NEGPAREN@i{w}.@i{d}} +@tab Encloses negative numbers in parentheses. +@tab @t{@w{ }42.96} +@tab @t{@w{ }(42.96)} + +@item @code{NEQUAL@i{w}.@i{d}} +@tab Adds a @code{N=} prefix. +@tab @t{@w{ }N=42.96} +@tab @t{@w{ }N=-42.96} + +@item @code{@code{PAREN@i{w}.@i{d}}} +@tab Encloses all numbers in parentheses. +@tab @t{@w{ }(42.96)} +@tab @t{@w{ }(-42.96)} + +@item @code{PCTPAREN@i{w}.@i{d}} +@tab Encloses all numbers in parentheses with a @samp{%} suffix. +@tab @t{@w{ }(42.96%)} +@tab @t{(-42.96%)} +@end multitable + Parentheses provide a shorthand to apply summary specifications to multiple variables. For example, both of these commands: @@ -1201,14 +1218,10 @@ produce the same output shown below: @psppoutput {ctables12} -The following sections list the available summary functions. - -@menu -* CTABLES Summary Functions for Individual Cells:: -* CTABLES Summary Functions for Groups of Cells:: -* CTABLES Summary Functions for Adjusted Weights:: -* CTABLES Unweighted Summary Functions:: -@end menu +The following sections list the available summary functions. After +each function's name is given its default label and format. If no +format is listed, then the default format is the print format for the +variable being summarized. @node CTABLES Summary Functions for Individual Cells @subsubsection Summary Functions for Individual Cells @@ -1218,7 +1231,7 @@ individual cell in @code{CTABLES}. Only one such summary function, @code{COUNT}, may be applied to both categorical and scale variables: @table @asis -@item @code{COUNT} (``Count'') +@item @code{COUNT} (``Count'', F40.0) The sum of weights in a cell. If @code{CATEGORIES} for one or more of the variables in a table @@ -1268,7 +1281,7 @@ The standard deviation. @item @code{SUM} (``Sum'') The sum. -@item @code{TOTALN} (``Total N'') +@item @code{TOTALN} (``Total N'', F40.0) The sum of weights in a cell. For scale data, @code{COUNT} and @code{TOTALN} are the same. @@ -1280,12 +1293,17 @@ Options}), or user-missing values excluded because @code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or system-missing values. @code{COUNT} does not count these. -@item @code{VALIDN} (``Valid N'') +@xref{CTABLES Missing Values for Summary Variables}, for details of +how @code{CTABLES} summarizes missing values. + +@item @code{VALIDN} (``Valid N'', F40.0) The sum of valid count weights in included categories. -@code{VALIDN} does not count missing values regardless of whether they -are in included categories via @code{CATEGORIES}. @code{VALIDN} does -not count valid values that are in excluded categories. +For categorical variables, @code{VALIDN} does not count missing values +regardless of whether they are in included categories via +@code{CATEGORIES}. @code{VALIDN} does not count valid values that are +in excluded categories. @xref{CTABLES Missing Values for Summary +Variables}, for details. @item @code{VARIANCE} (``Variance'') The variance. @@ -1341,13 +1359,13 @@ each @var{area} described above, for both categorical and scale variables: @table @asis -@item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'') +@item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'', PCT40.1) A percentage of total counts within @var{area}. -@item @code{@i{area}PCT.VALIDN} (``@i{Area} Valid N %'') +@item @code{@i{area}PCT.VALIDN} (``@i{Area} Valid N %'', PCT40.1) A percentage of total counts for valid values within @var{area}. -@item @code{@i{area}PCT.TOTALN} (``@i{Area} Total N %'') +@item @code{@i{area}PCT.TOTALN} (``@i{Area} Total N %'', PCT40.1) A percentage of total counts for all values within @var{area}. @end table @@ -1355,27 +1373,28 @@ Scale variables and totals and subtotals for categorical variables may use the following additional group cell summary function: @table @asis -@item @code{@i{area}PCT.SUM} (``@i{Area} Sum %'') +@item @code{@i{area}PCT.SUM} (``@i{Area} Sum %'', PCT40.1) Percentage of the sum of the values within @var{area}. @end table @node CTABLES Summary Functions for Adjusted Weights @subsubsection Summary Functions for Adjusted Weights -If the @code{WEIGHT} subcommand specified an adjustment weight -variable, then the following summary functions use its value instead -of the dictionary weight variable. Otherwise, they are equivalent to -the summary function without the @samp{E}-prefix: +If the @code{WEIGHT} subcommand specified an effective weight variable +(@pxref{CTABLES Effective Weight}), then the following summary functions +use its value instead of the dictionary weight variable. Otherwise, +they are equivalent to the summary function without the +@samp{E}-prefix: @itemize @bullet @item -@code{ECOUNT} (``Adjusted Count'') +@code{ECOUNT} (``Adjusted Count'', F40.0) @item -@code{ETOTALN} (``Adjusted Total N'') +@code{ETOTALN} (``Adjusted Total N'', F40.0) @item -@code{EVALIDN} (``Adjusted Valid N'') +@code{EVALIDN} (``Adjusted Valid N'', F40.0) @end itemize @node CTABLES Unweighted Summary Functions @@ -1387,16 +1406,16 @@ counts: @itemize @bullet @item -@code{UCOUNT} (``Unweighted Count'') +@code{UCOUNT} (``Unweighted Count'', F40.0) @item -@code{U@i{area}PCT} or @code{U@i{area}PCT.COUNT} (``Unweighted @i{Area} %'') +@code{U@i{area}PCT} or @code{U@i{area}PCT.COUNT} (``Unweighted @i{Area} %'', PCT40.1) @item -@code{U@i{area}PCT.VALIDN} (``Unweighted @i{Area} Valid N %'') +@code{U@i{area}PCT.VALIDN} (``Unweighted @i{Area} Valid N %'', PCT40.1) @item -@code{U@i{area}PCT.TOTALN} (``Unweighted @i{Area} Total N %'') +@code{U@i{area}PCT.TOTALN} (``Unweighted @i{Area} Total N %'', PCT40.1) @item @code{UMEAN} (``Unweighted Mean'') @@ -1408,10 +1427,10 @@ counts: @code{UMISSING} (``Unweighted Missing'') @item -@code{UMODE} (``Unweight Mode'') +@code{UMODE} (``Unweighted Mode'') @item -@code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'') +@code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'', PCT40.1) @item @code{UPTILE} @i{n} (``Unweighted Percentile @i{n}'') @@ -1426,13 +1445,13 @@ counts: @code{USUM} (``Unweighted Sum'') @item -@code{UTOTALN} (``Unweighted Total N'') +@code{UTOTALN} (``Unweighted Total N'', F40.0) @item -@code{UVALIDN} (``Unweighted Valid N'') +@code{UVALIDN} (``Unweighted Valid N'', F40.0) @item -@code{UVARIANCE} (``Unweighted Variance'') +@code{UVARIANCE} (``Unweighted Variance'', F40.0) @end itemize @node CTABLES Statistics Positions and Labels @@ -1501,8 +1520,8 @@ CTABLES /TABLE AgeGroup BY qns3a. @t{ROWLABELS=OPPOSITE} or @t{COLLABELS=OPPOSITE} move row or column variable category labels, respectively, to the opposite axis. The -setting affects only the innermost variable on the given axis. For -example: +setting affects only the innermost variable or variables, which must +be categorical, on the given axis. For example: @example CTABLES /TABLE AgeGroup BY qns3a /CLABELS ROWLABELS=OPPOSITE. @@ -1516,8 +1535,6 @@ column variable category labels, respectively, to the layer axis. Only one axis's labels may be moved, whether to the opposite axis or to the layer axis. -@c TODO Moving category labels for stacked variables - @subsubheading Effect on Summary Statistics @code{CLABELS} primarily affects the appearance of tables, not the @@ -1545,6 +1562,23 @@ CTABLES @end example @psppoutput {ctables24} +@subsubheading Moving Categories for Stacked Variables + +If @code{CLABELS} moves category labels from an axis with stacked +variables, the variables that are moved must have the same category +specifications (@pxref{CTABLES Per-Variable Category Options}) and the +same value labels. + +The following shows both moving stacked category variables and +adapting to the changing definitions of rows and columns: + +@example +CTABLES /TABLE (qn105ba + qn105bb) [COLPCT]. +CTABLES /TABLE (qn105ba + qn105bb) [ROWPCT] + /CLABELS ROW=OPPOSITE. +@end example +@psppoutput {ctables25} + @node CTABLES Per-Variable Category Options @subsection Per-Variable Category Options @@ -1669,7 +1703,20 @@ variables may be ``totaled'' indirectly by enabling totals and subtotals on a categorical variable within which the scalar variable is summarized. -@c TODO Specifying summaries for totals and subtotals +By default, @pspp{} uses the same summary functions for totals and +subtotals as other categories. To summarize totals and subtotals +differently, specify the summary functions for totals and subtotals +after the ordinary summary functions inside a nested set of @code{[]} +following @code{TOTALS}. For example, the following syntax displays +@code{COUNT} for individual categories and totals and @code{VALIDN} +for totals, as shown: + +@example +CTABLES + /TABLE qnd7a [COUNT, TOTALS[COUNT, VALIDN]] + /CATEGORIES VARIABLES=qnd7a TOTAL=YES MISSING=INCLUDE. +@end example +@psppoutput {ctables26} @subsubheading Categories Without Values @@ -1695,14 +1742,34 @@ or @code{OTHERNM}. [@t{CORNER=}@i{string}@dots{}] @end display -@c TODO Describe substitution variables - The @code{TITLES} subcommand sets the title, caption, and corner text -for the table output for the previous @code{TABLE} subcommand. The -title appears above the table, the caption below the table, and the -corner text appears in the table's upper left corner. By default, the -title is ``Custom Tables'' and the caption and corner text are empty. -With some table output styles, the corner text is not displayed. +for the table output for the previous @code{TABLE} subcommand. Any +number of strings may be specified for each kind of text, with each +string appearing on a separate line in the output. The title appears +above the table, the caption below the table, and the corner text +appears in the table's upper left corner. By default, the title is +``Custom Tables'' and the caption and corner text are empty. With +some table output styles, the corner text is not displayed. + +The strings provided in this subcommand may contain the following +macro-like keywords that @pspp{} substitutes at the time that it runs +the command: + +@table @code @c ( +@item )DATE +The current date, e.g.@: MM/DD/YY. The format is locale-dependent. + +@c ( +@item )TIME +The current time, e.g.@: HH:MM:SS. The format is locale-dependent. + +@c ( +@item )TABLE +The expression specified on the @code{TABLE} command. Summary +and measurement level specifications are omitted, and variable labels are used in place of variable names. +@end table + +@c TODO example @node CTABLES Table Formatting @subsection Table Formatting @@ -1775,6 +1842,73 @@ Show nothing. @node CTABLES Missing Value Treatment @subsection Missing Value Treatment +The @code{TABLE} subcommand on @code{CTABLES} specifies two different +kinds of variables: variables that divide tables into cells (which are +always categorical) and variables being summarized (which may be +categorical or scale). @pspp{} treats missing values differently in +each kind of variable: + +@itemize @bullet +@item +For variables that divide tables into cells, per-variable category +options determine which data is analyzed. If any of the categories +for such a variable would exclude a case, then that case is not +included. + +@item +The treatment of missing values in variables being summarized varies +between scale and scale and categorical variables. The following +section describes their treatment in detail. + +By default, each summarized variable is considered separately for +missing value treatment. A section below describes how to consider +missing values listwise for summarizing scale variables. +@end itemize + +@node CTABLES Missing Values for Summary Variables +@subsubsection Missing Values for Summary Variables + +For summary variables, values that are valid and in included +categories are analyzed, and values that are missing or in excluded +categories are not analyzed, with the following exceptions: + +@itemize @bullet +@item +The ``@t{VALIDN}'' summary functions (@code{VALIDN}, @code{EVALIDN}, +@code{UVALIDN}, @code{@i{area}PCT.VALIDN}, and +@code{U@i{area}PCT.VALIDN}) only count valid values in included +categories (not missing values in included categories). + +@item +The ``@t{TOTALN}'' summary functions (@code{TOTALN}, @code{ETOTALN}, +@code{UTOTALN}, @code{@i{area}PCT.TOTALN}), and +@code{U@i{area}PCT.TOTALN} count all values (valid and missing) in +included categories and missing (but not valid) values in excluded +categories. +@end itemize + +@noindent +For categorical variables, system-missing values are never in included +categories. For scale variables, there is no notion of included and +excluded categories, so all values are effectively included. + +The following table provides another view of the above rules: + +@multitable {@w{ }@w{ }@w{ }@w{ }Missing values in excluded categories} {@t{VALIDN}} {other} {@t{TOTALN}} +@headitem @tab @t{VALIDN} @tab other @tab @t{TOTALN} +@item @headitemfont{Categorical variables:} +@item @w{ }@w{ }@w{ }@w{ }Valid values in included categories @tab yes @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }Missing values in included categories @tab --- @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }Missing values in excluded categories @tab --- @tab --- @tab yes +@item @w{ }@w{ }@w{ }@w{ }Valid values in excluded categories @tab --- @tab --- @tab --- +@item @headitemfont{Scale variables:} +@item @w{ }@w{ }@w{ }@w{ }Valid values @tab yes @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }User- or system-missing values @tab --- @tab yes @tab yes +@end multitable + +@node CTABLES Scale Missing Values +@subsubsection Scale Missing Values + @display @t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@} @end display @@ -1939,8 +2073,8 @@ By default, or with @code{HIDESOURCECATS=NO}, categories referred to by computed categories are displayed like other categories. Use @code{HIDESOURCECATS=YES} to hide them. -@node CTABLES Base Weight -@subsection Base Weight +@node CTABLES Effective Weight +@subsection Effective Weight @display @t{/WEIGHT VARIABLE=}@i{variable} @@ -1948,17 +2082,17 @@ by computed categories are displayed like other categories. Use The @code{WEIGHT} subcommand is optional and must appear before @code{TABLE}. If it appears, it must name a numeric variable, known -as the @dfn{effective base weight} or @dfn{adjustment weight}. The -effective base weight variable stands in for the dictionary's weight +as the @dfn{effective weight} or @dfn{adjustment weight}. The +effective weight variable stands in for the dictionary's weight variable (@pxref{WEIGHT}), if any, in most calculations in @code{CTABLES}. The only exceptions are the @code{COUNT}, @code{TOTALN}, and @code{VALIDN} summary functions, which use the dictionary weight instead. Weights obtained from the @pspp{} dictionary are rounded to the -nearest integer at the case level. Effective base weights are not -rounded. Regardless of the weighting source, @pspp{} does not analyze -cases with zero, missing, or negative effective weights. +nearest integer at the case level. Effective weights are not rounded. +Regardless of the weighting source, @pspp{} does not analyze cases +with zero, missing, or negative effective weights. @node CTABLES Hiding Small Counts @subsection Hiding Small Counts