X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=0cfceb225afa5d59988a7a426f5042866d0291a2;hb=2088d7438791ad96dda2037a6ac7e9b0f3998c8b;hp=351842e94e24b285995c3e5acdbba683bdfc35f9;hpb=86e6b87d7ad411378c3204fe87504c7e6749be78;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index 351842e94e..0cfceb225a 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -1121,7 +1121,7 @@ When @pspp{} reads data from a file in an external format, such as a text file, variables' measurement levels are often unknown. If @code{CTABLES} runs when a variable has an unknown measurement level, it makes an initial pass through the data to guess measurement levels -using the rules described earlier in this manual (@pxref{Measurement +using the rules described in an earlier section (@pxref{Measurement Level}). Use the @code{VARIABLE LEVEL} command to set or change a variable's measurement level (@pxref{VARIABLE LEVEL}). @@ -1293,12 +1293,17 @@ Options}), or user-missing values excluded because @code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or system-missing values. @code{COUNT} does not count these. +@xref{CTABLES Missing Values for Summary Variables}, for details of +how @code{CTABLES} summarizes missing values. + @item @code{VALIDN} (``Valid N'', F40.0) The sum of valid count weights in included categories. -@code{VALIDN} does not count missing values regardless of whether they -are in included categories via @code{CATEGORIES}. @code{VALIDN} does -not count valid values that are in excluded categories. +For categorical variables, @code{VALIDN} does not count missing values +regardless of whether they are in included categories via +@code{CATEGORIES}. @code{VALIDN} does not count valid values that are +in excluded categories. @xref{CTABLES Missing Values for Summary +Variables}, for details. @item @code{VARIANCE} (``Variance'') The variance. @@ -1375,10 +1380,11 @@ Percentage of the sum of the values within @var{area}. @node CTABLES Summary Functions for Adjusted Weights @subsubsection Summary Functions for Adjusted Weights -If the @code{WEIGHT} subcommand specified an adjustment weight -variable, then the following summary functions use its value instead -of the dictionary weight variable. Otherwise, they are equivalent to -the summary function without the @samp{E}-prefix: +If the @code{WEIGHT} subcommand specified an effective weight variable +(@pxref{CTABLES Effective Weight}), then the following summary functions +use its value instead of the dictionary weight variable. Otherwise, +they are equivalent to the summary function without the +@samp{E}-prefix: @itemize @bullet @item @@ -1421,7 +1427,7 @@ counts: @code{UMISSING} (``Unweighted Missing'') @item -@code{UMODE} (``Unweight Mode'') +@code{UMODE} (``Unweighted Mode'') @item @code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'', PCT40.1) @@ -1599,16 +1605,20 @@ variables. @code{CATEGORIES} applies to the table produced by the @t{VARIABLES} is required and must list the variables for the subcommand to affect. -There are two way to specify the Categories to include and their sort -order: +The syntax may specify the categories to include and their sort order +either explicitly or implicitly. The following sections give the +details of each form of syntax, followed by information on totals and +subtotals and the @code{EMPTY} setting. + +@node CTABLES Explicit Categories +@subsubsection Explicit Categories -@table @asis -@item Explicit categories. @anchor{CTABLES Explicit Category List} -To explicitly specify categories to include, list the categories -within square brackets in the desired sort order. Use spaces or -commas to separate values. Categories not covered by the list are -excluded from analysis. + +To use @code{CTABLES} to explicitly specify categories to include, +list the categories within square brackets in the desired sort order. +Use spaces or commas to separate values. Categories not covered by +the list are excluded from analysis. Each element of the list takes one of the following forms: @@ -1639,15 +1649,30 @@ Any non-missing value not covered by any other element of the list @item &@i{postcompute} A computed category name (@pxref{CTABLES Computed Categories}). + +@item SUBTOTAL +@itemx HSUBTOTAL +A subtotal (@pxref{CTABLES Totals and Subtotals}). @end table -Additional forms, described later, allow for subtotals. If multiple elements of the list cover a given category, the last one in the list takes precedence. -@item Implicit categories. -Without an explicit list of categories, @pspp{} sorts -categories automatically. +The following example syntax and output show how an explicit category +can limit the displayed categories: + +@example +CTABLES /TABLE qn1. +CTABLES /TABLE qn1 /CATEGORIES VARIABLES=qn1 [1, 2, 3]. +@end example +@psppoutput {ctables27} + +@node CTABLES Implicit Categories +@subsubsection Implicit Categories + +In the absence of an explicit list of categories, @code{CATEGORIES} +allows @code{KEY}, @code{ORDER}, and @code{MISSING} to specify how to +select and sort categories. The @code{KEY} setting specifies the sort key. By default, or with @code{KEY=VALUE}, categories are sorted by default. Categories may @@ -1668,24 +1693,45 @@ order. Specify @code{ORDER=D} to sort in descending order. User-missing values are excluded by default, or with @code{MISSING=EXCLUDE}. Specify @code{MISSING=INCLUDE} to include user-missing values. The system-missing value is always excluded. -@end table -@subsubheading Totals and Subtotals +The following example syntax and output show how +@code{MISSING=INCLUDE} causes missing values to be included in a +category list. -@code{CATEGORIES} also controls display of totals and subtotals. -Totals are not displayed with @code{TOTAL=NO}, which is also the -default. Specify @code{TOTAL=YES} to display a total. By default, -the total is labeled ``Total''; use @code{LABEL="@i{label}"} to -override it. +@example +CTABLES /TABLE qn1. +CTABLES /TABLE qn1 /CATEGORIES VARIABLES=qn1 MISSING=INCLUDE. +@end example +@psppoutput {ctables28} + +@node CTABLES Totals and Subtotals +@subsubsection Totals and Subtotals + +@code{CATEGORIES} also controls display of totals and subtotals. By +default, or with @code{TOTAL=NO}, totals are not displayed. Use +@code{TOTAL=YES} to display a total. By default, the total is labeled +``Total''; use @code{LABEL="@i{label}"} to override it. Subtotals are also not displayed by default. To add one or more subtotals, use an explicit category list and insert @code{SUBTOTAL} or @code{HSUBTOTAL} in the position or positions where the subtotal -should appear. With @code{SUBTOTAL}, the subtotal becomes an extra -row or column or layer; @code{HSUBTOTAL} additionally hides the -categories that make up the subtotal. Either way, the default label -is ``Subtotal'', use @code{SUBTOTAL="@i{label}"} or -@code{HSUBTOTAL="@i{label}"} to specify a custom label. +should appear. The subtotal becomes an extra row or column or layer. +@code{HSUBTOTAL} additionally hides the categories that make up the +subtotal. Either way, the default label is ``Subtotal'', use +@code{SUBTOTAL="@i{label}"} or @code{HSUBTOTAL="@i{label}"} to specify +a custom label. + +The following example syntax and output show how to use +@code{TOTAL=YES} and @code{SUBTOTAL}: + +@example +CTABLES + /TABLE qn1 + /CATEGORIES VARIABLES=qn1 [OTHERNM, SUBTOTAL='Valid Total', + MISSING, SUBTOTAL='Missing Total'] + TOTAL=YES LABEL='Overall Total'. +@end example +@psppoutput {ctables29} By default, or with @code{POSITION=AFTER}, totals are displayed in the output after the last category and subtotals apply to categories that @@ -1694,8 +1740,16 @@ first category and subtotals apply to categories that follow them. Only categorical variables may have totals and subtotals. Scalar variables may be ``totaled'' indirectly by enabling totals and -subtotals on a categorical variable within which the scalar variable is -summarized. +subtotals on a categorical variable within which the scalar variable +is summarized. For example, the following syntax produces a mean, +count, and valid count across all data by adding a total on the +categorical @code{region} variable, as shown: + +@example +CTABLES /TABLE=region > qn20 [MEAN, VALIDN] + /CATEGORIES VARIABLES=region TOTAL=YES LABEL='All regions'. +@end example +@psppoutput {ctables30} By default, @pspp{} uses the same summary functions for totals and subtotals as other categories. To summarize totals and subtotals @@ -1712,7 +1766,8 @@ CTABLES @end example @psppoutput {ctables26} -@subsubheading Categories Without Values +@node CTABLES Categories Without Values +@subsubsection Categories Without Values Some categories might not be included in the data set being analyzed. For example, our example data set has no cases in the ``15 or @@ -1726,6 +1781,8 @@ categories, they include all the values listed individually and all values with value labels that are covered by ranges or @code{MISSING} or @code{OTHERNM}. +@c TODO + @node CTABLES Titles @subsection Titles @@ -1788,7 +1845,8 @@ specify a number for either or both of these settings. If both are specified, @code{MAXCOLWIDTH} must be greater than or equal to @code{MINCOLWIDTH}. The default unit, or with @code{UNITS=POINTS}, is points (1/72 inch), or specify @code{UNITS=INCHES} to use inches or -@code{UNITS=CM} for centimeters. +@code{UNITS=CM} for centimeters. @pspp{} does not currently honor any +of these settings. By default, or with @code{EMPTY=ZERO}, zero values are displayed in their usual format. Use @code{EMPTY=BLANK} to use an empty cell @@ -1833,6 +1891,8 @@ Show variable name and label. Show nothing. @end table +@c TODO example + @node CTABLES Missing Value Treatment @subsection Missing Value Treatment @@ -1859,6 +1919,8 @@ missing value treatment. A section below describes how to consider missing values listwise for summarizing scale variables. @end itemize +@c TODO example + @node CTABLES Missing Values for Summary Variables @subsubsection Missing Values for Summary Variables @@ -2039,6 +2101,8 @@ Normally a named postcompute is defined only once, but if a later @code{PCOMPUTE} redefines a postcompute with the same name as an earlier one, the later one take precedence. +@c TODO example + @node CTABLES Computed Category Properties @subsection Computed Category Properties @@ -2060,15 +2124,19 @@ All of the settings on @code{PPROPERTIES} are optional. Use output. The default label for a postcompute is the expression used to define it. -The @code{FORMAT} setting sets summary statistics and display formats -for the postcomputes. +A postcompute always uses same summary functions as the variable whose +categories contain it, but @code{FORMAT} allows control over the +format used to display their values. It takes a list of summary +function names and format specifiers. By default, or with @code{HIDESOURCECATS=NO}, categories referred to by computed categories are displayed like other categories. Use @code{HIDESOURCECATS=YES} to hide them. -@node CTABLES Base Weight -@subsection Base Weight +@c TODO example + +@node CTABLES Effective Weight +@subsection Effective Weight @display @t{/WEIGHT VARIABLE=}@i{variable} @@ -2076,17 +2144,17 @@ by computed categories are displayed like other categories. Use The @code{WEIGHT} subcommand is optional and must appear before @code{TABLE}. If it appears, it must name a numeric variable, known -as the @dfn{effective base weight} or @dfn{adjustment weight}. The -effective base weight variable stands in for the dictionary's weight +as the @dfn{effective weight} or @dfn{adjustment weight}. The +effective weight variable stands in for the dictionary's weight variable (@pxref{WEIGHT}), if any, in most calculations in @code{CTABLES}. The only exceptions are the @code{COUNT}, @code{TOTALN}, and @code{VALIDN} summary functions, which use the dictionary weight instead. Weights obtained from the @pspp{} dictionary are rounded to the -nearest integer at the case level. Effective base weights are not -rounded. Regardless of the weighting source, @pspp{} does not analyze -cases with zero, missing, or negative effective weights. +nearest integer at the case level. Effective weights are not rounded. +Regardless of the weighting source, @pspp{} does not analyze cases +with zero, missing, or negative effective weights. @node CTABLES Hiding Small Counts @subsection Hiding Small Counts @@ -2101,6 +2169,8 @@ are shown as @code{<@i{count}} instead of their true values. The value of @i{count} must be an integer and must be at least 2. Case weights are considered for deciding whether to hide a count. +@c TODO example + @node FACTOR @section FACTOR