X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=4c65898b8eb6be7f70e2f61b586fd2fb2f8c5ecd;hb=d98583b9425b8a053dc21b539203406bac74adc5;hp=2e133dd651ad3ded4525ef91f655ba6c1abd1344;hpb=c1b1583b96cc05a2bf9f3f6d01bbfa063fafb253;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index 2e133dd651..4c65898b8e 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -1014,12 +1014,6 @@ In @code{TABLE}, each of @var{rows}, @var{columns}, and @var{layers} is either empty or an axis expression that specifies one or more variables. At least one must specify an axis expression. -@menu -* CTABLES Categorical Variable Basics:: -* CTABLES Scalar Variable Basics:: -* CTABLES Overriding Measurement Level:: -@end menu - @node CTABLES Categorical Variable Basics @subsubsection Categorical Variables @@ -1027,7 +1021,8 @@ An axis expression that names a categorical variable divides the data into cells according to the values of that variable. When all the variables named on @code{TABLE} are categorical, by default each cell displays the number of cases that it contains, so specifying a single -variable yields a frequency table: +variable yields a frequency table, much like the output of the +@code{FREQUENCIES} command (@pxref{FREQUENCIES}): @example CTABLES /TABLE=AgeGroup. @@ -1036,7 +1031,8 @@ CTABLES /TABLE=AgeGroup. @noindent Specifying a row and a column categorical variable yields a -crosstabulation: +crosstabulation, much like the output of the @code{CROSSTABS} command +(@pxref{CROSSTABS}): @example CTABLES /TABLE=AgeGroup BY qns3a. @@ -1121,15 +1117,24 @@ decide whether to treat it as categorical or scalar. Variables assigned the nominal or ordinal measurement level are treated as categorical, and scalar variables are treated as scalar. -Use the @code{VARIABLE LEVEL} command to change a variable's -measurement level (@pxref{VARIABLE LEVEL}). To treat a variable as -categorical or scalar only for one use on @code{CTABLES}, add -@samp{[C]} or @samp{[S]}, respectively, after the variable name. The -following example shows how to analyze the scalar variable @code{qn20} -as categorical: +When @pspp{} reads data from a file in an external format, such as a +text file, variables' measurement levels are often unknown. If +@code{CTABLES} runs when a variable has an unknown measurement level, +it makes an initial pass through the data to guess measurement levels +using the rules described in an earlier section (@pxref{Measurement +Level}). Use the @code{VARIABLE LEVEL} command to set or change a +variable's measurement level (@pxref{VARIABLE LEVEL}). + +To treat a variable as categorical or scalar only for one use on +@code{CTABLES}, add @samp{[C]} or @samp{[S]}, respectively, after the +variable name. The following example shows the output when variable +@code{qn20} is analyzed as scalar (the default for its measurement +level) and as categorical: @example -CTABLES /TABLE qn20 [C] BY qns3a. +CTABLES + /TABLE qn20 BY qns3a + /TABLE qn20 [C] BY qns3a. @end example @psppoutput {ctables9} @@ -1145,14 +1150,17 @@ sets. @subsection Data Summarization The @code{CTABLES} command allows the user to control how the data are -summarized with summary specifications, which are enclosed in square -brackets following a variable name on the @code{TABLE} subcommand. -When all the variables are categorical, summary specifications can be -given for the innermost nested variables on any one axis. When a -scalar variable is present, only the scalar variable may have summary -specifications. The following example includes a summary -specification for column and row percentages for categorical -variables, and mean and median for a scalar variable: +summarized with @dfn{summary specifications}, syntax that lists one or +more summary function names, optionally separated by commas, and which +are enclosed in square brackets following a variable name on the +@code{TABLE} subcommand. When all the variables are categorical, +summary specifications can be given for the innermost nested variables +on any one axis. When a scalar variable is present, only the scalar +variable may have summary specifications. + +The following example includes a summary specification for column and +row percentages for categorical variables, and mean and median for a +scalar variable: @example CTABLES @@ -1172,6 +1180,31 @@ CTABLES /TABLE=AgeGroup [COLPCT 'Gender %' PCT5.0, @end example @psppoutput {ctables11} +In addition to the standard formats, @code{CTABLES} allows the user to +specify the following special formats: + +@multitable {@code{NEGPAREN@i{w}.@i{d}}} {Encloses all numbers in parentheses.} {@t{(42.96%)}} {@t{(-42.96%)}} +@item @code{NEGPAREN@i{w}.@i{d}} +@tab Encloses negative numbers in parentheses. +@tab @t{@w{ }42.96} +@tab @t{@w{ }(42.96)} + +@item @code{NEQUAL@i{w}.@i{d}} +@tab Adds a @code{N=} prefix. +@tab @t{@w{ }N=42.96} +@tab @t{@w{ }N=-42.96} + +@item @code{@code{PAREN@i{w}.@i{d}}} +@tab Encloses all numbers in parentheses. +@tab @t{@w{ }(42.96)} +@tab @t{@w{ }(-42.96)} + +@item @code{PCTPAREN@i{w}.@i{d}} +@tab Encloses all numbers in parentheses with a @samp{%} suffix. +@tab @t{@w{ }(42.96%)} +@tab @t{(-42.96%)} +@end multitable + Parentheses provide a shorthand to apply summary specifications to multiple variables. For example, both of these commands: @@ -1185,14 +1218,10 @@ produce the same output shown below: @psppoutput {ctables12} -The following sections list the available summary functions. - -@menu -* CTABLES Summary Functions for Individual Cells:: -* CTABLES Summary Functions for Groups of Cells:: -* CTABLES Summary Functions for Adjusted Weights:: -* CTABLES Unweighted Summary Functions:: -@end menu +The following sections list the available summary functions. After +each function's name is given its default label and format. If no +format is listed, then the default format is the print format for the +variable being summarized. @node CTABLES Summary Functions for Individual Cells @subsubsection Summary Functions for Individual Cells @@ -1202,7 +1231,7 @@ individual cell in @code{CTABLES}. Only one such summary function, @code{COUNT}, may be applied to both categorical and scale variables: @table @asis -@item @code{COUNT} (``Count'') +@item @code{COUNT} (``Count'', F40.0) The sum of weights in a cell. If @code{CATEGORIES} for one or more of the variables in a table @@ -1252,7 +1281,7 @@ The standard deviation. @item @code{SUM} (``Sum'') The sum. -@item @code{TOTALN} (``Total N'') +@item @code{TOTALN} (``Total N'', F40.0) The sum of weights in a cell. For scale data, @code{COUNT} and @code{TOTALN} are the same. @@ -1264,12 +1293,17 @@ Options}), or user-missing values excluded because @code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or system-missing values. @code{COUNT} does not count these. -@item @code{VALIDN} (``Valid N'') +@xref{CTABLES Missing Values for Summary Variables}, for details of +how @code{CTABLES} summarizes missing values. + +@item @code{VALIDN} (``Valid N'', F40.0) The sum of valid count weights in included categories. -@code{VALIDN} does not count missing values regardless of whether they -are in included categories via @code{CATEGORIES}. @code{VALIDN} does -not count valid values that are in excluded categories. +For categorical variables, @code{VALIDN} does not count missing values +regardless of whether they are in included categories via +@code{CATEGORIES}. @code{VALIDN} does not count valid values that are +in excluded categories. @xref{CTABLES Missing Values for Summary +Variables}, for details. @item @code{VARIANCE} (``Variance'') The variance. @@ -1325,13 +1359,13 @@ each @var{area} described above, for both categorical and scale variables: @table @asis -@item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'') +@item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'', PCT40.1) A percentage of total counts within @var{area}. -@item @code{@i{area}PCT.VALIDN} (``@i{Area} Valid N %'') +@item @code{@i{area}PCT.VALIDN} (``@i{Area} Valid N %'', PCT40.1) A percentage of total counts for valid values within @var{area}. -@item @code{@i{area}PCT.TOTALN} (``@i{Area} Total N %'') +@item @code{@i{area}PCT.TOTALN} (``@i{Area} Total N %'', PCT40.1) A percentage of total counts for all values within @var{area}. @end table @@ -1339,27 +1373,28 @@ Scale variables and totals and subtotals for categorical variables may use the following additional group cell summary function: @table @asis -@item @code{@i{area}PCT.SUM} (``@i{Area} Sum %'') +@item @code{@i{area}PCT.SUM} (``@i{Area} Sum %'', PCT40.1) Percentage of the sum of the values within @var{area}. @end table @node CTABLES Summary Functions for Adjusted Weights @subsubsection Summary Functions for Adjusted Weights -If the @code{WEIGHT} subcommand specified an adjustment weight -variable, then the following summary functions use its value instead -of the dictionary weight variable. Otherwise, they are equivalent to -the summary function without the @samp{E}-prefix: +If the @code{WEIGHT} subcommand specified an effective weight variable +(@pxref{CTABLES Effective Weight}), then the following summary functions +use its value instead of the dictionary weight variable. Otherwise, +they are equivalent to the summary function without the +@samp{E}-prefix: @itemize @bullet @item -@code{ECOUNT} (``Adjusted Count'') +@code{ECOUNT} (``Adjusted Count'', F40.0) @item -@code{ETOTALN} (``Adjusted Total N'') +@code{ETOTALN} (``Adjusted Total N'', F40.0) @item -@code{EVALIDN} (``Adjusted Valid N'') +@code{EVALIDN} (``Adjusted Valid N'', F40.0) @end itemize @node CTABLES Unweighted Summary Functions @@ -1371,16 +1406,16 @@ counts: @itemize @bullet @item -@code{UCOUNT} (``Unweighted Count'') +@code{UCOUNT} (``Unweighted Count'', F40.0) @item -@code{U@i{area}PCT} or @code{U@i{area}PCT.COUNT} (``Unweighted @i{Area} %'') +@code{U@i{area}PCT} or @code{U@i{area}PCT.COUNT} (``Unweighted @i{Area} %'', PCT40.1) @item -@code{U@i{area}PCT.VALIDN} (``Unweighted @i{Area} Valid N %'') +@code{U@i{area}PCT.VALIDN} (``Unweighted @i{Area} Valid N %'', PCT40.1) @item -@code{U@i{area}PCT.TOTALN} (``Unweighted @i{Area} Total N %'') +@code{U@i{area}PCT.TOTALN} (``Unweighted @i{Area} Total N %'', PCT40.1) @item @code{UMEAN} (``Unweighted Mean'') @@ -1392,10 +1427,10 @@ counts: @code{UMISSING} (``Unweighted Missing'') @item -@code{UMODE} (``Unweight Mode'') +@code{UMODE} (``Unweighted Mode'') @item -@code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'') +@code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'', PCT40.1) @item @code{UPTILE} @i{n} (``Unweighted Percentile @i{n}'') @@ -1410,13 +1445,13 @@ counts: @code{USUM} (``Unweighted Sum'') @item -@code{UTOTALN} (``Unweighted Total N'') +@code{UTOTALN} (``Unweighted Total N'', F40.0) @item -@code{UVALIDN} (``Unweighted Valid N'') +@code{UVALIDN} (``Unweighted Valid N'', F40.0) @item -@code{UVARIANCE} (``Unweighted Variance'') +@code{UVARIANCE} (``Unweighted Variance'', F40.0) @end itemize @node CTABLES Statistics Positions and Labels @@ -1485,8 +1520,8 @@ CTABLES /TABLE AgeGroup BY qns3a. @t{ROWLABELS=OPPOSITE} or @t{COLLABELS=OPPOSITE} move row or column variable category labels, respectively, to the opposite axis. The -setting affects only the innermost variable on the given axis. For -example: +setting affects only the innermost variable or variables, which must +be categorical, on the given axis. For example: @example CTABLES /TABLE AgeGroup BY qns3a /CLABELS ROWLABELS=OPPOSITE. @@ -1527,6 +1562,23 @@ CTABLES @end example @psppoutput {ctables24} +@subsubheading Moving Categories for Stacked Variables + +If @code{CLABELS} moves category labels from an axis with stacked +variables, the variables that are moved must have the same category +specifications (@pxref{CTABLES Per-Variable Category Options}) and the +same value labels. + +The following shows both moving stacked category variables and +adapting to the changing definitions of rows and columns: + +@example +CTABLES /TABLE (qn105ba + qn105bb) [COLPCT]. +CTABLES /TABLE (qn105ba + qn105bb) [ROWPCT] + /CLABELS ROW=OPPOSITE. +@end example +@psppoutput {ctables25} + @node CTABLES Per-Variable Category Options @subsection Per-Variable Category Options @@ -1550,19 +1602,23 @@ variables. @code{CATEGORIES} applies to the table produced by the @code{CATEGORIES} does not apply to scalar variables. -@t{VARIABLES} is required. List the variables for the subcommand +@t{VARIABLES} is required and must list the variables for the subcommand to affect. -There are two way to specify the Categories to include and their sort -order: +The syntax may specify the categories to include and their sort order +either explicitly or implicitly. The following sections give the +details of each form of syntax, followed by information on totals and +subtotals and the @code{EMPTY} setting. + +@node CTABLES Explicit Categories +@subsubsection Explicit Categories -@table @asis -@item Explicit categories. @anchor{CTABLES Explicit Category List} -To explicitly specify categories to include, list the categories -within square brackets in the desired sort order. Use spaces or -commas to separate values. Categories not covered by the list are -excluded from analysis. + +To use @code{CTABLES} to explicitly specify categories to include, +list the categories within square brackets in the desired sort order. +Use spaces or commas to separate values. Categories not covered by +the list are excluded from analysis. Each element of the list takes one of the following forms: @@ -1593,26 +1649,43 @@ Any non-missing value not covered by any other element of the list @item &@i{postcompute} A computed category name (@pxref{CTABLES Computed Categories}). + +@item SUBTOTAL +@itemx HSUBTOTAL +A subtotal (@pxref{CTABLES Totals and Subtotals}). @end table -Additional forms, described later, allow for subtotals. If multiple elements of the list cover a given category, the last one -in the list is considered to be a match. +in the list takes precedence. -@item Implicit categories. -Without an explicit list of categories, @pspp{} sorts -categories automatically. +The following example syntax and output show how an explicit category +can limit the displayed categories: + +@example +CTABLES /TABLE qn1. +CTABLES /TABLE qn1 /CATEGORIES VARIABLES=qn1 [1, 2, 3]. +@end example +@psppoutput {ctables27} + +@node CTABLES Implicit Categories +@subsubsection Implicit Categories + +In the absence of an explicit list of categories, @code{CATEGORIES} +allows @code{KEY}, @code{ORDER}, and @code{MISSING} to specify how to +select and sort categories. The @code{KEY} setting specifies the sort key. By default, or with @code{KEY=VALUE}, categories are sorted by default. Categories may also be sorted by value label, with @code{KEY=LABEL}, or by the value -of a summary function, e.g.@: @code{KEY=COUNT}. For summary -functions, a variable name may be specified in parentheses, e.g.@: -@code{KEY=MAXIUM(qnd1)}, and this is required for functions that apply -only to scalar variables. The @code{PTILE} function also requires a -percentage argument, e.g.@: @code{KEY=PTILE(qnd1, 90)}. Only summary -functions used in the table may be used, except that @code{COUNT} is -always allowed. +of a summary function, e.g.@: @code{KEY=COUNT}. +@ignore @c Not yet implemented +For summary functions, a variable name may be specified in +parentheses, e.g.@: @code{KEY=MAXIUM(qnd1)}, and this is required for +functions that apply only to scalar variables. The @code{PTILE} +function also requires a percentage argument, e.g.@: +@code{KEY=PTILE(qnd1, 90)}. Only summary functions used in the table +may be used, except that @code{COUNT} is always allowed. +@end ignore By default, or with @code{ORDER=A}, categories are sorted in ascending order. Specify @code{ORDER=D} to sort in descending order. @@ -1620,35 +1693,81 @@ order. Specify @code{ORDER=D} to sort in descending order. User-missing values are excluded by default, or with @code{MISSING=EXCLUDE}. Specify @code{MISSING=INCLUDE} to include user-missing values. The system-missing value is always excluded. -@end table -@subsubheading Totals and Subtotals +The following example syntax and output show how +@code{MISSING=INCLUDE} causes missing values to be included in a +category list. + +@example +CTABLES /TABLE qn1. +CTABLES /TABLE qn1 /CATEGORIES VARIABLES=qn1 MISSING=INCLUDE. +@end example +@psppoutput {ctables28} + +@node CTABLES Totals and Subtotals +@subsubsection Totals and Subtotals -@code{CATEGORIES} also controls display of totals and subtotals. -Totals are not displayed by default, or with @code{TOTAL=NO}. Specify +@code{CATEGORIES} also controls display of totals and subtotals. By +default, or with @code{TOTAL=NO}, totals are not displayed. Use @code{TOTAL=YES} to display a total. By default, the total is labeled ``Total''; use @code{LABEL="@i{label}"} to override it. Subtotals are also not displayed by default. To add one or more subtotals, use an explicit category list and insert @code{SUBTOTAL} or @code{HSUBTOTAL} in the position or positions where the subtotal -should appear. With @code{SUBTOTAL}, the subtotal becomes an extra -row or column or layer; @code{HSUBTOTAL} additionally hides the -categories that make up the subtotal. Either way, the default label -is ``Subtotal'', use @code{SUBTOTAL="@i{label}"} or -@code{HSUBTOTAL="@i{label}"} to specify a custom label. +should appear. The subtotal becomes an extra row or column or layer. +@code{HSUBTOTAL} additionally hides the categories that make up the +subtotal. Either way, the default label is ``Subtotal'', use +@code{SUBTOTAL="@i{label}"} or @code{HSUBTOTAL="@i{label}"} to specify +a custom label. + +The following example syntax and output show how to use +@code{TOTAL=YES} and @code{SUBTOTAL}: -By default, or with @code{POSITION=AFTER}, totals come after the last -category and subtotals apply to categories that precede them. With -@code{POSITION=BEFORE}, totals come before the first category and -subtotals apply to categories that follow them. +@example +CTABLES + /TABLE qn1 + /CATEGORIES VARIABLES=qn1 [OTHERNM, SUBTOTAL='Valid Total', + MISSING, SUBTOTAL='Missing Total'] + TOTAL=YES LABEL='Overall Total'. +@end example +@psppoutput {ctables29} + +By default, or with @code{POSITION=AFTER}, totals are displayed in the +output after the last category and subtotals apply to categories that +precede them. With @code{POSITION=BEFORE}, totals come before the +first category and subtotals apply to categories that follow them. Only categorical variables may have totals and subtotals. Scalar variables may be ``totaled'' indirectly by enabling totals and -subtotals on a categorical variable within which the scalar variable is -summarized. +subtotals on a categorical variable within which the scalar variable +is summarized. For example, the following syntax produces a mean, +count, and valid count across all data by adding a total on the +categorical @code{region} variable, as shown: + +@example +CTABLES /TABLE=region > qn20 [MEAN, VALIDN] + /CATEGORIES VARIABLES=region TOTAL=YES LABEL='All regions'. +@end example +@psppoutput {ctables30} + +By default, @pspp{} uses the same summary functions for totals and +subtotals as other categories. To summarize totals and subtotals +differently, specify the summary functions for totals and subtotals +after the ordinary summary functions inside a nested set of @code{[]} +following @code{TOTALS}. For example, the following syntax displays +@code{COUNT} for individual categories and totals and @code{VALIDN} +for totals, as shown: + +@example +CTABLES + /TABLE qnd7a [COUNT, TOTALS[COUNT, VALIDN]] + /CATEGORIES VARIABLES=qnd7a TOTAL=YES MISSING=INCLUDE. +@end example +@psppoutput {ctables26} -@subsubheading Categories Without Values +@node CTABLES Categories Without Values +@subsubsection Categories Without Values Some categories might not be included in the data set being analyzed. For example, our example data set has no cases in the ``15 or @@ -1657,9 +1776,20 @@ younger'' age group. By default, or with @code{EMPTY=INCLUDE}, them, specify @code{EMPTY=EXCLUDE}. For implicit categories, empty categories potentially include all the -values with labels for a given variable; for explicit categories, they -include all the values listed individually and all labeled values -covered by ranges or @code{MISSING} or @code{OTHERNM}. +values with value labels for a given variable; for explicit +categories, they include all the values listed individually and all +values with value labels that are covered by ranges or @code{MISSING} +or @code{OTHERNM}. + +The following example syntax and output show the effect of +@code{EMPTY=EXCLUDE} for the @code{qns1} variable, in which 0 is labeled +``None'' but no cases exist with that value: + +@example +CTABLES /TABLE=qns1. +CTABLES /TABLE=qns1 /CATEGORIES VARIABLES=qns1 EMPTY=EXCLUDE. +@end example +@psppoutput {ctables31} @node CTABLES Titles @subsection Titles @@ -1672,10 +1802,31 @@ covered by ranges or @code{MISSING} or @code{OTHERNM}. @end display The @code{TITLES} subcommand sets the title, caption, and corner text -for the table output for the previous @code{TABLE} subcommand. The -title appears above the table, the caption below the table, and the -corner text appears in the table's upper left corner. By default, the -title is ``Custom Tables'' and the caption and corner text are empty. +for the table output for the previous @code{TABLE} subcommand. Any +number of strings may be specified for each kind of text, with each +string appearing on a separate line in the output. The title appears +above the table, the caption below the table, and the corner text +appears in the table's upper left corner. By default, the title is +``Custom Tables'' and the caption and corner text are empty. With +some table output styles, the corner text is not displayed. + +The strings provided in this subcommand may contain the following +macro-like keywords that @pspp{} substitutes at the time that it runs +the command: + +@table @code @c ( +@item )DATE +The current date, e.g.@: MM/DD/YY. The format is locale-dependent. + +@c ( +@item )TIME +The current time, e.g.@: HH:MM:SS. The format is locale-dependent. + +@c ( +@item )TABLE +The expression specified on the @code{TABLE} command. Summary +and measurement level specifications are omitted, and variable labels are used in place of variable names. +@end table @node CTABLES Table Formatting @subsection Table Formatting @@ -1694,13 +1845,14 @@ The @code{FORMAT} subcommand, which must precede the first tables. @code{FORMAT} and all of its settings are optional. Use @code{MINCOLWIDTH} and @code{MAXCOLWIDTH} to control the minimum -or maximum width of columns in output tables. By default, or with +or maximum width of columns in output tables. By default, with @code{DEFAULT}, column width varies based on content. Otherwise, specify a number for either or both of these settings. If both are -specified, @code{MAXCOLWIDTH} must be bigger than @code{MINCOLWIDTH}. -The default unit, or with @code{UNITS=POINTS}, is points (1/72 inch), -but specify @code{UNITS=INCHES} to use inches or @code{UNITS=CM} for -centimeters. +specified, @code{MAXCOLWIDTH} must be greater than or equal to +@code{MINCOLWIDTH}. The default unit, or with @code{UNITS=POINTS}, is +points (1/72 inch), or specify @code{UNITS=INCHES} to use inches or +@code{UNITS=CM} for centimeters. @pspp{} does not currently honor any +of these settings. By default, or with @code{EMPTY=ZERO}, zero values are displayed in their usual format. Use @code{EMPTY=BLANK} to use an empty cell @@ -1730,7 +1882,7 @@ variables listed on @code{VARIABLES}. The supported values are: @table @code @item DEFAULT -Uses the setting from @ref{SET TVARS}. +Use the setting from @code{SET TVARS} (@pxref{SET TVARS}). @item NAME Show only a variable name. @@ -1748,6 +1900,89 @@ Show nothing. @node CTABLES Missing Value Treatment @subsection Missing Value Treatment +The @code{TABLE} subcommand on @code{CTABLES} specifies two different +kinds of variables: variables that divide tables into cells (which are +always categorical) and variables being summarized (which may be +categorical or scale). @pspp{} treats missing values differently in +each kind of variable, as described in the sections below. + +@node CTABLES Missing Values for Cell-Defining Variables +@subsubsection Missing Values for Cell-Defining Variables + +For variables that divide tables into cells, per-variable category +options, as described in @ref{CTABLES Per-Variable Category Options}, +determine which data is analyzed. If any of the categories for such a +variable would exclude a case, then that case is not included. + +As an example, consider the following entirely artificial dataset, in +which @samp{x} and @samp{y} are categorical variables with missing +value 9, and @samp{z} is scale: + +@psppoutput{ctables32} + +Using @samp{x} and @samp{y} to define cells, and summarizing @samp{z}, +by default @pspp{} omits all the cases that have @samp{x} or @samp{y} (or both) +missing: + +@example +CTABLES /TABLE x > y > z [SUM]. +@end example +@psppoutput{ctables33} + +If, however, we add @code{CATEGORIES} specifications to include +missing values for @samp{y} or for @samp{x} and @samp{y}, the output +table includes them, like so: + +@example +CTABLES /TABLE x > y > z [SUM] /CATEGORIES VARIABLES=y MISSING=INCLUDE. +CTABLES /TABLE x > y > z [SUM] /CATEGORIES VARIABLES=x y MISSING=INCLUDE. +@end example +@psppoutput{ctables34} + +@node CTABLES Missing Values for Summary Variables +@subsubsection Missing Values for Summary Variables + +For summary variables, values that are valid and in included +categories are analyzed, and values that are missing or in excluded +categories are not analyzed, with the following exceptions: + +@itemize @bullet +@item +The ``@t{VALIDN}'' summary functions (@code{VALIDN}, @code{EVALIDN}, +@code{UVALIDN}, @code{@i{area}PCT.VALIDN}, and +@code{U@i{area}PCT.VALIDN}) only count valid values in included +categories (not missing values in included categories). + +@item +The ``@t{TOTALN}'' summary functions (@code{TOTALN}, @code{ETOTALN}, +@code{UTOTALN}, @code{@i{area}PCT.TOTALN}), and +@code{U@i{area}PCT.TOTALN} count all values (valid and missing) in +included categories and missing (but not valid) values in excluded +categories. +@end itemize + +@noindent +For categorical variables, system-missing values are never in included +categories. For scale variables, there is no notion of included and +excluded categories, so all values are effectively included. + +The following table provides another view of the above rules: + +@multitable {@w{ }@w{ }@w{ }@w{ }Missing values in excluded categories} {@t{VALIDN}} {other} {@t{TOTALN}} +@headitem @tab @t{VALIDN} @tab other @tab @t{TOTALN} +@item @headitemfont{Categorical variables:} +@item @w{ }@w{ }@w{ }@w{ }Valid values in included categories @tab yes @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }Missing values in included categories @tab --- @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }Missing values in excluded categories @tab --- @tab --- @tab yes +@item @w{ }@w{ }@w{ }@w{ }Valid values in excluded categories @tab --- @tab --- @tab --- +@item @headitemfont{Scale variables:} +@item @w{ }@w{ }@w{ }@w{ }Valid values @tab yes @tab yes @tab yes +@item @w{ }@w{ }@w{ }@w{ }User- or system-missing values @tab --- @tab yes @tab yes +@end multitable + +@node CTABLES Scale Missing Values +@subsubsection Scale Missing Values + @display @t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@} @end display @@ -1805,19 +2040,30 @@ CTABLES /SMISSING LISTWISE /TABLE (y > x) + (z > x). @display @t{/PCOMPUTE} @t{&}@i{postcompute}@t{=EXPR(}@i{expression}@t{)} +@t{/PPROPERTIES} @t{&}@i{postcompute}@dots{} + [@t{LABEL=}@i{string}] + [@t{FORMAT=}[@i{summary} @i{format}]@dots{}] + [@t{HIDESOURCECATS=}@{@t{NO} @math{|} @t{YES}@} @end display @dfn{Computed categories}, also called @dfn{postcomputes}, are categories created using arithmetic on categories obtained from the -data. The @code{PCOMPUTE} subcommand defines computed categories, -which can then be used in two places: on @code{CATEGORIES} within an -explicit category list (@pxref{CTABLES Explicit Category List}), and on -the @code{PPROPERTIES} subcommand to define further properties for a -given postcompute. +data. The @code{PCOMPUTE} subcommand creates a postcompute, which may +then be used on @code{CATEGORIES} within an explicit category list +(@pxref{CTABLES Explicit Category List}). Optionally, +@code{PPROPERTIES} refines how a postcompute is displayed. The +following sections provide the details. + +@node CTABLES PCOMPUTE +@subsubsection PCOMPUTE + +@display +@t{/PCOMPUTE} @t{&}@i{postcompute}@t{=EXPR(}@i{expression}@t{)} +@end display -@code{PCOMPUTE} must precede the first @code{TABLE} command. It is -optional and it may be used any number of times to define multiple -postcomputes. +The @code{PCOMPUTE} subcommand, which must precede the first +@code{TABLE} command, defines computed categories. It is optional and +may be used any number of times to define multiple postcomputes. Each @code{PCOMPUTE} defines one postcompute. Its syntax consists of a name to identify the postcompute as a @pspp{} identifier prefixed by @@ -1829,30 +2075,33 @@ in @code{EXPR(@dots{})}. A postcompute expression consists of: This form evaluates to the summary statistic for @i{category}, e.g.@: @code{[1]} evaluates to the value of the summary statistic associated with category 1. The @i{category} may be a number, a quoted string, -or a quoted time or date value, and all of the categories for a given -postcompute must have the same form. +or a quoted time or date value. All of the categories for a given +postcompute must have the same form. The category must appear in all +the @code{CATEGORIES} list in which the postcompute is used. @item [@i{min} THRU @i{max}] @itemx [LO THRU @i{max}] @itemx [@i{min} THRU HI] @itemx MISSING @itemx OTHERNM -These forms evaluate to the summary statistics for categories matching -the given syntax, as described in previous sections (@pxref{CTABLES -Explicit Category List}). If more than one category matches, their -values are summed. +These forms evaluate to the summary statistics for a category +specified with the same syntax, as described in previous section +(@pxref{CTABLES Explicit Category List}). The category must appear in +all the @code{CATEGORIES} list in which the postcompute is used. @item SUBTOTAL The summary statistic for the subtotal category. This form is allowed -only for variables with exactly one subtotal. +only if the @code{CATEGORIES} lists that include this postcompute have +exactly one subtotal. @item SUBTOTAL[@i{index}] The summary statistic for subtotal category @i{index}, where 1 is the first subtotal, 2 is the second, and so on. This form may be used for -any number of subtotals. +@code{CATEGORIES} lists with any number of subtotals. @item TOTAL -The summary statistic for the total. +The summary statistic for the total. The @code{CATEGORIES} lsits that +include this postcompute must have a total enabled. @item @i{a} + @i{b} @itemx @i{a} - @i{b} @@ -1881,8 +2130,30 @@ Normally a named postcompute is defined only once, but if a later @code{PCOMPUTE} redefines a postcompute with the same name as an earlier one, the later one take precedence. -@node CTABLES Computed Category Properties -@subsection Computed Category Properties +The following syntax and output shows how @code{PCOMPUTE} can compute +a total over subtotals, summing the ``Frequent Drivers'' and +``Infrequent Drivers'' subtotals to form an ``All Drivers'' +postcompute. It also shows how to calculate and display a percentage, +in this case the percentage of valid responses that report never +driving. It uses @code{PPROPERTIES} (@pxref{CTABLES PPROPERTIES}) to +display the latter in @code{PCT} format. + +@example +CTABLES + /PCOMPUTE &all_drivers=EXPR([1 THRU 2] + [3 THRU 4]) + /PPROPERTIES &all_drivers LABEL='All Drivers' + /PCOMPUTE &pct_never=EXPR([5] / ([1 THRU 2] + [3 THRU 4] + [5]) * 100) + /PPROPERTIES &pct_never LABEL='% Not Drivers' FORMAT=COUNT PCT40.1 + /TABLE=qn1 BY qns3a + /CATEGORIES VARIABLES=qn1 [1 THRU 2, SUBTOTAL='Frequent Drivers', + 3 THRU 4, SUBTOTAL='Infrequent Drivers', + &all_drivers, 5, &pct_never, + MISSING, SUBTOTAL='Not Drivers or Missing']. +@end example +@psppoutput{ctables35} + +@node CTABLES PPROPERTIES +@subsubsection PPROPERTIES @display @t{/PPROPERTIES} @t{&}@i{postcompute}@dots{} @@ -1902,15 +2173,19 @@ All of the settings on @code{PPROPERTIES} are optional. Use output. The default label for a postcompute is the expression used to define it. -The @code{FORMAT} setting sets summary statistics and display formats -for the postcomputes. +A postcompute always uses same summary functions as the variable whose +categories contain it, but @code{FORMAT} allows control over the +format used to display their values. It takes a list of summary +function names and format specifiers. By default, or with @code{HIDESOURCECATS=NO}, categories referred to by computed categories are displayed like other categories. Use @code{HIDESOURCECATS=YES} to hide them. -@node CTABLES Base Weight -@subsection Base Weight +The previous section provides an example for @code{PPROPERTIES}. + +@node CTABLES Effective Weight +@subsection Effective Weight @display @t{/WEIGHT VARIABLE=}@i{variable} @@ -1918,15 +2193,17 @@ by computed categories are displayed like other categories. Use The @code{WEIGHT} subcommand is optional and must appear before @code{TABLE}. If it appears, it must name a numeric variable, known -as the @dfn{effective base weight} or @dfn{adjustment weight}. The -effective base weight variable is used for the @code{ECOUNT}, -@code{ETOTALN}, and @code{EVALIDN} summary functions. - -Cases with zero, missing, or negative effective base weight are -excluded from all analysis. +as the @dfn{effective weight} or @dfn{adjustment weight}. The +effective weight variable stands in for the dictionary's weight +variable (@pxref{WEIGHT}), if any, in most calculations in +@code{CTABLES}. The only exceptions are the @code{COUNT}, +@code{TOTALN}, and @code{VALIDN} summary functions, which use the +dictionary weight instead. Weights obtained from the @pspp{} dictionary are rounded to the -nearest integer. Effective base weights are not rounded. +nearest integer at the case level. Effective weights are not rounded. +Regardless of the weighting source, @pspp{} does not analyze cases +with zero, missing, or negative effective weights. @node CTABLES Hiding Small Counts @subsection Hiding Small Counts @@ -1936,10 +2213,18 @@ nearest integer. Effective base weights are not rounded. @end display The @code{HIDESMALLCOUNTS} subcommand is optional. If it specified, -then count values in output tables less than the value of @i{count} -are shown as @code{<@i{count}} instead of their true values. The -value of @i{count} must be an integer and must be at least 2. Case -weights are considered for deciding whether to hide a count. +then @code{COUNT}, @code{ECOUNT}, and @code{UCOUNT} values in output +tables less than the value of @i{count} are shown as @code{<@i{count}} +instead of their true values. The value of @i{count} must be an +integer and must be at least 2. + +The following syntax and example shows how to use +@code{HIDESMALLCOUNTS}: + +@example +CTABLES /HIDESMALLCOUNTS COUNT=10 /TABLE qn37. +@end example +@psppoutput{ctables36} @node FACTOR @section FACTOR