wokr on docs

[pspp] / doc / statistics.texi
diff --git a/doc/statistics.texi b/doc/statistics.texi

index 1b0a294218608be25f11f92e67b935472d8b9824..82855902f29c530d9257b8af3aa4ee2f8211f6c5 100644 (file)
--- a/doc/statistics.texi
+++ b/doc/statistics.texi
@@ -1014,12 +1014,6 @@ In @code{TABLE}, each of @var{rows}, @var{columns}, and @var{layers}
  is either empty or an axis expression that specifies one or more
  variables.  At least one must specify an axis expression.
  
  is either empty or an axis expression that specifies one or more
  variables.  At least one must specify an axis expression.
  
-@menu
-* CTABLES Categorical Variable Basics::
-* CTABLES Scalar Variable Basics::
-* CTABLES Overriding Measurement Level::
-@end menu
-
  @node CTABLES Categorical Variable Basics
  @subsubsection Categorical Variables
  
  @node CTABLES Categorical Variable Basics
  @subsubsection Categorical Variables
  
@@ -1127,7 +1121,7 @@ When @pspp{} reads data from a file in an external format, such as a
  text file, variables' measurement levels are often unknown.  If
  @code{CTABLES} runs when a variable has an unknown measurement level,
  it makes an initial pass through the data to guess measurement levels
  text file, variables' measurement levels are often unknown.  If
  @code{CTABLES} runs when a variable has an unknown measurement level,
  it makes an initial pass through the data to guess measurement levels
-using the rules described earlier in this manual (@pxref{Measurement
+using the rules described in an earlier section (@pxref{Measurement
  Level}).  Use the @code{VARIABLE LEVEL} command to set or change a
  variable's measurement level (@pxref{VARIABLE LEVEL}).
  
  Level}).  Use the @code{VARIABLE LEVEL} command to set or change a
  variable's measurement level (@pxref{VARIABLE LEVEL}).
  
@@ -1155,8 +1149,6 @@ sets.
  @node CTABLES Data Summarization
  @subsection Data Summarization
  
  @node CTABLES Data Summarization
  @subsection Data Summarization
  
-@c TODO Summary function default formats
-
  The @code{CTABLES} command allows the user to control how the data are
  summarized with @dfn{summary specifications}, syntax that lists one or
  more summary function names, optionally separated by commas, and which
  The @code{CTABLES} command allows the user to control how the data are
  summarized with @dfn{summary specifications}, syntax that lists one or
  more summary function names, optionally separated by commas, and which
@@ -1188,6 +1180,31 @@ CTABLES /TABLE=AgeGroup [COLPCT 'Gender %' PCT5.0,
  @end example
  @psppoutput {ctables11}
  
  @end example
  @psppoutput {ctables11}
  
+In addition to the standard formats, @code{CTABLES} allows the user to
+specify the following special formats:
+
+@multitable {@code{NEGPAREN@i{w}.@i{d}}} {Encloses all numbers in parentheses.} {@t{(42.96%)}} {@t{(-42.96%)}}
+@item @code{NEGPAREN@i{w}.@i{d}}
+@tab Encloses negative numbers in parentheses.
+@tab @t{@w{    }42.96}
+@tab @t{@w{  }(42.96)}
+
+@item @code{NEQUAL@i{w}.@i{d}}
+@tab Adds a @code{N=} prefix.
+@tab @t{@w{  }N=42.96}
+@tab @t{@w{ }N=-42.96}
+
+@item @code{@code{PAREN@i{w}.@i{d}}}
+@tab Encloses all numbers in parentheses.
+@tab @t{@w{  }(42.96)}
+@tab @t{@w{ }(-42.96)}
+
+@item @code{PCTPAREN@i{w}.@i{d}}
+@tab Encloses all numbers in parentheses with a @samp{%} suffix.
+@tab @t{@w{ }(42.96%)}
+@tab @t{(-42.96%)}
+@end multitable
+
  Parentheses provide a shorthand to apply summary specifications to
  multiple variables.  For example, both of these commands:
  
  Parentheses provide a shorthand to apply summary specifications to
  multiple variables.  For example, both of these commands:
  
@@ -1201,14 +1218,10 @@ produce the same output shown below:
  
  @psppoutput {ctables12}
  
  
  @psppoutput {ctables12}
  
-The following sections list the available summary functions.
-
-@menu
-* CTABLES Summary Functions for Individual Cells::
-* CTABLES Summary Functions for Groups of Cells::
-* CTABLES Summary Functions for Adjusted Weights::
-* CTABLES Unweighted Summary Functions::
-@end menu
+The following sections list the available summary functions.  After
+each function's name is given its default label and format.  If no
+format is listed, then the default format is the print format for the
+variable being summarized.
  
  @node CTABLES Summary Functions for Individual Cells
  @subsubsection Summary Functions for Individual Cells
  
  @node CTABLES Summary Functions for Individual Cells
  @subsubsection Summary Functions for Individual Cells
@@ -1218,7 +1231,7 @@ individual cell in @code{CTABLES}.  Only one such summary function,
  @code{COUNT}, may be applied to both categorical and scale variables:
  
  @table @asis
  @code{COUNT}, may be applied to both categorical and scale variables:
  
  @table @asis
-@item @code{COUNT} (``Count'')
+@item @code{COUNT} (``Count'', F40.0)
  The sum of weights in a cell.
  
  If @code{CATEGORIES} for one or more of the variables in a table
  The sum of weights in a cell.
  
  If @code{CATEGORIES} for one or more of the variables in a table
@@ -1268,7 +1281,7 @@ The standard deviation.
  @item @code{SUM} (``Sum'')
  The sum.
  
  @item @code{SUM} (``Sum'')
  The sum.
  
-@item @code{TOTALN} (``Total N'')
+@item @code{TOTALN} (``Total N'', F40.0)
  The sum of weights in a cell.
  
  For scale data, @code{COUNT} and @code{TOTALN} are the same.
  The sum of weights in a cell.
  
  For scale data, @code{COUNT} and @code{TOTALN} are the same.
@@ -1280,12 +1293,17 @@ Options}), or user-missing values excluded because
  @code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or
  system-missing values.  @code{COUNT} does not count these.
  
  @code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or
  system-missing values.  @code{COUNT} does not count these.
  
-@item @code{VALIDN} (``Valid N'')
+@xref{CTABLES Missing Values for Summary Variables}, for details of
+how @code{CTABLES} summarizes missing values.
+
+@item @code{VALIDN} (``Valid N'', F40.0)
  The sum of valid count weights in included categories.
  
  The sum of valid count weights in included categories.
  
-@code{VALIDN} does not count missing values regardless of whether they
-are in included categories via @code{CATEGORIES}.  @code{VALIDN} does
-not count valid values that are in excluded categories.
+For categorical variables, @code{VALIDN} does not count missing values
+regardless of whether they are in included categories via
+@code{CATEGORIES}.  @code{VALIDN} does not count valid values that are
+in excluded categories.  @xref{CTABLES Missing Values for Summary
+Variables}, for details.
  
  @item @code{VARIANCE} (``Variance'')
  The variance.
  
  @item @code{VARIANCE} (``Variance'')
  The variance.
@@ -1341,13 +1359,13 @@ each @var{area} described above, for both categorical and scale
  variables:
  
  @table @asis
  variables:
  
  @table @asis
-@item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'')
+@item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'', PCT40.1)
  A percentage of total counts within @var{area}.
  
  A percentage of total counts within @var{area}.
  
-@item @code{@i{area}PCT.VALIDN} (``@i{Area} Valid N %'')
+@item @code{@i{area}PCT.VALIDN} (``@i{Area} Valid N %'', PCT40.1)
  A percentage of total counts for valid values within @var{area}.
  
  A percentage of total counts for valid values within @var{area}.
  
-@item @code{@i{area}PCT.TOTALN} (``@i{Area} Total N %'')
+@item @code{@i{area}PCT.TOTALN} (``@i{Area} Total N %'', PCT40.1)
  A percentage of total counts for all values within @var{area}.
  @end table
  
  A percentage of total counts for all values within @var{area}.
  @end table
  
@@ -1355,27 +1373,28 @@ Scale variables and totals and subtotals for categorical variables may
  use the following additional group cell summary function:
  
  @table @asis
  use the following additional group cell summary function:
  
  @table @asis
-@item @code{@i{area}PCT.SUM} (``@i{Area} Sum %'')
+@item @code{@i{area}PCT.SUM} (``@i{Area} Sum %'', PCT40.1)
  Percentage of the sum of the values within @var{area}.
  @end table
  
  @node CTABLES Summary Functions for Adjusted Weights
  @subsubsection Summary Functions for Adjusted Weights
  
  Percentage of the sum of the values within @var{area}.
  @end table
  
  @node CTABLES Summary Functions for Adjusted Weights
  @subsubsection Summary Functions for Adjusted Weights
  
-If the @code{WEIGHT} subcommand specified an adjustment weight
-variable, then the following summary functions use its value instead
-of the dictionary weight variable.  Otherwise, they are equivalent to
-the summary function without the @samp{E}-prefix:
+If the @code{WEIGHT} subcommand specified an effective weight variable
+(@pxref{CTABLES Effective Weight}), then the following summary functions
+use its value instead of the dictionary weight variable.  Otherwise,
+they are equivalent to the summary function without the
+@samp{E}-prefix:
  
  @itemize @bullet
  @item
  
  @itemize @bullet
  @item
-@code{ECOUNT} (``Adjusted Count'')
+@code{ECOUNT} (``Adjusted Count'', F40.0)
  
  @item
  
  @item
-@code{ETOTALN} (``Adjusted Total N'')
+@code{ETOTALN} (``Adjusted Total N'', F40.0)
  
  @item
  
  @item
-@code{EVALIDN} (``Adjusted Valid N'')
+@code{EVALIDN} (``Adjusted Valid N'', F40.0)
  @end itemize
  
  @node CTABLES Unweighted Summary Functions
  @end itemize
  
  @node CTABLES Unweighted Summary Functions
@@ -1387,16 +1406,16 @@ counts:
  
  @itemize @bullet
  @item
  
  @itemize @bullet
  @item
-@code{UCOUNT} (``Unweighted Count'')
+@code{UCOUNT} (``Unweighted Count'', F40.0)
  
  @item
  
  @item
-@code{U@i{area}PCT} or @code{U@i{area}PCT.COUNT} (``Unweighted @i{Area} %'')
+@code{U@i{area}PCT} or @code{U@i{area}PCT.COUNT} (``Unweighted @i{Area} %'', PCT40.1)
  
  @item
  
  @item
-@code{U@i{area}PCT.VALIDN} (``Unweighted @i{Area} Valid N %'')
+@code{U@i{area}PCT.VALIDN} (``Unweighted @i{Area} Valid N %'', PCT40.1)
  
  @item
  
  @item
-@code{U@i{area}PCT.TOTALN} (``Unweighted @i{Area} Total N %'')
+@code{U@i{area}PCT.TOTALN} (``Unweighted @i{Area} Total N %'', PCT40.1)
  
  @item
  @code{UMEAN} (``Unweighted Mean'')
  
  @item
  @code{UMEAN} (``Unweighted Mean'')
@@ -1408,10 +1427,10 @@ counts:
  @code{UMISSING} (``Unweighted Missing'')
  
  @item
  @code{UMISSING} (``Unweighted Missing'')
  
  @item
-@code{UMODE} (``Unweight Mode'')
+@code{UMODE} (``Unweighted Mode'')
  
  @item
  
  @item
-@code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'')
+@code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'', PCT40.1)
  
  @item
  @code{UPTILE} @i{n} (``Unweighted Percentile @i{n}'') 
  
  @item
  @code{UPTILE} @i{n} (``Unweighted Percentile @i{n}'') 
@@ -1426,13 +1445,13 @@ counts:
  @code{USUM} (``Unweighted Sum'')
  
  @item
  @code{USUM} (``Unweighted Sum'')
  
  @item
-@code{UTOTALN} (``Unweighted Total N'')
+@code{UTOTALN} (``Unweighted Total N'', F40.0)
  
  @item
  
  @item
-@code{UVALIDN} (``Unweighted Valid N'')
+@code{UVALIDN} (``Unweighted Valid N'', F40.0)
  
  @item
  
  @item
-@code{UVARIANCE} (``Unweighted Variance'')
+@code{UVARIANCE} (``Unweighted Variance'', F40.0)
  @end itemize
  
  @node CTABLES Statistics Positions and Labels
  @end itemize
  
  @node CTABLES Statistics Positions and Labels
@@ -1501,8 +1520,8 @@ CTABLES /TABLE AgeGroup BY qns3a.
  
  @t{ROWLABELS=OPPOSITE} or @t{COLLABELS=OPPOSITE} move row or column
  variable category labels, respectively, to the opposite axis.  The
  
  @t{ROWLABELS=OPPOSITE} or @t{COLLABELS=OPPOSITE} move row or column
  variable category labels, respectively, to the opposite axis.  The
-setting affects only the innermost variable on the given axis.  For
-example:
+setting affects only the innermost variable or variables, which must
+be categorical, on the given axis.  For example:
  
  @example
  CTABLES /TABLE AgeGroup BY qns3a /CLABELS ROWLABELS=OPPOSITE.
  
  @example
  CTABLES /TABLE AgeGroup BY qns3a /CLABELS ROWLABELS=OPPOSITE.
@@ -1516,8 +1535,6 @@ column variable category labels, respectively, to the layer axis.
  Only one axis's labels may be moved, whether to the opposite axis or
  to the layer axis.
  
  Only one axis's labels may be moved, whether to the opposite axis or
  to the layer axis.
  
-@c TODO Moving category labels for stacked variables
-
  @subsubheading Effect on Summary Statistics
  
  @code{CLABELS} primarily affects the appearance of tables, not the
  @subsubheading Effect on Summary Statistics
  
  @code{CLABELS} primarily affects the appearance of tables, not the
@@ -1545,6 +1562,23 @@ CTABLES
  @end example
  @psppoutput {ctables24}
  
  @end example
  @psppoutput {ctables24}
  
+@subsubheading Moving Categories for Stacked Variables
+
+If @code{CLABELS} moves category labels from an axis with stacked
+variables, the variables that are moved must have the same category
+specifications (@pxref{CTABLES Per-Variable Category Options}) and the
+same value labels.
+
+The following shows both moving stacked category variables and
+adapting to the changing definitions of rows and columns:
+
+@example
+CTABLES /TABLE (qn105ba + qn105bb) [COLPCT].
+CTABLES /TABLE (qn105ba + qn105bb) [ROWPCT]
+  /CLABELS ROW=OPPOSITE.
+@end example
+@psppoutput {ctables25}
+
  @node CTABLES Per-Variable Category Options
  @subsection Per-Variable Category Options
  
  @node CTABLES Per-Variable Category Options
  @subsection Per-Variable Category Options
  
@@ -1669,7 +1703,20 @@ variables may be ``totaled'' indirectly by enabling totals and
  subtotals on a categorical variable within which the scalar variable is
  summarized.
  
  subtotals on a categorical variable within which the scalar variable is
  summarized.
  
-@c TODO Specifying summaries for totals and subtotals
+By default, @pspp{} uses the same summary functions for totals and
+subtotals as other categories.  To summarize totals and subtotals
+differently, specify the summary functions for totals and subtotals
+after the ordinary summary functions inside a nested set of @code{[]}
+following @code{TOTALS}.  For example, the following syntax displays
+@code{COUNT} for individual categories and totals and @code{VALIDN}
+for totals, as shown:
+
+@example
+CTABLES
+    /TABLE qnd7a [COUNT, TOTALS[COUNT, VALIDN]]
+    /CATEGORIES VARIABLES=qnd7a TOTAL=YES MISSING=INCLUDE.
+@end example
+@psppoutput {ctables26}
  
  @subsubheading Categories Without Values
  
  
  @subsubheading Categories Without Values
  
@@ -1695,14 +1742,34 @@ or @code{OTHERNM}.
      [@t{CORNER=}@i{string}@dots{}]
  @end display
  
      [@t{CORNER=}@i{string}@dots{}]
  @end display
  
-@c TODO Describe substitution variables
-
  The @code{TITLES} subcommand sets the title, caption, and corner text
  The @code{TITLES} subcommand sets the title, caption, and corner text
-for the table output for the previous @code{TABLE} subcommand.  The
-title appears above the table, the caption below the table, and the
-corner text appears in the table's upper left corner.  By default, the
-title is ``Custom Tables'' and the caption and corner text are empty.
-With some table output styles, the corner text is not displayed.
+for the table output for the previous @code{TABLE} subcommand.  Any
+number of strings may be specified for each kind of text, with each
+string appearing on a separate line in the output.  The title appears
+above the table, the caption below the table, and the corner text
+appears in the table's upper left corner.  By default, the title is
+``Custom Tables'' and the caption and corner text are empty.  With
+some table output styles, the corner text is not displayed.
+
+The strings provided in this subcommand may contain the following
+macro-like keywords that @pspp{} substitutes at the time that it runs
+the command:
+
+@table @code @c (
+@item )DATE
+The current date, e.g.@: MM/DD/YY.  The format is locale-dependent.
+
+@c (
+@item )TIME
+The current time, e.g.@: HH:MM:SS.  The format is locale-dependent.
+
+@c (
+@item )TABLE
+The expression specified on the @code{TABLE} command.  Summary
+and measurement level specifications are omitted, and variable labels are used in place of variable names.
+@end table
+
+@c TODO example
  
  @node CTABLES Table Formatting
  @subsection Table Formatting
  
  @node CTABLES Table Formatting
  @subsection Table Formatting
@@ -1775,6 +1842,73 @@ Show nothing.
  @node CTABLES Missing Value Treatment
  @subsection Missing Value Treatment
  
  @node CTABLES Missing Value Treatment
  @subsection Missing Value Treatment
  
+The @code{TABLE} subcommand on @code{CTABLES} specifies two different
+kinds of variables: variables that divide tables into cells (which are
+always categorical) and variables being summarized (which may be
+categorical or scale).  @pspp{} treats missing values differently in
+each kind of variable:
+
+@itemize @bullet
+@item
+For variables that divide tables into cells, per-variable category
+options determine which data is analyzed.  If any of the categories
+for such a variable would exclude a case, then that case is not
+included.
+
+@item
+The treatment of missing values in variables being summarized varies
+between scale and scale and categorical variables.  The following
+section describes their treatment in detail.
+
+By default, each summarized variable is considered separately for
+missing value treatment.  A section below describes how to consider
+missing values listwise for summarizing scale variables.
+@end itemize
+
+@node CTABLES Missing Values for Summary Variables
+@subsubsection Missing Values for Summary Variables
+
+For summary variables, values that are valid and in included
+categories are analyzed, and values that are missing or in excluded
+categories are not analyzed, with the following exceptions:
+
+@itemize @bullet
+@item
+The ``@t{VALIDN}'' summary functions (@code{VALIDN}, @code{EVALIDN},
+@code{UVALIDN}, @code{@i{area}PCT.VALIDN}, and
+@code{U@i{area}PCT.VALIDN}) only count valid values in included
+categories (not missing values in included categories).
+
+@item
+The ``@t{TOTALN}'' summary functions (@code{TOTALN}, @code{ETOTALN},
+@code{UTOTALN}, @code{@i{area}PCT.TOTALN}), and
+@code{U@i{area}PCT.TOTALN} count all values (valid and missing) in
+included categories and missing (but not valid) values in excluded
+categories.
+@end itemize
+
+@noindent
+For categorical variables, system-missing values are never in included
+categories.  For scale variables, there is no notion of included and
+excluded categories, so all values are effectively included.
+
+The following table provides another view of the above rules:
+
+@multitable {@w{ }@w{ }@w{ }@w{ }Missing values in excluded categories} {@t{VALIDN}} {other} {@t{TOTALN}}
+@headitem @tab @t{VALIDN} @tab other @tab @t{TOTALN}
+@item @headitemfont{Categorical variables:}
+@item @w{ }@w{ }@w{ }@w{ }Valid values in included categories   @tab yes @tab yes @tab yes
+@item @w{ }@w{ }@w{ }@w{ }Missing values in included categories @tab --- @tab yes @tab yes
+@item @w{ }@w{ }@w{ }@w{ }Missing values in excluded categories @tab --- @tab --- @tab yes
+@item @w{ }@w{ }@w{ }@w{ }Valid values in excluded categories   @tab --- @tab --- @tab ---
+@item @headitemfont{Scale variables:}
+@item @w{ }@w{ }@w{ }@w{ }Valid values                          @tab yes @tab yes @tab yes
+@item @w{ }@w{ }@w{ }@w{ }User- or system-missing values        @tab --- @tab yes @tab yes
+@end multitable
+
+@node CTABLES Scale Missing Values
+@subsubsection Scale Missing Values
+
  @display
  @t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@}
  @end display
  @display
  @t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@}
  @end display
@@ -1939,8 +2073,8 @@ By default, or with @code{HIDESOURCECATS=NO}, categories referred to
  by computed categories are displayed like other categories.  Use
  @code{HIDESOURCECATS=YES} to hide them.
  
  by computed categories are displayed like other categories.  Use
  @code{HIDESOURCECATS=YES} to hide them.
  
-@node CTABLES Base Weight
-@subsection Base Weight
+@node CTABLES Effective Weight
+@subsection Effective Weight
  
  @display
  @t{/WEIGHT VARIABLE=}@i{variable}
  
  @display
  @t{/WEIGHT VARIABLE=}@i{variable}
@@ -1948,17 +2082,17 @@ by computed categories are displayed like other categories.  Use
  
  The @code{WEIGHT} subcommand is optional and must appear before
  @code{TABLE}.  If it appears, it must name a numeric variable, known
  
  The @code{WEIGHT} subcommand is optional and must appear before
  @code{TABLE}.  If it appears, it must name a numeric variable, known
-as the @dfn{effective base weight} or @dfn{adjustment weight}.  The
-effective base weight variable stands in for the dictionary's weight
+as the @dfn{effective weight} or @dfn{adjustment weight}.  The
+effective weight variable stands in for the dictionary's weight
  variable (@pxref{WEIGHT}), if any, in most calculations in
  @code{CTABLES}.  The only exceptions are the @code{COUNT},
  @code{TOTALN}, and @code{VALIDN} summary functions, which use the
  dictionary weight instead.
  
  Weights obtained from the @pspp{} dictionary are rounded to the
  variable (@pxref{WEIGHT}), if any, in most calculations in
  @code{CTABLES}.  The only exceptions are the @code{COUNT},
  @code{TOTALN}, and @code{VALIDN} summary functions, which use the
  dictionary weight instead.
  
  Weights obtained from the @pspp{} dictionary are rounded to the
-nearest integer at the case level.  Effective base weights are not
-rounded.  Regardless of the weighting source, @pspp{} does not analyze
-cases with zero, missing, or negative effective weights.
+nearest integer at the case level.  Effective weights are not rounded.
+Regardless of the weighting source, @pspp{} does not analyze cases
+with zero, missing, or negative effective weights.
  
  @node CTABLES Hiding Small Counts
  @subsection Hiding Small Counts
  
  @node CTABLES Hiding Small Counts
  @subsection Hiding Small Counts