@c PSPP - a program for statistical analysis.
-@c Copyright (C) 2017 Free Software Foundation, Inc.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
* GRAPH:: Plot data.
* CORRELATIONS:: Correlation tables.
* CROSSTABS:: Crosstabulation tables.
+* CTABLES:: Custom tables.
* FACTOR:: Factor analysis and Principal Components analysis.
* GLM:: Univariate Linear Models.
* LOGISTIC REGRESSION:: Bivariate Logistic Regression.
* ONEWAY:: One way analysis of variance.
* QUICK CLUSTER:: K-Means clustering.
* RANK:: Compute rank scores.
-* REGRESSION:: Linear regression.
* RELIABILITY:: Reliability analysis.
* ROC:: Receiver Operating Characteristic.
@end menu
@end display
The @cmd{DESCRIPTIVES} procedure reads the active dataset and outputs
-descriptive
-statistics requested by the user. In addition, it can optionally
+linear descriptive statistics requested by the user. In addition, it can optionally
compute Z-scores.
The @subcmd{VARIABLES} subcommand, which is required, specifies the list of
The @subcmd{A} and @subcmd{D} settings request an ascending or descending
sort order, respectively.
+@subsection Descriptives Example
+
+The @file{physiology.sav} file contains various physiological data for a sample
+of persons. Running the @cmd{DESCRIPTIVES} command on the variables @exvar{height}
+and @exvar{temperature} with the default options allows one to see simple linear
+statistics for these two variables. In @ref{descriptives:ex}, these variables
+are specfied on the @subcmd{VARIABLES} subcommand and the @subcmd{SAVE} option
+has been used, to request that Z scores be calculated.
+
+After the command has completed, this example runs @cmd{DESCRIPTIVES} again, this
+time on the @exvar{zheight} and @exvar{ztemperature} variables,
+which are the two normalized (Z-score) variables generated by the
+first @cmd{DESCRIPTIVES} command.
+
+@float Example, descriptives:ex
+@psppsyntax {descriptives.sps}
+@caption {Running two @cmd{DESCRIPTIVES} commands, one with the @subcmd{SAVE} subcommand}
+@end float
+
+@float Screenshot, descriptives:scr
+@psppimage {descriptives}
+@caption {The Descriptives dialog box with two variables and Z-Scores option selected}
+@end float
+
+In @ref{descriptives:res}, we can see that there are 40 valid data for each of the variables
+and no missing values. The mean average of the height and temperature is 16677.12
+and 37.02 respectively. The descriptive statistics for temperature seem reasonable.
+However there is a very high standard deviation for @exvar{height} and a suspiciously
+low minimum. This is due to a data entry error in the
+data (@pxref{Identifying incorrect data}).
+
+In the second Descriptive Statistics command, one can see that the mean and standard
+deviation of both Z score variables is 0 and 1 respectively. All Z score statistics
+should have these properties since they are normalized versions of the original scores.
+
+@float Result, descriptives:res
+@psppoutput {descriptives}
+@caption {Descriptives statistics including two normalized variables (Z-scores)}
+@end float
+
@node FREQUENCIES
@section FREQUENCIES
displayed slices to a given range of values.
The keyword @subcmd{NOMISSING} causes missing values to be omitted from the
piechart. This is the default.
-If instead, @subcmd{MISSING} is specified, then a single slice
-will be included representing all system missing and user-missing cases.
+If instead, @subcmd{MISSING} is specified, then the pie chart includes
+a single slice representing all system missing and user-missing cases.
@cindex bar chart
The @subcmd{BARCHART} subcommand produces a bar chart for each variable.
The @subcmd{MINIMUM} and @subcmd{MAXIMUM} keywords can be used to omit
categories whose counts which lie outside the specified limits.
The @subcmd{FREQ} option (default) causes the ordinate to display the frequency
-of each category, whereas the @subcmd{PERCENT} option will display relative
+of each category, whereas the @subcmd{PERCENT} option displays relative
percentages.
The @subcmd{FREQ} and @subcmd{PERCENT} options on @subcmd{HISTOGRAM} and
The @subcmd{ORDER} subcommand is accepted but ignored.
+@subsection Frequencies Example
+
+@ref{frequencies:ex} runs a frequency analysis on the @exvar{sex}
+and @exvar{occupation} variables from the @file{personnel.sav} file.
+This is useful to get an general idea of the way in which these nominal
+variables are distributed.
+
+@float Example, frequencies:ex
+@psppsyntax {frequencies.sps}
+@caption {Running frequencies on the @exvar{sex} and @exvar{occupation} variables}
+@end float
+
+If you are using the graphic user interface, the dialog box is set up such that
+by default, several statistics are calculated. Some are not particularly useful
+for categorical variables, so you may want to disable those.
+
+@float Screenshot, frequencies:scr
+@psppimage {frequencies}
+@caption {The frequencies dialog box with the @exvar{sex} and @exvar{occupation} variables selected}
+@end float
+
+From @ref{frequencies:res} it is evident that there are 33 males, 21 females and
+2 persons for whom their sex has not been entered.
+
+One can also see how many of each occupation there are in the data.
+When dealing with string variables used as nominal values, running a frequency
+analysis is useful to detect data input entries. Notice that
+one @exvar{occupation} value has been mistyped as ``Scrientist''. This entry should
+be corrected, or marked as missing before using the data.
+
+@float Result, frequencies:res
+@psppoutput {frequencies}
+@caption {The relative frequencies of @exvar{sex} and @exvar{occupation}}
+@end float
+
@node EXAMINE
@section EXAMINE
@end display
Each unique combination of the values of @var{factorvar} and
@var{subfactorvar} divide the dataset into @dfn{cells}.
-Statistics will be calculated for each cell
+Statistics are calculated for each cell
and for the entire dataset (unless @subcmd{NOTOTAL} is given).
The @subcmd{STATISTICS} subcommand specifies which statistics to show.
-@subcmd{DESCRIPTIVES} will produce a table showing some parametric and
+@subcmd{DESCRIPTIVES} produces a table showing some parametric and
non-parametrics statistics.
@subcmd{EXTREME} produces a table showing the extremities of each cell.
A number in parentheses, @var{n} determines
The default number is 5.
The subcommands @subcmd{TOTAL} and @subcmd{NOTOTAL} are mutually exclusive.
-If @subcmd{TOTAL} appears, then statistics will be produced for the entire dataset
-as well as for each cell.
-If @subcmd{NOTOTAL} appears, then statistics will be produced only for the cells
+If @subcmd{TOTAL} appears, then statistics for the entire dataset
+as well as for each cell are produced.
+If @subcmd{NOTOTAL} appears, then statistics are produced only for the cells
(unless no factor variables have been given).
These subcommands have no effect if there have been no factor variables
specified.
@subcmd{SPREADLEVEL}.
The first three can be used to visualise how closely each cell conforms to a
normal distribution, whilst the spread vs.@: level plot can be useful to visualise
-how the variance of differs between factors.
-Boxplots will also show you the outliers and extreme values.
+how the variance differs between factors.
+Boxplots show you the outliers and extreme values.
@footnote{@subcmd{HISTOGRAM} uses Sturges' rule to determine the number of
bins, as approximately @math{1 + \log2(n)}, where @math{n} is the number of samples.
Note that @cmd{FREQUENCIES} uses a different algorithm to find the bin size.}
The @subcmd{SPREADLEVEL} plot displays the interquartile range versus the
median. It takes an optional parameter @var{t}, which specifies how the data
should be transformed prior to plotting.
-The given value @var{t} is a power to which the data is raised. For example, if
-@var{t} is given as 2, then the data will be squared.
+The given value @var{t} is a power to which the data are raised. For example, if
+@var{t} is given as 2, then the square of the data is used.
Zero, however is a special value. If @var{t} is 0 or
-is omitted, then data will be transformed by taking its natural logarithm instead of
+is omitted, then data are transformed by taking its natural logarithm instead of
raising to the power of @var{t}.
@cindex Shapiro-Wilk
If given, it should provide the name of a variable which is to be used
to labels extreme values and outliers.
Numeric or string variables are permissible.
-If the @subcmd{ID} subcommand is not given, then the case number will be used for
+If the @subcmd{ID} subcommand is not given, then the case number is used for
labelling.
The @subcmd{CINTERVAL} subcommand specifies the confidence interval to use in
factors specified then @subcmd{TOTAL} and @subcmd{NOTOTAL} have no effect.
-The following example will generate descriptive statistics and histograms for
+The following example generates descriptive statistics and histograms for
two variables @var{score1} and @var{score2}.
Two factors are given, @i{viz}: @var{gender} and @var{gender} BY @var{culture}.
-Therefore, the descriptives and histograms will be generated for each
+Therefore, the descriptives and histograms are generated for each
distinct value
of @var{gender} @emph{and} for each distinct combination of the values
of @var{gender} and @var{race}.
@end example
In this example, we look at the height and weight of a sample of individuals and
how they differ between male and female.
-A table showing the 3 largest and the 3 smallest values of @var{height} and
-@var{weight} for each gender, and for the whole dataset will be shown.
-Boxplots will also be produced.
-Because @subcmd{/COMPARE = GROUPS} was given, boxplots for male and female will be
-shown in the same graphic, allowing us to easily see the difference between
+A table showing the 3 largest and the 3 smallest values of @exvar{height} and
+@exvar{weight} for each gender, and for the whole dataset as are shown.
+In addition, the @subcmd{/PLOT} subcommand requests boxplots.
+Because @subcmd{/COMPARE = GROUPS} was specified, boxplots for male and female are
+shown in juxtaposed in the same graphic, allowing us to easily see the difference between
the genders.
-Since the variable @var{name} was specified on the @subcmd{ID} subcommand, this will be
-used to label the extreme values.
+Since the variable @var{name} was specified on the @subcmd{ID} subcommand,
+values of the @var{name} variable are used to label the extreme values.
@strong{Warning!}
-If many dependent variables are specified, or if factor variables are
-specified for which
-there are many distinct values, then @cmd{EXAMINE} will produce a very
+If you specify many dependent variables or factor variables
+for which there are many distinct values, then @cmd{EXAMINE} will produce a very
large quantity of output.
@node GRAPH
@end display
-The @cmd{GRAPH} produces graphical plots of data. Only one of the subcommands
-@subcmd{HISTOGRAM} or @subcmd{SCATTERPLOT} can be specified, i.e. only one plot
+The @cmd{GRAPH} command produces graphical plots of data. Only one of the subcommands
+@subcmd{HISTOGRAM}, @subcmd{BAR} or @subcmd{SCATTERPLOT} can be specified, @i{i.e.} only one plot
can be produced per call of @cmd{GRAPH}. The @subcmd{MISSING} is optional.
@menu
@cindex scatterplot
The subcommand @subcmd{SCATTERPLOT} produces an xy plot of the
-data. The different values of the optional third variable @var{var3}
-will result in different colours and/or markers for the plot. The
-following is an example for producing a scatterplot.
+data.
+@cmd{GRAPH} uses the third variable @var{var3}, if specified, to determine
+the colours and/or markers for the plot.
+The following is an example for producing a scatterplot.
@example
GRAPH
/SCATTERPLOT = @var{height} WITH @var{weight} BY @var{gender}.
@end example
-This example will produce a scatterplot where @var{height} is plotted versus @var{weight}. Depending
+This example produces a scatterplot where @var{height} is plotted versus @var{weight}. Depending
on the value of the @var{gender} variable, the colour of the datapoint is different. With
-this plot it is possible to analyze gender differences for @var{height} vs.@: @var{weight} relation.
+this plot it is possible to analyze gender differences for @var{height} versus @var{weight} relation.
@node HISTOGRAM
@subsection Histogram
The @cmd{CORRELATIONS} procedure produces tables of the Pearson correlation coefficient
for a set of variables. The significance of the coefficients are also given.
-At least one @subcmd{VARIABLES} subcommand is required. If the @subcmd{WITH}
-keyword is used, then a non-square correlation table will be produced.
-The variables preceding @subcmd{WITH}, will be used as the rows of the table,
-and the variables following will be the columns of the table.
-If no @subcmd{WITH} subcommand is given, then a square, symmetrical table using all variables is produced.
-
+At least one @subcmd{VARIABLES} subcommand is required. If you specify the @subcmd{WITH}
+keyword, then a non-square correlation table is produced.
+The variables preceding @subcmd{WITH}, are used as the rows of the table,
+and the variables following @subcmd{WITH} are used as the columns of the table.
+If no @subcmd{WITH} subcommand is specified, then @cmd{CORRELATIONS} produces a
+square, symmetrical table using all variables.
The @cmd{MISSING} subcommand determines the handling of missing variables.
If @subcmd{INCLUDE} is set, then user-missing values are included in the
The @subcmd{STATISTICS} subcommand requests additional statistics to be displayed. The keyword
@subcmd{DESCRIPTIVES} requests that the mean, number of non-missing cases, and the non-biased
estimator of the standard deviation are displayed.
-These statistics will be displayed in a separated table, for all the variables listed
+These statistics are displayed in a separated table, for all the variables listed
in any @subcmd{/VARIABLES} subcommand.
The @subcmd{XPROD} keyword requests cross-product deviations and covariance estimators to
be displayed for each pair of variables.
CROSSTABS
/TABLES=@var{var_list} BY @var{var_list} [BY @var{var_list}]@dots{}
/MISSING=@{TABLE,INCLUDE,REPORT@}
- /WRITE=@{NONE,CELLS,ALL@}
/FORMAT=@{TABLES,NOTABLES@}
- @{PIVOT,NOPIVOT@}
@{AVALUE,DVALUE@}
- @{NOINDEX,INDEX@}
- @{BOX,NOBOX@}
/CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
ASRESIDUAL,ALL,NONE@}
/COUNT=@{ASIS,CASE,CELL@}
integer mode, user-missing values are included in tables but marked with
a footnote and excluded from statistical calculations.
-Currently the @subcmd{WRITE} subcommand is ignored.
-
The @subcmd{FORMAT} subcommand controls the characteristics of the
crosstabulation tables to be displayed. It has a number of possible
settings:
@itemize @w{}
@item
@subcmd{TABLES}, the default, causes crosstabulation tables to be output.
-@subcmd{NOTABLES} suppresses them.
-
-@item
-@subcmd{PIVOT}, the default, causes each @subcmd{TABLES} subcommand to be displayed in a
-pivot table format. @subcmd{NOPIVOT} causes the old-style crosstabulation format
-to be used.
+@subcmd{NOTABLES}, which is equivalent to @code{CELLS=NONE}, suppresses them.
@item
@subcmd{AVALUE}, the default, causes values to be sorted in ascending order.
@subcmd{DVALUE} asserts a descending sort order.
-
-@item
-@subcmd{INDEX} and @subcmd{NOINDEX} are currently ignored.
-
-@item
-@subcmd{BOX} and @subcmd{NOBOX} is currently ignored.
@end itemize
The @subcmd{CELLS} subcommand controls the contents of each cell in the displayed
@samp{/CELLS} without any settings specified requests @subcmd{COUNT}, @subcmd{ROW},
@subcmd{COLUMN}, and @subcmd{TOTAL}.
If @subcmd{CELLS} is not specified at all then only @subcmd{COUNT}
-will be selected.
+is selected.
By default, crosstabulation and statistics use raw case weights,
without rounding. Use the @subcmd{/COUNT} subcommand to perform
@table @asis
@item CHISQ
-@cindex chisquare
@cindex chi-square
Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
The @samp{/BARCHART} subcommand produces a clustered bar chart for the first two
variables on each table.
If a table has more than two variables, the counts for the third and subsequent levels
-will be aggregated and the chart will be produces as if there were only two variables.
+are aggregated and the chart is produced as if there were only two variables.
@strong{Please note:} Currently the implementation of @cmd{CROSSTABS} has the
Fixes for any of these deficiencies would be welcomed.
+@subsection Crosstabs Example
+
+@cindex chi-square test of independence
+
+A researcher wishes to know if, in an industry, a person's sex is related to
+the person's occupation. To investigate this, she has determined that the
+@file{personnel.sav} is a representative, randomly selected sample of persons.
+The researcher's null hypothesis is that a person's sex has no relation to a
+person's occupation. She uses a chi-squared test of independence to investigate
+the hypothesis.
+
+@float Example, crosstabs:ex
+@psppsyntax {crosstabs.sps}
+@caption {Running crosstabs on the @exvar{sex} and @exvar{occupation} variables}
+@end float
+
+The syntax in @ref{crosstabs:ex} conducts a chi-squared test of independence.
+The line @code{/tables = occupation by sex} indicates that @exvar{occupation}
+and @exvar{sex} are the variables to be tabulated. To do this using the @gui{}
+you must place these variable names respectively in the @samp{Row} and
+@samp{Column} fields as shown in @ref{crosstabs:scr}.
+
+@float Screenshot, crosstabs:scr
+@psppimage {crosstabs}
+@caption {The Crosstabs dialog box with the @exvar{sex} and @exvar{occupation} variables selected}
+@end float
+
+Similarly, the @samp{Cells} button shows a dialog box to select the @code{count}
+and @code{expected} options. All other cell options can be deselected for this
+test.
+
+You would use the @samp{Format} and @samp{Statistics} buttons to select options
+for the @subcmd{FORMAT} and @subcmd{STATISTICS} subcommands. In this example,
+the @samp{Statistics} requires only the @samp{Chisq} option to be checked. All
+other options should be unchecked. No special settings are required from the
+@samp{Format} dialog.
+
+As shown in @ref{crosstabs:res} @cmd{CROSSTABS} generates a contingency table
+containing the observed count and the expected count of each sex and each
+occupation. The expected count is the count which would be observed if the
+null hypothesis were true.
+
+The significance of the Pearson Chi-Square value is very much larger than the
+normally accepted value of 0.05 and so one cannot reject the null hypothesis.
+Thus the researcher must conclude that a person's sex has no relation to the
+person's occupation.
+
+@float Results, crosstabs:res
+@psppoutput {crosstabs}
+@caption {The results of a test of independence between @exvar{sex} and @exvar{occupation}}
+@end float
+
+@node CTABLES
+@section CTABLES
+
+@vindex CTABLES
+@cindex custom tables
+@cindex tables, custom
+
+@code{CTABLES} has the following overall syntax. At least one
+@code{TABLE} subcommand is required:
+
+@display
+@t{CTABLES}
+ @dots{}@i{global subcommands}@dots{}
+ [@t{/TABLE} @i{axis} [@t{BY} @i{axis} [@t{BY} @i{axis}]]
+ @dots{}@i{per-table subcommands}@dots{}]@dots{}
+@end display
+
+@noindent
+where each @i{axis} may be empty or take one of the following forms:
+
+@display
+@i{variable}
+@i{variable} @t{[}@{@t{C} @math{|} @t{S}@}@t{]}
+@i{axis} + @i{axis}
+@i{axis} > @i{axis}
+(@i{axis})
+@i{axis} @t{[}@i{summary} [@i{string}] [@i{format}]@t{]}
+@end display
+
+The following subcommands precede the first @code{TABLE} subcommand
+and apply to all of the output tables. All of these subcommands are
+optional:
+
+@display
+@t{/FORMAT}
+ [@t{MINCOLWIDTH=}@{@t{DEFAULT} @math{|} @i{width}@}]
+ [@t{MAXCOLWIDTH=}@{@t{DEFAULT} @math{|} @i{width}@}]
+ [@t{UNITS=}@{@t{POINTS} @math{|} @t{INCHES} @math{|} @t{CM}@}]
+ [@t{EMPTY=}@{@t{ZERO} @math{|} @t{BLANK} @math{|} @i{string}@}]
+ [@t{MISSING=}@i{string}]
+@t{/VLABELS}
+ @t{VARIABLES=}@i{variables}
+ @t{DISPLAY}=@{@t{DEFAULT} @math{|} @t{NAME} @math{|} @t{LABEL} @math{|} @t{BOTH} @math{|} @t{NONE}@}
+@ignore @c not yet implemented
+@t{/MRSETS COUNTDUPLICATES=}@{@t{YES} @math{|} @t{NO}@}
+@end ignore
+@t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@}
+@t{/PCOMPUTE} @t{&}@i{postcompute}@t{=EXPR(}@i{expression}@t{)}
+@t{/PPROPERTIES} @t{&}@i{postcompute}@dots{}
+ [@t{LABEL=}@i{string}]
+ [@t{FORMAT=}[@i{summary} @i{format}]@dots{}]
+ [@t{HIDESOURCECATS=}@{@t{NO} @math{|} @t{YES}@}
+@t{/WEIGHT VARIABLE=}@i{variable}
+@t{/HIDESMALLCOUNTS COUNT=@i{count}}
+@end display
+
+The following subcommands follow @code{TABLE} and apply only to the
+previous @code{TABLE}. All of these subcommands are optional:
+
+@display
+@t{/SLABELS}
+ [@t{POSITION=}@{@t{COLUMN} @math{|} @t{ROW} @math{|} @t{LAYER}@}]
+ [@t{VISIBLE=}@{@t{YES} @math{|} @t{NO}@}]
+@t{/CLABELS} @{@t{AUTO} @math{|} @{@t{ROWLABELS}@math{|}@t{COLLABELS}@}@t{=}@{@t{OPPOSITE}@math{|}@t{LAYER}@}@}
+@t{/CATEGORIES} @t{VARIABLES=}@i{variables}
+ @{@t{[}@i{value}@t{,} @i{value}@dots{}@t{]}
+ @math{|} [@t{ORDER=}@{@t{A} @math{|} @t{D}@}]
+ [@t{KEY=}@{@t{VALUE} @math{|} @t{LABEL} @math{|} @i{summary}@t{(}@i{variable}@t{)}@}]
+ [@t{MISSING=}@{@t{EXCLUDE} @math{|} @t{INCLUDE}@}]@}
+ [@t{TOTAL=}@{@t{NO} @math{|} @t{YES}@} [@t{LABEL=}@i{string}] [@t{POSITION=}@{@t{AFTER} @math{|} @t{BEFORE}@}]]
+ [@t{EMPTY=}@{@t{INCLUDE} @math{|} @t{EXCLUDE}@}]
+@t{/TITLES}
+ [@t{TITLE=}@i{string}@dots{}]
+ [@t{CAPTION=}@i{string}@dots{}]
+ [@t{CORNER=}@i{string}@dots{}]
+@ignore @c not yet implemented
+@t{/CRITERIA CILEVEL=}@i{percentage}
+@t{/SIGTEST TYPE=CHISQUARE}
+ [@t{ALPHA=}@i{siglevel}]
+ [@t{INCLUDEMRSETS=}@{@t{YES} @math{|} @t{NO}@}]
+ [@t{CATEGORIES=}@{@t{ALLVISIBLE} @math{|} @t{SUBTOTALS}@}]
+@t{/COMPARETEST TYPE=}@{@t{PROP} @math{|} @t{MEAN}@}
+ [@t{ALPHA=}@i{value}[@t{,} @i{value}]]
+ [@t{ADJUST=}@{@t{BONFERRONI} @math{|} @t{BH} @math{|} @t{NONE}@}]
+ [@t{INCLUDEMRSETS=}@{@t{YES} @math{|} @t{NO}@}]
+ [@t{MEANSVARIANCE=}@{@t{ALLCATS} @math{|} @t{TESTEDCATS}@}]
+ [@t{CATEGORIES=}@{@t{ALLVISIBLE} @math{|} @t{SUBTOTALS}@}]
+ [@t{MERGE=}@{@t{NO} @math{|} @t{YES}@}]
+ [@t{STYLE=}@{@t{APA} @math{|} @t{SIMPLE}@}]
+ [@t{SHOWSIG=}@{@t{NO} @math{|} @t{YES}@}]
+@end ignore
+@end display
+
+The @code{CTABLES} (aka ``custom tables'') command produces
+multi-dimensional tables from categorical and scale data. It offers
+many options for data summarization and formatting.
+
+This section's examples use data from the 2008 (USA) National Survey
+of Drinking and Driving Attitudes and Behaviors, a public domain data
+set from the (USA) National Highway Traffic Administration and
+available at @url{https://data.transportation.gov}. @pspp{} includes
+this data set, with a slightly modified dictionary, as
+@file{examples/nhtsa.sav}.
+
+@node CTABLES Basics
+@subsection Basics
+
+The only required subcommand is @code{TABLE}, which specifies the
+variables to include along each axis:
+@display
+@t{/TABLE} @i{rows} [@t{BY} @i{columns} [@t{BY} @i{layers}]]
+@end display
+@noindent
+In @code{TABLE}, each of @var{rows}, @var{columns}, and @var{layers}
+is either empty or an axis expression that specifies one or more
+variables. At least one must specify an axis expression.
+
+@node CTABLES Categorical Variable Basics
+@subsubsection Categorical Variables
+
+An axis expression that names a categorical variable divides the data
+into cells according to the values of that variable. When all the
+variables named on @code{TABLE} are categorical, by default each cell
+displays the number of cases that it contains, so specifying a single
+variable yields a frequency table, much like the output of the
+@code{FREQUENCIES} command (@pxref{FREQUENCIES}):
+
+@example
+CTABLES /TABLE=AgeGroup.
+@end example
+@psppoutput {ctables1}
+
+@noindent
+Specifying a row and a column categorical variable yields a
+crosstabulation, much like the output of the @code{CROSSTABS} command
+(@pxref{CROSSTABS}):
+
+@example
+CTABLES /TABLE=AgeGroup BY qns3a.
+@end example
+@psppoutput {ctables2}
+
+@noindent
+The @samp{>} ``nesting'' operator nests multiple variables on a single
+axis, e.g.:
+
+@example
+CTABLES /TABLE qn105ba BY AgeGroup > qns3a.
+@end example
+@psppoutput {ctables3}
+
+@noindent
+The @samp{+} ``stacking'' operator allows a single output table to
+include multiple data analyses. With @samp{+}, @code{CTABLES} divides
+the output table into multiple @dfn{sections}, each of which includes
+an analysis of the full data set. For example, the following command
+separately tabulates age group and driving frequency by gender:
+
+@example
+CTABLES /TABLE AgeGroup + qn1 BY qns3a.
+@end example
+@psppoutput {ctables4}
+
+@noindent
+When @samp{+} and @samp{>} are used together, @samp{>} binds more
+tightly. Use parentheses to override operator precedence. Thus:
+
+@example
+CTABLES /TABLE qn26 + qn27 > qns3a.
+CTABLES /TABLE (qn26 + qn27) > qns3a.
+@end example
+@psppoutput {ctables5}
+
+@node CTABLES Scalar Variable Basics
+@subsubsection Scalar Variables
+
+For a categorical variable, @code{CTABLES} divides the table into a
+cell per category. For a scalar variable, @code{CTABLES} instead
+calculates a summary measure, by default the mean, of the values that
+fall into a cell. For example, if the only variable specified is a
+scalar variable, then the output is a single cell that holds the mean
+of all of the data:
+
+@example
+CTABLES /TABLE qnd1.
+@end example
+@psppoutput {ctables6}
+
+A scalar variable may nest with categorical variables. The following
+example shows the mean age of survey respondents across gender and
+language groups:
+
+@example
+CTABLES /TABLE qns3a > qnd1 BY region.
+@end example
+@psppoutput {ctables7}
+
+The order of nesting of scalar and categorical variables affects table
+labeling, but it does not affect the data displayed in the table. The
+following example shows how the output changes when the nesting order
+of the scalar and categorical variable are interchanged:
+
+@example
+CTABLES /TABLE qnd1 > qns3a BY region.
+@end example
+@psppoutput {ctables8}
+
+Only a single scalar variable may appear in each section; that is, a
+scalar variable may not nest inside a scalar variable directly or
+indirectly. Scalar variables may only appear on one axis within
+@code{TABLE}.
+
+@node CTABLES Overriding Measurement Level
+@subsubsection Overriding Measurement Level
+
+By default, @code{CTABLES} uses a variable's measurement level to
+decide whether to treat it as categorical or scalar. Variables
+assigned the nominal or ordinal measurement level are treated as
+categorical, and scalar variables are treated as scalar.
+
+When @pspp{} reads data from a file in an external format, such as a
+text file, variables' measurement levels are often unknown. If
+@code{CTABLES} runs when a variable has an unknown measurement level,
+it makes an initial pass through the data to guess measurement levels
+using the rules described earlier in this manual (@pxref{Measurement
+Level}). Use the @code{VARIABLE LEVEL} command to set or change a
+variable's measurement level (@pxref{VARIABLE LEVEL}).
+
+To treat a variable as categorical or scalar only for one use on
+@code{CTABLES}, add @samp{[C]} or @samp{[S]}, respectively, after the
+variable name. The following example shows the output when variable
+@code{qn20} is analyzed as scalar (the default for its measurement
+level) and as categorical:
+
+@example
+CTABLES
+ /TABLE qn20 BY qns3a
+ /TABLE qn20 [C] BY qns3a.
+@end example
+@psppoutput {ctables9}
+
+@ignore
+@node CTABLES Multiple Response Sets
+@subsubheading Multiple Response Sets
+
+The @code{CTABLES} command does not yet support multiple response
+sets.
+@end ignore
+
+@node CTABLES Data Summarization
+@subsection Data Summarization
+
+The @code{CTABLES} command allows the user to control how the data are
+summarized with @dfn{summary specifications}, syntax that lists one or
+more summary function names, optionally separated by commas, and which
+are enclosed in square brackets following a variable name on the
+@code{TABLE} subcommand. When all the variables are categorical,
+summary specifications can be given for the innermost nested variables
+on any one axis. When a scalar variable is present, only the scalar
+variable may have summary specifications.
+
+The following example includes a summary specification for column and
+row percentages for categorical variables, and mean and median for a
+scalar variable:
+
+@example
+CTABLES
+ /TABLE=qnd1 [MEAN, MEDIAN] BY qns3a
+ /TABLE=AgeGroup [COLPCT, ROWPCT] BY qns3a.
+@end example
+@psppoutput {ctables10}
+
+A summary specification may override the default label and format by
+appending a string or format specification or both (in that order) to
+the summary function name. For example:
+
+@example
+CTABLES /TABLE=AgeGroup [COLPCT 'Gender %' PCT5.0,
+ ROWPCT 'Age Group %' PCT5.0]
+ BY qns3a.
+@end example
+@psppoutput {ctables11}
+
+In addition to the standard formats, @code{CTABLES} allows the user to
+specify the following special formats:
+
+@multitable {@code{NEGPAREN@i{w}.@i{d}}} {Encloses all numbers in parentheses.} {@t{(42.96%)}} {@t{(-42.96%)}}
+@item @code{NEGPAREN@i{w}.@i{d}}
+@tab Encloses negative numbers in parentheses.
+@tab @t{@w{ }42.96}
+@tab @t{@w{ }(42.96)}
+
+@item @code{NEQUAL@i{w}.@i{d}}
+@tab Adds a @code{N=} prefix.
+@tab @t{@w{ }N=42.96}
+@tab @t{@w{ }N=-42.96}
+
+@item @code{@code{PAREN@i{w}.@i{d}}}
+@tab Encloses all numbers in parentheses.
+@tab @t{@w{ }(42.96)}
+@tab @t{@w{ }(-42.96)}
+
+@item @code{PCTPAREN@i{w}.@i{d}}
+@tab Encloses all numbers in parentheses with a @samp{%} suffix.
+@tab @t{@w{ }(42.96%)}
+@tab @t{(-42.96%)}
+@end multitable
+
+Parentheses provide a shorthand to apply summary specifications to
+multiple variables. For example, both of these commands:
+
+@example
+CTABLES /TABLE=AgeGroup[COLPCT] + qns1[COLPCT] BY qns3a.
+CTABLES /TABLE=(AgeGroup + qns1)[COLPCT] BY qns3a.
+@end example
+
+@noindent
+produce the same output shown below:
+
+@psppoutput {ctables12}
+
+The following sections list the available summary functions. After
+each function's name is given its default label and format. If no
+format is listed, then the default format is the print format for the
+variable being summarized.
+
+@node CTABLES Summary Functions for Individual Cells
+@subsubsection Summary Functions for Individual Cells
+
+This section lists the summary functions that consider only an
+individual cell in @code{CTABLES}. Only one such summary function,
+@code{COUNT}, may be applied to both categorical and scale variables:
+
+@table @asis
+@item @code{COUNT} (``Count'', F40.0)
+The sum of weights in a cell.
+
+If @code{CATEGORIES} for one or more of the variables in a table
+include missing values (@pxref{CTABLES Per-Variable Category
+Options}), then some or all of the categories for a cell might be
+missing values. @code{COUNT} counts data included in a cell
+regardless of whether its categories are missing.
+@end table
+
+The following summary functions apply only to scale variables or
+totals and subtotals for categorical variables. Be cautious about
+interpreting the summary value in the latter case, because it is not
+necessarily meaningful; however, the mean of a Likert scale, etc.@:
+may have a straightforward interpreation.
+
+@table @asis
+@item @code{MAXIMUM} (``Maximum'')
+The largest value.
+
+@item @code{MEAN} (``Mean'')
+The mean.
+
+@item @code{MEDIAN} (``Median'')
+The median value.
+
+@item @code{MINIMUM} (``Minimum'')
+The smallest value.
+
+@item @code{MISSING} (``Missing'')
+Sum of weights of user- and system-missing values.
+
+@item @code{MODE} (``Mode'')
+The highest-frequency value. Ties are broken by taking the smallest mode.
+
+@item @code{PTILE} @i{n} (``Percentile @i{n}'')
+The @var{n}th percentile, where @math{0 @leq{} @var{n} @leq{} 100}.
+
+@item @code{RANGE} (``Range'')
+The maximum minus the minimum.
+
+@item @code{SEMEAN} (``Std Error of Mean'')
+The standard error of the mean.
+
+@item @code{STDDEV} (``Std Deviation'')
+The standard deviation.
+
+@item @code{SUM} (``Sum'')
+The sum.
+
+@item @code{TOTALN} (``Total N'', F40.0)
+The sum of weights in a cell.
+
+For scale data, @code{COUNT} and @code{TOTALN} are the same.
+
+For categorical data, @code{TOTALN} counts missing values in excluded
+categories, that is, user-missing values not in an explicit category
+list on @code{CATEGORIES} (@pxref{CTABLES Per-Variable Category
+Options}), or user-missing values excluded because
+@code{MISSING=EXCLUDE} is in effect on @code{CATEGORIES}, or
+system-missing values. @code{COUNT} does not count these.
+
+@item @code{VALIDN} (``Valid N'', F40.0)
+The sum of valid count weights in included categories.
+
+@code{VALIDN} does not count missing values regardless of whether they
+are in included categories via @code{CATEGORIES}. @code{VALIDN} does
+not count valid values that are in excluded categories.
+
+@item @code{VARIANCE} (``Variance'')
+The variance.
+@end table
+
+@node CTABLES Summary Functions for Groups of Cells
+@subsubsection Summary Functions for Groups of Cells
+
+These summary functions summarize over multiple cells within an area
+of the output chosen by the user and specified as part of the function
+name. The following basic @var{area}s are supported, in decreasing
+order of size:
+
+@table @code
+@item TABLE
+A @dfn{section}. Stacked variables divide sections of the output from
+each other. sections may span multiple layers.
+
+@item LAYER
+A section within a single layer.
+
+@item SUBTABLE
+A @dfn{subtable}, whose contents are the cells that pair an innermost
+row variable and an innermost column variable within a single layer.
+@end table
+
+The following shows how the output for the table expression @code{qn61
+> qn57 BY qnd7a > qn86 + qn64b BY qns3a}@footnote{This is not
+necessarily a meaningful table, so for clarity variable labels are
+omitted.} is divided up into @code{TABLE}, @code{LAYER}, and
+@code{SUBTABLE} areas. Each unique value for Table ID is one section,
+and similarly for Layer ID and Subtable ID. Thus, this output has two
+@code{TABLE} areas (one for @code{qnd7a} and one for @code{qn64b}),
+four @code{LAYER} areas (for those two variables, per layer), and 12
+@code{SUBTABLE} areas.
+@psppoutput {ctables22}
+
+@code{CTABLES} also supports the following @var{area}s that further
+divide a subtable or a layer within a section:
+
+@table @code
+@item LAYERROW
+@itemx LAYERCOL
+A row or column, respectively, in one layer of a section.
+
+@item ROW
+@itemx COL
+A row or column, respectively, in a subtable.
+@end table
+
+The following summary functions for groups of cells are available for
+each @var{area} described above, for both categorical and scale
+variables:
+
+@table @asis
+@item @code{@i{area}PCT} or @code{@i{area}PCT.COUNT} (``@i{Area} %'', PCT40.1)
+A percentage of total counts within @var{area}.
+
+@item @code{@i{area}PCT.VALIDN} (``@i{Area} Valid N %'', PCT40.1)
+A percentage of total counts for valid values within @var{area}.
+
+@item @code{@i{area}PCT.TOTALN} (``@i{Area} Total N %'', PCT40.1)
+A percentage of total counts for all values within @var{area}.
+@end table
+
+Scale variables and totals and subtotals for categorical variables may
+use the following additional group cell summary function:
+
+@table @asis
+@item @code{@i{area}PCT.SUM} (``@i{Area} Sum %'', PCT40.1)
+Percentage of the sum of the values within @var{area}.
+@end table
+
+@node CTABLES Summary Functions for Adjusted Weights
+@subsubsection Summary Functions for Adjusted Weights
+
+If the @code{WEIGHT} subcommand specified an adjustment weight
+variable, then the following summary functions use its value instead
+of the dictionary weight variable. Otherwise, they are equivalent to
+the summary function without the @samp{E}-prefix:
+
+@itemize @bullet
+@item
+@code{ECOUNT} (``Adjusted Count'', F40.0)
+
+@item
+@code{ETOTALN} (``Adjusted Total N'', F40.0)
+
+@item
+@code{EVALIDN} (``Adjusted Valid N'', F40.0)
+@end itemize
+
+@node CTABLES Unweighted Summary Functions
+@subsubsection Unweighted Summary Functions
+
+The following summary functions with a @samp{U}-prefix are equivalent
+to the same ones without the prefix, except that they use unweighted
+counts:
+
+@itemize @bullet
+@item
+@code{UCOUNT} (``Unweighted Count'', F40.0)
+
+@item
+@code{U@i{area}PCT} or @code{U@i{area}PCT.COUNT} (``Unweighted @i{Area} %'', PCT40.1)
+
+@item
+@code{U@i{area}PCT.VALIDN} (``Unweighted @i{Area} Valid N %'', PCT40.1)
+
+@item
+@code{U@i{area}PCT.TOTALN} (``Unweighted @i{Area} Total N %'', PCT40.1)
+
+@item
+@code{UMEAN} (``Unweighted Mean'')
+
+@item
+@code{UMEDIAN} (``Unweighted Median'')
+
+@item
+@code{UMISSING} (``Unweighted Missing'')
+
+@item
+@code{UMODE} (``Unweight Mode'')
+
+@item
+@code{U@i{area}PCT.SUM} (``Unweighted @i{Area} Sum %'', PCT40.1)
+
+@item
+@code{UPTILE} @i{n} (``Unweighted Percentile @i{n}'')
+
+@item
+@code{USEMEAN} (``Unweighted Std Error of Mean'')
+
+@item
+@code{USTDDEV} (``Unweighted Std Deviation'')
+
+@item
+@code{USUM} (``Unweighted Sum'')
+
+@item
+@code{UTOTALN} (``Unweighted Total N'', F40.0)
+
+@item
+@code{UVALIDN} (``Unweighted Valid N'', F40.0)
+
+@item
+@code{UVARIANCE} (``Unweighted Variance'', F40.0)
+@end itemize
+
+@node CTABLES Statistics Positions and Labels
+@subsection Statistics Positions and Labels
+
+@display
+@t{/SLABELS}
+ [@t{POSITION=}@{@t{COLUMN} @math{|} @t{ROW} @math{|} @t{LAYER}@}]
+ [@t{VISIBLE=}@{@t{YES} @math{|} @t{NO}@}]
+@end display
+
+The @code{SLABELS} subcommand controls the position and visibility of
+summary statistics for the @code{TABLE} subcommand that it follows.
+
+@code{POSITION} sets the axis on which summary statistics appear.
+With @t{POSITION=COLUMN}, which is the default, each summary statistic
+appears in a column. For example:
+
+@example
+CTABLES /TABLE=qnd1 [MEAN, MEDIAN] BY qns3a.
+@end example
+@psppoutput {ctables13}
+
+@noindent
+With @t{POSITION=ROW}, each summary statistic appears in a row, as
+shown below:
+
+@example
+CTABLES /TABLE=qnd1 [MEAN, MEDIAN] BY qns3a /SLABELS POSITION=ROW.
+@end example
+@psppoutput {ctables14}
+
+@noindent
+@t{POSITION=LAYER} is also available to place each summary statistic in
+a separate layer.
+
+Labels for summary statistics are shown by default. Use
+@t{VISIBLE=NO} to suppress them. Because unlabeled data can cause
+confusion, it should only be considered if the meaning of the data is
+evident, as in a simple case like this:
+
+@example
+CTABLES /TABLE=AgeGroup [TABLEPCT] /SLABELS VISIBLE=NO.
+@end example
+@psppoutput {ctables15}
+
+@node CTABLES Category Label Positions
+@subsection Category Label Positions
+
+@display
+@t{/CLABELS} @{@t{AUTO} @math{|} @{@t{ROWLABELS}@math{|}@t{COLLABELS}@}@t{=}@{@t{OPPOSITE}@math{|}@t{LAYER}@}@}
+@end display
+
+The @code{CLABELS} subcommand controls the position of category labels
+for the @code{TABLE} subcommand that it follows. By default, or if
+@t{AUTO} is specified, category labels for a given variable nest
+inside the variable's label on the same axis. For example, the
+command below results in age categories nesting within the age group
+variable on the rows axis and gender categories within the gender
+variable on the columns axis:
+
+@example
+CTABLES /TABLE AgeGroup BY qns3a.
+@end example
+@psppoutput {ctables16}
+
+@t{ROWLABELS=OPPOSITE} or @t{COLLABELS=OPPOSITE} move row or column
+variable category labels, respectively, to the opposite axis. The
+setting affects only the innermost variable or variables, which must
+be categorical, on the given axis. For example:
+
+@example
+CTABLES /TABLE AgeGroup BY qns3a /CLABELS ROWLABELS=OPPOSITE.
+CTABLES /TABLE AgeGroup BY qns3a /CLABELS COLLABELS=OPPOSITE.
+@end example
+@psppoutput {ctables17}
+
+@t{ROWLABELS=LAYER} or @t{COLLABELS=LAYER} move the innermost row or
+column variable category labels, respectively, to the layer axis.
+
+Only one axis's labels may be moved, whether to the opposite axis or
+to the layer axis.
+
+@subsubheading Effect on Summary Statistics
+
+@code{CLABELS} primarily affects the appearance of tables, not the
+data displayed in them. However, @code{CTABLES} can affect the values
+displayed for statistics that summarize areas of a table, since it can
+change the definitions of these areas.
+
+For example, consider the following syntax and output:
+
+@example
+CTABLES /TABLE AgeGroup BY qns3a [ROWPCT, COLPCT].
+@end example
+@psppoutput {ctables23}
+
+@noindent
+Using @code{COLLABELS=OPPOSITE} changes the definitions of rows and
+columns, so that column percentages display what were previously row
+percentages and the new row percentages become meaningless (because
+there is only one cell per row):
+
+@example
+CTABLES
+ /TABLE AgeGroup BY qns3a [ROWPCT, COLPCT]
+ /CLABELS COLLABELS=OPPOSITE.
+@end example
+@psppoutput {ctables24}
+
+@subsubheading Moving Categories for Stacked Variables
+
+If @code{CLABELS} moves category labels from an axis with stacked
+variables, the variables that are moved must have the same category
+specifications (@pxref{CTABLES Per-Variable Category Options}) and the
+same value labels.
+
+The following shows both moving stacked category variables and
+adapting to the changing definitions of rows and columns:
+
+@example
+CTABLES /TABLE (qn105ba + qn105bb) [COLPCT].
+CTABLES /TABLE (qn105ba + qn105bb) [ROWPCT]
+ /CLABELS ROW=OPPOSITE.
+@end example
+@psppoutput {ctables25}
+
+@node CTABLES Per-Variable Category Options
+@subsection Per-Variable Category Options
+
+@display
+@t{/CATEGORIES} @t{VARIABLES=}@i{variables}
+ @{@t{[}@i{value}@t{,} @i{value}@dots{}@t{]}
+ @math{|} [@t{ORDER=}@{@t{A} @math{|} @t{D}@}]
+ [@t{KEY=}@{@t{VALUE} @math{|} @t{LABEL} @math{|} @i{summary}@t{(}@i{variable}@t{)}@}]
+ [@t{MISSING=}@{@t{EXCLUDE} @math{|} @t{INCLUDE}@}]@}
+ [@t{TOTAL=}@{@t{NO} @math{|} @t{YES}@} [@t{LABEL=}@i{string}] [@t{POSITION=}@{@t{AFTER} @math{|} @t{BEFORE}@}]]
+ [@t{EMPTY=}@{@t{INCLUDE} @math{|} @t{EXCLUDE}@}]
+@end display
+
+The @code{CATEGORIES} subcommand specifies, for one or more
+categorical variables, the categories to include and exclude, the sort
+order for included categories, and treatment of missing values. It
+also controls the totals and subtotals to display. It may be
+specified any number of times, each time for a different set of
+variables. @code{CATEGORIES} applies to the table produced by the
+@code{TABLE} subcommand that it follows.
+
+@code{CATEGORIES} does not apply to scalar variables.
+
+@t{VARIABLES} is required and must list the variables for the subcommand
+to affect.
+
+There are two way to specify the Categories to include and their sort
+order:
+
+@table @asis
+@item Explicit categories.
+@anchor{CTABLES Explicit Category List}
+To explicitly specify categories to include, list the categories
+within square brackets in the desired sort order. Use spaces or
+commas to separate values. Categories not covered by the list are
+excluded from analysis.
+
+Each element of the list takes one of the following forms:
+
+@table @t
+@item @i{number}
+@itemx '@i{string}'
+A numeric or string category value, for variables that have the
+corresponding type.
+
+@item '@i{date}'
+@itemx '@i{time}'
+A date or time category value, for variables that have a date or time
+print format.
+
+@item @i{min} THRU @i{max}
+@itemx LO THRU @i{max}
+@itemx @i{min} THRU HI
+A range of category values, where @var{min} and @var{max} each takes
+one of the forms above, in increasing order.
+
+@item MISSING
+All user-missing values. (To match individual user-missing values,
+specify their category values.)
+
+@item OTHERNM
+Any non-missing value not covered by any other element of the list
+(regardless of where @t{OTHERNM} is placed in the list).
+
+@item &@i{postcompute}
+A computed category name (@pxref{CTABLES Computed Categories}).
+@end table
+
+Additional forms, described later, allow for subtotals.
+If multiple elements of the list cover a given category, the last one
+in the list takes precedence.
+
+@item Implicit categories.
+Without an explicit list of categories, @pspp{} sorts
+categories automatically.
+
+The @code{KEY} setting specifies the sort key. By default, or with
+@code{KEY=VALUE}, categories are sorted by default. Categories may
+also be sorted by value label, with @code{KEY=LABEL}, or by the value
+of a summary function, e.g.@: @code{KEY=COUNT}.
+@ignore @c Not yet implemented
+For summary functions, a variable name may be specified in
+parentheses, e.g.@: @code{KEY=MAXIUM(qnd1)}, and this is required for
+functions that apply only to scalar variables. The @code{PTILE}
+function also requires a percentage argument, e.g.@:
+@code{KEY=PTILE(qnd1, 90)}. Only summary functions used in the table
+may be used, except that @code{COUNT} is always allowed.
+@end ignore
+
+By default, or with @code{ORDER=A}, categories are sorted in ascending
+order. Specify @code{ORDER=D} to sort in descending order.
+
+User-missing values are excluded by default, or with
+@code{MISSING=EXCLUDE}. Specify @code{MISSING=INCLUDE} to include
+user-missing values. The system-missing value is always excluded.
+@end table
+
+@subsubheading Totals and Subtotals
+
+@code{CATEGORIES} also controls display of totals and subtotals.
+Totals are not displayed with @code{TOTAL=NO}, which is also the
+default. Specify @code{TOTAL=YES} to display a total. By default,
+the total is labeled ``Total''; use @code{LABEL="@i{label}"} to
+override it.
+
+Subtotals are also not displayed by default. To add one or more
+subtotals, use an explicit category list and insert @code{SUBTOTAL} or
+@code{HSUBTOTAL} in the position or positions where the subtotal
+should appear. With @code{SUBTOTAL}, the subtotal becomes an extra
+row or column or layer; @code{HSUBTOTAL} additionally hides the
+categories that make up the subtotal. Either way, the default label
+is ``Subtotal'', use @code{SUBTOTAL="@i{label}"} or
+@code{HSUBTOTAL="@i{label}"} to specify a custom label.
+
+By default, or with @code{POSITION=AFTER}, totals are displayed in the
+output after the last category and subtotals apply to categories that
+precede them. With @code{POSITION=BEFORE}, totals come before the
+first category and subtotals apply to categories that follow them.
+
+Only categorical variables may have totals and subtotals. Scalar
+variables may be ``totaled'' indirectly by enabling totals and
+subtotals on a categorical variable within which the scalar variable is
+summarized.
+
+@c TODO Specifying summaries for totals and subtotals
+
+@subsubheading Categories Without Values
+
+Some categories might not be included in the data set being analyzed.
+For example, our example data set has no cases in the ``15 or
+younger'' age group. By default, or with @code{EMPTY=INCLUDE},
+@pspp{} includes these empty categories in output tables. To exclude
+them, specify @code{EMPTY=EXCLUDE}.
+
+For implicit categories, empty categories potentially include all the
+values with value labels for a given variable; for explicit
+categories, they include all the values listed individually and all
+values with value labels that are covered by ranges or @code{MISSING}
+or @code{OTHERNM}.
+
+@node CTABLES Titles
+@subsection Titles
+
+@display
+@t{/TITLES}
+ [@t{TITLE=}@i{string}@dots{}]
+ [@t{CAPTION=}@i{string}@dots{}]
+ [@t{CORNER=}@i{string}@dots{}]
+@end display
+
+The @code{TITLES} subcommand sets the title, caption, and corner text
+for the table output for the previous @code{TABLE} subcommand. Any
+number of strings may be specified for each kind of text, with each
+string appearing on a separate line in the output. The title appears
+above the table, the caption below the table, and the corner text
+appears in the table's upper left corner. By default, the title is
+``Custom Tables'' and the caption and corner text are empty. With
+some table output styles, the corner text is not displayed.
+
+The strings provided in this subcommand may contain the following
+macro-like keywords that @pspp{} substitutes at the time that it runs
+the command:
+
+@table @code @c (
+@item )DATE
+The current date, e.g.@: MM/DD/YY. The format is locale-dependent.
+
+@c (
+@item )TIME
+The current time, e.g.@: HH:MM:SS. The format is locale-dependent.
+
+@c (
+@item )TABLE
+The expression specified on the @code{TABLE} command. Summary
+and measurement level specifications are omitted, and variable labels are used in place of variable names.
+@end table
+
+@c TODO example
+
+@node CTABLES Table Formatting
+@subsection Table Formatting
+
+@display
+@t{/FORMAT}
+ [@t{MINCOLWIDTH=}@{@t{DEFAULT} @math{|} @i{width}@}]
+ [@t{MAXCOLWIDTH=}@{@t{DEFAULT} @math{|} @i{width}@}]
+ [@t{UNITS=}@{@t{POINTS} @math{|} @t{INCHES} @math{|} @t{CM}@}]
+ [@t{EMPTY=}@{@t{ZERO} @math{|} @t{BLANK} @math{|} @i{string}@}]
+ [@t{MISSING=}@i{string}]
+@end display
+
+The @code{FORMAT} subcommand, which must precede the first
+@code{TABLE} subcommand, controls formatting for all the output
+tables. @code{FORMAT} and all of its settings are optional.
+
+Use @code{MINCOLWIDTH} and @code{MAXCOLWIDTH} to control the minimum
+or maximum width of columns in output tables. By default, with
+@code{DEFAULT}, column width varies based on content. Otherwise,
+specify a number for either or both of these settings. If both are
+specified, @code{MAXCOLWIDTH} must be greater than or equal to
+@code{MINCOLWIDTH}. The default unit, or with @code{UNITS=POINTS}, is
+points (1/72 inch), or specify @code{UNITS=INCHES} to use inches or
+@code{UNITS=CM} for centimeters.
+
+By default, or with @code{EMPTY=ZERO}, zero values are displayed in
+their usual format. Use @code{EMPTY=BLANK} to use an empty cell
+instead, or @code{EMPTY="@i{string}"} to use the specified string.
+
+By default, missing values are displayed as @samp{.}, the same as in
+other tables. Specify @code{MISSING="@i{string}"} to instead use a
+custom string.
+
+@node CTABLES Display of Variable Labels
+@subsection Display of Variable Labels
+
+@display
+@t{/VLABELS}
+ @t{VARIABLES=}@i{variables}
+ @t{DISPLAY}=@{@t{DEFAULT} @math{|} @t{NAME} @math{|} @t{LABEL} @math{|} @t{BOTH} @math{|} @t{NONE}@}
+@end display
+
+The @code{VLABELS} subcommand, which must precede the first
+@code{TABLE} subcommand, controls display of variable labels in all
+the output tables. @code{VLABELS} is optional. It may appear
+multiple times to adjust settings for different variables.
+
+@code{VARIABLES} and @code{DISPLAY} are required. The value of
+@code{DISPLAY} controls how variable labels are displayed for the
+variables listed on @code{VARIABLES}. The supported values are:
+
+@table @code
+@item DEFAULT
+Use the setting from @code{SET TVARS} (@pxref{SET TVARS}).
+
+@item NAME
+Show only a variable name.
+
+@item LABEL
+Show only a variable label.
+
+@item BOTH
+Show variable name and label.
+
+@item NONE
+Show nothing.
+@end table
+
+@node CTABLES Missing Value Treatment
+@subsection Missing Value Treatment
+
+
+
+The sections below describe how @code{CTABLES} treats missing values
+in categorical and scale variables.
+
+@node CTABLES Categorical Missing Values
+@subsubsection Categorical Missing Values
+
+For categorical variables, in most cases, values that are valid and in
+included categories are analyzed, and values that are missing or in
+excluded categories are not analyzed. (@xref{CTABLES Per-Variable
+Category Options}), for information on included and excluded
+categories.) The exact rules are shown in the following chart, in
+which cells that contain ``yes'' indicate that a value is analyzed:
+
+@multitable {@headitemfont{System-Missing}} {Included Category} {Excluded Category}
+@headitem @tab Included Category @tab Excluded Category
+@item @headitemfont{Valid} @tab yes @tab ---
+@item @headitemfont{User-Missing} @tab yes [*] @tab --- [+]
+@item @headitemfont{System-Missing} @tab n/a [#] @tab --- [+]
+@end multitable
+
+@table @asis
+@item [*]
+Exceptions: The ``@t{VALIDN}'' summary functions (@code{VALIDN},
+@code{EVALIDN}, @code{UVALIDN}, @code{@i{area}PCT.VALIDN}, and
+@code{U@i{area}PCT.VALIDN}), which only count valid values in included
+categories.
+
+@item [+]
+Exceptions: The ``@t{TOTALN}'' summary functions (@code{TOTALN},
+@code{ETOTALN}, @code{UTOTALN}, @code{@i{area}PCT.TOTALN}), and
+@code{U@i{area}PCT.TOTALN}, which count all values (valid and missing)
+in included categories and missing (but not valid) values in excluded
+categories.
+
+@item [#]
+System-missing values are never in included categories.
+@end table
+
+@noindent
+The following table provides another view of the same information:
+
+@multitable {Missing values in excluded categories} {@code{VALIDN}} {other} {@code{TOTALN}}
+@headitem @tab @code{VALIDN} @tab other @tab @code{TOTALN}
+@item Valid values in included categories @tab yes @tab yes @tab yes
+@item Missing values in included categories @tab --- @tab yes @tab yes
+@item Missing values in excluded categories @tab --- @tab --- @tab yes
+@item Valid values in excluded categories @tab --- @tab --- @tab ---
+@end multitable
+
+@node CTABLES Scale Missing Values
+@subsubsection Scale Missing Values
+
+@display
+@t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@}
+@end display
+
+The @code{SMISSING} subcommand, which must precede the first
+@code{TABLE} subcommand, controls treatment of missing values for
+scalar variables in producing all the output tables. @code{SMISSING}
+is optional.
+
+With @code{SMISSING=VARIABLE}, which is the default, missing values
+are excluded on a variable-by-variable basis. With
+@code{SMISSING=LISTWISE}, when stacked scalar variables are nested
+together with a categorical variable, a missing value for any of the
+scalar variables causes the case to be excluded for all of them.
+
+As an example, consider the following dataset, in which @samp{x} is a
+categorical variable and @samp{y} and @samp{z} are scale:
+
+@psppoutput{ctables18}
+
+@noindent
+With the default missing-value treatment, @samp{x}'s mean is 20, based
+on the values 10, 20, and 30, and @samp{y}'s mean is 50, based on 40,
+50, and 60:
+
+@example
+CTABLES /TABLE (y + z) > x.
+@end example
+@psppoutput{ctables19}
+
+@noindent
+By adding @code{SMISSING=LISTWISE}, only cases where @samp{y} and
+@samp{z} are both non-missing are considered, so @samp{x}'s mean
+becomes 15, as the average of 10 and 20, and @samp{y}'s mean becomes
+55, the average of 50 and 60:
+
+@example
+CTABLES /SMISSING LISTWISE /TABLE (y + z) > x.
+@end example
+@psppoutput{ctables20}
+
+@noindent
+Even with @code{SMISSING=LISTWISE}, if @samp{y} and @samp{z} are
+separately nested with @samp{x}, instead of using a single @samp{>}
+operator, missing values revert to being considered on a
+variable-by-variable basis:
+
+@example
+CTABLES /SMISSING LISTWISE /TABLE (y > x) + (z > x).
+@end example
+@psppoutput{ctables21}
+
+@node CTABLES Computed Categories
+@subsection Computed Categories
+
+@display
+@t{/PCOMPUTE} @t{&}@i{postcompute}@t{=EXPR(}@i{expression}@t{)}
+@end display
+
+@dfn{Computed categories}, also called @dfn{postcomputes}, are
+categories created using arithmetic on categories obtained from the
+data. The @code{PCOMPUTE} subcommand defines computed categories,
+which can then be used in two places: on @code{CATEGORIES} within an
+explicit category list (@pxref{CTABLES Explicit Category List}), and on
+the @code{PPROPERTIES} subcommand to define further properties for a
+given postcompute.
+
+@code{PCOMPUTE} must precede the first @code{TABLE} command. It is
+optional and it may be used any number of times to define multiple
+postcomputes.
+
+Each @code{PCOMPUTE} defines one postcompute. Its syntax consists of
+a name to identify the postcompute as a @pspp{} identifier prefixed by
+@samp{&}, followed by @samp{=} and a postcompute expression enclosed
+in @code{EXPR(@dots{})}. A postcompute expression consists of:
+
+@table @t
+@item [@i{category}]
+This form evaluates to the summary statistic for @i{category}, e.g.@:
+@code{[1]} evaluates to the value of the summary statistic associated
+with category 1. The @i{category} may be a number, a quoted string,
+or a quoted time or date value. All of the categories for a given
+postcompute must have the same form. The category must appear in all
+the @code{CATEGORIES} list in which the postcompute is used.
+
+@item [@i{min} THRU @i{max}]
+@itemx [LO THRU @i{max}]
+@itemx [@i{min} THRU HI]
+@itemx MISSING
+@itemx OTHERNM
+These forms evaluate to the summary statistics for a category
+specified with the same syntax, as described in previous section
+(@pxref{CTABLES Explicit Category List}). The category must appear in
+all the @code{CATEGORIES} list in which the postcompute is used.
+
+@item SUBTOTAL
+The summary statistic for the subtotal category. This form is allowed
+only if the @code{CATEGORIES} lists that include this postcompute have
+exactly one subtotal.
+
+@item SUBTOTAL[@i{index}]
+The summary statistic for subtotal category @i{index}, where 1 is the
+first subtotal, 2 is the second, and so on. This form may be used for
+@code{CATEGORIES} lists with any number of subtotals.
+
+@item TOTAL
+The summary statistic for the total. The @code{CATEGORIES} lsits that
+include this postcompute must have a total enabled.
+
+@item @i{a} + @i{b}
+@itemx @i{a} - @i{b}
+@itemx @i{a} * @i{b}
+@itemx @i{a} / @i{b}
+@itemx @i{a} ** @i{b}
+These forms perform arithmetic on the values of postcompute
+expressions @i{a} and @i{b}. The usual operator precedence rules
+apply.
+
+@item @i{number}
+Numeric constants may be used in postcompute expressions.
+
+@item (@i{a})
+Parentheses override operator precedence.
+@end table
+
+A postcompute is not associated with any particular variable.
+Instead, it may be referenced within @code{CATEGORIES} for any
+suitable variable (e.g.@: only a string variable is suitable for a
+postcompute expression that refers to a string category, only a
+variable with subtotals for an expression that refers to subtotals,
+@dots{}).
+
+Normally a named postcompute is defined only once, but if a later
+@code{PCOMPUTE} redefines a postcompute with the same name as an
+earlier one, the later one take precedence.
+
+@node CTABLES Computed Category Properties
+@subsection Computed Category Properties
+
+@display
+@t{/PPROPERTIES} @t{&}@i{postcompute}@dots{}
+ [@t{LABEL=}@i{string}]
+ [@t{FORMAT=}[@i{summary} @i{format}]@dots{}]
+ [@t{HIDESOURCECATS=}@{@t{NO} @math{|} @t{YES}@}
+@end display
+
+The @code{PPROPERTIES} subcommand, which must appear before
+@code{TABLE}, sets properties for one or more postcomputes defined on
+prior @code{PCOMPUTE} subcommands. The subcommand syntax begins with
+the list of postcomputes, each prefixed with @samp{&} as specified on
+@code{PCOMPUTE}.
+
+All of the settings on @code{PPROPERTIES} are optional. Use
+@code{LABEL} to set the label shown for the postcomputes in table
+output. The default label for a postcompute is the expression used to
+define it.
+
+The @code{FORMAT} setting sets summary statistics and display formats
+for the postcomputes.
+
+By default, or with @code{HIDESOURCECATS=NO}, categories referred to
+by computed categories are displayed like other categories. Use
+@code{HIDESOURCECATS=YES} to hide them.
+
+@node CTABLES Base Weight
+@subsection Base Weight
+
+@display
+@t{/WEIGHT VARIABLE=}@i{variable}
+@end display
+
+The @code{WEIGHT} subcommand is optional and must appear before
+@code{TABLE}. If it appears, it must name a numeric variable, known
+as the @dfn{effective base weight} or @dfn{adjustment weight}. The
+effective base weight variable stands in for the dictionary's weight
+variable (@pxref{WEIGHT}), if any, in most calculations in
+@code{CTABLES}. The only exceptions are the @code{COUNT},
+@code{TOTALN}, and @code{VALIDN} summary functions, which use the
+dictionary weight instead.
+
+Weights obtained from the @pspp{} dictionary are rounded to the
+nearest integer at the case level. Effective base weights are not
+rounded. Regardless of the weighting source, @pspp{} does not analyze
+cases with zero, missing, or negative effective weights.
+
+@node CTABLES Hiding Small Counts
+@subsection Hiding Small Counts
+
+@display
+@t{/HIDESMALLCOUNTS COUNT=@i{count}}
+@end display
+
+The @code{HIDESMALLCOUNTS} subcommand is optional. If it specified,
+then count values in output tables less than the value of @i{count}
+are shown as @code{<@i{count}} instead of their true values. The
+value of @i{count} must be an integer and must be at least 2. Case
+weights are considered for deciding whether to hide a count.
+
@node FACTOR
@section FACTOR
If specified, @subcmd{MATRIX IN} must be followed by @samp{COV} or @samp{CORR},
then by @samp{=} and @var{file_spec} all in parentheses.
@var{file_spec} may either be an asterisk, which indicates the currently loaded
-dataset, or it may be a filename to be loaded. @xref{MATRIX DATA}, for the expected
+dataset, or it may be a file name to be loaded. @xref{MATRIX DATA}, for the expected
format of the file.
-The @subcmd{/EXTRACTION} subcommand is used to specify the way in which factors (components) are extracted from the data.
+The @subcmd{/EXTRACTION} subcommand is used to specify the way in which factors
+(components) are extracted from the data.
If @subcmd{PC} is specified, then Principal Components Analysis is used.
If @subcmd{PAF} is specified, then Principal Axis Factoring is
-used. By default Principal Components Analysis will be used.
+used. By default Principal Components Analysis is used.
-The @subcmd{/ROTATION} subcommand is used to specify the method by which the extracted solution will be rotated.
-Three orthogonal rotation methods are available:
+The @subcmd{/ROTATION} subcommand is used to specify the method by which the
+extracted solution is rotated. Three orthogonal rotation methods are available:
@subcmd{VARIMAX} (which is the default), @subcmd{EQUAMAX}, and @subcmd{QUARTIMAX}.
There is one oblique rotation method, @i{viz}: @subcmd{PROMAX}.
Optionally you may enter the power of the promax rotation @var{k}, which must be enclosed in parentheses.
The default value of @var{k} is 5.
-If you don't want any rotation to be performed, the word @subcmd{NOROTATE} will prevent the command from performing any
-rotation on the data.
+If you don't want any rotation to be performed, the word @subcmd{NOROTATE}
+prevents the command from performing any rotation on the data.
-The @subcmd{/METHOD} subcommand should be used to determine whether the covariance matrix or the correlation matrix of the data is
+The @subcmd{/METHOD} subcommand should be used to determine whether the
+covariance matrix or the correlation matrix of the data is
to be analysed. By default, the correlation matrix is analysed.
The @subcmd{/PRINT} subcommand may be used to select which features of the analysis are reported:
Identical to @subcmd{INITIAL} and @subcmd{EXTRACTION}.
@end itemize
-If @subcmd{/PLOT=EIGEN} is given, then a ``Scree'' plot of the eigenvalues will be printed. This can be useful for visualizing
+If @subcmd{/PLOT=EIGEN} is given, then a ``Scree'' plot of the eigenvalues is
+printed. This can be useful for visualizing the factors and deciding
which factors (components) should be retained.
-The @subcmd{/FORMAT} subcommand determined how data are to be displayed in loading matrices. If @subcmd{SORT} is specified, then the variables
-are sorted in descending order of significance. If @subcmd{BLANK(@var{n})} is specified, then coefficients whose absolute value is less
-than @var{n} will not be printed. If the keyword @subcmd{DEFAULT} is given, or if no @subcmd{/FORMAT} subcommand is given, then no sorting is
-performed, and all coefficients will be printed.
-
-The @subcmd{/CRITERIA} subcommand is used to specify how the number of extracted factors (components) are chosen.
-If @subcmd{FACTORS(@var{n})} is
-specified, where @var{n} is an integer, then @var{n} factors will be extracted. Otherwise, the @subcmd{MINEIGEN} setting will
-be used.
-@subcmd{MINEIGEN(@var{l})} requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted.
-The default value of @var{l} is 1.
-The @subcmd{ECONVERGE} setting has effect only when iterative algorithms for factor
-extraction (such as Principal Axis Factoring) are used.
-@subcmd{ECONVERGE(@var{delta})} specifies that
-iteration should cease when
-the maximum absolute value of the communality estimate between one iteration and the previous is less than @var{delta}. The
-default value of @var{delta} is 0.001.
-The @subcmd{ITERATE(@var{m})} may appear any number of times and is used for two different purposes.
-It is used to set the maximum number of iterations (@var{m}) for convergence and also to set the maximum number of iterations
-for rotation.
-Whether it affects convergence or rotation depends upon which subcommand follows the @subcmd{ITERATE} subcommand.
+The @subcmd{/FORMAT} subcommand determined how data are to be
+displayed in loading matrices. If @subcmd{SORT} is specified, then
+the variables are sorted in descending order of significance. If
+@subcmd{BLANK(@var{n})} is specified, then coefficients whose absolute
+value is less than @var{n} are not printed. If the keyword
+@subcmd{DEFAULT} is specified, or if no @subcmd{/FORMAT} subcommand is
+specified, then no sorting is performed, and all coefficients are printed.
+
+You can use the @subcmd{/CRITERIA} subcommand to specify how the number of
+extracted factors (components) are chosen. If @subcmd{FACTORS(@var{n})} is
+specified, where @var{n} is an integer, then @var{n} factors are
+extracted. Otherwise, the @subcmd{MINEIGEN} setting is used.
+@subcmd{MINEIGEN(@var{l})} requests that all factors whose eigenvalues
+are greater than or equal to @var{l} are extracted. The default value
+of @var{l} is 1. The @subcmd{ECONVERGE} setting has effect only when
+using iterative algorithms for factor extraction (such as Principal Axis
+Factoring). @subcmd{ECONVERGE(@var{delta})} specifies that
+iteration should cease when the maximum absolute value of the
+communality estimate between one iteration and the previous is less
+than @var{delta}. The default value of @var{delta} is 0.001.
+
+The @subcmd{ITERATE(@var{m})} may appear any number of times and is
+used for two different purposes. It is used to set the maximum number
+of iterations (@var{m}) for convergence and also to set the maximum
+number of iterations for rotation.
+Whether it affects convergence or rotation depends upon which
+subcommand follows the @subcmd{ITERATE} subcommand.
If @subcmd{EXTRACTION} follows, it affects convergence.
If @subcmd{ROTATION} follows, it affects rotation.
-If neither @subcmd{ROTATION} nor @subcmd{EXTRACTION} follow a @subcmd{ITERATE} subcommand it will be ignored.
+If neither @subcmd{ROTATION} nor @subcmd{EXTRACTION} follow a
+@subcmd{ITERATE} subcommand, then the entire subcommand is ignored.
The default value of @var{m} is 25.
-The @cmd{MISSING} subcommand determines the handling of missing variables.
-If @subcmd{INCLUDE} is set, then user-missing values are included in the
-calculations, but system-missing values are not.
+The @cmd{MISSING} subcommand determines the handling of missing
+variables. If @subcmd{INCLUDE} is set, then user-missing values are
+included in the calculations, but system-missing values are not.
If @subcmd{EXCLUDE} is set, which is the default, user-missing
-values are excluded as well as system-missing values.
-This is the default.
-If @subcmd{LISTWISE} is set, then the entire case is excluded from analysis
-whenever any variable specified in the @cmd{VARIABLES} subcommand
-contains a missing value.
-If @subcmd{PAIRWISE} is set, then a case is considered missing only if either of the
-values for the particular coefficient are missing.
+values are excluded as well as system-missing values. This is the
+default. If @subcmd{LISTWISE} is set, then the entire case is excluded
+from analysis whenever any variable specified in the @cmd{VARIABLES}
+subcommand contains a missing value.
+
+If @subcmd{PAIRWISE} is set, then a case is considered missing only if
+either of the values for the particular coefficient are missing.
The default is @subcmd{LISTWISE}.
@node GLM
appear before the @code{BY} keyword.
The @var{fixed_factors} list must be one or more categorical variables. Normally it
-will not make sense to enter a scalar variable in the @var{fixed_factors} and doing
+does not make sense to enter a scalar variable in the @var{fixed_factors} and doing
so may cause @pspp{} to do a lot of unnecessary processing.
The @subcmd{METHOD} subcommand is used to change the method for producing the sums of
MEANS @var{v} BY @var{g}.
@end example
@noindent then the means, counts and standard deviations for @var{v} after having
-been grouped by @var{g} will be calculated.
+been grouped by @var{g} are calculated.
Instead of the mean, count and standard deviation, you could specify the statistics
in which you are interested:
@example
@item @subcmd{DEFAULT}
This is the same as @subcmd{MEAN} @subcmd{COUNT} @subcmd{STDDEV}.
@item @subcmd{ALL}
- All of the above statistics will be calculated.
+ All of the above statistics are calculated.
@item @subcmd{NONE}
- No statistics will be calculated (only a summary will be shown).
+ No statistics are calculated (only a summary is shown).
@end itemize
have user missing values for the categorical variables should be omitted
from the calculation.
+@subsection Example Means
+
+The dataset in @file{repairs.sav} contains the mean time between failures (@exvar{mtbf})
+for a sample of artifacts produced by different factories and trialed under
+different operating conditions.
+Since there are four combinations of categorical variables, by simply looking
+at the list of data, it would be hard to how the scores vary for each category.
+@ref{means:ex} shows one way of tabulating the @exvar{mtbf} in a way which is
+easier to understand.
+
+@float Example, means:ex
+@psppsyntax {means.sps}
+@caption {Running @cmd{MEANS} on the @exvar{mtbf} score with categories @exvar{factory} and @exvar{environment}}
+@end float
+
+The results are shown in @ref{means:res}. The figures shown indicate the mean,
+standard deviation and number of samples in each category.
+These figures however do not indicate whether the results are statistically
+significant. For that, you would need to use the procedures @cmd{ONEWAY}, @cmd{GLM} or
+@cmd{T-TEST} depending on the hypothesis being tested.
+
+@float Result, means:res
+@psppoutput {means}
+@caption {The @exvar{mtbf} categorised by @exvar{factory} and @exvar{environment}}
+@end float
+
+Note that there is no limit to the number of variables for which you can calculate
+statistics, nor to the number of categorical variables per layer, nor the number
+of layers.
+However, running @cmd{MEANS} on a large numbers of variables, or with categorical variables
+containing a large number of distinct values may result in an extremely large output, which
+will not be easy to interpret.
+So you should consider carefully which variables to select for participation in the analysis.
+
@node NPAR TESTS
@section NPAR TESTS
subcommand @subcmd{/METHOD=EXACT} is specified.
Exact tests give more accurate results, but may take an unacceptably long
time to perform. If the @subcmd{TIMER} keyword is used, it sets a maximum time,
-after which the test will be abandoned, and a warning message printed.
+after which the test is abandoned, and a warning message printed.
The time, in minutes, should be specified in parentheses after the @subcmd{TIMER} keyword.
If the @subcmd{TIMER} keyword is given without this figure, then a default value of 5 minutes
is used.
@menu
* BINOMIAL:: Binomial Test
-* CHISQUARE:: Chisquare Test
+* CHISQUARE:: Chi-square Test
* COCHRAN:: Cochran Q Test
* FRIEDMAN:: Friedman Test
* KENDALL:: Kendall's W Test
than or equal to the threshold value form the first category. Values
greater than the threshold form the second category.
-If two values appear after the variable list, then they will be used
+If two values appear after the variable list, then they are used
as the values which a variable must take to be in the respective
category.
Cases for which a variable takes a value equal to neither of the specified
even for very large sample sizes.
-
@node CHISQUARE
-@subsection Chisquare Test
+@subsection Chi-square Test
@vindex CHISQUARE
-@cindex chisquare test
+@cindex chi-square test
@display
If no @subcmd{/EXPECTED} subcommand is given, then equal frequencies
are expected.
+@subsubsection Chi-square Example
+
+A researcher wishes to investigate whether there are an equal number of
+persons of each sex in a population. The sample chosen for invesigation
+is that from the @file {physiology.sav} dataset. The null hypothesis for
+the test is that the population comprises an equal number of males and females.
+The analysis is performed as shown in @ref{chisquare:ex}.
+
+@float Example, chisquare:ex
+@psppsyntax {chisquare.sps}
+@caption {Performing a chi-square test to check for equal distribution of sexes}
+@end float
+
+There is only one test variable, @i{viz:} @exvar{sex}. The other variables in the dataset
+are ignored.
+
+@float Screenshot, chisquare:scr
+@psppimage {chisquare}
+@caption {Performing a chi-square test using the graphic user interface}
+@end float
+
+In @ref{chisquare:res} the summary box shows that in the sample, there are more males
+than females. However the significance of chi-square result is greater than 0.05
+--- the most commonly accepted p-value --- and therefore
+there is not enough evidence to reject the null hypothesis and one must conclude
+that the evidence does not indicate that there is an imbalance of the sexes
+in the population.
+
+@float Result, chisquare:res
+@psppoutput {chisquare}
+@caption {The results of running a chi-square test on @exvar{sex}}
+@end float
+
@node COCHRAN
@subsection Cochran Q Test
@end display
The Cochran Q test is used to test for differences between three or more groups.
-The data for @var{var_list} in all cases must assume exactly two distinct values (other than missing values).
+The data for @var{var_list} in all cases must assume exactly two
+distinct values (other than missing values).
-The value of Q will be displayed and its Asymptotic significance based on a chi-square distribution.
+The value of Q is displayed along with its Asymptotic significance
+based on a chi-square distribution.
@node FRIEDMAN
@subsection Friedman Test
drawn from a particular distribution. Four distributions are supported, @i{viz:}
Normal, Uniform, Poisson and Exponential.
-Ideally you should provide the parameters of the distribution against which you wish to test
-the data. For example, with the normal distribution the mean (@var{mu})and standard deviation (@var{sigma})
-should be given; with the uniform distribution, the minimum (@var{min})and maximum (@var{max}) value should
-be provided.
-However, if the parameters are omitted they will be imputed from the data. Imputing the
-parameters reduces the power of the test so should be avoided if possible.
-
-In the following example, two variables @var{score} and @var{age} are tested to see if
-they follow a normal distribution with a mean of 3.5 and a standard deviation of 2.0.
+Ideally you should provide the parameters of the distribution against
+which you wish to test the data. For example, with the normal
+distribution the mean (@var{mu})and standard deviation (@var{sigma})
+should be given; with the uniform distribution, the minimum
+(@var{min})and maximum (@var{max}) value should be provided.
+However, if the parameters are omitted they are imputed from the
+data. Imputing the parameters reduces the power of the test so should
+be avoided if possible.
+
+In the following example, two variables @var{score} and @var{age} are
+tested to see if they follow a normal distribution with a mean of 3.5
+and a standard deviation of 2.0.
@example
NPAR TESTS
/KOLMOGOROV-SMIRNOV (normal 3.5 2.0) = @var{score} @var{age}.
The data to be compared are specified by @var{var_list}.
The categorical variable determining the groups to which the
data belongs is given by @var{var}. The limits @var{lower} and
-@var{upper} specify the valid range of @var{var}. Any cases for
-which @var{var} falls outside [@var{lower}, @var{upper}] will be
-ignored.
+@var{upper} specify the valid range of @var{var}.
+If @var{upper} is smaller than @var{lower}, the PSPP will assume their values
+to be reversed. Any cases for which @var{var} falls outside
+[@var{lower}, @var{upper}] are ignored.
-The mean rank of each group as well as the chi-squared value and significance
-of the test will be printed.
-The abbreviated subcommand @subcmd{K-W} may be used in place of @subcmd{KRUSKAL-WALLIS}.
+The mean rank of each group as well as the chi-squared value and
+significance of the test are printed.
+The abbreviated subcommand @subcmd{K-W} may be used in place of
+@subcmd{KRUSKAL-WALLIS}.
@node MANN-WHITNEY
[ /MANN-WHITNEY = @var{var_list} BY var (@var{group1}, @var{group2}) ]
@end display
-The Mann-Whitney subcommand is used to test whether two groups of data come from different populations.
-The variables to be tested should be specified in @var{var_list} and the grouping variable, that determines to which group the test variables belong, in @var{var}.
+The Mann-Whitney subcommand is used to test whether two groups of data
+come from different populations. The variables to be tested should be
+specified in @var{var_list} and the grouping variable, that determines
+to which group the test variables belong, in @var{var}.
@var{Var} may be either a string or an alpha variable.
@var{Group1} and @var{group2} specify the
two values of @var{var} which determine the groups of the test data.
-Cases for which the @var{var} value is neither @var{group1} or @var{group2} will be ignored.
+Cases for which the @var{var} value is neither @var{group1} or
+@var{group2} are ignored.
+
+The value of the Mann-Whitney U statistic, the Wilcoxon W, and the
+significance are printed.
+You may abbreviated the subcommand @subcmd{MANN-WHITNEY} to
+@subcmd{M-W}.
-The value of the Mann-Whitney U statistic, the Wilcoxon W, and the significance will be printed.
-The abbreviated subcommand @subcmd{M-W} may be used in place of @subcmd{MANN-WHITNEY}.
@node MCNEMAR
@subsection McNemar Test
populations with a common median.
The median of the populations against which the samples are to be tested
may be given in parentheses immediately after the
-@subcmd{/MEDIAN} subcommand. If it is not given, the median will be imputed from the
+@subcmd{/MEDIAN} subcommand. If it is not given, the median is imputed from the
union of all the samples.
The variables of the samples to be tested should immediately follow the @samp{=} sign. The
In this mode, you must also use the @subcmd{/VARIABLES} subcommand to
tell @pspp{} which variables you wish to test.
+@subsubsection Example - One Sample T-test
+
+A researcher wishes to know whether the weight of persons in a population
+is different from the national average.
+The samples are drawn from the population under investigation and recorded
+in the file @file{physiology.sav}.
+From the Department of Health, she
+knows that the national average weight of healthy adults is 76.8kg.
+Accordingly the @subcmd{TESTVAL} is set to 76.8.
+The null hypothesis therefore is that the mean average weight of the
+population from which the sample was drawn is 76.8kg.
+
+As previously noted (@pxref{Identifying incorrect data}), one
+sample in the dataset contains a weight value
+which is clearly incorrect. So this is excluded from the analysis
+using the @cmd{SELECT} command.
+
+@float Example, one-sample-t:ex
+@psppsyntax {one-sample-t.sps}
+@caption {Running a one sample T-Test after excluding all non-positive values}
+@end float
+
+@float Screenshot, one-sample-t:scr
+@psppimage {one-sample-t}
+@caption {Using the One Sample T-Test dialog box to test @exvar{weight} for a mean of 76.8kg}
+@end float
+
+
+@ref{one-sample-t:res} shows that the mean of our sample differs from the test value
+by -1.40kg. However the significance is very high (0.610). So one cannot
+reject the null hypothesis, and must conclude there is not enough evidence
+to suggest that the mean weight of the persons in our population is different
+from 76.8kg.
+
+@float Results, one-sample-t:res
+@psppoutput {one-sample-t}
+@caption {The results of a one sample T-test of @exvar{weight} using a test value of 76.8kg}
+@end float
+
@node Independent Samples Mode
@subsection Independent Samples Mode
the independent variable are excluded on a listwise basis, regardless
of whether @subcmd{/MISSING=LISTWISE} was specified.
+@subsubsection Example - Independent Samples T-test
+
+A researcher wishes to know whether within a population, adult males
+are taller than adult females.
+The samples are drawn from the population under investigation and recorded
+in the file @file{physiology.sav}.
+
+As previously noted (@pxref{Identifying incorrect data}), one
+sample in the dataset contains a height value
+which is clearly incorrect. So this is excluded from the analysis
+using the @cmd{SELECT} command.
+
+
+@float Example, indepdendent-samples-t:ex
+@psppsyntax {independent-samples-t.sps}
+@caption {Running a independent samples T-Test after excluding all observations less than 200kg}
+@end float
+
+
+The null hypothesis is that both males and females are on average
+of equal height.
+
+@float Screenshot, independent-samples-t:scr
+@psppimage {independent-samples-t}
+@caption {Using the Independent Sample T-test dialog, to test for differences of @exvar{height} between values of @exvar{sex}}
+@end float
+
+
+In this case, the grouping variable is @exvar{sex}, so this is entered
+as the variable for the @subcmd{GROUP} subcommand. The group values are 0 (male) and
+1 (female).
+
+If you are running the proceedure using syntax, then you need to enter
+the values corresponding to each group within parentheses.
+If you are using the graphic user interface, then you have to open
+the ``Define Groups'' dialog box and enter the values corresponding
+to each group as shown in @ref{define-groups-t:scr}. If, as in this case, the dataset has defined value
+labels for the group variable, then you can enter them by label
+or by value.
+
+@float Screenshot, define-groups-t:scr
+@psppimage {define-groups-t}
+@caption {Setting the values of the grouping variable for an Independent Samples T-test}
+@end float
+
+From @ref{independent-samples-t:res}, one can clearly see that the @emph{sample} mean height
+is greater for males than for females. However in order to see if this
+is a significant result, one must consult the T-Test table.
+
+The T-Test table contains two rows; one for use if the variance of the samples
+in each group may be safely assumed to be equal, and the second row
+if the variances in each group may not be safely assumed to be equal.
+
+In this case however, both rows show a 2-tailed significance less than 0.001 and
+one must therefore reject the null hypothesis and conclude that within
+the population the mean height of males and of females are unequal.
+
+@float Result, independent-samples-t:res
+@psppoutput {independent-samples-t}
+@caption {The results of an independent samples T-test of @exvar{height} by @exvar{sex}}
+@end float
@node Paired Samples Mode
@subsection Paired Samples Mode
to specify different contrast tests.
The @subcmd{MISSING} subcommand defines how missing values are handled.
If @subcmd{LISTWISE} is specified then cases which have missing values for
-the independent variable or any dependent variable will be ignored.
-If @subcmd{ANALYSIS} is specified, then cases will be ignored if the independent
+the independent variable or any dependent variable are ignored.
+If @subcmd{ANALYSIS} is specified, then cases are ignored if the independent
variable is missing or if the dependent variable currently being
analysed is missing. The default is @subcmd{ANALYSIS}.
A setting of @subcmd{EXCLUDE} means that variables whose values are
@end itemize
@noindent
-The optional syntax @code{ALPHA(@var{value})} is used to indicate
-that @var{value} should be used as the
-confidence level for which the posthoc tests will be performed.
-The default is 0.05.
+Use the optional syntax @code{ALPHA(@var{value})} to indicate that
+@cmd{ONEWAY} should perform the posthoc tests at a confidence level of
+@var{value}. If @code{ALPHA(@var{value})} is not specified, then the
+confidence level used is 0.05.
@node QUICK CLUSTER
@section QUICK CLUSTER
If @subcmd{INITIAL} is set, then the initial cluster memberships will
be printed.
If @subcmd{CLUSTER} is set, the cluster memberships of the individual
-cases will be displayed (potentially generating lengthy output).
+cases are displayed (potentially generating lengthy output).
You can specify the subcommand @subcmd{SAVE} to ask that each case's cluster membership
and the euclidean distance between the case and its cluster center be saved to
The @subcmd{VARIABLES} subcommand is required. It determines the set of variables
upon which analysis is to be performed.
-The @subcmd{SCALE} subcommand determines which variables reliability is to be
-calculated for. If it is omitted, then analysis for all variables named
-in the @subcmd{VARIABLES} subcommand will be used.
+The @subcmd{SCALE} subcommand determines the variables for which
+reliability is to be calculated. If @subcmd{SCALE} is omitted, then analysis for
+all variables named in the @subcmd{VARIABLES} subcommand are used.
Optionally, the @var{name} parameter may be specified to set a string name
for the scale.
The default model is @subcmd{ALPHA}.
By default, any cases with user missing, or system missing values for
-any variables given
-in the @subcmd{VARIABLES} subcommand will be omitted from analysis.
-The @subcmd{MISSING} subcommand determines whether user missing values are to
-be included or excluded in the analysis.
+any variables given in the @subcmd{VARIABLES} subcommand are omitted
+from the analysis. The @subcmd{MISSING} subcommand determines whether
+user missing values are included or excluded in the analysis.
The @subcmd{SUMMARY} subcommand determines the type of summary analysis to be performed.
Currently there is only one type: @subcmd{SUMMARY=TOTAL}, which displays per-item
analysis tested against the totals.
+@subsection Example - Reliability
+
+Before analysing the results of a survey -- particularly for a multiple choice survey --
+it is desireable to know whether the respondents have considered their answers
+or simply provided random answers.
+
+In the following example the survey results from the file @file{hotel.sav} are used.
+All five survey questions are included in the reliability analysis.
+However, before running the analysis, the data must be preprocessed.
+An examination of the survey questions reveals that two questions, @i{viz:} v3 and v5
+are negatively worded, whereas the others are positively worded.
+All questions must be based upon the same scale for the analysis to be meaningful.
+One could use the @cmd{RECODE} command (@pxref{RECODE}), however a simpler way is
+to use @cmd{COMPUTE} (@pxref{COMPUTE}) and this is what is done in @ref{reliability:ex}.
+
+@float Example, reliability:ex
+@psppsyntax {reliability.sps}
+@caption {Investigating the reliability of survey responses}
+@end float
+
+In this case, all variables in the data set are used. So we can use the special
+keyword @samp{ALL} (@pxref{BNF}).
+
+@float Screenshot, reliability:src
+@psppimage {reliability}
+@caption {Reliability dialog box with all variables selected}
+@end float
+
+@ref{reliability:res} shows that Cronbach's Alpha is 0.11 which is a value normally considered too
+low to indicate consistency within the data. This is possibly due to the small number of
+survey questions. The survey should be redesigned before serious use of the results are
+applied.
+
+@float Result, reliability:res
+@psppoutput {reliability}
+@caption {The results of the reliability command on @file{hotel.sav}}
+@end float
@node ROC
If the keyword @subcmd{NONE} is given, then no @subcmd{ROC} curve is drawn.
By default, the curve is drawn with no reference line.
-The optional subcommand @subcmd{PRINT} determines which additional tables should be printed.
-Two additional tables are available.
-The @subcmd{SE} keyword says that standard error of the area under the curve should be printed as well as
-the area itself.
-In addition, a p-value under the null hypothesis that the area under the curve equals 0.5 will be
-printed.
-The @subcmd{COORDINATES} keyword says that a table of coordinates of the @subcmd{ROC} curve should be printed.
+The optional subcommand @subcmd{PRINT} determines which additional
+tables should be printed. Two additional tables are available. The
+@subcmd{SE} keyword says that standard error of the area under the
+curve should be printed as well as the area itself. In addition, a
+p-value for the null hypothesis that the area under the curve equals
+0.5 is printed. The @subcmd{COORDINATES} keyword says that a
+table of coordinates of the @subcmd{ROC} curve should be printed.
The @subcmd{CRITERIA} subcommand has four optional parameters:
@itemize @bullet
be included or excluded in the analysis. The default behaviour is to
exclude them.
Cases are excluded on a listwise basis; if any of the variables in @var{var_list}
-or if the variable @var{state_var} is missing, then the entire case will be
+or if the variable @var{state_var} is missing, then the entire case is
excluded.
@c LocalWords: subcmd subcommand