+@subsection Crosstabs Example
+
+@cindex chi-square test of independence
+
+A researcher wishes to know if, in an industry, a person's sex is related to
+the person's occupation. To investigate this, she has determined that the
+@file{personnel.sav} is a representative, randomly selected sample of persons.
+The researcher's null hypothesis is that a person's sex has no relation to a
+person's occupation. She uses a chi-squared test of independence to investigate
+the hypothesis.
+
+@float Example, crosstabs:ex
+@psppsyntax {crosstabs.sps}
+@caption {Running crosstabs on the @exvar{sex} and @exvar{occupation} variables}
+@end float
+
+The syntax in @ref{crosstabs:ex} conducts a chi-squared test of independence.
+The line @code{/tables = occupation by sex} indicates that @exvar{occupation}
+and @exvar{sex} are the variables to be tabulated. To do this using the @gui{}
+you must place these variable names respectively in the @samp{Row} and
+@samp{Column} fields as shown in @ref{crosstabs:scr}.
+
+@float Screenshot, crosstabs:scr
+@psppimage {crosstabs}
+@caption {The Crosstabs dialog box with the @exvar{sex} and @exvar{occupation} variables selected}
+@end float
+
+Similarly, the @samp{Cells} button shows a dialog box to select the @code{count}
+and @code{expected} options. All other cell options can be deselected for this
+test.
+
+You would use the @samp{Format} and @samp{Statistics} buttons to select options
+for the @subcmd{FORMAT} and @subcmd{STATISTICS} subcommands. In this example,
+the @samp{Statistics} requires only the @samp{Chisq} option to be checked. All
+other options should be unchecked. No special settings are required from the
+@samp{Format} dialog.
+
+As shown in @ref{crosstabs:res} @cmd{CROSSTABS} generates a contingency table
+containing the observed count and the expected count of each sex and each
+occupation. The expected count is the count which would be observed if the
+null hypothesis were true.
+
+The significance of the Pearson Chi-Square value is very much larger than the
+normally accepted value of 0.05 and so one cannot reject the null hypothesis.
+Thus the researcher must conclude that a person's sex has no relation to the
+person's occupation.
+
+@float Results, crosstabs:res
+@psppoutput {crosstabs}
+@caption {The results of a test of independence between @exvar{sex} and @exvar{occupation}}
+@end float
+
+
+@node FACTOR
+@section FACTOR
+
+@vindex FACTOR
+@cindex factor analysis
+@cindex principal components analysis
+@cindex principal axis factoring
+@cindex data reduction
+
+@display
+FACTOR @{
+ VARIABLES=@var{var_list},
+ MATRIX IN (@{CORR,COV@}=@{*,@var{file_spec}@})
+ @}
+
+ [ /METHOD = @{CORRELATION, COVARIANCE@} ]
+
+ [ /ANALYSIS=@var{var_list} ]
+
+ [ /EXTRACTION=@{PC, PAF@}]
+
+ [ /ROTATION=@{VARIMAX, EQUAMAX, QUARTIMAX, PROMAX[(@var{k})], NOROTATE@}]
+
+ [ /PRINT=[INITIAL] [EXTRACTION] [ROTATION] [UNIVARIATE] [CORRELATION] [COVARIANCE] [DET] [KMO] [AIC] [SIG] [ALL] [DEFAULT] ]
+
+ [ /PLOT=[EIGEN] ]
+
+ [ /FORMAT=[SORT] [BLANK(@var{n})] [DEFAULT] ]
+
+ [ /CRITERIA=[FACTORS(@var{n})] [MINEIGEN(@var{l})] [ITERATE(@var{m})] [ECONVERGE (@var{delta})] [DEFAULT] ]
+
+ [ /MISSING=[@{LISTWISE, PAIRWISE@}] [@{INCLUDE, EXCLUDE@}] ]
+@end display
+
+The @cmd{FACTOR} command performs Factor Analysis or Principal Axis Factoring on a dataset. It may be used to find
+common factors in the data or for data reduction purposes.
+
+The @subcmd{VARIABLES} subcommand is required (unless the @subcmd{MATRIX IN}
+subcommand is used).
+It lists the variables which are to partake in the analysis. (The @subcmd{ANALYSIS}
+subcommand may optionally further limit the variables that
+participate; it is useful primarily in conjunction with @subcmd{MATRIX IN}.)
+
+If @subcmd{MATRIX IN} instead of @subcmd{VARIABLES} is specified, then the analysis
+is performed on a pre-prepared correlation or covariance matrix file instead of on
+individual data cases. Typically the matrix file will have been generated by
+@cmd{MATRIX DATA} (@pxref{MATRIX DATA}) or provided by a third party.
+If specified, @subcmd{MATRIX IN} must be followed by @samp{COV} or @samp{CORR},
+then by @samp{=} and @var{file_spec} all in parentheses.
+@var{file_spec} may either be an asterisk, which indicates the currently loaded
+dataset, or it may be a file name to be loaded. @xref{MATRIX DATA}, for the expected
+format of the file.
+
+The @subcmd{/EXTRACTION} subcommand is used to specify the way in which factors
+(components) are extracted from the data.
+If @subcmd{PC} is specified, then Principal Components Analysis is used.
+If @subcmd{PAF} is specified, then Principal Axis Factoring is
+used. By default Principal Components Analysis is used.
+
+The @subcmd{/ROTATION} subcommand is used to specify the method by which the
+extracted solution is rotated. Three orthogonal rotation methods are available:
+@subcmd{VARIMAX} (which is the default), @subcmd{EQUAMAX}, and @subcmd{QUARTIMAX}.
+There is one oblique rotation method, @i{viz}: @subcmd{PROMAX}.
+Optionally you may enter the power of the promax rotation @var{k}, which must be enclosed in parentheses.
+The default value of @var{k} is 5.
+If you don't want any rotation to be performed, the word @subcmd{NOROTATE}
+prevents the command from performing any rotation on the data.
+
+The @subcmd{/METHOD} subcommand should be used to determine whether the
+covariance matrix or the correlation matrix of the data is
+to be analysed. By default, the correlation matrix is analysed.
+
+The @subcmd{/PRINT} subcommand may be used to select which features of the analysis are reported:
+
+@itemize
+@item @subcmd{UNIVARIATE}
+ A table of mean values, standard deviations and total weights are printed.
+@item @subcmd{INITIAL}
+ Initial communalities and eigenvalues are printed.
+@item @subcmd{EXTRACTION}
+ Extracted communalities and eigenvalues are printed.
+@item @subcmd{ROTATION}
+ Rotated communalities and eigenvalues are printed.
+@item @subcmd{CORRELATION}
+ The correlation matrix is printed.
+@item @subcmd{COVARIANCE}
+ The covariance matrix is printed.
+@item @subcmd{DET}
+ The determinant of the correlation or covariance matrix is printed.
+@item @subcmd{AIC}
+ The anti-image covariance and anti-image correlation matrices are printed.
+@item @subcmd{KMO}
+ The Kaiser-Meyer-Olkin measure of sampling adequacy and the Bartlett test of sphericity is printed.
+@item @subcmd{SIG}
+ The significance of the elements of correlation matrix is printed.
+@item @subcmd{ALL}
+ All of the above are printed.
+@item @subcmd{DEFAULT}
+ Identical to @subcmd{INITIAL} and @subcmd{EXTRACTION}.
+@end itemize
+
+If @subcmd{/PLOT=EIGEN} is given, then a ``Scree'' plot of the eigenvalues is
+printed. This can be useful for visualizing the factors and deciding
+which factors (components) should be retained.
+
+The @subcmd{/FORMAT} subcommand determined how data are to be
+displayed in loading matrices. If @subcmd{SORT} is specified, then
+the variables are sorted in descending order of significance. If
+@subcmd{BLANK(@var{n})} is specified, then coefficients whose absolute
+value is less than @var{n} are not printed. If the keyword
+@subcmd{DEFAULT} is specified, or if no @subcmd{/FORMAT} subcommand is
+specified, then no sorting is performed, and all coefficients are printed.
+
+You can use the @subcmd{/CRITERIA} subcommand to specify how the number of
+extracted factors (components) are chosen. If @subcmd{FACTORS(@var{n})} is
+specified, where @var{n} is an integer, then @var{n} factors are
+extracted. Otherwise, the @subcmd{MINEIGEN} setting is used.
+@subcmd{MINEIGEN(@var{l})} requests that all factors whose eigenvalues
+are greater than or equal to @var{l} are extracted. The default value
+of @var{l} is 1. The @subcmd{ECONVERGE} setting has effect only when
+using iterative algorithms for factor extraction (such as Principal Axis
+Factoring). @subcmd{ECONVERGE(@var{delta})} specifies that
+iteration should cease when the maximum absolute value of the
+communality estimate between one iteration and the previous is less
+than @var{delta}. The default value of @var{delta} is 0.001.
+
+The @subcmd{ITERATE(@var{m})} may appear any number of times and is
+used for two different purposes. It is used to set the maximum number
+of iterations (@var{m}) for convergence and also to set the maximum
+number of iterations for rotation.
+Whether it affects convergence or rotation depends upon which
+subcommand follows the @subcmd{ITERATE} subcommand.
+If @subcmd{EXTRACTION} follows, it affects convergence.
+If @subcmd{ROTATION} follows, it affects rotation.
+If neither @subcmd{ROTATION} nor @subcmd{EXTRACTION} follow a
+@subcmd{ITERATE} subcommand, then the entire subcommand is ignored.
+The default value of @var{m} is 25.
+
+The @cmd{MISSING} subcommand determines the handling of missing
+variables. If @subcmd{INCLUDE} is set, then user-missing values are
+included in the calculations, but system-missing values are not.
+If @subcmd{EXCLUDE} is set, which is the default, user-missing
+values are excluded as well as system-missing values. This is the
+default. If @subcmd{LISTWISE} is set, then the entire case is excluded
+from analysis whenever any variable specified in the @cmd{VARIABLES}
+subcommand contains a missing value.
+
+If @subcmd{PAIRWISE} is set, then a case is considered missing only if
+either of the values for the particular coefficient are missing.
+The default is @subcmd{LISTWISE}.
+
+@node GLM
+@section GLM
+
+@vindex GLM
+@cindex univariate analysis of variance
+@cindex fixed effects
+@cindex factorial anova
+@cindex analysis of variance
+@cindex ANOVA
+
+
+@display
+GLM @var{dependent_vars} BY @var{fixed_factors}
+ [/METHOD = SSTYPE(@var{type})]
+ [/DESIGN = @var{interaction_0} [@var{interaction_1} [... @var{interaction_n}]]]
+ [/INTERCEPT = @{INCLUDE|EXCLUDE@}]
+ [/MISSING = @{INCLUDE|EXCLUDE@}]
+@end display
+
+The @cmd{GLM} procedure can be used for fixed effects factorial Anova.
+
+The @var{dependent_vars} are the variables to be analysed.
+You may analyse several variables in the same command in which case they should all
+appear before the @code{BY} keyword.
+
+The @var{fixed_factors} list must be one or more categorical variables. Normally it
+does not make sense to enter a scalar variable in the @var{fixed_factors} and doing
+so may cause @pspp{} to do a lot of unnecessary processing.
+
+The @subcmd{METHOD} subcommand is used to change the method for producing the sums of
+squares. Available values of @var{type} are 1, 2 and 3. The default is type 3.
+
+You may specify a custom design using the @subcmd{DESIGN} subcommand.
+The design comprises a list of interactions where each interaction is a
+list of variables separated by a @samp{*}. For example the command
+@display
+GLM subject BY sex age_group race
+ /DESIGN = age_group sex group age_group*sex age_group*race
+@end display
+@noindent specifies the model @math{subject = age_group + sex + race + age_group*sex + age_group*race}.
+If no @subcmd{DESIGN} subcommand is specified, then the default is all possible combinations
+of the fixed factors. That is to say
+@display
+GLM subject BY sex age_group race
+@end display
+implies the model
+@math{subject = age_group + sex + race + age_group*sex + age_group*race + sex*race + age_group*sex*race}.
+
+
+The @subcmd{MISSING} subcommand determines the handling of missing
+variables.
+If @subcmd{INCLUDE} is set then, for the purposes of GLM analysis,
+only system-missing values are considered
+to be missing; user-missing values are not regarded as missing.
+If @subcmd{EXCLUDE} is set, which is the default, then user-missing
+values are considered to be missing as well as system-missing values.
+A case for which any dependent variable or any factor
+variable has a missing value is excluded from the analysis.
+
+@node LOGISTIC REGRESSION
+@section LOGISTIC REGRESSION
+
+@vindex LOGISTIC REGRESSION
+@cindex logistic regression
+@cindex bivariate logistic regression
+
+@display
+LOGISTIC REGRESSION [VARIABLES =] @var{dependent_var} WITH @var{predictors}
+
+ [/CATEGORICAL = @var{categorical_predictors}]
+
+ [@{/NOCONST | /ORIGIN | /NOORIGIN @}]
+
+ [/PRINT = [SUMMARY] [DEFAULT] [CI(@var{confidence})] [ALL]]
+
+ [/CRITERIA = [BCON(@var{min_delta})] [ITERATE(@var{max_interations})]
+ [LCON(@var{min_likelihood_delta})] [EPS(@var{min_epsilon})]
+ [CUT(@var{cut_point})]]
+
+ [/MISSING = @{INCLUDE|EXCLUDE@}]
+@end display
+
+Bivariate Logistic Regression is used when you want to explain a dichotomous dependent
+variable in terms of one or more predictor variables.
+
+The minimum command is
+@example
+LOGISTIC REGRESSION @var{y} WITH @var{x1} @var{x2} @dots{} @var{xn}.
+@end example
+Here, @var{y} is the dependent variable, which must be dichotomous and @var{x1} @dots{} @var{xn}
+are the predictor variables whose coefficients the procedure estimates.
+
+By default, a constant term is included in the model.
+Hence, the full model is
+@math{
+{\bf y}
+= b_0 + b_1 {\bf x_1}
++ b_2 {\bf x_2}
++ \dots
++ b_n {\bf x_n}
+}
+
+Predictor variables which are categorical in nature should be listed on the @subcmd{/CATEGORICAL} subcommand.
+Simple variables as well as interactions between variables may be listed here.
+
+If you want a model without the constant term @math{b_0}, use the keyword @subcmd{/ORIGIN}.
+@subcmd{/NOCONST} is a synonym for @subcmd{/ORIGIN}.
+
+An iterative Newton-Raphson procedure is used to fit the model.
+The @subcmd{/CRITERIA} subcommand is used to specify the stopping criteria of the procedure,
+and other parameters.
+The value of @var{cut_point} is used in the classification table. It is the
+threshold above which predicted values are considered to be 1. Values
+of @var{cut_point} must lie in the range [0,1].
+During iterations, if any one of the stopping criteria are satisfied, the procedure is
+considered complete.
+The stopping criteria are:
+@itemize
+@item The number of iterations exceeds @var{max_iterations}.
+ The default value of @var{max_iterations} is 20.
+@item The change in the all coefficient estimates are less than @var{min_delta}.
+The default value of @var{min_delta} is 0.001.
+@item The magnitude of change in the likelihood estimate is less than @var{min_likelihood_delta}.
+The default value of @var{min_delta} is zero.
+This means that this criterion is disabled.
+@item The differential of the estimated probability for all cases is less than @var{min_epsilon}.
+In other words, the probabilities are close to zero or one.
+The default value of @var{min_epsilon} is 0.00000001.
+@end itemize
+
+
+The @subcmd{PRINT} subcommand controls the display of optional statistics.
+Currently there is one such option, @subcmd{CI}, which indicates that the
+confidence interval of the odds ratio should be displayed as well as its value.
+@subcmd{CI} should be followed by an integer in parentheses, to indicate the
+confidence level of the desired confidence interval.
+
+The @subcmd{MISSING} subcommand determines the handling of missing
+variables.
+If @subcmd{INCLUDE} is set, then user-missing values are included in the
+calculations, but system-missing values are not.
+If @subcmd{EXCLUDE} is set, which is the default, user-missing
+values are excluded as well as system-missing values.
+This is the default.
+
+@node MEANS
+@section MEANS
+
+@vindex MEANS
+@cindex means
+
+@display
+MEANS [TABLES =]
+ @{@var{var_list}@}
+ [ BY @{@var{var_list}@} [BY @{@var{var_list}@} [BY @{@var{var_list}@} @dots{} ]]]
+
+ [ /@{@var{var_list}@}
+ [ BY @{@var{var_list}@} [BY @{@var{var_list}@} [BY @{@var{var_list}@} @dots{} ]]] ]
+
+ [/CELLS = [MEAN] [COUNT] [STDDEV] [SEMEAN] [SUM] [MIN] [MAX] [RANGE]
+ [VARIANCE] [KURT] [SEKURT]
+ [SKEW] [SESKEW] [FIRST] [LAST]
+ [HARMONIC] [GEOMETRIC]
+ [DEFAULT]
+ [ALL]
+ [NONE] ]
+
+ [/MISSING = [INCLUDE] [DEPENDENT]]
+@end display
+
+You can use the @cmd{MEANS} command to calculate the arithmetic mean and similar
+statistics, either for the dataset as a whole or for categories of data.
+
+The simplest form of the command is
+@example
+MEANS @var{v}.
+@end example
+@noindent which calculates the mean, count and standard deviation for @var{v}.
+If you specify a grouping variable, for example
+@example
+MEANS @var{v} BY @var{g}.
+@end example
+@noindent then the means, counts and standard deviations for @var{v} after having
+been grouped by @var{g} are calculated.
+Instead of the mean, count and standard deviation, you could specify the statistics
+in which you are interested:
+@example
+MEANS @var{x} @var{y} BY @var{g}
+ /CELLS = HARMONIC SUM MIN.
+@end example
+This example calculates the harmonic mean, the sum and the minimum values of @var{x} and @var{y}
+grouped by @var{g}.
+
+The @subcmd{CELLS} subcommand specifies which statistics to calculate. The available statistics
+are:
+@itemize
+@item @subcmd{MEAN}
+@cindex arithmetic mean
+ The arithmetic mean.
+@item @subcmd{COUNT}
+ The count of the values.
+@item @subcmd{STDDEV}
+ The standard deviation.
+@item @subcmd{SEMEAN}
+ The standard error of the mean.
+@item @subcmd{SUM}
+ The sum of the values.
+@item @subcmd{MIN}
+ The minimum value.
+@item @subcmd{MAX}
+ The maximum value.
+@item @subcmd{RANGE}
+ The difference between the maximum and minimum values.
+@item @subcmd{VARIANCE}
+ The variance.
+@item @subcmd{FIRST}
+ The first value in the category.
+@item @subcmd{LAST}
+ The last value in the category.
+@item @subcmd{SKEW}
+ The skewness.
+@item @subcmd{SESKEW}
+ The standard error of the skewness.
+@item @subcmd{KURT}
+ The kurtosis
+@item @subcmd{SEKURT}
+ The standard error of the kurtosis.
+@item @subcmd{HARMONIC}
+@cindex harmonic mean
+ The harmonic mean.
+@item @subcmd{GEOMETRIC}
+@cindex geometric mean
+ The geometric mean.
+@end itemize
+
+In addition, three special keywords are recognized:
+@itemize
+@item @subcmd{DEFAULT}
+ This is the same as @subcmd{MEAN} @subcmd{COUNT} @subcmd{STDDEV}.
+@item @subcmd{ALL}
+ All of the above statistics are calculated.
+@item @subcmd{NONE}
+ No statistics are calculated (only a summary is shown).
+@end itemize
+
+
+More than one @dfn{table} can be specified in a single command.
+Each table is separated by a @samp{/}. For
+example
+@example
+MEANS TABLES =
+ @var{c} @var{d} @var{e} BY @var{x}
+ /@var{a} @var{b} BY @var{x} @var{y}
+ /@var{f} BY @var{y} BY @var{z}.
+@end example
+has three tables (the @samp{TABLE =} is optional).
+The first table has three dependent variables @var{c}, @var{d} and @var{e}
+and a single categorical variable @var{x}.
+The second table has two dependent variables @var{a} and @var{b},
+and two categorical variables @var{x} and @var{y}.
+The third table has a single dependent variables @var{f}
+and a categorical variable formed by the combination of @var{y} and @var{z}.
+
+
+By default values are omitted from the analysis only if missing values
+(either system missing or user missing)
+for any of the variables directly involved in their calculation are
+encountered.
+This behaviour can be modified with the @subcmd{/MISSING} subcommand.
+Three options are possible: @subcmd{TABLE}, @subcmd{INCLUDE} and @subcmd{DEPENDENT}.
+
+@subcmd{/MISSING = INCLUDE} says that user missing values, either in the dependent
+variables or in the categorical variables should be taken at their face
+value, and not excluded.
+
+@subcmd{/MISSING = DEPENDENT} says that user missing values, in the dependent
+variables should be taken at their face value, however cases which
+have user missing values for the categorical variables should be omitted
+from the calculation.
+
+@subsection Example Means
+
+The dataset in @file{repairs.sav} contains the mean time between failures (@exvar{mtbf})
+for a sample of artifacts produced by different factories and trialed under
+different operating conditions.
+Since there are four combinations of categorical variables, by simply looking
+at the list of data, it would be hard to how the scores vary for each category.
+@ref{means:ex} shows one way of tabulating the @exvar{mtbf} in a way which is
+easier to understand.
+
+@float Example, means:ex
+@psppsyntax {means.sps}
+@caption {Running @cmd{MEANS} on the @exvar{mtbf} score with categories @exvar{factory} and @exvar{environment}}
+@end float
+
+The results are shown in @ref{means:res}. The figures shown indicate the mean,
+standard deviation and number of samples in each category.
+These figures however do not indicate whether the results are statistically
+significant. For that, you would need to use the procedures @cmd{ONEWAY}, @cmd{GLM} or
+@cmd{T-TEST} depending on the hypothesis being tested.
+
+@float Result, means:res
+@psppoutput {means}
+@caption {The @exvar{mtbf} categorised by @exvar{factory} and @exvar{environment}}
+@end float
+
+Note that there is no limit to the number of variables for which you can calculate
+statistics, nor to the number of categorical variables per layer, nor the number
+of layers.
+However, running @cmd{MEANS} on a large numbers of variables, or with categorical variables
+containing a large number of distinct values may result in an extremely large output, which
+will not be easy to interpret.
+So you should consider carefully which variables to select for participation in the analysis.
+