Documentation add subsections for each GRAPH type

[pspp] / doc / statistics.texi
diff --git a/doc/statistics.texi b/doc/statistics.texi

index f8a8864da75e2c7db9990a93d94e542cc6797835..3877a8d43d93db8ddd35a49aa287121ac907bd86 100644 (file)
--- a/doc/statistics.texi
+++ b/doc/statistics.texi
@@ -8,9 +8,11 @@ far.
  * DESCRIPTIVES::                Descriptive statistics.
  * FREQUENCIES::                 Frequency tables.
  * EXAMINE::                     Testing data for normality.
+* GRAPH::                       Plot data.
  * CORRELATIONS::                Correlation tables.
  * CROSSTABS::                   Crosstabulation tables.
-* FACTOR::                      Factor analysis and Principal Components analysis
+* FACTOR::                      Factor analysis and Principal Components analysis.
+* LOGISTIC REGRESSION::         Bivariate Logistic Regression.
  * MEANS::                       Average values and other statistics.
  * NPAR TESTS::                  Nonparametric tests.
  * T-TEST::                      Test hypotheses about means.
@@ -51,17 +53,18 @@ variables to be analyzed.  Keyword @subcmd{VARIABLES} is optional.
  All other subcommands are optional:
  
  The @subcmd{MISSING} subcommand determines the handling of missing variables.  If
-INCLUDE is set, then user-missing values are included in the
-calculations.  If NOINCLUDE is set, which is the default, user-missing
-values are excluded.  If VARIABLE is set, then missing values are
-excluded on a variable by variable basis; if LISTWISE is set, then
+@subcmd{INCLUDE} is set, then user-missing values are included in the
+calculations.  If @subcmd{NOINCLUDE} is set, which is the default, user-missing
+values are excluded.  If @subcmd{VARIABLE} is set, then missing values are
+excluded on a variable by variable basis; if @subcmd{LISTWISE} is set, then
  the entire case is excluded whenever any value in that case has a
-system-missing or, if INCLUDE is set, user-missing value.
+system-missing or, if @subcmd{INCLUDE} is set, user-missing value.
  
  The @subcmd{FORMAT} subcommand affects the output format.  Currently the
-LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
+@subcmd{LABELS/NOLABELS} and @subcmd{NOINDEX/INDEX} settings are not used.
+When @subcmd{SERIAL} is
  set, both valid and missing number of cases are listed in the output;
-when NOSERIAL is set, only valid cases are listed.
+when @subcmd{NOSERIAL} is set, only valid cases are listed.
  
  The @subcmd{SAVE} subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  the specified variables.  The Z scores are saved to new variables.
@@ -69,27 +72,29 @@ Variable names are generated by trying first the original variable name
  with Z prepended and truncated to a maximum of 8 characters, then the
  names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
-variable names can be specified explicitly on VARIABLES in the variable
+variable names can be specified explicitly on @subcmd{VARIABLES} in the variable
  list by enclosing them in parentheses after each variable.
+When Z scores are calculated, @pspp{} ignores @cmd{TEMPORARY},
+treating temporary transformations as permanent.
  
  The @subcmd{STATISTICS} subcommand specifies the statistics to be displayed:
  
  @table @code
-@item ALL
+@item @subcmd{ALL}
  All of the statistics below.
-@item MEAN
+@item @subcmd{MEAN}
  Arithmetic mean.
-@item SEMEAN
+@item @subcmd{SEMEAN}
  Standard error of the mean.
-@item STDDEV
+@item @subcmd{STDDEV}
  Standard deviation.
-@item VARIANCE
+@item @subcmd{VARIANCE}
  Variance.
-@item KURTOSIS
+@item @subcmd{KURTOSIS}
  Kurtosis and standard error of the kurtosis.
-@item SKEWNESS
+@item @subcmd{SKEWNESS}
  Skewness and standard error of the skewness.
-@item RANGE
+@item @subcmd{RANGE}
  Range.
  @item MINIMUM
  Minimum value.
@@ -106,11 +111,11 @@ Standard error of the skewness.
  @end table
  
  The @subcmd{SORT} subcommand specifies how the statistics should be sorted.  Most
-of the possible values should be self-explanatory.  NAME causes the
+of the possible values should be self-explanatory.  @subcmd{NAME} causes the
  statistics to be sorted by name.  By default, the statistics are listed
-in the order that they are specified on the @subcmd{VARIABLES} subcommand.  The A
-and D settings request an ascending or descending sort order,
-respectively.
+in the order that they are specified on the @subcmd{VARIABLES} subcommand.
+The @subcmd{A} and @subcmd{D} settings request an ascending or descending
+sort order, respectively.
  
  @node FREQUENCIES
  @section FREQUENCIES
@@ -131,9 +136,12 @@ FREQUENCIES
                     [@{FREQ[(@var{y_max})],PERCENT[(@var{y_max})]@}] [@{NONORMAL,NORMAL@}]
          /PIECHART=[MINIMUM(@var{x_min})] [MAXIMUM(@var{x_max})]
                    [@{FREQ,PERCENT@}] [@{NOMISSING,MISSING@}]
+        /BARCHART=[MINIMUM(@var{x_min})] [MAXIMUM(@var{x_max})]
+                  [@{FREQ,PERCENT@}]
+        /ORDER=@{ANALYSIS,VARIABLE@}
+
  
  (These options are not currently implemented.)
-        /BARCHART=@dots{}
          /HBAR=@dots{}
          /GROUPED=@dots{}
  @end display
@@ -141,9 +149,8 @@ FREQUENCIES
  The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
  variables.
  @cmd{FREQUENCIES} can also calculate and display descriptive statistics
-(including median and mode) and percentiles,
-@cmd{FREQUENCIES} can also output
-histograms and pie charts.  
+(including median and mode) and percentiles, and various graphical representations
+of the frequency distribution.
  
  The @subcmd{VARIABLES} subcommand is the only required subcommand.  Specify the
  variables to be analyzed.
@@ -166,13 +173,14 @@ respectively, by frequency count.
  @end itemize
  
  The @subcmd{MISSING} subcommand controls the handling of user-missing values.
-When EXCLUDE, the default, is set, user-missing values are not included
-in frequency tables or statistics.  When INCLUDE is set, user-missing
+When @subcmd{EXCLUDE}, the default, is set, user-missing values are not included
+in frequency tables or statistics.  When @subcmd{INCLUDE} is set, user-missing
  are included.  System-missing values are never included in statistics,
  but are listed in frequency tables.
  
-The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
-(@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
+The available @subcmd{STATISTICS} are the same as available 
+in @cmd{DESCRIPTIVES} (@pxref{DESCRIPTIVES}), with the addition 
+of @subcmd{MEDIAN}, the data's median
  value, and MODE, the mode.  (If there are multiple modes, the smallest
  value is reported.)  By default, the mean, standard deviation of the
  mean, minimum, and maximum are reported for each variable.
@@ -189,27 +197,48 @@ For instance, @subcmd{/NTILES=4} would cause quartiles to be reported.
  The @subcmd{HISTOGRAM} subcommand causes the output to include a histogram for
  each specified numeric variable.  The X axis by default ranges from
  the minimum to the maximum value observed in the data, but the @subcmd{MINIMUM}
-and @subcmd{MAXIMUM} keywords can set an explicit range.  Specify @subcmd{NORMAL} to
-superimpose a normal curve on the histogram.  Histograms are not
-created for string variables.
+and @subcmd{MAXIMUM} keywords can set an explicit range. 
+@footnote{The number of
+bins is chosen according to the Freedman-Diaconis rule:
+@math{2 \times IQR(x)n^{-1/3}}, where @math{IQR(x)} is the interquartile range of @math{x}
+and @math{n} is the number of samples.    Note that
+@cmd{EXAMINE} uses a different algorithm to determine bin sizes.}
+Histograms are not created for string variables.
+
+Specify @subcmd{NORMAL} to superimpose a normal curve on the
+histogram.
  
  @cindex piechart
  The @subcmd{PIECHART} subcommand adds a pie chart for each variable to the data.  Each
  slice represents one value, with the size of the slice proportional to
  the value's frequency.  By default, all non-missing values are given
-slices.  The @subcmd{MINIMUM} and @subcmd{MAXIMUM} keywords can be used to limit the
-displayed slices to a given range of values.  The @subcmd{MISSING} keyword adds
-slices for missing values.
-
-The @subcmd{FREQ} and @subcmd{PERCENT} options on @subcmd{HISTOGRAM} and @subcmd{PIECHART} are accepted
-but not currently honoured.
+slices.  
+The @subcmd{MINIMUM} and @subcmd{MAXIMUM} keywords can be used to limit the
+displayed slices to a given range of values.  
+The keyword @subcmd{NOMISSING} causes missing values to be omitted from the
+piechart.  This is the default.
+If instead, @subcmd{MISSING} is specified, then a single slice
+will be included representing all system missing and user-missing cases.
+
+@cindex bar chart
+The @subcmd{BARCHART} subcommand produces a bar chart for each variable.
+The @subcmd{MINIMUM} and @subcmd{MAXIMUM} keywords can be used to omit
+categories whose counts which lie outside the specified limits.
+The @subcmd{FREQ} option (default) causes the ordinate to display the frequency
+of each category, whereas the @subcmd{PERCENT} option will display relative
+percentages.
+
+The @subcmd{FREQ} and @subcmd{PERCENT} options on @subcmd{HISTOGRAM} and 
+@subcmd{PIECHART} are accepted but not currently honoured.
+
+The @subcmd{ORDER} subcommand is accepted but ignored.
  
  @node EXAMINE
  @section EXAMINE
  
  @vindex EXAMINE
  @cindex Exploratory data analysis
-@cindex Normality, testing for
+@cindex normality, testing
  
  @display
  EXAMINE
@@ -238,14 +267,14 @@ normal distribution, and for finding outliers and extreme values.
  The @subcmd{VARIABLES} subcommand is mandatory.  
  It specifies the dependent variables and optionally variables to use as
  factors for the analysis.
-Variables listed before the first BY keyword (if any) are the 
+Variables listed before the first @subcmd{BY} keyword (if any) are the 
  dependent variables.
  The dependent variables may optionally be followed by a list of
  factors which tell @pspp{} how to break down the analysis for each
  dependent variable. 
  
  Following the dependent variables, factors may be specified.
-The factors (if desired) should be preceeded by a single BY keyword.
+The factors (if desired) should be preceded by a single @subcmd{BY} keyword.
  The format for each factor is 
  @display
  @var{factorvar} [BY @var{subfactorvar}].
@@ -282,6 +311,9 @@ The first three can be used to visualise how closely each cell conforms to a
  normal distribution, whilst the spread vs.@: level plot can be useful to visualise
  how the variance of differs between factors.
  Boxplots will also show you the outliers and extreme values.
+@footnote{@subcmd{HISTOGRAM} uses Sturges' rule to determine the number of
+bins, as approximately @math{1 + \log2(n)}, where @math{n} is the number of samples.
+Note that @cmd{FREQUENCIES} uses a different algorithm to find the bin size.}
  
  The @subcmd{SPREADLEVEL} plot displays the interquartile range versus the 
  median.  It takes an optional parameter @var{t}, which specifies how the data
@@ -304,10 +336,10 @@ If the @subcmd{/COMPARE} subcommand is omitted, then @pspp{} behaves as if
   
  The @subcmd{ID} subcommand is relevant only if @subcmd{/PLOT=BOXPLOT} or 
  @subcmd{/STATISTICS=EXTREME} has been given.
-If given, it shoule provide the name of a variable which is to be used
+If given, it should provide the name of a variable which is to be used
  to labels extreme values and outliers.
  Numeric or string variables are permissible.  
-If the @subcmd{ID} subcommand is not given, then the casenumber will be used for
+If the @subcmd{ID} subcommand is not given, then the case number will be used for
  labelling.
  
  The @subcmd{CINTERVAL} subcommand specifies the confidence interval to use in
@@ -317,7 +349,7 @@ calculation of the descriptives command.  The default is 95%.
  The @subcmd{PERCENTILES} subcommand specifies which percentiles are to be calculated, 
  and which algorithm to use for calculating them.  The default is to
  calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
-HAVERAGE algorithm.
+@subcmd{HAVERAGE} algorithm.
  
  The @subcmd{TOTAL} and @subcmd{NOTOTAL} subcommands are mutually exclusive.  If @subcmd{NOTOTAL}
  is given and factors have been specified in the @subcmd{VARIABLES} subcommand,
@@ -370,6 +402,121 @@ specified for which
  there are many distinct values, then @cmd{EXAMINE} will produce a very
  large quantity of output.
  
+@node GRAPH
+@section GRAPH
+
+@vindex GRAPH
+@cindex Exploratory data analysis
+@cindex normality, testing
+
+@display
+GRAPH
+        /HISTOGRAM [(NORMAL)]= @var{var}
+        /SCATTERPLOT [(BIVARIATE)] = @var{var1} WITH @var{var2} [BY @var{var3}]
+        /BAR = @{@var{summary-function}(@var{var1}) | @var{count-function}@} BY @var{var2} [BY @var{var3}] 
+        [ /MISSING=@{LISTWISE, VARIABLE@} [@{EXCLUDE, INCLUDE@}] ] 
+               [@{NOREPORT,REPORT@}]
+
+@end display
+
+The @cmd{GRAPH} produces graphical plots of data. Only one of the subcommands 
+@subcmd{HISTOGRAM} or @subcmd{SCATTERPLOT} can be specified, i.e. only one plot
+can be produced per call of @cmd{GRAPH}. The @subcmd{MISSING} is optional. 
+
+@menu
+* SCATTERPLOT::             Cartesian Plots
+* HISTOGRAM::               Histograms
+* BAR CHART::               Bar Charts
+@end menu
+
+@node SCATTERPLOT
+@subsection Scatterplot
+@cindex scatterplot
+
+The subcommand @subcmd{SCATTERPLOT} produces an xy plot of the
+data. The different values of the optional third variable @var{var3}
+will result in different colours and/or markers for the plot. The
+following is an example for producing a scatterplot.
+
+@example
+GRAPH   
+        /SCATTERPLOT = @var{height} WITH @var{weight} BY @var{gender}.
+@end example
+
+This example will produce a scatterplot where @var{height} is plotted versus @var{weight}. Depending
+on the value of the @var{gender} variable, the colour of the datapoint is different. With
+this plot it is possible to analyze gender differences for @var{height} vs.@: @var{weight} relation.
+
+@node HISTOGRAM
+@subsection Histogram
+@cindex histogram
+
+The subcommand @subcmd{HISTOGRAM} produces a histogram. Only one variable is allowed for
+the histogram plot.
+The keyword @subcmd{NORMAL} may be specified in parentheses, to indicate that the ideal normal curve
+should be superimposed over the histogram.
+For an alternative method to produce histograms @pxref{EXAMINE}. The
+following example produces a histogram plot for the variable @var{weight}.
+
+@example
+GRAPH   
+        /HISTOGRAM = @var{weight}.
+@end example
+
+@node BAR CHART
+@subsection Bar Chart
+@cindex bar chart
+
+The subcommand @subcmd{BAR} produces a bar chart.
+This subcommand requires that a @var{count-function} be specified (with no arguments) or a @var{summary-function} with a variable @var{var1} in parentheses.
+Following the summary or count function, the keyword @subcmd{BY} should be specified and then a catagorical variable, @var{var2}.
+The values of the variable @var{var2} determine the labels of the bars to be plotted.
+Optionally a second categorical variable @var{var3} may be specified in which case a clustered (grouped) bar chart is produced.
+
+Valid count functions are
+@table @subcmd
+@item COUNT
+The weighted counts of the cases in each category.
+@item PCT
+The weighted counts of the cases in each category expressed as a percentage of the total weights of the cases.
+@item CUFREQ
+The cumulative weighted counts of the cases in each category.
+@item CUPCT
+The cumulative weighted counts of the cases in each category expressed as a percentage of the total weights of the cases.
+@end table
+
+The summary function is applied to @var{var1} across all cases in each category.
+The recognised summary functions are:
+@table @subcmd
+@item SUM
+The sum.
+@item MEAN
+The arithmetic mean.
+@item MAXIMUM
+The maximum value.
+@item MINIMUM
+The minimum value.
+@end table
+
+The following examples assume a dataset which is the results of a survey.
+Each respondent has indicated annual income, their sex and city of residence.
+One could create a bar chart showing how the mean income varies between of residents of different cities, thus:
+@example
+GRAPH  /BAR  = MEAN(@var{income}) BY @var{city}.
+@end example
+
+This can be extended to also indicate how income in each city differs between the sexes.
+@example
+GRAPH  /BAR  = MEAN(@var{income}) BY @var{city} BY @var{sex}.
+@end example
+
+One might also want to see how many respondents there are from each city.  This can be achieved as follows:
+@example
+GRAPH  /BAR  = COUNT BY @var{city}.
+@end example
+
+Bar charts can also be produced using the @ref{FREQUENCIES} and @ref{CROSSTABS} commands.
+
  @node CORRELATIONS
  @section CORRELATIONS
  
@@ -394,19 +541,18 @@ CORRELATIONS
  The @cmd{CORRELATIONS} procedure produces tables of the Pearson correlation coefficient
  for a set of variables.  The significance of the coefficients are also given.
  
-At least one @subcmd{VARIABLES} subcommand is required. If the WITH keyword is used, then a non-square
-correlation table will be produced.
-The variables preceding WITH, will be used as the rows of the table, and the variables following
-will be the columns of the table.
+At least one @subcmd{VARIABLES} subcommand is required. If the @subcmd{WITH} 
+keyword is used, then a non-square correlation table will be produced.
+The variables preceding @subcmd{WITH}, will be used as the rows of the table,
+and the variables following will be the columns of the table.
  If no @subcmd{WITH} subcommand is given, then a square, symmetrical table using all variables is produced.
  
  
  The @cmd{MISSING} subcommand determines the handling of missing variables.  
-If INCLUDE is set, then user-missing values are included in the
+If @subcmd{INCLUDE} is set, then user-missing values are included in the
  calculations, but system-missing values are not.
  If @subcmd{EXCLUDE} is set, which is the default, user-missing
  values are excluded as well as system-missing values. 
-This is the default.
  
  If @subcmd{LISTWISE} is set, then the entire case is excluded from analysis
  whenever any variable  specified in any @cmd{/VARIABLES} subcommand
@@ -426,13 +572,13 @@ If @subcmd{SIG} is specified, then no highlighting is performed.  This is the de
  
  @cindex covariance
  The @subcmd{STATISTICS} subcommand requests additional statistics to be displayed.  The keyword 
-DESCRIPTIVES requests that the mean, number of non-missing cases, and the non-biased
+@subcmd{DESCRIPTIVES} requests that the mean, number of non-missing cases, and the non-biased
  estimator of the standard deviation are displayed.
  These statistics will be displayed in a separated table, for all the variables listed
  in any @subcmd{/VARIABLES} subcommand.
  The @subcmd{XPROD} keyword requests cross-product deviations and covariance estimators to 
  be displayed for each pair of variables.
-The keyword ALL is the union of @subcmd{DESCRIPTIVES} and @subcmd{XPROD}.
+The keyword @subcmd{ALL} is the union of @subcmd{DESCRIPTIVES} and @subcmd{XPROD}.
  
  @node CROSSTABS
  @section CROSSTABS
@@ -452,6 +598,7 @@ CROSSTABS
                  ASRESIDUAL,ALL,NONE@}
          /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
                       KAPPA,ETA,CORR,ALL,NONE@}
+        /BARCHART
          
  (Integer mode.)
          /VARIABLES=@var{var_list} (@var{low},@var{high})@dots{}
@@ -483,9 +630,9 @@ In general mode, numeric and string variables may be specified on
  TABLES.  In integer mode, only numeric variables are allowed.
  
  The @subcmd{MISSING} subcommand determines the handling of user-missing values.
-When set to TABLE, the default, missing values are dropped on a table by
-table basis.  When set to INCLUDE, user-missing values are included in
-tables and statistics.  When set to REPORT, which is allowed only in
+When set to @subcmd{TABLE}, the default, missing values are dropped on a table by
+table basis.  When set to @subcmd{INCLUDE}, user-missing values are included in
+tables and statistics.  When set to @subcmd{REPORT}, which is allowed only in
  integer mode, user-missing values are included in tables but marked with
  an @samp{M} (for ``missing'') and excluded from statistical
  calculations.
@@ -496,10 +643,10 @@ The @subcmd{FORMAT} subcommand controls the characteristics of the
  crosstabulation tables to be displayed.  It has a number of possible
  settings:
  
-@itemize @subcmd{}
+@itemize @w{}
  @item
-TABLES, the default, causes crosstabulation tables to be output.
-NOTABLES suppresses them.
+@subcmd{TABLES}, the default, causes crosstabulation tables to be output.
+@subcmd{NOTABLES} suppresses them.
  
  @item
  @subcmd{PIVOT}, the default, causes each @subcmd{TABLES} subcommand to be displayed in a
@@ -544,7 +691,8 @@ Suppress cells entirely.
  @end table
  
  @samp{/CELLS} without any settings specified requests @subcmd{COUNT}, @subcmd{ROW},
-@subcmd{COLUMN}, and @subcmd{TOTAL}.  If CELLS is not specified at all then only @subcmd{COUNT}
+@subcmd{COLUMN}, and @subcmd{TOTAL}.  
+If @subcmd{CELLS} is not specified at all then only @subcmd{COUNT}
  will be selected.
  
  The @subcmd{STATISTICS} subcommand selects statistics for computation:
@@ -593,24 +741,24 @@ some statistics are calculated only in integer mode.
  @samp{/STATISTICS} without any settings selects CHISQ.  If the
  @subcmd{STATISTICS} subcommand is not given, no statistics are calculated.
  
-@strong{Please note:} Currently the implementation of CROSSTABS has the
-followings bugs:
+@cindex bar chart
+The @samp{/BARCHART} subcommand produces a clustered bar chart for the first two
+variables on each table.
+If a table has more than two variables, the counts for the third and subsequent levels 
+will be aggregated and the chart will be produces as if there were only two variables.  
+
+
+@strong{Please note:} Currently the implementation of @cmd{CROSSTABS} has the
+following limitations:
  
  @itemize @bullet
  @item
-Pearson's R (but not Spearman) is off a little.
-@item
-T values for Spearman's R and Pearson's R are wrong.
-@item
-Significance of symmetric and directional measures is not calculated.
-@item
-Asymmetric ASEs and T values for lambda are wrong.
+Significance of some symmetric and directional measures is not calculated.
  @item
-ASE of Goodman and Kruskal's tau is not calculated.
+Asymptotic standard error is not calculated for
+Goodman and Kruskal's tau or symmetric Somers' d.
  @item
-ASE of symmetric somers' d is wrong.
-@item
-Approximate T of uncertainty coefficient is wrong.
+Approximate T is not calculated for symmetric uncertainty coefficient.
  @end itemize
  
  Fixes for any of these deficiencies would be welcomed.
@@ -629,9 +777,11 @@ FACTOR  VARIABLES=@var{var_list}
  
          [ /METHOD = @{CORRELATION, COVARIANCE@} ]
  
+        [ /ANALYSIS=@var{var_list} ]
+
          [ /EXTRACTION=@{PC, PAF@}] 
  
-        [ /ROTATION=@{VARIMAX, EQUAMAX, QUARTIMAX, NOROTATE@}]
+        [ /ROTATION=@{VARIMAX, EQUAMAX, QUARTIMAX, PROMAX[(@var{k})], NOROTATE@}]
  
          [ /PRINT=[INITIAL] [EXTRACTION] [ROTATION] [UNIVARIATE] [CORRELATION] [COVARIANCE] [DET] [KMO] [SIG] [ALL] [DEFAULT] ]
  
@@ -644,68 +794,86 @@ FACTOR  VARIABLES=@var{var_list}
          [ /MISSING=[@{LISTWISE, PAIRWISE@}] [@{INCLUDE, EXCLUDE@}] ]
  @end display
  
-The FACTOR command performs Factor Analysis or Principal Axis Factoring on a dataset.  It may be used to find
+The @cmd{FACTOR} command performs Factor Analysis or Principal Axis Factoring on a dataset.  It may be used to find
  common factors in the data or for data reduction purposes.
  
-The @subcmd{VARIABLES} subcommand is required.  It lists the variables which are to partake in the analysis.
+The @subcmd{VARIABLES} subcommand is required.  It lists the variables
+which are to partake in the analysis.  (The @subcmd{ANALYSIS}
+subcommand may optionally further limit the variables that
+participate; it is not useful and implemented only for compatibility.)
  
  The @subcmd{/EXTRACTION} subcommand is used to specify the way in which factors (components) are extracted from the data.
-If PC is specified, then Principal Components Analysis is used.  If PAF is specified, then Principal Axis Factoring is
+If @subcmd{PC} is specified, then Principal Components Analysis is used.  
+If @subcmd{PAF} is specified, then Principal Axis Factoring is
  used. By default Principal Components Analysis will be used.
  
  The @subcmd{/ROTATION} subcommand is used to specify the method by which the extracted solution will be rotated.
-Three methods are available: VARIMAX (which is the default), EQUAMAX, and QUARTIMAX.
-If don't want any rotation to be performed, the word NOROTATE will prevent the command from performing any
-rotation on the data. Oblique rotations are not supported.
+Three orthogonal rotation methods are available: 
+@subcmd{VARIMAX} (which is the default), @subcmd{EQUAMAX}, and @subcmd{QUARTIMAX}.
+There is one oblique rotation method, @i{viz}: @subcmd{PROMAX}.
+Optionally you may enter the power of the promax rotation @var{k}, which must be enclosed in parentheses.
+The default value of @var{k} is 5.
+If you don't want any rotation to be performed, the word @subcmd{NOROTATE} will prevent the command from performing any
+rotation on the data. 
  
  The @subcmd{/METHOD} subcommand should be used to determine whether the covariance matrix or the correlation matrix of the data is
  to be analysed.  By default, the correlation matrix is analysed.
  
  The @subcmd{/PRINT} subcommand may be used to select which features of the analysis are reported:
  
-@itemize @subcmd{}
-@item UNIVARIATE
+@itemize 
+@item @subcmd{UNIVARIATE}
        A table of mean values, standard deviations and total weights are printed.
-@item INITIAL
+@item @subcmd{INITIAL}
        Initial communalities and eigenvalues are printed.
-@item EXTRACTION
+@item @subcmd{EXTRACTION}
        Extracted communalities and eigenvalues are printed.
-@item ROTATION
+@item @subcmd{ROTATION}
        Rotated communalities and eigenvalues are printed.
-@item CORRELATION
+@item @subcmd{CORRELATION}
        The correlation matrix is printed.
-@item COVARIANCE
+@item @subcmd{COVARIANCE}
        The covariance matrix is printed.
-@item DET
+@item @subcmd{DET}
        The determinant of the correlation or covariance matrix is printed.
-@item KMO
+@item @subcmd{KMO}
        The Kaiser-Meyer-Olkin measure of sampling adequacy and the Bartlett test of sphericity is printed.
-@item SIG
+@item @subcmd{SIG}
        The significance of the elements of correlation matrix is printed.
-@item ALL
+@item @subcmd{ALL}
        All of the above are printed.
-@item DEFAULT
-      Identical to INITIAL and EXTRACTION.
+@item @subcmd{DEFAULT}
+      Identical to @subcmd{INITIAL} and @subcmd{EXTRACTION}.
  @end itemize
  
  If @subcmd{/PLOT=EIGEN} is given, then a ``Scree'' plot of the eigenvalues will be printed.  This can be useful for visualizing
  which factors (components) should be retained.
  
-The @subcmd{/FORMAT} subcommand determined how data are to be displayed in loading matrices.  If SORT is specified, then the variables
-are sorted in descending order of significance.  If BLANK(@var{n}) is specified, then coefficients whose absolute value is less
-than @var{n} will not be printed.  If the keyword DEFAULT is given, or if no @subcmd{/FORMAT} subcommand is given, then no sorting is 
+The @subcmd{/FORMAT} subcommand determined how data are to be displayed in loading matrices.  If @subcmd{SORT} is specified, then the variables
+are sorted in descending order of significance.  If @subcmd{BLANK(@var{n})} is specified, then coefficients whose absolute value is less
+than @var{n} will not be printed.  If the keyword @subcmd{DEFAULT} is given, or if no @subcmd{/FORMAT} subcommand is given, then no sorting is 
  performed, and all coefficients will be printed.
  
  The @subcmd{/CRITERIA} subcommand is used to specify how the number of extracted factors (components) are chosen.
  If @subcmd{FACTORS(@var{n})} is
  specified, where @var{n} is an integer, then @var{n} factors will be extracted.  Otherwise, the @subcmd{MINEIGEN} setting will
-be used.  @subcmd{MINEIGEN(@var{l})} requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted.
-The default value of @var{l} is 1.    The @subcmd{ECONVERGE} and @subcmd{ITERATE} settings have effect only when iterative algorithms for factor
-extraction (such as Principal Axis Factoring) are used.   @subcmd{ECONVERGE(@var{delta})} specifies that
+be used.  
+@subcmd{MINEIGEN(@var{l})} requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted.
+The default value of @var{l} is 1.    
+The @subcmd{ECONVERGE} setting has effect only when iterative algorithms for factor
+extraction (such as Principal Axis Factoring) are used.   
+@subcmd{ECONVERGE(@var{delta})} specifies that
  iteration should cease when
  the maximum absolute value of the communality estimate between one iteration and the previous is less than @var{delta}. The
  default value of @var{delta} is 0.001.
-The @subcmd{ITERATE(@var{m})} setting sets the maximum number of iterations to @var{m}.  The default value of @var{m} is 25.
+The @subcmd{ITERATE(@var{m})} may appear any number of times and is used for two different purposes.  
+It is used to set the maximum number of iterations (@var{m}) for convergence and also to set the maximum number of iterations
+for rotation.
+Whether it affects convergence or rotation depends upon which subcommand follows the @subcmd{ITERATE} subcommand.
+If @subcmd{EXTRACTION} follows, it affects convergence.  
+If @subcmd{ROTATION} follows, it affects rotation.  
+If neither @subcmd{ROTATION} nor @subcmd{EXTRACTION} follow a @subcmd{ITERATE} subcommand it will be ignored.
+The default value of @var{m} is 25.
  
  The @cmd{MISSING} subcommand determines the handling of missing variables.  
  If @subcmd{INCLUDE} is set, then user-missing values are included in the
@@ -720,6 +888,92 @@ If @subcmd{PAIRWISE} is set, then a case is considered missing only if either of
  values  for the particular coefficient are missing.
  The default is @subcmd{LISTWISE}.
  
+@node LOGISTIC REGRESSION
+@section LOGISTIC REGRESSION
+
+@vindex LOGISTIC REGRESSION
+@cindex logistic regression
+@cindex bivariate logistic regression
+
+@display
+LOGISTIC REGRESSION [VARIABLES =] @var{dependent_var} WITH @var{predictors}
+
+     [/CATEGORICAL = @var{categorical_predictors}]
+
+     [@{/NOCONST | /ORIGIN | /NOORIGIN @}]
+
+     [/PRINT = [SUMMARY] [DEFAULT] [CI(@var{confidence})] [ALL]]
+
+     [/CRITERIA = [BCON(@var{min_delta})] [ITERATE(@var{max_interations})]
+                  [LCON(@var{min_likelihood_delta})] [EPS(@var{min_epsilon})]
+                  [CUT(@var{cut_point})]]
+
+     [/MISSING = @{INCLUDE|EXCLUDE@}]
+@end display
+
+Bivariate Logistic Regression is used when you want to explain a dichotomous dependent
+variable in terms of one or more predictor variables.
+
+The minimum command is
+@example
+LOGISTIC REGRESSION @var{y} WITH @var{x1} @var{x2} @dots{} @var{xn}.
+@end example
+Here, @var{y} is the dependent variable, which must be dichotomous and @var{x1} @dots{} @var{xn}
+are the predictor variables whose coefficients the procedure estimates.
+
+By default, a constant term is included in the model.
+Hence, the full model is
+@math{
+{\bf y} 
+= b_0 + b_1 {\bf x_1} 
++ b_2 {\bf x_2} 
++ \dots
++ b_n {\bf x_n}
+}
+
+Predictor variables which are categorical in nature should be listed on the @subcmd{/CATEGORICAL} subcommand.
+Simple variables as well as interactions between variables may be listed here.
+
+If you want a model without the constant term @math{b_0}, use the keyword @subcmd{/ORIGIN}.
+@subcmd{/NOCONST} is a synonym for @subcmd{/ORIGIN}.
+
+An iterative Newton-Raphson procedure is used to fit the model.
+The @subcmd{/CRITERIA} subcommand is used to specify the stopping criteria of the procedure,
+and other parameters.
+The value of @var{cut_point} is used in the classification table.  It is the 
+threshold above which predicted values are considered to be 1.  Values
+of @var{cut_point} must lie in the range [0,1].
+During iterations, if any one of the stopping criteria are satisfied, the procedure is
+considered complete.
+The stopping criteria are:
+@itemize
+@item The number of iterations exceeds @var{max_iterations}.  
+      The default value of @var{max_iterations} is 20.
+@item The change in the all coefficient estimates are less than @var{min_delta}.
+The default value of @var{min_delta} is 0.001.
+@item The magnitude of change in the likelihood estimate is less than @var{min_likelihood_delta}.
+The default value of @var{min_delta} is zero.
+This means that this criterion is disabled.
+@item The differential of the estimated probability for all cases is less than @var{min_epsilon}.
+In other words, the probabilities are close to zero or one.
+The default value of @var{min_epsilon} is 0.00000001.
+@end itemize
+
+
+The @subcmd{PRINT} subcommand controls the display of optional statistics.
+Currently there is one such option, @subcmd{CI}, which indicates that the 
+confidence interval of the odds ratio should be displayed as well as its value.
+@subcmd{CI} should be followed by an integer in parentheses, to indicate the
+confidence level of the desired confidence interval.
+
+The @subcmd{MISSING} subcommand determines the handling of missing
+variables.  
+If @subcmd{INCLUDE} is set, then user-missing values are included in the
+calculations, but system-missing values are not.
+If @subcmd{EXCLUDE} is set, which is the default, user-missing
+values are excluded as well as system-missing values. 
+This is the default.
+
  @node MEANS
  @section MEANS
  
@@ -983,7 +1237,7 @@ outside the  specified range are excluded from the analysis.
  The @subcmd{/EXPECTED} subcommand specifies the expected values of each
  category.  
  There must be exactly one non-zero expected value, for each observed
-category, or the @subcmd{EQUAL} keywork must be specified.
+category, or the @subcmd{EQUAL} keyword must be specified.
  You may use the notation @subcmd{@var{n}*@var{f}} to specify @var{n}
  consecutive expected categories all taking a frequency of @var{f}.
  The frequencies given are proportions, not absolute frequencies.  The
@@ -1234,7 +1488,7 @@ of variable preceding @code{WITH} against variable following
  The @subcmd{/WILCOXON} subcommand tests for differences between medians of the 
  variables listed.
  The test does not make any assumptions about the variances of the samples.
-It does however assume that the distribution is symetrical.
+It does however assume that the distribution is symmetrical.
  
  If the @subcmd{WITH} keyword is omitted, then tests for all
  combinations of the listed variables are performed.
@@ -1256,7 +1510,7 @@ of variable preceding @subcmd{WITH} against variable following
  @display
  T-TEST
          /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
-        /CRITERIA=CIN(@var{confidence})
+        /CRITERIA=CI(@var{confidence})
  
  
  (One Sample mode.)
@@ -1398,7 +1652,7 @@ The list of variables must be followed by the @subcmd{BY} keyword and
  the name of the independent (or factor) variable.
  
  You can use the @subcmd{STATISTICS} subcommand to tell @pspp{} to display
-ancilliary information.  The options accepted are:
+ancillary information.  The options accepted are:
  @itemize
  @item DESCRIPTIVES
  Displays descriptive statistics about the groups factored by the independent
@@ -1461,8 +1715,9 @@ The default is 0.05.
  
  @display
  QUICK CLUSTER @var{var_list}
-      [/CRITERIA=CLUSTERS(@var{k}) [MXITER(@var{max_iter})]]
+      [/CRITERIA=CLUSTERS(@var{k}) [MXITER(@var{max_iter})] CONVERGE(@var{epsilon}) [NOINITIAL]]
        [/MISSING=@{EXCLUDE,INCLUDE@} @{LISTWISE, PAIRWISE@}]
+      [/PRINT=@{INITIAL@} @{CLUSTERS@}]
  @end display
  
  The @cmd{QUICK CLUSTER} command performs k-means clustering on the
@@ -1472,11 +1727,29 @@ of similar values and you already know the number of clusters.
  The minimum specification is @samp{QUICK CLUSTER} followed by the names
  of the variables which contain the cluster data.  Normally you will also
  want to specify @subcmd{/CRITERIA=CLUSTERS(@var{k})} where @var{k} is the
-number of clusters.  If this is not given, then @var{k} defaults to 2.
+number of clusters.  If this is not specified, then @var{k} defaults to 2.
+
+If you use @subcmd{/CRITERIA=NOINITIAL} then a naive algorithm to select
+the initial clusters is used.   This will provide for faster execution but
+less well separated initial clusters and hence possibly an inferior final
+result.
+
  
-The command uses an iterative algorithm to determine the clusters for
-each case.  It will continue iterating until convergence, or until @var{max_iter}
-iterations have been done.  The default value of @var{max_iter} is 2.
+@cmd{QUICK CLUSTER} uses an iterative algorithm to select the clusters centers.
+The subcommand  @subcmd{/CRITERIA=MXITER(@var{max_iter})} sets the maximum number of iterations.
+During classification, @pspp{} will continue iterating until until @var{max_iter}
+iterations have been done or the convergence criterion (see below) is fulfilled.
+The default value of @var{max_iter} is 2.
+
+If however, you specify @subcmd{/CRITERIA=NOUPDATE} then after selecting the initial centers,
+no further update to the cluster centers is done.  In this case, @var{max_iter}, if specified.
+is ignored.
+
+The subcommand  @subcmd{/CRITERIA=CONVERGE(@var{epsilon})} is used
+to set the convergence criterion.  The value of convergence criterion is  @var{epsilon}
+times the minimum distance between the @emph{initial} cluster centers.  Iteration stops when
+the  mean cluster distance between  one iteration and the next  
+is less than the convergence criterion.  The default value of @var{epsilon} is zero.
  
  The @subcmd{MISSING} subcommand determines the handling of missing variables.  
  If @subcmd{INCLUDE} is set, then user-missing values are considered at their face
@@ -1491,6 +1764,12 @@ clustering variables contain missing values.  Otherwise it is clustered
  on the basis of the non-missing values.
  The default is @subcmd{LISTWISE}.
  
+The @subcmd{PRINT} subcommand requests additional output to be printed.
+If @subcmd{INITIAL} is set, then the initial cluster memberships will
+be printed.
+If @subcmd{CLUSTERS} is set, the cluster memberships of the individual
+cases will be displayed (potentially generating lengthy output).
+
  
  @node RANK
  @section RANK
@@ -1522,7 +1801,7 @@ more variables whose values are to be ranked.
  After each variable, @samp{A} or @samp{D} may appear, indicating that
  the variable is to be ranked in ascending or descending order.
  Ascending is the default.
-If a BY keyword appears, it should be followed by a list of variables
+If a @subcmd{BY} keyword appears, it should be followed by a list of variables
  which are to serve as group variables.  
  In this case, the cases are gathered into groups, and ranks calculated
  for each group.
@@ -1531,7 +1810,7 @@ The @subcmd{TIES} subcommand specifies how tied values are to be treated.  The
  default is to take the mean value of all the tied cases.
  
  The @subcmd{FRACTION} subcommand specifies how proportional ranks are to be
-calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
+calculated.  This only has any effect if @subcmd{NORMAL} or @subcmd{PROPORTIONAL} rank
  functions are requested.
  
  The @subcmd{PRINT} subcommand may be used to specify that a summary of the rank
@@ -1570,7 +1849,7 @@ RELIABILITY
  @end display
  
  @cindex Cronbach's Alpha
-The @cmd{RELIABILTY} command performs reliability analysis on the data.
+The @cmd{RELIABILITY} command performs reliability analysis on the data.
  
  The @subcmd{VARIABLES} subcommand is required. It determines the set of variables 
  upon which analysis is to be performed.