* DESCRIPTIVES:: Descriptive statistics.
* FREQUENCIES:: Frequency tables.
* EXAMINE:: Testing data for normality.
+* CORRELATIONS:: Correlation tables.
* CROSSTABS:: Crosstabulation tables.
+* FACTOR:: Factor analysis and Principal Components analysis
* NPAR TESTS:: Nonparametric tests.
* T-TEST:: Test hypotheses about means.
* ONEWAY:: One way analysis of variance.
* RANK:: Compute rank scores.
* REGRESSION:: Linear regression.
+* RELIABILITY:: Reliability analysis.
+* ROC:: Receiver Operating Characteristic.
@end menu
@node DESCRIPTIVES
For instance, @code{/NTILES=4} would cause quartiles to be reported.
The HISTOGRAM subcommand causes the output to include a histogram for
-each specified variable. The X axis by default ranges from the
+each specified numeric variable. The X axis by default ranges from the
minimum to the maximum value observed in the data, but the MINIMUM and
MAXIMUM keywords can set an explicit range. The Y axis by default is
labeled in frequencies; use the PERCENT keyword to causes it to be
labeled in percent of the total observed count. Specify NORMAL to
superimpose a normal curve on the histogram.
+Histograms are not created for string variables.
The PIECHART adds a pie chart for each variable to the data. Each
slice represents one value, with the size of the slice proportional to
/PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
/CINTERVAL n
/COMPARE=@{GROUPS,VARIABLES@}
- /ID=@{case_number, var_name@}
+ /ID=var_name
/@{TOTAL,NOTOTAL@}
/PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
/MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
each containing one boxplot per dependent variable.
If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
/COMPARE=GROUPS.
+
+The ID subcommand also pertains to boxplots. If given, it must
+specify a variable name. Outliers and extreme cases plotted in
+boxplots will be labelled with the case from that variable. Numeric or
+string variables are permissible. If the ID subcommand is not given,
+then the casenumber will be used for labelling.
The CINTERVAL subcommand specifies the confidence interval to use in
calculation of the descriptives command. The default it 95%.
there are many distinct values, then @cmd{EXAMINE} will produce a very
large quantity of output.
+@node CORRELATIONS
+@section CORRELATIONS
+
+@vindex CORRELATIONS
+@display
+CORRELATIONS
+ /VARIABLES = varlist [ WITH varlist ]
+ [
+ .
+ .
+ .
+ /VARIABLES = varlist [ WITH varlist ]
+ /VARIABLES = varlist [ WITH varlist ]
+ ]
+
+ [ /PRINT=@{TWOTAIL, ONETAIL@} @{SIG, NOSIG@} ]
+ [ /STATISTICS=DESCRIPTIVES XPROD ALL]
+ [ /MISSING=@{PAIRWISE, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
+@end display
+
+@cindex correlation
+The @cmd{CORRELATIONS} procedure produces tables of the Pearson correlation coefficient
+for a set of variables. The significance of the coefficients are also given.
+
+At least one VARIABLES subcommand is required. If the WITH keyword is used, then a non-square
+correlation table will be produced.
+The variables preceding WITH, will be used as the rows of the table, and the variables following
+will be the columns of the table.
+If no WITH subcommand is given, then a square, symmetrical table using all variables is produced.
+
+
+The @cmd{MISSING} subcommand determines the handling of missing variables.
+If INCLUDE is set, then user-missing values are included in the
+calculations, but system-missing values are not.
+If EXCLUDE is set, which is the default, user-missing
+values are excluded as well as system-missing values.
+This is the default.
+
+If LISTWISE is set, then the entire case is excluded from analysis
+whenever any variable specified in any @cmd{/VARIABLES} subcommand
+contains a missing value.
+If PAIRWISE is set, then a case is considered missing only if either of the
+values for the particular coefficient are missing.
+The default is PAIRWISE.
+
+The PRINT subcommand is used to control how the reported significance values are printed.
+If the TWOTAIL option is used, then a two-tailed test of significance is
+printed. If the ONETAIL option is given, then a one-tailed test is used.
+The default is TWOTAIL.
+
+If the NOSIG option is specified, then correlation coefficients with significance less than
+0.05 are highlighted.
+If SIG is specified, then no highlighting is performed. This is the default.
+
+@cindex covariance
+The STATISTICS subcommand requests additional statistics to be displayed. The keyword
+DESCRIPTIVES requests that the mean, number of non-missing cases, and the non-biased
+estimator of the standard deviation are displayed.
+These statistics will be displayed in a separated table, for all the variables listed
+in any /VARIABLES subcommand.
+The XPROD keyword requests cross-product deviations and covariance estimators to
+be displayed for each pair of variables.
+The keyword ALL is the union of DESCRIPTIVES and XPROD.
@node CROSSTABS
@section CROSSTABS
subcommand.
In general mode, numeric and string variables may be specified on
-TABLES. Although long string variables are allowed, only their
-initial short-string parts are used. In integer mode, only numeric
-variables are allowed.
+TABLES. In integer mode, only numeric variables are allowed.
The MISSING subcommand determines the handling of user-missing values.
When set to TABLE, the default, missing values are dropped on a table by
Fixes for any of these deficiencies would be welcomed.
+@node FACTOR
+@section FACTOR
+
+@vindex FACTOR
+@cindex factor analysis
+@cindex principal components analysis
+@cindex principal axis factoring
+@cindex data reduction
+
+@display
+FACTOR VARIABLES=var_list
+
+ [ /METHOD = @{CORRELATION, COVARIANCE@} ]
+
+ [ /EXTRACTION=@{PC, PAF@}]
+
+ [ /PRINT=[INITIAL] [EXTRACTION] [UNIVARIATE] [CORRELATION] [COVARIANCE] [DET] [SIG] [ALL] [DEFAULT] ]
+
+ [ /PLOT=[EIGEN] ]
+
+ [ /FORMAT=[SORT] [BLANK(@var{n})] [DEFAULT] ]
+
+ [ /CRITERIA=[FACTORS(@var{n})] [MINEIGEN(@var{l})] [ITERATE(@var{m})] [ECONVERGE (@var{delta})] [DEFAULT] ]
+
+ [ /MISSING=[@{LISTWISE, PAIRWISE@}] [@{INCLUDE, EXCLUDE@}] ]
+@end display
+
+The FACTOR command performs Factor Analysis or Principal Axis Factoring on a dataset. It may be used to find
+common factors in the data or for data reduction purposes.
+
+The VARIABLES subcommand is required. It lists the variables which are to partake in the analysis.
+
+The /EXTRACTION subcommand is used to specify the way in which factors (components) are extracted from the data.
+If PC is specified, then Principal Components Analysis is used. If PAF is specified, then Principal Axis Factoring is
+used. By default Principal Components Analysis will be used.
+
+The /METHOD subcommand should be used to determine whether the covariance matrix or the correlation matrix of the data is
+to be analysed. By default, the correlation matrix is analysed.
+
+The /PRINT subcommand may be used to select which features of the analysis are reported:
+
+@itemize
+@item UNIVARIATE
+ A table of mean values, standard deviations and total weights are printed.
+@item INITIAL
+ Initial communalities and eigenvalues are printed.
+@item EXTRACTION
+ Extracted communalities and eigenvalues are printed.
+@item CORRELATION
+ The correlation matrix is printed.
+@item COVARIANCE
+ The covariance matrix is printed.
+@item DET
+ The determinant of the correlation or covariance matrix is printed.
+@item SIG
+ The significance of the elements of correlation matrix is printed.
+@item ALL
+ All of the above are printed.
+@item DEFAULT
+ Identical to INITIAL and EXTRACTION.
+@end itemize
+
+If /PLOT=EIGEN is given, then a ``Scree'' plot of the eigenvalues will be printed. This can be useful for visualising
+which factors (components) should be retained.
+
+The /FORMAT subcommand determined how data are to be displayed in loading matrices. If SORT is specified, then the variables
+are sorted in descending order of significance. If BLANK(@var{n}) is specified, then coefficients whose absolute value is less
+than @var{n} will not be printed. If the keyword DEFAULT is given, or if no /FORMAT subcommand is given, then no sorting is
+performed, and all coefficients will be printed.
+
+The /CRITERIA subcommand is used to specify how the number of extracted factors (components) are chosen. If FACTORS(@var{n}) is
+specified, where @var{n} is an integer, then @var{n} factors will be extracted. Otherwise, the MINEIGEN setting will
+be used. MINEIGEN(@var{l}) requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted.
+The default value of @var{l} is 1. The ECONVERGE and ITERATE settings have effect only when iterative algorithms for factor
+extraction (such as Principal Axis Factoring) are used. ECONVERGE(@var{delta}) specifies that iteration should cease when
+the maximum absolute value of the communality estimate between one iteration and the previous is less than @var{delta}. The
+default value of @var{delta} is 0.001.
+The ITERATE(@var{m}) setting sets the maximum number of iterations to @var{m}. The default value of @var{m} is 25.
+
+The @cmd{MISSING} subcommand determines the handling of missing variables.
+If INCLUDE is set, then user-missing values are included in the
+calculations, but system-missing values are not.
+If EXCLUDE is set, which is the default, user-missing
+values are excluded as well as system-missing values.
+This is the default.
+If LISTWISE is set, then the entire case is excluded from analysis
+whenever any variable specified in the @cmd{VARIABLES} subcommand
+contains a missing value.
+If PAIRWISE is set, then a case is considered missing only if either of the
+values for the particular coefficient are missing.
+The default is LISTWISE.
+
+
@node NPAR TESTS
@section NPAR TESTS
[ /STATISTICS=@{DESCRIPTIVES@} ]
[ /MISSING=@{ANALYSIS, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
+
+ [ /METHOD=EXACT [ TIMER [(n)] ] ]
@end display
NPAR TESTS performs nonparametric tests.
If the /STATISTICS subcommand is also specified, then summary statistics are
produces for each variable that is the subject of any test.
+Certain tests may take a long time to execute, if an exact figure is required.
+Therefore, by default asymptotic approximations are used unless the
+subcommand /METHOD=EXACT is specified.
+Exact tests give more accurate results, but may take an unacceptably long
+time to perform. If the TIMER keyword is used, it sets a maximum time,
+after which the test will be abandoned, and a warning message printed.
+The time, in minutes, should be specified in parentheses after the TIMER keyword.
+If the TIMER keyword is given without this figure, then a default value of 5 minutes
+is used.
+
@menu
* BINOMIAL:: Binomial Test
* CHISQUARE:: Chisquare Test
+* WILCOXON:: Wilcoxon Signed Ranks Test
+* SIGN:: The Sign Test
@end menu
[ /BINOMIAL[(p)]=var_list[(value1[, value2)] ] ]
@end display
-The binomial test compares the observed distribution of a dichotomous
+The /BINOMIAL subcommand compares the observed distribution of a dichotomous
variable with that of a binomial distribution.
The variable @var{p} specifies the test proportion of the binomial
distribution.
If more than two distinct, non-missing values for a variable
under test are encountered then an error occurs.
-If the test proportion is equal to 0.5, then a one tailed test is
+If the test proportion is equal to 0.5, then a two tailed test is
reported. For any other test proportion, a one tailed test is
reported.
For one tailed tests, if the test proportion is less than
@node CHISQUARE
-@subsection Chisquare test
+@subsection Chisquare Test
@vindex CHISQUARE
@cindex chisquare test
@end display
-The chisquare test produces a chi-square statistic for the differences
+The /CHISQUARE subcommand produces a chi-square statistic for the differences
between the expected and observed frequencies of the categories of a variable.
Optionally, a range of values may appear after the variable list.
If a range is given, then non integer values are truncated, and values
If no /EXPECTED subcommand is given, then then equal frequencies
are expected.
+@node WILCOXON
+@subsection Wilcoxon Matched Pairs Signed Ranks Test
+@comment node-name, next, previous, up
+@vindex WILCOXON
+@cindex wilcoxon matched pairs signed ranks test
+
+@display
+ [ /WILCOXON varlist [ WITH varlist [ (PAIRED) ]]]
+@end display
+
+The /WILCOXON subcommand tests for differences between medians of the
+variables listed.
+The test does not make any assumptions about the variances of the samples.
+It does however assume that the distribution is symetrical.
+
+If the @code{WITH} keyword is omitted, then tests for all
+combinations of the listed variables are performed.
+If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
+is also given, then the number of variables preceding @code{WITH}
+must be the same as the number following it.
+In this case, tests for each respective pair of variables are
+performed.
+If the @code{WITH} keyword is given, but the
+@code{(PAIRED)} keyword is omitted, then tests for each combination
+of variable preceding @code{WITH} against variable following
+@code{WITH} are performed.
+
+
+@node SIGN
+@subsection Sign Test
+@vindex SIGN
+@cindex sign test
+
+@display
+ [ /SIGN varlist [ WITH varlist [ (PAIRED) ]]]
+@end display
+
+The /SIGN subcommand tests for differences between medians of the
+variables listed.
+The test does not make any assumptions about the
+distribution of the data.
+
+If the @code{WITH} keyword is omitted, then tests for all
+combinations of the listed variables are performed.
+If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
+is also given, then the number of variables preceding @code{WITH}
+must be the same as the number following it.
+In this case, tests for each respective pair of variables are
+performed.
+If the @code{WITH} keyword is given, but the
+@code{(PAIRED)} keyword is omitted, then tests for each combination
+of variable preceding @code{WITH} against variable following
+@code{WITH} are performed.
@node T-TEST
@comment node-name, next, previous, up
display a warning, but will proceed with the analysis.
The @code{CONTRAST} subcommand may be given up to 10 times in order
to specify different contrast tests.
-@setfilename ignored
@node RANK
@comment node-name, next, previous, up
INCLUDE means they are to be included. The default is EXCLUDE.
@include regression.texi
+
+
+@node RELIABILITY
+@section RELIABILITY
+
+@vindex RELIABILITY
+@display
+RELIABILITY
+ /VARIABLES=var_list
+ /SCALE (@var{name}) = @{var_list, ALL@}
+ /MODEL=@{ALPHA, SPLIT[(N)]@}
+ /SUMMARY=@{TOTAL,ALL@}
+ /MISSING=@{EXCLUDE,INCLUDE@}
+@end display
+
+@cindex Cronbach's Alpha
+The @cmd{RELIABILTY} command performs reliablity analysis on the data.
+
+The VARIABLES subcommand is required. It determines the set of variables
+upon which analysis is to be performed.
+
+The SCALE subcommand determines which variables reliability is to be
+calculated for. If it is omitted, then analysis for all variables named
+in the VARIABLES subcommand will be used.
+Optionally, the @var{name} parameter may be specified to set a string name
+for the scale.
+
+The MODEL subcommand determines the type of analysis. If ALPHA is specified,
+then Cronbach's Alpha is calculated for the scale. If the model is SPLIT,
+then the variables are divided into 2 subsets. An optional parameter
+@var{N} may be given, to specify how many variables to be in the first subset.
+If @var{N} is omitted, then it defaults to one half of the variables in the
+scale, or one half minus one if there are an odd number of variables.
+The default model is ALPHA.
+
+By default, any cases with user missing, or system missing values for
+any variables given
+in the VARIABLES subcommand will be omitted from analysis.
+The MISSING subcommand determines whether user missing values are to
+be included or excluded in the analysis.
+
+The SUMMARY subcommand determines the type of summary analysis to be performed.
+Currently there is only one type: SUMMARY=TOTAL, which displays per-item
+analysis tested against the totals.
+
+
+
+@node ROC
+@section ROC
+
+@vindex ROC
+@cindex Receiver Operating Characterstic
+@cindex Area under curve
+
+@display
+ROC @var{var_list} BY @var{state_var} (@var{state_value})
+ /PLOT = @{ CURVE [(REFERENCE)], NONE @}
+ /PRINT = [ SE ] [ COORDINATES ]
+ /CRITERIA = [ CUTOFF(@{INCLUDE,EXCLUDE@}) ]
+ [ TESTPOS (@{LARGE,SMALL@}) ]
+ [ CI (@var{confidence}) ]
+ [ DISTRIBUTION (@{FREE, NEGEXPO @}) ]
+ /MISSING=@{EXCLUDE,INCLUDE@}
+@end display
+
+
+The @cmd{ROC} command is used to plot the receiver operating characteristic curve
+of a dataset, and to estimate the area under the curve.
+This is useful for analysing the efficacy of a variable as a predictor of a state of nature.
+
+The mandatory @var{var_list} is the list of predictor variables.
+The variable @var{state_var} is the variable whose values represent the actual states,
+and @var{state_value} is the value of this variable which represents the positive state.
+
+The optional subcommand PLOT is used to determine if and how the ROC curve is drawn.
+The keyword CURVE means that the ROC curve should be drawn, and the optional keyword REFERENCE,
+which should be enclosed in parentheses, says that the diagonal reference line should be drawn.
+If the keyword NONE is given, then no ROC curve is drawn.
+By default, the curve is drawn with no reference line.
+
+The optional subcommand PRINT determines which additional tables should be printed.
+Two additional tables are available.
+The SE keyword says that standard error of the area under the curve should be printed as well as
+the area itself.
+In addition, a p-value under the null hypothesis that the area under the curve equals 0.5 will be
+printed.
+The COORDINATES keyword says that a table of coordinates of the ROC curve should be printed.
+
+The CRITERIA subcommand has four optional parameters:
+@itemize @bullet
+@item The TESTPOS parameter may be LARGE or SMALL.
+LARGE is the default, and says that larger values in the predictor variables are to be
+considered positive. SMALL indicates that smaller values should be considered positive.
+
+@item The CI parameter specifies the confidence interval that should be printed.
+It has no effect if the SE keyword in the PRINT subcommand has not been given.
+
+@item The DISTRIBUTION parameter determines the method to be used when estimating the area
+under the curve.
+There are two possibilities, @i{viz}: FREE and NEGEXPO.
+The FREE method uses a non-parametric estimate, and the NEGEXPO method a bi-negative
+exponential distribution estimate.
+The NEGEXPO method should only be used when the number of positive actual states is
+equal to the number of negative actual states.
+The default is FREE.
+
+@item The CUTOFF parameter is for compatibility and is ignored.
+@end itemize
+
+The MISSING subcommand determines whether user missing values are to
+be included or excluded in the analysis. The default behaviour is to
+exclude them.
+Cases are excluded on a listwise basis; if any of the variables in @var{var_list}
+or if the variable @var{state_var} is missing, then the entire case will be
+excluded.
+
+