X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=cdc5fdf6706dee358a119e252273d0ce22caf4f3;hb=9ff56712d2c03adaf0fb23a501c3ba0e00471b20;hp=76925d7e954d865d76217b97e79c86fa3a0ca272;hpb=bd17d2af982332ee1791998361b1ac6731fe14fa;p=pspp-builds.git diff --git a/doc/statistics.texi b/doc/statistics.texi index 76925d7e..cdc5fdf6 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -8,7 +8,9 @@ far. * DESCRIPTIVES:: Descriptive statistics. * FREQUENCIES:: Frequency tables. * EXAMINE:: Testing data for normality. +* CORRELATIONS:: Correlation tables. * CROSSTABS:: Crosstabulation tables. +* FACTOR:: Factor analysis and Principal Components analysis * NPAR TESTS:: Nonparametric tests. * T-TEST:: Test hypotheses about means. * ONEWAY:: One way analysis of variance. @@ -301,6 +303,69 @@ If many dependent variable are given, or factors are given for which there are many distinct values, then @cmd{EXAMINE} will produce a very large quantity of output. +@node CORRELATIONS +@section CORRELATIONS + +@vindex CORRELATIONS +@display +CORRELATIONS + /VARIABLES = varlist [ WITH varlist ] + [ + . + . + . + /VARIABLES = varlist [ WITH varlist ] + /VARIABLES = varlist [ WITH varlist ] + ] + + [ /PRINT=@{TWOTAIL, ONETAIL@} @{SIG, NOSIG@} ] + [ /STATISTICS=DESCRIPTIVES XPROD ALL] + [ /MISSING=@{PAIRWISE, LISTWISE@} @{INCLUDE, EXCLUDE@} ] +@end display + +@cindex correlation +The @cmd{CORRELATIONS} procedure produces tables of the Pearson correlation coefficient +for a set of variables. The significance of the coefficients are also given. + +At least one VARIABLES subcommand is required. If the WITH keyword is used, then a non-square +correlation table will be produced. +The variables preceding WITH, will be used as the rows of the table, and the variables following +will be the columns of the table. +If no WITH subcommand is given, then a square, symmetrical table using all variables is produced. + + +The @cmd{MISSING} subcommand determines the handling of missing variables. +If INCLUDE is set, then user-missing values are included in the +calculations, but system-missing values are not. +If EXCLUDE is set, which is the default, user-missing +values are excluded as well as system-missing values. +This is the default. + +If LISTWISE is set, then the entire case is excluded from analysis +whenever any variable specified in any @cmd{/VARIABLES} subcommand +contains a missing value. +If PAIRWISE is set, then a case is considered missing only if either of the +values for the particular coefficient are missing. +The default is PAIRWISE. + +The PRINT subcommand is used to control how the reported significance values are printed. +If the TWOTAIL option is used, then a two-tailed test of significance is +printed. If the ONETAIL option is given, then a one-tailed test is used. +The default is TWOTAIL. + +If the NOSIG option is specified, then correlation coefficients with significance less than +0.05 are highlighted. +If SIG is specified, then no highlighting is performed. This is the default. + +@cindex covariance +The STATISTICS subcommand requests additional statistics to be displayed. The keyword +DESCRIPTIVES requests that the mean, number of non-missing cases, and the non-biased +estimator of the standard deviation are displayed. +These statistics will be displayed in a separated table, for all the variables listed +in any /VARIABLES subcommand. +The XPROD keyword requests cross-product deviations and covariance estimators to +be displayed for each pair of variables. +The keyword ALL is the union of DESCRIPTIVES and XPROD. @node CROSSTABS @section CROSSTABS @@ -489,6 +554,99 @@ Approximate T of uncertainty coefficient is wrong. Fixes for any of these deficiencies would be welcomed. +@node FACTOR +@section FACTOR + +@vindex FACTOR +@cindex factor analysis +@cindex principal components analysis +@cindex principal axis factoring +@cindex data reduction + +@display +FACTOR VARIABLES=var_list + + [ /METHOD = @{CORRELATION, COVARIANCE@} ] + + [ /EXTRACTION=@{PC, PAF@}] + + [ /PRINT=[INITIAL] [EXTRACTION] [UNIVARIATE] [CORRELATION] [COVARIANCE] [DET] [SIG] [ALL] [DEFAULT] ] + + [ /PLOT=[EIGEN] ] + + [ /FORMAT=[SORT] [BLANK(@var{n})] [DEFAULT] ] + + [ /CRITERIA=[FACTORS(@var{n})] [MINEIGEN(@var{l})] [ITERATE(@var{m})] [ECONVERGE (@var{delta})] [DEFAULT] ] + + [ /MISSING=[@{LISTWISE, PAIRWISE@}] [@{INCLUDE, EXCLUDE@}] ] +@end display + +The FACTOR command performs Factor Analysis or Principal Axis Factoring on a dataset. It may be used to find +common factors in the data or for data reduction purposes. + +The VARIABLES subcommand is required. It lists the variables which are to partake in the analysis. + +The /EXTRACTION subcommand is used to specify the way in which factors (components) are extracted from the data. +If PC is specified, then Principal Components Analysis is used. If PAF is specified, then Principal Axis Factoring is +used. By default Principal Components Analysis will be used. + +The /METHOD subcommand should be used to determine whether the covariance matrix or the correlation matrix of the data is +to be analysed. By default, the correlation matrix is analysed. + +The /PRINT subcommand may be used to select which features of the analysis are reported: + +@itemize +@item UNIVARIATE + A table of mean values, standard deviations and total weights are printed. +@item INITIAL + Initial communalities and eigenvalues are printed. +@item EXTRACTION + Extracted communalities and eigenvalues are printed. +@item CORRELATION + The correlation matrix is printed. +@item COVARIANCE + The covariance matrix is printed. +@item DET + The determinant of the correlation or covariance matrix is printed. +@item SIG + The significance of the elements of correlation matrix is printed. +@item ALL + All of the above are printed. +@item DEFAULT + Identical to INITIAL and EXTRACTION. +@end itemize + +If /PLOT=EIGEN is given, then a ``Scree'' plot of the eigenvalues will be printed. This can be useful for visualising +which factors (components) should be retained. + +The /FORMAT subcommand determined how data are to be displayed in loading matrices. If SORT is specified, then the variables +are sorted in descending order of significance. If BLANK(@var{n}) is specified, then coefficients whose absolute value is less +than @var{n} will not be printed. If the keyword DEFAULT is given, or if no /FORMAT subcommand is given, then no sorting is +performed, and all coefficients will be printed. + +The /CRITERIA subcommand is used to specify how the number of extracted factors (components) are chosen. If FACTORS(@var{n}) is +specified, where @var{n} is an integer, then @var{n} factors will be extracted. Otherwise, the MINEIGEN setting will +be used. MINEIGEN(@var{l}) requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted. +The default value of @var{l} is 1. The ECONVERGE and ITERATE settings have effect only when iterative algorithms for factor +extraction (such as Principal Axis Factoring) are used. ECONVERGE(@var{delta}) specifies that iteration should cease when +the maximum absolute value of the communality estimate between one iteration and the previous is less than @var{delta}. The +default value of @var{delta} is 0.001. +The ITERATE(@var{m}) setting sets the maximum number of iterations to @var{m}. The default value of @var{m} is 25. + +The @cmd{MISSING} subcommand determines the handling of missing variables. +If INCLUDE is set, then user-missing values are included in the +calculations, but system-missing values are not. +If EXCLUDE is set, which is the default, user-missing +values are excluded as well as system-missing values. +This is the default. +If LISTWISE is set, then the entire case is excluded from analysis +whenever any variable specified in the @cmd{VARIABLES} subcommand +contains a missing value. +If PAIRWISE is set, then a case is considered missing only if either of the +values for the particular coefficient are missing. +The default is LISTWISE. + + @node NPAR TESTS @section NPAR TESTS