X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=36727c44c5f75dfa7620e900029875d4c2dd874b;hb=93a7decc9a02886e5852ecf37ef820665ec798d3;hp=c475f306b3644431b9e248e629b6e04b8b550b3b;hpb=38b40b6e700680ede4e2f6203e1200ad4197b4b4;p=pspp-builds.git diff --git a/doc/statistics.texi b/doc/statistics.texi index c475f306..36727c44 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -14,6 +14,7 @@ far. * NPAR TESTS:: Nonparametric tests. * T-TEST:: Test hypotheses about means. * ONEWAY:: One way analysis of variance. +* QUICK CLUSTER:: K-Means clustering. * RANK:: Compute rank scores. * REGRESSION:: Linear regression. * RELIABILITY:: Reliability analysis. @@ -38,7 +39,7 @@ DESCRIPTIVES @{A,D@} @end display -The @cmd{DESCRIPTIVES} procedure reads the active file and outputs +The @cmd{DESCRIPTIVES} procedure reads the active dataset and outputs descriptive statistics requested by the user. In addition, it can optionally compute Z-scores. @@ -681,8 +682,15 @@ is used. @menu * BINOMIAL:: Binomial Test * CHISQUARE:: Chisquare Test -* WILCOXON:: Wilcoxon Signed Ranks Test +* COCHRAN:: Cochran Q Test +* FRIEDMAN:: Friedman Test +* KENDALL:: Kendall's W Test +* KRUSKAL-WALLIS:: Kruskal-Wallis Test +* MANN-WHITNEY:: Mann Whitney U Test +* MCNEMAR:: McNemar Test +* RUNS:: Runs Test * SIGN:: The Sign Test +* WILCOXON:: Wilcoxon Signed Ranks Test @end menu @@ -762,20 +770,108 @@ sum of the frequencies need not be 1. If no /EXPECTED subcommand is given, then then equal frequencies are expected. -@node WILCOXON -@subsection Wilcoxon Matched Pairs Signed Ranks Test -@comment node-name, next, previous, up -@vindex WILCOXON -@cindex wilcoxon matched pairs signed ranks test + +@node COCHRAN +@subsection Cochran Q Test +@vindex Cochran +@cindex Cochran Q test +@cindex Q, Cochran Q @display - [ /WILCOXON varlist [ WITH varlist [ (PAIRED) ]]] + [ /COCHRAN = varlist ] @end display -The /WILCOXON subcommand tests for differences between medians of the -variables listed. -The test does not make any assumptions about the variances of the samples. -It does however assume that the distribution is symetrical. +The Cochran Q test is used to test for differences between three or more groups. +The data for @var{varlist} in all cases must assume exactly two distinct values (other than missing values). + +The value of Q will be displayed and its Asymptotic significance based on a chi-square distribution. + +@node FRIEDMAN +@subsection Friedman Test +@vindex FRIEDMAN +@cindex Friedman test + +@display + [ /FRIEDMAN = varlist ] +@end display + +The Friedman test is used to test for differences between repeated measures when there is no indication that the distributions are normally distributed. + +A list of variables which contain the measured data must be given. The procedure prints the sum of ranks for each variable, the test statistic and its significance. + +@node KENDALL +@subsection Kendall's W Test +@vindex KENDALL +@cindex Kendall's W test +@cindex coefficient of concordance + +@display + [ /KENDALL = varlist ] +@end display + +The Kendall test investigates whether an arbitrary number of related samples come from the +same population. +It is identical to the Friedman test except that the additional statistic W, Kendall's Coefficient of Concordance is printed. +It has the range [0,1] --- a value of zero indicates no agreement between the samples whereas a value of +unity indicates complete agreement. + + +@node KRUSKAL-WALLIS +@subsection Kruskal-Wallis Test +@vindex KRUSKAL-WALLIS +@vindex K-W +@cindex Kruskal-Wallis test + +@display + [ /KRUSKAL-WALLIS = varlist BY var (lower, upper) ] +@end display + +The Kruskal-Wallis test is used to compare data from an +arbitrary number of populations. It does not assume normality. +The data to be compared are specified by @var{varlist}. +The categorical variable determining the groups to which the +data belongs is given by @var{var}. The limits @var{lower} and +@var{upper} specify the valid range of @var{var}. Any cases for +which @var{var} falls outside [@var{lower}, @var{upper}] will be +ignored. + +The mean rank of each group as well as the chi-squared value and significance +of the test will be printed. +The abbreviated subcommand K-W may be used in place of KRUSKAL-WALLIS. + + +@node MANN-WHITNEY +@subsection Mann-Whitney U Test +@vindex MANN-WHITNEY +@vindex M-W +@cindex Mann-Whitney U test +@cindex U, Mann-Whitney U + +@display + [ /MANN-WHITNEY = varlist BY var (group1, group2) ] +@end display + +The Mann-Whitney subcommand is used to test whether two groups of data come from different populations. +The variables to be tested should be specified in @var{varlist} and the grouping variable, that determines to which group the test variables belong, in @var{var}. +@var{Var} may be either a string or an alpha variable. +@var{Group1} and @var{group2} specify the +two values of @var{var} which determine the groups of the test data. +Cases for which the @var{var} value is neither @var{group1} or @var{group2} will be ignored. + +The value of the Mann-Whitney U statistic, the Wilcoxon W, and the significance will be printed. +The abbreviated subcommand M-W may be used in place of MANN-WHITNEY. + +@node MCNEMAR +@subsection McNemar Test +@vindex MCNEMAR +@cindex McNemar test + +@display + [ /MCNEMAR varlist [ WITH varlist [ (PAIRED) ]]] +@end display + +Use McNemar's test to analyse the significance of the difference between +pairs of correlated proportions. If the @code{WITH} keyword is omitted, then tests for all combinations of the listed variables are performed. @@ -789,6 +885,29 @@ If the @code{WITH} keyword is given, but the of variable preceding @code{WITH} against variable following @code{WITH} are performed. +The data in each variable must be dichotomous. If there are more +than two distinct variables an error will occur and the test will +not be run. + +@node RUNS +@subsection Runs Test +@vindex RUNS +@cindex runs test + +@display + [ /RUNS (@{MEAN, MEDIAN, MODE, value@}) varlist ] +@end display + +The /RUNS subcommand tests whether a data sequence is randomly ordered. + +It works by examining the number of times a variable's value crosses a given threshold. +The desired threshold must be specified within parentheses. +It may either be specified as a number or as one of MEAN, MEDIAN or MODE. +Following the threshold specification comes the list of variables whose values are to be +tested. + +The subcommand shows the number of runs, the asymptotic significance based on the +length of the data. @node SIGN @subsection Sign Test @@ -816,6 +935,33 @@ If the @code{WITH} keyword is given, but the of variable preceding @code{WITH} against variable following @code{WITH} are performed. +@node WILCOXON +@subsection Wilcoxon Matched Pairs Signed Ranks Test +@comment node-name, next, previous, up +@vindex WILCOXON +@cindex wilcoxon matched pairs signed ranks test + +@display + [ /WILCOXON varlist [ WITH varlist [ (PAIRED) ]]] +@end display + +The /WILCOXON subcommand tests for differences between medians of the +variables listed. +The test does not make any assumptions about the variances of the samples. +It does however assume that the distribution is symetrical. + +If the @code{WITH} keyword is omitted, then tests for all +combinations of the listed variables are performed. +If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword +is also given, then the number of variables preceding @code{WITH} +must be the same as the number following it. +In this case, tests for each respective pair of variables are +performed. +If the @code{WITH} keyword is given, but the +@code{(PAIRED)} keyword is omitted, then tests for each combination +of variable preceding @code{WITH} against variable following +@code{WITH} are performed. + @node T-TEST @comment node-name, next, previous, up @section T-TEST @@ -956,7 +1102,7 @@ ONEWAY /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@} /CONTRAST= value1 [, value2] ... [,valueN] /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@} - + /POSTHOC=@{BONFERRONI, GH, LSD, SCHEFFE, SIDAK, TUKEY, ALPHA ([value])@} @end display The @cmd{ONEWAY} procedure performs a one-way analysis of variance of @@ -1000,11 +1146,76 @@ A setting of EXCLUDE means that variables whose values are user-missing are to be excluded from the analysis. A setting of INCLUDE means they are to be included. The default is EXCLUDE. +Using the @code{POSTHOC} subcommand you can perform multiple +pairwise comparisons on the data. The following comparison methods +are available: +@itemize +@item LSD +Least Significant Difference. +@item TUKEY +Tukey Honestly Significant Difference. +@item BONFERRONI +Bonferroni test. +@item SCHEFFE +Scheff@'e's test. +@item SIDAK +Sidak test. +@item GH +The Games-Howell test. +@end itemize + +@noindent +The optional syntax @code{ALPHA(@var{value})} is used to indicate +that @var{value} should be used as the +confidence level for which the posthoc tests will be performed. +The default is 0.05. + +@node QUICK CLUSTER +@comment node-name, next, previous, up +@section QUICK CLUSTER +@vindex QUICK CLUSTER + +@cindex K-means clustering +@cindex clustering + +@display +QUICK CLUSTER var_list + [/CRITERIA=CLUSTERS(@var{k}) [MXITER(@var{max_iter})]] + [/MISSING=@{EXCLUDE,INCLUDE@} @{LISTWISE, PAIRWISE@}] +@end display + +The @cmd{QUICK CLUSTER} command performs k-means clustering on the +dataset. This is useful when you wish to allocate cases into clusters +of similar values and you already know the number of clusters. + +The minimum specification is @samp{QUICK CLUSTER} followed by the names +of the variables which contain the cluster data. Normally you will also +want to specify @samp{/CRITERIA=CLUSTERS(@var{k})} where @var{k} is the +number of clusters. If this is not given, then @var{k} defaults to 2. + +The command uses an iterative algorithm to determine the clusters for +each case. It will continue iterating until convergence, or until @var{max_iter} +iterations have been done. The default value of @var{max_iter} is 2. + +The @cmd{MISSING} subcommand determines the handling of missing variables. +If INCLUDE is set, then user-missing values are considered at their face +value and not as missing values. +If EXCLUDE is set, which is the default, user-missing +values are excluded as well as system-missing values. + +If LISTWISE is set, then the entire case is excluded from the analysis +whenever any of the clustering variables contains a missing value. +If PAIRWISE is set, then a case is considered missing only if all the +clustering variables contain missing values. Otherwise it is clustered +on the basis of the non-missing values. +The default is LISTWISE. + @node RANK @comment node-name, next, previous, up @section RANK + @vindex RANK @display RANK