From: John Darrington Date: Sat, 9 Jul 2011 07:54:15 +0000 (+0200) Subject: QUICK CLUSTER: Add documentation X-Git-Tag: v0.7.9~218 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=93a7decc9a02886e5852ecf37ef820665ec798d3;p=pspp-builds.git QUICK CLUSTER: Add documentation --- diff --git a/doc/statistics.texi b/doc/statistics.texi index 599e7e42..36727c44 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -14,6 +14,7 @@ far. * NPAR TESTS:: Nonparametric tests. * T-TEST:: Test hypotheses about means. * ONEWAY:: One way analysis of variance. +* QUICK CLUSTER:: K-Means clustering. * RANK:: Compute rank scores. * REGRESSION:: Linear regression. * RELIABILITY:: Reliability analysis. @@ -1169,10 +1170,52 @@ that @var{value} should be used as the confidence level for which the posthoc tests will be performed. The default is 0.05. +@node QUICK CLUSTER +@comment node-name, next, previous, up +@section QUICK CLUSTER +@vindex QUICK CLUSTER + +@cindex K-means clustering +@cindex clustering + +@display +QUICK CLUSTER var_list + [/CRITERIA=CLUSTERS(@var{k}) [MXITER(@var{max_iter})]] + [/MISSING=@{EXCLUDE,INCLUDE@} @{LISTWISE, PAIRWISE@}] +@end display + +The @cmd{QUICK CLUSTER} command performs k-means clustering on the +dataset. This is useful when you wish to allocate cases into clusters +of similar values and you already know the number of clusters. + +The minimum specification is @samp{QUICK CLUSTER} followed by the names +of the variables which contain the cluster data. Normally you will also +want to specify @samp{/CRITERIA=CLUSTERS(@var{k})} where @var{k} is the +number of clusters. If this is not given, then @var{k} defaults to 2. + +The command uses an iterative algorithm to determine the clusters for +each case. It will continue iterating until convergence, or until @var{max_iter} +iterations have been done. The default value of @var{max_iter} is 2. + +The @cmd{MISSING} subcommand determines the handling of missing variables. +If INCLUDE is set, then user-missing values are considered at their face +value and not as missing values. +If EXCLUDE is set, which is the default, user-missing +values are excluded as well as system-missing values. + +If LISTWISE is set, then the entire case is excluded from the analysis +whenever any of the clustering variables contains a missing value. +If PAIRWISE is set, then a case is considered missing only if all the +clustering variables contain missing values. Otherwise it is clustered +on the basis of the non-missing values. +The default is LISTWISE. + + @node RANK @comment node-name, next, previous, up @section RANK + @vindex RANK @display RANK