X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=6957b3836eae3a2c07f8bad2f64d30d781db1249;hb=2dd762f7a21ca4e1ac232eb5350f19fd1e70be0f;hp=2f74f883fd1e7dda67622f7a9634abcf78d8677f;hpb=f751a381d7b792ee539ff2297a2b1ab9afe2d8c3;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index 2f74f883fd..6957b3836e 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -20,6 +20,7 @@ far. * GRAPH:: Plot data. * CORRELATIONS:: Correlation tables. * CROSSTABS:: Crosstabulation tables. +* CTABLES:: Custom tables. * FACTOR:: Factor analysis and Principal Components analysis. * GLM:: Univariate Linear Models. * LOGISTIC REGRESSION:: Bivariate Logistic Regression. @@ -29,7 +30,6 @@ far. * ONEWAY:: One way analysis of variance. * QUICK CLUSTER:: K-Means clustering. * RANK:: Compute rank scores. -* REGRESSION:: Linear regression. * RELIABILITY:: Reliability analysis. * ROC:: Receiver Operating Characteristic. @end menu @@ -513,9 +513,9 @@ The @cmd{GRAPH} command produces graphical plots of data. Only one of the subcom can be produced per call of @cmd{GRAPH}. The @subcmd{MISSING} is optional. @menu -* SCATTERPLOT:: Cartesian Plots -* HISTOGRAM:: Histograms -* BAR CHART:: Bar Charts +* SCATTERPLOT:: Cartesian Plots +* HISTOGRAM:: Histograms +* BAR CHART:: Bar Charts @end menu @node SCATTERPLOT @@ -897,6 +897,170 @@ person's occupation. @caption {The results of a test of independence between @exvar{sex} and @exvar{occupation}} @end float +@node CTABLES +@section CTABLES + +@vindex CTABLES +@cindex custom tables +@cindex tables, custom + +@code{CTABLES} has the following overall syntax. At least one +@code{TABLE} subcommand is required: + +@display +@t{CTABLES} + @dots{}@i{global subcommands}@dots{} + [@t{/TABLE} @i{axis} [@t{BY} @i{axis} [@t{BY} @i{axis}]] + @dots{}@i{per-table subcommands}@dots{}]@dots{} +@end display + +@noindent +where each @i{axis} may be empty or take one of the following forms: + +@display +@i{variable} +@i{variable} @t{[}@{@t{C} @math{|} @t{S}@}@t{]} +@i{axis} + @i{axis} +@i{axis} > @i{axis} +(@i{axis}) +@i{axis} @t{(}@i{summary} [@i{string}] [@i{format}]@t{)} +@end display + +The following subcommands precede the first @code{TABLE} subcommand +and apply to all of the output tables. All of these subcommands are +optional: + +@display +@t{/FORMAT} + [@t{MINCOLWIDTH=}@{@t{DEFAULT} @math{|} @i{width}@}] + [@t{MAXCOLWIDTH=}@{@t{DEFAULT} @math{|} @i{width}@}] + [@t{UNITS=}@{@t{POINTS} @math{|} @t{INCHES} @math{|} @t{CM}@}] + [@t{EMPTY=}@{@t{ZERO} @math{|} @t{BLANK} @math{|} @i{string}@}] + [@t{MISSING=}@i{string}] +@t{/VLABELS} + @t{VARIABLES=}@i{variables} + @t{DISPLAY}=@{@t{DEFAULT} @math{|} @t{NAME} @math{|} @t{LABEL} @math{|} @t{BOTH} @math{|} @t{NONE}@} +@t{/MRSETS COUNTDUPLICATES=}@{@t{YES} @math{|} @t{NO}@} +@t{/SMISSING} @{@t{VARIABLE} @math{|} @t{LISTWISE}@} +@t{/PCOMPUTE} @t{&}@i{category}@t{=EXPR(}@i{expression}@t{)} +@t{/PPROPERTIES} @t{&}@i{category}@dots{} + [@t{LABEL=}@i{string}] + [@t{FORMAT=}[@i{summary} @i{format}]@dots{}] + [@t{HIDESOURCECATS=}@{@t{NO} @math{|} @t{YES}@} +@t{/WEIGHT VARIABLE=}@i{variable} +@t{/HIDESMALLCOUNTS COUNT=@i{count}} +@end display + +The following subcommands follow @code{TABLE} and apply only to the +previous @code{TABLE}. All of these subcommands are optional: + +@display +@t{/SLABELS} + [@t{POSITION=}@{@t{COLUMN} @math{|} @t{ROW} @math{|} @t{LAYER}@}] + [@t{VISIBLE=}@{@t{YES} @math{|} @t{NO}@}] +@t{/CLABELS} @{@t{AUTO} @math{|} @{@t{ROWLABELS}@math{|}@t{COLLABELS}@}@t{=}@{@t{OPPOSITE}@math{|}@t{LAYER}@}@} +@t{/CRITERIA CILEVEL=}@i{percentage} +@t{/CATEGORIES} @t{VARIABLES=}@i{variables} + @{@t{[}@i{value}@t{,} @i{value}@dots{}@t{]} + @math{|} [@t{ORDER=}@{@t{A} @math{|} @t{D}@}] + [@t{KEY=}@{@t{VALUE} @math{|} @t{LABEL} @math{|} @i{summary}@t{(}@i{variable}@t{)}@}] + [@t{MISSING=}@{@t{EXCLUDE} @math{|} @t{INCLUDE}@}]@} + [@t{TOTAL=}@{@t{NO} @math{|} @t{YES}@} [@t{LABEL=}@i{string}] [@t{POSITION=}@{@t{AFTER} @math{|} @t{BEFORE}@}]] + [@t{EMPTY=}@{@t{INCLUDE} @math{|} @t{EXCLUDE}@}] +@t{/TITLES} + [@t{TITLE=}@i{string}@dots{}] + [@t{CAPTION=}@i{string}@dots{}] + [@t{CORNER=}@i{string}@dots{}] +@t{/SIGTEST TYPE=CHISQUARE} + [@t{ALPHA=}@i{siglevel}] + [@t{INCLUDEMRSETS=}@{@t{YES} @math{|} @t{NO}@}] + [@t{CATEGORIES=}@{@t{ALLVISIBLE} @math{|} @t{SUBTOTALS}@}] +@t{/COMPARETEST TYPE=}@{@t{PROP} @math{|} @t{MEAN}@} + [@t{ALPHA=}@i{value}[@t{,} @i{value}]] + [@t{ADJUST=}@{@t{BONFERRONI} @math{|} @t{BH} @math{|} @t{NONE}@}] + [@t{INCLUDEMRSETS=}@{@t{YES} @math{|} @t{NO}@}] + [@t{MEANSVARIANCE=}@{@t{ALLCATS} @math{|} @t{TESTEDCATS}@}] + [@t{CATEGORIES=}@{@t{ALLVISIBLE} @math{|} @t{SUBTOTALS}@}] + [@t{MERGE=}@{@t{NO} @math{|} @t{YES}@}] + [@t{STYLE=}@{@t{APA} @math{|} @t{SIMPLE}@}] + [@t{SHOWSIG=}@{@t{NO} @math{|} @t{YES}@}] +@end display + +The @code{CTABLES} (aka ``custom tables'') command produces +multi-dimensional tables from categorical and scale data. It offers +many options for data summarization and formatting. + +This section's examples use data from the 2008 (USA) National Survey +of Drinking and Driving Attitudes and Behaviors, a public domain data +set from the (USA) National Highway Traffic Administration and +available at @url{https://data.transportation.gov}. @pspp{} includes +this data set, with a slightly modified dictionary, as +@file{examples/nhtsa.sav}. + +@menu +* CTABLES Basics:: +@end menu + +@node CTABLES Basics +@subsection Basics + +The only required subcommand is @code{TABLE}, which specifies the +variables to include along each axis: +@display +@t{/TABLE} @i{rows} [@t{BY} @i{columns} [@t{BY} @i{layers}]] +@end display +@noindent +In @code{TABLE}, each of @var{rows}, @var{columns}, and @var{layers} +is either empty or an axis expression that specifies one or more +variables. An axis expression that names a categorical variable +divides the data into cells according to the values of that variable. +When all the variables named on @code{TABLE} are categorical, by +default each cell displays the number of cases that it contains, so +specifying a single variable yields a frequency table: + +@example +CTABLES /TABLE=AgeGroup. +@end example +@psppoutput {ctables1} + +@noindent +Specifying a row and a column categorical variable yields a +crosstabulation: + +@example +CTABLES /TABLE=AgeGroup BY qns3a. +@end example +@psppoutput {ctables2} + +@noindent +The @samp{>} operator nests multiple variables on a single axis, e.g.: + +@example +CTABLES /TABLE qn105ba BY AgeGroup > qns3a. +@end example +@psppoutput {ctables3} + +@noindent +The @samp{+} operator allows a single output table to include multiple +data analyses. With @samp{+}, @code{CTABLES} divides the output table +into multiple sections, each of which includes an analysis of the full +data set. For example, the following command separately tabulates age +group and driving frequency by gender: + +@example +CTABLES /TABLE AgeGroup + qn1 BY qns3a. +@end example +@psppoutput {ctables4} + +@noindent +If @samp{+} and @samp{>} are used together, @samp{>} binds more +tightly. Use parentheses to override operator precedence. Thus: + +@example +CTABLES /TABLE qn26 + qn27 > qns3a. +CTABLES /TABLE (qn26 + qn27) > qns3a. +@end example +@psppoutput {ctables5} @node FACTOR @section FACTOR @@ -1403,19 +1567,19 @@ is used. @menu -* BINOMIAL:: Binomial Test -* CHISQUARE:: Chi-square Test -* COCHRAN:: Cochran Q Test -* FRIEDMAN:: Friedman Test -* KENDALL:: Kendall's W Test -* KOLMOGOROV-SMIRNOV:: Kolmogorov Smirnov Test -* KRUSKAL-WALLIS:: Kruskal-Wallis Test -* MANN-WHITNEY:: Mann Whitney U Test -* MCNEMAR:: McNemar Test -* MEDIAN:: Median Test -* RUNS:: Runs Test -* SIGN:: The Sign Test -* WILCOXON:: Wilcoxon Signed Ranks Test +* BINOMIAL:: Binomial Test +* CHISQUARE:: Chi-square Test +* COCHRAN:: Cochran Q Test +* FRIEDMAN:: Friedman Test +* KENDALL:: Kendall's W Test +* KOLMOGOROV-SMIRNOV:: Kolmogorov Smirnov Test +* KRUSKAL-WALLIS:: Kruskal-Wallis Test +* MANN-WHITNEY:: Mann Whitney U Test +* MCNEMAR:: McNemar Test +* MEDIAN:: Median Test +* RUNS:: Runs Test +* SIGN:: The Sign Test +* WILCOXON:: Wilcoxon Signed Ranks Test @end menu @@ -1634,9 +1798,10 @@ arbitrary number of populations. It does not assume normality. The data to be compared are specified by @var{var_list}. The categorical variable determining the groups to which the data belongs is given by @var{var}. The limits @var{lower} and -@var{upper} specify the valid range of @var{var}. Any cases for -which @var{var} falls outside [@var{lower}, @var{upper}] are -ignored. +@var{upper} specify the valid range of @var{var}. +If @var{upper} is smaller than @var{lower}, the PSPP will assume their values +to be reversed. Any cases for which @var{var} falls outside +[@var{lower}, @var{upper}] are ignored. The mean rank of each group as well as the chi-squared value and significance of the test are printed.