doc/statistics.texi

   1 @node Statistics, Utilities, Conditionals and Looping, Top
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @c If you add any new commands, then don't forget to remove the entry in
   8 @c not-implemented.texi
   9
  10 @menu
  11 * DESCRIPTIVES::                Descriptive statistics.
  12 * FREQUENCIES::                 Frequency tables.
  13 * EXAMINE::                     Testing data for normality.
  14 * CROSSTABS::                   Crosstabulation tables.
  15 * T-TEST::                      Test hypotheses about means.
  16 * ONEWAY::                      One way analysis of variance.
  17 * REGRESSION::                  Linear regression.
  18 @end menu
  19
  20 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
  21 @section DESCRIPTIVES
  22
  23 @vindex DESCRIPTIVES
  24 @display
  25 DESCRIPTIVES
  26         /VARIABLES=var_list
  27         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  28         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  29         /SAVE
  30         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  31                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  32                      SESKEWNESS,SEKURTOSIS@}
  33         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  34                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  35               @{A,D@}
  36 @end display
  37
  38 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  39 descriptive
  40 statistics requested by the user.  In addition, it can optionally
  41 compute Z-scores.
  42
  43 The VARIABLES subcommand, which is required, specifies the list of
  44 variables to be analyzed.  Keyword VARIABLES is optional.
  45
  46 All other subcommands are optional:
  47
  48 The MISSING subcommand determines the handling of missing variables.  If
  49 INCLUDE is set, then user-missing values are included in the
  50 calculations.  If NOINCLUDE is set, which is the default, user-missing
  51 values are excluded.  If VARIABLE is set, then missing values are
  52 excluded on a variable by variable basis; if LISTWISE is set, then
  53 the entire case is excluded whenever any value in that case has a
  54 system-missing or, if INCLUDE is set, user-missing value.
  55
  56 The FORMAT subcommand affects the output format.  Currently the
  57 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  58 set, both valid and missing number of cases are listed in the output;
  59 when NOSERIAL is set, only valid cases are listed.
  60
  61 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  62 the specified variables.  The Z scores are saved to new variables.
  63 Variable names are generated by trying first the original variable name
  64 with Z prepended and truncated to a maximum of 8 characters, then the
  65 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  66 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  67 variable names can be specified explicitly on VARIABLES in the variable
  68 list by enclosing them in parentheses after each variable.
  69
  70 The STATISTICS subcommand specifies the statistics to be displayed:
  71
  72 @table @code
  73 @item ALL
  74 All of the statistics below.
  75 @item MEAN
  76 Arithmetic mean.
  77 @item SEMEAN
  78 Standard error of the mean.
  79 @item STDDEV
  80 Standard deviation.
  81 @item VARIANCE
  82 Variance.
  83 @item KURTOSIS
  84 Kurtosis and standard error of the kurtosis.
  85 @item SKEWNESS
  86 Skewness and standard error of the skewness.
  87 @item RANGE
  88 Range.
  89 @item MINIMUM
  90 Minimum value.
  91 @item MAXIMUM
  92 Maximum value.
  93 @item SUM
  94 Sum.
  95 @item DEFAULT
  96 Mean, standard deviation of the mean, minimum, maximum.
  97 @item SEKURTOSIS
  98 Standard error of the kurtosis.
  99 @item SESKEWNESS
 100 Standard error of the skewness.
 101 @end table
 102
 103 The SORT subcommand specifies how the statistics should be sorted.  Most
 104 of the possible values should be self-explanatory.  NAME causes the
 105 statistics to be sorted by name.  By default, the statistics are listed
 106 in the order that they are specified on the VARIABLES subcommand.  The A
 107 and D settings request an ascending or descending sort order,
 108 respectively.
 109
 110 @node FREQUENCIES, EXAMINE, DESCRIPTIVES, Statistics
 111 @section FREQUENCIES
 112
 113 @vindex FREQUENCIES
 114 @display
 115 FREQUENCIES
 116         /VARIABLES=var_list
 117         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 118                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 119                 @{LABELS,NOLABELS@}
 120                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 121                 @{SINGLE,DOUBLE@}
 122                 @{OLDPAGE,NEWPAGE@}
 123         /MISSING=@{EXCLUDE,INCLUDE@}
 124         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 125                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 126                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 127         /NTILES=ntiles
 128         /PERCENTILES=percent@dots{}
 129
 130 (These options are not currently implemented.)
 131         /BARCHART=@dots{}
 132         /HISTOGRAM=@dots{}
 133         /HBAR=@dots{}
 134         /GROUPED=@dots{}
 135
 136 (Integer mode.)
 137         /VARIABLES=var_list (low,high)@dots{}
 138 @end display
 139
 140 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 141 variables.
 142 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 143 (including median and mode) and percentiles.
 144
 145 In the future, @cmd{FREQUENCIES} will also support graphical output in the
 146 form of bar charts and histograms.  In addition, it will be able to
 147 support percentiles for grouped data.
 148
 149 The VARIABLES subcommand is the only required subcommand.  Specify the
 150 variables to be analyzed.  In most cases, this is all that is required.
 151 This is known as @dfn{general mode}.
 152
 153 Occasionally, one may want to invoke a special mode called @dfn{integer
 154 mode}.  Normally, in general mode, PSPP will automatically determine
 155 what values occur in the data.  In integer mode, the user specifies the
 156 range of values that the data assumes.  To invoke this mode, specify a
 157 range of data values in parentheses, separated by a comma.  Data values
 158 inside the range are truncated to the nearest integer, then assigned to
 159 that value.  If values occur outside this range, they are discarded.
 160
 161 The FORMAT subcommand controls the output format.  It has several
 162 possible settings:
 163
 164 @itemize @bullet
 165 @item
 166 TABLE, the default, causes a frequency table to be output for every
 167 variable specified.  NOTABLE prevents them from being output.  LIMIT
 168 with a numeric argument causes them to be output except when there are
 169 more than the specified number of values in the table.
 170
 171 @item
 172 STANDARD frequency tables contain more complete information, but also to
 173 take up more space on the printed page.  CONDENSE frequency tables are
 174 less informative but take up less space.  ONEPAGE with a numeric
 175 argument will output standard frequency tables if there are the
 176 specified number of values or less, condensed tables otherwise.  ONEPAGE
 177 without an argument defaults to a threshold of 50 values.
 178
 179 @item
 180 LABELS causes value labels to be displayed in STANDARD frequency
 181 tables.  NOLABLES prevents this.
 182
 183 @item
 184 Normally frequency tables are sorted in ascending order by value.  This
 185 is AVALUE.  DVALUE tables are sorted in descending order by value.
 186 AFREQ and DFREQ tables are sorted in ascending and descending order,
 187 respectively, by frequency count.
 188
 189 @item
 190 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 191 frequency tables have wider spacing.
 192
 193 @item
 194 OLDPAGE and NEWPAGE are not currently used.
 195 @end itemize
 196
 197 The MISSING subcommand controls the handling of user-missing values.
 198 When EXCLUDE, the default, is set, user-missing values are not included
 199 in frequency tables or statistics.  When INCLUDE is set, user-missing
 200 are included.  System-missing values are never included in statistics,
 201 but are listed in frequency tables.
 202
 203 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 204 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 205 value, and MODE, the mode.  (If there are multiple modes, the smallest
 206 value is reported.)  By default, the mean, standard deviation of the
 207 mean, minimum, and maximum are reported for each variable.
 208
 209 PERCENTILES causes the specified percentiles to be reported.
 210 The percentiles should  be presented at a list of numbers between 0
 211 and 100 inclusive.
 212 The NTILES subcommand causes the percentiles to be reported at the
 213 boundaries of the data set divided into the specified number of ranges.
 214 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 215
 216
 217 @node EXAMINE, CROSSTABS, FREQUENCIES, Statistics
 218 @comment  node-name,  next,  previous,  up
 219 @section EXAMINE
 220 @vindex EXAMINE
 221
 222 @cindex Normality, testing for
 223
 224 @display
 225 EXAMINE
 226         VARIABLES=var_list [BY factor_list ]
 227         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 228         /PLOT=@{STEMLEAF, BOXPLOT, NPPLOT, SPREADLEVEL(n), HISTOGRAM,
 229                ALL, NONE@}
 230         /CINTERVAL n
 231         /COMPARE=@{GROUPS,VARIABLES@}
 232         /ID=@{case_number, var_name@}
 233         /@{TOTAL,NOTOTAL@}
 234         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 235         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 236                 [@{NOREPORT,REPORT@}]
 237
 238 @end display
 239
 240 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 241 normal distribution.  It also shows you outliers and extreme values.
 242
 243 The VARIABLES subcommand specifies the dependent variables and the
 244 independent variable to use as factors for the analysis.   Variables
 245 listed before the first BY keyword are the dependent variables.
 246 The dependent variables may optionally be followed by a list of
 247 factors which tell PSPP how to break down the analysis for each
 248 dependent variable.  The format for each factor is
 249 @display
 250 var [BY var].
 251 @end display
 252
 253
 254 The STATISTICS subcommand specifies the analysis to be done.
 255 DESCRIPTIVES will produce a table showing some parametric and
 256 non-parametrics statistics.  EXTREME produces a table showing extreme
 257 values of the dependent variable.  A number in parentheses determines
 258 how many upper and lower extremes to show.  The default number is 5.
 259
 260
 261 The PLOT subcommand specifies which plots are to be produced if any.
 262
 263 The COMPARE subcommand is only relevant if producing boxplots, and it is only
 264 useful there is more than one dependent variable and at least one factor.   If
 265 /COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
 266 containing boxplots for all the factors.
 267 If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
 268 each containing one boxplot per dependent variable.
 269 If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
 270 /COMPARE=GROUPS.
 271
 272 The CINTERVAL subcommand specifies the confidence interval to use in
 273 calculation of the descriptives command.  The default it 95%.
 274
 275 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 276 and which algorithm to use for calculating them.  The default is to
 277 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 278 HAVERAGE algorithm.
 279
 280 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 281 is given and factors have been specified in the VARIABLES subcommand,
 282 then then statistics for the unfactored dependent variables are
 283 produced in addition to the factored variables.  If there are no
 284 factors specified then TOTAL and NOTOTAL have no effect.
 285
 286 @strong{Warning!}
 287 If many dependent variable are given, or factors are given for which
 288 there are many distinct values, then @cmd{EXAMINE} will produce a very
 289 large quantity of output.
 290
 291
 292 @node CROSSTABS, T-TEST, EXAMINE, Statistics
 293 @section CROSSTABS
 294
 295 @vindex CROSSTABS
 296 @display
 297 CROSSTABS
 298         /TABLES=var_list BY var_list [BY var_list]@dots{}
 299         /MISSING=@{TABLE,INCLUDE,REPORT@}
 300         /WRITE=@{NONE,CELLS,ALL@}
 301         /FORMAT=@{TABLES,NOTABLES@}
 302                 @{LABELS,NOLABELS,NOVALLABS@}
 303                 @{PIVOT,NOPIVOT@}
 304                 @{AVALUE,DVALUE@}
 305                 @{NOINDEX,INDEX@}
 306                 @{BOX,NOBOX@}
 307         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 308                 ASRESIDUAL,ALL,NONE@}
 309         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 310                      KAPPA,ETA,CORR,ALL,NONE@}
 311
 312 (Integer mode.)
 313         /VARIABLES=var_list (low,high)@dots{}
 314 @end display
 315
 316 The @cmd{CROSSTABS} procedure displays crosstabulation
 317 tables requested by the user.  It can calculate several statistics for
 318 each cell in the crosstabulation tables.  In addition, a number of
 319 statistics can be calculated for each table itself.
 320
 321 The TABLES subcommand is used to specify the tables to be reported.  Any
 322 number of dimensions is permitted, and any number of variables per
 323 dimension is allowed.  The TABLES subcommand may be repeated as many
 324 times as needed.  This is the only required subcommand in @dfn{general
 325 mode}.
 326
 327 Occasionally, one may want to invoke a special mode called @dfn{integer
 328 mode}.  Normally, in general mode, PSPP automatically determines
 329 what values occur in the data.  In integer mode, the user specifies the
 330 range of values that the data assumes.  To invoke this mode, specify the
 331 VARIABLES subcommand, giving a range of data values in parentheses for
 332 each variable to be used on the TABLES subcommand.  Data values inside
 333 the range are truncated to the nearest integer, then assigned to that
 334 value.  If values occur outside this range, they are discarded.  When it
 335 is present, the VARIABLES subcommand must precede the TABLES
 336 subcommand.
 337
 338 In general mode, numeric and string variables may be specified on
 339 TABLES.  Although long string variables are allowed, only their
 340 initial short-string parts are used.  In integer mode, only numeric
 341 variables are allowed.
 342
 343 The MISSING subcommand determines the handling of user-missing values.
 344 When set to TABLE, the default, missing values are dropped on a table by
 345 table basis.  When set to INCLUDE, user-missing values are included in
 346 tables and statistics.  When set to REPORT, which is allowed only in
 347 integer mode, user-missing values are included in tables but marked with
 348 an @samp{M} (for ``missing'') and excluded from statistical
 349 calculations.
 350
 351 Currently the WRITE subcommand is ignored.
 352
 353 The FORMAT subcommand controls the characteristics of the
 354 crosstabulation tables to be displayed.  It has a number of possible
 355 settings:
 356
 357 @itemize @bullet
 358 @item
 359 TABLES, the default, causes crosstabulation tables to be output.
 360 NOTABLES suppresses them.
 361
 362 @item
 363 LABELS, the default, allows variable labels and value labels to appear
 364 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 365 labels but suppresses value labels.
 366
 367 @item
 368 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 369 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 370 to be used.
 371
 372 @item
 373 AVALUE, the default, causes values to be sorted in ascending order.
 374 DVALUE asserts a descending sort order.
 375
 376 @item
 377 INDEX/NOINDEX is currently ignored.
 378
 379 @item
 380 BOX/NOBOX is currently ignored.
 381 @end itemize
 382
 383 The CELLS subcommand controls the contents of each cell in the displayed
 384 crosstabulation table.  The possible settings are:
 385
 386 @table @asis
 387 @item COUNT
 388 Frequency count.
 389 @item ROW
 390 Row percent.
 391 @item COLUMN
 392 Column percent.
 393 @item TOTAL
 394 Table percent.
 395 @item EXPECTED
 396 Expected value.
 397 @item RESIDUAL
 398 Residual.
 399 @item SRESIDUAL
 400 Standardized residual.
 401 @item ASRESIDUAL
 402 Adjusted standardized residual.
 403 @item ALL
 404 All of the above.
 405 @item NONE
 406 Suppress cells entirely.
 407 @end table
 408
 409 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 410 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 411 will be selected.
 412
 413 The STATISTICS subcommand selects statistics for computation:
 414
 415 @table @asis
 416 @item CHISQ
 417 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 418 correction, linear-by-linear association.
 419 @item PHI
 420 Phi.
 421 @item CC
 422 Contingency coefficient.
 423 @item LAMBDA
 424 Lambda.
 425 @item UC
 426 Uncertainty coefficient.
 427 @item BTAU
 428 Tau-b.
 429 @item CTAU
 430 Tau-c.
 431 @item RISK
 432 Risk estimate.
 433 @item GAMMA
 434 Gamma.
 435 @item D
 436 Somers' D.
 437 @item KAPPA
 438 Cohen's Kappa.
 439 @item ETA
 440 Eta.
 441 @item CORR
 442 Spearman correlation, Pearson's r.
 443 @item ALL
 444 All of the above.
 445 @item NONE
 446 No statistics.
 447 @end table
 448
 449 Selected statistics are only calculated when appropriate for the
 450 statistic.  Certain statistics require tables of a particular size, and
 451 some statistics are calculated only in integer mode.
 452
 453 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 454 STATISTICS subcommand is not given, no statistics are calculated.
 455
 456 @strong{Please note:} Currently the implementation of CROSSTABS has the
 457 followings bugs:
 458
 459 @itemize @bullet
 460 @item
 461 Pearson's R (but not Spearman) is off a little.
 462 @item
 463 T values for Spearman's R and Pearson's R are wrong.
 464 @item
 465 Significance of symmetric and directional measures is not calculated.
 466 @item
 467 Asymmetric ASEs and T values for lambda are wrong.
 468 @item
 469 ASE of Goodman and Kruskal's tau is not calculated.
 470 @item
 471 ASE of symmetric somers' d is wrong.
 472 @item
 473 Approximate T of uncertainty coefficient is wrong.
 474 @end itemize
 475
 476 Fixes for any of these deficiencies would be welcomed.
 477
 478 @node T-TEST, ONEWAY, CROSSTABS, Statistics
 479 @comment  node-name,  next,  previous,  up
 480 @section T-TEST
 481
 482 @vindex T-TEST
 483 @display
 484 T-TEST
 485         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 486         /CRITERIA=CIN(confidence)
 487
 488
 489 (One Sample mode.)
 490         TESTVAL=test_value
 491         /VARIABLES=var_list
 492
 493
 494 (Independent Samples mode.)
 495         GROUPS=var(value1 [, value2])
 496         /VARIABLES=var_list
 497
 498
 499 (Paired Samples mode.)
 500         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 501
 502 @end display
 503
 504
 505 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 506 means.
 507 It operates in one of three modes:
 508 @itemize
 509 @item One Sample mode.
 510 @item Independent Groups mode.
 511 @item Paired mode.
 512 @end itemize
 513
 514 @noindent
 515 Each of these modes are described in more detail below.
 516 There are two optional subcommands which are common to all modes.
 517
 518 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 519 in the tests.  The default value is 0.95.
 520
 521
 522 The @cmd{MISSING} subcommand determines the handling of missing
 523 variables.
 524 If INCLUDE is set, then user-missing values are included in the
 525 calculations, but system-missing values are not.
 526 If EXCLUDE is set, which is the default, user-missing
 527 values are excluded as well as system-missing values.
 528 This is the default.
 529
 530 If LISTWISE is set, then the entire case is excluded from analysis
 531 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 532 @cmd{/GROUPS} subcommands contains a missing value.
 533 If ANALYSIS is set, then missing values are excluded only in the analysis for
 534 which they would be needed. This is the default.
 535
 536
 537 @menu
 538 * One Sample Mode::             Testing against a hypothesised mean
 539 * Independent Samples Mode::    Testing two independent groups for equal mean
 540 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 541 @end menu
 542
 543 @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
 544 @subsection One Sample Mode
 545
 546 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 547 This mode is used to test a population mean against a hypothesised
 548 mean.
 549 The value given to the @cmd{TESTVAL} subcommand is the value against
 550 which you wish to test.
 551 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 552 tell PSPP which variables you wish to test.
 553
 554 @node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST
 555 @comment  node-name,  next,  previous,  up
 556 @subsection Independent Samples Mode
 557
 558 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 559 `Groups' mode.
 560 This mode is used to test whether two groups of values have the
 561 same population mean.
 562 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 563 tell PSPP the dependent variables you wish to test.
 564
 565 The variable given in the @cmd{GROUPS} subcommand is the independent
 566 variable which determines to which group the samples belong.
 567 The values in parentheses are the specific values of the independent
 568 variable for each group.
 569 If the parentheses are omitted and no values are given, the default values
 570 of 1.0 and 2.0 are assumed.
 571
 572 If the independent variable is numeric,
 573 it is acceptable to specify only one value inside the parentheses.
 574 If you do this, cases where the independent variable is
 575 less than  or equal to this value belong to the first group, and cases
 576 greater than this value belong to the second group.
 577 When using this form of the @cmd{GROUPS} subcommand, missing values in
 578 the independent variable are excluded on a listwise basis, regardless
 579 of whether @cmd{/MISSING=LISTWISE} was specified.
 580
 581
 582 @node Paired Samples Mode,  , Independent Samples Mode, T-TEST
 583 @comment  node-name,  next,  previous,  up
 584 @subsection Paired Samples Mode
 585
 586 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 587 Use this mode when repeated measures have been taken from the same
 588 samples.
 589 If the the @code{WITH} keyword is omitted, then tables for all
 590 combinations of variables given in the @cmd{PAIRS} subcommand are
 591 generated.
 592 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 593 is also given, then the number of variables preceding @code{WITH}
 594 must be the same as the number following it.
 595 In this case, tables for each respective pair of variables are
 596 generated.
 597 In the event that the @code{WITH} keyword is given, but the
 598 @code{(PAIRED)} keyword is omitted, then tables for each combination
 599 of variable preceding @code{WITH} against variable following
 600 @code{WITH} are generated.
 601
 602
 603 @node ONEWAY, REGRESSION, T-TEST, Statistics
 604 @comment  node-name,  next,  previous,  up
 605 @section ONEWAY
 606
 607 @vindex ONEWAY
 608 @cindex analysis of variance
 609 @cindex ANOVA
 610
 611 @display
 612 ONEWAY
 613         [/VARIABLES = ] var_list BY var
 614         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 615         /CONTRASTS= value1 [, value2] ... [,valueN]
 616         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 617
 618 @end display
 619
 620 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 621 variables factored by a single independent variable.
 622 It is used to compare the means of a population
 623 divided into more than two groups.
 624
 625 The  variables to be analysed should be given in the @code{VARIABLES}
 626 subcommand.
 627 The list of variables must be followed by the @code{BY} keyword and
 628 the name of the independent (or factor) variable.
 629
 630 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 631 ancilliary information.  The options accepted are:
 632 @itemize
 633 @item DESCRIPTIVES
 634 Displays descriptive statistics about the groups factored by the independent
 635 variable.
 636 @item HOMOGENEITY
 637 Displays the Levene test of Homogeneity of Variance for the
 638 variables and their groups.
 639 @end itemize
 640
 641 The @code{CONTRASTS} subcommand is used when you anticipate certain
 642 differences between the groups.
 643 The subcommand must be followed by a list of numerals which are the
 644 coefficients of the groups to be tested.
 645 The number of coefficients must correspond to the number of distinct
 646 groups (or values of the independent variable).
 647 If the total sum of the coefficients are not zero, then PSPP will
 648 display a warning, but will proceed with the analysis.
 649 The @code{CONTRASTS} subcommand may be given up to 10 times in order
 650 to specify different contrast tests.
 651 @setfilename ignored
 652
 653 @include regression.texi