pintos-os.org Git - pspp/blob - doc/statistics.texi

   1 @node Statistics, Utilities, Conditionals and Looping, Top
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @c If you add any new commands, then don't forget to remove the entry in
   8 @c not-implemented.texi
   9
  10 @menu
  11 * DESCRIPTIVES::                Descriptive statistics.
  12 * FREQUENCIES::                 Frequency tables.
  13 * EXAMINE::                     Testing data for normality.
  14 * CROSSTABS::                   Crosstabulation tables.
  15 * T-TEST::                      Test hypotheses about means.
  16 * ONEWAY::                      One way analysis of variance.
  17 @end menu
  18
  19 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
  20 @section DESCRIPTIVES
  21
  22 @vindex DESCRIPTIVES
  23 @display
  24 DESCRIPTIVES
  25         /VARIABLES=var_list
  26         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  27         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  28         /SAVE
  29         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  30                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  31                      SESKEWNESS,SEKURTOSIS@}
  32         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  33                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  34               @{A,D@}
  35 @end display
  36
  37 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  38 descriptive
  39 statistics requested by the user.  In addition, it can optionally
  40 compute Z-scores.
  41
  42 The VARIABLES subcommand, which is required, specifies the list of
  43 variables to be analyzed.  Keyword VARIABLES is optional.
  44
  45 All other subcommands are optional:
  46
  47 The MISSING subcommand determines the handling of missing variables.  If
  48 INCLUDE is set, then user-missing values are included in the
  49 calculations.  If NOINCLUDE is set, which is the default, user-missing
  50 values are excluded.  If VARIABLE is set, then missing values are
  51 excluded on a variable by variable basis; if LISTWISE is set, then
  52 the entire case is excluded whenever any value in that case has a
  53 system-missing or, if INCLUDE is set, user-missing value.
  54
  55 The FORMAT subcommand affects the output format.  Currently the
  56 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  57 set, both valid and missing number of cases are listed in the output;
  58 when NOSERIAL is set, only valid cases are listed.
  59
  60 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  61 the specified variables.  The Z scores are saved to new variables.
  62 Variable names are generated by trying first the original variable name
  63 with Z prepended and truncated to a maximum of 8 characters, then the
  64 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  65 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  66 variable names can be specified explicitly on VARIABLES in the variable
  67 list by enclosing them in parentheses after each variable.
  68
  69 The STATISTICS subcommand specifies the statistics to be displayed:
  70
  71 @table @code
  72 @item ALL
  73 All of the statistics below.
  74 @item MEAN
  75 Arithmetic mean.
  76 @item SEMEAN
  77 Standard error of the mean.
  78 @item STDDEV
  79 Standard deviation.
  80 @item VARIANCE
  81 Variance.
  82 @item KURTOSIS
  83 Kurtosis and standard error of the kurtosis.
  84 @item SKEWNESS
  85 Skewness and standard error of the skewness.
  86 @item RANGE
  87 Range.
  88 @item MINIMUM
  89 Minimum value.
  90 @item MAXIMUM
  91 Maximum value.
  92 @item SUM
  93 Sum.
  94 @item DEFAULT
  95 Mean, standard deviation of the mean, minimum, maximum.
  96 @item SEKURTOSIS
  97 Standard error of the kurtosis.
  98 @item SESKEWNESS
  99 Standard error of the skewness.
 100 @end table
 101
 102 The SORT subcommand specifies how the statistics should be sorted.  Most
 103 of the possible values should be self-explanatory.  NAME causes the
 104 statistics to be sorted by name.  By default, the statistics are listed
 105 in the order that they are specified on the VARIABLES subcommand.  The A
 106 and D settings request an ascending or descending sort order,
 107 respectively.
 108
 109 @node FREQUENCIES, EXAMINE, DESCRIPTIVES, Statistics
 110 @section FREQUENCIES
 111
 112 @vindex FREQUENCIES
 113 @display
 114 FREQUENCIES
 115         /VARIABLES=var_list
 116         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 117                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 118                 @{LABELS,NOLABELS@}
 119                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 120                 @{SINGLE,DOUBLE@}
 121                 @{OLDPAGE,NEWPAGE@}
 122         /MISSING=@{EXCLUDE,INCLUDE@}
 123         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 124                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 125                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 126         /NTILES=ntiles
 127         /PERCENTILES=percent@dots{}
 128
 129 (These options are not currently implemented.)
 130         /BARCHART=@dots{}
 131         /HISTOGRAM=@dots{}
 132         /HBAR=@dots{}
 133         /GROUPED=@dots{}
 134
 135 (Integer mode.)
 136         /VARIABLES=var_list (low,high)@dots{}
 137 @end display
 138
 139 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 140 variables.
 141 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 142 (including median and mode) and percentiles.
 143
 144 In the future, @cmd{FREQUENCIES} will also support graphical output in the
 145 form of bar charts and histograms.  In addition, it will be able to
 146 support percentiles for grouped data.
 147
 148 The VARIABLES subcommand is the only required subcommand.  Specify the
 149 variables to be analyzed.  In most cases, this is all that is required.
 150 This is known as @dfn{general mode}.
 151
 152 Occasionally, one may want to invoke a special mode called @dfn{integer
 153 mode}.  Normally, in general mode, PSPP will automatically determine
 154 what values occur in the data.  In integer mode, the user specifies the
 155 range of values that the data assumes.  To invoke this mode, specify a
 156 range of data values in parentheses, separated by a comma.  Data values
 157 inside the range are truncated to the nearest integer, then assigned to
 158 that value.  If values occur outside this range, they are discarded.
 159
 160 The FORMAT subcommand controls the output format.  It has several
 161 possible settings:
 162
 163 @itemize @bullet
 164 @item
 165 TABLE, the default, causes a frequency table to be output for every
 166 variable specified.  NOTABLE prevents them from being output.  LIMIT
 167 with a numeric argument causes them to be output except when there are
 168 more than the specified number of values in the table.
 169
 170 @item
 171 STANDARD frequency tables contain more complete information, but also to
 172 take up more space on the printed page.  CONDENSE frequency tables are
 173 less informative but take up less space.  ONEPAGE with a numeric
 174 argument will output standard frequency tables if there are the
 175 specified number of values or less, condensed tables otherwise.  ONEPAGE
 176 without an argument defaults to a threshold of 50 values.
 177
 178 @item
 179 LABELS causes value labels to be displayed in STANDARD frequency
 180 tables.  NOLABLES prevents this.
 181
 182 @item
 183 Normally frequency tables are sorted in ascending order by value.  This
 184 is AVALUE.  DVALUE tables are sorted in descending order by value.
 185 AFREQ and DFREQ tables are sorted in ascending and descending order,
 186 respectively, by frequency count.
 187
 188 @item
 189 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 190 frequency tables have wider spacing.
 191
 192 @item
 193 OLDPAGE and NEWPAGE are not currently used.
 194 @end itemize
 195
 196 The MISSING subcommand controls the handling of user-missing values.
 197 When EXCLUDE, the default, is set, user-missing values are not included
 198 in frequency tables or statistics.  When INCLUDE is set, user-missing
 199 are included.  System-missing values are never included in statistics,
 200 but are listed in frequency tables.
 201
 202 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 203 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 204 value, and MODE, the mode.  (If there are multiple modes, the smallest
 205 value is reported.)  By default, the mean, standard deviation of the
 206 mean, minimum, and maximum are reported for each variable.
 207
 208 PERCENTILES causes the specified percentiles to be reported.
 209 The percentiles should  be presented at a list of numbers between 0
 210 and 100 inclusive.
 211 The NTILES subcommand causes the percentiles to be reported at the
 212 boundaries of the data set divided into the specified number of ranges.
 213 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 214
 215
 216 @node EXAMINE, CROSSTABS, FREQUENCIES, Statistics
 217 @comment  node-name,  next,  previous,  up
 218 @section EXAMINE
 219 @vindex EXAMINE
 220
 221 @cindex Normality, testing for
 222
 223 @display
 224 EXAMINE
 225         VARIABLES=var_list [BY factor_list ]
 226         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 227         /PLOT=@{STEMLEAF, BOXPLOT, NPPLOT, SPREADLEVEL(n), HISTOGRAM,
 228                ALL, NONE@}
 229         /CINTERVAL n
 230         /COMPARE=@{GROUPS,VARIABLES@}
 231         /ID=@{case_number, var_name@}
 232         /@{TOTAL,NOTOTAL@}
 233         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 234         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 235                 [@{NOREPORT,REPORT@}]
 236
 237 @end display
 238
 239 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 240 normal distribution.  It also shows you outliers and extreme values.
 241
 242 The VARIABLES subcommand specifies the dependent variables and the
 243 independent variable to use as factors for the analysis.   Variables
 244 listed before the first BY keyword are the dependent variables.
 245 The dependent variables may optionally be followed by a list of
 246 factors which tell PSPP how to break down the analysis for each
 247 dependent variable.  The format for each factor is
 248 @display
 249 var [BY var].
 250 @end display
 251
 252
 253 The STATISTICS subcommand specifies the analysis to be done.
 254 DESCRIPTIVES will produce a table showing some parametric and
 255 non-parametrics statistics.  EXTREME produces a table showing extreme
 256 values of the dependent variable.  A number in parentheses determines
 257 how many upper and lower extremes to show.  The default number is 5.
 258
 259
 260 The PLOT subcommand specifies which plots are to be produced if any.
 261
 262 The CINTERVAL subcommand specifies the confidence interval to use in
 263 calculation of the descriptives command.  The default it 95%.
 264
 265 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 266 and which algorithm to use for calculating them.  The default is to
 267 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 268 HAVERAGE algorithm.
 269
 270 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 271 is given and factors have been specified in the VARIABLES subcommand,
 272 then then statistics for the unfactored dependent variables are
 273 produced in addition to the factored variables.  If there are no
 274 factors specified then TOTAL and NOTOTAL have no effect.
 275
 276 @strong{Warning!}
 277 If many dependent variable are given, or factors are given for which
 278 there are many distinct values, then @cmd{EXAMINE} will produce a very
 279 large quantity of output.
 280
 281
 282 @node CROSSTABS, T-TEST, EXAMINE, Statistics
 283 @section CROSSTABS
 284
 285 @vindex CROSSTABS
 286 @display
 287 CROSSTABS
 288         /TABLES=var_list BY var_list [BY var_list]@dots{}
 289         /MISSING=@{TABLE,INCLUDE,REPORT@}
 290         /WRITE=@{NONE,CELLS,ALL@}
 291         /FORMAT=@{TABLES,NOTABLES@}
 292                 @{LABELS,NOLABELS,NOVALLABS@}
 293                 @{PIVOT,NOPIVOT@}
 294                 @{AVALUE,DVALUE@}
 295                 @{NOINDEX,INDEX@}
 296                 @{BOX,NOBOX@}
 297         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 298                 ASRESIDUAL,ALL,NONE@}
 299         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 300                      KAPPA,ETA,CORR,ALL,NONE@}
 301
 302 (Integer mode.)
 303         /VARIABLES=var_list (low,high)@dots{}
 304 @end display
 305
 306 The @cmd{CROSSTABS} procedure displays crosstabulation
 307 tables requested by the user.  It can calculate several statistics for
 308 each cell in the crosstabulation tables.  In addition, a number of
 309 statistics can be calculated for each table itself.
 310
 311 The TABLES subcommand is used to specify the tables to be reported.  Any
 312 number of dimensions is permitted, and any number of variables per
 313 dimension is allowed.  The TABLES subcommand may be repeated as many
 314 times as needed.  This is the only required subcommand in @dfn{general
 315 mode}.
 316
 317 Occasionally, one may want to invoke a special mode called @dfn{integer
 318 mode}.  Normally, in general mode, PSPP automatically determines
 319 what values occur in the data.  In integer mode, the user specifies the
 320 range of values that the data assumes.  To invoke this mode, specify the
 321 VARIABLES subcommand, giving a range of data values in parentheses for
 322 each variable to be used on the TABLES subcommand.  Data values inside
 323 the range are truncated to the nearest integer, then assigned to that
 324 value.  If values occur outside this range, they are discarded.  When it
 325 is present, the VARIABLES subcommand must precede the TABLES
 326 subcommand.
 327
 328 In general mode, numeric and string variables may be specified on
 329 TABLES.  Although long string variables are allowed, only their
 330 initial short-string parts are used.  In integer mode, only numeric
 331 variables are allowed.
 332
 333 The MISSING subcommand determines the handling of user-missing values.
 334 When set to TABLE, the default, missing values are dropped on a table by
 335 table basis.  When set to INCLUDE, user-missing values are included in
 336 tables and statistics.  When set to REPORT, which is allowed only in
 337 integer mode, user-missing values are included in tables but marked with
 338 an @samp{M} (for ``missing'') and excluded from statistical
 339 calculations.
 340
 341 Currently the WRITE subcommand is ignored.
 342
 343 The FORMAT subcommand controls the characteristics of the
 344 crosstabulation tables to be displayed.  It has a number of possible
 345 settings:
 346
 347 @itemize @bullet
 348 @item
 349 TABLES, the default, causes crosstabulation tables to be output.
 350 NOTABLES suppresses them.
 351
 352 @item
 353 LABELS, the default, allows variable labels and value labels to appear
 354 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 355 labels but suppresses value labels.
 356
 357 @item
 358 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 359 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 360 to be used.
 361
 362 @item
 363 AVALUE, the default, causes values to be sorted in ascending order.
 364 DVALUE asserts a descending sort order.
 365
 366 @item
 367 INDEX/NOINDEX is currently ignored.
 368
 369 @item
 370 BOX/NOBOX is currently ignored.
 371 @end itemize
 372
 373 The CELLS subcommand controls the contents of each cell in the displayed
 374 crosstabulation table.  The possible settings are:
 375
 376 @table @asis
 377 @item COUNT
 378 Frequency count.
 379 @item ROW
 380 Row percent.
 381 @item COLUMN
 382 Column percent.
 383 @item TOTAL
 384 Table percent.
 385 @item EXPECTED
 386 Expected value.
 387 @item RESIDUAL
 388 Residual.
 389 @item SRESIDUAL
 390 Standardized residual.
 391 @item ASRESIDUAL
 392 Adjusted standardized residual.
 393 @item ALL
 394 All of the above.
 395 @item NONE
 396 Suppress cells entirely.
 397 @end table
 398
 399 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 400 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 401 will be selected.
 402
 403 The STATISTICS subcommand selects statistics for computation:
 404
 405 @table @asis
 406 @item CHISQ
 407 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 408 correction, linear-by-linear association.
 409 @item PHI
 410 Phi.
 411 @item CC
 412 Contingency coefficient.
 413 @item LAMBDA
 414 Lambda.
 415 @item UC
 416 Uncertainty coefficient.
 417 @item BTAU
 418 Tau-b.
 419 @item CTAU
 420 Tau-c.
 421 @item RISK
 422 Risk estimate.
 423 @item GAMMA
 424 Gamma.
 425 @item D
 426 Somers' D.
 427 @item KAPPA
 428 Cohen's Kappa.
 429 @item ETA
 430 Eta.
 431 @item CORR
 432 Spearman correlation, Pearson's r.
 433 @item ALL
 434 All of the above.
 435 @item NONE
 436 No statistics.
 437 @end table
 438
 439 Selected statistics are only calculated when appropriate for the
 440 statistic.  Certain statistics require tables of a particular size, and
 441 some statistics are calculated only in integer mode.
 442
 443 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 444 STATISTICS subcommand is not given, no statistics are calculated.
 445
 446 @strong{Please note:} Currently the implementation of CROSSTABS has the
 447 followings bugs:
 448
 449 @itemize @bullet
 450 @item
 451 Pearson's R (but not Spearman) is off a little.
 452 @item
 453 T values for Spearman's R and Pearson's R are wrong.
 454 @item
 455 Significance of symmetric and directional measures is not calculated.
 456 @item
 457 Asymmetric ASEs and T values for lambda are wrong.
 458 @item
 459 ASE of Goodman and Kruskal's tau is not calculated.
 460 @item
 461 ASE of symmetric somers' d is wrong.
 462 @item
 463 Approximate T of uncertainty coefficient is wrong.
 464 @end itemize
 465
 466 Fixes for any of these deficiencies would be welcomed.
 467
 468 @node T-TEST, ONEWAY, CROSSTABS, Statistics
 469 @comment  node-name,  next,  previous,  up
 470 @section T-TEST
 471
 472 @vindex T-TEST
 473 @display
 474 T-TEST
 475         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 476         /CRITERIA=CIN(confidence)
 477
 478
 479 (One Sample mode.)
 480         TESTVAL=test_value
 481         /VARIABLES=var_list
 482
 483
 484 (Independent Samples mode.)
 485         GROUPS=var(value1 [, value2])
 486         /VARIABLES=var_list
 487
 488
 489 (Paired Samples mode.)
 490         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 491
 492 @end display
 493
 494
 495 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 496 means.
 497 It operates in one of three modes:
 498 @itemize
 499 @item One Sample mode.
 500 @item Independent Groups mode.
 501 @item Paired mode.
 502 @end itemize
 503
 504 @noindent
 505 Each of these modes are described in more detail below.
 506 There are two optional subcommands which are common to all modes.
 507
 508 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 509 in the tests.  The default value is 0.95.
 510
 511
 512 The @cmd{MISSING} subcommand determines the handling of missing
 513 variables.
 514 If INCLUDE is set, then user-missing values are included in the
 515 calculations, but system-missing values are not.
 516 If EXCLUDE is set, which is the default, user-missing
 517 values are excluded as well as system-missing values.
 518 This is the default.
 519
 520 If LISTWISE is set, then the entire case is excluded from analysis
 521 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 522 @cmd{/GROUPS} subcommands contains a missing value.
 523 If ANALYSIS is set, then missing values are excluded only in the analysis for
 524 which they would be needed. This is the default.
 525
 526
 527 @menu
 528 * One Sample Mode::             Testing against a hypothesised mean
 529 * Independent Samples Mode::    Testing two independent groups for equal mean
 530 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 531 @end menu
 532
 533 @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
 534 @subsection One Sample Mode
 535
 536 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 537 This mode is used to test a population mean against a hypothesised
 538 mean.
 539 The value given to the @cmd{TESTVAL} subcommand is the value against
 540 which you wish to test.
 541 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 542 tell PSPP which variables you wish to test.
 543
 544 @node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST
 545 @comment  node-name,  next,  previous,  up
 546 @subsection Independent Samples Mode
 547
 548 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 549 `Groups' mode.
 550 This mode is used to test whether two groups of values have the
 551 same population mean.
 552 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 553 tell PSPP the dependent variables you wish to test.
 554
 555 The variable given in the @cmd{GROUPS} subcommand is the independent
 556 variable which determines to which group the samples belong.
 557 The values in parentheses are the specific values of the independent
 558 variable for each group.
 559 If the parentheses are omitted and no values are given, the default values
 560 of 1.0 and 2.0 are assumed.
 561
 562 If the independent variable is numeric,
 563 it is acceptable to specify only one value inside the parentheses.
 564 If you do this, cases where the independent variable is
 565 less than  or equal to this value belong to the first group, and cases
 566 greater than this value belong to the second group.
 567 When using this form of the @cmd{GROUPS} subcommand, missing values in
 568 the independent variable are excluded on a listwise basis, regardless
 569 of whether @cmd{/MISSING=LISTWISE} was specified.
 570
 571
 572 @node Paired Samples Mode,  , Independent Samples Mode, T-TEST
 573 @comment  node-name,  next,  previous,  up
 574 @subsection Paired Samples Mode
 575
 576 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 577 Use this mode when repeated measures have been taken from the same
 578 samples.
 579 If the the @code{WITH} keyword is omitted, then tables for all
 580 combinations of variables given in the @cmd{PAIRS} subcommand are
 581 generated.
 582 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 583 is also given, then the number of variables preceding @code{WITH}
 584 must be the same as the number following it.
 585 In this case, tables for each respective pair of variables are
 586 generated.
 587 In the event that the @code{WITH} keyword is given, but the
 588 @code{(PAIRED)} keyword is omitted, then tables for each combination
 589 of variable preceding @code{WITH} against variable following
 590 @code{WITH} are generated.
 591
 592
 593 @node ONEWAY, , T-TEST, Statistics
 594 @comment  node-name,  next,  previous,  up
 595 @section Oneway
 596
 597 @vindex ONEWAY
 598 @cindex analysis of variance
 599 @cindex ANOVA
 600
 601 @display
 602 ONEWAY
 603         [/VARIABLES = ] var_list BY var
 604         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 605         /CONTRASTS= value1 [, value2] ... [,valueN]
 606         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 607
 608 @end display
 609
 610 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 611 variables factored by a single independent variable.
 612 It is used to compare the means of a population
 613 divided into more than two groups.
 614
 615 The  variables to be analysed should be given in the @code{VARIABLES}
 616 subcommand.
 617 The list of variables must be followed by the @code{BY} keyword and
 618 the name of the independent (or factor) variable.
 619
 620 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 621 ancilliary information.  The options accepted are:
 622 @itemize
 623 @item DESCRIPTIVES
 624 Displays descriptive statistics about the groups factored by the independent
 625 variable.
 626 @item HOMOGENEITY
 627 Displays the Levene test of Homogeneity of Variance for the
 628 variables and their groups.
 629 @end itemize
 630
 631 The @code{CONTRASTS} subcommand is used when you anticipate certain
 632 differences between the groups.
 633 The subcommand must be followed by a list of numerals which are the
 634 coefficients of the groups to be tested.
 635 The number of coefficients must correspond to the number of distinct
 636 groups (or values of the independent variable).
 637 If the total sum of the coefficients are not zero, then PSPP will
 638 display a warning, but will proceed with the analysis.
 639 The @code{CONTRASTS} subcommand may be given up to 10 times in order
 640 to specify different contrast tests.
 641 @setfilename ignored