doc/statistics.texi

   1 @node Statistics, Utilities, Conditionals and Looping, Top
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @c If you add any new commands, then don't forget to remove the entry in
   8 @c not-implemented.texi
   9
  10 @menu
  11 * DESCRIPTIVES::                Descriptive statistics.
  12 * FREQUENCIES::                 Frequency tables.
  13 * EXAMINE::                     Testing data for normality.
  14 * CROSSTABS::                   Crosstabulation tables.
  15 * T-TEST::                      Test hypotheses about means.
  16 * ONEWAY::                      One way analysis of variance.
  17 * RANK::                        Compute rank scores.
  18 * REGRESSION::                  Linear regression.
  19 @end menu
  20
  21 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
  22 @section DESCRIPTIVES
  23
  24 @vindex DESCRIPTIVES
  25 @display
  26 DESCRIPTIVES
  27         /VARIABLES=var_list
  28         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  29         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  30         /SAVE
  31         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  32                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  33                      SESKEWNESS,SEKURTOSIS@}
  34         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  35                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  36               @{A,D@}
  37 @end display
  38
  39 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  40 descriptive
  41 statistics requested by the user.  In addition, it can optionally
  42 compute Z-scores.
  43
  44 The VARIABLES subcommand, which is required, specifies the list of
  45 variables to be analyzed.  Keyword VARIABLES is optional.
  46
  47 All other subcommands are optional:
  48
  49 The MISSING subcommand determines the handling of missing variables.  If
  50 INCLUDE is set, then user-missing values are included in the
  51 calculations.  If NOINCLUDE is set, which is the default, user-missing
  52 values are excluded.  If VARIABLE is set, then missing values are
  53 excluded on a variable by variable basis; if LISTWISE is set, then
  54 the entire case is excluded whenever any value in that case has a
  55 system-missing or, if INCLUDE is set, user-missing value.
  56
  57 The FORMAT subcommand affects the output format.  Currently the
  58 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  59 set, both valid and missing number of cases are listed in the output;
  60 when NOSERIAL is set, only valid cases are listed.
  61
  62 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  63 the specified variables.  The Z scores are saved to new variables.
  64 Variable names are generated by trying first the original variable name
  65 with Z prepended and truncated to a maximum of 8 characters, then the
  66 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  67 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  68 variable names can be specified explicitly on VARIABLES in the variable
  69 list by enclosing them in parentheses after each variable.
  70
  71 The STATISTICS subcommand specifies the statistics to be displayed:
  72
  73 @table @code
  74 @item ALL
  75 All of the statistics below.
  76 @item MEAN
  77 Arithmetic mean.
  78 @item SEMEAN
  79 Standard error of the mean.
  80 @item STDDEV
  81 Standard deviation.
  82 @item VARIANCE
  83 Variance.
  84 @item KURTOSIS
  85 Kurtosis and standard error of the kurtosis.
  86 @item SKEWNESS
  87 Skewness and standard error of the skewness.
  88 @item RANGE
  89 Range.
  90 @item MINIMUM
  91 Minimum value.
  92 @item MAXIMUM
  93 Maximum value.
  94 @item SUM
  95 Sum.
  96 @item DEFAULT
  97 Mean, standard deviation of the mean, minimum, maximum.
  98 @item SEKURTOSIS
  99 Standard error of the kurtosis.
 100 @item SESKEWNESS
 101 Standard error of the skewness.
 102 @end table
 103
 104 The SORT subcommand specifies how the statistics should be sorted.  Most
 105 of the possible values should be self-explanatory.  NAME causes the
 106 statistics to be sorted by name.  By default, the statistics are listed
 107 in the order that they are specified on the VARIABLES subcommand.  The A
 108 and D settings request an ascending or descending sort order,
 109 respectively.
 110
 111 @node FREQUENCIES, EXAMINE, DESCRIPTIVES, Statistics
 112 @section FREQUENCIES
 113
 114 @vindex FREQUENCIES
 115 @display
 116 FREQUENCIES
 117         /VARIABLES=var_list
 118         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 119                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 120                 @{LABELS,NOLABELS@}
 121                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 122                 @{SINGLE,DOUBLE@}
 123                 @{OLDPAGE,NEWPAGE@}
 124         /MISSING=@{EXCLUDE,INCLUDE@}
 125         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 126                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 127                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 128         /NTILES=ntiles
 129         /PERCENTILES=percent@dots{}
 130
 131 (These options are not currently implemented.)
 132         /BARCHART=@dots{}
 133         /HISTOGRAM=@dots{}
 134         /HBAR=@dots{}
 135         /GROUPED=@dots{}
 136
 137 (Integer mode.)
 138         /VARIABLES=var_list (low,high)@dots{}
 139 @end display
 140
 141 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 142 variables.
 143 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 144 (including median and mode) and percentiles.
 145
 146 In the future, @cmd{FREQUENCIES} will also support graphical output in the
 147 form of bar charts and histograms.  In addition, it will be able to
 148 support percentiles for grouped data.
 149
 150 The VARIABLES subcommand is the only required subcommand.  Specify the
 151 variables to be analyzed.  In most cases, this is all that is required.
 152 This is known as @dfn{general mode}.
 153
 154 Occasionally, one may want to invoke a special mode called @dfn{integer
 155 mode}.  Normally, in general mode, PSPP will automatically determine
 156 what values occur in the data.  In integer mode, the user specifies the
 157 range of values that the data assumes.  To invoke this mode, specify a
 158 range of data values in parentheses, separated by a comma.  Data values
 159 inside the range are truncated to the nearest integer, then assigned to
 160 that value.  If values occur outside this range, they are discarded.
 161
 162 The FORMAT subcommand controls the output format.  It has several
 163 possible settings:
 164
 165 @itemize @bullet
 166 @item
 167 TABLE, the default, causes a frequency table to be output for every
 168 variable specified.  NOTABLE prevents them from being output.  LIMIT
 169 with a numeric argument causes them to be output except when there are
 170 more than the specified number of values in the table.
 171
 172 @item
 173 STANDARD frequency tables contain more complete information, but also to
 174 take up more space on the printed page.  CONDENSE frequency tables are
 175 less informative but take up less space.  ONEPAGE with a numeric
 176 argument will output standard frequency tables if there are the
 177 specified number of values or less, condensed tables otherwise.  ONEPAGE
 178 without an argument defaults to a threshold of 50 values.
 179
 180 @item
 181 LABELS causes value labels to be displayed in STANDARD frequency
 182 tables.  NOLABLES prevents this.
 183
 184 @item
 185 Normally frequency tables are sorted in ascending order by value.  This
 186 is AVALUE.  DVALUE tables are sorted in descending order by value.
 187 AFREQ and DFREQ tables are sorted in ascending and descending order,
 188 respectively, by frequency count.
 189
 190 @item
 191 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 192 frequency tables have wider spacing.
 193
 194 @item
 195 OLDPAGE and NEWPAGE are not currently used.
 196 @end itemize
 197
 198 The MISSING subcommand controls the handling of user-missing values.
 199 When EXCLUDE, the default, is set, user-missing values are not included
 200 in frequency tables or statistics.  When INCLUDE is set, user-missing
 201 are included.  System-missing values are never included in statistics,
 202 but are listed in frequency tables.
 203
 204 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 205 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 206 value, and MODE, the mode.  (If there are multiple modes, the smallest
 207 value is reported.)  By default, the mean, standard deviation of the
 208 mean, minimum, and maximum are reported for each variable.
 209
 210 PERCENTILES causes the specified percentiles to be reported.
 211 The percentiles should  be presented at a list of numbers between 0
 212 and 100 inclusive.
 213 The NTILES subcommand causes the percentiles to be reported at the
 214 boundaries of the data set divided into the specified number of ranges.
 215 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 216
 217
 218 @node EXAMINE, CROSSTABS, FREQUENCIES, Statistics
 219 @comment  node-name,  next,  previous,  up
 220 @section EXAMINE
 221 @vindex EXAMINE
 222
 223 @cindex Normality, testing for
 224
 225 @display
 226 EXAMINE
 227         VARIABLES=var_list [BY factor_list ]
 228         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 229         /PLOT=@{STEMLEAF, BOXPLOT, NPPLOT, SPREADLEVEL(n), HISTOGRAM,
 230                ALL, NONE@}
 231         /CINTERVAL n
 232         /COMPARE=@{GROUPS,VARIABLES@}
 233         /ID=@{case_number, var_name@}
 234         /@{TOTAL,NOTOTAL@}
 235         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 236         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 237                 [@{NOREPORT,REPORT@}]
 238
 239 @end display
 240
 241 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 242 normal distribution.  It also shows you outliers and extreme values.
 243
 244 The VARIABLES subcommand specifies the dependent variables and the
 245 independent variable to use as factors for the analysis.   Variables
 246 listed before the first BY keyword are the dependent variables.
 247 The dependent variables may optionally be followed by a list of
 248 factors which tell PSPP how to break down the analysis for each
 249 dependent variable.  The format for each factor is
 250 @display
 251 var [BY var].
 252 @end display
 253
 254
 255 The STATISTICS subcommand specifies the analysis to be done.
 256 DESCRIPTIVES will produce a table showing some parametric and
 257 non-parametrics statistics.  EXTREME produces a table showing extreme
 258 values of the dependent variable.  A number in parentheses determines
 259 how many upper and lower extremes to show.  The default number is 5.
 260
 261
 262 The PLOT subcommand specifies which plots are to be produced if any.
 263
 264 The COMPARE subcommand is only relevant if producing boxplots, and it is only
 265 useful there is more than one dependent variable and at least one factor.   If
 266 /COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
 267 containing boxplots for all the factors.
 268 If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
 269 each containing one boxplot per dependent variable.
 270 If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
 271 /COMPARE=GROUPS.
 272
 273 The CINTERVAL subcommand specifies the confidence interval to use in
 274 calculation of the descriptives command.  The default it 95%.
 275
 276 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 277 and which algorithm to use for calculating them.  The default is to
 278 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 279 HAVERAGE algorithm.
 280
 281 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 282 is given and factors have been specified in the VARIABLES subcommand,
 283 then then statistics for the unfactored dependent variables are
 284 produced in addition to the factored variables.  If there are no
 285 factors specified then TOTAL and NOTOTAL have no effect.
 286
 287 @strong{Warning!}
 288 If many dependent variable are given, or factors are given for which
 289 there are many distinct values, then @cmd{EXAMINE} will produce a very
 290 large quantity of output.
 291
 292
 293 @node CROSSTABS, T-TEST, EXAMINE, Statistics
 294 @section CROSSTABS
 295
 296 @vindex CROSSTABS
 297 @display
 298 CROSSTABS
 299         /TABLES=var_list BY var_list [BY var_list]@dots{}
 300         /MISSING=@{TABLE,INCLUDE,REPORT@}
 301         /WRITE=@{NONE,CELLS,ALL@}
 302         /FORMAT=@{TABLES,NOTABLES@}
 303                 @{LABELS,NOLABELS,NOVALLABS@}
 304                 @{PIVOT,NOPIVOT@}
 305                 @{AVALUE,DVALUE@}
 306                 @{NOINDEX,INDEX@}
 307                 @{BOX,NOBOX@}
 308         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 309                 ASRESIDUAL,ALL,NONE@}
 310         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 311                      KAPPA,ETA,CORR,ALL,NONE@}
 312
 313 (Integer mode.)
 314         /VARIABLES=var_list (low,high)@dots{}
 315 @end display
 316
 317 The @cmd{CROSSTABS} procedure displays crosstabulation
 318 tables requested by the user.  It can calculate several statistics for
 319 each cell in the crosstabulation tables.  In addition, a number of
 320 statistics can be calculated for each table itself.
 321
 322 The TABLES subcommand is used to specify the tables to be reported.  Any
 323 number of dimensions is permitted, and any number of variables per
 324 dimension is allowed.  The TABLES subcommand may be repeated as many
 325 times as needed.  This is the only required subcommand in @dfn{general
 326 mode}.
 327
 328 Occasionally, one may want to invoke a special mode called @dfn{integer
 329 mode}.  Normally, in general mode, PSPP automatically determines
 330 what values occur in the data.  In integer mode, the user specifies the
 331 range of values that the data assumes.  To invoke this mode, specify the
 332 VARIABLES subcommand, giving a range of data values in parentheses for
 333 each variable to be used on the TABLES subcommand.  Data values inside
 334 the range are truncated to the nearest integer, then assigned to that
 335 value.  If values occur outside this range, they are discarded.  When it
 336 is present, the VARIABLES subcommand must precede the TABLES
 337 subcommand.
 338
 339 In general mode, numeric and string variables may be specified on
 340 TABLES.  Although long string variables are allowed, only their
 341 initial short-string parts are used.  In integer mode, only numeric
 342 variables are allowed.
 343
 344 The MISSING subcommand determines the handling of user-missing values.
 345 When set to TABLE, the default, missing values are dropped on a table by
 346 table basis.  When set to INCLUDE, user-missing values are included in
 347 tables and statistics.  When set to REPORT, which is allowed only in
 348 integer mode, user-missing values are included in tables but marked with
 349 an @samp{M} (for ``missing'') and excluded from statistical
 350 calculations.
 351
 352 Currently the WRITE subcommand is ignored.
 353
 354 The FORMAT subcommand controls the characteristics of the
 355 crosstabulation tables to be displayed.  It has a number of possible
 356 settings:
 357
 358 @itemize @bullet
 359 @item
 360 TABLES, the default, causes crosstabulation tables to be output.
 361 NOTABLES suppresses them.
 362
 363 @item
 364 LABELS, the default, allows variable labels and value labels to appear
 365 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 366 labels but suppresses value labels.
 367
 368 @item
 369 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 370 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 371 to be used.
 372
 373 @item
 374 AVALUE, the default, causes values to be sorted in ascending order.
 375 DVALUE asserts a descending sort order.
 376
 377 @item
 378 INDEX/NOINDEX is currently ignored.
 379
 380 @item
 381 BOX/NOBOX is currently ignored.
 382 @end itemize
 383
 384 The CELLS subcommand controls the contents of each cell in the displayed
 385 crosstabulation table.  The possible settings are:
 386
 387 @table @asis
 388 @item COUNT
 389 Frequency count.
 390 @item ROW
 391 Row percent.
 392 @item COLUMN
 393 Column percent.
 394 @item TOTAL
 395 Table percent.
 396 @item EXPECTED
 397 Expected value.
 398 @item RESIDUAL
 399 Residual.
 400 @item SRESIDUAL
 401 Standardized residual.
 402 @item ASRESIDUAL
 403 Adjusted standardized residual.
 404 @item ALL
 405 All of the above.
 406 @item NONE
 407 Suppress cells entirely.
 408 @end table
 409
 410 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 411 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 412 will be selected.
 413
 414 The STATISTICS subcommand selects statistics for computation:
 415
 416 @table @asis
 417 @item CHISQ
 418 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 419 correction, linear-by-linear association.
 420 @item PHI
 421 Phi.
 422 @item CC
 423 Contingency coefficient.
 424 @item LAMBDA
 425 Lambda.
 426 @item UC
 427 Uncertainty coefficient.
 428 @item BTAU
 429 Tau-b.
 430 @item CTAU
 431 Tau-c.
 432 @item RISK
 433 Risk estimate.
 434 @item GAMMA
 435 Gamma.
 436 @item D
 437 Somers' D.
 438 @item KAPPA
 439 Cohen's Kappa.
 440 @item ETA
 441 Eta.
 442 @item CORR
 443 Spearman correlation, Pearson's r.
 444 @item ALL
 445 All of the above.
 446 @item NONE
 447 No statistics.
 448 @end table
 449
 450 Selected statistics are only calculated when appropriate for the
 451 statistic.  Certain statistics require tables of a particular size, and
 452 some statistics are calculated only in integer mode.
 453
 454 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 455 STATISTICS subcommand is not given, no statistics are calculated.
 456
 457 @strong{Please note:} Currently the implementation of CROSSTABS has the
 458 followings bugs:
 459
 460 @itemize @bullet
 461 @item
 462 Pearson's R (but not Spearman) is off a little.
 463 @item
 464 T values for Spearman's R and Pearson's R are wrong.
 465 @item
 466 Significance of symmetric and directional measures is not calculated.
 467 @item
 468 Asymmetric ASEs and T values for lambda are wrong.
 469 @item
 470 ASE of Goodman and Kruskal's tau is not calculated.
 471 @item
 472 ASE of symmetric somers' d is wrong.
 473 @item
 474 Approximate T of uncertainty coefficient is wrong.
 475 @end itemize
 476
 477 Fixes for any of these deficiencies would be welcomed.
 478
 479 @node T-TEST, ONEWAY, CROSSTABS, Statistics
 480 @comment  node-name,  next,  previous,  up
 481 @section T-TEST
 482
 483 @vindex T-TEST
 484 @display
 485 T-TEST
 486         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 487         /CRITERIA=CIN(confidence)
 488
 489
 490 (One Sample mode.)
 491         TESTVAL=test_value
 492         /VARIABLES=var_list
 493
 494
 495 (Independent Samples mode.)
 496         GROUPS=var(value1 [, value2])
 497         /VARIABLES=var_list
 498
 499
 500 (Paired Samples mode.)
 501         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 502
 503 @end display
 504
 505
 506 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 507 means.
 508 It operates in one of three modes:
 509 @itemize
 510 @item One Sample mode.
 511 @item Independent Groups mode.
 512 @item Paired mode.
 513 @end itemize
 514
 515 @noindent
 516 Each of these modes are described in more detail below.
 517 There are two optional subcommands which are common to all modes.
 518
 519 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 520 in the tests.  The default value is 0.95.
 521
 522
 523 The @cmd{MISSING} subcommand determines the handling of missing
 524 variables.
 525 If INCLUDE is set, then user-missing values are included in the
 526 calculations, but system-missing values are not.
 527 If EXCLUDE is set, which is the default, user-missing
 528 values are excluded as well as system-missing values.
 529 This is the default.
 530
 531 If LISTWISE is set, then the entire case is excluded from analysis
 532 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 533 @cmd{/GROUPS} subcommands contains a missing value.
 534 If ANALYSIS is set, then missing values are excluded only in the analysis for
 535 which they would be needed. This is the default.
 536
 537
 538 @menu
 539 * One Sample Mode::             Testing against a hypothesised mean
 540 * Independent Samples Mode::    Testing two independent groups for equal mean
 541 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 542 @end menu
 543
 544 @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
 545 @subsection One Sample Mode
 546
 547 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 548 This mode is used to test a population mean against a hypothesised
 549 mean.
 550 The value given to the @cmd{TESTVAL} subcommand is the value against
 551 which you wish to test.
 552 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 553 tell PSPP which variables you wish to test.
 554
 555 @node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST
 556 @comment  node-name,  next,  previous,  up
 557 @subsection Independent Samples Mode
 558
 559 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 560 `Groups' mode.
 561 This mode is used to test whether two groups of values have the
 562 same population mean.
 563 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 564 tell PSPP the dependent variables you wish to test.
 565
 566 The variable given in the @cmd{GROUPS} subcommand is the independent
 567 variable which determines to which group the samples belong.
 568 The values in parentheses are the specific values of the independent
 569 variable for each group.
 570 If the parentheses are omitted and no values are given, the default values
 571 of 1.0 and 2.0 are assumed.
 572
 573 If the independent variable is numeric,
 574 it is acceptable to specify only one value inside the parentheses.
 575 If you do this, cases where the independent variable is
 576 less than  or equal to this value belong to the first group, and cases
 577 greater than this value belong to the second group.
 578 When using this form of the @cmd{GROUPS} subcommand, missing values in
 579 the independent variable are excluded on a listwise basis, regardless
 580 of whether @cmd{/MISSING=LISTWISE} was specified.
 581
 582
 583 @node Paired Samples Mode,  , Independent Samples Mode, T-TEST
 584 @comment  node-name,  next,  previous,  up
 585 @subsection Paired Samples Mode
 586
 587 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 588 Use this mode when repeated measures have been taken from the same
 589 samples.
 590 If the the @code{WITH} keyword is omitted, then tables for all
 591 combinations of variables given in the @cmd{PAIRS} subcommand are
 592 generated.
 593 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 594 is also given, then the number of variables preceding @code{WITH}
 595 must be the same as the number following it.
 596 In this case, tables for each respective pair of variables are
 597 generated.
 598 In the event that the @code{WITH} keyword is given, but the
 599 @code{(PAIRED)} keyword is omitted, then tables for each combination
 600 of variable preceding @code{WITH} against variable following
 601 @code{WITH} are generated.
 602
 603
 604 @node ONEWAY, RANK, T-TEST, Statistics
 605 @comment  node-name,  next,  previous,  up
 606 @section ONEWAY
 607
 608 @vindex ONEWAY
 609 @cindex analysis of variance
 610 @cindex ANOVA
 611
 612 @display
 613 ONEWAY
 614         [/VARIABLES = ] var_list BY var
 615         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 616         /CONTRASTS= value1 [, value2] ... [,valueN]
 617         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 618
 619 @end display
 620
 621 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 622 variables factored by a single independent variable.
 623 It is used to compare the means of a population
 624 divided into more than two groups.
 625
 626 The  variables to be analysed should be given in the @code{VARIABLES}
 627 subcommand.
 628 The list of variables must be followed by the @code{BY} keyword and
 629 the name of the independent (or factor) variable.
 630
 631 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 632 ancilliary information.  The options accepted are:
 633 @itemize
 634 @item DESCRIPTIVES
 635 Displays descriptive statistics about the groups factored by the independent
 636 variable.
 637 @item HOMOGENEITY
 638 Displays the Levene test of Homogeneity of Variance for the
 639 variables and their groups.
 640 @end itemize
 641
 642 The @code{CONTRASTS} subcommand is used when you anticipate certain
 643 differences between the groups.
 644 The subcommand must be followed by a list of numerals which are the
 645 coefficients of the groups to be tested.
 646 The number of coefficients must correspond to the number of distinct
 647 groups (or values of the independent variable).
 648 If the total sum of the coefficients are not zero, then PSPP will
 649 display a warning, but will proceed with the analysis.
 650 The @code{CONTRASTS} subcommand may be given up to 10 times in order
 651 to specify different contrast tests.
 652 @setfilename ignored
 653
 654 @node RANK, REGRESSION, ONEWAY, Statistics
 655 @comment  node-name,  next,  previous,  up
 656 @section RANK
 657
 658 @vindex RANK
 659 @cindex RANK
 660
 661 @display
 662 RANK
 663         [VARIABLES=] var_list [@{A,D@}] [BY var_list]
 664         /TIES=@{MEAN,LOW,HIGH,CONDENSE@}
 665         /FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
 666         /PRINT[=@{YES,NO@}
 667         /MISSING=@{EXCLUDE,INCLUDE@}
 668
 669         /RANK [INTO var_list]
 670         /NTILES(k) [INTO var_list]
 671         /NORMAL [INTO var_list]
 672         /PERCENT [INTO var_list]
 673         /RFRACTION [INTO var_list]
 674         /PROPORTION [INTO var_list]
 675         /N [INTO var_list]
 676         /SAVAGE [INTO var_list]
 677 @end display
 678
 679 The @cmd{RANK} command ranks variables and stores the results into new
 680 variables.
 681
 682 The VARIABLES subcommand, which is mandatory, specifies one or
 683 more variables whose values are to be ranked.
 684 After each variable, @samp{A} or @samp{D} may appear, indicating that
 685 the variable is to be ranked in ascending or descending order.
 686 Ascending is the default.
 687 If a BY keyword appears, it should be followed by a list of variables
 688 which are to serve as group variables.
 689 In this case, the cases are gathered into groups, and ranks calculated
 690 for each group.
 691
 692 The TIES subcommand specifies how tied values are to be treated.  The
 693 default is to take the mean value of all the tied cases.
 694
 695 The FRACTION subcommand specifies how proportional ranks are to be
 696 calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
 697 functions are requested.
 698
 699 The PRINT subcommand may be used to specify that a summary of the rank
 700 variables created should appear in the output.
 701
 702 The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
 703 PROPORTION and SAVAGE.  Any number of function subcommands may appear.
 704 If none are given, then the default is RANK.
 705 The NTILES subcommand must take an integer specifying the number of
 706 partitions into which values should be ranked.
 707 Each subcommand may be followed by the INTO keyword and a list of
 708 variables which are the variables to be created and receive the rank
 709 scores.  There may be as many variables specified as there are
 710 variables named on the VARIABLES subcommand.  If fewer are specified,
 711 then the variable names are automatically created.
 712
 713 The MISSING subcommand determines how user missing values are to be
 714 treated. A setting of EXCLUDE means that variables whose values are
 715 user-missing are to be excluded from the rank scores. A setting of
 716 INCLUDE means they are to be included.  The default is EXCLUDE.
 717
 718 @include regression.texi