pintos-os.org Git - pspp/blob - doc/statistics.texi

   1 @node Statistics, Utilities, Conditionals and Looping, Top
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @menu
   8 * DESCRIPTIVES::                Descriptive statistics.
   9 * FREQUENCIES::                 Frequency tables.
  10 * EXAMINE::                     Testing data for normality.
  11 * CROSSTABS::                   Crosstabulation tables.
  12 * T-TEST::                      Test hypotheses about means.
  13 * ONEWAY::                      One way analysis of variance.
  14 * RANK::                        Compute rank scores.
  15 * REGRESSION::                  Linear regression.
  16 @end menu
  17
  18 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
  19 @section DESCRIPTIVES
  20
  21 @vindex DESCRIPTIVES
  22 @display
  23 DESCRIPTIVES
  24         /VARIABLES=var_list
  25         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  26         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  27         /SAVE
  28         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  29                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  30                      SESKEWNESS,SEKURTOSIS@}
  31         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  32                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  33               @{A,D@}
  34 @end display
  35
  36 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  37 descriptive
  38 statistics requested by the user.  In addition, it can optionally
  39 compute Z-scores.
  40
  41 The VARIABLES subcommand, which is required, specifies the list of
  42 variables to be analyzed.  Keyword VARIABLES is optional.
  43
  44 All other subcommands are optional:
  45
  46 The MISSING subcommand determines the handling of missing variables.  If
  47 INCLUDE is set, then user-missing values are included in the
  48 calculations.  If NOINCLUDE is set, which is the default, user-missing
  49 values are excluded.  If VARIABLE is set, then missing values are
  50 excluded on a variable by variable basis; if LISTWISE is set, then
  51 the entire case is excluded whenever any value in that case has a
  52 system-missing or, if INCLUDE is set, user-missing value.
  53
  54 The FORMAT subcommand affects the output format.  Currently the
  55 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  56 set, both valid and missing number of cases are listed in the output;
  57 when NOSERIAL is set, only valid cases are listed.
  58
  59 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  60 the specified variables.  The Z scores are saved to new variables.
  61 Variable names are generated by trying first the original variable name
  62 with Z prepended and truncated to a maximum of 8 characters, then the
  63 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  64 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  65 variable names can be specified explicitly on VARIABLES in the variable
  66 list by enclosing them in parentheses after each variable.
  67
  68 The STATISTICS subcommand specifies the statistics to be displayed:
  69
  70 @table @code
  71 @item ALL
  72 All of the statistics below.
  73 @item MEAN
  74 Arithmetic mean.
  75 @item SEMEAN
  76 Standard error of the mean.
  77 @item STDDEV
  78 Standard deviation.
  79 @item VARIANCE
  80 Variance.
  81 @item KURTOSIS
  82 Kurtosis and standard error of the kurtosis.
  83 @item SKEWNESS
  84 Skewness and standard error of the skewness.
  85 @item RANGE
  86 Range.
  87 @item MINIMUM
  88 Minimum value.
  89 @item MAXIMUM
  90 Maximum value.
  91 @item SUM
  92 Sum.
  93 @item DEFAULT
  94 Mean, standard deviation of the mean, minimum, maximum.
  95 @item SEKURTOSIS
  96 Standard error of the kurtosis.
  97 @item SESKEWNESS
  98 Standard error of the skewness.
  99 @end table
 100
 101 The SORT subcommand specifies how the statistics should be sorted.  Most
 102 of the possible values should be self-explanatory.  NAME causes the
 103 statistics to be sorted by name.  By default, the statistics are listed
 104 in the order that they are specified on the VARIABLES subcommand.  The A
 105 and D settings request an ascending or descending sort order,
 106 respectively.
 107
 108 @node FREQUENCIES, EXAMINE, DESCRIPTIVES, Statistics
 109 @section FREQUENCIES
 110
 111 @vindex FREQUENCIES
 112 @display
 113 FREQUENCIES
 114         /VARIABLES=var_list
 115         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 116                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 117                 @{LABELS,NOLABELS@}
 118                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 119                 @{SINGLE,DOUBLE@}
 120                 @{OLDPAGE,NEWPAGE@}
 121         /MISSING=@{EXCLUDE,INCLUDE@}
 122         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 123                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 124                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 125         /NTILES=ntiles
 126         /PERCENTILES=percent@dots{}
 127
 128 (These options are not currently implemented.)
 129         /BARCHART=@dots{}
 130         /HISTOGRAM=@dots{}
 131         /HBAR=@dots{}
 132         /GROUPED=@dots{}
 133
 134 (Integer mode.)
 135         /VARIABLES=var_list (low,high)@dots{}
 136 @end display
 137
 138 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 139 variables.
 140 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 141 (including median and mode) and percentiles.
 142
 143 In the future, @cmd{FREQUENCIES} will also support graphical output in the
 144 form of bar charts and histograms.  In addition, it will be able to
 145 support percentiles for grouped data.
 146
 147 The VARIABLES subcommand is the only required subcommand.  Specify the
 148 variables to be analyzed.  In most cases, this is all that is required.
 149 This is known as @dfn{general mode}.
 150
 151 Occasionally, one may want to invoke a special mode called @dfn{integer
 152 mode}.  Normally, in general mode, PSPP will automatically determine
 153 what values occur in the data.  In integer mode, the user specifies the
 154 range of values that the data assumes.  To invoke this mode, specify a
 155 range of data values in parentheses, separated by a comma.  Data values
 156 inside the range are truncated to the nearest integer, then assigned to
 157 that value.  If values occur outside this range, they are discarded.
 158
 159 The FORMAT subcommand controls the output format.  It has several
 160 possible settings:
 161
 162 @itemize @bullet
 163 @item
 164 TABLE, the default, causes a frequency table to be output for every
 165 variable specified.  NOTABLE prevents them from being output.  LIMIT
 166 with a numeric argument causes them to be output except when there are
 167 more than the specified number of values in the table.
 168
 169 @item
 170 STANDARD frequency tables contain more complete information, but also to
 171 take up more space on the printed page.  CONDENSE frequency tables are
 172 less informative but take up less space.  ONEPAGE with a numeric
 173 argument will output standard frequency tables if there are the
 174 specified number of values or less, condensed tables otherwise.  ONEPAGE
 175 without an argument defaults to a threshold of 50 values.
 176
 177 @item
 178 LABELS causes value labels to be displayed in STANDARD frequency
 179 tables.  NOLABLES prevents this.
 180
 181 @item
 182 Normally frequency tables are sorted in ascending order by value.  This
 183 is AVALUE.  DVALUE tables are sorted in descending order by value.
 184 AFREQ and DFREQ tables are sorted in ascending and descending order,
 185 respectively, by frequency count.
 186
 187 @item
 188 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 189 frequency tables have wider spacing.
 190
 191 @item
 192 OLDPAGE and NEWPAGE are not currently used.
 193 @end itemize
 194
 195 The MISSING subcommand controls the handling of user-missing values.
 196 When EXCLUDE, the default, is set, user-missing values are not included
 197 in frequency tables or statistics.  When INCLUDE is set, user-missing
 198 are included.  System-missing values are never included in statistics,
 199 but are listed in frequency tables.
 200
 201 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 202 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 203 value, and MODE, the mode.  (If there are multiple modes, the smallest
 204 value is reported.)  By default, the mean, standard deviation of the
 205 mean, minimum, and maximum are reported for each variable.
 206
 207 @cindex percentiles
 208 PERCENTILES causes the specified percentiles to be reported.
 209 The percentiles should  be presented at a list of numbers between 0
 210 and 100 inclusive.
 211 The NTILES subcommand causes the percentiles to be reported at the
 212 boundaries of the data set divided into the specified number of ranges.
 213 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 214
 215
 216 @node EXAMINE, CROSSTABS, FREQUENCIES, Statistics
 217 @comment  node-name,  next,  previous,  up
 218 @section EXAMINE
 219 @vindex EXAMINE
 220
 221 @cindex Normality, testing for
 222
 223 @display
 224 EXAMINE
 225         VARIABLES=var_list [BY factor_list ]
 226         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 227         /PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
 228         /CINTERVAL n
 229         /COMPARE=@{GROUPS,VARIABLES@}
 230         /ID=@{case_number, var_name@}
 231         /@{TOTAL,NOTOTAL@}
 232         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 233         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 234                 [@{NOREPORT,REPORT@}]
 235
 236 @end display
 237
 238 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 239 normal distribution.  It also shows you outliers and extreme values.
 240
 241 The VARIABLES subcommand specifies the dependent variables and the
 242 independent variable to use as factors for the analysis.   Variables
 243 listed before the first BY keyword are the dependent variables.
 244 The dependent variables may optionally be followed by a list of
 245 factors which tell PSPP how to break down the analysis for each
 246 dependent variable.  The format for each factor is
 247 @display
 248 var [BY var].
 249 @end display
 250
 251
 252 The STATISTICS subcommand specifies the analysis to be done.
 253 DESCRIPTIVES will produce a table showing some parametric and
 254 non-parametrics statistics.  EXTREME produces a table showing extreme
 255 values of the dependent variable.  A number in parentheses determines
 256 how many upper and lower extremes to show.  The default number is 5.
 257
 258
 259 The PLOT subcommand specifies which plots are to be produced if any.
 260
 261 The COMPARE subcommand is only relevant if producing boxplots, and it is only
 262 useful there is more than one dependent variable and at least one factor.   If
 263 /COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
 264 containing boxplots for all the factors.
 265 If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
 266 each containing one boxplot per dependent variable.
 267 If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
 268 /COMPARE=GROUPS.
 269
 270 The CINTERVAL subcommand specifies the confidence interval to use in
 271 calculation of the descriptives command.  The default it 95%.
 272
 273 @cindex percentiles
 274 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 275 and which algorithm to use for calculating them.  The default is to
 276 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 277 HAVERAGE algorithm.
 278
 279 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 280 is given and factors have been specified in the VARIABLES subcommand,
 281 then then statistics for the unfactored dependent variables are
 282 produced in addition to the factored variables.  If there are no
 283 factors specified then TOTAL and NOTOTAL have no effect.
 284
 285 @strong{Warning!}
 286 If many dependent variable are given, or factors are given for which
 287 there are many distinct values, then @cmd{EXAMINE} will produce a very
 288 large quantity of output.
 289
 290
 291 @node CROSSTABS, T-TEST, EXAMINE, Statistics
 292 @section CROSSTABS
 293
 294 @vindex CROSSTABS
 295 @display
 296 CROSSTABS
 297         /TABLES=var_list BY var_list [BY var_list]@dots{}
 298         /MISSING=@{TABLE,INCLUDE,REPORT@}
 299         /WRITE=@{NONE,CELLS,ALL@}
 300         /FORMAT=@{TABLES,NOTABLES@}
 301                 @{LABELS,NOLABELS,NOVALLABS@}
 302                 @{PIVOT,NOPIVOT@}
 303                 @{AVALUE,DVALUE@}
 304                 @{NOINDEX,INDEX@}
 305                 @{BOX,NOBOX@}
 306         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 307                 ASRESIDUAL,ALL,NONE@}
 308         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 309                      KAPPA,ETA,CORR,ALL,NONE@}
 310
 311 (Integer mode.)
 312         /VARIABLES=var_list (low,high)@dots{}
 313 @end display
 314
 315 The @cmd{CROSSTABS} procedure displays crosstabulation
 316 tables requested by the user.  It can calculate several statistics for
 317 each cell in the crosstabulation tables.  In addition, a number of
 318 statistics can be calculated for each table itself.
 319
 320 The TABLES subcommand is used to specify the tables to be reported.  Any
 321 number of dimensions is permitted, and any number of variables per
 322 dimension is allowed.  The TABLES subcommand may be repeated as many
 323 times as needed.  This is the only required subcommand in @dfn{general
 324 mode}.
 325
 326 Occasionally, one may want to invoke a special mode called @dfn{integer
 327 mode}.  Normally, in general mode, PSPP automatically determines
 328 what values occur in the data.  In integer mode, the user specifies the
 329 range of values that the data assumes.  To invoke this mode, specify the
 330 VARIABLES subcommand, giving a range of data values in parentheses for
 331 each variable to be used on the TABLES subcommand.  Data values inside
 332 the range are truncated to the nearest integer, then assigned to that
 333 value.  If values occur outside this range, they are discarded.  When it
 334 is present, the VARIABLES subcommand must precede the TABLES
 335 subcommand.
 336
 337 In general mode, numeric and string variables may be specified on
 338 TABLES.  Although long string variables are allowed, only their
 339 initial short-string parts are used.  In integer mode, only numeric
 340 variables are allowed.
 341
 342 The MISSING subcommand determines the handling of user-missing values.
 343 When set to TABLE, the default, missing values are dropped on a table by
 344 table basis.  When set to INCLUDE, user-missing values are included in
 345 tables and statistics.  When set to REPORT, which is allowed only in
 346 integer mode, user-missing values are included in tables but marked with
 347 an @samp{M} (for ``missing'') and excluded from statistical
 348 calculations.
 349
 350 Currently the WRITE subcommand is ignored.
 351
 352 The FORMAT subcommand controls the characteristics of the
 353 crosstabulation tables to be displayed.  It has a number of possible
 354 settings:
 355
 356 @itemize @bullet
 357 @item
 358 TABLES, the default, causes crosstabulation tables to be output.
 359 NOTABLES suppresses them.
 360
 361 @item
 362 LABELS, the default, allows variable labels and value labels to appear
 363 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 364 labels but suppresses value labels.
 365
 366 @item
 367 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 368 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 369 to be used.
 370
 371 @item
 372 AVALUE, the default, causes values to be sorted in ascending order.
 373 DVALUE asserts a descending sort order.
 374
 375 @item
 376 INDEX/NOINDEX is currently ignored.
 377
 378 @item
 379 BOX/NOBOX is currently ignored.
 380 @end itemize
 381
 382 The CELLS subcommand controls the contents of each cell in the displayed
 383 crosstabulation table.  The possible settings are:
 384
 385 @table @asis
 386 @item COUNT
 387 Frequency count.
 388 @item ROW
 389 Row percent.
 390 @item COLUMN
 391 Column percent.
 392 @item TOTAL
 393 Table percent.
 394 @item EXPECTED
 395 Expected value.
 396 @item RESIDUAL
 397 Residual.
 398 @item SRESIDUAL
 399 Standardized residual.
 400 @item ASRESIDUAL
 401 Adjusted standardized residual.
 402 @item ALL
 403 All of the above.
 404 @item NONE
 405 Suppress cells entirely.
 406 @end table
 407
 408 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 409 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 410 will be selected.
 411
 412 The STATISTICS subcommand selects statistics for computation:
 413
 414 @table @asis
 415 @item CHISQ
 416 @cindex chisquare
 417 @cindex chi-square
 418
 419 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 420 correction, linear-by-linear association.
 421 @item PHI
 422 Phi.
 423 @item CC
 424 Contingency coefficient.
 425 @item LAMBDA
 426 Lambda.
 427 @item UC
 428 Uncertainty coefficient.
 429 @item BTAU
 430 Tau-b.
 431 @item CTAU
 432 Tau-c.
 433 @item RISK
 434 Risk estimate.
 435 @item GAMMA
 436 Gamma.
 437 @item D
 438 Somers' D.
 439 @item KAPPA
 440 Cohen's Kappa.
 441 @item ETA
 442 Eta.
 443 @item CORR
 444 Spearman correlation, Pearson's r.
 445 @item ALL
 446 All of the above.
 447 @item NONE
 448 No statistics.
 449 @end table
 450
 451 Selected statistics are only calculated when appropriate for the
 452 statistic.  Certain statistics require tables of a particular size, and
 453 some statistics are calculated only in integer mode.
 454
 455 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 456 STATISTICS subcommand is not given, no statistics are calculated.
 457
 458 @strong{Please note:} Currently the implementation of CROSSTABS has the
 459 followings bugs:
 460
 461 @itemize @bullet
 462 @item
 463 Pearson's R (but not Spearman) is off a little.
 464 @item
 465 T values for Spearman's R and Pearson's R are wrong.
 466 @item
 467 Significance of symmetric and directional measures is not calculated.
 468 @item
 469 Asymmetric ASEs and T values for lambda are wrong.
 470 @item
 471 ASE of Goodman and Kruskal's tau is not calculated.
 472 @item
 473 ASE of symmetric somers' d is wrong.
 474 @item
 475 Approximate T of uncertainty coefficient is wrong.
 476 @end itemize
 477
 478 Fixes for any of these deficiencies would be welcomed.
 479
 480 @node T-TEST, ONEWAY, CROSSTABS, Statistics
 481 @comment  node-name,  next,  previous,  up
 482 @section T-TEST
 483
 484 @vindex T-TEST
 485 @display
 486 T-TEST
 487         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 488         /CRITERIA=CIN(confidence)
 489
 490
 491 (One Sample mode.)
 492         TESTVAL=test_value
 493         /VARIABLES=var_list
 494
 495
 496 (Independent Samples mode.)
 497         GROUPS=var(value1 [, value2])
 498         /VARIABLES=var_list
 499
 500
 501 (Paired Samples mode.)
 502         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 503
 504 @end display
 505
 506
 507 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 508 means.
 509 It operates in one of three modes:
 510 @itemize
 511 @item One Sample mode.
 512 @item Independent Groups mode.
 513 @item Paired mode.
 514 @end itemize
 515
 516 @noindent
 517 Each of these modes are described in more detail below.
 518 There are two optional subcommands which are common to all modes.
 519
 520 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 521 in the tests.  The default value is 0.95.
 522
 523
 524 The @cmd{MISSING} subcommand determines the handling of missing
 525 variables.
 526 If INCLUDE is set, then user-missing values are included in the
 527 calculations, but system-missing values are not.
 528 If EXCLUDE is set, which is the default, user-missing
 529 values are excluded as well as system-missing values.
 530 This is the default.
 531
 532 If LISTWISE is set, then the entire case is excluded from analysis
 533 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 534 @cmd{/GROUPS} subcommands contains a missing value.
 535 If ANALYSIS is set, then missing values are excluded only in the analysis for
 536 which they would be needed. This is the default.
 537
 538
 539 @menu
 540 * One Sample Mode::             Testing against a hypothesised mean
 541 * Independent Samples Mode::    Testing two independent groups for equal mean
 542 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 543 @end menu
 544
 545 @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
 546 @subsection One Sample Mode
 547
 548 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 549 This mode is used to test a population mean against a hypothesised
 550 mean.
 551 The value given to the @cmd{TESTVAL} subcommand is the value against
 552 which you wish to test.
 553 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 554 tell PSPP which variables you wish to test.
 555
 556 @node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST
 557 @comment  node-name,  next,  previous,  up
 558 @subsection Independent Samples Mode
 559
 560 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 561 `Groups' mode.
 562 This mode is used to test whether two groups of values have the
 563 same population mean.
 564 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 565 tell PSPP the dependent variables you wish to test.
 566
 567 The variable given in the @cmd{GROUPS} subcommand is the independent
 568 variable which determines to which group the samples belong.
 569 The values in parentheses are the specific values of the independent
 570 variable for each group.
 571 If the parentheses are omitted and no values are given, the default values
 572 of 1.0 and 2.0 are assumed.
 573
 574 If the independent variable is numeric,
 575 it is acceptable to specify only one value inside the parentheses.
 576 If you do this, cases where the independent variable is
 577 less than  or equal to this value belong to the first group, and cases
 578 greater than this value belong to the second group.
 579 When using this form of the @cmd{GROUPS} subcommand, missing values in
 580 the independent variable are excluded on a listwise basis, regardless
 581 of whether @cmd{/MISSING=LISTWISE} was specified.
 582
 583
 584 @node Paired Samples Mode,  , Independent Samples Mode, T-TEST
 585 @comment  node-name,  next,  previous,  up
 586 @subsection Paired Samples Mode
 587
 588 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 589 Use this mode when repeated measures have been taken from the same
 590 samples.
 591 If the the @code{WITH} keyword is omitted, then tables for all
 592 combinations of variables given in the @cmd{PAIRS} subcommand are
 593 generated.
 594 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 595 is also given, then the number of variables preceding @code{WITH}
 596 must be the same as the number following it.
 597 In this case, tables for each respective pair of variables are
 598 generated.
 599 In the event that the @code{WITH} keyword is given, but the
 600 @code{(PAIRED)} keyword is omitted, then tables for each combination
 601 of variable preceding @code{WITH} against variable following
 602 @code{WITH} are generated.
 603
 604
 605 @node ONEWAY, RANK, T-TEST, Statistics
 606 @comment  node-name,  next,  previous,  up
 607 @section ONEWAY
 608
 609 @vindex ONEWAY
 610 @cindex analysis of variance
 611 @cindex ANOVA
 612
 613 @display
 614 ONEWAY
 615         [/VARIABLES = ] var_list BY var
 616         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 617         /CONTRASTS= value1 [, value2] ... [,valueN]
 618         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 619
 620 @end display
 621
 622 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 623 variables factored by a single independent variable.
 624 It is used to compare the means of a population
 625 divided into more than two groups.
 626
 627 The  variables to be analysed should be given in the @code{VARIABLES}
 628 subcommand.
 629 The list of variables must be followed by the @code{BY} keyword and
 630 the name of the independent (or factor) variable.
 631
 632 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 633 ancilliary information.  The options accepted are:
 634 @itemize
 635 @item DESCRIPTIVES
 636 Displays descriptive statistics about the groups factored by the independent
 637 variable.
 638 @item HOMOGENEITY
 639 Displays the Levene test of Homogeneity of Variance for the
 640 variables and their groups.
 641 @end itemize
 642
 643 The @code{CONTRASTS} subcommand is used when you anticipate certain
 644 differences between the groups.
 645 The subcommand must be followed by a list of numerals which are the
 646 coefficients of the groups to be tested.
 647 The number of coefficients must correspond to the number of distinct
 648 groups (or values of the independent variable).
 649 If the total sum of the coefficients are not zero, then PSPP will
 650 display a warning, but will proceed with the analysis.
 651 The @code{CONTRASTS} subcommand may be given up to 10 times in order
 652 to specify different contrast tests.
 653 @setfilename ignored
 654
 655 @node RANK, REGRESSION, ONEWAY, Statistics
 656 @comment  node-name,  next,  previous,  up
 657 @section RANK
 658
 659 @vindex RANK
 660 @display
 661 RANK
 662         [VARIABLES=] var_list [@{A,D@}] [BY var_list]
 663         /TIES=@{MEAN,LOW,HIGH,CONDENSE@}
 664         /FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
 665         /PRINT[=@{YES,NO@}
 666         /MISSING=@{EXCLUDE,INCLUDE@}
 667
 668         /RANK [INTO var_list]
 669         /NTILES(k) [INTO var_list]
 670         /NORMAL [INTO var_list]
 671         /PERCENT [INTO var_list]
 672         /RFRACTION [INTO var_list]
 673         /PROPORTION [INTO var_list]
 674         /N [INTO var_list]
 675         /SAVAGE [INTO var_list]
 676 @end display
 677
 678 The @cmd{RANK} command ranks variables and stores the results into new
 679 variables.
 680
 681 The VARIABLES subcommand, which is mandatory, specifies one or
 682 more variables whose values are to be ranked.
 683 After each variable, @samp{A} or @samp{D} may appear, indicating that
 684 the variable is to be ranked in ascending or descending order.
 685 Ascending is the default.
 686 If a BY keyword appears, it should be followed by a list of variables
 687 which are to serve as group variables.
 688 In this case, the cases are gathered into groups, and ranks calculated
 689 for each group.
 690
 691 The TIES subcommand specifies how tied values are to be treated.  The
 692 default is to take the mean value of all the tied cases.
 693
 694 The FRACTION subcommand specifies how proportional ranks are to be
 695 calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
 696 functions are requested.
 697
 698 The PRINT subcommand may be used to specify that a summary of the rank
 699 variables created should appear in the output.
 700
 701 The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
 702 PROPORTION and SAVAGE.  Any number of function subcommands may appear.
 703 If none are given, then the default is RANK.
 704 The NTILES subcommand must take an integer specifying the number of
 705 partitions into which values should be ranked.
 706 Each subcommand may be followed by the INTO keyword and a list of
 707 variables which are the variables to be created and receive the rank
 708 scores.  There may be as many variables specified as there are
 709 variables named on the VARIABLES subcommand.  If fewer are specified,
 710 then the variable names are automatically created.
 711
 712 The MISSING subcommand determines how user missing values are to be
 713 treated. A setting of EXCLUDE means that variables whose values are
 714 user-missing are to be excluded from the rank scores. A setting of
 715 INCLUDE means they are to be included.  The default is EXCLUDE.
 716
 717 @include regression.texi