pintos-os.org Git - pspp/blob - doc/statistics.texi

   1 @node Statistics, Utilities, Conditionals and Looping, Top
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @menu
   8 * DESCRIPTIVES::                Descriptive statistics.
   9 * FREQUENCIES::                 Frequency tables.
  10 * EXAMINE::                     Testing data for normality.
  11 * CROSSTABS::                   Crosstabulation tables.
  12 * T-TEST::                      Test hypotheses about means.
  13 * ONEWAY::                      One way analysis of variance.
  14 * RANK::                        Compute rank scores.
  15 * REGRESSION::                  Linear regression.
  16 @end menu
  17
  18 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
  19 @section DESCRIPTIVES
  20
  21 @vindex DESCRIPTIVES
  22 @display
  23 DESCRIPTIVES
  24         /VARIABLES=var_list
  25         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  26         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  27         /SAVE
  28         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  29                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  30                      SESKEWNESS,SEKURTOSIS@}
  31         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  32                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  33               @{A,D@}
  34 @end display
  35
  36 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  37 descriptive
  38 statistics requested by the user.  In addition, it can optionally
  39 compute Z-scores.
  40
  41 The VARIABLES subcommand, which is required, specifies the list of
  42 variables to be analyzed.  Keyword VARIABLES is optional.
  43
  44 All other subcommands are optional:
  45
  46 The MISSING subcommand determines the handling of missing variables.  If
  47 INCLUDE is set, then user-missing values are included in the
  48 calculations.  If NOINCLUDE is set, which is the default, user-missing
  49 values are excluded.  If VARIABLE is set, then missing values are
  50 excluded on a variable by variable basis; if LISTWISE is set, then
  51 the entire case is excluded whenever any value in that case has a
  52 system-missing or, if INCLUDE is set, user-missing value.
  53
  54 The FORMAT subcommand affects the output format.  Currently the
  55 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  56 set, both valid and missing number of cases are listed in the output;
  57 when NOSERIAL is set, only valid cases are listed.
  58
  59 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  60 the specified variables.  The Z scores are saved to new variables.
  61 Variable names are generated by trying first the original variable name
  62 with Z prepended and truncated to a maximum of 8 characters, then the
  63 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  64 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  65 variable names can be specified explicitly on VARIABLES in the variable
  66 list by enclosing them in parentheses after each variable.
  67
  68 The STATISTICS subcommand specifies the statistics to be displayed:
  69
  70 @table @code
  71 @item ALL
  72 All of the statistics below.
  73 @item MEAN
  74 Arithmetic mean.
  75 @item SEMEAN
  76 Standard error of the mean.
  77 @item STDDEV
  78 Standard deviation.
  79 @item VARIANCE
  80 Variance.
  81 @item KURTOSIS
  82 Kurtosis and standard error of the kurtosis.
  83 @item SKEWNESS
  84 Skewness and standard error of the skewness.
  85 @item RANGE
  86 Range.
  87 @item MINIMUM
  88 Minimum value.
  89 @item MAXIMUM
  90 Maximum value.
  91 @item SUM
  92 Sum.
  93 @item DEFAULT
  94 Mean, standard deviation of the mean, minimum, maximum.
  95 @item SEKURTOSIS
  96 Standard error of the kurtosis.
  97 @item SESKEWNESS
  98 Standard error of the skewness.
  99 @end table
 100
 101 The SORT subcommand specifies how the statistics should be sorted.  Most
 102 of the possible values should be self-explanatory.  NAME causes the
 103 statistics to be sorted by name.  By default, the statistics are listed
 104 in the order that they are specified on the VARIABLES subcommand.  The A
 105 and D settings request an ascending or descending sort order,
 106 respectively.
 107
 108 @node FREQUENCIES, EXAMINE, DESCRIPTIVES, Statistics
 109 @section FREQUENCIES
 110
 111 @vindex FREQUENCIES
 112 @display
 113 FREQUENCIES
 114         /VARIABLES=var_list
 115         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 116                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 117                 @{LABELS,NOLABELS@}
 118                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 119                 @{SINGLE,DOUBLE@}
 120                 @{OLDPAGE,NEWPAGE@}
 121         /MISSING=@{EXCLUDE,INCLUDE@}
 122         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 123                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 124                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 125         /NTILES=ntiles
 126         /PERCENTILES=percent@dots{}
 127         /HISTOGRAM=[MINIMUM(x_min)] [MAXIMUM(x_max)]
 128                    [@{FREQ,PCNT@}] [@{NONORMAL,NORMAL@}]
 129         /PIECHART=[MINIMUM(x_min)] [MAXIMUM(x_max)] @{NOMISSING,MISSING@}
 130
 131 (These options are not currently implemented.)
 132         /BARCHART=@dots{}
 133         /HBAR=@dots{}
 134         /GROUPED=@dots{}
 135
 136 (Integer mode.)
 137         /VARIABLES=var_list (low,high)@dots{}
 138 @end display
 139
 140 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 141 variables.
 142 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 143 (including median and mode) and percentiles.
 144
 145 @cmd{FREQUENCIES} also support graphical output in the form of
 146 histograms and pie charts.  In the future, it will be able to produce
 147 bar charts and output percentiles for grouped data.
 148
 149 The VARIABLES subcommand is the only required subcommand.  Specify the
 150 variables to be analyzed.  In most cases, this is all that is required.
 151 This is known as @dfn{general mode}.
 152
 153 Occasionally, one may want to invoke a special mode called @dfn{integer
 154 mode}.  Normally, in general mode, PSPP will automatically determine
 155 what values occur in the data.  In integer mode, the user specifies the
 156 range of values that the data assumes.  To invoke this mode, specify a
 157 range of data values in parentheses, separated by a comma.  Data values
 158 inside the range are truncated to the nearest integer, then assigned to
 159 that value.  If values occur outside this range, they are discarded.
 160
 161 The FORMAT subcommand controls the output format.  It has several
 162 possible settings:
 163
 164 @itemize @bullet
 165 @item
 166 TABLE, the default, causes a frequency table to be output for every
 167 variable specified.  NOTABLE prevents them from being output.  LIMIT
 168 with a numeric argument causes them to be output except when there are
 169 more than the specified number of values in the table.
 170
 171 @item
 172 STANDARD frequency tables contain more complete information, but also to
 173 take up more space on the printed page.  CONDENSE frequency tables are
 174 less informative but take up less space.  ONEPAGE with a numeric
 175 argument will output standard frequency tables if there are the
 176 specified number of values or less, condensed tables otherwise.  ONEPAGE
 177 without an argument defaults to a threshold of 50 values.
 178
 179 @item
 180 LABELS causes value labels to be displayed in STANDARD frequency
 181 tables.  NOLABLES prevents this.
 182
 183 @item
 184 Normally frequency tables are sorted in ascending order by value.  This
 185 is AVALUE.  DVALUE tables are sorted in descending order by value.
 186 AFREQ and DFREQ tables are sorted in ascending and descending order,
 187 respectively, by frequency count.
 188
 189 @item
 190 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 191 frequency tables have wider spacing.
 192
 193 @item
 194 OLDPAGE and NEWPAGE are not currently used.
 195 @end itemize
 196
 197 The MISSING subcommand controls the handling of user-missing values.
 198 When EXCLUDE, the default, is set, user-missing values are not included
 199 in frequency tables or statistics.  When INCLUDE is set, user-missing
 200 are included.  System-missing values are never included in statistics,
 201 but are listed in frequency tables.
 202
 203 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 204 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 205 value, and MODE, the mode.  (If there are multiple modes, the smallest
 206 value is reported.)  By default, the mean, standard deviation of the
 207 mean, minimum, and maximum are reported for each variable.
 208
 209 @cindex percentiles
 210 PERCENTILES causes the specified percentiles to be reported.
 211 The percentiles should  be presented at a list of numbers between 0
 212 and 100 inclusive.
 213 The NTILES subcommand causes the percentiles to be reported at the
 214 boundaries of the data set divided into the specified number of ranges.
 215 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 216
 217 The HISTOGRAM subcommand causes the output to include a histogram for
 218 each specified variable.  The X axis by default ranges from the
 219 minimum to the maximum value observed in the data, but the MINIMUM and
 220 MAXIMUM keywords can set an explicit range.  The Y axis by default is
 221 labeled in frequencies; use the PERCENT keyword to causes it to be
 222 labeled in percent of the total observed count.  Specify NORMAL to
 223 superimpose a normal curve on the histogram.
 224
 225 The PIECHART adds a pie chart for each variable to the data.  Each
 226 slice represents one value, with the size of the slice proportional to
 227 the value's frequency.  By default, all non-missing values are given
 228 slices.  The MINIMUM and MAXIMUM keywords can be used to limit the
 229 displayed slices to a given range of values.  The MISSING keyword adds
 230 slices for missing values.
 231
 232 @node EXAMINE, CROSSTABS, FREQUENCIES, Statistics
 233 @comment  node-name,  next,  previous,  up
 234 @section EXAMINE
 235 @vindex EXAMINE
 236
 237 @cindex Normality, testing for
 238
 239 @display
 240 EXAMINE
 241         VARIABLES=var_list [BY factor_list ]
 242         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 243         /PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
 244         /CINTERVAL n
 245         /COMPARE=@{GROUPS,VARIABLES@}
 246         /ID=@{case_number, var_name@}
 247         /@{TOTAL,NOTOTAL@}
 248         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 249         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 250                 [@{NOREPORT,REPORT@}]
 251
 252 @end display
 253
 254 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 255 normal distribution.  It also shows you outliers and extreme values.
 256
 257 The VARIABLES subcommand specifies the dependent variables and the
 258 independent variable to use as factors for the analysis.   Variables
 259 listed before the first BY keyword are the dependent variables.
 260 The dependent variables may optionally be followed by a list of
 261 factors which tell PSPP how to break down the analysis for each
 262 dependent variable.  The format for each factor is
 263 @display
 264 var [BY var].
 265 @end display
 266
 267
 268 The STATISTICS subcommand specifies the analysis to be done.
 269 DESCRIPTIVES will produce a table showing some parametric and
 270 non-parametrics statistics.  EXTREME produces a table showing extreme
 271 values of the dependent variable.  A number in parentheses determines
 272 how many upper and lower extremes to show.  The default number is 5.
 273
 274
 275 The PLOT subcommand specifies which plots are to be produced if any.
 276
 277 The COMPARE subcommand is only relevant if producing boxplots, and it is only
 278 useful there is more than one dependent variable and at least one factor.   If
 279 /COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
 280 containing boxplots for all the factors.
 281 If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
 282 each containing one boxplot per dependent variable.
 283 If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
 284 /COMPARE=GROUPS.
 285
 286 The CINTERVAL subcommand specifies the confidence interval to use in
 287 calculation of the descriptives command.  The default it 95%.
 288
 289 @cindex percentiles
 290 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 291 and which algorithm to use for calculating them.  The default is to
 292 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 293 HAVERAGE algorithm.
 294
 295 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 296 is given and factors have been specified in the VARIABLES subcommand,
 297 then then statistics for the unfactored dependent variables are
 298 produced in addition to the factored variables.  If there are no
 299 factors specified then TOTAL and NOTOTAL have no effect.
 300
 301 @strong{Warning!}
 302 If many dependent variable are given, or factors are given for which
 303 there are many distinct values, then @cmd{EXAMINE} will produce a very
 304 large quantity of output.
 305
 306
 307 @node CROSSTABS, T-TEST, EXAMINE, Statistics
 308 @section CROSSTABS
 309
 310 @vindex CROSSTABS
 311 @display
 312 CROSSTABS
 313         /TABLES=var_list BY var_list [BY var_list]@dots{}
 314         /MISSING=@{TABLE,INCLUDE,REPORT@}
 315         /WRITE=@{NONE,CELLS,ALL@}
 316         /FORMAT=@{TABLES,NOTABLES@}
 317                 @{LABELS,NOLABELS,NOVALLABS@}
 318                 @{PIVOT,NOPIVOT@}
 319                 @{AVALUE,DVALUE@}
 320                 @{NOINDEX,INDEX@}
 321                 @{BOX,NOBOX@}
 322         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 323                 ASRESIDUAL,ALL,NONE@}
 324         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 325                      KAPPA,ETA,CORR,ALL,NONE@}
 326
 327 (Integer mode.)
 328         /VARIABLES=var_list (low,high)@dots{}
 329 @end display
 330
 331 The @cmd{CROSSTABS} procedure displays crosstabulation
 332 tables requested by the user.  It can calculate several statistics for
 333 each cell in the crosstabulation tables.  In addition, a number of
 334 statistics can be calculated for each table itself.
 335
 336 The TABLES subcommand is used to specify the tables to be reported.  Any
 337 number of dimensions is permitted, and any number of variables per
 338 dimension is allowed.  The TABLES subcommand may be repeated as many
 339 times as needed.  This is the only required subcommand in @dfn{general
 340 mode}.
 341
 342 Occasionally, one may want to invoke a special mode called @dfn{integer
 343 mode}.  Normally, in general mode, PSPP automatically determines
 344 what values occur in the data.  In integer mode, the user specifies the
 345 range of values that the data assumes.  To invoke this mode, specify the
 346 VARIABLES subcommand, giving a range of data values in parentheses for
 347 each variable to be used on the TABLES subcommand.  Data values inside
 348 the range are truncated to the nearest integer, then assigned to that
 349 value.  If values occur outside this range, they are discarded.  When it
 350 is present, the VARIABLES subcommand must precede the TABLES
 351 subcommand.
 352
 353 In general mode, numeric and string variables may be specified on
 354 TABLES.  Although long string variables are allowed, only their
 355 initial short-string parts are used.  In integer mode, only numeric
 356 variables are allowed.
 357
 358 The MISSING subcommand determines the handling of user-missing values.
 359 When set to TABLE, the default, missing values are dropped on a table by
 360 table basis.  When set to INCLUDE, user-missing values are included in
 361 tables and statistics.  When set to REPORT, which is allowed only in
 362 integer mode, user-missing values are included in tables but marked with
 363 an @samp{M} (for ``missing'') and excluded from statistical
 364 calculations.
 365
 366 Currently the WRITE subcommand is ignored.
 367
 368 The FORMAT subcommand controls the characteristics of the
 369 crosstabulation tables to be displayed.  It has a number of possible
 370 settings:
 371
 372 @itemize @bullet
 373 @item
 374 TABLES, the default, causes crosstabulation tables to be output.
 375 NOTABLES suppresses them.
 376
 377 @item
 378 LABELS, the default, allows variable labels and value labels to appear
 379 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 380 labels but suppresses value labels.
 381
 382 @item
 383 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 384 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 385 to be used.
 386
 387 @item
 388 AVALUE, the default, causes values to be sorted in ascending order.
 389 DVALUE asserts a descending sort order.
 390
 391 @item
 392 INDEX/NOINDEX is currently ignored.
 393
 394 @item
 395 BOX/NOBOX is currently ignored.
 396 @end itemize
 397
 398 The CELLS subcommand controls the contents of each cell in the displayed
 399 crosstabulation table.  The possible settings are:
 400
 401 @table @asis
 402 @item COUNT
 403 Frequency count.
 404 @item ROW
 405 Row percent.
 406 @item COLUMN
 407 Column percent.
 408 @item TOTAL
 409 Table percent.
 410 @item EXPECTED
 411 Expected value.
 412 @item RESIDUAL
 413 Residual.
 414 @item SRESIDUAL
 415 Standardized residual.
 416 @item ASRESIDUAL
 417 Adjusted standardized residual.
 418 @item ALL
 419 All of the above.
 420 @item NONE
 421 Suppress cells entirely.
 422 @end table
 423
 424 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 425 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 426 will be selected.
 427
 428 The STATISTICS subcommand selects statistics for computation:
 429
 430 @table @asis
 431 @item CHISQ
 432 @cindex chisquare
 433 @cindex chi-square
 434
 435 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 436 correction, linear-by-linear association.
 437 @item PHI
 438 Phi.
 439 @item CC
 440 Contingency coefficient.
 441 @item LAMBDA
 442 Lambda.
 443 @item UC
 444 Uncertainty coefficient.
 445 @item BTAU
 446 Tau-b.
 447 @item CTAU
 448 Tau-c.
 449 @item RISK
 450 Risk estimate.
 451 @item GAMMA
 452 Gamma.
 453 @item D
 454 Somers' D.
 455 @item KAPPA
 456 Cohen's Kappa.
 457 @item ETA
 458 Eta.
 459 @item CORR
 460 Spearman correlation, Pearson's r.
 461 @item ALL
 462 All of the above.
 463 @item NONE
 464 No statistics.
 465 @end table
 466
 467 Selected statistics are only calculated when appropriate for the
 468 statistic.  Certain statistics require tables of a particular size, and
 469 some statistics are calculated only in integer mode.
 470
 471 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 472 STATISTICS subcommand is not given, no statistics are calculated.
 473
 474 @strong{Please note:} Currently the implementation of CROSSTABS has the
 475 followings bugs:
 476
 477 @itemize @bullet
 478 @item
 479 Pearson's R (but not Spearman) is off a little.
 480 @item
 481 T values for Spearman's R and Pearson's R are wrong.
 482 @item
 483 Significance of symmetric and directional measures is not calculated.
 484 @item
 485 Asymmetric ASEs and T values for lambda are wrong.
 486 @item
 487 ASE of Goodman and Kruskal's tau is not calculated.
 488 @item
 489 ASE of symmetric somers' d is wrong.
 490 @item
 491 Approximate T of uncertainty coefficient is wrong.
 492 @end itemize
 493
 494 Fixes for any of these deficiencies would be welcomed.
 495
 496 @node T-TEST, ONEWAY, CROSSTABS, Statistics
 497 @comment  node-name,  next,  previous,  up
 498 @section T-TEST
 499
 500 @vindex T-TEST
 501 @display
 502 T-TEST
 503         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 504         /CRITERIA=CIN(confidence)
 505
 506
 507 (One Sample mode.)
 508         TESTVAL=test_value
 509         /VARIABLES=var_list
 510
 511
 512 (Independent Samples mode.)
 513         GROUPS=var(value1 [, value2])
 514         /VARIABLES=var_list
 515
 516
 517 (Paired Samples mode.)
 518         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 519
 520 @end display
 521
 522
 523 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 524 means.
 525 It operates in one of three modes:
 526 @itemize
 527 @item One Sample mode.
 528 @item Independent Groups mode.
 529 @item Paired mode.
 530 @end itemize
 531
 532 @noindent
 533 Each of these modes are described in more detail below.
 534 There are two optional subcommands which are common to all modes.
 535
 536 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 537 in the tests.  The default value is 0.95.
 538
 539
 540 The @cmd{MISSING} subcommand determines the handling of missing
 541 variables.
 542 If INCLUDE is set, then user-missing values are included in the
 543 calculations, but system-missing values are not.
 544 If EXCLUDE is set, which is the default, user-missing
 545 values are excluded as well as system-missing values.
 546 This is the default.
 547
 548 If LISTWISE is set, then the entire case is excluded from analysis
 549 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 550 @cmd{/GROUPS} subcommands contains a missing value.
 551 If ANALYSIS is set, then missing values are excluded only in the analysis for
 552 which they would be needed. This is the default.
 553
 554
 555 @menu
 556 * One Sample Mode::             Testing against a hypothesised mean
 557 * Independent Samples Mode::    Testing two independent groups for equal mean
 558 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 559 @end menu
 560
 561 @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
 562 @subsection One Sample Mode
 563
 564 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 565 This mode is used to test a population mean against a hypothesised
 566 mean.
 567 The value given to the @cmd{TESTVAL} subcommand is the value against
 568 which you wish to test.
 569 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 570 tell PSPP which variables you wish to test.
 571
 572 @node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST
 573 @comment  node-name,  next,  previous,  up
 574 @subsection Independent Samples Mode
 575
 576 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 577 `Groups' mode.
 578 This mode is used to test whether two groups of values have the
 579 same population mean.
 580 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 581 tell PSPP the dependent variables you wish to test.
 582
 583 The variable given in the @cmd{GROUPS} subcommand is the independent
 584 variable which determines to which group the samples belong.
 585 The values in parentheses are the specific values of the independent
 586 variable for each group.
 587 If the parentheses are omitted and no values are given, the default values
 588 of 1.0 and 2.0 are assumed.
 589
 590 If the independent variable is numeric,
 591 it is acceptable to specify only one value inside the parentheses.
 592 If you do this, cases where the independent variable is
 593 less than  or equal to this value belong to the first group, and cases
 594 greater than this value belong to the second group.
 595 When using this form of the @cmd{GROUPS} subcommand, missing values in
 596 the independent variable are excluded on a listwise basis, regardless
 597 of whether @cmd{/MISSING=LISTWISE} was specified.
 598
 599
 600 @node Paired Samples Mode,  , Independent Samples Mode, T-TEST
 601 @comment  node-name,  next,  previous,  up
 602 @subsection Paired Samples Mode
 603
 604 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 605 Use this mode when repeated measures have been taken from the same
 606 samples.
 607 If the the @code{WITH} keyword is omitted, then tables for all
 608 combinations of variables given in the @cmd{PAIRS} subcommand are
 609 generated.
 610 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 611 is also given, then the number of variables preceding @code{WITH}
 612 must be the same as the number following it.
 613 In this case, tables for each respective pair of variables are
 614 generated.
 615 In the event that the @code{WITH} keyword is given, but the
 616 @code{(PAIRED)} keyword is omitted, then tables for each combination
 617 of variable preceding @code{WITH} against variable following
 618 @code{WITH} are generated.
 619
 620
 621 @node ONEWAY, RANK, T-TEST, Statistics
 622 @comment  node-name,  next,  previous,  up
 623 @section ONEWAY
 624
 625 @vindex ONEWAY
 626 @cindex analysis of variance
 627 @cindex ANOVA
 628
 629 @display
 630 ONEWAY
 631         [/VARIABLES = ] var_list BY var
 632         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 633         /CONTRASTS= value1 [, value2] ... [,valueN]
 634         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 635
 636 @end display
 637
 638 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 639 variables factored by a single independent variable.
 640 It is used to compare the means of a population
 641 divided into more than two groups.
 642
 643 The  variables to be analysed should be given in the @code{VARIABLES}
 644 subcommand.
 645 The list of variables must be followed by the @code{BY} keyword and
 646 the name of the independent (or factor) variable.
 647
 648 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 649 ancilliary information.  The options accepted are:
 650 @itemize
 651 @item DESCRIPTIVES
 652 Displays descriptive statistics about the groups factored by the independent
 653 variable.
 654 @item HOMOGENEITY
 655 Displays the Levene test of Homogeneity of Variance for the
 656 variables and their groups.
 657 @end itemize
 658
 659 The @code{CONTRASTS} subcommand is used when you anticipate certain
 660 differences between the groups.
 661 The subcommand must be followed by a list of numerals which are the
 662 coefficients of the groups to be tested.
 663 The number of coefficients must correspond to the number of distinct
 664 groups (or values of the independent variable).
 665 If the total sum of the coefficients are not zero, then PSPP will
 666 display a warning, but will proceed with the analysis.
 667 The @code{CONTRASTS} subcommand may be given up to 10 times in order
 668 to specify different contrast tests.
 669 @setfilename ignored
 670
 671 @node RANK, REGRESSION, ONEWAY, Statistics
 672 @comment  node-name,  next,  previous,  up
 673 @section RANK
 674
 675 @vindex RANK
 676 @display
 677 RANK
 678         [VARIABLES=] var_list [@{A,D@}] [BY var_list]
 679         /TIES=@{MEAN,LOW,HIGH,CONDENSE@}
 680         /FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
 681         /PRINT[=@{YES,NO@}
 682         /MISSING=@{EXCLUDE,INCLUDE@}
 683
 684         /RANK [INTO var_list]
 685         /NTILES(k) [INTO var_list]
 686         /NORMAL [INTO var_list]
 687         /PERCENT [INTO var_list]
 688         /RFRACTION [INTO var_list]
 689         /PROPORTION [INTO var_list]
 690         /N [INTO var_list]
 691         /SAVAGE [INTO var_list]
 692 @end display
 693
 694 The @cmd{RANK} command ranks variables and stores the results into new
 695 variables.
 696
 697 The VARIABLES subcommand, which is mandatory, specifies one or
 698 more variables whose values are to be ranked.
 699 After each variable, @samp{A} or @samp{D} may appear, indicating that
 700 the variable is to be ranked in ascending or descending order.
 701 Ascending is the default.
 702 If a BY keyword appears, it should be followed by a list of variables
 703 which are to serve as group variables.
 704 In this case, the cases are gathered into groups, and ranks calculated
 705 for each group.
 706
 707 The TIES subcommand specifies how tied values are to be treated.  The
 708 default is to take the mean value of all the tied cases.
 709
 710 The FRACTION subcommand specifies how proportional ranks are to be
 711 calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
 712 functions are requested.
 713
 714 The PRINT subcommand may be used to specify that a summary of the rank
 715 variables created should appear in the output.
 716
 717 The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
 718 PROPORTION and SAVAGE.  Any number of function subcommands may appear.
 719 If none are given, then the default is RANK.
 720 The NTILES subcommand must take an integer specifying the number of
 721 partitions into which values should be ranked.
 722 Each subcommand may be followed by the INTO keyword and a list of
 723 variables which are the variables to be created and receive the rank
 724 scores.  There may be as many variables specified as there are
 725 variables named on the VARIABLES subcommand.  If fewer are specified,
 726 then the variable names are automatically created.
 727
 728 The MISSING subcommand determines how user missing values are to be
 729 treated. A setting of EXCLUDE means that variables whose values are
 730 user-missing are to be excluded from the rank scores. A setting of
 731 INCLUDE means they are to be included.  The default is EXCLUDE.
 732
 733 @include regression.texi