doc/statistics.texi

   1 @node Statistics
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @menu
   8 * DESCRIPTIVES::                Descriptive statistics.
   9 * FREQUENCIES::                 Frequency tables.
  10 * EXAMINE::                     Testing data for normality.
  11 * CROSSTABS::                   Crosstabulation tables.
  12 * NPAR TESTS::                  Nonparametric tests.
  13 * T-TEST::                      Test hypotheses about means.
  14 * ONEWAY::                      One way analysis of variance.
  15 * RANK::                        Compute rank scores.
  16 * REGRESSION::                  Linear regression.
  17 @end menu
  18
  19 @node DESCRIPTIVES
  20 @section DESCRIPTIVES
  21
  22 @vindex DESCRIPTIVES
  23 @display
  24 DESCRIPTIVES
  25         /VARIABLES=var_list
  26         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  27         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  28         /SAVE
  29         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  30                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  31                      SESKEWNESS,SEKURTOSIS@}
  32         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  33                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  34               @{A,D@}
  35 @end display
  36
  37 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  38 descriptive
  39 statistics requested by the user.  In addition, it can optionally
  40 compute Z-scores.
  41
  42 The VARIABLES subcommand, which is required, specifies the list of
  43 variables to be analyzed.  Keyword VARIABLES is optional.
  44
  45 All other subcommands are optional:
  46
  47 The MISSING subcommand determines the handling of missing variables.  If
  48 INCLUDE is set, then user-missing values are included in the
  49 calculations.  If NOINCLUDE is set, which is the default, user-missing
  50 values are excluded.  If VARIABLE is set, then missing values are
  51 excluded on a variable by variable basis; if LISTWISE is set, then
  52 the entire case is excluded whenever any value in that case has a
  53 system-missing or, if INCLUDE is set, user-missing value.
  54
  55 The FORMAT subcommand affects the output format.  Currently the
  56 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  57 set, both valid and missing number of cases are listed in the output;
  58 when NOSERIAL is set, only valid cases are listed.
  59
  60 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  61 the specified variables.  The Z scores are saved to new variables.
  62 Variable names are generated by trying first the original variable name
  63 with Z prepended and truncated to a maximum of 8 characters, then the
  64 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  65 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  66 variable names can be specified explicitly on VARIABLES in the variable
  67 list by enclosing them in parentheses after each variable.
  68
  69 The STATISTICS subcommand specifies the statistics to be displayed:
  70
  71 @table @code
  72 @item ALL
  73 All of the statistics below.
  74 @item MEAN
  75 Arithmetic mean.
  76 @item SEMEAN
  77 Standard error of the mean.
  78 @item STDDEV
  79 Standard deviation.
  80 @item VARIANCE
  81 Variance.
  82 @item KURTOSIS
  83 Kurtosis and standard error of the kurtosis.
  84 @item SKEWNESS
  85 Skewness and standard error of the skewness.
  86 @item RANGE
  87 Range.
  88 @item MINIMUM
  89 Minimum value.
  90 @item MAXIMUM
  91 Maximum value.
  92 @item SUM
  93 Sum.
  94 @item DEFAULT
  95 Mean, standard deviation of the mean, minimum, maximum.
  96 @item SEKURTOSIS
  97 Standard error of the kurtosis.
  98 @item SESKEWNESS
  99 Standard error of the skewness.
 100 @end table
 101
 102 The SORT subcommand specifies how the statistics should be sorted.  Most
 103 of the possible values should be self-explanatory.  NAME causes the
 104 statistics to be sorted by name.  By default, the statistics are listed
 105 in the order that they are specified on the VARIABLES subcommand.  The A
 106 and D settings request an ascending or descending sort order,
 107 respectively.
 108
 109 @node FREQUENCIES
 110 @section FREQUENCIES
 111
 112 @vindex FREQUENCIES
 113 @display
 114 FREQUENCIES
 115         /VARIABLES=var_list
 116         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 117                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 118                 @{LABELS,NOLABELS@}
 119                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 120                 @{SINGLE,DOUBLE@}
 121                 @{OLDPAGE,NEWPAGE@}
 122         /MISSING=@{EXCLUDE,INCLUDE@}
 123         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 124                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 125                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 126         /NTILES=ntiles
 127         /PERCENTILES=percent@dots{}
 128         /HISTOGRAM=[MINIMUM(x_min)] [MAXIMUM(x_max)]
 129                    [@{FREQ,PCNT@}] [@{NONORMAL,NORMAL@}]
 130         /PIECHART=[MINIMUM(x_min)] [MAXIMUM(x_max)] @{NOMISSING,MISSING@}
 131
 132 (These options are not currently implemented.)
 133         /BARCHART=@dots{}
 134         /HBAR=@dots{}
 135         /GROUPED=@dots{}
 136 @end display
 137
 138 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 139 variables.
 140 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 141 (including median and mode) and percentiles.
 142
 143 @cmd{FREQUENCIES} also support graphical output in the form of
 144 histograms and pie charts.  In the future, it will be able to produce
 145 bar charts and output percentiles for grouped data.
 146
 147 The VARIABLES subcommand is the only required subcommand.  Specify the
 148 variables to be analyzed.
 149
 150 The FORMAT subcommand controls the output format.  It has several
 151 possible settings:
 152
 153 @itemize @bullet
 154 @item
 155 TABLE, the default, causes a frequency table to be output for every
 156 variable specified.  NOTABLE prevents them from being output.  LIMIT
 157 with a numeric argument causes them to be output except when there are
 158 more than the specified number of values in the table.
 159
 160 @item
 161 STANDARD frequency tables contain more complete information, but also to
 162 take up more space on the printed page.  CONDENSE frequency tables are
 163 less informative but take up less space.  ONEPAGE with a numeric
 164 argument will output standard frequency tables if there are the
 165 specified number of values or less, condensed tables otherwise.  ONEPAGE
 166 without an argument defaults to a threshold of 50 values.
 167
 168 @item
 169 LABELS causes value labels to be displayed in STANDARD frequency
 170 tables.  NOLABLES prevents this.
 171
 172 @item
 173 Normally frequency tables are sorted in ascending order by value.  This
 174 is AVALUE.  DVALUE tables are sorted in descending order by value.
 175 AFREQ and DFREQ tables are sorted in ascending and descending order,
 176 respectively, by frequency count.
 177
 178 @item
 179 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 180 frequency tables have wider spacing.
 181
 182 @item
 183 OLDPAGE and NEWPAGE are not currently used.
 184 @end itemize
 185
 186 The MISSING subcommand controls the handling of user-missing values.
 187 When EXCLUDE, the default, is set, user-missing values are not included
 188 in frequency tables or statistics.  When INCLUDE is set, user-missing
 189 are included.  System-missing values are never included in statistics,
 190 but are listed in frequency tables.
 191
 192 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 193 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 194 value, and MODE, the mode.  (If there are multiple modes, the smallest
 195 value is reported.)  By default, the mean, standard deviation of the
 196 mean, minimum, and maximum are reported for each variable.
 197
 198 @cindex percentiles
 199 PERCENTILES causes the specified percentiles to be reported.
 200 The percentiles should  be presented at a list of numbers between 0
 201 and 100 inclusive.
 202 The NTILES subcommand causes the percentiles to be reported at the
 203 boundaries of the data set divided into the specified number of ranges.
 204 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 205
 206 The HISTOGRAM subcommand causes the output to include a histogram for
 207 each specified variable.  The X axis by default ranges from the
 208 minimum to the maximum value observed in the data, but the MINIMUM and
 209 MAXIMUM keywords can set an explicit range.  The Y axis by default is
 210 labeled in frequencies; use the PERCENT keyword to causes it to be
 211 labeled in percent of the total observed count.  Specify NORMAL to
 212 superimpose a normal curve on the histogram.
 213
 214 The PIECHART adds a pie chart for each variable to the data.  Each
 215 slice represents one value, with the size of the slice proportional to
 216 the value's frequency.  By default, all non-missing values are given
 217 slices.  The MINIMUM and MAXIMUM keywords can be used to limit the
 218 displayed slices to a given range of values.  The MISSING keyword adds
 219 slices for missing values.
 220
 221 @node EXAMINE
 222 @comment  node-name,  next,  previous,  up
 223 @section EXAMINE
 224 @vindex EXAMINE
 225
 226 @cindex Normality, testing for
 227
 228 @display
 229 EXAMINE
 230         VARIABLES=var_list [BY factor_list ]
 231         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 232         /PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
 233         /CINTERVAL n
 234         /COMPARE=@{GROUPS,VARIABLES@}
 235         /ID=var_name
 236         /@{TOTAL,NOTOTAL@}
 237         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 238         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 239                 [@{NOREPORT,REPORT@}]
 240
 241 @end display
 242
 243 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 244 normal distribution.  It also shows you outliers and extreme values.
 245
 246 The VARIABLES subcommand specifies the dependent variables and the
 247 independent variable to use as factors for the analysis.   Variables
 248 listed before the first BY keyword are the dependent variables.
 249 The dependent variables may optionally be followed by a list of
 250 factors which tell PSPP how to break down the analysis for each
 251 dependent variable.  The format for each factor is
 252 @display
 253 var [BY var].
 254 @end display
 255
 256
 257 The STATISTICS subcommand specifies the analysis to be done.
 258 DESCRIPTIVES will produce a table showing some parametric and
 259 non-parametrics statistics.  EXTREME produces a table showing extreme
 260 values of the dependent variable.  A number in parentheses determines
 261 how many upper and lower extremes to show.  The default number is 5.
 262
 263
 264 The PLOT subcommand specifies which plots are to be produced if any.
 265
 266 The COMPARE subcommand is only relevant if producing boxplots, and it is only
 267 useful there is more than one dependent variable and at least one factor.   If
 268 /COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
 269 containing boxplots for all the factors.
 270 If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
 271 each containing one boxplot per dependent variable.
 272 If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
 273 /COMPARE=GROUPS.
 274
 275 The ID subcommand also pertains to boxplots.  If given, it must
 276 specify a variable name.   Outliers and extreme cases plotted in
 277 boxplots will be labelled with the case from that variable.  Numeric or
 278 string variables are permissible.  If the ID subcommand is not given,
 279 then the casenumber will be used for labelling.
 280
 281 The CINTERVAL subcommand specifies the confidence interval to use in
 282 calculation of the descriptives command.  The default it 95%.
 283
 284 @cindex percentiles
 285 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 286 and which algorithm to use for calculating them.  The default is to
 287 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 288 HAVERAGE algorithm.
 289
 290 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 291 is given and factors have been specified in the VARIABLES subcommand,
 292 then then statistics for the unfactored dependent variables are
 293 produced in addition to the factored variables.  If there are no
 294 factors specified then TOTAL and NOTOTAL have no effect.
 295
 296 @strong{Warning!}
 297 If many dependent variable are given, or factors are given for which
 298 there are many distinct values, then @cmd{EXAMINE} will produce a very
 299 large quantity of output.
 300
 301
 302 @node CROSSTABS
 303 @section CROSSTABS
 304
 305 @vindex CROSSTABS
 306 @display
 307 CROSSTABS
 308         /TABLES=var_list BY var_list [BY var_list]@dots{}
 309         /MISSING=@{TABLE,INCLUDE,REPORT@}
 310         /WRITE=@{NONE,CELLS,ALL@}
 311         /FORMAT=@{TABLES,NOTABLES@}
 312                 @{LABELS,NOLABELS,NOVALLABS@}
 313                 @{PIVOT,NOPIVOT@}
 314                 @{AVALUE,DVALUE@}
 315                 @{NOINDEX,INDEX@}
 316                 @{BOX,NOBOX@}
 317         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 318                 ASRESIDUAL,ALL,NONE@}
 319         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 320                      KAPPA,ETA,CORR,ALL,NONE@}
 321
 322 (Integer mode.)
 323         /VARIABLES=var_list (low,high)@dots{}
 324 @end display
 325
 326 The @cmd{CROSSTABS} procedure displays crosstabulation
 327 tables requested by the user.  It can calculate several statistics for
 328 each cell in the crosstabulation tables.  In addition, a number of
 329 statistics can be calculated for each table itself.
 330
 331 The TABLES subcommand is used to specify the tables to be reported.  Any
 332 number of dimensions is permitted, and any number of variables per
 333 dimension is allowed.  The TABLES subcommand may be repeated as many
 334 times as needed.  This is the only required subcommand in @dfn{general
 335 mode}.
 336
 337 Occasionally, one may want to invoke a special mode called @dfn{integer
 338 mode}.  Normally, in general mode, PSPP automatically determines
 339 what values occur in the data.  In integer mode, the user specifies the
 340 range of values that the data assumes.  To invoke this mode, specify the
 341 VARIABLES subcommand, giving a range of data values in parentheses for
 342 each variable to be used on the TABLES subcommand.  Data values inside
 343 the range are truncated to the nearest integer, then assigned to that
 344 value.  If values occur outside this range, they are discarded.  When it
 345 is present, the VARIABLES subcommand must precede the TABLES
 346 subcommand.
 347
 348 In general mode, numeric and string variables may be specified on
 349 TABLES.  Although long string variables are allowed, only their
 350 initial short-string parts are used.  In integer mode, only numeric
 351 variables are allowed.
 352
 353 The MISSING subcommand determines the handling of user-missing values.
 354 When set to TABLE, the default, missing values are dropped on a table by
 355 table basis.  When set to INCLUDE, user-missing values are included in
 356 tables and statistics.  When set to REPORT, which is allowed only in
 357 integer mode, user-missing values are included in tables but marked with
 358 an @samp{M} (for ``missing'') and excluded from statistical
 359 calculations.
 360
 361 Currently the WRITE subcommand is ignored.
 362
 363 The FORMAT subcommand controls the characteristics of the
 364 crosstabulation tables to be displayed.  It has a number of possible
 365 settings:
 366
 367 @itemize @bullet
 368 @item
 369 TABLES, the default, causes crosstabulation tables to be output.
 370 NOTABLES suppresses them.
 371
 372 @item
 373 LABELS, the default, allows variable labels and value labels to appear
 374 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 375 labels but suppresses value labels.
 376
 377 @item
 378 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 379 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 380 to be used.
 381
 382 @item
 383 AVALUE, the default, causes values to be sorted in ascending order.
 384 DVALUE asserts a descending sort order.
 385
 386 @item
 387 INDEX/NOINDEX is currently ignored.
 388
 389 @item
 390 BOX/NOBOX is currently ignored.
 391 @end itemize
 392
 393 The CELLS subcommand controls the contents of each cell in the displayed
 394 crosstabulation table.  The possible settings are:
 395
 396 @table @asis
 397 @item COUNT
 398 Frequency count.
 399 @item ROW
 400 Row percent.
 401 @item COLUMN
 402 Column percent.
 403 @item TOTAL
 404 Table percent.
 405 @item EXPECTED
 406 Expected value.
 407 @item RESIDUAL
 408 Residual.
 409 @item SRESIDUAL
 410 Standardized residual.
 411 @item ASRESIDUAL
 412 Adjusted standardized residual.
 413 @item ALL
 414 All of the above.
 415 @item NONE
 416 Suppress cells entirely.
 417 @end table
 418
 419 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 420 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 421 will be selected.
 422
 423 The STATISTICS subcommand selects statistics for computation:
 424
 425 @table @asis
 426 @item CHISQ
 427 @cindex chisquare
 428 @cindex chi-square
 429
 430 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 431 correction, linear-by-linear association.
 432 @item PHI
 433 Phi.
 434 @item CC
 435 Contingency coefficient.
 436 @item LAMBDA
 437 Lambda.
 438 @item UC
 439 Uncertainty coefficient.
 440 @item BTAU
 441 Tau-b.
 442 @item CTAU
 443 Tau-c.
 444 @item RISK
 445 Risk estimate.
 446 @item GAMMA
 447 Gamma.
 448 @item D
 449 Somers' D.
 450 @item KAPPA
 451 Cohen's Kappa.
 452 @item ETA
 453 Eta.
 454 @item CORR
 455 Spearman correlation, Pearson's r.
 456 @item ALL
 457 All of the above.
 458 @item NONE
 459 No statistics.
 460 @end table
 461
 462 Selected statistics are only calculated when appropriate for the
 463 statistic.  Certain statistics require tables of a particular size, and
 464 some statistics are calculated only in integer mode.
 465
 466 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 467 STATISTICS subcommand is not given, no statistics are calculated.
 468
 469 @strong{Please note:} Currently the implementation of CROSSTABS has the
 470 followings bugs:
 471
 472 @itemize @bullet
 473 @item
 474 Pearson's R (but not Spearman) is off a little.
 475 @item
 476 T values for Spearman's R and Pearson's R are wrong.
 477 @item
 478 Significance of symmetric and directional measures is not calculated.
 479 @item
 480 Asymmetric ASEs and T values for lambda are wrong.
 481 @item
 482 ASE of Goodman and Kruskal's tau is not calculated.
 483 @item
 484 ASE of symmetric somers' d is wrong.
 485 @item
 486 Approximate T of uncertainty coefficient is wrong.
 487 @end itemize
 488
 489 Fixes for any of these deficiencies would be welcomed.
 490
 491 @node NPAR TESTS
 492 @section NPAR TESTS
 493
 494 @vindex NPAR TESTS
 495 @cindex nonparametric tests
 496
 497 @display
 498 NPAR TESTS
 499
 500      nonparametric test subcommands
 501      .
 502      .
 503      .
 504
 505      [ /STATISTICS=@{DESCRIPTIVES@} ]
 506
 507      [ /MISSING=@{ANALYSIS, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
 508 @end display
 509
 510 NPAR TESTS performs nonparametric tests.
 511 Non parametric tests make very few assumptions about the distribution of the
 512 data.
 513 One or more tests may be specified by using the corresponding subcommand.
 514 If the /STATISTICS subcommand is also specified, then summary statistics are
 515 produces for each variable that is the subject of any test.
 516
 517
 518 @menu
 519 * BINOMIAL::                Binomial Test
 520 * CHISQUARE::               Chisquare Test
 521 @end menu
 522
 523
 524 @node    BINOMIAL
 525 @subsection Binomial test
 526 @vindex BINOMIAL
 527 @cindex binomial test
 528
 529 @display
 530      [ /BINOMIAL[(p)]=var_list[(value1[, value2)] ] ]
 531 @end display
 532
 533 The binomial test compares the observed distribution of a dichotomous
 534 variable with that of a binomial distribution.
 535 The variable @var{p} specifies the test proportion of the binomial
 536 distribution.
 537 The default value of 0.5 is assumed if @var{p} is omitted.
 538
 539 If a single value appears after the variable list, then that value is
 540 used as the threshold to partition the observed values. Values less
 541 than or equal to the threshold value form the first category.  Values
 542 greater than the threshold form the second category.
 543
 544 If two values appear after the variable list, then they will be used
 545 as the values which a variable must take to be in the respective
 546 category.
 547 Cases for which a variable takes a value equal to neither of the specified
 548 values, take no part in the test for that variable.
 549
 550 If no values appear, then the variable must assume dichotomous
 551 values.
 552 If more than two distinct, non-missing values for a variable
 553 under test are encountered then an error occurs.
 554
 555 If the test proportion is equal to 0.5, then a one tailed test is
 556 reported.   For any other test proportion, a one tailed test is
 557 reported.
 558 For one tailed tests, if the test proportion is less than
 559 or equal to the observed proportion, then the significance of
 560 observing the observed proportion or more is reported.
 561 If the test proportion is more than the observed proportion, then the
 562 significance of observing the observed proportion or less is reported.
 563 That is to say, the test is always performed in the observed
 564 direction.
 565
 566 PSPP uses a very precise approximation to the gamma function to
 567 compute the binomial significance.  Thus, exact results are reported
 568 even for very large sample sizes.
 569
 570
 571
 572 @node    CHISQUARE
 573 @subsection Chisquare test
 574 @vindex CHISQUARE
 575 @cindex chisquare test
 576
 577
 578 @display
 579      [ /CHISQUARE=var_list[(lo,hi)] [/EXPECTED=@{EQUAL|f1, f2 @dots{} fn@}] ]
 580 @end display
 581
 582
 583 The chisquare test produces a chi-square statistic for the differences
 584 between the expected and observed frequencies of the categories of a variable.
 585 Optionally, a range of values may appear after the variable list.
 586 If a range is given, then non integer values are truncated, and values
 587 outside the  specified range are excluded from the analysis.
 588
 589 The /EXPECTED subcommand specifies the expected values of each
 590 category.
 591 There must be exactly one non-zero expected value, for each observed
 592 category, or the EQUAL keywork must be specified.
 593 You may use the notation @var{n}*@var{f} to specify @var{n}
 594 consecutive expected categories all taking a frequency of @var{f}.
 595 The frequencies given are proportions, not absolute frequencies.  The
 596 sum of the frequencies need not be 1.
 597 If no /EXPECTED subcommand is given, then then equal frequencies
 598 are expected.
 599
 600
 601 @node T-TEST
 602 @comment  node-name,  next,  previous,  up
 603 @section T-TEST
 604
 605 @vindex T-TEST
 606
 607 @display
 608 T-TEST
 609         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 610         /CRITERIA=CIN(confidence)
 611
 612
 613 (One Sample mode.)
 614         TESTVAL=test_value
 615         /VARIABLES=var_list
 616
 617
 618 (Independent Samples mode.)
 619         GROUPS=var(value1 [, value2])
 620         /VARIABLES=var_list
 621
 622
 623 (Paired Samples mode.)
 624         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 625
 626 @end display
 627
 628
 629 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 630 means.
 631 It operates in one of three modes:
 632 @itemize
 633 @item One Sample mode.
 634 @item Independent Groups mode.
 635 @item Paired mode.
 636 @end itemize
 637
 638 @noindent
 639 Each of these modes are described in more detail below.
 640 There are two optional subcommands which are common to all modes.
 641
 642 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 643 in the tests.  The default value is 0.95.
 644
 645
 646 The @cmd{MISSING} subcommand determines the handling of missing
 647 variables.
 648 If INCLUDE is set, then user-missing values are included in the
 649 calculations, but system-missing values are not.
 650 If EXCLUDE is set, which is the default, user-missing
 651 values are excluded as well as system-missing values.
 652 This is the default.
 653
 654 If LISTWISE is set, then the entire case is excluded from analysis
 655 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 656 @cmd{/GROUPS} subcommands contains a missing value.
 657 If ANALYSIS is set, then missing values are excluded only in the analysis for
 658 which they would be needed. This is the default.
 659
 660
 661 @menu
 662 * One Sample Mode::             Testing against a hypothesised mean
 663 * Independent Samples Mode::    Testing two independent groups for equal mean
 664 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 665 @end menu
 666
 667 @node One Sample Mode
 668 @subsection One Sample Mode
 669
 670 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 671 This mode is used to test a population mean against a hypothesised
 672 mean.
 673 The value given to the @cmd{TESTVAL} subcommand is the value against
 674 which you wish to test.
 675 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 676 tell PSPP which variables you wish to test.
 677
 678 @node Independent Samples Mode
 679 @comment  node-name,  next,  previous,  up
 680 @subsection Independent Samples Mode
 681
 682 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 683 `Groups' mode.
 684 This mode is used to test whether two groups of values have the
 685 same population mean.
 686 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 687 tell PSPP the dependent variables you wish to test.
 688
 689 The variable given in the @cmd{GROUPS} subcommand is the independent
 690 variable which determines to which group the samples belong.
 691 The values in parentheses are the specific values of the independent
 692 variable for each group.
 693 If the parentheses are omitted and no values are given, the default values
 694 of 1.0 and 2.0 are assumed.
 695
 696 If the independent variable is numeric,
 697 it is acceptable to specify only one value inside the parentheses.
 698 If you do this, cases where the independent variable is
 699 greater than or equal to this value belong to the first group, and cases
 700 less than this value belong to the second group.
 701 When using this form of the @cmd{GROUPS} subcommand, missing values in
 702 the independent variable are excluded on a listwise basis, regardless
 703 of whether @cmd{/MISSING=LISTWISE} was specified.
 704
 705
 706 @node Paired Samples Mode
 707 @comment  node-name,  next,  previous,  up
 708 @subsection Paired Samples Mode
 709
 710 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 711 Use this mode when repeated measures have been taken from the same
 712 samples.
 713 If the @code{WITH} keyword is omitted, then tables for all
 714 combinations of variables given in the @cmd{PAIRS} subcommand are
 715 generated.
 716 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 717 is also given, then the number of variables preceding @code{WITH}
 718 must be the same as the number following it.
 719 In this case, tables for each respective pair of variables are
 720 generated.
 721 In the event that the @code{WITH} keyword is given, but the
 722 @code{(PAIRED)} keyword is omitted, then tables for each combination
 723 of variable preceding @code{WITH} against variable following
 724 @code{WITH} are generated.
 725
 726
 727 @node ONEWAY
 728 @comment  node-name,  next,  previous,  up
 729 @section ONEWAY
 730
 731 @vindex ONEWAY
 732 @cindex analysis of variance
 733 @cindex ANOVA
 734
 735 @display
 736 ONEWAY
 737         [/VARIABLES = ] var_list BY var
 738         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 739         /CONTRAST= value1 [, value2] ... [,valueN]
 740         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 741
 742 @end display
 743
 744 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 745 variables factored by a single independent variable.
 746 It is used to compare the means of a population
 747 divided into more than two groups.
 748
 749 The  variables to be analysed should be given in the @code{VARIABLES}
 750 subcommand.
 751 The list of variables must be followed by the @code{BY} keyword and
 752 the name of the independent (or factor) variable.
 753
 754 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 755 ancilliary information.  The options accepted are:
 756 @itemize
 757 @item DESCRIPTIVES
 758 Displays descriptive statistics about the groups factored by the independent
 759 variable.
 760 @item HOMOGENEITY
 761 Displays the Levene test of Homogeneity of Variance for the
 762 variables and their groups.
 763 @end itemize
 764
 765 The @code{CONTRAST} subcommand is used when you anticipate certain
 766 differences between the groups.
 767 The subcommand must be followed by a list of numerals which are the
 768 coefficients of the groups to be tested.
 769 The number of coefficients must correspond to the number of distinct
 770 groups (or values of the independent variable).
 771 If the total sum of the coefficients are not zero, then PSPP will
 772 display a warning, but will proceed with the analysis.
 773 The @code{CONTRAST} subcommand may be given up to 10 times in order
 774 to specify different contrast tests.
 775 @setfilename ignored
 776
 777 @node RANK
 778 @comment  node-name,  next,  previous,  up
 779 @section RANK
 780
 781 @vindex RANK
 782 @display
 783 RANK
 784         [VARIABLES=] var_list [@{A,D@}] [BY var_list]
 785         /TIES=@{MEAN,LOW,HIGH,CONDENSE@}
 786         /FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
 787         /PRINT[=@{YES,NO@}
 788         /MISSING=@{EXCLUDE,INCLUDE@}
 789
 790         /RANK [INTO var_list]
 791         /NTILES(k) [INTO var_list]
 792         /NORMAL [INTO var_list]
 793         /PERCENT [INTO var_list]
 794         /RFRACTION [INTO var_list]
 795         /PROPORTION [INTO var_list]
 796         /N [INTO var_list]
 797         /SAVAGE [INTO var_list]
 798 @end display
 799
 800 The @cmd{RANK} command ranks variables and stores the results into new
 801 variables.
 802
 803 The VARIABLES subcommand, which is mandatory, specifies one or
 804 more variables whose values are to be ranked.
 805 After each variable, @samp{A} or @samp{D} may appear, indicating that
 806 the variable is to be ranked in ascending or descending order.
 807 Ascending is the default.
 808 If a BY keyword appears, it should be followed by a list of variables
 809 which are to serve as group variables.
 810 In this case, the cases are gathered into groups, and ranks calculated
 811 for each group.
 812
 813 The TIES subcommand specifies how tied values are to be treated.  The
 814 default is to take the mean value of all the tied cases.
 815
 816 The FRACTION subcommand specifies how proportional ranks are to be
 817 calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
 818 functions are requested.
 819
 820 The PRINT subcommand may be used to specify that a summary of the rank
 821 variables created should appear in the output.
 822
 823 The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
 824 PROPORTION and SAVAGE.  Any number of function subcommands may appear.
 825 If none are given, then the default is RANK.
 826 The NTILES subcommand must take an integer specifying the number of
 827 partitions into which values should be ranked.
 828 Each subcommand may be followed by the INTO keyword and a list of
 829 variables which are the variables to be created and receive the rank
 830 scores.  There may be as many variables specified as there are
 831 variables named on the VARIABLES subcommand.  If fewer are specified,
 832 then the variable names are automatically created.
 833
 834 The MISSING subcommand determines how user missing values are to be
 835 treated. A setting of EXCLUDE means that variables whose values are
 836 user-missing are to be excluded from the rank scores. A setting of
 837 INCLUDE means they are to be included.  The default is EXCLUDE.
 838
 839 @include regression.texi