doc/statistics.texi

   1 @node Statistics, Utilities, Conditionals and Looping, Top
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @menu
   8 * DESCRIPTIVES::                Descriptive statistics.
   9 * FREQUENCIES::                 Frequency tables.
  10 * EXAMINE::                     Testing data for normality.
  11 * CROSSTABS::                   Crosstabulation tables.
  12 * NPAR TESTS::                  Nonparametric tests.
  13 * T-TEST::                      Test hypotheses about means.
  14 * ONEWAY::                      One way analysis of variance.
  15 * RANK::                        Compute rank scores.
  16 * REGRESSION::                  Linear regression.
  17 @end menu
  18
  19 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
  20 @section DESCRIPTIVES
  21
  22 @vindex DESCRIPTIVES
  23 @display
  24 DESCRIPTIVES
  25         /VARIABLES=var_list
  26         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  27         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  28         /SAVE
  29         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  30                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  31                      SESKEWNESS,SEKURTOSIS@}
  32         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  33                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  34               @{A,D@}
  35 @end display
  36
  37 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  38 descriptive
  39 statistics requested by the user.  In addition, it can optionally
  40 compute Z-scores.
  41
  42 The VARIABLES subcommand, which is required, specifies the list of
  43 variables to be analyzed.  Keyword VARIABLES is optional.
  44
  45 All other subcommands are optional:
  46
  47 The MISSING subcommand determines the handling of missing variables.  If
  48 INCLUDE is set, then user-missing values are included in the
  49 calculations.  If NOINCLUDE is set, which is the default, user-missing
  50 values are excluded.  If VARIABLE is set, then missing values are
  51 excluded on a variable by variable basis; if LISTWISE is set, then
  52 the entire case is excluded whenever any value in that case has a
  53 system-missing or, if INCLUDE is set, user-missing value.
  54
  55 The FORMAT subcommand affects the output format.  Currently the
  56 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  57 set, both valid and missing number of cases are listed in the output;
  58 when NOSERIAL is set, only valid cases are listed.
  59
  60 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  61 the specified variables.  The Z scores are saved to new variables.
  62 Variable names are generated by trying first the original variable name
  63 with Z prepended and truncated to a maximum of 8 characters, then the
  64 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  65 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  66 variable names can be specified explicitly on VARIABLES in the variable
  67 list by enclosing them in parentheses after each variable.
  68
  69 The STATISTICS subcommand specifies the statistics to be displayed:
  70
  71 @table @code
  72 @item ALL
  73 All of the statistics below.
  74 @item MEAN
  75 Arithmetic mean.
  76 @item SEMEAN
  77 Standard error of the mean.
  78 @item STDDEV
  79 Standard deviation.
  80 @item VARIANCE
  81 Variance.
  82 @item KURTOSIS
  83 Kurtosis and standard error of the kurtosis.
  84 @item SKEWNESS
  85 Skewness and standard error of the skewness.
  86 @item RANGE
  87 Range.
  88 @item MINIMUM
  89 Minimum value.
  90 @item MAXIMUM
  91 Maximum value.
  92 @item SUM
  93 Sum.
  94 @item DEFAULT
  95 Mean, standard deviation of the mean, minimum, maximum.
  96 @item SEKURTOSIS
  97 Standard error of the kurtosis.
  98 @item SESKEWNESS
  99 Standard error of the skewness.
 100 @end table
 101
 102 The SORT subcommand specifies how the statistics should be sorted.  Most
 103 of the possible values should be self-explanatory.  NAME causes the
 104 statistics to be sorted by name.  By default, the statistics are listed
 105 in the order that they are specified on the VARIABLES subcommand.  The A
 106 and D settings request an ascending or descending sort order,
 107 respectively.
 108
 109 @node FREQUENCIES, EXAMINE, DESCRIPTIVES, Statistics
 110 @section FREQUENCIES
 111
 112 @vindex FREQUENCIES
 113 @display
 114 FREQUENCIES
 115         /VARIABLES=var_list
 116         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 117                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 118                 @{LABELS,NOLABELS@}
 119                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 120                 @{SINGLE,DOUBLE@}
 121                 @{OLDPAGE,NEWPAGE@}
 122         /MISSING=@{EXCLUDE,INCLUDE@}
 123         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 124                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 125                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 126         /NTILES=ntiles
 127         /PERCENTILES=percent@dots{}
 128         /HISTOGRAM=[MINIMUM(x_min)] [MAXIMUM(x_max)]
 129                    [@{FREQ,PCNT@}] [@{NONORMAL,NORMAL@}]
 130         /PIECHART=[MINIMUM(x_min)] [MAXIMUM(x_max)] @{NOMISSING,MISSING@}
 131
 132 (These options are not currently implemented.)
 133         /BARCHART=@dots{}
 134         /HBAR=@dots{}
 135         /GROUPED=@dots{}
 136
 137 (Integer mode.)
 138         /VARIABLES=var_list (low,high)@dots{}
 139 @end display
 140
 141 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 142 variables.
 143 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 144 (including median and mode) and percentiles.
 145
 146 @cmd{FREQUENCIES} also support graphical output in the form of
 147 histograms and pie charts.  In the future, it will be able to produce
 148 bar charts and output percentiles for grouped data.
 149
 150 The VARIABLES subcommand is the only required subcommand.  Specify the
 151 variables to be analyzed.  In most cases, this is all that is required.
 152 This is known as @dfn{general mode}.
 153
 154 Occasionally, one may want to invoke a special mode called @dfn{integer
 155 mode}.  Normally, in general mode, PSPP will automatically determine
 156 what values occur in the data.  In integer mode, the user specifies the
 157 range of values that the data assumes.  To invoke this mode, specify a
 158 range of data values in parentheses, separated by a comma.  Data values
 159 inside the range are truncated to the nearest integer, then assigned to
 160 that value.  If values occur outside this range, they are discarded.
 161
 162 The FORMAT subcommand controls the output format.  It has several
 163 possible settings:
 164
 165 @itemize @bullet
 166 @item
 167 TABLE, the default, causes a frequency table to be output for every
 168 variable specified.  NOTABLE prevents them from being output.  LIMIT
 169 with a numeric argument causes them to be output except when there are
 170 more than the specified number of values in the table.
 171
 172 @item
 173 STANDARD frequency tables contain more complete information, but also to
 174 take up more space on the printed page.  CONDENSE frequency tables are
 175 less informative but take up less space.  ONEPAGE with a numeric
 176 argument will output standard frequency tables if there are the
 177 specified number of values or less, condensed tables otherwise.  ONEPAGE
 178 without an argument defaults to a threshold of 50 values.
 179
 180 @item
 181 LABELS causes value labels to be displayed in STANDARD frequency
 182 tables.  NOLABLES prevents this.
 183
 184 @item
 185 Normally frequency tables are sorted in ascending order by value.  This
 186 is AVALUE.  DVALUE tables are sorted in descending order by value.
 187 AFREQ and DFREQ tables are sorted in ascending and descending order,
 188 respectively, by frequency count.
 189
 190 @item
 191 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 192 frequency tables have wider spacing.
 193
 194 @item
 195 OLDPAGE and NEWPAGE are not currently used.
 196 @end itemize
 197
 198 The MISSING subcommand controls the handling of user-missing values.
 199 When EXCLUDE, the default, is set, user-missing values are not included
 200 in frequency tables or statistics.  When INCLUDE is set, user-missing
 201 are included.  System-missing values are never included in statistics,
 202 but are listed in frequency tables.
 203
 204 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 205 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 206 value, and MODE, the mode.  (If there are multiple modes, the smallest
 207 value is reported.)  By default, the mean, standard deviation of the
 208 mean, minimum, and maximum are reported for each variable.
 209
 210 @cindex percentiles
 211 PERCENTILES causes the specified percentiles to be reported.
 212 The percentiles should  be presented at a list of numbers between 0
 213 and 100 inclusive.
 214 The NTILES subcommand causes the percentiles to be reported at the
 215 boundaries of the data set divided into the specified number of ranges.
 216 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 217
 218 The HISTOGRAM subcommand causes the output to include a histogram for
 219 each specified variable.  The X axis by default ranges from the
 220 minimum to the maximum value observed in the data, but the MINIMUM and
 221 MAXIMUM keywords can set an explicit range.  The Y axis by default is
 222 labeled in frequencies; use the PERCENT keyword to causes it to be
 223 labeled in percent of the total observed count.  Specify NORMAL to
 224 superimpose a normal curve on the histogram.
 225
 226 The PIECHART adds a pie chart for each variable to the data.  Each
 227 slice represents one value, with the size of the slice proportional to
 228 the value's frequency.  By default, all non-missing values are given
 229 slices.  The MINIMUM and MAXIMUM keywords can be used to limit the
 230 displayed slices to a given range of values.  The MISSING keyword adds
 231 slices for missing values.
 232
 233 @node EXAMINE, CROSSTABS, FREQUENCIES, Statistics
 234 @comment  node-name,  next,  previous,  up
 235 @section EXAMINE
 236 @vindex EXAMINE
 237
 238 @cindex Normality, testing for
 239
 240 @display
 241 EXAMINE
 242         VARIABLES=var_list [BY factor_list ]
 243         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 244         /PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
 245         /CINTERVAL n
 246         /COMPARE=@{GROUPS,VARIABLES@}
 247         /ID=@{case_number, var_name@}
 248         /@{TOTAL,NOTOTAL@}
 249         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 250         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 251                 [@{NOREPORT,REPORT@}]
 252
 253 @end display
 254
 255 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 256 normal distribution.  It also shows you outliers and extreme values.
 257
 258 The VARIABLES subcommand specifies the dependent variables and the
 259 independent variable to use as factors for the analysis.   Variables
 260 listed before the first BY keyword are the dependent variables.
 261 The dependent variables may optionally be followed by a list of
 262 factors which tell PSPP how to break down the analysis for each
 263 dependent variable.  The format for each factor is
 264 @display
 265 var [BY var].
 266 @end display
 267
 268
 269 The STATISTICS subcommand specifies the analysis to be done.
 270 DESCRIPTIVES will produce a table showing some parametric and
 271 non-parametrics statistics.  EXTREME produces a table showing extreme
 272 values of the dependent variable.  A number in parentheses determines
 273 how many upper and lower extremes to show.  The default number is 5.
 274
 275
 276 The PLOT subcommand specifies which plots are to be produced if any.
 277
 278 The COMPARE subcommand is only relevant if producing boxplots, and it is only
 279 useful there is more than one dependent variable and at least one factor.   If
 280 /COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
 281 containing boxplots for all the factors.
 282 If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
 283 each containing one boxplot per dependent variable.
 284 If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
 285 /COMPARE=GROUPS.
 286
 287 The CINTERVAL subcommand specifies the confidence interval to use in
 288 calculation of the descriptives command.  The default it 95%.
 289
 290 @cindex percentiles
 291 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 292 and which algorithm to use for calculating them.  The default is to
 293 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 294 HAVERAGE algorithm.
 295
 296 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 297 is given and factors have been specified in the VARIABLES subcommand,
 298 then then statistics for the unfactored dependent variables are
 299 produced in addition to the factored variables.  If there are no
 300 factors specified then TOTAL and NOTOTAL have no effect.
 301
 302 @strong{Warning!}
 303 If many dependent variable are given, or factors are given for which
 304 there are many distinct values, then @cmd{EXAMINE} will produce a very
 305 large quantity of output.
 306
 307
 308 @node CROSSTABS, NPAR TESTS, EXAMINE, Statistics
 309 @section CROSSTABS
 310
 311 @vindex CROSSTABS
 312 @display
 313 CROSSTABS
 314         /TABLES=var_list BY var_list [BY var_list]@dots{}
 315         /MISSING=@{TABLE,INCLUDE,REPORT@}
 316         /WRITE=@{NONE,CELLS,ALL@}
 317         /FORMAT=@{TABLES,NOTABLES@}
 318                 @{LABELS,NOLABELS,NOVALLABS@}
 319                 @{PIVOT,NOPIVOT@}
 320                 @{AVALUE,DVALUE@}
 321                 @{NOINDEX,INDEX@}
 322                 @{BOX,NOBOX@}
 323         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 324                 ASRESIDUAL,ALL,NONE@}
 325         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 326                      KAPPA,ETA,CORR,ALL,NONE@}
 327
 328 (Integer mode.)
 329         /VARIABLES=var_list (low,high)@dots{}
 330 @end display
 331
 332 The @cmd{CROSSTABS} procedure displays crosstabulation
 333 tables requested by the user.  It can calculate several statistics for
 334 each cell in the crosstabulation tables.  In addition, a number of
 335 statistics can be calculated for each table itself.
 336
 337 The TABLES subcommand is used to specify the tables to be reported.  Any
 338 number of dimensions is permitted, and any number of variables per
 339 dimension is allowed.  The TABLES subcommand may be repeated as many
 340 times as needed.  This is the only required subcommand in @dfn{general
 341 mode}.
 342
 343 Occasionally, one may want to invoke a special mode called @dfn{integer
 344 mode}.  Normally, in general mode, PSPP automatically determines
 345 what values occur in the data.  In integer mode, the user specifies the
 346 range of values that the data assumes.  To invoke this mode, specify the
 347 VARIABLES subcommand, giving a range of data values in parentheses for
 348 each variable to be used on the TABLES subcommand.  Data values inside
 349 the range are truncated to the nearest integer, then assigned to that
 350 value.  If values occur outside this range, they are discarded.  When it
 351 is present, the VARIABLES subcommand must precede the TABLES
 352 subcommand.
 353
 354 In general mode, numeric and string variables may be specified on
 355 TABLES.  Although long string variables are allowed, only their
 356 initial short-string parts are used.  In integer mode, only numeric
 357 variables are allowed.
 358
 359 The MISSING subcommand determines the handling of user-missing values.
 360 When set to TABLE, the default, missing values are dropped on a table by
 361 table basis.  When set to INCLUDE, user-missing values are included in
 362 tables and statistics.  When set to REPORT, which is allowed only in
 363 integer mode, user-missing values are included in tables but marked with
 364 an @samp{M} (for ``missing'') and excluded from statistical
 365 calculations.
 366
 367 Currently the WRITE subcommand is ignored.
 368
 369 The FORMAT subcommand controls the characteristics of the
 370 crosstabulation tables to be displayed.  It has a number of possible
 371 settings:
 372
 373 @itemize @bullet
 374 @item
 375 TABLES, the default, causes crosstabulation tables to be output.
 376 NOTABLES suppresses them.
 377
 378 @item
 379 LABELS, the default, allows variable labels and value labels to appear
 380 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 381 labels but suppresses value labels.
 382
 383 @item
 384 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 385 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 386 to be used.
 387
 388 @item
 389 AVALUE, the default, causes values to be sorted in ascending order.
 390 DVALUE asserts a descending sort order.
 391
 392 @item
 393 INDEX/NOINDEX is currently ignored.
 394
 395 @item
 396 BOX/NOBOX is currently ignored.
 397 @end itemize
 398
 399 The CELLS subcommand controls the contents of each cell in the displayed
 400 crosstabulation table.  The possible settings are:
 401
 402 @table @asis
 403 @item COUNT
 404 Frequency count.
 405 @item ROW
 406 Row percent.
 407 @item COLUMN
 408 Column percent.
 409 @item TOTAL
 410 Table percent.
 411 @item EXPECTED
 412 Expected value.
 413 @item RESIDUAL
 414 Residual.
 415 @item SRESIDUAL
 416 Standardized residual.
 417 @item ASRESIDUAL
 418 Adjusted standardized residual.
 419 @item ALL
 420 All of the above.
 421 @item NONE
 422 Suppress cells entirely.
 423 @end table
 424
 425 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 426 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 427 will be selected.
 428
 429 The STATISTICS subcommand selects statistics for computation:
 430
 431 @table @asis
 432 @item CHISQ
 433 @cindex chisquare
 434 @cindex chi-square
 435
 436 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 437 correction, linear-by-linear association.
 438 @item PHI
 439 Phi.
 440 @item CC
 441 Contingency coefficient.
 442 @item LAMBDA
 443 Lambda.
 444 @item UC
 445 Uncertainty coefficient.
 446 @item BTAU
 447 Tau-b.
 448 @item CTAU
 449 Tau-c.
 450 @item RISK
 451 Risk estimate.
 452 @item GAMMA
 453 Gamma.
 454 @item D
 455 Somers' D.
 456 @item KAPPA
 457 Cohen's Kappa.
 458 @item ETA
 459 Eta.
 460 @item CORR
 461 Spearman correlation, Pearson's r.
 462 @item ALL
 463 All of the above.
 464 @item NONE
 465 No statistics.
 466 @end table
 467
 468 Selected statistics are only calculated when appropriate for the
 469 statistic.  Certain statistics require tables of a particular size, and
 470 some statistics are calculated only in integer mode.
 471
 472 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 473 STATISTICS subcommand is not given, no statistics are calculated.
 474
 475 @strong{Please note:} Currently the implementation of CROSSTABS has the
 476 followings bugs:
 477
 478 @itemize @bullet
 479 @item
 480 Pearson's R (but not Spearman) is off a little.
 481 @item
 482 T values for Spearman's R and Pearson's R are wrong.
 483 @item
 484 Significance of symmetric and directional measures is not calculated.
 485 @item
 486 Asymmetric ASEs and T values for lambda are wrong.
 487 @item
 488 ASE of Goodman and Kruskal's tau is not calculated.
 489 @item
 490 ASE of symmetric somers' d is wrong.
 491 @item
 492 Approximate T of uncertainty coefficient is wrong.
 493 @end itemize
 494
 495 Fixes for any of these deficiencies would be welcomed.
 496
 497 @node NPAR TESTS, T-TEST, CROSSTABS, Statistics
 498 @section NPAR TESTS
 499
 500 @vindex NPAR TESTS
 501 @cindex nonparametric tests
 502
 503 @display
 504 NPAR TESTS
 505
 506      nonparametric test subcommands
 507      .
 508      .
 509      .
 510
 511      [ /STATISTICS=@{DESCRIPTIVES@} ]
 512
 513      [ /MISSING=@{ANALYSIS, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
 514 @end display
 515
 516 NPAR TESTS performs nonparametric tests.
 517 Non parametric tests make very few assumptions about the distribution of the
 518 data.
 519 One or more tests may be specified by using the corresponding subcommand.
 520 If the /STATISTICS subcommand is also specified, then summary statistics are
 521 produces for each variable that is the subject of any test.
 522
 523
 524 @menu
 525 * BINOMIAL::                Binomial Test
 526 * CHISQUARE::               Chisquare Test
 527 @end menu
 528
 529
 530 @node    BINOMIAL,  CHISQUARE, NPAR TESTS, NPAR TESTS
 531 @subsection Binomial test
 532 @vindex BINOMIAL
 533 @cindex binomial test
 534
 535 @display
 536      [ /BINOMIAL[(p)]=var_list[(value1[, value2)] ] ]
 537 @end display
 538
 539 The binomial test compares the observed distribution of a dichotomous
 540 variable with that of a binomial distribution.
 541 The variable @var{p} specifies the test proportion of the binomial
 542 distribution.
 543 The default value of 0.5 is assumed if @var{p} is omitted.
 544
 545 If a single value appears after the variable list, then that value is
 546 used as the threshold to partition the observed values. Values less
 547 than or equal to the threshold value form the first category.  Values
 548 greater than the threshold form the second category.
 549
 550 If two values appear after the variable list, then they will be used
 551 as the values which a variable must take to be in the respective
 552 category.
 553 Cases for which a variable takes a value equal to neither of the specified
 554 values, take no part in the test for that variable.
 555
 556 If no values appear, then the variable must assume dichotomous
 557 values.
 558 If more than two distinct, non-missing values for a variable
 559 under test are encountered then an error occurs.
 560
 561 If the test proportion is equal to 0.5, then a one tailed test is
 562 reported.   For any other test proportion, a one tailed test is
 563 reported.
 564 For one tailed tests, if the test proportion is less than
 565 or equal to the observed proportion, then the significance of
 566 observing the observed proportion or more is reported.
 567 If the test proportion is more than the observed proportion, then the
 568 significance of observing the observed proportion or less is reported.
 569 That is to say, the test is always performed in the observed
 570 direction.
 571
 572 PSPP uses a very precise approximation to the gamma function to
 573 compute the binomial significance.  Thus, exact results are reported
 574 even for very large sample sizes.
 575
 576
 577
 578 @node    CHISQUARE, , BINOMIAL, NPAR TESTS
 579 @subsection Chisquare test
 580 @vindex CHISQUARE
 581 @cindex chisquare test
 582
 583
 584 @display
 585      [ /CHISQUARE=var_list[(lo,hi)] [/EXPECTED=@{EQUAL|f1, f2 @dots{} fn@}] ]
 586 @end display
 587
 588
 589 The chisquare test produces a chi-square statistic for the differences
 590 between the expected and observed frequencies of the categories of a variable.
 591 Optionally, a range of values may appear after the variable list.
 592 If a range is given, then non integer values are truncated, and values
 593 outside the  specified range are excluded from the analysis.
 594
 595 The /EXPECTED subcommand specifies the expected values of each
 596 category.
 597 There must be exactly one non-zero expected value, for each observed
 598 category, or the EQUAL keywork must be specified.
 599 You may use the notation @var{n}*@var{f} to specify @var{n}
 600 consecutive expected categories all taking a frequency of @var{f}.
 601 The frequencies given are proportions, not absolute frequencies.  The
 602 sum of the frequencies need not be 1.
 603 If no /EXPECTED subcommand is given, then then equal frequencies
 604 are expected.
 605
 606
 607 @node T-TEST, ONEWAY, NPAR TESTS, Statistics
 608 @comment  node-name,  next,  previous,  up
 609 @section T-TEST
 610
 611 @vindex T-TEST
 612
 613 @display
 614 T-TEST
 615         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 616         /CRITERIA=CIN(confidence)
 617
 618
 619 (One Sample mode.)
 620         TESTVAL=test_value
 621         /VARIABLES=var_list
 622
 623
 624 (Independent Samples mode.)
 625         GROUPS=var(value1 [, value2])
 626         /VARIABLES=var_list
 627
 628
 629 (Paired Samples mode.)
 630         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 631
 632 @end display
 633
 634
 635 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 636 means.
 637 It operates in one of three modes:
 638 @itemize
 639 @item One Sample mode.
 640 @item Independent Groups mode.
 641 @item Paired mode.
 642 @end itemize
 643
 644 @noindent
 645 Each of these modes are described in more detail below.
 646 There are two optional subcommands which are common to all modes.
 647
 648 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 649 in the tests.  The default value is 0.95.
 650
 651
 652 The @cmd{MISSING} subcommand determines the handling of missing
 653 variables.
 654 If INCLUDE is set, then user-missing values are included in the
 655 calculations, but system-missing values are not.
 656 If EXCLUDE is set, which is the default, user-missing
 657 values are excluded as well as system-missing values.
 658 This is the default.
 659
 660 If LISTWISE is set, then the entire case is excluded from analysis
 661 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 662 @cmd{/GROUPS} subcommands contains a missing value.
 663 If ANALYSIS is set, then missing values are excluded only in the analysis for
 664 which they would be needed. This is the default.
 665
 666
 667 @menu
 668 * One Sample Mode::             Testing against a hypothesised mean
 669 * Independent Samples Mode::    Testing two independent groups for equal mean
 670 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 671 @end menu
 672
 673 @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
 674 @subsection One Sample Mode
 675
 676 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 677 This mode is used to test a population mean against a hypothesised
 678 mean.
 679 The value given to the @cmd{TESTVAL} subcommand is the value against
 680 which you wish to test.
 681 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 682 tell PSPP which variables you wish to test.
 683
 684 @node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST
 685 @comment  node-name,  next,  previous,  up
 686 @subsection Independent Samples Mode
 687
 688 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 689 `Groups' mode.
 690 This mode is used to test whether two groups of values have the
 691 same population mean.
 692 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 693 tell PSPP the dependent variables you wish to test.
 694
 695 The variable given in the @cmd{GROUPS} subcommand is the independent
 696 variable which determines to which group the samples belong.
 697 The values in parentheses are the specific values of the independent
 698 variable for each group.
 699 If the parentheses are omitted and no values are given, the default values
 700 of 1.0 and 2.0 are assumed.
 701
 702 If the independent variable is numeric,
 703 it is acceptable to specify only one value inside the parentheses.
 704 If you do this, cases where the independent variable is
 705 less than  or equal to this value belong to the first group, and cases
 706 greater than this value belong to the second group.
 707 When using this form of the @cmd{GROUPS} subcommand, missing values in
 708 the independent variable are excluded on a listwise basis, regardless
 709 of whether @cmd{/MISSING=LISTWISE} was specified.
 710
 711
 712 @node Paired Samples Mode,  , Independent Samples Mode, T-TEST
 713 @comment  node-name,  next,  previous,  up
 714 @subsection Paired Samples Mode
 715
 716 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 717 Use this mode when repeated measures have been taken from the same
 718 samples.
 719 If the @code{WITH} keyword is omitted, then tables for all
 720 combinations of variables given in the @cmd{PAIRS} subcommand are
 721 generated.
 722 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 723 is also given, then the number of variables preceding @code{WITH}
 724 must be the same as the number following it.
 725 In this case, tables for each respective pair of variables are
 726 generated.
 727 In the event that the @code{WITH} keyword is given, but the
 728 @code{(PAIRED)} keyword is omitted, then tables for each combination
 729 of variable preceding @code{WITH} against variable following
 730 @code{WITH} are generated.
 731
 732
 733 @node ONEWAY, RANK, T-TEST, Statistics
 734 @comment  node-name,  next,  previous,  up
 735 @section ONEWAY
 736
 737 @vindex ONEWAY
 738 @cindex analysis of variance
 739 @cindex ANOVA
 740
 741 @display
 742 ONEWAY
 743         [/VARIABLES = ] var_list BY var
 744         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 745         /CONTRASTS= value1 [, value2] ... [,valueN]
 746         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 747
 748 @end display
 749
 750 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 751 variables factored by a single independent variable.
 752 It is used to compare the means of a population
 753 divided into more than two groups.
 754
 755 The  variables to be analysed should be given in the @code{VARIABLES}
 756 subcommand.
 757 The list of variables must be followed by the @code{BY} keyword and
 758 the name of the independent (or factor) variable.
 759
 760 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 761 ancilliary information.  The options accepted are:
 762 @itemize
 763 @item DESCRIPTIVES
 764 Displays descriptive statistics about the groups factored by the independent
 765 variable.
 766 @item HOMOGENEITY
 767 Displays the Levene test of Homogeneity of Variance for the
 768 variables and their groups.
 769 @end itemize
 770
 771 The @code{CONTRASTS} subcommand is used when you anticipate certain
 772 differences between the groups.
 773 The subcommand must be followed by a list of numerals which are the
 774 coefficients of the groups to be tested.
 775 The number of coefficients must correspond to the number of distinct
 776 groups (or values of the independent variable).
 777 If the total sum of the coefficients are not zero, then PSPP will
 778 display a warning, but will proceed with the analysis.
 779 The @code{CONTRASTS} subcommand may be given up to 10 times in order
 780 to specify different contrast tests.
 781 @setfilename ignored
 782
 783 @node RANK, REGRESSION, ONEWAY, Statistics
 784 @comment  node-name,  next,  previous,  up
 785 @section RANK
 786
 787 @vindex RANK
 788 @display
 789 RANK
 790         [VARIABLES=] var_list [@{A,D@}] [BY var_list]
 791         /TIES=@{MEAN,LOW,HIGH,CONDENSE@}
 792         /FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
 793         /PRINT[=@{YES,NO@}
 794         /MISSING=@{EXCLUDE,INCLUDE@}
 795
 796         /RANK [INTO var_list]
 797         /NTILES(k) [INTO var_list]
 798         /NORMAL [INTO var_list]
 799         /PERCENT [INTO var_list]
 800         /RFRACTION [INTO var_list]
 801         /PROPORTION [INTO var_list]
 802         /N [INTO var_list]
 803         /SAVAGE [INTO var_list]
 804 @end display
 805
 806 The @cmd{RANK} command ranks variables and stores the results into new
 807 variables.
 808
 809 The VARIABLES subcommand, which is mandatory, specifies one or
 810 more variables whose values are to be ranked.
 811 After each variable, @samp{A} or @samp{D} may appear, indicating that
 812 the variable is to be ranked in ascending or descending order.
 813 Ascending is the default.
 814 If a BY keyword appears, it should be followed by a list of variables
 815 which are to serve as group variables.
 816 In this case, the cases are gathered into groups, and ranks calculated
 817 for each group.
 818
 819 The TIES subcommand specifies how tied values are to be treated.  The
 820 default is to take the mean value of all the tied cases.
 821
 822 The FRACTION subcommand specifies how proportional ranks are to be
 823 calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
 824 functions are requested.
 825
 826 The PRINT subcommand may be used to specify that a summary of the rank
 827 variables created should appear in the output.
 828
 829 The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
 830 PROPORTION and SAVAGE.  Any number of function subcommands may appear.
 831 If none are given, then the default is RANK.
 832 The NTILES subcommand must take an integer specifying the number of
 833 partitions into which values should be ranked.
 834 Each subcommand may be followed by the INTO keyword and a list of
 835 variables which are the variables to be created and receive the rank
 836 scores.  There may be as many variables specified as there are
 837 variables named on the VARIABLES subcommand.  If fewer are specified,
 838 then the variable names are automatically created.
 839
 840 The MISSING subcommand determines how user missing values are to be
 841 treated. A setting of EXCLUDE means that variables whose values are
 842 user-missing are to be excluded from the rank scores. A setting of
 843 INCLUDE means they are to be included.  The default is EXCLUDE.
 844
 845 @include regression.texi