doc/statistics.texi

   1 @node Statistics
   2 @chapter Statistics
   3
   4 This chapter documents the statistical procedures that PSPP supports so
   5 far.
   6
   7 @menu
   8 * DESCRIPTIVES::                Descriptive statistics.
   9 * FREQUENCIES::                 Frequency tables.
  10 * EXAMINE::                     Testing data for normality.
  11 * CORRELATIONS::                Correlation tables.
  12 * CROSSTABS::                   Crosstabulation tables.
  13 * FACTOR::                      Factor analysis and Principal Components analysis
  14 * NPAR TESTS::                  Nonparametric tests.
  15 * T-TEST::                      Test hypotheses about means.
  16 * ONEWAY::                      One way analysis of variance.
  17 * RANK::                        Compute rank scores.
  18 * REGRESSION::                  Linear regression.
  19 * RELIABILITY::                 Reliability analysis.
  20 * ROC::                         Receiver Operating Characteristic.
  21 @end menu
  22
  23 @node DESCRIPTIVES
  24 @section DESCRIPTIVES
  25
  26 @vindex DESCRIPTIVES
  27 @display
  28 DESCRIPTIVES
  29         /VARIABLES=var_list
  30         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
  31         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
  32         /SAVE
  33         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
  34                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
  35                      SESKEWNESS,SEKURTOSIS@}
  36         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
  37                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
  38               @{A,D@}
  39 @end display
  40
  41 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
  42 descriptive
  43 statistics requested by the user.  In addition, it can optionally
  44 compute Z-scores.
  45
  46 The VARIABLES subcommand, which is required, specifies the list of
  47 variables to be analyzed.  Keyword VARIABLES is optional.
  48
  49 All other subcommands are optional:
  50
  51 The MISSING subcommand determines the handling of missing variables.  If
  52 INCLUDE is set, then user-missing values are included in the
  53 calculations.  If NOINCLUDE is set, which is the default, user-missing
  54 values are excluded.  If VARIABLE is set, then missing values are
  55 excluded on a variable by variable basis; if LISTWISE is set, then
  56 the entire case is excluded whenever any value in that case has a
  57 system-missing or, if INCLUDE is set, user-missing value.
  58
  59 The FORMAT subcommand affects the output format.  Currently the
  60 LABELS/NOLABELS and NOINDEX/INDEX settings are not used.  When SERIAL is
  61 set, both valid and missing number of cases are listed in the output;
  62 when NOSERIAL is set, only valid cases are listed.
  63
  64 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
  65 the specified variables.  The Z scores are saved to new variables.
  66 Variable names are generated by trying first the original variable name
  67 with Z prepended and truncated to a maximum of 8 characters, then the
  68 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
  69 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
  70 variable names can be specified explicitly on VARIABLES in the variable
  71 list by enclosing them in parentheses after each variable.
  72
  73 The STATISTICS subcommand specifies the statistics to be displayed:
  74
  75 @table @code
  76 @item ALL
  77 All of the statistics below.
  78 @item MEAN
  79 Arithmetic mean.
  80 @item SEMEAN
  81 Standard error of the mean.
  82 @item STDDEV
  83 Standard deviation.
  84 @item VARIANCE
  85 Variance.
  86 @item KURTOSIS
  87 Kurtosis and standard error of the kurtosis.
  88 @item SKEWNESS
  89 Skewness and standard error of the skewness.
  90 @item RANGE
  91 Range.
  92 @item MINIMUM
  93 Minimum value.
  94 @item MAXIMUM
  95 Maximum value.
  96 @item SUM
  97 Sum.
  98 @item DEFAULT
  99 Mean, standard deviation of the mean, minimum, maximum.
 100 @item SEKURTOSIS
 101 Standard error of the kurtosis.
 102 @item SESKEWNESS
 103 Standard error of the skewness.
 104 @end table
 105
 106 The SORT subcommand specifies how the statistics should be sorted.  Most
 107 of the possible values should be self-explanatory.  NAME causes the
 108 statistics to be sorted by name.  By default, the statistics are listed
 109 in the order that they are specified on the VARIABLES subcommand.  The A
 110 and D settings request an ascending or descending sort order,
 111 respectively.
 112
 113 @node FREQUENCIES
 114 @section FREQUENCIES
 115
 116 @vindex FREQUENCIES
 117 @display
 118 FREQUENCIES
 119         /VARIABLES=var_list
 120         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
 121                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
 122                 @{LABELS,NOLABELS@}
 123                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
 124                 @{SINGLE,DOUBLE@}
 125                 @{OLDPAGE,NEWPAGE@}
 126         /MISSING=@{EXCLUDE,INCLUDE@}
 127         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
 128                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
 129                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
 130         /NTILES=ntiles
 131         /PERCENTILES=percent@dots{}
 132         /HISTOGRAM=[MINIMUM(x_min)] [MAXIMUM(x_max)]
 133                    [@{FREQ,PCNT@}] [@{NONORMAL,NORMAL@}]
 134         /PIECHART=[MINIMUM(x_min)] [MAXIMUM(x_max)] @{NOMISSING,MISSING@}
 135
 136 (These options are not currently implemented.)
 137         /BARCHART=@dots{}
 138         /HBAR=@dots{}
 139         /GROUPED=@dots{}
 140 @end display
 141
 142 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
 143 variables.
 144 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
 145 (including median and mode) and percentiles.
 146
 147 @cmd{FREQUENCIES} also support graphical output in the form of
 148 histograms and pie charts.  In the future, it will be able to produce
 149 bar charts and output percentiles for grouped data.
 150
 151 The VARIABLES subcommand is the only required subcommand.  Specify the
 152 variables to be analyzed.
 153
 154 The FORMAT subcommand controls the output format.  It has several
 155 possible settings:
 156
 157 @itemize @bullet
 158 @item
 159 TABLE, the default, causes a frequency table to be output for every
 160 variable specified.  NOTABLE prevents them from being output.  LIMIT
 161 with a numeric argument causes them to be output except when there are
 162 more than the specified number of values in the table.
 163
 164 @item
 165 STANDARD frequency tables contain more complete information, but also to
 166 take up more space on the printed page.  CONDENSE frequency tables are
 167 less informative but take up less space.  ONEPAGE with a numeric
 168 argument will output standard frequency tables if there are the
 169 specified number of values or less, condensed tables otherwise.  ONEPAGE
 170 without an argument defaults to a threshold of 50 values.
 171
 172 @item
 173 LABELS causes value labels to be displayed in STANDARD frequency
 174 tables.  NOLABLES prevents this.
 175
 176 @item
 177 Normally frequency tables are sorted in ascending order by value.  This
 178 is AVALUE.  DVALUE tables are sorted in descending order by value.
 179 AFREQ and DFREQ tables are sorted in ascending and descending order,
 180 respectively, by frequency count.
 181
 182 @item
 183 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
 184 frequency tables have wider spacing.
 185
 186 @item
 187 OLDPAGE and NEWPAGE are not currently used.
 188 @end itemize
 189
 190 The MISSING subcommand controls the handling of user-missing values.
 191 When EXCLUDE, the default, is set, user-missing values are not included
 192 in frequency tables or statistics.  When INCLUDE is set, user-missing
 193 are included.  System-missing values are never included in statistics,
 194 but are listed in frequency tables.
 195
 196 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
 197 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
 198 value, and MODE, the mode.  (If there are multiple modes, the smallest
 199 value is reported.)  By default, the mean, standard deviation of the
 200 mean, minimum, and maximum are reported for each variable.
 201
 202 @cindex percentiles
 203 PERCENTILES causes the specified percentiles to be reported.
 204 The percentiles should  be presented at a list of numbers between 0
 205 and 100 inclusive.
 206 The NTILES subcommand causes the percentiles to be reported at the
 207 boundaries of the data set divided into the specified number of ranges.
 208 For instance, @code{/NTILES=4} would cause quartiles to be reported.
 209
 210 The HISTOGRAM subcommand causes the output to include a histogram for
 211 each specified numeric variable.  The X axis by default ranges from the
 212 minimum to the maximum value observed in the data, but the MINIMUM and
 213 MAXIMUM keywords can set an explicit range.  The Y axis by default is
 214 labeled in frequencies; use the PERCENT keyword to causes it to be
 215 labeled in percent of the total observed count.  Specify NORMAL to
 216 superimpose a normal curve on the histogram.
 217 Histograms are not created for string variables.
 218
 219 The PIECHART adds a pie chart for each variable to the data.  Each
 220 slice represents one value, with the size of the slice proportional to
 221 the value's frequency.  By default, all non-missing values are given
 222 slices.  The MINIMUM and MAXIMUM keywords can be used to limit the
 223 displayed slices to a given range of values.  The MISSING keyword adds
 224 slices for missing values.
 225
 226 @node EXAMINE
 227 @comment  node-name,  next,  previous,  up
 228 @section EXAMINE
 229 @vindex EXAMINE
 230
 231 @cindex Normality, testing for
 232
 233 @display
 234 EXAMINE
 235         VARIABLES=var_list [BY factor_list ]
 236         /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@}
 237         /PLOT=@{BOXPLOT, NPPLOT, HISTOGRAM, ALL, NONE@}
 238         /CINTERVAL n
 239         /COMPARE=@{GROUPS,VARIABLES@}
 240         /ID=var_name
 241         /@{TOTAL,NOTOTAL@}
 242         /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @}
 243         /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}]
 244                 [@{NOREPORT,REPORT@}]
 245
 246 @end display
 247
 248 The @cmd{EXAMINE} command is used to test how closely a distribution is to a
 249 normal distribution.  It also shows you outliers and extreme values.
 250
 251 The VARIABLES subcommand specifies the dependent variables and the
 252 independent variable to use as factors for the analysis.   Variables
 253 listed before the first BY keyword are the dependent variables.
 254 The dependent variables may optionally be followed by a list of
 255 factors which tell PSPP how to break down the analysis for each
 256 dependent variable.  The format for each factor is
 257 @display
 258 var [BY var].
 259 @end display
 260
 261
 262 The STATISTICS subcommand specifies the analysis to be done.
 263 DESCRIPTIVES will produce a table showing some parametric and
 264 non-parametrics statistics.  EXTREME produces a table showing extreme
 265 values of the dependent variable.  A number in parentheses determines
 266 how many upper and lower extremes to show.  The default number is 5.
 267
 268
 269 The PLOT subcommand specifies which plots are to be produced if any.
 270
 271 The COMPARE subcommand is only relevant if producing boxplots, and it is only
 272 useful there is more than one dependent variable and at least one factor.   If
 273 /COMPARE=GROUPS is specified, then one plot per dependent variable is produced,
 274 containing boxplots for all the factors.
 275 If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each
 276 each containing one boxplot per dependent variable.
 277 If the /COMPARE subcommand is ommitted, then PSPP uses the default value of
 278 /COMPARE=GROUPS.
 279
 280 The ID subcommand also pertains to boxplots.  If given, it must
 281 specify a variable name.   Outliers and extreme cases plotted in
 282 boxplots will be labelled with the case from that variable.  Numeric or
 283 string variables are permissible.  If the ID subcommand is not given,
 284 then the casenumber will be used for labelling.
 285
 286 The CINTERVAL subcommand specifies the confidence interval to use in
 287 calculation of the descriptives command.  The default it 95%.
 288
 289 @cindex percentiles
 290 The PERCENTILES subcommand specifies which percentiles are to be calculated,
 291 and which algorithm to use for calculating them.  The default is to
 292 calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the
 293 HAVERAGE algorithm.
 294
 295 The TOTAL and NOTOTAL subcommands are mutually exclusive.  If NOTOTAL
 296 is given and factors have been specified in the VARIABLES subcommand,
 297 then then statistics for the unfactored dependent variables are
 298 produced in addition to the factored variables.  If there are no
 299 factors specified then TOTAL and NOTOTAL have no effect.
 300
 301 @strong{Warning!}
 302 If many dependent variable are given, or factors are given for which
 303 there are many distinct values, then @cmd{EXAMINE} will produce a very
 304 large quantity of output.
 305
 306 @node CORRELATIONS
 307 @section CORRELATIONS
 308
 309 @vindex CORRELATIONS
 310 @display
 311 CORRELATIONS
 312      /VARIABLES = varlist [ WITH varlist ]
 313      [
 314       .
 315       .
 316       .
 317       /VARIABLES = varlist [ WITH varlist ]
 318       /VARIABLES = varlist [ WITH varlist ]
 319      ]
 320
 321      [ /PRINT=@{TWOTAIL, ONETAIL@} @{SIG, NOSIG@} ]
 322      [ /STATISTICS=DESCRIPTIVES XPROD ALL]
 323      [ /MISSING=@{PAIRWISE, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
 324 @end display
 325
 326 @cindex correlation
 327 The @cmd{CORRELATIONS} procedure produces tables of the Pearson correlation coefficient
 328 for a set of variables.  The significance of the coefficients are also given.
 329
 330 At least one VARIABLES subcommand is required. If the WITH keyword is used, then a non-square
 331 correlation table will be produced.
 332 The variables preceding WITH, will be used as the rows of the table, and the variables following
 333 will be the columns of the table.
 334 If no WITH subcommand is given, then a square, symmetrical table using all variables is produced.
 335
 336
 337 The @cmd{MISSING} subcommand determines the handling of missing variables.
 338 If INCLUDE is set, then user-missing values are included in the
 339 calculations, but system-missing values are not.
 340 If EXCLUDE is set, which is the default, user-missing
 341 values are excluded as well as system-missing values.
 342 This is the default.
 343
 344 If LISTWISE is set, then the entire case is excluded from analysis
 345 whenever any variable  specified in any @cmd{/VARIABLES} subcommand
 346 contains a missing value.
 347 If PAIRWISE is set, then a case is considered missing only if either of the
 348 values  for the particular coefficient are missing.
 349 The default is PAIRWISE.
 350
 351 The PRINT subcommand is used to control how the reported significance values are printed.
 352 If the TWOTAIL option is used, then a two-tailed test of significance is
 353 printed.  If the ONETAIL option is given, then a one-tailed test is used.
 354 The default is TWOTAIL.
 355
 356 If the NOSIG option is specified, then correlation coefficients with significance less than
 357 0.05 are highlighted.
 358 If SIG is specified, then no highlighting is performed.  This is the default.
 359
 360 @cindex covariance
 361 The STATISTICS subcommand requests additional statistics to be displayed.  The keyword
 362 DESCRIPTIVES requests that the mean, number of non-missing cases, and the non-biased
 363 estimator of the standard deviation are displayed.
 364 These statistics will be displayed in a separated table, for all the variables listed
 365 in any /VARIABLES subcommand.
 366 The XPROD keyword requests cross-product deviations and covariance estimators to
 367 be displayed for each pair of variables.
 368 The keyword ALL is the union of DESCRIPTIVES and XPROD.
 369
 370 @node CROSSTABS
 371 @section CROSSTABS
 372
 373 @vindex CROSSTABS
 374 @display
 375 CROSSTABS
 376         /TABLES=var_list BY var_list [BY var_list]@dots{}
 377         /MISSING=@{TABLE,INCLUDE,REPORT@}
 378         /WRITE=@{NONE,CELLS,ALL@}
 379         /FORMAT=@{TABLES,NOTABLES@}
 380                 @{LABELS,NOLABELS,NOVALLABS@}
 381                 @{PIVOT,NOPIVOT@}
 382                 @{AVALUE,DVALUE@}
 383                 @{NOINDEX,INDEX@}
 384                 @{BOX,NOBOX@}
 385         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
 386                 ASRESIDUAL,ALL,NONE@}
 387         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
 388                      KAPPA,ETA,CORR,ALL,NONE@}
 389
 390 (Integer mode.)
 391         /VARIABLES=var_list (low,high)@dots{}
 392 @end display
 393
 394 The @cmd{CROSSTABS} procedure displays crosstabulation
 395 tables requested by the user.  It can calculate several statistics for
 396 each cell in the crosstabulation tables.  In addition, a number of
 397 statistics can be calculated for each table itself.
 398
 399 The TABLES subcommand is used to specify the tables to be reported.  Any
 400 number of dimensions is permitted, and any number of variables per
 401 dimension is allowed.  The TABLES subcommand may be repeated as many
 402 times as needed.  This is the only required subcommand in @dfn{general
 403 mode}.
 404
 405 Occasionally, one may want to invoke a special mode called @dfn{integer
 406 mode}.  Normally, in general mode, PSPP automatically determines
 407 what values occur in the data.  In integer mode, the user specifies the
 408 range of values that the data assumes.  To invoke this mode, specify the
 409 VARIABLES subcommand, giving a range of data values in parentheses for
 410 each variable to be used on the TABLES subcommand.  Data values inside
 411 the range are truncated to the nearest integer, then assigned to that
 412 value.  If values occur outside this range, they are discarded.  When it
 413 is present, the VARIABLES subcommand must precede the TABLES
 414 subcommand.
 415
 416 In general mode, numeric and string variables may be specified on
 417 TABLES.  In integer mode, only numeric variables are allowed.
 418
 419 The MISSING subcommand determines the handling of user-missing values.
 420 When set to TABLE, the default, missing values are dropped on a table by
 421 table basis.  When set to INCLUDE, user-missing values are included in
 422 tables and statistics.  When set to REPORT, which is allowed only in
 423 integer mode, user-missing values are included in tables but marked with
 424 an @samp{M} (for ``missing'') and excluded from statistical
 425 calculations.
 426
 427 Currently the WRITE subcommand is ignored.
 428
 429 The FORMAT subcommand controls the characteristics of the
 430 crosstabulation tables to be displayed.  It has a number of possible
 431 settings:
 432
 433 @itemize @bullet
 434 @item
 435 TABLES, the default, causes crosstabulation tables to be output.
 436 NOTABLES suppresses them.
 437
 438 @item
 439 LABELS, the default, allows variable labels and value labels to appear
 440 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
 441 labels but suppresses value labels.
 442
 443 @item
 444 PIVOT, the default, causes each TABLES subcommand to be displayed in a
 445 pivot table format.  NOPIVOT causes the old-style crosstabulation format
 446 to be used.
 447
 448 @item
 449 AVALUE, the default, causes values to be sorted in ascending order.
 450 DVALUE asserts a descending sort order.
 451
 452 @item
 453 INDEX/NOINDEX is currently ignored.
 454
 455 @item
 456 BOX/NOBOX is currently ignored.
 457 @end itemize
 458
 459 The CELLS subcommand controls the contents of each cell in the displayed
 460 crosstabulation table.  The possible settings are:
 461
 462 @table @asis
 463 @item COUNT
 464 Frequency count.
 465 @item ROW
 466 Row percent.
 467 @item COLUMN
 468 Column percent.
 469 @item TOTAL
 470 Table percent.
 471 @item EXPECTED
 472 Expected value.
 473 @item RESIDUAL
 474 Residual.
 475 @item SRESIDUAL
 476 Standardized residual.
 477 @item ASRESIDUAL
 478 Adjusted standardized residual.
 479 @item ALL
 480 All of the above.
 481 @item NONE
 482 Suppress cells entirely.
 483 @end table
 484
 485 @samp{/CELLS} without any settings specified requests COUNT, ROW,
 486 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
 487 will be selected.
 488
 489 The STATISTICS subcommand selects statistics for computation:
 490
 491 @table @asis
 492 @item CHISQ
 493 @cindex chisquare
 494 @cindex chi-square
 495
 496 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
 497 correction, linear-by-linear association.
 498 @item PHI
 499 Phi.
 500 @item CC
 501 Contingency coefficient.
 502 @item LAMBDA
 503 Lambda.
 504 @item UC
 505 Uncertainty coefficient.
 506 @item BTAU
 507 Tau-b.
 508 @item CTAU
 509 Tau-c.
 510 @item RISK
 511 Risk estimate.
 512 @item GAMMA
 513 Gamma.
 514 @item D
 515 Somers' D.
 516 @item KAPPA
 517 Cohen's Kappa.
 518 @item ETA
 519 Eta.
 520 @item CORR
 521 Spearman correlation, Pearson's r.
 522 @item ALL
 523 All of the above.
 524 @item NONE
 525 No statistics.
 526 @end table
 527
 528 Selected statistics are only calculated when appropriate for the
 529 statistic.  Certain statistics require tables of a particular size, and
 530 some statistics are calculated only in integer mode.
 531
 532 @samp{/STATISTICS} without any settings selects CHISQ.  If the
 533 STATISTICS subcommand is not given, no statistics are calculated.
 534
 535 @strong{Please note:} Currently the implementation of CROSSTABS has the
 536 followings bugs:
 537
 538 @itemize @bullet
 539 @item
 540 Pearson's R (but not Spearman) is off a little.
 541 @item
 542 T values for Spearman's R and Pearson's R are wrong.
 543 @item
 544 Significance of symmetric and directional measures is not calculated.
 545 @item
 546 Asymmetric ASEs and T values for lambda are wrong.
 547 @item
 548 ASE of Goodman and Kruskal's tau is not calculated.
 549 @item
 550 ASE of symmetric somers' d is wrong.
 551 @item
 552 Approximate T of uncertainty coefficient is wrong.
 553 @end itemize
 554
 555 Fixes for any of these deficiencies would be welcomed.
 556
 557 @node FACTOR
 558 @section FACTOR
 559
 560 @vindex FACTOR
 561 @cindex factor analysis
 562 @cindex principal components analysis
 563 @cindex principal axis factoring
 564 @cindex data reduction
 565
 566 @display
 567 FACTOR  VARIABLES=var_list
 568
 569         [ /METHOD = @{CORRELATION, COVARIANCE@} ]
 570
 571         [ /EXTRACTION=@{PC, PAF@}]
 572
 573         [ /PRINT=[INITIAL] [EXTRACTION] [UNIVARIATE] [CORRELATION] [COVARIANCE] [DET] [SIG] [ALL] [DEFAULT] ]
 574
 575         [ /PLOT=[EIGEN] ]
 576
 577         [ /FORMAT=[SORT] [BLANK(@var{n})] [DEFAULT] ]
 578
 579         [ /CRITERIA=[FACTORS(@var{n})] [MINEIGEN(@var{l})] [ITERATE(@var{m})] [ECONVERGE (@var{delta})] [DEFAULT] ]
 580
 581         [ /MISSING=[@{LISTWISE, PAIRWISE@}] [@{INCLUDE, EXCLUDE@}] ]
 582 @end display
 583
 584 The FACTOR command performs Factor Analysis or Principal Axis Factoring on a dataset.  It may be used to find
 585 common factors in the data or for data reduction purposes.
 586
 587 The VARIABLES subcommand is required.  It lists the variables which are to partake in the analysis.
 588
 589 The /EXTRACTION subcommand is used to specify the way in which factors (components) are extracted from the data.
 590 If PC is specified, then Principal Components Analysis is used.  If PAF is specified, then Principal Axis Factoring is
 591 used. By default Principal Components Analysis will be used.
 592
 593 The /METHOD subcommand should be used to determine whether the covariance matrix or the correlation matrix of the data is
 594 to be analysed.  By default, the correlation matrix is analysed.
 595
 596 The /PRINT subcommand may be used to select which features of the analysis are reported:
 597
 598 @itemize
 599 @item UNIVARIATE
 600       A table of mean values, standard deviations and total weights are printed.
 601 @item INITIAL
 602       Initial communalities and eigenvalues are printed.
 603 @item EXTRACTION
 604       Extracted communalities and eigenvalues are printed.
 605 @item CORRELATION
 606       The correlation matrix is printed.
 607 @item COVARIANCE
 608       The covariance matrix is printed.
 609 @item DET
 610       The determinant of the correlation or covariance matrix is printed.
 611 @item SIG
 612       The significance of the elements of correlation matrix is printed.
 613 @item ALL
 614       All of the above are printed.
 615 @item DEFAULT
 616       Identical to INITIAL and EXTRACTION.
 617 @end itemize
 618
 619 If /PLOT=EIGEN is given, then a ``Scree'' plot of the eigenvalues will be printed.  This can be useful for visualising
 620 which factors (components) should be retained.
 621
 622 The /FORMAT subcommand determined how data are to be displayed in loading matrices.  If SORT is specified, then the variables
 623 are sorted in descending order of significance.  If BLANK(@var{n}) is specified, then coefficients whose absolute value is less
 624 than @var{n} will not be printed.  If the keyword DEFAULT is given, or if no /FORMAT subcommand is given, then no sorting is
 625 performed, and all coefficients will be printed.
 626
 627 The /CRITERIA subcommand is used to specify how the number of extracted factors (components) are chosen.  If FACTORS(@var{n}) is
 628 specified, where @var{n} is an integer, then @var{n} factors will be extracted.  Otherwise, the MINEIGEN setting will
 629 be used.  MINEIGEN(@var{l}) requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted.
 630 The default value of @var{l} is 1.    The ECONVERGE and ITERATE settings have effect only when iterative algorithms for factor
 631 extraction (such as Principal Axis Factoring) are used.   ECONVERGE(@var{delta}) specifies that iteration should cease when
 632 the maximum absolute value of the communality estimate between one iteration and the previous is less than @var{delta}. The
 633 default value of @var{delta} is 0.001.
 634 The ITERATE(@var{m}) setting sets the maximum number of iterations to @var{m}.  The default value of @var{m} is 25.
 635
 636 The @cmd{MISSING} subcommand determines the handling of missing variables.
 637 If INCLUDE is set, then user-missing values are included in the
 638 calculations, but system-missing values are not.
 639 If EXCLUDE is set, which is the default, user-missing
 640 values are excluded as well as system-missing values.
 641 This is the default.
 642 If LISTWISE is set, then the entire case is excluded from analysis
 643 whenever any variable  specified in the @cmd{VARIABLES} subcommand
 644 contains a missing value.
 645 If PAIRWISE is set, then a case is considered missing only if either of the
 646 values  for the particular coefficient are missing.
 647 The default is LISTWISE.
 648
 649
 650 @node NPAR TESTS
 651 @section NPAR TESTS
 652
 653 @vindex NPAR TESTS
 654 @cindex nonparametric tests
 655
 656 @display
 657 NPAR TESTS
 658
 659      nonparametric test subcommands
 660      .
 661      .
 662      .
 663
 664      [ /STATISTICS=@{DESCRIPTIVES@} ]
 665
 666      [ /MISSING=@{ANALYSIS, LISTWISE@} @{INCLUDE, EXCLUDE@} ]
 667
 668      [ /METHOD=EXACT [ TIMER [(n)] ] ]
 669 @end display
 670
 671 NPAR TESTS performs nonparametric tests.
 672 Non parametric tests make very few assumptions about the distribution of the
 673 data.
 674 One or more tests may be specified by using the corresponding subcommand.
 675 If the /STATISTICS subcommand is also specified, then summary statistics are
 676 produces for each variable that is the subject of any test.
 677
 678 Certain tests may take a long time to execute, if an exact figure is required.
 679 Therefore, by default asymptotic approximations are used unless the
 680 subcommand /METHOD=EXACT is specified.
 681 Exact tests give more accurate results, but may take an unacceptably long
 682 time to perform.  If the TIMER keyword is used, it sets a maximum time,
 683 after which the test will be abandoned, and a warning message printed.
 684 The time, in minutes, should be specified in parentheses after the TIMER keyword.
 685 If the TIMER keyword is given without this figure, then a default value of 5 minutes
 686 is used.
 687
 688
 689 @menu
 690 * BINOMIAL::                Binomial Test
 691 * CHISQUARE::               Chisquare Test
 692 * WILCOXON::                Wilcoxon Signed Ranks Test
 693 * SIGN::                    The Sign Test
 694 @end menu
 695
 696
 697 @node    BINOMIAL
 698 @subsection Binomial test
 699 @vindex BINOMIAL
 700 @cindex binomial test
 701
 702 @display
 703      [ /BINOMIAL[(p)]=var_list[(value1[, value2)] ] ]
 704 @end display
 705
 706 The /BINOMIAL subcommand compares the observed distribution of a dichotomous
 707 variable with that of a binomial distribution.
 708 The variable @var{p} specifies the test proportion of the binomial
 709 distribution.
 710 The default value of 0.5 is assumed if @var{p} is omitted.
 711
 712 If a single value appears after the variable list, then that value is
 713 used as the threshold to partition the observed values. Values less
 714 than or equal to the threshold value form the first category.  Values
 715 greater than the threshold form the second category.
 716
 717 If two values appear after the variable list, then they will be used
 718 as the values which a variable must take to be in the respective
 719 category.
 720 Cases for which a variable takes a value equal to neither of the specified
 721 values, take no part in the test for that variable.
 722
 723 If no values appear, then the variable must assume dichotomous
 724 values.
 725 If more than two distinct, non-missing values for a variable
 726 under test are encountered then an error occurs.
 727
 728 If the test proportion is equal to 0.5, then a two tailed test is
 729 reported.   For any other test proportion, a one tailed test is
 730 reported.
 731 For one tailed tests, if the test proportion is less than
 732 or equal to the observed proportion, then the significance of
 733 observing the observed proportion or more is reported.
 734 If the test proportion is more than the observed proportion, then the
 735 significance of observing the observed proportion or less is reported.
 736 That is to say, the test is always performed in the observed
 737 direction.
 738
 739 PSPP uses a very precise approximation to the gamma function to
 740 compute the binomial significance.  Thus, exact results are reported
 741 even for very large sample sizes.
 742
 743
 744
 745 @node    CHISQUARE
 746 @subsection Chisquare Test
 747 @vindex CHISQUARE
 748 @cindex chisquare test
 749
 750
 751 @display
 752      [ /CHISQUARE=var_list[(lo,hi)] [/EXPECTED=@{EQUAL|f1, f2 @dots{} fn@}] ]
 753 @end display
 754
 755
 756 The /CHISQUARE subcommand produces a chi-square statistic for the differences
 757 between the expected and observed frequencies of the categories of a variable.
 758 Optionally, a range of values may appear after the variable list.
 759 If a range is given, then non integer values are truncated, and values
 760 outside the  specified range are excluded from the analysis.
 761
 762 The /EXPECTED subcommand specifies the expected values of each
 763 category.
 764 There must be exactly one non-zero expected value, for each observed
 765 category, or the EQUAL keywork must be specified.
 766 You may use the notation @var{n}*@var{f} to specify @var{n}
 767 consecutive expected categories all taking a frequency of @var{f}.
 768 The frequencies given are proportions, not absolute frequencies.  The
 769 sum of the frequencies need not be 1.
 770 If no /EXPECTED subcommand is given, then then equal frequencies
 771 are expected.
 772
 773 @node WILCOXON
 774 @subsection Wilcoxon Matched Pairs Signed Ranks Test
 775 @comment  node-name,  next,  previous,  up
 776 @vindex WILCOXON
 777 @cindex wilcoxon matched pairs signed ranks test
 778
 779 @display
 780      [ /WILCOXON varlist [ WITH varlist [ (PAIRED) ]]]
 781 @end display
 782
 783 The /WILCOXON subcommand tests for differences between medians of the
 784 variables listed.
 785 The test does not make any assumptions about the variances of the samples.
 786 It does however assume that the distribution is symetrical.
 787
 788 If the @code{WITH} keyword is omitted, then tests for all
 789 combinations of the listed variables are performed.
 790 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 791 is also given, then the number of variables preceding @code{WITH}
 792 must be the same as the number following it.
 793 In this case, tests for each respective pair of variables are
 794 performed.
 795 If the @code{WITH} keyword is given, but the
 796 @code{(PAIRED)} keyword is omitted, then tests for each combination
 797 of variable preceding @code{WITH} against variable following
 798 @code{WITH} are performed.
 799
 800
 801 @node SIGN
 802 @subsection Sign Test
 803 @vindex SIGN
 804 @cindex sign test
 805
 806 @display
 807      [ /SIGN varlist [ WITH varlist [ (PAIRED) ]]]
 808 @end display
 809
 810 The /SIGN subcommand tests for differences between medians of the
 811 variables listed.
 812 The test does not make any assumptions about the
 813 distribution of the data.
 814
 815 If the @code{WITH} keyword is omitted, then tests for all
 816 combinations of the listed variables are performed.
 817 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 818 is also given, then the number of variables preceding @code{WITH}
 819 must be the same as the number following it.
 820 In this case, tests for each respective pair of variables are
 821 performed.
 822 If the @code{WITH} keyword is given, but the
 823 @code{(PAIRED)} keyword is omitted, then tests for each combination
 824 of variable preceding @code{WITH} against variable following
 825 @code{WITH} are performed.
 826
 827 @node T-TEST
 828 @comment  node-name,  next,  previous,  up
 829 @section T-TEST
 830
 831 @vindex T-TEST
 832
 833 @display
 834 T-TEST
 835         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 836         /CRITERIA=CIN(confidence)
 837
 838
 839 (One Sample mode.)
 840         TESTVAL=test_value
 841         /VARIABLES=var_list
 842
 843
 844 (Independent Samples mode.)
 845         GROUPS=var(value1 [, value2])
 846         /VARIABLES=var_list
 847
 848
 849 (Paired Samples mode.)
 850         PAIRS=var_list [WITH var_list [(PAIRED)] ]
 851
 852 @end display
 853
 854
 855 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
 856 means.
 857 It operates in one of three modes:
 858 @itemize
 859 @item One Sample mode.
 860 @item Independent Groups mode.
 861 @item Paired mode.
 862 @end itemize
 863
 864 @noindent
 865 Each of these modes are described in more detail below.
 866 There are two optional subcommands which are common to all modes.
 867
 868 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
 869 in the tests.  The default value is 0.95.
 870
 871
 872 The @cmd{MISSING} subcommand determines the handling of missing
 873 variables.
 874 If INCLUDE is set, then user-missing values are included in the
 875 calculations, but system-missing values are not.
 876 If EXCLUDE is set, which is the default, user-missing
 877 values are excluded as well as system-missing values.
 878 This is the default.
 879
 880 If LISTWISE is set, then the entire case is excluded from analysis
 881 whenever any variable  specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
 882 @cmd{/GROUPS} subcommands contains a missing value.
 883 If ANALYSIS is set, then missing values are excluded only in the analysis for
 884 which they would be needed. This is the default.
 885
 886
 887 @menu
 888 * One Sample Mode::             Testing against a hypothesised mean
 889 * Independent Samples Mode::    Testing two independent groups for equal mean
 890 * Paired Samples Mode::         Testing two interdependent groups for equal mean
 891 @end menu
 892
 893 @node One Sample Mode
 894 @subsection One Sample Mode
 895
 896 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
 897 This mode is used to test a population mean against a hypothesised
 898 mean.
 899 The value given to the @cmd{TESTVAL} subcommand is the value against
 900 which you wish to test.
 901 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 902 tell PSPP which variables you wish to test.
 903
 904 @node Independent Samples Mode
 905 @comment  node-name,  next,  previous,  up
 906 @subsection Independent Samples Mode
 907
 908 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
 909 `Groups' mode.
 910 This mode is used to test whether two groups of values have the
 911 same population mean.
 912 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
 913 tell PSPP the dependent variables you wish to test.
 914
 915 The variable given in the @cmd{GROUPS} subcommand is the independent
 916 variable which determines to which group the samples belong.
 917 The values in parentheses are the specific values of the independent
 918 variable for each group.
 919 If the parentheses are omitted and no values are given, the default values
 920 of 1.0 and 2.0 are assumed.
 921
 922 If the independent variable is numeric,
 923 it is acceptable to specify only one value inside the parentheses.
 924 If you do this, cases where the independent variable is
 925 greater than or equal to this value belong to the first group, and cases
 926 less than this value belong to the second group.
 927 When using this form of the @cmd{GROUPS} subcommand, missing values in
 928 the independent variable are excluded on a listwise basis, regardless
 929 of whether @cmd{/MISSING=LISTWISE} was specified.
 930
 931
 932 @node Paired Samples Mode
 933 @comment  node-name,  next,  previous,  up
 934 @subsection Paired Samples Mode
 935
 936 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
 937 Use this mode when repeated measures have been taken from the same
 938 samples.
 939 If the @code{WITH} keyword is omitted, then tables for all
 940 combinations of variables given in the @cmd{PAIRS} subcommand are
 941 generated.
 942 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
 943 is also given, then the number of variables preceding @code{WITH}
 944 must be the same as the number following it.
 945 In this case, tables for each respective pair of variables are
 946 generated.
 947 In the event that the @code{WITH} keyword is given, but the
 948 @code{(PAIRED)} keyword is omitted, then tables for each combination
 949 of variable preceding @code{WITH} against variable following
 950 @code{WITH} are generated.
 951
 952
 953 @node ONEWAY
 954 @comment  node-name,  next,  previous,  up
 955 @section ONEWAY
 956
 957 @vindex ONEWAY
 958 @cindex analysis of variance
 959 @cindex ANOVA
 960
 961 @display
 962 ONEWAY
 963         [/VARIABLES = ] var_list BY var
 964         /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
 965         /CONTRAST= value1 [, value2] ... [,valueN]
 966         /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
 967
 968 @end display
 969
 970 The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
 971 variables factored by a single independent variable.
 972 It is used to compare the means of a population
 973 divided into more than two groups.
 974
 975 The  variables to be analysed should be given in the @code{VARIABLES}
 976 subcommand.
 977 The list of variables must be followed by the @code{BY} keyword and
 978 the name of the independent (or factor) variable.
 979
 980 You can use the @code{STATISTICS} subcommand to tell PSPP to display
 981 ancilliary information.  The options accepted are:
 982 @itemize
 983 @item DESCRIPTIVES
 984 Displays descriptive statistics about the groups factored by the independent
 985 variable.
 986 @item HOMOGENEITY
 987 Displays the Levene test of Homogeneity of Variance for the
 988 variables and their groups.
 989 @end itemize
 990
 991 The @code{CONTRAST} subcommand is used when you anticipate certain
 992 differences between the groups.
 993 The subcommand must be followed by a list of numerals which are the
 994 coefficients of the groups to be tested.
 995 The number of coefficients must correspond to the number of distinct
 996 groups (or values of the independent variable).
 997 If the total sum of the coefficients are not zero, then PSPP will
 998 display a warning, but will proceed with the analysis.
 999 The @code{CONTRAST} subcommand may be given up to 10 times in order
1000 to specify different contrast tests.
1001
1002 @node RANK
1003 @comment  node-name,  next,  previous,  up
1004 @section RANK
1005
1006 @vindex RANK
1007 @display
1008 RANK
1009         [VARIABLES=] var_list [@{A,D@}] [BY var_list]
1010         /TIES=@{MEAN,LOW,HIGH,CONDENSE@}
1011         /FRACTION=@{BLOM,TUKEY,VW,RANKIT@}
1012         /PRINT[=@{YES,NO@}
1013         /MISSING=@{EXCLUDE,INCLUDE@}
1014
1015         /RANK [INTO var_list]
1016         /NTILES(k) [INTO var_list]
1017         /NORMAL [INTO var_list]
1018         /PERCENT [INTO var_list]
1019         /RFRACTION [INTO var_list]
1020         /PROPORTION [INTO var_list]
1021         /N [INTO var_list]
1022         /SAVAGE [INTO var_list]
1023 @end display
1024
1025 The @cmd{RANK} command ranks variables and stores the results into new
1026 variables.
1027
1028 The VARIABLES subcommand, which is mandatory, specifies one or
1029 more variables whose values are to be ranked.
1030 After each variable, @samp{A} or @samp{D} may appear, indicating that
1031 the variable is to be ranked in ascending or descending order.
1032 Ascending is the default.
1033 If a BY keyword appears, it should be followed by a list of variables
1034 which are to serve as group variables.
1035 In this case, the cases are gathered into groups, and ranks calculated
1036 for each group.
1037
1038 The TIES subcommand specifies how tied values are to be treated.  The
1039 default is to take the mean value of all the tied cases.
1040
1041 The FRACTION subcommand specifies how proportional ranks are to be
1042 calculated.  This only has any effect if NORMAL or PROPORTIONAL rank
1043 functions are requested.
1044
1045 The PRINT subcommand may be used to specify that a summary of the rank
1046 variables created should appear in the output.
1047
1048 The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION,
1049 PROPORTION and SAVAGE.  Any number of function subcommands may appear.
1050 If none are given, then the default is RANK.
1051 The NTILES subcommand must take an integer specifying the number of
1052 partitions into which values should be ranked.
1053 Each subcommand may be followed by the INTO keyword and a list of
1054 variables which are the variables to be created and receive the rank
1055 scores.  There may be as many variables specified as there are
1056 variables named on the VARIABLES subcommand.  If fewer are specified,
1057 then the variable names are automatically created.
1058
1059 The MISSING subcommand determines how user missing values are to be
1060 treated. A setting of EXCLUDE means that variables whose values are
1061 user-missing are to be excluded from the rank scores. A setting of
1062 INCLUDE means they are to be included.  The default is EXCLUDE.
1063
1064 @include regression.texi
1065
1066
1067 @node RELIABILITY
1068 @section RELIABILITY
1069
1070 @vindex RELIABILITY
1071 @display
1072 RELIABILITY
1073         /VARIABLES=var_list
1074         /SCALE (@var{name}) = @{var_list, ALL@}
1075         /MODEL=@{ALPHA, SPLIT[(N)]@}
1076         /SUMMARY=@{TOTAL,ALL@}
1077         /MISSING=@{EXCLUDE,INCLUDE@}
1078 @end display
1079
1080 @cindex Cronbach's Alpha
1081 The @cmd{RELIABILTY} command performs reliablity analysis on the data.
1082
1083 The VARIABLES subcommand is required. It determines the set of variables
1084 upon which analysis is to be performed.
1085
1086 The SCALE subcommand determines which variables reliability is to be
1087 calculated for.  If it is omitted, then analysis for all variables named
1088 in the VARIABLES subcommand will be used.
1089 Optionally, the @var{name} parameter may be specified to set a string name
1090 for the scale.
1091
1092 The MODEL subcommand determines the type of analysis. If ALPHA is specified,
1093 then Cronbach's Alpha is calculated for the scale.  If the model is SPLIT,
1094 then the variables  are divided into 2 subsets.  An optional parameter
1095 @var{N} may be given, to specify how many variables to be in the first subset.
1096 If @var{N} is omitted, then it defaults to one half of the variables in the
1097 scale, or one half minus one if there are an odd number of variables.
1098 The default model is ALPHA.
1099
1100 By default, any cases with user missing, or system missing values for
1101 any variables given
1102 in the VARIABLES subcommand will be omitted from analysis.
1103 The MISSING subcommand determines whether user missing values are to
1104 be included or excluded in the analysis.
1105
1106 The SUMMARY subcommand determines the type of summary analysis to be performed.
1107 Currently there is only one type: SUMMARY=TOTAL, which displays per-item
1108 analysis tested against the totals.
1109
1110
1111
1112 @node ROC
1113 @section ROC
1114
1115 @vindex ROC
1116 @cindex Receiver Operating Characterstic
1117 @cindex Area under curve
1118
1119 @display
1120 ROC     @var{var_list} BY @var{state_var} (@var{state_value})
1121         /PLOT = @{ CURVE [(REFERENCE)], NONE @}
1122         /PRINT = [ SE ] [ COORDINATES ]
1123         /CRITERIA = [ CUTOFF(@{INCLUDE,EXCLUDE@}) ]
1124           [ TESTPOS (@{LARGE,SMALL@}) ]
1125           [ CI (@var{confidence}) ]
1126           [ DISTRIBUTION (@{FREE, NEGEXPO @}) ]
1127         /MISSING=@{EXCLUDE,INCLUDE@}
1128 @end display
1129
1130
1131 The @cmd{ROC} command is used to plot the receiver operating characteristic curve
1132 of a dataset, and to estimate the area under the curve.
1133 This is useful for analysing the efficacy of a variable as a predictor of a state of nature.
1134
1135 The mandatory @var{var_list} is the list of predictor variables.
1136 The variable @var{state_var} is the variable whose values represent the actual states,
1137 and @var{state_value} is the value of this variable which represents the positive state.
1138
1139 The optional subcommand PLOT is used to determine if and how the ROC curve is drawn.
1140 The keyword CURVE means that the ROC curve should be drawn, and the optional keyword REFERENCE,
1141 which should be enclosed in parentheses, says that the diagonal reference line should be drawn.
1142 If the keyword NONE is given, then no ROC curve is drawn.
1143 By default, the curve is drawn with no reference line.
1144
1145 The optional subcommand PRINT determines which additional tables should be printed.
1146 Two additional tables are available.
1147 The SE keyword says that standard error of the area under the curve should be printed as well as
1148 the area itself.
1149 In addition, a p-value under the null hypothesis that the area under the curve equals 0.5 will be
1150 printed.
1151 The COORDINATES keyword says that a table of coordinates of the ROC curve should be printed.
1152
1153 The CRITERIA subcommand has four optional parameters:
1154 @itemize @bullet
1155 @item The TESTPOS parameter may be LARGE or SMALL.
1156 LARGE is the default, and says that larger values in the predictor variables are to be
1157 considered positive.  SMALL indicates that smaller values should be considered positive.
1158
1159 @item The CI parameter specifies the confidence interval that should be printed.
1160 It has no effect if the SE keyword in the PRINT subcommand has not been given.
1161
1162 @item The DISTRIBUTION parameter determines the method to be used when estimating the area
1163 under the curve.
1164 There are two possibilities, @i{viz}: FREE and NEGEXPO.
1165 The FREE method uses a non-parametric estimate, and the NEGEXPO method a bi-negative
1166 exponential distribution estimate.
1167 The NEGEXPO method should only be used when the number of positive actual states is
1168 equal to the number of negative actual states.
1169 The default is FREE.
1170
1171 @item The CUTOFF parameter is for compatibility and is ignored.
1172 @end itemize
1173
1174 The MISSING subcommand determines whether user missing values are to
1175 be included or excluded in the analysis.  The default behaviour is to
1176 exclude them.
1177 Cases are excluded on a listwise basis; if any of the variables in @var{var_list}
1178 or if the variable @var{state_var} is missing, then the entire case will be
1179 excluded.
1180
1181