@c PSPP - a program for statistical analysis.
-@c Copyright (C) 2017 Free Software Foundation, Inc.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
@c Permission is granted to copy, distribute and/or modify this document
@c under the terms of the GNU Free Documentation License, Version 1.3
@c or any later version published by the Free Software Foundation;
@end display
The @cmd{DESCRIPTIVES} procedure reads the active dataset and outputs
-descriptive
-statistics requested by the user. In addition, it can optionally
+linear descriptive statistics requested by the user. In addition, it can optionally
compute Z-scores.
The @subcmd{VARIABLES} subcommand, which is required, specifies the list of
The @subcmd{A} and @subcmd{D} settings request an ascending or descending
sort order, respectively.
+@subsection Descriptives Example
+
+The @file{physiology.sav} file contains various physiological data for a sample
+of persons. Running the @cmd{DESCRIPTIVES} command on the variables @exvar{height}
+and @exvar{temperature} with the default options allows one to see simple linear
+statistics for these two variables. In @ref{descriptives:ex}, these variables
+are specfied on the @subcmd{VARIABLES} subcommand and the @subcmd{SAVE} option
+has been used, to request that Z scores be calculated.
+
+After the command has completed, this example runs @cmd{DESCRIPTIVES} again, this
+time on the @exvar{zheight} and @exvar{ztemperature} variables,
+which are the two normalized (Z-score) variables generated by the
+first @cmd{DESCRIPTIVES} command.
+
+@float Example, descriptives:ex
+@psppsyntax {descriptives.sps}
+@caption {Running two @cmd{DESCRIPTIVES} commands, one with the @subcmd{SAVE} subcommand}
+@end float
+
+In @ref{descriptives:res}, we can see that there are 40 valid data for each of the variables
+and no missing values. The mean average of the height and temperature is 16677.12
+and 37.02 respectively. The descriptive statistics for temperature seem reasonable.
+However there is a very high standard deviation for @exvar{height} and a suspiciously
+low minimum. This is due to a data entry error in the
+data (@pxref{Identifying incorrect data}).
+
+In the second Descriptive Statistics command, one can see that the mean and standard
+deviation of both Z score variables is 0 and 1 respectively. All Z score statistics
+should have these properties since they are normalized versions of the original scores.
+
+@float Result, descriptives:res
+@psppoutput {descriptives}
+@caption {Descriptives statistics including two normalized variables (Z-scores)}
+@end float
+
@node FREQUENCIES
@section FREQUENCIES
The @subcmd{ORDER} subcommand is accepted but ignored.
+@subsection Frequencies Example
+
+@ref{frequencies:ex} runs a frequency analysis on the @exvar{sex}
+and @exvar{occupation} variables from the @file{personnel.sav} file.
+This is useful to get an general idea of the way in which these nominal
+variables are distributed.
+
+@float Example, frequencies:ex
+@psppsyntax {frequencies.sps}
+@caption {Running frequencies on the @exvar{sex} and @exvar{occupation} variables}
+@end float
+
+If you are using the graphic user interface, the dialog box is set up such that
+by default, several statistics are calculated. These are not particularly useful
+for these variables, so you will want to disable those.
+
+From @ref{frequencies:res} it is evident that there are 33 males, 21 females and
+2 persons for whom their sex has not been entered.
+
+One can also see how many of each occupation there are in the data.
+When dealing with string variables used as nominal values, running a frequency
+analysis is useful to detect data input entries. Notice that
+one @exvar{occupation} value has been mistyped as ``Scrientist''. This entry should
+be corrected, or marked as missing before using the data.
+
+@float Result, frequencies:res
+@psppoutput {frequencies}
+@caption {The relative frequencies of @exvar{sex} and @exvar{occupation}}
+@end float
+
@node EXAMINE
@section EXAMINE
@end example
In this example, we look at the height and weight of a sample of individuals and
how they differ between male and female.
-A table showing the 3 largest and the 3 smallest values of @var{height} and
-@var{weight} for each gender, and for the whole dataset will be shown.
+A table showing the 3 largest and the 3 smallest values of @exvar{height} and
+@exvar{weight} for each gender, and for the whole dataset will be shown.
Boxplots will also be produced.
Because @subcmd{/COMPARE = GROUPS} was given, boxplots for male and female will be
shown in the same graphic, allowing us to easily see the difference between
@table @asis
@item CHISQ
-@cindex chisquare
@cindex chi-square
Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
have user missing values for the categorical variables should be omitted
from the calculation.
+@subsection Example Means
+
+The dataset in @file{repairs.sav} contains the mean time between failures (@exvar{mtbf})
+for a sample of artifacts produced by different factories and trialed under
+different operating conditions.
+Since there are four combinations of categorical variables, by simply looking
+at the list of data, it would be hard to how the scores vary for each category.
+@ref{means:ex} shows one way of tabulating the @exvar{mtbf} in a way which is
+easier to understand.
+
+@float Example, means:ex
+@psppsyntax {means.sps}
+@caption {Running @cmd{MEANS} on the @exvar{mtbf} score with categories @exvar{factory} and @exvar{environment}}
+@end float
+
+The results are shown in @ref{means:res}. The figures shown indicate the mean,
+standard deviation and number of samples in each category.
+These figures however do not indicate whether the results are statistically
+significant. For that, you would need to use the procedures @cmd{ONEWAY}, @cmd{GLM} or
+@cmd{T-TEST} depending on the hypothesis being tested.
+
+@float Result, means:res
+@psppoutput {means}
+@caption {The @exvar{mtbf} categorised by @exvar{factory} and @exvar{environment}}
+@end float
+
+Note that there is no limit to the number of variables for which you can calculate
+statistics, nor to the number of categorical variables per layer, nor the number
+of layers.
+However, running @cmd{MEANS} on a large numbers of variables, or with categorical variables
+containing a large number of distinct values may result in an extremely large output, which
+will not be easy to interpret.
+So you should consider carefully which variables to select for participation in the analysis.
+
@node NPAR TESTS
@section NPAR TESTS
@menu
* BINOMIAL:: Binomial Test
-* CHISQUARE:: Chisquare Test
+* CHISQUARE:: Chi-square Test
* COCHRAN:: Cochran Q Test
* FRIEDMAN:: Friedman Test
* KENDALL:: Kendall's W Test
even for very large sample sizes.
-
@node CHISQUARE
-@subsection Chisquare Test
+@subsection Chi-square Test
@vindex CHISQUARE
-@cindex chisquare test
+@cindex chi-square test
@display
If no @subcmd{/EXPECTED} subcommand is given, then equal frequencies
are expected.
+@subsubsection Chi-square Example
+
+A researcher wishes to investigate whether there are an equal number of
+persons of each sex in a population. The sample chosen for invesigation
+is that from the @file {physiology.sav} dataset. The null hypothesis for
+the test is that the population comprises an equal number of males and females.
+The analysis is performed as shown in @ref{chisquare:ex}.
+
+@float Example, chisquare:ex
+@psppsyntax {chisquare.sps}
+@caption {Performing a chi-square test to check for equal distribution of sexes}
+@end float
+
+There is only one test variable, @i{viz:} @exvar{sex}. The other variables in the dataset
+are ignored.
+
+In @ref{chisquare:res} the summary box shows that in the sample, there are more males
+than females. However the significance of chi-square result is greater than 0.05
+--- the most commonly accepted p-value --- and therefore
+there is not enough evidence to reject the null hypothesis and one must conclude
+that the evidence does not indicate that there is an imbalance of the sexes
+in the population.
+
+@float Result, chisquare:res
+@psppoutput {chisquare}
+@caption {The results of running a chi-square test on @exvar{sex}}
+@end float
+
@node COCHRAN
@subsection Cochran Q Test
In this mode, you must also use the @subcmd{/VARIABLES} subcommand to
tell @pspp{} which variables you wish to test.
+@subsubsection Example - One Sample T-test
+
+A researcher wishes to know whether the weight of persons in a population
+is different from the national average.
+The samples are drawn from the population under investigation and recorded
+in the file @file{physiology.sav}.
+From the Department of Health, she
+knows that the national average weight of healthy adults is 76.8kg.
+Accordingly the @subcmd{TESTVAL} is set to 76.8.
+The null hypothesis therefore is that the mean average weight of the
+population from which the sample was drawn is 76.8kg.
+
+As previously noted (@pxref{Identifying incorrect data}), one
+sample in the dataset contains a weight value
+which is clearly incorrect. So this is excluded from the analysis
+using the @cmd{SELECT} command.
+
+@float Example, one-sample-t:ex
+@psppsyntax {one-sample-t.sps}
+@caption {Running a one sample T-Test after excluding all non-positive values}
+@end float
+
+@ref{one-sample-t:res} shows that the mean of our sample differs from the test value
+by -1.40kg. However the significance is very high (0.610). So one cannot
+reject the null hypothesis, and must conclude there is not enough evidence
+to suggest that the mean weight of the persons in our population is different
+from 76.8kg.
+
+@float Results, one-sample-t:res
+@psppoutput {one-sample-t}
+@caption {The results of a one sample T-test of @exvar{weight} using a test value of 76.8kg}
+@end float
+
@node Independent Samples Mode
@subsection Independent Samples Mode
the independent variable are excluded on a listwise basis, regardless
of whether @subcmd{/MISSING=LISTWISE} was specified.
+@subsubsection Example - Independent Samples T-test
+
+A researcher wishes to know whether within a population, adult males
+are taller than adult females.
+The samples are drawn from the population under investigation and recorded
+in the file @file{physiology.sav}.
+
+As previously noted (@pxref{Identifying incorrect data}), one
+sample in the dataset contains a height value
+which is clearly incorrect. So this is excluded from the analysis
+using the @cmd{SELECT} command.
+
+
+@float Example, indepdendent-samples-t:ex
+@psppsyntax {independent-samples-t.sps}
+@caption {Running a independent samples T-Test after excluding all observations less than 200kg}
+@end float
+
+
+The null hypothesis is that both males and females are on average
+of equal height.
+
+In this case, the grouping variable is @exvar{sex}, so this is entered
+as the variable for the @subcmd{GROUP} subcommand. The group values are 0 (male) and
+1 (female).
+
+If you are running the proceedure using syntax, then you need to enter
+the values corresponding to each group within parentheses.
+
+
+From @ref{independent-samples-t:res}, one can clearly see that the @emph{sample} mean height
+is greater for males than for females. However in order to see if this
+is a significant result, one must consult the T-Test table.
+
+The T-Test table contains two rows; one for use if the variance of the samples
+in each group may be safely assumed to be equal, and the second row
+if the variances in each group may not be safely assumed to be equal.
+
+In this case however, both rows show a 2-tailed significance less than 0.001 and
+one must therefore reject the null hypothesis and conclude that within
+the population the mean height of males and of females are unequal.
+
+@float Result, independent-samples-t:res
+@psppoutput {independent-samples-t}
+@caption {The results of an independent samples T-test of @exvar{height} by @exvar{sex}}
+@end float
@node Paired Samples Mode
@subsection Paired Samples Mode
Currently there is only one type: @subcmd{SUMMARY=TOTAL}, which displays per-item
analysis tested against the totals.
+@subsection Example - Reliability
+
+Before analysing the results of a survey -- particularly for a multiple choice survey --
+it is desireable to know whether the respondents have considered their answers
+or simply provided random answers.
+
+In the following example the survey results from the file @file{hotel.sav} are used.
+All five survey questions are included in the reliability analysis.
+However, before running the analysis, the data must be preprocessed.
+An examination of the survey questions reveals that two questions, @i{viz:} v3 and v5
+are negatively worded, whereas the others are positively worded.
+All questions must be based upon the same scale for the analysis to be meaningful.
+One could use the @cmd{RECODE} command (@pxref{RECODE}), however a simpler way is
+to use @cmd{COMPUTE} (@pxref{COMPUTE}) and this is what is done in @ref{reliability:ex}.
+
+@float Example, reliability:ex
+@psppsyntax {reliability.sps}
+@caption {Investigating the reliability of survey responses}
+@end float
+
+In this case, all variables in the data set are used. So we can use the special
+keyword @samp{ALL} (@pxref{BNF}).
+
+@ref{reliability:res} shows that Cronbach's Alpha is 0.11 which is a value normally considered too
+low to indicate consistency within the data. This is possibly due to the small number of
+survey questions. The survey should be redesigned before serious use of the results are
+applied.
+
+@float Result, reliability:res
+@psppoutput {reliability}
+@caption {The results of the reliability command on @file{hotel.sav}}
+@end float
@node ROC