# Language Overview
+- [Tutorial](language/tutorial/index.md)
+ - [Data Preparation](language/tutorial/preparation.md)
+ - [Data Screening and Transformation](language/tutorial/transformations.md)
+ - [Hypothesis Testing](language/tutorial/hypotheses.md)
- [Basics](language/basics/index.md)
- [Tokens](language/basics/tokens.md)
- [Forming Commands](language/basics/commands.md)
--- /dev/null
+# Hypothesis Testing
+
+One of the most fundamental purposes of statistical analysis is
+hypothesis testing. Researchers commonly need to test hypotheses about
+a set of data. For example, she might want to test whether one set of
+data comes from the same distribution as another, or whether the mean of
+a dataset significantly differs from a particular value. This section
+presents just some of the possible tests that PSPP offers.
+
+The researcher starts by making a "null hypothesis". Often this is a
+hypothesis which he suspects to be false. For example, if he suspects
+that A is greater than B he will state the null hypothesis as A = B.[^1]
+
+[^1]: This example assumes that it is already proven that B is not
+greater than A.
+
+The "p-value" is a recurring concept in hypothesis testing. It is
+the highest acceptable probability that the evidence implying a null
+hypothesis is false, could have been obtained when the null hypothesis
+is in fact true. Note that this is not the same as "the probability of
+making an error" nor is it the same as "the probability of rejecting a
+hypothesis when it is true".
+
+## Testing for differences of means
+
+A common statistical test involves hypotheses about means. The `T-TEST`
+command is used to find out whether or not two separate subsets have the
+same mean.
+
+A researcher suspected that the heights and core body temperature of
+persons might be different depending upon their sex. To investigate
+this, he posed two null hypotheses based on the data from
+`physiology.sav` previously encountered:
+
+ - The mean heights of males and females in the population are equal.
+
+ - The mean body temperature of males and females in the population
+ are equal.
+
+For the purposes of the investigation the researcher decided to use a
+p-value of 0.05.
+
+In addition to the T-test, the `T-TEST` command also performs the
+Levene test for equal variances. If the variances are equal, then a
+more powerful form of the T-test can be used. However if it is unsafe
+to assume equal variances, then an alternative calculation is necessary.
+PSPP performs both calculations.
+
+For the height variable, the output shows the significance of the
+Levene test to be 0.33 which means there is a 33% probability that the
+Levene test produces this outcome when the variances are equal. Had the
+significance been less than 0.05, then it would have been unsafe to
+assume that the variances were equal. However, because the value is
+higher than 0.05 the homogeneity of variances assumption is safe and the
+"Equal Variances" row (the more powerful test) can be used. Examining
+this row, the two tailed significance for the height t-test is less than
+0.05, so it is safe to reject the null hypothesis and conclude that the
+mean heights of males and females are unequal.
+
+For the temperature variable, the significance of the Levene test is
+0.58 so again, it is safe to use the row for equal variances. The equal
+variances row indicates that the two tailed significance for temperature
+is 0.20. Since this is greater than 0.05 we must reject the null
+hypothesis and conclude that there is insufficient evidence to suggest
+that the body temperature of male and female persons are different.
+
+ The syntax for this analysis is:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/physiology.sav'.
+PSPP> recode height (179 = SYSMIS).
+PSPP> t-test group=sex(0,1) /variables = height temperature.
+```
+
+PSPP produces the following output for this syntax:
+
+```
+ Group Statistics
+┌───────────────────────────────────────────┬──┬───────┬─────────────┬────────┐
+│ │ │ │ Std. │ S.E. │
+│ Group │ N│ Mean │ Deviation │ Mean │
+├───────────────────────────────────────────┼──┼───────┼─────────────┼────────┤
+│Height in millimeters Male │22│1796.49│ 49.71│ 10.60│
+│ Female│17│1610.77│ 25.43│ 6.17│
+├───────────────────────────────────────────┼──┼───────┼─────────────┼────────┤
+│Internal body temperature in degrees Male │22│ 36.68│ 1.95│ .42│
+│Celcius Female│18│ 37.43│ 1.61│ .38│
+└───────────────────────────────────────────┴──┴───────┴─────────────┴────────┘
+
+ Independent Samples Test
+┌─────────────────────┬──────────┬──────────────────────────────────────────
+│ │ Levene's │
+│ │ Test for │
+│ │ Equality │
+│ │ of │
+│ │ Variances│ T─Test for Equality of Means
+│ ├────┬─────┼─────┬─────┬───────┬──────────┬──────────┐
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ │ │ │
+│ │ │ │ │ │ Sig. │ │ │
+│ │ │ │ │ │ (2─ │ Mean │Std. Error│
+│ │ F │ Sig.│ t │ df │tailed)│Difference│Difference│
+├─────────────────────┼────┼─────┼─────┼─────┼───────┼──────────┼──────────┤
+│Height in Equal │ .97│ .331│14.02│37.00│ .000│ 185.72│ 13.24│
+│millimeters variances│ │ │ │ │ │ │ │
+│ assumed │ │ │ │ │ │ │ │
+│ Equal │ │ │15.15│32.71│ .000│ 185.72│ 12.26│
+│ variances│ │ │ │ │ │ │ │
+│ not │ │ │ │ │ │ │ │
+│ assumed │ │ │ │ │ │ │ │
+├─────────────────────┼────┼─────┼─────┼─────┼───────┼──────────┼──────────┤
+│Internal Equal │ .31│ .581│─1.31│38.00│ .198│ ─.75│ .57│
+│body variances│ │ │ │ │ │ │ │
+│temperature assumed │ │ │ │ │ │ │ │
+│in degrees Equal │ │ │─1.33│37.99│ .190│ ─.75│ .56│
+│Celcius variances│ │ │ │ │ │ │ │
+│ not │ │ │ │ │ │ │ │
+│ assumed │ │ │ │ │ │ │ │
+└─────────────────────┴────┴─────┴─────┴─────┴───────┴──────────┴──────────┘
+
+┌─────────────────────┬─────────────┐
+│ │ │
+│ │ │
+│ │ │
+│ │ │
+│ │ │
+│ ├─────────────┤
+│ │ 95% │
+│ │ Confidence │
+│ │ Interval of │
+│ │ the │
+│ │ Difference │
+│ ├──────┬──────┤
+│ │ Lower│ Upper│
+├─────────────────────┼──────┼──────┤
+│Height in Equal │158.88│212.55│
+│millimeters variances│ │ │
+│ assumed │ │ │
+│ Equal │160.76│210.67│
+│ variances│ │ │
+│ not │ │ │
+│ assumed │ │ │
+├─────────────────────┼──────┼──────┤
+│Internal Equal │ ─1.91│ .41│
+│body variances│ │ │
+│temperature assumed │ │ │
+│in degrees Equal │ ─1.89│ .39│
+│Celcius variances│ │ │
+│ not │ │ │
+│ assumed │ │ │
+└─────────────────────┴──────┴──────┘
+```
+
+The `T-TEST` command tests for differences of means. Here, the height
+variable's two tailed significance is less than 0.05, so the null
+hypothesis can be rejected. Thus, the evidence suggests there is a
+difference between the heights of male and female persons. However
+the significance of the test for the temperature variable is greater
+than 0.05 so the null hypothesis cannot be rejected, and there is
+insufficient evidence to suggest a difference in body temperature.
+
+## Linear Regression
+
+Linear regression is a technique used to investigate if and how a
+variable is linearly related to others. If a variable is found to be
+linearly related, then this can be used to predict future values of that
+variable.
+
+In the following example, the service department of the company wanted
+to be able to predict the time to repair equipment, in order to
+improve the accuracy of their quotations. It was suggested that the
+time to repair might be related to the time between failures and the
+duty cycle of the equipment. The p-value of 0.1 was chosen for this
+investigation. In order to investigate this hypothesis, the
+[`REGRESSION`](../../commands/statistics/regression.md) command was
+used. This command not only tests if the variables are related, but
+also identifies the potential linear relationship.
+
+A first attempt includes `duty_cycle`:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/repairs.sav'.
+PSPP> regression /variables = mtbf duty_cycle /dependent = mttr.
+```
+
+This attempt yields the following output (in part):
+
+```
+ Coefficients (Mean time to repair (hours) )
+┌────────────────────────┬─────────────────────┬───────────────────┬─────┬────┐
+│ │ Unstandardized │ Standardized │ │ │
+│ │ Coefficients │ Coefficients │ │ │
+│ ├─────────┬───────────┼───────────────────┤ │ │
+│ │ B │ Std. Error│ Beta │ t │Sig.│
+├────────────────────────┼─────────┼───────────┼───────────────────┼─────┼────┤
+│(Constant) │ 10.59│ 3.11│ .00│ 3.40│.002│
+│Mean time between │ 3.02│ .20│ .95│14.88│.000│
+│failures (months) │ │ │ │ │ │
+│Ratio of working to non─│ ─1.12│ 3.69│ ─.02│ ─.30│.763│
+│working time │ │ │ │ │ │
+└────────────────────────┴─────────┴───────────┴───────────────────┴─────┴────┘
+```
+
+The coefficients in the above table suggest that the formula
+\\(\textrm{MTTR} = 9.81 + 3.1 \times \textrm{MTBF} + 1.09 \times
+\textrm{DUTY\_CYCLE}\\) can be used to predict the time to repair.
+However, the significance value for the `DUTY_CYCLE` coefficient is
+very high, which would make this an unsafe predictor. For this
+reason, the test was repeated, but omitting the `duty_cycle` variable:
+
+```
+PSPP> regression /variables = mtbf /dependent = mttr.
+```
+
+This second try produces the following output (in part):
+
+```
+ Coefficients (Mean time to repair (hours) )
+┌───────────────────────┬──────────────────────┬───────────────────┬─────┬────┐
+│ │ Unstandardized │ Standardized │ │ │
+│ │ Coefficients │ Coefficients │ │ │
+│ ├─────────┬────────────┼───────────────────┤ │ │
+│ │ B │ Std. Error │ Beta │ t │Sig.│
+├───────────────────────┼─────────┼────────────┼───────────────────┼─────┼────┤
+│(Constant) │ 9.90│ 2.10│ .00│ 4.71│.000│
+│Mean time between │ 3.01│ .20│ .94│15.21│.000│
+│failures (months) │ │ │ │ │ │
+└───────────────────────┴─────────┴────────────┴───────────────────┴─────┴────┘
+```
+
+This time, the significance of all coefficients is no higher than
+0.06, suggesting that at the 0.06 level, the formula \\(\textrm{MTTR} = 10.5 +
+3.11 \times \textrm{MTBF}\\) is a reliable predictor of the time to repair.
+
--- /dev/null
+# PSPP Language Tutorial
+
+PSPP is a tool for the statistical analysis of sampled data. You can
+use it to discover patterns in the data, to explain differences in one
+subset of data in terms of another subset and to find out whether
+certain beliefs about the data are justified. This chapter does not
+attempt to introduce the theory behind the statistical analysis, but it
+shows how such analysis can be performed using PSPP.
+
+This tutorial assumes that you are using PSPP in its interactive mode
+from the command line. However, the example commands can also be
+typed into a file and executed in a post-hoc mode by typing `pspp
+FILE-NAME` at a shell prompt, where `FILE-NAME` is the name of the
+file containing the commands. Alternatively, from the graphical
+interface, you can select File → New → Syntax to open a new syntax
+window and use the Run menu when a syntax fragment is ready to be
+executed. Whichever method you choose, the syntax is identical.
+
+When using the interactive method, PSPP tells you that it's waiting
+for your data with a string like `PSPP>` or `data>`. In the examples
+of this chapter, whenever you see text like this, it indicates the
+prompt displayed by PSPP, _not_ something that you should type.
+
+Throughout this chapter reference is made to a number of sample data
+files. So that you can try the examples for yourself, you should have
+received these files along with your copy of PSPP.[^1]
+
+> Normally these files are installed in the directory
+`/usr/local/share/pspp/examples`. If however your system
+administrator or operating system vendor has chosen to install them in
+a different location, you will have to adjust the examples
+accordingly.
+
+[^1]: These files contain purely fictitious data. They should not be
+used for research purposes.
+
--- /dev/null
+# Preparation of Data Files
+
+Before analysis can commence, the data must be loaded into PSPP and
+arranged such that both PSPP and humans can understand what the data
+represents. There are two aspects of data:
+
+- The variables—these are the parameters of a quantity which has
+ been measured or estimated in some way. For example height, weight
+ and geographic location are all variables.
+
+- The observations (also called 'cases') of the variables—each
+ observation represents an instance when the variables were measured
+ or observed.
+
+For example, a data set which has the variables height, weight, and
+name, might have the observations:
+
+```
+1881 89.2 Ahmed
+1192 107.01 Frank
+1230 67 Julie
+```
+
+The following sections explain how to define a dataset.
+
+## Defining Variables
+
+Variables come in two basic types: "numeric" and "string".
+Variables such as age, height and satisfaction are numeric, whereas name
+is a string variable. String variables are best reserved for commentary
+data to assist the human observer. However they can also be used for
+nominal or categorical data.
+
+The following example defines two variables, `forename` and `height`,
+and reads data into them by manual input:
+
+```
+PSPP> data list list /forename (A12) height.
+PSPP> begin data.
+data> Ahmed 188
+data> Bertram 167
+data> Catherine 134.231
+data> David 109.1
+data> end data
+PSPP>
+```
+
+There are several things to note about this example.
+
+- The words `data list list` are an example of the [`DATA
+ LIST`](../../commands/data-io/data-list.md). command, which tells
+ PSPP to prepare for reading data. The word `list` intentionally
+ appears twice. The first occurrence is part of the `DATA LIST`
+ call, whilst the second tells PSPP that the data is to be read as
+ free format data with one record per line.
+
+ Usually this manual shows command names and other fixed elements of
+ syntax in upper case, but case doesn't matter in most parts of
+ command syntax. In the tutorial, we usually show them in lowercase
+ because they are easier to type that way.
+
+- The `/` character is important. It marks the start of the list of
+ variables which you wish to define.
+
+- The text `forename` is the name of the first variable, and `(A12)`
+ says that the variable forename is a string variable and that its
+ maximum length is 12 bytes. The second variable's name is specified
+ by the text `height`. Since no format is given, this variable has
+ the default format. Normally the default format expects numeric
+ data, which should be entered in the locale of the operating system.
+ Thus, the example is correct for English locales and other locales
+ which use a period (`.`) as the decimal separator. However if you
+ are using a system with a locale which uses the comma (`,`) as the
+ decimal separator, then you should in the subsequent lines
+ substitute `.` with `,`. Alternatively, you could explicitly tell
+ PSPP that the height variable is to be read using a period as its
+ decimal separator by appending the text `DOT8.3` after the word
+ `height`. For more information on data formats, see [Input and
+ Output Formats](../../language/datasets/formats/index.md).
+
+- PSPP displays the prompt `PSPP>` when it's expecting a command.
+ When it's expecting data, the prompt changes to `data>` so that you
+ know to enter data and not a command.
+
+- At the end of every command there is a terminating `.` which tells
+ PSPP that the end of a command has been encountered. You should not
+ enter `.` when data is expected (ie. when the `data>` prompt is
+ current) since it is appropriate only for terminating commands.
+
+ You can also terminate a command with a blank line.
+
+## Listing the data
+
+Once the data has been entered, you could type
+```
+PSPP> list /format=numbered.
+```
+to list the data. The optional text `/format=numbered` requests the
+case numbers to be shown along with the data. It should show the
+following output:
+
+```
+ Data List
+┌───────────┬─────────┬──────┐
+│Case Number│ forename│height│
+├───────────┼─────────┼──────┤
+│1 │Ahmed │188.00│
+│2 │Bertram │167.00│
+│3 │Catherine│134.23│
+│4 │David │109.10│
+└───────────┴─────────┴──────┘
+```
+
+
+Note that the numeric variable height is displayed to 2 decimal
+places, because the format for that variable is `F8.2`. For a
+complete description of the `LIST` command, see
+[`LIST`](../../commands/data-io/list.md).
+
+## Reading data from a text file
+
+The previous example showed how to define a set of variables and to
+manually enter the data for those variables. Manual entering of data is
+tedious work, and often a file containing the data will be have been
+previously prepared. Let us assume that you have a file called
+`mydata.dat` containing the ascii encoded data:
+
+```
+Ahmed 188.00
+Bertram 167.00
+Catherine 134.23
+David 109.10
+ .
+ .
+ .
+Zachariah 113.02
+```
+
+You can can tell the `DATA LIST` command to read the data directly
+from this file instead of by manual entry, with a command like: PSPP>
+data list file='mydata.dat' list /forename (A12) height. Notice
+however, that it is still necessary to specify the names of the
+variables and their formats, since this information is not contained
+in the file. It is also possible to specify the file's character
+encoding and other parameters. For full details refer to [`DATA
+LIST`](../../commands/data-io/data-list.md).
+
+## Reading data from a pre-prepared PSPP file
+
+When working with other PSPP users, or users of other software which
+uses the PSPP data format, you may be given the data in a pre-prepared
+PSPP file. Such files contain not only the data, but the variable
+definitions, along with their formats, labels and other meta-data.
+Conventionally, these files (sometimes called "system" files) have the
+suffix `.sav`, but that is not mandatory. The following syntax loads a
+file called `my-file.sav`.
+
+```
+PSPP> get file='my-file.sav'.
+```
+
+You will encounter several instances of this in future examples.
+
+## Saving data to a PSPP file.
+
+If you want to save your data, along with the variable definitions so
+that you or other PSPP users can use it later, you can do this with the
+`SAVE` command.
+
+ The following syntax will save the existing data and variables to a
+file called `my-new-file.sav`.
+
+```
+PSPP> save outfile='my-new-file.sav'.
+```
+
+If `my-new-file.sav` already exists, then it will be overwritten.
+Otherwise it will be created.
+
+## Reading data from other sources
+
+Sometimes it's useful to be able to read data from comma separated
+text, from spreadsheets, databases or other sources. In these
+instances you should use the [`GET
+DATA`](../../commands/spss-io/get-data.md) command.
+
+## Exiting PSPP
+
+Use the `FINISH` command to exit PSPP:
+ PSPP> finish.
+
--- /dev/null
+# Data Screening and Transformation
+
+Once data has been entered, it is often desirable, or even necessary,
+to transform it in some way before performing analysis upon it. At
+the very least, it's good practice to check for errors.
+
+## Identifying incorrect data
+
+Data from real sources is rarely error free. PSPP has a number of
+procedures which can be used to help identify data which might be
+incorrect.
+
+The [`DESCRIPTIVES`](../../commands/statistics/descriptives.md)
+command is used to generate simple linear statistics for a dataset.
+It is also useful for identifying potential problems in the data. The
+example file `physiology.sav` contains a number of physiological
+measurements of a sample of healthy adults selected at random.
+However, the data entry clerk made a number of mistakes when entering
+the data. The following example illustrates the use of `DESCRIPTIVES`
+to screen this data and identify the erroneous values:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/physiology.sav'.
+PSPP> descriptives sex, weight, height.
+```
+
+For this example, PSPP produces the following output:
+
+```
+ Descriptive Statistics
+┌─────────────────────┬──┬───────┬───────┬───────┬───────┐
+│ │ N│ Mean │Std Dev│Minimum│Maximum│
+├─────────────────────┼──┼───────┼───────┼───────┼───────┤
+│Sex of subject │40│ .45│ .50│Male │Female │
+│Weight in kilograms │40│ 72.12│ 26.70│ ─55.6│ 92.1│
+│Height in millimeters│40│1677.12│ 262.87│ 179│ 1903│
+│Valid N (listwise) │40│ │ │ │ │
+│Missing N (listwise) │ 0│ │ │ │ │
+└─────────────────────┴──┴───────┴───────┴───────┴───────┘
+```
+
+
+The most interesting column in the output is the minimum value. The
+weight variable has a minimum value of less than zero, which is clearly
+erroneous. Similarly, the height variable's minimum value seems to be
+very low. In fact, it is more than 5 standard deviations from the mean,
+and is a seemingly bizarre height for an adult person.
+
+We can look deeper into these discrepancies by issuing an additional
+`EXAMINE` command:
+
+```
+PSPP> examine height, weight /statistics=extreme(3).
+```
+
+This command produces the following additional output (in part):
+
+```
+ Extreme Values
+┌───────────────────────────────┬───────────┬─────┐
+│ │Case Number│Value│
+├───────────────────────────────┼───────────┼─────┤
+│Height in millimeters Highest 1│ 14│ 1903│
+│ 2│ 15│ 1884│
+│ 3│ 12│ 1802│
+│ ──────────┼───────────┼─────┤
+│ Lowest 1│ 30│ 179│
+│ 2│ 31│ 1598│
+│ 3│ 28│ 1601│
+├───────────────────────────────┼───────────┼─────┤
+│Weight in kilograms Highest 1│ 13│ 92.1│
+│ 2│ 5│ 92.1│
+│ 3│ 17│ 91.7│
+│ ──────────┼───────────┼─────┤
+│ Lowest 1│ 38│─55.6│
+│ 2│ 39│ 54.5│
+│ 3│ 33│ 55.4│
+└───────────────────────────────┴───────────┴─────┘
+```
+
+From this new output, you can see that the lowest value of height is 179
+(which we suspect to be erroneous), but the second lowest is 1598 which
+we know from `DESCRIPTIVES` is within 1 standard deviation from the
+mean. Similarly, the lowest value of weight is negative, but its second
+lowest value is plausible. This suggests that the two extreme values
+are outliers and probably represent data entry errors.
+
+The output also identifies the case numbers for each extreme value,
+so we can see that cases 30 and 38 are the ones with the erroneous
+values.
+
+## Dealing with suspicious data
+
+If possible, suspect data should be checked and re-measured. However,
+this may not always be feasible, in which case the researcher may
+decide to disregard these values. PSPP has a feature for [missing
+values](../../language/basics/missing-values.md), whereby data can
+assume the special value 'SYSMIS', and will be disregarded in future
+analysis. You can set the two suspect values to the `SYSMIS` value
+using the [`RECODE`](../../commands/data/recode.md) command.
+
+```
+PSPP> recode height (179 = SYSMIS).
+PSPP> recode weight (LOWEST THRU 0 = SYSMIS).
+```
+
+The first command says that for any observation which has a height value
+of 179, that value should be changed to the SYSMIS value. The second
+command says that any weight values of zero or less should be changed to
+SYSMIS. From now on, they will be ignored in analysis.
+
+If you now re-run the `DESCRIPTIVES` or `EXAMINE` commands from the
+previous section, you will see a data summary with more plausible
+parameters. You will also notice that the data summaries indicate the
+two missing values.
+
+## Inverting negatively coded variables
+
+Data entry errors are not the only reason for wanting to recode data.
+The sample file `hotel.sav` comprises data gathered from a customer
+satisfaction survey of clients at a particular hotel. The following
+commands load the file and display its variables and associated data:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/hotel.sav'.
+PSPP> display dictionary.
+```
+
+It yields the following output:
+
+```
+ Variables
+┌────┬────────┬─────────────┬────────────┬─────┬─────┬─────────┬──────┬───────┐
+│ │ │ │ Measurement│ │ │ │ Print│ Write │
+│Name│Position│ Label │ Level │ Role│Width│Alignment│Format│ Format│
+├────┼────────┼─────────────┼────────────┼─────┼─────┼─────────┼──────┼───────┤
+│v1 │ 1│I am │Ordinal │Input│ 8│Right │F8.0 │F8.0 │
+│ │ │satisfied │ │ │ │ │ │ │
+│ │ │with the │ │ │ │ │ │ │
+│ │ │level of │ │ │ │ │ │ │
+│ │ │service │ │ │ │ │ │ │
+│v2 │ 2│The value for│Ordinal │Input│ 8│Right │F8.0 │F8.0 │
+│ │ │money was │ │ │ │ │ │ │
+│ │ │good │ │ │ │ │ │ │
+│v3 │ 3│The staff │Ordinal │Input│ 8│Right │F8.0 │F8.0 │
+│ │ │were slow in │ │ │ │ │ │ │
+│ │ │responding │ │ │ │ │ │ │
+│v4 │ 4│My concerns │Ordinal │Input│ 8│Right │F8.0 │F8.0 │
+│ │ │were dealt │ │ │ │ │ │ │
+│ │ │with in an │ │ │ │ │ │ │
+│ │ │efficient │ │ │ │ │ │ │
+│ │ │manner │ │ │ │ │ │ │
+│v5 │ 5│There was too│Ordinal │Input│ 8│Right │F8.0 │F8.0 │
+│ │ │much noise in│ │ │ │ │ │ │
+│ │ │the rooms │ │ │ │ │ │ │
+└────┴────────┴─────────────┴────────────┴─────┴─────┴─────────┴──────┴───────┘
+
+ Value Labels
+┌────────────────────────────────────────────────────┬─────────────────┐
+│Variable Value │ Label │
+├────────────────────────────────────────────────────┼─────────────────┤
+│I am satisfied with the level of service 1│Strongly Disagree│
+│ 2│Disagree │
+│ 3│No Opinion │
+│ 4│Agree │
+│ 5│Strongly Agree │
+├────────────────────────────────────────────────────┼─────────────────┤
+│The value for money was good 1│Strongly Disagree│
+│ 2│Disagree │
+│ 3│No Opinion │
+│ 4│Agree │
+│ 5│Strongly Agree │
+├────────────────────────────────────────────────────┼─────────────────┤
+│The staff were slow in responding 1│Strongly Disagree│
+│ 2│Disagree │
+│ 3│No Opinion │
+│ 4│Agree │
+│ 5│Strongly Agree │
+├────────────────────────────────────────────────────┼─────────────────┤
+│My concerns were dealt with in an efficient manner 1│Strongly Disagree│
+│ 2│Disagree │
+│ 3│No Opinion │
+│ 4│Agree │
+│ 5│Strongly Agree │
+├────────────────────────────────────────────────────┼─────────────────┤
+│There was too much noise in the rooms 1│Strongly Disagree│
+│ 2│Disagree │
+│ 3│No Opinion │
+│ 4│Agree │
+│ 5│Strongly Agree │
+└────────────────────────────────────────────────────┴─────────────────┘
+```
+
+The output shows that all of the variables v1 through v5 are measured
+on a 5 point Likert scale, with 1 meaning "Strongly disagree" and 5
+meaning "Strongly agree". However, some of the questions are positively
+worded (v1, v2, v4) and others are negatively worded (v3, v5). To
+perform meaningful analysis, we need to recode the variables so that
+they all measure in the same direction. We could use the `RECODE`
+command, with syntax such as:
+
+```
+recode v3 (1 = 5) (2 = 4) (4 = 2) (5 = 1).
+```
+
+However an easier and more elegant way uses the
+[`COMPUTE`](../../commands/data/compute.md) command. Since the
+variables are Likert variables in the range (1 ... 5), subtracting
+their value from 6 has the effect of inverting them:
+
+```
+compute VAR = 6 - VAR.
+```
+
+The following section uses this technique to recode the
+variables v3 and v5. After applying `COMPUTE` for both variables, all
+subsequent commands will use the inverted values.
+
+## Testing data consistency
+
+A sensible check to perform on survey data is the calculation of
+reliability. This gives the statistician some confidence that the
+questionnaires have been completed thoughtfully. If you examine the
+labels of variables v1, v3 and v4, you will notice that they ask very
+similar questions. One would therefore expect the values of these
+variables (after recoding) to closely follow one another, and we can
+test that with the
+[`RELIABILITY`](../../commands/statistics/reliability.md) command.
+The following example shows a PSPP session where the user recodes
+negatively scaled variables and then requests reliability statistics
+for v1, v3, and v4.
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/hotel.sav'.
+PSPP> compute v3 = 6 - v3.
+PSPP> compute v5 = 6 - v5.
+PSPP> reliability v1, v3, v4.
+```
+
+This yields the following output:
+
+```
+Scale: ANY
+
+Case Processing Summary
+┌────────┬──┬───────┐
+│Cases │ N│Percent│
+├────────┼──┼───────┤
+│Valid │17│ 100.0%│
+│Excluded│ 0│ .0%│
+│Total │17│ 100.0%│
+└────────┴──┴───────┘
+
+ Reliability Statistics
+┌────────────────┬──────────┐
+│Cronbach's Alpha│N of Items│
+├────────────────┼──────────┤
+│ .81│ 3│
+└────────────────┴──────────┘
+```
+
+As a rule of thumb, many statisticians consider a value of Cronbach's
+Alpha of 0.7 or higher to indicate reliable data.
+
+Here, the value is 0.81, which suggests a high degree of reliability
+among variables v1, v3 and v4, so the data and the recoding that we
+performed are vindicated.
+
+## Testing for normality
+
+Many statistical tests rely upon certain properties of the data. One
+common property, upon which many linear tests depend, is that of
+normality -- the data must have been drawn from a normal distribution.
+It is necessary then to ensure normality before deciding upon the test
+procedure to use. One way to do this uses the `EXAMINE` command.
+
+In the following example, a researcher was examining the failure
+rates of equipment produced by an engineering company. The file
+`repairs.sav` contains the mean time between failures (mtbf) of some
+items of equipment subject to the study. Before performing linear
+analysis on the data, the researcher wanted to ascertain that the data
+is normally distributed.
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/repairs.sav'.
+PSPP> examine mtbf /statistics=descriptives.
+```
+
+This produces the following output:
+
+```
+ Descriptives
+┌──────────────────────────────────────────────────────────┬─────────┬────────┐
+│ │ │ Std. │
+│ │Statistic│ Error │
+├──────────────────────────────────────────────────────────┼─────────┼────────┤
+│Mean time between Mean │ 8.78│ 1.10│
+│failures (months) ──────────────────────────────────┼─────────┼────────┤
+│ 95% Confidence Interval Lower │ 6.53│ │
+│ for Mean Bound │ │ │
+│ Upper │ 11.04│ │
+│ Bound │ │ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ 5% Trimmed Mean │ 8.20│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Median │ 8.29│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Variance │ 36.34│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Std. Deviation │ 6.03│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Minimum │ 1.63│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Maximum │ 26.47│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Range │ 24.84│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Interquartile Range │ 6.03│ │
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Skewness │ 1.65│ .43│
+│ ──────────────────────────────────┼─────────┼────────┤
+│ Kurtosis │ 3.41│ .83│
+└──────────────────────────────────────────────────────────┴─────────┴────────┘
+```
+
+A normal distribution has a skewness and kurtosis of zero. The
+skewness of mtbf in the output above makes it clear that the mtbf
+figures have a lot of positive skew and are therefore not drawn from a
+normally distributed variable. Positive skew can often be compensated
+for by applying a logarithmic transformation, as in the following
+continuation of the example:
+
+```
+PSPP> compute mtbf_ln = ln (mtbf).
+PSPP> examine mtbf_ln /statistics=descriptives.
+```
+
+which produces the following additional output:
+
+```
+ Descriptives
+┌────────────────────────────────────────────────────┬─────────┬──────────┐
+│ │Statistic│Std. Error│
+├────────────────────────────────────────────────────┼─────────┼──────────┤
+│mtbf_ln Mean │ 1.95│ .13│
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ 95% Confidence Interval for Mean Lower Bound│ 1.69│ │
+│ Upper Bound│ 2.22│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ 5% Trimmed Mean │ 1.96│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Median │ 2.11│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Variance │ .49│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Std. Deviation │ .70│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Minimum │ .49│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Maximum │ 3.28│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Range │ 2.79│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Interquartile Range │ .88│ │
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Skewness │ ─.37│ .43│
+│ ─────────────────────────────────────────────┼─────────┼──────────┤
+│ Kurtosis │ .01│ .83│
+└────────────────────────────────────────────────────┴─────────┴──────────┘
+```
+
+The `COMPUTE` command in the first line above performs the logarithmic
+transformation: ``` compute mtbf_ln = ln (mtbf). ``` Rather than
+redefining the existing variable, this use of `COMPUTE` defines a new
+variable mtbf_ln which is the natural logarithm of mtbf. The final
+command in this example calls `EXAMINE` on this new variable. The
+results show that both the skewness and kurtosis for mtbf_ln are very
+close to zero. This provides some confidence that the mtbf_ln
+variable is normally distributed and thus safe for linear analysis.
+In the event that no suitable transformation can be found, then it
+would be worth considering an appropriate non-parametric test instead
+of a linear one. See [`NPAR
+TESTS`](../../commands/statistics/npar-tests.md), for information
+about non-parametric tests.
+
* `flt64 bias;`
- Compression bias, ordinarily set to 100. Only integers between `1
- - bias` and `251 - bias` can be compressed.
+ Compression bias, usually 100. Only integers between `1 - bias` and
+ `251 - bias` can be compressed.
By assuming that its value is 100, PSPP uses `bias` to determine the
file's floating-point format and endianness. If the compression