From: Ben Pfaff <blp@cs.stanford.edu>
Date: Sat, 10 May 2025 15:39:13 +0000 (-0700)
Subject: work on manual
X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=39e7f4f214956b848be53211c9d4b9bc6024db2e;p=pspp

work on manual
---

diff --git a/rust/doc/src/SUMMARY.md b/rust/doc/src/SUMMARY.md
index 65862d132f..13185aa046 100644
--- a/rust/doc/src/SUMMARY.md
+++ b/rust/doc/src/SUMMARY.md
@@ -5,6 +5,10 @@
 
 # Language Overview
 
+- [Tutorial](language/tutorial/index.md)
+  - [Data Preparation](language/tutorial/preparation.md)
+  - [Data Screening and Transformation](language/tutorial/transformations.md)
+  - [Hypothesis Testing](language/tutorial/hypotheses.md)
 - [Basics](language/basics/index.md)
   - [Tokens](language/basics/tokens.md)
   - [Forming Commands](language/basics/commands.md)
diff --git a/rust/doc/src/language/tutorial/hypotheses.md b/rust/doc/src/language/tutorial/hypotheses.md
new file mode 100644
index 0000000000..4806cb3282
--- /dev/null
+++ b/rust/doc/src/language/tutorial/hypotheses.md
@@ -0,0 +1,236 @@
+# Hypothesis Testing
+
+One of the most fundamental purposes of statistical analysis is
+hypothesis testing.  Researchers commonly need to test hypotheses about
+a set of data.  For example, she might want to test whether one set of
+data comes from the same distribution as another, or whether the mean of
+a dataset significantly differs from a particular value.  This section
+presents just some of the possible tests that PSPP offers.
+
+The researcher starts by making a "null hypothesis".  Often this is a
+hypothesis which he suspects to be false.  For example, if he suspects
+that A is greater than B he will state the null hypothesis as A = B.[^1]
+
+[^1]: This example assumes that it is already proven that B is not
+greater than A.
+
+The "p-value" is a recurring concept in hypothesis testing.  It is
+the highest acceptable probability that the evidence implying a null
+hypothesis is false, could have been obtained when the null hypothesis
+is in fact true.  Note that this is not the same as "the probability of
+making an error" nor is it the same as "the probability of rejecting a
+hypothesis when it is true".
+
+## Testing for differences of means
+
+A common statistical test involves hypotheses about means.  The `T-TEST`
+command is used to find out whether or not two separate subsets have the
+same mean.
+
+A researcher suspected that the heights and core body temperature of
+persons might be different depending upon their sex.  To investigate
+this, he posed two null hypotheses based on the data from
+`physiology.sav` previously encountered:
+
+   - The mean heights of males and females in the population are equal.
+
+   - The mean body temperature of males and females in the population
+     are equal.
+
+For the purposes of the investigation the researcher decided to use a
+p-value of 0.05.
+
+In addition to the T-test, the `T-TEST` command also performs the
+Levene test for equal variances.  If the variances are equal, then a
+more powerful form of the T-test can be used.  However if it is unsafe
+to assume equal variances, then an alternative calculation is necessary.
+PSPP performs both calculations.
+
+For the height variable, the output shows the significance of the
+Levene test to be 0.33 which means there is a 33% probability that the
+Levene test produces this outcome when the variances are equal.  Had the
+significance been less than 0.05, then it would have been unsafe to
+assume that the variances were equal.  However, because the value is
+higher than 0.05 the homogeneity of variances assumption is safe and the
+"Equal Variances" row (the more powerful test) can be used.  Examining
+this row, the two tailed significance for the height t-test is less than
+0.05, so it is safe to reject the null hypothesis and conclude that the
+mean heights of males and females are unequal.
+
+For the temperature variable, the significance of the Levene test is
+0.58 so again, it is safe to use the row for equal variances.  The equal
+variances row indicates that the two tailed significance for temperature
+is 0.20.  Since this is greater than 0.05 we must reject the null
+hypothesis and conclude that there is insufficient evidence to suggest
+that the body temperature of male and female persons are different.
+
+   The syntax for this analysis is:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/physiology.sav'.
+PSPP> recode height (179 = SYSMIS).
+PSPP> t-test group=sex(0,1) /variables = height temperature.
+```
+
+PSPP produces the following output for this syntax:
+
+```
+                                Group Statistics
+âââââââââââââââââââââââââââââââââââââââââââââ¬âââ¬ââââââââ¬ââââââââââââââ¬âââââââââ
+â                                           â  â       â     Std.    â  S.E.  â
+â                                     Group â Nâ  Mean â  Deviation  â  Mean  â
+âââââââââââââââââââââââââââââââââââââââââââââ¼âââ¼ââââââââ¼ââââââââââââââ¼âââââââââ¤
+âHeight in millimeters                Male  â22â1796.49â        49.71â   10.60â
+â                                     Femaleâ17â1610.77â        25.43â    6.17â
+âââââââââââââââââââââââââââââââââââââââââââââ¼âââ¼ââââââââ¼ââââââââââââââ¼âââââââââ¤
+âInternal body temperature in degrees Male  â22â  36.68â         1.95â     .42â
+âCelcius                              Femaleâ18â  37.43â         1.61â     .38â
+âââââââââââââââââââââââââââââââââââââââââââââ´âââ´ââââââââ´ââââââââââââââ´âââââââââ
+
+                          Independent Samples Test
+âââââââââââââââââââââââ¬âââââââââââ¬ââââââââââââââââââââââââââââââââââââââââââ
+â                     â Levene's â
+â                     â Test for â
+â                     â Equality â
+â                     â    of    â
+â                     â Variancesâ              TâTest for Equality of Means
+â                     ââââââ¬ââââââ¼ââââââ¬ââââââ¬ââââââââ¬âââââââââââ¬âââââââââââ
+â                     â    â     â     â     â       â          â          â
+â                     â    â     â     â     â       â          â          â
+â                     â    â     â     â     â       â          â          â
+â                     â    â     â     â     â       â          â          â
+â                     â    â     â     â     â  Sig. â          â          â
+â                     â    â     â     â     â  (2â  â   Mean   âStd. Errorâ
+â                     â  F â Sig.â  t  â  df âtailed)âDifferenceâDifferenceâ
+âââââââââââââââââââââââ¼âââââ¼ââââââ¼ââââââ¼ââââââ¼ââââââââ¼âââââââââââ¼âââââââââââ¤
+âHeight in   Equal    â .97â .331â14.02â37.00â   .000â    185.72â     13.24â
+âmillimeters variancesâ    â     â     â     â       â          â          â
+â            assumed  â    â     â     â     â       â          â          â
+â            Equal    â    â     â15.15â32.71â   .000â    185.72â     12.26â
+â            variancesâ    â     â     â     â       â          â          â
+â            not      â    â     â     â     â       â          â          â
+â            assumed  â    â     â     â     â       â          â          â
+âââââââââââââââââââââââ¼âââââ¼ââââââ¼ââââââ¼ââââââ¼ââââââââ¼âââââââââââ¼âââââââââââ¤
+âInternal    Equal    â .31â .581ââ1.31â38.00â   .198â      â.75â       .57â
+âbody        variancesâ    â     â     â     â       â          â          â
+âtemperature assumed  â    â     â     â     â       â          â          â
+âin degrees  Equal    â    â     ââ1.33â37.99â   .190â      â.75â       .56â
+âCelcius     variancesâ    â     â     â     â       â          â          â
+â            not      â    â     â     â     â       â          â          â
+â            assumed  â    â     â     â     â       â          â          â
+âââââââââââââââââââââââ´âââââ´ââââââ´ââââââ´ââââââ´ââââââââ´âââââââââââ´âââââââââââ
+
+âââââââââââââââââââââââ¬ââââââââââââââ
+â                     â             â
+â                     â             â
+â                     â             â
+â                     â             â
+â                     â             â
+â                     âââââââââââââââ¤
+â                     â     95%     â
+â                     â  Confidence â
+â                     â Interval of â
+â                     â     the     â
+â                     â  Difference â
+â                     ââââââââ¬âââââââ¤
+â                     â Lowerâ Upperâ
+âââââââââââââââââââââââ¼âââââââ¼âââââââ¤
+âHeight in   Equal    â158.88â212.55â
+âmillimeters variancesâ      â      â
+â            assumed  â      â      â
+â            Equal    â160.76â210.67â
+â            variancesâ      â      â
+â            not      â      â      â
+â            assumed  â      â      â
+âââââââââââââââââââââââ¼âââââââ¼âââââââ¤
+âInternal    Equal    â â1.91â   .41â
+âbody        variancesâ      â      â
+âtemperature assumed  â      â      â
+âin degrees  Equal    â â1.89â   .39â
+âCelcius     variancesâ      â      â
+â            not      â      â      â
+â            assumed  â      â      â
+âââââââââââââââââââââââ´âââââââ´âââââââ
+```
+
+The `T-TEST` command tests for differences of means.  Here, the height
+variable's two tailed significance is less than 0.05, so the null
+hypothesis can be rejected.  Thus, the evidence suggests there is a
+difference between the heights of male and female persons.  However
+the significance of the test for the temperature variable is greater
+than 0.05 so the null hypothesis cannot be rejected, and there is
+insufficient evidence to suggest a difference in body temperature.
+
+## Linear Regression
+
+Linear regression is a technique used to investigate if and how a
+variable is linearly related to others.  If a variable is found to be
+linearly related, then this can be used to predict future values of that
+variable.
+
+In the following example, the service department of the company wanted
+to be able to predict the time to repair equipment, in order to
+improve the accuracy of their quotations.  It was suggested that the
+time to repair might be related to the time between failures and the
+duty cycle of the equipment.  The p-value of 0.1 was chosen for this
+investigation.  In order to investigate this hypothesis, the
+[`REGRESSION`](../../commands/statistics/regression.md) command was
+used.  This command not only tests if the variables are related, but
+also identifies the potential linear relationship.
+
+A first attempt includes `duty_cycle`:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/repairs.sav'.
+PSPP> regression /variables = mtbf duty_cycle /dependent = mttr.
+```
+
+This attempt yields the following output (in part):
+
+```
+                  Coefficients (Mean time to repair (hours) )
+ââââââââââââââââââââââââââ¬ââââââââââââââââââââââ¬ââââââââââââââââââââ¬ââââââ¬âââââ
+â                        â    Unstandardized   â    Standardized   â     â    â
+â                        â     Coefficients    â    Coefficients   â     â    â
+â                        âââââââââââ¬ââââââââââââ¼ââââââââââââââââââââ¤     â    â
+â                        â    B    â Std. Errorâ        Beta       â  t  âSig.â
+ââââââââââââââââââââââââââ¼ââââââââââ¼ââââââââââââ¼ââââââââââââââââââââ¼ââââââ¼âââââ¤
+â(Constant)              â    10.59â       3.11â                .00â 3.40â.002â
+âMean time between       â     3.02â        .20â                .95â14.88â.000â
+âfailures (months)       â         â           â                   â     â    â
+âRatio of working to nonââ    â1.12â       3.69â               â.02â â.30â.763â
+âworking time            â         â           â                   â     â    â
+ââââââââââââââââââââââââââ´ââââââââââ´ââââââââââââ´ââââââââââââââââââââ´ââââââ´âââââ
+```
+
+The coefficients in the above table suggest that the formula
+\\(\textrm{MTTR} = 9.81 + 3.1 \times \textrm{MTBF} + 1.09 \times
+\textrm{DUTY\_CYCLE}\\) can be used to predict the time to repair.
+However, the significance value for the `DUTY_CYCLE` coefficient is
+very high, which would make this an unsafe predictor.  For this
+reason, the test was repeated, but omitting the `duty_cycle` variable:
+
+```
+PSPP> regression /variables = mtbf /dependent = mttr.
+```
+
+This second try produces the following output (in part):
+
+```
+                  Coefficients (Mean time to repair (hours) )
+âââââââââââââââââââââââââ¬âââââââââââââââââââââââ¬ââââââââââââââââââââ¬ââââââ¬âââââ
+â                       â    Unstandardized    â    Standardized   â     â    â
+â                       â     Coefficients     â    Coefficients   â     â    â
+â                       âââââââââââ¬âââââââââââââ¼ââââââââââââââââââââ¤     â    â
+â                       â    B    â Std. Error â        Beta       â  t  âSig.â
+âââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââââ¼ââââââââââââââââââââ¼ââââââ¼âââââ¤
+â(Constant)             â     9.90â        2.10â                .00â 4.71â.000â
+âMean time between      â     3.01â         .20â                .94â15.21â.000â
+âfailures (months)      â         â            â                   â     â    â
+âââââââââââââââââââââââââ´ââââââââââ´âââââââââââââ´ââââââââââââââââââââ´ââââââ´âââââ
+```
+
+This time, the significance of all coefficients is no higher than
+0.06, suggesting that at the 0.06 level, the formula \\(\textrm{MTTR} = 10.5 +
+3.11 \times \textrm{MTBF}\\) is a reliable predictor of the time to repair.
+
diff --git a/rust/doc/src/language/tutorial/index.md b/rust/doc/src/language/tutorial/index.md
new file mode 100644
index 0000000000..39e9d6d163
--- /dev/null
+++ b/rust/doc/src/language/tutorial/index.md
@@ -0,0 +1,36 @@
+# PSPP Language Tutorial
+
+PSPP is a tool for the statistical analysis of sampled data.  You can
+use it to discover patterns in the data, to explain differences in one
+subset of data in terms of another subset and to find out whether
+certain beliefs about the data are justified.  This chapter does not
+attempt to introduce the theory behind the statistical analysis, but it
+shows how such analysis can be performed using PSPP.
+
+This tutorial assumes that you are using PSPP in its interactive mode
+from the command line.  However, the example commands can also be
+typed into a file and executed in a post-hoc mode by typing `pspp
+FILE-NAME` at a shell prompt, where `FILE-NAME` is the name of the
+file containing the commands.  Alternatively, from the graphical
+interface, you can select File â New â Syntax to open a new syntax
+window and use the Run menu when a syntax fragment is ready to be
+executed.  Whichever method you choose, the syntax is identical.
+
+When using the interactive method, PSPP tells you that it's waiting
+for your data with a string like `PSPP>` or `data>`.  In the examples
+of this chapter, whenever you see text like this, it indicates the
+prompt displayed by PSPP, _not_ something that you should type.
+
+Throughout this chapter reference is made to a number of sample data
+files.  So that you can try the examples for yourself, you should have
+received these files along with your copy of PSPP.[^1]
+
+> Normally these files are installed in the directory
+`/usr/local/share/pspp/examples`.  If however your system
+administrator or operating system vendor has chosen to install them in
+a different location, you will have to adjust the examples
+accordingly.
+
+[^1]: These files contain purely fictitious data.  They should not be
+used for research purposes.
+
diff --git a/rust/doc/src/language/tutorial/preparation.md b/rust/doc/src/language/tutorial/preparation.md
new file mode 100644
index 0000000000..d5f0b3791c
--- /dev/null
+++ b/rust/doc/src/language/tutorial/preparation.md
@@ -0,0 +1,191 @@
+# Preparation of Data Files
+
+Before analysis can commence, the data must be loaded into PSPP and
+arranged such that both PSPP and humans can understand what the data
+represents.  There are two aspects of data:
+
+- The variablesâthese are the parameters of a quantity which has
+  been measured or estimated in some way.  For example height, weight
+  and geographic location are all variables.
+
+- The observations (also called 'cases') of the variablesâeach
+  observation represents an instance when the variables were measured
+  or observed.
+
+For example, a data set which has the variables height, weight, and
+name, might have the observations:
+
+```
+1881 89.2 Ahmed
+1192 107.01 Frank
+1230 67 Julie
+```
+
+The following sections explain how to define a dataset.
+
+## Defining Variables
+
+Variables come in two basic types: "numeric" and "string".
+Variables such as age, height and satisfaction are numeric, whereas name
+is a string variable.  String variables are best reserved for commentary
+data to assist the human observer.  However they can also be used for
+nominal or categorical data.
+
+The following example defines two variables, `forename` and `height`,
+and reads data into them by manual input:
+
+```
+PSPP> data list list /forename (A12) height.
+PSPP> begin data.
+data> Ahmed 188
+data> Bertram 167
+data> Catherine 134.231
+data> David 109.1
+data> end data
+PSPP>
+```
+
+There are several things to note about this example.
+
+- The words `data list list` are an example of the [`DATA
+  LIST`](../../commands/data-io/data-list.md).  command, which tells
+  PSPP to prepare for reading data.  The word `list` intentionally
+  appears twice.  The first occurrence is part of the `DATA LIST`
+  call, whilst the second tells PSPP that the data is to be read as
+  free format data with one record per line.
+
+  Usually this manual shows command names and other fixed elements of
+  syntax in upper case, but case doesn't matter in most parts of
+  command syntax.  In the tutorial, we usually show them in lowercase
+  because they are easier to type that way.
+
+- The `/` character is important.  It marks the start of the list of
+  variables which you wish to define.
+
+- The text `forename` is the name of the first variable, and `(A12)`
+  says that the variable forename is a string variable and that its
+  maximum length is 12 bytes.  The second variable's name is specified
+  by the text `height`.  Since no format is given, this variable has
+  the default format.  Normally the default format expects numeric
+  data, which should be entered in the locale of the operating system.
+  Thus, the example is correct for English locales and other locales
+  which use a period (`.`) as the decimal separator.  However if you
+  are using a system with a locale which uses the comma (`,`) as the
+  decimal separator, then you should in the subsequent lines
+  substitute `.` with `,`.  Alternatively, you could explicitly tell
+  PSPP that the height variable is to be read using a period as its
+  decimal separator by appending the text `DOT8.3` after the word
+  `height`.  For more information on data formats, see [Input and
+  Output Formats](../../language/datasets/formats/index.md).
+
+- PSPP displays the prompt `PSPP>` when it's expecting a command.
+  When it's expecting data, the prompt changes to `data>` so that you
+  know to enter data and not a command.
+
+- At the end of every command there is a terminating `.` which tells
+  PSPP that the end of a command has been encountered.  You should not
+  enter `.` when data is expected (ie.  when the `data>` prompt is
+  current) since it is appropriate only for terminating commands.
+
+  You can also terminate a command with a blank line.
+
+## Listing the data
+
+Once the data has been entered, you could type
+```
+PSPP> list /format=numbered.
+```
+to list the data.  The optional text `/format=numbered` requests the
+case numbers to be shown along with the data.  It should show the
+following output:
+
+```
+           Data List
+âââââââââââââ¬ââââââââââ¬âââââââ
+âCase Numberâ forenameâheightâ
+âââââââââââââ¼ââââââââââ¼âââââââ¤
+â1          âAhmed    â188.00â
+â2          âBertram  â167.00â
+â3          âCatherineâ134.23â
+â4          âDavid    â109.10â
+âââââââââââââ´ââââââââââ´âââââââ
+```
+
+
+Note that the numeric variable height is displayed to 2 decimal
+places, because the format for that variable is `F8.2`.  For a
+complete description of the `LIST` command, see
+[`LIST`](../../commands/data-io/list.md).
+
+## Reading data from a text file
+
+The previous example showed how to define a set of variables and to
+manually enter the data for those variables.  Manual entering of data is
+tedious work, and often a file containing the data will be have been
+previously prepared.  Let us assume that you have a file called
+`mydata.dat` containing the ascii encoded data:
+
+```
+Ahmed          188.00
+Bertram        167.00
+Catherine      134.23
+David          109.10
+              .
+              .
+              .
+Zachariah      113.02
+```
+
+You can can tell the `DATA LIST` command to read the data directly
+from this file instead of by manual entry, with a command like: PSPP>
+data list file='mydata.dat' list /forename (A12) height.  Notice
+however, that it is still necessary to specify the names of the
+variables and their formats, since this information is not contained
+in the file.  It is also possible to specify the file's character
+encoding and other parameters.  For full details refer to [`DATA
+LIST`](../../commands/data-io/data-list.md).
+
+## Reading data from a pre-prepared PSPP file
+
+When working with other PSPP users, or users of other software which
+uses the PSPP data format, you may be given the data in a pre-prepared
+PSPP file.  Such files contain not only the data, but the variable
+definitions, along with their formats, labels and other meta-data.
+Conventionally, these files (sometimes called "system" files) have the
+suffix `.sav`, but that is not mandatory.  The following syntax loads a
+file called `my-file.sav`.
+
+```
+PSPP> get file='my-file.sav'.
+```
+
+You will encounter several instances of this in future examples.
+
+## Saving data to a PSPP file.
+
+If you want to save your data, along with the variable definitions so
+that you or other PSPP users can use it later, you can do this with the
+`SAVE` command.
+
+   The following syntax will save the existing data and variables to a
+file called `my-new-file.sav`.
+
+```
+PSPP> save outfile='my-new-file.sav'.
+```
+
+If `my-new-file.sav` already exists, then it will be overwritten.
+Otherwise it will be created.
+
+## Reading data from other sources
+
+Sometimes it's useful to be able to read data from comma separated
+text, from spreadsheets, databases or other sources.  In these
+instances you should use the [`GET
+DATA`](../../commands/spss-io/get-data.md) command.
+
+## Exiting PSPP
+
+Use the `FINISH` command to exit PSPP:
+     PSPP> finish.
+
diff --git a/rust/doc/src/language/tutorial/transformations.md b/rust/doc/src/language/tutorial/transformations.md
new file mode 100644
index 0000000000..1975bfb257
--- /dev/null
+++ b/rust/doc/src/language/tutorial/transformations.md
@@ -0,0 +1,385 @@
+# Data Screening and Transformation
+
+Once data has been entered, it is often desirable, or even necessary,
+to transform it in some way before performing analysis upon it.  At
+the very least, it's good practice to check for errors.
+
+## Identifying incorrect data
+
+Data from real sources is rarely error free.  PSPP has a number of
+procedures which can be used to help identify data which might be
+incorrect.
+
+The [`DESCRIPTIVES`](../../commands/statistics/descriptives.md)
+command is used to generate simple linear statistics for a dataset.
+It is also useful for identifying potential problems in the data.  The
+example file `physiology.sav` contains a number of physiological
+measurements of a sample of healthy adults selected at random.
+However, the data entry clerk made a number of mistakes when entering
+the data.  The following example illustrates the use of `DESCRIPTIVES`
+to screen this data and identify the erroneous values:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/physiology.sav'.
+PSPP> descriptives sex, weight, height.
+```
+
+For this example, PSPP produces the following output:
+
+```
+                  Descriptive Statistics
+âââââââââââââââââââââââ¬âââ¬ââââââââ¬ââââââââ¬ââââââââ¬ââââââââ
+â                     â Nâ  Mean âStd DevâMinimumâMaximumâ
+âââââââââââââââââââââââ¼âââ¼ââââââââ¼ââââââââ¼ââââââââ¼ââââââââ¤
+âSex of subject       â40â    .45â    .50âMale   âFemale â
+âWeight in kilograms  â40â  72.12â  26.70â  â55.6â   92.1â
+âHeight in millimetersâ40â1677.12â 262.87â    179â   1903â
+âValid N (listwise)   â40â       â       â       â       â
+âMissing N (listwise) â 0â       â       â       â       â
+âââââââââââââââââââââââ´âââ´ââââââââ´ââââââââ´ââââââââ´ââââââââ
+```
+
+
+The most interesting column in the output is the minimum value.  The
+weight variable has a minimum value of less than zero, which is clearly
+erroneous.  Similarly, the height variable's minimum value seems to be
+very low.  In fact, it is more than 5 standard deviations from the mean,
+and is a seemingly bizarre height for an adult person.
+
+We can look deeper into these discrepancies by issuing an additional
+`EXAMINE` command:
+
+```
+PSPP> examine height, weight /statistics=extreme(3).
+```
+
+This command produces the following additional output (in part):
+
+```
+                   Extreme Values
+âââââââââââââââââââââââââââââââââ¬ââââââââââââ¬ââââââ
+â                               âCase NumberâValueâ
+âââââââââââââââââââââââââââââââââ¼ââââââââââââ¼ââââââ¤
+âHeight in millimeters Highest 1â         14â 1903â
+â                              2â         15â 1884â
+â                              3â         12â 1802â
+â                     âââââââââââ¼ââââââââââââ¼ââââââ¤
+â                      Lowest  1â         30â  179â
+â                              2â         31â 1598â
+â                              3â         28â 1601â
+âââââââââââââââââââââââââââââââââ¼ââââââââââââ¼ââââââ¤
+âWeight in kilograms   Highest 1â         13â 92.1â
+â                              2â          5â 92.1â
+â                              3â         17â 91.7â
+â                     âââââââââââ¼ââââââââââââ¼ââââââ¤
+â                      Lowest  1â         38ââ55.6â
+â                              2â         39â 54.5â
+â                              3â         33â 55.4â
+âââââââââââââââââââââââââââââââââ´ââââââââââââ´ââââââ
+```
+
+From this new output, you can see that the lowest value of height is 179
+(which we suspect to be erroneous), but the second lowest is 1598 which
+we know from `DESCRIPTIVES` is within 1 standard deviation from the
+mean.  Similarly, the lowest value of weight is negative, but its second
+lowest value is plausible.  This suggests that the two extreme values
+are outliers and probably represent data entry errors.
+
+The output also identifies the case numbers for each extreme value,
+so we can see that cases 30 and 38 are the ones with the erroneous
+values.
+
+## Dealing with suspicious data
+
+If possible, suspect data should be checked and re-measured.  However,
+this may not always be feasible, in which case the researcher may
+decide to disregard these values.  PSPP has a feature for [missing
+values](../../language/basics/missing-values.md), whereby data can
+assume the special value 'SYSMIS', and will be disregarded in future
+analysis.  You can set the two suspect values to the `SYSMIS` value
+using the [`RECODE`](../../commands/data/recode.md) command.
+
+```
+PSPP> recode height (179 = SYSMIS).
+PSPP> recode weight (LOWEST THRU 0 = SYSMIS).
+```
+
+The first command says that for any observation which has a height value
+of 179, that value should be changed to the SYSMIS value.  The second
+command says that any weight values of zero or less should be changed to
+SYSMIS. From now on, they will be ignored in analysis.
+
+If you now re-run the `DESCRIPTIVES` or `EXAMINE` commands from the
+previous section, you will see a data summary with more plausible
+parameters.  You will also notice that the data summaries indicate the
+two missing values.
+
+## Inverting negatively coded variables
+
+Data entry errors are not the only reason for wanting to recode data.
+The sample file `hotel.sav` comprises data gathered from a customer
+satisfaction survey of clients at a particular hotel.  The following
+commands load the file and display its variables and associated data:
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/hotel.sav'.
+PSPP> display dictionary.
+```
+
+It yields the following output:
+
+```
+                                   Variables
+ââââââ¬âââââââââ¬ââââââââââââââ¬âââââââââââââ¬ââââââ¬ââââââ¬ââââââââââ¬âââââââ¬ââââââââ
+â    â        â             â Measurementâ     â     â         â Printâ Write â
+âNameâPositionâ    Label    â    Level   â RoleâWidthâAlignmentâFormatâ Formatâ
+ââââââ¼âââââââââ¼ââââââââââââââ¼âââââââââââââ¼ââââââ¼ââââââ¼ââââââââââ¼âââââââ¼ââââââââ¤
+âv1  â       1âI am         âOrdinal     âInputâ    8âRight    âF8.0  âF8.0   â
+â    â        âsatisfied    â            â     â     â         â      â       â
+â    â        âwith the     â            â     â     â         â      â       â
+â    â        âlevel of     â            â     â     â         â      â       â
+â    â        âservice      â            â     â     â         â      â       â
+âv2  â       2âThe value forâOrdinal     âInputâ    8âRight    âF8.0  âF8.0   â
+â    â        âmoney was    â            â     â     â         â      â       â
+â    â        âgood         â            â     â     â         â      â       â
+âv3  â       3âThe staff    âOrdinal     âInputâ    8âRight    âF8.0  âF8.0   â
+â    â        âwere slow in â            â     â     â         â      â       â
+â    â        âresponding   â            â     â     â         â      â       â
+âv4  â       4âMy concerns  âOrdinal     âInputâ    8âRight    âF8.0  âF8.0   â
+â    â        âwere dealt   â            â     â     â         â      â       â
+â    â        âwith in an   â            â     â     â         â      â       â
+â    â        âefficient    â            â     â     â         â      â       â
+â    â        âmanner       â            â     â     â         â      â       â
+âv5  â       5âThere was tooâOrdinal     âInputâ    8âRight    âF8.0  âF8.0   â
+â    â        âmuch noise inâ            â     â     â         â      â       â
+â    â        âthe rooms    â            â     â     â         â      â       â
+ââââââ´âââââââââ´ââââââââââââââ´âââââââââââââ´ââââââ´ââââââ´ââââââââââ´âââââââ´ââââââââ
+
+                              Value Labels
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¬ââââââââââââââââââ
+âVariable Value                                      â      Label      â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââââââââââ¤
+âI am satisfied with the level of service           1âStrongly Disagreeâ
+â                                                   2âDisagree         â
+â                                                   3âNo Opinion       â
+â                                                   4âAgree            â
+â                                                   5âStrongly Agree   â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââââââââââ¤
+âThe value for money was good                       1âStrongly Disagreeâ
+â                                                   2âDisagree         â
+â                                                   3âNo Opinion       â
+â                                                   4âAgree            â
+â                                                   5âStrongly Agree   â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââââââââââ¤
+âThe staff were slow in responding                  1âStrongly Disagreeâ
+â                                                   2âDisagree         â
+â                                                   3âNo Opinion       â
+â                                                   4âAgree            â
+â                                                   5âStrongly Agree   â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââââââââââ¤
+âMy concerns were dealt with in an efficient manner 1âStrongly Disagreeâ
+â                                                   2âDisagree         â
+â                                                   3âNo Opinion       â
+â                                                   4âAgree            â
+â                                                   5âStrongly Agree   â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââââââââââ¤
+âThere was too much noise in the rooms              1âStrongly Disagreeâ
+â                                                   2âDisagree         â
+â                                                   3âNo Opinion       â
+â                                                   4âAgree            â
+â                                                   5âStrongly Agree   â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ´ââââââââââââââââââ
+```
+
+The output shows that all of the variables v1 through v5 are measured
+on a 5 point Likert scale, with 1 meaning "Strongly disagree" and 5
+meaning "Strongly agree".  However, some of the questions are positively
+worded (v1, v2, v4) and others are negatively worded (v3, v5).  To
+perform meaningful analysis, we need to recode the variables so that
+they all measure in the same direction.  We could use the `RECODE`
+command, with syntax such as:
+
+```
+recode v3 (1 = 5) (2 = 4) (4 = 2) (5 = 1).
+```
+
+However an easier and more elegant way uses the
+[`COMPUTE`](../../commands/data/compute.md) command.  Since the
+variables are Likert variables in the range (1 ... 5), subtracting
+their value from 6 has the effect of inverting them:
+
+```
+compute VAR = 6 - VAR.
+```
+
+The following section uses this technique to recode the
+variables v3 and v5.  After applying `COMPUTE` for both variables, all
+subsequent commands will use the inverted values.
+
+## Testing data consistency
+
+A sensible check to perform on survey data is the calculation of
+reliability.  This gives the statistician some confidence that the
+questionnaires have been completed thoughtfully.  If you examine the
+labels of variables v1, v3 and v4, you will notice that they ask very
+similar questions.  One would therefore expect the values of these
+variables (after recoding) to closely follow one another, and we can
+test that with the
+[`RELIABILITY`](../../commands/statistics/reliability.md) command.
+The following example shows a PSPP session where the user recodes
+negatively scaled variables and then requests reliability statistics
+for v1, v3, and v4.
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/hotel.sav'.
+PSPP> compute v3 = 6 - v3.
+PSPP> compute v5 = 6 - v5.
+PSPP> reliability v1, v3, v4.
+```
+
+This yields the following output:
+
+```
+Scale: ANY
+
+Case Processing Summary
+ââââââââââ¬âââ¬ââââââââ
+âCases   â NâPercentâ
+ââââââââââ¼âââ¼ââââââââ¤
+âValid   â17â 100.0%â
+âExcludedâ 0â    .0%â
+âTotal   â17â 100.0%â
+ââââââââââ´âââ´ââââââââ
+
+    Reliability Statistics
+ââââââââââââââââââ¬âââââââââââ
+âCronbach's AlphaâN of Itemsâ
+ââââââââââââââââââ¼âââââââââââ¤
+â             .81â         3â
+ââââââââââââââââââ´âââââââââââ
+```
+
+As a rule of thumb, many statisticians consider a value of Cronbach's
+Alpha of 0.7 or higher to indicate reliable data.
+
+Here, the value is 0.81, which suggests a high degree of reliability
+among variables v1, v3 and v4, so the data and the recoding that we
+performed are vindicated.
+
+## Testing for normality
+
+Many statistical tests rely upon certain properties of the data.  One
+common property, upon which many linear tests depend, is that of
+normality -- the data must have been drawn from a normal distribution.
+It is necessary then to ensure normality before deciding upon the test
+procedure to use.  One way to do this uses the `EXAMINE` command.
+
+In the following example, a researcher was examining the failure
+rates of equipment produced by an engineering company.  The file
+`repairs.sav` contains the mean time between failures (mtbf) of some
+items of equipment subject to the study.  Before performing linear
+analysis on the data, the researcher wanted to ascertain that the data
+is normally distributed.
+
+```
+PSPP> get file='/usr/local/share/pspp/examples/repairs.sav'.
+PSPP> examine mtbf /statistics=descriptives.
+```
+
+This produces the following output:
+
+```
+                                  Descriptives
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¬ââââââââââ¬âââââââââ
+â                                                          â         â  Std.  â
+â                                                          âStatisticâ  Error â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+âMean time between        Mean                             â     8.78â    1.10â
+âfailures (months)       âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         95% Confidence Interval Lower    â     6.53â        â
+â                         for Mean                Bound    â         â        â
+â                                                 Upper    â    11.04â        â
+â                                                 Bound    â         â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         5% Trimmed Mean                  â     8.20â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Median                           â     8.29â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Variance                         â    36.34â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Std. Deviation                   â     6.03â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Minimum                          â     1.63â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Maximum                          â    26.47â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Range                            â    24.84â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Interquartile Range              â     6.03â        â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Skewness                         â     1.65â     .43â
+â                        âââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââ¤
+â                         Kurtosis                         â     3.41â     .83â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ´ââââââââââ´âââââââââ
+```
+
+A normal distribution has a skewness and kurtosis of zero.  The
+skewness of mtbf in the output above makes it clear that the mtbf
+figures have a lot of positive skew and are therefore not drawn from a
+normally distributed variable.  Positive skew can often be compensated
+for by applying a logarithmic transformation, as in the following
+continuation of the example:
+
+```
+PSPP> compute mtbf_ln = ln (mtbf).
+PSPP> examine mtbf_ln /statistics=descriptives.
+```
+
+which produces the following additional output:
+
+```
+                                Descriptives
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¬ââââââââââ¬âââââââââââ
+â                                                    âStatisticâStd. Errorâ
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+âmtbf_ln Mean                                        â     1.95â       .13â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        95% Confidence Interval for Mean Lower Boundâ     1.69â          â
+â                                         Upper Boundâ     2.22â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        5% Trimmed Mean                             â     1.96â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Median                                      â     2.11â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Variance                                    â      .49â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Std. Deviation                              â      .70â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Minimum                                     â      .49â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Maximum                                     â     3.28â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Range                                       â     2.79â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Interquartile Range                         â      .88â          â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Skewness                                    â     â.37â       .43â
+â       ââââââââââââââââââââââââââââââââââââââââââââââ¼ââââââââââ¼âââââââââââ¤
+â        Kurtosis                                    â      .01â       .83â
+ââââââââââââââââââââââââââââââââââââââââââââââââââââââ´ââââââââââ´âââââââââââ
+```
+
+The `COMPUTE` command in the first line above performs the logarithmic
+transformation: ``` compute mtbf_ln = ln (mtbf).  ``` Rather than
+redefining the existing variable, this use of `COMPUTE` defines a new
+variable mtbf_ln which is the natural logarithm of mtbf.  The final
+command in this example calls `EXAMINE` on this new variable.  The
+results show that both the skewness and kurtosis for mtbf_ln are very
+close to zero.  This provides some confidence that the mtbf_ln
+variable is normally distributed and thus safe for linear analysis.
+In the event that no suitable transformation can be found, then it
+would be worth considering an appropriate non-parametric test instead
+of a linear one.  See [`NPAR
+TESTS`](../../commands/statistics/npar-tests.md), for information
+about non-parametric tests.
+
diff --git a/rust/doc/src/system-file/file-header-record.md b/rust/doc/src/system-file/file-header-record.md
index 9b95430898..d444313f3b 100644
--- a/rust/doc/src/system-file/file-header-record.md
+++ b/rust/doc/src/system-file/file-header-record.md
@@ -89,8 +89,8 @@ A system file begins with the file header, with the following format:
 
 * `flt64 bias;`
 
-  Compression bias, ordinarily set to 100.  Only integers between `1
-  - bias` and `251 - bias` can be compressed.
+  Compression bias, usually 100.  Only integers between `1 - bias` and
+  `251 - bias` can be compressed.
 
   By assuming that its value is 100, PSPP uses `bias` to determine the
   file's floating-point format and endianness.  If the compression