- [VARIABLE WIDTH](commands/variables/variable-width.md)
- [VECTOR](commands/variables/vector.md)
- [WRITE FORMATS](commands/variables/write-formats.md)
+- [Transforming Data](commands/data/index.md)
+ - [AGGREGATE](commands/data/aggregate.md)
+ - [AUTORECODE](commands/data/autorecode.md)
+ - [COMPUTE](commands/data/compute.md)
# Developer Documentation
--- /dev/null
+# AGGREGATE
+
+```
+AGGREGATE
+ [OUTFILE={*,'FILE_NAME',FILE_HANDLE} [MODE={REPLACE,ADDVARIABLES}]]
+ [/MISSING=COLUMNWISE]
+ [/PRESORTED]
+ [/DOCUMENT]
+ [/BREAK=VAR_LIST]
+ /DEST_VAR['LABEL']...=AGR_FUNC(SRC_VARS[, ARGS]...)...
+```
+
+`AGGREGATE` summarizes groups of cases into single cases. It divides
+cases into groups that have the same values for one or more variables
+called "break variables". Several functions are available for
+summarizing case contents.
+
+The `AGGREGATE` syntax consists of subcommands to control its
+behavior, all of which are optional, followed by one or more
+destination variable assigments, each of which uses an aggregation
+function to define how it is calculated.
+
+The `OUTFILE` subcommand, which must be first, names the destination
+for `AGGREGATE` output. It may name a system file by file name or
+[file handle](../../language/files/file-handles.md), a
+[dataset](../../language/datasets/index.md) by its name, or `*` to
+replace the active dataset. `AGGREGATE` writes its output to this
+file.
+
+With `OUTFILE=*` only, `MODE` may be specified immediately afterward
+with the value `ADDVARIABLES` or `REPLACE`:
+
+- With `REPLACE`, the default, the active dataset is replaced by a
+ new dataset which contains just the break variables and the
+ destination varibles. The new file contains as many cases as there
+ are unique combinations of the break variables.
+
+- With `ADDVARIABLES`, the destination variables are added to those
+ in the existing active dataset. Cases that have the same
+ combination of values in their break variables receive identical
+ values for the destination variables. The number of cases in the
+ active dataset remains unchanged. The data must be sorted on the
+ break variables, that is, `ADDVARIABLES` implies `PRESORTED`
+
+Without `OUTFILE`, `AGGREGATE` acts as if `OUTFILE=*
+MODE=ADDVARIABLES` were specified.
+
+By default, `AGGREGATE` first sorts the data on the break variables.
+If the active dataset is already sorted or grouped by the break
+variables, specify `PRESORTED` to save time. With
+`MODE=ADDVARIABLES`, the data must be pre-sorted.
+
+Specify `DOCUMENT` (*note DOCUMENT::) to copy the documents from the
+active dataset into the aggregate file. Otherwise, the aggregate file
+does not contain any documents, even if the aggregate file replaces
+the active dataset.
+
+Normally, `AGGREGATE` produces a non-missing value whenever there is
+enough non-missing data for the aggregation function in use, that is,
+just one non-missing value or, for the `SD` and `SD.` aggregation
+functions, two non-missing values. Specify `/MISSING=COLUMNWISE` to
+make `AGGREGATE` output a missing value when one or more of the input
+values are missing.
+
+The `BREAK` subcommand is optionally but usually present. On `BREAK`,
+list the variables used to divide the active dataset into groups to be
+summarized.
+
+`AGGREGATE` is particular about the order of subcommands. `OUTFILE`
+must be first, followed by `MISSING`. `PRESORTED` and `DOCUMENT`
+follow `MISSING`, in either order, followed by `BREAK`, then followed
+by aggregation variable specifications.
+
+At least one set of aggregation variables is required. Each set
+comprises a list of aggregation variables, an equals sign (`=`), the
+name of an aggregation function (see the list below), and a list of
+source variables in parentheses. A few aggregation functions do not
+accept source variables, and some aggregation functions expect
+additional arguments after the source variable names.
+
+`AGGREGATE` typically creates aggregation variables with no variable
+label, value labels, or missing values. Their default print and write
+formats depend on the aggregation function used, with details given in
+the table below. A variable label for an aggregation variable may be
+specified just after the variable's name in the aggregation variable
+list.
+
+Each set must have exactly as many source variables as aggregation
+variables. Each aggregation variable receives the results of applying
+the specified aggregation function to the corresponding source variable.
+
+The following aggregation functions may be applied only to numeric
+variables:
+
+* `MEAN(VAR_NAME...)`
+ Arithmetic mean. Limited to numeric values. The default format is
+ `F8.2`.
+
+* `MEDIAN(VAR_NAME...)`
+ The median value. Limited to numeric values. The default format
+ is `F8.2`.
+
+* `SD(VAR_NAME...)`
+ Standard deviation of the mean. Limited to numeric values. The
+ default format is `F8.2`.
+
+* `SUM(VAR_NAME...)`
+ Sum. Limited to numeric values. The default format is `F8.2`.
+
+ These aggregation functions may be applied to numeric and string
+variables:
+
+* `CGT(VAR_NAME..., VALUE)`
+ `CLT(VAR_NAME..., VALUE)`
+ `CIN(VAR_NAME..., LOW, HIGH)`
+ `COUT(VAR_NAME..., LOW, HIGH)`
+ Total weight of cases greater than or less than `VALUE` or inside or
+ outside the closed range `[LOW,HIGH]`, respectively. The default
+ format is `F5.3`.
+
+* `FGT(VAR_NAME..., VALUE)`
+ `FLT(VAR_NAME..., VALUE)`
+ `FIN(VAR_NAME..., LOW, HIGH)`
+ `FOUT(VAR_NAME..., LOW, HIGH)`
+ Fraction of values greater than or less than `VALUE` or inside or
+ outside the closed range `[LOW,HIGH]`, respectively. The default
+ format is `F5.3`.
+
+* `FIRST(VAR_NAME...)`
+ `LAST(VAR_NAME...)`
+ First or last non-missing value, respectively, in break group. The
+ aggregation variable receives the complete dictionary information
+ from the source variable. The sort performed by `AGGREGATE` (and
+ by `SORT CASES`) is stable. This means that the first (or last)
+ case with particular values for the break variables before sorting
+ is also the first (or last) case in that break group after sorting.
+
+* `MIN(VAR_NAME...)`
+ `MAX(VAR_NAME...)`
+ Minimum or maximum value, respectively. The aggregation variable
+ receives the complete dictionary information from the source
+ variable.
+
+* `N(VAR_NAME...)`
+ `NMISS(VAR_NAME...)`
+ Total weight of non-missing or missing values, respectively. The
+ default format is `F7.0` if weighting is not enabled, `F8.2` if it is
+ (*note WEIGHT::).
+
+* `NU(VAR_NAME...)`
+ `NUMISS(VAR_NAME...)`
+ Count of non-missing or missing values, respectively, ignoring case
+ weights. The default format is `F7.0`.
+
+* `PGT(VAR_NAME..., VALUE)`
+ `PLT(VAR_NAME..., VALUE)`
+ `PIN(VAR_NAME..., LOW, HIGH)`
+ `POUT(VAR_NAME..., LOW, HIGH)`
+ Percentage between 0 and 100 of values greater than or less than
+ `VALUE` or inside or outside the closed range `[LOW,HIGH]`,
+ respectively. The default format is `F5.1`.
+
+These aggregation functions do not accept source variables:
+
+* `N`
+ Total weight of cases aggregated to form this group. The default
+ format is `F7.0` if weighting is not enabled, `F8.2` if it is (*note
+ WEIGHT::).
+
+* `NU`
+ Count of cases aggregated to form this group, ignoring case
+ weights. The default format is `F7.0`.
+
+Aggregation functions compare string values in terms of Unicode
+character codes.
+
+The aggregation functions listed above exclude all user-missing values
+from calculations. To include user-missing values, insert a period
+(`.`) at the end of the function name. (e.g. `SUM.`). (Be aware that
+specifying such a function as the last token on a line causes the
+period to be interpreted as the end of the command.)
+
+`AGGREGATE` both ignores and cancels the current `SPLIT FILE` settings
+(*note SPLIT FILE::).
+
+## Example
+
+The `personnel.sav` dataset provides the occupations and salaries of
+many individuals. For many purposes however such detailed information
+is not interesting, but often the aggregated statistics of each
+occupation are of interest. Here, the `AGGREGATE` command is used to
+calculate the mean, the median and the standard deviation of each
+occupation.
+
+```
+GET FILE="personnel.sav".
+AGGREGATE OUTFILE=* MODE=REPLACE
+ /BREAK=occupation
+ /occ_mean_salary=MEAN(salary)
+ /occ_median_salary=MEDIAN(salary)
+ /occ_std_dev_salary=SD(salary).
+LIST.
+```
+
+Since we chose the `MODE=REPLACE` option, cases for the individual
+persons are no longer present. They have each been replaced by a
+single case per aggregated value.
+
+```
+ Data List
+┌──────────────────┬───────────────┬─────────────────┬──────────────────┐
+│ occupation │occ_mean_salary│occ_median_salary│occ_std_dev_salary│
+├──────────────────┼───────────────┼─────────────────┼──────────────────┤
+│Artist │ 37836.18│ 34712.50│ 7631.48│
+│Baker │ 45075.20│ 45075.20│ 4411.21│
+│Barrister │ 39504.00│ 39504.00│ .│
+│Carpenter │ 39349.11│ 36190.04│ 7453.40│
+│Cleaner │ 41142.50│ 39647.49│ 14378.98│
+│Cook │ 40357.79│ 43194.00│ 11064.51│
+│Manager │ 46452.14│ 45657.56│ 6901.69│
+│Mathematician │ 34531.06│ 34763.06│ 5267.68│
+│Painter │ 45063.55│ 45063.55│ 15159.67│
+│Payload Specialist│ 34355.72│ 34355.72│ .│
+│Plumber │ 40413.91│ 40410.00│ 4726.05│
+│Scientist │ 36687.07│ 36803.83│ 10873.54│
+│Scrientist │ 42530.65│ 42530.65│ .│
+│Tailor │ 34586.79│ 34586.79│ 3728.98│
+└──────────────────┴───────────────┴─────────────────┴──────────────────┘
+```
+
+Some values for the standard deviation are blank because there is only
+one case with the respective occupation.
+
--- /dev/null
+# AUTORECODE
+
+```
+AUTORECODE VARIABLES=SRC_VARS INTO DEST_VARS
+ [ /DESCENDING ]
+ [ /PRINT ]
+ [ /GROUP ]
+ [ /BLANK = {VALID, MISSING} ]
+```
+
+The `AUTORECODE` procedure considers the N values that a variable
+takes on and maps them onto values 1...N on a new numeric variable.
+
+Subcommand `VARIABLES` is the only required subcommand and must come
+first. Specify `VARIABLES`, an equals sign (`=`), a list of source
+variables, `INTO`, and a list of target variables. There must the
+same number of source and target variables. The target variables must
+not already exist.
+
+`AUTORECODE` ordinarily assigns each increasing non-missing value of a
+source variable (for a string, this is based on character code
+comparisons) to consecutive values of its target variable. For
+example, the smallest non-missing value of the source variable is
+recoded to value 1, the next smallest to 2, and so on. If the source
+variable has user-missing values, they are recoded to consecutive
+values just above the non-missing values. For example, if a source
+variables has seven distinct non-missing values, then the smallest
+missing value would be recoded to 8, the next smallest to 9, and so
+on.
+
+Use `DESCENDING` to reverse the sort order for non-missing values, so
+that the largest non-missing value is recoded to 1, the second-largest
+to 2, and so on. Even with `DESCENDING`, user-missing values are
+still recoded in ascending order just above the non-missing values.
+
+The system-missing value is always recoded into the system-missing
+variable in target variables.
+
+If a source value has a value label, then that value label is retained
+for the new value in the target variable. Otherwise, the source value
+itself becomes each new value's label.
+
+Variable labels are copied from the source to target variables.
+
+`PRINT` is currently ignored.
+
+The `GROUP` subcommand is relevant only if more than one variable is
+to be recoded. It causes a single mapping between source and target
+values to be used, instead of one map per variable. With `GROUP`,
+user-missing values are taken from the first source variable that has
+any user-missing values.
+
+If `/BLANK=MISSING` is given, then string variables which contain
+only whitespace are recoded as SYSMIS. If `/BLANK=VALID` is specified
+then they are allocated a value like any other. `/BLANK` is not
+relevant to numeric values. `/BLANK=VALID` is the default.
+
+`AUTORECODE` is a procedure. It causes the data to be read.
+
+## Example
+
+In the file `personnel.sav`, the variable occupation is a string
+variable. Except for data of a purely commentary nature, string
+variables are generally a bad idea. One reason is that data entry
+errors are easily overlooked. This has happened in `personnel.sav`;
+one entry which should read "Scientist" has been mistyped as
+"Scrientist". The syntax below shows how to correct this error in the
+`DO IF` clause[^1], which then uses `AUTORECODE` to create a new numeric
+variable which takes recoded values of occupation. Finally, we remove
+the old variable and rename the new variable to the name of the old
+variable:
+
+[^1]: One must use care when correcting such data input errors rather
+than simply marking them as missing. For example, if an occupation
+has been entered "Barister", did the person mean "Barrister" or
+"Barista"?
+
+```
+get file='personnel.sav'.
+
+* Correct a typing error in the original file.
+do if occupation = "Scrientist".
+ compute occupation = "Scientist".
+end if.
+
+autorecode
+ variables = occupation into occ
+ /blank = missing.
+
+* Delete the old variable.
+delete variables occupation.
+
+* Rename the new variable to the old variable's name.
+rename variables (occ = occupation).
+
+* Inspect the new variable.
+display dictionary /variables=occupation.
+```
+
+
+Notice, in the output below, how the new variable has been
+automatically allocated value labels which correspond to the strings
+of the old variable. This means that in future analyses the
+descriptive strings are reported instead of the numeric values.
+
+```
+ Variables
++----------+--------+--------------+-----+-----+---------+----------+---------+
+| | | Measurement | | | | Print | Write |
+|Name |Position| Level | Role|Width|Alignment| Format | Format |
++----------+--------+--------------+-----+-----+---------+----------+---------+
+|occupation| 6|Unknown |Input| 8|Right |F2.0 |F2.0 |
++----------+--------+--------------+-----+-----+---------+----------+---------+
+
+ Value Labels
++---------------+------------------+
+|Variable Value | Label |
++---------------+------------------+
+|occupation 1 |Artist |
+| 2 |Baker |
+| 3 |Barrister |
+| 4 |Carpenter |
+| 5 |Cleaner |
+| 6 |Cook |
+| 7 |Manager |
+| 8 |Mathematician |
+| 9 |Painter |
+| 10 |Payload Specialist|
+| 11 |Plumber |
+| 12 |Scientist |
+| 13 |Tailor |
++---------------+------------------+
+```
--- /dev/null
+# COMPUTE
+
+```
+COMPUTE VARIABLE = EXPRESSION.
+ or
+COMPUTE vector(INDEX) = EXPRESSION.
+```
+
+`COMPUTE` assigns the value of an expression to a target variable.
+For each case, the expression is evaluated and its value assigned to
+the target variable. Numeric and string variables may be assigned.
+When a string expression's width differs from the target variable's
+width, the string result of the expression is truncated or padded with
+spaces on the right as necessary. The expression and variable types
+must match.
+
+For numeric variables only, the target variable need not already
+exist. Numeric variables created by `COMPUTE` are assigned an `F8.2`
+output format. String variables must be declared before they can be
+used as targets for `COMPUTE`.
+
+The target variable may be specified as an element of a
+[vector](../../commands/variables/vector.md). In this case, an
+expression `INDEX` must be specified in parentheses following the vector
+name. The expression `INDEX` must evaluate to a numeric value that,
+after rounding down to the nearest integer, is a valid index for the
+named vector.
+
+Using `COMPUTE` to assign to a variable specified on
+[`LEAVE`](../../commands/variables/leave.md) resets the variable's
+left state. Therefore, `LEAVE` should be specified following
+`COMPUTE`, not before.
+
+`COMPUTE` is a transformation. It does not cause the active dataset
+to be read.
+
+When `COMPUTE` is specified following `TEMPORARY` (*note TEMPORARY::),
+the [`LAG`](../../language/expressions/functions/miscellaneous.md)
+function may not be used.
+
+## Examples
+
+The dataset `physiology.sav` contains the height and weight of
+persons. For some purposes, neither height nor weight alone is of
+interest. Epidemiologists are often more interested in the "body mass
+index" which can sometimes be used as a predictor for clinical
+conditions. The body mass index is defined as the weight of the
+person in kilograms divided by the square of the person's height in
+metres.[^1]
+
+[^1]: Since BMI is a quantity with a ratio scale and has units, the
+term "index" is a misnomer, but that is what it is called.
+
+```
+get file='physiology.sav'.
+
+* height is in mm so we must divide by 1000 to get metres.
+compute bmi = weight / (height/1000)**2.
+variable label bmi "Body Mass Index".
+
+descriptives /weight height bmi.
+```
+
+This syntax shows how you can use `COMPUTE` to generate a new variable
+called bmi and have every case's value calculated from the existing
+values of weight and height. It also shows how you can [add a
+label](../../commands/variables/variable-labels.md) to this new
+variable, so that a more descriptive label appears in subsequent
+analyses, and this can be seen in the output from the `DESCRIPTIVES`
+command, below.
+
+The expression which follows the `=` sign can be as complicated as
+necessary. See [Expressions](../../language/expressions.md) for a
+full description of the language accepted.
+
+```
+ Descriptive Statistics
+┌─────────────────────┬──┬───────┬───────┬───────┬───────┐
+│ │ N│ Mean │Std Dev│Minimum│Maximum│
+├─────────────────────┼──┼───────┼───────┼───────┼───────┤
+│Weight in kilograms │40│ 72.12│ 26.70│ ─55.6│ 92.1│
+│Height in millimeters│40│1677.12│ 262.87│ 179│ 1903│
+│Body Mass Index │40│ 67.46│ 274.08│ ─21.62│1756.82│
+│Valid N (listwise) │40│ │ │ │ │
+│Missing N (listwise) │ 0│ │ │ │ │
+└─────────────────────┴──┴───────┴───────┴───────┴───────┘
+```
--- /dev/null
+The PSPP procedures in this chapter manipulate data and prepare the
+active dataset for later analyses. They do not produce output.
--- /dev/null
+# AGGREGATE
+
+```
+AGGREGATE
+ [OUTFILE={*,'FILE_NAME',FILE_HANDLE} [MODE={REPLACE,ADDVARIABLES}]]
+ [/MISSING=COLUMNWISE]
+ [/PRESORTED]
+ [/DOCUMENT]
+ [/BREAK=VAR_LIST]
+ /DEST_VAR['LABEL']...=AGR_FUNC(SRC_VARS[, ARGS]...)...
+```
+
+`AGGREGATE` summarizes groups of cases into single cases. It divides
+cases into groups that have the same values for one or more variables
+called "break variables". Several functions are available for
+summarizing case contents.
+
+The `AGGREGATE` syntax consists of subcommands to control its
+behavior, all of which are optional, followed by one or more
+destination variable assigments, each of which uses an aggregation
+function to define how it is calculated.
+
+The `OUTFILE` subcommand, which must be first, names the destination
+for `AGGREGATE` output. It may name a system file by file name or
+[file handle](../../language/files/file-handles.md), a
+[dataset](../../language/datasets/index.md) by its name, or `*` to
+replace the active dataset. `AGGREGATE` writes its output to this
+file.
+
+With `OUTFILE=*` only, `MODE` may be specified immediately afterward
+with the value `ADDVARIABLES` or `REPLACE`:
+
+- With `REPLACE`, the default, the active dataset is replaced by a
+ new dataset which contains just the break variables and the
+ destination varibles. The new file contains as many cases as there
+ are unique combinations of the break variables.
+
+- With `ADDVARIABLES`, the destination variables are added to those
+ in the existing active dataset. Cases that have the same
+ combination of values in their break variables receive identical
+ values for the destination variables. The number of cases in the
+ active dataset remains unchanged. The data must be sorted on the
+ break variables, that is, `ADDVARIABLES` implies `PRESORTED`
+
+Without `OUTFILE`, `AGGREGATE` acts as if `OUTFILE=*
+MODE=ADDVARIABLES` were specified.
+
+By default, `AGGREGATE` first sorts the data on the break variables.
+If the active dataset is already sorted or grouped by the break
+variables, specify `PRESORTED` to save time. With
+`MODE=ADDVARIABLES`, the data must be pre-sorted.
+
+Specify `DOCUMENT` (*note DOCUMENT::) to copy the documents from the
+active dataset into the aggregate file. Otherwise, the aggregate file
+does not contain any documents, even if the aggregate file replaces
+the active dataset.
+
+Normally, `AGGREGATE` produces a non-missing value whenever there is
+enough non-missing data for the aggregation function in use, that is,
+just one non-missing value or, for the `SD` and `SD.` aggregation
+functions, two non-missing values. Specify `/MISSING=COLUMNWISE` to
+make `AGGREGATE` output a missing value when one or more of the input
+values are missing.
+
+The `BREAK` subcommand is optionally but usually present. On `BREAK`,
+list the variables used to divide the active dataset into groups to be
+summarized.
+
+`AGGREGATE` is particular about the order of subcommands. `OUTFILE`
+must be first, followed by `MISSING`. `PRESORTED` and `DOCUMENT`
+follow `MISSING`, in either order, followed by `BREAK`, then followed
+by aggregation variable specifications.
+
+At least one set of aggregation variables is required. Each set
+comprises a list of aggregation variables, an equals sign (`=`), the
+name of an aggregation function (see the list below), and a list of
+source variables in parentheses. A few aggregation functions do not
+accept source variables, and some aggregation functions expect
+additional arguments after the source variable names.
+
+`AGGREGATE` typically creates aggregation variables with no variable
+label, value labels, or missing values. Their default print and write
+formats depend on the aggregation function used, with details given in
+the table below. A variable label for an aggregation variable may be
+specified just after the variable's name in the aggregation variable
+list.
+
+Each set must have exactly as many source variables as aggregation
+variables. Each aggregation variable receives the results of applying
+the specified aggregation function to the corresponding source variable.
+
+The following aggregation functions may be applied only to numeric
+variables:
+
+* `MEAN(VAR_NAME...)`
+ Arithmetic mean. Limited to numeric values. The default format is
+ `F8.2`.
+
+* `MEDIAN(VAR_NAME...)`
+ The median value. Limited to numeric values. The default format
+ is `F8.2`.
+
+* `SD(VAR_NAME...)`
+ Standard deviation of the mean. Limited to numeric values. The
+ default format is `F8.2`.
+
+* `SUM(VAR_NAME...)`
+ Sum. Limited to numeric values. The default format is `F8.2`.
+
+ These aggregation functions may be applied to numeric and string
+variables:
+
+* `CGT(VAR_NAME..., VALUE)`
+ `CLT(VAR_NAME..., VALUE)`
+ `CIN(VAR_NAME..., LOW, HIGH)`
+ `COUT(VAR_NAME..., LOW, HIGH)`
+ Total weight of cases greater than or less than `VALUE` or inside or
+ outside the closed range `[LOW,HIGH]`, respectively. The default
+ format is `F5.3`.
+
+* `FGT(VAR_NAME..., VALUE)`
+ `FLT(VAR_NAME..., VALUE)`
+ `FIN(VAR_NAME..., LOW, HIGH)`
+ `FOUT(VAR_NAME..., LOW, HIGH)`
+ Fraction of values greater than or less than `VALUE` or inside or
+ outside the closed range `[LOW,HIGH]`, respectively. The default
+ format is `F5.3`.
+
+* `FIRST(VAR_NAME...)`
+ `LAST(VAR_NAME...)`
+ First or last non-missing value, respectively, in break group. The
+ aggregation variable receives the complete dictionary information
+ from the source variable. The sort performed by `AGGREGATE` (and
+ by `SORT CASES`) is stable. This means that the first (or last)
+ case with particular values for the break variables before sorting
+ is also the first (or last) case in that break group after sorting.
+
+* `MIN(VAR_NAME...)`
+ `MAX(VAR_NAME...)`
+ Minimum or maximum value, respectively. The aggregation variable
+ receives the complete dictionary information from the source
+ variable.
+
+* `N(VAR_NAME...)`
+ `NMISS(VAR_NAME...)`
+ Total weight of non-missing or missing values, respectively. The
+ default format is `F7.0` if weighting is not enabled, `F8.2` if it is
+ (*note WEIGHT::).
+
+* `NU(VAR_NAME...)`
+ `NUMISS(VAR_NAME...)`
+ Count of non-missing or missing values, respectively, ignoring case
+ weights. The default format is `F7.0`.
+
+* `PGT(VAR_NAME..., VALUE)`
+ `PLT(VAR_NAME..., VALUE)`
+ `PIN(VAR_NAME..., LOW, HIGH)`
+ `POUT(VAR_NAME..., LOW, HIGH)`
+ Percentage between 0 and 100 of values greater than or less than
+ `VALUE` or inside or outside the closed range `[LOW,HIGH]`,
+ respectively. The default format is `F5.1`.
+
+These aggregation functions do not accept source variables:
+
+* `N`
+ Total weight of cases aggregated to form this group. The default
+ format is `F7.0` if weighting is not enabled, `F8.2` if it is (*note
+ WEIGHT::).
+
+* `NU`
+ Count of cases aggregated to form this group, ignoring case
+ weights. The default format is `F7.0`.
+
+Aggregation functions compare string values in terms of Unicode
+character codes.
+
+The aggregation functions listed above exclude all user-missing values
+from calculations. To include user-missing values, insert a period
+(`.`) at the end of the function name. (e.g. `SUM.`). (Be aware that
+specifying such a function as the last token on a line causes the
+period to be interpreted as the end of the command.)
+
+`AGGREGATE` both ignores and cancels the current `SPLIT FILE` settings
+(*note SPLIT FILE::).
+
+## Aggregate Example
+
+The `personnel.sav` dataset provides the occupations and salaries of
+many individuals. For many purposes however such detailed information
+is not interesting, but often the aggregated statistics of each
+occupation are of interest. Here, the `AGGREGATE` command is used to
+calculate the mean, the median and the standard deviation of each
+occupation.
+
+```
+GET FILE="personnel.sav".
+AGGREGATE OUTFILE=* MODE=REPLACE
+ /BREAK=occupation
+ /occ_mean_salary=MEAN(salary)
+ /occ_median_salary=MEDIAN(salary)
+ /occ_std_dev_salary=SD(salary).
+LIST.
+```
+
+Since we chose the `MODE=REPLACE` option, cases for the individual
+persons are no longer present. They have each been replaced by a
+single case per aggregated value.
+
+```
+ Data List
+┌──────────────────┬───────────────┬─────────────────┬──────────────────┐
+│ occupation │occ_mean_salary│occ_median_salary│occ_std_dev_salary│
+├──────────────────┼───────────────┼─────────────────┼──────────────────┤
+│Artist │ 37836.18│ 34712.50│ 7631.48│
+│Baker │ 45075.20│ 45075.20│ 4411.21│
+│Barrister │ 39504.00│ 39504.00│ .│
+│Carpenter │ 39349.11│ 36190.04│ 7453.40│
+│Cleaner │ 41142.50│ 39647.49│ 14378.98│
+│Cook │ 40357.79│ 43194.00│ 11064.51│
+│Manager │ 46452.14│ 45657.56│ 6901.69│
+│Mathematician │ 34531.06│ 34763.06│ 5267.68│
+│Painter │ 45063.55│ 45063.55│ 15159.67│
+│Payload Specialist│ 34355.72│ 34355.72│ .│
+│Plumber │ 40413.91│ 40410.00│ 4726.05│
+│Scientist │ 36687.07│ 36803.83│ 10873.54│
+│Scrientist │ 42530.65│ 42530.65│ .│
+│Tailor │ 34586.79│ 34586.79│ 3728.98│
+└──────────────────┴───────────────┴─────────────────┴──────────────────┘
+```
+
+Note that some values for the standard deviation are blank. This is
+because there is only one case with the respective occupation.
+
--- /dev/null
+The PSPP procedures in this chapter manipulate data and prepare the
+active dataset for later analyses. They do not produce output.