From: Ben Pfaff Date: Wed, 7 May 2025 19:03:00 +0000 (-0700) Subject: work on manual X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=a8491059bfd7c1200219306ff350199dfdee1bef;p=pspp work on manual --- diff --git a/rust/doc/src/SUMMARY.md b/rust/doc/src/SUMMARY.md index 4b01a29331..77e058bd01 100644 --- a/rust/doc/src/SUMMARY.md +++ b/rust/doc/src/SUMMARY.md @@ -101,6 +101,10 @@ - [AGGREGATE](commands/data/aggregate.md) - [AUTORECODE](commands/data/autorecode.md) - [COMPUTE](commands/data/compute.md) + - [FLIP](commands/data/flip.md) + - [IF](commands/data/if.md) + - [RECODE](commands/data/recode.md) + - [SORT CASES](commands/data/sort-cases.md) # Developer Documentation diff --git a/rust/doc/src/commands/data/compute.md b/rust/doc/src/commands/data/compute.md index dd3067bf52..9181d955d5 100644 --- a/rust/doc/src/commands/data/compute.md +++ b/rust/doc/src/commands/data/compute.md @@ -38,7 +38,7 @@ When `COMPUTE` is specified following `TEMPORARY` (*note TEMPORARY::), the [`LAG`](../../language/expressions/functions/miscellaneous.md) function may not be used. -## Examples +## Example The dataset `physiology.sav` contains the height and weight of persons. For some purposes, neither height nor weight alone is of diff --git a/rust/doc/src/commands/data/flip.md b/rust/doc/src/commands/data/flip.md new file mode 100644 index 0000000000..ccef0d626b --- /dev/null +++ b/rust/doc/src/commands/data/flip.md @@ -0,0 +1,131 @@ +# FLIP + +``` +FLIP /VARIABLES=VAR_LIST /NEWNAMES=VAR_NAME. +``` + +`FLIP` transposes rows and columns in the active dataset. It causes +cases to be swapped with variables, and vice versa. + +All variables in the transposed active dataset are numeric. String +variables take on the system-missing value in the transposed file. + +`N` subcommands are required. If specified, the `VARIABLES` +subcommand selects variables to be transformed into cases, and variables +not specified are discarded. If the `VARIABLES` subcommand is omitted, +all variables are selected for transposition. + +The variables specified by `NEWNAMES`, which must be a string +variable, is used to give names to the variables created by `FLIP`. +Only the first 8 characters of the variable are used. If `NEWNAMES` +is not specified then the default is a variable named CASE_LBL, if it +exists. If it does not then the variables created by `FLIP` are named +`VAR000` through `VAR999`, then `VAR1000`, `VAR1001`, and so on. + +When a `NEWNAMES` variable is available, the names must be +canonicalized before becoming variable names. Invalid characters are +replaced by letter `V` in the first position, or by `_` in subsequent +positions. If the name thus generated is not unique, then numeric +extensions are added, starting with 1, until a unique name is found or +there are no remaining possibilities. If the latter occurs then the +`FLIP` operation aborts. + +The resultant dictionary contains a `CASE_LBL` variable, a string +variable of width 8, which stores the names of the variables in the +dictionary before the transposition. Variables names longer than 8 +characters are truncated. If `FLIP` is called again on this dataset, +the `CASE_LBL` variable can be passed to the `NEWNAMES` subcommand to +recreate the original variable names. + +`FLIP` honors `N OF CASES` (*note N OF CASES::). It ignores +`TEMPORARY` (*note TEMPORARY::), so that "temporary" transformations +become permanent. + +## Example + +In the syntax below, data has been entered using [`DATA +LIST`](../../commands/data-io/data-list.md) such that the first +variable in the dataset is a string variable containing a description +of the other data for the case. Clearly this is not a convenient +arrangement for performing statistical analyses, so it would have been +better to think a little more carefully about how the data should have +been arranged. However often the data is provided by some third party +source, and you have no control over the form. Fortunately, we can +use `FLIP` to exchange the variables and cases in the active dataset. + +``` +data list notable list /heading (a16) v1 v2 v3 v4 v5 v6 +begin data. +date-of-birth 1970 1989 2001 1966 1976 1982 +sex 1 0 0 1 0 1 +score 10 10 9 3 8 9 +end data. + +echo 'Before FLIP:'. +display variables. +list. + +flip /variables = all /newnames = heading. + +echo 'After FLIP:'. +display variables. +list. +``` + +As you can see in the results below, before the `FLIP` command has run +there are seven variables (six containing data and one for the +heading) and three cases. Afterwards there are four variables (one +per case, plus the CASE_LBL variable) and six cases. You can delete +the CASE_LBL variable (see [DELETE +VARIABLES](../commands/variables/delete-variables.md)) if you don't +need it. + +``` +Before FLIP: + + Variables +┌───────┬────────┬────────────┬────────────┐ +│Name │Position│Print Format│Write Format│ +├───────┼────────┼────────────┼────────────┤ +│heading│ 1│A16 │A16 │ +│v1 │ 2│F8.2 │F8.2 │ +│v2 │ 3│F8.2 │F8.2 │ +│v3 │ 4│F8.2 │F8.2 │ +│v4 │ 5│F8.2 │F8.2 │ +│v5 │ 6│F8.2 │F8.2 │ +│v6 │ 7│F8.2 │F8.2 │ +└───────┴────────┴────────────┴────────────┘ + + Data List +┌─────────────┬───────┬───────┬───────┬───────┬───────┬───────┐ +│ heading │ v1 │ v2 │ v3 │ v4 │ v5 │ v6 │ +├─────────────┼───────┼───────┼───────┼───────┼───────┼───────┤ +│date─of─birth│1970.00│1989.00│2001.00│1966.00│1976.00│1982.00│ +│sex │ 1.00│ .00│ .00│ 1.00│ .00│ 1.00│ +│score │ 10.00│ 10.00│ 9.00│ 3.00│ 8.00│ 9.00│ +└─────────────┴───────┴───────┴───────┴───────┴───────┴───────┘ + +After FLIP: + + Variables +┌─────────────┬────────┬────────────┬────────────┐ +│Name │Position│Print Format│Write Format│ +├─────────────┼────────┼────────────┼────────────┤ +│CASE_LBL │ 1│A8 │A8 │ +│date_of_birth│ 2│F8.2 │F8.2 │ +│sex │ 3│F8.2 │F8.2 │ +│score │ 4│F8.2 │F8.2 │ +└─────────────┴────────┴────────────┴────────────┘ + + Data List +┌────────┬─────────────┬────┬─────┐ +│CASE_LBL│date_of_birth│ sex│score│ +├────────┼─────────────┼────┼─────┤ +│v1 │ 1970.00│1.00│10.00│ +│v2 │ 1989.00│ .00│10.00│ +│v3 │ 2001.00│ .00│ 9.00│ +│v4 │ 1966.00│1.00│ 3.00│ +│v5 │ 1976.00│ .00│ 8.00│ +│v6 │ 1982.00│1.00│ 9.00│ +└────────┴─────────────┴────┴─────┘ +``` diff --git a/rust/doc/src/commands/data/if.md b/rust/doc/src/commands/data/if.md new file mode 100644 index 0000000000..a1a3273392 --- /dev/null +++ b/rust/doc/src/commands/data/if.md @@ -0,0 +1,40 @@ +# IF + +``` + IF CONDITION VARIABLE=EXPRESSION. +or + IF CONDITION vector(INDEX)=EXPRESSION. +``` + +The `IF` transformation evaluates a test expression and, if it is +true, assigns the value of a target expression to a target variable. + +Specify a boolean-valued test +[expression](../../language/expressions/index.md) to be tested following the +`IF` keyword. The test expression is evaluated for each case: + +- If it is true, then the target expression is evaluated and assigned + to the specified variable. + +- If it is false or missing, nothing is done. + +Numeric and string variables may be assigned. When a string +expression's width differs from the target variable's width, the +string result is truncated or padded with spaces on the right as +necessary. The expression and variable types must match. + +The target variable may be specified as an element of a +[vector](../../commands/variables/vector.md). In this case, a vector +index expression must be specified in parentheses following the vector +name. The index expression must evaluate to a numeric value that, +after rounding down to the nearest integer, is a valid index for the +named vector. + +Using `IF` to assign to a variable specified on +[`LEAVE`](../../commands/variables/leave.md) resets the variable's +left state. Therefore, use `LEAVE` after `IF`, not before. + +When `IF` follows `TEMPORARY` (*note TEMPORARY::), the +[`LAG`](../../language/expressions/functions/miscellaneous.md) function may not +be used. + diff --git a/rust/doc/src/commands/data/recode.md b/rust/doc/src/commands/data/recode.md new file mode 100644 index 0000000000..1b4bfd4103 --- /dev/null +++ b/rust/doc/src/commands/data/recode.md @@ -0,0 +1,164 @@ +# RECODE + +The `RECODE` command is used to transform existing values into other, +user specified values. The general form is: + +``` +RECODE SRC_VARS + (SRC_VALUE SRC_VALUE ... = DEST_VALUE) + (SRC_VALUE SRC_VALUE ... = DEST_VALUE) + (SRC_VALUE SRC_VALUE ... = DEST_VALUE) ... + [INTO DEST_VARS]. +``` + +Following the `RECODE` keyword itself comes `SRC_VARS`, a list of +variables whose values are to be transformed. These variables must +all string or all numeric variables. + +After the list of source variables, there should be one or more +"mappings". Each mapping is enclosed in parentheses, and contains the +source values and a destination value separated by a single `=`. The +source values are used to specify the values in the dataset which need +to change, and the destination value specifies the new value to which +they should be changed. Each SRC_VALUE may take one of the following +forms: + +* `NUMBER` (numeric source variables only) + Matches a number. + +* `STRING` (string source variables only) + Matches a string enclosed in single or double quotes. + +* `NUM1 THRU NUM2` (numeric source variables only) + Matches all values in the range between `NUM1` and `NUM2`, including + both endpoints of the range. `NUM1` should be less than `NUM2`. + Open-ended ranges may be specified using `LO` or `LOWEST` for `NUM1` + or `HI` or `HIGHEST` for `NUM2`. + +* `MISSING` + Matches system missing and user missing values. + +* `SYSMIS` (numeric source variables only) + Match system-missing values. + +* `ELSE` + Matches any values that are not matched by any other `SRC_VALUE`. + This should appear only as the last mapping in the command. + +After the source variables comes an `=` and then the `DEST_VALUE`, +which may take any of the following forms: + +* `NUMBER` (numeric destination variables only) + A literal numeric value to which the source values should be + changed. + +* `STRING` (string destination variables only) + A literal string value (enclosed in quotation marks) to which the + source values should be changed. This implies the destination + variable must be a string variable. + +* `SYSMIS` (numeric destination variables only) + The keyword `SYSMIS` changes the value to the system missing value. + This implies the destination variable must be numeric. + +* `COPY` + The special keyword `COPY` means that the source value should not be + modified, but copied directly to the destination value. This is + meaningful only if `INTO DEST_VARS` is specified. + +Mappings are considered from left to right. Therefore, if a value is +matched by a `SRC_VALUE` from more than one mapping, the first +(leftmost) mapping which matches is considered. Any subsequent +matches are ignored. + +The clause `INTO DEST_VARS` is optional. The behaviour of the command +is slightly different depending on whether it appears or not: + +* Without `INTO DEST_VARS`, then values are recoded "in place". This + means that the recoded values are written back to the source variables + from whence the original values came. In this case, the DEST_VALUE + for every mapping must imply a value which has the same type as the + SRC_VALUE. For example, if the source value is a string value, it is + not permissible for DEST_VALUE to be `SYSMIS` or another forms which + implies a numeric result. It is also not permissible for DEST_VALUE + to be longer than the width of the source variable. + + The following example recodes two numeric variables `x` and `y` in + place. 0 becomes 99, the values 1 to 10 inclusive are unchanged, + values 1000 and higher are recoded to the system-missing value, and + all other values are changed to 999: + + ``` + RECODE x y + (0 = 99) + (1 THRU 10 = COPY) + (1000 THRU HIGHEST = SYSMIS) + (ELSE = 999). + ``` + +* With `INTO DEST_VARS`, recoded values are written into the variables + specified in `DEST_VARS`, which must therefore contain a list of + valid variable names. The number of variables in `DEST_VARS` must + be the same as the number of variables in `SRC_VARS` and the + respective order of the variables in `DEST_VARS` corresponds to the + order of `SRC_VARS`. That is to say, the recoded value whose + original value came from the Nth variable in `SRC_VARS` is placed + into the Nth variable in `DEST_VARS`. The source variables are + unchanged. If any mapping implies a string as its destination + value, then the respective destination variable must already exist, + or have been declared using `STRING` or another transformation. + Numeric variables however are automatically created if they don't + already exist. + + The following example deals with two source variables, `a` and `b` + which contain string values. Hence there are two destination + variables `v1` and `v2`. Any cases where `a` or `b` contain the + values `apple`, `pear` or `pomegranate` result in `v1` or `v2` being + filled with the string `fruit` whilst cases with `tomato`, `lettuce` + or `carrot` result in `vegetable`. Other values produce the result + `unknown`: + + ``` + STRING v1 (A20). + STRING v2 (A20). + + RECODE a b + ("apple" "pear" "pomegranate" = "fruit") + ("tomato" "lettuce" "carrot" = "vegetable") + (ELSE = "unknown") + INTO v1 v2. + ``` + +There is one special mapping, not mentioned above. If the source +variable is a string variable then a mapping may be specified as +`(CONVERT)`. This mapping, if it appears must be the last mapping +given and the `INTO DEST_VARS` clause must also be given and must not +refer to a string variable. `CONVERT` causes a number specified as a +string to be converted to a numeric value. For example it converts +the string `"3"` into the numeric value 3 (note that it does not +convert `three` into 3). If the string cannot be parsed as a number, +then the system-missing value is assigned instead. In the following +example, cases where the value of `x` (a string variable) is the empty +string, are recoded to 999 and all others are converted to the numeric +equivalent of the input value. The results are placed into the +numeric variable `y`: + +``` +RECODE x ("" = 999) (CONVERT) INTO y. +``` + +It is possible to specify multiple recodings on a single command. +Introduce additional recodings with a slash (`/`) to separate them from +the previous recodings: + +``` +RECODE + a (2 = 22) (ELSE = 99) + /b (1 = 3) INTO z. +``` + +Here we have two recodings. The first affects the source variable `a` +and recodes in-place the value 2 into 22 and all other values to 99. +The second recoding copies the values of `b` into the variable `z`, +changing any instances of 1 into 3. + diff --git a/rust/doc/src/commands/data/sort-cases.md b/rust/doc/src/commands/data/sort-cases.md new file mode 100644 index 0000000000..2501c63cb6 --- /dev/null +++ b/rust/doc/src/commands/data/sort-cases.md @@ -0,0 +1,98 @@ +# SORT CASES + +``` +SORT CASES BY VAR_LIST[({D|A}] [ VAR_LIST[({D|A}] ] ... +``` + +`SORT CASES` sorts the active dataset by the values of one or more +variables. + +Specify `BY` and a list of variables to sort by. By default, +variables are sorted in ascending order. To override sort order, +specify `(D)` or `(DOWN)` after a list of variables to get descending +order, or `(A)` or `(UP)` for ascending order. These apply to all the +listed variables up until the preceding `(A)`, `(D)`, `(UP)` or +`(DOWN)`. + +`SORT CASES` performs a stable sort, meaning that records with equal +values of the sort variables have the same relative order before and +after sorting. Thus, re-sorting an already sorted file does not +affect the ordering of cases. + +`SORT CASES` is a procedure. It causes the data to be read. + +`SORT CASES` attempts to sort the entire active dataset in main +memory. If workspace is exhausted, it falls back to a merge sort +algorithm which creates numerous temporary files. + +`SORT CASES` may not be specified following `TEMPORARY`. + +## Example + +In the syntax below, the data from the file `physiology.sav` is sorted +by two variables, viz sex in descending order and temperature in +ascending order. + +``` +get file='physiology.sav'. +sort cases by sex (D) temperature(A). +list. +``` + + In the output below, you can see that all the cases with a sex of +`1` (female) appear before those with a sex of `0` (male). This is +because they have been sorted in descending order. Within each sex, +the data is sorted on the temperature variable, this time in ascending +order. + +``` + Data List +┌───┬──────┬──────┬───────────┐ +│sex│height│weight│temperature│ +├───┼──────┼──────┼───────────┤ +│ 1│ 1606│ 56.1│ 34.56│ +│ 1│ 179│ 56.3│ 35.15│ +│ 1│ 1609│ 55.4│ 35.46│ +│ 1│ 1606│ 56.0│ 36.06│ +│ 1│ 1607│ 56.3│ 36.26│ +│ 1│ 1604│ 56.0│ 36.57│ +│ 1│ 1604│ 56.6│ 36.81│ +│ 1│ 1606│ 56.3│ 36.88│ +│ 1│ 1604│ 57.8│ 37.32│ +│ 1│ 1598│ 55.6│ 37.37│ +│ 1│ 1607│ 55.9│ 37.84│ +│ 1│ 1605│ 54.5│ 37.86│ +│ 1│ 1603│ 56.1│ 38.80│ +│ 1│ 1604│ 58.1│ 38.85│ +│ 1│ 1605│ 57.7│ 38.98│ +│ 1│ 1709│ 55.6│ 39.45│ +│ 1│ 1604│ -55.6│ 39.72│ +│ 1│ 1601│ 55.9│ 39.90│ +│ 0│ 1799│ 90.3│ 32.59│ +│ 0│ 1799│ 89.0│ 33.61│ +│ 0│ 1799│ 90.6│ 34.04│ +│ 0│ 1801│ 90.5│ 34.42│ +│ 0│ 1802│ 87.7│ 35.03│ +│ 0│ 1793│ 90.1│ 35.11│ +│ 0│ 1801│ 92.1│ 35.98│ +│ 0│ 1800│ 89.5│ 36.10│ +│ 0│ 1645│ 92.1│ 36.68│ +│ 0│ 1698│ 90.2│ 36.94│ +│ 0│ 1800│ 89.6│ 37.02│ +│ 0│ 1800│ 88.9│ 37.03│ +│ 0│ 1801│ 88.9│ 37.12│ +│ 0│ 1799│ 90.4│ 37.33│ +│ 0│ 1903│ 91.5│ 37.52│ +│ 0│ 1799│ 90.9│ 37.53│ +│ 0│ 1800│ 91.0│ 37.60│ +│ 0│ 1799│ 90.4│ 37.68│ +│ 0│ 1801│ 91.7│ 38.98│ +│ 0│ 1801│ 90.9│ 39.03│ +│ 0│ 1799│ 89.3│ 39.77│ +│ 0│ 1884│ 88.6│ 39.97│ +└───┴──────┴──────┴───────────┘ +``` + +`SORT CASES` affects only the active file. It does not have any +effect upon the `physiology.sav` file itself. For that, you would +have to rewrite the file using the `SAVE` command (*note SAVE::).