From: Ben Pfaff <blp@cs.stanford.edu>
Date: Tue, 6 May 2025 02:38:45 +0000 (-0700)
Subject: work on manual
X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=e7e12d3e146633cf87a44701dd045158bae05fab;p=pspp

work on manual
---

diff --git a/rust/doc/src/SUMMARY.md b/rust/doc/src/SUMMARY.md
index 69876f7a74..e6e59f1056 100644
--- a/rust/doc/src/SUMMARY.md
+++ b/rust/doc/src/SUMMARY.md
@@ -3,6 +3,19 @@
 [Introduction](introduction.md)
 [License](license.md)
 
+# Language Syntax
+
+- [Basics](language/basics/index.md)
+  - [Tokens](language/basics/tokens.md)
+  - [Forming Commands](language/basics/commands.md)
+  - [Syntax Variants](language/basics/syntax-variants.md)
+  - [Handling Missing Values](language/basics/missing-values.md)
+- [Datasets](language/datasets/index.md)
+  - [Variables](language/datasets/variables.md)
+  - [Variable Lists](language/datasets/variable-lists.md)
+  - [Input and Output Formats](language/datasets/formats/index.md)
+    - [Basic Numeric Formats](language/datasets/formats/basic.md)
+
 # Developer Documentation
 
 - [System File Format](system-file/index.md)
diff --git a/rust/doc/src/language/basics/commands.md b/rust/doc/src/language/basics/commands.md
new file mode 100644
index 0000000000..9e5241cb56
--- /dev/null
+++ b/rust/doc/src/language/basics/commands.md
@@ -0,0 +1,21 @@
+# Forming Commands
+
+Most PSPP commands share a common structure.  A command begins with a
+command name, such as `FREQUENCIES`, `DATA LIST`, or `N OF CASES`.  The
+command name may be abbreviated to its first word, and each word in the
+command name may be abbreviated to its first three or more characters,
+where these abbreviations are unambiguous.
+
+The command name may be followed by one or more "subcommands".  Each
+subcommand begins with a subcommand name, which may be abbreviated to
+its first three letters.  Some subcommands accept a series of one or
+more specifications, which follow the subcommand name, optionally
+separated from it by an equals sign (`=`).  Specifications may be
+separated from each other by commas or spaces.  Each subcommand must
+be separated from the next (if any) by a forward slash (`/`).
+
+There are multiple ways to mark the end of a command.  The most common
+way is to end the last line of the command with a period (`.`) as
+described in the previous section.  A blank line, or one that consists
+only of white space or comments, also ends a command.
+
diff --git a/rust/doc/src/language/basics/index.md b/rust/doc/src/language/basics/index.md
new file mode 100644
index 0000000000..b7d305c858
--- /dev/null
+++ b/rust/doc/src/language/basics/index.md
@@ -0,0 +1,2 @@
+This chapter discusses elements common to many PSPP commands.  Later
+chapters describe individual commands in detail.
diff --git a/rust/doc/src/language/basics/missing-values.md b/rust/doc/src/language/basics/missing-values.md
new file mode 100644
index 0000000000..a63e0a45cf
--- /dev/null
+++ b/rust/doc/src/language/basics/missing-values.md
@@ -0,0 +1,19 @@
+# Handling Missing Values
+
+PSPP includes special support for unknown numeric data values.  Missing
+observations are assigned a special value, called the "system-missing
+value".  This "value" actually indicates the absence of a value; it
+means that the actual value is unknown.  Procedures automatically
+exclude from analyses those observations or cases that have missing
+values.  Details of missing value exclusion depend on the procedure and
+can often be controlled by the user; refer to descriptions of individual
+procedures for details.
+
+   The system-missing value exists only for numeric variables.  String
+variables always have a defined value, even if it is only a string of
+spaces.
+
+   Variables, whether numeric or string, can have designated
+"user-missing values".  Every user-missing value is an actual value for
+that variable.  However, most of the time user-missing values are
+treated in the same way as the system-missing value.
diff --git a/rust/doc/src/language/basics/syntax-variants.md b/rust/doc/src/language/basics/syntax-variants.md
new file mode 100644
index 0000000000..59092ab566
--- /dev/null
+++ b/rust/doc/src/language/basics/syntax-variants.md
@@ -0,0 +1,30 @@
+# Syntax Variants
+
+There are three variants of command syntax, which vary only in how they
+detect the end of one command and the start of the next.
+
+   In "interactive mode", which is the default for syntax typed at a
+command prompt, a period as the last non-blank character on a line ends
+a command.  A blank line also ends a command.
+
+   In "batch mode", an end-of-line period or a blank line also ends a
+command.  Additionally, it treats any line that has a non-blank
+character in the leftmost column as beginning a new command.  Thus, in
+batch mode the second and subsequent lines in a command must be
+indented.
+
+   Regardless of the syntax mode, a plus sign, minus sign, or period in
+the leftmost column of a line is ignored and causes that line to begin a
+new command.  This is most useful in batch mode, in which the first line
+of a new command could not otherwise be indented, but it is accepted
+regardless of syntax mode.
+
+   The default mode for reading commands from a file is "auto mode".  It
+is the same as batch mode, except that a line with a non-blank in the
+leftmost column only starts a new command if that line begins with the
+name of a PSPP command.  This correctly interprets most valid PSPP
+syntax files regardless of the syntax mode for which they are intended.
+
+   The `--interactive` (or `-i`) or `--batch` (or `-b`) options set the
+syntax mode for files listed on the PSPP command line.
+
diff --git a/rust/doc/src/language/basics/tokens.md b/rust/doc/src/language/basics/tokens.md
new file mode 100644
index 0000000000..5da24061b9
--- /dev/null
+++ b/rust/doc/src/language/basics/tokens.md
@@ -0,0 +1,115 @@
+# Tokens
+
+PSPP divides most syntax file lines into series of short chunks called
+"tokens".  Tokens are then grouped to form commands, each of which
+tells PSPP to take some actionâread in data, write out data, perform a
+statistical procedure, etc.  Each type of token is described below.
+
+## Identifiers
+
+Identifiers are names that typically specify variables, commands, or
+subcommands.  The first character in an identifier must be a letter,
+`#`, or `@`.  The remaining characters in the identifier must be
+letters, digits, or one of the following special characters:
+
+```
+. _ $ # @
+```
+
+Identifiers may be any length, but only the first 64 bytes are
+significant.  Identifiers are not case-sensitive: `foobar`,
+`Foobar`, `FooBar`, `FOOBAR`, and `FoObaR` are different
+representations of the same identifier.
+
+Some identifiers are reserved.  Reserved identifiers may not be
+used in any context besides those explicitly described in this
+manual.  The reserved identifiers are:
+
+```
+ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
+```
+
+## Keywords
+
+Keywords are a subclass of identifiers that form a fixed part of
+command syntax.  For example, command and subcommand names are
+keywords.  Keywords may be abbreviated to their first 3 characters
+if this abbreviation is unambiguous.  (Unique abbreviations of 3 or
+more characters are also accepted: `FRE`, `FREQ`, and `FREQUENCIES`
+are equivalent when the last is a keyword.)
+
+Reserved identifiers are always used as keywords.  Other
+identifiers may be used both as keywords and as user-defined
+identifiers, such as variable names.
+
+## Numbers
+
+Numbers are expressed in decimal.  A decimal point is optional.
+Numbers may be expressed in scientific notation by adding `e` and a
+base-10 exponent, so that `1.234e3` has the value 1234.  Here are
+some more examples of valid numbers:
+
+```
+-5  3.14159265359  1e100  -.707  8945.
+```
+
+Negative numbers are expressed with a `-` prefix.  However, in
+situations where a literal `-` token is expected, what appears to
+be a negative number is treated as `-` followed by a positive
+number.
+
+No white space is allowed within a number token, except for
+horizontal white space between `-` and the rest of the number.
+
+The last example above, `8945.` is interpreted as two tokens,
+`8945` and `.`, if it is the last token on a line.  *Note Forming
+commands of tokens: Commands.
+
+## Strings
+
+Strings are literal sequences of characters enclosed in pairs of
+single quotes (`'`) or double quotes (`"`).  To include the
+character used for quoting in the string, double it, e.g. `'it''s
+an apostrophe'`.  White space and case of letters are significant
+inside strings.
+
+Strings can be concatenated using `+`, so that `"a" + 'b' + 'c'` is
+equivalent to `'abc'`.  So that a long string may be broken across
+lines, a line break may precede or follow, or both precede and
+follow, the `+`.  (However, an entirely blank line preceding or
+following the `+` is interpreted as ending the current command.)
+
+Strings may also be expressed as hexadecimal character values by
+prefixing the initial quote character by `x` or `X`.  Regardless of
+the syntax file or active dataset's encoding, the hexadecimal
+digits in the string are interpreted as Unicode characters in UTF-8
+encoding.
+
+> Individual Unicode code points may also be expressed by specifying
+the hexadecimal code point number in single or double quotes
+preceded by `u` or `U`.  For example, Unicode code point U+1D11E,
+the musical G clef character, could be expressed as `U'1D11E'`.
+Invalid Unicode code points (above U+10FFFF or in between U+D800
+and U+DFFF) are not allowed.
+
+When strings are concatenated with `+`, each segment's prefix is
+considered individually.  For example, `'The G clef symbol is:' +
+u"1d11e" + "."` inserts a G clef symbol in the middle of an
+otherwise plain text string.
+
+## Punctuators and Operators
+
+These tokens are the punctuators and operators:
+
+```
+, / = ( ) + - * / ** < <= <> > >= ~= & | .
+```
+
+Most of these appear within the syntax of commands, but the period
+(`.`) punctuator is used only at the end of a command.  It is a
+punctuator only as the last character on a line (except white
+space).  When it is the last non-space character on a line, a
+period is not treated as part of another token, even if it would
+otherwise be part of, e.g., an identifier or a floating-point
+number.
+
diff --git a/rust/doc/src/language/datasets/formats/basic.md b/rust/doc/src/language/datasets/formats/basic.md
new file mode 100644
index 0000000000..10d4d316ee
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/basic.md
@@ -0,0 +1,161 @@
+# Basic Numeric Formats
+
+The basic numeric formats are used for input and output of real numbers
+in standard or scientific notation.  The following table shows an
+example of how each format displays positive and negative numbers with
+the default decimal point setting:
+
+|Format         |3141.59        |-3141.59|
+|:--------------|--------------:|---------:|
+|`F8.2`         |` 3141.59`     |`-3141.59`|
+|`COMMA9.2`     |` 3,141.59`    |`-3,141.59`|
+|`DOT9.2`       |` 3.141,59`    |`-3.141,59`|
+|`DOLLAR10.2`   |` $3,141.59`   |`-$3,141.59`|
+|`PCT9.2`       |` 3141.59%`    |`-3141.59%`|
+|`E8.1`         |` 3.1E+003`    |`-3.1E+003`|
+
+   On output, numbers in `F` format are expressed in standard decimal
+notation with the requested number of decimal places.  The other formats
+output some variation on this style:
+
+   - Numbers in `COMMA` format are additionally grouped every three digits
+     by inserting a grouping character.  The grouping character is
+     ordinarily a comma, but it can be changed to a period (*note SET
+     DECIMAL::).
+
+   - `DOT` format is like `COMMA` format, but it interchanges the role of
+     the decimal point and grouping characters.  That is, the current
+     grouping character is used as a decimal point and vice versa.
+
+   - `DOLLAR` format is like `COMMA` format, but it prefixes the number with
+     `$`.
+
+   - `PCT` format is like `F` format, but adds `%` after the number.
+
+   - The `E` format always produces output in scientific notation.
+
+   On input, the basic numeric formats accept positive and numbers in
+standard decimal notation or scientific notation.  Leading and trailing
+spaces are allowed.  An empty or all-spaces field, or one that contains
+only a single period, is treated as the system missing value.
+
+   In scientific notation, the exponent may be introduced by a sign (`+`
+or `-`), or by one of the letters `e` or `d` (in uppercase or
+lowercase), or by a letter followed by a sign.  A single space may
+follow the letter or the sign or both.
+
+   On fixed-format `DATA LIST` (*note DATA LIST FIXED::) and in a few
+other contexts, decimals are implied when the field does not contain a
+decimal point.  In `F6.5` format, for example, the field `314159` is taken
+as the value 3.14159 with implied decimals.  Decimals are never implied
+if an explicit decimal point is present or if scientific notation is
+used.
+
+   `E` and `F` formats accept the basic syntax already described.  The other
+formats allow some additional variations:
+
+- `COMMA`, `DOLLAR`, and `DOT` formats ignore grouping characters within
+  the integer part of the input field.  The identity of the grouping
+  character depends on the format.
+
+- `DOLLAR` format allows a dollar sign to precede the number.  In a
+  negative number, the dollar sign may precede or follow the minus
+  sign.
+
+- `PCT` format allows a percent sign to follow the number.
+
+   All of the basic number formats have a maximum field width of 40 and
+accept no more than 16 decimal places, on both input and output.  Some
+additional restrictions apply:
+
+- As input formats, the basic numeric formats allow no more decimal
+  places than the field width.  As output formats, the field width
+  must be greater than the number of decimal places; that is, large
+  enough to allow for a decimal point and the number of requested
+  decimal places.  `DOLLAR` and `PCT` formats must allow an additional
+  column for `$` or `%`.
+
+- The default output format for a given input format increases the
+  field width enough to make room for optional input characters.  If
+  an input format calls for decimal places, the width is increased by
+  1 to make room for an implied decimal point.  `COMMA`, `DOT`, and
+  `DOLLAR` formats also increase the output width to make room for
+  grouping characters.  `DOLLAR` and `PCT` further increase the output
+  field width by 1 to make room for `$` or `%`.  The increased output
+  width is capped at 40, the maximum field width.
+
+- The `E` format is exceptional.  For output, `E` format has a minimum
+  width of 7 plus the number of decimal places.  The default output
+  format for an `E` input format is an `E` format with at least 3 decimal
+  places and thus a minimum width of 10.
+
+More details of basic numeric output formatting are given below:
+
+- Output rounds to nearest, with ties rounded away from zero.  Thus,
+  2.5 is output as `3` in `F1.0` format, and -1.125 as `-1.13` in `F5.1`
+  format.
+
+- The system-missing value is output as a period in a field of
+  spaces, placed in the decimal point's position, or in the rightmost
+  column if no decimal places are requested.  A period is used even
+  if the decimal point character is a comma.
+
+- A number that does not fill its field is right-justified within the
+  field.
+
+- A number is too large for its field causes decimal places to be
+  dropped to make room.  If dropping decimals does not make enough
+  room, scientific notation is used if the field is wide enough.  If
+  a number does not fit in the field, even in scientific notation,
+  the overflow is indicated by filling the field with asterisks
+  (`*`).
+
+- `COMMA`, `DOT`, and `DOLLAR` formats insert grouping characters only if
+  space is available for all of them.  Grouping characters are never
+  inserted when all decimal places must be dropped.  Thus, 1234.56 in
+  `COMMA5.2` format is output as ` 1235` without a comma, even though
+  there is room for one, because all decimal places were dropped.
+
+- `DOLLAR` or `PCT` format drop the `$` or `%` only if the number would
+  not fit at all without it.  Scientific notation with `$` or `%` is
+  preferred to ordinary decimal notation without it.
+
+- Except in scientific notation, a decimal point is included only
+  when it is followed by a digit.  If the integer part of the number
+  being output is 0, and a decimal point is included, then PSPP
+  ordinarily drops the zero before the decimal point.  However, in
+  `F`, `COMMA`, or `DOT` formats, PSPP keeps the zero if `SET
+  LEADZERO` is set to `ON` (*note SET LEADZERO::).
+
+  In scientific notation, the number always includes a decimal point,
+  even if it is not followed by a digit.
+
+- A negative number includes a minus sign only in the presence of a
+  nonzero digit: -0.01 is output as `-.01` in `F4.2` format but as
+  `  .0` in `F4.1` format.  Thus, a "negative zero" never includes a
+  minus sign.
+
+- In negative numbers output in `DOLLAR` format, the dollar sign
+  follows the negative sign.  Thus, -9.99 in `DOLLAR6.2` format is
+  output as `-$9.99`.
+
+- In scientific notation, the exponent is output as `E` followed by
+  `+` or `-` and exactly three digits.  Numbers with magnitude less
+  than 10**-999 or larger than 10**999 are not supported by most
+  computers, but if they are supported then their output is
+  considered to overflow the field and they are output as asterisks.
+
+- On most computers, no more than 15 decimal digits are significant
+  in output, even if more are printed.  In any case, output precision
+  cannot be any higher than input precision; few data sets are
+  accurate to 15 digits of precision.  Unavoidable loss of precision
+  in intermediate calculations may also reduce precision of output.
+
+- Special values such as infinities and "not a number" values are
+  usually converted to the system-missing value before printing.  In
+  a few circumstances, these values are output directly.  In fields
+  of width 3 or greater, special values are output as however many
+  characters fit from `+Infinity` or `-Infinity` for infinities, from
+  `NaN` for "not a number," or from `Unknown` for other values (if
+  any are supported by the system).  In fields under 3 columns wide,
+  special values are output as asterisks.
diff --git a/rust/doc/src/language/datasets/formats/index.md b/rust/doc/src/language/datasets/formats/index.md
new file mode 100644
index 0000000000..4d68607530
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/index.md
@@ -0,0 +1,31 @@
+# Input and Output Formats
+
+An "input format" describes how to interpret the contents of an input
+field as a number or a string.  It might specify that the field contains
+an ordinary decimal number, a time or date, a number in binary or
+hexadecimal notation, or one of several other notations.  Input formats
+are used by commands such as `DATA LIST` that read data or syntax files
+into the PSPP active dataset.
+
+   Every input format corresponds to a default "output format" that
+specifies the formatting used when the value is output later.  It is
+always possible to explicitly specify an output format that resembles
+the input format.  Usually, this is the default, but in cases where the
+input format is unfriendly to human readability, such as binary or
+hexadecimal formats, the default output format is an easier-to-read
+decimal format.
+
+   Every variable has two output formats, called its "print format" and
+"write format".  Print formats are used in most output contexts; write
+formats are used only by `WRITE` (*note WRITE::).  Newly created
+variables have identical print and write formats, and `FORMATS`, the
+most commonly used command for changing formats (*note FORMATS::), sets
+both of them to the same value as well.  Thus, most of the time, the
+distinction between print and write formats is unimportant.
+
+   Input and output formats are specified to PSPP with a "format
+specification" of the form `TypeW` or `TypeW.D`, where `Type` is one
+of the format types described later, `W` is a field width measured in
+columns, and `D` is an optional number of decimal places.  If `D` is
+omitted, a value of 0 is assumed.  Some formats do not allow a nonzero
+`D` to be specified.
diff --git a/rust/doc/src/language/datasets/index.md b/rust/doc/src/language/datasets/index.md
new file mode 100644
index 0000000000..679a0a9811
--- /dev/null
+++ b/rust/doc/src/language/datasets/index.md
@@ -0,0 +1,13 @@
+# Datasets
+
+PSPP works with data organized into "datasets".  A dataset consists of a
+set of "variables", which taken together are said to form a
+"dictionary", and one or more "cases", each of which has one value for
+each variable.
+
+   At any given time PSPP has exactly one distinguished dataset, called
+the "active dataset".  Most PSPP commands work only with the active
+dataset.  In addition to the active dataset, PSPP also supports any
+number of additional open datasets.  The `DATASET` commands can choose a
+new active dataset from among those that are open, as well as create and
+destroy datasets (*note DATASET::).
diff --git a/rust/doc/src/language/datasets/variable-lists.md b/rust/doc/src/language/datasets/variable-lists.md
new file mode 100644
index 0000000000..2e46e9316d
--- /dev/null
+++ b/rust/doc/src/language/datasets/variable-lists.md
@@ -0,0 +1,24 @@
+# Variable Lists
+
+To refer to a set of variables, list their names one after another.
+Optionally, their names may be separated by commas.  To include a
+range of variables from the dictionary in the list, write the name of
+the first and last variable in the range, separated by `TO`.  For
+instance, if the dictionary contains six variables with the names
+`ID`, `X1`, `X2`, `GOAL`, `MET`, and `NEXTGOAL`, in that order, then
+`X2 TO MET` would include variables `X2`, `GOAL`, and `MET`.
+
+   Commands that define variables, such as `DATA LIST`, give `TO` an
+alternate meaning.  With these commands, `TO` define sequences of
+variables whose names end in consecutive integers.  The syntax is two
+identifiers that begin with the same root and end with numbers,
+separated by `TO`.  The syntax `X1 TO X5` defines 5 variables, named
+`X1`, `X2`, `X3`, `X4`, and `X5`.  The syntax `ITEM0008 TO ITEM0013`
+defines 6 variables, named `ITEM0008`, `ITEM0009`, `ITEM0010`,
+`ITEM0011`, `ITEM0012`, and `ITEM00013`.  The syntaxes `QUES001 TO
+QUES9` and `QUES6 TO QUES3` are invalid.
+
+   After a set of variables has been defined with `DATA LIST` or
+another command with this method, the same set can be referenced on
+later commands using the same syntax.
+
diff --git a/rust/doc/src/language/datasets/variables.md b/rust/doc/src/language/datasets/variables.md
new file mode 100644
index 0000000000..74dae3f42c
--- /dev/null
+++ b/rust/doc/src/language/datasets/variables.md
@@ -0,0 +1,130 @@
+# Attributes of Variables
+
+Each variable has a number of attributes, including:
+
+* Name  
+  An identifier, up to 64 bytes long.  Each variable must have a
+  different name.  *Note Tokens::.
+
+  Some system variable names begin with `$`, but user-defined
+  variables' names may not begin with `$`.
+
+  The final character in a variable name should not be `.`, because
+  such an identifier will be misinterpreted when it is the final
+  token on a line: `FOO.` is divided into two separate tokens, `FOO`
+  and `.`, indicating end-of-command.  *Note Tokens::.
+
+  The final character in a variable name should not be `_`, because
+  some such identifiers are used for special purposes by PSPP
+  procedures.
+
+  As with all PSPP identifiers, variable names are not
+  case-sensitive.  PSPP capitalizes variable names on output the same
+  way they were capitalized at their point of definition in the
+  input.
+
+* Type  
+  Numeric or string.
+
+* Width (string variables only)  
+  String variables with a width of 8
+  characters or fewer are called "short string variables".  Short
+  string variables may be used in a few contexts where "long string
+  variables" (those with widths greater than 8) are not allowed.
+
+* Position  
+  Variables in the dictionary are arranged in a specific order.
+  `DISPLAY` can be used to show this order: see *note DISPLAY::.
+
+* Initialization  
+  Either reinitialized to 0 or spaces for each case, or left at its
+  existing value.  *Note LEAVE::.
+
+* Missing values  
+  Optionally, up to three values, or a range of values, or a specific
+  value plus a range, can be specified as "user-missing values".
+  There is also a "system-missing value" that is assigned to an
+  observation when there is no other obvious value for that
+  observation.  Observations with missing values are automatically
+  excluded from analyses.  User-missing values are actual data
+  values, while the system-missing value is not a value at all.
+  *Note Missing Observations::.
+
+* Variable label  
+  A string that describes the variable.  *Note VARIABLE LABELS::.
+
+* Value label  
+  Optionally, these associate each possible value of the variable
+  with a string.  *Note VALUE LABELS::.
+
+* Print format  
+  Display width, format, and (for numeric variables) number of
+  decimal places.  This attribute does not affect how data are
+  stored, just how they are displayed.  Example: a width of 8, with 2
+  decimal places.  *Note Input and Output Formats::.
+
+* Write format  
+  Similar to print format, but used by the `WRITE` command (*note
+  WRITE::).
+
+* Measurement level  
+  One of the following:
+
+  - *Nominal*: Each value of a nominal variable represents a distinct
+    category.  The possible categories are finite and often have value
+    labels.  The order of categories is not significant.  Political
+    parties, US states, and yes/no choices are nominal.  Numeric and
+    string variables can be nominal.
+
+  - *Ordinal*: Ordinal variables also represent distinct categories, but
+    their values are arranged according to some natural order.  Likert
+    scales, e.g. from strongly disagree to strongly agree, are
+    ordinal.  Data grouped into ranges, e.g. age groups or income
+    groups, are ordinal.  Both numeric and string variables can be
+    ordinal.  String values are ordered alphabetically, so letter
+    grades from A to F will work as expected, but `poor`,
+    `satisfactory`, `excellent` will not.
+
+  - *Scale*: Scale variables are ones for which differences and ratios
+    are meaningful.  These are often values which have a natural unit
+    attached, such as age in years, income in dollars, or distance in
+    miles.  Only numeric variables are scalar.
+
+  Variables created by `COMPUTE` and similar transformations,
+  obtained from external sources, etc., initially have an unknown
+  measurement level.  Any procedure that reads the data will then
+  assign a default measurement level.  PSPP can assign some defaults
+  without reading the data:
+
+  - Nominal, if it's a string variable.
+
+  - Nominal, if the variable has a WKDAY or MONTH print format.
+
+  - Scale, if the variable has a DOLLAR, CCA through CCE, or time
+    or date print format.
+
+  Otherwise, PSPP reads the data and decides based on its
+  distribution:
+
+  - Nominal, if all observations are missing.
+
+  - Scale, if one or more valid observations are noninteger or
+    negative.
+
+  - Scale, if no valid observation is less than 10.
+
+  - Scale, if the variable has 24 or more unique valid values.
+    The value 24 is the default and can be adjusted (*note SET
+    SCALEMIN::).
+
+  Finally, if none of the above is true, PSPP assigns the variable a
+  nominal measurement level.
+
+* Custom attributes  
+  User-defined associations between names and values.  *Note VARIABLE
+  ATTRIBUTE::.
+
+* Role  
+  The intended role of a variable for use in dialog boxes in
+  graphical user interfaces.  *Note VARIABLE ROLE::.
+
diff --git a/rust/doc/src/language/index.md b/rust/doc/src/language/index.md
new file mode 100644
index 0000000000..b7d305c858
--- /dev/null
+++ b/rust/doc/src/language/index.md
@@ -0,0 +1,2 @@
+This chapter discusses elements common to many PSPP commands.  Later
+chapters describe individual commands in detail.
diff --git a/rust/doc/src/license.md b/rust/doc/src/license.md
index d273a3cbb8..3ed60ec097 100644
--- a/rust/doc/src/license.md
+++ b/rust/doc/src/license.md
@@ -33,4 +33,5 @@ PSPP is licensed under the [GNU General Public
 License](https://www.gnu.org/licenses/licenses.html#GPL), version 3 or
 later.  This manual is licensed under the [GNU Free Documentation
 License](https://www.gnu.org/licenses/licenses.html#FDL), version 1.3
-or later.
+or later; with no Invariant Sections, no Front-Cover Texts, and no
+Back-Cover Texts.