work on manual

author Ben Pfaff <blp@cs.stanford.edu>

Tue, 6 May 2025 15:17:00 +0000 (08:17 -0700)

committer Ben Pfaff <blp@cs.stanford.edu>

Tue, 6 May 2025 15:17:00 +0000 (08:17 -0700)
author Ben Pfaff <blp@cs.stanford.edu>
Tue, 6 May 2025 15:17:00 +0000 (08:17 -0700)
committer Ben Pfaff <blp@cs.stanford.edu>
Tue, 6 May 2025 15:17:00 +0000 (08:17 -0700)
diff --git a/rust/doc/src/SUMMARY.md b/rust/doc/src/SUMMARY.md

index e6e59f1056c228e4b2d615a57bea2b16e5947771..d2b7594fc9c3d41cb87691a00a6fb5ab939b94da 100644 (file)
--- a/rust/doc/src/SUMMARY.md
+++ b/rust/doc/src/SUMMARY.md
@@ -15,6 +15,16 @@
    - [Variable Lists](language/datasets/variable-lists.md)
    - [Input and Output Formats](language/datasets/formats/index.md)
      - [Basic Numeric Formats](language/datasets/formats/basic.md)
+    - [Custom Currency Formats](language/datasets/formats/custom-currency.md)
+    - [Legacy Numeric Formats](language/datasets/formats/legacy-numeric.md)
+    - [Binary and Hexadecimal Numeric Formats](language/datasets/formats/binary-and-hex.md)
+    - [Time and Date Formats](language/datasets/formats/time-and-date.md)
+    - [Date Component Formats](language/datasets/formats/date-component.md)
+    - [String Formats](language/datasets/formats/string.md)
+  - [Scratch Variables](language/datasets/scratch-variables.md)
+- [Files Used by PSPP](language/files/index.md)
+  - [File Handles](language/files/file-handles.md)
+- [Syntax Diagrams](language/syntax-diagrams.md)
  
  # Developer Documentation
  
diff --git a/rust/doc/src/language/datasets/formats/binary-and-hex.md b/rust/doc/src/language/datasets/formats/binary-and-hex.md

new file mode 100644 (file)

index 0000000..9cfa9cf
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/binary-and-hex.md
@@ -0,0 +1,89 @@
+# Binary and Hexadecimal Numeric Formats
+
+The binary and hexadecimal formats are primarily designed for
+compatibility with existing machine formats, not for human
+readability.  All of them therefore have a `F` format as default
+output format.  Some of these formats are only portable between
+machines with compatible byte ordering (endianness).
+
+   Binary formats use byte values that in text files are interpreted
+as special control functions, such as carriage return and line feed.
+Thus, data in binary formats should not be included in syntax files or
+read from data files with variable-length records, such as ordinary
+text files.  They may be read from or written to data files with
+fixed-length records.  *Note FILE HANDLE::, for information on working
+with fixed-length records.
+
+## `P` and `PK` Formats
+
+These are binary-coded decimal formats, in which every byte (except
+the last, in `P` format) represents two decimal digits.  The
+most-significant 4 bits of the first byte is the most-significant
+decimal digit, the least-significant 4 bits of the first byte is the
+next decimal digit, and so on.
+
+   In `P` format, the most-significant 4 bits of the last byte are the
+least-significant decimal digit.  The least-significant 4 bits
+represent the sign: decimal 15 indicates a negative value, decimal 13
+indicates a positive value.
+
+   Numbers are rounded downward on output.  The system-missing value and
+numbers outside representable range are output as zero.
+
+   The maximum field width is 16.  Decimal places may range from 0 up to
+the number of decimal digits represented by the field.
+
+   The default output format is an `F` format with twice the input
+field width, plus one column for a decimal point (if decimal places
+were requested).
+
+## `IB` and `PIB` Formats
+
+These are integer binary formats.  `IB` reads and writes 2's
+complement binary integers, and `PIB` reads and writes unsigned binary
+integers.  The byte ordering is by default the host machine's, but
+`SET RIB` may be used to select a specific byte ordering for reading
+(*note SET RIB::) and `SET WIB`, similarly, for writing (*note SET
+WIB::).
+
+   The maximum field width is 8.  Decimal places may range from 0 up to
+the number of decimal digits in the largest value representable in the
+field width.
+
+   The default output format is an `F` format whose width is the
+number of decimal digits in the largest value representable in the
+field width, plus 1 if the format has decimal places.
+
+## `RB` Format
+
+This is a binary format for real numbers.  By default it reads and
+writes the host machine's floating-point format, but `SET RRB` may be
+used to select an alternate floating-point format for reading (*note
+SET RRB::) and `SET WRB`, similarly, for writing (*note SET WRB::).
+
+The field width should be 4, for 32-bit floating-point numbers, or 8,
+for 64-bit floating-point numbers.  Other field widths do not produce
+useful results.  The maximum field width is 8.  No decimal places may
+be specified.
+
+   The default output format is `F8.2`.
+
+## `PIBHEX` and `RBHEX` Formats
+
+These are hexadecimal formats, for reading and writing binary formats
+where each byte has been recoded as a pair of hexadecimal digits.
+
+   A hexadecimal field consists solely of hexadecimal digits `0`...`9`
+and `A`...`F`.  Uppercase and lowercase are accepted on input; output is
+in uppercase.
+
+   Other than the hexadecimal representation, these formats are
+equivalent to `PIB` and `RB` formats, respectively.  However, bytes in
+`PIBHEX` format are always ordered with the most-significant byte
+first (big-endian order), regardless of the host machine's native byte
+order or PSPP settings.
+
+   Field widths must be even and between 2 and 16.  `RBHEX` format
+allows no decimal places; `PIBHEX` allows as many decimal places as a
+`PIB` format with half the given width.
+
diff --git a/rust/doc/src/language/datasets/formats/custom-currency.md b/rust/doc/src/language/datasets/formats/custom-currency.md

new file mode 100644 (file)

index 0000000..1b6ef16
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/custom-currency.md
@@ -0,0 +1,60 @@
+# Custom Currency Formats
+
+The custom currency formats are closely related to the basic numeric
+formats, but they allow users to customize the output format.  The SET
+command configures custom currency formats, using the syntax
+
+```
+SET CCX="STRING".
+```
+
+where X is `A`, `B`, `C`, `D`, or `E`, and `STRING` is no more than 16
+characters long.
+
+   `STRING` must contain exactly three commas or exactly three periods
+(but not both), except that a single quote character may be used to
+"escape" a following comma, period, or single quote.  If three commas
+are used, commas are used for grouping in output, and a period is used
+as the decimal point.  Uses of periods reverses these roles.
+
+   The commas or periods divide `STRING` into four fields, called the
+"negative prefix", "prefix", "suffix", and "negative suffix",
+respectively.  The prefix and suffix are added to output whenever
+space is available.  The negative prefix and negative suffix are
+always added to a negative number when the output includes a nonzero
+digit.
+
+   The following syntax shows how custom currency formats could be used
+to reproduce basic numeric formats:
+
+```
+SET CCA="-,,,".  /* Same as COMMA.
+SET CCB="-...".  /* Same as DOT.
+SET CCC="-,$,,". /* Same as DOLLAR.
+SET CCD="-,,%,". /* Like PCT, but groups with commas.
+```
+
+   Here are some more examples of custom currency formats.  The final
+example shows how to use a single quote to escape a delimiter:
+
+```
+SET CCA=",EUR,,-".   /* Euro.
+SET CCB="(,USD ,,)". /* US dollar.
+SET CCC="-.R$..".    /* Brazilian real.
+SET CCD="-,, NIS,".  /* Israel shekel.
+SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
+```
+
+These formats would yield the following output:
+
+|Format    |` 3145.59`         |`-3145.59`|
+|:---------|------------------:|---------------:|
+|`CCA12.2` |  ` EUR3,145.59`   |  `EUR3,145.59-`|
+|`CCB14.2` |  `  USD 3,145.59` |  `(USD 3,145.59)`|
+|`CCC11.2` |  ` R$3.145,59`    |  `-R$3.145,59`|
+|`CCD13.2` |  ` 3,145.59 NIS`  |  `-3,145.59 NIS`|
+|`CCE10.0` |  ` Rp. 3.146`     |  `-Rp. 3.146`|
+
+   The default for all the custom currency formats is `-,,,`, equivalent
+to `COMMA` format.
+
diff --git a/rust/doc/src/language/datasets/formats/date-component.md b/rust/doc/src/language/datasets/formats/date-component.md

new file mode 100644 (file)

index 0000000..376e8ba
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/date-component.md
@@ -0,0 +1,21 @@
+# Date Component Formats
+
+The `WKDAY` and `MONTH` formats provide input and output for the names of
+weekdays and months, respectively.
+
+   On output, these formats convert a number between 1 and 7, for
+`WKDAY`, or between 1 and 12, for `MONTH`, into the English name of a
+day or month, respectively.  If the name is longer than the field, it
+is trimmed to fit.  If the name is shorter than the field, it is
+padded on the right with spaces.  Values outside the valid range, and
+the system-missing value, are output as all spaces.
+
+   On input, English weekday or month names (in uppercase or lowercase)
+are converted back to their corresponding numbers.  Weekday and month
+names may be abbreviated to their first 2 or 3 letters, respectively.
+
+   The field width may range from 2 to 40, for `WKDAY`, or from 3 to
+40, for `MONTH`. No decimal places are allowed.
+
+   The default output format is the same as the input format.
+
diff --git a/rust/doc/src/language/datasets/formats/legacy-numeric.md b/rust/doc/src/language/datasets/formats/legacy-numeric.md

new file mode 100644 (file)

index 0000000..28c2b58
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/legacy-numeric.md
@@ -0,0 +1,74 @@
+# Legacy Numeric Formats
+
+The `N` and `Z` numeric formats provide compatibility with legacy file
+formats.  They have much in common:
+
+   - Output is rounded to the nearest representable value, with ties
+     rounded away from zero.
+
+   - Numbers too large to display are output as a field filled with
+     asterisks (`*`).
+
+   - The decimal point is always implicitly the specified number of
+     digits from the right edge of the field, except that `Z` format input
+     allows an explicit decimal point.
+
+   - Scientific notation may not be used.
+
+   - The system-missing value is output as a period in a field of
+     spaces.  The period is placed just to the right of the implied
+     decimal point in `Z` format, or at the right end in `N` format or
+     in `Z` format if no decimal places are requested.  A period is
+     used even if the decimal point character is a comma.
+
+   - Field width may range from 1 to 40.  Decimal places may range from
+     0 up to the field width, to a maximum of 16.
+
+   - When a legacy numeric format used for input is converted to an
+     output format, it is changed into the equivalent `F` format.  The
+     field width is increased by 1 if any decimal places are
+     specified, to make room for a decimal point.  For `Z` format, the
+     field width is increased by 1 more column, to make room for a
+     negative sign.  The output field width is capped at 40 columns.
+
+## `N` Format
+
+The `N` format supports input and output of fields that contain only
+digits.  On input, leading or trailing spaces, a decimal point, or any
+other non-digit character causes the field to be read as the
+system-missing value.  As a special exception, an `N` format used on
+`DATA LIST FREE` or `DATA LIST LIST` is treated as the equivalent `F`
+format.
+
+   On output, `N` pads the field on the left with zeros.  Negative
+numbers are output like the system-missing value.
+
+## `Z` Format
+
+The `Z` format is a "zoned decimal" format used on IBM mainframes.  `Z`
+format encodes the sign as part of the final digit, which must be one of
+the following:
+
+```
+0123456789
+{ABCDEFGHI
+}JKLMNOPQR
+```
+
+where the characters on each line represent digits 0 through 9 in
+order.  Characters on the first two lines indicate a positive sign;
+those on the third indicate a negative sign.
+
+   On output, `Z` fields are padded on the left with spaces.  On
+input, leading and trailing spaces are ignored.  Any character in an
+input field other than spaces, the digit characters above, and `.`
+causes the field to be read as system-missing.
+
+   The decimal point character for input and output is always `.`, even
+if the decimal point character is a comma (*note SET DECIMAL::).
+
+   Nonzero, negative values output in `Z` format are marked as
+negative even when no nonzero digits are output.  For example, -0.2 is
+output in `Z1.0` format as `J`.  The "negative zero" value supported
+by most machines is output as positive.
+
diff --git a/rust/doc/src/language/datasets/formats/string.md b/rust/doc/src/language/datasets/formats/string.md

new file mode 100644 (file)

index 0000000..24d4a4c
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/string.md
@@ -0,0 +1,14 @@
+# String Formats
+
+The `A` and `AHEX` formats are the only ones that may be assigned to
+string variables.  Neither format allows any decimal places.
+
+   In `A` format, the entire field is treated as a string value.  The
+field width may range from 1 to 32,767, the maximum string width.  The
+default output format is the same as the input format.
+
+   In `AHEX` format, the field is composed of characters in a string
+encoded as hex digit pairs.  On output, hex digits are output in
+uppercase; on input, uppercase and lowercase are both accepted.  The
+default output format is `A` format with half the input width.
+
diff --git a/rust/doc/src/language/datasets/formats/time-and-date.md b/rust/doc/src/language/datasets/formats/time-and-date.md

new file mode 100644 (file)

index 0000000..ed7ddbb
--- /dev/null
+++ b/rust/doc/src/language/datasets/formats/time-and-date.md
@@ -0,0 +1,147 @@
+# Time and Date Formats
+
+In PSPP, a "time" is an interval.  The time formats translate between
+human-friendly descriptions of time intervals and PSPP's internal
+representation of time intervals, which is simply the number of seconds
+in the interval.  PSPP has three time formats:
+
+|Time Format   |Template                    |Example|
+|:-------------|:---------------------------|:---------------------------|
+|MTIME         |`MM:SS.ss`                  |`91:17.01`|
+|TIME          |`hh:MM:SS.ss`               |`01:31:17.01`|
+|DTIME         |`DD HH:MM:SS.ss`            |`00 04:31:17.01`|
+
+   A "date" is a moment in the past or the future.  Internally, PSPP
+represents a date as the number of seconds since the "epoch", midnight,
+Oct.  14, 1582.  The date formats translate between human-readable dates
+and PSPP's numeric representation of dates and times.  PSPP has several
+date formats:
+
+|Date Format   |Template                    |Example|
+|:-------------|:---------------------------|:---------------------------|
+|DATE          |`dd-mmm-yyyy`               |`01-OCT-1978`|
+|ADATE         |`mm/dd/yyyy`                |`10/01/1978`|
+|EDATE         |`dd.mm.yyyy`                |`01.10.1978`|
+|JDATE         |`yyyyjjj`                   |`1978274`|
+|SDATE         |`yyyy/mm/dd`                |`1978/10/01`|
+|QYR           |`q Q yyyy`                  |`3 Q 1978`|
+|MOYR          |`mmm yyyy`                  |`OCT 1978`|
+|WKYR          |`ww WK yyyy`                |`40 WK 1978`|
+|DATETIME      |`dd-mmm-yyyy HH:MM:SS.ss`   |`01-OCT-1978 04:31:17.01`|
+|YMDHMS        |`yyyy-mm-dd HH:MM:SS.ss`    |`1978-01-OCT 04:31:17.01`|
+
+   The templates in the preceding tables describe how the time and date
+formats are input and output:
+
+* `dd`  
+  Day of month, from 1 to 31.  Always output as two digits.
+
+* `mm`  
+  `mmm`  
+  Month.  In output, `mm` is output as two digits, `mmm` as the first
+  three letters of an English month name (January, February, ...).
+  In input, both of these formats, plus Roman numerals, are accepted.
+
+* `yyyy`  
+  Year.  In output, `DATETIME` and `YMDHMS` always produce 4-digit years;
+  other formats can produce a 2- or 4-digit year.  The century
+  assumed for 2-digit years depends on the `EPOCH` setting (*note SET
+  EPOCH::).  In output, a year outside the epoch causes the whole
+  field to be filled with asterisks (`*`).
+
+* `jjj`  
+  Day of year (Julian day), from 1 to 366.  This is exactly three
+  digits giving the count of days from the start of the year.
+  January 1 is considered day 1.
+
+* `q`  
+  Quarter of year, from 1 to 4.  Quarters start on January 1, April
+  1, July 1, and October 1.
+
+* `ww`  
+  Week of year, from 1 to 53.  Output as exactly two digits.  January
+  1 is the first day of week 1.
+
+* `DD`  
+  Count of days, which may be positive or negative.  Output as at
+  least two digits.
+
+* `hh`  
+  Count of hours, which may be positive or negative.  Output as at
+  least two digits.
+
+* `HH`  
+  Hour of day, from 0 to 23.  Output as exactly two digits.
+
+* `MM`  
+  In MTIME, count of minutes, which may be positive or negative.
+  Output as at least two digits.
+
+  In other formats, minute of hour, from 0 to 59.  Output as exactly
+  two digits.
+
+* `SS.ss`  
+  Seconds within minute, from 0 to 59.  The integer part is output as
+  exactly two digits.  On output, seconds and fractional seconds may
+  or may not be included, depending on field width and decimal
+  places.  On input, seconds and fractional seconds are optional.
+  The `DECIMAL` setting controls the character accepted and displayed
+  as the decimal point (*note SET DECIMAL::).
+
+   For output, the date and time formats use the delimiters indicated in
+the table.  For input, date components may be separated by spaces or by
+one of the characters `-`, `/`, `.`, or `,`, and time components may be
+separated by spaces or `:`.  On input, the `Q` separating quarter from
+year and the `WK` separating week from year may be uppercase or
+lowercase, and the spaces around them are optional.
+
+   On input, all time and date formats accept any amount of leading and
+trailing white space.
+
+   The maximum width for time and date formats is 40 columns.  Minimum
+input and output width for each of the time and date formats is shown
+below:
+
+|Format       |Min. Input Width    |Min. Output Width    |Option|
+|:------------|-------------------:|--------------------:|:------------|
+|`DATE`       |8                   |9                    |4-digit year|
+|`ADATE`      |8                   |8                    |4-digit year|
+|`EDATE`      |8                   |8                    |4-digit year|
+|`JDATE`      |5                   |5                    |4-digit year|
+|`SDATE`      |8                   |8                    |4-digit year|
+|`QYR`        |4                   |6                    |4-digit year|
+|`MOYR`       |6                   |6                    |4-digit year|
+|`WKYR`       |6                   |8                    |4-digit year|
+|`DATETIME`   |17                  |17                   |seconds|
+|`YMDHMS`     |12                  |16                   |seconds|
+|`MTIME`      |4                   |5
+|`TIME`       |5                   |5                    |seconds|
+|`DTIME`      |8                   |8                    |seconds|
+
+In the table, "Option" describes what increased output width enables:
+
+* "4-digit year": A field 2 columns wider than the minimum includes a
+  4-digit year.  (`DATETIME` and `YMDHMS` formats always include a
+  4-digit year.)
+
+* "seconds": A field 3 columns wider than the minimum includes seconds
+  as well as minutes.  A field 5 columns wider than minimum, or more,
+  can also include a decimal point and fractional seconds (but no more
+  than allowed by the format's decimal places).
+
+   For the time and date formats, the default output format is the same
+as the input format, except that PSPP increases the field width, if
+necessary, to the minimum allowed for output.
+
+   Time or dates narrower than the field width are right-justified
+within the field.
+
+   When a time or date exceeds the field width, characters are trimmed
+from the end until it fits.  This can occur in an unusual situation,
+e.g. with a year greater than 9999 (which adds an extra digit), or for
+a negative value on `MTIME`, `TIME`, or `DTIME` (which adds a leading
+minus sign).
+
+   The system-missing value is output as a period at the right end of
+the field.
+
diff --git a/rust/doc/src/language/datasets/scratch-variables.md b/rust/doc/src/language/datasets/scratch-variables.md

new file mode 100644 (file)

index 0000000..cd1397a
--- /dev/null
+++ b/rust/doc/src/language/datasets/scratch-variables.md
@@ -0,0 +1,22 @@
+# Scratch Variables
+
+Most of the time, variables don't retain their values between cases.
+Instead, either they're being read from a data file or the active
+dataset, in which case they assume the value read, or, if created with
+`COMPUTE` or another transformation, they're initialized to the
+system-missing value or to blanks, depending on type.
+
+   However, sometimes it's useful to have a variable that keeps its
+value between cases.  You can do this with `LEAVE` (*note LEAVE::), or
+you can use a "scratch variable".  Scratch variables are variables whose
+names begin with an octothorpe (`#`).
+
+   Scratch variables have the same properties as variables left with
+`LEAVE`: they retain their values between cases, and for the first case
+they are initialized to 0 or blanks.  They have the additional property
+that they are deleted before the execution of any procedure.  For this
+reason, scratch variables can't be used for analysis.  To use a scratch
+variable in an analysis, use `COMPUTE` (*note COMPUTE::) to copy its
+value into an ordinary variable, then use that ordinary variable in the
+analysis.
+
diff --git a/rust/doc/src/language/files/file-handles.md b/rust/doc/src/language/files/file-handles.md

new file mode 100644 (file)

index 0000000..d987801
--- /dev/null
+++ b/rust/doc/src/language/files/file-handles.md
@@ -0,0 +1,38 @@
+# File Handles
+
+A "file handle" is a reference to a data file, system file, or portable
+file.  Most often, a file handle is specified as the name of a file as a
+string, that is, enclosed within `'` or `"`.
+
+   A file name string that begins or ends with `|` is treated as the
+name of a command to pipe data to or from.  You can use this feature to
+read data over the network using a program such as `curl` (e.g. `GET
+'|curl -s -S http://example.com/mydata.sav'`), to read compressed data
+from a file using a program such as `zcat` (e.g. `GET '|zcat
+mydata.sav.gz'`), and for many other purposes.
+
+   PSPP also supports declaring named file handles with the `FILE
+HANDLE` command.  This command associates an identifier of your choice
+(the file handle's name) with a file.  Later, the file handle name can
+be substituted for the name of the file.  When PSPP syntax accesses a
+file multiple times, declaring a named file handle simplifies updating
+the syntax later to use a different file.  Use of `FILE HANDLE` is also
+required to read data files in binary formats.  *Note FILE HANDLE::, for
+more information.
+
+   In some circumstances, PSPP must distinguish whether a file handle
+refers to a system file or a portable file.  When this is necessary to
+read a file, e.g. as an input file for `GET` or `MATCH FILES`, PSPP uses
+the file's contents to decide.  In the context of writing a file, e.g.
+as an output file for `SAVE` or `AGGREGATE`, PSPP decides based on the
+file's name: if it ends in `.por` (with any capitalization), then PSPP
+writes a portable file; otherwise, PSPP writes a system file.
+
+   `INLINE` is reserved as a file handle name.  It refers to the "data
+file" embedded into the syntax file between `BEGIN DATA` and `END
+DATA`.  *Note BEGIN DATA::, for more information.
+
+   The file to which a file handle refers may be reassigned on a later
+`FILE HANDLE` command if it is first closed using `CLOSE FILE HANDLE`.
+*Note CLOSE FILE HANDLE::, for more information.
+
diff --git a/rust/doc/src/language/files/index.md b/rust/doc/src/language/files/index.md

new file mode 100644 (file)

index 0000000..fe0f4c3
--- /dev/null
+++ b/rust/doc/src/language/files/index.md
@@ -0,0 +1,32 @@
+# Files Used by PSPP
+
+PSPP makes use of many files each time it runs.  Some of these it
+reads, some it writes, some it creates.  Here is a table listing the
+most important of these files:
+
+* command file  
+  syntax file  
+  These names (synonyms) refer to the file that contains instructions
+  that tell PSPP what to do.  The syntax file's name is specified on
+  the PSPP command line.  Syntax files can also be read with
+  `INCLUDE` (*note INCLUDE::).
+
+* data file  
+  Data files contain raw data in text or binary format.  Data can
+  also be embedded in a syntax file with `BEGIN DATA` and `END DATA`.
+
+* listing file  
+  One or more output files are created by PSPP each time it is run.
+  The output files receive the tables and charts produced by
+  statistical procedures.  The output files may be in any number of
+  formats, depending on how PSPP is configured.
+
+* system file  
+  System files are binary files that store a dictionary and a set of
+  cases.  `GET` and `SAVE` read and write system files.
+
+* portable file  
+  Portable files are files in a text-based format that store a
+  dictionary and a set of cases.  `IMPORT` and `EXPORT` read and
+  write portable files.
+
diff --git a/rust/doc/src/language/syntax-diagrams.md b/rust/doc/src/language/syntax-diagrams.md

new file mode 100644 (file)

index 0000000..937599f
--- /dev/null
+++ b/rust/doc/src/language/syntax-diagrams.md
@@ -0,0 +1,47 @@
+# Syntax Diagrams
+
+The syntax of PSPP commands is presented in this manual with syntax
+diagrams.
+
+A syntax diagram is a series of definitions of "nonterminals".  Each
+nonterminal is defined its name, then `::=`, then what the nonterminal
+consists of.  If a nonterminal has multiple definitions, then any of
+them is acceptable.  If the definition is empty, then one possible
+expansion of that nonterminal is nothing.  Otherwise, the definition
+consists of a series of nonterminals and "terminals".  The latter
+represent single tokens and consist of:
+
+- `KEYWORD`  
+  Any word written in uppercase is that literal syntax keyword.
+
+- `number`  
+  A real number.
+
+- `integer`  
+  An integer number.
+
+- `string`  
+  A string.
+
+- `var-name`  
+  A single variable name.
+
+- `=`, `/`, `+`, `-`, etc.  
+  Operators and punctuators.
+
+- `.`  
+  The end of the command.  This is not necessarily an actual dot in
+  the syntax file (*note Commands::).
+
+Some nonterminals are very common, so they are defined here in English
+for clarity:
+
+- `var-list`  
+  A list of one or more variable names or the keyword `ALL`.
+
+- `expression`  
+  An expression.  *Note Expressions::, for details.
+
+The first nonterminal defined in a syntax diagram for a command is
+the entire syntax for that command.
+
author	Ben Pfaff <blp@cs.stanford.edu>
	Tue, 6 May 2025 15:17:00 +0000 (08:17 -0700)
committer	Ben Pfaff <blp@cs.stanford.edu>
	Tue, 6 May 2025 15:17:00 +0000 (08:17 -0700)
rust/doc/src/SUMMARY.md		patch \| blob \| history
rust/doc/src/language/datasets/formats/binary-and-hex.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/datasets/formats/custom-currency.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/datasets/formats/date-component.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/datasets/formats/legacy-numeric.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/datasets/formats/string.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/datasets/formats/time-and-date.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/datasets/scratch-variables.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/files/file-handles.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/files/index.md	[new file with mode: 0644]	patch \| blob
rust/doc/src/language/syntax-diagrams.md	[new file with mode: 0644]	patch \| blob