From: Ben Pfaff Date: Tue, 6 May 2025 15:17:00 +0000 (-0700) Subject: work on manual X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=ed6bb61438b487d0fc610b0da013ee87df20dabf;p=pspp work on manual --- diff --git a/rust/doc/src/SUMMARY.md b/rust/doc/src/SUMMARY.md index e6e59f1056..d2b7594fc9 100644 --- a/rust/doc/src/SUMMARY.md +++ b/rust/doc/src/SUMMARY.md @@ -15,6 +15,16 @@ - [Variable Lists](language/datasets/variable-lists.md) - [Input and Output Formats](language/datasets/formats/index.md) - [Basic Numeric Formats](language/datasets/formats/basic.md) + - [Custom Currency Formats](language/datasets/formats/custom-currency.md) + - [Legacy Numeric Formats](language/datasets/formats/legacy-numeric.md) + - [Binary and Hexadecimal Numeric Formats](language/datasets/formats/binary-and-hex.md) + - [Time and Date Formats](language/datasets/formats/time-and-date.md) + - [Date Component Formats](language/datasets/formats/date-component.md) + - [String Formats](language/datasets/formats/string.md) + - [Scratch Variables](language/datasets/scratch-variables.md) +- [Files Used by PSPP](language/files/index.md) + - [File Handles](language/files/file-handles.md) +- [Syntax Diagrams](language/syntax-diagrams.md) # Developer Documentation diff --git a/rust/doc/src/language/datasets/formats/binary-and-hex.md b/rust/doc/src/language/datasets/formats/binary-and-hex.md new file mode 100644 index 0000000000..9cfa9cf272 --- /dev/null +++ b/rust/doc/src/language/datasets/formats/binary-and-hex.md @@ -0,0 +1,89 @@ +# Binary and Hexadecimal Numeric Formats + +The binary and hexadecimal formats are primarily designed for +compatibility with existing machine formats, not for human +readability. All of them therefore have a `F` format as default +output format. Some of these formats are only portable between +machines with compatible byte ordering (endianness). + + Binary formats use byte values that in text files are interpreted +as special control functions, such as carriage return and line feed. +Thus, data in binary formats should not be included in syntax files or +read from data files with variable-length records, such as ordinary +text files. They may be read from or written to data files with +fixed-length records. *Note FILE HANDLE::, for information on working +with fixed-length records. + +## `P` and `PK` Formats + +These are binary-coded decimal formats, in which every byte (except +the last, in `P` format) represents two decimal digits. The +most-significant 4 bits of the first byte is the most-significant +decimal digit, the least-significant 4 bits of the first byte is the +next decimal digit, and so on. + + In `P` format, the most-significant 4 bits of the last byte are the +least-significant decimal digit. The least-significant 4 bits +represent the sign: decimal 15 indicates a negative value, decimal 13 +indicates a positive value. + + Numbers are rounded downward on output. The system-missing value and +numbers outside representable range are output as zero. + + The maximum field width is 16. Decimal places may range from 0 up to +the number of decimal digits represented by the field. + + The default output format is an `F` format with twice the input +field width, plus one column for a decimal point (if decimal places +were requested). + +## `IB` and `PIB` Formats + +These are integer binary formats. `IB` reads and writes 2's +complement binary integers, and `PIB` reads and writes unsigned binary +integers. The byte ordering is by default the host machine's, but +`SET RIB` may be used to select a specific byte ordering for reading +(*note SET RIB::) and `SET WIB`, similarly, for writing (*note SET +WIB::). + + The maximum field width is 8. Decimal places may range from 0 up to +the number of decimal digits in the largest value representable in the +field width. + + The default output format is an `F` format whose width is the +number of decimal digits in the largest value representable in the +field width, plus 1 if the format has decimal places. + +## `RB` Format + +This is a binary format for real numbers. By default it reads and +writes the host machine's floating-point format, but `SET RRB` may be +used to select an alternate floating-point format for reading (*note +SET RRB::) and `SET WRB`, similarly, for writing (*note SET WRB::). + +The field width should be 4, for 32-bit floating-point numbers, or 8, +for 64-bit floating-point numbers. Other field widths do not produce +useful results. The maximum field width is 8. No decimal places may +be specified. + + The default output format is `F8.2`. + +## `PIBHEX` and `RBHEX` Formats + +These are hexadecimal formats, for reading and writing binary formats +where each byte has been recoded as a pair of hexadecimal digits. + + A hexadecimal field consists solely of hexadecimal digits `0`...`9` +and `A`...`F`. Uppercase and lowercase are accepted on input; output is +in uppercase. + + Other than the hexadecimal representation, these formats are +equivalent to `PIB` and `RB` formats, respectively. However, bytes in +`PIBHEX` format are always ordered with the most-significant byte +first (big-endian order), regardless of the host machine's native byte +order or PSPP settings. + + Field widths must be even and between 2 and 16. `RBHEX` format +allows no decimal places; `PIBHEX` allows as many decimal places as a +`PIB` format with half the given width. + diff --git a/rust/doc/src/language/datasets/formats/custom-currency.md b/rust/doc/src/language/datasets/formats/custom-currency.md new file mode 100644 index 0000000000..1b6ef16a76 --- /dev/null +++ b/rust/doc/src/language/datasets/formats/custom-currency.md @@ -0,0 +1,60 @@ +# Custom Currency Formats + +The custom currency formats are closely related to the basic numeric +formats, but they allow users to customize the output format. The SET +command configures custom currency formats, using the syntax + +``` +SET CCX="STRING". +``` + +where X is `A`, `B`, `C`, `D`, or `E`, and `STRING` is no more than 16 +characters long. + + `STRING` must contain exactly three commas or exactly three periods +(but not both), except that a single quote character may be used to +"escape" a following comma, period, or single quote. If three commas +are used, commas are used for grouping in output, and a period is used +as the decimal point. Uses of periods reverses these roles. + + The commas or periods divide `STRING` into four fields, called the +"negative prefix", "prefix", "suffix", and "negative suffix", +respectively. The prefix and suffix are added to output whenever +space is available. The negative prefix and negative suffix are +always added to a negative number when the output includes a nonzero +digit. + + The following syntax shows how custom currency formats could be used +to reproduce basic numeric formats: + +``` +SET CCA="-,,,". /* Same as COMMA. +SET CCB="-...". /* Same as DOT. +SET CCC="-,$,,". /* Same as DOLLAR. +SET CCD="-,,%,". /* Like PCT, but groups with commas. +``` + + Here are some more examples of custom currency formats. The final +example shows how to use a single quote to escape a delimiter: + +``` +SET CCA=",EUR,,-". /* Euro. +SET CCB="(,USD ,,)". /* US dollar. +SET CCC="-.R$..". /* Brazilian real. +SET CCD="-,, NIS,". /* Israel shekel. +SET CCE="-.Rp'. ..". /* Indonesia Rupiah. +``` + +These formats would yield the following output: + +|Format |` 3145.59` |`-3145.59`| +|:---------|------------------:|---------------:| +|`CCA12.2` | ` EUR3,145.59` | `EUR3,145.59-`| +|`CCB14.2` | ` USD 3,145.59` | `(USD 3,145.59)`| +|`CCC11.2` | ` R$3.145,59` | `-R$3.145,59`| +|`CCD13.2` | ` 3,145.59 NIS` | `-3,145.59 NIS`| +|`CCE10.0` | ` Rp. 3.146` | `-Rp. 3.146`| + + The default for all the custom currency formats is `-,,,`, equivalent +to `COMMA` format. + diff --git a/rust/doc/src/language/datasets/formats/date-component.md b/rust/doc/src/language/datasets/formats/date-component.md new file mode 100644 index 0000000000..376e8baac6 --- /dev/null +++ b/rust/doc/src/language/datasets/formats/date-component.md @@ -0,0 +1,21 @@ +# Date Component Formats + +The `WKDAY` and `MONTH` formats provide input and output for the names of +weekdays and months, respectively. + + On output, these formats convert a number between 1 and 7, for +`WKDAY`, or between 1 and 12, for `MONTH`, into the English name of a +day or month, respectively. If the name is longer than the field, it +is trimmed to fit. If the name is shorter than the field, it is +padded on the right with spaces. Values outside the valid range, and +the system-missing value, are output as all spaces. + + On input, English weekday or month names (in uppercase or lowercase) +are converted back to their corresponding numbers. Weekday and month +names may be abbreviated to their first 2 or 3 letters, respectively. + + The field width may range from 2 to 40, for `WKDAY`, or from 3 to +40, for `MONTH`. No decimal places are allowed. + + The default output format is the same as the input format. + diff --git a/rust/doc/src/language/datasets/formats/legacy-numeric.md b/rust/doc/src/language/datasets/formats/legacy-numeric.md new file mode 100644 index 0000000000..28c2b58c14 --- /dev/null +++ b/rust/doc/src/language/datasets/formats/legacy-numeric.md @@ -0,0 +1,74 @@ +# Legacy Numeric Formats + +The `N` and `Z` numeric formats provide compatibility with legacy file +formats. They have much in common: + + - Output is rounded to the nearest representable value, with ties + rounded away from zero. + + - Numbers too large to display are output as a field filled with + asterisks (`*`). + + - The decimal point is always implicitly the specified number of + digits from the right edge of the field, except that `Z` format input + allows an explicit decimal point. + + - Scientific notation may not be used. + + - The system-missing value is output as a period in a field of + spaces. The period is placed just to the right of the implied + decimal point in `Z` format, or at the right end in `N` format or + in `Z` format if no decimal places are requested. A period is + used even if the decimal point character is a comma. + + - Field width may range from 1 to 40. Decimal places may range from + 0 up to the field width, to a maximum of 16. + + - When a legacy numeric format used for input is converted to an + output format, it is changed into the equivalent `F` format. The + field width is increased by 1 if any decimal places are + specified, to make room for a decimal point. For `Z` format, the + field width is increased by 1 more column, to make room for a + negative sign. The output field width is capped at 40 columns. + +## `N` Format + +The `N` format supports input and output of fields that contain only +digits. On input, leading or trailing spaces, a decimal point, or any +other non-digit character causes the field to be read as the +system-missing value. As a special exception, an `N` format used on +`DATA LIST FREE` or `DATA LIST LIST` is treated as the equivalent `F` +format. + + On output, `N` pads the field on the left with zeros. Negative +numbers are output like the system-missing value. + +## `Z` Format + +The `Z` format is a "zoned decimal" format used on IBM mainframes. `Z` +format encodes the sign as part of the final digit, which must be one of +the following: + +``` +0123456789 +{ABCDEFGHI +}JKLMNOPQR +``` + +where the characters on each line represent digits 0 through 9 in +order. Characters on the first two lines indicate a positive sign; +those on the third indicate a negative sign. + + On output, `Z` fields are padded on the left with spaces. On +input, leading and trailing spaces are ignored. Any character in an +input field other than spaces, the digit characters above, and `.` +causes the field to be read as system-missing. + + The decimal point character for input and output is always `.`, even +if the decimal point character is a comma (*note SET DECIMAL::). + + Nonzero, negative values output in `Z` format are marked as +negative even when no nonzero digits are output. For example, -0.2 is +output in `Z1.0` format as `J`. The "negative zero" value supported +by most machines is output as positive. + diff --git a/rust/doc/src/language/datasets/formats/string.md b/rust/doc/src/language/datasets/formats/string.md new file mode 100644 index 0000000000..24d4a4cee3 --- /dev/null +++ b/rust/doc/src/language/datasets/formats/string.md @@ -0,0 +1,14 @@ +# String Formats + +The `A` and `AHEX` formats are the only ones that may be assigned to +string variables. Neither format allows any decimal places. + + In `A` format, the entire field is treated as a string value. The +field width may range from 1 to 32,767, the maximum string width. The +default output format is the same as the input format. + + In `AHEX` format, the field is composed of characters in a string +encoded as hex digit pairs. On output, hex digits are output in +uppercase; on input, uppercase and lowercase are both accepted. The +default output format is `A` format with half the input width. + diff --git a/rust/doc/src/language/datasets/formats/time-and-date.md b/rust/doc/src/language/datasets/formats/time-and-date.md new file mode 100644 index 0000000000..ed7ddbb49d --- /dev/null +++ b/rust/doc/src/language/datasets/formats/time-and-date.md @@ -0,0 +1,147 @@ +# Time and Date Formats + +In PSPP, a "time" is an interval. The time formats translate between +human-friendly descriptions of time intervals and PSPP's internal +representation of time intervals, which is simply the number of seconds +in the interval. PSPP has three time formats: + +|Time Format |Template |Example| +|:-------------|:---------------------------|:---------------------------| +|MTIME |`MM:SS.ss` |`91:17.01`| +|TIME |`hh:MM:SS.ss` |`01:31:17.01`| +|DTIME |`DD HH:MM:SS.ss` |`00 04:31:17.01`| + + A "date" is a moment in the past or the future. Internally, PSPP +represents a date as the number of seconds since the "epoch", midnight, +Oct. 14, 1582. The date formats translate between human-readable dates +and PSPP's numeric representation of dates and times. PSPP has several +date formats: + +|Date Format |Template |Example| +|:-------------|:---------------------------|:---------------------------| +|DATE |`dd-mmm-yyyy` |`01-OCT-1978`| +|ADATE |`mm/dd/yyyy` |`10/01/1978`| +|EDATE |`dd.mm.yyyy` |`01.10.1978`| +|JDATE |`yyyyjjj` |`1978274`| +|SDATE |`yyyy/mm/dd` |`1978/10/01`| +|QYR |`q Q yyyy` |`3 Q 1978`| +|MOYR |`mmm yyyy` |`OCT 1978`| +|WKYR |`ww WK yyyy` |`40 WK 1978`| +|DATETIME |`dd-mmm-yyyy HH:MM:SS.ss` |`01-OCT-1978 04:31:17.01`| +|YMDHMS |`yyyy-mm-dd HH:MM:SS.ss` |`1978-01-OCT 04:31:17.01`| + + The templates in the preceding tables describe how the time and date +formats are input and output: + +* `dd` + Day of month, from 1 to 31. Always output as two digits. + +* `mm` + `mmm` + Month. In output, `mm` is output as two digits, `mmm` as the first + three letters of an English month name (January, February, ...). + In input, both of these formats, plus Roman numerals, are accepted. + +* `yyyy` + Year. In output, `DATETIME` and `YMDHMS` always produce 4-digit years; + other formats can produce a 2- or 4-digit year. The century + assumed for 2-digit years depends on the `EPOCH` setting (*note SET + EPOCH::). In output, a year outside the epoch causes the whole + field to be filled with asterisks (`*`). + +* `jjj` + Day of year (Julian day), from 1 to 366. This is exactly three + digits giving the count of days from the start of the year. + January 1 is considered day 1. + +* `q` + Quarter of year, from 1 to 4. Quarters start on January 1, April + 1, July 1, and October 1. + +* `ww` + Week of year, from 1 to 53. Output as exactly two digits. January + 1 is the first day of week 1. + +* `DD` + Count of days, which may be positive or negative. Output as at + least two digits. + +* `hh` + Count of hours, which may be positive or negative. Output as at + least two digits. + +* `HH` + Hour of day, from 0 to 23. Output as exactly two digits. + +* `MM` + In MTIME, count of minutes, which may be positive or negative. + Output as at least two digits. + + In other formats, minute of hour, from 0 to 59. Output as exactly + two digits. + +* `SS.ss` + Seconds within minute, from 0 to 59. The integer part is output as + exactly two digits. On output, seconds and fractional seconds may + or may not be included, depending on field width and decimal + places. On input, seconds and fractional seconds are optional. + The `DECIMAL` setting controls the character accepted and displayed + as the decimal point (*note SET DECIMAL::). + + For output, the date and time formats use the delimiters indicated in +the table. For input, date components may be separated by spaces or by +one of the characters `-`, `/`, `.`, or `,`, and time components may be +separated by spaces or `:`. On input, the `Q` separating quarter from +year and the `WK` separating week from year may be uppercase or +lowercase, and the spaces around them are optional. + + On input, all time and date formats accept any amount of leading and +trailing white space. + + The maximum width for time and date formats is 40 columns. Minimum +input and output width for each of the time and date formats is shown +below: + +|Format |Min. Input Width |Min. Output Width |Option| +|:------------|-------------------:|--------------------:|:------------| +|`DATE` |8 |9 |4-digit year| +|`ADATE` |8 |8 |4-digit year| +|`EDATE` |8 |8 |4-digit year| +|`JDATE` |5 |5 |4-digit year| +|`SDATE` |8 |8 |4-digit year| +|`QYR` |4 |6 |4-digit year| +|`MOYR` |6 |6 |4-digit year| +|`WKYR` |6 |8 |4-digit year| +|`DATETIME` |17 |17 |seconds| +|`YMDHMS` |12 |16 |seconds| +|`MTIME` |4 |5 +|`TIME` |5 |5 |seconds| +|`DTIME` |8 |8 |seconds| + +In the table, "Option" describes what increased output width enables: + +* "4-digit year": A field 2 columns wider than the minimum includes a + 4-digit year. (`DATETIME` and `YMDHMS` formats always include a + 4-digit year.) + +* "seconds": A field 3 columns wider than the minimum includes seconds + as well as minutes. A field 5 columns wider than minimum, or more, + can also include a decimal point and fractional seconds (but no more + than allowed by the format's decimal places). + + For the time and date formats, the default output format is the same +as the input format, except that PSPP increases the field width, if +necessary, to the minimum allowed for output. + + Time or dates narrower than the field width are right-justified +within the field. + + When a time or date exceeds the field width, characters are trimmed +from the end until it fits. This can occur in an unusual situation, +e.g. with a year greater than 9999 (which adds an extra digit), or for +a negative value on `MTIME`, `TIME`, or `DTIME` (which adds a leading +minus sign). + + The system-missing value is output as a period at the right end of +the field. + diff --git a/rust/doc/src/language/datasets/scratch-variables.md b/rust/doc/src/language/datasets/scratch-variables.md new file mode 100644 index 0000000000..cd1397a9e2 --- /dev/null +++ b/rust/doc/src/language/datasets/scratch-variables.md @@ -0,0 +1,22 @@ +# Scratch Variables + +Most of the time, variables don't retain their values between cases. +Instead, either they're being read from a data file or the active +dataset, in which case they assume the value read, or, if created with +`COMPUTE` or another transformation, they're initialized to the +system-missing value or to blanks, depending on type. + + However, sometimes it's useful to have a variable that keeps its +value between cases. You can do this with `LEAVE` (*note LEAVE::), or +you can use a "scratch variable". Scratch variables are variables whose +names begin with an octothorpe (`#`). + + Scratch variables have the same properties as variables left with +`LEAVE`: they retain their values between cases, and for the first case +they are initialized to 0 or blanks. They have the additional property +that they are deleted before the execution of any procedure. For this +reason, scratch variables can't be used for analysis. To use a scratch +variable in an analysis, use `COMPUTE` (*note COMPUTE::) to copy its +value into an ordinary variable, then use that ordinary variable in the +analysis. + diff --git a/rust/doc/src/language/files/file-handles.md b/rust/doc/src/language/files/file-handles.md new file mode 100644 index 0000000000..d987801afc --- /dev/null +++ b/rust/doc/src/language/files/file-handles.md @@ -0,0 +1,38 @@ +# File Handles + +A "file handle" is a reference to a data file, system file, or portable +file. Most often, a file handle is specified as the name of a file as a +string, that is, enclosed within `'` or `"`. + + A file name string that begins or ends with `|` is treated as the +name of a command to pipe data to or from. You can use this feature to +read data over the network using a program such as `curl` (e.g. `GET +'|curl -s -S http://example.com/mydata.sav'`), to read compressed data +from a file using a program such as `zcat` (e.g. `GET '|zcat +mydata.sav.gz'`), and for many other purposes. + + PSPP also supports declaring named file handles with the `FILE +HANDLE` command. This command associates an identifier of your choice +(the file handle's name) with a file. Later, the file handle name can +be substituted for the name of the file. When PSPP syntax accesses a +file multiple times, declaring a named file handle simplifies updating +the syntax later to use a different file. Use of `FILE HANDLE` is also +required to read data files in binary formats. *Note FILE HANDLE::, for +more information. + + In some circumstances, PSPP must distinguish whether a file handle +refers to a system file or a portable file. When this is necessary to +read a file, e.g. as an input file for `GET` or `MATCH FILES`, PSPP uses +the file's contents to decide. In the context of writing a file, e.g. +as an output file for `SAVE` or `AGGREGATE`, PSPP decides based on the +file's name: if it ends in `.por` (with any capitalization), then PSPP +writes a portable file; otherwise, PSPP writes a system file. + + `INLINE` is reserved as a file handle name. It refers to the "data +file" embedded into the syntax file between `BEGIN DATA` and `END +DATA`. *Note BEGIN DATA::, for more information. + + The file to which a file handle refers may be reassigned on a later +`FILE HANDLE` command if it is first closed using `CLOSE FILE HANDLE`. +*Note CLOSE FILE HANDLE::, for more information. + diff --git a/rust/doc/src/language/files/index.md b/rust/doc/src/language/files/index.md new file mode 100644 index 0000000000..fe0f4c3f46 --- /dev/null +++ b/rust/doc/src/language/files/index.md @@ -0,0 +1,32 @@ +# Files Used by PSPP + +PSPP makes use of many files each time it runs. Some of these it +reads, some it writes, some it creates. Here is a table listing the +most important of these files: + +* command file + syntax file + These names (synonyms) refer to the file that contains instructions + that tell PSPP what to do. The syntax file's name is specified on + the PSPP command line. Syntax files can also be read with + `INCLUDE` (*note INCLUDE::). + +* data file + Data files contain raw data in text or binary format. Data can + also be embedded in a syntax file with `BEGIN DATA` and `END DATA`. + +* listing file + One or more output files are created by PSPP each time it is run. + The output files receive the tables and charts produced by + statistical procedures. The output files may be in any number of + formats, depending on how PSPP is configured. + +* system file + System files are binary files that store a dictionary and a set of + cases. `GET` and `SAVE` read and write system files. + +* portable file + Portable files are files in a text-based format that store a + dictionary and a set of cases. `IMPORT` and `EXPORT` read and + write portable files. + diff --git a/rust/doc/src/language/syntax-diagrams.md b/rust/doc/src/language/syntax-diagrams.md new file mode 100644 index 0000000000..937599fe72 --- /dev/null +++ b/rust/doc/src/language/syntax-diagrams.md @@ -0,0 +1,47 @@ +# Syntax Diagrams + +The syntax of PSPP commands is presented in this manual with syntax +diagrams. + +A syntax diagram is a series of definitions of "nonterminals". Each +nonterminal is defined its name, then `::=`, then what the nonterminal +consists of. If a nonterminal has multiple definitions, then any of +them is acceptable. If the definition is empty, then one possible +expansion of that nonterminal is nothing. Otherwise, the definition +consists of a series of nonterminals and "terminals". The latter +represent single tokens and consist of: + +- `KEYWORD` + Any word written in uppercase is that literal syntax keyword. + +- `number` + A real number. + +- `integer` + An integer number. + +- `string` + A string. + +- `var-name` + A single variable name. + +- `=`, `/`, `+`, `-`, etc. + Operators and punctuators. + +- `.` + The end of the command. This is not necessarily an actual dot in + the syntax file (*note Commands::). + +Some nonterminals are very common, so they are defined here in English +for clarity: + +- `var-list` + A list of one or more variable names or the keyword `ALL`. + +- `expression` + An expression. *Note Expressions::, for details. + +The first nonterminal defined in a syntax diagram for a command is +the entire syntax for that command. +