- [Variable Lists](language/datasets/variable-lists.md)
- [Input and Output Formats](language/datasets/formats/index.md)
- [Basic Numeric Formats](language/datasets/formats/basic.md)
+ - [Custom Currency Formats](language/datasets/formats/custom-currency.md)
+ - [Legacy Numeric Formats](language/datasets/formats/legacy-numeric.md)
+ - [Binary and Hexadecimal Numeric Formats](language/datasets/formats/binary-and-hex.md)
+ - [Time and Date Formats](language/datasets/formats/time-and-date.md)
+ - [Date Component Formats](language/datasets/formats/date-component.md)
+ - [String Formats](language/datasets/formats/string.md)
+ - [Scratch Variables](language/datasets/scratch-variables.md)
+- [Files Used by PSPP](language/files/index.md)
+ - [File Handles](language/files/file-handles.md)
+- [Syntax Diagrams](language/syntax-diagrams.md)
# Developer Documentation
--- /dev/null
+# Binary and Hexadecimal Numeric Formats
+
+The binary and hexadecimal formats are primarily designed for
+compatibility with existing machine formats, not for human
+readability. All of them therefore have a `F` format as default
+output format. Some of these formats are only portable between
+machines with compatible byte ordering (endianness).
+
+ Binary formats use byte values that in text files are interpreted
+as special control functions, such as carriage return and line feed.
+Thus, data in binary formats should not be included in syntax files or
+read from data files with variable-length records, such as ordinary
+text files. They may be read from or written to data files with
+fixed-length records. *Note FILE HANDLE::, for information on working
+with fixed-length records.
+
+## `P` and `PK` Formats
+
+These are binary-coded decimal formats, in which every byte (except
+the last, in `P` format) represents two decimal digits. The
+most-significant 4 bits of the first byte is the most-significant
+decimal digit, the least-significant 4 bits of the first byte is the
+next decimal digit, and so on.
+
+ In `P` format, the most-significant 4 bits of the last byte are the
+least-significant decimal digit. The least-significant 4 bits
+represent the sign: decimal 15 indicates a negative value, decimal 13
+indicates a positive value.
+
+ Numbers are rounded downward on output. The system-missing value and
+numbers outside representable range are output as zero.
+
+ The maximum field width is 16. Decimal places may range from 0 up to
+the number of decimal digits represented by the field.
+
+ The default output format is an `F` format with twice the input
+field width, plus one column for a decimal point (if decimal places
+were requested).
+
+## `IB` and `PIB` Formats
+
+These are integer binary formats. `IB` reads and writes 2's
+complement binary integers, and `PIB` reads and writes unsigned binary
+integers. The byte ordering is by default the host machine's, but
+`SET RIB` may be used to select a specific byte ordering for reading
+(*note SET RIB::) and `SET WIB`, similarly, for writing (*note SET
+WIB::).
+
+ The maximum field width is 8. Decimal places may range from 0 up to
+the number of decimal digits in the largest value representable in the
+field width.
+
+ The default output format is an `F` format whose width is the
+number of decimal digits in the largest value representable in the
+field width, plus 1 if the format has decimal places.
+
+## `RB` Format
+
+This is a binary format for real numbers. By default it reads and
+writes the host machine's floating-point format, but `SET RRB` may be
+used to select an alternate floating-point format for reading (*note
+SET RRB::) and `SET WRB`, similarly, for writing (*note SET WRB::).
+
+The field width should be 4, for 32-bit floating-point numbers, or 8,
+for 64-bit floating-point numbers. Other field widths do not produce
+useful results. The maximum field width is 8. No decimal places may
+be specified.
+
+ The default output format is `F8.2`.
+
+## `PIBHEX` and `RBHEX` Formats
+
+These are hexadecimal formats, for reading and writing binary formats
+where each byte has been recoded as a pair of hexadecimal digits.
+
+ A hexadecimal field consists solely of hexadecimal digits `0`...`9`
+and `A`...`F`. Uppercase and lowercase are accepted on input; output is
+in uppercase.
+
+ Other than the hexadecimal representation, these formats are
+equivalent to `PIB` and `RB` formats, respectively. However, bytes in
+`PIBHEX` format are always ordered with the most-significant byte
+first (big-endian order), regardless of the host machine's native byte
+order or PSPP settings.
+
+ Field widths must be even and between 2 and 16. `RBHEX` format
+allows no decimal places; `PIBHEX` allows as many decimal places as a
+`PIB` format with half the given width.
+
--- /dev/null
+# Custom Currency Formats
+
+The custom currency formats are closely related to the basic numeric
+formats, but they allow users to customize the output format. The SET
+command configures custom currency formats, using the syntax
+
+```
+SET CCX="STRING".
+```
+
+where X is `A`, `B`, `C`, `D`, or `E`, and `STRING` is no more than 16
+characters long.
+
+ `STRING` must contain exactly three commas or exactly three periods
+(but not both), except that a single quote character may be used to
+"escape" a following comma, period, or single quote. If three commas
+are used, commas are used for grouping in output, and a period is used
+as the decimal point. Uses of periods reverses these roles.
+
+ The commas or periods divide `STRING` into four fields, called the
+"negative prefix", "prefix", "suffix", and "negative suffix",
+respectively. The prefix and suffix are added to output whenever
+space is available. The negative prefix and negative suffix are
+always added to a negative number when the output includes a nonzero
+digit.
+
+ The following syntax shows how custom currency formats could be used
+to reproduce basic numeric formats:
+
+```
+SET CCA="-,,,". /* Same as COMMA.
+SET CCB="-...". /* Same as DOT.
+SET CCC="-,$,,". /* Same as DOLLAR.
+SET CCD="-,,%,". /* Like PCT, but groups with commas.
+```
+
+ Here are some more examples of custom currency formats. The final
+example shows how to use a single quote to escape a delimiter:
+
+```
+SET CCA=",EUR,,-". /* Euro.
+SET CCB="(,USD ,,)". /* US dollar.
+SET CCC="-.R$..". /* Brazilian real.
+SET CCD="-,, NIS,". /* Israel shekel.
+SET CCE="-.Rp'. ..". /* Indonesia Rupiah.
+```
+
+These formats would yield the following output:
+
+|Format |` 3145.59` |`-3145.59`|
+|:---------|------------------:|---------------:|
+|`CCA12.2` | ` EUR3,145.59` | `EUR3,145.59-`|
+|`CCB14.2` | ` USD 3,145.59` | `(USD 3,145.59)`|
+|`CCC11.2` | ` R$3.145,59` | `-R$3.145,59`|
+|`CCD13.2` | ` 3,145.59 NIS` | `-3,145.59 NIS`|
+|`CCE10.0` | ` Rp. 3.146` | `-Rp. 3.146`|
+
+ The default for all the custom currency formats is `-,,,`, equivalent
+to `COMMA` format.
+
--- /dev/null
+# Date Component Formats
+
+The `WKDAY` and `MONTH` formats provide input and output for the names of
+weekdays and months, respectively.
+
+ On output, these formats convert a number between 1 and 7, for
+`WKDAY`, or between 1 and 12, for `MONTH`, into the English name of a
+day or month, respectively. If the name is longer than the field, it
+is trimmed to fit. If the name is shorter than the field, it is
+padded on the right with spaces. Values outside the valid range, and
+the system-missing value, are output as all spaces.
+
+ On input, English weekday or month names (in uppercase or lowercase)
+are converted back to their corresponding numbers. Weekday and month
+names may be abbreviated to their first 2 or 3 letters, respectively.
+
+ The field width may range from 2 to 40, for `WKDAY`, or from 3 to
+40, for `MONTH`. No decimal places are allowed.
+
+ The default output format is the same as the input format.
+
--- /dev/null
+# Legacy Numeric Formats
+
+The `N` and `Z` numeric formats provide compatibility with legacy file
+formats. They have much in common:
+
+ - Output is rounded to the nearest representable value, with ties
+ rounded away from zero.
+
+ - Numbers too large to display are output as a field filled with
+ asterisks (`*`).
+
+ - The decimal point is always implicitly the specified number of
+ digits from the right edge of the field, except that `Z` format input
+ allows an explicit decimal point.
+
+ - Scientific notation may not be used.
+
+ - The system-missing value is output as a period in a field of
+ spaces. The period is placed just to the right of the implied
+ decimal point in `Z` format, or at the right end in `N` format or
+ in `Z` format if no decimal places are requested. A period is
+ used even if the decimal point character is a comma.
+
+ - Field width may range from 1 to 40. Decimal places may range from
+ 0 up to the field width, to a maximum of 16.
+
+ - When a legacy numeric format used for input is converted to an
+ output format, it is changed into the equivalent `F` format. The
+ field width is increased by 1 if any decimal places are
+ specified, to make room for a decimal point. For `Z` format, the
+ field width is increased by 1 more column, to make room for a
+ negative sign. The output field width is capped at 40 columns.
+
+## `N` Format
+
+The `N` format supports input and output of fields that contain only
+digits. On input, leading or trailing spaces, a decimal point, or any
+other non-digit character causes the field to be read as the
+system-missing value. As a special exception, an `N` format used on
+`DATA LIST FREE` or `DATA LIST LIST` is treated as the equivalent `F`
+format.
+
+ On output, `N` pads the field on the left with zeros. Negative
+numbers are output like the system-missing value.
+
+## `Z` Format
+
+The `Z` format is a "zoned decimal" format used on IBM mainframes. `Z`
+format encodes the sign as part of the final digit, which must be one of
+the following:
+
+```
+0123456789
+{ABCDEFGHI
+}JKLMNOPQR
+```
+
+where the characters on each line represent digits 0 through 9 in
+order. Characters on the first two lines indicate a positive sign;
+those on the third indicate a negative sign.
+
+ On output, `Z` fields are padded on the left with spaces. On
+input, leading and trailing spaces are ignored. Any character in an
+input field other than spaces, the digit characters above, and `.`
+causes the field to be read as system-missing.
+
+ The decimal point character for input and output is always `.`, even
+if the decimal point character is a comma (*note SET DECIMAL::).
+
+ Nonzero, negative values output in `Z` format are marked as
+negative even when no nonzero digits are output. For example, -0.2 is
+output in `Z1.0` format as `J`. The "negative zero" value supported
+by most machines is output as positive.
+
--- /dev/null
+# String Formats
+
+The `A` and `AHEX` formats are the only ones that may be assigned to
+string variables. Neither format allows any decimal places.
+
+ In `A` format, the entire field is treated as a string value. The
+field width may range from 1 to 32,767, the maximum string width. The
+default output format is the same as the input format.
+
+ In `AHEX` format, the field is composed of characters in a string
+encoded as hex digit pairs. On output, hex digits are output in
+uppercase; on input, uppercase and lowercase are both accepted. The
+default output format is `A` format with half the input width.
+
--- /dev/null
+# Time and Date Formats
+
+In PSPP, a "time" is an interval. The time formats translate between
+human-friendly descriptions of time intervals and PSPP's internal
+representation of time intervals, which is simply the number of seconds
+in the interval. PSPP has three time formats:
+
+|Time Format |Template |Example|
+|:-------------|:---------------------------|:---------------------------|
+|MTIME |`MM:SS.ss` |`91:17.01`|
+|TIME |`hh:MM:SS.ss` |`01:31:17.01`|
+|DTIME |`DD HH:MM:SS.ss` |`00 04:31:17.01`|
+
+ A "date" is a moment in the past or the future. Internally, PSPP
+represents a date as the number of seconds since the "epoch", midnight,
+Oct. 14, 1582. The date formats translate between human-readable dates
+and PSPP's numeric representation of dates and times. PSPP has several
+date formats:
+
+|Date Format |Template |Example|
+|:-------------|:---------------------------|:---------------------------|
+|DATE |`dd-mmm-yyyy` |`01-OCT-1978`|
+|ADATE |`mm/dd/yyyy` |`10/01/1978`|
+|EDATE |`dd.mm.yyyy` |`01.10.1978`|
+|JDATE |`yyyyjjj` |`1978274`|
+|SDATE |`yyyy/mm/dd` |`1978/10/01`|
+|QYR |`q Q yyyy` |`3 Q 1978`|
+|MOYR |`mmm yyyy` |`OCT 1978`|
+|WKYR |`ww WK yyyy` |`40 WK 1978`|
+|DATETIME |`dd-mmm-yyyy HH:MM:SS.ss` |`01-OCT-1978 04:31:17.01`|
+|YMDHMS |`yyyy-mm-dd HH:MM:SS.ss` |`1978-01-OCT 04:31:17.01`|
+
+ The templates in the preceding tables describe how the time and date
+formats are input and output:
+
+* `dd`
+ Day of month, from 1 to 31. Always output as two digits.
+
+* `mm`
+ `mmm`
+ Month. In output, `mm` is output as two digits, `mmm` as the first
+ three letters of an English month name (January, February, ...).
+ In input, both of these formats, plus Roman numerals, are accepted.
+
+* `yyyy`
+ Year. In output, `DATETIME` and `YMDHMS` always produce 4-digit years;
+ other formats can produce a 2- or 4-digit year. The century
+ assumed for 2-digit years depends on the `EPOCH` setting (*note SET
+ EPOCH::). In output, a year outside the epoch causes the whole
+ field to be filled with asterisks (`*`).
+
+* `jjj`
+ Day of year (Julian day), from 1 to 366. This is exactly three
+ digits giving the count of days from the start of the year.
+ January 1 is considered day 1.
+
+* `q`
+ Quarter of year, from 1 to 4. Quarters start on January 1, April
+ 1, July 1, and October 1.
+
+* `ww`
+ Week of year, from 1 to 53. Output as exactly two digits. January
+ 1 is the first day of week 1.
+
+* `DD`
+ Count of days, which may be positive or negative. Output as at
+ least two digits.
+
+* `hh`
+ Count of hours, which may be positive or negative. Output as at
+ least two digits.
+
+* `HH`
+ Hour of day, from 0 to 23. Output as exactly two digits.
+
+* `MM`
+ In MTIME, count of minutes, which may be positive or negative.
+ Output as at least two digits.
+
+ In other formats, minute of hour, from 0 to 59. Output as exactly
+ two digits.
+
+* `SS.ss`
+ Seconds within minute, from 0 to 59. The integer part is output as
+ exactly two digits. On output, seconds and fractional seconds may
+ or may not be included, depending on field width and decimal
+ places. On input, seconds and fractional seconds are optional.
+ The `DECIMAL` setting controls the character accepted and displayed
+ as the decimal point (*note SET DECIMAL::).
+
+ For output, the date and time formats use the delimiters indicated in
+the table. For input, date components may be separated by spaces or by
+one of the characters `-`, `/`, `.`, or `,`, and time components may be
+separated by spaces or `:`. On input, the `Q` separating quarter from
+year and the `WK` separating week from year may be uppercase or
+lowercase, and the spaces around them are optional.
+
+ On input, all time and date formats accept any amount of leading and
+trailing white space.
+
+ The maximum width for time and date formats is 40 columns. Minimum
+input and output width for each of the time and date formats is shown
+below:
+
+|Format |Min. Input Width |Min. Output Width |Option|
+|:------------|-------------------:|--------------------:|:------------|
+|`DATE` |8 |9 |4-digit year|
+|`ADATE` |8 |8 |4-digit year|
+|`EDATE` |8 |8 |4-digit year|
+|`JDATE` |5 |5 |4-digit year|
+|`SDATE` |8 |8 |4-digit year|
+|`QYR` |4 |6 |4-digit year|
+|`MOYR` |6 |6 |4-digit year|
+|`WKYR` |6 |8 |4-digit year|
+|`DATETIME` |17 |17 |seconds|
+|`YMDHMS` |12 |16 |seconds|
+|`MTIME` |4 |5
+|`TIME` |5 |5 |seconds|
+|`DTIME` |8 |8 |seconds|
+
+In the table, "Option" describes what increased output width enables:
+
+* "4-digit year": A field 2 columns wider than the minimum includes a
+ 4-digit year. (`DATETIME` and `YMDHMS` formats always include a
+ 4-digit year.)
+
+* "seconds": A field 3 columns wider than the minimum includes seconds
+ as well as minutes. A field 5 columns wider than minimum, or more,
+ can also include a decimal point and fractional seconds (but no more
+ than allowed by the format's decimal places).
+
+ For the time and date formats, the default output format is the same
+as the input format, except that PSPP increases the field width, if
+necessary, to the minimum allowed for output.
+
+ Time or dates narrower than the field width are right-justified
+within the field.
+
+ When a time or date exceeds the field width, characters are trimmed
+from the end until it fits. This can occur in an unusual situation,
+e.g. with a year greater than 9999 (which adds an extra digit), or for
+a negative value on `MTIME`, `TIME`, or `DTIME` (which adds a leading
+minus sign).
+
+ The system-missing value is output as a period at the right end of
+the field.
+
--- /dev/null
+# Scratch Variables
+
+Most of the time, variables don't retain their values between cases.
+Instead, either they're being read from a data file or the active
+dataset, in which case they assume the value read, or, if created with
+`COMPUTE` or another transformation, they're initialized to the
+system-missing value or to blanks, depending on type.
+
+ However, sometimes it's useful to have a variable that keeps its
+value between cases. You can do this with `LEAVE` (*note LEAVE::), or
+you can use a "scratch variable". Scratch variables are variables whose
+names begin with an octothorpe (`#`).
+
+ Scratch variables have the same properties as variables left with
+`LEAVE`: they retain their values between cases, and for the first case
+they are initialized to 0 or blanks. They have the additional property
+that they are deleted before the execution of any procedure. For this
+reason, scratch variables can't be used for analysis. To use a scratch
+variable in an analysis, use `COMPUTE` (*note COMPUTE::) to copy its
+value into an ordinary variable, then use that ordinary variable in the
+analysis.
+
--- /dev/null
+# File Handles
+
+A "file handle" is a reference to a data file, system file, or portable
+file. Most often, a file handle is specified as the name of a file as a
+string, that is, enclosed within `'` or `"`.
+
+ A file name string that begins or ends with `|` is treated as the
+name of a command to pipe data to or from. You can use this feature to
+read data over the network using a program such as `curl` (e.g. `GET
+'|curl -s -S http://example.com/mydata.sav'`), to read compressed data
+from a file using a program such as `zcat` (e.g. `GET '|zcat
+mydata.sav.gz'`), and for many other purposes.
+
+ PSPP also supports declaring named file handles with the `FILE
+HANDLE` command. This command associates an identifier of your choice
+(the file handle's name) with a file. Later, the file handle name can
+be substituted for the name of the file. When PSPP syntax accesses a
+file multiple times, declaring a named file handle simplifies updating
+the syntax later to use a different file. Use of `FILE HANDLE` is also
+required to read data files in binary formats. *Note FILE HANDLE::, for
+more information.
+
+ In some circumstances, PSPP must distinguish whether a file handle
+refers to a system file or a portable file. When this is necessary to
+read a file, e.g. as an input file for `GET` or `MATCH FILES`, PSPP uses
+the file's contents to decide. In the context of writing a file, e.g.
+as an output file for `SAVE` or `AGGREGATE`, PSPP decides based on the
+file's name: if it ends in `.por` (with any capitalization), then PSPP
+writes a portable file; otherwise, PSPP writes a system file.
+
+ `INLINE` is reserved as a file handle name. It refers to the "data
+file" embedded into the syntax file between `BEGIN DATA` and `END
+DATA`. *Note BEGIN DATA::, for more information.
+
+ The file to which a file handle refers may be reassigned on a later
+`FILE HANDLE` command if it is first closed using `CLOSE FILE HANDLE`.
+*Note CLOSE FILE HANDLE::, for more information.
+
--- /dev/null
+# Files Used by PSPP
+
+PSPP makes use of many files each time it runs. Some of these it
+reads, some it writes, some it creates. Here is a table listing the
+most important of these files:
+
+* command file
+ syntax file
+ These names (synonyms) refer to the file that contains instructions
+ that tell PSPP what to do. The syntax file's name is specified on
+ the PSPP command line. Syntax files can also be read with
+ `INCLUDE` (*note INCLUDE::).
+
+* data file
+ Data files contain raw data in text or binary format. Data can
+ also be embedded in a syntax file with `BEGIN DATA` and `END DATA`.
+
+* listing file
+ One or more output files are created by PSPP each time it is run.
+ The output files receive the tables and charts produced by
+ statistical procedures. The output files may be in any number of
+ formats, depending on how PSPP is configured.
+
+* system file
+ System files are binary files that store a dictionary and a set of
+ cases. `GET` and `SAVE` read and write system files.
+
+* portable file
+ Portable files are files in a text-based format that store a
+ dictionary and a set of cases. `IMPORT` and `EXPORT` read and
+ write portable files.
+
--- /dev/null
+# Syntax Diagrams
+
+The syntax of PSPP commands is presented in this manual with syntax
+diagrams.
+
+A syntax diagram is a series of definitions of "nonterminals". Each
+nonterminal is defined its name, then `::=`, then what the nonterminal
+consists of. If a nonterminal has multiple definitions, then any of
+them is acceptable. If the definition is empty, then one possible
+expansion of that nonterminal is nothing. Otherwise, the definition
+consists of a series of nonterminals and "terminals". The latter
+represent single tokens and consist of:
+
+- `KEYWORD`
+ Any word written in uppercase is that literal syntax keyword.
+
+- `number`
+ A real number.
+
+- `integer`
+ An integer number.
+
+- `string`
+ A string.
+
+- `var-name`
+ A single variable name.
+
+- `=`, `/`, `+`, `-`, etc.
+ Operators and punctuators.
+
+- `.`
+ The end of the command. This is not necessarily an actual dot in
+ the syntax file (*note Commands::).
+
+Some nonterminals are very common, so they are defined here in English
+for clarity:
+
+- `var-list`
+ A list of one or more variable names or the keyword `ALL`.
+
+- `expression`
+ An expression. *Note Expressions::, for details.
+
+The first nonterminal defined in a syntax diagram for a command is
+the entire syntax for that command.
+