1 @node Language, Expressions, Invocation, Top
2 @chapter The PSPP language
7 @strong{Please note:} PSPP is not even close to completion.
8 Only a few statistical procedures are implemented. PSPP
12 This chapter discusses elements common to many PSPP commands.
13 Later chapters will describe individual commands in detail.
16 * Tokens:: Characters combine to form tokens.
17 * Commands:: Tokens combine to form commands.
18 * Types of Commands:: Commands come in several flavors.
19 * Order of Commands:: Commands combine to form syntax files.
20 * Missing Observations:: Handling missing observations.
21 * Variables:: The unit of data storage.
22 * Files:: Files used by PSPP.
23 * File Handles:: How files are named.
24 * BNF:: How command syntax is described.
27 @node Tokens, Commands, Language, Language
29 @cindex language, lexical analysis
30 @cindex language, tokens
32 @cindex lexical analysis
34 PSPP divides most syntax file lines into series of short chunks
36 Tokens are then grouped to form commands, each of which tells
37 PSPP to take some action---read in data, write out data, perform
38 a statistical procedure, etc. Each type of token is
44 Identifiers are names that typically specify variables, commands, or
45 subcommands. The first character in an identifier must be a letter,
46 @samp{#}, or @samp{@@}. The remaining characters in the identifier
47 must be letters, digits, or one of the following special characters:
53 @cindex case-sensitivity
54 Identifiers may be any length, but only the first 64 bytes are
55 significant. Identifiers are not case-sensitive: @code{foobar},
56 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
57 different representations of the same identifier.
59 @cindex identifiers, reserved
60 @cindex reserved identifiers
61 Some identifiers are reserved. Reserved identifiers may not be used
62 in any context besides those explicitly described in this manual. The
63 reserved identifiers are:
66 @center ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
70 Keywords are a subclass of identifiers that form a fixed part of
71 command syntax. For example, command and subcommand names are
72 keywords. Keywords may be abbreviated to their first 3 characters if
73 this abbreviation is unambiguous. (Unique abbreviations of 3 or more
74 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
75 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
77 Reserved identifiers are always used as keywords. Other identifiers
78 may be used both as keywords and as user-defined identifiers, such as
85 Numbers are expressed in decimal. A decimal point is optional.
86 Numbers may be expressed in scientific notation by adding @samp{e} and
87 a base-10 exponent, so that @samp{1.234e3} has the value 1234. Here
88 are some more examples of valid numbers:
91 -5 3.14159265359 1e100 -.707 8945.
94 Negative numbers are expressed with a @samp{-} prefix. However, in
95 situations where a literal @samp{-} token is expected, what appears to
96 be a negative number is treated as @samp{-} followed by a positive
99 No white space is allowed within a number token, except for horizontal
100 white space between @samp{-} and the rest of the number.
102 The last example above, @samp{8945.} will be interpreted as two
103 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
104 @xref{Commands, , Forming commands of tokens}.
110 @cindex case-sensitivity
111 Strings are literal sequences of characters enclosed in pairs of
112 single quotes (@samp{'}) or double quotes (@samp{"}). To include the
113 character used for quoting in the string, double it, e.g.@:
114 @samp{'it''s an apostrophe'}. White space and case of letters are
115 significant inside strings.
117 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
118 'c'} is equivalent to @samp{'abc'}. Concatenation is useful for
119 splitting a single string across multiple source lines. The maximum
120 length of a string, after concatenation, is 255 characters.
122 Strings may also be expressed as hexadecimal, octal, or binary
123 character values by prefixing the initial quote character by @samp{X},
124 @samp{O}, or @samp{B} or their lowercase equivalents. Each pair,
125 triplet, or octet of characters, according to the radix, is
126 transformed into a single character with the given value. If there is
127 an incomplete group of characters, the missing final digits are
128 assumed to be @samp{0}. These forms of strings are nonportable
129 because numeric values are associated with different characters by
130 different operating systems. Therefore, their use should be confined
131 to syntax files that will not be widely distributed.
133 @cindex characters, reserved
136 The character with value 00 is reserved for
137 internal use by PSPP. Its use in strings causes an error and
138 replacement by a space character.
140 @item Punctuators and Operators
143 These tokens are the punctuators and operators:
146 @center , / = ( ) + - * / ** < <= <> > >= ~= & | .
149 Most of these appear within the syntax of commands, but the period
150 (@samp{.}) punctuator is used only at the end of a command. It is a
151 punctuator only as the last character on a line (except white space).
152 When it is the last non-space character on a line, a period is not
153 treated as part of another token, even if it would otherwise be part
154 of, e.g.@:, an identifier or a floating-point number.
156 Actually, the character that ends a command can be changed with
157 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
158 doing so. Throughout the remainder of this manual we will assume that
159 the default setting is in effect.
162 @node Commands, Types of Commands, Tokens, Language
163 @section Forming commands of tokens
165 @cindex PSPP, command structure
166 @cindex language, command structure
167 @cindex commands, structure
169 Most PSPP commands share a common structure. A command begins with a
170 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
171 CASES}. The command name may be abbreviated to its first word, and
172 each word in the command name may be abbreviated to its first three
173 or more characters, where these abbreviations are unambiguous.
175 The command name may be followed by one or more @dfn{subcommands}.
176 Each subcommand begins with a subcommand name, which may be
177 abbreviated to its first three letters. Some subcommands accept a
178 series of one or more specifications, which follow the subcommand
179 name, optionally separated from it by an equals sign
180 (@samp{=}). Specifications may be separated from each other
181 by commas or spaces. Each subcommand must be separated from the next (if any)
182 by a forward slash (@samp{/}).
184 There are multiple ways to mark the end of a command. The most common
185 way is to end the last line of the command with a period (@samp{.}) as
186 described in the previous section (@pxref{Tokens}). A blank line, or
187 one that consists only of white space or comments, also ends a command
188 by default, although you can use the NULLINE subcommand of @cmd{SET}
189 to disable this feature (@pxref{SET}).
191 In batch mode only, that is, when reading commands from a file instead
192 of an interactive user, any line that contains a non-space character
193 in the leftmost column begins a new command. Thus, each command
194 consists of a flush-left line followed by any number of lines indented
195 from the left margin. In this mode, a plus or minus sign
196 (@samp{+}, @samp{@minus{}}) as the first character
197 in a line is ignored and causes that line to begin a new command,
198 which allows for visual indentation of a command without that command
199 being considered part of the previous command.
201 @node Types of Commands, Order of Commands, Commands, Language
202 @section Types of Commands
204 Commands in PSPP are divided roughly into six categories:
207 @item Utility commands
208 @cindex utility commands
209 Set or display various global options that affect PSPP operations.
210 May appear anywhere in a syntax file. @xref{Utilities, , Utility
213 @item File definition commands
214 @cindex file definition commands
215 Give instructions for reading data from text files or from special
216 binary ``system files''. Most of these commands replace any previous
217 data or variables with new data or
218 variables. At least one file definition command must appear before the first command in any of
219 the categories below. @xref{Data Input and Output}.
221 @item Input program commands
222 @cindex input program commands
223 Though rarely used, these provide tools for reading data files
224 in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
226 @item Transformations
227 @cindex transformations
228 Perform operations on data and write data to output files. Transformations
229 are not carried out until a procedure is executed.
231 @item Restricted transformations
232 @cindex restricted transformations
233 Transformations that cannot appear in certain contexts. @xref{Order
234 of Commands}, for details.
238 Analyze data, writing results of analyses to the listing file. Cause
239 transformations specified earlier in the file to be performed. In a
240 more general sense, a @dfn{procedure} is any command that causes the
241 active file (the data) to be read.
244 @node Order of Commands, Missing Observations, Types of Commands, Language
245 @section Order of Commands
246 @cindex commands, ordering
247 @cindex order of commands
249 PSPP does not place many restrictions on ordering of commands. The
250 main restriction is that variables must be defined before they are otherwise
251 referenced. This section describes the details of command ordering,
252 but most users will have no need to refer to them.
254 PSPP possesses five internal states, called initial, INPUT PROGRAM,
255 FILE TYPE, transformation, and procedure states. (Please note the
256 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
257 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
259 PSPP starts in the initial state. Each successful completion
260 of a command may cause a state transition. Each type of command has its
261 own rules for state transitions:
264 @item Utility commands
269 Do not cause state transitions. Exception: when @cmd{N OF CASES}
270 is executed in the procedure state, it causes a transition to the
271 transformation state.
274 @item @cmd{DATA LIST}
279 When executed in the initial or procedure state, causes a transition to
280 the transformation state.
282 Clears the active file if executed in the procedure or transformation
286 @item @cmd{INPUT PROGRAM}
289 Invalid in INPUT PROGRAM and FILE TYPE states.
291 Causes a transition to the INPUT PROGRAM state.
293 Clears the active file.
296 @item @cmd{FILE TYPE}
299 Invalid in INPUT PROGRAM and FILE TYPE states.
301 Causes a transition to the FILE TYPE state.
303 Clears the active file.
306 @item Other file definition commands
309 Invalid in INPUT PROGRAM and FILE TYPE states.
311 Cause a transition to the transformation state.
313 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
317 @item Transformations
320 Invalid in initial and FILE TYPE states.
322 Cause a transition to the transformation state.
325 @item Restricted transformations
328 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
330 Cause a transition to the transformation state.
336 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
338 Cause a transition to the procedure state.
342 @node Missing Observations, Variables, Order of Commands, Language
343 @section Handling missing observations
344 @cindex missing values
345 @cindex values, missing
347 PSPP includes special support for unknown numeric data values.
348 Missing observations are assigned a special value, called the
349 @dfn{system-missing value}. This ``value'' actually indicates the
350 absence of a value; it means that the actual value is unknown. Procedures
351 automatically exclude from analyses those observations or cases that
352 have missing values. Details of missing value exclusion depend on the
353 procedure and can often be controlled by the user; refer to
354 descriptions of individual procedures for details.
356 The system-missing value exists only for numeric variables. String
357 variables always have a defined value, even if it is only a string of
360 Variables, whether numeric or string, can have designated
361 @dfn{user-missing values}. Every user-missing value is an actual value
362 for that variable. However, most of the time user-missing values are
363 treated in the same way as the system-missing value. String variables
364 that are wider than a certain width, usually 8 characters (depending on
365 computer architecture), cannot have user-missing values.
367 For more information on missing values, see the following sections:
368 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
369 documentation on individual procedures for information on how they
370 handle missing values.
372 @node Variables, Files, Missing Observations, Language
377 Variables are the basic unit of data storage in PSPP. All the
378 variables in a file taken together, apart from any associated data, are
379 said to form a @dfn{dictionary}.
380 Some details of variables are described in the sections below.
383 * Attributes:: Attributes of variables.
384 * System Variables:: Variables automatically defined by PSPP.
385 * Sets of Variables:: Lists of variable names.
386 * Input/Output Formats:: Input and output formats.
387 * Scratch Variables:: Variables deleted by procedures.
390 @node Attributes, System Variables, Variables, Variables
391 @subsection Attributes of Variables
392 @cindex variables, attributes of
393 @cindex attributes of variables
394 Each variable has a number of attributes, including:
398 An identifier, up to 64 bytes long. Each variable must have a different name.
401 Some system variable names begin with @samp{$}, but user-defined
402 variables' names may not begin with @samp{$}.
406 @cindex variable names, ending with period
407 The final character in a variable name should not be @samp{.}, because
408 such an identifier will be misinterpreted when it is the final token
409 on a line: @code{FOO.} will be divided into two separate tokens,
410 @samp{FOO} and @samp{.}, indicating end-of-command. @xref{Tokens}.
413 The final character in a variable name should not be @samp{_}, because
414 some such identifiers are used for special purposes by PSPP
417 As with all PSPP identifiers, variable names are not case-sensitive.
418 PSPP capitalizes variable names on output the same way they were
419 capitalized at their point of definition in the input.
421 @cindex variables, type
422 @cindex type of variables
426 @cindex variables, width
427 @cindex width of variables
429 (string variables only) String variables with a width of 8 characters or
430 fewer are called @dfn{short string variables}. Short string variables
431 can be used in many procedures where @dfn{long string variables} (those
432 with widths greater than 8) are not allowed.
434 Certain systems may consider strings longer than 8
435 characters to be short strings. Eight characters represents a minimum
436 figure for the maximum length of a short string.
439 Variables in the dictionary are arranged in a specific order.
440 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
443 Either reinitialized to 0 or spaces for each case, or left at its
444 existing value. @xref{LEAVE}.
446 @cindex missing values
447 @cindex values, missing
449 Optionally, up to three values, or a range of values, or a specific
450 value plus a range, can be specified as @dfn{user-missing values}.
451 There is also a @dfn{system-missing value} that is assigned to an
452 observation when there is no other obvious value for that observation.
453 Observations with missing values are automatically excluded from
454 analyses. User-missing values are actual data values, while the
455 system-missing value is not a value at all. @xref{Missing Observations}.
457 @cindex variable labels
458 @cindex labels, variable
460 A string that describes the variable. @xref{VARIABLE LABELS}.
463 @cindex labels, value
465 Optionally, these associate each possible value of the variable with a
466 string. @xref{VALUE LABELS}.
470 Display width, format, and (for numeric variables) number of decimal
471 places. This attribute does not affect how data are stored, just how
472 they are displayed. Example: a width of 8, with 2 decimal places.
473 @xref{PRINT FORMATS}.
477 Similar to print format, but used by certain commands that are
478 designed to write to binary files. @xref{WRITE FORMATS}.
481 @node System Variables, Sets of Variables, Attributes, Variables
482 @subsection Variables Automatically Defined by PSPP
483 @cindex system variables
484 @cindex variables, system
486 There are seven system variables. These are not like ordinary
487 variables because system variables are not always stored. They can be used only
488 in expressions. These system variables, whose values and output formats
489 cannot be modified, are described below.
492 @cindex @code{$CASENUM}
494 Case number of the case at the moment. This changes as cases are
499 Date the PSPP process was started, in format A9, following the
500 pattern @code{DD MMM YY}.
502 @cindex @code{$JDATE}
504 Number of days between 15 Oct 1582 and the time the PSPP process
507 @cindex @code{$LENGTH}
509 Page length, in lines, in format F11.
511 @cindex @code{$SYSMIS}
513 System missing value, in format F1.
517 Number of seconds between midnight 14 Oct 1582 and the time the active file
518 was read, in format F20.
520 @cindex @code{$WIDTH}
522 Page width, in characters, in format F3.
525 @node Sets of Variables, Input/Output Formats, System Variables, Variables
526 @subsection Lists of variable names
527 @cindex TO convention
528 @cindex convention, TO
530 To refer to a set of variables, list their names one after another.
531 Optionally, their names may be separated by commas. To include a
532 range of variables from the dictionary in the list, write the name of
533 the first and last variable in the range, separated by @code{TO}. For
534 instance, if the dictionary contains six variables with the names
535 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
536 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
537 variables @code{X2}, @code{GOAL}, and @code{MET}.
539 Commands that define variables, such as @cmd{DATA LIST}, give
540 @code{TO} an alternate meaning. With these commands, @code{TO} define
541 sequences of variables whose names end in consecutive integers. The
542 syntax is two identifiers that begin with the same root and end with
543 numbers, separated by @code{TO}. The syntax @code{X1 TO X5} defines 5
544 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
545 @code{X5}. The syntax @code{ITEM0008 TO ITEM0013} defines 6
546 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
547 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}. The syntaxes
548 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
550 After a set of variables has been defined with @cmd{DATA LIST} or
551 another command with this method, the same set can be referenced on
552 later commands using the same syntax.
554 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
555 @subsection Input and Output Formats
557 Data that PSPP inputs and outputs must have one of a number of formats.
558 These formats are described, in general, by a format specification of
559 the form @code{NAMEw.d}, where @var{name} is the
560 format name and @var{w} is a field width. @var{d} is the optional
561 desired number of decimal places, if appropriate. If @var{d} is not
562 included then it is assumed to be 0. Some formats do not allow @var{d}
565 When @cmd{DATA LIST} or another command specifies an input format,
566 that format is converted to an output format for the purposes of
567 @cmd{PRINT} and other data output commands. For most purposes, input
568 and output formats are the same; the salient differences are described
571 Below are listed the input and output formats supported by PSPP. If an
572 input format is mapped to a different output format by default, then
573 that mapping is indicated with @result{}. Each format has the listed
574 bounds on input width (iw) and output width (ow).
576 The standard numeric input and output formats are given in the following
580 @item Fw.d: 1 <= iw,ow <= 40
581 Standard decimal format with @var{d} decimal places. If the number is
582 too large to fit within the field width, it is expressed in scientific
583 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
584 the exponent. When used as an input format, scientific notation is
585 allowed but an E or an F must be used to introduce the exponent.
587 The default output format is the same as the input format, except if
588 @var{d} > 1. In that case the output @var{w} is always made to be at
591 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
592 For input this is equivalent to F format except that no E or F is
593 require to introduce the exponent. For output, produces scientific
594 notation in the form @code{1.2+34}. There are always at least two
595 digits given in the exponent.
597 The default output @var{w} is the largest of the input @var{w}, the
598 input @var{d} + 7, and 10. The default output @var{d} is the input
599 @var{d}, but at least 3.
601 @item COMMAw.d: 1 <= iw,ow <= 40
602 Equivalent to F format, except that groups of three digits are
603 comma-separated on output. If the number is too large to express in the
604 field width, then first commas are eliminated, then if there is still
605 not enough space the number is expressed in scientific notation given
606 that w >= 6. Commas are allowed and ignored when this is used as an
609 @item DOTw.d: 1 <= iw,ow <= 40
610 Equivalent to COMMA format except that the roles of comma and decimal
611 point are interchanged. However: If SET /DECIMAL=DOT is in effect, then
612 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
615 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
616 Equivalent to COMMA format, except that the number is prefixed by a
617 dollar sign (@samp{$}) if there is room. On input the value is allowed
618 to be prefixed by a dollar sign, which is ignored.
620 The default output @var{w} is the input @var{w}, but at least 2.
622 @item PCTw.d: 2 <= iw,ow <= 40
623 Equivalent to F format, except that the number is suffixed by a percent
624 sign (@samp{%}) if there is room. On input the value is allowed to be
625 suffixed by a percent sign, which is ignored.
627 The default output @var{w} is the input @var{w}, but at least 2.
629 @item Nw.d: 1 <= iw,ow <= 40
630 Only digits are allowed within the field width. The decimal point is
631 assumed to be @var{d} digits from the right margin.
633 The default output format is F with the same @var{w} and @var{d}, except
634 if @var{d} > 1. In that case the output @var{w} is always made to be at
637 @item Zw.d @result{} F: 1 <= iw,ow <= 40
638 Zoned decimal input. If you need to use this then you know how.
640 @item IBw.d @result{} F: 1 <= iw,ow <= 8
641 Integer binary format. The field is interpreted as a fixed-point
642 positive or negative binary number in two's-complement notation. The
643 location of the decimal point is implied. Endianness is the same as the
646 The default output format is F8.2 if @var{d} is 0. Otherwise it is F,
647 with output @var{w} as 9 + input @var{d} and output @var{d} as input
650 @item PIB @result{} F: 1 <= iw,ow <= 8
651 Positive integer binary format. The field is interpreted as a
652 fixed-point positive binary number. The location of the decimal point
653 is implied. Endianness is the same as the host machine.
655 The default output format follows the rules for IB format.
657 @item Pw.d @result{} F: 1 <= iw,ow <= 16
658 Binary coded decimal format. Each byte from left to right, except the
659 rightmost, represents two digits. The upper nibble of each byte is more
660 significant. The upper nibble of the final byte is the least
661 significant digit. The lower nibble of the final byte is the sign; a
662 value of D represents a negative sign and all other values are
663 considered positive. The decimal point is implied.
665 The default output format follows the rules for IB format.
667 @item PKw.d @result{} F: 1 <= iw,ow <= 16
668 Positive binary code decimal format. Same as P but the last byte is the
671 The default output format follows the rules for IB format.
673 @item RBw @result{} F: 2 <= iw,ow <= 8
675 Binary C architecture-dependent ``double'' format. For a standard
676 IEEE754 implementation @var{w} should be 8.
678 The default output format follows the rules for IB format.
680 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
681 PIB format encoded as textual hex digit pairs. @var{w} must be even.
683 The input width is mapped to a default output width as follows:
684 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
685 12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for
688 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
690 RB format encoded as textual hex digits pairs. @var{w} must be even.
692 The default output format is F8.2.
694 @item CCAw.d: 1 <= ow <= 40
695 @itemx CCBw.d: 1 <= ow <= 40
696 @itemx CCCw.d: 1 <= ow <= 40
697 @itemx CCDw.d: 1 <= ow <= 40
698 @itemx CCEw.d: 1 <= ow <= 40
700 User-defined custom currency formats. May not be used as an input
701 format. @xref{SET}, for more details.
704 The date and time numeric input and output formats accept a number of
705 possible formats. Before describing the formats themselves, some
706 definitions of the elements that make up their formats will be helpful:
710 All formats accept an optional white space leader.
713 An integer between 1 and 31 representing the day of month.
716 An integer representing a number of days.
719 One or more characters of white space or the following characters:
723 A month name in one of the following forms:
726 An integer between 1 and 12.
728 Roman numerals representing an integer between 1 and 12.
730 At least the first three characters of an English month name (January,
735 An integer year number between 1582 and 19999, or between 1 and 199.
736 Years between 1 and 199 will have 1900 added.
739 A single number with a year number in the first 2, 3, or 4 digits (as
740 above) and the day number within the year in the last 3 digits.
743 An integer between 1 and 4 representing a quarter.
746 The letter @samp{Q} or @samp{q}.
749 An integer between 1 and 53 representing a week within a year.
752 The letters @samp{wk} in any case.
755 At least one characters of white space or @samp{:} or @samp{.}.
758 An integer greater than 0 representing an hour.
761 An integer between 0 and 59 representing a minute within an hour.
764 Optionally, a time-delimiter followed by a real number representing a
768 An integer between 0 and 23 representing an hour within a day.
771 At least the first two characters of an English day word.
774 Any amount or no amount of white space.
777 An optional positive or negative sign.
780 All formats accept an optional white space trailer.
783 The date input formats are strung together from the above pieces. On
784 output, the date formats are always printed in a single canonical
785 manner, based on field width. The date input and output formats are
789 @item DATEw: 9 <= iw,ow <= 40
790 Date format. Input format: leader + day + date-delimiter +
791 month + date-delimiter + year + trailer. Output format: DD-MMM-YY for
792 @var{w} < 11, DD-MMM-YYYY otherwise.
794 @item EDATEw: 8 <= iw,ow <= 40
795 European date format. Input format same as DATE. Output format:
796 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
798 @item SDATEw: 8 <= iw,ow <= 40
799 Standard date format. Input format: leader + year + date-delimiter +
800 month + date-delimiter + day + trailer. Output format: YY/MM/DD for
801 @var{w} < 10, YYYY/MM/DD otherwise.
803 @item ADATEw: 8 <= iw,ow <= 40
804 American date format. Input format: leader + month + date-delimiter +
805 day + date-delimiter + year + trailer. Output format: MM/DD/YY for
806 @var{w} < 10, MM/DD/YYYY otherwise.
808 @item JDATEw: 5 <= iw,ow <= 40
809 Julian date format. Input format: leader + julian + trailer. Output
810 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
812 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
813 Quarter/year format. Input format: leader + quarter + q-delimiter +
814 year + trailer. Output format: @samp{Q Q YY}, where the first
815 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
818 @item MOYRw: 6 <= iw,ow <= 40
819 Month/year format. Input format: leader + month + date-delimiter + year
820 + trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
823 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
824 Week/year format. Input format: leader + week + wk-delimiter + year +
825 trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
828 @item DATETIMEw.d: 17 <= iw,ow <= 40
829 Date and time format. Input format: leader + day + date-delimiter +
830 month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
831 + minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
832 @var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
833 @var{d} > 0 then fractional seconds @samp{.SS} are added.
835 @item TIMEw.d: 5 <= iw,ow <= 40
836 Time format. Input format: leader + sign + spaces + hour +
837 time-delimiter + minute + opt-second. Output format: @samp{HH:MM}.
838 Seconds and fractional seconds are available with @var{w} of at least 8
839 and 10, respectively.
841 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
842 Time format with day count. Input format: leader + sign + spaces +
843 day-count + time-delimiter + hour + time-delimiter + minute +
844 opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional
845 seconds are available with @var{w} of at least 8 and 10, respectively.
847 @item WKDAYw: 2 <= iw,ow <= 40
848 A weekday as a number between 1 and 7, where 1 is Sunday. Input format:
849 leader + weekday + trailer. Output format: as many characters, in all
850 capital letters, of the English name of the weekday as will fit in the
853 @item MONTHw: 3 <= iw,ow <= 40
854 A month as a number between 1 and 12, where 1 is January. Input format:
855 leader + month + trailer. Output format: as many character, in all
856 capital letters, of the English name of the month as will fit in the
860 There are only two formats that may be used with string variables:
863 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
864 The entire field is treated as a string value.
866 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
867 The field is composed of characters in a string encoded as textual hex
870 The default output @var{w} is half the input @var{w}.
873 @node Scratch Variables, , Input/Output Formats, Variables
874 @subsection Scratch Variables
876 Most of the time, variables don't retain their values between cases.
877 Instead, either they're being read from a data file or the active file,
878 in which case they assume the value read, or, if created with
880 another transformation, they're initialized to the system-missing value
881 or to blanks, depending on type.
883 However, sometimes it's useful to have a variable that keeps its value
884 between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
885 use a @dfn{scratch variable}. Scratch variables are variables whose
886 names begin with an octothorpe (@samp{#}).
888 Scratch variables have the same properties as variables left with
889 @cmd{LEAVE}: they retain their values between cases, and for the first
890 case they are initialized to 0 or blanks. They have the additional
891 property that they are deleted before the execution of any procedure.
892 For this reason, scratch variables can't be used for analysis. To use
893 a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
894 to copy its value into an ordinary variable, then use that ordinary
895 variable in the analysis.
898 @section Files Used by PSPP
900 PSPP makes use of many files each time it runs. Some of these it
901 reads, some it writes, some it creates. Here is a table listing the
902 most important of these files:
905 @cindex file, command
906 @cindex file, syntax file
911 These names (synonyms) refer to the file that contains instructions
912 that tell PSPP what to do. The syntax file's name is specified on
913 the PSPP command line. Syntax files can also be read with
914 @cmd{INCLUDE} (@pxref{INCLUDE}).
919 Data files contain raw data in text or binary format. Data can also
920 be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
925 One or more output files are created by PSPP each time it is
926 run. The output files receive the tables and charts produced by
927 statistical procedures. The output files may be in any number of formats,
928 depending on how PSPP is configured.
933 The active file is the ``file'' on which all PSPP procedures are
934 performed. The active file consists of a dictionary and a set of cases.
935 The active file is not necessarily a disk file: it is stored in memory
941 System files are binary files that store a dictionary and a set of
942 cases. @cmd{GET} and @cmd{SAVE} read and write system files.
944 @cindex portable file
945 @cindex file, portable
947 Portable files are files in a text-based format that store a dictionary
948 and a set of cases. @cmd{IMPORT} and @cmd{EXPORT} read and write
952 @cindex file, scratch
954 Scratch files consist of a dictionary and cases and may be stored in
955 memory or on disk. Most procedures that act on a system file or
956 portable file can use a scratch file instead. The contents of scratch
957 files persist within a single PSPP session only. @cmd{GET} and
958 @cmd{SAVE} can be used to read and write scratch files. Scratch files
959 are a PSPP extension.
963 @section File Handles
966 A @dfn{file handle} is a reference to a data file, system file, portable
967 file, or scratch file. Most often, a file handle is specified as the
968 name of a file as a string, that is, enclosed within @samp{'} or
971 PSPP also supports declaring named file handles with the @cmd{FILE
972 HANDLE} command. This command associates an identifier of your choice
973 (the file handle's name) with a file. Later, the file handle name can
974 be substituted for the name of the file. When PSPP syntax accesses a
975 file multiple times, declaring a named file handle simplifies updating
976 the syntax later to use a different file. Use of @cmd{FILE HANDLE} is
977 also required to read data files in binary formats. @xref{FILE HANDLE},
978 for more information.
980 PSPP assumes that a file handle name that begins with @samp{#} refers to
981 a scratch file, unless the name has already been declared on @cmd{FILE
982 HANDLE} to refer to another kind of file. A scratch file is similar to
983 a system file, except that it persists only for the duration of a given
984 PSPP session. Most commands that read or write a system or portable
985 file, such as @cmd{GET} and @cmd{SAVE}, also accept scratch file
986 handles. Scratch file handles may also be declared explicitly with
987 @cmd{FILE HANDLE}. Scratch files are a PSPP extension.
989 In some circumstances, PSPP must distinguish whether a file handle
990 refers to a system file or a portable file. When this is necessary to
991 read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
992 PSPP uses the file's contents to decide. In the context of writing a
993 file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, PSPP
994 decides based on the file's name: if it ends in @samp{.por} (with any
995 capitalization), then PSPP writes a portable file; otherwise, PSPP
996 writes a system file.
998 INLINE is reserved as a file handle name. It refers to the ``data
999 file'' embedded into the syntax file between @cmd{BEGIN DATA} and
1000 @cmd{END DATA}. @xref{BEGIN DATA}, for more information.
1002 The file to which a file handle refers may be reassigned on a later
1003 @cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
1004 HANDLE}. The @cmd{CLOSE FILE HANDLE} command is also useful to free the
1005 storage associated with a scratch file. @xref{CLOSE FILE HANDLE}, for
1009 @section Backus-Naur Form
1011 @cindex Backus-Naur Form
1012 @cindex command syntax, description of
1013 @cindex description of command syntax
1015 The syntax of some parts of the PSPP language is presented in this
1016 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
1017 following table describes BNF:
1023 Words in all-uppercase are PSPP keyword tokens. In BNF, these are
1024 often called @dfn{terminals}. There are some special terminals, which
1025 are written in lowercase for clarity:
1028 @cindex @code{number}
1032 @cindex @code{integer}
1033 @item @code{integer}
1036 @cindex @code{string}
1040 @cindex @code{var-name}
1041 @item @code{var-name}
1042 A single variable name.
1046 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
1047 Operators and punctuators.
1051 The end of the command. This is not necessarily an actual dot in the
1052 syntax file: @xref{Commands}, for more details.
1057 @cindex nonterminals
1058 Other words in all lowercase refer to BNF definitions, called
1059 @dfn{productions}. These productions are also known as
1060 @dfn{nonterminals}. Some nonterminals are very common, so they are
1061 defined here in English for clarity:
1064 @cindex @code{var-list}
1066 A list of one or more variable names or the keyword @code{ALL}.
1068 @cindex @code{expression}
1070 An expression. @xref{Expressions}, for details.
1074 @cindex ``is defined as''
1076 @samp{::=} means ``is defined as''. The left side of @samp{::=} gives
1077 the name of the nonterminal being defined. The right side of @samp{::=}
1078 gives the definition of that nonterminal. If the right side is empty,
1079 then one possible expansion of that nonterminal is nothing. A BNF
1080 definition is called a @dfn{production}.
1083 @cindex terminals and nonterminals, differences
1084 So, the key difference between a terminal and a nonterminal is that a
1085 terminal cannot be broken into smaller parts---in fact, every terminal
1086 is a single token (@pxref{Tokens}). On the other hand, nonterminals are
1087 composed of a (possibly empty) sequence of terminals and nonterminals.
1088 Thus, terminals indicate the deepest level of syntax description. (In
1089 parsing theory, terminals are the leaves of the parse tree; nonterminals
1093 @cindex start symbol
1094 @cindex symbol, start
1095 The first nonterminal defined in a set of productions is called the
1096 @dfn{start symbol}. The start symbol defines the entire syntax for
1099 @setfilename ignored