1 @node Language, Expressions, Invocation, Top
2 @chapter The PSPP language
7 @strong{Please note:} PSPP is not even close to completion.
8 Only a few statistical procedures are implemented. PSPP
12 This chapter discusses elements common to many PSPP commands.
13 Later chapters will describe individual commands in detail.
16 * Tokens:: Characters combine to form tokens.
17 * Commands:: Tokens combine to form commands.
18 * Types of Commands:: Commands come in several flavors.
19 * Order of Commands:: Commands combine to form syntax files.
20 * Missing Observations:: Handling missing observations.
21 * Variables:: The unit of data storage.
22 * Files:: Files used by PSPP.
23 * File Handles:: How files are named.
24 * BNF:: How command syntax is described.
27 @node Tokens, Commands, Language, Language
29 @cindex language, lexical analysis
30 @cindex language, tokens
32 @cindex lexical analysis
34 PSPP divides most syntax file lines into series of short chunks
36 Tokens are then grouped to form commands, each of which tells
37 PSPP to take some action---read in data, write out data, perform
38 a statistical procedure, etc. Each type of token is
44 Identifiers are names that typically specify variables, commands, or
45 subcommands. The first character in an identifier must be a letter,
46 @samp{#}, or @samp{@@}. The remaining characters in the identifier
47 must be letters, digits, or one of the following special characters:
53 @cindex case-sensitivity
54 Identifiers may be any length, but only the first 64 bytes are
55 significant. Identifiers are not case-sensitive: @code{foobar},
56 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
57 different representations of the same identifier.
59 @cindex identifiers, reserved
60 @cindex reserved identifiers
61 Some identifiers are reserved. Reserved identifiers may not be used
62 in any context besides those explicitly described in this manual. The
63 reserved identifiers are:
66 @center ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
70 Keywords are a subclass of identifiers that form a fixed part of
71 command syntax. For example, command and subcommand names are
72 keywords. Keywords may be abbreviated to their first 3 characters if
73 this abbreviation is unambiguous. (Unique abbreviations of 3 or more
74 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
75 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
77 Reserved identifiers are always used as keywords. Other identifiers
78 may be used both as keywords and as user-defined identifiers, such as
85 Numbers are expressed in decimal. A decimal point is optional.
86 Numbers may be expressed in scientific notation by adding @samp{e} and
87 a base-10 exponent, so that @samp{1.234e3} has the value 1234. Here
88 are some more examples of valid numbers:
91 -5 3.14159265359 1e100 -.707 8945.
94 Negative numbers are expressed with a @samp{-} prefix. However, in
95 situations where a literal @samp{-} token is expected, what appears to
96 be a negative number is treated as @samp{-} followed by a positive
99 No white space is allowed within a number token, except for horizontal
100 white space between @samp{-} and the rest of the number.
102 The last example above, @samp{8945.} will be interpreted as two
103 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
104 @xref{Commands, , Forming commands of tokens}.
110 @cindex case-sensitivity
111 Strings are literal sequences of characters enclosed in pairs of
112 single quotes (@samp{'}) or double quotes (@samp{"}). To include the
113 character used for quoting in the string, double it, e.g.@:
114 @samp{'it''s an apostrophe'}. White space and case of letters are
115 significant inside strings.
117 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
118 'c'} is equivalent to @samp{'abc'}. Concatenation is useful for
119 splitting a single string across multiple source lines. The maximum
120 length of a string, after concatenation, is 255 characters.
122 Strings may also be expressed as hexadecimal, octal, or binary
123 character values by prefixing the initial quote character by @samp{X},
124 @samp{O}, or @samp{B} or their lowercase equivalents. Each pair,
125 triplet, or octet of characters, according to the radix, is
126 transformed into a single character with the given value. If there is
127 an incomplete group of characters, the missing final digits are
128 assumed to be @samp{0}. These forms of strings are nonportable
129 because numeric values are associated with different characters by
130 different operating systems. Therefore, their use should be confined
131 to syntax files that will not be widely distributed.
133 @cindex characters, reserved
136 The character with value 00 is reserved for
137 internal use by PSPP. Its use in strings causes an error and
138 replacement by a space character.
140 @item Punctuators and Operators
143 These tokens are the punctuators and operators:
146 @center , / = ( ) + - * / ** < <= <> > >= ~= & | .
149 Most of these appear within the syntax of commands, but the period
150 (@samp{.}) punctuator is used only at the end of a command. It is a
151 punctuator only as the last character on a line (except white space).
152 When it is the last non-space character on a line, a period is not
153 treated as part of another token, even if it would otherwise be part
154 of, e.g.@:, an identifier or a floating-point number.
156 Actually, the character that ends a command can be changed with
157 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
158 doing so. Throughout the remainder of this manual we will assume that
159 the default setting is in effect.
162 @node Commands, Types of Commands, Tokens, Language
163 @section Forming commands of tokens
165 @cindex PSPP, command structure
166 @cindex language, command structure
167 @cindex commands, structure
169 Most PSPP commands share a common structure. A command begins with a
170 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
171 CASES}. The command name may be abbreviated to its first word, and
172 each word in the command name may be abbreviated to its first three
173 or more characters, where these abbreviations are unambiguous.
175 The command name may be followed by one or more @dfn{subcommands}.
176 Each subcommand begins with a subcommand name, which may be
177 abbreviated to its first three letters. Some subcommands accept a
178 series of one or more specifications, which follow the subcommand
179 name, optionally separated from it by an equals sign
180 (@samp{=}). Specifications may be separated from each other
181 by commas or spaces. Each subcommand must be separated from the next (if any)
182 by a forward slash (@samp{/}).
184 There are multiple ways to mark the end of a command. The most common
185 way is to end the last line of the command with a period (@samp{.}) as
186 described in the previous section (@pxref{Tokens}). A blank line, or
187 one that consists only of white space or comments, also ends a command
188 by default, although you can use the NULLINE subcommand of @cmd{SET}
189 to disable this feature (@pxref{SET}).
191 In batch mode only, that is, when reading commands from a file instead
192 of an interactive user, any line that contains a non-space character
193 in the leftmost column begins a new command. Thus, each command
194 consists of a flush-left line followed by any number of lines indented
195 from the left margin. In this mode, a plus sign, minus sign, or
196 period (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character
197 in a line is ignored and causes that line to begin a new command,
198 which allows for visual indentation of a command without that command
199 being considered part of the previous command.
201 Sometimes, one encounters syntax files that are intended to be
202 interpreted in interactive mode rather than batch mode. When this
203 occurs, use the @samp{-i} command line option to force interpretation
204 in interactive mode (@pxref{Language control options}).
206 @node Types of Commands, Order of Commands, Commands, Language
207 @section Types of Commands
209 Commands in PSPP are divided roughly into six categories:
212 @item Utility commands
213 @cindex utility commands
214 Set or display various global options that affect PSPP operations.
215 May appear anywhere in a syntax file. @xref{Utilities, , Utility
218 @item File definition commands
219 @cindex file definition commands
220 Give instructions for reading data from text files or from special
221 binary ``system files''. Most of these commands replace any previous
222 data or variables with new data or
223 variables. At least one file definition command must appear before the first command in any of
224 the categories below. @xref{Data Input and Output}.
226 @item Input program commands
227 @cindex input program commands
228 Though rarely used, these provide tools for reading data files
229 in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
231 @item Transformations
232 @cindex transformations
233 Perform operations on data and write data to output files. Transformations
234 are not carried out until a procedure is executed.
236 @item Restricted transformations
237 @cindex restricted transformations
238 Transformations that cannot appear in certain contexts. @xref{Order
239 of Commands}, for details.
243 Analyze data, writing results of analyses to the listing file. Cause
244 transformations specified earlier in the file to be performed. In a
245 more general sense, a @dfn{procedure} is any command that causes the
246 active file (the data) to be read.
249 @node Order of Commands, Missing Observations, Types of Commands, Language
250 @section Order of Commands
251 @cindex commands, ordering
252 @cindex order of commands
254 PSPP does not place many restrictions on ordering of commands. The
255 main restriction is that variables must be defined before they are otherwise
256 referenced. This section describes the details of command ordering,
257 but most users will have no need to refer to them.
259 PSPP possesses five internal states, called initial, INPUT PROGRAM,
260 FILE TYPE, transformation, and procedure states. (Please note the
261 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
262 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
264 PSPP starts in the initial state. Each successful completion
265 of a command may cause a state transition. Each type of command has its
266 own rules for state transitions:
269 @item Utility commands
274 Do not cause state transitions. Exception: when @cmd{N OF CASES}
275 is executed in the procedure state, it causes a transition to the
276 transformation state.
279 @item @cmd{DATA LIST}
284 When executed in the initial or procedure state, causes a transition to
285 the transformation state.
287 Clears the active file if executed in the procedure or transformation
291 @item @cmd{INPUT PROGRAM}
294 Invalid in INPUT PROGRAM and FILE TYPE states.
296 Causes a transition to the INPUT PROGRAM state.
298 Clears the active file.
301 @item @cmd{FILE TYPE}
304 Invalid in INPUT PROGRAM and FILE TYPE states.
306 Causes a transition to the FILE TYPE state.
308 Clears the active file.
311 @item Other file definition commands
314 Invalid in INPUT PROGRAM and FILE TYPE states.
316 Cause a transition to the transformation state.
318 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
322 @item Transformations
325 Invalid in initial and FILE TYPE states.
327 Cause a transition to the transformation state.
330 @item Restricted transformations
333 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
335 Cause a transition to the transformation state.
341 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
343 Cause a transition to the procedure state.
347 @node Missing Observations, Variables, Order of Commands, Language
348 @section Handling missing observations
349 @cindex missing values
350 @cindex values, missing
352 PSPP includes special support for unknown numeric data values.
353 Missing observations are assigned a special value, called the
354 @dfn{system-missing value}. This ``value'' actually indicates the
355 absence of a value; it means that the actual value is unknown. Procedures
356 automatically exclude from analyses those observations or cases that
357 have missing values. Details of missing value exclusion depend on the
358 procedure and can often be controlled by the user; refer to
359 descriptions of individual procedures for details.
361 The system-missing value exists only for numeric variables. String
362 variables always have a defined value, even if it is only a string of
365 Variables, whether numeric or string, can have designated
366 @dfn{user-missing values}. Every user-missing value is an actual value
367 for that variable. However, most of the time user-missing values are
368 treated in the same way as the system-missing value. String variables
369 that are wider than a certain width, usually 8 characters (depending on
370 computer architecture), cannot have user-missing values.
372 For more information on missing values, see the following sections:
373 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
374 documentation on individual procedures for information on how they
375 handle missing values.
377 @node Variables, Files, Missing Observations, Language
382 Variables are the basic unit of data storage in PSPP. All the
383 variables in a file taken together, apart from any associated data, are
384 said to form a @dfn{dictionary}.
385 Some details of variables are described in the sections below.
388 * Attributes:: Attributes of variables.
389 * System Variables:: Variables automatically defined by PSPP.
390 * Sets of Variables:: Lists of variable names.
391 * Input/Output Formats:: Input and output formats.
392 * Scratch Variables:: Variables deleted by procedures.
395 @node Attributes, System Variables, Variables, Variables
396 @subsection Attributes of Variables
397 @cindex variables, attributes of
398 @cindex attributes of variables
399 Each variable has a number of attributes, including:
403 An identifier, up to 64 bytes long. Each variable must have a different name.
406 Some system variable names begin with @samp{$}, but user-defined
407 variables' names may not begin with @samp{$}.
411 @cindex variable names, ending with period
412 The final character in a variable name should not be @samp{.}, because
413 such an identifier will be misinterpreted when it is the final token
414 on a line: @code{FOO.} will be divided into two separate tokens,
415 @samp{FOO} and @samp{.}, indicating end-of-command. @xref{Tokens}.
418 The final character in a variable name should not be @samp{_}, because
419 some such identifiers are used for special purposes by PSPP
422 As with all PSPP identifiers, variable names are not case-sensitive.
423 PSPP capitalizes variable names on output the same way they were
424 capitalized at their point of definition in the input.
426 @cindex variables, type
427 @cindex type of variables
431 @cindex variables, width
432 @cindex width of variables
434 (string variables only) String variables with a width of 8 characters or
435 fewer are called @dfn{short string variables}. Short string variables
436 can be used in many procedures where @dfn{long string variables} (those
437 with widths greater than 8) are not allowed.
439 Certain systems may consider strings longer than 8
440 characters to be short strings. Eight characters represents a minimum
441 figure for the maximum length of a short string.
444 Variables in the dictionary are arranged in a specific order.
445 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
448 Either reinitialized to 0 or spaces for each case, or left at its
449 existing value. @xref{LEAVE}.
451 @cindex missing values
452 @cindex values, missing
454 Optionally, up to three values, or a range of values, or a specific
455 value plus a range, can be specified as @dfn{user-missing values}.
456 There is also a @dfn{system-missing value} that is assigned to an
457 observation when there is no other obvious value for that observation.
458 Observations with missing values are automatically excluded from
459 analyses. User-missing values are actual data values, while the
460 system-missing value is not a value at all. @xref{Missing Observations}.
462 @cindex variable labels
463 @cindex labels, variable
465 A string that describes the variable. @xref{VARIABLE LABELS}.
468 @cindex labels, value
470 Optionally, these associate each possible value of the variable with a
471 string. @xref{VALUE LABELS}.
475 Display width, format, and (for numeric variables) number of decimal
476 places. This attribute does not affect how data are stored, just how
477 they are displayed. Example: a width of 8, with 2 decimal places.
478 @xref{PRINT FORMATS}.
482 Similar to print format, but used by certain commands that are
483 designed to write to binary files. @xref{WRITE FORMATS}.
486 @node System Variables, Sets of Variables, Attributes, Variables
487 @subsection Variables Automatically Defined by PSPP
488 @cindex system variables
489 @cindex variables, system
491 There are seven system variables. These are not like ordinary
492 variables because system variables are not always stored. They can be used only
493 in expressions. These system variables, whose values and output formats
494 cannot be modified, are described below.
497 @cindex @code{$CASENUM}
499 Case number of the case at the moment. This changes as cases are
504 Date the PSPP process was started, in format A9, following the
505 pattern @code{DD MMM YY}.
507 @cindex @code{$JDATE}
509 Number of days between 15 Oct 1582 and the time the PSPP process
512 @cindex @code{$LENGTH}
514 Page length, in lines, in format F11.
516 @cindex @code{$SYSMIS}
518 System missing value, in format F1.
522 Number of seconds between midnight 14 Oct 1582 and the time the active file
523 was read, in format F20.
525 @cindex @code{$WIDTH}
527 Page width, in characters, in format F3.
530 @node Sets of Variables, Input/Output Formats, System Variables, Variables
531 @subsection Lists of variable names
532 @cindex TO convention
533 @cindex convention, TO
535 To refer to a set of variables, list their names one after another.
536 Optionally, their names may be separated by commas. To include a
537 range of variables from the dictionary in the list, write the name of
538 the first and last variable in the range, separated by @code{TO}. For
539 instance, if the dictionary contains six variables with the names
540 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
541 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
542 variables @code{X2}, @code{GOAL}, and @code{MET}.
544 Commands that define variables, such as @cmd{DATA LIST}, give
545 @code{TO} an alternate meaning. With these commands, @code{TO} define
546 sequences of variables whose names end in consecutive integers. The
547 syntax is two identifiers that begin with the same root and end with
548 numbers, separated by @code{TO}. The syntax @code{X1 TO X5} defines 5
549 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
550 @code{X5}. The syntax @code{ITEM0008 TO ITEM0013} defines 6
551 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
552 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}. The syntaxes
553 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
555 After a set of variables has been defined with @cmd{DATA LIST} or
556 another command with this method, the same set can be referenced on
557 later commands using the same syntax.
559 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
560 @subsection Input and Output Formats
562 Data that PSPP inputs and outputs must have one of a number of formats.
563 These formats are described, in general, by a format specification of
564 the form @code{NAMEw.d}, where @var{name} is the
565 format name and @var{w} is a field width. @var{d} is the optional
566 desired number of decimal places, if appropriate. If @var{d} is not
567 included then it is assumed to be 0. Some formats do not allow @var{d}
570 When @cmd{DATA LIST} or another command specifies an input format,
571 that format is converted to an output format for the purposes of
572 @cmd{PRINT} and other data output commands. For most purposes, input
573 and output formats are the same; the salient differences are described
576 Below are listed the input and output formats supported by PSPP. If an
577 input format is mapped to a different output format by default, then
578 that mapping is indicated with @result{}. Each format has the listed
579 bounds on input width (iw) and output width (ow).
581 The standard numeric input and output formats are given in the following
585 @item Fw.d: 1 <= iw,ow <= 40
586 Standard decimal format with @var{d} decimal places. If the number is
587 too large to fit within the field width, it is expressed in scientific
588 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
589 the exponent. When used as an input format, scientific notation is
590 allowed but an E or an F must be used to introduce the exponent.
592 The default output format is the same as the input format, except if
593 @var{d} > 1. In that case the output @var{w} is always made to be at
596 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
597 For input this is equivalent to F format except that no E or F is
598 require to introduce the exponent. For output, produces scientific
599 notation in the form @code{1.2+34}. There are always at least two
600 digits given in the exponent.
602 The default output @var{w} is the largest of the input @var{w}, the
603 input @var{d} + 7, and 10. The default output @var{d} is the input
604 @var{d}, but at least 3.
606 @item COMMAw.d: 1 <= iw,ow <= 40
607 Equivalent to F format, except that groups of three digits are
608 comma-separated on output. If the number is too large to express in the
609 field width, then first commas are eliminated, then if there is still
610 not enough space the number is expressed in scientific notation given
611 that w >= 6. Commas are allowed and ignored when this is used as an
614 @item DOTw.d: 1 <= iw,ow <= 40
615 Equivalent to COMMA format except that the roles of comma and decimal
616 point are interchanged. However: If SET /DECIMAL=DOT is in effect, then
617 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
620 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
621 Equivalent to COMMA format, except that the number is prefixed by a
622 dollar sign (@samp{$}) if there is room. On input the value is allowed
623 to be prefixed by a dollar sign, which is ignored.
625 The default output @var{w} is the input @var{w}, but at least 2.
627 @item PCTw.d: 2 <= iw,ow <= 40
628 Equivalent to F format, except that the number is suffixed by a percent
629 sign (@samp{%}) if there is room. On input the value is allowed to be
630 suffixed by a percent sign, which is ignored.
632 The default output @var{w} is the input @var{w}, but at least 2.
634 @item Nw.d: 1 <= iw,ow <= 40
635 Only digits are allowed within the field width. The decimal point is
636 assumed to be @var{d} digits from the right margin.
638 The default output format is F with the same @var{w} and @var{d}, except
639 if @var{d} > 1. In that case the output @var{w} is always made to be at
642 @item Zw.d @result{} F: 1 <= iw,ow <= 40
643 Zoned decimal input. If you need to use this then you know how.
645 @item IBw.d @result{} F: 1 <= iw,ow <= 8
646 Integer binary format. The field is interpreted as a fixed-point
647 positive or negative binary number in two's-complement notation. The
648 location of the decimal point is implied. Endianness is the same as the
651 The default output format is F8.2 if @var{d} is 0. Otherwise it is F,
652 with output @var{w} as 9 + input @var{d} and output @var{d} as input
655 @item PIB @result{} F: 1 <= iw,ow <= 8
656 Positive integer binary format. The field is interpreted as a
657 fixed-point positive binary number. The location of the decimal point
658 is implied. Endianness is the same as the host machine.
660 The default output format follows the rules for IB format.
662 @item Pw.d @result{} F: 1 <= iw,ow <= 16
663 Binary coded decimal format. Each byte from left to right, except the
664 rightmost, represents two digits. The upper nibble of each byte is more
665 significant. The upper nibble of the final byte is the least
666 significant digit. The lower nibble of the final byte is the sign; a
667 value of D represents a negative sign and all other values are
668 considered positive. The decimal point is implied.
670 The default output format follows the rules for IB format.
672 @item PKw.d @result{} F: 1 <= iw,ow <= 16
673 Positive binary code decimal format. Same as P but the last byte is the
676 The default output format follows the rules for IB format.
678 @item RBw @result{} F: 2 <= iw,ow <= 8
680 Binary C architecture-dependent ``double'' format. For a standard
681 IEEE754 implementation @var{w} should be 8.
683 The default output format follows the rules for IB format.
685 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
686 PIB format encoded as textual hex digit pairs. @var{w} must be even.
688 The input width is mapped to a default output width as follows:
689 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
690 12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for
693 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
695 RB format encoded as textual hex digits pairs. @var{w} must be even.
697 The default output format is F8.2.
699 @item CCAw.d: 1 <= ow <= 40
700 @itemx CCBw.d: 1 <= ow <= 40
701 @itemx CCCw.d: 1 <= ow <= 40
702 @itemx CCDw.d: 1 <= ow <= 40
703 @itemx CCEw.d: 1 <= ow <= 40
705 User-defined custom currency formats. May not be used as an input
706 format. @xref{SET}, for more details.
709 The date and time numeric input and output formats accept a number of
710 possible formats. Before describing the formats themselves, some
711 definitions of the elements that make up their formats will be helpful:
715 All formats accept an optional white space leader.
718 An integer between 1 and 31 representing the day of month.
721 An integer representing a number of days.
724 One or more characters of white space or the following characters:
728 A month name in one of the following forms:
731 An integer between 1 and 12.
733 Roman numerals representing an integer between 1 and 12.
735 At least the first three characters of an English month name (January,
740 An integer year number between 1582 and 19999, or between 1 and 199.
741 Years between 1 and 199 will have 1900 added.
744 A single number with a year number in the first 2, 3, or 4 digits (as
745 above) and the day number within the year in the last 3 digits.
748 An integer between 1 and 4 representing a quarter.
751 The letter @samp{Q} or @samp{q}.
754 An integer between 1 and 53 representing a week within a year.
757 The letters @samp{wk} in any case.
760 At least one characters of white space or @samp{:} or @samp{.}.
763 An integer greater than 0 representing an hour.
766 An integer between 0 and 59 representing a minute within an hour.
769 Optionally, a time-delimiter followed by a real number representing a
773 An integer between 0 and 23 representing an hour within a day.
776 At least the first two characters of an English day word.
779 Any amount or no amount of white space.
782 An optional positive or negative sign.
785 All formats accept an optional white space trailer.
788 The date input formats are strung together from the above pieces. On
789 output, the date formats are always printed in a single canonical
790 manner, based on field width. The date input and output formats are
794 @item DATEw: 9 <= iw,ow <= 40
795 Date format. Input format: leader + day + date-delimiter +
796 month + date-delimiter + year + trailer. Output format: DD-MMM-YY for
797 @var{w} < 11, DD-MMM-YYYY otherwise.
799 @item EDATEw: 8 <= iw,ow <= 40
800 European date format. Input format same as DATE. Output format:
801 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
803 @item SDATEw: 8 <= iw,ow <= 40
804 Standard date format. Input format: leader + year + date-delimiter +
805 month + date-delimiter + day + trailer. Output format: YY/MM/DD for
806 @var{w} < 10, YYYY/MM/DD otherwise.
808 @item ADATEw: 8 <= iw,ow <= 40
809 American date format. Input format: leader + month + date-delimiter +
810 day + date-delimiter + year + trailer. Output format: MM/DD/YY for
811 @var{w} < 10, MM/DD/YYYY otherwise.
813 @item JDATEw: 5 <= iw,ow <= 40
814 Julian date format. Input format: leader + julian + trailer. Output
815 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
817 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
818 Quarter/year format. Input format: leader + quarter + q-delimiter +
819 year + trailer. Output format: @samp{Q Q YY}, where the first
820 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
823 @item MOYRw: 6 <= iw,ow <= 40
824 Month/year format. Input format: leader + month + date-delimiter + year
825 + trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
828 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
829 Week/year format. Input format: leader + week + wk-delimiter + year +
830 trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
833 @item DATETIMEw.d: 17 <= iw,ow <= 40
834 Date and time format. Input format: leader + day + date-delimiter +
835 month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
836 + minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
837 @var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
838 @var{d} > 0 then fractional seconds @samp{.SS} are added.
840 @item TIMEw.d: 5 <= iw,ow <= 40
841 Time format. Input format: leader + sign + spaces + hour +
842 time-delimiter + minute + opt-second. Output format: @samp{HH:MM}.
843 Seconds and fractional seconds are available with @var{w} of at least 8
844 and 10, respectively.
846 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
847 Time format with day count. Input format: leader + sign + spaces +
848 day-count + time-delimiter + hour + time-delimiter + minute +
849 opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional
850 seconds are available with @var{w} of at least 8 and 10, respectively.
852 @item WKDAYw: 2 <= iw,ow <= 40
853 A weekday as a number between 1 and 7, where 1 is Sunday. Input format:
854 leader + weekday + trailer. Output format: as many characters, in all
855 capital letters, of the English name of the weekday as will fit in the
858 @item MONTHw: 3 <= iw,ow <= 40
859 A month as a number between 1 and 12, where 1 is January. Input format:
860 leader + month + trailer. Output format: as many character, in all
861 capital letters, of the English name of the month as will fit in the
865 There are only two formats that may be used with string variables:
868 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
869 The entire field is treated as a string value.
871 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
872 The field is composed of characters in a string encoded as textual hex
875 The default output @var{w} is half the input @var{w}.
878 @node Scratch Variables, , Input/Output Formats, Variables
879 @subsection Scratch Variables
881 Most of the time, variables don't retain their values between cases.
882 Instead, either they're being read from a data file or the active file,
883 in which case they assume the value read, or, if created with
885 another transformation, they're initialized to the system-missing value
886 or to blanks, depending on type.
888 However, sometimes it's useful to have a variable that keeps its value
889 between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
890 use a @dfn{scratch variable}. Scratch variables are variables whose
891 names begin with an octothorpe (@samp{#}).
893 Scratch variables have the same properties as variables left with
894 @cmd{LEAVE}: they retain their values between cases, and for the first
895 case they are initialized to 0 or blanks. They have the additional
896 property that they are deleted before the execution of any procedure.
897 For this reason, scratch variables can't be used for analysis. To use
898 a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
899 to copy its value into an ordinary variable, then use that ordinary
900 variable in the analysis.
903 @section Files Used by PSPP
905 PSPP makes use of many files each time it runs. Some of these it
906 reads, some it writes, some it creates. Here is a table listing the
907 most important of these files:
910 @cindex file, command
911 @cindex file, syntax file
916 These names (synonyms) refer to the file that contains instructions
917 that tell PSPP what to do. The syntax file's name is specified on
918 the PSPP command line. Syntax files can also be read with
919 @cmd{INCLUDE} (@pxref{INCLUDE}).
924 Data files contain raw data in text or binary format. Data can also
925 be embedded in a syntax file with @cmd{BEGIN DATA} and @cmd{END DATA}.
930 One or more output files are created by PSPP each time it is
931 run. The output files receive the tables and charts produced by
932 statistical procedures. The output files may be in any number of formats,
933 depending on how PSPP is configured.
938 The active file is the ``file'' on which all PSPP procedures are
939 performed. The active file consists of a dictionary and a set of cases.
940 The active file is not necessarily a disk file: it is stored in memory
946 System files are binary files that store a dictionary and a set of
947 cases. @cmd{GET} and @cmd{SAVE} read and write system files.
949 @cindex portable file
950 @cindex file, portable
952 Portable files are files in a text-based format that store a dictionary
953 and a set of cases. @cmd{IMPORT} and @cmd{EXPORT} read and write
957 @cindex file, scratch
959 Scratch files consist of a dictionary and cases and may be stored in
960 memory or on disk. Most procedures that act on a system file or
961 portable file can use a scratch file instead. The contents of scratch
962 files persist within a single PSPP session only. @cmd{GET} and
963 @cmd{SAVE} can be used to read and write scratch files. Scratch files
964 are a PSPP extension.
968 @section File Handles
971 A @dfn{file handle} is a reference to a data file, system file, portable
972 file, or scratch file. Most often, a file handle is specified as the
973 name of a file as a string, that is, enclosed within @samp{'} or
976 PSPP also supports declaring named file handles with the @cmd{FILE
977 HANDLE} command. This command associates an identifier of your choice
978 (the file handle's name) with a file. Later, the file handle name can
979 be substituted for the name of the file. When PSPP syntax accesses a
980 file multiple times, declaring a named file handle simplifies updating
981 the syntax later to use a different file. Use of @cmd{FILE HANDLE} is
982 also required to read data files in binary formats. @xref{FILE HANDLE},
983 for more information.
985 PSPP assumes that a file handle name that begins with @samp{#} refers to
986 a scratch file, unless the name has already been declared on @cmd{FILE
987 HANDLE} to refer to another kind of file. A scratch file is similar to
988 a system file, except that it persists only for the duration of a given
989 PSPP session. Most commands that read or write a system or portable
990 file, such as @cmd{GET} and @cmd{SAVE}, also accept scratch file
991 handles. Scratch file handles may also be declared explicitly with
992 @cmd{FILE HANDLE}. Scratch files are a PSPP extension.
994 In some circumstances, PSPP must distinguish whether a file handle
995 refers to a system file or a portable file. When this is necessary to
996 read a file, e.g.@: as an input file for @cmd{GET} or @cmd{MATCH FILES},
997 PSPP uses the file's contents to decide. In the context of writing a
998 file, e.g.@: as an output file for @cmd{SAVE} or @cmd{AGGREGATE}, PSPP
999 decides based on the file's name: if it ends in @samp{.por} (with any
1000 capitalization), then PSPP writes a portable file; otherwise, PSPP
1001 writes a system file.
1003 INLINE is reserved as a file handle name. It refers to the ``data
1004 file'' embedded into the syntax file between @cmd{BEGIN DATA} and
1005 @cmd{END DATA}. @xref{BEGIN DATA}, for more information.
1007 The file to which a file handle refers may be reassigned on a later
1008 @cmd{FILE HANDLE} command if it is first closed using @cmd{CLOSE FILE
1009 HANDLE}. The @cmd{CLOSE FILE HANDLE} command is also useful to free the
1010 storage associated with a scratch file. @xref{CLOSE FILE HANDLE}, for
1014 @section Backus-Naur Form
1016 @cindex Backus-Naur Form
1017 @cindex command syntax, description of
1018 @cindex description of command syntax
1020 The syntax of some parts of the PSPP language is presented in this
1021 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
1022 following table describes BNF:
1028 Words in all-uppercase are PSPP keyword tokens. In BNF, these are
1029 often called @dfn{terminals}. There are some special terminals, which
1030 are written in lowercase for clarity:
1033 @cindex @code{number}
1037 @cindex @code{integer}
1038 @item @code{integer}
1041 @cindex @code{string}
1045 @cindex @code{var-name}
1046 @item @code{var-name}
1047 A single variable name.
1051 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
1052 Operators and punctuators.
1056 The end of the command. This is not necessarily an actual dot in the
1057 syntax file: @xref{Commands}, for more details.
1062 @cindex nonterminals
1063 Other words in all lowercase refer to BNF definitions, called
1064 @dfn{productions}. These productions are also known as
1065 @dfn{nonterminals}. Some nonterminals are very common, so they are
1066 defined here in English for clarity:
1069 @cindex @code{var-list}
1071 A list of one or more variable names or the keyword @code{ALL}.
1073 @cindex @code{expression}
1075 An expression. @xref{Expressions}, for details.
1079 @cindex ``is defined as''
1081 @samp{::=} means ``is defined as''. The left side of @samp{::=} gives
1082 the name of the nonterminal being defined. The right side of @samp{::=}
1083 gives the definition of that nonterminal. If the right side is empty,
1084 then one possible expansion of that nonterminal is nothing. A BNF
1085 definition is called a @dfn{production}.
1088 @cindex terminals and nonterminals, differences
1089 So, the key difference between a terminal and a nonterminal is that a
1090 terminal cannot be broken into smaller parts---in fact, every terminal
1091 is a single token (@pxref{Tokens}). On the other hand, nonterminals are
1092 composed of a (possibly empty) sequence of terminals and nonterminals.
1093 Thus, terminals indicate the deepest level of syntax description. (In
1094 parsing theory, terminals are the leaves of the parse tree; nonterminals
1098 @cindex start symbol
1099 @cindex symbol, start
1100 The first nonterminal defined in a set of productions is called the
1101 @dfn{start symbol}. The start symbol defines the entire syntax for
1104 @setfilename ignored