1 @node Language, Expressions, Invocation, Top
2 @chapter The PSPP language
7 @strong{Please note:} PSPP is not even close to completion.
8 Only a few statistical procedures are implemented. PSPP
12 This chapter discusses elements common to many PSPP commands.
13 Later chapters will describe individual commands in detail.
16 * Tokens:: Characters combine to form tokens.
17 * Commands:: Tokens combine to form commands.
18 * Types of Commands:: Commands come in several flavors.
19 * Order of Commands:: Commands combine to form syntax files.
20 * Missing Observations:: Handling missing observations.
21 * Variables:: The unit of data storage.
22 * Files:: Files used by PSPP.
23 * BNF:: How command syntax is described.
26 @node Tokens, Commands, Language, Language
28 @cindex language, lexical analysis
29 @cindex language, tokens
31 @cindex lexical analysis
33 PSPP divides most syntax file lines into series of short chunks
35 Tokens are then grouped to form commands, each of which tells
36 PSPP to take some action---read in data, write out data, perform
37 a statistical procedure, etc. Each type of token is
43 Identifiers are names that typically specify variables, commands, or
44 subcommands. The first character in an identifier must be a letter,
45 @samp{#}, or @samp{@@}. The remaining characters in the identifier
46 must be letters, digits, or one of the following special characters:
52 @cindex case-sensitivity
53 Identifiers may be any length, but only the first 64 bytes are
54 significant. Identifiers are not case-sensitive: @code{foobar},
55 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
56 different representations of the same identifier.
58 @cindex identifiers, reserved
59 @cindex reserved identifiers
60 Some identifiers are reserved. Reserved identifiers may not be used
61 in any context besides those explicitly described in this manual. The
62 reserved identifiers are:
65 @center ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
69 Keywords are a subclass of identifiers that form a fixed part of
70 command syntax. For example, command and subcommand names are
71 keywords. Keywords may be abbreviated to their first 3 characters if
72 this abbreviation is unambiguous. (Unique abbreviations of 3 or more
73 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
74 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
76 Reserved identifiers are always used as keywords. Other identifiers
77 may be used both as keywords and as user-defined identifiers, such as
84 Numbers are expressed in decimal. A decimal point is optional.
85 Numbers may be expressed in scientific notation by adding @samp{e} and
86 a base-10 exponent, so that @samp{1.234e3} has the value 1234. Here
87 are some more examples of valid numbers:
90 -5 3.14159265359 1e100 -.707 8945.
93 Negative numbers are expressed with a @samp{-} prefix. However, in
94 situations where a literal @samp{-} token is expected, what appears to
95 be a negative number is treated as @samp{-} followed by a positive
98 No white space is allowed within a number token, except for horizontal
99 white space between @samp{-} and the rest of the number.
101 The last example above, @samp{8945.} will be interpreted as two
102 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
103 @xref{Commands, , Forming commands of tokens}.
109 @cindex case-sensitivity
110 Strings are literal sequences of characters enclosed in pairs of
111 single quotes (@samp{'}) or double quotes (@samp{"}). To include the
112 character used for quoting in the string, double it, e.g.@:
113 @samp{'it''s an apostrophe'}. White space and case of letters are
114 significant inside strings.
116 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
117 'c'} is equivalent to @samp{'abc'}. Concatenation is useful for
118 splitting a single string across multiple source lines. The maximum
119 length of a string, after concatenation, is 255 characters.
121 Strings may also be expressed as hexadecimal, octal, or binary
122 character values by prefixing the initial quote character by @samp{X},
123 @samp{O}, or @samp{B} or their lowercase equivalents. Each pair,
124 triplet, or octet of characters, according to the radix, is
125 transformed into a single character with the given value. If there is
126 an incomplete group of characters, the missing final digits are
127 assumed to be @samp{0}. These forms of strings are nonportable
128 because numeric values are associated with different characters by
129 different operating systems. Therefore, their use should be confined
130 to syntax files that will not be widely distributed.
132 @cindex characters, reserved
135 The character with value 00 is reserved for
136 internal use by PSPP. Its use in strings causes an error and
137 replacement by a space character.
139 @item Punctuators and Operators
142 These tokens are the punctuators and operators:
145 @center , / = ( ) + - * / ** < <= <> > >= ~= & | .
148 Most of these appear within the syntax of commands, but the period
149 (@samp{.}) punctuator is used only at the end of a command. It is a
150 punctuator only as the last character on a line (except white space).
151 When it is the last non-space character on a line, a period is not
152 treated as part of another token, even if it would otherwise be part
153 of, e.g.@:, an identifier or a floating-point number.
155 Actually, the character that ends a command can be changed with
156 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
157 doing so. Throughout the remainder of this manual we will assume that
158 the default setting is in effect.
161 @node Commands, Types of Commands, Tokens, Language
162 @section Forming commands of tokens
164 @cindex PSPP, command structure
165 @cindex language, command structure
166 @cindex commands, structure
168 Most PSPP commands share a common structure. A command begins with a
169 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
170 CASES}. The command name may be abbreviated to its first word, and
171 each word in the command name may be abbreviated to its first three
172 or more characters, where these abbreviations are unambiguous.
174 The command name may be followed by one or more @dfn{subcommands}.
175 Each subcommand begins with a subcommand name, which may be
176 abbreviated to its first three letters. Some subcommands accept a
177 series of one or more specifications, which follow the subcommand
178 name, optionally separated from it by an equals sign
179 (@samp{=}). Specifications may be separated from each other
180 by commas or spaces. Each subcommand must be separated from the next (if any)
181 by a forward slash (@samp{/}).
183 There are multiple ways to mark the end of a command. The most common
184 way is to end the last line of the command with a period (@samp{.}) as
185 described in the previous section (@pxref{Tokens}). A blank line, or
186 one that consists only of white space or comments, also ends a command
187 by default, although you can use the NULLINE subcommand of @cmd{SET}
188 to disable this feature (@pxref{SET}).
190 In batch mode only, that is, when reading commands from a file instead
191 of an interactive user, any line that contains a non-space character
192 in the leftmost column begins a new command. Thus, each command
193 consists of a flush-left line followed by any number of lines indented
194 from the left margin. In this mode, a plus sign, minus sign, or
195 period (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character
196 in a line is ignored and causes that line to begin a new command,
197 which allows for visual indentation of a command without that command
198 being considered part of the previous command.
200 Sometimes, one encounters syntax files that are intended to be
201 interpreted in interactive mode rather than batch mode. When this
202 occurs, use the @samp{-i} command line option to force interpretation
203 in interactive mode (@pxref{Language control options}).
205 @node Types of Commands, Order of Commands, Commands, Language
206 @section Types of Commands
208 Commands in PSPP are divided roughly into six categories:
211 @item Utility commands
212 @cindex utility commands
213 Set or display various global options that affect PSPP operations.
214 May appear anywhere in a syntax file. @xref{Utilities, , Utility
217 @item File definition commands
218 @cindex file definition commands
219 Give instructions for reading data from text files or from special
220 binary ``system files''. Most of these commands replace any previous
221 data or variables with new data or
222 variables. At least one file definition command must appear before the first command in any of
223 the categories below. @xref{Data Input and Output}.
225 @item Input program commands
226 @cindex input program commands
227 Though rarely used, these provide tools for reading data files
228 in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
230 @item Transformations
231 @cindex transformations
232 Perform operations on data and write data to output files. Transformations
233 are not carried out until a procedure is executed.
235 @item Restricted transformations
236 @cindex restricted transformations
237 Transformations that cannot appear in certain contexts. @xref{Order
238 of Commands}, for details.
242 Analyze data, writing results of analyses to the listing file. Cause
243 transformations specified earlier in the file to be performed. In a
244 more general sense, a @dfn{procedure} is any command that causes the
245 active file (the data) to be read.
248 @node Order of Commands, Missing Observations, Types of Commands, Language
249 @section Order of Commands
250 @cindex commands, ordering
251 @cindex order of commands
253 PSPP does not place many restrictions on ordering of commands. The
254 main restriction is that variables must be defined before they are otherwise
255 referenced. This section describes the details of command ordering,
256 but most users will have no need to refer to them.
258 PSPP possesses five internal states, called initial, INPUT PROGRAM,
259 FILE TYPE, transformation, and procedure states. (Please note the
260 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
261 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
263 PSPP starts in the initial state. Each successful completion
264 of a command may cause a state transition. Each type of command has its
265 own rules for state transitions:
268 @item Utility commands
273 Do not cause state transitions. Exception: when @cmd{N OF CASES}
274 is executed in the procedure state, it causes a transition to the
275 transformation state.
278 @item @cmd{DATA LIST}
283 When executed in the initial or procedure state, causes a transition to
284 the transformation state.
286 Clears the active file if executed in the procedure or transformation
290 @item @cmd{INPUT PROGRAM}
293 Invalid in INPUT PROGRAM and FILE TYPE states.
295 Causes a transition to the INPUT PROGRAM state.
297 Clears the active file.
300 @item @cmd{FILE TYPE}
303 Invalid in INPUT PROGRAM and FILE TYPE states.
305 Causes a transition to the FILE TYPE state.
307 Clears the active file.
310 @item Other file definition commands
313 Invalid in INPUT PROGRAM and FILE TYPE states.
315 Cause a transition to the transformation state.
317 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
321 @item Transformations
324 Invalid in initial and FILE TYPE states.
326 Cause a transition to the transformation state.
329 @item Restricted transformations
332 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
334 Cause a transition to the transformation state.
340 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
342 Cause a transition to the procedure state.
346 @node Missing Observations, Variables, Order of Commands, Language
347 @section Handling missing observations
348 @cindex missing values
349 @cindex values, missing
351 PSPP includes special support for unknown numeric data values.
352 Missing observations are assigned a special value, called the
353 @dfn{system-missing value}. This ``value'' actually indicates the
354 absence of a value; it means that the actual value is unknown. Procedures
355 automatically exclude from analyses those observations or cases that
356 have missing values. Details of missing value exclusion depend on the
357 procedure and can often be controlled by the user; refer to
358 descriptions of individual procedures for details.
360 The system-missing value exists only for numeric variables. String
361 variables always have a defined value, even if it is only a string of
364 Variables, whether numeric or string, can have designated
365 @dfn{user-missing values}. Every user-missing value is an actual value
366 for that variable. However, most of the time user-missing values are
367 treated in the same way as the system-missing value. String variables
368 that are wider than a certain width, usually 8 characters (depending on
369 computer architecture), cannot have user-missing values.
371 For more information on missing values, see the following sections:
372 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
373 documentation on individual procedures for information on how they
374 handle missing values.
376 @node Variables, Files, Missing Observations, Language
381 Variables are the basic unit of data storage in PSPP. All the
382 variables in a file taken together, apart from any associated data, are
383 said to form a @dfn{dictionary}.
384 Some details of variables are described in the sections below.
387 * Attributes:: Attributes of variables.
388 * System Variables:: Variables automatically defined by PSPP.
389 * Sets of Variables:: Lists of variable names.
390 * Input/Output Formats:: Input and output formats.
391 * Scratch Variables:: Variables deleted by procedures.
394 @node Attributes, System Variables, Variables, Variables
395 @subsection Attributes of Variables
396 @cindex variables, attributes of
397 @cindex attributes of variables
398 Each variable has a number of attributes, including:
402 An identifier, up to 64 bytes long. Each variable must have a different name.
405 Some system variable names begin with @samp{$}, but user-defined
406 variables' names may not begin with @samp{$}.
410 @cindex variable names, ending with period
411 The final character in a variable name should not be @samp{.}, because
412 such an identifier will be misinterpreted when it is the final token
413 on a line: @code{FOO.} will be divided into two separate tokens,
414 @samp{FOO} and @samp{.}, indicating end-of-command. @xref{Tokens}.
417 The final character in a variable name should not be @samp{_}, because
418 some such identifiers are used for special purposes by PSPP
421 As with all PSPP identifiers, variable names are not case-sensitive.
422 PSPP capitalizes variable names on output the same way they were
423 capitalized at their point of definition in the input.
425 @cindex variables, type
426 @cindex type of variables
430 @cindex variables, width
431 @cindex width of variables
433 (string variables only) String variables with a width of 8 characters or
434 fewer are called @dfn{short string variables}. Short string variables
435 can be used in many procedures where @dfn{long string variables} (those
436 with widths greater than 8) are not allowed.
438 Certain systems may consider strings longer than 8
439 characters to be short strings. Eight characters represents a minimum
440 figure for the maximum length of a short string.
443 Variables in the dictionary are arranged in a specific order.
444 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
447 Either reinitialized to 0 or spaces for each case, or left at its
448 existing value. @xref{LEAVE}.
450 @cindex missing values
451 @cindex values, missing
453 Optionally, up to three values, or a range of values, or a specific
454 value plus a range, can be specified as @dfn{user-missing values}.
455 There is also a @dfn{system-missing value} that is assigned to an
456 observation when there is no other obvious value for that observation.
457 Observations with missing values are automatically excluded from
458 analyses. User-missing values are actual data values, while the
459 system-missing value is not a value at all. @xref{Missing Observations}.
461 @cindex variable labels
462 @cindex labels, variable
464 A string that describes the variable. @xref{VARIABLE LABELS}.
467 @cindex labels, value
469 Optionally, these associate each possible value of the variable with a
470 string. @xref{VALUE LABELS}.
474 Display width, format, and (for numeric variables) number of decimal
475 places. This attribute does not affect how data are stored, just how
476 they are displayed. Example: a width of 8, with 2 decimal places.
477 @xref{PRINT FORMATS}.
481 Similar to print format, but used by certain commands that are
482 designed to write to binary files. @xref{WRITE FORMATS}.
485 @node System Variables, Sets of Variables, Attributes, Variables
486 @subsection Variables Automatically Defined by PSPP
487 @cindex system variables
488 @cindex variables, system
490 There are seven system variables. These are not like ordinary
491 variables because system variables are not always stored. They can be used only
492 in expressions. These system variables, whose values and output formats
493 cannot be modified, are described below.
496 @cindex @code{$CASENUM}
498 Case number of the case at the moment. This changes as cases are
503 Date the PSPP process was started, in format A9, following the
504 pattern @code{DD MMM YY}.
506 @cindex @code{$JDATE}
508 Number of days between 15 Oct 1582 and the time the PSPP process
511 @cindex @code{$LENGTH}
513 Page length, in lines, in format F11.
515 @cindex @code{$SYSMIS}
517 System missing value, in format F1.
521 Number of seconds between midnight 14 Oct 1582 and the time the active file
522 was read, in format F20.
524 @cindex @code{$WIDTH}
526 Page width, in characters, in format F3.
529 @node Sets of Variables, Input/Output Formats, System Variables, Variables
530 @subsection Lists of variable names
531 @cindex TO convention
532 @cindex convention, TO
534 To refer to a set of variables, list their names one after another.
535 Optionally, their names may be separated by commas. To include a
536 range of variables from the dictionary in the list, write the name of
537 the first and last variable in the range, separated by @code{TO}. For
538 instance, if the dictionary contains six variables with the names
539 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
540 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
541 variables @code{X2}, @code{GOAL}, and @code{MET}.
543 Commands that define variables, such as @cmd{DATA LIST}, give
544 @code{TO} an alternate meaning. With these commands, @code{TO} define
545 sequences of variables whose names end in consecutive integers. The
546 syntax is two identifiers that begin with the same root and end with
547 numbers, separated by @code{TO}. The syntax @code{X1 TO X5} defines 5
548 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
549 @code{X5}. The syntax @code{ITEM0008 TO ITEM0013} defines 6
550 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
551 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}. The syntaxes
552 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
554 After a set of variables has been defined with @cmd{DATA LIST} or
555 another command with this method, the same set can be referenced on
556 later commands using the same syntax.
558 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
559 @subsection Input and Output Formats
561 Data that PSPP inputs and outputs must have one of a number of formats.
562 These formats are described, in general, by a format specification of
563 the form @code{NAMEw.d}, where @var{name} is the
564 format name and @var{w} is a field width. @var{d} is the optional
565 desired number of decimal places, if appropriate. If @var{d} is not
566 included then it is assumed to be 0. Some formats do not allow @var{d}
569 When @cmd{DATA LIST} or another command specifies an input format,
570 that format is converted to an output format for the purposes of
571 @cmd{PRINT} and other data output commands. For most purposes, input
572 and output formats are the same; the salient differences are described
575 Below are listed the input and output formats supported by PSPP. If an
576 input format is mapped to a different output format by default, then
577 that mapping is indicated with @result{}. Each format has the listed
578 bounds on input width (iw) and output width (ow).
580 The standard numeric input and output formats are given in the following
584 @item Fw.d: 1 <= iw,ow <= 40
585 Standard decimal format with @var{d} decimal places. If the number is
586 too large to fit within the field width, it is expressed in scientific
587 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
588 the exponent. When used as an input format, scientific notation is
589 allowed but an E or an F must be used to introduce the exponent.
591 The default output format is the same as the input format, except if
592 @var{d} > 1. In that case the output @var{w} is always made to be at
595 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
596 For input this is equivalent to F format except that no E or F is
597 require to introduce the exponent. For output, produces scientific
598 notation in the form @code{1.2+34}. There are always at least two
599 digits given in the exponent.
601 The default output @var{w} is the largest of the input @var{w}, the
602 input @var{d} + 7, and 10. The default output @var{d} is the input
603 @var{d}, but at least 3.
605 @item COMMAw.d: 1 <= iw,ow <= 40
606 Equivalent to F format, except that groups of three digits are
607 comma-separated on output. If the number is too large to express in the
608 field width, then first commas are eliminated, then if there is still
609 not enough space the number is expressed in scientific notation given
610 that w >= 6. Commas are allowed and ignored when this is used as an
613 @item DOTw.d: 1 <= iw,ow <= 40
614 Equivalent to COMMA format except that the roles of comma and decimal
615 point are interchanged. However: If SET /DECIMAL=DOT is in effect, then
616 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
619 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
620 Equivalent to COMMA format, except that the number is prefixed by a
621 dollar sign (@samp{$}) if there is room. On input the value is allowed
622 to be prefixed by a dollar sign, which is ignored.
624 The default output @var{w} is the input @var{w}, but at least 2.
626 @item PCTw.d: 2 <= iw,ow <= 40
627 Equivalent to F format, except that the number is suffixed by a percent
628 sign (@samp{%}) if there is room. On input the value is allowed to be
629 suffixed by a percent sign, which is ignored.
631 The default output @var{w} is the input @var{w}, but at least 2.
633 @item Nw.d: 1 <= iw,ow <= 40
634 Only digits are allowed within the field width. The decimal point is
635 assumed to be @var{d} digits from the right margin.
637 The default output format is F with the same @var{w} and @var{d}, except
638 if @var{d} > 1. In that case the output @var{w} is always made to be at
641 @item Zw.d @result{} F: 1 <= iw,ow <= 40
642 Zoned decimal input. If you need to use this then you know how.
644 @item IBw.d @result{} F: 1 <= iw,ow <= 8
645 Integer binary format. The field is interpreted as a fixed-point
646 positive or negative binary number in two's-complement notation. The
647 location of the decimal point is implied. Endianness is the same as the
650 The default output format is F8.2 if @var{d} is 0. Otherwise it is F,
651 with output @var{w} as 9 + input @var{d} and output @var{d} as input
654 @item PIB @result{} F: 1 <= iw,ow <= 8
655 Positive integer binary format. The field is interpreted as a
656 fixed-point positive binary number. The location of the decimal point
657 is implied. Endianness is the same as the host machine.
659 The default output format follows the rules for IB format.
661 @item Pw.d @result{} F: 1 <= iw,ow <= 16
662 Binary coded decimal format. Each byte from left to right, except the
663 rightmost, represents two digits. The upper nibble of each byte is more
664 significant. The upper nibble of the final byte is the least
665 significant digit. The lower nibble of the final byte is the sign; a
666 value of D represents a negative sign and all other values are
667 considered positive. The decimal point is implied.
669 The default output format follows the rules for IB format.
671 @item PKw.d @result{} F: 1 <= iw,ow <= 16
672 Positive binary code decimal format. Same as P but the last byte is the
675 The default output format follows the rules for IB format.
677 @item RBw @result{} F: 2 <= iw,ow <= 8
679 Binary C architecture-dependent ``double'' format. For a standard
680 IEEE754 implementation @var{w} should be 8.
682 The default output format follows the rules for IB format.
684 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
685 PIB format encoded as textual hex digit pairs. @var{w} must be even.
687 The input width is mapped to a default output width as follows:
688 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
689 12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for
692 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
694 RB format encoded as textual hex digits pairs. @var{w} must be even.
696 The default output format is F8.2.
698 @item CCAw.d: 1 <= ow <= 40
699 @itemx CCBw.d: 1 <= ow <= 40
700 @itemx CCCw.d: 1 <= ow <= 40
701 @itemx CCDw.d: 1 <= ow <= 40
702 @itemx CCEw.d: 1 <= ow <= 40
704 User-defined custom currency formats. May not be used as an input
705 format. @xref{SET}, for more details.
708 The date and time numeric input and output formats accept a number of
709 possible formats. Before describing the formats themselves, some
710 definitions of the elements that make up their formats will be helpful:
714 All formats accept an optional white space leader.
717 An integer between 1 and 31 representing the day of month.
720 An integer representing a number of days.
723 One or more characters of white space or the following characters:
727 A month name in one of the following forms:
730 An integer between 1 and 12.
732 Roman numerals representing an integer between 1 and 12.
734 At least the first three characters of an English month name (January,
739 An integer year number between 1582 and 19999, or between 1 and 199.
740 Years between 1 and 199 will have 1900 added.
743 A single number with a year number in the first 2, 3, or 4 digits (as
744 above) and the day number within the year in the last 3 digits.
747 An integer between 1 and 4 representing a quarter.
750 The letter @samp{Q} or @samp{q}.
753 An integer between 1 and 53 representing a week within a year.
756 The letters @samp{wk} in any case.
759 At least one characters of white space or @samp{:} or @samp{.}.
762 An integer greater than 0 representing an hour.
765 An integer between 0 and 59 representing a minute within an hour.
768 Optionally, a time-delimiter followed by a real number representing a
772 An integer between 0 and 23 representing an hour within a day.
775 At least the first two characters of an English day word.
778 Any amount or no amount of white space.
781 An optional positive or negative sign.
784 All formats accept an optional white space trailer.
787 The date input formats are strung together from the above pieces. On
788 output, the date formats are always printed in a single canonical
789 manner, based on field width. The date input and output formats are
793 @item DATEw: 9 <= iw,ow <= 40
794 Date format. Input format: leader + day + date-delimiter +
795 month + date-delimiter + year + trailer. Output format: DD-MMM-YY for
796 @var{w} < 11, DD-MMM-YYYY otherwise.
798 @item EDATEw: 8 <= iw,ow <= 40
799 European date format. Input format same as DATE. Output format:
800 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
802 @item SDATEw: 8 <= iw,ow <= 40
803 Standard date format. Input format: leader + year + date-delimiter +
804 month + date-delimiter + day + trailer. Output format: YY/MM/DD for
805 @var{w} < 10, YYYY/MM/DD otherwise.
807 @item ADATEw: 8 <= iw,ow <= 40
808 American date format. Input format: leader + month + date-delimiter +
809 day + date-delimiter + year + trailer. Output format: MM/DD/YY for
810 @var{w} < 10, MM/DD/YYYY otherwise.
812 @item JDATEw: 5 <= iw,ow <= 40
813 Julian date format. Input format: leader + julian + trailer. Output
814 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
816 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
817 Quarter/year format. Input format: leader + quarter + q-delimiter +
818 year + trailer. Output format: @samp{Q Q YY}, where the first
819 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
822 @item MOYRw: 6 <= iw,ow <= 40
823 Month/year format. Input format: leader + month + date-delimiter + year
824 + trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
827 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
828 Week/year format. Input format: leader + week + wk-delimiter + year +
829 trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
832 @item DATETIMEw.d: 17 <= iw,ow <= 40
833 Date and time format. Input format: leader + day + date-delimiter +
834 month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
835 + minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
836 @var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
837 @var{d} > 0 then fractional seconds @samp{.SS} are added.
839 @item TIMEw.d: 5 <= iw,ow <= 40
840 Time format. Input format: leader + sign + spaces + hour +
841 time-delimiter + minute + opt-second. Output format: @samp{HH:MM}.
842 Seconds and fractional seconds are available with @var{w} of at least 8
843 and 10, respectively.
845 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
846 Time format with day count. Input format: leader + sign + spaces +
847 day-count + time-delimiter + hour + time-delimiter + minute +
848 opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional
849 seconds are available with @var{w} of at least 8 and 10, respectively.
851 @item WKDAYw: 2 <= iw,ow <= 40
852 A weekday as a number between 1 and 7, where 1 is Sunday. Input format:
853 leader + weekday + trailer. Output format: as many characters, in all
854 capital letters, of the English name of the weekday as will fit in the
857 @item MONTHw: 3 <= iw,ow <= 40
858 A month as a number between 1 and 12, where 1 is January. Input format:
859 leader + month + trailer. Output format: as many character, in all
860 capital letters, of the English name of the month as will fit in the
864 There are only two formats that may be used with string variables:
867 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
868 The entire field is treated as a string value.
870 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
871 The field is composed of characters in a string encoded as textual hex
874 The default output @var{w} is half the input @var{w}.
877 @node Scratch Variables, , Input/Output Formats, Variables
878 @subsection Scratch Variables
880 Most of the time, variables don't retain their values between cases.
881 Instead, either they're being read from a data file or the active file,
882 in which case they assume the value read, or, if created with
884 another transformation, they're initialized to the system-missing value
885 or to blanks, depending on type.
887 However, sometimes it's useful to have a variable that keeps its value
888 between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
889 use a @dfn{scratch variable}. Scratch variables are variables whose
890 names begin with an octothorpe (@samp{#}).
892 Scratch variables have the same properties as variables left with
893 @cmd{LEAVE}: they retain their values between cases, and for the first
894 case they are initialized to 0 or blanks. They have the additional
895 property that they are deleted before the execution of any procedure.
896 For this reason, scratch variables can't be used for analysis. To use
897 a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
898 to copy its value into an ordinary variable, then use that ordinary
899 variable in the analysis.
901 @node Files, BNF, Variables, Language
902 @section Files Used by PSPP
904 PSPP makes use of many files each time it runs. Some of these it
905 reads, some it writes, some it creates. Here is a table listing the
906 most important of these files:
909 @cindex file, command
910 @cindex file, syntax file
915 These names (synonyms) refer to the file that contains instructions
916 that tell PSPP what to do. The syntax file's name is specified on
917 the PSPP command line. Syntax files can also be pulled in with
918 @cmd{INCLUDE} (@pxref{INCLUDE}).
923 Data files contain raw data in ASCII format suitable for being read in
924 by @cmd{DATA LIST}. Data can be embedded in the syntax
925 file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
926 syntax file a data file too.
931 One or more output files are created by PSPP each time it is
932 run. The output files receive the tables and charts produced by
933 statistical procedures. The output files may be in any number of formats,
934 depending on how PSPP is configured.
939 The active file is the ``file'' on which all PSPP procedures
940 are performed. The active file contains variable definitions and
941 cases. The active file is not necessarily a disk file: it is stored
942 in memory if there is room.
945 @node BNF, , Files, Language
946 @section Backus-Naur Form
948 @cindex Backus-Naur Form
949 @cindex command syntax, description of
950 @cindex description of command syntax
952 The syntax of some parts of the PSPP language is presented in this
953 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
954 following table describes BNF:
960 Words in all-uppercase are PSPP keyword tokens. In BNF, these are
961 often called @dfn{terminals}. There are some special terminals, which
962 are written in lowercase for clarity:
965 @cindex @code{number}
969 @cindex @code{integer}
973 @cindex @code{string}
977 @cindex @code{var-name}
978 @item @code{var-name}
979 A single variable name.
983 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
984 Operators and punctuators.
988 The end of the command. This is not necessarily an actual dot in the
989 syntax file: @xref{Commands}, for more details.
995 Other words in all lowercase refer to BNF definitions, called
996 @dfn{productions}. These productions are also known as
997 @dfn{nonterminals}. Some nonterminals are very common, so they are
998 defined here in English for clarity:
1001 @cindex @code{var-list}
1003 A list of one or more variable names or the keyword @code{ALL}.
1005 @cindex @code{expression}
1007 An expression. @xref{Expressions}, for details.
1011 @cindex ``is defined as''
1013 @samp{::=} means ``is defined as''. The left side of @samp{::=} gives
1014 the name of the nonterminal being defined. The right side of @samp{::=}
1015 gives the definition of that nonterminal. If the right side is empty,
1016 then one possible expansion of that nonterminal is nothing. A BNF
1017 definition is called a @dfn{production}.
1020 @cindex terminals and nonterminals, differences
1021 So, the key difference between a terminal and a nonterminal is that a
1022 terminal cannot be broken into smaller parts---in fact, every terminal
1023 is a single token (@pxref{Tokens}). On the other hand, nonterminals are
1024 composed of a (possibly empty) sequence of terminals and nonterminals.
1025 Thus, terminals indicate the deepest level of syntax description. (In
1026 parsing theory, terminals are the leaves of the parse tree; nonterminals
1030 @cindex start symbol
1031 @cindex symbol, start
1032 The first nonterminal defined in a set of productions is called the
1033 @dfn{start symbol}. The start symbol defines the entire syntax for
1036 @setfilename ignored