1 @node Language, Expressions, Invocation, Top
2 @chapter The PSPP language
7 @strong{Please note:} PSPP is not even close to completion.
8 Only a few actual statistical procedures are implemented. PSPP
12 This chapter discusses elements common to many PSPP commands.
13 Later chapters will describe individual commands in detail.
16 * Tokens:: Characters combine to form tokens.
17 * Commands:: Tokens combine to form commands.
18 * Types of Commands:: Commands come in several flavors.
19 * Order of Commands:: Commands combine to form syntax files.
20 * Missing Observations:: Handling missing observations.
21 * Variables:: The unit of data storage.
22 * Files:: Files used by PSPP.
23 * BNF:: How command syntax is described.
26 @node Tokens, Commands, Language, Language
28 @cindex language, lexical analysis
29 @cindex language, tokens
31 @cindex lexical analysis
33 PSPP divides most syntax file lines into series of short chunks
35 Tokens are then grouped to form commands, each of which tells
36 PSPP to take some action---read in data, write out data, perform
37 a statistical procedure, etc. Each type of token is
43 Identifiers are names that typically specify variables, commands, or
44 subcommands. The first character in an identifier must be a letter,
45 @samp{#}, or @samp{@@}. The remaining characters in the identifier
46 must be letters, digits, or one of the following special characters:
52 @cindex case-sensitivity
53 Identifiers may be up any length, but only the first 64 bytes are
54 significant. Identifiers are not case-sensitive: @code{foobar},
55 @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
56 different representations of the same identifier.
58 @cindex identifiers, reserved
59 @cindex reserved identifiers
60 Some identifiers are reserved. Reserved identifiers may not be used
61 in any context besides those explicitly described in this manual. The
62 reserved identifiers are:
65 @center ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
69 Keywords are a subclass of identifiers that form a fixed part of
70 command syntax. For example, command and subcommand names are
71 keywords. Keywords may be abbreviated to their first 3 characters if
72 this abbreviation is unambiguous. (Unique abbreviations of 3 or more
73 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
74 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
76 Reserved identifiers are always used as keywords. Other identifiers
77 may be used both as keywords and as user-defined identifiers, such as
84 Numbers are expressed in decimal. A decimal point is optional.
85 Numbers may be expressed in scientific notation by adding @samp{e} and
86 a base-10 exponent, so that @samp{1.234e3} has the value 1234. Here
87 are some more examples of valid numbers:
90 -5 3.14159265359 1e100 -.707 8945.
93 Negative numbers are expressed with a @samp{-} prefix. However, in
94 situations where a literal @samp{-} token is expected, what appears to
95 be a negative number is treated as @samp{-} followed by a positive
98 No white space is allowed within a number token, except for horizontal
99 white space between @samp{-} and the rest of the number.
101 The last example above, @samp{8945.} will be interpreted as two
102 tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
103 @xref{Commands, , Forming commands of tokens}.
109 @cindex case-sensitivity
110 Strings are literal sequences of characters enclosed in pairs of
111 single quotes (@samp{'}) or double quotes (@samp{"}). To include the
112 character used for quoting in the string, double it, e.g.@:
113 @samp{'it''s an apostrophe'}. White space and case of letters are
114 significant inside strings.
116 Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
117 'c'} is equivalent to @samp{'abc'}. Concatenation is useful for
118 splitting a single string across multiple source lines. The maximum
119 length of a string, after concatenation, is 255 characters.
121 Strings may also be expressed as hexadecimal, octal, or binary
122 character values by prefixing the initial quote character by @samp{X},
123 @samp{O}, or @samp{B} or their lowercase equivalents. Each pair,
124 triplet, or octet of characters, according to the radix, is
125 transformed into a single character with the given value. If there is
126 an incomplete group of characters, the missing final digits are
127 assumed to be @samp{0}. These forms of strings are nonportable
128 because numeric values are associated with different characters by
129 different operating systems. Therefore, their use should be confined
130 to syntax files that will not be widely distributed.
132 @cindex characters, reserved
135 The character with value 00 is reserved for
136 internal use by PSPP. Its use in strings causes an error and
137 replacement by a space character.
139 @item Punctuators and Operators
142 These tokens are the punctuators and operators:
145 @center , / = ( ) + - * / ** < <= <> > >= ~= & | .
148 Most of these appear within the syntax of commands, but the period
149 (@samp{.}) punctuator is used only at the end of a command. It is a
150 punctuator only as the last character on a line (except white space).
151 When it is the last non-space character on a line, a period is not
152 treated as part of another token, even if it would otherwise be part
153 of e.g.@: an identifier or a floating-point number.
155 Actually, the character that ends a command can be changed with
156 @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
157 doing so. Throughout the remainder of this manual we will assume that
158 the default setting is in effect.
161 @node Commands, Types of Commands, Tokens, Language
162 @section Forming commands of tokens
164 @cindex PSPP, command structure
165 @cindex language, command structure
166 @cindex commands, structure
168 Most PSPP commands share a common structure. A command begins with a
169 command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
170 CASES}. The command name may be abbreviated to its first word, and
171 each word in the command name may be abbreviated to its first three
172 or more characters, where these abbreviations are unambiguous.
174 The command name may be followed by one or more @dfn{subcommands}.
175 Each subcommand begins with a subcommand name, which may be
176 abbreviated to its first three letters. Some subcommands accept a
177 series of one or more specifications, which follow the subcommand name
178 and, optionally separated from it by an equals sign (@samp{=}), and
179 optionally separated from each other by commas. Each subcommand must
180 be separated from the next (if any) by a forward slash (@samp{/}).
182 There are multiple ways to mark the end of a command. The most common
183 way is to end the last line of the command with a period (@samp{.}) as
184 described in the previous section (@pxref{Tokens}). A blank line, or
185 one that consists only of white space or comments, also ends a command
186 by default, although you can use the NULLINE subcommand of @cmd{SET}
187 to disable this feature (@pxref{SET}).
189 In batch mode only, that is, when reading commands from a file instead
190 of an interactive user, any line that contains a non-space character
191 in the leftmost column begins a new command. Thus, each command
192 consists of a flush-left line followed by any number of lines indented
193 from the left margin. In this mode, a plus sign, minus sign, or
194 period (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character
195 in a line is ignored and causes that line to begin a new command,
196 which allows for visual indentation of a command without that command
197 being considered part of the previous command.
199 Sometimes, one encounters syntax files that are intended to be
200 interpreted in interactive mode rather than batch mode. When this
201 occurs, use the @samp{-i} command line option to force interpretation
202 in interactive mode (@pxref{Language control options}).
204 @node Types of Commands, Order of Commands, Commands, Language
205 @section Types of Commands
207 Commands in PSPP are divided roughly into six categories:
210 @item Utility commands
211 @cindex utility commands
212 Set or display various global options that affect PSPP operations.
213 May appear anywhere in a syntax file. @xref{Utilities, , Utility
216 @item File definition commands
217 @cindex file definition commands
218 Give instructions for reading data from text files or from special
219 binary ``system files''. Most of these commands discard any previous
220 data or variables to replace it with the new data and
221 variables. At least one must appear before the first command in any of
222 the categories below. @xref{Data Input and Output}.
224 @item Input program commands
225 @cindex input program commands
226 Though rarely used, these provide powerful tools for reading data files
227 in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
229 @item Transformations
230 @cindex transformations
231 Perform operations on data and write data to output files. Transformations
232 are not carried out until a procedure is executed.
234 @item Restricted transformations
235 @cindex restricted transformations
236 Transformations that cannot appear in certain contexts. @xref{Order
237 of Commands}, for details.
241 Analyze data, writing results of analyses to the listing file. Cause
242 transformations specified earlier in the file to be performed. In a
243 more general sense, a @dfn{procedure} is any command that causes the
244 active file (the data) to be read.
247 @node Order of Commands, Missing Observations, Types of Commands, Language
248 @section Order of Commands
249 @cindex commands, ordering
250 @cindex order of commands
252 PSPP does not place many restrictions on ordering of commands. The
253 main restriction is that variables must be defined they are otherwise
254 referenced. This section describes the details of command ordering,
255 but most users will have no need to refer to them.
257 PSPP possesses five internal states, called initial, INPUT PROGRAM,
258 FILE TYPE, transformation, and procedure states. (Please note the
259 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
260 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
262 PSPP starts up in the initial state. Each successful completion
263 of a command may cause a state transition. Each type of command has its
264 own rules for state transitions:
267 @item Utility commands
272 Do not cause state transitions. Exception: when @cmd{N OF CASES}
273 is executed in the procedure state, it causes a transition to the
274 transformation state.
277 @item @cmd{DATA LIST}
282 When executed in the initial or procedure state, causes a transition to
283 the transformation state.
285 Clears the active file if executed in the procedure or transformation
289 @item @cmd{INPUT PROGRAM}
292 Invalid in INPUT PROGRAM and FILE TYPE states.
294 Causes a transition to the INPUT PROGRAM state.
296 Clears the active file.
299 @item @cmd{FILE TYPE}
302 Invalid in INPUT PROGRAM and FILE TYPE states.
304 Causes a transition to the FILE TYPE state.
306 Clears the active file.
309 @item Other file definition commands
312 Invalid in INPUT PROGRAM and FILE TYPE states.
314 Cause a transition to the transformation state.
316 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
320 @item Transformations
323 Invalid in initial and FILE TYPE states.
325 Cause a transition to the transformation state.
328 @item Restricted transformations
331 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
333 Cause a transition to the transformation state.
339 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
341 Cause a transition to the procedure state.
345 @node Missing Observations, Variables, Order of Commands, Language
346 @section Handling missing observations
347 @cindex missing values
348 @cindex values, missing
350 PSPP includes special support for unknown numeric data values.
351 Missing observations are assigned a special value, called the
352 @dfn{system-missing value}. This ``value'' actually indicates the
353 absence of a value; it means that the actual value is unknown. Procedures
354 automatically exclude from analyses those observations or cases that
355 have missing values. Details of missing value exclusion depend on the
356 procedure and can often be controlled by the user; refer to
357 descriptions of individual procedures for details.
359 The system-missing value exists only for numeric variables. String
360 variables always have a defined value, even if it is only a string of
363 Variables, whether numeric or string, can have designated
364 @dfn{user-missing values}. Every user-missing value is an actual value
365 for that variable. However, most of the time user-missing values are
366 treated in the same way as the system-missing value. String variables
367 that are wider than a certain width, usually 8 characters (depending on
368 computer architecture), cannot have user-missing values.
370 For more information on missing values, see the following sections:
371 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
372 documentation on individual procedures for information on how they
373 handle missing values.
375 @node Variables, Files, Missing Observations, Language
380 Variables are the basic unit of data storage in PSPP. All the
381 variables in a file taken together, apart from any associated data, are
382 said to form a @dfn{dictionary}.
383 Some details of variables are described in the sections below.
386 * Attributes:: Attributes of variables.
387 * System Variables:: Variables automatically defined by PSPP.
388 * Sets of Variables:: Lists of variable names.
389 * Input/Output Formats:: Input and output formats.
390 * Scratch Variables:: Variables deleted by procedures.
393 @node Attributes, System Variables, Variables, Variables
394 @subsection Attributes of Variables
395 @cindex variables, attributes of
396 @cindex attributes of variables
397 Each variable has a number of attributes, including:
401 An identifier, up to 64 bytes long. Each variable must have a different name.
404 Some system variable names begin with @samp{$}, but user-defined
405 variables' names may not begin with @samp{$}.
409 @cindex variable names, ending with period
410 The final character in a variable name should not be @samp{.}, because
411 such an identifier will be misinterpreted when it is the final token
412 on a line: @code{FOO.} will be divided into two separate tokens,
413 @samp{FOO} and @samp{.}, indicating end-of-command. @xref{Tokens}.
416 The final character in a variable name should not be @samp{_}, because
417 some such identifiers are used for special purposes by PSPP
420 As with all PSPP identifiers, variable names are not case-sensitive.
421 PSPP capitalizes variable names on output the same way they were
422 capitalized at their point of definition in the input.
424 @cindex variables, type
425 @cindex type of variables
429 @cindex variables, width
430 @cindex width of variables
432 (string variables only) String variables with a width of 8 characters or
433 fewer are called @dfn{short string variables}. Short string variables
434 can be used in many procedures where @dfn{long string variables} (those
435 with widths greater than 8) are not allowed.
437 Certain systems may consider strings longer than 8
438 characters to be short strings. Eight characters represents a minimum
439 figure for the maximum length of a short string.
442 Variables in the dictionary are arranged in a specific order.
443 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
446 Either reinitialized to 0 or spaces for each case, or left at its
447 existing value. @xref{LEAVE}.
449 @cindex missing values
450 @cindex values, missing
452 Optionally, up to three values, or a range of values, or a specific
453 value plus a range, can be specified as @dfn{user-missing values}.
454 There is also a @dfn{system-missing value} that is assigned to an
455 observation when there is no other obvious value for that observation.
456 Observations with missing values are automatically excluded from
457 analyses. User-missing values are actual data values, while the
458 system-missing value is not a value at all. @xref{Missing Observations}.
460 @cindex variable labels
461 @cindex labels, variable
463 A string that describes the variable. @xref{VARIABLE LABELS}.
466 @cindex labels, value
468 Optionally, these associate each possible value of the variable with a
469 string. @xref{VALUE LABELS}.
473 Display width, format, and (for numeric variables) number of decimal
474 places. This attribute does not affect how data are stored, just how
475 they are displayed. Example: a width of 8, with 2 decimal places.
476 @xref{PRINT FORMATS}.
480 Similar to print format, but used by certain commands that are
481 designed to write to binary files. @xref{WRITE FORMATS}.
484 @node System Variables, Sets of Variables, Attributes, Variables
485 @subsection Variables Automatically Defined by PSPP
486 @cindex system variables
487 @cindex variables, system
489 There are seven system variables. These are not like ordinary
490 variables, as they are not stored in each case. They can only be used
491 in expressions. These system variables, whose values and output formats
492 cannot be modified, are described below.
495 @cindex @code{$CASENUM}
497 Case number of the case at the moment. This changes as cases are
502 Date the PSPP process was started, in format A9, following the
503 pattern @code{DD MMM YY}.
505 @cindex @code{$JDATE}
507 Number of days between 15 Oct 1582 and the time the PSPP process
510 @cindex @code{$LENGTH}
512 Page length, in lines, in format F11.
514 @cindex @code{$SYSMIS}
516 System missing value, in format F1.
520 Number of seconds between midnight 14 Oct 1582 and the time the active file
521 was read, in format F20.
523 @cindex @code{$WIDTH}
525 Page width, in characters, in format F3.
528 @node Sets of Variables, Input/Output Formats, System Variables, Variables
529 @subsection Lists of variable names
530 @cindex TO convention
531 @cindex convention, TO
533 To refer to a set of variables, list their names one after another.
534 Optionally, their names may be separated by commas. To include a
535 range of variables from the dictionary in the list, write the name of
536 the first and last variable in the range, separated by @code{TO}. For
537 instance, if the dictionary contains six variables with the names
538 @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
539 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
540 variables @code{X2}, @code{GOAL}, and @code{MET}.
542 Commands that define variables, such as @cmd{DATA LIST}, give
543 @code{TO} an alternate meaning. With these commands, @code{TO} define
544 sequences of variables whose names end in consecutive integers. The
545 syntax is two identifiers that begin with the same root and end with
546 numbers, separated by @code{TO}. The syntax @code{X1 TO X5} defines 5
547 variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
548 @code{X5}. The syntax @code{ITEM0008 TO ITEM0013} defines 6
549 variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
550 @code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}. The syntaxes
551 @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.
553 After a set of variables has been defined with @cmd{DATA LIST} or
554 another command with this method, the same set can be referenced on
555 later commands using the same syntax.
557 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
558 @subsection Input and Output Formats
560 Data that PSPP inputs and outputs must have one of a number of formats.
561 These formats are described, in general, by a format specification of
562 the form @code{NAMEw.d}, where @var{name} is the
563 format name and @var{w} is a field width. @var{d} is the optional
564 desired number of decimal places, if appropriate. If @var{d} is not
565 included then it is assumed to be 0. Some formats do not allow @var{d}
568 When an input format is specified on @cmd{DATA LIST} or another
570 it is converted to an output format for the purposes of @cmd{PRINT}
572 data output commands. For most purposes, input and output formats are
573 the same; the salient differences are described below.
575 Below are listed the input and output formats supported by PSPP. If an
576 input format is mapped to a different output format by default, then
577 that mapping is indicated with @result{}. Each format has the listed
578 bounds on input width (iw) and output width (ow).
580 The standard numeric input and output formats are given in the following
584 @item Fw.d: 1 <= iw,ow <= 40
585 Standard decimal format with @var{d} decimal places. If the number is
586 too large to fit within the field width, it is expressed in scientific
587 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
588 the exponent. When used as an input format, scientific notation is
589 allowed but an E or an F must be used to introduce the exponent.
591 The default output format is the same as the input format, except if
592 @var{d} > 1. In that case the output @var{w} is always made to be at
595 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
596 For input this is equivalent to F format except that no E or F is
597 require to introduce the exponent. For output, produces scientific
598 notation in the form @code{1.2+34}. There are always at least two
599 digits given in the exponent.
601 The default output @var{w} is the largest of the input @var{w}, the
602 input @var{d} + 7, and 10. The default output @var{d} is the input
603 @var{d}, but at least 3.
605 @item COMMAw.d: 1 <= iw,ow <= 40
606 Equivalent to F format, except that groups of three digits are
607 comma-separated on output. If the number is too large to express in the
608 field width, then first commas are eliminated, then if there is still
609 not enough space the number is expressed in scientific notation given
610 that w >= 6. Commas are allowed and ignored when this is used as an
613 @item DOTw.d: 1 <= iw,ow <= 40
614 Equivalent to COMMA format except that the roles of comma and decimal
615 point are interchanged. However: If SET /DECIMAL=DOT is in effect, then
616 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
619 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
620 Equivalent to COMMA format, except that the number is prefixed by a
621 dollar sign (@samp{$}) if there is room. On input the value is allowed
622 to be prefixed by a dollar sign, which is ignored.
624 The default output @var{w} is the input @var{w}, but at least 2.
626 @item PCTw.d: 2 <= iw,ow <= 40
627 Equivalent to F format, except that the number is suffixed by a percent
628 sign (@samp{%}) if there is room. On input the value is allowed to be
629 suffixed by a percent sign, which is ignored.
631 The default output @var{w} is the input @var{w}, but at least 2.
633 @item Nw.d: 1 <= iw,ow <= 40
634 Only digits are allowed within the field width. The decimal point is
635 assumed to be @var{d} digits from the right margin.
637 The default output format is F with the same @var{w} and @var{d}, except
638 if @var{d} > 1. In that case the output @var{w} is always made to be at
641 @item Zw.d @result{} F: 1 <= iw,ow <= 40
642 Zoned decimal input. If you need to use this then you know how.
644 @item IBw.d @result{} F: 1 <= iw,ow <= 8
645 Integer binary format. The field is interpreted as a fixed-point
646 positive or negative binary number in two's-complement notation. The
647 location of the decimal point is implied. Endianness is the same as the
650 The default output format is F8.2 if @var{d} is 0. Otherwise it is F,
651 with output @var{w} as 9 + input @var{d} and output @var{d} as input
654 @item PIB @result{} F: 1 <= iw,ow <= 8
655 Positive integer binary format. The field is interpreted as a
656 fixed-point positive binary number. The location of the decimal point
657 is implied. Endianness is teh same as the host machine.
659 The default output format follows the rules for IB format.
661 @item Pw.d @result{} F: 1 <= iw,ow <= 16
662 Binary coded decimal format. Each byte from left to right, except the
663 rightmost, represents two digits. The upper nibble of each byte is more
664 significant. The upper nibble of the final byte is the least
665 significant digit. The lower nibble of the final byte is the sign; a
666 value of D represents a negative sign and all other values are
667 considered positive. The decimal point is implied.
669 The default output format follows the rules for IB format.
671 @item PKw.d @result{} F: 1 <= iw,ow <= 16
672 Positive binary code decimal format. Same as P but the last byte is the
675 The default output format follows the rules for IB format.
677 @item RBw @result{} F: 2 <= iw,ow <= 8
679 Binary C architecture-dependent ``double'' format. For a standard
680 IEEE754 implementation @var{w} should be 8.
682 The default output format follows the rules for IB format.
684 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
685 PIB format encoded as textual hex digit pairs. @var{w} must be even.
687 The input width is mapped to a default output width as follows:
688 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
689 12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for
692 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
694 RB format encoded as textual hex digits pairs. @var{w} must be even.
696 The default output format is F8.2.
698 @item CCAw.d: 1 <= ow <= 40
699 @itemx CCBw.d: 1 <= ow <= 40
700 @itemx CCCw.d: 1 <= ow <= 40
701 @itemx CCDw.d: 1 <= ow <= 40
702 @itemx CCEw.d: 1 <= ow <= 40
704 User-defined custom currency formats. May not be used as an input
705 format. @xref{SET}, for more details.
708 The date and time numeric input and output formats accept a number of
709 possible formats. Before describing the formats themselves, some
710 definitions of the elements that make up their formats will be helpful:
714 All formats accept an optional white space leader.
717 An integer between 1 and 31 representing the day of month.
720 An integer representing a number of days.
723 One or more characters of white space or the following characters:
727 A month name in one of the following forms:
730 An integer between 1 and 12.
732 Roman numerals representing an integer between 1 and 12.
734 At least the first three characters of an English month name (January,
739 An integer year number between 1582 and 19999, or between 1 and 199.
740 Years between 1 and 199 will have 1900 added.
743 A single number with a year number in the first 2, 3, or 4 digits (as
744 above) and the day number within the year in the last 3 digits.
747 An integer between 1 and 4 representing a quarter.
750 The letter @samp{Q} or @samp{q}.
753 An integer between 1 and 53 representing a week within a year.
756 The letters @samp{wk} in any case.
759 At least one characters of white space or @samp{:} or @samp{.}.
762 An integer greater than 0 representing an hour.
765 An integer between 0 and 59 representing a minute within an hour.
768 Optionally, a time-delimiter followed by a real number representing a
772 An integer between 0 and 23 representing an hour within a day.
775 At least the first two characters of an English day word.
778 Any amount or no amount of white space.
781 An optional positive or negative sign.
784 All formats accept an optional white space trailer.
787 The date input formats are strung together from the above pieces. On
788 output, the date formats are always printed in a single canonical
789 manner, based on field width. The date input and output formats are
793 @item DATEw: 9 <= iw,ow <= 40
794 Date format. Input format: leader + day + date-delimiter +
795 month + date-delimiter + year + trailer. Output format: DD-MMM-YY for
796 @var{w} < 11, DD-MMM-YYYY otherwise.
798 @item EDATEw: 8 <= iw,ow <= 40
799 European date format. Input format same as DATE. Output format:
800 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
802 @item SDATEw: 8 <= iw,ow <= 40
803 Standard date format. Input format: leader + year + date-delimiter +
804 month + date-delimiter + day + trailer. Output format: YY/MM/DD for
805 @var{w} < 10, YYYY/MM/DD otherwise.
807 @item ADATEw: 8 <= iw,ow <= 40
808 American date format. Input format: leader + month + date-delimiter +
809 day + date-delimiter + year + trailer. Output format: MM/DD/YY for
810 @var{w} < 10, MM/DD/YYYY otherwise.
812 @item JDATEw: 5 <= iw,ow <= 40
813 Julian date format. Input format: leader + julian + trailer. Output
814 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
816 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
817 Quarter/year format. Input format: leader + quarter + q-delimiter +
818 year + trailer. Output format: @samp{Q Q YY}, where the first
819 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
822 @item MOYRw: 6 <= iw,ow <= 40
823 Month/year format. Input format: leader + month + date-delimiter + year
824 + trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
827 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
828 Week/year format. Input format: leader + week + wk-delimiter + year +
829 trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
832 @item DATETIMEw.d: 17 <= iw,ow <= 40
833 Date and time format. Input format: leader + day + date-delimiter +
834 month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
835 + minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
836 @var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
837 @var{d} > 0 then fractional seconds @samp{.SS} are added.
839 @item TIMEw.d: 5 <= iw,ow <= 40
840 Time format. Input format: leader + sign + spaces + hour +
841 time-delimiter + minute + opt-second. Output format: @samp{HH:MM}.
842 Seconds and fractional seconds are available with @var{w} of at least 8
843 and 10, respectively.
845 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
846 Time format with day count. Input format: leader + sign + spaces +
847 day-count + time-delimiter + hour + time-delimiter + minute +
848 opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional
849 seconds are available with @var{w} of at least 8 and 10, respectively.
851 @item WKDAYw: 2 <= iw,ow <= 40
852 A weekday as a number between 1 and 7, where 1 is Sunday. Input format:
853 leader + weekday + trailer. Output format: as many characters, in all
854 capital letters, of the English name of the weekday as will fit in the
857 @item MONTHw: 3 <= iw,ow <= 40
858 A month as a number between 1 and 12, where 1 is January. Input format:
859 leader + month + trailer. Output format: as many character, in all
860 capital letters, of the English name of the month as will fit in the
864 There are only two formats that may be used with string variables:
867 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
868 The entire field is treated as a string value.
870 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
871 The field is composed of characters in a string encoded as textual hex
874 The default output @var{w} is half the input @var{w}.
877 @node Scratch Variables, , Input/Output Formats, Variables
878 @subsection Scratch Variables
880 Most of the time, variables don't retain their values between cases.
881 Instead, either they're being read from a data file or the active file,
882 in which case they assume the value read, or, if created with
884 another transformation, they're initialized to the system-missing value
885 or to blanks, depending on type.
887 However, sometimes it's useful to have a variable that keeps its value
888 between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
889 use a @dfn{scratch variable}. Scratch variables are variables whose
890 names begin with an octothorpe (@samp{#}).
892 Scratch variables have the same properties as variables left with
894 they retain their values between cases, and for the first case they are
895 initialized to 0 or blanks. They have the additional property that they
896 are deleted before the execution of any procedure. For this reason,
897 scratch variables can't be used for analysis. To obtain the same
898 effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
899 value into an ordinary variable, then analysis that variable.
901 @node Files, BNF, Variables, Language
902 @section Files Used by PSPP
904 PSPP makes use of many files each time it runs. Some of these it
905 reads, some it writes, some it creates. Here is a table listing the
906 most important of these files:
909 @cindex file, command
910 @cindex file, syntax file
915 These names (synonyms) refer to the file that contains instructions to
916 PSPP that tell it what to do. The syntax file's name is specified on
917 the PSPP command line. Syntax files can also be pulled in with
918 @cmd{INCLUDE} (@pxref{INCLUDE}).
923 Data files contain raw data in ASCII format suitable for being read in
924 by @cmd{DATA LIST}. Data can be embedded in the syntax
925 file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
926 syntax file a data file too.
931 One or more output files are created by PSPP each time it is
932 run. The output files receive the tables and charts produced by
933 statistical procedures. The output files may be in any number of formats,
934 depending on how PSPP is configured.
939 The active file is the ``file'' on which all PSPP procedures
940 are performed. The active file contains variable definitions and
941 cases. The active file is not necessarily a disk file: it is stored
942 in memory if there is room.
945 @node BNF, , Files, Language
946 @section Backus-Naur Form
948 @cindex Backus-Naur Form
949 @cindex command syntax, description of
950 @cindex description of command syntax
952 The syntax of some parts of the PSPP language is presented in this
953 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
954 following table describes BNF:
960 Words in all-uppercase are PSPP keyword tokens. In BNF, these are
961 often called @dfn{terminals}. There are some special terminals, which
962 are actually written in lowercase for clarity:
965 @cindex @code{number}
969 @cindex @code{integer}
973 @cindex @code{string}
977 @cindex @code{var-name}
978 @item @code{var-name}
979 A single variable name.
983 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
984 Operators and punctuators.
988 The end of the command. This is not necessarily an actual dot in the
989 syntax file: @xref{Commands}, for more details.
995 Other words in all lowercase refer to BNF definitions, called
996 @dfn{productions}. These productions are also known as
997 @dfn{nonterminals}. Some nonterminals are very common, so they are
998 defined here in English for clarity:
1001 @cindex @code{var-list}
1003 A list of one or more variable names or the keyword @code{ALL}.
1005 @cindex @code{expression}
1007 An expression. @xref{Expressions}, for details.
1012 @cindex ``is defined as''
1014 @samp{::=} means ``is defined as''. The left side of @samp{::=} gives
1015 the name of the nonterminal being defined. The right side of @samp{::=}
1016 gives the definition of that nonterminal. If the right side is empty,
1017 then one possible expansion of that nonterminal is nothing. A BNF
1018 definition is called a @dfn{production}.
1021 @cindex terminals and nonterminals, differences
1022 So, the key difference between a terminal and a nonterminal is that a
1023 terminal cannot be broken into smaller parts---in fact, every terminal
1024 is a single token (@pxref{Tokens}). On the other hand, nonterminals are
1025 composed of a (possibly empty) sequence of terminals and nonterminals.
1026 Thus, terminals indicate the deepest level of syntax description. (In
1027 parsing theory, terminals are the leaves of the parse tree; nonterminals
1031 @cindex start symbol
1032 @cindex symbol, start
1033 The first nonterminal defined in a set of productions is called the
1034 @dfn{start symbol}. The start symbol defines the entire syntax for
1037 @setfilename ignored