1 @node Language, Expressions, Invocation, Top
2 @chapter The PSPP language
7 @strong{Please note:} PSPP is not even close to completion.
8 Only a few actual statistical procedures are implemented. PSPP
12 This chapter discusses elements common to many PSPP commands.
13 Later chapters will describe individual commands in detail.
16 * Tokens:: Characters combine to form tokens.
17 * Commands:: Tokens combine to form commands.
18 * Types of Commands:: Commands come in several flavors.
19 * Order of Commands:: Commands combine to form syntax files.
20 * Missing Observations:: Handling missing observations.
21 * Variables:: The unit of data storage.
22 * Files:: Files used by PSPP.
23 * BNF:: How command syntax is described.
26 @node Tokens, Commands, Language, Language
28 @cindex language, lexical analysis
29 @cindex language, tokens
31 @cindex lexical analysis
34 PSPP divides most syntax file lines into series of short chunks
35 called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}. These
36 tokens are then grouped to form commands, each of which tells
37 PSPP to take some action---read in data, write out data, perform
38 a statistical procedure, etc. The process of dividing input into tokens
39 is @dfn{tokenization}, or @dfn{lexical analysis}. Each type of token is
44 Tokens must be separated from each other by @dfn{delimiters}.
45 Delimiters include whitespace (spaces, tabs, carriage returns, line
46 feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
47 operators (plus, minus, times, divide, etc.) Note that while whitespace
48 only separates tokens, other delimiters are tokens in themselves.
53 Identifiers are names that specify variable names, commands, or command
58 The first character in an identifier must be a letter, @samp{#}, or
59 @samp{@@}. Some system identifiers begin with @samp{$}, but
60 user-defined variables' names may not begin with @samp{$}.
63 The remaining characters in the identifier must be letters, digits, or
64 one of the following special characters:
71 @cindex variable names
72 @cindex names, variable
73 Variable names may be up any length up to 64 bytes long.
77 @cindex case-sensitivity
78 Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
79 @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
80 representations of the same identifier.
84 Identifiers other than variable names may be abbreviated to their first
85 3 characters if this abbreviation is unambiguous. These identifiers are
86 often called @dfn{keywords}. (Unique abbreviations of 3 or more
87 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
88 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
91 Whether an identifier is a keyword depends on the context.
94 @cindex keywords, reserved
95 @cindex reserved keywords
96 Some keywords are reserved. These keywords may not be used in any
97 context besides those explicitly described in this manual. The reserved
101 ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
105 Since keywords are identifiers, all the rules for identifiers apply.
106 Specifically, they must be delimited as are other identifiers:
107 @code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
113 @cindex variable names, ending with period
114 @strong{Caution:} It is legal to end a variable name with a period, but
115 @emph{don't do it!} The variable name will be misinterpreted when it is
116 the final token on a line: @code{FOO.} will be divided into two separate
117 tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
118 @xref{Commands, , Forming commands of tokens}.
124 Numbers may be specified as integers or reals. Integers are internally
125 converted into reals. Scientific notation is not supported. Here are
126 some examples of valid numbers:
129 1234 3.14159265359 .707106781185 8945.
132 @strong{Caution:} The last example will be interpreted as two tokens,
133 @samp{8945} and @samp{.}, if it is the last token on a line.
139 @cindex case-sensitivity
140 Strings are literal sequences of characters enclosed in pairs of single
141 quotes (@samp{'}) or double quotes (@samp{"}).
145 Whitespace and case of letters @emph{are} significant inside strings.
147 Whitespace characters inside a string are not delimiters.
149 To include single-quote characters in a string, enclose the string in
152 To include double-quote characters in a string, enclose the string in
155 It is not possible to put both single- and double-quote characters
161 Hexstrings are string variants that use hex digits to specify
166 A hexstring may be used anywhere that an ordinary string is allowed.
171 A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
175 No whitespace is allowed between the initial @samp{X} and @samp{'}.
178 Double quotes @samp{"} may be used in place of single quotes @samp{'} if
182 Each pair of hex digits is internally changed into a single character
183 with the given value.
186 If there is an odd number of hex digits, the missing last digit is
187 assumed to be @samp{0}.
191 @strong{Please note:} Use of hexstrings is nonportable because the same
192 numeric values are associated with different glyphs by different
193 operating systems. Therefore, their use should be confined to syntax
194 files that will not be widely distributed.
197 @cindex characters, reserved
200 @strong{Please note also:} The character with value 00 is reserved for
201 internal use by PSPP. Its use in strings causes an error and
202 replacement with a blank space (in ASCII, hex 20, decimal 32).
207 Punctuation separates tokens; punctuators are delimiters. These are the
208 punctuation characters:
216 Operators describe mathematical operations. Some operators are delimiters:
222 Many of the above operators are also punctuators. Punctuators are
223 distinguished from operators by context.
225 The other operators are all reserved keywords. None of these are
229 AND EQ GE GT LE LT NE OR
234 @cindex dot, terminal
237 A period (@samp{.}) at the end of a line (except for whitespace) is one
238 type of a @dfn{terminal dot}, although not every terminal dot is a
239 period at the end of a line. @xref{Commands, , Forming commands of
240 tokens}. A period is a terminal dot @emph{only}
241 when it is at the end of a line; otherwise it is part of a
242 floating-point number. (A period outside a number in the middle of a
246 @cindex terminal dot, changing
247 @cindex dot, terminal, changing
248 @strong{Please note:} The character used for the @dfn{terminal dot}
249 can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}). This
250 is strongly discouraged, and throughout all the remainder of this
251 manual it will be assumed that the default setting is in effect.
256 @node Commands, Types of Commands, Tokens, Language
257 @section Forming commands of tokens
259 @cindex PSPP, command structure
260 @cindex language, command structure
261 @cindex commands, structure
263 Most PSPP commands share a common structure, diagrammed below:
266 @var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
270 In the above, rather daunting, expression, pairs of square brackets
271 (@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
272 indicate parts of the syntax that vary from command to command.
273 Ellipses (@samp{...}) indicate that the preceding part may be repeated
274 an arbitrary number of times. Let's pick apart what it says above:
277 @cindex commands, names
279 A command begins with a command name of one or more keywords, such as
280 @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}. @var{cmd}
281 may be abbreviated to its first word if that is unambiguous; each word
282 in @var{cmd} may be abbreviated to a unique prefix of three or more
283 characters as described above.
287 The command name may be followed by one or more @dfn{subcommands}:
291 Each subcommand begins with a unique keyword, indicated by @var{sbc}
292 above. This is analogous to the command name.
295 The subcommand name is optionally followed by an equals sign (@samp{=}).
298 Some subcommands accept a series of one or more specifications
299 (@var{spec}), optionally separated by commas.
302 Each subcommand must be separated from the next (if any) by a forward
306 @cindex dot, terminal
309 Each command must be terminated with a @dfn{terminal dot}.
310 The terminal dot may be given one of three ways:
314 (most commonly) A period character at the very end of a line, as
318 (only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
319 details.) A completely blank line.
322 (in batch mode only) Any line that is not indented from the left side of
323 the page causes a terminal dot to be inserted before that line.
324 Therefore, each command begins with a line that is flush left, followed
325 by zero or more lines that are indented one or more characters from the
328 In batch mode, PSPP will ignore a plus sign, minus sign, or period
329 (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
330 line. Any of these characters as the first character on a line will
331 begin a new command. This allows for visual indentation of a command
332 without that command being considered part of the previous command.
334 PSPP is in batch mode when it is reading input from a file, rather
335 than from an interactive user. Note that the other forms of the
336 terminal dot may also be used in batch mode.
338 Sometimes, one encounters syntax files that are intended to be
339 interpreted in interactive mode rather than batch mode (for instance,
340 this can happen if a session log file is used directly as a syntax
341 file). When this occurs, use the @samp{-i} command line option to force
342 interpretation in interactive mode (@pxref{Language control options}).
346 PSPP ignores empty commands when they are generated by the above
347 rules. Note that, as a consequence of these rules, each command must
350 @node Types of Commands, Order of Commands, Commands, Language
351 @section Types of Commands
353 Commands in PSPP are divided roughly into six categories:
356 @item Utility commands
357 @cindex utility commands
358 Set or display various global options that affect PSPP operations.
359 May appear anywhere in a syntax file. @xref{Utilities, , Utility
362 @item File definition commands
363 @cindex file definition commands
364 Give instructions for reading data from text files or from special
365 binary ``system files''. Most of these commands discard any previous
366 data or variables to replace it with the new data and
367 variables. At least one must appear before the first command in any of
368 the categories below. @xref{Data Input and Output}.
370 @item Input program commands
371 @cindex input program commands
372 Though rarely used, these provide powerful tools for reading data files
373 in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
375 @item Transformations
376 @cindex transformations
377 Perform operations on data and write data to output files. Transformations
378 are not carried out until a procedure is executed.
380 @item Restricted transformations
381 @cindex restricted transformations
382 Same as transformations for most purposes. @xref{Order of Commands}, for a
383 detailed description of the differences.
387 Analyze data, writing results of analyses to the listing file. Cause
388 transformations specified earlier in the file to be performed. In a
389 more general sense, a @dfn{procedure} is any command that causes the
390 active file (the data) to be read.
393 @node Order of Commands, Missing Observations, Types of Commands, Language
394 @section Order of Commands
395 @cindex commands, ordering
396 @cindex order of commands
398 PSPP does not place many restrictions on ordering of commands.
399 The main restriction is that variables must be defined with one of the
400 file-definition commands before they are otherwise referred to.
402 Of course, there are specific rules, for those who are interested.
403 PSPP possesses five internal states, called initial, INPUT PROGRAM,
404 FILE TYPE, transformation, and procedure states. (Please note the
405 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
406 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
408 PSPP starts up in the initial state. Each successful completion
409 of a command may cause a state transition. Each type of command has its
410 own rules for state transitions:
413 @item Utility commands
418 Do not cause state transitions. Exception: when @cmd{N OF CASES}
419 is executed in the procedure state, it causes a transition to the
420 transformation state.
423 @item @cmd{DATA LIST}
428 When executed in the initial or procedure state, causes a transition to
429 the transformation state.
431 Clears the active file if executed in the procedure or transformation
435 @item @cmd{INPUT PROGRAM}
438 Invalid in INPUT PROGRAM and FILE TYPE states.
440 Causes a transition to the INPUT PROGRAM state.
442 Clears the active file.
445 @item @cmd{FILE TYPE}
448 Invalid in INPUT PROGRAM and FILE TYPE states.
450 Causes a transition to the FILE TYPE state.
452 Clears the active file.
455 @item Other file definition commands
458 Invalid in INPUT PROGRAM and FILE TYPE states.
460 Cause a transition to the transformation state.
462 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
466 @item Transformations
469 Invalid in initial and FILE TYPE states.
471 Cause a transition to the transformation state.
474 @item Restricted transformations
477 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
479 Cause a transition to the transformation state.
485 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
487 Cause a transition to the procedure state.
491 @node Missing Observations, Variables, Order of Commands, Language
492 @section Handling missing observations
493 @cindex missing values
494 @cindex values, missing
496 PSPP includes special support for unknown numeric data values.
497 Missing observations are assigned a special value, called the
498 @dfn{system-missing value}. This ``value'' actually indicates the
499 absence of value; it means that the actual value is unknown. Procedures
500 automatically exclude from analyses those observations or cases that
501 have missing values. Whether single observations or entire cases are
502 excluded depends on the procedure.
504 The system-missing value exists only for numeric variables. String
505 variables always have a defined value, even if it is only a string of
508 Variables, whether numeric or string, can have designated
509 @dfn{user-missing values}. Every user-missing value is an actual value
510 for that variable. However, most of the time user-missing values are
511 treated in the same way as the system-missing value. String variables
512 that are wider than a certain width, usually 8 characters (depending on
513 computer architecture), cannot have user-missing values.
515 For more information on missing values, see the following sections:
516 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
517 documentation on individual procedures for information on how they
518 handle missing values.
520 @node Variables, Files, Missing Observations, Language
525 Variables are the basic unit of data storage in PSPP. All the
526 variables in a file taken together, apart from any associated data, are
527 said to form a @dfn{dictionary}.
528 Some details of variables are described in the sections below.
531 * Attributes:: Attributes of variables.
532 * System Variables:: Variables automatically defined by PSPP.
533 * Sets of Variables:: Lists of variable names.
534 * Input/Output Formats:: Input and output formats.
535 * Scratch Variables:: Variables deleted by procedures.
538 @node Attributes, System Variables, Variables, Variables
539 @subsection Attributes of Variables
540 @cindex variables, attributes of
541 @cindex attributes of variables
542 Each variable has a number of attributes, including:
546 This is an identifier. Each variable must have a different name.
549 @cindex variables, type
550 @cindex type of variables
554 @cindex variables, width
555 @cindex width of variables
557 (string variables only) String variables with a width of 8 characters or
558 fewer are called @dfn{short string variables}. Short string variables
559 can be used in many procedures where @dfn{long string variables} (those
560 with widths greater than 8) are not allowed.
563 @strong{Please note:} Certain systems may consider strings longer than 8
564 characters to be short strings. Eight characters represents a minimum
565 figure for the maximum length of a short string.
569 Variables in the dictionary are arranged in a specific order.
570 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
573 Either reinitialized to 0 or spaces for each case, or left at its
574 existing value. @xref{LEAVE}.
576 @cindex missing values
577 @cindex values, missing
579 Optionally, up to three values, or a range of values, or a specific
580 value plus a range, can be specified as @dfn{user-missing values}.
581 There is also a @dfn{system-missing value} that is assigned to an
582 observation when there is no other obvious value for that observation.
583 Observations with missing values are automatically excluded from
584 analyses. User-missing values are actual data values, while the
585 system-missing value is not a value at all. @xref{Missing Observations}.
587 @cindex variable labels
588 @cindex labels, variable
590 A string that describes the variable. @xref{VARIABLE LABELS}.
593 @cindex labels, value
595 Optionally, these associate each possible value of the variable with a
596 string. @xref{VALUE LABELS}.
600 Display width, format, and (for numeric variables) number of decimal
601 places. This attribute does not affect how data are stored, just how
602 they are displayed. Example: a width of 8, with 2 decimal places.
603 @xref{PRINT FORMATS}.
607 Similar to print format, but used by certain commands that are
608 designed to write to binary files. @xref{WRITE FORMATS}.
611 @node System Variables, Sets of Variables, Attributes, Variables
612 @subsection Variables Automatically Defined by PSPP
613 @cindex system variables
614 @cindex variables, system
616 There are seven system variables. These are not like ordinary
617 variables, as they are not stored in each case. They can only be used
618 in expressions. These system variables, whose values and output formats
619 cannot be modified, are described below.
622 @cindex @code{$CASENUM}
624 Case number of the case at the moment. This changes as cases are
629 Date the PSPP process was started, in format A9, following the
630 pattern @code{DD MMM YY}.
632 @cindex @code{$JDATE}
634 Number of days between 15 Oct 1582 and the time the PSPP process
637 @cindex @code{$LENGTH}
639 Page length, in lines, in format F11.
641 @cindex @code{$SYSMIS}
643 System missing value, in format F1.
647 Number of seconds between midnight 14 Oct 1582 and the time the active file
648 was read, in format F20.
650 @cindex @code{$WIDTH}
652 Page width, in characters, in format F3.
655 @node Sets of Variables, Input/Output Formats, System Variables, Variables
656 @subsection Lists of variable names
657 @cindex TO convention
658 @cindex convention, TO
660 There are several ways to specify a set of variables:
664 (Most commonly.) List the variable names one after another, optionally
665 separating them by commas.
669 (This method cannot be used on commands that define the dictionary, such
670 as @cmd{DATA LIST}.) The syntax is the names of two existing variables,
671 separated by the reserved keyword @code{TO}. The meaning is to include
672 every variable in the dictionary between and including the variables
673 specified. For instance, if the dictionary contains six variables with
674 the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
675 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
676 variables @code{X2}, @code{GOAL}, and @code{MET}.
679 (This method can be used only on commands that define the dictionary,
680 such as @cmd{DATA LIST}.) It is used to define sequences of variables
681 that end in consecutive integers. The syntax is two identifiers that
682 end in numbers. This method is best illustrated with examples:
686 The syntax @code{X1 TO X5} defines 5 variables:
702 The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
720 Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
721 are invalid, although for different reasons, which should be evident.
724 Note that after a set of variables has been defined with @cmd{DATA LIST}
725 or another command with this method, the same set can be referenced on
726 later commands using the same syntax.
729 The above methods can be combined, either one after another or delimited
730 by commas. For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
731 is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
732 is individually legal.
735 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
736 @subsection Input and Output Formats
738 Data that PSPP inputs and outputs must have one of a number of formats.
739 These formats are described, in general, by a format specification of
740 the form @code{NAMEw.d}, where @var{name} is the
741 format name and @var{w} is a field width. @var{d} is the optional
742 desired number of decimal places, if appropriate. If @var{d} is not
743 included then it is assumed to be 0. Some formats do not allow @var{d}
746 When an input format is specified on @cmd{DATA LIST} or another
748 it is converted to an output format for the purposes of @cmd{PRINT}
750 data output commands. For most purposes, input and output formats are
751 the same; the salient differences are described below.
753 Below are listed the input and output formats supported by PSPP. If an
754 input format is mapped to a different output format by default, then
755 that mapping is indicated with @result{}. Each format has the listed
756 bounds on input width (iw) and output width (ow).
758 The standard numeric input and output formats are given in the following
762 @item Fw.d: 1 <= iw,ow <= 40
763 Standard decimal format with @var{d} decimal places. If the number is
764 too large to fit within the field width, it is expressed in scientific
765 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
766 the exponent. When used as an input format, scientific notation is
767 allowed but an E or an F must be used to introduce the exponent.
769 The default output format is the same as the input format, except if
770 @var{d} > 1. In that case the output @var{w} is always made to be at
773 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
774 For input this is equivalent to F format except that no E or F is
775 require to introduce the exponent. For output, produces scientific
776 notation in the form @code{1.2+34}. There are always at least two
777 digits given in the exponent.
779 The default output @var{w} is the largest of the input @var{w}, the
780 input @var{d} + 7, and 10. The default output @var{d} is the input
781 @var{d}, but at least 3.
783 @item COMMAw.d: 1 <= iw,ow <= 40
784 Equivalent to F format, except that groups of three digits are
785 comma-separated on output. If the number is too large to express in the
786 field width, then first commas are eliminated, then if there is still
787 not enough space the number is expressed in scientific notation given
788 that w >= 6. Commas are allowed and ignored when this is used as an
791 @item DOTw.d: 1 <= iw,ow <= 40
792 Equivalent to COMMA format except that the roles of comma and decimal
793 point are interchanged. However: If SET /DECIMAL=DOT is in effect, then
794 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
797 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
798 Equivalent to COMMA format, except that the number is prefixed by a
799 dollar sign (@samp{$}) if there is room. On input the value is allowed
800 to be prefixed by a dollar sign, which is ignored.
802 The default output @var{w} is the input @var{w}, but at least 2.
804 @item PCTw.d: 2 <= iw,ow <= 40
805 Equivalent to F format, except that the number is suffixed by a percent
806 sign (@samp{%}) if there is room. On input the value is allowed to be
807 suffixed by a percent sign, which is ignored.
809 The default output @var{w} is the input @var{w}, but at least 2.
811 @item Nw.d: 1 <= iw,ow <= 40
812 Only digits are allowed within the field width. The decimal point is
813 assumed to be @var{d} digits from the right margin.
815 The default output format is F with the same @var{w} and @var{d}, except
816 if @var{d} > 1. In that case the output @var{w} is always made to be at
819 @item Zw.d @result{} F: 1 <= iw,ow <= 40
820 Zoned decimal input. If you need to use this then you know how.
822 @item IBw.d @result{} F: 1 <= iw,ow <= 8
823 Integer binary format. The field is interpreted as a fixed-point
824 positive or negative binary number in two's-complement notation. The
825 location of the decimal point is implied. Endianness is the same as the
828 The default output format is F8.2 if @var{d} is 0. Otherwise it is F,
829 with output @var{w} as 9 + input @var{d} and output @var{d} as input
832 @item PIB @result{} F: 1 <= iw,ow <= 8
833 Positive integer binary format. The field is interpreted as a
834 fixed-point positive binary number. The location of the decimal point
835 is implied. Endianness is teh same as the host machine.
837 The default output format follows the rules for IB format.
839 @item Pw.d @result{} F: 1 <= iw,ow <= 16
840 Binary coded decimal format. Each byte from left to right, except the
841 rightmost, represents two digits. The upper nibble of each byte is more
842 significant. The upper nibble of the final byte is the least
843 significant digit. The lower nibble of the final byte is the sign; a
844 value of D represents a negative sign and all other values are
845 considered positive. The decimal point is implied.
847 The default output format follows the rules for IB format.
849 @item PKw.d @result{} F: 1 <= iw,ow <= 16
850 Positive binary code decimal format. Same as P but the last byte is the
853 The default output format follows the rules for IB format.
855 @item RBw @result{} F: 2 <= iw,ow <= 8
857 Binary C architecture-dependent ``double'' format. For a standard
858 IEEE754 implementation @var{w} should be 8.
860 The default output format follows the rules for IB format.
862 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
863 PIB format encoded as textual hex digit pairs. @var{w} must be even.
865 The input width is mapped to a default output width as follows:
866 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
867 12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for
870 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
872 RB format encoded as textual hex digits pairs. @var{w} must be even.
874 The default output format is F8.2.
876 @item CCAw.d: 1 <= ow <= 40
877 @itemx CCBw.d: 1 <= ow <= 40
878 @itemx CCCw.d: 1 <= ow <= 40
879 @itemx CCDw.d: 1 <= ow <= 40
880 @itemx CCEw.d: 1 <= ow <= 40
882 User-defined custom currency formats. May not be used as an input
883 format. @xref{SET}, for more details.
886 The date and time numeric input and output formats accept a number of
887 possible formats. Before describing the formats themselves, some
888 definitions of the elements that make up their formats will be helpful:
892 All formats accept an optional whitespace leader.
895 An integer between 1 and 31 representing the day of month.
898 An integer representing a number of days.
901 One or more characters of whitespace or the following characters:
905 A month name in one of the following forms:
908 An integer between 1 and 12.
910 Roman numerals representing an integer between 1 and 12.
912 At least the first three characters of an English month name (January,
917 An integer year number between 1582 and 19999, or between 1 and 199.
918 Years between 1 and 199 will have 1900 added.
921 A single number with a year number in the first 2, 3, or 4 digits (as
922 above) and the day number within the year in the last 3 digits.
925 An integer between 1 and 4 representing a quarter.
928 The letter @samp{Q} or @samp{q}.
931 An integer between 1 and 53 representing a week within a year.
934 The letters @samp{wk} in any case.
937 At least one characters of whitespace or @samp{:} or @samp{.}.
940 An integer greater than 0 representing an hour.
943 An integer between 0 and 59 representing a minute within an hour.
946 Optionally, a time-delimiter followed by a real number representing a
950 An integer between 0 and 23 representing an hour within a day.
953 At least the first two characters of an English day word.
956 Any amount or no amount of whitespace.
959 An optional positive or negative sign.
962 All formats accept an optional whitespace trailer.
965 The date input formats are strung together from the above pieces. On
966 output, the date formats are always printed in a single canonical
967 manner, based on field width. The date input and output formats are
971 @item DATEw: 9 <= iw,ow <= 40
972 Date format. Input format: leader + day + date-delimiter +
973 month + date-delimiter + year + trailer. Output format: DD-MMM-YY for
974 @var{w} < 11, DD-MMM-YYYY otherwise.
976 @item EDATEw: 8 <= iw,ow <= 40
977 European date format. Input format same as DATE. Output format:
978 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
980 @item SDATEw: 8 <= iw,ow <= 40
981 Standard date format. Input format: leader + year + date-delimiter +
982 month + date-delimiter + day + trailer. Output format: YY/MM/DD for
983 @var{w} < 10, YYYY/MM/DD otherwise.
985 @item ADATEw: 8 <= iw,ow <= 40
986 American date format. Input format: leader + month + date-delimiter +
987 day + date-delimiter + year + trailer. Output format: MM/DD/YY for
988 @var{w} < 10, MM/DD/YYYY otherwise.
990 @item JDATEw: 5 <= iw,ow <= 40
991 Julian date format. Input format: leader + julian + trailer. Output
992 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
994 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
995 Quarter/year format. Input format: leader + quarter + q-delimiter +
996 year + trailer. Output format: @samp{Q Q YY}, where the first
997 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
1000 @item MOYRw: 6 <= iw,ow <= 40
1001 Month/year format. Input format: leader + month + date-delimiter + year
1002 + trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
1005 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
1006 Week/year format. Input format: leader + week + wk-delimiter + year +
1007 trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
1010 @item DATETIMEw.d: 17 <= iw,ow <= 40
1011 Date and time format. Input format: leader + day + date-delimiter +
1012 month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
1013 + minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
1014 @var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
1015 @var{d} > 0 then fractional seconds @samp{.SS} are added.
1017 @item TIMEw.d: 5 <= iw,ow <= 40
1018 Time format. Input format: leader + sign + spaces + hour +
1019 time-delimiter + minute + opt-second. Output format: @samp{HH:MM}.
1020 Seconds and fractional seconds are available with @var{w} of at least 8
1021 and 10, respectively.
1023 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
1024 Time format with day count. Input format: leader + sign + spaces +
1025 day-count + time-delimiter + hour + time-delimiter + minute +
1026 opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional
1027 seconds are available with @var{w} of at least 8 and 10, respectively.
1029 @item WKDAYw: 2 <= iw,ow <= 40
1030 A weekday as a number between 1 and 7, where 1 is Sunday. Input format:
1031 leader + weekday + trailer. Output format: as many characters, in all
1032 capital letters, of the English name of the weekday as will fit in the
1035 @item MONTHw: 3 <= iw,ow <= 40
1036 A month as a number between 1 and 12, where 1 is January. Input format:
1037 leader + month + trailer. Output format: as many character, in all
1038 capital letters, of the English name of the month as will fit in the
1042 There are only two formats that may be used with string variables:
1045 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
1046 The entire field is treated as a string value.
1048 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
1049 The field is composed of characters in a string encoded as textual hex
1052 The default output @var{w} is half the input @var{w}.
1055 @node Scratch Variables, , Input/Output Formats, Variables
1056 @subsection Scratch Variables
1058 Most of the time, variables don't retain their values between cases.
1059 Instead, either they're being read from a data file or the active file,
1060 in which case they assume the value read, or, if created with
1062 another transformation, they're initialized to the system-missing value
1063 or to blanks, depending on type.
1065 However, sometimes it's useful to have a variable that keeps its value
1066 between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
1067 use a @dfn{scratch variable}. Scratch variables are variables whose
1068 names begin with an octothorpe (@samp{#}).
1070 Scratch variables have the same properties as variables left with
1072 they retain their values between cases, and for the first case they are
1073 initialized to 0 or blanks. They have the additional property that they
1074 are deleted before the execution of any procedure. For this reason,
1075 scratch variables can't be used for analysis. To obtain the same
1076 effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
1077 value into an ordinary variable, then analysis that variable.
1079 @node Files, BNF, Variables, Language
1080 @section Files Used by PSPP
1082 PSPP makes use of many files each time it runs. Some of these it
1083 reads, some it writes, some it creates. Here is a table listing the
1084 most important of these files:
1087 @cindex file, command
1088 @cindex file, syntax file
1089 @cindex command file
1093 These names (synonyms) refer to the file that contains instructions to
1094 PSPP that tell it what to do. The syntax file's name is specified on
1095 the PSPP command line. Syntax files can also be pulled in with
1096 @cmd{INCLUDE} (@pxref{INCLUDE}).
1101 Data files contain raw data in ASCII format suitable for being read in
1102 by @cmd{DATA LIST}. Data can be embedded in the syntax
1103 file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
1104 syntax file a data file too.
1106 @cindex file, output
1109 One or more output files are created by PSPP each time it is
1110 run. The output files receive the tables and charts produced by
1111 statistical procedures. The output files may be in any number of formats,
1112 depending on how PSPP is configured.
1115 @cindex file, active
1117 The active file is the ``file'' on which all PSPP procedures
1118 are performed. The active file contains variable definitions and
1119 cases. The active file is not necessarily a disk file: it is stored
1120 in memory if there is room.
1123 @node BNF, , Files, Language
1124 @section Backus-Naur Form
1126 @cindex Backus-Naur Form
1127 @cindex command syntax, description of
1128 @cindex description of command syntax
1130 The syntax of some parts of the PSPP language is presented in this
1131 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
1132 following table describes BNF:
1138 Words in all-uppercase are PSPP keyword tokens. In BNF, these are
1139 often called @dfn{terminals}. There are some special terminals, which
1140 are actually written in lowercase for clarity:
1143 @cindex @code{number}
1147 @cindex @code{integer}
1148 @item @code{integer}
1151 @cindex @code{string}
1155 @cindex @code{var-name}
1156 @item @code{var-name}
1157 A single variable name.
1161 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
1162 Operators and punctuators.
1165 @cindex terminal dot
1166 @cindex dot, terminal
1168 The terminal dot. This is not necessarily an actual dot in the syntax
1169 file: @xref{Commands}, for more details.
1174 @cindex nonterminals
1175 Other words in all lowercase refer to BNF definitions, called
1176 @dfn{productions}. These productions are also known as
1177 @dfn{nonterminals}. Some nonterminals are very common, so they are
1178 defined here in English for clarity:
1181 @cindex @code{var-list}
1183 A list of one or more variable names or the keyword @code{ALL}.
1185 @cindex @code{expression}
1187 An expression. @xref{Expressions}, for details.
1192 @cindex ``is defined as''
1194 @samp{::=} means ``is defined as''. The left side of @samp{::=} gives
1195 the name of the nonterminal being defined. The right side of @samp{::=}
1196 gives the definition of that nonterminal. If the right side is empty,
1197 then one possible expansion of that nonterminal is nothing. A BNF
1198 definition is called a @dfn{production}.
1201 @cindex terminals and nonterminals, differences
1202 So, the key difference between a terminal and a nonterminal is that a
1203 terminal cannot be broken into smaller parts---in fact, every terminal
1204 is a single token (@pxref{Tokens}). On the other hand, nonterminals are
1205 composed of a (possibly empty) sequence of terminals and nonterminals.
1206 Thus, terminals indicate the deepest level of syntax description. (In
1207 parsing theory, terminals are the leaves of the parse tree; nonterminals
1211 @cindex start symbol
1212 @cindex symbol, start
1213 The first nonterminal defined in a set of productions is called the
1214 @dfn{start symbol}. The start symbol defines the entire syntax for
1217 @setfilename ignored