1 \input texinfo @c -*- texinfo -*-
5 @set TIMESTAMP Time-stamp: Sat Dec 20 20:25:33 WST 2003 jmd
8 @c For double-sided printing, uncomment:
9 @c @setchapternewpage odd
18 * PSPP: (pspp). Statistical analysis package.
22 PSPP, for statistical analysis of sampled data, by Ben Pfaff.
24 This file documents PSPP, a statistical package for analysis of
25 sampled data that uses a command language compatible with SPSS.
27 Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.
29 This version of the PSPP documentation is consistent with version 2 of
32 Permission is granted to make and distribute verbatim copies of this
33 manual provided the copyright notice and this permission notice are
34 preserved on all copies.
37 Permission is granted to process this file through TeX and print the
38 results, provided the printed document carries copying permission notice
39 identical to this one except for the removal of this paragraph (this
40 paragraph not being relevant to the printed manual).
43 Permission is granted to copy and distribute modified versions of this
44 manual under the conditions for verbatim copying, provided that the
45 entire resulting derived work is distributed under the terms of a
46 permission notice identical to this one.
48 Permission is granted to copy and distribute translations of this
49 manual into another language, under the above condition for modified
50 versions, except that this permission notice may be stated in a
51 translation approved by the Free Software Foundation.
56 @subtitle A System for Statistical Analysis
57 @subtitle Edition @value{EDITION}, for PSPP version @value{VERSION}
61 @vskip 0pt plus 1filll
63 PSPP Copyright @copyright{} 1997, 1998 Free Software Foundation, Inc.
65 Permission is granted to make and distribute verbatim copies of this
66 manual provided the copyright notice and this permission notice are
67 preserved on all copies.
69 Permission is granted to copy and distribute modified versions of this
70 manual under the conditions for verbatim copying, provided that the
71 entire derived work is distributed under the terms of a permission
72 notice identical to this one.
74 Permission is granted to copy and distribute translations of this manual
75 into another language, under the above conditions for modified versions,
76 except that this permission notice may be stated in a translation
77 approved by the Foundation.
80 @node Top, Introduction, (dir), (dir)
84 This file documents the PSPP package for statistical analysis of sampled
85 data. This is edition @value{EDITION}, for PSPP version
86 @value{VERSION}, last modified at @value{TIMESTAMP}.
91 * Introduction:: Description of the package.
92 * License:: Your rights and obligations.
93 * Credits:: Acknowledgement of authors.
95 * Installation:: How to compile and install PSPP.
96 * Configuration:: Configuring PSPP.
97 * Invocation:: Starting and running PSPP.
99 * Language:: Basics of the PSPP command language.
100 * Expressions:: Numeric and string expression syntax.
102 * Data Input and Output:: Reading data from user files.
103 * System and Portable Files:: Dealing with system & portable files.
104 * Variable Attributes:: Adjusting and examining variables.
105 * Data Manipulation:: Simple operations on data.
106 * Data Selection:: Select certain cases for analysis.
107 * Conditionals and Looping:: Doing things many times or not at all.
108 * Statistics:: Basic statistical procedures.
109 * Utilities:: Other commands.
110 * Not Implemented:: What's not here yet
112 * Data File Format:: Format of PSPP system files.
113 * Portable File Format:: Format of PSPP portable files.
114 * q2c Input Format:: Format of syntax accepted by q2c.
116 * Bugs:: Known problems; submitting bug reports.
118 * Function Index:: Index of PSPP functions for expressions.
119 * Concept Index:: Index of concepts.
120 * Command Index:: Index of PSPP procedures.
124 @node Introduction, License, Top, Top
125 @chapter Introduction
128 @cindex PSPP language
129 @cindex language, PSPP
130 PSPP is a tool for statistical analysis of sampled data. It reads a
131 syntax file and a data file, analyzes the data, and writes the results
132 to a listing file or to standard output.
134 The language accepted by PSPP is similar to those accepted by SPSS
135 statistical products. The details of PSPP's language are given
136 later in this manual.
143 @cindex Free Software Foundation
144 PSPP produces output in two forms: tables and charts. Both of these can
145 be written in several formats; currently, ASCII, PostScript, and HTML
146 are supported. In the future, more drivers, such as PCL and X Window
147 System drivers, may be developed. For now, Ghostscript, available from
148 the Free Software Foundation, may be used to convert PostScript chart
149 output to other formats.
151 The current version of PSPP, @value{VERSION}, is woefully incomplete in
152 terms of its statistical procedure support. PSPP is a work in progress.
153 The author hopes to support fully support all features in the products
154 that PSPP replaces, eventually. The author welcomes questions,
155 comments, donations, and code submissions. @xref{Bugs,,Submitting Bug
156 Reports}, for instructions on contacting the author.
158 @node License, Credits, Introduction, Top
159 @chapter Your rights and obligations
161 @cindex your rights and obligations
163 @cindex obligations, your
165 @cindex Free Software Foundation
166 @cindex GNU General Public License
167 @cindex General Public License
170 @cindex redistribution
171 Most of PSPP is distributed under the GNU General Public
172 License. The General Public License says, in effect, that you may
173 modify and distribute PSPP as you like, as long as you grant the
174 same rights to others. It also states that you must provide source code
175 when you distribute PSPP, or, if you obtained PSPP
176 source code from an anonymous ftp site, give out the name of that site.
178 The General Public License is given in full in the source distribution
179 as file @file{COPYING}. In Debian GNU/Linux, this file is also
180 available as file @file{/usr/share/common-licenses/GPL-2}.
182 To quote the GPL itself:
185 This program is free software; you can redistribute it and/or modify it
186 under the terms of the GNU General Public License as published by the
187 Free Software Foundation; either version 2 of the License, or (at your
188 option) any later version.
190 This program is distributed in the hope that it will be useful, but
191 WITHOUT ANY WARRANTY; without even the implied warranty of
192 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
193 General Public License for more details.
195 You should have received a copy of the GNU General Public License along
196 with this program; if not, write to the Free Software Foundation, Inc.,
197 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
200 @node Credits, Installation, License, Top
205 @cindex Minton, Claire
206 @cindex @cite{Cat's Cradle}
207 @cindex Vonnegut, Kurt, Jr.
210 I'm always embarrassed when I see an index an author has made of his own
211 work. It's a shameless exhibition---to the @i{trained} eye. Never
214 ---Claire Minton, @cite{Cat's Cradle}, Kurt Vonnegut, Jr.
218 Most of PSPP, as well as this manual (including the indices),
219 was written by Ben Pfaff. @xref{Contacting the Author}, for
220 instructions on contacting the author.
222 @cindex Covington, Michael A.
223 @cindex Van Zandt, James
224 @cindex @file{ftp.cdrom.com}
225 @cindex @file{/pub/algorithms/c/julcal10}
226 @cindex @file{julcal.c}
227 @cindex @file{julcal.h}
228 The PSPP source code incorporates @code{julcal10} originally
229 written by Michael A. Covington and translated into C by Jim Van Zandt.
230 The original package can be found in directory
231 @url{ftp://ftp.cdrom.com/pub/algorithms/c/julcal10}. The entire
232 contents of that directory constitute the package. The files actually
233 used in PSPP are @code{julcal.c} and @code{julcal.h}.
235 @node Installation, Configuration, Credits, Top
236 @chapter Installing PSPP
238 @cindex PSPP, installing
240 @cindex GNU C compiler
242 @cindex compiler, recommended
243 @cindex compiler, gcc
244 PSPP conforms to the GNU Coding Standards. PSPP is written in, and
245 requires for proper operation, ANSI/ISO C. You might want to
246 additionally note the following points:
250 The compiler and linker must allow for significance of several
251 characters in external identifiers. The exact number is unknown but at
252 least 31 is recommended.
255 The @code{int} type must be 32 bits or wider.
258 The recommended compiler is gcc 2.7.2.1 or later, but any ANSI compiler
259 will do if it fits the above criteria.
262 Many UNIX variants should work out-of-the-box, as PSPP uses GNU
263 autoconf to detect differences between environments. Please report any
264 problems with compilation of PSPP under UNIX and UNIX-like operating
265 systems---portability is a major concern of the author.
267 The pages below give specific instructions for installing PSPP
268 on each type of system mentioned above.
271 * UNIX installation:: Installing on UNIX-like environments.
274 @node UNIX installation, , Installation, Installation
275 @section UNIX installation
276 @cindex UNIX, installing PSPP under
277 @cindex installation, under UNIX
279 To install PSPP under a UNIX-like operating system, follow the steps
280 below in order. Some of the text below was taken directly from various
281 Free Software Foundation sources.
285 @code{cd} to the directory containing the PSPP source.
287 @cindex configure, GNU
288 @cindex GNU configure
290 Type @samp{./configure} to configure for your particular operating
291 system and compiler. Running @code{configure} takes a while. While
292 running, it displays some messages telling which features it is checking
295 You can optionally supply some options to @code{configure} in order to
296 give it hints about how to do its job. Type @code{./configure --help}
297 to see a list of options. One of the most useful options is
298 @samp{--with-checker}, which enables the use of the Checker memory
299 debugger under supported operating systems. Checker must already be
300 installed to use this option. Do not use @samp{--with-checker} if you
301 are not debugging PSPP itself.
303 @cindex @file{Makefile}
304 @cindex @file{config.h}
305 @cindex @file{pref.h}
308 (optional) Edit @file{Makefile}, @file{config.h}, and @file{pref.h}.
309 These files are produced by @code{configure}. Note that most PSPP
310 settings can be changed at runtime.
312 @file{pref.h} is only generated by @code{configure} if it does not
313 already exist. (It's copied from @file{prefh.orig}.)
317 Type @samp{make} to compile the package. If there are any errors during
318 compilation, try to fix them. If modifications are necessary to compile
319 correctly under your configuration, contact the author.
320 @xref{Bugs,,Submitting Bug Reports}, for details.
322 @cindex self-tests, running
324 Type @samp{make check} to run self-tests on the compiled PSPP package.
327 @cindex PSPP, installing
328 @cindex @file{/usr/local/share/pspp/}
329 @cindex @file{/usr/local/bin/}
330 @cindex @file{/usr/local/info/}
331 @cindex documentation, installing
333 Become the superuser and type @samp{make install} to install the
334 PSPP binaries, by default in @file{/usr/local/bin/}. The
335 directory @file{/usr/local/share/pspp/} is created and populated with
336 files needed by PSPP at runtime. This step will also cause the
337 PSPP documentation to be installed in @file{/usr/local/info/},
338 but only if that directory already exists.
341 (optional) Type @samp{make clean} to delete the PSPP binaries
342 from the source tree.
345 @node Configuration, Invocation, Installation, Top
346 @chapter Configuring PSPP
347 @cindex configuration
348 @cindex PSPP, configuring
350 PSPP has dozens of configuration possibilities and hundreds of
351 settings. This is both a bane and a blessing. On one hand, it's
352 possible to easily accommodate diverse ranges of setups. But, on the
353 other, the multitude of possibilities can overwhelm the casual user.
354 Fortunately, the configuration mechanisms are profusely described in the
355 sections below@enddots{}
358 * File locations:: How PSPP finds config files.
359 * Configuration techniques:: Many different methods of configuration@enddots{}
360 * Configuration files:: How configuration files are read.
361 * Environment variables:: All about environment variables.
362 * Output devices:: Describing your terminal(s) and printer(s).
363 * PostScript driver class:: Configuration of PostScript devices.
364 * ASCII driver class:: Configuration of character-code devices.
365 * HTML driver class:: Configuration for HTML output.
366 * Miscellaneous configuring:: Even more configuration variables.
367 * Improving output quality:: Hints for producing ever-more-lovely output.
370 @node File locations, Configuration techniques, Configuration, Configuration
371 @section Locating configuration files
373 PSPP uses the same method to find most of its configuration files:
377 The @dfn{base name} of the file being sought is determined.
380 The path to search is determined.
383 Each directory in the search path, from left to right, is searched for a
384 file with the name of the base name. The first occurrence is read
385 as the configuration file.
388 The first two steps are elaborated below for the sake of our pedantic
393 A @dfn{base name} is a file name lacking an absolute directory
394 reference. Some examples of base names are: @file{ps-encodings},
395 @file{devices}, @file{devps/DESC} (under UNIX), @file{devps\DESC} (under
398 Determining the base name is a two-step process:
402 If the appropriate environment variable is defined, the value of that
403 variable is used (@pxref{Environment variables}). For instance, when
404 searching for the output driver initialization file, the variable
405 examined is @code{STAT_OUTPUT_INIT_FILE}.
408 Otherwise, the compiled-in default is used. For example, when searching
409 for the output driver initialization file, the default base name is
413 @strong{Please note:} If a user-specified base name does contain an
414 absolute directory reference, as in a file name like
415 @file{/home/pfaff/fonts/TR}, no path is searched---the file name is used
416 exactly as given---and the algorithm terminates.
419 The path is the first of the following that is defined:
423 A variable definition for the path given in the user environment. This
424 is a PSPP-specific environment variable name; for instance,
425 @code{STAT_OUTPUT_INIT_PATH}.
428 In some cases, another, less-specific environment variable is checked.
429 For instance, when searching for font files, the PostScript driver first
430 checks for a variable with name @code{STAT_GROFF_FONT_PATH}, then for
431 one with name @code{GROFF_FONT_PATH}. (However, font searching has its
432 own list of esoteric search rules.)
435 The configuration file path, which is itself determined by the
440 If the command line contains an option of the form @samp{-B @var{path}}
441 or @samp{--config-dir=@var{path}}, then the value given on the
442 rightmost occurrence of such an option is used.
445 Otherwise, if the environment variable @code{STAT_CONFIG_PATH} is
446 defined, the value of that variable is used.
449 Otherwise, the compiled-in fallback default is used. On UNIX machines,
450 the default fallback path is
457 @file{/usr/local/lib/pspp}
463 On DOS machines, the default fallback path is:
467 All the paths from the DOS search path in the @samp{PATH} environment
468 variable, in left-to-right order.
471 @file{C:\PSPP}, as a last resort.
474 Note that the installer of PSPP can easily change this default
475 fallback path; thus the above should not be taken as gospel.
480 As a final note: Under DOS, directories given in paths are delimited by
481 semicolons (@samp{;}); under UNIX, directories are delimited by colons
482 (@samp{:}). This corresponds with the standard path delimiter under
485 @node Configuration techniques, Configuration files, File locations, Configuration
486 @section Configuration techniques
488 There are many ways that PSPP can be configured. These are
489 described in the list below. Values given by earlier items take
490 precedence over those given by later items.
494 Syntax commands that modify settings, such as @code{SET}. @xref{SET}.
497 Command-line options. @xref{Invocation}.
500 PSPP-specific environment variable contents. @xref{Environment
504 General environment variable contents. @xref{Environment variables}.
507 Configuration file contents. @xref{Configuration files}.
513 Some of the above may not apply to a particular setting. For instance,
514 the current pager (such as @samp{more}, @samp{most}, or @samp{less})
515 cannot be determined by configuration file contents because there is no
516 appropriate configuration file.
518 @node Configuration files, Environment variables, Configuration techniques, Configuration
519 @section Configuration files
521 Most configuration files have a common form:
525 Each line forms a separate command or directive. This means that lines
526 cannot be broken up, unless they are spliced together with a trailing
527 backslash, as described below.
530 Before anything else is done, trailing whitespace is removed.
533 When a line ends in a backslash (@samp{\}), the backslash is removed,
534 and the next line is read and appended to the current line.
538 Whitespace preceding the backslash is retained.
541 This rule continues to be applied until the line read does not end in a
545 It is an error if the last line in the file ends in a backslash.
549 Comments are introduced by an octothorpe (@samp{#}), and continue until the
554 An octothorpe inside balanced pairs of double quotation marks (@samp{"})
555 or single quotation marks (@samp{'}) does not introduce a comment.
558 The backslash character can be used inside balanced quotes of either
559 type to escape the following character as a literal character.
561 (This is distinct from the use of a backslash as a line-splicing
565 Line splicing takes place before comment removal.
569 Blank lines, and lines that contain only whitespace, are ignored.
572 @node Environment variables, Output devices, Configuration files, Configuration
573 @section Environment variables
575 You may think the concept of environment variables is a fairly simple
576 one. However, the author of PSPP has found a way to complicate
577 even something so simple. Environment variables are further described
578 in the sections below:
581 * Variable values:: Values of variables are determined this way.
582 * Environment substitutions:: How environment substitutions are made.
583 * Predefined variables:: A few variables are automatically defined.
586 @node Variable values, Environment substitutions, Environment variables, Environment variables
587 @subsection Values of environment variables
589 Values for environment variables are obtained by the following means,
590 which are arranged in order of decreasing precedence:
594 Command-line options. @xref{Invocation}.
597 The @file{environment} configuration file---more on this below.
600 Actual environment variables (defined in the shell or other parent
604 The @file{environment} configuration file is located through application
605 of the usual algorithm for configuration files (@pxref{File locations}),
606 except that its contents do not affect the search path used to find
607 @file{environment} itself. Use of @file{environment} is discouraged on
608 systems that allow an arbitrarily large environment; it is supported for
609 use on systems like MS-DOS that limit environment size.
611 @file{environment} is composed of lines having the form
612 @samp{@var{key}=@var{value}}, where @var{key} and the equals sign
613 (@samp{=}) are required, and @var{value} is optional. If @var{value} is
614 given, variable @var{key} is given that value; if @var{value} is absent,
615 variable @var{key} is undefined (deleted). Variables may not be defined
618 Environment substitutions are performed on each line in the file
619 (@pxref{Environment substitutions}).
621 See @ref{Configuration files}, for more details on formatting of the
622 environment configuration file.
625 @strong{Please note:} Support for @file{environment} is not yet
629 @node Environment substitutions, Predefined variables, Variable values, Environment variables
630 @subsection Environment substitutions
632 Much of the power of environment variables lies in the way that they may
633 be substituted into configuration files. Variable substitutions are
636 The line is scanned from left to right. In this scan, all characters
637 other than dollar signs (@samp{$}) are retained unmolested. Dollar
638 signs, however, introduce an environment variable reference. References
643 Replaced by the value of environment variable @var{var}, determined as
644 specified in @ref{Variable values}. @var{var} must be one of the
652 Exactly one nonalphabetic character. This may not be a left brace
657 Same as above, but @var{var} may contain any character (except
661 Replaced by a single dollar sign.
664 Undefined variables expand to a empty value.
666 @node Predefined variables, , Environment substitutions, Environment variables
667 @subsection Predefined environment variables
669 There are two environment variables predefined for use in environment
674 Defined as the version number of PSPP, as a string, in a format
675 something like @samp{0.9.4}.
678 Defined as the host architecture of PSPP, as a string, in standard
679 cpu-manufacturer-OS format. For instance, Debian GNU/Linux 1.1 on an
680 Intel machine defines this as @samp{i586-unknown-linux}. This is
681 somewhat dependent on the system used to compile PSPP.
684 Nothing prevents these values from being overridden, although it's a
685 good idea not to do so.
687 @node Output devices, PostScript driver class, Environment variables, Configuration
688 @section Output devices
690 Configuring output devices is the most complicated aspect of configuring
691 PSPP. The output device configuration file is named
692 @file{devices}. It is searched for using the usual algorithm for
693 finding configuration files (@pxref{File locations}). Each line in the
694 file is read in the usual manner for configuration files
695 (@pxref{Configuration files}).
697 Lines in @file{devices} are divided into three categories, described
698 briefly in the table below:
701 @item driver category definitions
702 Define a driver in terms of other drivers.
704 @item macro definitions
705 Define environment variables local to the the output driver
708 @item device definitions
709 Describe the configuration of an output device.
712 The following sections further elaborate the contents of the
716 * Driver categories:: How to organize the driver namespace.
717 * Macro definitions:: Environment variables local to @file{devices}.
718 * Device definitions:: Output device descriptions.
719 * Dimensions:: Lengths, widths, sizes, @enddots{}
720 * papersize:: Letter, legal, A4, envelope, @enddots{}
721 * Distinguishing line types:: Details on @file{devices} parsing.
722 * Tokenizing lines:: Dividing @file{devices} lines into tokens.
725 @node Driver categories, Macro definitions, Output devices, Output devices
726 @subsection Driver categories
728 Drivers can be divided into categories. Drivers are specified by their
729 names, or by the names of the categories that they are contained in.
730 Only certain drivers are enabled each time PSPP is run; by
731 default, these are the drivers in the category `default'. To enable a
732 different set of drivers, use the @samp{-o @var{device}} command-line
733 option (@pxref{Invocation}).
735 Categories are specified with a line of the form
736 @samp{@var{category}=@var{driver1} @var{driver2} @var{driver3} @var{@dots{}}
737 @var{driver@var{n}}}. This line specifies that the category
738 @var{category} is composed of drivers named @var{driver1},
739 @var{driver2}, and so on. There may be any number of drivers in the
740 category, from zero on up.
742 Categories may also be specified on the command line
743 (@pxref{Invocation}).
745 This is all you need to know about categories. If you're still curious,
748 First of all, the term `categories' is a bit of a misnomer. In fact,
749 the internal representation is nothing like the hierarchy that the term
750 seems to imply: a linear list is used to keep track of the enabled
753 When PSPP first begins reading @file{devices}, this list contains
754 the name of any drivers or categories specified on the command line, or
755 the single item `default' if none were specified.
757 Each time a category definition is specified, the list is searched for
758 an item with the value of @var{category}. If a matching item is found,
759 it is deleted. If there was a match, the list of drivers (@var{driver1}
760 through @var{driver@var{n}}) is then appended to the list.
762 Each time a driver definition line is encountered, the list is searched.
763 If the list contains an item with that driver's name, the driver is
764 enabled and the item is deleted from the list. Otherwise, the driver
767 It is an error if the list is not empty when the end of @file{devices}
770 @node Macro definitions, Device definitions, Driver categories, Output devices
771 @subsection Macro definitions
773 Macro definitions take the form @samp{define @var{macroname}
774 @var{definition}}. In such a macro definition, the environment variable
775 @var{macroname} is defined to expand to the value @var{definition}.
776 Before the definition is made, however, any macros used in
777 @var{definition} are expanded.
779 Please note the following nuances of macro usage:
783 For the purposes of this section, @dfn{macro} and @dfn{environment
784 variable} are synonyms.
787 Macros may not take arguments.
790 Macros may not recurse.
793 Macros are just environment variable definitions like other environment
794 variable definitions, with the exception that they are limited in scope
795 to the @file{devices} configuration file.
798 Macros override other all environment variables of the same name (within
799 the scope of @file{devices}).
802 Earlier macro definitions for a particular @var{key} override later
803 ones. In particular, macro definitions on the command line override
804 those in the device definition file. @xref{Non-option Arguments}.
807 There are two predefined macros, whose values are determined at runtime:
811 Defined as the width of the console screen, in columns of text.
814 Defined as the length of the console screen, in lines of text.
818 @node Device definitions, Dimensions, Macro definitions, Output devices
819 @subsection Driver definitions
821 Driver definitions are the ultimate purpose of the @file{devices}
822 configuration file. These are where the real action is. Driver
823 definitions tell PSPP where it should send its output.
825 Each driver definition line is divided into four fields. These fields
826 are delimited by colons (@samp{:}). Each line is subjected to
827 environment variable interpolation before it is processed further
828 (@pxref{Environment substitutions}). From left to right, the four
829 fields are, in brief:
833 A unique identifier, used to determine whether to enable the driver.
836 One of the predefined driver classes supported by PSPP. The
837 currently supported driver classes include `postscript' and `ascii'.
840 Zero or more of the following keywords, delimited by spaces:
845 Indicates that the device is a screen display. This may reduce the
846 amount of buffering done by the driver, to make interactive use more
851 Indicates that the device is a printer.
855 Indicates that the device is a listing file.
858 These options are just hints to PSPP and do not cause the output to be
859 directed to the screen, or to the printer, or to a listing file---those
860 must be set elsewhere in the options. They are used primarily to decide
861 which devices should be enabled at any given time. @xref{SET}, for more
865 An optional set of options to pass to the driver itself. The exact
866 format for the options varies among drivers.
869 The driver is enabled if:
873 Its driver name is specified on the command line, or
876 It's in a category specified on the command line, or
879 If no categories or driver names are specified on the command line, it
880 is in category @code{default}.
883 For more information on driver names, see @ref{Driver categories}.
885 The class name must be one of those supported by PSPP. The
886 classes supported depend on the options with which PSPP was
887 compiled. See later sections in this chapter for descriptions of the
888 available driver classes.
890 Options are dependent on the driver. See the driver descriptions for
893 @node Dimensions, papersize, Device definitions, Output devices
894 @subsection Dimensions
896 Quite often in configuration it is necessary to specify a length or a
897 size. PSPP uses a common syntax for all such, calling them
898 collectively by the name @dfn{dimensions}.
902 You can specify dimensions in decimal form (@samp{12.5}) or as
903 fractions, either as mixed numbers (@samp{12-1/2}) or raw fractions
907 A number of different units are available. These are suffixed to the
908 numeric part of the dimension. There must be no spaces between the
909 number and the unit. The available units are identical to those offered
910 by the popular typesetting system @TeX{}:
914 inch (1 @code{in} = 2.54 @code{cm})
917 inch (1 @code{in} = 2.54 @code{cm})
920 printer's point (1 @code{in} = 72.27 @code{pt})
923 pica (12 @code{pt} = 1 @code{pc})
926 PostScript point (1 @code{in} = 72 @code{bp})
932 millimeter (10 @code{mm} = 1 @code{cm})
935 didot point (1157 @code{dd} = 1238 @code{pt})
938 cicero (1 @code{cc} = 12 @code{dd})
941 scaled point (65536 @code{sp} = 1 @code{pt})
945 If no explicit unit is given, a DWIM@footnote{Do What I Mean}
946 ``feature'' attempts to guess the best unit:
950 Numbers less than 50 are assumed to be in inches.
953 Numbers 50 or greater are assumed to be in millimeters.
957 @node papersize, Distinguishing line types, Dimensions, Output devices
958 @subsection Paper sizes
960 Output drivers usually deal with some sort of hardcopy media. This
961 media is called @dfn{paper} by the drivers, though in reality it could
962 be a transparency or film or thinly veiled sarcasm. To make it easier
963 for you to deal with paper, PSPP allows you to have (of course!) a
964 configuration file that gives symbolic names, like ``letter'' or
965 ``legal'' or ``a4'', to paper sizes, rather than forcing you to use
966 cryptic numbers like ``8-1/2 x 11'' or ``210 by 297''. Surprisingly
967 enough, this configuration file is named @file{papersize}.
968 @xref{Configuration files}.
970 When PSPP tries to connect a symbolic paper name to a paper size, it
971 reads and parses each non-comment line in the file, in order. The first
972 field on each line must be a symbolic paper name in double quotes.
973 Paper names may not contain double quotes. Paper names are not
974 case-sensitive: @samp{legal} and @samp{Legal} are equivalent.
976 If a match is found for the paper name, the rest of the line is parsed.
977 If it is found to be a pair of dimensions (@pxref{Dimensions}) separated
978 by either @samp{x} or @samp{by}, then those are taken to be the paper
979 size, in order of width followed by length. There @emph{must} be at
980 least one space on each side of @samp{x} or @samp{by}.
982 Otherwise the line must be of the form
983 @samp{"@var{paper-1}"="@var{paper-2}"}. In this case the target of the
984 search becomes paper name @var{paper-2} and the search through the file
987 @node Distinguishing line types, Tokenizing lines, papersize, Output devices
988 @subsection How lines are divided into types
990 The lines in @file{devices} are distinguished in the following manner:
994 Leading whitespace is removed.
997 If the resulting line begins with the exact string @code{define},
998 followed by one or more whitespace characters, the line is processed as
1002 Otherwise, the line is scanned for the first instance of a colon
1003 (@samp{:}) or an equals sign (@samp{=}).
1006 If a colon is encountered first, the line is processed as a driver
1010 Otherwise, if an equals sign is encountered, the line is processed as a
1014 Otherwise, the line is ill-formed.
1017 @node Tokenizing lines, , Distinguishing line types, Output devices
1018 @subsection How lines are divided into tokens
1020 Each driver definition line is run through a simple tokenizer. This
1021 tokenizer recognizes two basic types of tokens.
1023 The first type is an equals sign (@samp{=}). Equals signs are both
1024 delimiters between tokens and tokens in themselves.
1026 The second type is an identifier or string token. Identifiers and
1027 strings are equivalent after tokenization, though they are written
1028 differently. An identifier is any string of characters other than
1029 whitespace or equals sign.
1031 A string is introduced by a single- or double-quote character (@samp{'}
1032 or @samp{"}) and, in general, continues until the next occurrence of
1033 that same character. The following standard C escapes can also be
1034 embedded within strings:
1038 A single-quote (@samp{'}).
1041 A double-quote (@samp{"}).
1044 A question mark (@samp{?}). Included for hysterical raisins.
1047 A backslash (@samp{\}).
1050 Audio bell (ASCII 7).
1053 Backspace (ASCII 8).
1056 Formfeed (ASCII 12).
1062 Carriage return (ASCII 13).
1068 Vertical tab (ASCII 11).
1070 @item \@var{o}@var{o}@var{o}
1071 Each @samp{o} must be an octal digit. The character is the one having
1072 the octal value specified. Any number of octal digits is read and
1073 interpreted; only the lower 8 bits are used.
1075 @item \x@var{h}@var{h}
1076 Each @samp{h} must be a hex digit. The character is the one having the
1077 hexadecimal value specified. Any number of hex digits is read and
1078 interpreted; only the lower 8 bits are used.
1081 Tokens, outside of quoted strings, are delimited by whitespace or equals
1084 @node PostScript driver class, ASCII driver class, Output devices, Configuration
1085 @section The PostScript driver class
1087 The @code{postscript} driver class is used to produce output that is
1088 acceptable to PostScript printers and to PC-based PostScript
1089 interpreters such as Ghostscript. Continuing a long tradition,
1090 PSPP's PostScript driver is configurable to the point of
1093 There are actually two PostScript drivers. The first one,
1094 @samp{postscript}, produces ordinary DSC-compliant PostScript output.
1095 The second one @samp{epsf}, produces an Encapsulated PostScript file.
1096 The two drivers are otherwise identical in configuration and in
1099 The PostScript driver is described in further detail below.
1102 * PS output options:: Output file options.
1103 * PS page options:: Paper, margins, scaling & rotation, more!
1104 * PS file options:: Configuration files.
1105 * PS font options:: Default fonts, font options.
1106 * PS line options:: Line widths, options.
1107 * Prologue:: Details on the PostScript prologue.
1108 * Encodings:: Details on PostScript font encodings.
1111 @node PS output options, PS page options, PostScript driver class, PostScript driver class
1112 @subsection PostScript output options
1114 These options deal with the form of the output and the output file
1118 @item output-file=@var{filename}
1120 File to which output should be sent. This can be an ordinary filename
1121 (i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or
1122 stdout (@code{"-"}). Default: @code{"pspp.ps"}.
1124 @item color=@var{boolean}
1126 Most of the time black-and-white PostScript devices are smart enough to
1127 map colors to shades themselves. However, you can cause the PSPP
1128 output driver to do an ugly simulation of this in its own driver by
1129 turning @code{color} off. Default: @code{on}.
1131 This is a boolean setting, as are many settings in the PostScript
1132 driver. Valid positive boolean values are @samp{on}, @samp{true},
1133 @samp{yes}, and nonzero integers. Negative boolean values are
1134 @samp{off}, @samp{false}, @samp{no}, and zero.
1136 @item data=@var{data-type}
1138 One of @code{clean7bit}, @code{clean8bit}, or @code{binary}. This
1139 controls what characters will be written to the output file. PostScript
1140 produced with @code{clean7bit} can be transmitted over 7-bit
1141 transmission channels that use ASCII control characters for line
1142 control. @code{clean8bit} is similar but allows characters above 127 to
1143 be written to the output file. @code{binary} allows any character in
1144 the output file. Default: @code{clean7bit}.
1146 @item line-ends=@var{line-end-type}
1148 One of @code{cr}, @code{lf}, or @code{crlf}. This controls what is used
1149 for newline in the output file. Default: @code{cr}.
1151 @item optimize-line-size=@var{level}
1153 Either @code{0} or @code{1}. If @var{level} is @code{1}, then short
1154 line segments will be collected and merged into longer ones. This
1155 reduces output file size but requires more time and memory. A
1156 @var{level} of @code{0} has the advantage of being better for
1157 interactive environments. @code{1} is the default unless the
1158 @code{screen} flag is set; in that case, the default is @code{0}.
1160 @item optimize-text-size=@var{level}
1162 One of @code{0}, @code{1}, or @code{2}, each higher level representing
1163 correspondingly more aggressive space savings for text in the output
1164 file and requiring correspondingly more time and memory. Unfortunately
1165 the levels presently are all the same. @code{1} is the default unless
1166 the @code{screen} flag is set; in that case, the default is @code{0}.
1169 @node PS page options, PS file options, PS output options, PostScript driver class
1170 @subsection PostScript page options
1172 These options affect page setup:
1175 @item headers=@var{boolean}
1177 Controls whether the standard headers showing the time and date and
1178 title and subtitle are printed at the top of each page. Default:
1181 @item paper-size=@var{paper-size}
1183 Paper size, either as a symbolic name (i.e., @code{letter} or @code{a4})
1184 or specific measurements (i.e., @code{8-1/2x11} or @code{"210 x 297"}.
1185 @xref{papersize, , Paper sizes}. Default: @code{letter}.
1187 @item orientation=@var{orientation}
1189 Either @code{portrait} or @code{landscape}. Default: @code{portrait}.
1191 @item left-margin=@var{dimension}
1192 @itemx right-margin=@var{dimension}
1193 @itemx top-margin=@var{dimension}
1194 @itemx bottom-margin=@var{dimension}
1196 Sets the margins around the page. The headers, if enabled, are not
1197 included in the margins; they are in addition to the margins. For a
1198 description of dimensions, see @ref{Dimensions}. Default: @code{0.5in}.
1202 @node PS file options, PS font options, PS page options, PostScript driver class
1203 @subsection PostScript file options
1205 Oh, my. You don't really want to know about the way that the PostScript
1206 driver deals with files, do you? Well I suppose you're entitled, but I
1207 warn you right now: it's not pretty. Here goes@enddots{}
1209 First let's look at the options that are available:
1213 @item font-dir=@var{font-directory}
1215 Sets the font directory. Default: @code{devps}.
1217 @item prologue-file=@var{prologue-file-name}
1219 Sets the name of the PostScript prologue file. You can write your own
1220 prologue, though I have no idea why you'd want to: see @ref{Prologue}.
1221 Default: @code{ps-prologue}.
1223 @item device-file=@var{device-file-name}
1225 Sets the name of the Groff-format device description file. The
1226 PostScript driver reads this in order to know about the scaling of fonts
1227 and so on. The format of such files is described in groff_font(5),
1228 included with Groff. Default: @code{DESC}.
1230 @item encoding-file=@var{encoding-file-name}
1232 Sets the name of the encoding file. This file contains a list of all
1233 font encodings that will be needed so that the driver can put all of
1234 them at the top of the prologue. @xref{Encodings}. Default:
1235 @code{ps-encodings}.
1237 If the specified encoding file cannot be found, this error will be
1238 silently ignored, since most people do not need any encodings besides
1239 the ones that can be found using @code{auto-encodings}, described below.
1241 @item auto-encode=@var{boolean}
1243 When enabled, the font encodings needed by the default proportional- and
1244 fixed-pitch fonts will automatically be dumped to the PostScript
1245 output. Otherwise, it is assumed that the user has an encoding file
1246 and knows how to use it (@pxref{Encodings}). There is probably no good
1247 reason to turn off this convenient feature. Default: @code{on}.
1251 Next I suppose it's time to describe the search algorithm. When the
1252 PostScript driver needs a file, whether that file be a font, a
1253 PostScript prologue, or what you will, it searches in this manner:
1258 Constructs a path by taking the first of the following that is defined:
1263 Environment variable @code{STAT_GROFF_FONT_PATH}. @xref{Environment
1267 Environment variable @code{GROFF_FONT_PATH}.
1270 The compiled-in fallback default.
1274 Constructs a base name from concatenating, in order, the font directory,
1275 a path separator (@samp{/} or @samp{\}), and the file to be found. A
1276 typical base name would be something like @code{devps/ps-encodings}.
1279 Searches for the base name in the path constructed above. If the file
1280 is found, the algorithm terminates.
1283 Searches for the base name in the standard configuration path. See
1284 @ref{File locations}, for more details. If the file is found, the
1285 algorithm terminates.
1288 At this point we remove the font directory and path separator from the
1289 base name. Now the base name is simply the file to be found, i.e.,
1290 @code{ps-encodings}.
1293 Searches for the base name in the path constructed in the first step.
1294 If the file is found, the algorithm terminates.
1297 Searches for the base name in the standard configuration path. If the
1298 file is found, the algorithm terminates.
1301 The algorithm terminates unsuccessfully.
1304 So, as you see, there are several ways to configure the PostScript
1305 drivers. Careful selection of techniques can make the configuration
1306 very flexible indeed.
1308 @node PS font options, PS line options, PS file options, PostScript driver class
1309 @subsection PostScript font options
1311 The list of available font options is short and sweet:
1314 @item prop-font=@var{font-name}
1316 Sets the default proportional font. The name should be that of a
1317 PostScript font. Default: @code{"Helvetica"}.
1319 @item fixed-font=@var{font-name}
1321 Sets the default fixed-pitch font. The name should be that of a
1322 PostScript font. Default: @code{"Courier"}.
1324 @item font-size=@var{font-size}
1326 Sets the size of the default fonts, in thousandths of a point. Default:
1331 @node PS line options, Prologue, PS font options, PostScript driver class
1332 @subsection PostScript line options
1334 Most tables contain lines, or rules, between cells. Some features of
1335 the way that lines are drawn in PostScript tables are user-definable:
1339 @item line-style=@var{style}
1341 Sets the style used for lines used to divide tables into sections.
1342 @var{style} must be either @code{thick}, in which case thick lines are
1343 used, or @var{double}, in which case double lines are used. Default:
1346 @item line-gutter=@var{dimension}
1348 Sets the line gutter, which is the amount of whitespace on either side
1349 of lines that border text or graphics objects. @xref{Dimensions}.
1350 Default: @code{0.5pt}.
1352 @item line-spacing=@var{dimension}
1354 Sets the line spacing, which is the amount of whitespace that separates
1355 lines that are side by side, as in a double line. Default:
1358 @item line-width=@var{dimension}
1360 Sets the width of a typical line used in tables. Default: @code{0.5pt}.
1362 @item line-width-thick=@var{dimension}
1364 Sets the width of a thick line used in tables. Not used if
1365 @code{line-style} is set to @code{thick}. Default: @code{1.5pt}.
1369 @node Prologue, Encodings, PS line options, PostScript driver class
1370 @subsection The PostScript prologue
1372 Most PostScript files that are generated mechanically by programs
1373 consist of two parts: a prologue and a body. The prologue is generally
1374 a collection of boilerplate. Only the body differs greatly between
1375 two outputs from the same program.
1377 This is also the strategy used in the PSPP PostScript driver. In
1378 general, the prologue supplied with PSPP will be more than sufficient.
1379 In this case, you will not need to read the rest of this section.
1380 However, hackers might want to know more. Read on, if you fall into
1383 The prologue is dumped into the output stream essentially unmodified.
1384 However, two actions are performed on its lines. First, certain lines
1385 may be omitted as specified in the prologue file itself. Second,
1386 variables are substituted.
1388 The following lines are omitted:
1392 All lines that contain three bangs in a row (@code{!!!}).
1395 Lines that contain @code{!eps}, if the PostScript driver is producing
1396 ordinary PostScript output. Otherwise an EPS file is being produced,
1397 and the line is included in the output, although everything following
1398 @code{!eps} is deleted.
1401 Lines that contain @code{!ps}, if the PostScript driver is producing EPS
1402 output. Otherwise, ordinary PostScript is being produced, and the line
1403 is included in the output, although everything following @code{!ps} is
1407 The following are the variables that are substituted. Only the
1408 variables listed are substituted; environment variables are not.
1409 @xref{Environment substitutions}.
1414 The page bounding box, in points, as four space-separated numbers. For
1415 U.S. letter size paper, this is @samp{0 0 612 792}.
1419 PSPP version as a string: @samp{GNU PSPP 0.1b}, for example.
1423 Date the file was created. Example: @samp{Tue May 21 13:46:22 1991}.
1427 Value of the @code{data} PostScript driver option, as one of the strings
1428 @samp{Clean7Bit}, @samp{Clean8Bit}, or @samp{Binary}.
1432 Page orientation, as one of the strings @code{Portrait} or
1437 Under multiuser OSes, the user's login name, taken either from the
1438 environment variable @code{LOGNAME} or, if that fails, the result of the
1439 C library function @code{getlogin()}. Defaults to @samp{nobody}.
1443 System hostname as reported by @code{gethostname()}. Defaults to
1448 Name of the default proportional font, prefixed by the word
1449 @samp{font} and a space. Example: @samp{font Times-Roman}.
1453 Name of the default fixed-pitch font, prefixed by the word @samp{font}
1458 The page scaling factor as a floating-point number. Example:
1459 @code{1.0}. Note that this is also passed as an argument to the BP
1465 The paper length and paper width, respectively, in thousandths of a
1466 point. Note that these are also passed as arguments to the BP macro.
1471 The left margin and top margin, respectively, in thousandths of a
1472 point. Note that these are also passed as arguments to the BP macro.
1476 Document title as a string. This is not the title specified in the
1477 PSPP syntax file. A typical title is the word @samp{PSPP} followed
1478 by the syntax file name in parentheses. Example: @samp{PSPP
1483 PSPP syntax file name. Example: @samp{mary96/first.stat}.
1487 Any other questions about the PostScript prologue can best be answered
1488 by examining the default prologue or the PSPP source.
1490 @node Encodings, , Prologue, PostScript driver class
1491 @subsection PostScript encodings
1493 PostScript fonts often contain many more than 256 characters, in order
1494 to accommodate foreign language characters and special symbols.
1495 PostScript uses @dfn{encodings} to map these onto single-byte symbol
1496 sets. Each font can have many different encodings applied to it.
1498 PSPP's PostScript driver needs to know which encoding to apply to each
1499 font. It can determine this from the information encapsulated in the
1500 Groff font description that it reads. However, there is an additional
1501 problem---for efficiency, the PostScript driver needs to have a complete
1502 list of all encodings that will be used in the entire session @emph{when
1503 it opens the output file}. For this reason, it can't use the
1504 information built into the fonts because it doesn't know which fonts
1507 As a stopgap solution, there are two mechanisms for specifying which
1508 encodings will be used. The first mechanism is automatic and it is the
1509 only one that most PSPP users will ever need. The second mechanism is
1510 manual, but it is more flexible. Either mechanism or both may be used
1513 The first mechanism is activated by the @samp{auto-encode} driver option
1514 (@pxref{PS file options}). When enabled, @samp{auto-encode} causes the
1515 PostScript driver to include the encodings used by the default
1516 proportional and fixed-pitch fonts (@pxref{PS font options}). Many
1517 PSPP output files will only need these encodings.
1519 The second mechanism is the file specified by the @samp{encoding-file}
1520 option (@pxref{PS file options}). If it exists, this file must consist
1521 of lines in PSPP configuration-file format (@pxref{Configuration
1522 files}). Each line that is not a comment should name a PostScript
1523 encoding to include in the output.
1525 It is not an error if an encoding is included more than once, by either
1526 mechanism. It will appear only once in the output. It is also not an
1527 error if an encoding is included in the output but never used. It
1528 @emph{is} an error if an encoding is used but not included by one of
1529 these mechanisms. In this case, the built-in PostScript encoding
1530 @samp{ISOLatin1Encoding} is substituted.
1532 @node ASCII driver class, HTML driver class, PostScript driver class, Configuration
1533 @section The ASCII driver class
1535 The ASCII driver class produces output that can be displayed on a
1536 terminal or output to printers. All of its options are highly
1537 configurable. The ASCII driver has class name @samp{ascii}.
1539 The ASCII driver is described in further detail below.
1542 * ASCII output options:: Output file options.
1543 * ASCII page options:: Page size, margins, more.
1544 * ASCII font options:: Box character, bold & italics.
1547 @node ASCII output options, ASCII page options, ASCII driver class, ASCII driver class
1548 @subsection ASCII output options
1551 @item output-file=@var{filename}
1553 File to which output should be sent. This can be an ordinary filename
1554 (e.g., @code{"pspp.txt"}), a pipe filename (e.g., @code{"|lpr"}), or
1555 stdout (@code{"-"}). Default: @code{"pspp.list"}.
1557 @item char-set=@var{char-set-type}
1559 One of @samp{ascii} or @samp{latin1}. This has no effect on output at
1560 the present time. Default: @code{ascii}.
1562 @item form-feed-string=@var{form-feed-value}
1564 The string written to the output to cause a formfeed. See also
1565 @code{paginate}, described below, for a related setting. Default:
1568 @item newline-string=@var{newline-value}
1570 The string written to the output to cause a newline (carriage return
1571 plus linefeed). The default, which can be specified explicitly with
1572 @code{newline-string=default}, is to use the system-dependent newline
1573 sequence by opening the output file in text mode. This is usually the
1576 However, @code{newline-string} can be set to any string. When this is
1577 done, the output file is opened in binary mode.
1579 @item paginate=@var{boolean}
1581 If set, a formfeed (as set in @code{form-feed-string}, described above)
1582 will be written to the device after every page. Default: @code{on}.
1584 @item tab-width=@var{tab-width-value}
1586 The distance between tab stops for this device. If set to 0, tabs will
1587 not be used in the output. Default: @code{8}.
1589 @item init=@var{initialization-string}.
1591 String written to the device before anything else, at the beginning of
1592 the output. Default: @code{""} (the empty string).
1594 @item done=@var{finalization-string}.
1596 String written to the device after everything else, at the end of the
1597 output. Default: @code{""} (the empty string).
1600 @node ASCII page options, ASCII font options, ASCII output options, ASCII driver class
1601 @subsection ASCII page options
1603 These options affect page setup:
1606 @item headers=@var{boolean}
1608 If enabled, two lines of header information giving title and subtitle,
1609 page number, date and time, and PSPP version are printed at the top of
1610 every page. These two lines are in addition to any top margin
1611 requested. Default: @code{on}.
1613 @item length=@var{line-count}
1615 Physical length of a page, in lines. Headers and margins are subtracted
1616 from this value. Default: @code{66}.
1618 @item width=@var{character-count}
1620 Physical width of a page, in characters. Margins are subtracted from
1621 this value. Default: @code{130}.
1623 @item lpi=@var{lines-per-inch}
1625 Number of lines per vertical inch. Not currently used. Default: @code{6}.
1627 @item cpi=@var{characters-per-inch}
1629 Number of characters per horizontal inch. Not currently used. Default:
1632 @item left-margin=@var{left-margin-width}
1634 Width of the left margin, in characters. PSPP subtracts this value
1635 from the page width. Default: @code{0}.
1637 @item right-margin=@var{right-margin-width}
1639 Width of the right margin, in characters. PSPP subtracts this value
1640 from the page width. Default: @code{0}.
1642 @item top-margin=@var{top-margin-lines}
1644 Length of the top margin, in lines. PSPP subtracts this value from
1645 the page length. Default: @code{2}.
1647 @item bottom-margin=@var{bottom-margin-lines}
1649 Length of the bottom margin, in lines. PSPP subtracts this value from
1650 the page length. Default: @code{2}.
1654 @node ASCII font options, , ASCII page options, ASCII driver class
1655 @subsection ASCII font options
1657 These are the ASCII font options:
1660 @item box[@var{line-type}]=@var{box-chars}
1662 The characters used for lines in tables produced by the ASCII driver can
1663 be changed using this option. @var{line-type} is used to indicate which
1664 type of line to change; @var{box-chars} is the character or string of
1665 characters to use for this type of line.
1667 @var{line-type} must be a 4-digit number in base 4. The digits are in
1668 the order `right', `bottom', `left', `top'. The four possibilities for
1682 Special device-defined line, if one is available; otherwise, a double
1691 Sets @samp{|} as the character to use for a single-width line with
1692 bottom and top components.
1696 Sets @samp{#} as the character to use for the intersection of four
1697 double-width lines, one each from the top, bottom, left and right.
1699 @item box[1100]="\xda"
1701 Sets @samp{"\xda"}, which under MS-DOG is a box character suitable for
1702 the top-left corner of a box, as the character for the intersection of
1703 two single-width lines, one each from the right and bottom.
1711 @code{box[0000]=" "}
1714 @code{box[1000]="-"}
1715 @*@code{box[0010]="-"}
1716 @*@code{box[1010]="-"}
1719 @code{box[0100]="|"}
1720 @*@code{box[0001]="|"}
1721 @*@code{box[0101]="|"}
1724 @code{box[2000]="="}
1725 @*@code{box[0020]="="}
1726 @*@code{box[2020]="="}
1729 @code{box[0200]="#"}
1730 @*@code{box[0002]="#"}
1731 @*@code{box[0202]="#"}
1734 @code{box[3000]="="}
1735 @*@code{box[0030]="="}
1736 @*@code{box[3030]="="}
1739 @code{box[0300]="#"}
1740 @*@code{box[0003]="#"}
1741 @*@code{box[0303]="#"}
1744 For all others, @samp{+} is used unless there are double lines or
1745 special lines, in which case @samp{#} is used.
1748 @item italic-on=@var{italic-on-string}
1750 Character sequence written to turn on italics or underline printing. If
1751 this is set to @code{overstrike}, then the driver will simulate
1752 underlining by overstriking with underscore characters (@samp{_}) in the
1753 manner described by @code{overstrike-style} and
1754 @code{carriage-return-style}. Default: @code{overstrike}.
1756 @item italic-off=@var{italic-off-string}
1758 Character sequence to turn off italics or underline printing. Default:
1759 @code{""} (the empty string).
1761 @item bold-on=@var{bold-on-string}
1763 Character sequence written to turn on bold or emphasized printing. If
1764 set to @code{overstrike}, then the driver will simulated bold printing
1765 by overstriking characters in the manner described by
1766 @code{overstrike-style} and @code{carriage-return-style}. Default:
1769 @item bold-off=@var{bold-off-string}
1771 Character sequence to turn off bold or emphasized printing. Default:
1772 @code{""} (the empty string).
1774 @item bold-italic-on=@var{bold-italic-on-string}
1776 Character sequence written to turn on bold-italic printing. If set to
1777 @code{overstrike}, then the driver will simulate bold-italics by
1778 overstriking twice, once with the character, a second time with an
1779 underscore (@samp{_}) character, in the manner described by
1780 @code{overstrike-style} and @code{carriage-return-style}. Default:
1783 @item bold-italic-off=@var{bold-italic-off-string}
1785 Character sequence to turn off bold-italic printing. Default: @code{""}
1788 @item overstrike-style=@var{overstrike-option}
1790 Either @code{single} or @code{line}:
1794 If @code{single} is selected, then, to overstrike a line of text, the
1795 output driver will output a character, backspace, overstrike, output a
1796 character, backspace, overstrike, and so on along a line.
1799 If @code{line} is selected then the output driver will output an entire
1800 line, then backspace or emit a carriage return (as indicated by
1801 @code{carriage-return-style}), then overstrike the entire line at once.
1804 @code{single} is recommended for use with ttys and programs that
1805 understand overstriking in text files, such as the pager @code{less}.
1806 @code{single} will also work with printer devices but results in rapid
1807 back-and-forth motions of the printhead that can cause the printer to
1808 physically overheat!
1810 @code{line} is recommended for use with printer devices. Most programs
1811 that understand overstriking in text files will not properly deal with
1814 Default: @code{single}.
1816 @item carriage-return-style=@var{carriage-return-type}
1818 Either @code{bs} or @code{cr}. This option applies only when one or
1819 more of the font commands is set to @code{overstrike} and, at the same
1820 time, @code{overstrike-style} is set to @code{line}.
1824 If @code{bs} is selected then the driver will return to the beginning of
1825 a line by emitting a sequence of backspace characters (ASCII 8).
1828 If @code{cr} is selected then the driver will return to the beginning of
1829 a line by emitting a single carriage-return character (ASCII 13).
1832 Although @code{cr} is preferred as being more compact, @code{bs} is more
1833 general since some devices do not interpret carriage returns in the
1834 desired manner. Default: @code{bs}.
1837 @node HTML driver class, Miscellaneous configuring, ASCII driver class, Configuration
1838 @section The HTML driver class
1840 The @code{html} driver class is used to produce output for viewing in
1841 tables-capable web browsers such as Emacs' w3-mode. Its configuration
1842 is very simple. Currently, the output has a very plain format. In the
1843 future, further work may be done on improving the output appearance.
1845 There are few options for use with the @code{html} driver class:
1848 @item output-file=@var{filename}
1850 File to which output should be sent. This can be an ordinary filename
1851 (i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or
1852 stdout (@code{"-"}). Default: @code{"pspp.html"}.
1854 @item prologue-file=@var{prologue-file-name}
1856 Sets the name of the PostScript prologue file. You can write your own
1857 prologue if you want to customize colors or other settings: see
1858 @ref{HTML Prologue}. Default: @code{html-prologue}.
1862 * HTML Prologue:: Format of the HTML prologue file.
1865 @node HTML Prologue, , HTML driver class, HTML driver class
1866 @subsection The HTML prologue
1868 HTML files that are generated by PSPP consist of two parts: a prologue
1869 and a body. The prologue is a collection of boilerplate. Only the body
1870 differs greatly between two outputs. You can tune the colors and other
1871 attributes of the output by editing the prologue.
1873 The prologue is dumped into the output stream essentially unmodified.
1874 However, two actions are performed on its lines. First, certain lines
1875 may be omitted as specified in the prologue file itself. Second,
1876 variables are substituted.
1878 The following lines are omitted:
1882 All lines that contain three bangs in a row (@code{!!!}).
1885 Lines that contain @code{!title}, if no title is set for the output. If
1886 a title is set, then the characters @code{!title} are removed before the
1890 Lines that contain @code{!subtitle}, if no subtitle is set for the
1891 output. If a subtitle is set, then the characters @code{!subtitle} are
1892 removed before the line is output.
1895 The following are the variables that are substituted. Only the
1896 variables listed are substituted; environment variables are not.
1897 @xref{Environment substitutions}.
1902 PSPP version as a string: @samp{GNU PSPP 0.1b}, for example.
1906 Date the file was created. Example: @samp{Tue May 21 13:46:22 1991}.
1910 Under multiuser OSes, the user's login name, taken either from the
1911 environment variable @code{LOGNAME} or, if that fails, the result of the
1912 C library function @code{getlogin()}. Defaults to @samp{nobody}.
1916 System hostname as reported by @code{gethostname()}. Defaults to
1921 Document title as a string. This is the title specified in the PSPP
1926 Document subtitle as a string.
1930 PSPP syntax file name. Example: @samp{mary96/first.stat}.
1933 @node Miscellaneous configuring, Improving output quality, HTML driver class, Configuration
1934 @section Miscellaneous configuration
1936 The following environment variables can be used to further configure
1942 Used to determine the user's home directory. No default value.
1944 @item STAT_INCLUDE_PATH
1946 Path used to find include files in PSPP syntax files. Defaults vary
1947 across operating systems:
1957 @file{~/.pspp/include}
1960 @file{/usr/local/lib/pspp/include}
1963 @file{/usr/lib/pspp/include}
1966 @file{/usr/local/share/pspp/include}
1969 @file{/usr/share/pspp/include}
1979 @file{C:\PSPP\INCLUDE}
1992 When PSPP invokes an external pager, it uses the first of these that
1993 is defined. There is a default pager only if the person who compiled
1998 The terminal type @code{termcap} or @code{ncurses} will use, if such
1999 support was compiled into PSPP.
2001 @item STAT_OUTPUT_INIT_FILE
2003 The basename used to search for the driver definition file.
2004 @xref{Output devices}. @xref{File locations}. Default: @code{devices}.
2006 @item STAT_OUTPUT_PAPERSIZE_FILE
2008 The basename used to search for the papersize file. @xref{papersize}.
2009 @xref{File locations}. Default: @code{papersize}.
2011 @item STAT_OUTPUT_INIT_PATH
2013 The path used to search for the driver definition file and the papersize
2014 file. @xref{File locations}. Default: the standard configuration path.
2018 The @code{sort} procedure stores its temporary files in this directory.
2019 Default: (UNIX) @file{/tmp}, (MS-DOS) @file{\}, (other OSes) empty string.
2024 Under MS-DOS only, these variables are consulted after TMPDIR, in this
2028 @node Improving output quality, , Miscellaneous configuring, Configuration
2029 @section Improving output quality
2031 When its drivers are set up properly, PSPP can produce output that
2032 looks very good indeed. The PostScript driver, suitably configured, can
2033 produce presentation-quality output. Here are a few guidelines for
2034 producing better-looking output, regardless of output driver. Your
2035 mileage may vary, of course, and everyone has different esthetic
2040 Width is important in PSPP output. Greater output width leads to more
2041 readable output, to a point. Try the following to increase the output
2046 If you're using the ASCII driver with a dot-matrix printer, figure out
2047 what you need to do to put the printer into compressed mode. Put that
2048 string into the @code{init-string} setting. Try to get 132 columns; 160
2049 might be better, but you might find that print that tiny is difficult to
2053 With the PostScript driver, try these ideas:
2060 Legal-size (8.5" x 14") paper in landscape mode.
2063 Reducing font sizes. If you're using 12-point fonts, try 10 point; if
2064 you're using 10-point fonts, try 8 point. Some fonts are more readable
2065 than others at small sizes.
2069 Try to strike a balance between character size and page width.
2072 Use high-quality fonts. Many public domain fonts are poor in quality.
2073 Recently, URW made some high-quality fonts available under the GPL.
2074 These are probably suitable.
2077 Be sure you're using the proper font metrics. The font metrics provided
2078 with PSPP may not correspond to the fonts actually being printed.
2079 This can cause bizarre-looking output.
2082 Make sure that you're using good ink/ribbon/toner. Darker print is
2086 Use plain fonts with serifs, such as Times-Roman or Palatino. Avoid
2087 choosing italic or bold fonts as document base fonts.
2090 @node Invocation, Language, Configuration, Top
2091 @chapter Invoking PSPP
2093 @cindex PSPP, invoking
2095 @cindex command line, options
2096 @cindex options, command-line
2098 pspp [ -B @var{dir} | --config-dir=@var{dir} ] [ -o @var{device} | --device=@var{device} ]
2099 [ -d @var{var}[=@var{value}] | --define=@var{var}[=@var{value}] ] [-u @var{var} | --undef=@var{var} ]
2100 [ -f @var{file} | --out-file=@var{file} ] [ -p | --pipe ] [ -I- | --no-include ]
2101 [ -I @var{dir} | --include=@var{dir} ] [ -i | --interactive ]
2102 [ -n | --edit | --dry-run | --just-print | --recon ]
2103 [ -r | --no-statrc ] [ -h | --help ] [ -l | --list ]
2104 [ -c @var{command} | --command @var{command} ] [ -s | --safer ]
2105 [ --testing-mode ] [ -V | --version ] [ -v | --verbose ]
2106 [ @var{key}=@var{value} ] @var{file}@enddots{}
2110 * Non-option Arguments:: Specifying syntax files and output devices.
2111 * Configuration Options:: Change the configuration for the current run.
2112 * Input and output options:: Controlling input and output files.
2113 * Language control options:: Language variants.
2114 * Informational options:: Helpful information about PSPP.
2117 @node Non-option Arguments, Configuration Options, Invocation, Invocation
2118 @section Non-option Arguments
2120 Syntax files and output device substitutions can be specified on
2121 PSPP's command line:
2126 A file by itself on the command line will be executed as a syntax file.
2127 PSPP terminates after the syntax file runs, unless the @code{-i} or
2128 @code{--interactive} option is given (@pxref{Language control options}).
2130 @item @var{file1} @var{file2}
2132 When two or more filenames are given on the command line, the first
2133 syntax file is executed, then PSPP's dictionary is cleared, then the second
2134 syntax file is executed.
2136 @item @var{file1} + @var{file2}
2138 If syntax files' names are delimited by a plus sign (@samp{+}), then the
2139 dictionary is not cleared between their executions, as if they were
2140 concatenated together into a single file.
2142 @item @var{key}=@var{value}
2144 Defines an output device macro @var{key} to expand to @var{value},
2145 overriding any macro having the same @var{key} defined in the device
2146 configuration file. @xref{Macro definitions}.
2150 There is one other way to specify a syntax file, if your operating
2151 system supports it. If you have a syntax file @file{foobar.stat}, put
2155 #! /usr/local/bin/pspp
2158 at the top, and mark the file as executable with @code{chmod +x
2159 foobar.stat}. (If PSPP is not installed in @file{/usr/local/bin},
2160 then insert its actual installation directory into the syntax file
2161 instead.) Now you should be able to invoke the syntax file just by
2162 typing its name. You can include any options on the command line as
2163 usual. PSPP entirely ignores any lines beginning with @samp{#!}.
2165 @node Configuration Options, Input and output options, Non-option Arguments, Invocation
2166 @section Configuration Options
2168 Configuration options are used to change PSPP's configuration for the
2169 current run. The configuration options are:
2173 @itemx --config-dir=@var{dir}
2175 Sets the configuration directory to @var{dir}. @xref{File locations}.
2177 @item -o @var{device}
2178 @itemx --device=@var{device}
2180 Selects the output device with name @var{device}. If this option is
2181 given more than once, then all devices mentioned are selected. This
2182 option disables all devices besides those mentioned on the command line.
2184 @item -d @var{var}[=@var{value}]
2185 @itemx --define=@var{var}[=@var{value}]
2187 Defines an `environment variable' named @var{var} having the optional
2188 value @var{value} specified. @xref{Variable values}.
2191 @itemx --undef=@var{var}
2193 Undefines the `environment variable' named @var{var}. @xref{Variable
2197 @node Input and output options, Language control options, Configuration Options, Invocation
2198 @section Input and output options
2200 Input and output options affect how PSPP reads input and writes
2201 output. These are the input and output options:
2205 @itemx --out-file=@var{file}
2207 This overrides the output file name for devices designated as listing
2208 devices. If a file named @var{file} already exists, it is overwritten.
2213 Allows PSPP to be used as a filter by causing the syntax file to be
2214 read from stdin and output to be written to stdout. Conflicts with the
2215 @code{-f @var{file}} and @code{--file=@var{file}} options.
2220 Clears all directories from the include path. This includes all
2221 directories put in the include path by default. @xref{Miscellaneous
2225 @itemx --include=@var{dir}
2227 Appends directory @var{dir} to the path that is searched for include
2228 files in PSPP syntax files.
2230 @item -c @var{command}
2231 @itemx --command=@var{command}
2233 Execute literal command @var{command}. The command is executed before
2234 startup syntax files, if any.
2236 @item --testing-mode
2238 Invoke heuristics to assist with testing PSPP. For use by @code{make
2239 check} and similar scripts.
2242 @node Language control options, Informational options, Input and output options, Invocation
2243 @section Language control options
2245 Language control options control how PSPP syntax files are parsed and
2246 interpreted. The available language control options are:
2250 @itemx --interactive
2252 When a syntax file is specified on the command line, PSPP normally
2253 terminates after processing it. Giving this option will cause PSPP to
2254 bring up a command prompt after processing the syntax file.
2256 In addition, this forces syntax files to be interpreted in interactive
2257 mode, rather than the default batch mode. @xref{Tokenizing lines}, for
2258 information on the differences between batch mode and interactive mode
2259 command interpretation.
2267 Only the syntax of any syntax file specified or of commands entered at
2268 the command line is checked. Transformations are not performed and
2269 procedures are not executed. Not yet implemented.
2274 Prevents the execution of the PSPP startup syntax file. Not yet
2275 implemented, as startup syntax files aren't, either.
2280 Disables certain unsafe operations. This includes the @code{ERASE} and
2281 @code{HOST} commands, as well as use of pipes as input and output files.
2284 @node Informational options, , Language control options, Invocation
2285 @section Informational options
2287 Informational options cause information about PSPP to be written to
2288 the terminal. Here are the available options:
2294 Prints a message describing PSPP command-line syntax and the available
2295 device driver classes, then terminates.
2300 Lists the available device driver classes, then terminates.
2305 Prints a brief message listing PSPP's version, warranties you don't
2306 have, copying conditions and copyright, and e-mail address for bug
2307 reports, then terminates.
2312 Increments PSPP's verbosity level. Higher verbosity levels cause
2313 PSPP to display greater amounts of information about what it is
2314 doing. Often useful for debugging PSPP's configuration.
2316 This option can be given multiple times to set the verbosity level to
2317 that value. The default verbosity level is 0, in which no informational
2318 messages will be displayed.
2320 Higher verbosity levels cause messages to be displayed when the
2321 corresponding events take place.
2326 Driver and subsystem initializations.
2330 Completion of driver initializations. Beginning of driver closings.
2334 Completion of driver closings.
2338 Files searched for; success of searches.
2342 Individual directories included in file searches.
2345 Each verbosity level also includes messages from lower verbosity levels.
2349 @node Language, Expressions, Invocation, Top
2350 @chapter The PSPP language
2351 @cindex language, PSPP
2352 @cindex PSPP, language
2355 @strong{Please note:} PSPP is not even close to completion.
2356 Only a few actual statistical procedures are implemented. PSPP
2357 is a work in progress.
2360 This chapter discusses elements common to many PSPP commands.
2361 Later chapters will describe individual commands in detail.
2364 * Tokens:: Characters combine to form tokens.
2365 * Commands:: Tokens combine to form commands.
2366 * Types of Commands:: Commands come in several flavors.
2367 * Order of Commands:: Commands combine to form syntax files.
2368 * Missing Observations:: Handling missing observations.
2369 * Variables:: The unit of data storage.
2370 * Files:: Files used by PSPP.
2371 * BNF:: How command syntax is described.
2374 @node Tokens, Commands, Language, Language
2376 @cindex language, lexical analysis
2377 @cindex language, tokens
2379 @cindex lexical analysis
2382 PSPP divides most syntax file lines into series of short chunks
2383 called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}. These
2384 tokens are then grouped to form commands, each of which tells
2385 PSPP to take some action---read in data, write out data, perform
2386 a statistical procedure, etc. The process of dividing input into tokens
2387 is @dfn{tokenization}, or @dfn{lexical analysis}. Each type of token is
2392 Tokens must be separated from each other by @dfn{delimiters}.
2393 Delimiters include whitespace (spaces, tabs, carriage returns, line
2394 feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
2395 operators (plus, minus, times, divide, etc.) Note that while whitespace
2396 only separates tokens, other delimiters are tokens in themselves.
2401 Identifiers are names that specify variable names, commands, or command
2406 The first character in an identifier must be a letter, @samp{#}, or
2407 @samp{@@}. Some system identifiers begin with @samp{$}, but
2408 user-defined variables' names may not begin with @samp{$}.
2411 The remaining characters in the identifier must be letters, digits, or
2412 one of the following special characters:
2419 @cindex variable names
2420 @cindex names, variable
2421 Variable names may be any length, but only the first 8 characters are
2425 @cindex case-sensitivity
2426 Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
2427 @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
2428 representations of the same identifier.
2432 Identifiers other than variable names may be abbreviated to their first
2433 3 characters if this abbreviation is unambiguous. These identifiers are
2434 often called @dfn{keywords}. (Unique abbreviations of 3 or more
2435 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
2436 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
2439 Whether an identifier is a keyword depends on the context.
2442 @cindex keywords, reserved
2443 @cindex reserved keywords
2444 Some keywords are reserved. These keywords may not be used in any
2445 context besides those explicitly described in this manual. The reserved
2449 ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
2453 Since keywords are identifiers, all the rules for identifiers apply.
2454 Specifically, they must be delimited as are other identifiers:
2455 @code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
2461 @cindex variable names, ending with period
2462 @strong{Caution:} It is legal to end a variable name with a period, but
2463 @emph{don't do it!} The variable name will be misinterpreted when it is
2464 the final token on a line: @code{FOO.} will be divided into two separate
2465 tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
2466 @xref{Commands, , Forming commands of tokens}.
2472 Numbers may be specified as integers or reals. Integers are internally
2473 converted into reals. Scientific notation is not supported. Here are
2474 some examples of valid numbers:
2477 1234 3.14159265359 .707106781185 8945.
2480 @strong{Caution:} The last example will be interpreted as two tokens,
2481 @samp{8945} and @samp{.}, if it is the last token on a line.
2487 @cindex case-sensitivity
2488 Strings are literal sequences of characters enclosed in pairs of single
2489 quotes (@samp{'}) or double quotes (@samp{"}).
2493 Whitespace and case of letters @emph{are} significant inside strings.
2495 Whitespace characters inside a string are not delimiters.
2497 To include single-quote characters in a string, enclose the string in
2500 To include double-quote characters in a string, enclose the string in
2503 It is not possible to put both single- and double-quote characters
2509 Hexstrings are string variants that use hex digits to specify
2514 A hexstring may be used anywhere that an ordinary string is allowed.
2519 A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
2523 No whitespace is allowed between the initial @samp{X} and @samp{'}.
2526 Double quotes @samp{"} may be used in place of single quotes @samp{'} if
2527 done in both places.
2530 Each pair of hex digits is internally changed into a single character
2531 with the given value.
2534 If there is an odd number of hex digits, the missing last digit is
2535 assumed to be @samp{0}.
2539 @strong{Please note:} Use of hexstrings is nonportable because the same
2540 numeric values are associated with different glyphs by different
2541 operating systems. Therefore, their use should be confined to syntax
2542 files that will not be widely distributed.
2545 @cindex characters, reserved
2548 @strong{Please note also:} The character with value 00 is reserved for
2549 internal use by PSPP. Its use in strings causes an error and
2550 replacement with a blank space (in ASCII, hex 20, decimal 32).
2555 Punctuation separates tokens; punctuators are delimiters. These are the
2556 punctuation characters:
2564 Operators describe mathematical operations. Some operators are delimiters:
2570 Many of the above operators are also punctuators. Punctuators are
2571 distinguished from operators by context.
2573 The other operators are all reserved keywords. None of these are
2577 AND EQ GE GT LE LT NE OR
2581 @cindex terminal dot
2582 @cindex dot, terminal
2585 A period (@samp{.}) at the end of a line (except for whitespace) is one
2586 type of a @dfn{terminal dot}, although not every terminal dot is a
2587 period at the end of a line. @xref{Commands, , Forming commands of
2588 tokens}. A period is a terminal dot @emph{only}
2589 when it is at the end of a line; otherwise it is part of a
2590 floating-point number. (A period outside a number in the middle of a
2594 @cindex terminal dot, changing
2595 @cindex dot, terminal, changing
2596 @strong{Please note:} The character used for the @dfn{terminal dot} can
2597 be changed with the SET command. This is strongly discouraged, and
2598 throughout all the remainder of this manual it will be assumed that the
2599 default setting is in effect.
2604 @node Commands, Types of Commands, Tokens, Language
2605 @section Forming commands of tokens
2607 @cindex PSPP, command structure
2608 @cindex language, command structure
2609 @cindex commands, structure
2611 Most PSPP commands share a common structure, diagrammed below:
2614 @var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
2618 In the above, rather daunting, expression, pairs of square brackets
2619 (@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
2620 indicate parts of the syntax that vary from command to command.
2621 Ellipses (@samp{...}) indicate that the preceding part may be repeated
2622 an arbitrary number of times. Let's pick apart what it says above:
2625 @cindex commands, names
2627 A command begins with a command name of one or more keywords, such as
2628 @code{FREQUENCIES}, @code{DATA LIST}, or @code{N OF CASES}. @var{cmd}
2629 may be abbreviated to its first word if that is unambiguous; each word
2630 in @var{cmd} may be abbreviated to a unique prefix of three or more
2631 characters as described above.
2635 The command name may be followed by one or more @dfn{subcommands}:
2639 Each subcommand begins with a unique keyword, indicated by @var{sbc}
2640 above. This is analogous to the command name.
2643 The subcommand name is optionally followed by an equals sign (@samp{=}).
2646 Some subcommands accept a series of one or more specifications
2647 (@var{spec}), optionally separated by commas.
2650 Each subcommand must be separated from the next (if any) by a forward
2654 @cindex dot, terminal
2655 @cindex terminal dot
2657 Each command must be terminated with a @dfn{terminal dot}.
2658 The terminal dot may be given one of three ways:
2662 (most commonly) A period character at the very end of a line, as
2666 (only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
2667 details.) A completely blank line.
2670 (in batch mode only) Any line that is not indented from the left side of
2671 the page causes a terminal dot to be inserted before that line.
2672 Therefore, each command begins with a line that is flush left, followed
2673 by zero or more lines that are indented one or more characters from the
2676 In batch mode, PSPP will ignore a plus sign, minus sign, or period
2677 (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
2678 line. Any of these characters as the first character on a line will
2679 begin a new command. This allows for visual indentation of a command
2680 without that command being considered part of the previous command.
2682 PSPP is in batch mode when it is reading input from a file, rather
2683 than from an interactive user. Note that the other forms of the
2684 terminal dot may also be used in batch mode.
2686 Sometimes, one encounters syntax files that are intended to be
2687 interpreted in interactive mode rather than batch mode (for instance,
2688 this can happen if a session log file is used directly as a syntax
2689 file). When this occurs, use the @samp{-i} command line option to force
2690 interpretation in interactive mode (@pxref{Language control options}).
2694 PSPP ignores empty commands when they are generated by the above
2695 rules. Note that, as a consequence of these rules, each command must
2696 begin on a new line.
2698 @node Types of Commands, Order of Commands, Commands, Language
2699 @section Types of Commands
2701 Commands in PSPP are divided roughly into six categories:
2704 @item Utility commands
2705 @cindex utility commands
2706 Set or display various global options that affect PSPP operations.
2707 May appear anywhere in a syntax file. @xref{Utilities, , Utility
2710 @item File definition commands
2711 @cindex file definition commands
2712 Give instructions for reading data from text files or from special
2713 binary ``system files''. Most of these commands discard any previous
2714 data or variables in order to replace it with the new data and
2715 variables. At least one must appear before the first command in any of
2716 the categories below. @xref{Data Input and Output}.
2718 @item Input program commands
2719 @cindex input program commands
2720 Though rarely used, these provide powerful tools for reading data files
2721 in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
2723 @item Transformations
2724 @cindex transformations
2725 Perform operations on data and write data to output files. Transformations
2726 are not carried out until a procedure is executed.
2728 @item Restricted transformations
2729 @cindex restricted transformations
2730 Same as transformations for most purposes. @xref{Order of Commands}, for a
2731 detailed description of the differences.
2735 Analyze data, writing results of analyses to the listing file. Cause
2736 transformations specified earlier in the file to be performed. In a
2737 more general sense, a @dfn{procedure} is any command that causes the
2738 active file (the data) to be read.
2741 @node Order of Commands, Missing Observations, Types of Commands, Language
2742 @section Order of Commands
2743 @cindex commands, ordering
2744 @cindex order of commands
2746 PSPP does not place many restrictions on ordering of commands.
2747 The main restriction is that variables must be defined with one of the
2748 file-definition commands before they are otherwise referred to.
2750 Of course, there are specific rules, for those who are interested.
2751 PSPP possesses five internal states, called initial, INPUT
2752 PROGRAM, FILE TYPE, transformation, and procedure states. (Please note
2753 the distinction between the INPUT PROGRAM and FILE TYPE @emph{commands}
2754 and the INPUT PROGRAM and FILE TYPE @emph{states}.)
2756 PSPP starts up in the initial state. Each successful completion
2757 of a command may cause a state transition. Each type of command has its
2758 own rules for state transitions:
2761 @item Utility commands
2764 Legal in all states.
2766 Do not cause state transitions. Exception: when the N OF CASES command
2767 is executed in the procedure state, it causes a transition to the
2768 transformation state.
2774 Legal in all states.
2776 When executed in the initial or procedure state, causes a transition to
2777 the transformation state.
2779 Clears the active file if executed in the procedure or transformation
2786 Invalid in INPUT PROGRAM and FILE TYPE states.
2788 Causes a transition to the INPUT PROGRAM state.
2790 Clears the active file.
2796 Invalid in INPUT PROGRAM and FILE TYPE states.
2798 Causes a transition to the FILE TYPE state.
2800 Clears the active file.
2803 @item Other file definition commands
2806 Invalid in INPUT PROGRAM and FILE TYPE states.
2808 Cause a transition to the transformation state.
2810 Clear the active file, except for ADD FILES, MATCH FILES, and UPDATE.
2813 @item Transformations
2816 Invalid in initial and FILE TYPE states.
2818 Cause a transition to the transformation state.
2821 @item Restricted transformations
2824 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
2826 Cause a transition to the transformation state.
2832 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
2834 Cause a transition to the procedure state.
2838 @node Missing Observations, Variables, Order of Commands, Language
2839 @section Handling missing observations
2840 @cindex missing values
2841 @cindex values, missing
2843 PSPP includes special support for unknown numeric data values.
2844 Missing observations are assigned a special value, called the
2845 @dfn{system-missing value}. This ``value'' actually indicates the
2846 absence of value; it means that the actual value is unknown. Procedures
2847 automatically exclude from analyses those observations or cases that
2848 have missing values. Whether single observations or entire cases are
2849 excluded depends on the procedure.
2851 The system-missing value exists only for numeric variables. String
2852 variables always have a defined value, even if it is only a string of
2855 Variables, whether numeric or string, can have designated
2856 @dfn{user-missing values}. Every user-missing value is an actual value
2857 for that variable. However, most of the time user-missing values are
2858 treated in the same way as the system-missing value. String variables
2859 that are wider than a certain width, usually 8 characters (depending on
2860 computer architecture), cannot have user-missing values.
2862 For more information on missing values, see the following sections:
2863 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
2864 documentation on individual procedures for information on how they
2865 handle missing values.
2867 @node Variables, Files, Missing Observations, Language
2872 Variables are the basic unit of data storage in PSPP. All the
2873 variables in a file taken together, apart from any associated data, are
2874 said to form a @dfn{dictionary}.
2875 Some details of variables are described in the sections below.
2878 * Attributes:: Attributes of variables.
2879 * System Variables:: Variables automatically defined by PSPP.
2880 * Sets of Variables:: Lists of variable names.
2881 * Input/Output Formats:: Input and output formats.
2882 * Scratch Variables:: Variables deleted by procedures.
2885 @node Attributes, System Variables, Variables, Variables
2886 @subsection Attributes of Variables
2887 @cindex variables, attributes of
2888 @cindex attributes of variables
2889 Each variable has a number of attributes, including:
2893 This is an identifier. Each variable must have a different name.
2896 @cindex variables, type
2897 @cindex type of variables
2901 @cindex variables, width
2902 @cindex width of variables
2904 (string variables only) String variables with a width of 8 characters or
2905 fewer are called @dfn{short string variables}. Short string variables
2906 can be used in many procedures where @dfn{long string variables} (those
2907 with widths greater than 8) are not allowed.
2910 @strong{Please note:} Certain systems may consider strings longer than 8
2911 characters to be short strings. Eight characters represents a minimum
2912 figure for the maximum length of a short string.
2916 Variables in the dictionary are arranged in a specific order. The
2917 DISPLAY command can be used to show this order: see @ref{DISPLAY}.
2920 Dexter or sinister. @xref{LEAVE}.
2922 @cindex missing values
2923 @cindex values, missing
2924 @item Missing values
2925 Optionally, up to three values, or a range of values, or a specific
2926 value plus a range, can be specified as @dfn{user-missing values}.
2927 There is also a @dfn{system-missing value} that is assigned to an
2928 observation when there is no other obvious value for that observation.
2929 Observations with missing values are automatically excluded from
2930 analyses. User-missing values are actual data values, while the
2931 system-missing value is not a value at all. @xref{Missing Observations}.
2933 @cindex variable labels
2934 @cindex labels, variable
2935 @item Variable label
2936 A string that describes the variable. @xref{VARIABLE LABELS}.
2938 @cindex value labels
2939 @cindex labels, value
2941 Optionally, these associate each possible value of the variable with a
2942 string. @xref{VALUE LABELS}.
2944 @cindex print format
2946 Display width, format, and (for numeric variables) number of decimal
2947 places. This attribute does not affect how data are stored, just how
2948 they are displayed. Example: a width of 8, with 2 decimal places.
2949 @xref{PRINT FORMATS}.
2951 @cindex write format
2953 Similar to print format, but used by certain commands that are
2954 designed to write to binary files. @xref{WRITE FORMATS}.
2957 @node System Variables, Sets of Variables, Attributes, Variables
2958 @subsection Variables Automatically Defined by PSPP
2959 @cindex system variables
2960 @cindex variables, system
2962 There are seven system variables. These are not like ordinary
2963 variables, as they are not stored in each case. They can only be used
2964 in expressions. These system variables, whose values and output formats
2965 cannot be modified, are described below.
2968 @cindex @code{$CASENUM}
2970 Case number of the case at the moment. This changes as cases are
2973 @cindex @code{$DATE}
2975 Date the PSPP process was started, in format A9, following the
2976 pattern @code{DD MMM YY}.
2978 @cindex @code{$JDATE}
2980 Number of days between 15 Oct 1582 and the time the PSPP process
2983 @cindex @code{$LENGTH}
2985 Page length, in lines, in format F11.
2987 @cindex @code{$SYSMIS}
2989 System missing value, in format F1.
2991 @cindex @code{$TIME}
2993 Number of seconds between midnight 14 Oct 1582 and the time the active file
2994 was read, in format F20.
2996 @cindex @code{$WIDTH}
2998 Page width, in characters, in format F3.
3001 @node Sets of Variables, Input/Output Formats, System Variables, Variables
3002 @subsection Lists of variable names
3003 @cindex TO convention
3004 @cindex convention, TO
3006 There are several ways to specify a set of variables:
3010 (Most commonly.) List the variable names one after another, optionally
3011 separating them by commas.
3015 (This method cannot be used on commands that define the dictionary, such
3016 as @code{DATA LIST}.) The syntax is the names of two existing variables,
3017 separated by the reserved keyword @code{TO}. The meaning is to include
3018 every variable in the dictionary between and including the variables
3019 specified. For instance, if the dictionary contains six variables with
3020 the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
3021 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
3022 variables @code{X2}, @code{GOAL}, and @code{MET}.
3025 (This method can be used only on commands that define the dictionary,
3026 such as @code{DATA LIST}.) It is used to define sequences of variables
3027 that end in consecutive integers. The syntax is two identifiers that
3028 end in numbers. This method is best illustrated with examples:
3032 The syntax @code{X1 TO X5} defines 5 variables:
3048 The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
3066 Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
3067 are invalid, although for different reasons, which should be evident.
3070 Note that after a set of variables has been defined with @code{DATA LIST}
3071 or another command with this method, the same set can be referenced on
3072 later commands using the same syntax.
3075 The above methods can be combined, either one after another or delimited
3076 by commas. For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
3077 is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
3078 is individually legal.
3081 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
3082 @subsection Input and Output Formats
3084 Data that PSPP inputs and outputs must have one of a number of formats.
3085 These formats are described, in general, by a format specification of
3086 the form @code{NAMEw.d}, where @var{name} is the
3087 format name and @var{w} is a field width. @var{d} is the optional
3088 desired number of decimal places, if appropriate. If @var{d} is not
3089 included then it is assumed to be 0. Some formats do not allow @var{d}
3092 When an input format is specified on DATA LIST or another command, then
3093 it is converted to an output format for the purposes of PRINT and other
3094 data output commands. For most purposes, input and output formats are
3095 the same; the salient differences are described below.
3097 Below are listed the input and output formats supported by PSPP. If an
3098 input format is mapped to a different output format by default, then
3099 that mapping is indicated with @result{}. Each format has the listed
3100 bounds on input width (iw) and output width (ow).
3102 The standard numeric input and output formats are given in the following
3106 @item Fw.d: 1 <= iw,ow <= 40
3107 Standard decimal format with @var{d} decimal places. If the number is
3108 too large to fit within the field width, it is expressed in scientific
3109 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
3110 the exponent. When used as an input format, scientific notation is
3111 allowed but an E or an F must be used to introduce the exponent.
3113 The default output format is the same as the input format, except if
3114 @var{d} > 1. In that case the output @var{w} is always made to be at
3117 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
3118 For input this is equivalent to F format except that no E or F is
3119 require to introduce the exponent. For output, produces scientific
3120 notation in the form @code{1.2+34}. There are always at least two
3121 digits given in the exponent.
3123 The default output @var{w} is the largest of the input @var{w}, the
3124 input @var{d} + 7, and 10. The default output @var{d} is the input
3125 @var{d}, but at least 3.
3127 @item COMMAw.d: 1 <= iw,ow <= 40
3128 Equivalent to F format, except that groups of three digits are
3129 comma-separated on output. If the number is too large to express in the
3130 field width, then first commas are eliminated, then if there is still
3131 not enough space the number is expressed in scientific notation given
3132 that w >= 6. Commas are allowed and ignored when this is used as an
3135 @item DOTw.d: 1 <= iw,ow <= 40
3136 Equivalent to COMMA format except that the roles of comma and decimal
3137 point are interchanged. However: If SET /DECIMAL=DOT is in effect, then
3138 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
3141 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
3142 Equivalent to COMMA format, except that the number is prefixed by a
3143 dollar sign (@samp{$}) if there is room. On input the value is allowed
3144 to be prefixed by a dollar sign, which is ignored.
3146 The default output @var{w} is the input @var{w}, but at least 2.
3148 @item PCTw.d: 2 <= iw,ow <= 40
3149 Equivalent to F format, except that the number is suffixed by a percent
3150 sign (@samp{%}) if there is room. On input the value is allowed to be
3151 suffixed by a percent sign, which is ignored.
3153 The default output @var{w} is the input @var{w}, but at least 2.
3155 @item Nw.d: 1 <= iw,ow <= 40
3156 Only digits are allowed within the field width. The decimal point is
3157 assumed to be @var{d} digits from the right margin.
3159 The default output format is F with the same @var{w} and @var{d}, except
3160 if @var{d} > 1. In that case the output @var{w} is always made to be at
3163 @item Zw.d @result{} F: 1 <= iw,ow <= 40
3164 Zoned decimal input. If you need to use this then you know how.
3166 @item IBw.d @result{} F: 1 <= iw,ow <= 8
3167 Integer binary format. The field is interpreted as a fixed-point
3168 positive or negative binary number in two's-complement notation. The
3169 location of the decimal point is implied. Endianness is the same as the
3172 The default output format is F8.2 if @var{d} is 0. Otherwise it is F,
3173 with output @var{w} as 9 + input @var{d} and output @var{d} as input
3176 @item PIB @result{} F: 1 <= iw,ow <= 8
3177 Positive integer binary format. The field is interpreted as a
3178 fixed-point positive binary number. The location of the decimal point
3179 is implied. Endianness is teh same as the host machine.
3181 The default output format follows the rules for IB format.
3183 @item Pw.d @result{} F: 1 <= iw,ow <= 16
3184 Binary coded decimal format. Each byte from left to right, except the
3185 rightmost, represents two digits. The upper nibble of each byte is more
3186 significant. The upper nibble of the final byte is the least
3187 significant digit. The lower nibble of the final byte is the sign; a
3188 value of D represents a negative sign and all other values are
3189 considered positive. The decimal point is implied.
3191 The default output format follows the rules for IB format.
3193 @item PKw.d @result{} F: 1 <= iw,ow <= 16
3194 Positive binary code decimal format. Same as P but the last byte is the
3197 The default output format follows the rules for IB format.
3199 @item RBw @result{} F: 2 <= iw,ow <= 8
3201 Binary C architecture-dependent ``double'' format. For a standard
3202 IEEE754 implementation @var{w} should be 8.
3204 The default output format follows the rules for IB format.
3206 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
3207 PIB format encoded as textual hex digit pairs. @var{w} must be even.
3209 The input width is mapped to a default output width as follows:
3210 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
3211 12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for
3214 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
3216 RB format encoded as textual hex digits pairs. @var{w} must be even.
3218 The default output format is F8.2.
3220 @item CCAw.d: 1 <= ow <= 40
3221 @itemx CCBw.d: 1 <= ow <= 40
3222 @itemx CCCw.d: 1 <= ow <= 40
3223 @itemx CCDw.d: 1 <= ow <= 40
3224 @itemx CCEw.d: 1 <= ow <= 40
3226 User-defined custom currency formats. May not be used as an input
3227 format. @xref{SET}, for more details.
3230 The date and time numeric input and output formats accept a number of
3231 possible formats. Before describing the formats themselves, some
3232 definitions of the elements that make up their formats will be helpful:
3236 All formats accept an optional whitespace leader.
3239 An integer between 1 and 31 representing the day of month.
3242 An integer representing a number of days.
3244 @item date-delimiter
3245 One or more characters of whitespace or the following characters:
3249 A month name in one of the following forms:
3252 An integer between 1 and 12.
3254 Roman numerals representing an integer between 1 and 12.
3256 At least the first three characters of an English month name (January,
3261 An integer year number between 1582 and 19999, or between 1 and 199.
3262 Years between 1 and 199 will have 1900 added.
3265 A single number with a year number in the first 2, 3, or 4 digits (as
3266 above) and the day number within the year in the last 3 digits.
3269 An integer between 1 and 4 representing a quarter.
3272 The letter @samp{Q} or @samp{q}.
3275 An integer between 1 and 53 representing a week within a year.
3278 The letters @samp{wk} in any case.
3280 @item time-delimiter
3281 At least one characters of whitespace or @samp{:} or @samp{.}.
3284 An integer greater than 0 representing an hour.
3287 An integer between 0 and 59 representing a minute within an hour.
3290 Optionally, a time-delimiter followed by a real number representing a
3294 An integer between 0 and 23 representing an hour within a day.
3297 At least the first two characters of an English day word.
3300 Any amount or no amount of whitespace.
3303 An optional positive or negative sign.
3306 All formats accept an optional whitespace trailer.
3309 The date input formats are strung together from the above pieces. On
3310 output, the date formats are always printed in a single canonical
3311 manner, based on field width. The date input and output formats are
3315 @item DATEw: 9 <= iw,ow <= 40
3316 Date format. Input format: leader + day + date-delimiter +
3317 month + date-delimiter + year + trailer. Output format: DD-MMM-YY for
3318 @var{w} < 11, DD-MMM-YYYY otherwise.
3320 @item EDATEw: 8 <= iw,ow <= 40
3321 European date format. Input format same as DATE. Output format:
3322 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
3324 @item SDATEw: 8 <= iw,ow <= 40
3325 Standard date format. Input format: leader + year + date-delimiter +
3326 month + date-delimiter + day + trailer. Output format: YY/MM/DD for
3327 @var{w} < 10, YYYY/MM/DD otherwise.
3329 @item ADATEw: 8 <= iw,ow <= 40
3330 American date format. Input format: leader + month + date-delimiter +
3331 day + date-delimiter + year + trailer. Output format: MM/DD/YY for
3332 @var{w} < 10, MM/DD/YYYY otherwise.
3334 @item JDATEw: 5 <= iw,ow <= 40
3335 Julian date format. Input format: leader + julian + trailer. Output
3336 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
3338 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
3339 Quarter/year format. Input format: leader + quarter + q-delimiter +
3340 year + trailer. Output format: @samp{Q Q YY}, where the first
3341 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
3344 @item MOYRw: 6 <= iw,ow <= 40
3345 Month/year format. Input format: leader + month + date-delimiter + year
3346 + trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
3349 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
3350 Week/year format. Input format: leader + week + wk-delimiter + year +
3351 trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
3354 @item DATETIMEw.d: 17 <= iw,ow <= 40
3355 Date and time format. Input format: leader + day + date-delimiter +
3356 month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
3357 + minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
3358 @var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
3359 @var{d} > 0 then fractional seconds @samp{.SS} are added.
3361 @item TIMEw.d: 5 <= iw,ow <= 40
3362 Time format. Input format: leader + sign + spaces + hour +
3363 time-delimiter + minute + opt-second. Output format: @samp{HH:MM}.
3364 Seconds and fractional seconds are available with @var{w} of at least 8
3365 and 10, respectively.
3367 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
3368 Time format with day count. Input format: leader + sign + spaces +
3369 day-count + time-delimiter + hour + time-delimiter + minute +
3370 opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional
3371 seconds are available with @var{w} of at least 8 and 10, respectively.
3373 @item WKDAYw: 2 <= iw,ow <= 40
3374 A weekday as a number between 1 and 7, where 1 is Sunday. Input format:
3375 leader + weekday + trailer. Output format: as many characters, in all
3376 capital letters, of the English name of the weekday as will fit in the
3379 @item MONTHw: 3 <= iw,ow <= 40
3380 A month as a number between 1 and 12, where 1 is January. Input format:
3381 leader + month + trailer. Output format: as many character, in all
3382 capital letters, of the English name of the month as will fit in the
3386 There are only two formats that may be used with string variables:
3389 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
3390 The entire field is treated as a string value.
3392 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
3393 The field is composed of characters in a string encoded as textual hex
3396 The default output @var{w} is half the input @var{w}.
3399 @node Scratch Variables, , Input/Output Formats, Variables
3400 @subsection Scratch Variables
3402 Most of the time, variables don't retain their values between cases.
3403 Instead, either they're being read from a data file or the active file,
3404 in which case they assume the value read, or, if created with COMPUTE or
3405 another transformation, they're initialized to the system-missing value
3406 or to blanks, depending on type.
3408 However, sometimes it's useful to have a variable that keeps its value
3409 between cases. You can do this with LEAVE (@pxref{LEAVE}), or you can
3410 use a @dfn{scratch variable}. Scratch variables are variables whose
3411 names begin with an octothorpe (@samp{#}).
3413 Scratch variables have the same properties as variables left with LEAVE:
3414 they retain their values between cases, and for the first case they are
3415 initialized to 0 or blanks. They have the additional property that they
3416 are deleted before the execution of any procedure. For this reason,
3417 scratch variables can't be used for analysis. To obtain the same
3418 effect, use COMPUTE (@pxref{COMPUTE}) to copy the scratch variable's
3419 value into an ordinary variable, then analysis that variable.
3421 @node Files, BNF, Variables, Language
3422 @section Files Used by PSPP
3424 PSPP makes use of many files each time it runs. Some of these it
3425 reads, some it writes, some it creates. Here is a table listing the
3426 most important of these files:
3429 @cindex file, command
3430 @cindex file, syntax file
3431 @cindex command file
3435 These names (synonyms) refer to the file that contains instructions to
3436 PSPP that tell it what to do. The syntax file's name is specified on
3437 the PSPP command line. Syntax files can also be pulled in with the
3438 @code{INCLUDE} command.
3443 Data files contain raw data in ASCII format suitable for being read in
3444 by the @code{DATA LIST} command. Data can be embedded in the syntax
3445 file with @code{BEGIN DATA} and @code{END DATA} commands: this makes the
3446 syntax file a data file too.
3448 @cindex file, output
3451 One or more output files are created by PSPP each time it is
3452 run. The output files receive the tables and charts produced by
3453 statistical procedures. The output files may be in any number of formats,
3454 depending on how PSPP is configured.
3457 @cindex file, active
3459 The active file is the ``file'' on which all PSPP procedures
3460 are performed. The active file contains variable definitions and
3461 cases. The active file is not necessarily a disk file: it is stored
3462 in memory if there is room.
3465 @node BNF, , Files, Language
3466 @section Backus-Naur Form
3468 @cindex Backus-Naur Form
3469 @cindex command syntax, description of
3470 @cindex description of command syntax
3472 The syntax of some parts of the PSPP language is presented in this
3473 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
3474 following table describes BNF:
3480 Words in all-uppercase are PSPP keyword tokens. In BNF, these are
3481 often called @dfn{terminals}. There are some special terminals, which
3482 are actually written in lowercase for clarity:
3485 @cindex @code{number}
3489 @cindex @code{integer}
3490 @item @code{integer}
3493 @cindex @code{string}
3497 @cindex @code{var-name}
3498 @item @code{var-name}
3499 A single variable name.
3503 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
3504 Operators and punctuators.
3507 @cindex terminal dot
3508 @cindex dot, terminal
3510 The terminal dot. This is not necessarily an actual dot in the syntax
3511 file: @xref{Commands}, for more details.
3516 @cindex nonterminals
3517 Other words in all lowercase refer to BNF definitions, called
3518 @dfn{productions}. These productions are also known as
3519 @dfn{nonterminals}. Some nonterminals are very common, so they are
3520 defined here in English for clarity:
3523 @cindex @code{var-list}
3525 A list of one or more variable names or the keyword @code{ALL}.
3527 @cindex @code{expression}
3529 An expression. @xref{Expressions}, for details.
3534 @cindex ``is defined as''
3536 @samp{::=} means ``is defined as''. The left side of @samp{::=} gives
3537 the name of the nonterminal being defined. The right side of @samp{::=}
3538 gives the definition of that nonterminal. If the right side is empty,
3539 then one possible expansion of that nonterminal is nothing. A BNF
3540 definition is called a @dfn{production}.
3543 @cindex terminals and nonterminals, differences
3544 So, the key difference between a terminal and a nonterminal is that a
3545 terminal cannot be broken into smaller parts---in fact, every terminal
3546 is a single token (@pxref{Tokens}). On the other hand, nonterminals are
3547 composed of a (possibly empty) sequence of terminals and nonterminals.
3548 Thus, terminals indicate the deepest level of syntax description. (In
3549 parsing theory, terminals are the leaves of the parse tree; nonterminals
3553 @cindex start symbol
3554 @cindex symbol, start
3555 The first nonterminal defined in a set of productions is called the
3556 @dfn{start symbol}. The start symbol defines the entire syntax for
3560 @node Expressions, Data Input and Output, Language, Top
3561 @chapter Mathematical Expressions
3562 @cindex expressions, mathematical
3563 @cindex mathematical expressions
3565 Some PSPP commands use expressions, which share a common syntax
3566 among all PSPP commands. Expressions are made up of
3567 @dfn{operands}, which can be numbers, strings, or variable names,
3568 separated by @dfn{operators}. There are five types of operators:
3569 grouping, arithmetic, logical, relational, and functions.
3571 Every operator takes one or more @dfn{arguments} as input and produces
3572 or @dfn{returns} exactly one result as output. Both strings and numeric
3573 values can be used as arguments and are produced as results, but each
3574 operator accepts only specific combinations of numeric and string values
3575 as arguments. With few exceptions, operator arguments may be
3576 full-fledged expressions in themselves.
3579 * Booleans:: Boolean values.
3580 * Missing Values in Expressions:: Using missing values in expressions.
3581 * Grouping Operators:: ( )
3582 * Arithmetic Operators:: + - * / **
3583 * Logical Operators:: AND NOT OR
3584 * Relational Operators:: EQ GE GT LE LT NE
3585 * Functions:: More-sophisticated operators.
3586 * Order of Operations:: Operator precedence.
3589 @node Booleans, Missing Values in Expressions, Expressions, Expressions
3590 @section Boolean values
3592 @cindex values, Boolean
3594 There is a third type for arguments and results, the @dfn{Boolean} type,
3595 which is used to represent true/false conditions. Booleans have only
3596 three possible values: 0 (false), 1 (true), and system-missing.
3597 System-missing is neither true nor false.
3601 A numeric expression that has value 0, 1, or system-missing may be used
3602 in place of a Boolean. Thus, the expression @code{0 AND 1} is valid
3603 (although it is always false).
3606 A numeric expression with any other value will cause an error if it is
3607 used as a Boolean. So, @code{2 OR 3} is invalid.
3610 A Boolean expression may not be used in place of a numeric expression.
3611 Thus, @code{(1>2) + (3<4)} is invalid.
3614 Strings and Booleans are not compatible, and neither may be used in
3618 @node Missing Values in Expressions, Grouping Operators, Booleans, Expressions
3619 @section Missing Values in Expressions
3621 String missing values are not treated specially in expressions. Most
3622 numeric operators return system-missing when given system-missing
3623 arguments. Exceptions are listed under particular operator
3626 User-missing values for numeric variables are always transformed into
3627 the system-missing value, except inside the arguments to the
3628 @code{VALUE}, @code{SYSMIS}, and @code{MISSING} functions.
3630 The missing-value functions can be used to precisely control how missing
3631 values are treated in expressions. @xref{Missing Value Functions}, for
3634 @node Grouping Operators, Arithmetic Operators, Missing Values in Expressions, Expressions
3635 @section Grouping Operators
3638 @cindex grouping operators
3639 @cindex operators, grouping
3641 Parentheses (@samp{()}) are the grouping operators. Surround an
3642 expression with parentheses to force early evaluation.
3644 Parentheses also surround the arguments to functions, but in that
3645 situation they act as punctuators, not as operators.
3647 @node Arithmetic Operators, Logical Operators, Grouping Operators, Expressions
3648 @section Arithmetic Operators
3649 @cindex operators, arithmetic
3650 @cindex arithmetic operators
3652 The arithmetic operators take numeric arguments and produce numeric
3658 @item @var{a} + @var{b}
3659 Adds @var{a} and @var{b}, returning the sum.
3663 @item @var{a} - @var{b}
3664 Subtracts @var{b} from @var{a}, returning the difference.
3667 @cindex multiplication
3668 @item @var{a} * @var{b}
3669 Multiplies @var{a} and @var{b}, returning the product.
3673 @item @var{a} / @var{b}
3674 Divides @var{a} by @var{b}, returning the quotient. If @var{b} is
3675 zero, the result is system-missing.
3678 @cindex exponentiation
3679 @item @var{a} ** @var{b}
3680 Returns the result of raising @var{a} to the power @var{b}. If
3681 @var{a} is negative and @var{b} is not an integer, the result is
3682 system-missing. The result of @code{0**0} is system-missing as well.
3687 Reverses the sign of @var{a}.
3690 @node Logical Operators, Relational Operators, Arithmetic Operators, Expressions
3691 @section Logical Operators
3692 @cindex logical operators
3693 @cindex operators, logical
3698 @cindex values, system-missing
3699 @cindex system-missing
3700 The logical operators take logical arguments and produce logical
3701 results, meaning ``true or false''. PSPP logical operators are
3702 not true Boolean operators because they may also result in a
3703 system-missing value.
3708 @cindex intersection, logical
3709 @cindex logical intersection
3710 @item @var{a} AND @var{b}
3711 @itemx @var{a} & @var{b}
3712 True if both @var{a} and @var{b} are true. However, if one argument is
3713 false and the other is missing, the result is false, not missing. If
3714 both arguments are missing, the result is missing.
3718 @cindex union, logical
3719 @cindex logical union
3720 @item @var{a} OR @var{b}
3721 @itemx @var{a} | @var{b}
3722 True if at least one of @var{a} and @var{b} is true. If one argument is
3723 true and the other is missing, the result is true, not missing. If both
3724 arguments are missing, the result is missing.
3728 @cindex inversion, logical
3729 @cindex logical inversion
3732 True if @var{a} is false.
3735 @node Relational Operators, Functions, Logical Operators, Expressions
3736 @section Relational Operators
3738 The relational operators take numeric or string arguments and produce Boolean
3741 Note that, with numeric arguments, PSPP does not make exact
3742 relational tests. Instead, two numbers are considered to be equal even
3743 if they differ by a small amount. This amount, @dfn{epsilon}, is
3744 dependent on the PSPP configuration and determined at compile
3745 time. (The default value is 0.000000001, or
3752 Use of epsilon allows for round-off errors. Use of epsilon is also
3753 idiotic, but the author is not a numeric analyst.
3755 Strings cannot be compared to numbers. When strings of different
3756 lengths are compared, the shorter string is right-padded with spaces
3757 to match the length of the longer string.
3759 The results of string comparisons, other than tests for equality or
3760 inequality, are dependent on the character set in use. String
3761 comparisons are case-sensitive.
3764 @cindex equality, testing
3765 @cindex testing for equality
3768 @item @var{a} EQ @var{b}
3769 @itemx @var{a} = @var{b}
3770 True if @var{a} is equal to @var{b}.
3772 @cindex less than or equal to
3775 @item @var{a} LE @var{b}
3776 @itemx @var{a} <= @var{b}
3777 True if @var{a} is less than or equal to @var{b}.
3782 @item @var{a} LT @var{b}
3783 @itemx @var{a} < @var{b}
3784 True if @var{a} is less than @var{b}.
3786 @cindex greater than or equal to
3789 @item @var{a} GE @var{b}
3790 @itemx @var{a} >= @var{b}
3791 True if @var{a} is greater than or equal to @var{b}.
3793 @cindex greater than
3796 @item @var{a} GT @var{b}
3797 @itemx @var{a} > @var{b}
3798 True if @var{a} is greater than @var{b}.
3800 @cindex inequality, testing
3801 @cindex testing for inequality
3805 @item @var{a} NE @var{b}
3806 @itemx @var{a} ~= @var{b}
3807 @itemx @var{a} <> @var{b}
3808 True is @var{a} is not equal to @var{b}.
3811 @node Functions, Order of Operations, Relational Operators, Expressions
3820 @cindex names, of functions
3821 PSPP functions provide mathematical abilities above and beyond
3822 those possible using simple operators. Functions have a common
3823 syntax: each is composed of a function name followed by a left
3824 parenthesis, one or more arguments, and a right parenthesis. Function
3825 names are @strong{not} reserved; their names are specially treated
3826 only when followed by a left parenthesis: @code{EXP(10)} refers to the
3827 constant value @code{e} raised to the 10th power, but @code{EXP} by
3828 itself refers to the value of variable EXP.
3830 The sections below describe each function in detail.
3833 * Advanced Mathematics:: EXP LG10 LN SQRT
3834 * Miscellaneous Mathematics:: ABS MOD MOD10 RND TRUNC
3835 * Trigonometry:: ACOS ARCOS ARSIN ARTAN ASIN ATAN COS SIN TAN
3836 * Missing Value Functions:: MISSING NMISS NVALID SYSMIS VALUE
3837 * Pseudo-Random Numbers:: NORMAL UNIFORM
3838 * Set Membership:: ANY RANGE
3839 * Statistical Functions:: CFVAR MAX MEAN MIN SD SUM VARIANCE
3840 * String Functions:: CONCAT INDEX LENGTH LOWER LPAD LTRIM NUMBER
3841 RINDEX RPAD RTRIM STRING SUBSTR UPCASE
3842 * Time & Date:: CTIME.xxx DATE.xxx TIME.xxx XDATE.xxx
3843 * Miscellaneous Functions:: LAG YRMODA
3844 * Functions Not Implemented:: CDF.xxx CDFNORM IDF.xxx NCDF.xxx PROBIT RV.xxx
3847 @node Advanced Mathematics, Miscellaneous Mathematics, Functions, Functions
3848 @subsection Advanced Mathematical Functions
3849 @cindex mathematics, advanced
3851 Advanced mathematical functions take numeric arguments and produce
3854 @deftypefn {Function} {} EXP (@var{exponent})
3855 Returns @i{e} (approximately 2.71828) raised to power @var{exponent}.
3859 @deftypefn {Function} {} LG10 (@var{number})
3860 Takes the base-10 logarithm of @var{number}. If @var{number} is
3861 not positive, the result is system-missing.
3864 @deftypefn {Function} {} LN (@var{number})
3865 Takes the base-@i{e} logarithm of @var{number}. If @var{number} is
3866 not positive, the result is system-missing.
3869 @cindex square roots
3870 @deftypefn {Function} {} SQRT (@var{number})
3871 Takes the square root of @var{number}. If @var{number} is negative,
3872 the result is system-missing.
3875 @node Miscellaneous Mathematics, Trigonometry, Advanced Mathematics, Functions
3876 @subsection Miscellaneous Mathematical Functions
3877 @cindex mathematics, miscellaneous
3879 Miscellaneous mathematical functions take numeric arguments and produce
3882 @cindex absolute value
3883 @deftypefn {Function} {} ABS (@var{number})
3884 Results in the absolute value of @var{number}.
3888 @deftypefn {Function} {} MOD (@var{numerator}, @var{denominator})
3889 Returns the remainder (modulus) of @var{numerator} divided by
3890 @var{denominator}. If @var{denominator} is 0, the result is
3891 system-missing. However, if @var{numerator} is 0 and
3892 @var{denominator} is system-missing, the result is 0.
3895 @cindex modulus, by 10
3896 @deftypefn {Function} {} MOD10 (@var{number})
3897 Returns the remainder when @var{number} is divided by 10. If
3898 @var{number} is negative, MOD10(@var{number}) is negative or zero.
3902 @deftypefn {Function} {} RND (@var{number})
3903 Takes the absolute value of @var{number} and rounds it to an integer.
3904 Then, if @var{number} was negative originally, negates the result.
3908 @deftypefn {Function} {} TRUNC (@var{number})
3909 Discards the fractional part of @var{number}; that is, rounds
3910 @var{number} towards zero.
3913 @node Trigonometry, Missing Value Functions, Miscellaneous Mathematics, Functions
3914 @subsection Trigonometric Functions
3915 @cindex trigonometry
3917 Trigonometric functions take numeric arguments and produce numeric
3921 @cindex inverse cosine
3922 @deftypefn {Function} {} ACOS (@var{number})
3923 @deftypefnx {Function} {} ARCOS (@var{number})
3924 Takes the arccosine, in radians, of @var{number}. Results in
3925 system-missing if @var{number} is not between -1 and 1. Portability:
3930 @cindex inverse sine
3931 @deftypefn {Function} {} ARSIN (@var{number})
3932 Takes the arcsine, in radians, of @var{number}. Results in
3933 system-missing if @var{number} is not between -1 and 1 inclusive.
3937 @cindex inverse tangent
3938 @deftypefn {Function} {} ARTAN (@var{number})
3939 Takes the arctangent, in radians, of @var{number}.
3943 @cindex inverse sine
3944 @deftypefn {Function} {} ASIN (@var{number})
3945 Takes the arcsine, in radians, of @var{number}. Results in
3946 system-missing if @var{number} is not between -1 and 1 inclusive.
3951 @cindex inverse tangent
3952 @deftypefn {Function} {} ATAN (@var{number})
3953 Takes the arctangent, in radians, of @var{number}.
3957 @strong{Please note:} Use of the AR* group of inverse trigonometric
3958 functions is recommended over the A* group because they are more
3963 @deftypefn {Function} {} COS (@var{angle})
3964 Takes the cosine of @var{angle} which should be in radians.
3968 @deftypefn {Function} {} SIN (@var{angle})
3969 Takes the sine of @var{angle} which should be in radians.
3973 @deftypefn {Function} {} TAN (@var{angle})
3974 Takes the tangent of @var{angle} which should be in radians.
3975 Results in system-missing at values
3976 of @var{angle} that are too close to odd multiples of pi/2.
3980 @node Missing Value Functions, Pseudo-Random Numbers, Trigonometry, Functions
3981 @subsection Missing-Value Functions
3982 @cindex missing values
3983 @cindex values, missing
3984 @cindex functions, missing-value
3986 Missing-value functions take various types as arguments, returning
3987 various types of results.
3989 @deftypefn {Function} {} MISSING (@var{variable or expression})
3990 @var{num} may be a single variable name or an expression. If it is a
3991 variable name, results in 1 if the variable has a user-missing or
3992 system-missing value for the current case, 0 otherwise. If it is an
3993 expression, results in 1 if the expression has the system-missing value,
3997 @strong{Please note:} If the argument is a string expression other than
3998 a variable name, MISSING is guaranteed to return 0, because strings do
3999 not have a system-missing value. Also, when using a numeric expression
4000 argument, remember that user-missing values are converted to the
4001 system-missing value in most contexts. Thus, the expressions
4002 @code{MISSING(VAR1 @var{op} VAR2)} and @code{MISSING(VAR1) OR
4003 MISSING(VAR2)} are often equivalent, depending on the specific operator
4008 @deftypefn {Function} {} NMISS (@var{expr} [, @var{expr}]@dots{})
4009 Each argument must be a numeric expression. Returns the number of
4010 user- or system-missing values in the list. As a special extension,
4011 the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a
4012 range of variables; see @ref{Sets of Variables}, for more details.
4015 @deftypefn {Function} {} NVALID (@var{expr} [, @var{expr}]@dots{})
4016 Each argument must be a numeric expression. Returns the number of
4017 values in the list that are not user- or system-missing. As a special extension,
4018 the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a
4019 range of variables; see @ref{Sets of Variables}, for more details.
4022 @deftypefn {Function} {} SYSMIS (@var{variable or expression})
4023 When given the name of a numeric variable, returns 1 if the value of
4024 that variable is system-missing. Otherwise, if the value is not
4025 missing or if it is user-missing, returns 0. If given the name of a
4026 string variable, always returns 1. If given an expression other than
4027 a single variable name, results in 1 if the value is system- or
4028 user-missing, 0 otherwise.
4031 @deftypefn {Function} {} VALUE (@var{variable})
4032 Prevents the user-missing values of @var{variable} from being
4033 transformed into system-missing values: If @var{variable} is not
4034 system- or user-missing, results in the value of @var{variable}. If
4035 @var{variable} is user-missing, results in the value of @var{variable}
4036 anyway. If @var{variable} is system-missing, results in system-missing.
4039 @node Pseudo-Random Numbers, Set Membership, Missing Value Functions, Functions
4040 @subsection Pseudo-Random Number Generation Functions
4041 @cindex random numbers
4042 @cindex pseudo-random numbers (see random numbers)
4044 Pseudo-random number generation functions take numeric arguments and
4045 produce numeric results.
4048 The system's C library random generator is used as a basis for
4049 generating random numbers, since random number generation is a
4050 system-dependent task. However, Knuth's Algorithm B is used to
4051 shuffle the resultant values, which is enough to make even a stream of
4052 consecutive integers random enough for most applications.
4054 (If you're worried about the quality of the random number generator,
4055 well, you're using a statistical processing package---analyze it!)
4057 @cindex random numbers, normally-distributed
4058 @deftypefn {Function} {} NORMAL (@var{number})
4059 Results in a random number. Results from @code{NORMAL} are normally
4060 distributed with a mean of 0 and a standard deviation of @var{number}.
4063 @cindex random numbers, uniformly-distributed
4064 @deftypefn {Function} {} UNIFORM (@var{number})
4065 Results in a random number between 0 and @var{number}. Results from
4066 @code{UNIFORM} are evenly distributed across its entire range. There
4067 may be a maximum on the largest random number ever generated---this is
4075 (2,147,483,647), but it may be orders of magnitude
4079 @node Set Membership, Statistical Functions, Pseudo-Random Numbers, Functions
4080 @subsection Set-Membership Functions
4081 @cindex set membership
4082 @cindex membership, of set
4084 Set membership functions determine whether a value is a member of a set.
4085 They take a set of numeric arguments or a set of string arguments, and
4086 produce Boolean results.
4088 String comparisons are performed according to the rules given in
4089 @ref{Relational Operators}.
4091 @deftypefn {Function} {} ANY (@var{value}, @var{set} [, @var{set}]@dots{})
4092 Results in true if @var{value} is equal to any of the @var{set}
4093 values. Otherwise, results in false. If @var{value} is
4094 system-missing, returns system-missing. System-missing values in
4095 @var{set} do not cause ANY to return system-missing.
4098 @deftypefn {Function} {} RANGE (@var{value}, @var{low}, @var{high} [, @var{low}, @var{high}]@dots{})
4099 Results in true if @var{value} is in any of the intervals bounded by
4100 @var{low} and @var{high} inclusive. Otherwise, results in false.
4101 Each @var{low} must be less than or equal to its corresponding
4102 @var{high} value. @var{low} and @var{high} must be given in pairs.
4103 If @var{value} is system-missing, returns system-missing.
4104 System-missing values in @var{set} do not cause RANGE to return
4108 @node Statistical Functions, String Functions, Set Membership, Functions
4109 @subsection Statistical Functions
4110 @cindex functions, statistical
4113 Statistical functions compute descriptive statistics on a list of
4114 values. Some statistics can be computed on numeric or string values;
4115 other can only be computed on numeric values. They result in the same
4116 type as their arguments.
4118 @cindex arguments, minimum valid
4119 @cindex minimum valid number of arguments
4120 With statistical functions it is possible to specify a minimum number of
4121 non-missing arguments for the function to be evaluated. To do so,
4122 append a dot and the number to the function name. For instance, to
4123 specify a minimum of three valid arguments to the MEAN function, use the
4126 @cindex coefficient of variation
4127 @cindex variation, coefficient of
4128 @deftypefn {Function} {} CFVAR (@var{number}, @var{number}[, @dots{}])
4129 Results in the coefficient of variation of the values of @var{number}.
4130 This function requires at least two valid arguments to give a
4131 non-missing result. (The coefficient of variation is the standard
4132 deviation divided by the mean.)
4136 @deftypefn {Function} {} MAX (@var{value}, @var{value}[, @dots{}])
4137 Results in the value of the greatest @var{value}. The @var{value}s may
4138 be numeric or string. Although at least two arguments must be given,
4139 only one need be valid for MAX to give a non-missing result.
4143 @deftypefn {Function} {} MEAN (@var{number}, @var{number}[, @dots{}])
4144 Results in the mean of the values of @var{number}. Although at least
4145 two arguments must be given, only one need be valid for MEAN to give a
4150 @deftypefn {Function} {} MIN (@var{number}, @var{number}[, @dots{}])
4151 Results in the value of the least @var{value}. The @var{value}s may
4152 be numeric or string. Although at least two arguments must be given,
4153 only one need be valid for MAX to give a non-missing result.
4156 @cindex standard deviation
4157 @cindex deviation, standard
4158 @deftypefn {Function} {} SD (@var{number}, @var{number}[, @dots{}])
4159 Results in the standard deviation of the values of @var{number}.
4160 This function requires at least two valid arguments to give a
4165 @deftypefn {Function} {} SUM (@var{number}, @var{number}[, @dots{}])
4166 Results in the sum of the values of @var{number}. Although at least two
4167 arguments must be given, only one need by valid for SUM to give a
4172 @deftypefn {Function} {} VAR (@var{number}, @var{number}[, @dots{}])
4173 Results in the variance of the values of @var{number}. This function
4174 requires at least two valid arguments to give a non-missing result.
4177 @deftypefn {Function} {} VARIANCE (@var{number}, @var{number}[, @dots{}])
4178 Results in the variance of the values of @var{number}. This function
4179 requires at least two valid arguments to give a non-missing result.
4180 (Use VAR in preference to VARIANCE for reasons of portability.)
4183 @node String Functions, Time & Date, Statistical Functions, Functions
4184 @subsection String Functions
4185 @cindex functions, string
4186 @cindex string functions
4188 String functions take various arguments and return various results.
4190 @cindex concatenation
4191 @cindex strings, concatenation of
4192 @deftypefn {Function} {} CONCAT (@var{string}, @var{string}[, @dots{}])
4193 Returns a string consisting of each @var{string} in sequence.
4194 @code{CONCAT("abc", "def", "ghi")} has a value of @code{"abcdefghi"}.
4195 The resultant string is truncated to a maximum of 255 characters.
4198 @cindex searching strings
4199 @deftypefn {Function} {} INDEX (@var{haystack}, @var{needle})
4200 Returns a positive integer indicating the position of the first
4201 occurrence @var{needle} in @var{haystack}. Returns 0 if @var{haystack}
4202 does not contain @var{needle}. Returns system-missing if @var{needle}
4206 @deftypefn {Function} {} INDEX (@var{haystack}, @var{needle}, @var{divisor})
4207 Divides @var{needle} into parts, each with length @var{divisor}.
4208 Searches @var{haystack} for the first occurrence of each part, and
4209 returns the smallest value. Returns 0 if @var{haystack} does not
4210 contain any part in @var{needle}. It is an error if @var{divisor}
4211 cannot be evenly divided into the length of @var{needle}. Returns
4212 system-missing if @var{needle} is an empty string.
4215 @cindex strings, finding length of
4216 @deftypefn {Function} {} LENGTH (@var{string})
4217 Returns the number of characters in @var{string}.
4220 @cindex strings, case of
4221 @deftypefn {Function} {} LOWER (@var{string})
4222 Returns a string identical to @var{string} except that all uppercase
4223 letters are changed to lowercase letters. The definitions of
4224 ``uppercase'' and ``lowercase'' are system-dependent.
4227 @cindex strings, padding
4228 @deftypefn {Function} {} LPAD (@var{string}, @var{length})
4229 If @var{string} is at least @var{length} characters in length, returns
4230 @var{string} unchanged. Otherwise, returns @var{string} padded with
4231 spaces on the left side to length @var{length}. Returns an empty string
4232 if @var{length} is system-missing, negative, or greater than 255.
4235 @deftypefn {Function} {} LPAD (@var{string}, @var{length}, @var{padding})
4236 If @var{string} is at least @var{length} characters in length, returns
4237 @var{string} unchanged. Otherwise, returns @var{string} padded with
4238 @var{padding} on the left side to length @var{length}. Returns an empty
4239 string if @var{length} is system-missing, negative, or greater than 255, or
4240 if @var{padding} does not contain exactly one character.
4243 @cindex strings, trimming
4244 @cindex whitespace, trimming
4245 @deftypefn {Function} {} LTRIM (@var{string})
4246 Returns @var{string}, after removing leading spaces. Other whitespace,
4247 such as tabs, carriage returns, line feeds, and vertical tabs, is not
4251 @deftypefn {Function} {} LTRIM (@var{string}, @var{padding})
4252 Returns @var{string}, after removing leading @var{padding} characters.
4253 If @var{padding} does not contain exactly one character, returns an
4257 @cindex numbers, converting from strings
4258 @cindex strings, converting to numbers
4259 @deftypefn {Function} {} NUMBER (@var{string})
4260 Returns the number produced when @var{string} is interpreted according
4261 to format F@var{x}.0, where @var{x} is the number of characters in
4262 @var{string}. If @var{string} does not form a proper number,
4263 system-missing is returned without an error message. Portability: none.
4266 @deftypefn {Function} {} NUMBER (@var{string}, @var{format})
4267 Returns the number produced when @var{string} is interpreted according
4268 to format specifier @var{format}. Only the number of characters in
4269 @var{string} specified by @var{format} are examined. For example,
4270 @code{NUMBER("123", F3.0)} and @code{NUMBER("1234", F3.0)} both have
4271 value 123. If @var{string} does not form a proper number,
4272 system-missing is returned without an error message.
4275 @cindex strings, searching backwards
4276 @deftypefn {Function} {} RINDEX (@var{string}, @var{format})
4277 Returns a positive integer indicating the position of the last
4278 occurrence of @var{needle} in @var{haystack}. Returns 0 if
4279 @var{haystack} does not contain @var{needle}. Returns system-missing if
4280 @var{needle} is an empty string.
4283 @deftypefn {Function} {} RINDEX (@var{haystack}, @var{needle}, @var{divisor})
4284 Divides @var{needle} into parts, each with length @var{divisor}.
4285 Searches @var{haystack} for the last occurrence of each part, and
4286 returns the largest value. Returns 0 if @var{haystack} does not contain
4287 any part in @var{needle}. It is an error if @var{divisor} cannot be
4288 evenly divided into the length of @var{needle}. Returns system-missing
4289 if @var{needle} is an empty string.
4292 @cindex padding strings
4293 @cindex strings, padding
4294 @deftypefn {Function} {} RPAD (@var{string}, @var{length})
4295 If @var{string} is at least @var{length} characters in length, returns
4296 @var{string} unchanged. Otherwise, returns @var{string} padded with
4297 spaces on the right to length @var{length}. Returns an empty string if
4298 @var{length} is system-missing, negative, or greater than 255.
4301 @deftypefn {Function} {} RPAD (@var{string}, @var{length}, @var{padding})
4302 If @var{string} is at least @var{length} characters in length, returns
4303 @var{string} unchanged. Otherwise, returns @var{string} padded with
4304 @var{padding} on the right to length @var{length}. Returns an empty
4305 string if @var{length} is system-missing, negative, or greater than 255,
4306 or if @var{padding} does not contain exactly one character.
4309 @cindex strings, trimming
4310 @cindex whitespace, trimming
4311 @deftypefn {Function} {} RTRIM (@var{string})
4312 Returns @var{string}, after removing trailing spaces. Other types of
4313 whitespace are not removed.
4316 @deftypefn {Function} {} RTRIM (@var{string}, @var{padding})
4317 Returns @var{string}, after removing trailing @var{padding} characters.
4318 If @var{padding} does not contain exactly one character, returns an
4322 @cindex strings, converting from numbers
4323 @cindex numbers, converting to strings
4324 @deftypefn {Function} {} STRING (@var{number}, @var{format})
4325 Returns a string corresponding to @var{number} in the format given by
4326 format specifier @var{format}. For example, @code{STRING(123.56, F5.1)}
4327 has the value @code{"123.6"}.
4331 @cindex strings, taking substrings of
4332 @deftypefn {Function} {} SUBSTR (@var{string}, @var{start})
4333 Returns a string consisting of the value of @var{string} from position
4334 @var{start} onward. Returns an empty string if @var{start} is system-missing
4335 or has a value less than 1 or greater than the number of characters in
4339 @deftypefn {Function} {} SUBSTR (@var{string}, @var{start}, @var{count})
4340 Returns a string consisting of the first @var{count} characters from
4341 @var{string} beginning at position @var{start}. Returns an empty string
4342 if @var{start} or @var{count} is system-missing, if @var{start} is less
4343 than 1 or greater than the number of characters in @var{string}, or if
4344 @var{count} is less than 1. Returns a string shorter than @var{count}
4345 characters if @var{start} + @var{count} - 1 is greater than the number
4346 of characters in @var{string}. Examples: @code{SUBSTR("abcdefg", 3, 2)}
4347 has value @code{"cd"}; @code{SUBSTR("Ben Pfaff", 5, 10)} has the value
4351 @cindex case conversion
4352 @cindex strings, case of
4353 @deftypefn {Function} {} UPCASE (@var{string})
4354 Returns @var{string}, changing lowercase letters to uppercase letters.
4357 @node Time & Date, Miscellaneous Functions, String Functions, Functions
4358 @subsection Time & Date Functions
4359 @cindex functions, time & date
4363 @cindex dates, legal range of
4364 The legal range of dates for use in PSPP is 15 Oct 1582
4365 through 31 Dec 19999.
4367 @cindex arguments, invalid
4368 @cindex invalid arguments
4370 @strong{Please note:} Most time & date extraction functions will accept
4375 Negative numbers in PSPP time format.
4377 Numbers less than 86,400 in PSPP date format.
4380 However, sensible results are not guaranteed for these invalid values.
4381 The given equivalents for these functions are definitely not guaranteed
4386 @strong{Please note also:} The time & date construction
4387 functions @strong{do} produce reasonable and useful results for
4388 out-of-range values; these are not considered invalid.
4392 * Time & Date Concepts:: How times & dates are defined and represented
4393 * Time Construction:: TIME.@{DAYS HMS@}
4394 * Time Extraction:: CTIME.@{DAYS HOURS MINUTES SECONDS@}
4395 * Date Construction:: DATE.@{DMY MDY MOYR QYR WKYR YRDAY@}
4396 * Date Extraction:: XDATE.@{DATE HOUR JDAY MDAY MINUTE MONTH
4397 QUARTER SECOND TDAY TIME WEEK
4401 @node Time & Date Concepts, Time Construction, Time & Date, Time & Date
4402 @subsubsection How times & dates are defined and represented
4404 @cindex time, concepts
4405 @cindex time, intervals
4406 Times and dates are handled by PSPP as single numbers. A
4407 @dfn{time} is an interval. PSPP measures times in seconds.
4408 Thus, the following intervals correspond with the numeric values given:
4413 1 day, 3 hours, 10 seconds 97,210
4415 10010 d, 14 min, 24 s 864,864,864
4418 @cindex dates, concepts
4419 @cindex time, instants of
4420 A @dfn{date}, on the other hand, is a particular instant in the past or
4421 the future. PSPP represents a date as a number of seconds after the
4422 midnight that separated 8 Oct 1582 and 9 Oct 1582. (Please note that 15
4423 Oct 1582 immediately followed 9 Oct 1582.) Thus, the midnights before
4424 the dates given below correspond with the numeric PSPP dates given:
4428 4 Jul 1776 6,113,318,400
4429 1 Jan 1900 10,010,390,400
4430 1 Oct 1978 12,495,427,200
4431 24 Aug 1995 13,028,601,600
4434 @cindex time, mathematical properties of
4435 @cindex mathematics, applied to times & dates
4436 @cindex dates, mathematical properties of
4442 A time may be added to, or subtracted from, a date, resulting in a date.
4445 The difference of two dates may be taken, resulting in a time.
4448 Two times may be added to, or subtracted from, each other, resulting in
4452 (Adding two dates does not produce a useful result.)
4454 Since times and dates are merely numbers, the ordinary addition and
4455 subtraction operators are employed for these purposes.
4458 @strong{Please note:} Many dates and times have extremely large
4459 values---just look at the values above. Thus, it is not a good idea to
4460 take powers of these values; also, the accuracy of some procedures may
4461 be affected. If necessary, convert times or dates in seconds to some
4462 other unit, like days or years, before performing analysis.
4465 @node Time Construction, Time Extraction, Time & Date Concepts, Time & Date
4466 @subsubsection Functions that Produce Times
4467 @cindex times, constructing
4468 @cindex constructing times
4470 These functions take numeric arguments and produce numeric results in
4474 @cindex time, in days
4475 @deftypefn {Function} {} TIME.DAYS (@var{ndays})
4476 Results in a time value corresponding to @var{ndays} days.
4477 (@code{TIME.DAYS(@var{x})} is equivalent to @code{@var{x} * 60 * 60 *
4481 @cindex hours-minutes-seconds
4482 @cindex time, in hours-minutes-seconds
4483 @deftypefn {Function} {} TIME.HMS (@var{nhours}, @var{nmins}, @var{nsecs})
4484 Results in a time value corresponding to @var{nhours} hours, @var{nmins}
4485 minutes, and @var{nsecs} seconds. (@code{TIME.HMS(@var{h}, @var{m},
4486 @var{s})} is equivalent to @code{@var{h}*60*60 + @var{m}*60 +
4490 @node Time Extraction, Date Construction, Time Construction, Time & Date
4491 @subsubsection Functions that Examine Times
4492 @cindex extraction, of time
4493 @cindex time examination
4494 @cindex examination, of times
4495 @cindex time, lengths of
4497 These functions take numeric arguments in PSPP time format and
4498 give numeric results.
4501 @cindex time, in days
4502 @deftypefn {Function} {} CTIME.DAYS (@var{time})
4503 Results in the number of days and fractional days in @var{time}.
4504 (@code{CTIME.DAYS(@var{x})} is equivalent to @code{@var{x}/60/60/24}.)
4508 @cindex time, in hours
4509 @deftypefn {Function} {} CTIME.HOURS (@var{time})
4510 Results in the number of hours and fractional hours in @var{time}.
4511 (@code{CTIME.HOURS(@var{x})} is equivalent to @code{@var{x}/60/60}.)
4515 @cindex time, in minutes
4516 @deftypefn {Function} {} CTIME.MINUTES (@var{time})
4517 Results in the number of minutes and fractional minutes in @var{time}.
4518 (@code{CTIME.MINUTES(@var{x})} is equivalent to @code{@var{x}/60}.)
4522 @cindex time, in seconds
4523 @deftypefn {Function} {} CTIME.SECONDS (@var{time})
4524 Results in the number of seconds and fractional seconds in @var{time}.
4525 (@code{CTIME.SECONDS} does nothing; @code{CTIME.SECONDS(@var{x})} is
4526 equivalent to @code{@var{x}}.)
4529 @node Date Construction, Date Extraction, Time Extraction, Time & Date
4530 @subsubsection Functions that Produce Dates
4531 @cindex dates, constructing
4532 @cindex constructing dates
4534 @cindex arguments, of date construction functions
4535 These functions take numeric arguments and give numeric results in the
4536 PSPP date format. Arguments taken by these functions are:
4540 Refers to a day of the month between 1 and 31.
4543 Refers to a month of the year between 1 and 12.
4546 Refers to a quarter of the year between 1 and 4. The quarters of the
4547 year begin on the first days of months 1, 4, 7, and 10.
4550 Refers to a week of the year between 1 and 53.
4553 Refers to a day of the year between 1 and 366.
4556 Refers to a year between 1582 and 19999.
4559 @cindex arguments, invalid
4560 If these functions' arguments are out-of-range, they are correctly
4561 normalized before conversion to date format. Non-integers are rounded
4564 @cindex day-month-year
4565 @cindex dates, day-month-year
4566 @deftypefn {Function} {} DATE.DMY (@var{day}, @var{month}, @var{year})
4567 @deftypefnx {Function} {} DATE.MDY (@var{month}, @var{day}, @var{year})
4568 Results in a date value corresponding to the midnight before day
4569 @var{day} of month @var{month} of year @var{year}.
4573 @cindex dates, month-year
4574 @deftypefn {Function} {} DATE.MOYR (@var{month}, @var{year})
4575 Results in a date value corresponding to the midnight before the first
4576 day of month @var{month} of year @var{year}.
4579 @cindex quarter-year
4580 @cindex dates, quarter-year
4581 @deftypefn {Function} {} DATE.QYR (@var{quarter}, @var{year})
4582 Results in a date value corresponding to the midnight before the first
4583 day of quarter @var{quarter} of year @var{year}.
4587 @cindex dates, week-year
4588 @deftypefn {Function} {} DATE.WKYR (@var{week}, @var{year})
4589 Results in a date value corresponding to the midnight before the first
4590 day of week @var{week} of year @var{year}.
4594 @cindex dates, year-day
4595 @deftypefn {Function} {} DATE.YRDAY (@var{year}, @var{yday})
4596 Results in a date value corresponding to the midnight before day
4597 @var{yday} of year @var{year}.
4600 @node Date Extraction, , Date Construction, Time & Date
4601 @subsubsection Functions that Examine Dates
4602 @cindex extraction, of dates
4603 @cindex date examination
4605 @cindex arguments, of date extraction functions
4606 These functions take numeric arguments in PSPP date or time
4607 format and give numeric results. These names are used for arguments:
4611 A numeric value in PSPP date format.
4614 A numeric value in PSPP time format.
4617 A numeric value in PSPP time or date format.
4621 @cindex dates, in days
4622 @cindex time, in days
4623 @deftypefn {Function} {} XDATE.DATE (@var{time-or-date})
4624 For a time, results in the time corresponding to the number of whole
4625 days @var{date-or-time} includes. For a date, results in the date
4626 corresponding to the latest midnight at or before @var{date-or-time};
4627 that is, gives the date that @var{date-or-time} is in.
4628 (XDATE.DATE(@var{x}) is equivalent to TRUNC(@var{x}/86400)*86400.)
4629 Applying this function to a time is a non-portable feature.
4633 @cindex dates, in hours
4634 @cindex time, in hours
4635 @deftypefn {Function} {} XDATE.HOUR (@var{time-or-date})
4636 For a time, results in the number of whole hours beyond the number of
4637 whole days represented by @var{date-or-time}. For a date, results in
4638 the hour (as an integer between 0 and 23) corresponding to
4639 @var{date-or-time}. (XDATE.HOUR(@var{x}) is equivalent to
4640 MOD(TRUNC(@var{x}/3600),24)) Applying this function to a time is a
4641 non-portable feature.
4644 @cindex day of the year
4645 @cindex dates, day of the year
4646 @deftypefn {Function} {} XDATE.JDAY (@var{date})
4647 Results in the day of the year (as an integer between 1 and 366)
4648 corresponding to @var{date}.
4651 @cindex day of the month
4652 @cindex dates, day of the month
4653 @deftypefn {Function} {} XDATE.MDAY (@var{date})
4654 Results in the day of the month (as an integer between 1 and 31)
4655 corresponding to @var{date}.
4659 @cindex dates, in minutes
4660 @cindex time, in minutes
4661 @deftypefn {Function} {} XDATE.MINUTE (@var{time-or-date})
4662 Results in the number of minutes (as an integer between 0 and 59) after
4663 the last hour in @var{time-or-date}. (XDATE.MINUTE(@var{x}) is
4664 equivalent to MOD(TRUNC(@var{x}/60),60)) Applying this function to a
4665 time is a non-portable feature.
4669 @cindex dates, in months
4670 @deftypefn {Function} {} XDATE.MONTH (@var{date})
4671 Results in the month of the year (as an integer between 1 and 12)
4672 corresponding to @var{date}.
4676 @cindex dates, in quarters
4677 @deftypefn {Function} {} XDATE.QUARTER (@var{date})
4678 Results in the quarter of the year (as an integer between 1 and 4)
4679 corresponding to @var{date}.
4683 @cindex dates, in seconds
4684 @cindex time, in seconds
4685 @deftypefn {Function} {} XDATE.SECOND (@var{time-or-date})
4686 Results in the number of whole seconds after the last whole minute (as
4687 an integer between 0 and 59) in @var{time-or-date}.
4688 (XDATE.SECOND(@var{x}) is equivalent to MOD(@var{x}, 60).) Applying
4689 this function to a time is a non-portable feature.
4693 @cindex times, in days
4694 @deftypefn {Function} {} XDATE.TDAY (@var{time})
4695 Results in the number of whole days (as an integer) in @var{time}.
4696 (XDATE.TDAY(@var{x}) is equivalent to TRUNC(@var{x}/86400).)
4700 @cindex dates, time of day
4701 @deftypefn {Function} {} XDATE.TIME (@var{date})
4702 Results in the time of day at the instant corresponding to @var{date},
4703 in PSPP time format. This is the number of seconds since
4704 midnight on the day corresponding to @var{date}. (XDATE.TIME(@var{x}) is
4705 equivalent to TRUNC(@var{x}/86400)*86400.)
4709 @cindex dates, in weeks
4710 @deftypefn {Function} {} XDATE.WEEK (@var{date})
4711 Results in the week of the year (as an integer between 1 and 53)
4712 corresponding to @var{date}.
4715 @cindex day of the week
4717 @cindex dates, day of the week
4718 @cindex dates, in weekdays
4719 @deftypefn {Function} {} XDATE.WKDAY (@var{date})
4720 Results in the day of week (as an integer between 1 and 7) corresponding
4721 to @var{date}. The days of the week are:
4742 @cindex dates, in years
4743 @deftypefn {Function} {} XDATE.YEAR (@var{date})
4744 Returns the year (as an integer between 1582 and 19999) corresponding to
4748 @node Miscellaneous Functions, Functions Not Implemented, Time & Date, Functions
4749 @subsection Miscellaneous Functions
4750 @cindex functions, miscellaneous
4752 Miscellaneous functions take various arguments and produce various
4755 @cindex cross-case function
4756 @cindex function, cross-case
4757 @deftypefn {Function} {} LAG (@var{variable})
4758 @var{variable} must be a numeric or string variable name. @code{LAG}
4759 results in the value of that variable for the case before the current
4760 one. In case-selection procedures, @code{LAG} results in the value of
4761 the variable for the last case selected. Results in system-missing (for
4762 numeric variables) or blanks (for string variables) for the first case
4763 or before any cases are selected.
4766 @deftypefn {Function} {} LAG (@var{variable}, @var{ncases})
4767 @var{variable} must be a numeric or string variable name. @var{ncases}
4768 must be a small positive constant integer, although there is no explicit
4769 limit. (Use of a large value for @var{ncases} will increase memory
4770 consumption, since PSPP must keep @var{ncases} cases in memory.)
4771 @code{LAG (@var{variable}, @var{ncases}} results in the value of
4772 @var{variable} that is @var{ncases} before the case currently being
4773 processed. See @code{LAG (@var{variable})} above for more details.
4776 @cindex date, Julian
4778 @deftypefn {Function} {} YRMODA (@var{year}, @var{month}, @var{day})
4779 @var{year} is a year between 0 and 199 or 1582 and 19999. @var{month} is
4780 a month between 1 and 12. @var{day} is a day between 1 and 31. If
4781 @var{month} or @var{day} is out-of-range, it changes the next higher
4782 unit. For instance, a @var{day} of 0 refers to the last day of the
4783 previous month, and a @var{month} of 13 refers to the first month of the
4784 next year. @var{year} must be in range. If @var{year} is between 0 and
4785 199, 1900 is added. @var{year}, @var{month}, and @var{day} must all be
4788 @code{YRMODA} results in the number of days between 15 Oct 1582 and
4789 the date specified, plus one. The date passed to @code{YRMODA} must be
4790 on or after 15 Oct 1582. 15 Oct 1582 has a value of 1.
4793 @node Functions Not Implemented, , Miscellaneous Functions, Functions
4794 @subsection Functions Not Implemented
4795 @cindex functions, not implemented
4796 @cindex not implemented
4797 @cindex features, not implemented
4799 These functions are not yet implemented and thus not yet documented,
4800 since it's a hassle.
4824 @node Order of Operations, , Functions, Expressions
4825 @section Operator Precedence
4826 @cindex operator precedence
4827 @cindex precedence, operator
4828 @cindex order of operations
4829 @cindex operations, order of
4831 The following table describes operator precedence. Smaller-numbered
4832 levels in the table have higher precedence. Within a level, operations
4833 are performed from left to right, except for level 2 (exponentiation),
4834 where operations are performed from right to left. If an operator
4835 appears in the table in two places (@code{-}), the first occurrence is
4836 unary, the second is binary.
4850 @code{EQ GE GT LE LT NE}
4855 @node Data Input and Output, System and Portable Files, Expressions, Top
4856 @chapter Data Input and Output
4861 @cindex observations
4863 Data are the focus of the PSPP language.
4864 Each datum belongs to a @dfn{case} (also called an @dfn{observation}).
4865 Each case represents an individual or `experimental unit'.
4866 For example, in the results of a survey, the names of the respondents,
4867 their sex, age @i{etc}. and their responses are all data and the data
4868 pertaining to single respondent is a case.
4869 This chapter examines
4870 the PSPP commands for defining variables and reading and writing data.
4873 @strong{Please note:} Data is not actually read until a procedure is
4874 executed. These commands tell PSPP how to read data, but they
4875 do not @emph{cause} PSPP to read data.
4879 * BEGIN DATA:: Embed data within a syntax file.
4880 * CLEAR TRANSFORMATIONS:: Clear pending transformations.
4881 * DATA LIST:: Fundamental data reading command.
4882 * END CASE:: Output the current case.
4883 * END FILE:: Terminate the current input program.
4884 * FILE HANDLE:: Support for fixed-length records.
4885 * INPUT PROGRAM:: Support for complex input programs.
4886 * LIST:: List cases in the active file.
4887 * MATRIX DATA:: Read matrices in text format.
4888 * NEW FILE:: Clear the active file and dictionary.
4889 * PRINT:: Display values in print formats.
4890 * PRINT EJECT:: Eject the current page then print.
4891 * PRINT SPACE:: Print blank lines.
4892 * REREAD:: Take another look at the previous input line.
4893 * REPEATING DATA:: Multiple cases on a single line.
4894 * WRITE:: Display values in write formats.
4897 @node BEGIN DATA, CLEAR TRANSFORMATIONS, Data Input and Output, Data Input and Output
4901 @cindex Embedding data in syntax files
4902 @cindex Data, embedding in syntax files
4910 BEGIN DATA and END DATA can be used to embed raw ASCII data in a PSPP
4911 syntax file. DATA LIST or another input procedure must be used before
4912 BEGIN DATA (@pxref{DATA LIST}). BEGIN DATA and END DATA must be used
4913 together. The END DATA command must appear by itself on a single line,
4914 with no leading whitespace and exactly one space between the words
4915 @code{END} and @code{DATA}, followed immediately by the terminal dot,
4922 @node CLEAR TRANSFORMATIONS, DATA LIST, BEGIN DATA, Data Input and Output
4923 @section CLEAR TRANSFORMATIONS
4924 @vindex CLEAR TRANSFORMATIONS
4927 CLEAR TRANSFORMATIONS.
4930 The CLEAR TRANSFORMATIONS command clears out all pending
4931 transformations. It does not cancel the current input program. It is
4932 valid only when PSPP is interactive, not in syntax files.
4934 @node DATA LIST, END CASE, CLEAR TRANSFORMATIONS, Data Input and Output
4937 @cindex reading data from a file
4938 @cindex data, reading from a file
4939 @cindex data, embedding in syntax files
4940 @cindex embedding data in syntax files
4942 Used to read text or binary data, DATA LIST is the most
4943 fundamental data-reading command. Even the more sophisticated input
4944 methods use DATA LIST commands as a building block.
4945 Understanding DATA LIST is important to understanding how to use
4946 PSPP to read your data files.
4948 There are two major variants of DATA LIST, which are fixed
4949 format and free format. In addition, free format has a minor variant,
4950 list format, which is discussed in terms of its differences from vanilla
4953 Each form of DATA LIST is described in detail below.
4956 * DATA LIST FIXED:: Fixed columnar locations for data.
4957 * DATA LIST FREE:: Any spacing you like.
4958 * DATA LIST LIST:: Each case must be on a single line.
4961 @node DATA LIST FIXED, DATA LIST FREE, DATA LIST, DATA LIST
4962 @subsection DATA LIST FIXED
4963 @vindex DATA LIST FIXED
4964 @cindex reading fixed-format data
4965 @cindex fixed-format data, reading
4966 @cindex data, fixed-format, reading
4967 @cindex embedding fixed-format data
4973 RECORDS=record_count
4975 /[line_no] var_spec@dots{}
4977 where each var_spec takes one of the forms
4978 var_list start-end [type_spec]
4979 var_list (fortran_spec)
4982 DATA LIST FIXED is used to read data files that have values at fixed
4983 positions on each line of single-line or multiline records. The
4984 keyword FIXED is optional.
4986 The FILE subcommand must be used if input is to be taken from an
4987 external file. It may be used to specify a filename as a string or a
4988 file handle (@pxref{FILE HANDLE}). If the FILE subcommand is not used,
4989 then input is assumed to be specified within the command file using
4990 BEGIN DATA@dots{}END DATA (@pxref{BEGIN DATA}).
4992 The optional RECORDS subcommand, which takes a single integer as an
4993 argument, is used to specify the number of lines per record. If RECORDS
4994 is not specified, then the number of lines per record is calculated from
4995 the list of variable specifications later in the DATA LIST command.
4997 The END subcommand is only useful in conjunction with the INPUT PROGRAM
4998 input procedure, and for that reason it is not discussed here
4999 (@pxref{INPUT PROGRAM}).
5001 DATA LIST can optionally output a table describing how the data file
5002 will be read. The TABLE subcommand enables this output, and NOTABLE
5003 disables it. The default is to output the table.
5005 The list of variables to be read from the data list must come last in
5006 the DATA LIST command. Each line in the data record is introduced by a
5007 slash (@samp{/}). Optionally, a line number may follow the slash.
5008 Following, any number of variable specifications may be present.
5010 Each variable specification consists of a list of variable names
5011 followed by a description of their location on the input line. Sets of
5012 variables may specified using DATA LIST's TO convention (@pxref{Sets of
5013 Variables}). There are two ways to specify the location of the variable
5014 on the line: SPSS style and FORTRAN style.
5016 With SPSS style, the starting column and ending column for the field
5017 are specified after the variable name, separated by a dash (@samp{-}).
5018 For instance, the third through fifth columns on a line would be
5019 specified @samp{3-5}. By default, variables are considered to be in
5020 @samp{F} format (@pxref{Input/Output Formats}). (This default can be
5021 changed; see @ref{SET} for more information.)
5023 When using SPSS style, to use a variable format other than the default,
5024 specify the format type in parentheses after the column numbers. For
5025 instance, for alphanumeric @samp{A} format, use @samp{(A)}.
5027 In addition, implied decimal places can be specified in parentheses
5028 after the column numbers. As an example, suppose that a data file has a
5029 field in which the characters @samp{1234} should be interpreted as
5030 having the value 12.34. Then this field has two implied decimal places,
5031 and the corresponding specification would be @samp{(2)}. If a field
5032 that has implied decimal places contains a decimal point, then the
5033 implied decimal places are not applied.
5035 Changing the variable format and adding implied decimal places can be
5036 done together; for instance, @samp{(N,5)}.
5038 When using SPSS style, the input and output width of each variable is
5039 computed from the field width. The field width must be evenly divisible
5040 into the number of variables specified.
5042 FORTRAN style is an altogether different approach to specifying field
5043 locations. With this approach, a list of variable input format
5044 specifications, separated by commas, are placed after the variable names
5045 inside parentheses. Each format specifier advances as many characters
5046 into the input line as it uses.
5048 In addition to the standard format specifiers (@pxref{Input/Output
5049 Formats}), FORTRAN style defines some extensions:
5053 Advance the current column on this line by one character position.
5055 @item @code{T}@var{x}
5056 Set the current column on this line to column @var{x}, with column
5057 numbers considered to begin with 1 at the left margin.
5059 @item @code{NEWREC}@var{x}
5060 Skip forward @var{x} lines in the current record, resetting the active
5061 column to the left margin.
5064 Any format specifier may be preceded by a number. This causes the
5065 action of that format specifier to be repeated the specified number of
5068 @item (@var{spec1}, @dots{}, @var{specN})
5069 Group the given specifiers together. This is most useful when preceded
5070 by a repeat count. Groups may be nested arbitrarily.
5073 FORTRAN and SPSS styles may be freely intermixed. SPSS style leaves the
5074 active column immediately after the ending column specified. Record
5075 motion using @code{NEWREC} in FORTRAN style also applies to later
5076 FORTRAN and SPSS specifiers.
5079 * DATA LIST FIXED Examples:: Examples of DATA LIST FIXED.
5082 @node DATA LIST FIXED Examples, , DATA LIST FIXED, DATA LIST FIXED
5083 @unnumberedsubsubsec Examples
5088 DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
5097 Defines the following variables:
5101 @code{NAME}, a 10-character-wide long string variable, in columns 1
5105 @code{INFO1}, a numeric variable, in columns 12 through 13.
5108 @code{INFO2}, a numeric variable, in columns 14 through 15.
5111 @code{INFO3}, a numeric variable, in columns 16 through 17.
5114 The @code{BEGIN DATA}/@code{END DATA} commands cause three cases to be
5118 Case NAME INFO1 INFO2 INFO3
5119 1 John Smith 10 23 11
5120 2 Bob Arnold 12 20 15
5124 The @code{TABLE} keyword causes PSPP to print out a table
5125 describing the four variables defined.
5129 DAT LIS FIL="survey.dat"
5130 /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
5135 Defines the following variables:
5139 @code{ID}, a numeric variable, in columns 1-5 of the first record.
5142 @code{NAME}, a 30-character long string variable, in columns 7-36 of the
5146 @code{SURNAME}, a 30-character long string variable, in columns 38-67 of
5150 @code{MINITIAL}, a 1-character short string variable, in column 69 of
5154 Fifty variables @code{Q01}, @code{Q02}, @code{Q03}, @dots{}, @code{Q49},
5155 @code{Q50}, all numeric, @code{Q01} in column 7, @code{Q02} in column 8,
5156 @dots{}, @code{Q49} in column 55, @code{Q50} in column 56, all in the second
5160 Cases are separated by a blank record.
5162 Data is read from file @file{survey.dat} in the current directory.
5164 This example shows keywords abbreviated to their first 3 letters.
5168 @node DATA LIST FREE, DATA LIST LIST, DATA LIST FIXED, DATA LIST
5169 @subsection DATA LIST FREE
5170 @vindex DATA LIST FREE
5179 where each var_spec takes one of the forms
5180 var_list [(type_spec)]
5184 In free format, the input data is structured as a series of comma- or
5185 whitespace-delimited fields (end of line is one form of whitespace; it
5186 is not treated specially). Field contents may be surrounded by matched
5187 pairs of apostrophes (@samp{'}) or quotes (@samp{"}), or they may be
5188 unenclosed. For any type of field leading white space (up to the
5189 apostrophe or quote, if any) is not included in the field.
5191 Multiple consecutive delimiters are equivalent to a single delimiter.
5192 To specify an empty field, write an empty set of single or double
5193 quotes; for instance, @samp{""}.
5195 The NOTABLE and TABLE subcommands are as in DATA LIST FIXED above.
5196 NOTABLE is the default.
5198 The FILE and END subcommands are as in DATA LIST FIXED above.
5200 The variables to be parsed are given as a single list of variable names.
5201 This list must be introduced by a single slash (@samp{/}). The set of
5202 variable names may contain format specifications in parentheses
5203 (@pxref{Input/Output Formats}). Format specifications apply to all
5204 variables back to the previous parenthesized format specification.
5206 In addition, an asterisk may be used to indicate that all variables
5207 preceding it are to have input/output format @samp{F8.0}.
5209 Specified field widths are ignored on input, although all normal limits
5210 on field width apply, but they are honored on output.
5212 @node DATA LIST LIST, , DATA LIST FREE, DATA LIST
5213 @subsection DATA LIST LIST
5214 @vindex DATA LIST LIST
5223 where each var_spec takes one of the forms
5224 var_list [(type_spec)]
5228 Syntactically and semantically, DATA LIST LIST is equivalent to DATA
5229 LIST FREE, with one exception: each input line is expected to correspond
5230 to exactly one input record. If more or fewer fields are found on an
5231 input line than expected, an appropriate diagnostic is issued.
5233 @node END CASE, END FILE, DATA LIST, Data Input and Output
5241 END CASE is used within INPUT PROGRAM to output the current case.
5242 @xref{INPUT PROGRAM}.
5244 @node END FILE, FILE HANDLE, END CASE, Data Input and Output
5252 END FILE is used within INPUT PROGRAM to terminate the current input
5253 program. @xref{INPUT PROGRAM}.
5255 @node FILE HANDLE, INPUT PROGRAM, END FILE, Data Input and Output
5256 @section FILE HANDLE
5260 FILE HANDLE handle_name
5262 /RECFORM=@{VARIABLE,FIXED,SPANNED@}
5264 /MODE=@{CHARACTER,IMAGE,BINARY,MULTIPUNCH,360@}
5267 Use the FILE HANDLE command to define the attributes of a file that does
5268 not use conventional variable-length records terminated by newline
5271 Specify the file handle name as an identifier. Any given identifier may
5272 only appear once in a PSPP run. File handles may not be reassigned to a
5273 different file. The file handle name must immediately follow the FILE
5274 HANDLE command name.
5276 The NAME subcommand specifies the name of the file associated with the
5277 handle. It is the only required subcommand.
5279 The RECFORM subcommand specifies how the file is laid out. VARIABLE
5280 specifies variable-length lines terminated with newlines, and it is the
5281 default. FIXED specifies fixed-length records. SPANNED is not
5284 LRECL specifies the length of fixed-length records. It is required if
5285 @code{/RECFORM FIXED} is specified.
5287 MODE specifies a file mode. CHARACTER, the default, causes the data
5288 file to be opened in ANSI C text mode. BINARY causes the data file to
5289 be opened in ANSI C binary mode. The other possibilities are not
5292 @node INPUT PROGRAM, LIST, FILE HANDLE, Data Input and Output
5293 @section INPUT PROGRAM
5294 @vindex INPUT PROGRAM
5298 @dots{} input commands @dots{}
5302 The INPUT PROGRAM@dots{}END INPUT PROGRAM construct is used to specify a
5303 complex input program. By placing data input commands within INPUT
5304 PROGRAM, PSPP programs can take advantage of more complex file
5305 structures than available by using DATA LIST by itself.
5307 The first sort of extended input program is to simply put multiple DATA
5308 LIST commands within the INPUT PROGRAM. This will cause all of the data
5309 files to be read in parallel. Input will stop when end of file is
5310 reached on any of the data files.
5312 Transformations, such as conditional and looping constructs, can also be
5313 included within an INPUT PROGRAM. These can be used to combine input
5314 from several data files in more complex ways. However, input will still
5315 stop when end of file is reached on any of the data files.
5317 To prevent INPUT PROGRAM from terminating at the first end of file, use
5318 the END subcommand on DATA LIST. This subcommand takes a variable name,
5319 which should be a numeric scratch variable (@pxref{Scratch Variables}).
5320 (It need not be a scratch variable but otherwise the results can be
5321 surprising.) The value of this variable is set to 0 when reading the
5322 data file, or 1 when end of file is encountered.
5324 Some additional commands are useful in conjunction with INPUT PROGRAM.
5325 END CASE is the first one. Normally each loop through the INPUT PROGRAM
5326 structure produces one case. But with END CASE you can control exactly
5327 when cases are output. When END CASE is used, looping from the end of
5328 INPUT PROGRAM to the beginning does not cause a case to be output.
5330 END FILE is the other command. When the END subcommand is used on DATA
5331 LIST, there is no way for the INPUT PROGRAM construct to stop looping,
5332 so an infinite loop results. The END FILE command, when executed,
5333 stops the flow of input data and passes out of the INPUT PROGRAM
5336 All this is very confusing. A few examples should help to clarify.
5340 DATA LIST NOTABLE FILE='a.data'/X 1-10.
5341 DATA LIST NOTABLE FILE='b.data'/Y 1-10.
5346 The example above reads variable X from file @file{a.data} and variable
5347 Y from file @file{b.data}. If one file is shorter than the other then
5348 the extra data in the longer file is ignored.
5355 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
5358 DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
5368 This example reads variable X from @file{a.data} and variable Y from
5369 @file{b.data}. If one file is shorter than the other then the missing
5370 field is set to the system-missing value alongside the present value for
5371 the remaining length of the longer file.
5378 DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
5385 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
5394 The above example reads data from file @file{a.data}, then from
5395 @file{b.data}, and concatenates them into a single active file.
5402 DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
5410 DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
5421 The above example does the same thing as the previous example, in a
5427 COMPUTE X=UNIFORM(10).
5432 LIST/FORMAT=NUMBERED.
5435 The above example causes an active file to be created consisting of 50
5436 random variates between 0 and 10.
5438 @node LIST, MATRIX DATA, INPUT PROGRAM, Data Input and Output
5445 /CASES=FROM start_index TO end_index BY incr_index
5446 /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@}
5450 The LIST procedure prints the values of specified variables to the
5453 The VARIABLES subcommand specifies the variables whose values are to be
5454 printed. Keyword VARIABLES is optional. If VARIABLES subcommand is not
5455 specified then all variables in the active file are printed.
5457 The CASES subcommand can be used to specify a subset of cases to be
5458 printed. Specify FROM and the case number of the first case to print,
5459 TO and the case number of the last case to print, and BY and the number
5460 of cases to advance between printing cases, or any subset of those
5461 settings. If CASES is not specified then all cases are printed.
5463 The FORMAT subcommand can be used to change the output format. NUMBERED
5464 will print case numbers along with each case; UNNUMBERED, the default,
5465 causes the case numbers to be omitted. The WRAP and SINGLE settings are
5466 currently not used. WEIGHT will cause case weights to be printed along
5467 with variable values; NOWEIGHT, the default, causes case weights to be
5468 omitted from the output.
5470 Case numbers start from 1. They are counted after all transformations
5471 have been considered.
5473 LIST will attempt to fit all the values on a single line. If necessary,
5474 variable names will be display vertically in order to fit. If values
5475 cannot fit on a single line, then a multi-line format will be used.
5477 LIST is a procedure. It causes the data to be read.
5479 @node MATRIX DATA, NEW FILE, LIST, Data Input and Output
5480 @section MATRIX DATA
5487 /FORMAT=@{LIST,FREE@} @{LOWER,UPPER,FULL@} @{DIAGONAL,NODIAGONAL@}
5488 /SPLIT=@{new_var,var_list@}
5492 /CONTENTS=@{N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE,
5493 DFE,MAT,COV,CORR,PROX@}
5496 The MATRIX DATA command reads square matrices in one of several textual
5497 formats. MATRIX DATA clears the dictionary and replaces it and reads a
5500 Use VARIABLES to specify the variables that form the rows and columns of
5501 the matrices. You may not specify a variable named VARNAME_. You
5502 should specify VARIABLES first.
5504 Specify the file to read on FILE, either as a file name string or a file
5505 handle (@pxref{FILE HANDLE}). If FILE is not specified then matrix data
5506 must immediately follow MATRIX DATA with a BEGIN DATA@dots{}END DATA
5507 construct (@pxref{BEGIN DATA}).
5509 The FORMAT subcommand specifies how the matrices are formatted. LIST,
5510 the default, indicates that there is one line per row of matrix data;
5511 FREE allows single matrix rows to be broken across multiple lines. This
5512 is analogous to the difference between DATA LIST FREE and DATA LIST LIST
5513 (@pxref{DATA LIST}). LOWER, the default, indicates that the lower
5514 triangle of the matrix is given; UPPER indicates the upper triangle; and
5515 FULL indicates that the entire matrix is given. DIAGONAL, the default,
5516 indicates that the diagonal is part of the data; NODIAGONAL indicates
5517 that it is omitted. DIAGONAL/NODIAGONAL have no effect when FULL is
5520 The SPLIT subcommand is used to specify SPLIT FILE variables for the
5521 input matrices (@pxref{SPLIT FILE}). Specify either a single variable
5522 not specified on VARIABLES, or one or more variables that are specified
5523 on VARIABLES. In the former case, the SPLIT values are not present in
5524 the data and ROWTYPE_ may not be specified on VARIABLES. In the latter
5525 case, the SPLIT values are present in the data.
5527 Specify a list of factor variables on FACTORS. Factor variables must
5528 also be listed on VARIABLES. Factor variables are used when there are
5529 some variables where, for each possible combination of their values,
5530 statistics on the matrix variables are included in the data.
5532 If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the
5533 CELLS subcommand is required. Specify the number of factor variable
5534 combinations that are given. For instance, if factor variable A has 2
5535 values and factor variable B has 3 values, specify 6.
5537 The N subcommand specifies a population number of observations. When N
5538 is specified, one N record is output for each SPLIT FILE.
5540 Use CONTENTS to specify what sort of information the matrices include.
5541 Each possible option is described in more detail below. When ROWTYPE_
5542 is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS
5543 is not specified then /CONTENTS=CORR is assumed.
5548 Number of observations as a vector, one value for each variable.
5550 Number of observations as a single value.
5556 Vector of standard deviations.
5560 Vector of mean squared errors.
5562 Vector of degrees of freedom.
5573 The exact semantics of the matrices read by MATRIX DATA are complex.
5574 Right now MATRIX DATA isn't too useful due to a lack of procedures
5575 accepting or producing related data, so these semantics aren't
5576 documented. Later, they'll be described here in detail.
5578 @node NEW FILE, PRINT, MATRIX DATA, Data Input and Output
5586 The NEW FILE command clears the current active file.
5588 @node PRINT, PRINT EJECT, NEW FILE, Data Input and Output
5597 /[line_no] arg@dots{}
5599 arg takes one of the following forms:
5600 'string' [start-end]
5601 var_list start-end [type_spec]
5602 var_list (fortran_spec)
5606 The PRINT transformation writes variable data to an output file. PRINT
5607 is executed when a procedure causes the data to be read. In order to
5608 execute the PRINT transformation without invoking a procedure, use the
5609 EXECUTE command (@pxref{EXECUTE}).
5611 All PRINT subcommands are optional.
5613 The OUTFILE subcommand specifies the file to receive the output. The
5614 file may be a file name as a string or a file handle (@pxref{FILE
5615 HANDLE}). If OUTFILE is not present then output will be sent to PSPP's
5616 output listing file.
5618 The RECORDS subcommand specifies the number of lines to be output. The
5619 number of lines may optionally be surrounded by parentheses.
5621 TABLE will cause the PRINT command to output a table to the listing file
5622 that describes what it will print to the output file. NOTABLE, the
5623 default, suppresses this output table.
5625 Introduce the strings and variables to be printed with a slash
5626 (@samp{/}). Optionally, the slash may be followed by a number
5627 indicating which output line will be specified. In the absence of this
5628 line number, the next line number will be specified. Multiple lines may
5629 be specified using multiple slashes with the intended output for a line
5630 following its respective slash.
5632 Literal strings may be printed. Specify the string itself. Optionally
5633 the string may be followed by a column number or range of column
5634 numbers, specifying the location on the line for the string to be
5635 printed. Otherwise, the string will be printed at the current position
5638 Variables to be printed can be specified in the same ways as available
5639 for DATA LIST FIXED (@pxref{DATA LIST FIXED}). In addition, a variable
5640 list may be followed by an asterisk (@samp{*}), which indicates that the
5641 variables should be printed in their dictionary print formats, separated
5642 by spaces. A variable list followed by a slash or the end of command
5643 will be interpreted the same way.
5645 If a FORTRAN type specification is used to move backwards on the current
5646 line, then text is written at that point on the line, the line will be
5647 truncated to that length, although additional text being added will
5648 again extend the line to that length.
5650 @node PRINT EJECT, PRINT SPACE, PRINT, Data Input and Output
5651 @section PRINT EJECT
5659 /[line_no] arg@dots{}
5661 arg takes one of the following forms:
5662 'string' [start-end]
5663 var_list start-end [type_spec]
5664 var_list (fortran_spec)
5668 PRINT EJECT is used to write data to an output file. Before the data is
5669 written, the current page in the listing file is ejected.
5671 @xref{PRINT}, for more information on syntax and usage.
5673 @node PRINT SPACE, REREAD, PRINT EJECT, Data Input and Output
5674 @section PRINT SPACE
5678 PRINT SPACE OUTFILE='filename' n_lines.
5681 The PRINT SPACE prints one or more blank lines to an output file.
5683 The OUTFILE subcommand is optional. It may be used to direct output to
5684 a file specified by file name as a string or file handle (@pxref{FILE
5685 HANDLE}). If OUTFILE is not specified then output will be directed to
5688 n_lines is also optional. If present, it is an expression
5689 (@pxref{Expressions}) specifying the number of blank lines to be
5690 printed. The expression must evaluate to a nonnegative value.
5692 @node REREAD, REPEATING DATA, PRINT SPACE, Data Input and Output
5697 REREAD FILE=handle COLUMN=column.
5700 The REREAD transformation allows the previous input line in a data file
5701 already processed by DATA LIST or another input command to be re-read
5702 for further processing.
5704 The FILE subcommand, which is optional, is used to specify the file to
5705 have its line re-read. The file must be specified in the form of a file
5706 handle (@pxref{FILE HANDLE}). If FILE is not specified then the last
5707 file specified on DATA LIST will be assumed (last file specified
5708 lexically, not in terms of flow-of-control).
5710 By default, the line re-read is re-read in its entirety. With the
5711 COLUMN subcommand, a prefix of the line can be exempted from
5712 re-reading. Specify an expression (@pxref{Expressions}) evaluating to
5713 the first column that should be included in the re-read line. Columns
5714 are numbered from 1 at the left margin.
5716 Multiple REREAD commands will not back up in the data file. Instead,
5717 they will re-read the same line multiple times.
5719 @node REPEATING DATA, WRITE, REREAD, Data Input and Output
5720 @section REPEATING DATA
5721 @vindex REPEATING DATA
5729 /CONTINUED[=cont_start-cont_end]
5730 /ID=id_start-id_end=id_var
5732 /DATA=var_spec@dots{}
5734 where each var_spec takes one of the forms
5735 var_list start-end [type_spec]
5736 var_list (fortran_spec)
5739 The REPEATING DATA command is used to parse groups of data repeating in
5740 a uniform format, possibly with several groups on a single line. Each
5741 group of data corresponds with one case. REPEATING DATA may only be
5742 used within an INPUT PROGRAM structure. When used with DATA LIST, it
5743 can be used to parse groups of cases that share a subset of variables
5744 but differ in their other data.
5746 The STARTS subcommand is required. Specify a range of columns, using
5747 literal numbers or numeric variable names. This range specifies the
5748 columns on the first line that are used to contain groups of data. The
5749 ending column is optional. If it is not specified, then the record
5750 width of the input file is used. For the inline file (@pxref{BEGIN
5751 DATA}) this is 80 columns; for a file with fixed record widths it is the
5752 record width; for other files it is 1024 characters by default.
5754 The OCCURS subcommand is required. It must be a number or the name of a
5755 numeric variable. Its value is the number of groups present in the
5758 The DATA subcommand is required. It must be the last subcommand
5759 specified. It is used to specify the data present within each repeating
5760 group. Column numbers are specified relative to the beginning of a
5761 group at column 1. Data is specified in the same way as with DATA LIST
5762 FIXED (@pxref{DATA LIST FIXED}).
5764 All other subcommands are optional.
5766 FILE specifies the file to read, either a file name as a string or a
5767 file handle (@pxref{FILE HANDLE}). If FILE is not present then the
5768 default is the last file handle used on DATA LIST (lexically, not in
5769 terms of flow of control).
5771 By default REPEATING DATA will output a table describing how it will
5772 parse the input data. Specifying NOTABLE will disable this behavior;
5773 specifying TABLE will explicitly enable it.
5775 The LENGTH subcommand specifies the length in characters of each group.
5776 If it is not present then length is inferred from the DATA subcommand.
5777 LENGTH can be a number or a variable name.
5779 Normally all the data groups are expected to be present on a single
5780 line. Use the CONTINUED command to indicate that data can be continued
5781 onto additional lines. If data on continuation lines starts at the left
5782 margin and continues through the entire field width, no column
5783 specifications are necessary on CONTINUED. Otherwise, specify the
5784 possible range of columns in the same way as on STARTS.
5786 When data groups are continued from line to line, it's easily possible
5787 for cases to get out of sync if hand editing is not done carefully. The
5788 ID subcommand allows a case identifier to be present on each line of
5789 repeating data groups. REPEATING DATA will check for the same
5790 identifier on each line and report mismatches. Specify the range of
5791 columns that the identifier will occupy, followed by an equals sign
5792 (@samp{=}) and the identifier variable name. The variable must already
5793 have been declared with NUMERIC or another command.
5795 @node WRITE, , REPEATING DATA, Data Input and Output
5804 /[line_no] arg@dots{}
5806 arg takes one of the following forms:
5807 'string' [start-end]
5808 var_list start-end [type_spec]
5809 var_list (fortran_spec)
5813 WRITE is used to write text or binary data to an output file.
5815 @xref{PRINT}, for more information on syntax and usage. The main
5816 difference between PRINT and WRITE is that whereas by default PRINT uses
5817 variables' print formats, WRITE uses write formats.
5819 The sole additional difference is that if WRITE is used to send output
5820 to a binary file, carriage control characters will not be output.
5821 @xref{FILE HANDLE}, for information on how to declare a file as binary.
5823 @node System and Portable Files, Variable Attributes, Data Input and Output, Top
5824 @chapter System Files and Portable Files
5826 The commands in this chapter read, write, and examine system files and
5830 * APPLY DICTIONARY:: Apply system file dictionary to active file.
5831 * EXPORT:: Write to a portable file.
5832 * GET:: Read from a system file.
5833 * IMPORT:: Read from a portable file.
5834 * MATCH FILES:: Merge system files.
5835 * SAVE:: Write to a system file.
5836 * SYSFILE INFO:: Display system file dictionary.
5837 * XSAVE:: Write to a system file, as a transform.
5840 @node APPLY DICTIONARY, EXPORT, System and Portable Files, System and Portable Files
5841 @section APPLY DICTIONARY
5842 @vindex APPLY DICTIONARY
5845 APPLY DICTIONARY FROM='filename'.
5848 The APPLY DICTIONARY command applies the variable labels, value labels,
5849 and missing values from variables in a system file to corresponding
5850 variables in the active file. In some cases it also updates the
5853 Specify a system file with a file name string or as a file handle
5854 (@pxref{FILE HANDLE}). The dictionary in the system file will be read,
5855 but it will not replace the active file dictionary. The system file's
5856 data will not be read.
5858 Only variables with names that exist in both the active file and the
5859 system file are considered. Variables with the same name but different
5860 types (numeric, string) will cause an error message. Otherwise, the
5861 system file variables' attributes will replace those in their matching
5862 active file variables, as described below.
5864 If a system file variable has a variable label, then it will replace the
5865 active file variable's variable label. If the system file variable does
5866 not have a variable label, then the active file variable's variable
5867 label, if any, will be retained.
5869 If the active file variable is numeric or short string, then value
5870 labels and missing values, if any, will be copied to the active file
5871 variable. If the system file variable does not have value labels or
5872 missing values, then those in the active file variable, if any, will not
5875 Finally, weighting of the active file is updated (@pxref{WEIGHT}). If
5876 the active file has a weighting variable, and the system file does not,
5877 or if the weighting variable in the system file does not exist in the
5878 active file, then the active file weighting variable, if any, is
5879 retained. Otherwise, the weighting variable in the system file becomes
5880 the active file weighting variable.
5882 APPLY DICTIONARY takes effect immediately. It does not read the active
5883 file. The system file is not modified.
5885 @node EXPORT, GET, APPLY DICTIONARY, System and Portable Files
5894 /RENAME=(src_names=target_names)@dots{}
5897 The EXPORT procedure writes the active file dictionary and data to a
5898 specified portable file.
5900 The OUTFILE subcommand, which is the only required subcommand, specifies
5901 the portable file to be written as a file name string or a file handle
5902 (@pxref{FILE HANDLE}).
5904 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
5907 EXPORT is a procedure. It causes the active file to be read.
5909 @node GET, IMPORT, EXPORT, System and Portable Files
5918 /RENAME=(src_names=target_names)@dots{}
5921 The GET transformation clears the current dictionary and active file and
5922 replaces them with the dictionary and data from a specified system file.
5924 The FILE subcommand is the only required subcommand. Specify the system
5925 file to be read as a string file name or a file handle (@pxref{FILE
5928 By default, all the variables in a system file are read. The DROP
5929 subcommand can be used to specify a list of variables that are not to be
5930 read. By contrast, the KEEP subcommand can be used to specify variable
5931 that are to be read, with all other variables not read.
5933 Normally variables in a system file retain the names that they were
5934 saved under. Use the RENAME subcommand to change these names. Specify,
5935 within parentheses, a list of variable names followed by an equals sign
5936 (@samp{=}) and the names that they should be renamed to. Multiple
5937 parenthesized groups of variable names can be included on a single
5938 RENAME subcommand. Variables' names may be swapped using a RENAME
5939 subcommand of the form @samp{/RENAME=(A B=B A)}.
5941 Alternate syntax for the RENAME subcommand allows the parentheses to be
5942 eliminated. When this is done, only a single variable may be renamed at
5943 once. For instance, @samp{/RENAME=A=B}. This alternate syntax is
5946 DROP, KEEP, and RENAME are performed in left-to-right order. They each
5947 may be present any number of times.
5949 Please note that DROP, KEEP, and RENAME do not cause the system file on
5950 disk to be modified. Only the active file read from the system file is
5953 GET does not cause the data to be read, only the dictionary. The data
5954 is read later, when a procedure is executed.
5956 @node IMPORT, MATCH FILES, GET, System and Portable Files
5966 /RENAME=(src_names=target_names)@dots{}
5969 The IMPORT transformation clears the active file dictionary and data and
5970 replaces them with a dictionary and data from a portable file on disk.
5972 The FILE subcommand, which is the only required subcommand, specifies
5973 the portable file to be read as a file name string or a file handle
5974 (@pxref{FILE HANDLE}).
5976 The TYPE subcommand is currently not used.
5978 DROP, KEEP, and RENAME follow the syntax used by GET (@pxref{GET}).
5980 IMPORT does not cause the data to be read, only the dictionary. The
5981 data is read later, when a procedure is executed.
5983 @node MATCH FILES, SAVE, IMPORT, System and Portable Files
5984 @section MATCH FILES
5990 /@{FILE,TABLE@}=@{*,'filename'@}
5993 /RENAME=(src_names=target_names)@dots{}
6000 The MATCH FILES command merges one or more system files, optionally
6001 including the active file. Records with the same values for BY
6002 variables are combined into a single record. Records with different
6003 values are output in order. Thus, multiple sorted system files are
6004 combined into a single sorted system file based on the value of the BY
6007 The BY subcommand specifies a list of variables that are used to match
6008 records from each of the system files. Variables specified must exist
6009 in all the files specified on FILE and TABLE. BY should usually be
6010 specified. If TABLE is used then BY is required.
6012 Specify FILE with a system file as a file name string or file handle
6013 (@pxref{FILE HANDLE}). An asterisk (@samp{*}) may also be specified to
6014 indicate the current active file. The files specified on FILE are
6015 merged together based on the BY variables, or combined case-by-case if
6016 BY is not specified. Normally at least two FILE subcommands should be
6019 Specify TABLE with a system file in order to use it as a @dfn{table
6020 lookup file}. Records in table lookup files are not used up after
6021 they've been used once. This means that data in table lookup files can
6022 correspond to any number of records in FILE files. Table lookup files
6023 correspond to lookup tables in traditional relational database systems.
6024 It is incorrect to have records with duplicate BY values in table lookup
6027 Any number of FILE and TABLE subcommands may be specified. Each
6028 instance of FILE or TABLE can be followed by DROP, KEEP, and/or RENAME
6029 subcommands. These take the same form as the corresponding subcommands
6030 of GET (@pxref{GET}), and perform the same functions.
6032 Variables belonging to files that are not present for the current case
6033 are set to the system-missing value for numeric variables or spaces for
6036 IN, FIRST, LAST, and MAP are currently not used.
6038 @node SAVE, SYSFILE INFO, MATCH FILES, System and Portable Files
6045 /@{COMPRESSED,UNCOMPRESSED@}
6048 /RENAME=(src_names=target_names)@dots{}
6051 The SAVE procedure causes the dictionary and data in the active file to
6052 be written to a system file.
6054 The FILE subcommand is the only required subcommand. Specify the system
6055 file to be written as a string file name or a file handle (@pxref{FILE
6058 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
6059 system file is compressed. By default, system files are compressed.
6060 This default can be changed with the SET command (@pxref{SET}).
6062 By default, all the variables in the active file dictionary are written
6063 to the system file. The DROP subcommand can be used to specify a list
6064 of variables not to be written. In contrast, KEEP specifies variables
6065 to be written, with all variables not specified not written.
6067 Normally variables are saved to a system file under the same names they
6068 have in the active file. Use the RENAME command to change these names.
6069 Specify, within parentheses, a list of variable names followed by an
6070 equals sign (@samp{=}) and the names that they should be renamed to.
6071 Multiple parenthesized groups of variable names can be included on a
6072 single RENAME subcommand. Variables' names may be swapped using a
6073 RENAME subcommand of the form @samp{/RENAME=(A B=B A)}.
6075 Alternate syntax for the RENAME subcommand allows the parentheses to be
6076 eliminated. When this is done, only a single variable may be renamed at
6077 once. For instance, @samp{/RENAME=A=B}. This alternate syntax is
6080 DROP, KEEP, and RENAME are performed in left-to-right order. They each
6081 may be present any number of times.
6083 Please note that DROP, KEEP, and RENAME do not cause the active file to
6084 be modified. Only the system file written to disk is changed.
6086 SAVE causes the data to be read. It is a procedure.
6088 @node SYSFILE INFO, XSAVE, SAVE, System and Portable Files
6089 @section SYSFILE INFO
6090 @vindex SYSFILE INFO
6093 SYSFILE INFO FILE='filename'.
6096 The SYSFILE INFO command reads the dictionary in a system file and
6097 displays the information in its dictionary.
6099 Specify a file name or file handle. SYSFILE INFO will read that file as
6100 a system file and display information on its dictionary.
6102 The file does not replace the current active file.
6104 @node XSAVE, , SYSFILE INFO, System and Portable Files
6111 /@{COMPRESSED,UNCOMPRESSED@}
6114 /RENAME=(src_names=target_names)@dots{}
6117 The XSAVE transformation writes the active file dictionary and data to a
6118 system file stored on disk.
6120 XSAVE is a transformation, not a procedure. It is executed when the
6121 data is read by a procedure or procedure-like command. In all other
6122 respects, XSAVE is identical to SAVE. @xref{SAVE}, for more information
6123 on syntax and usage.
6125 @node Variable Attributes, Data Manipulation, System and Portable Files, Top
6126 @chapter Manipulating variables
6128 The variables in the active file dictionary are important. There are
6129 several utility functions for examining and adjusting them.
6132 * ADD VALUE LABELS:: Add value labels to variables.
6133 * DISPLAY:: Display variable names & descriptions.
6134 * DISPLAY VECTORS:: Display a list of vectors.
6135 * FORMATS:: Set print and write formats.
6136 * LEAVE:: Don't clear variables between cases.
6137 * MISSING VALUES:: Set missing values for variables.
6138 * MODIFY VARS:: Rename, reorder, and drop variables.
6139 * NUMERIC:: Create new numeric variables.
6140 * PRINT FORMATS:: Set variable print formats.
6141 * RENAME VARIABLES:: Rename variables.
6142 * VALUE LABELS:: Set value labels for variables.
6143 * STRING:: Create new string variables.
6144 * VARIABLE LABELS:: Set variable labels for variables.
6145 * VECTOR:: Declare an array of variables.
6146 * WRITE FORMATS:: Set variable write formats.
6149 @node ADD VALUE LABELS, DISPLAY, Variable Attributes, Variable Attributes
6150 @section ADD VALUE LABELS
6151 @vindex ADD VALUE LABELS
6155 /var_list value 'label' [value 'label']@dots{}
6158 ADD VALUE LABELS has the same syntax and purpose as VALUE LABELS (see
6159 above), but it does not clear away value labels from the variables
6160 before adding the ones specified.
6162 @node DISPLAY, DISPLAY VECTORS, ADD VALUE LABELS, Variable Attributes
6167 DISPLAY @{NAMES,INDEX,LABELS,VARIABLES,DICTIONARY,SCRATCH@}
6171 DISPLAY displays requested information on variables. Variables can
6172 optionally be sorted alphabetically. The entire dictionary or just
6173 specified variables can be described.
6175 One of the following keywords can be present:
6179 The variables' names are displayed.
6182 The variables' names are displayed along with a value describing their
6183 position within the active file dictionary.
6186 Variable names, positions, and variable labels are displayed.
6189 Variable names, positions, print and write formats, and missing values
6193 Variable names, positions, print and write formats, missing values,
6194 variable labels, and value labels are displayed.
6197 Varible names are displayed, for scratch variables only (@pxref{Scratch
6201 If SORTED is specified, then the variables are displayed in ascending
6202 order based on their names; otherwise, they are displayed in the order
6203 that they occur in the active file dictionary.
6205 @node DISPLAY VECTORS, FORMATS, DISPLAY, Variable Attributes
6206 @section DISPLAY VECTORS
6207 @vindex DISPLAY VECTORS
6213 The DISPLAY VECTORS command causes a list of the currently declared
6214 vectors to be displayed.
6216 @node FORMATS, LEAVE, DISPLAY VECTORS, Variable Attributes
6221 FORMATS var_list (fmt_spec).
6224 The FORMATS command set the print and write formats for the specified
6225 variables to the specified format specification. @xref{Input/Output
6228 Specify a list of variables followed by a format specification in
6229 parentheses. The print and write formats of the specified variables
6232 Additional lists of variables and formats may be included if they are
6233 delimited by a slash (@samp{/}).
6235 The FORMATS command takes effect immediately. It is not affected by
6236 conditional and looping structures such as DO IF or LOOP.
6238 @node LEAVE, MISSING VALUES, FORMATS, Variable Attributes
6246 The LEAVE command prevents the specified variables from being
6247 reinitialized whenever a new case is processed.
6249 Normally, when a data file is processed, every variable in the active
6250 file is initialized to the system-missing value or spaces at the
6251 beginning of processing for each case. When a variable has been
6252 specified on LEAVE, this is not the case. Instead, that variable is
6253 initialized to 0 (not system-missing) or spaces for the first case.
6254 After that, it retains its value between cases.
6256 This becomes useful for counters. For instance, in the example below
6257 the variable SUM maintains a running total of the values in the ITEM
6261 DATA LIST /ITEM 1-3.
6262 COMPUTE SUM=SUM+ITEM.
6273 @noindent Partial output from this example:
6282 It is best to use the LEAVE command immediately before invoking a
6283 procedure command, because it is reset by certain transformations---for
6284 instance, COMPUTE and IF. LEAVE is also reset by all procedure
6287 @node MISSING VALUES, MODIFY VARS, LEAVE, Variable Attributes
6288 @section MISSING VALUES
6289 @vindex MISSING VALUES
6292 MISSING VALUES var_list (missing_values).
6294 missing_values takes one of the following forms:
6299 num1 THRU num2, num3
6302 string1, string2, string3
6303 As part of a range, LO or LOWEST may take the place of num1;
6304 HI or HIGHEST may take the place of num2.
6307 The MISSING VALUES command sets user-missing values for numeric and
6308 short string variables. Long string variables may not have missing
6311 Specify a list of variables, followed by a list of their user-missing
6312 values in parentheses. Up to three discrete values may be given, or,
6313 for numeric variables only, a range of values optionally accompanied by
6314 a single discrete value. Ranges may be open-ended on one end, indicated
6315 through the use of the keyword LO or LOWEST or HI or HIGHEST.
6317 The MISSING VALUES command takes effect immediately. It is not affected
6318 by conditional and looping constructs such as DO IF or LOOP.
6320 @node MODIFY VARS, NUMERIC, MISSING VALUES, Variable Attributes
6321 @section MODIFY VARS
6326 /REORDER=@{FORWARD,BACKWARD@} @{POSITIONAL,ALPHA@} (var_list)@dots{}
6327 /RENAME=(old_names=new_names)@dots{}
6328 /@{DROP,KEEP@}=var_list
6332 The MODIFY VARS commands allows variables in the active file to be
6333 reordered, renamed, or deleted from the active file.
6335 At least one subcommand must be specified, and no subcommand may be
6336 specified more than once. DROP and KEEP may not both be specified.
6338 The REORDER subcommand changes the order of variables in the active
6339 file. Specify one or more lists of variable names in parentheses. By
6340 default, each list of variables is rearranged into the specified order.
6341 To put the variables into the reverse of the specified order, put
6342 keyword BACKWARD before the parentheses. To put them into alphabetical
6343 order in the dictionary, specify keyword ALPHA before the parentheses.
6344 BACKWARD and ALPHA may also be combined.
6346 To rename variables in the active file, specify RENAME, an equals sign
6347 (@samp{=}), and lists of the old variable names and new variable names
6348 separated by another equals sign within parentheses. There must be the
6349 same number of old and new variable names. Each old variable is renamed to
6350 the corresponding new variable name. Multiple parenthesized groups of
6351 variables may be specified.
6353 The DROP subcommand deletes a specified list of variables from the
6356 The KEEP subcommand keeps the specified list of variables in the active
6357 file. Any unlisted variables are deleted from the active file.
6359 MAP is currently ignored.
6361 MODIFY VARS takes effect immediately. It does not cause the data to be
6364 @node NUMERIC, PRINT FORMATS, MODIFY VARS, Variable Attributes
6369 NUMERIC /var_list [(fmt_spec)].
6372 The NUMERIC command explicitly declares new numeric variables,
6373 optionally setting their output formats.
6375 Specify a slash (@samp{/}), followed by the names of the new numeric
6376 variables. If you wish to set their output formats, follow their names
6377 by an output format specification in parentheses (@pxref{Input/Output
6378 Formats}). If no output format specification is given then the
6379 variables will default to F8.2.
6381 Variables created with NUMERIC will be initialized to the system-missing
6384 @node PRINT FORMATS, RENAME VARIABLES, NUMERIC, Variable Attributes
6385 @section PRINT FORMATS
6386 @vindex PRINT FORMATS
6389 PRINT FORMATS var_list (fmt_spec).
6392 The PRINT FORMATS command sets the print formats for the specified
6393 variables to the specified format specification.
6395 Syntax is identical to that of FORMATS (@pxref{FORMATS}), but the PRINT
6396 FORMATS command sets only print formats, not write formats.
6398 @node RENAME VARIABLES, VALUE LABELS, PRINT FORMATS, Variable Attributes
6399 @section RENAME VARIABLES
6400 @vindex RENAME VARIABLES
6403 RENAME VARIABLES (old_names=new_names)@dots{} .
6406 The RENAME VARIABLES command allows the names of variables in the active
6409 To rename variables, specify lists of the old variable names and new
6410 variable names, separated by an equals sign (@samp{=}), within
6411 parentheses. There must be the same number of old and new variable
6412 names. Each old variable is renamed to the corresponding new variable
6413 name. Multiple parenthesized groups of variables may be specified.
6415 RENAME VARIABLES takes effect immediately. It does not cause the data
6418 @node VALUE LABELS, STRING, RENAME VARIABLES, Variable Attributes
6419 @section VALUE LABELS
6420 @vindex VALUE LABELS
6424 /var_list value 'label' [value 'label']@dots{}
6427 The VALUE LABELS command allows values of numeric and short string
6428 variables to be associated with labels. In this way, a short value can
6429 stand for a long value.
6431 In order to set up value labels for a set of variables, specify the
6432 variable names after a slash (@samp{/}), followed by a list of values
6433 and their associated labels, separated by spaces.
6435 Before the VALUE LABELS command is executed, any existing value labels
6436 are cleared from the variables specified.
6438 @node STRING, VARIABLE LABELS, VALUE LABELS, Variable Attributes
6443 STRING /var_list (fmt_spec).
6446 The STRING command creates new string variables for use in
6449 Specify a slash (@samp{/}), followed by the names of the string
6450 variables to create and the desired output format specification in
6451 parentheses (@pxref{Input/Output Formats}). Variable widths are
6452 implicitly derived from the specified output formats.
6454 Created variables are initialized to spaces.
6456 @node VARIABLE LABELS, VECTOR, STRING, Variable Attributes
6457 @section VARIABLE LABELS
6458 @vindex VARIABLE LABELS
6462 /var_list 'var_label'.
6465 The VARIABLE LABELS command is used to associate an explanatory name
6466 with a group of variables. This name (a variable label) is displayed by
6467 statistical procedures.
6469 To assign a variable label to a group of variables, specify a slash
6470 (@samp{/}), followed by the list of variable names and the variable
6473 @node VECTOR, WRITE FORMATS, VARIABLE LABELS, Variable Attributes
6478 Two possible syntaxes:
6479 VECTOR vec_name=var_list.
6480 VECTOR vec_name_list(count).
6483 The VECTOR command allows a group of variables to be accessed as if they
6484 were consecutive members of an array with a vector(index) notation.
6486 To make a vector out of a set of existing variables, specify a name for
6487 the vector followed by an equals sign (@samp{=}) and the variables that
6488 belong in the vector.
6490 To make a vector and create variables at the same time, specify one or
6491 more vector names followed by a count in parentheses. This will cause
6492 variables named @code{@var{vec}1} through @code{@var{vec}@var{count}} to
6493 be created as numeric variables. Variable names including numeric
6494 suffixes may not exceed 8 characters in length, and none of the
6495 variables may exist prior to the VECTOR command.
6497 All the variables in a vector must be the same type.
6499 Vectors created with VECTOR disappear after any procedure or
6500 procedure-like command is executed. The variables contained in the
6501 vectors remain, unless they are scratch variables (@pxref{Scratch
6504 Variables within a vector may be references in expressions using
6505 vector(index) syntax.
6507 @node WRITE FORMATS, , VECTOR, Variable Attributes
6508 @section WRITE FORMATS
6509 @vindex WRITE FORMATS
6512 WRITE FORMATS var_list (fmt_spec).
6515 The WRITE FORMATS command sets the write formats for the specified
6516 variables to the specified format specification.
6518 Syntax is identical to that of FORMATS (@pxref{FORMATS}), but the WRITE
6519 FORMATS command sets only write formats, not print formats.
6521 @node Data Manipulation, Data Selection, Variable Attributes, Top
6522 @chapter Data transformations
6523 @cindex transformations
6525 The PSPP procedures examined in this chapter manipulate data and
6526 prepare the active file for later analyses. They do not produce output,
6530 * AGGREGATE:: Summarize multiple cases into a single case.
6531 * AUTORECODE:: Automatic recoding of variables.
6532 * COMPUTE:: Assigning a variable a calculated value.
6533 * COUNT:: Counting variables with particular values.
6534 * FLIP:: Exchange variables with cases.
6535 * IF:: Conditionally assigning a calculated value.
6536 * RECODE:: Mapping values from one set to another.
6537 * SORT CASES:: Sort the active file.
6540 @node AGGREGATE, AUTORECODE, Data Manipulation, Data Manipulation
6548 /OUTFILE=@{*,'filename'@}
6551 /dest_vars=agr_func(src_vars, args@dots{})@dots{}
6554 The AGGREGATE command summarizes groups of cases into single cases.
6555 Cases are divided into groups that have the same values for one or more
6556 variables called @dfn{break variables}. Several functions are available
6557 for summarizing case contents.
6559 BREAK is the only required subcommand (in addition, at least one
6560 aggregation variable must be specified). Specify a list of variable
6561 names. The values of these variables are used to divide the active file
6562 into groups to be summarized.
6564 By default, the active file is sorted based on the break variables
6565 before aggregation takes place. If the active file is already sorted,
6566 specify PRESORTED to save time.
6568 The OUTFILE subcommand specifies a system file by file name string or
6569 file handle (@pxref{FILE HANDLE}). The aggregated cases are sent to
6570 this file. If OUTFILE is not specified, or if @samp{*} is specified,
6571 then the aggregated cases replace the active file.
6573 Normally the aggregate file does not receive the documents from the
6574 active file, even if the aggregate file replaces the active file.
6575 Specify DOCUMENT to have the documents from the active file copied to
6578 At least one aggregation variable must be specified. Specify a list of
6579 aggregation variables, an equals sign (@samp{=}), an aggregation
6580 function name (see the list below), and a list of source variables in
6581 parentheses. In addition, some aggregation functions expect additional
6582 arguments in the parentheses following the source variable names.
6584 There must be exactly as many source variables as aggregation variables.
6585 Each aggregation variable receives the results of applying the specified
6586 aggregation function to the corresponding source variable. Most
6587 aggregation functions may be applied to numeric and short and long
6588 string variables. Others are restricted to numeric values; these are
6589 marked as such in this list below.
6591 Any number of sets of aggregation variables may be specified.
6593 The available aggregation functions are as follows:
6597 Sum. Limited to numeric values.
6598 @item MEAN(var_name)
6599 Arithmetic mean. Limited to numeric values.
6601 Standard deviation of the mean. Limited to numeric values.
6606 @item FGT(var_name, value)
6607 @itemx PGT(var_name, value)
6608 Fraction between 0 and 1, or percentage between 0 and 100, respectively,
6609 of values greater than the specified constant.
6610 @item FLT(var_name, value)
6611 @itemx PLT(var_name, value)
6612 Fraction or percentage, respectively, of values less than the specified
6614 @item FIN(var_name, low, high)
6615 @itemx PIN(var_name, low, high)
6616 Fraction or percentage, respectively, of values within the specified
6617 inclusive range of constants.
6618 @item FOUT(var_name, low, high)
6619 @itemx POUT(var_name, low, high)
6620 Fraction or percentage, respectively, of values strictly outside the
6621 specified range of constants.
6623 Number of non-missing values.
6625 Number of cases aggregated to form this group. Don't supply a source
6626 variable for this aggregation function.
6628 Number of non-missing values. Each case is considered to have a weight
6629 of 1, regardless of the current weighting variable (@pxref{WEIGHT}).
6631 Number of cases aggregated to form this group. Each case is considered
6632 to have a weight of 1, regardless of the current weighting variable.
6633 @item NMISS(var_name)
6634 Number of missing values.
6635 @item NUMISS(var_name)
6636 Number of missing values. Each case is considered to have a weight of
6637 1, regardless of the current weighting variable.
6638 @item FIRST(var_name)
6639 First value in this group.
6640 @item LAST(var_name)
6641 Last value in this group.
6644 When string values are compared by aggregation functions, they are done
6645 in terms of internal character codes. On most modern computers, this is
6648 In addition, there is a parallel set of aggregation functions having the
6649 same names as those above, but with a dot after the last character (for
6650 instance, @samp{SUM.}). These functions are the same as the above,
6651 except that they cause user-missing values, which are normally excluded
6652 from calculations, to be included.
6654 Normally, only a single case (2 for SD and SD.) need be non-missing in
6655 each group in order for the aggregate variable to be non-missing. If
6656 /MISSING=COLUMNWISE is specified, the behavior reverses: that is, a
6657 single missing value is enough to make the aggregate variable become a
6660 AGGREGATE ignores the current SPLIT FILE settings and causes them to be
6661 canceled (@pxref{SPLIT FILE}).
6663 @node AUTORECODE, COMPUTE, AGGREGATE, Data Manipulation
6668 AUTORECODE VARIABLES=src_vars INTO dest_vars
6673 The AUTORECODE procedure considers the @var{n} values that a variable
6674 takes on and maps them onto values 1@dots{}@var{n} on a new numeric
6677 Subcommand VARIABLES is the only required subcommand and must come
6678 first. Specify VARIABLES, an equals sign (@samp{=}), a list of source
6679 variables, INTO, and a list of target variables. There must the same
6680 number of source and target variables. The target variables must not
6683 By default, increasing values of a source variable (for a string, this
6684 is based on character code comparisons) are recoded to increasing values
6685 of its target variable. To cause increasing values of a source variable
6686 to be recoded to decreasing values of its target variable (@var{n} down
6687 to 1), specify DESCENDING.
6689 PRINT is currently ignored.
6691 AUTORECODE is a procedure. It causes the data to be read.
6693 @node COMPUTE, COUNT, AUTORECODE, Data Manipulation
6699 COMPUTE var_name = expression.
6702 @code{COMPUTE} creates a variable with the name specified (if
6703 necessary), then evaluates the given expression for every case and
6704 assigns the result to the variable. @xref{Expressions}.
6706 Numeric variables created or computed by @code{COMPUTE} are assigned an
6707 output width of 8 characters with two decimal places (@code{F8.2}).
6708 String variables created or computed by @code{COMPUTE} have the same
6709 width as the existing variable or constant.
6711 COMPUTE is a transformation. It does not cause the active file to be
6714 @node COUNT, FLIP, COMPUTE, Data Manipulation
6719 COUNT var_name = var@dots{} (value@dots{}).
6721 Each value takes one of the following forms:
6727 In addition, num1 and num2 can be LO or LOWEST, or HI or HIGHEST,
6731 @code{COUNT} creates or replaces a numeric @dfn{target} variable that
6732 counts the occurrence of a @dfn{criterion} value or set of values over
6733 one or more @dfn{test} variables for each case.
6735 The target variable values are always nonnegative integers. They are
6736 never missing. The target variable is assigned an F8.2 output format.
6737 @xref{Input/Output Formats}. Any variables, including long and short
6738 string variables, may be test variables.
6740 User-missing values of test variables are treated just like any other
6741 values. They are @strong{not} treated as system-missing values.
6742 User-missing values that are criterion values or inside ranges of
6743 criterion values are counted as any other values. However (for numeric
6744 variables), keyword @code{MISSING} may be used to refer to all system-
6745 and user-missing values.
6748 @code{COUNT} target variables are assigned values in the order
6749 specified. In the command @code{COUNT A=A B(1) /B=A B(2).}, the
6750 following actions occur:
6754 The number of occurrences of 1 between @code{A} and @code{B} is counted.
6757 @code{A} is assigned this value.
6760 The number of occurrences of 1 between @code{B} and the @strong{new}
6761 value of @code{A} is counted.
6764 @code{B} is assigned this value.
6767 Despite this ordering, all @code{COUNT} criterion variables must exist
6768 before the procedure is executed---they may not be created as target
6769 variables earlier in the command! Break such a command into two
6772 The examples below may help to clarify.
6776 Assuming @code{Q0}, @code{Q2}, @dots{}, @code{Q9} are numeric variables,
6777 the following commands:
6781 Count the number of times the value 1 occurs through these variables
6782 for each case and assigns the count to variable @code{QCOUNT}.
6785 Print out the total number of times the value 1 occurs throughout
6786 @emph{all} cases using @code{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for
6791 COUNT QCOUNT=Q0 TO Q9(1).
6792 DESCRIPTIVES QCOUNT /STATISTICS=SUM.
6796 Given these same variables, the following commands:
6800 Count the number of valid values of these variables for each case and
6801 assigns the count to variable @code{QVALID}.
6804 Multiplies each value of @code{QVALID} by 10 to obtain a percentage of
6805 valid values, using @code{COMPUTE}. @xref{COMPUTE}, for details.
6808 Print out the percentage of valid values across all cases, using
6809 @code{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for details.
6813 COUNT QVALID=Q0 TO Q9 (LO THRU HI).
6814 COMPUTE QVALID=QVALID*10.
6815 DESCRIPTIVES QVALID /STATISTICS=MEAN.
6819 @node FLIP, IF, COUNT, Data Manipulation
6824 FLIP /VARIABLES=var_list /NEWNAMES=var_name.
6827 The FLIP command transposes rows and columns in the active file. It
6828 causes cases to be swapped with variables, and vice versa.
6830 There are no required subcommands. The VARIABLES subcommand specifies
6831 variables that will be transformed into cases. Variables not specified
6832 are discarded. By default, all variables are selected for
6835 The variables specified by NEWNAMES, which must be a string variable, is
6836 used to give names to the variables created by FLIP. If NEWNAMES is not
6837 specified then the default is a variable named CASE_LBL, if it exists.
6838 If it does not then the variables created by FLIP are named VAR000
6839 through VAR999, then VAR1000, VAR1001, and so on.
6841 When a NEWNAMES variable is available, the names must be canonicalized
6842 before becoming variable names. Invalid characters are replaced by
6843 letter @samp{V} in the first position, or by @samp{_} in subsequent
6844 positions. If the name thus generated is not unique, then numeric
6845 extensions are added, starting with 1, until a unique name is found or
6846 there are no remaining possibilities. If the latter occurs then the
6847 FLIP operation aborts.
6849 The resultant dictionary contains a CASE_LBL variable, which stores the
6850 names of the variables in the dictionary before the transposition. If
6851 the active file is subsequently transposed using FLIP, this variable can
6852 be used to recreate the original variable names.
6854 @node IF, RECODE, FLIP, Data Manipulation
6859 Two possible syntaxes:
6860 IF test_expr target_var=target_expr.
6861 IF test_expr target_vec(target_index)=target_expr.
6864 The IF transformation conditionally assigns the value of a target
6865 expression to a target variable, based on the truth of a test
6868 Specify a boolean-valued expression (@pxref{Expressions}) to be tested
6869 following the IF keyword. This expression is calculated for each case.
6870 If the value is true, then the value of target_expr is computed and
6871 assigned to target_var. If the value is false or missing, nothing is
6872 done. Numeric and short and long string variables may be used. The
6873 type of target_expr must match the type of target_var.
6875 For numeric variables only, target_var need not exist before the IF
6876 transformation is executed. In this case, target_var is assigned the
6877 system-missing value if the IF condition is not true. String variables
6878 must be declared before they can be used as targets for IF.
6880 In addition to ordinary variables, the target variable may be an element
6881 of a vector. In this case, the vector index must be specified in
6882 parentheses following the vector name.
6884 @node RECODE, SORT CASES, IF, Data Manipulation
6889 RECODE var_list (src_value@dots{}=dest_value)@dots{} [INTO var_list].
6891 src_value may take the following forms:
6898 Open-ended ranges may be specified using LO or LOWEST for num1
6899 or HI or HIGHEST for num2.
6901 dest_value may take the following forms:
6908 The RECODE command is used to translate data from one range of values to
6909 another, using flexible user-specified mappings. Data may be remapped
6910 in-place or copied to new variables. Numeric, short string, and long
6911 string data can be recoded.
6913 Specify the list of source variables, followed by one or more mapping
6914 specifications each enclosed in parentheses. If the data is to be
6915 copied to new variables, specify INTO, then the list of target
6916 variables. String target variables must already have been declared
6917 using STRING or another transformation, but numeric target variables can
6918 be created on the fly. There must be exactly as many target variables
6919 as source variables. Each source variable is remapped into its
6920 corresponding target variable.
6922 When INTO is not used, the input and output variables must be of the
6923 same type. Otherwise, string values can be recoded into numeric values,
6924 and vice versa. When this is done and there is no mapping for a
6925 particular value, either a value consisting of all spaces or the
6926 system-missing value is assigned, depending on variable type.
6928 Mappings are considered from left to right. The first src_value that
6929 matches the value of the source variable causes the target variable to
6930 receive the value indicated by the dest_value. Literal number, string,
6931 and range src_value's should be self-explanatory. MISSING as a
6932 src_value matches any user- or system-missing value. SYSMIS matches the
6933 system missing value only. ELSE is a catch-all that matches anything.
6934 It should be the last src_value specified.
6936 Numeric and string dest_value's should also be self-explanatory. COPY
6937 causes the input values to be copied to the output. This is only value
6938 if the source and target variables are of the same type. SYSMIS
6939 indicates the system-missing value.
6941 If the source variables are strings and the target variables are
6942 numeric, then there is one additional mapping available: (CONVERT),
6943 which must be the last specified mapping. CONVERT causes a number
6944 specified as a string to be converted to a numeric value. If the string
6945 cannot be parsed as a number, then the system-missing value is assigned.
6947 Multiple recodings can be specified on the same RECODE command.
6948 Introduce additional recodings with a slash (@samp{/}) in order to
6949 separate them from the previous recodings.
6951 @node SORT CASES, , RECODE, Data Manipulation
6956 SORT CASES BY var_list.
6959 SORT CASES sorts the active file by the values of one or more
6962 Specify BY and a list of variables to sort by. By default, variables
6963 are sorted in ascending order. To override sort order, specify (D) or
6964 (DOWN) after a list of variables to get descending order, or (A) or (UP)
6965 for ascending order. These apply to the entire list of variables
6968 SORT CASES is a procedure. It causes the data to be read.
6970 SORT CASES will attempt to sort the entire active file in main memory.
6971 If main memory is exhausted then it will use a merge sort algorithm that
6972 involves writing and reading numerous temporary files. Environment
6973 variables determine the temporary files' location. The first of
6974 SPSSTMPDIR, SPSSXTMPDIR, or TMPDIR that is set determines the location.
6975 Otherwise, if the compiler environment defined P_tmpdir, that is used.
6976 Otherwise, under Unix-like OSes /tmp is used; under MS-DOS, the first of
6977 TEMP, TMP, or root on the current drive is used; under other OSes, the
6980 @node Data Selection, Conditionals and Looping, Data Manipulation, Top
6981 @chapter Selecting data for analysis
6983 This chapter documents PSPP commands that temporarily or permanently
6984 select data records from the active file for analysis.
6987 * FILTER:: Exclude cases based on a variable.
6988 * N OF CASES:: Limit the size of the active file.
6989 * PROCESS IF:: Temporarily excluding cases.
6990 * SAMPLE:: Select a specified proportion of cases.
6991 * SELECT IF:: Permanently delete selected cases.
6992 * SPLIT FILE:: Do multiple analyses with one command.
6993 * TEMPORARY:: Make transformations' effects temporary.
6994 * WEIGHT:: Weight cases by a variable.
6997 @node FILTER, N OF CASES, Data Selection, Data Selection
7006 The FILTER command allows a boolean-valued variable to be used to select
7007 cases from the data stream for processing.
7009 In order to set up filtering, specify BY and a variable name. Keyword
7010 BY is optional but recommended. Cases which have a zero or system- or
7011 user-missing value are excluded from analysis, but not deleted from the
7012 data stream. Cases with other values are analyzed.
7014 Use FILTER OFF to turn off case filtering.
7016 Filtering takes place immediately before cases pass to a procedure for
7017 analysis. Only one filter variable may be active at once. Normally,
7018 case filtering continues until it is explicitly turned off with FILTER
7019 OFF. However, if FILTER is placed after TEMPORARY, then filtering stops
7020 after execution of the next procedure or procedure-like command.
7022 @node N OF CASES, PROCESS IF, FILTER, Data Selection
7027 N [OF CASES] num_of_cases [ESTIMATED].
7030 Sometimes you may want to disregard cases of your input. The @code{N}
7031 command can be used to do this. @code{N 100} tells PSPP to
7032 disregard all cases after the first 100.
7034 If the value specified for @code{N} is greater than the number of cases
7035 read in, the value is ignored.
7037 @code{N} does not discard cases or cause them not to be read in. It
7038 just causes cases beyond the last one specified to be ignored by data
7041 A later @code{N} command can increase or decrease the number of cases
7042 selected. (To select all the cases without knowing how many there are,
7043 specify a very high number: 100000 or whatever you think is large enough.)
7045 Transformation procedures performed after @code{N} is executed
7046 @emph{do} cause cases to be discarded.
7048 The @code{SAMPLE}, @code{PROCESS IF}, and @code{SELECT IF} commands have
7049 precedence over @code{N}---the same results are obtained by both of the
7050 following fragments, given the same random number seeds:
7053 @i{@dots{}set up, read in data@dots{}}
7056 @i{@dots{}analyze data@dots{}}
7058 @i{@dots{}set up, read in data@dots{}}
7061 @i{@dots{}analyze data@dots{}}
7064 Both fragments above first randomly sample approximately half of the
7065 cases, then select the first 100 of those sampled.
7067 @code{N} with the @code{ESTIMATED} keyword can be used to give an
7068 estimated number of cases before DATA LIST or another command to
7069 read in data. (@code{ESTIMATED} never limits the number of cases
7070 processed by procedures.)
7072 @node PROCESS IF, SAMPLE, N OF CASES, Data Selection
7077 PROCESS IF expression.
7080 The PROCESS IF command is used to temporarily eliminate cases from the
7081 data stream. Its effects are active only through the execution of the
7082 next procedure or procedure-like command.
7084 Specify a boolean expression (@pxref{Expressions}). If the value of the
7085 expression is true for a particular case, the case will be analyzed. If
7086 the expression has a false or missing value, then the case will be
7087 deleted from the data stream for this procedure only.
7089 Regardless of its placement relative to other commands, PROCESS IF
7090 always takes effect immediately before data passes to the procedure.
7091 Only one PROCESS IF command may be in effect at any given time.
7093 The effects of PROCESS IF are similar not identical to the effects of
7094 executing TEMPORARY then SELECT IF (@pxref{SELECT IF}).
7096 Use of PROCESS IF is deprecated. It is included for compatibility with
7097 old command files. New syntax files should use SELECT IF or FILTER
7100 @node SAMPLE, SELECT IF, PROCESS IF, Data Selection
7105 SAMPLE num1 [FROM num2].
7108 @code{SAMPLE} is used to randomly sample a proportion of the cases in
7109 the active file. @code{SAMPLE} is temporary, affecting only the next
7110 procedure, unless that is a data transformation, such as @code{SELECT IF}
7113 The proportion to sample can be expressed as a single number between 0
7114 and 1. If @code{k} is the number specified, and @code{N} is the number
7115 of currently-selected cases in the active file, then after
7116 @code{SAMPLE @var{k}.}, approximately @code{k*N} cases will be
7119 The proportion to sample can also be specified in the style @code{SAMPLE
7120 @var{m} FROM @var{N}}. With this style, cases are selected as follows:
7124 If @var{N} is equal to the number of currently-selected cases in the
7125 active file, exactly @var{m} cases will be selected.
7128 If @var{N} is greater than the number of currently-selected cases in the
7129 active file, an equivalent proportion of cases will be selected.
7132 If @var{N} is less than the number of currently-selected cases in the
7133 active, exactly @var{m} cases will be selected @emph{from the first
7134 @var{N} cases in the active file.}
7137 @code{SAMPLE}, @code{SELECT IF}, and @code{PROCESS IF} are performed in
7138 the order specified by the syntax file.
7140 @code{SAMPLE} is ignored before @code{SORT CASES}.
7142 @code{SAMPLE} is always performed before @code{N OF CASES}, regardless
7143 of ordering in the syntax file. @xref{N OF CASES}.
7145 The same values for @code{SAMPLE} may result in different samples. To
7146 obtain the same sample, use the @code{SET} command to set the random
7147 number seed to the same value before each @code{SAMPLE}. By default,
7148 the random number seed is based on the system time.
7150 @node SELECT IF, SPLIT FILE, SAMPLE, Data Selection
7155 SELECT IF expression.
7158 The SELECT IF command is used to select particular cases for analysis
7159 based on the value of a boolean expression. Cases not selected are
7160 permanently eliminated, unless TEMPORARY is in effect
7161 (@pxref{TEMPORARY}).
7163 Specify a boolean expression (@pxref{Expressions}). If the value of the
7164 expression is true for a particular case, the case will be analyzed. If
7165 the expression has a false or missing value, then the case will be
7166 deleted from the data stream.
7168 Always place SELECT IF commands as early in the command file as
7169 possible. Cases that are deleted early can be processed more
7170 efficiently in time and space.
7172 @node SPLIT FILE, TEMPORARY, SELECT IF, Data Selection
7177 Two possible syntaxes:
7178 SPLIT FILE BY var_list.
7182 The SPLIT FILE command allows multiple sets of data present in one data
7183 file to be analyzed separately using single statistical procedure
7186 Specify a list of variable names in order to analyze multiple sets of
7187 data separately. Groups of cases having the same values for these
7188 variables are analyzed by statistical procedure commands as one group.
7189 An independent analysis is carried out for each group of cases, and the
7190 variable values for the group are printed along with the analysis.
7192 Specify OFF in order to disable SPLIT FILE and resume analysis of the
7193 entire active file as a single group of data.
7195 @node TEMPORARY, WEIGHT, SPLIT FILE, Data Selection
7203 The TEMPORARY command is used to make the effects of transformations
7204 following its execution temporary. These transformations will
7205 affect only the execution of the next procedure or procedure-like
7206 command. Their effects will not be saved to the active file.
7208 The only specification is the command name.
7210 TEMPORARY may not appear within a DO IF or LOOP construct. It may
7211 appear only once between procedures and procedure-like commands.
7213 An example may help to clarify:
7232 The data read by the first DESCRIPTIVES command are 4, 5, 8,
7233 10.5, 13, 15. The data read by the first DESCRIPTIVES command are 1, 2,
7236 @node WEIGHT, , TEMPORARY, Data Selection
7245 WEIGHT can be used to assign cases varying weights in order to
7246 change the frequency distribution of the active file. Execution of
7247 WEIGHT is delayed until data have been read in.
7249 If a variable name is specified, WEIGHT causes the values of that
7250 variable to be used as weighting factors for subsequent statistical
7251 procedures. Use of keyword BY is optional but recommended. Weighting
7252 variables must be numeric. Scratch variables may not be used for
7253 weighting (@pxref{Scratch Variables}).
7255 When OFF is specified, subsequent statistical procedures will weight all
7258 Weighting values do not need to be integers. However, negative and
7259 system- and user-missing values for the weighting variable are
7260 interpreted as weighting factors of 0.
7262 WEIGHT does not cause cases in the active file to be replicated in
7265 @node Conditionals and Looping, Statistics, Data Selection, Top
7266 @chapter Conditional and Looping Constructs
7267 @cindex conditionals
7269 @cindex flow of control
7270 @cindex control flow
7272 This chapter documents PSPP commands used for conditional execution,
7273 looping, and flow of control.
7276 * BREAK:: Exit a loop.
7277 * DO IF:: Conditionally execute a block of code.
7278 * DO REPEAT:: Textually repeat a code block.
7279 * LOOP:: Repeat a block of code.
7282 @node BREAK, DO IF, Conditionals and Looping, Conditionals and Looping
7290 BREAK terminates execution of the innermost currently executing LOOP
7293 BREAK is allowed only inside a LOOP construct. @xref{LOOP}, for more
7296 @node DO IF, DO REPEAT, BREAK, Conditionals and Looping
7311 The DO IF command allows one of several sets of transformations to be
7312 executed, depending on user-specified conditions.
7314 Specify a boolean expression. If the condition is true, then the block
7315 of code following DO IF is executed. If the condition is missing, then
7316 none of the code blocks is executed. If the condition is false, then
7317 the boolean expressions on the first ELSE IF, if present, is tested in
7318 turn, with the same rules applied. If all expressions evaluate to
7319 false, then the ELSE code block is executed, if it is present.
7321 @node DO REPEAT, LOOP, DO IF, Conditionals and Looping
7326 DO REPEAT repvar_name=expansion@dots{}.
7330 expansion takes one of the following forms:
7335 num_or_range takes one of the following forms:
7340 The DO REPEAT command causes a block of code to be repeated a number of
7341 times with different variables, numbers, or strings textually
7342 substituted into the block with each repetition.
7344 Specify a repeat variable name followed by an equals sign (@samp{=}) and
7345 the list of replacements. Replacements can be a list of variables
7346 (which may be existing variables or new variables or a combination
7347 thereof), of numbers, or of strings. When new variable names are
7348 specified, DO REPEAT creates them as numeric variables. When numbers
7349 are specified, runs of integers may be indicated with TO notation, for
7350 instance @samp{1 TO 5} and @samp{1 2 3 4 5} would be equivalent. There
7351 is no equivalent notation for string values.
7353 Multiple repeat variables can be specified. When this is done, each
7354 variable must have the same number of replacements.
7356 The code within DO REPEAT is repeated as many times as there are
7357 replacements for each variable. The first time, the first value for
7358 each repeat variable is substituted; the second time, the second value
7359 for each repeat variable is substituted; and so on.
7361 Repeat variable substitutions work like macros. They take place
7362 anywhere in a line that the repeat variable name occurs as a token,
7363 including command and subcommand names. For this reason it is not a
7364 good idea to select words commonly used in command and subcommand names
7365 as repeat variable identifiers.
7367 If PRINT is specified on END REPEAT, the commands after substitutions
7368 are made are printed to the listing file, prefixed by a plus sign
7371 @node LOOP, , DO REPEAT, Conditionals and Looping
7376 LOOP [index_var=start TO end [BY incr]] [IF condition].
7378 END LOOP [IF condition].
7381 The LOOP command allows a group of commands to be iterated. A number of
7382 termination options are offered.
7384 Specify index_var in order to make that variable count from one value to
7385 another by a particular increment. index_var must be a pre-existing
7386 numeric variable. start, end, and incr are numeric expressions
7387 (@pxref{Expressions}.)
7389 During the first iteration, index_var is set to the value of start.
7390 During each successive iteration, index_var is increased by the value of
7391 incr. If end > start, then the loop terminates when index_var > end;
7392 otherwise it terminates when index_var < end. If incr is not specified
7393 then it defaults to +1 or -1 as appropriate.
7395 If end > start and incr < 0, or if end < start and incr > 0, then the
7396 loop is never executed. index_var is nevertheless set to the value of
7399 Modifying index_var within the loop is allowed, but it has no effect on
7400 the value of index_var in the next iteration.
7402 Specify a boolean expression for the condition on the LOOP command to
7403 cause the loop to be executed only if the condition is true. If the
7404 condition is false or missing before the loop contents are executed the
7405 first time, the loop contents are not executed at all.
7407 If index and condition clauses are both present on LOOP, the index
7408 clause is always evaluated first.
7410 Specify a boolean expression for the condition on the END LOOP to cause
7411 the loop to terminate if the condition is not true after the enclosed
7412 code block is executed. The condition is evaluated at the end of the
7413 loop, not at the beginning.
7415 If the index clause and both condition clauses are not present, then the
7416 loop is executed MXLOOPS (@pxref{SET}) times or until BREAK
7417 (@pxref{BREAK}) is executed.
7419 The BREAK command provides another way to terminate execution of a LOOP
7422 @node Statistics, Utilities, Conditionals and Looping, Top
7425 This chapter documents the statistical procedures that PSPP supports so
7429 * DESCRIPTIVES:: Descriptive statistics.
7430 * FREQUENCIES:: Frequency tables.
7431 * CROSSTABS:: Crosstabulation tables.
7434 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
7435 @section DESCRIPTIVES
7440 /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
7441 /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
7443 /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
7444 SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
7445 SESKEWNESS,SEKURTOSIS@}
7446 /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
7447 RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
7451 The DESCRIPTIVES procedure reads the active file and outputs descriptive
7452 statistics requested by the user. In addition, it can optionally
7455 The VARIABLES subcommand, which is required, specifies the list of
7456 variables to be analyzed. Keyword VARIABLES is optional.
7458 All other subcommands are optional:
7460 The MISSING subcommand determines the handling of missing variables. If
7461 INCLUDE is set, then user-missing values are included in the
7462 calculations. If NOINCLUDE is set, which is the default, user-missing
7463 values are excluded. If VARIABLE is set, then missing values are
7464 excluded on a variable by variable basis; if LISTWISE is set, then
7465 the entire case is excluded whenever any value in that case has a
7466 system-missing or, if INCLUDE is set, user-missing value.
7468 The FORMAT subcommand affects the output format. Currently the
7469 LABELS/NOLABELS and NOINDEX/INDEX settings is not used. When SERIAL is
7470 set, both valid and missing number of cases are listed in the output;
7471 when NOSERIAL is set, only valid cases are listed.
7473 The SAVE subcommand causes DESCRIPTIVES to calculate Z scores for all
7474 the specified variables. The Z scores are saved to new variables.
7475 Variable names are generated by trying first the original variable name
7476 with Z prepended and truncated to a maximum of 8 characters, then the
7477 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
7478 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence. In addition, Z score
7479 variable names can be specified explicitly on VARIABLES in the variable
7480 list by enclosing them in parentheses after each variable.
7482 The STATISTICS subcommand specifies the statistics to be displayed:
7486 All of the statistics below.
7490 Standard error of the mean.
7496 Kurtosis and standard error of the kurtosis.
7498 Skewness and standard error of the skewness.
7508 Mean, standard deviation of the mean, minimum, maximum.
7510 Standard error of the kurtosis.
7512 Standard error of the skewness.
7515 The SORT subcommand specifies how the statistics should be sorted. Most
7516 of the possible values should be self-explanatory. NAME causes the
7517 statistics to be sorted by name. By default, the statistics are listed
7518 in the order that they are specified on the VARIABLES subcommand. The A
7519 and D settings request an ascending or descending sort order,
7522 @node FREQUENCIES, CROSSTABS, DESCRIPTIVES, Statistics
7523 @section FREQUENCIES
7528 /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
7529 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
7531 @{AVALUE,DVALUE,AFREQ,DFREQ@}
7534 /MISSING=@{EXCLUDE,INCLUDE@}
7535 /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
7536 KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
7537 SESKEWNESS,SEKURTOSIS,ALL,NONE@}
7539 /PERCENTILES=percent@dots{}
7541 (These options are not currently implemented.)
7548 /VARIABLES=var_list (low,high)@dots{}
7551 FREQUENCIES causes the data to be read and frequency tables to be built
7552 and output for specified variables. FREQUENCIES can also calculate and
7553 display descriptive statistics (including median and mode) and
7556 In the future, FREQUENCIES will also support graphical output in the
7557 form of bar charts and histograms. In addition, it will be able to
7558 support percentiles for grouped data. (As a historical note, these
7559 options were supported in a version of PSPP written years ago, but the
7560 code has not survived.)
7562 The VARIABLES subcommand is the only required subcommand. Specify the
7563 variables to be analyzed. In most cases, this is all that is required.
7564 This is known as @dfn{general mode}.
7566 Occasionally, one may want to invoke a special mode called @dfn{integer
7567 mode}. Normally, in general mode, PSPP will automatically determine
7568 what values occur in the data. In integer mode, the user specifies the
7569 range of values that the data assumes. To invoke this mode, specify a
7570 range of data values in parentheses, separated by a comma. Data values
7571 inside the range are truncated to the nearest integer, then assigned to
7572 that value. If values occur outside this range, they are discarded.
7574 The FORMAT subcommand controls the output format. It has several
7579 TABLE, the default, causes a frequency table to be output for every
7580 variable specified. NOTABLE prevents them from being output. LIMIT
7581 with a numeric argument causes them to be output except when there are
7582 more than the specified number of values in the table.
7585 STANDARD frequency tables contain more complete information, but also to
7586 take up more space on the printed page. CONDENSE frequency tables are
7587 less informative but take up less space. ONEPAGE with a numeric
7588 argument will output standard frequency tables if there are the
7589 specified number of values or less, condensed tables otherwise. ONEPAGE
7590 without an argument defaults to a threshold of 50 values.
7593 LABELS causes value labels to be displayed in STANDARD frequency
7594 tables. NOLABLES prevents this.
7597 Normally frequency tables are sorted in ascending order by value. This
7598 is AVALUE. DVALUE tables are sorted in descending order by value.
7599 AFREQ and DFREQ tables are sorted in ascending and descending order,
7600 respectively, by frequency count.
7603 SINGLE spaced frequency tables are closely spaced. DOUBLE spaced
7604 frequency tables have wider spacing.
7607 OLDPAGE and NEWPAGE are not currently used.
7610 The MISSING subcommand controls the handling of user-missing values.
7611 When EXCLUDE, the default, is set, user-missing values are not included
7612 in frequency tables or statistics. When INCLUDE is set, user-missing
7613 are included. System-missing values are never included in statistics,
7614 but are listed in frequency tables.
7616 The available STATISTICS are the same as available in DESCRIPTIVES
7617 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
7618 value, and MODE, the mode. (If there are multiple modes, the smallest
7619 value is reported.) By default, the mean, standard deviation of the
7620 mean, minimum, and maximum are reported for each variable.
7622 NTILES causes the specified quartiles to be reported. For instance,
7623 @code{/NTILES=4} would cause quartiles to be reported. In addition,
7624 particular percentiles can be requested with the PERCENTILES subcommand.
7626 @node CROSSTABS, , FREQUENCIES, Statistics
7631 /TABLES=var_list BY var_list [BY var_list]@dots{}
7632 /MISSING=@{TABLE,INCLUDE,REPORT@}
7633 /WRITE=@{NONE,CELLS,ALL@}
7634 /FORMAT=@{TABLES,NOTABLES@}
7635 @{LABELS,NOLABELS,NOVALLABS@}
7640 /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
7641 ASRESIDUAL,ALL,NONE@}
7642 /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
7643 KAPPA,ETA,CORR,ALL,NONE@}
7646 /VARIABLES=var_list (low,high)@dots{}
7649 CROSSTABS reads the active file and builds and displays crosstabulation
7650 tables requested by the user. It can calculate several statistics for
7651 each cell in the crosstabulation tables. In addition, a number of
7652 statistics can be calculated for each table itself.
7654 The TABLES subcommand is used to specify the tables to be reported. Any
7655 number of dimensions is permitted, and any number of variables per
7656 dimension is allowed. The TABLES subcommand may be repeated as many
7657 times as needed. This is the only required subcommand in @dfn{general
7660 Occasionally, one may want to invoke a special mode called @dfn{integer
7661 mode}. Normally, in general mode, PSPP will automatically determine
7662 what values occur in the data. In integer mode, the user specifies the
7663 range of values that the data assumes. To invoke this mode, specify the
7664 VARIABLES subcommand, giving a range of data values in parentheses for
7665 each variable to be used on the TABLES subcommand. Data values inside
7666 the range are truncated to the nearest integer, then assigned to that
7667 value. If values occur outside this range, they are discarded. When it
7668 is present, the VARIABLES subcommand must precede the TABLES subcommand.
7670 The MISSING subcommand determines the handling of user-missing values.
7671 When set to TABLE, the default, missing values are dropped on a table by
7672 table basis. When set to INCLUDE, user-missing values are included in
7673 tables and statistics. When set to REPORT, which is allowed only in
7674 integer mode, user-missing values are included in tables but marked with
7675 an @samp{M} (for ``missing'') and excluded from statistical
7678 Currently the WRITE subcommand is not used.
7680 The FORMAT subcommand controls the characteristics of the
7681 crosstabulation tables to be displayed. It has a number of possible
7686 TABLES, the default, causes crosstabulation tables to be output.
7687 NOTABLES suppresses them.
7690 LABELS, the default, allows variable labels and value labels to appear
7691 in the output. NOLABELS suppresses them. NOVALLABS displays variable
7692 labels but suppresses value labels.
7695 PIVOT, the default, causes each TABLES subcommand to be displayed in a
7696 pivot table format. NOPIVOT causes the old-style crosstabulation format
7700 AVALUE, the default, causes values to be sorted in ascending order.
7701 DVALUE asserts a descending sort order.
7704 INDEX/NOINDEX is currently ignored.
7707 BOX/NOBOX is currently ignored.
7710 The CELLS subcommand controls the contents of each cell in the displayed
7711 crosstabulation table. The possible settings are:
7727 Standardized residual.
7729 Adjusted standardized residual.
7733 Suppress cells entirely.
7736 @samp{/CELLS} without any settings specified requests COUNT, ROW,
7737 COLUMN, and TOTAL. If CELLS is not specified at all then only COUNT
7740 The STATISTICS subcommand selects statistics for computation:
7744 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
7745 correction, linear-by-linear association.
7749 Contingency coefficient.
7753 Uncertainty coefficient.
7769 Spearman correlation, Pearson's r.
7776 Selected statistics are only calculated when appropriate for the
7777 statistic. Certain statistics require tables of a particular size, and
7778 some statistics are calculated only in integer mode.
7780 @samp{/STATISTICS} without any settings selects CHISQ. If the
7781 STATISTICS subcommand is not given, no statistics are calculated.
7783 @strong{Please note:} Currently the implementation of CROSSTABS has the
7788 Pearson's R (but not Spearman!) is off a little.
7790 T values for Spearman's R and Pearson's R are wrong.
7792 How to calculate significance of symmetric and directional measures?
7794 Asymmetric ASEs and T values for lambda are wrong.
7796 ASE of Goodman and Kruskal's tau is not calculated.
7798 ASE of symmetric somers' d is wrong.
7800 Approx. T of uncertainty coefficient is wrong.
7803 Fix for any of these deficiencies would be welcomed.
7805 @node Utilities, Not Implemented, Statistics, Top
7808 Commands that don't fit any other category are placed here.
7810 Most of these commands are not affected by commands like IF and LOOP:
7811 they take effect only once, unconditionally, at the time that they are
7812 encountered in the input.
7815 * COMMENT:: Document your syntax file.
7816 * DOCUMENT:: Document the active file.
7817 * DISPLAY DOCUMENTS:: Display active file documents.
7818 * DISPLAY FILE LABEL:: Display the active file label.
7819 * DROP DOCUMENTS:: Remove documents from the active file.
7820 * EXECUTE:: Execute pending transformations.
7821 * FILE LABEL:: Set the active file's label.
7822 * INCLUDE:: Include a file within the current one.
7823 * QUIT:: Terminate the PSPP session.
7824 * SET:: Adjust PSPP runtime parameters.
7825 * SUBTITLE:: Provide a document subtitle.
7826 * SYSFILE INFO:: Display the dictionary in a system file.
7827 * TITLE:: Provide a document title.
7830 @node COMMENT, DOCUMENT, Utilities, Utilities
7836 Two possibles syntaxes:
7837 COMMENT comment text @dots{} .
7838 *comment text @dots{} .
7841 The COMMENT command is ignored. It is used to provide information to
7842 the author and other readers of the PSPP syntax file.
7844 A COMMENT command can extend over any number of lines. Don't forget to
7845 terminate it with a dot or a blank line!
7847 @node DOCUMENT, DISPLAY DOCUMENTS, COMMENT, Utilities
7852 DOCUMENT documentary_text.
7855 The DOCUMENT command adds one or more lines of descriptive commentary to
7856 the active file. Documents added in this way are saved to system files.
7857 They can be viewed using SYSFILE INFO or DISPLAY DOCUMENTS. They can be
7858 removed from the active file with DROP DOCUMENTS.
7860 Specify the documentary text following the DOCUMENT keyword. You can
7861 extend the documentary text over as many lines as necessary. Lines are
7862 truncated at 80 characters width. Don't forget to terminate the
7863 DOCUMENT command with a dot or a blank line.
7865 @node DISPLAY DOCUMENTS, DISPLAY FILE LABEL, DOCUMENT, Utilities
7866 @section DISPLAY DOCUMENTS
7867 @vindex DISPLAY DOCUMENTS
7873 DISPLAY DOCUMENTS displays the documents in the active file. Each
7874 document is preceded by a line giving the time and date that it was
7875 added. @xref{DOCUMENT}.
7877 @node DISPLAY FILE LABEL, DROP DOCUMENTS, DISPLAY DOCUMENTS, Utilities
7878 @section DISPLAY FILE LABEL
7879 @vindex DISPLAY FILE LABEL
7885 DISPLAY FILE LABEL displays the file label contained in the active file,
7886 if any. @xref{FILE LABEL}.
7888 @node DROP DOCUMENTS, EXECUTE, DISPLAY FILE LABEL, Utilities
7889 @section DROP DOCUMENTS
7890 @vindex DROP DOCUMENTS
7896 The DROP DOCUMENTS command removes all documents from the active file.
7897 New documents can be added with the DOCUMENT utility (@pxref{DOCUMENT}).
7899 DROP DOCUMENTS only changes the active file. It does not modify any
7900 system files stored on disk.
7902 @node EXECUTE, FILE LABEL, DROP DOCUMENTS, Utilities
7910 The EXECUTE utility causes the active file to be read and all pending
7911 transformations to be executed.
7913 @node FILE LABEL, FINISH, EXECUTE, Utilities
7918 FILE LABEL file_label.
7921 Use the FILE LABEL command to provide a title for the active file. This
7922 title will be saved into system files and portable files that are
7923 created during this PSPP run.
7925 It is not necessary to include quotes around file_label. If they are
7926 included then they become part of the file label.
7930 @node FINISH, INCLUDE, FILE LABEL, Utilities
7938 The FINISH command terminates the current PSPP session and returns
7939 control to the operating system.
7941 This command is not valid in interactive mode.
7944 @node INCLUDE, QUIT, FINISH, Utilities
7950 Two possible syntaxes:
7955 The INCLUDE command causes the PSPP command processor to read an
7956 additional command file as if it were included bodily in the current
7959 INCLUDE files may be nested to any depth, up to the limit of available
7962 @node QUIT, SET, INCLUDE, Utilities
7967 Two possible syntaxes:
7972 The QUIT command terminates the current PSPP session and returns control
7973 to the operating system.
7975 This command is not valid within a command file.
7977 @node SET, SUBTITLE, QUIT, Utilities
7985 /BLANKS=@{SYSMIS,'.',number@}
7986 /DECIMAL=@{DOT,COMMA@}
7994 /CPROMPT='cprompt_string'
7995 /DPROMPT='dprompt_string'
7996 /ERRORBREAK=@{OFF,ON@}
7998 /MXWARNS=max_warnings
8000 /VIEWLENGTH=@{MINIMUM,MEDIAN,MAXIMUM,n_lines@}
8001 /VIEWWIDTH=n_characters
8005 /MITERATE=max_iterations
8009 /SEED=@{RANDOM,seed_value@}
8010 /UNDEFINED=@{WARN,NOWARN@}
8013 /CC@{A,B,C,D,E@}=@{'npre,pre,suf,nsuf','npre.pre.suf.nsuf'@}
8014 /DECIMAL=@{DOT,COMMA@}
8019 /ERRORS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8021 /MESSAGES=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8022 /PRINTBACK=@{ON,OFF@}
8023 /RESULTS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8030 (output driver options)
8031 /HEADERS=@{NO,YES,BLANK@}
8032 /LENGTH=@{NONE,length_in_lines@}
8035 /PAGER=@{OFF,"pager_name"@}
8036 /WIDTH=@{NARROW,WIDTH,n_characters@}
8039 /JOURNAL=@{ON,OFF@} [filename]
8040 /LOG=@{ON,OFF@} [filename]
8043 /COMPRESSION=@{ON,OFF@}
8044 /SCOMPRESSION=@{ON,OFF@}
8049 (obsolete settings accepted for compatibility, but ignored)
8050 /AUTOMENU=@{ON,OFF@}
8053 /BOXSTRING=@{'xxx','xxxxxxxxxxx'@}
8054 /CASE=@{UPPER,UPLOW@}
8059 /HELPWINDOWS=@{ON,OFF@}
8062 /LOWRES=@{AUTO,ON,OFF@}
8064 /MENUS=@{STANDARD,EXTENDED@}
8065 /MXMEMORY=max_memory
8066 /PTRANSLATE=@{ON,OFF@}
8068 /RUNREVIEW=@{AUTO,MANUAL@}
8070 /TB1=@{'xxx','xxxxxxxxxxx'@}
8072 /WORKDEV=drive_letter
8073 /WORKSPACE=workspace_size
8077 The SET command allows the user to adjust several parameters relating to
8078 PSPP's execution. Since there are many subcommands to this command, its
8079 subcommands will be examined in groups.
8081 As a general comment, ON and YES are considered synonymous, and
8082 so are OFF and NO, when used as subcommand values.
8084 The data input subcommands affect the way that data is read from data
8085 files. The data input subcommands are
8089 This is the value assigned to an item data item that is empty or
8090 contains only whitespace. An argument of SYSMIS or '.' will cause the
8091 system-missing value to be assigned to null items. This is the
8092 default. Any real value may be assigned.
8095 The default DOT setting causes the decimal point character to be
8096 @samp{.}. A setting of COMMA causes the decimal point character to be
8100 Allows the default numeric input/output format to be specified. The
8101 default is F8.2. @xref{Input/Output Formats}.
8104 Program input subcommands affect the way that programs are parsed when
8105 they are typed interactively or run from a script. They are
8109 This is a single character indicating the end of a command. The default
8110 is @samp{.}. Don't change this.
8113 Whether a blank line is interpreted as ending the current command. The
8117 Interaction subcommands affect the way that PSPP interacts with an
8118 online user. The interaction subcommands are
8122 The command continuation prompt. The default is @samp{ > }.
8125 Prompt used when expecting data input within BEGIN DATA (@pxref{BEGIN
8126 DATA}). The default is @samp{data> }.
8129 Whether an error causes PSPP to stop processing the current command
8130 file after finishing the current command. The default is OFF.
8133 The maximum number of errors before PSPP halts processing of the current
8134 command file. The default is 50.
8137 The maximum number of warnings + errors before PSPP halts processing the
8138 current command file. The default is 100.
8141 The command prompt. The default is @samp{PSPP> }.
8144 The length of the screen in lines. MINIMUM means 25 lines, MEDIAN and
8145 MAXIMUM mean 43 lines. Otherwise specify the number of lines. Normally
8146 PSPP should auto-detect your screen size so this shouldn't have to be
8150 The width of the screen in characters. Normally 80 or 132.
8153 Program execution subcommands control the way that PSPP commands
8154 execute. The program execution subcommands are
8164 The maximum number of iterations for an uncontrolled loop.
8167 The initial pseudo-random number seed. Set to a real number or to
8168 RANDOM, which will obtain an initial seed from the current time of day.
8174 Data output subcommands affect the format of output data. These
8183 Set up custom currency formats. The argument is a string which must
8184 contain exactly three commas or exactly three periods. If commas, then
8185 the grouping character for the currency format is @samp{,}, and the
8186 decimal point character is @samp{.}; if periods, then the situation is
8189 The commas or periods divide the string into four fields, which are, in
8190 order, the negative prefix, prefix, suffix, and negative suffix. When a
8191 value is formatted using the custom currency format, the prefix precedes
8192 the value formatted and the suffix follows it. In addition, if the
8193 value is negative, the negative prefix precedes the prefix and the
8194 negative suffix follows the suffix.
8197 The default DOT setting causes the decimal point character to be
8198 @samp{.}. A setting of COMMA causes the decimal point character to be
8202 Allows the default numeric input/output format to be specified. The
8203 default is F8.2. @xref{Input/Output Formats}.
8206 Output routing subcommands affect where the output of transformations
8207 and procedures is sent. These subcommands are
8212 If turned on, commands are written to the listing file as they are read
8213 from command files. The default is OFF.
8223 Output activation subcommands affect whether output devices of
8224 particular types are enabled. These subcommands are
8228 Enable or disable listing devices.
8231 Enable or disable printer devices.
8234 Enable or disable screen devices.
8237 Output driver option subcommands affect output drivers' settings. These
8250 Logging subcommands affect logging of commands executed to external
8251 files. These subcommands are
8259 System file subcommands affect the default format of system files
8260 produced by PSPP. These subcommands are
8267 Whether system files created by SAVE or XSAVE are compressed by default.
8271 Security subcommands affect the operations that commands are allowed to
8272 perform. The security subcommands are
8276 When set, this setting cannot ever be reset, for obvious security
8277 reasons. Setting this option disables the following operations:
8285 Pipe filenames (filenames beginning or ending with @samp{|}).
8288 Be aware that this setting does not guarantee safety (commands can still
8289 overwrite files, for instance) but it is an improvement.
8292 @node SUBTITLE, TITLE, SET, Utilities
8297 Two possible syntaxes:
8298 SUBTITLE 'subtitle_string'.
8299 SUBTITLE subtitle_string.
8302 The SUBTITLE command is used to provide a subtitle to a particular PSPP
8303 run. This subtitle appears at the top of each output page below the
8304 title, if headers are enabled on the output device.
8306 Specify a subtitle as a string in quotes. The alternate syntax that did
8307 not require quotes is now obsolete. If it is used then the subtitle is
8308 converted to all uppercase.
8310 @node TITLE, , SUBTITLE, Utilities
8315 Two possible syntaxes:
8316 TITLE 'title_string'.
8320 The TITLE command is used to provide a title to a particular PSPP run.
8321 This title appears at the top of each output page, if headers are enabled
8322 on the output device.
8324 Specify a title as a string in quotes. The alternate syntax that did
8325 not require quotes is now obsolete. If it is used then the title is
8326 converted to all uppercase.
8328 @node Not Implemented, Data File Format, Utilities, Top
8329 @chapter Not Implemented
8331 This chapter lists parts of the PSPP language that are not yet
8334 The following transformations and utilities are not yet implemented, but
8335 they will be supported in a later release.
8368 The following transformations and utilities are not implemented. There
8369 are no plans to support them in future releases. Contributions to
8370 implement them will still be accepted.
8392 NUMBERED and UNNUMBERED
8405 @node Data File Format, Portable File Format, Not Implemented, Top
8406 @chapter Data File Format
8408 PSPP necessarily uses the same format for system files as do the
8409 products with which it is compatible. This chapter is a description of
8412 There are three data types used in system files: 32-bit integers, 64-bit
8413 floating points, and 1-byte characters. In this document these will
8414 simply be referred to as @code{int32}, @code{flt64}, and @code{char},
8415 the names that are used in the PSPP source code. Every field of type
8416 @code{int32} or @code{flt64} is aligned on a 32-bit boundary.
8418 The endianness of data in PSPP system files is not specified. System
8419 files output on a computer of a particular endianness will have the
8420 endianness of that computer. However, PSPP can read files of either
8421 endianness, regardless of its host computer's endianness. PSPP
8422 translates endianness for both integer and floating point numbers.
8424 Floating point formats are also not specified. PSPP does not
8425 translate between floating point formats. This is unlikely to be a
8426 problem as all modern computer architectures use IEEE 754 format for
8427 floating point representation.
8429 The PSPP system-missing value is represented by the largest possible
8430 negative number in the floating point format; in C, this is most likely
8431 @code{-DBL_MAX}. There are two other important values used in missing
8432 values: @code{HIGHEST} and @code{LOWEST}. These are represented by the
8433 largest possible positive number (probably @code{DBL_MAX}) and the
8434 second-largest negative number. The latter must be determined in a
8435 system-dependent manner; in IEEE 754 format it is represented by value
8436 @code{0xffeffffffffffffe}.
8438 System files are divided into records. Each record begins with an
8439 @code{int32} giving a numeric record type. Individual record types are
8443 * File Header Record::
8445 * Value Label Record::
8446 * Value Label Variable Record::
8448 * Machine int32 Info Record::
8449 * Machine flt64 Info Record::
8450 * Miscellaneous Informational Records::
8451 * Dictionary Termination Record::
8455 @node File Header Record, Variable Record, Data File Format, Data File Format
8456 @section File Header Record
8458 The file header is always the first record in the file.
8461 struct sysfile_header
8471 char creation_date[9];
8472 char creation_time[8];
8473 char file_label[64];
8479 @item char rec_type[4];
8480 Record type code. Always set to @samp{$FL2}. This is the only record
8481 for which the record type is not of type @code{int32}.
8483 @item char prod_name[60];
8484 Product identification string. This always begins with the characters
8485 @samp{@@(#) SPSS DATA FILE}. PSPP uses the remaining characters to
8486 give its version and the operating system name; for example, @samp{GNU
8487 pspp 0.1.4 - sparc-sun-solaris2.5.2}. The string is truncated if it
8488 would be longer than 60 characters; otherwise it is padded on the right
8491 @item int32 layout_code;
8492 Always set to 2. PSPP reads this value in order to determine the
8495 @item int32 case_size;
8496 Number of data elements per case. This is the number of variables,
8497 except that long string variables add extra data elements (one for every
8498 8 characters after the first 8).
8500 @item int32 compressed;
8501 Set to 1 if the data in the file is compressed, 0 otherwise.
8503 @item int32 weight_index;
8504 If one of the variables in the data set is used as a weighting variable,
8505 set to the index of that variable. Otherwise, set to 0.
8508 Set to the number of cases in the file if it is known, or -1 otherwise.
8510 In the general case it is not possible to determine the number of cases
8511 that will be output to a system file at the time that the header is
8512 written. The way that this is dealt with is by writing the entire
8513 system file, including the header, then seeking back to the beginning of
8514 the file and writing just the @code{ncases} field. For `files' in which
8515 this is not valid, the seek operation fails. In this case,
8516 @code{ncases} remains -1.
8519 Compression bias. Always set to 100. The significance of this value is
8520 that only numbers between @code{(1 - bias)} and @code{(251 - bias)} can
8523 @item char creation_date[9];
8524 Set to the date of creation of the system file, in @samp{dd mmm yy}
8525 format, with the month as standard English abbreviations, using an
8526 initial capital letter and following with lowercase. If the date is not
8527 available then this field is arbitrarily set to @samp{01 Jan 70}.
8529 @item char creation_time[8];
8530 Set to the time of creation of the system file, in @samp{hh:mm:ss}
8531 format and using 24-hour time. If the time is not available then this
8532 field is arbitrarily set to @samp{00:00:00}.
8534 @item char file_label[64];
8535 Set the the file label declared by the user, if any. Padded on the
8538 @item char padding[3];
8539 Ignored padding bytes to make the structure a multiple of 32 bits in
8540 length. Set to zeros.
8543 @node Variable Record, Value Label Record, File Header Record, Data File Format
8544 @section Variable Record
8546 Immediately following the header must come the variable records. There
8547 must be one variable record for every variable and every 8 characters in
8548 a long string beyond the first 8; i.e., there must be exactly as many
8549 variable records as the value specified for @code{case_size} in the file
8553 struct sysfile_variable
8557 int32 has_var_label;
8558 int32 n_missing_values;
8563 /* The following two fields are present
8564 only if has_var_label is 1. */
8566 char label[/* variable length */];
8568 /* The following field is present only
8569 if n_missing_values is not 0. */
8570 flt64 missing_values[/* variable length*/];
8575 @item int32 rec_type;
8576 Record type code. Always set to 2.
8579 Variable type code. Set to 0 for a numeric variable. For a short
8580 string variable or the first part of a long string variable, this is set
8581 to the width of the string. For the second and subsequent parts of a
8582 long string variable, set to -1, and the remaining fields in the
8583 structure are ignored.
8585 @item int32 has_var_label;
8586 If this variable has a variable label, set to 1; otherwise, set to 0.
8588 @item int32 n_missing_values;
8589 If the variable has no missing values, set to 0. If the variable has
8590 one, two, or three discrete missing values, set to 1, 2, or 3,
8591 respectively. If the variable has a range for missing variables, set to
8592 -2; if the variable has a range for missing variables plus a single
8593 discrete value, set to -3.
8596 Print format for this variable. See below.
8599 Write format for this variable. See below.
8602 Variable name. The variable name must begin with a capital letter or
8603 the at-sign (@samp{@@}). Subsequent characters may also be octothorpes
8604 (@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full
8605 stops (@samp{.}). The variable name is padded on the right with spaces.
8607 @item int32 label_len;
8608 This field is present only if @code{has_var_label} is set to 1. It is
8609 set to the length, in characters, of the variable label, which must be a
8610 number between 0 and 120.
8612 @item char label[/* variable length */];
8613 This field is present only if @code{has_var_label} is set to 1. It has
8614 length @code{label_len}, rounded up to the nearest multiple of 32 bits.
8615 The first @code{label_len} characters are the variable's variable label.
8617 @item flt64 missing_values[/* variable length */];
8618 This field is present only if @code{n_missing_values} is not 0. It has
8619 the same number of elements as the absolute value of
8620 @code{n_missing_values}. For discrete missing values, each element
8621 represents one missing value. When a range is present, the first
8622 element denotes the minimum value in the range, and the second element
8623 denotes the maximum value in the range. When a range plus a value are
8624 present, the third element denotes the additional discrete missing
8625 value. HIGHEST and LOWEST are indicated as described in the chapter
8629 The @code{print} and @code{write} members of sysfile_variable are output
8630 formats coded into @code{int32} types. The LSB (least-significant byte)
8631 of the @code{int32} represents the number of decimal places, and the
8632 next two bytes in order of increasing significance represent field width
8633 and format type, respectively. The MSB (most-significant byte) is not
8634 used and should be set to zero.
8636 Format types are defined as follows:
8720 @node Value Label Record, Value Label Variable Record, Variable Record, Data File Format
8721 @section Value Label Record
8723 Value label records must follow the variable records and must precede
8724 the header termination record. Other than this, they may appear
8725 anywhere in the system file. Every value label record must be
8726 immediately followed by a label variable record, described below.
8728 Value label records begin with @code{rec_type}, an @code{int32} value
8729 set to the record type of 3. This is followed by @code{count}, an
8730 @code{int32} value set to the number of value labels present in this
8733 These two fields are followed by a series of @code{count} tuples. Each
8734 tuple is divided into two fields, the value and the label. The first of
8735 these, the value, is composed of a 64-bit value, which is either a
8736 @code{flt64} value or up to 8 characters (padded on the right to 8
8737 bytes) denoting a short string value. Whether the value is a
8738 @code{flt64} or a character string is not defined inside the value label
8741 The second field in the tuple, the label, has variable length. The
8742 first @code{char} is a count of the number of characters in the value
8743 label. The remainder of the field is the label itself. The field is
8744 padded on the right to a multiple of 64 bits in length.
8746 @node Value Label Variable Record, Document Record, Value Label Record, Data File Format
8747 @section Value Label Variable Record
8749 Every value label variable record must be immediately preceded by a
8750 value label record, described above.
8753 struct sysfile_value_label_variable
8757 int32 vars[/* variable length */];
8762 @item int32 rec_type;
8763 Record type. Always set to 4.
8766 Number of variables that the associated value labels from the value
8767 label record are to be applied.
8769 @item int32 vars[/* variable length];
8770 A list of variables to which to apply the value labels. There are
8771 @code{count} elements.
8774 @node Document Record, Machine int32 Info Record, Value Label Variable Record, Data File Format
8775 @section Document Record
8777 There must be no more than one document record per system file.
8778 Document records must follow the variable records and precede the
8779 dictionary termination record.
8782 struct sysfile_document
8786 char lines[/* variable length */][80];
8791 @item int32 rec_type;
8792 Record type. Always set to 6.
8794 @item int32 n_lines;
8795 Number of lines of documents present.
8797 @item char lines[/* variable length */][80];
8798 Document lines. The number of elements is defined by @code{n_lines}.
8799 Lines shorter than 80 characters are padded on the right with spaces.
8802 @node Machine int32 Info Record, Machine flt64 Info Record, Document Record, Data File Format
8803 @section Machine @code{int32} Info Record
8805 There must be no more than one machine @code{int32} info record per
8806 system file. Machine @code{int32} info records must follow the variable
8807 records and precede the dictionary termination record.
8810 struct sysfile_machine_int32_info
8819 int32 version_major;
8820 int32 version_minor;
8821 int32 version_revision;
8823 int32 floating_point_rep;
8824 int32 compression_code;
8826 int32 character_code;
8831 @item int32 rec_type;
8832 Record type. Always set to 7.
8834 @item int32 subtype;
8835 Record subtype. Always set to 3.
8838 Size of each piece of data in the data part, in bytes. Always set to 4.
8841 Number of pieces of data in the data part. Always set to 8.
8843 @item int32 version_major;
8844 PSPP major version number. In version @var{x}.@var{y}.@var{z}, this
8847 @item int32 version_minor;
8848 PSPP minor version number. In version @var{x}.@var{y}.@var{z}, this
8851 @item int32 version_revision;
8852 PSPP version revision number. In version @var{x}.@var{y}.@var{z},
8855 @item int32 machine_code;
8856 Machine code. PSPP always set this field to value to -1, but other
8859 @item int32 floating_point_rep;
8860 Floating point representation code. For IEEE 754 systems this is 1.
8861 IBM 370 sets this to 2, and DEC VAX E to 3.
8863 @item int32 compression_code;
8864 Compression code. Always set to 1.
8866 @item int32 endianness;
8867 Machine endianness. 1 indicates big-endian, 2 indicates little-endian.
8869 @item int32 character_code;
8870 Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3
8871 indicates 8-bit ASCII, 4 indicates DEC Kanji.
8874 @node Machine flt64 Info Record, Miscellaneous Informational Records, Machine int32 Info Record, Data File Format
8875 @section Machine @code{flt64} Info Record
8877 There must be no more than one machine @code{flt64} info record per
8878 system file. Machine @code{flt64} info records must follow the variable
8879 records and precede the dictionary termination record.
8882 struct sysfile_machine_flt64_info
8898 @item int32 rec_type;
8899 Record type. Always set to 3.
8901 @item int32 subtype;
8902 Record subtype. Always set to 4.
8905 Size of each piece of data in the data part, in bytes. Always set to 4.
8908 Number of pieces of data in the data part. Always set to 3.
8911 The system missing value.
8913 @item flt64 highest;
8914 The value used for HIGHEST in missing values.
8917 The value used for LOWEST in missing values.
8920 @node Miscellaneous Informational Records, Dictionary Termination Record, Machine flt64 Info Record, Data File Format
8921 @section Miscellaneous Informational Records
8923 Miscellaneous informational records must follow the variable records and
8924 precede the dictionary termination record.
8926 Miscellaneous informational records are ignored by PSPP when reading
8927 system files. They are not written by PSPP when writing system files.
8930 struct sysfile_misc_info
8939 char data[/* variable length */];
8944 @item int32 rec_type;
8945 Record type. Always set to 3.
8947 @item int32 subtype;
8948 Record subtype. May take any value.
8951 Size of each piece of data in the data part. Should have the value 4 or
8952 8, for @code{int32} and @code{flt64}, respectively.
8955 Number of pieces of data in the data part.
8957 @item char data[/* variable length */];
8958 Arbitrary data. There must be @code{size} times @code{count} bytes of
8962 @node Dictionary Termination Record, Data Record, Miscellaneous Informational Records, Data File Format
8963 @section Dictionary Termination Record
8965 The dictionary termination record must follow all other records, except
8966 for the actual cases, which it must precede. There must be exactly one
8967 dictionary termination record in every system file.
8970 struct sysfile_dict_term
8978 @item int32 rec_type;
8979 Record type. Always set to 999.
8982 Ignored padding. Should be set to 0.
8985 @node Data Record, , Dictionary Termination Record, Data File Format
8986 @section Data Record
8988 Data records must follow all other records in the data file. There must
8989 be at least one data record in every system file.
8991 The format of data records varies depending on whether the data is
8992 compressed. Regardless, the data is arranged in a series of 8-byte
8995 When data is not compressed, Every case is composed of @code{case_size}
8996 of these 8-byte elements, where @code{case_size} comes from the file
8997 header record (@pxref{File Header Record}). Each element corresponds to
8998 the variable declared in the respective variable record (@pxref{Variable
8999 Record}). Numeric values are given in @code{flt64} format; string
9000 values are literal characters string, padded on the right when
9003 Compressed data is arranged in the following manner: the first 8-byte
9004 element in the data section is divided into a series of 1-byte command
9005 codes. These codes have meanings as described below:
9009 Ignored. If the program writing the system file accumulates compressed
9010 data in blocks of fixed length, 0 bytes can be used to pad out extra
9011 bytes remaining at the end of a fixed-size block.
9014 These values indicate that the corresponding numeric variable has the
9015 value @code{(@var{code} - @var{bias})} for the case being read, where
9016 @var{code} is the value of the compression code and @var{bias} is the
9017 variable @code{compression_bias} from the file header. For example,
9018 code 105 with bias 100.0 (the normal value) indicates a numeric variable
9022 End of file. This code may or may not appear at the end of the data
9023 stream. PSPP always outputs this code but its use is not required.
9026 This value indicates that the numeric or string value is not
9027 compressible. The value is stored in the 8-byte element following the
9028 current block of command bytes. If this value appears twice in a block
9029 of command bytes, then it indicates the second element following the
9030 command bytes, and so on.
9033 Used to indicate a string value that is all spaces.
9036 Used to indicate the system-missing value.
9039 When the end of the first 8-byte element of command bytes is reached,
9040 any blocks of non-compressible values are skipped, and the next element
9041 of command bytes is read and interpreted, until the end of the file is
9044 @node Portable File Format, q2c Input Format, Data File Format, Top
9045 @chapter Portable File Format
9047 These days, most computers use the same internal data formats for
9048 integer and floating-point data, if one ignores little differences like
9049 big- versus little-endian byte ordering. However, occasionally it is
9050 necessary to exchange data between systems with incompatible data
9051 formats. This is what portable files are designed to do.
9053 @strong{Please note:} Although all of the following information is
9054 correct, as far as the author has been able to ascertain, it is gleaned
9055 from examination of ASCII-formatted portable files only, so some of it
9056 may be incorrect in the general case.
9059 * Portable File Characters::
9060 * Portable File Structure::
9061 * Portable File Header::
9062 * Version and Date Info Record::
9063 * Identification Records::
9064 * Variable Count Record::
9065 * Variable Records::
9066 * Value Label Records::
9067 * Portable File Data::
9070 @node Portable File Characters, Portable File Structure, Portable File Format, Portable File Format
9071 @section Portable File Characters
9073 Portable files are arranged as a series of lines of exactly 80
9074 characters each. Each line is terminated by a carriage-return,
9075 line-feed sequence (henceforth, ``newline''). Newlines are not
9076 delimiters: they are only used to avoid line-length limitations existing
9077 on some operating systems.
9079 The file must be terminated with a @samp{Z} character. In addition, if
9080 the final line in the file does not have exactly 80 characters, then it
9081 is padded on the right with @samp{Z} characters. (The file contents may
9082 be in any character set; the file contains a description of its own
9083 character set, as explained in the next section. Therefore, the
9084 @samp{Z} character is not necessarily an ASCII @samp{Z}.)
9086 For the rest of the description of the portable file format, newlines
9087 and the trailing @samp{Z}s will be ignored, as if they did not exist,
9088 because they are not an important part of understanding the file
9091 @node Portable File Structure, Portable File Header, Portable File Characters, Portable File Format
9092 @section Portable File Structure
9094 Every portable file consists of the following records, in sequence:
9102 Version and date info.
9105 Product identification.
9108 Subproduct identification (optional).
9114 Variables. Each variable record may optionally be followed by a
9115 missing value record and a variable label record.
9118 Value labels (optional).
9124 Most records are identified by a single-character tag code. The file
9125 header and version info record do not have a tag.
9127 Other than these single-character codes, there are three types of fields
9128 in a portable file: floating-point, integer, and string. Floating-point
9129 fields have the following format:
9134 Zero or more leading spaces.
9137 Optional asterisk (@samp{*}), which indicates a missing value. The
9138 asterisk must be followed by a single character, generally a period
9139 (@samp{.}), but it appears that other characters may also be possible.
9140 This completes the specification of a missing value.
9143 Optional minus sign (@samp{-}) to indicate a negative number.
9146 A whole number, consisting of one or more base-30 digits: @samp{0}
9147 through @samp{9} plus capital letters @samp{A} through @samp{T}.
9150 A fraction, consisting of a radix point (@samp{.}) followed by one or
9151 more base-30 digits (optional).
9154 An exponent, consisting of a plus or minus sign (@samp{+} or @samp{-})
9155 followed by one or more base-30 digits (optional).
9158 A forward slash (@samp{/}).
9161 Integer fields take form identical to floating-point fields, but they
9162 may not contain a fraction.
9164 String fields take the form of a integer field having value @var{n},
9165 followed by exactly @var{n} characters, which are the string content.
9167 @node Portable File Header, Version and Date Info Record, Portable File Structure, Portable File Format
9168 @section Portable File Header
9170 Every portable file begins with a 464-byte header, consisting of a
9171 200-byte collection of vanity splash strings, followed by a 256-byte
9172 character set translation table, followed by an 8-byte tag string.
9174 The 200-byte segment is divided into five 40-byte sections, each of
9175 which represents the string @code{ASCII SPSS PORT FILE} in a different
9176 character set encoding. (If the file is encoded in EBCDIC then the
9177 string is actually @code{EBCDIC SPSS PORT FILE}, and so on.) These
9178 strings are padded on the right with spaces in their own character set.
9180 It appears that these strings exist only to inform those who might view
9181 the file on a screen, and that they are not parsed by SPSS products.
9182 Thus, they can be safely ignored. For those interested, the strings are
9183 supposed to be in the following character sets, in the specified order:
9184 EBCDIC, 7-bit ASCII, CDC 6-bit ASCII, 6-bit ASCII, Honeywell 6-bit
9187 The 256-byte segment describes a mapping from the character set used in
9188 the portable file to an arbitrary character set having characters at the
9189 following positions:
9194 Control characters. Not important enough to describe in full here.
9202 Digits @samp{0} through @samp{9}.
9206 Capital letters @samp{A} through @samp{Z}.
9210 Lowercase letters @samp{a} through @samp{z}.
9222 Solid vertical pipe.
9226 Symbols @code{&[]!$*);^-/}
9230 Broken vertical pipe.
9234 Symbols @code{,%_>}?@code{`:} @c @code{?} is an inverted question mark
9238 British pound symbol.
9242 Symbols @code{@@'="}.
9246 Less than or equal symbol.
9278 Lower left corner box draw.
9282 Upper left corner box draw.
9286 Greater than or equal symbol.
9290 Superscript @samp{0} through @samp{9}.
9294 Lower right corner box draw.
9298 Upper right corner box draw.
9310 Superscript @samp{(}.
9314 Superscript @samp{)}.
9318 Horizontal dagger (?).
9322 Symbols @samp{@{@}\}.
9329 Centered dot, or bullet.
9336 Symbols that are not defined in a particular character set are set to
9337 the same value as symbol 64; i.e., to @samp{0}.
9339 The 8-byte tag string consists of the exact characters @code{SPSSPORT}
9340 in the portable file's character set, which can be used to verify that
9341 the file is indeed a portable file.
9343 @node Version and Date Info Record, Identification Records, Portable File Header, Portable File Format
9344 @section Version and Date Info Record
9346 This record does not have a tag code. It has the following structure:
9350 A single character identifying the file format version. The letter A
9351 represents version 0, and so on.
9354 An 8-character string field giving the file creation date in the format
9358 A 6-character string field giving the file creation time in the format
9362 @node Identification Records, Variable Count Record, Version and Date Info Record, Portable File Format
9363 @section Identification Records
9365 The product identification record has tag code @samp{1}. It consists of
9366 a single string field giving the name of the product that wrote the
9369 The subproduct identification record has tag code @samp{3}. It
9370 consists of a single string field giving additional information on the
9371 product that wrote the portable file.
9373 @node Variable Count Record, Variable Records, Identification Records, Portable File Format
9374 @section Variable Count Record
9376 The variable count record has tag code @samp{4}. It consists of two
9377 integer fields. The first contains the number of variables in the file
9378 dictionary. The purpose of the second is unknown; it contains the value
9379 161 in all portable files examined so far.
9381 @node Variable Records, Value Label Records, Variable Count Record, Portable File Format
9382 @section Variable Records
9384 Each variable record represents a single variable. Variable records
9385 have tag code @samp{7}. They have the following structure:
9390 Width (integer). This is 0 for a numeric variable, and a number between 1
9391 and 255 for a string variable.
9394 Name (string). 1--8 characters long. Must be in all capitals.
9397 Print format. This is a set of three integer fields:
9402 Format type (@pxref{Variable Record}).
9405 Format width. 1--40.
9408 Number of decimal places. 1--40.
9412 Write format. Same structure as the print format described above.
9415 Each variable record can optionally be followed by a missing value
9416 record, which has tag code @samp{8}. A missing value record has one
9417 field, the missing value itself (a floating-point or string, as
9418 appropriate). Up to three of these missing value records can be used.
9420 There is also a record for missing value ranges, which has tag code
9421 @samp{B}. It is followed by two fields representing the range, which
9422 are floating-point or string as appropriate. If a missing value range
9423 is present, it may be followed by a single missing value record.
9425 Tag codes @samp{9} and @samp{A} represent @code{LO THRU @var{x}} and
9426 @code{@var{x} THRU HI} ranges, respectively. Each is followed by a
9427 single field representing @var{x}. If one of the ranges is present, it
9428 may be followed by a single missing value record.
9430 In addition, each variable record can optionally be followed by a
9431 variable label record, which has tag code @samp{C}. A variable label
9432 record has one field, the variable label itself (string).
9434 @node Value Label Records, Portable File Data, Variable Records, Portable File Format
9435 @section Value Label Records
9437 Value label records have tag code @samp{D}. They have the following
9442 Variable count (integer).
9445 List of variables (strings). The variable count specifies the number in
9446 the list. Variables are specified by their names. All variables must
9447 be of the same type (numeric or string).
9450 Label count (integer).
9453 List of (value, label) tuples. The label count specifies the number of
9454 tuples. Each tuple consists of a value, which is numeric or string as
9455 appropriate to the variables, followed by a label (string).
9458 @node Portable File Data, , Value Label Records, Portable File Format
9459 @section Portable File Data
9461 The data record has tag code @samp{F}. There is only one tag for all
9462 the data; thus, all the data must follow the dictionary. The data is
9463 terminated by the end-of-file marker @samp{Z}, which is not valid as the
9464 beginning of a data element.
9466 Data elements are output in the same order as the variable records
9467 describing them. String variables are output as string fields, and
9468 numeric variables are output as floating-point fields.
9470 @node q2c Input Format, Bugs, Portable File Format, Top
9471 @chapter @code{q2c} Input Format
9473 PSPP statistical procedures have a bizarre and somewhat irregular
9474 syntax. Despite this, a parser generator has been written that
9475 adequately addresses many of the possibilities and tries to provide
9476 hooks for the exceptional cases. This parser generator is named
9480 * Invoking q2c:: q2c command-line syntax.
9481 * q2c Input Structure:: High-level layout of the input file.
9482 * Grammar Rules:: Syntax of the grammar rules.
9485 @node Invoking q2c, q2c Input Structure, q2c Input Format, q2c Input Format
9486 @section Invoking q2c
9489 q2c @var{input.q} @var{output.c}
9492 @code{q2c} translates a @samp{.q} file into a @samp{.c} file. It takes
9493 exactly two command-line arguments, which are the input file name and
9494 output file name, respectively. @code{q2c} does not accept any
9495 command-line options.
9497 @node q2c Input Structure, Grammar Rules, Invoking q2c, q2c Input Format
9498 @section @code{q2c} Input Structure
9500 @code{q2c} input files are divided into two sections: the grammar rules
9501 and the supporting code. The @dfn{grammar rules}, which make up the
9502 first part of the input, are used to define the syntax of the
9503 statistical procedure to be parsed. The @dfn{supporting code},
9504 following the grammar rules, are copied largely unchanged to the output
9505 file, except for certain escapes.
9507 The most important lines in the grammar rules are used for defining
9508 procedure syntax. These lines can be prefixed with a dollar sign
9509 (@samp{$}), which prevents Emacs' CC-mode from munging them. Besides
9510 this, a bang (@samp{!}) at the beginning of a line causes the line,
9511 minus the bang, to be written verbatim to the output file (useful for
9512 comments). As a third special case, any line that begins with the exact
9513 characters @code{/* *INDENT} is ignored and not written to the output.
9514 This allows @code{.q} files to be processed through @code{indent}
9515 without being munged.
9517 The syntax of the grammar rules themselves is given in the following
9520 The supporting code is passed into the output file largely unchanged.
9521 However, the following escapes are supported. Each escape must appear
9522 on a line by itself.
9525 @item /* (header) */
9527 Expands to a series of C @code{#include} directives which include the
9528 headers that are required for the parser generated by @code{q2c}.
9530 @item /* (decls @var{scope}) */
9532 Expands to C variable and data type declarations for the variables and
9533 @code{enum}s input and output by the @code{q2c} parser. @var{scope}
9534 must be either @code{local} or @code{global}. @code{local} causes the
9535 declarations to be output as function locals. @code{global} causes them
9536 to be declared as @code{static} module variables; thus, @code{global} is
9537 a bit of a misnomer.
9539 @item /* (parser) */
9541 Expands to the entire parser. Must be enclosed within a C function.
9545 Expands to a set of calls to the @code{free} function for variables
9546 declared by the parser. Only needs to be invoked if subcommands of type
9547 @code{string} are used in the grammar rules.
9550 @node Grammar Rules, , q2c Input Structure, q2c Input Format
9551 @section Grammar Rules
9553 The grammar rules describe the format of the syntax that the parser
9554 generated by @code{q2c} will understand. The way that the grammar rules
9555 are included in @code{q2c} input file are described above.
9557 The grammar rules are divided into tokens of the following types:
9560 @item Identifier (@code{ID})
9562 An identifier token is a sequence of letters, digits, and underscores
9563 (@samp{_}). Identifiers are @emph{not} case-sensitive.
9565 @item String (@code{STRING})
9567 String tokens are initiated by a double-quote character (@samp{"}) and
9568 consist of all the characters between that double quote and the next
9569 double quote, which must be on the same line as the first. Within a
9570 string, a backslash can be used as a ``literal escape''. The only
9571 reasons to use a literal escape are to include a double quote or a
9572 backslash within a string.
9574 @item Special character
9576 Other characters, other than whitespace, constitute tokens in
9581 The syntax of the grammar rules is as follows:
9584 grammar-rules ::= ID : subcommands .
9585 subcommands ::= subcommand
9586 ::= subcommands ; subcommand
9589 The syntax begins with an ID or STRING token that gives the name of the
9590 procedure to be parsed. The rest of the syntax consists of subcommands
9591 separated by semicolons (@samp{;}) and terminated with a full stop
9595 subcommand ::= sbc-options ID sbc-defn
9598 ::= sbc-options sbc-options
9601 sbc-defn ::= opt-prefix = specifiers
9602 ::= [ ID ] = array-sbc
9603 ::= opt-prefix = sbc-special-form
9608 Each subcommand can be prefixed with one or more option characters. An
9609 asterisk (@samp{*}) is used to indicate the default subcommand; the
9610 keyword used for the default subcommand can be omitted in the PSPP
9611 syntax file. A plus sign (@samp{+}) is used to indicate that a
9612 subcommand can appear more than once; if it is not present then that
9613 subcommand can appear no more than once.
9615 The subcommand name appears after the option characters.
9617 There are three forms of subcommands. The first and most common form
9618 simply gives an equals sign (@samp{=}) and a list of specifiers, which
9619 can each be set to a single setting. The second form declares an array,
9620 which is a set of flags that can be individually turned on by the user.
9621 There are also several special forms that do not take a list of
9624 Arrays require an additional @code{ID} argument. This is used as a
9625 prefix, prepended to the variable names constructed from the
9626 specifiers. The other forms also allow an optional prefix to be
9630 array-sbc ::= alternatives
9631 ::= array-sbc , alternatives
9633 ::= alternatives | ID
9636 An array subcommand is a set of Boolean values that can independently be
9637 turned on by the user, listed separated by commas (@samp{,}). If an value has more
9638 than one name then these names are separated by pipes (@samp{|}).
9641 specifiers ::= specifier
9642 ::= specifiers , specifier
9643 specifier ::= opt-id : settings
9648 Ordinary subcommands (other than arrays and special forms) require a
9649 list of specifiers. Each specifier has an optional name and a list of
9650 settings. If the name is given then a correspondingly named variable
9651 will be used to store the user's choice of setting. If no name is given
9652 then there is no way to tell which setting the user picked; in this case
9653 the settings should probably have values attached.
9656 settings ::= setting
9657 ::= settings / setting
9658 setting ::= setting-options ID setting-value
9665 Individual settings are separated by forward slashes (@samp{/}). Each
9666 setting can be as little as an @code{ID} token, but options and values
9667 can optionally be included. The @samp{*} option means that, for this
9668 setting, the @code{ID} can be omitted. The @samp{!} option means that
9669 this option is the default for its specifier.
9673 ::= ( setting-value-2 )
9675 setting-value-2 ::= setting-value-options setting-value-type : ID
9676 setting-value-restriction
9677 setting-value-options ::=
9679 setting-value-type ::= N
9681 setting-value-restriction ::=
9685 Settings may have values. If the value must be enclosed in parentheses,
9686 then enclose the value declaration in parentheses. Declare the setting
9687 type as @samp{n} or @samp{d} for integer or floating point type,
9688 respectively. The given @code{ID} is used to construct a variable name.
9689 If option @samp{*} is given, then the value is optional; otherwise it
9690 must be specified whenever the corresponding setting is specified. A
9691 ``restriction'' can also be specified which is a string giving a C
9692 expression limiting the valid range of the value. The special escape
9693 @code{%s} should be used within the restriction to refer to the
9694 setting's value variable.
9697 sbc-special-form ::= VAR
9698 ::= VARLIST varlist-options
9699 ::= INTEGER opt-list
9702 ::= STRING @r{(the literal word STRING)} string-options
9709 ::= ( STRING STRING )
9712 The special forms are of the following types:
9717 A single variable name.
9721 A list of variables. If given, the string can be used to provide
9722 @code{PV_@var{*}} options to the call to @code{parse_variables}.
9726 A single integer value.
9730 A list of integers separated by spaces or commas.
9734 A single floating-point value.
9738 A list of floating-point values.
9742 A single positive integer value.
9746 A string value. If the options are given then the first string is an
9747 expression giving a restriction on the value of the string; the second
9748 string is an error message to display when the restriction is violated.
9752 A custom function is used to parse this subcommand. The function must
9753 have prototype @code{int custom_@var{name} (void)}. It should return 0
9754 on failure (when it has already issued an appropriate diagnostic), 1 on
9755 success, or 2 if it fails and the calling function should issue a syntax
9756 error on behalf of the custom handler.
9760 @node Bugs, Function Index, q2c Input Format, Top
9764 As of fvwm 0.99 there were exactly 39.342 unidentified bugs. Identified
9765 bugs have mostly been fixed, though. Since then 9.34 bugs have been
9766 fixed. Assuming that there are at least 10 unidentified bugs for every
9767 identified one, that leaves us with 39.342 - 9.34 + 10 * 9.34 = 123.422
9768 unidentified bugs. If we follow this to its logical conclusion we
9769 will have an infinite number of unidentified bugs before the number of
9770 bugs can start to diminish, at which point the program will be
9771 bug-free. Since this is a computer program infinity = 3.4028e+38 if you
9772 don't insist on double-precision. At the current rate of bug discovery
9773 we should expect to achieve this point in 3.37e+27 years. I guess I
9774 better plan on passing this thing on to my children@enddots{}
9776 ---Robert Nation, @cite{fvwm manpage}.
9780 * Known bugs:: Pointers to other files.
9781 * Contacting the Author:: Where to send the bug reports.
9784 @node Known bugs, Contacting the Author, Bugs, Bugs
9787 This is the list of known bugs in PSPP. In addition, @xref{Not
9788 Implemented}, and @xref{Functions Not Implemented}, for lists of bugs
9789 due to features not implemented. For known bugs in individual language
9790 features, see the documentation for that feature.
9794 Nothing has yet been tested exhaustively. Be cautious using PSPP to
9795 make important decisions.
9798 @code{make check} fails on some systems that don't like the syntax. I'm
9799 not sure why. If someone could make an attempt to track this down, it
9800 would be appreciated.
9803 PostScript driver bugs:
9807 Does not support driver arguments `max-fonts-simult' or
9808 `optimize-text-size'.
9811 Minor problems with font-encodings.
9814 Fails to align fonts along their baselines.
9817 Does not support certain bizarre line intersections--should
9818 never crop up in practice.
9821 Does not gracefully substitute for existing fonts whose
9822 encodings are missing.
9825 Does not perform italic correction or left italic correction
9829 Encapsulated PostScript is unimplemented.
9836 Does not support `infinite length' or `infinite width' paper.
9840 See below for information on reporting bugs not listed here.
9842 @node Contacting the Author, , Known bugs, Bugs
9843 @section Contacting the Author
9845 The author can be contacted at e-mail address
9850 @code{<blp@@gnu.org>}.
9853 PSPP bug reports should be sent to
9855 <bug-gnu-pspp@@gnu.org>.
9858 @code{<bug-gnu-pspp@@gnu.org>}.
9861 @node Function Index, Concept Index, Bugs, Top
9862 @chapter Function Index
9865 @node Concept Index, Command Index, Function Index, Top
9866 @chapter Concept Index
9869 @node Command Index, , Concept Index, Top
9870 @chapter Command Index
9877 @c compile-command: "makeinfo pspp.texi"