1 \input texinfo @c -*- texinfo -*-
5 @set TIMESTAMP Time-stamp: Sat Dec 20 20:25:33 WST 2003 jmd
8 @c For double-sided printing, uncomment:
9 @c @setchapternewpage odd
22 * PSPP: (pspp). Statistical analysis package.
26 PSPP, for statistical analysis of sampled data, by Ben Pfaff.
28 This file documents PSPP, a statistical package for analysis of
29 sampled data that uses a command language compatible with SPSS.
31 Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.
33 This version of the PSPP documentation is consistent with version 2 of
36 Permission is granted to make and distribute verbatim copies of this
37 manual provided the copyright notice and this permission notice are
38 preserved on all copies.
41 Permission is granted to process this file through TeX and print the
42 results, provided the printed document carries copying permission notice
43 identical to this one except for the removal of this paragraph (this
44 paragraph not being relevant to the printed manual).
47 Permission is granted to copy and distribute modified versions of this
48 manual under the conditions for verbatim copying, provided that the
49 entire resulting derived work is distributed under the terms of a
50 permission notice identical to this one.
52 Permission is granted to copy and distribute translations of this
53 manual into another language, under the above condition for modified
54 versions, except that this permission notice may be stated in a
55 translation approved by the Free Software Foundation.
60 @subtitle A System for Statistical Analysis
61 @subtitle Edition @value{EDITION}, for PSPP version @value{VERSION}
65 @vskip 0pt plus 1filll
67 PSPP Copyright @copyright{} 1997, 1998 Free Software Foundation, Inc.
69 Permission is granted to make and distribute verbatim copies of this
70 manual provided the copyright notice and this permission notice are
71 preserved on all copies.
73 Permission is granted to copy and distribute modified versions of this
74 manual under the conditions for verbatim copying, provided that the
75 entire derived work is distributed under the terms of a permission
76 notice identical to this one.
78 Permission is granted to copy and distribute translations of this manual
79 into another language, under the above conditions for modified versions,
80 except that this permission notice may be stated in a translation
81 approved by the Foundation.
84 @node Top, Introduction, (dir), (dir)
88 This file documents the PSPP package for statistical analysis of sampled
89 data. This is edition @value{EDITION}, for PSPP version
90 @value{VERSION}, last modified at @value{TIMESTAMP}.
95 * Introduction:: Description of the package.
96 * License:: Your rights and obligations.
97 * Credits:: Acknowledgement of authors.
99 * Installation:: How to compile and install PSPP.
100 * Configuration:: Configuring PSPP.
101 * Invocation:: Starting and running PSPP.
103 * Language:: Basics of the PSPP command language.
104 * Expressions:: Numeric and string expression syntax.
106 * Data Input and Output:: Reading data from user files.
107 * System and Portable Files:: Dealing with system & portable files.
108 * Variable Attributes:: Adjusting and examining variables.
109 * Data Manipulation:: Simple operations on data.
110 * Data Selection:: Select certain cases for analysis.
111 * Conditionals and Looping:: Doing things many times or not at all.
112 * Statistics:: Basic statistical procedures.
113 * Utilities:: Other commands.
114 * Not Implemented:: What's not here yet
116 * Data File Format:: Format of PSPP system files.
117 * Portable File Format:: Format of PSPP portable files.
118 * q2c Input Format:: Format of syntax accepted by q2c.
120 * Bugs:: Known problems; submitting bug reports.
122 * Function Index:: Index of PSPP functions for expressions.
123 * Concept Index:: Index of concepts.
124 * Command Index:: Index of PSPP procedures.
128 @node Introduction, License, Top, Top
129 @chapter Introduction
132 @cindex PSPP language
133 @cindex language, PSPP
134 PSPP is a tool for statistical analysis of sampled data. It reads a
135 syntax file and a data file, analyzes the data, and writes the results
136 to a listing file or to standard output.
138 The language accepted by PSPP is similar to those accepted by SPSS
139 statistical products. The details of PSPP's language are given
140 later in this manual.
147 @cindex Free Software Foundation
148 PSPP produces output in two forms: tables and charts. Both of these can
149 be written in several formats; currently, ASCII, PostScript, and HTML
150 are supported. In the future, more drivers, such as PCL and X Window
151 System drivers, may be developed. For now, Ghostscript, available from
152 the Free Software Foundation, may be used to convert PostScript chart
153 output to other formats.
155 The current version of PSPP, @value{VERSION}, is woefully incomplete in
156 terms of its statistical procedure support. PSPP is a work in progress.
157 The author hopes to support fully support all features in the products
158 that PSPP replaces, eventually. The author welcomes questions,
159 comments, donations, and code submissions. @xref{Bugs,,Submitting Bug
160 Reports}, for instructions on contacting the author.
162 @node License, Credits, Introduction, Top
163 @chapter Your rights and obligations
165 @cindex your rights and obligations
167 @cindex obligations, your
169 @cindex Free Software Foundation
170 @cindex GNU General Public License
171 @cindex General Public License
174 @cindex redistribution
175 Most of PSPP is distributed under the GNU General Public
176 License. The General Public License says, in effect, that you may
177 modify and distribute PSPP as you like, as long as you grant the
178 same rights to others. It also states that you must provide source code
179 when you distribute PSPP, or, if you obtained PSPP
180 source code from an anonymous ftp site, give out the name of that site.
182 The General Public License is given in full in the source distribution
183 as file @file{COPYING}. In Debian GNU/Linux, this file is also
184 available as file @file{/usr/share/common-licenses/GPL-2}.
186 To quote the GPL itself:
189 This program is free software; you can redistribute it and/or modify it
190 under the terms of the GNU General Public License as published by the
191 Free Software Foundation; either version 2 of the License, or (at your
192 option) any later version.
194 This program is distributed in the hope that it will be useful, but
195 WITHOUT ANY WARRANTY; without even the implied warranty of
196 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
197 General Public License for more details.
199 You should have received a copy of the GNU General Public License along
200 with this program; if not, write to the Free Software Foundation, Inc.,
201 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
204 @node Credits, Installation, License, Top
210 Most of PSPP, as well as this manual,
211 was written by Ben Pfaff. @xref{Contacting the Author}, for
212 instructions on contacting the author.
214 @cindex Covington, Michael A.
215 @cindex Van Zandt, James
216 @cindex @file{ftp.cdrom.com}
217 @cindex @file{/pub/algorithms/c/julcal10}
218 @cindex @file{julcal.c}
219 @cindex @file{julcal.h}
220 The PSPP source code incorporates @code{julcal10} originally
221 written by Michael A. Covington and translated into C by Jim Van Zandt.
222 The original package can be found in directory
223 @url{ftp://ftp.cdrom.com/pub/algorithms/c/julcal10}. The entire
224 contents of that directory constitute the package. The files actually
225 used in PSPP are @code{julcal.c} and @code{julcal.h}.
227 @node Installation, Configuration, Credits, Top
228 @chapter Installing PSPP
230 @cindex PSPP, installing
232 @cindex GNU C compiler
234 @cindex compiler, recommended
235 @cindex compiler, gcc
236 PSPP conforms to the GNU Coding Standards. PSPP is written in, and
237 requires for proper operation, ANSI/ISO C. You might want to
238 additionally note the following points:
242 The compiler and linker must allow for significance of several
243 characters in external identifiers. The exact number is unknown but at
244 least 31 is recommended.
247 The @code{int} type must be 32 bits or wider.
250 The recommended compiler is gcc 2.7.2.1 or later, but any ANSI compiler
251 will do if it fits the above criteria.
254 Many UNIX variants should work out-of-the-box, as PSPP uses GNU
255 autoconf to detect differences between environments. Please report any
256 problems with compilation of PSPP under UNIX and UNIX-like operating
257 systems---portability is a major concern of the author.
259 The pages below give specific instructions for installing PSPP
260 on each type of system mentioned above.
263 * UNIX installation:: Installing on UNIX-like environments.
266 @node UNIX installation, , Installation, Installation
267 @section UNIX installation
268 @cindex UNIX, installing PSPP under
269 @cindex installation, under UNIX
271 To install PSPP under a UNIX-like operating system, follow the steps
272 below in order. Some of the text below was taken directly from various
273 Free Software Foundation sources.
277 @code{cd} to the directory containing the PSPP source.
279 @cindex configure, GNU
280 @cindex GNU configure
282 Type @samp{./configure} to configure for your particular operating
283 system and compiler. Running @code{configure} takes a while. While
284 running, it displays some messages telling which features it is checking
287 You can optionally supply some options to @code{configure} to
288 give it hints about how to do its job. Type @code{./configure --help}
289 to see a list of options. One of the most useful options is
290 @samp{--with-checker}, which enables the use of the Checker memory
291 debugger under supported operating systems. Checker must already be
292 installed to use this option. Do not use @samp{--with-checker} if you
293 are not debugging PSPP itself.
295 @cindex @file{Makefile}
296 @cindex @file{config.h}
297 @cindex @file{pref.h}
300 (optional) Edit @file{Makefile}, @file{config.h}, and @file{pref.h}.
301 These files are produced by @code{configure}. Note that most PSPP
302 settings can be changed at runtime.
304 @file{pref.h} is only generated by @code{configure} if it does not
305 already exist. (It's copied from @file{prefh.orig}.)
309 Type @samp{make} to compile the package. If there are any errors during
310 compilation, try to fix them. If modifications are necessary to compile
311 correctly under your configuration, contact the author.
312 @xref{Bugs,,Submitting Bug Reports}, for details.
314 @cindex self-tests, running
316 Type @samp{make check} to run self-tests on the compiled PSPP package.
319 @cindex PSPP, installing
320 @cindex @file{/usr/local/share/pspp/}
321 @cindex @file{/usr/local/bin/}
322 @cindex @file{/usr/local/info/}
323 @cindex documentation, installing
325 Become the superuser and type @samp{make install} to install the
326 PSPP binaries, by default in @file{/usr/local/bin/}. The
327 directory @file{/usr/local/share/pspp/} is created and populated with
328 files needed by PSPP at runtime. This step will also cause the
329 PSPP documentation to be installed in @file{/usr/local/info/},
330 but only if that directory already exists.
333 (optional) Type @samp{make clean} to delete the PSPP binaries
334 from the source tree.
337 @node Configuration, Invocation, Installation, Top
338 @chapter Configuring PSPP
339 @cindex configuration
340 @cindex PSPP, configuring
342 PSPP has dozens of configuration possibilities and hundreds of
343 settings. This is both a bane and a blessing. On one hand, it's
344 possible to easily accommodate diverse ranges of setups. But, on the
345 other, the multitude of possibilities can overwhelm the casual user.
346 Fortunately, the configuration mechanisms are profusely described in the
347 sections below@enddots{}
350 * File locations:: How PSPP finds config files.
351 * Configuration techniques:: Many different methods of configuration@enddots{}
352 * Configuration files:: How configuration files are read.
353 * Environment variables:: All about environment variables.
354 * Output devices:: Describing your terminal(s) and printer(s).
355 * PostScript driver class:: Configuration of PostScript devices.
356 * ASCII driver class:: Configuration of character-code devices.
357 * HTML driver class:: Configuration for HTML output.
358 * Miscellaneous configuring:: Even more configuration variables.
359 * Improving output quality:: Hints for producing ever-more-lovely output.
362 @node File locations, Configuration techniques, Configuration, Configuration
363 @section Locating configuration files
365 PSPP uses the same method to find most of its configuration files:
369 The @dfn{base name} of the file being sought is determined.
372 The path to search is determined.
375 Each directory in the search path, from left to right, is searched for a
376 file with the name of the base name. The first occurrence is read
377 as the configuration file.
380 The first two steps are elaborated below for the sake of our pedantic
385 A @dfn{base name} is a file name lacking an absolute directory
386 reference. Some examples of base names are: @file{ps-encodings},
387 @file{devices}, @file{devps/DESC} (under UNIX), @file{devps\DESC} (under
390 Determining the base name is a two-step process:
394 If the appropriate environment variable is defined, the value of that
395 variable is used (@pxref{Environment variables}). For instance, when
396 searching for the output driver initialization file, the variable
397 examined is @code{STAT_OUTPUT_INIT_FILE}.
400 Otherwise, the compiled-in default is used. For example, when searching
401 for the output driver initialization file, the default base name is
405 @strong{Please note:} If a user-specified base name does contain an
406 absolute directory reference, as in a file name like
407 @file{/home/pfaff/fonts/TR}, no path is searched---the file name is used
408 exactly as given---and the algorithm terminates.
411 The path is the first of the following that is defined:
415 A variable definition for the path given in the user environment. This
416 is a PSPP-specific environment variable name; for instance,
417 @code{STAT_OUTPUT_INIT_PATH}.
420 In some cases, another, less-specific environment variable is checked.
421 For instance, when searching for font files, the PostScript driver first
422 checks for a variable with name @code{STAT_GROFF_FONT_PATH}, then for
423 one with name @code{GROFF_FONT_PATH}. (However, font searching has its
424 own list of esoteric search rules.)
427 The configuration file path, which is itself determined by the
432 If the command line contains an option of the form @samp{-B @var{path}}
433 or @samp{--config-dir=@var{path}}, then the value given on the
434 rightmost occurrence of such an option is used.
437 Otherwise, if the environment variable @code{STAT_CONFIG_PATH} is
438 defined, the value of that variable is used.
441 Otherwise, the compiled-in fallback default is used. On UNIX machines,
442 the default fallback path is
449 @file{/usr/local/lib/pspp}
455 On DOS machines, the default fallback path is:
459 All the paths from the DOS search path in the @samp{PATH} environment
460 variable, in left-to-right order.
463 @file{C:\PSPP}, as a last resort.
466 Note that the installer of PSPP can easily change this default
467 fallback path; thus the above should not be taken as gospel.
472 As a final note: Under DOS, directories given in paths are delimited by
473 semicolons (@samp{;}); under UNIX, directories are delimited by colons
474 (@samp{:}). This corresponds with the standard path delimiter under
477 @node Configuration techniques, Configuration files, File locations, Configuration
478 @section Configuration techniques
480 There are many ways that PSPP can be configured. These are
481 described in the list below. Values given by earlier items take
482 precedence over those given by later items.
486 Syntax commands that modify settings, such as @cmd{SET}. @xref{SET}.
489 Command-line options. @xref{Invocation}.
492 PSPP-specific environment variable contents. @xref{Environment
496 General environment variable contents. @xref{Environment variables}.
499 Configuration file contents. @xref{Configuration files}.
505 Some of the above may not apply to a particular setting. For instance,
506 the current pager (such as @samp{more}, @samp{most}, or @samp{less})
507 cannot be determined by configuration file contents because there is no
508 appropriate configuration file.
510 @node Configuration files, Environment variables, Configuration techniques, Configuration
511 @section Configuration files
513 Most configuration files have a common form:
517 Each line forms a separate command or directive. This means that lines
518 cannot be broken up, unless they are spliced together with a trailing
519 backslash, as described below.
522 Before anything else is done, trailing whitespace is removed.
525 When a line ends in a backslash (@samp{\}), the backslash is removed,
526 and the next line is read and appended to the current line.
530 Whitespace preceding the backslash is retained.
533 This rule continues to be applied until the line read does not end in a
537 It is an error if the last line in the file ends in a backslash.
541 Comments are introduced by an octothorpe (@samp{#}), and continue until the
546 An octothorpe inside balanced pairs of double quotation marks (@samp{"})
547 or single quotation marks (@samp{'}) does not introduce a comment.
550 The backslash character can be used inside balanced quotes of either
551 type to escape the following character as a literal character.
553 (This is distinct from the use of a backslash as a line-splicing
557 Line splicing takes place before comment removal.
561 Blank lines, and lines that contain only whitespace, are ignored.
564 @node Environment variables, Output devices, Configuration files, Configuration
565 @section Environment variables
567 You may think the concept of environment variables is a fairly simple
568 one. However, the author of PSPP has found a way to complicate
569 even something so simple. Environment variables are further described
570 in the sections below:
573 * Variable values:: Values of variables are determined this way.
574 * Environment substitutions:: How environment substitutions are made.
575 * Predefined variables:: A few variables are automatically defined.
578 @node Variable values, Environment substitutions, Environment variables, Environment variables
579 @subsection Values of environment variables
581 Values for environment variables are obtained by the following means,
582 which are arranged in order of decreasing precedence:
586 Command-line options. @xref{Invocation}.
589 The @file{environment} configuration file---more on this below.
592 Actual environment variables (defined in the shell or other parent
596 The @file{environment} configuration file is located through application
597 of the usual algorithm for configuration files (@pxref{File locations}),
598 except that its contents do not affect the search path used to find
599 @file{environment} itself. Use of @file{environment} is discouraged on
600 systems that allow an arbitrarily large environment; it is supported for
601 use on systems like MS-DOS that limit environment size.
603 @file{environment} is composed of lines having the form
604 @samp{@var{key}=@var{value}}, where @var{key} and the equals sign
605 (@samp{=}) are required, and @var{value} is optional. If @var{value} is
606 given, variable @var{key} is given that value; if @var{value} is absent,
607 variable @var{key} is undefined (deleted). Variables may not be defined
610 Environment substitutions are performed on each line in the file
611 (@pxref{Environment substitutions}).
613 See @ref{Configuration files}, for more details on formatting of the
614 environment configuration file.
617 @strong{Please note:} Support for @file{environment} is not yet
621 @node Environment substitutions, Predefined variables, Variable values, Environment variables
622 @subsection Environment substitutions
624 Much of the power of environment variables lies in the way that they may
625 be substituted into configuration files. Variable substitutions are
628 The line is scanned from left to right. In this scan, all characters
629 other than dollar signs (@samp{$}) are retained unmolested. Dollar
630 signs, however, introduce an environment variable reference. References
635 Replaced by the value of environment variable @var{var}, determined as
636 specified in @ref{Variable values}. @var{var} must be one of the
644 Exactly one nonalphabetic character. This may not be a left brace
649 Same as above, but @var{var} may contain any character (except
653 Replaced by a single dollar sign.
656 Undefined variables expand to a empty value.
658 @node Predefined variables, , Environment substitutions, Environment variables
659 @subsection Predefined environment variables
661 There are two environment variables predefined for use in environment
666 Defined as the version number of PSPP, as a string, in a format
667 something like @samp{0.9.4}.
670 Defined as the host architecture of PSPP, as a string, in standard
671 cpu-manufacturer-OS format. For instance, Debian GNU/Linux 1.1 on an
672 Intel machine defines this as @samp{i586-unknown-linux}. This is
673 somewhat dependent on the system used to compile PSPP.
676 Nothing prevents these values from being overridden, although it's a
677 good idea not to do so.
679 @node Output devices, PostScript driver class, Environment variables, Configuration
680 @section Output devices
682 Configuring output devices is the most complicated aspect of configuring
683 PSPP. The output device configuration file is named
684 @file{devices}. It is searched for using the usual algorithm for
685 finding configuration files (@pxref{File locations}). Each line in the
686 file is read in the usual manner for configuration files
687 (@pxref{Configuration files}).
689 Lines in @file{devices} are divided into three categories, described
690 briefly in the table below:
693 @item driver category definitions
694 Define a driver in terms of other drivers.
696 @item macro definitions
697 Define environment variables local to the the output driver
700 @item device definitions
701 Describe the configuration of an output device.
704 The following sections further elaborate the contents of the
708 * Driver categories:: How to organize the driver namespace.
709 * Macro definitions:: Environment variables local to @file{devices}.
710 * Device definitions:: Output device descriptions.
711 * Dimensions:: Lengths, widths, sizes, @enddots{}
712 * papersize:: Letter, legal, A4, envelope, @enddots{}
713 * Distinguishing line types:: Details on @file{devices} parsing.
714 * Tokenizing lines:: Dividing @file{devices} lines into tokens.
717 @node Driver categories, Macro definitions, Output devices, Output devices
718 @subsection Driver categories
720 Drivers can be divided into categories. Drivers are specified by their
721 names, or by the names of the categories that they are contained in.
722 Only certain drivers are enabled each time PSPP is run; by
723 default, these are the drivers in the category `default'. To enable a
724 different set of drivers, use the @samp{-o @var{device}} command-line
725 option (@pxref{Invocation}).
727 Categories are specified with a line of the form
728 @samp{@var{category}=@var{driver1} @var{driver2} @var{driver3} @var{@dots{}}
729 @var{driver@var{n}}}. This line specifies that the category
730 @var{category} is composed of drivers named @var{driver1},
731 @var{driver2}, and so on. There may be any number of drivers in the
732 category, from zero on up.
734 Categories may also be specified on the command line
735 (@pxref{Invocation}).
737 This is all you need to know about categories. If you're still curious,
740 First of all, the term `categories' is a bit of a misnomer. In fact,
741 the internal representation is nothing like the hierarchy that the term
742 seems to imply: a linear list is used to keep track of the enabled
745 When PSPP first begins reading @file{devices}, this list contains
746 the name of any drivers or categories specified on the command line, or
747 the single item `default' if none were specified.
749 Each time a category definition is specified, the list is searched for
750 an item with the value of @var{category}. If a matching item is found,
751 it is deleted. If there was a match, the list of drivers (@var{driver1}
752 through @var{driver@var{n}}) is then appended to the list.
754 Each time a driver definition line is encountered, the list is searched.
755 If the list contains an item with that driver's name, the driver is
756 enabled and the item is deleted from the list. Otherwise, the driver
759 It is an error if the list is not empty when the end of @file{devices}
762 @node Macro definitions, Device definitions, Driver categories, Output devices
763 @subsection Macro definitions
765 Macro definitions take the form @samp{define @var{macroname}
766 @var{definition}}. In such a macro definition, the environment variable
767 @var{macroname} is defined to expand to the value @var{definition}.
768 Before the definition is made, however, any macros used in
769 @var{definition} are expanded.
771 Please note the following nuances of macro usage:
775 For the purposes of this section, @dfn{macro} and @dfn{environment
776 variable} are synonyms.
779 Macros may not take arguments.
782 Macros may not recurse.
785 Macros are just environment variable definitions like other environment
786 variable definitions, with the exception that they are limited in scope
787 to the @file{devices} configuration file.
790 Macros override other all environment variables of the same name (within
791 the scope of @file{devices}).
794 Earlier macro definitions for a particular @var{key} override later
795 ones. In particular, macro definitions on the command line override
796 those in the device definition file. @xref{Non-option Arguments}.
799 There are two predefined macros, whose values are determined at runtime:
803 Defined as the width of the console screen, in columns of text.
806 Defined as the length of the console screen, in lines of text.
810 @node Device definitions, Dimensions, Macro definitions, Output devices
811 @subsection Driver definitions
813 Driver definitions are the ultimate purpose of the @file{devices}
814 configuration file. These are where the real action is. Driver
815 definitions tell PSPP where it should send its output.
817 Each driver definition line is divided into four fields. These fields
818 are delimited by colons (@samp{:}). Each line is subjected to
819 environment variable interpolation before it is processed further
820 (@pxref{Environment substitutions}). From left to right, the four
821 fields are, in brief:
825 A unique identifier, used to determine whether to enable the driver.
828 One of the predefined driver classes supported by PSPP. The
829 currently supported driver classes include `postscript' and `ascii'.
832 Zero or more of the following keywords, delimited by spaces:
837 Indicates that the device is a screen display. This may reduce the
838 amount of buffering done by the driver, to make interactive use more
843 Indicates that the device is a printer.
847 Indicates that the device is a listing file.
850 These options are just hints to PSPP and do not cause the output to be
851 directed to the screen, or to the printer, or to a listing file---those
852 must be set elsewhere in the options. They are used primarily to decide
853 which devices should be enabled at any given time. @xref{SET}, for more
857 An optional set of options to pass to the driver itself. The exact
858 format for the options varies among drivers.
861 The driver is enabled if:
865 Its driver name is specified on the command line, or
868 It's in a category specified on the command line, or
871 If no categories or driver names are specified on the command line, it
872 is in category @code{default}.
875 For more information on driver names, see @ref{Driver categories}.
877 The class name must be one of those supported by PSPP. The
878 classes supported depend on the options with which PSPP was
879 compiled. See later sections in this chapter for descriptions of the
880 available driver classes.
882 Options are dependent on the driver. See the driver descriptions for
885 @node Dimensions, papersize, Device definitions, Output devices
886 @subsection Dimensions
888 Quite often in configuration it is necessary to specify a length or a
889 size. PSPP uses a common syntax for all such, calling them
890 collectively by the name @dfn{dimensions}.
894 You can specify dimensions in decimal form (@samp{12.5}) or as
895 fractions, either as mixed numbers (@samp{12-1/2}) or raw fractions
899 A number of different units are available. These are suffixed to the
900 numeric part of the dimension. There must be no spaces between the
901 number and the unit. The available units are identical to those offered
902 by the popular typesetting system @TeX{}:
906 inch (1 @code{in} = 2.54 @code{cm})
909 inch (1 @code{in} = 2.54 @code{cm})
912 printer's point (1 @code{in} = 72.27 @code{pt})
915 pica (12 @code{pt} = 1 @code{pc})
918 PostScript point (1 @code{in} = 72 @code{bp})
924 millimeter (10 @code{mm} = 1 @code{cm})
927 didot point (1157 @code{dd} = 1238 @code{pt})
930 cicero (1 @code{cc} = 12 @code{dd})
933 scaled point (65536 @code{sp} = 1 @code{pt})
937 If no explicit unit is given, PSPP attempts to guess the best unit:
941 Numbers less than 50 are assumed to be in inches.
944 Numbers 50 or greater are assumed to be in millimeters.
948 @node papersize, Distinguishing line types, Dimensions, Output devices
949 @subsection Paper sizes
951 Output drivers usually deal with some sort of hardcopy media. This
952 media is called @dfn{paper} by the drivers, though in reality it could
953 be a transparency or film or thinly veiled sarcasm. To make it easier
954 for you to deal with paper, PSPP allows you to have (of course!) a
955 configuration file that gives symbolic names, like ``letter'' or
956 ``legal'' or ``a4'', to paper sizes, rather than forcing you to use
957 cryptic numbers like ``8-1/2 x 11'' or ``210 by 297''. Surprisingly
958 enough, this configuration file is named @file{papersize}.
959 @xref{Configuration files}.
961 When PSPP tries to connect a symbolic paper name to a paper size, it
962 reads and parses each non-comment line in the file, in order. The first
963 field on each line must be a symbolic paper name in double quotes.
964 Paper names may not contain double quotes. Paper names are not
965 case-sensitive: @samp{legal} and @samp{Legal} are equivalent.
967 If a match is found for the paper name, the rest of the line is parsed.
968 If it is found to be a pair of dimensions (@pxref{Dimensions}) separated
969 by either @samp{x} or @samp{by}, then those are taken to be the paper
970 size, in order of width followed by length. There @emph{must} be at
971 least one space on each side of @samp{x} or @samp{by}.
973 Otherwise the line must be of the form
974 @samp{"@var{paper-1}"="@var{paper-2}"}. In this case the target of the
975 search becomes paper name @var{paper-2} and the search through the file
978 @node Distinguishing line types, Tokenizing lines, papersize, Output devices
979 @subsection How lines are divided into types
981 The lines in @file{devices} are distinguished in the following manner:
985 Leading whitespace is removed.
988 If the resulting line begins with the exact string @code{define},
989 followed by one or more whitespace characters, the line is processed as
993 Otherwise, the line is scanned for the first instance of a colon
994 (@samp{:}) or an equals sign (@samp{=}).
997 If a colon is encountered first, the line is processed as a driver
1001 Otherwise, if an equals sign is encountered, the line is processed as a
1005 Otherwise, the line is ill-formed.
1008 @node Tokenizing lines, , Distinguishing line types, Output devices
1009 @subsection How lines are divided into tokens
1011 Each driver definition line is run through a simple tokenizer. This
1012 tokenizer recognizes two basic types of tokens.
1014 The first type is an equals sign (@samp{=}). Equals signs are both
1015 delimiters between tokens and tokens in themselves.
1017 The second type is an identifier or string token. Identifiers and
1018 strings are equivalent after tokenization, though they are written
1019 differently. An identifier is any string of characters other than
1020 whitespace or equals sign.
1022 A string is introduced by a single- or double-quote character (@samp{'}
1023 or @samp{"}) and, in general, continues until the next occurrence of
1024 that same character. The following standard C escapes can also be
1025 embedded within strings:
1029 A single-quote (@samp{'}).
1032 A double-quote (@samp{"}).
1035 A question mark (@samp{?}). Included for hysterical raisins.
1038 A backslash (@samp{\}).
1041 Audio bell (ASCII 7).
1044 Backspace (ASCII 8).
1047 Formfeed (ASCII 12).
1053 Carriage return (ASCII 13).
1059 Vertical tab (ASCII 11).
1061 @item \@var{o}@var{o}@var{o}
1062 Each @samp{o} must be an octal digit. The character is the one having
1063 the octal value specified. Any number of octal digits is read and
1064 interpreted; only the lower 8 bits are used.
1066 @item \x@var{h}@var{h}
1067 Each @samp{h} must be a hex digit. The character is the one having the
1068 hexadecimal value specified. Any number of hex digits is read and
1069 interpreted; only the lower 8 bits are used.
1072 Tokens, outside of quoted strings, are delimited by whitespace or equals
1075 @node PostScript driver class, ASCII driver class, Output devices, Configuration
1076 @section The PostScript driver class
1078 The @code{postscript} driver class is used to produce output that is
1079 acceptable to PostScript printers and to PC-based PostScript
1080 interpreters such as Ghostscript. Continuing a long tradition,
1081 PSPP's PostScript driver is configurable to the point of
1084 There are actually two PostScript drivers. The first one,
1085 @samp{postscript}, produces ordinary DSC-compliant PostScript output.
1086 The second one @samp{epsf}, produces an Encapsulated PostScript file.
1087 The two drivers are otherwise identical in configuration and in
1090 The PostScript driver is described in further detail below.
1093 * PS output options:: Output file options.
1094 * PS page options:: Paper, margins, scaling & rotation, more!
1095 * PS file options:: Configuration files.
1096 * PS font options:: Default fonts, font options.
1097 * PS line options:: Line widths, options.
1098 * Prologue:: Details on the PostScript prologue.
1099 * Encodings:: Details on PostScript font encodings.
1102 @node PS output options, PS page options, PostScript driver class, PostScript driver class
1103 @subsection PostScript output options
1105 These options deal with the form of the output and the output file
1109 @item output-file=@var{filename}
1111 File to which output should be sent. This can be an ordinary filename
1112 (i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or
1113 stdout (@code{"-"}). Default: @code{"pspp.ps"}.
1115 @item color=@var{boolean}
1117 Most of the time black-and-white PostScript devices are smart enough to
1118 map colors to shades themselves. However, you can cause the PSPP
1119 output driver to do an ugly simulation of this in its own driver by
1120 turning @code{color} off. Default: @code{on}.
1122 This is a boolean setting, as are many settings in the PostScript
1123 driver. Valid positive boolean values are @samp{on}, @samp{true},
1124 @samp{yes}, and nonzero integers. Negative boolean values are
1125 @samp{off}, @samp{false}, @samp{no}, and zero.
1127 @item data=@var{data-type}
1129 One of @code{clean7bit}, @code{clean8bit}, or @code{binary}. This
1130 controls what characters will be written to the output file. PostScript
1131 produced with @code{clean7bit} can be transmitted over 7-bit
1132 transmission channels that use ASCII control characters for line
1133 control. @code{clean8bit} is similar but allows characters above 127 to
1134 be written to the output file. @code{binary} allows any character in
1135 the output file. Default: @code{clean7bit}.
1137 @item line-ends=@var{line-end-type}
1139 One of @code{cr}, @code{lf}, or @code{crlf}. This controls what is used
1140 for newline in the output file. Default: @code{cr}.
1142 @item optimize-line-size=@var{level}
1144 Either @code{0} or @code{1}. If @var{level} is @code{1}, then short
1145 line segments will be collected and merged into longer ones. This
1146 reduces output file size but requires more time and memory. A
1147 @var{level} of @code{0} has the advantage of being better for
1148 interactive environments. @code{1} is the default unless the
1149 @code{screen} flag is set; in that case, the default is @code{0}.
1151 @item optimize-text-size=@var{level}
1153 One of @code{0}, @code{1}, or @code{2}, each higher level representing
1154 correspondingly more aggressive space savings for text in the output
1155 file and requiring correspondingly more time and memory. Unfortunately
1156 the levels presently are all the same. @code{1} is the default unless
1157 the @code{screen} flag is set; in that case, the default is @code{0}.
1160 @node PS page options, PS file options, PS output options, PostScript driver class
1161 @subsection PostScript page options
1163 These options affect page setup:
1166 @item headers=@var{boolean}
1168 Controls whether the standard headers showing the time and date and
1169 title and subtitle are printed at the top of each page. Default:
1172 @item paper-size=@var{paper-size}
1174 Paper size, either as a symbolic name (i.e., @code{letter} or @code{a4})
1175 or specific measurements (i.e., @code{8-1/2x11} or @code{"210 x 297"}.
1176 @xref{papersize, , Paper sizes}. Default: @code{letter}.
1178 @item orientation=@var{orientation}
1180 Either @code{portrait} or @code{landscape}. Default: @code{portrait}.
1182 @item left-margin=@var{dimension}
1183 @itemx right-margin=@var{dimension}
1184 @itemx top-margin=@var{dimension}
1185 @itemx bottom-margin=@var{dimension}
1187 Sets the margins around the page. The headers, if enabled, are not
1188 included in the margins; they are in addition to the margins. For a
1189 description of dimensions, see @ref{Dimensions}. Default: @code{0.5in}.
1193 @node PS file options, PS font options, PS page options, PostScript driver class
1194 @subsection PostScript file options
1196 Oh, my. You don't really want to know about the way that the PostScript
1197 driver deals with files, do you? Well I suppose you're entitled, but I
1198 warn you right now: it's not pretty. Here goes@enddots{}
1200 First let's look at the options that are available:
1204 @item font-dir=@var{font-directory}
1206 Sets the font directory. Default: @code{devps}.
1208 @item prologue-file=@var{prologue-file-name}
1210 Sets the name of the PostScript prologue file. You can write your own
1211 prologue, though I have no idea why you'd want to: see @ref{Prologue}.
1212 Default: @code{ps-prologue}.
1214 @item device-file=@var{device-file-name}
1216 Sets the name of the Groff-format device description file. The
1217 PostScript driver reads this to know about the scaling of fonts
1218 and so on. The format of such files is described in groff_font(5),
1219 included with Groff. Default: @code{DESC}.
1221 @item encoding-file=@var{encoding-file-name}
1223 Sets the name of the encoding file. This file contains a list of all
1224 font encodings that will be needed so that the driver can put all of
1225 them at the top of the prologue. @xref{Encodings}. Default:
1226 @code{ps-encodings}.
1228 If the specified encoding file cannot be found, this error will be
1229 silently ignored, since most people do not need any encodings besides
1230 the ones that can be found using @code{auto-encodings}, described below.
1232 @item auto-encode=@var{boolean}
1234 When enabled, the font encodings needed by the default proportional- and
1235 fixed-pitch fonts will automatically be dumped to the PostScript
1236 output. Otherwise, it is assumed that the user has an encoding file
1237 and knows how to use it (@pxref{Encodings}). There is probably no good
1238 reason to turn off this convenient feature. Default: @code{on}.
1242 Next I suppose it's time to describe the search algorithm. When the
1243 PostScript driver needs a file, whether that file be a font, a
1244 PostScript prologue, or what you will, it searches in this manner:
1249 Constructs a path by taking the first of the following that is defined:
1254 Environment variable @code{STAT_GROFF_FONT_PATH}. @xref{Environment
1258 Environment variable @code{GROFF_FONT_PATH}.
1261 The compiled-in fallback default.
1265 Constructs a base name from concatenating, in order, the font directory,
1266 a path separator (@samp{/} or @samp{\}), and the file to be found. A
1267 typical base name would be something like @code{devps/ps-encodings}.
1270 Searches for the base name in the path constructed above. If the file
1271 is found, the algorithm terminates.
1274 Searches for the base name in the standard configuration path. See
1275 @ref{File locations}, for more details. If the file is found, the
1276 algorithm terminates.
1279 At this point we remove the font directory and path separator from the
1280 base name. Now the base name is simply the file to be found, i.e.,
1281 @code{ps-encodings}.
1284 Searches for the base name in the path constructed in the first step.
1285 If the file is found, the algorithm terminates.
1288 Searches for the base name in the standard configuration path. If the
1289 file is found, the algorithm terminates.
1292 The algorithm terminates unsuccessfully.
1295 So, as you see, there are several ways to configure the PostScript
1296 drivers. Careful selection of techniques can make the configuration
1297 very flexible indeed.
1299 @node PS font options, PS line options, PS file options, PostScript driver class
1300 @subsection PostScript font options
1302 The list of available font options is short and sweet:
1305 @item prop-font=@var{font-name}
1307 Sets the default proportional font. The name should be that of a
1308 PostScript font. Default: @code{"Helvetica"}.
1310 @item fixed-font=@var{font-name}
1312 Sets the default fixed-pitch font. The name should be that of a
1313 PostScript font. Default: @code{"Courier"}.
1315 @item font-size=@var{font-size}
1317 Sets the size of the default fonts, in thousandths of a point. Default:
1322 @node PS line options, Prologue, PS font options, PostScript driver class
1323 @subsection PostScript line options
1325 Most tables contain lines, or rules, between cells. Some features of
1326 the way that lines are drawn in PostScript tables are user-definable:
1330 @item line-style=@var{style}
1332 Sets the style used for lines used to divide tables into sections.
1333 @var{style} must be either @code{thick}, in which case thick lines are
1334 used, or @var{double}, in which case double lines are used. Default:
1337 @item line-gutter=@var{dimension}
1339 Sets the line gutter, which is the amount of whitespace on either side
1340 of lines that border text or graphics objects. @xref{Dimensions}.
1341 Default: @code{0.5pt}.
1343 @item line-spacing=@var{dimension}
1345 Sets the line spacing, which is the amount of whitespace that separates
1346 lines that are side by side, as in a double line. Default:
1349 @item line-width=@var{dimension}
1351 Sets the width of a typical line used in tables. Default: @code{0.5pt}.
1353 @item line-width-thick=@var{dimension}
1355 Sets the width of a thick line used in tables. Not used if
1356 @code{line-style} is set to @code{thick}. Default: @code{1.5pt}.
1360 @node Prologue, Encodings, PS line options, PostScript driver class
1361 @subsection The PostScript prologue
1363 Most PostScript files that are generated mechanically by programs
1364 consist of two parts: a prologue and a body. The prologue is generally
1365 a collection of boilerplate. Only the body differs greatly between
1366 two outputs from the same program.
1368 This is also the strategy used in the PSPP PostScript driver. In
1369 general, the prologue supplied with PSPP will be more than sufficient.
1370 In this case, you will not need to read the rest of this section.
1371 However, hackers might want to know more. Read on, if you fall into
1374 The prologue is dumped into the output stream essentially unmodified.
1375 However, two actions are performed on its lines. First, certain lines
1376 may be omitted as specified in the prologue file itself. Second,
1377 variables are substituted.
1379 The following lines are omitted:
1383 All lines that contain three bangs in a row (@code{!!!}).
1386 Lines that contain @code{!eps}, if the PostScript driver is producing
1387 ordinary PostScript output. Otherwise an EPS file is being produced,
1388 and the line is included in the output, although everything following
1389 @code{!eps} is deleted.
1392 Lines that contain @code{!ps}, if the PostScript driver is producing EPS
1393 output. Otherwise, ordinary PostScript is being produced, and the line
1394 is included in the output, although everything following @code{!ps} is
1398 The following are the variables that are substituted. Only the
1399 variables listed are substituted; environment variables are not.
1400 @xref{Environment substitutions}.
1405 The page bounding box, in points, as four space-separated numbers. For
1406 U.S. letter size paper, this is @samp{0 0 612 792}.
1410 PSPP version as a string: @samp{GNU PSPP 0.1b}, for example.
1414 Date the file was created. Example: @samp{Tue May 21 13:46:22 1991}.
1418 Value of the @code{data} PostScript driver option, as one of the strings
1419 @samp{Clean7Bit}, @samp{Clean8Bit}, or @samp{Binary}.
1423 Page orientation, as one of the strings @code{Portrait} or
1428 Under multiuser OSes, the user's login name, taken either from the
1429 environment variable @code{LOGNAME} or, if that fails, the result of the
1430 C library function @code{getlogin()}. Defaults to @samp{nobody}.
1434 System hostname as reported by @code{gethostname()}. Defaults to
1439 Name of the default proportional font, prefixed by the word
1440 @samp{font} and a space. Example: @samp{font Times-Roman}.
1444 Name of the default fixed-pitch font, prefixed by the word @samp{font}
1449 The page scaling factor as a floating-point number. Example:
1450 @code{1.0}. Note that this is also passed as an argument to the BP
1456 The paper length and paper width, respectively, in thousandths of a
1457 point. Note that these are also passed as arguments to the BP macro.
1462 The left margin and top margin, respectively, in thousandths of a
1463 point. Note that these are also passed as arguments to the BP macro.
1467 Document title as a string. This is not the title specified in the
1468 PSPP syntax file. A typical title is the word @samp{PSPP} followed
1469 by the syntax file name in parentheses. Example: @samp{PSPP
1474 PSPP syntax file name. Example: @samp{mary96/first.stat}.
1478 Any other questions about the PostScript prologue can best be answered
1479 by examining the default prologue or the PSPP source.
1481 @node Encodings, , Prologue, PostScript driver class
1482 @subsection PostScript encodings
1484 PostScript fonts often contain many more than 256 characters, in order
1485 to accommodate foreign language characters and special symbols.
1486 PostScript uses @dfn{encodings} to map these onto single-byte symbol
1487 sets. Each font can have many different encodings applied to it.
1489 PSPP's PostScript driver needs to know which encoding to apply to each
1490 font. It can determine this from the information encapsulated in the
1491 Groff font description that it reads. However, there is an additional
1492 problem---for efficiency, the PostScript driver needs to have a complete
1493 list of all encodings that will be used in the entire session @emph{when
1494 it opens the output file}. For this reason, it can't use the
1495 information built into the fonts because it doesn't know which fonts
1498 As a stopgap solution, there are two mechanisms for specifying which
1499 encodings will be used. The first mechanism is automatic and it is the
1500 only one that most PSPP users will ever need. The second mechanism is
1501 manual, but it is more flexible. Either mechanism or both may be used
1504 The first mechanism is activated by the @samp{auto-encode} driver option
1505 (@pxref{PS file options}). When enabled, @samp{auto-encode} causes the
1506 PostScript driver to include the encodings used by the default
1507 proportional and fixed-pitch fonts (@pxref{PS font options}). Many
1508 PSPP output files will only need these encodings.
1510 The second mechanism is the file specified by the @samp{encoding-file}
1511 option (@pxref{PS file options}). If it exists, this file must consist
1512 of lines in PSPP configuration-file format (@pxref{Configuration
1513 files}). Each line that is not a comment should name a PostScript
1514 encoding to include in the output.
1516 It is not an error if an encoding is included more than once, by either
1517 mechanism. It will appear only once in the output. It is also not an
1518 error if an encoding is included in the output but never used. It
1519 @emph{is} an error if an encoding is used but not included by one of
1520 these mechanisms. In this case, the built-in PostScript encoding
1521 @samp{ISOLatin1Encoding} is substituted.
1523 @node ASCII driver class, HTML driver class, PostScript driver class, Configuration
1524 @section The ASCII driver class
1526 The ASCII driver class produces output that can be displayed on a
1527 terminal or output to printers. All of its options are highly
1528 configurable. The ASCII driver has class name @samp{ascii}.
1530 The ASCII driver is described in further detail below.
1533 * ASCII output options:: Output file options.
1534 * ASCII page options:: Page size, margins, more.
1535 * ASCII font options:: Box character, bold & italics.
1538 @node ASCII output options, ASCII page options, ASCII driver class, ASCII driver class
1539 @subsection ASCII output options
1542 @item output-file=@var{filename}
1544 File to which output should be sent. This can be an ordinary filename
1545 (e.g., @code{"pspp.txt"}), a pipe filename (e.g., @code{"|lpr"}), or
1546 stdout (@code{"-"}). Default: @code{"pspp.list"}.
1548 @item char-set=@var{char-set-type}
1550 One of @samp{ascii} or @samp{latin1}. This has no effect on output at
1551 the present time. Default: @code{ascii}.
1553 @item form-feed-string=@var{form-feed-value}
1555 The string written to the output to cause a formfeed. See also
1556 @code{paginate}, described below, for a related setting. Default:
1559 @item newline-string=@var{newline-value}
1561 The string written to the output to cause a newline (carriage return
1562 plus linefeed). The default, which can be specified explicitly with
1563 @code{newline-string=default}, is to use the system-dependent newline
1564 sequence by opening the output file in text mode. This is usually the
1567 However, @code{newline-string} can be set to any string. When this is
1568 done, the output file is opened in binary mode.
1570 @item paginate=@var{boolean}
1572 If set, a formfeed (as set in @code{form-feed-string}, described above)
1573 will be written to the device after every page. Default: @code{on}.
1575 @item tab-width=@var{tab-width-value}
1577 The distance between tab stops for this device. If set to 0, tabs will
1578 not be used in the output. Default: @code{8}.
1580 @item init=@var{initialization-string}.
1582 String written to the device before anything else, at the beginning of
1583 the output. Default: @code{""} (the empty string).
1585 @item done=@var{finalization-string}.
1587 String written to the device after everything else, at the end of the
1588 output. Default: @code{""} (the empty string).
1591 @node ASCII page options, ASCII font options, ASCII output options, ASCII driver class
1592 @subsection ASCII page options
1594 These options affect page setup:
1597 @item headers=@var{boolean}
1599 If enabled, two lines of header information giving title and subtitle,
1600 page number, date and time, and PSPP version are printed at the top of
1601 every page. These two lines are in addition to any top margin
1602 requested. Default: @code{on}.
1604 @item length=@var{line-count}
1606 Physical length of a page, in lines. Headers and margins are subtracted
1607 from this value. Default: @code{66}.
1609 @item width=@var{character-count}
1611 Physical width of a page, in characters. Margins are subtracted from
1612 this value. Default: @code{130}.
1614 @item lpi=@var{lines-per-inch}
1616 Number of lines per vertical inch. Not currently used. Default: @code{6}.
1618 @item cpi=@var{characters-per-inch}
1620 Number of characters per horizontal inch. Not currently used. Default:
1623 @item left-margin=@var{left-margin-width}
1625 Width of the left margin, in characters. PSPP subtracts this value
1626 from the page width. Default: @code{0}.
1628 @item right-margin=@var{right-margin-width}
1630 Width of the right margin, in characters. PSPP subtracts this value
1631 from the page width. Default: @code{0}.
1633 @item top-margin=@var{top-margin-lines}
1635 Length of the top margin, in lines. PSPP subtracts this value from
1636 the page length. Default: @code{2}.
1638 @item bottom-margin=@var{bottom-margin-lines}
1640 Length of the bottom margin, in lines. PSPP subtracts this value from
1641 the page length. Default: @code{2}.
1645 @node ASCII font options, , ASCII page options, ASCII driver class
1646 @subsection ASCII font options
1648 These are the ASCII font options:
1651 @item box[@var{line-type}]=@var{box-chars}
1653 The characters used for lines in tables produced by the ASCII driver can
1654 be changed using this option. @var{line-type} is used to indicate which
1655 type of line to change; @var{box-chars} is the character or string of
1656 characters to use for this type of line.
1658 @var{line-type} must be a 4-digit number in base 4. The digits are in
1659 the order `right', `bottom', `left', `top'. The four possibilities for
1673 Special device-defined line, if one is available; otherwise, a double
1682 Sets @samp{|} as the character to use for a single-width line with
1683 bottom and top components.
1687 Sets @samp{#} as the character to use for the intersection of four
1688 double-width lines, one each from the top, bottom, left and right.
1690 @item box[1100]="\xda"
1692 Sets @samp{"\xda"}, which under MS-DOS is a box character suitable for
1693 the top-left corner of a box, as the character for the intersection of
1694 two single-width lines, one each from the right and bottom.
1702 @code{box[0000]=" "}
1705 @code{box[1000]="-"}
1706 @*@code{box[0010]="-"}
1707 @*@code{box[1010]="-"}
1710 @code{box[0100]="|"}
1711 @*@code{box[0001]="|"}
1712 @*@code{box[0101]="|"}
1715 @code{box[2000]="="}
1716 @*@code{box[0020]="="}
1717 @*@code{box[2020]="="}
1720 @code{box[0200]="#"}
1721 @*@code{box[0002]="#"}
1722 @*@code{box[0202]="#"}
1725 @code{box[3000]="="}
1726 @*@code{box[0030]="="}
1727 @*@code{box[3030]="="}
1730 @code{box[0300]="#"}
1731 @*@code{box[0003]="#"}
1732 @*@code{box[0303]="#"}
1735 For all others, @samp{+} is used unless there are double lines or
1736 special lines, in which case @samp{#} is used.
1739 @item italic-on=@var{italic-on-string}
1741 Character sequence written to turn on italics or underline printing. If
1742 this is set to @code{overstrike}, then the driver will simulate
1743 underlining by overstriking with underscore characters (@samp{_}) in the
1744 manner described by @code{overstrike-style} and
1745 @code{carriage-return-style}. Default: @code{overstrike}.
1747 @item italic-off=@var{italic-off-string}
1749 Character sequence to turn off italics or underline printing. Default:
1750 @code{""} (the empty string).
1752 @item bold-on=@var{bold-on-string}
1754 Character sequence written to turn on bold or emphasized printing. If
1755 set to @code{overstrike}, then the driver will simulated bold printing
1756 by overstriking characters in the manner described by
1757 @code{overstrike-style} and @code{carriage-return-style}. Default:
1760 @item bold-off=@var{bold-off-string}
1762 Character sequence to turn off bold or emphasized printing. Default:
1763 @code{""} (the empty string).
1765 @item bold-italic-on=@var{bold-italic-on-string}
1767 Character sequence written to turn on bold-italic printing. If set to
1768 @code{overstrike}, then the driver will simulate bold-italics by
1769 overstriking twice, once with the character, a second time with an
1770 underscore (@samp{_}) character, in the manner described by
1771 @code{overstrike-style} and @code{carriage-return-style}. Default:
1774 @item bold-italic-off=@var{bold-italic-off-string}
1776 Character sequence to turn off bold-italic printing. Default: @code{""}
1779 @item overstrike-style=@var{overstrike-option}
1781 Either @code{single} or @code{line}:
1785 If @code{single} is selected, then, to overstrike a line of text, the
1786 output driver will output a character, backspace, overstrike, output a
1787 character, backspace, overstrike, and so on along a line.
1790 If @code{line} is selected then the output driver will output an entire
1791 line, then backspace or emit a carriage return (as indicated by
1792 @code{carriage-return-style}), then overstrike the entire line at once.
1795 @code{single} is recommended for use with ttys and programs that
1796 understand overstriking in text files, such as the pager @code{less}.
1797 @code{single} will also work with printer devices but results in rapid
1798 back-and-forth motions of the printhead that can cause the printer to
1799 physically overheat!
1801 @code{line} is recommended for use with printer devices. Most programs
1802 that understand overstriking in text files will not properly deal with
1805 Default: @code{single}.
1807 @item carriage-return-style=@var{carriage-return-type}
1809 Either @code{bs} or @code{cr}. This option applies only when one or
1810 more of the font commands is set to @code{overstrike} and, at the same
1811 time, @code{overstrike-style} is set to @code{line}.
1815 If @code{bs} is selected then the driver will return to the beginning of
1816 a line by emitting a sequence of backspace characters (ASCII 8).
1819 If @code{cr} is selected then the driver will return to the beginning of
1820 a line by emitting a single carriage-return character (ASCII 13).
1823 Although @code{cr} is preferred as being more compact, @code{bs} is more
1824 general since some devices do not interpret carriage returns in the
1825 desired manner. Default: @code{bs}.
1828 @node HTML driver class, Miscellaneous configuring, ASCII driver class, Configuration
1829 @section The HTML driver class
1831 The @code{html} driver class is used to produce output for viewing in
1832 tables-capable web browsers such as Emacs' w3-mode. Its configuration
1833 is very simple. Currently, the output has a very plain format. In the
1834 future, further work may be done on improving the output appearance.
1836 There are few options for use with the @code{html} driver class:
1839 @item output-file=@var{filename}
1841 File to which output should be sent. This can be an ordinary filename
1842 (i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or
1843 stdout (@code{"-"}). Default: @code{"pspp.html"}.
1845 @item prologue-file=@var{prologue-file-name}
1847 Sets the name of the PostScript prologue file. You can write your own
1848 prologue if you want to customize colors or other settings: see
1849 @ref{HTML Prologue}. Default: @code{html-prologue}.
1853 * HTML Prologue:: Format of the HTML prologue file.
1856 @node HTML Prologue, , HTML driver class, HTML driver class
1857 @subsection The HTML prologue
1859 HTML files that are generated by PSPP consist of two parts: a prologue
1860 and a body. The prologue is a collection of boilerplate. Only the body
1861 differs greatly between two outputs. You can tune the colors and other
1862 attributes of the output by editing the prologue.
1864 The prologue is dumped into the output stream essentially unmodified.
1865 However, two actions are performed on its lines. First, certain lines
1866 may be omitted as specified in the prologue file itself. Second,
1867 variables are substituted.
1869 The following lines are omitted:
1873 All lines that contain three bangs in a row (@code{!!!}).
1876 Lines that contain @code{!title}, if no title is set for the output. If
1877 a title is set, then the characters @code{!title} are removed before the
1881 Lines that contain @code{!subtitle}, if no subtitle is set for the
1882 output. If a subtitle is set, then the characters @code{!subtitle} are
1883 removed before the line is output.
1886 The following are the variables that are substituted. Only the
1887 variables listed are substituted; environment variables are not.
1888 @xref{Environment substitutions}.
1893 PSPP version as a string: @samp{GNU PSPP 0.1b}, for example.
1897 Date the file was created. Example: @samp{Tue May 21 13:46:22 1991}.
1901 Under multiuser OSes, the user's login name, taken either from the
1902 environment variable @code{LOGNAME} or, if that fails, the result of the
1903 C library function @code{getlogin()}. Defaults to @samp{nobody}.
1907 System hostname as reported by @code{gethostname()}. Defaults to
1912 Document title as a string. This is the title specified in the PSPP
1917 Document subtitle as a string.
1921 PSPP syntax file name. Example: @samp{mary96/first.stat}.
1924 @node Miscellaneous configuring, Improving output quality, HTML driver class, Configuration
1925 @section Miscellaneous configuration
1927 The following environment variables can be used to further configure
1933 Used to determine the user's home directory. No default value.
1935 @item STAT_INCLUDE_PATH
1937 Path used to find include files in PSPP syntax files. Defaults vary
1938 across operating systems:
1948 @file{~/.pspp/include}
1951 @file{/usr/local/lib/pspp/include}
1954 @file{/usr/lib/pspp/include}
1957 @file{/usr/local/share/pspp/include}
1960 @file{/usr/share/pspp/include}
1970 @file{C:\PSPP\INCLUDE}
1983 When PSPP invokes an external pager, it uses the first of these that
1984 is defined. There is a default pager only if the person who compiled
1989 The terminal type @code{termcap} or @code{ncurses} will use, if such
1990 support was compiled into PSPP.
1992 @item STAT_OUTPUT_INIT_FILE
1994 The basename used to search for the driver definition file.
1995 @xref{Output devices}. @xref{File locations}. Default: @code{devices}.
1997 @item STAT_OUTPUT_PAPERSIZE_FILE
1999 The basename used to search for the papersize file. @xref{papersize}.
2000 @xref{File locations}. Default: @code{papersize}.
2002 @item STAT_OUTPUT_INIT_PATH
2004 The path used to search for the driver definition file and the papersize
2005 file. @xref{File locations}. Default: the standard configuration path.
2009 The @code{sort} procedure stores its temporary files in this directory.
2010 Default: (UNIX) @file{/tmp}, (MS-DOS) @file{\}, (other OSes) empty string.
2015 Under MS-DOS only, these variables are consulted after TMPDIR, in this
2019 @node Improving output quality, , Miscellaneous configuring, Configuration
2020 @section Improving output quality
2022 When its drivers are set up properly, PSPP can produce output that
2023 looks very good indeed. The PostScript driver, suitably configured, can
2024 produce presentation-quality output. Here are a few guidelines for
2025 producing better-looking output, regardless of output driver. Your
2026 mileage may vary, of course, and everyone has different esthetic
2031 Width is important in PSPP output. Greater output width leads to more
2032 readable output, to a point. Try the following to increase the output
2037 If you're using the ASCII driver with a dot-matrix printer, figure out
2038 what you need to do to put the printer into compressed mode. Put that
2039 string into the @code{init-string} setting. Try to get 132 columns; 160
2040 might be better, but you might find that print that tiny is difficult to
2044 With the PostScript driver, try these ideas:
2051 Legal-size (8.5" x 14") paper in landscape mode.
2054 Reducing font sizes. If you're using 12-point fonts, try 10 point; if
2055 you're using 10-point fonts, try 8 point. Some fonts are more readable
2056 than others at small sizes.
2060 Try to strike a balance between character size and page width.
2063 Use high-quality fonts. Many public domain fonts are poor in quality.
2064 Recently, URW made some high-quality fonts available under the GPL.
2065 These are probably suitable.
2068 Be sure you're using the proper font metrics. The font metrics provided
2069 with PSPP may not correspond to the fonts actually being printed.
2070 This can cause bizarre-looking output.
2073 Make sure that you're using good ink/ribbon/toner. Darker print is
2077 Use plain fonts with serifs, such as Times-Roman or Palatino. Avoid
2078 choosing italic or bold fonts as document base fonts.
2081 @node Invocation, Language, Configuration, Top
2082 @chapter Invoking PSPP
2084 @cindex PSPP, invoking
2086 @cindex command line, options
2087 @cindex options, command-line
2089 pspp [ -B @var{dir} | --config-dir=@var{dir} ] [ -o @var{device} | --device=@var{device} ]
2090 [ -d @var{var}[=@var{value}] | --define=@var{var}[=@var{value}] ] [-u @var{var} | --undef=@var{var} ]
2091 [ -f @var{file} | --out-file=@var{file} ] [ -p | --pipe ] [ -I- | --no-include ]
2092 [ -I @var{dir} | --include=@var{dir} ] [ -i | --interactive ]
2093 [ -n | --edit | --dry-run | --just-print | --recon ]
2094 [ -r | --no-statrc ] [ -h | --help ] [ -l | --list ]
2095 [ -c @var{command} | --command @var{command} ] [ -s | --safer ]
2096 [ --testing-mode ] [ -V | --version ] [ -v | --verbose ]
2097 [ @var{key}=@var{value} ] @var{file}@enddots{}
2101 * Non-option Arguments:: Specifying syntax files and output devices.
2102 * Configuration Options:: Change the configuration for the current run.
2103 * Input and output options:: Controlling input and output files.
2104 * Language control options:: Language variants.
2105 * Informational options:: Helpful information about PSPP.
2108 @node Non-option Arguments, Configuration Options, Invocation, Invocation
2109 @section Non-option Arguments
2111 Syntax files and output device substitutions can be specified on
2112 PSPP's command line:
2117 A file by itself on the command line will be executed as a syntax file.
2118 PSPP terminates after the syntax file runs, unless the @code{-i} or
2119 @code{--interactive} option is given (@pxref{Language control options}).
2121 @item @var{file1} @var{file2}
2123 When two or more filenames are given on the command line, the first
2124 syntax file is executed, then PSPP's dictionary is cleared, then the second
2125 syntax file is executed.
2127 @item @var{file1} + @var{file2}
2129 If syntax files' names are delimited by a plus sign (@samp{+}), then the
2130 dictionary is not cleared between their executions, as if they were
2131 concatenated together into a single file.
2133 @item @var{key}=@var{value}
2135 Defines an output device macro @var{key} to expand to @var{value},
2136 overriding any macro having the same @var{key} defined in the device
2137 configuration file. @xref{Macro definitions}.
2141 There is one other way to specify a syntax file, if your operating
2142 system supports it. If you have a syntax file @file{foobar.stat}, put
2146 #! /usr/local/bin/pspp
2149 at the top, and mark the file as executable with @code{chmod +x
2150 foobar.stat}. (If PSPP is not installed in @file{/usr/local/bin},
2151 then insert its actual installation directory into the syntax file
2152 instead.) Now you should be able to invoke the syntax file just by
2153 typing its name. You can include any options on the command line as
2154 usual. PSPP entirely ignores any lines beginning with @samp{#!}.
2156 @node Configuration Options, Input and output options, Non-option Arguments, Invocation
2157 @section Configuration Options
2159 Configuration options are used to change PSPP's configuration for the
2160 current run. The configuration options are:
2164 @itemx --config-dir=@var{dir}
2166 Sets the configuration directory to @var{dir}. @xref{File locations}.
2168 @item -o @var{device}
2169 @itemx --device=@var{device}
2171 Selects the output device with name @var{device}. If this option is
2172 given more than once, then all devices mentioned are selected. This
2173 option disables all devices besides those mentioned on the command line.
2175 @item -d @var{var}[=@var{value}]
2176 @itemx --define=@var{var}[=@var{value}]
2178 Defines an `environment variable' named @var{var} having the optional
2179 value @var{value} specified. @xref{Variable values}.
2182 @itemx --undef=@var{var}
2184 Undefines the `environment variable' named @var{var}. @xref{Variable
2188 @node Input and output options, Language control options, Configuration Options, Invocation
2189 @section Input and output options
2191 Input and output options affect how PSPP reads input and writes
2192 output. These are the input and output options:
2196 @itemx --out-file=@var{file}
2198 This overrides the output file name for devices designated as listing
2199 devices. If a file named @var{file} already exists, it is overwritten.
2204 Allows PSPP to be used as a filter by causing the syntax file to be
2205 read from stdin and output to be written to stdout. Conflicts with the
2206 @code{-f @var{file}} and @code{--file=@var{file}} options.
2211 Clears all directories from the include path. This includes all
2212 directories put in the include path by default. @xref{Miscellaneous
2216 @itemx --include=@var{dir}
2218 Appends directory @var{dir} to the path that is searched for include
2219 files in PSPP syntax files.
2221 @item -c @var{command}
2222 @itemx --command=@var{command}
2224 Execute literal command @var{command}. The command is executed before
2225 startup syntax files, if any.
2227 @item --testing-mode
2229 Invoke heuristics to assist with testing PSPP. For use by @code{make
2230 check} and similar scripts.
2233 @node Language control options, Informational options, Input and output options, Invocation
2234 @section Language control options
2236 Language control options control how PSPP syntax files are parsed and
2237 interpreted. The available language control options are:
2241 @itemx --interactive
2243 When a syntax file is specified on the command line, PSPP normally
2244 terminates after processing it. Giving this option will cause PSPP to
2245 bring up a command prompt after processing the syntax file.
2247 In addition, this forces syntax files to be interpreted in interactive
2248 mode, rather than the default batch mode. @xref{Tokenizing lines}, for
2249 information on the differences between batch mode and interactive mode
2250 command interpretation.
2258 Only the syntax of any syntax file specified or of commands entered at
2259 the command line is checked. Transformations are not performed and
2260 procedures are not executed. Not yet implemented.
2265 Prevents the execution of the PSPP startup syntax file. Not yet
2266 implemented, as startup syntax files aren't, either.
2271 Disables certain unsafe operations. This includes the ERASE and
2272 HOST commands, as well as use of pipes as input and output files.
2275 @node Informational options, , Language control options, Invocation
2276 @section Informational options
2278 Informational options cause information about PSPP to be written to
2279 the terminal. Here are the available options:
2285 Prints a message describing PSPP command-line syntax and the available
2286 device driver classes, then terminates.
2291 Lists the available device driver classes, then terminates.
2296 Prints a brief message listing PSPP's version, warranties you don't
2297 have, copying conditions and copyright, and e-mail address for bug
2298 reports, then terminates.
2303 Increments PSPP's verbosity level. Higher verbosity levels cause
2304 PSPP to display greater amounts of information about what it is
2305 doing. Often useful for debugging PSPP's configuration.
2307 This option can be given multiple times to set the verbosity level to
2308 that value. The default verbosity level is 0, in which no informational
2309 messages will be displayed.
2311 Higher verbosity levels cause messages to be displayed when the
2312 corresponding events take place.
2317 Driver and subsystem initializations.
2321 Completion of driver initializations. Beginning of driver closings.
2325 Completion of driver closings.
2329 Files searched for; success of searches.
2333 Individual directories included in file searches.
2336 Each verbosity level also includes messages from lower verbosity levels.
2340 @node Language, Expressions, Invocation, Top
2341 @chapter The PSPP language
2342 @cindex language, PSPP
2343 @cindex PSPP, language
2346 @strong{Please note:} PSPP is not even close to completion.
2347 Only a few actual statistical procedures are implemented. PSPP
2348 is a work in progress.
2351 This chapter discusses elements common to many PSPP commands.
2352 Later chapters will describe individual commands in detail.
2355 * Tokens:: Characters combine to form tokens.
2356 * Commands:: Tokens combine to form commands.
2357 * Types of Commands:: Commands come in several flavors.
2358 * Order of Commands:: Commands combine to form syntax files.
2359 * Missing Observations:: Handling missing observations.
2360 * Variables:: The unit of data storage.
2361 * Files:: Files used by PSPP.
2362 * BNF:: How command syntax is described.
2365 @node Tokens, Commands, Language, Language
2367 @cindex language, lexical analysis
2368 @cindex language, tokens
2370 @cindex lexical analysis
2373 PSPP divides most syntax file lines into series of short chunks
2374 called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}. These
2375 tokens are then grouped to form commands, each of which tells
2376 PSPP to take some action---read in data, write out data, perform
2377 a statistical procedure, etc. The process of dividing input into tokens
2378 is @dfn{tokenization}, or @dfn{lexical analysis}. Each type of token is
2383 Tokens must be separated from each other by @dfn{delimiters}.
2384 Delimiters include whitespace (spaces, tabs, carriage returns, line
2385 feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
2386 operators (plus, minus, times, divide, etc.) Note that while whitespace
2387 only separates tokens, other delimiters are tokens in themselves.
2392 Identifiers are names that specify variable names, commands, or command
2397 The first character in an identifier must be a letter, @samp{#}, or
2398 @samp{@@}. Some system identifiers begin with @samp{$}, but
2399 user-defined variables' names may not begin with @samp{$}.
2402 The remaining characters in the identifier must be letters, digits, or
2403 one of the following special characters:
2410 @cindex variable names
2411 @cindex names, variable
2412 Variable names may be any length, but only the first 8 characters are
2416 @cindex case-sensitivity
2417 Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
2418 @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
2419 representations of the same identifier.
2423 Identifiers other than variable names may be abbreviated to their first
2424 3 characters if this abbreviation is unambiguous. These identifiers are
2425 often called @dfn{keywords}. (Unique abbreviations of 3 or more
2426 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
2427 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
2430 Whether an identifier is a keyword depends on the context.
2433 @cindex keywords, reserved
2434 @cindex reserved keywords
2435 Some keywords are reserved. These keywords may not be used in any
2436 context besides those explicitly described in this manual. The reserved
2440 ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH
2444 Since keywords are identifiers, all the rules for identifiers apply.
2445 Specifically, they must be delimited as are other identifiers:
2446 @code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
2452 @cindex variable names, ending with period
2453 @strong{Caution:} It is legal to end a variable name with a period, but
2454 @emph{don't do it!} The variable name will be misinterpreted when it is
2455 the final token on a line: @code{FOO.} will be divided into two separate
2456 tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
2457 @xref{Commands, , Forming commands of tokens}.
2463 Numbers may be specified as integers or reals. Integers are internally
2464 converted into reals. Scientific notation is not supported. Here are
2465 some examples of valid numbers:
2468 1234 3.14159265359 .707106781185 8945.
2471 @strong{Caution:} The last example will be interpreted as two tokens,
2472 @samp{8945} and @samp{.}, if it is the last token on a line.
2478 @cindex case-sensitivity
2479 Strings are literal sequences of characters enclosed in pairs of single
2480 quotes (@samp{'}) or double quotes (@samp{"}).
2484 Whitespace and case of letters @emph{are} significant inside strings.
2486 Whitespace characters inside a string are not delimiters.
2488 To include single-quote characters in a string, enclose the string in
2491 To include double-quote characters in a string, enclose the string in
2494 It is not possible to put both single- and double-quote characters
2500 Hexstrings are string variants that use hex digits to specify
2505 A hexstring may be used anywhere that an ordinary string is allowed.
2510 A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
2514 No whitespace is allowed between the initial @samp{X} and @samp{'}.
2517 Double quotes @samp{"} may be used in place of single quotes @samp{'} if
2518 done in both places.
2521 Each pair of hex digits is internally changed into a single character
2522 with the given value.
2525 If there is an odd number of hex digits, the missing last digit is
2526 assumed to be @samp{0}.
2530 @strong{Please note:} Use of hexstrings is nonportable because the same
2531 numeric values are associated with different glyphs by different
2532 operating systems. Therefore, their use should be confined to syntax
2533 files that will not be widely distributed.
2536 @cindex characters, reserved
2539 @strong{Please note also:} The character with value 00 is reserved for
2540 internal use by PSPP. Its use in strings causes an error and
2541 replacement with a blank space (in ASCII, hex 20, decimal 32).
2546 Punctuation separates tokens; punctuators are delimiters. These are the
2547 punctuation characters:
2555 Operators describe mathematical operations. Some operators are delimiters:
2561 Many of the above operators are also punctuators. Punctuators are
2562 distinguished from operators by context.
2564 The other operators are all reserved keywords. None of these are
2568 AND EQ GE GT LE LT NE OR
2572 @cindex terminal dot
2573 @cindex dot, terminal
2576 A period (@samp{.}) at the end of a line (except for whitespace) is one
2577 type of a @dfn{terminal dot}, although not every terminal dot is a
2578 period at the end of a line. @xref{Commands, , Forming commands of
2579 tokens}. A period is a terminal dot @emph{only}
2580 when it is at the end of a line; otherwise it is part of a
2581 floating-point number. (A period outside a number in the middle of a
2585 @cindex terminal dot, changing
2586 @cindex dot, terminal, changing
2587 @strong{Please note:} The character used for the @dfn{terminal dot}
2588 can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}). This
2589 is strongly discouraged, and throughout all the remainder of this
2590 manual it will be assumed that the default setting is in effect.
2595 @node Commands, Types of Commands, Tokens, Language
2596 @section Forming commands of tokens
2598 @cindex PSPP, command structure
2599 @cindex language, command structure
2600 @cindex commands, structure
2602 Most PSPP commands share a common structure, diagrammed below:
2605 @var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
2609 In the above, rather daunting, expression, pairs of square brackets
2610 (@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
2611 indicate parts of the syntax that vary from command to command.
2612 Ellipses (@samp{...}) indicate that the preceding part may be repeated
2613 an arbitrary number of times. Let's pick apart what it says above:
2616 @cindex commands, names
2618 A command begins with a command name of one or more keywords, such as
2619 @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}. @var{cmd}
2620 may be abbreviated to its first word if that is unambiguous; each word
2621 in @var{cmd} may be abbreviated to a unique prefix of three or more
2622 characters as described above.
2626 The command name may be followed by one or more @dfn{subcommands}:
2630 Each subcommand begins with a unique keyword, indicated by @var{sbc}
2631 above. This is analogous to the command name.
2634 The subcommand name is optionally followed by an equals sign (@samp{=}).
2637 Some subcommands accept a series of one or more specifications
2638 (@var{spec}), optionally separated by commas.
2641 Each subcommand must be separated from the next (if any) by a forward
2645 @cindex dot, terminal
2646 @cindex terminal dot
2648 Each command must be terminated with a @dfn{terminal dot}.
2649 The terminal dot may be given one of three ways:
2653 (most commonly) A period character at the very end of a line, as
2657 (only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
2658 details.) A completely blank line.
2661 (in batch mode only) Any line that is not indented from the left side of
2662 the page causes a terminal dot to be inserted before that line.
2663 Therefore, each command begins with a line that is flush left, followed
2664 by zero or more lines that are indented one or more characters from the
2667 In batch mode, PSPP will ignore a plus sign, minus sign, or period
2668 (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
2669 line. Any of these characters as the first character on a line will
2670 begin a new command. This allows for visual indentation of a command
2671 without that command being considered part of the previous command.
2673 PSPP is in batch mode when it is reading input from a file, rather
2674 than from an interactive user. Note that the other forms of the
2675 terminal dot may also be used in batch mode.
2677 Sometimes, one encounters syntax files that are intended to be
2678 interpreted in interactive mode rather than batch mode (for instance,
2679 this can happen if a session log file is used directly as a syntax
2680 file). When this occurs, use the @samp{-i} command line option to force
2681 interpretation in interactive mode (@pxref{Language control options}).
2685 PSPP ignores empty commands when they are generated by the above
2686 rules. Note that, as a consequence of these rules, each command must
2687 begin on a new line.
2689 @node Types of Commands, Order of Commands, Commands, Language
2690 @section Types of Commands
2692 Commands in PSPP are divided roughly into six categories:
2695 @item Utility commands
2696 @cindex utility commands
2697 Set or display various global options that affect PSPP operations.
2698 May appear anywhere in a syntax file. @xref{Utilities, , Utility
2701 @item File definition commands
2702 @cindex file definition commands
2703 Give instructions for reading data from text files or from special
2704 binary ``system files''. Most of these commands discard any previous
2705 data or variables to replace it with the new data and
2706 variables. At least one must appear before the first command in any of
2707 the categories below. @xref{Data Input and Output}.
2709 @item Input program commands
2710 @cindex input program commands
2711 Though rarely used, these provide powerful tools for reading data files
2712 in arbitrary textual or binary formats. @xref{INPUT PROGRAM}.
2714 @item Transformations
2715 @cindex transformations
2716 Perform operations on data and write data to output files. Transformations
2717 are not carried out until a procedure is executed.
2719 @item Restricted transformations
2720 @cindex restricted transformations
2721 Same as transformations for most purposes. @xref{Order of Commands}, for a
2722 detailed description of the differences.
2726 Analyze data, writing results of analyses to the listing file. Cause
2727 transformations specified earlier in the file to be performed. In a
2728 more general sense, a @dfn{procedure} is any command that causes the
2729 active file (the data) to be read.
2732 @node Order of Commands, Missing Observations, Types of Commands, Language
2733 @section Order of Commands
2734 @cindex commands, ordering
2735 @cindex order of commands
2737 PSPP does not place many restrictions on ordering of commands.
2738 The main restriction is that variables must be defined with one of the
2739 file-definition commands before they are otherwise referred to.
2741 Of course, there are specific rules, for those who are interested.
2742 PSPP possesses five internal states, called initial, INPUT PROGRAM,
2743 FILE TYPE, transformation, and procedure states. (Please note the
2744 distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
2745 @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
2747 PSPP starts up in the initial state. Each successful completion
2748 of a command may cause a state transition. Each type of command has its
2749 own rules for state transitions:
2752 @item Utility commands
2755 Legal in all states.
2757 Do not cause state transitions. Exception: when @cmd{N OF CASES}
2758 is executed in the procedure state, it causes a transition to the
2759 transformation state.
2762 @item @cmd{DATA LIST}
2765 Legal in all states.
2767 When executed in the initial or procedure state, causes a transition to
2768 the transformation state.
2770 Clears the active file if executed in the procedure or transformation
2774 @item @cmd{INPUT PROGRAM}
2777 Invalid in INPUT PROGRAM and FILE TYPE states.
2779 Causes a transition to the INPUT PROGRAM state.
2781 Clears the active file.
2784 @item @cmd{FILE TYPE}
2787 Invalid in INPUT PROGRAM and FILE TYPE states.
2789 Causes a transition to the FILE TYPE state.
2791 Clears the active file.
2794 @item Other file definition commands
2797 Invalid in INPUT PROGRAM and FILE TYPE states.
2799 Cause a transition to the transformation state.
2801 Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
2805 @item Transformations
2808 Invalid in initial and FILE TYPE states.
2810 Cause a transition to the transformation state.
2813 @item Restricted transformations
2816 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
2818 Cause a transition to the transformation state.
2824 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
2826 Cause a transition to the procedure state.
2830 @node Missing Observations, Variables, Order of Commands, Language
2831 @section Handling missing observations
2832 @cindex missing values
2833 @cindex values, missing
2835 PSPP includes special support for unknown numeric data values.
2836 Missing observations are assigned a special value, called the
2837 @dfn{system-missing value}. This ``value'' actually indicates the
2838 absence of value; it means that the actual value is unknown. Procedures
2839 automatically exclude from analyses those observations or cases that
2840 have missing values. Whether single observations or entire cases are
2841 excluded depends on the procedure.
2843 The system-missing value exists only for numeric variables. String
2844 variables always have a defined value, even if it is only a string of
2847 Variables, whether numeric or string, can have designated
2848 @dfn{user-missing values}. Every user-missing value is an actual value
2849 for that variable. However, most of the time user-missing values are
2850 treated in the same way as the system-missing value. String variables
2851 that are wider than a certain width, usually 8 characters (depending on
2852 computer architecture), cannot have user-missing values.
2854 For more information on missing values, see the following sections:
2855 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the
2856 documentation on individual procedures for information on how they
2857 handle missing values.
2859 @node Variables, Files, Missing Observations, Language
2864 Variables are the basic unit of data storage in PSPP. All the
2865 variables in a file taken together, apart from any associated data, are
2866 said to form a @dfn{dictionary}.
2867 Some details of variables are described in the sections below.
2870 * Attributes:: Attributes of variables.
2871 * System Variables:: Variables automatically defined by PSPP.
2872 * Sets of Variables:: Lists of variable names.
2873 * Input/Output Formats:: Input and output formats.
2874 * Scratch Variables:: Variables deleted by procedures.
2877 @node Attributes, System Variables, Variables, Variables
2878 @subsection Attributes of Variables
2879 @cindex variables, attributes of
2880 @cindex attributes of variables
2881 Each variable has a number of attributes, including:
2885 This is an identifier. Each variable must have a different name.
2888 @cindex variables, type
2889 @cindex type of variables
2893 @cindex variables, width
2894 @cindex width of variables
2896 (string variables only) String variables with a width of 8 characters or
2897 fewer are called @dfn{short string variables}. Short string variables
2898 can be used in many procedures where @dfn{long string variables} (those
2899 with widths greater than 8) are not allowed.
2902 @strong{Please note:} Certain systems may consider strings longer than 8
2903 characters to be short strings. Eight characters represents a minimum
2904 figure for the maximum length of a short string.
2908 Variables in the dictionary are arranged in a specific order.
2909 @cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.
2911 @item Initialization
2912 Either reinitialized to 0 or spaces for each case, or left at its
2913 existing value. @xref{LEAVE}.
2915 @cindex missing values
2916 @cindex values, missing
2917 @item Missing values
2918 Optionally, up to three values, or a range of values, or a specific
2919 value plus a range, can be specified as @dfn{user-missing values}.
2920 There is also a @dfn{system-missing value} that is assigned to an
2921 observation when there is no other obvious value for that observation.
2922 Observations with missing values are automatically excluded from
2923 analyses. User-missing values are actual data values, while the
2924 system-missing value is not a value at all. @xref{Missing Observations}.
2926 @cindex variable labels
2927 @cindex labels, variable
2928 @item Variable label
2929 A string that describes the variable. @xref{VARIABLE LABELS}.
2931 @cindex value labels
2932 @cindex labels, value
2934 Optionally, these associate each possible value of the variable with a
2935 string. @xref{VALUE LABELS}.
2937 @cindex print format
2939 Display width, format, and (for numeric variables) number of decimal
2940 places. This attribute does not affect how data are stored, just how
2941 they are displayed. Example: a width of 8, with 2 decimal places.
2942 @xref{PRINT FORMATS}.
2944 @cindex write format
2946 Similar to print format, but used by certain commands that are
2947 designed to write to binary files. @xref{WRITE FORMATS}.
2950 @node System Variables, Sets of Variables, Attributes, Variables
2951 @subsection Variables Automatically Defined by PSPP
2952 @cindex system variables
2953 @cindex variables, system
2955 There are seven system variables. These are not like ordinary
2956 variables, as they are not stored in each case. They can only be used
2957 in expressions. These system variables, whose values and output formats
2958 cannot be modified, are described below.
2961 @cindex @code{$CASENUM}
2963 Case number of the case at the moment. This changes as cases are
2966 @cindex @code{$DATE}
2968 Date the PSPP process was started, in format A9, following the
2969 pattern @code{DD MMM YY}.
2971 @cindex @code{$JDATE}
2973 Number of days between 15 Oct 1582 and the time the PSPP process
2976 @cindex @code{$LENGTH}
2978 Page length, in lines, in format F11.
2980 @cindex @code{$SYSMIS}
2982 System missing value, in format F1.
2984 @cindex @code{$TIME}
2986 Number of seconds between midnight 14 Oct 1582 and the time the active file
2987 was read, in format F20.
2989 @cindex @code{$WIDTH}
2991 Page width, in characters, in format F3.
2994 @node Sets of Variables, Input/Output Formats, System Variables, Variables
2995 @subsection Lists of variable names
2996 @cindex TO convention
2997 @cindex convention, TO
2999 There are several ways to specify a set of variables:
3003 (Most commonly.) List the variable names one after another, optionally
3004 separating them by commas.
3008 (This method cannot be used on commands that define the dictionary, such
3009 as @cmd{DATA LIST}.) The syntax is the names of two existing variables,
3010 separated by the reserved keyword @code{TO}. The meaning is to include
3011 every variable in the dictionary between and including the variables
3012 specified. For instance, if the dictionary contains six variables with
3013 the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
3014 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
3015 variables @code{X2}, @code{GOAL}, and @code{MET}.
3018 (This method can be used only on commands that define the dictionary,
3019 such as @cmd{DATA LIST}.) It is used to define sequences of variables
3020 that end in consecutive integers. The syntax is two identifiers that
3021 end in numbers. This method is best illustrated with examples:
3025 The syntax @code{X1 TO X5} defines 5 variables:
3041 The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
3059 Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
3060 are invalid, although for different reasons, which should be evident.
3063 Note that after a set of variables has been defined with @cmd{DATA LIST}
3064 or another command with this method, the same set can be referenced on
3065 later commands using the same syntax.
3068 The above methods can be combined, either one after another or delimited
3069 by commas. For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
3070 is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
3071 is individually legal.
3074 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
3075 @subsection Input and Output Formats
3077 Data that PSPP inputs and outputs must have one of a number of formats.
3078 These formats are described, in general, by a format specification of
3079 the form @code{NAMEw.d}, where @var{name} is the
3080 format name and @var{w} is a field width. @var{d} is the optional
3081 desired number of decimal places, if appropriate. If @var{d} is not
3082 included then it is assumed to be 0. Some formats do not allow @var{d}
3085 When an input format is specified on @cmd{DATA LIST} or another
3087 it is converted to an output format for the purposes of @cmd{PRINT}
3089 data output commands. For most purposes, input and output formats are
3090 the same; the salient differences are described below.
3092 Below are listed the input and output formats supported by PSPP. If an
3093 input format is mapped to a different output format by default, then
3094 that mapping is indicated with @result{}. Each format has the listed
3095 bounds on input width (iw) and output width (ow).
3097 The standard numeric input and output formats are given in the following
3101 @item Fw.d: 1 <= iw,ow <= 40
3102 Standard decimal format with @var{d} decimal places. If the number is
3103 too large to fit within the field width, it is expressed in scientific
3104 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
3105 the exponent. When used as an input format, scientific notation is
3106 allowed but an E or an F must be used to introduce the exponent.
3108 The default output format is the same as the input format, except if
3109 @var{d} > 1. In that case the output @var{w} is always made to be at
3112 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
3113 For input this is equivalent to F format except that no E or F is
3114 require to introduce the exponent. For output, produces scientific
3115 notation in the form @code{1.2+34}. There are always at least two
3116 digits given in the exponent.
3118 The default output @var{w} is the largest of the input @var{w}, the
3119 input @var{d} + 7, and 10. The default output @var{d} is the input
3120 @var{d}, but at least 3.
3122 @item COMMAw.d: 1 <= iw,ow <= 40
3123 Equivalent to F format, except that groups of three digits are
3124 comma-separated on output. If the number is too large to express in the
3125 field width, then first commas are eliminated, then if there is still
3126 not enough space the number is expressed in scientific notation given
3127 that w >= 6. Commas are allowed and ignored when this is used as an
3130 @item DOTw.d: 1 <= iw,ow <= 40
3131 Equivalent to COMMA format except that the roles of comma and decimal
3132 point are interchanged. However: If SET /DECIMAL=DOT is in effect, then
3133 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
3136 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
3137 Equivalent to COMMA format, except that the number is prefixed by a
3138 dollar sign (@samp{$}) if there is room. On input the value is allowed
3139 to be prefixed by a dollar sign, which is ignored.
3141 The default output @var{w} is the input @var{w}, but at least 2.
3143 @item PCTw.d: 2 <= iw,ow <= 40
3144 Equivalent to F format, except that the number is suffixed by a percent
3145 sign (@samp{%}) if there is room. On input the value is allowed to be
3146 suffixed by a percent sign, which is ignored.
3148 The default output @var{w} is the input @var{w}, but at least 2.
3150 @item Nw.d: 1 <= iw,ow <= 40
3151 Only digits are allowed within the field width. The decimal point is
3152 assumed to be @var{d} digits from the right margin.
3154 The default output format is F with the same @var{w} and @var{d}, except
3155 if @var{d} > 1. In that case the output @var{w} is always made to be at
3158 @item Zw.d @result{} F: 1 <= iw,ow <= 40
3159 Zoned decimal input. If you need to use this then you know how.
3161 @item IBw.d @result{} F: 1 <= iw,ow <= 8
3162 Integer binary format. The field is interpreted as a fixed-point
3163 positive or negative binary number in two's-complement notation. The
3164 location of the decimal point is implied. Endianness is the same as the
3167 The default output format is F8.2 if @var{d} is 0. Otherwise it is F,
3168 with output @var{w} as 9 + input @var{d} and output @var{d} as input
3171 @item PIB @result{} F: 1 <= iw,ow <= 8
3172 Positive integer binary format. The field is interpreted as a
3173 fixed-point positive binary number. The location of the decimal point
3174 is implied. Endianness is teh same as the host machine.
3176 The default output format follows the rules for IB format.
3178 @item Pw.d @result{} F: 1 <= iw,ow <= 16
3179 Binary coded decimal format. Each byte from left to right, except the
3180 rightmost, represents two digits. The upper nibble of each byte is more
3181 significant. The upper nibble of the final byte is the least
3182 significant digit. The lower nibble of the final byte is the sign; a
3183 value of D represents a negative sign and all other values are
3184 considered positive. The decimal point is implied.
3186 The default output format follows the rules for IB format.
3188 @item PKw.d @result{} F: 1 <= iw,ow <= 16
3189 Positive binary code decimal format. Same as P but the last byte is the
3192 The default output format follows the rules for IB format.
3194 @item RBw @result{} F: 2 <= iw,ow <= 8
3196 Binary C architecture-dependent ``double'' format. For a standard
3197 IEEE754 implementation @var{w} should be 8.
3199 The default output format follows the rules for IB format.
3201 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
3202 PIB format encoded as textual hex digit pairs. @var{w} must be even.
3204 The input width is mapped to a default output width as follows:
3205 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
3206 12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for
3209 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
3211 RB format encoded as textual hex digits pairs. @var{w} must be even.
3213 The default output format is F8.2.
3215 @item CCAw.d: 1 <= ow <= 40
3216 @itemx CCBw.d: 1 <= ow <= 40
3217 @itemx CCCw.d: 1 <= ow <= 40
3218 @itemx CCDw.d: 1 <= ow <= 40
3219 @itemx CCEw.d: 1 <= ow <= 40
3221 User-defined custom currency formats. May not be used as an input
3222 format. @xref{SET}, for more details.
3225 The date and time numeric input and output formats accept a number of
3226 possible formats. Before describing the formats themselves, some
3227 definitions of the elements that make up their formats will be helpful:
3231 All formats accept an optional whitespace leader.
3234 An integer between 1 and 31 representing the day of month.
3237 An integer representing a number of days.
3239 @item date-delimiter
3240 One or more characters of whitespace or the following characters:
3244 A month name in one of the following forms:
3247 An integer between 1 and 12.
3249 Roman numerals representing an integer between 1 and 12.
3251 At least the first three characters of an English month name (January,
3256 An integer year number between 1582 and 19999, or between 1 and 199.
3257 Years between 1 and 199 will have 1900 added.
3260 A single number with a year number in the first 2, 3, or 4 digits (as
3261 above) and the day number within the year in the last 3 digits.
3264 An integer between 1 and 4 representing a quarter.
3267 The letter @samp{Q} or @samp{q}.
3270 An integer between 1 and 53 representing a week within a year.
3273 The letters @samp{wk} in any case.
3275 @item time-delimiter
3276 At least one characters of whitespace or @samp{:} or @samp{.}.
3279 An integer greater than 0 representing an hour.
3282 An integer between 0 and 59 representing a minute within an hour.
3285 Optionally, a time-delimiter followed by a real number representing a
3289 An integer between 0 and 23 representing an hour within a day.
3292 At least the first two characters of an English day word.
3295 Any amount or no amount of whitespace.
3298 An optional positive or negative sign.
3301 All formats accept an optional whitespace trailer.
3304 The date input formats are strung together from the above pieces. On
3305 output, the date formats are always printed in a single canonical
3306 manner, based on field width. The date input and output formats are
3310 @item DATEw: 9 <= iw,ow <= 40
3311 Date format. Input format: leader + day + date-delimiter +
3312 month + date-delimiter + year + trailer. Output format: DD-MMM-YY for
3313 @var{w} < 11, DD-MMM-YYYY otherwise.
3315 @item EDATEw: 8 <= iw,ow <= 40
3316 European date format. Input format same as DATE. Output format:
3317 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
3319 @item SDATEw: 8 <= iw,ow <= 40
3320 Standard date format. Input format: leader + year + date-delimiter +
3321 month + date-delimiter + day + trailer. Output format: YY/MM/DD for
3322 @var{w} < 10, YYYY/MM/DD otherwise.
3324 @item ADATEw: 8 <= iw,ow <= 40
3325 American date format. Input format: leader + month + date-delimiter +
3326 day + date-delimiter + year + trailer. Output format: MM/DD/YY for
3327 @var{w} < 10, MM/DD/YYYY otherwise.
3329 @item JDATEw: 5 <= iw,ow <= 40
3330 Julian date format. Input format: leader + julian + trailer. Output
3331 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
3333 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
3334 Quarter/year format. Input format: leader + quarter + q-delimiter +
3335 year + trailer. Output format: @samp{Q Q YY}, where the first
3336 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
3339 @item MOYRw: 6 <= iw,ow <= 40
3340 Month/year format. Input format: leader + month + date-delimiter + year
3341 + trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
3344 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
3345 Week/year format. Input format: leader + week + wk-delimiter + year +
3346 trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
3349 @item DATETIMEw.d: 17 <= iw,ow <= 40
3350 Date and time format. Input format: leader + day + date-delimiter +
3351 month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
3352 + minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If
3353 @var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and
3354 @var{d} > 0 then fractional seconds @samp{.SS} are added.
3356 @item TIMEw.d: 5 <= iw,ow <= 40
3357 Time format. Input format: leader + sign + spaces + hour +
3358 time-delimiter + minute + opt-second. Output format: @samp{HH:MM}.
3359 Seconds and fractional seconds are available with @var{w} of at least 8
3360 and 10, respectively.
3362 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
3363 Time format with day count. Input format: leader + sign + spaces +
3364 day-count + time-delimiter + hour + time-delimiter + minute +
3365 opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional
3366 seconds are available with @var{w} of at least 8 and 10, respectively.
3368 @item WKDAYw: 2 <= iw,ow <= 40
3369 A weekday as a number between 1 and 7, where 1 is Sunday. Input format:
3370 leader + weekday + trailer. Output format: as many characters, in all
3371 capital letters, of the English name of the weekday as will fit in the
3374 @item MONTHw: 3 <= iw,ow <= 40
3375 A month as a number between 1 and 12, where 1 is January. Input format:
3376 leader + month + trailer. Output format: as many character, in all
3377 capital letters, of the English name of the month as will fit in the
3381 There are only two formats that may be used with string variables:
3384 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
3385 The entire field is treated as a string value.
3387 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
3388 The field is composed of characters in a string encoded as textual hex
3391 The default output @var{w} is half the input @var{w}.
3394 @node Scratch Variables, , Input/Output Formats, Variables
3395 @subsection Scratch Variables
3397 Most of the time, variables don't retain their values between cases.
3398 Instead, either they're being read from a data file or the active file,
3399 in which case they assume the value read, or, if created with
3401 another transformation, they're initialized to the system-missing value
3402 or to blanks, depending on type.
3404 However, sometimes it's useful to have a variable that keeps its value
3405 between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
3406 use a @dfn{scratch variable}. Scratch variables are variables whose
3407 names begin with an octothorpe (@samp{#}).
3409 Scratch variables have the same properties as variables left with
3411 they retain their values between cases, and for the first case they are
3412 initialized to 0 or blanks. They have the additional property that they
3413 are deleted before the execution of any procedure. For this reason,
3414 scratch variables can't be used for analysis. To obtain the same
3415 effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
3416 value into an ordinary variable, then analysis that variable.
3418 @node Files, BNF, Variables, Language
3419 @section Files Used by PSPP
3421 PSPP makes use of many files each time it runs. Some of these it
3422 reads, some it writes, some it creates. Here is a table listing the
3423 most important of these files:
3426 @cindex file, command
3427 @cindex file, syntax file
3428 @cindex command file
3432 These names (synonyms) refer to the file that contains instructions to
3433 PSPP that tell it what to do. The syntax file's name is specified on
3434 the PSPP command line. Syntax files can also be pulled in with
3435 @cmd{INCLUDE} (@pxref{INCLUDE}).
3440 Data files contain raw data in ASCII format suitable for being read in
3441 by @cmd{DATA LIST}. Data can be embedded in the syntax
3442 file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
3443 syntax file a data file too.
3445 @cindex file, output
3448 One or more output files are created by PSPP each time it is
3449 run. The output files receive the tables and charts produced by
3450 statistical procedures. The output files may be in any number of formats,
3451 depending on how PSPP is configured.
3454 @cindex file, active
3456 The active file is the ``file'' on which all PSPP procedures
3457 are performed. The active file contains variable definitions and
3458 cases. The active file is not necessarily a disk file: it is stored
3459 in memory if there is room.
3462 @node BNF, , Files, Language
3463 @section Backus-Naur Form
3465 @cindex Backus-Naur Form
3466 @cindex command syntax, description of
3467 @cindex description of command syntax
3469 The syntax of some parts of the PSPP language is presented in this
3470 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
3471 following table describes BNF:
3477 Words in all-uppercase are PSPP keyword tokens. In BNF, these are
3478 often called @dfn{terminals}. There are some special terminals, which
3479 are actually written in lowercase for clarity:
3482 @cindex @code{number}
3486 @cindex @code{integer}
3487 @item @code{integer}
3490 @cindex @code{string}
3494 @cindex @code{var-name}
3495 @item @code{var-name}
3496 A single variable name.
3500 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
3501 Operators and punctuators.
3504 @cindex terminal dot
3505 @cindex dot, terminal
3507 The terminal dot. This is not necessarily an actual dot in the syntax
3508 file: @xref{Commands}, for more details.
3513 @cindex nonterminals
3514 Other words in all lowercase refer to BNF definitions, called
3515 @dfn{productions}. These productions are also known as
3516 @dfn{nonterminals}. Some nonterminals are very common, so they are
3517 defined here in English for clarity:
3520 @cindex @code{var-list}
3522 A list of one or more variable names or the keyword @code{ALL}.
3524 @cindex @code{expression}
3526 An expression. @xref{Expressions}, for details.
3531 @cindex ``is defined as''
3533 @samp{::=} means ``is defined as''. The left side of @samp{::=} gives
3534 the name of the nonterminal being defined. The right side of @samp{::=}
3535 gives the definition of that nonterminal. If the right side is empty,
3536 then one possible expansion of that nonterminal is nothing. A BNF
3537 definition is called a @dfn{production}.
3540 @cindex terminals and nonterminals, differences
3541 So, the key difference between a terminal and a nonterminal is that a
3542 terminal cannot be broken into smaller parts---in fact, every terminal
3543 is a single token (@pxref{Tokens}). On the other hand, nonterminals are
3544 composed of a (possibly empty) sequence of terminals and nonterminals.
3545 Thus, terminals indicate the deepest level of syntax description. (In
3546 parsing theory, terminals are the leaves of the parse tree; nonterminals
3550 @cindex start symbol
3551 @cindex symbol, start
3552 The first nonterminal defined in a set of productions is called the
3553 @dfn{start symbol}. The start symbol defines the entire syntax for
3557 @node Expressions, Data Input and Output, Language, Top
3558 @chapter Mathematical Expressions
3559 @cindex expressions, mathematical
3560 @cindex mathematical expressions
3562 Some PSPP commands use expressions, which share a common syntax
3563 among all PSPP commands. Expressions are made up of
3564 @dfn{operands}, which can be numbers, strings, or variable names,
3565 separated by @dfn{operators}. There are five types of operators:
3566 grouping, arithmetic, logical, relational, and functions.
3568 Every operator takes one or more @dfn{arguments} as input and produces
3569 or @dfn{returns} exactly one result as output. Both strings and numeric
3570 values can be used as arguments and are produced as results, but each
3571 operator accepts only specific combinations of numeric and string values
3572 as arguments. With few exceptions, operator arguments may be
3573 full-fledged expressions in themselves.
3576 * Boolean Values:: Boolean values.
3577 * Missing Values in Expressions:: Using missing values in expressions.
3578 * Grouping Operators:: ( )
3579 * Arithmetic Operators:: + - * / **
3580 * Logical Operators:: AND NOT OR
3581 * Relational Operators:: EQ GE GT LE LT NE
3582 * Functions:: More-sophisticated operators.
3583 * Order of Operations:: Operator precedence.
3586 @node Boolean Values, Missing Values in Expressions, Expressions, Expressions
3587 @section Boolean Values
3589 @cindex values, Boolean
3591 Some PSPP operators and expressions work with Boolean values, which
3592 represent true/false conditions. Booleans have only three possible
3593 values: 0 (false), 1 (true), and system-missing (unknown).
3594 System-missing is neither true nor false and indicates that the true
3597 Boolean-typed operands or function arguments must take on one of these
3598 three values. Other values are considered false, but cause an error
3599 when the expression is evaluated.
3601 Strings and Booleans are not compatible, and neither may be used in
3604 @node Missing Values in Expressions, Grouping Operators, Boolean Values, Expressions
3605 @section Missing Values in Expressions
3607 String missing values are not treated specially in expressions. Most
3608 numeric operators return system-missing when given system-missing
3609 arguments. Exceptions are listed under particular operator
3612 User-missing values for numeric variables are always transformed into
3613 the system-missing value, except inside the arguments to the
3614 @code{VALUE} and @code{SYSMIS} functions.
3616 The missing-value functions can be used to precisely control how missing
3617 values are treated in expressions. @xref{Missing Value Functions}, for
3620 @node Grouping Operators, Arithmetic Operators, Missing Values in Expressions, Expressions
3621 @section Grouping Operators
3624 @cindex grouping operators
3625 @cindex operators, grouping
3627 Parentheses (@samp{()}) are the grouping operators. Surround an
3628 expression with parentheses to force early evaluation.
3630 Parentheses also surround the arguments to functions, but in that
3631 situation they act as punctuators, not as operators.
3633 @node Arithmetic Operators, Logical Operators, Grouping Operators, Expressions
3634 @section Arithmetic Operators
3635 @cindex operators, arithmetic
3636 @cindex arithmetic operators
3638 The arithmetic operators take numeric arguments and produce numeric
3644 @item @var{a} + @var{b}
3645 Adds @var{a} and @var{b}, returning the sum.
3649 @item @var{a} - @var{b}
3650 Subtracts @var{b} from @var{a}, returning the difference.
3653 @cindex multiplication
3654 @item @var{a} * @var{b}
3655 Multiplies @var{a} and @var{b}, returning the product.
3659 @item @var{a} / @var{b}
3660 Divides @var{a} by @var{b}, returning the quotient. If @var{b} is
3661 zero, the result is system-missing.
3664 @cindex exponentiation
3665 @item @var{a} ** @var{b}
3666 Returns the result of raising @var{a} to the power @var{b}. If
3667 @var{a} is negative and @var{b} is not an integer, the result is
3668 system-missing. The result of @code{0**0} is system-missing as well.
3673 Reverses the sign of @var{a}.
3676 @node Logical Operators, Relational Operators, Arithmetic Operators, Expressions
3677 @section Logical Operators
3678 @cindex logical operators
3679 @cindex operators, logical
3684 @cindex values, system-missing
3685 @cindex system-missing
3686 The logical operators take logical arguments and produce logical
3687 results, meaning ``true or false''. PSPP logical operators are
3688 not true Boolean operators because they may also result in a
3689 system-missing value.
3694 @cindex intersection, logical
3695 @cindex logical intersection
3696 @item @var{a} AND @var{b}
3697 @itemx @var{a} & @var{b}
3698 True if both @var{a} and @var{b} are true, false otherwise. If one
3699 argument is false, the result is false even if the other is missing. If
3700 both arguments are missing, the result is missing.
3704 @cindex union, logical
3705 @cindex logical union
3706 @item @var{a} OR @var{b}
3707 @itemx @var{a} | @var{b}
3708 True if at least one of @var{a} and @var{b} is true. If one argument is
3709 true, the result is true even if the other argument is missing. If both
3710 arguments are missing, the result is missing.
3714 @cindex inversion, logical
3715 @cindex logical inversion
3718 True if @var{a} is false. If the argument is missing, then the result
3722 @node Relational Operators, Functions, Logical Operators, Expressions
3723 @section Relational Operators
3725 The relational operators take numeric or string arguments and produce Boolean
3728 Strings cannot be compared to numbers. When strings of different
3729 lengths are compared, the shorter string is right-padded with spaces
3730 to match the length of the longer string.
3732 The results of string comparisons, other than tests for equality or
3733 inequality, are dependent on the character set in use. String
3734 comparisons are case-sensitive.
3737 @cindex equality, testing
3738 @cindex testing for equality
3741 @item @var{a} EQ @var{b}
3742 @itemx @var{a} = @var{b}
3743 True if @var{a} is equal to @var{b}.
3745 @cindex less than or equal to
3748 @item @var{a} LE @var{b}
3749 @itemx @var{a} <= @var{b}
3750 True if @var{a} is less than or equal to @var{b}.
3755 @item @var{a} LT @var{b}
3756 @itemx @var{a} < @var{b}
3757 True if @var{a} is less than @var{b}.
3759 @cindex greater than or equal to
3762 @item @var{a} GE @var{b}
3763 @itemx @var{a} >= @var{b}
3764 True if @var{a} is greater than or equal to @var{b}.
3766 @cindex greater than
3769 @item @var{a} GT @var{b}
3770 @itemx @var{a} > @var{b}
3771 True if @var{a} is greater than @var{b}.
3773 @cindex inequality, testing
3774 @cindex testing for inequality
3778 @item @var{a} NE @var{b}
3779 @itemx @var{a} ~= @var{b}
3780 @itemx @var{a} <> @var{b}
3781 True is @var{a} is not equal to @var{b}.
3784 @node Functions, Order of Operations, Relational Operators, Expressions
3793 @cindex names, of functions
3794 PSPP functions provide mathematical abilities above and beyond
3795 those possible using simple operators. Functions have a common
3796 syntax: each is composed of a function name followed by a left
3797 parenthesis, one or more arguments, and a right parenthesis. Function
3798 names are @strong{not} reserved; their names are specially treated
3799 only when followed by a left parenthesis: @code{EXP(10)} refers to the
3800 constant value @code{e} raised to the 10th power, but @code{EXP} by
3801 itself refers to the value of variable EXP.
3803 The sections below describe each function in detail.
3806 * Advanced Mathematics:: EXP LG10 LN SQRT
3807 * Miscellaneous Mathematics:: ABS MOD MOD10 RND TRUNC
3808 * Trigonometry:: ACOS ARCOS ARSIN ARTAN ASIN ATAN COS SIN TAN
3809 * Missing Value Functions:: MISSING NMISS NVALID SYSMIS VALUE
3810 * Pseudo-Random Numbers:: NORMAL UNIFORM
3811 * Set Membership:: ANY RANGE
3812 * Statistical Functions:: CFVAR MAX MEAN MIN SD SUM VARIANCE
3813 * String Functions:: CONCAT INDEX LENGTH LOWER LPAD LTRIM NUMBER
3814 RINDEX RPAD RTRIM STRING SUBSTR UPCASE
3815 * Time & Date:: CTIME.xxx DATE.xxx TIME.xxx XDATE.xxx
3816 * Miscellaneous Functions:: LAG YRMODA
3817 * Functions Not Implemented:: CDF.xxx CDFNORM IDF.xxx NCDF.xxx PROBIT RV.xxx
3820 @node Advanced Mathematics, Miscellaneous Mathematics, Functions, Functions
3821 @subsection Advanced Mathematical Functions
3822 @cindex mathematics, advanced
3824 Advanced mathematical functions take numeric arguments and produce
3827 @deftypefn {Function} {} EXP (@var{exponent})
3828 Returns @i{e} (approximately 2.71828) raised to power @var{exponent}.
3832 @deftypefn {Function} {} LG10 (@var{number})
3833 Takes the base-10 logarithm of @var{number}. If @var{number} is
3834 not positive, the result is system-missing.
3837 @deftypefn {Function} {} LN (@var{number})
3838 Takes the base-@i{e} logarithm of @var{number}. If @var{number} is
3839 not positive, the result is system-missing.
3842 @cindex square roots
3843 @deftypefn {Function} {} SQRT (@var{number})
3844 Takes the square root of @var{number}. If @var{number} is negative,
3845 the result is system-missing.
3848 @node Miscellaneous Mathematics, Trigonometry, Advanced Mathematics, Functions
3849 @subsection Miscellaneous Mathematical Functions
3850 @cindex mathematics, miscellaneous
3852 Miscellaneous mathematical functions take numeric arguments and produce
3855 @cindex absolute value
3856 @deftypefn {Function} {} ABS (@var{number})
3857 Results in the absolute value of @var{number}.
3861 @deftypefn {Function} {} MOD (@var{numerator}, @var{denominator})
3862 Returns the remainder (modulus) of @var{numerator} divided by
3863 @var{denominator}. If @var{denominator} is 0, the result is
3864 system-missing. However, if @var{numerator} is 0 and
3865 @var{denominator} is system-missing, the result is 0.
3868 @cindex modulus, by 10
3869 @deftypefn {Function} {} MOD10 (@var{number})
3870 Returns the remainder when @var{number} is divided by 10. If
3871 @var{number} is negative, MOD10(@var{number}) is negative or zero.
3875 @deftypefn {Function} {} RND (@var{number})
3876 Takes the absolute value of @var{number} and rounds it to an integer.
3877 Then, if @var{number} was negative originally, negates the result.
3881 @deftypefn {Function} {} TRUNC (@var{number})
3882 Discards the fractional part of @var{number}; that is, rounds
3883 @var{number} towards zero.
3886 @node Trigonometry, Missing Value Functions, Miscellaneous Mathematics, Functions
3887 @subsection Trigonometric Functions
3888 @cindex trigonometry
3890 Trigonometric functions take numeric arguments and produce numeric
3894 @cindex inverse cosine
3895 @deftypefn {Function} {} ARCOS (@var{number})
3896 Takes the arccosine, in radians, of @var{number}. Results in
3897 system-missing if @var{number} is not between -1 and 1.
3901 @cindex inverse sine
3902 @deftypefn {Function} {} ARSIN (@var{number})
3903 Takes the arcsine, in radians, of @var{number}. Results in
3904 system-missing if @var{number} is not between -1 and 1 inclusive.
3908 @cindex inverse tangent
3909 @deftypefn {Function} {} ARTAN (@var{number})
3910 Takes the arctangent, in radians, of @var{number}.
3914 @deftypefn {Function} {} COS (@var{angle})
3915 Takes the cosine of @var{angle} which should be in radians.
3919 @deftypefn {Function} {} SIN (@var{angle})
3920 Takes the sine of @var{angle} which should be in radians.
3924 @deftypefn {Function} {} TAN (@var{angle})
3925 Takes the tangent of @var{angle} which should be in radians.
3926 Results in system-missing at values
3927 of @var{angle} that are too close to odd multiples of pi/2.
3931 @node Missing Value Functions, Pseudo-Random Numbers, Trigonometry, Functions
3932 @subsection Missing-Value Functions
3933 @cindex missing values
3934 @cindex values, missing
3935 @cindex functions, missing-value
3937 Missing-value functions take various numeric arguments and yield
3938 various types of results. Note that the normal rules of evaluation
3939 apply within expression arguments to these functions. In particular,
3940 user-missing values for numeric variables are converted to
3941 system-missing values.
3943 @deftypefn {Function} {} MISSING (@var{expr})
3944 Returns 1 if @var{expr} has the system-missing value, 0 otherwise.
3947 @deftypefn {Function} {} NMISS (@var{expr} [, @var{expr}]@dots{})
3948 Each argument must be a numeric expression. Returns the number of
3949 system-missing values in the list. As a special extension,
3950 the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a
3951 range of variables; see @ref{Sets of Variables}, for more details.
3954 @deftypefn {Function} {} NVALID (@var{expr} [, @var{expr}]@dots{})
3955 Each argument must be a numeric expression. Returns the number of
3956 values in the list that are not system-missing. As a special extension,
3957 the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a
3958 range of variables; see @ref{Sets of Variables}, for more details.
3961 @deftypefn {Function} {} SYSMIS (@var{expr})
3962 When @var{expr} is simply the name of a numeric variable, returns 1 if
3963 the variable has the system-missing value, 0 if it is user-missing or
3964 not missing. If given @var{expr} takes another form, results in 1 if
3965 the value is system-missing, 0 otherwise.
3968 @deftypefn {Function} {} VALUE (@var{variable})
3969 Prevents the user-missing values of @var{variable} from being
3970 transformed into system-missing values, and always results in the
3971 actual value of @var{variable}, whether it is user-missing,
3972 system-missing or not missing at all.
3975 @node Pseudo-Random Numbers, Set Membership, Missing Value Functions, Functions
3976 @subsection Pseudo-Random Number Generation Functions
3977 @cindex random numbers
3978 @cindex pseudo-random numbers (see random numbers)
3980 Pseudo-random number generation functions take numeric arguments and
3981 produce numeric results.
3983 PSPP uses the alleged RC4 cipher as a pseudo-random number generator
3984 (PRNG). The bytes output by this PRNG are system-independent for a
3985 given random seed, but differences in endianness and floating-point
3986 formats will make PRNG results differ from system to system. RC4
3987 should produce high-quality random numbers for simulation purposes.
3988 (If you're concerned about the quality of the random number generator,
3989 well, you're using a statistical processing package---analyze it!)
3991 PSPP's implementation of RC4 has not undergone any security auditing.
3992 Furthermore, various precautions that would be necessary for secure
3993 operation, such as secure seeding and discarding the first several
3994 bytes of output, have not been taken. Therefore, PSPP's
3995 implementation of RC4 should not be used for security purposes.
3997 @cindex random numbers, normally-distributed
3998 @deftypefn {Function} {} NORMAL (@var{number})
3999 Results in a random number. Results from @code{NORMAL} are normally
4000 distributed with a mean of 0 and a standard deviation of @var{number}.
4003 @cindex random numbers, uniformly-distributed
4004 @deftypefn {Function} {} UNIFORM (@var{number})
4005 Results in a random number between 0 and @var{number}. Results from
4006 @code{UNIFORM} are evenly distributed across its entire range. There
4007 may be a maximum on the largest random number ever generated---this is
4015 (2,147,483,647), but it may be orders of magnitude
4019 @node Set Membership, Statistical Functions, Pseudo-Random Numbers, Functions
4020 @subsection Set-Membership Functions
4021 @cindex set membership
4022 @cindex membership, of set
4024 Set membership functions determine whether a value is a member of a set.
4025 They take a set of numeric arguments or a set of string arguments, and
4026 produce Boolean results.
4028 String comparisons are performed according to the rules given in
4029 @ref{Relational Operators}.
4031 @deftypefn {Function} {} ANY (@var{value}, @var{set} [, @var{set}]@dots{})
4032 Results in true if @var{value} is equal to any of the @var{set}
4033 values. Otherwise, results in false. If @var{value} is
4034 system-missing, returns system-missing. System-missing values in
4035 @var{set} do not cause ANY to return system-missing.
4038 @deftypefn {Function} {} RANGE (@var{value}, @var{low}, @var{high} [, @var{low}, @var{high}]@dots{})
4039 Results in true if @var{value} is in any of the intervals bounded by
4040 @var{low} and @var{high} inclusive. Otherwise, results in false.
4041 Each @var{low} must be less than or equal to its corresponding
4042 @var{high} value. @var{low} and @var{high} must be given in pairs.
4043 If @var{value} is system-missing, returns system-missing.
4044 System-missing values in @var{set} do not cause RANGE to return
4048 @node Statistical Functions, String Functions, Set Membership, Functions
4049 @subsection Statistical Functions
4050 @cindex functions, statistical
4053 Statistical functions compute descriptive statistics on a list of
4054 values. Some statistics can be computed on numeric or string values;
4055 other can only be computed on numeric values. Their results have the
4056 same type as their arguments. The current case's weighting factor
4057 (@pxref{WEIGHT}) has no effect on statistical functions.
4059 @cindex arguments, minimum valid
4060 @cindex minimum valid number of arguments
4061 With statistical functions it is possible to specify a minimum number of
4062 non-missing arguments for the function to be evaluated. To do so,
4063 append a dot and the number to the function name. For instance, to
4064 specify a minimum of three valid arguments to the MEAN function, use the
4067 @cindex coefficient of variation
4068 @cindex variation, coefficient of
4069 @deftypefn {Function} {} CFVAR (@var{number}, @var{number}[, @dots{}])
4070 Results in the coefficient of variation of the values of @var{number}.
4071 This function requires at least two valid arguments to give a
4072 non-missing result. (The coefficient of variation is the standard
4073 deviation divided by the mean.)
4077 @deftypefn {Function} {} MAX (@var{value}, @var{value}[, @dots{}])
4078 Results in the value of the greatest @var{value}. The @var{value}s may
4079 be numeric or string. Although at least two arguments must be given,
4080 only one need be valid for MAX to give a non-missing result.
4084 @deftypefn {Function} {} MEAN (@var{number}, @var{number}[, @dots{}])
4085 Results in the mean of the values of @var{number}. Although at least
4086 two arguments must be given, only one need be valid for MEAN to give a
4091 @deftypefn {Function} {} MIN (@var{number}, @var{number}[, @dots{}])
4092 Results in the value of the least @var{value}. The @var{value}s may
4093 be numeric or string. Although at least two arguments must be given,
4094 only one need be valid for MAX to give a non-missing result.
4097 @cindex standard deviation
4098 @cindex deviation, standard
4099 @deftypefn {Function} {} SD (@var{number}, @var{number}[, @dots{}])
4100 Results in the standard deviation of the values of @var{number}.
4101 This function requires at least two valid arguments to give a
4106 @deftypefn {Function} {} SUM (@var{number}, @var{number}[, @dots{}])
4107 Results in the sum of the values of @var{number}. Although at least two
4108 arguments must be given, only one need by valid for SUM to give a
4113 @deftypefn {Function} {} VARIANCE (@var{number}, @var{number}[, @dots{}])
4114 Results in the variance of the values of @var{number}. This function
4115 requires at least two valid arguments to give a non-missing result.
4118 @node String Functions, Time & Date, Statistical Functions, Functions
4119 @subsection String Functions
4120 @cindex functions, string
4121 @cindex string functions
4123 String functions take various arguments and return various results.
4125 @cindex concatenation
4126 @cindex strings, concatenation of
4127 @deftypefn {Function} {} CONCAT (@var{string}, @var{string}[, @dots{}])
4128 Returns a string consisting of each @var{string} in sequence.
4129 @code{CONCAT("abc", "def", "ghi")} has a value of @code{"abcdefghi"}.
4130 The resultant string is truncated to a maximum of 255 characters.
4133 @cindex searching strings
4134 @deftypefn {Function} {} INDEX (@var{haystack}, @var{needle})
4135 Returns a positive integer indicating the position of the first
4136 occurrence @var{needle} in @var{haystack}. Returns 0 if @var{haystack}
4137 does not contain @var{needle}. Returns system-missing if @var{needle}
4141 @deftypefn {Function} {} INDEX (@var{haystack}, @var{needle}, @var{divisor})
4142 Divides @var{needle} into parts, each with length @var{divisor}.
4143 Searches @var{haystack} for the first occurrence of each part, and
4144 returns the smallest value. Returns 0 if @var{haystack} does not
4145 contain any part in @var{needle}. It is an error if @var{divisor}
4146 cannot be evenly divided into the length of @var{needle}. Returns
4147 system-missing if @var{needle} is an empty string.
4150 @cindex strings, finding length of
4151 @deftypefn {Function} {} LENGTH (@var{string})
4152 Returns the number of characters in @var{string}.
4155 @cindex strings, case of
4156 @deftypefn {Function} {} LOWER (@var{string})
4157 Returns a string identical to @var{string} except that all uppercase
4158 letters are changed to lowercase letters. The definitions of
4159 ``uppercase'' and ``lowercase'' are system-dependent.
4162 @cindex strings, padding
4163 @deftypefn {Function} {} LPAD (@var{string}, @var{length})
4164 If @var{string} is at least @var{length} characters in length, returns
4165 @var{string} unchanged. Otherwise, returns @var{string} padded with
4166 spaces on the left side to length @var{length}. Returns an empty string
4167 if @var{length} is system-missing, negative, or greater than 255.
4170 @deftypefn {Function} {} LPAD (@var{string}, @var{length}, @var{padding})
4171 If @var{string} is at least @var{length} characters in length, returns
4172 @var{string} unchanged. Otherwise, returns @var{string} padded with
4173 @var{padding} on the left side to length @var{length}. Returns an empty
4174 string if @var{length} is system-missing, negative, or greater than 255, or
4175 if @var{padding} does not contain exactly one character.
4178 @cindex strings, trimming
4179 @cindex whitespace, trimming
4180 @deftypefn {Function} {} LTRIM (@var{string})
4181 Returns @var{string}, after removing leading spaces. Other whitespace,
4182 such as tabs, carriage returns, line feeds, and vertical tabs, is not
4186 @deftypefn {Function} {} LTRIM (@var{string}, @var{padding})
4187 Returns @var{string}, after removing leading @var{padding} characters.
4188 If @var{padding} does not contain exactly one character, returns an
4192 @cindex numbers, converting from strings
4193 @cindex strings, converting to numbers
4194 @deftypefn {Function} {} NUMBER (@var{string}, @var{format})
4195 Returns the number produced when @var{string} is interpreted according
4196 to format specifier @var{format}. If the format width @var{w} is less
4197 than the length of @var{string}, then only the first @var{w}
4198 characters in @var{string} are used, e.g.@: @code{NUMBER("123", F3.0)}
4199 and @code{NUMBER("1234", F3.0)} both have value 123. If @var{w} is
4200 greater than @var{string}'s length, then it is treated as if it were
4201 right-padded with spaces. If @var{string} is not in the correct
4202 format for @var{format}, system-missing is returned.
4205 @cindex strings, searching backwards
4206 @deftypefn {Function} {} RINDEX (@var{string}, @var{format})
4207 Returns a positive integer indicating the position of the last
4208 occurrence of @var{needle} in @var{haystack}. Returns 0 if
4209 @var{haystack} does not contain @var{needle}. Returns system-missing if
4210 @var{needle} is an empty string.
4213 @deftypefn {Function} {} RINDEX (@var{haystack}, @var{needle}, @var{divisor})
4214 Divides @var{needle} into parts, each with length @var{divisor}.
4215 Searches @var{haystack} for the last occurrence of each part, and
4216 returns the largest value. Returns 0 if @var{haystack} does not contain
4217 any part in @var{needle}. It is an error if @var{divisor} cannot be
4218 evenly divided into the length of @var{needle}. Returns system-missing
4219 if @var{needle} is an empty string.
4222 @cindex padding strings
4223 @cindex strings, padding
4224 @deftypefn {Function} {} RPAD (@var{string}, @var{length})
4225 If @var{string} is at least @var{length} characters in length, returns
4226 @var{string} unchanged. Otherwise, returns @var{string} padded with
4227 spaces on the right to length @var{length}. Returns an empty string if
4228 @var{length} is system-missing, negative, or greater than 255.
4231 @deftypefn {Function} {} RPAD (@var{string}, @var{length}, @var{padding})
4232 If @var{string} is at least @var{length} characters in length, returns
4233 @var{string} unchanged. Otherwise, returns @var{string} padded with
4234 @var{padding} on the right to length @var{length}. Returns an empty
4235 string if @var{length} is system-missing, negative, or greater than 255,
4236 or if @var{padding} does not contain exactly one character.
4239 @cindex strings, trimming
4240 @cindex whitespace, trimming
4241 @deftypefn {Function} {} RTRIM (@var{string})
4242 Returns @var{string}, after removing trailing spaces. Other types of
4243 whitespace are not removed.
4246 @deftypefn {Function} {} RTRIM (@var{string}, @var{padding})
4247 Returns @var{string}, after removing trailing @var{padding} characters.
4248 If @var{padding} does not contain exactly one character, returns an
4252 @cindex strings, converting from numbers
4253 @cindex numbers, converting to strings
4254 @deftypefn {Function} {} STRING (@var{number}, @var{format})
4255 Returns a string corresponding to @var{number} in the format given by
4256 format specifier @var{format}. For example, @code{STRING(123.56, F5.1)}
4257 has the value @code{"123.6"}.
4261 @cindex strings, taking substrings of
4262 @deftypefn {Function} {} SUBSTR (@var{string}, @var{start})
4263 Returns a string consisting of the value of @var{string} from position
4264 @var{start} onward. Returns an empty string if @var{start} is system-missing
4265 or has a value less than 1 or greater than the number of characters in
4269 @deftypefn {Function} {} SUBSTR (@var{string}, @var{start}, @var{count})
4270 Returns a string consisting of the first @var{count} characters from
4271 @var{string} beginning at position @var{start}. Returns an empty string
4272 if @var{start} or @var{count} is system-missing, if @var{start} is less
4273 than 1 or greater than the number of characters in @var{string}, or if
4274 @var{count} is less than 1. Returns a string shorter than @var{count}
4275 characters if @var{start} + @var{count} - 1 is greater than the number
4276 of characters in @var{string}. Examples: @code{SUBSTR("abcdefg", 3, 2)}
4277 has value @code{"cd"}; @code{SUBSTR("Ben Pfaff", 5, 10)} has the value
4281 @cindex case conversion
4282 @cindex strings, case of
4283 @deftypefn {Function} {} UPCASE (@var{string})
4284 Returns @var{string}, changing lowercase letters to uppercase letters.
4287 @node Time & Date, Miscellaneous Functions, String Functions, Functions
4288 @subsection Time & Date Functions
4289 @cindex functions, time & date
4293 @cindex dates, legal range of
4294 The legal range of dates for use in PSPP is 15 Oct 1582
4295 through 31 Dec 19999.
4297 @cindex arguments, invalid
4298 @cindex invalid arguments
4300 @strong{Please note:} Most time & date extraction functions will accept
4305 Negative numbers in PSPP time format.
4307 Numbers less than 86,400 in PSPP date format.
4310 However, sensible results are not guaranteed for these invalid values.
4311 The given equivalents for these functions are definitely not guaranteed
4316 @strong{Please note also:} The time & date construction
4317 functions @strong{do} produce reasonable and useful results for
4318 out-of-range values; these are not considered invalid.
4322 * Time & Date Concepts:: How times & dates are defined and represented
4323 * Time Construction:: TIME.@{DAYS HMS@}
4324 * Time Extraction:: CTIME.@{DAYS HOURS MINUTES SECONDS@}
4325 * Date Construction:: DATE.@{DMY MDY MOYR QYR WKYR YRDAY@}
4326 * Date Extraction:: XDATE.@{DATE HOUR JDAY MDAY MINUTE MONTH
4327 QUARTER SECOND TDAY TIME WEEK
4331 @node Time & Date Concepts, Time Construction, Time & Date, Time & Date
4332 @subsubsection How times & dates are defined and represented
4334 @cindex time, concepts
4335 @cindex time, intervals
4336 Times and dates are handled by PSPP as single numbers. A
4337 @dfn{time} is an interval. PSPP measures times in seconds.
4338 Thus, the following intervals correspond with the numeric values given:
4343 1 day, 3 hours, 10 seconds 97,210
4345 10010 d, 14 min, 24 s 864,864,864
4348 @cindex dates, concepts
4349 @cindex time, instants of
4350 A @dfn{date}, on the other hand, is a particular instant in the past or
4351 the future. PSPP represents a date as a number of seconds after the
4352 midnight that separated 8 Oct 1582 and 9 Oct 1582. (Please note that 15
4353 Oct 1582 immediately followed 9 Oct 1582.) Thus, the midnights before
4354 the dates given below correspond with the numeric PSPP dates given:
4358 4 Jul 1776 6,113,318,400
4359 1 Jan 1900 10,010,390,400
4360 1 Oct 1978 12,495,427,200
4361 24 Aug 1995 13,028,601,600
4364 @cindex time, mathematical properties of
4365 @cindex mathematics, applied to times & dates
4366 @cindex dates, mathematical properties of
4372 A time may be added to, or subtracted from, a date, resulting in a date.
4375 The difference of two dates may be taken, resulting in a time.
4378 Two times may be added to, or subtracted from, each other, resulting in
4382 (Adding two dates does not produce a useful result.)
4384 Since times and dates are merely numbers, the ordinary addition and
4385 subtraction operators are employed for these purposes.
4388 @strong{Please note:} Many dates and times have extremely large
4389 values---just look at the values above. Thus, it is not a good idea to
4390 take powers of these values; also, the accuracy of some procedures may
4391 be affected. If necessary, convert times or dates in seconds to some
4392 other unit, like days or years, before performing analysis.
4395 @node Time Construction, Time Extraction, Time & Date Concepts, Time & Date
4396 @subsubsection Functions that Produce Times
4397 @cindex times, constructing
4398 @cindex constructing times
4400 These functions take numeric arguments and produce numeric results in
4404 @cindex time, in days
4405 @deftypefn {Function} {} TIME.DAYS (@var{ndays})
4406 Results in a time value corresponding to @var{ndays} days.
4407 (@code{TIME.DAYS(@var{x})} is equivalent to @code{@var{x} * 60 * 60 *
4411 @cindex hours-minutes-seconds
4412 @cindex time, in hours-minutes-seconds
4413 @deftypefn {Function} {} TIME.HMS (@var{nhours}, @var{nmins}, @var{nsecs})
4414 Results in a time value corresponding to @var{nhours} hours, @var{nmins}
4415 minutes, and @var{nsecs} seconds. (@code{TIME.HMS(@var{h}, @var{m},
4416 @var{s})} is equivalent to @code{@var{h}*60*60 + @var{m}*60 +
4420 @node Time Extraction, Date Construction, Time Construction, Time & Date
4421 @subsubsection Functions that Examine Times
4422 @cindex extraction, of time
4423 @cindex time examination
4424 @cindex examination, of times
4425 @cindex time, lengths of
4427 These functions take numeric arguments in PSPP time format and
4428 give numeric results.
4431 @cindex time, in days
4432 @deftypefn {Function} {} CTIME.DAYS (@var{time})
4433 Results in the number of days and fractional days in @var{time}.
4434 (@code{CTIME.DAYS(@var{x})} is equivalent to @code{@var{x}/60/60/24}.)
4438 @cindex time, in hours
4439 @deftypefn {Function} {} CTIME.HOURS (@var{time})
4440 Results in the number of hours and fractional hours in @var{time}.
4441 (@code{CTIME.HOURS(@var{x})} is equivalent to @code{@var{x}/60/60}.)
4445 @cindex time, in minutes
4446 @deftypefn {Function} {} CTIME.MINUTES (@var{time})
4447 Results in the number of minutes and fractional minutes in @var{time}.
4448 (@code{CTIME.MINUTES(@var{x})} is equivalent to @code{@var{x}/60}.)
4452 @cindex time, in seconds
4453 @deftypefn {Function} {} CTIME.SECONDS (@var{time})
4454 Results in the number of seconds and fractional seconds in @var{time}.
4455 (@code{CTIME.SECONDS} does nothing; @code{CTIME.SECONDS(@var{x})} is
4456 equivalent to @code{@var{x}}.)
4459 @node Date Construction, Date Extraction, Time Extraction, Time & Date
4460 @subsubsection Functions that Produce Dates
4461 @cindex dates, constructing
4462 @cindex constructing dates
4464 @cindex arguments, of date construction functions
4465 These functions take numeric arguments and give numeric results in the
4466 PSPP date format. Arguments taken by these functions are:
4470 Refers to a day of the month between 1 and 31.
4473 Refers to a month of the year between 1 and 12.
4476 Refers to a quarter of the year between 1 and 4. The quarters of the
4477 year begin on the first days of months 1, 4, 7, and 10.
4480 Refers to a week of the year between 1 and 53.
4483 Refers to a day of the year between 1 and 366.
4486 Refers to a year between 1582 and 19999.
4489 @cindex arguments, invalid
4490 If these functions' arguments are out-of-range, they are correctly
4491 normalized before conversion to date format. Non-integers are rounded
4494 @cindex day-month-year
4495 @cindex dates, day-month-year
4496 @deftypefn {Function} {} DATE.DMY (@var{day}, @var{month}, @var{year})
4497 @deftypefnx {Function} {} DATE.MDY (@var{month}, @var{day}, @var{year})
4498 Results in a date value corresponding to the midnight before day
4499 @var{day} of month @var{month} of year @var{year}.
4503 @cindex dates, month-year
4504 @deftypefn {Function} {} DATE.MOYR (@var{month}, @var{year})
4505 Results in a date value corresponding to the midnight before the first
4506 day of month @var{month} of year @var{year}.
4509 @cindex quarter-year
4510 @cindex dates, quarter-year
4511 @deftypefn {Function} {} DATE.QYR (@var{quarter}, @var{year})
4512 Results in a date value corresponding to the midnight before the first
4513 day of quarter @var{quarter} of year @var{year}.
4517 @cindex dates, week-year
4518 @deftypefn {Function} {} DATE.WKYR (@var{week}, @var{year})
4519 Results in a date value corresponding to the midnight before the first
4520 day of week @var{week} of year @var{year}.
4524 @cindex dates, year-day
4525 @deftypefn {Function} {} DATE.YRDAY (@var{year}, @var{yday})
4526 Results in a date value corresponding to the midnight before day
4527 @var{yday} of year @var{year}.
4530 @node Date Extraction, , Date Construction, Time & Date
4531 @subsubsection Functions that Examine Dates
4532 @cindex extraction, of dates
4533 @cindex date examination
4535 @cindex arguments, of date extraction functions
4536 These functions take numeric arguments in PSPP date or time
4537 format and give numeric results. These names are used for arguments:
4541 A numeric value in PSPP date format.
4544 A numeric value in PSPP time format.
4547 A numeric value in PSPP time or date format.
4551 @cindex dates, in days
4552 @cindex time, in days
4553 @deftypefn {Function} {} XDATE.DATE (@var{time-or-date})
4554 For a time, results in the time corresponding to the number of whole
4555 days @var{date-or-time} includes. For a date, results in the date
4556 corresponding to the latest midnight at or before @var{date-or-time};
4557 that is, gives the date that @var{date-or-time} is in.
4558 (XDATE.DATE(@var{x}) is equivalent to TRUNC(@var{x}/86400)*86400.)
4559 Applying this function to a time is a non-portable feature.
4563 @cindex dates, in hours
4564 @cindex time, in hours
4565 @deftypefn {Function} {} XDATE.HOUR (@var{time-or-date})
4566 For a time, results in the number of whole hours beyond the number of
4567 whole days represented by @var{date-or-time}. For a date, results in
4568 the hour (as an integer between 0 and 23) corresponding to
4569 @var{date-or-time}. (XDATE.HOUR(@var{x}) is equivalent to
4570 MOD(TRUNC(@var{x}/3600),24)) Applying this function to a time is a
4571 non-portable feature.
4574 @cindex day of the year
4575 @cindex dates, day of the year
4576 @deftypefn {Function} {} XDATE.JDAY (@var{date})
4577 Results in the day of the year (as an integer between 1 and 366)
4578 corresponding to @var{date}.
4581 @cindex day of the month
4582 @cindex dates, day of the month
4583 @deftypefn {Function} {} XDATE.MDAY (@var{date})
4584 Results in the day of the month (as an integer between 1 and 31)
4585 corresponding to @var{date}.
4589 @cindex dates, in minutes
4590 @cindex time, in minutes
4591 @deftypefn {Function} {} XDATE.MINUTE (@var{time-or-date})
4592 Results in the number of minutes (as an integer between 0 and 59) after
4593 the last hour in @var{time-or-date}. (XDATE.MINUTE(@var{x}) is
4594 equivalent to MOD(TRUNC(@var{x}/60),60)) Applying this function to a
4595 time is a non-portable feature.
4599 @cindex dates, in months
4600 @deftypefn {Function} {} XDATE.MONTH (@var{date})
4601 Results in the month of the year (as an integer between 1 and 12)
4602 corresponding to @var{date}.
4606 @cindex dates, in quarters
4607 @deftypefn {Function} {} XDATE.QUARTER (@var{date})
4608 Results in the quarter of the year (as an integer between 1 and 4)
4609 corresponding to @var{date}.
4613 @cindex dates, in seconds
4614 @cindex time, in seconds
4615 @deftypefn {Function} {} XDATE.SECOND (@var{time-or-date})
4616 Results in the number of whole seconds after the last whole minute (as
4617 an integer between 0 and 59) in @var{time-or-date}.
4618 (XDATE.SECOND(@var{x}) is equivalent to MOD(@var{x}, 60).) Applying
4619 this function to a time is a non-portable feature.
4623 @cindex times, in days
4624 @deftypefn {Function} {} XDATE.TDAY (@var{time})
4625 Results in the number of whole days (as an integer) in @var{time}.
4626 (XDATE.TDAY(@var{x}) is equivalent to TRUNC(@var{x}/86400).)
4630 @cindex dates, time of day
4631 @deftypefn {Function} {} XDATE.TIME (@var{date})
4632 Results in the time of day at the instant corresponding to @var{date},
4633 in PSPP time format. This is the number of seconds since
4634 midnight on the day corresponding to @var{date}. (XDATE.TIME(@var{x}) is
4635 equivalent to TRUNC(@var{x}/86400)*86400.)
4639 @cindex dates, in weeks
4640 @deftypefn {Function} {} XDATE.WEEK (@var{date})
4641 Results in the week of the year (as an integer between 1 and 53)
4642 corresponding to @var{date}.
4645 @cindex day of the week
4647 @cindex dates, day of the week
4648 @cindex dates, in weekdays
4649 @deftypefn {Function} {} XDATE.WKDAY (@var{date})
4650 Results in the day of week (as an integer between 1 and 7) corresponding
4651 to @var{date}. The days of the week are:
4672 @cindex dates, in years
4673 @deftypefn {Function} {} XDATE.YEAR (@var{date})
4674 Returns the year (as an integer between 1582 and 19999) corresponding to
4678 @node Miscellaneous Functions, Functions Not Implemented, Time & Date, Functions
4679 @subsection Miscellaneous Functions
4680 @cindex functions, miscellaneous
4682 Miscellaneous functions take various arguments and produce various
4685 @cindex cross-case function
4686 @cindex function, cross-case
4687 @deftypefn {Function} {} LAG (@var{variable})
4689 @var{variable} must be a numeric or string variable name. @code{LAG}
4690 results in the value of that variable for the case before the current
4691 one. In case-selection procedures, @code{LAG} results in the value of
4692 the variable for the last case selected. Results in system-missing (for
4693 numeric variables) or blanks (for string variables) for the first case
4694 or before any cases are selected.
4697 @deftypefn {Function} {} LAG (@var{variable}, @var{ncases})
4698 @var{variable} must be a numeric or string variable name. @var{ncases}
4699 must be a small positive constant integer, although there is no explicit
4700 limit. (Use of a large value for @var{ncases} will increase memory
4701 consumption, since PSPP must keep @var{ncases} cases in memory.)
4702 @code{LAG (@var{variable}, @var{ncases}} results in the value of
4703 @var{variable} that is @var{ncases} before the case currently being
4704 processed. See @code{LAG (@var{variable})} above for more details.
4707 @cindex date, Julian
4709 @deftypefn {Function} {} YRMODA (@var{year}, @var{month}, @var{day})
4710 @var{year} is a year between 0 and 199 or 1582 and 19999. @var{month} is
4711 a month between 1 and 12. @var{day} is a day between 1 and 31. If
4712 @var{month} or @var{day} is out-of-range, it changes the next higher
4713 unit. For instance, a @var{day} of 0 refers to the last day of the
4714 previous month, and a @var{month} of 13 refers to the first month of the
4715 next year. @var{year} must be in range. If @var{year} is between 0 and
4716 199, 1900 is added. @var{year}, @var{month}, and @var{day} must all be
4719 @code{YRMODA} results in the number of days between 15 Oct 1582 and
4720 the date specified, plus one. The date passed to @code{YRMODA} must be
4721 on or after 15 Oct 1582. 15 Oct 1582 has a value of 1.
4724 @node Functions Not Implemented, , Miscellaneous Functions, Functions
4725 @subsection Functions Not Implemented
4726 @cindex functions, not implemented
4727 @cindex not implemented
4728 @cindex features, not implemented
4730 These functions are not yet implemented and thus not yet documented,
4731 since it's a hassle.
4755 @node Order of Operations, , Functions, Expressions
4756 @section Operator Precedence
4757 @cindex operator precedence
4758 @cindex precedence, operator
4759 @cindex order of operations
4760 @cindex operations, order of
4762 The following table describes operator precedence. Smaller-numbered
4763 levels in the table have higher precedence. Within a level, operations
4764 are performed from left to right, except for level 2 (exponentiation),
4765 where operations are performed from right to left. If an operator
4766 appears in the table in two places (@code{-}), the first occurrence is
4767 unary, the second is binary.
4781 @code{EQ GE GT LE LT NE}
4786 @node Data Input and Output, System and Portable Files, Expressions, Top
4787 @chapter Data Input and Output
4792 @cindex observations
4794 Data are the focus of the PSPP language.
4795 Each datum belongs to a @dfn{case} (also called an @dfn{observation}).
4796 Each case represents an individual or `experimental unit'.
4797 For example, in the results of a survey, the names of the respondents,
4798 their sex, age @i{etc}. and their responses are all data and the data
4799 pertaining to single respondent is a case.
4800 This chapter examines
4801 the PSPP commands for defining variables and reading and writing data.
4804 @strong{Please note:} Data is not actually read until a procedure is
4805 executed. These commands tell PSPP how to read data, but they
4806 do not @emph{cause} PSPP to read data.
4810 * BEGIN DATA:: Embed data within a syntax file.
4811 * CLEAR TRANSFORMATIONS:: Clear pending transformations.
4812 * DATA LIST:: Fundamental data reading command.
4813 * END CASE:: Output the current case.
4814 * END FILE:: Terminate the current input program.
4815 * FILE HANDLE:: Support for fixed-length records.
4816 * INPUT PROGRAM:: Support for complex input programs.
4817 * LIST:: List cases in the active file.
4818 * MATRIX DATA:: Read matrices in text format.
4819 * NEW FILE:: Clear the active file and dictionary.
4820 * PRINT:: Display values in print formats.
4821 * PRINT EJECT:: Eject the current page then print.
4822 * PRINT SPACE:: Print blank lines.
4823 * REREAD:: Take another look at the previous input line.
4824 * REPEATING DATA:: Multiple cases on a single line.
4825 * WRITE:: Display values in write formats.
4828 @node BEGIN DATA, CLEAR TRANSFORMATIONS, Data Input and Output, Data Input and Output
4832 @cindex Embedding data in syntax files
4833 @cindex Data, embedding in syntax files
4841 @cmd{BEGIN DATA} and @cmd{END DATA} can be used to embed raw ASCII
4842 data in a PSPP syntax file. @cmd{DATA LIST} or another input
4843 procedure must be used before @cmd{BEGIN DATA} (@pxref{DATA LIST}).
4844 @cmd{BEGIN DATA} and @cmd{END DATA} must be used together. @cmd{END
4845 DATA} must appear by itself on a single line, with no leading
4846 whitespace and exactly one space between the words @code{END} and
4847 @code{DATA}, followed immediately by the terminal dot, like this:
4853 @node CLEAR TRANSFORMATIONS, DATA LIST, BEGIN DATA, Data Input and Output
4854 @section CLEAR TRANSFORMATIONS
4855 @vindex CLEAR TRANSFORMATIONS
4858 CLEAR TRANSFORMATIONS.
4861 @cmd{CLEAR TRANSFORMATIONS} clears out all pending
4862 transformations. It does not cancel the current input program. It is
4863 valid only when PSPP is interactive, not in syntax files.
4865 @node DATA LIST, END CASE, CLEAR TRANSFORMATIONS, Data Input and Output
4868 @cindex reading data from a file
4869 @cindex data, reading from a file
4870 @cindex data, embedding in syntax files
4871 @cindex embedding data in syntax files
4873 Used to read text or binary data, @cmd{DATA LIST} is the most
4874 fundamental data-reading command. Even the more sophisticated input
4875 methods use @cmd{DATA LIST} commands as a building block.
4876 Understanding @cmd{DATA LIST} is important to understanding how to use
4877 PSPP to read your data files.
4879 There are two major variants of @cmd{DATA LIST}, which are fixed
4880 format and free format. In addition, free format has a minor variant,
4881 list format, which is discussed in terms of its differences from vanilla
4884 Each form of @cmd{DATA LIST} is described in detail below.
4887 * DATA LIST FIXED:: Fixed columnar locations for data.
4888 * DATA LIST FREE:: Any spacing you like.
4889 * DATA LIST LIST:: Each case must be on a single line.
4892 @node DATA LIST FIXED, DATA LIST FREE, DATA LIST, DATA LIST
4893 @subsection DATA LIST FIXED
4894 @vindex DATA LIST FIXED
4895 @cindex reading fixed-format data
4896 @cindex fixed-format data, reading
4897 @cindex data, fixed-format, reading
4898 @cindex embedding fixed-format data
4904 RECORDS=record_count
4906 /[line_no] var_spec@dots{}
4908 where each var_spec takes one of the forms
4909 var_list start-end [type_spec]
4910 var_list (fortran_spec)
4913 @cmd{DATA LIST FIXED} is used to read data files that have values at fixed
4914 positions on each line of single-line or multiline records. The
4915 keyword FIXED is optional.
4917 The FILE subcommand must be used if input is to be taken from an
4918 external file. It may be used to specify a filename as a string or a
4919 file handle (@pxref{FILE HANDLE}). If the FILE subcommand is not used,
4920 then input is assumed to be specified within the command file using
4921 @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}).
4923 The optional RECORDS subcommand, which takes a single integer as an
4924 argument, is used to specify the number of lines per record. If RECORDS
4925 is not specified, then the number of lines per record is calculated from
4926 the list of variable specifications later in @cmd{DATA LIST}.
4928 The END subcommand is only useful in conjunction with @cmd{INPUT
4929 PROGRAM}. @xref{INPUT PROGRAM}, for details.
4931 @cmd{DATA LIST} can optionally output a table describing how the data file
4932 will be read. The TABLE subcommand enables this output, and NOTABLE
4933 disables it. The default is to output the table.
4935 The list of variables to be read from the data list must come last.
4936 Each line in the data record is introduced by a slash (@samp{/}).
4937 Optionally, a line number may follow the slash. Following, any number
4938 of variable specifications may be present.
4940 Each variable specification consists of a list of variable names
4941 followed by a description of their location on the input line. Sets of
4942 variables may specified using the @code{DATA LIST} TO convention
4944 Variables}). There are two ways to specify the location of the variable
4945 on the line: PSPP style and FORTRAN style.
4947 With PSPP style, the starting column and ending column for the field
4948 are specified after the variable name, separated by a dash (@samp{-}).
4949 For instance, the third through fifth columns on a line would be
4950 specified @samp{3-5}. By default, variables are considered to be in
4951 @samp{F} format (@pxref{Input/Output Formats}). (This default can be
4952 changed; see @ref{SET} for more information.)
4954 When using PSPP style, to use a variable format other than the default,
4955 specify the format type in parentheses after the column numbers. For
4956 instance, for alphanumeric @samp{A} format, use @samp{(A)}.
4958 In addition, implied decimal places can be specified in parentheses
4959 after the column numbers. As an example, suppose that a data file has a
4960 field in which the characters @samp{1234} should be interpreted as
4961 having the value 12.34. Then this field has two implied decimal places,
4962 and the corresponding specification would be @samp{(2)}. If a field
4963 that has implied decimal places contains a decimal point, then the
4964 implied decimal places are not applied.
4966 Changing the variable format and adding implied decimal places can be
4967 done together; for instance, @samp{(N,5)}.
4969 When using PSPP style, the input and output width of each variable is
4970 computed from the field width. The field width must be evenly divisible
4971 into the number of variables specified.
4973 FORTRAN style is an altogether different approach to specifying field
4974 locations. With this approach, a list of variable input format
4975 specifications, separated by commas, are placed after the variable names
4976 inside parentheses. Each format specifier advances as many characters
4977 into the input line as it uses.
4979 In addition to the standard format specifiers (@pxref{Input/Output
4980 Formats}), FORTRAN style defines some extensions:
4984 Advance the current column on this line by one character position.
4986 @item @code{T}@var{x}
4987 Set the current column on this line to column @var{x}, with column
4988 numbers considered to begin with 1 at the left margin.
4990 @item @code{NEWREC}@var{x}
4991 Skip forward @var{x} lines in the current record, resetting the active
4992 column to the left margin.
4995 Any format specifier may be preceded by a number. This causes the
4996 action of that format specifier to be repeated the specified number of
4999 @item (@var{spec1}, @dots{}, @var{specN})
5000 Group the given specifiers together. This is most useful when preceded
5001 by a repeat count. Groups may be nested arbitrarily.
5004 FORTRAN and PSPP styles may be freely intermixed. PSPP style leaves the
5005 active column immediately after the ending column specified. Record
5006 motion using @code{NEWREC} in FORTRAN style also applies to later
5007 FORTRAN and PSPP specifiers.
5010 * DATA LIST FIXED Examples:: Examples of DATA LIST FIXED.
5013 @node DATA LIST FIXED Examples, , DATA LIST FIXED, DATA LIST FIXED
5014 @unnumberedsubsubsec Examples
5019 DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
5028 Defines the following variables:
5032 @code{NAME}, a 10-character-wide long string variable, in columns 1
5036 @code{INFO1}, a numeric variable, in columns 12 through 13.
5039 @code{INFO2}, a numeric variable, in columns 14 through 15.
5042 @code{INFO3}, a numeric variable, in columns 16 through 17.
5045 The @code{BEGIN DATA}/@code{END DATA} commands cause three cases to be
5049 Case NAME INFO1 INFO2 INFO3
5050 1 John Smith 10 23 11
5051 2 Bob Arnold 12 20 15
5055 The @code{TABLE} keyword causes PSPP to print out a table
5056 describing the four variables defined.
5060 DAT LIS FIL="survey.dat"
5061 /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
5066 Defines the following variables:
5070 @code{ID}, a numeric variable, in columns 1-5 of the first record.
5073 @code{NAME}, a 30-character long string variable, in columns 7-36 of the
5077 @code{SURNAME}, a 30-character long string variable, in columns 38-67 of
5081 @code{MINITIAL}, a 1-character short string variable, in column 69 of
5085 Fifty variables @code{Q01}, @code{Q02}, @code{Q03}, @dots{}, @code{Q49},
5086 @code{Q50}, all numeric, @code{Q01} in column 7, @code{Q02} in column 8,
5087 @dots{}, @code{Q49} in column 55, @code{Q50} in column 56, all in the second
5091 Cases are separated by a blank record.
5093 Data is read from file @file{survey.dat} in the current directory.
5095 This example shows keywords abbreviated to their first 3 letters.
5099 @node DATA LIST FREE, DATA LIST LIST, DATA LIST FIXED, DATA LIST
5100 @subsection DATA LIST FREE
5101 @vindex DATA LIST FREE
5110 where each var_spec takes one of the forms
5111 var_list [(type_spec)]
5115 In free format, the input data is structured as a series of comma- or
5116 whitespace-delimited fields (end of line is one form of whitespace; it
5117 is not treated specially). Field contents may be surrounded by matched
5118 pairs of apostrophes (@samp{'}) or quotes (@samp{"}), or they may be
5119 unenclosed. For any type of field leading white space (up to the
5120 apostrophe or quote, if any) is not included in the field.
5122 Multiple consecutive delimiters are equivalent to a single delimiter.
5123 To specify an empty field, write an empty set of single or double
5124 quotes; for instance, @samp{""}.
5126 The NOTABLE and TABLE subcommands are as in @cmd{DATA LIST FIXED} above.
5127 NOTABLE is the default.
5129 The FILE and END subcommands are as in @cmd{DATA LIST FIXED} above.
5131 The variables to be parsed are given as a single list of variable names.
5132 This list must be introduced by a single slash (@samp{/}). The set of
5133 variable names may contain format specifications in parentheses
5134 (@pxref{Input/Output Formats}). Format specifications apply to all
5135 variables back to the previous parenthesized format specification.
5137 In addition, an asterisk may be used to indicate that all variables
5138 preceding it are to have input/output format @samp{F8.0}.
5140 Specified field widths are ignored on input, although all normal limits
5141 on field width apply, but they are honored on output.
5143 @node DATA LIST LIST, , DATA LIST FREE, DATA LIST
5144 @subsection DATA LIST LIST
5145 @vindex DATA LIST LIST
5154 where each var_spec takes one of the forms
5155 var_list [(type_spec)]
5159 With one exception, @cmd{DATA LIST LIST} is syntactically and
5160 semantically equivalent to @cmd{DATA LIST FREE}. The exception is
5161 that each input line is expected to correspond to exactly one input
5162 record. If more or fewer fields are found on an input line than
5163 expected, an appropriate diagnostic is issued.
5165 @node END CASE, END FILE, DATA LIST, Data Input and Output
5173 @cmd{END CASE} is used only within @cmd{INPUT PROGRAM} to output the
5174 current case. @xref{INPUT PROGRAM}, for details.
5176 @node END FILE, FILE HANDLE, END CASE, Data Input and Output
5184 @cmd{END FILE} is used only within @cmd{INPUT PROGRAM} to terminate
5185 the current input program. @xref{INPUT PROGRAM}.
5187 @node FILE HANDLE, INPUT PROGRAM, END FILE, Data Input and Output
5188 @section FILE HANDLE
5192 FILE HANDLE handle_name
5194 /RECFORM=@{VARIABLE,FIXED,SPANNED@}
5196 /MODE=@{CHARACTER,IMAGE,BINARY,MULTIPUNCH,360@}
5199 Use @cmd{FILE HANDLE} to define the attributes of a file that does
5200 not use conventional variable-length records terminated by newline
5203 Specify the file handle name as an identifier. Any given identifier may
5204 only appear once in a PSPP run. File handles may not be reassigned to a
5205 different file. The file handle name must immediately follow the @cmd{FILE
5206 HANDLE} command name.
5208 The NAME subcommand specifies the name of the file associated with the
5209 handle. It is the only required subcommand.
5211 The RECFORM subcommand specifies how the file is laid out. VARIABLE
5212 specifies variable-length lines terminated with newlines, and it is the
5213 default. FIXED specifies fixed-length records. SPANNED is not
5216 LRECL specifies the length of fixed-length records. It is required if
5217 @code{/RECFORM FIXED} is specified.
5219 MODE specifies a file mode. CHARACTER, the default, causes the data
5220 file to be opened in ANSI C text mode. BINARY causes the data file to
5221 be opened in ANSI C binary mode. The other possibilities are not
5224 @node INPUT PROGRAM, LIST, FILE HANDLE, Data Input and Output
5225 @section INPUT PROGRAM
5226 @vindex INPUT PROGRAM
5230 @dots{} input commands @dots{}
5234 @cmd{INPUT PROGRAM}@dots{}@cmd{END INPUT PROGRAM} specifies a
5235 complex input program. By placing data input commands within @cmd{INPUT
5236 PROGRAM}, PSPP programs can take advantage of more complex file
5237 structures than available with only @cmd{DATA LIST}.
5239 The first sort of extended input program is to simply put multiple @cmd{DATA
5240 LIST} commands within the @cmd{INPUT PROGRAM}. This will cause all of
5242 files to be read in parallel. Input will stop when end of file is
5243 reached on any of the data files.
5245 Transformations, such as conditional and looping constructs, can also be
5246 included within @cmd{INPUT PROGRAM}. These can be used to combine input
5247 from several data files in more complex ways. However, input will still
5248 stop when end of file is reached on any of the data files.
5250 To prevent @cmd{INPUT PROGRAM} from terminating at the first end of
5252 the END subcommand on @cmd{DATA LIST}. This subcommand takes a
5254 which should be a numeric scratch variable (@pxref{Scratch Variables}).
5255 (It need not be a scratch variable but otherwise the results can be
5256 surprising.) The value of this variable is set to 0 when reading the
5257 data file, or 1 when end of file is encountered.
5259 Two additional commands are useful in conjunction with @cmd{INPUT PROGRAM}.
5260 @cmd{END CASE} is the first. Normally each loop through the
5262 structure produces one case. @cmd{END CASE} controls exactly
5263 when cases are output. When @cmd{END CASE} is used, looping from the end of
5264 @cmd{INPUT PROGRAM} to the beginning does not cause a case to be output.
5266 @cmd{END FILE} is the second. When the END subcommand is used on @cmd{DATA
5267 LIST}, there is no way for the @cmd{INPUT PROGRAM} construct to stop
5269 so an infinite loop results. @cmd{END FILE}, when executed,
5270 stops the flow of input data and passes out of the @cmd{INPUT PROGRAM}
5273 All this is very confusing. A few examples should help to clarify.
5277 DATA LIST NOTABLE FILE='a.data'/X 1-10.
5278 DATA LIST NOTABLE FILE='b.data'/Y 1-10.
5283 The example above reads variable X from file @file{a.data} and variable
5284 Y from file @file{b.data}. If one file is shorter than the other then
5285 the extra data in the longer file is ignored.
5292 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
5295 DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
5305 The above example reads variable X from @file{a.data} and variable Y from
5306 @file{b.data}. If one file is shorter than the other then the missing
5307 field is set to the system-missing value alongside the present value for
5308 the remaining length of the longer file.
5315 DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
5322 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
5331 The above example reads data from file @file{a.data}, then from
5332 @file{b.data}, and concatenates them into a single active file.
5339 DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
5347 DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
5358 The above example does the same thing as the previous example, in a
5364 COMPUTE X=UNIFORM(10).
5369 LIST/FORMAT=NUMBERED.
5372 The above example causes an active file to be created consisting of 50
5373 random variates between 0 and 10.
5375 @node LIST, MATRIX DATA, INPUT PROGRAM, Data Input and Output
5382 /CASES=FROM start_index TO end_index BY incr_index
5383 /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@}
5387 The @cmd{LIST} procedure prints the values of specified variables to the
5390 The VARIABLES subcommand specifies the variables whose values are to be
5391 printed. Keyword VARIABLES is optional. If VARIABLES subcommand is not
5392 specified then all variables in the active file are printed.
5394 The CASES subcommand can be used to specify a subset of cases to be
5395 printed. Specify FROM and the case number of the first case to print,
5396 TO and the case number of the last case to print, and BY and the number
5397 of cases to advance between printing cases, or any subset of those
5398 settings. If CASES is not specified then all cases are printed.
5400 The FORMAT subcommand can be used to change the output format. NUMBERED
5401 will print case numbers along with each case; UNNUMBERED, the default,
5402 causes the case numbers to be omitted. The WRAP and SINGLE settings are
5403 currently not used. WEIGHT will cause case weights to be printed along
5404 with variable values; NOWEIGHT, the default, causes case weights to be
5405 omitted from the output.
5407 Case numbers start from 1. They are counted after all transformations
5408 have been considered.
5410 @cmd{LIST} attempts to fit all the values on a single line. If needed
5411 to make them fit, variable names are displayed vertically. If values
5412 cannot fit on a single line, then a multi-line format will be used.
5414 @cmd{LIST} is a procedure. It causes the data to be read.
5416 @node MATRIX DATA, NEW FILE, LIST, Data Input and Output
5417 @section MATRIX DATA
5424 /FORMAT=@{LIST,FREE@} @{LOWER,UPPER,FULL@} @{DIAGONAL,NODIAGONAL@}
5425 /SPLIT=@{new_var,var_list@}
5429 /CONTENTS=@{N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE,
5430 DFE,MAT,COV,CORR,PROX@}
5433 @cmd{MATRIX DATA} command reads square matrices in one of several textual
5434 formats. @cmd{MATRIX DATA} clears the dictionary and replaces it and
5438 Use VARIABLES to specify the variables that form the rows and columns of
5439 the matrices. You may not specify a variable named @code{VARNAME_}. You
5440 should specify VARIABLES first.
5442 Specify the file to read on FILE, either as a file name string or a file
5443 handle (@pxref{FILE HANDLE}). If FILE is not specified then matrix data
5444 must immediately follow @cmd{MATRIX DATA} with a @cmd{BEGIN
5445 DATA}@dots{}@cmd{END DATA}
5446 construct (@pxref{BEGIN DATA}).
5448 The FORMAT subcommand specifies how the matrices are formatted. LIST,
5449 the default, indicates that there is one line per row of matrix data;
5450 FREE allows single matrix rows to be broken across multiple lines. This
5451 is analogous to the difference between @cmd{DATA LIST FREE} and
5452 @cmd{DATA LIST LIST}
5453 (@pxref{DATA LIST}). LOWER, the default, indicates that the lower
5454 triangle of the matrix is given; UPPER indicates the upper triangle; and
5455 FULL indicates that the entire matrix is given. DIAGONAL, the default,
5456 indicates that the diagonal is part of the data; NODIAGONAL indicates
5457 that it is omitted. DIAGONAL/NODIAGONAL have no effect when FULL is
5460 The SPLIT subcommand is used to specify @cmd{SPLIT FILE} variables for the
5461 input matrices (@pxref{SPLIT FILE}). Specify either a single variable
5462 not specified on VARIABLES, or one or more variables that are specified
5463 on VARIABLES. In the former case, the SPLIT values are not present in
5464 the data and ROWTYPE_ may not be specified on VARIABLES. In the latter
5465 case, the SPLIT values are present in the data.
5467 Specify a list of factor variables on FACTORS. Factor variables must
5468 also be listed on VARIABLES. Factor variables are used when there are
5469 some variables where, for each possible combination of their values,
5470 statistics on the matrix variables are included in the data.
5472 If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the
5473 CELLS subcommand is required. Specify the number of factor variable
5474 combinations that are given. For instance, if factor variable A has 2
5475 values and factor variable B has 3 values, specify 6.
5477 The N subcommand specifies a population number of observations. When N
5478 is specified, one N record is output for each @cmd{SPLIT FILE}.
5480 Use CONTENTS to specify what sort of information the matrices include.
5481 Each possible option is described in more detail below. When ROWTYPE_
5482 is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS
5483 is not specified then /CONTENTS=CORR is assumed.
5488 Number of observations as a vector, one value for each variable.
5490 Number of observations as a single value.
5496 Vector of standard deviations.
5500 Vector of mean squared errors.
5502 Vector of degrees of freedom.
5513 The exact semantics of the matrices read by @cmd{MATRIX DATA} are complex.
5514 Right now @cmd{MATRIX DATA} isn't too useful due to a lack of procedures
5515 accepting or producing related data, so these semantics aren't
5516 documented. Later, they'll be described here in detail.
5518 @node NEW FILE, PRINT, MATRIX DATA, Data Input and Output
5526 @cmd{NEW FILE} command clears the current active file.
5528 @node PRINT, PRINT EJECT, NEW FILE, Data Input and Output
5537 /[line_no] arg@dots{}
5539 arg takes one of the following forms:
5540 'string' [start-end]
5541 var_list start-end [type_spec]
5542 var_list (fortran_spec)
5546 The @cmd{PRINT} transformation writes variable data to an output file.
5547 @cmd{PRINT} is executed when a procedure causes the data to be read.
5548 Follow @cmd{PRINT} by @cmd{EXECUTE} to print variable data without
5549 invoking a procedure (@pxref{EXECUTE}).
5551 All @cmd{PRINT} subcommands are optional.
5553 The OUTFILE subcommand specifies the file to receive the output. The
5554 file may be a file name as a string or a file handle (@pxref{FILE
5555 HANDLE}). If OUTFILE is not present then output will be sent to PSPP's
5556 output listing file.
5558 The RECORDS subcommand specifies the number of lines to be output. The
5559 number of lines may optionally be surrounded by parentheses.
5561 TABLE will cause the PRINT command to output a table to the listing file
5562 that describes what it will print to the output file. NOTABLE, the
5563 default, suppresses this output table.
5565 Introduce the strings and variables to be printed with a slash
5566 (@samp{/}). Optionally, the slash may be followed by a number
5567 indicating which output line will be specified. In the absence of this
5568 line number, the next line number will be specified. Multiple lines may
5569 be specified using multiple slashes with the intended output for a line
5570 following its respective slash.
5572 Literal strings may be printed. Specify the string itself. Optionally
5573 the string may be followed by a column number or range of column
5574 numbers, specifying the location on the line for the string to be
5575 printed. Otherwise, the string will be printed at the current position
5578 Variables to be printed can be specified in the same ways as available
5579 for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}). In addition, a
5581 list may be followed by an asterisk (@samp{*}), which indicates that the
5582 variables should be printed in their dictionary print formats, separated
5583 by spaces. A variable list followed by a slash or the end of command
5584 will be interpreted the same way.
5586 If a FORTRAN type specification is used to move backwards on the current
5587 line, then text is written at that point on the line, the line will be
5588 truncated to that length, although additional text being added will
5589 again extend the line to that length.
5591 @node PRINT EJECT, PRINT SPACE, PRINT, Data Input and Output
5592 @section PRINT EJECT
5600 /[line_no] arg@dots{}
5602 arg takes one of the following forms:
5603 'string' [start-end]
5604 var_list start-end [type_spec]
5605 var_list (fortran_spec)
5609 @cmd{PRINT EJECT} writes data to an output file. Before the data is
5610 written, the current page in the listing file is ejected.
5612 @xref{PRINT}, for more information on syntax and usage.
5614 @node PRINT SPACE, REREAD, PRINT EJECT, Data Input and Output
5615 @section PRINT SPACE
5619 PRINT SPACE OUTFILE='filename' n_lines.
5622 @cmd{PRINT SPACE} prints one or more blank lines to an output file.
5624 The OUTFILE subcommand is optional. It may be used to direct output to
5625 a file specified by file name as a string or file handle (@pxref{FILE
5626 HANDLE}). If OUTFILE is not specified then output will be directed to
5629 n_lines is also optional. If present, it is an expression
5630 (@pxref{Expressions}) specifying the number of blank lines to be
5631 printed. The expression must evaluate to a nonnegative value.
5633 @node REREAD, REPEATING DATA, PRINT SPACE, Data Input and Output
5638 REREAD FILE=handle COLUMN=column.
5641 The @cmd{REREAD} transformation allows the previous input line in a
5643 already processed by @cmd{DATA LIST} or another input command to be re-read
5644 for further processing.
5646 The FILE subcommand, which is optional, is used to specify the file to
5647 have its line re-read. The file must be specified in the form of a file
5648 handle (@pxref{FILE HANDLE}). If FILE is not specified then the last
5649 file specified on @cmd{DATA LIST} will be assumed (last file specified
5650 lexically, not in terms of flow-of-control).
5652 By default, the line re-read is re-read in its entirety. With the
5653 COLUMN subcommand, a prefix of the line can be exempted from
5654 re-reading. Specify an expression (@pxref{Expressions}) evaluating to
5655 the first column that should be included in the re-read line. Columns
5656 are numbered from 1 at the left margin.
5658 Issuing @code{REREAD} multiple times will not back up in the data
5659 file. Instead, it will re-read the same line multiple times.
5661 @node REPEATING DATA, WRITE, REREAD, Data Input and Output
5662 @section REPEATING DATA
5663 @vindex REPEATING DATA
5671 /CONTINUED[=cont_start-cont_end]
5672 /ID=id_start-id_end=id_var
5674 /DATA=var_spec@dots{}
5676 where each var_spec takes one of the forms
5677 var_list start-end [type_spec]
5678 var_list (fortran_spec)
5681 @cmd{REPEATING DATA} parses groups of data repeating in
5682 a uniform format, possibly with several groups on a single line. Each
5683 group of data corresponds with one case. @cmd{REPEATING DATA} may only be
5684 used within an @cmd{INPUT PROGRAM} structure (@pxref{INPUT PROGRAM}).
5685 When used with @cmd{DATA LIST}, it
5686 can be used to parse groups of cases that share a subset of variables
5687 but differ in their other data.
5689 The STARTS subcommand is required. Specify a range of columns, using
5690 literal numbers or numeric variable names. This range specifies the
5691 columns on the first line that are used to contain groups of data. The
5692 ending column is optional. If it is not specified, then the record
5693 width of the input file is used. For the inline file (@pxref{BEGIN
5694 DATA}) this is 80 columns; for a file with fixed record widths it is the
5695 record width; for other files it is 1024 characters by default.
5697 The OCCURS subcommand is required. It must be a number or the name of a
5698 numeric variable. Its value is the number of groups present in the
5701 The DATA subcommand is required. It must be the last subcommand
5702 specified. It is used to specify the data present within each repeating
5703 group. Column numbers are specified relative to the beginning of a
5704 group at column 1. Data is specified in the same way as with @cmd{DATA LIST
5705 FIXED} (@pxref{DATA LIST FIXED}).
5707 All other subcommands are optional.
5709 FILE specifies the file to read, either a file name as a string or a
5710 file handle (@pxref{FILE HANDLE}). If FILE is not present then the
5711 default is the last file handle used on @cmd{DATA LIST} (lexically, not in
5712 terms of flow of control).
5714 By default @cmd{REPEATING DATA} will output a table describing how it will
5715 parse the input data. Specifying NOTABLE will disable this behavior;
5716 specifying TABLE will explicitly enable it.
5718 The LENGTH subcommand specifies the length in characters of each group.
5719 If it is not present then length is inferred from the DATA subcommand.
5720 LENGTH can be a number or a variable name.
5722 Normally all the data groups are expected to be present on a single
5723 line. Use the CONTINUED command to indicate that data can be continued
5724 onto additional lines. If data on continuation lines starts at the left
5725 margin and continues through the entire field width, no column
5726 specifications are necessary on CONTINUED. Otherwise, specify the
5727 possible range of columns in the same way as on STARTS.
5729 When data groups are continued from line to line, it is easy
5730 for cases to get out of sync through careless hand editing. The
5731 ID subcommand allows a case identifier to be present on each line of
5732 repeating data groups. @cmd{REPEATING DATA} will check for the same
5733 identifier on each line and report mismatches. Specify the range of
5734 columns that the identifier will occupy, followed by an equals sign
5735 (@samp{=}) and the identifier variable name. The variable must already
5736 have been declared with @cmd{NUMERIC} or another command.
5738 @cmd{REPEATING DATA} should be the last command given within an
5739 @cmd{INPUT PROGRAM}. It should not be enclosed within a @cmd{LOOP}
5740 structure (@pxref{LOOP}). Use @cmd{DATA LIST} before, not after,
5741 @cmd{REPEATING DATA}.
5743 @node WRITE, , REPEATING DATA, Data Input and Output
5752 /[line_no] arg@dots{}
5754 arg takes one of the following forms:
5755 'string' [start-end]
5756 var_list start-end [type_spec]
5757 var_list (fortran_spec)
5761 @code{WRITE} writes text or binary data to an output file.
5763 @xref{PRINT}, for more information on syntax and usage. The main
5764 difference between @code{PRINT} and @code{WRITE} is that @cmd{WRITE}
5765 uses write formats by default, where PRINT uses print formats.
5767 The sole additional difference is that if @cmd{WRITE} is used to send output
5768 to a binary file, carriage control characters will not be output.
5769 @xref{FILE HANDLE}, for information on how to declare a file as binary.
5771 @node System and Portable Files, Variable Attributes, Data Input and Output, Top
5772 @chapter System Files and Portable Files
5774 The commands in this chapter read, write, and examine system files and
5778 * APPLY DICTIONARY:: Apply system file dictionary to active file.
5779 * EXPORT:: Write to a portable file.
5780 * GET:: Read from a system file.
5781 * IMPORT:: Read from a portable file.
5782 * MATCH FILES:: Merge system files.
5783 * SAVE:: Write to a system file.
5784 * SYSFILE INFO:: Display system file dictionary.
5785 * XSAVE:: Write to a system file, as a transform.
5788 @node APPLY DICTIONARY, EXPORT, System and Portable Files, System and Portable Files
5789 @section APPLY DICTIONARY
5790 @vindex APPLY DICTIONARY
5793 APPLY DICTIONARY FROM='filename'.
5796 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
5797 and missing values from variables in a system file to corresponding
5798 variables in the active file. In some cases it also updates the
5801 Specify a system file with a file name string or as a file handle
5802 (@pxref{FILE HANDLE}). The dictionary in the system file will be read,
5803 but it will not replace the active file dictionary. The system file's
5804 data will not be read.
5806 Only variables with names that exist in both the active file and the
5807 system file are considered. Variables with the same name but different
5808 types (numeric, string) will cause an error message. Otherwise, the
5809 system file variables' attributes will replace those in their matching
5810 active file variables, as described below.
5812 If a system file variable has a variable label, then it will replace the
5813 active file variable's variable label. If the system file variable does
5814 not have a variable label, then the active file variable's variable
5815 label, if any, will be retained.
5817 If the active file variable is numeric or short string, then value
5818 labels and missing values, if any, will be copied to the active file
5819 variable. If the system file variable does not have value labels or
5820 missing values, then those in the active file variable, if any, will not
5823 Finally, weighting of the active file is updated (@pxref{WEIGHT}). If
5824 the active file has a weighting variable, and the system file does not,
5825 or if the weighting variable in the system file does not exist in the
5826 active file, then the active file weighting variable, if any, is
5827 retained. Otherwise, the weighting variable in the system file becomes
5828 the active file weighting variable.
5830 @cmd{APPLY DICTIONARY} takes effect immediately. It does not read the
5832 file. The system file is not modified.
5834 @node EXPORT, GET, APPLY DICTIONARY, System and Portable Files
5843 /RENAME=(src_names=target_names)@dots{}
5846 The @cmd{EXPORT} procedure writes the active file dictionary and data to a
5847 specified portable file.
5849 The OUTFILE subcommand, which is the only required subcommand, specifies
5850 the portable file to be written as a file name string or a file handle
5851 (@pxref{FILE HANDLE}).
5853 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
5856 @cmd{EXPORT} is a procedure. It causes the active file to be read.
5858 @node GET, IMPORT, EXPORT, System and Portable Files
5867 /RENAME=(src_names=target_names)@dots{}
5870 @cmd{GET} clears the current dictionary and active file and
5871 replaces them with the dictionary and data from a specified system file.
5873 The FILE subcommand is the only required subcommand. Specify the system
5874 file to be read as a string file name or a file handle (@pxref{FILE
5877 By default, all the variables in a system file are read. The DROP
5878 subcommand can be used to specify a list of variables that are not to be
5879 read. By contrast, the KEEP subcommand can be used to specify variable
5880 that are to be read, with all other variables not read.
5882 Normally variables in a system file retain the names that they were
5883 saved under. Use the RENAME subcommand to change these names. Specify,
5884 within parentheses, a list of variable names followed by an equals sign
5885 (@samp{=}) and the names that they should be renamed to. Multiple
5886 parenthesized groups of variable names can be included on a single
5887 RENAME subcommand. Variables' names may be swapped using a RENAME
5888 subcommand of the form @samp{/RENAME=(A B=B A)}.
5890 Alternate syntax for the RENAME subcommand allows the parentheses to be
5891 eliminated. When this is done, only a single variable may be renamed at
5892 once. For instance, @samp{/RENAME=A=B}. This alternate syntax is
5895 DROP, KEEP, and RENAME are performed in left-to-right order. They
5896 each may be present any number of times. @cmd{GET} never modifies a
5897 system file on disk. Only the active file read from the system file
5898 is affected by these subcommands.
5900 @cmd{GET} does not cause the data to be read, only the dictionary. The data
5901 is read later, when a procedure is executed.
5903 @node IMPORT, MATCH FILES, GET, System and Portable Files
5913 /RENAME=(src_names=target_names)@dots{}
5916 The @cmd{IMPORT} transformation clears the active file dictionary and
5918 replaces them with a dictionary and data from a portable file on disk.
5920 The FILE subcommand, which is the only required subcommand, specifies
5921 the portable file to be read as a file name string or a file handle
5922 (@pxref{FILE HANDLE}).
5924 The TYPE subcommand is currently not used.
5926 DROP, KEEP, and RENAME follow the syntax used by @cmd{GET} (@pxref{GET}).
5928 @cmd{IMPORT} does not cause the data to be read, only the dictionary. The
5929 data is read later, when a procedure is executed.
5931 @node MATCH FILES, SAVE, IMPORT, System and Portable Files
5932 @section MATCH FILES
5938 /@{FILE,TABLE@}=@{*,'filename'@}
5941 /RENAME=(src_names=target_names)@dots{}
5948 @cmd{MATCH FILES} merges one or more system files, optionally
5949 including the active file. Records with the same values for BY
5950 variables are combined into a single record. Records with different
5951 values are output in order. Thus, multiple sorted system files are
5952 combined into a single sorted system file based on the value of the BY
5953 variables. The results of the merge become the new active file.
5955 The BY subcommand specifies a list of variables that are used to match
5956 records from each of the system files. Variables specified must exist
5957 in all the files specified on FILE and TABLE. BY should usually be
5958 specified. If TABLE is used then BY is required.
5960 Specify FILE with a system file as a file name string or file handle
5961 (@pxref{FILE HANDLE}), or with an asterisk (@samp{*}) to
5962 indicate the current active file. The files specified on FILE are
5963 merged together based on the BY variables, or combined case-by-case if
5964 BY is not specified. Normally at least two FILE subcommands should be
5967 Specify TABLE with a system file to use it as a @dfn{table
5968 lookup file}. Records in table lookup files are not used up after
5969 they've been used once. This means that data in table lookup files can
5970 correspond to any number of records in FILE files. Table lookup files
5971 correspond to lookup tables in traditional relational database systems.
5972 It is incorrect to have records with duplicate BY values in table lookup
5975 Any number of FILE and TABLE subcommands may be specified. Each
5976 instance of FILE or TABLE can be followed by DROP, KEEP, and/or RENAME
5977 subcommands. These take the same form as the corresponding subcommands
5978 of @cmd{GET} (@pxref{GET}), and perform the same functions.
5980 Variables belonging to files that are not present for the current case
5981 are set to the system-missing value for numeric variables or spaces for
5984 IN, FIRST, LAST, and MAP are currently not used.
5986 @cmd{MATCH FILES} may not be specified following @cmd{TEMPORARY}
5987 (@pxref{TEMPORARY}) if the active file is used as an input source.
5989 @node SAVE, SYSFILE INFO, MATCH FILES, System and Portable Files
5996 /@{COMPRESSED,UNCOMPRESSED@}
5999 /RENAME=(src_names=target_names)@dots{}
6002 The @cmd{SAVE} procedure causes the dictionary and data in the active
6004 be written to a system file.
6006 FILE is the only required subcommand. Specify the system
6007 file to be written as a string file name or a file handle (@pxref{FILE
6010 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
6011 system file is compressed. By default, system files are compressed.
6012 This default can be changed with the SET command (@pxref{SET}).
6014 By default, all the variables in the active file dictionary are written
6015 to the system file. The DROP subcommand can be used to specify a list
6016 of variables not to be written. In contrast, KEEP specifies variables
6017 to be written, with all variables not specified not written.
6019 Normally variables are saved to a system file under the same names they
6020 have in the active file. Use the RENAME subcommand to change these names.
6021 Specify, within parentheses, a list of variable names followed by an
6022 equals sign (@samp{=}) and the names that they should be renamed to.
6023 Multiple parenthesized groups of variable names can be included on a
6024 single RENAME subcommand. Variables' names may be swapped using a
6025 RENAME subcommand of the form @samp{/RENAME=(A B=B A)}.
6027 Alternate syntax for the RENAME subcommand allows the parentheses to be
6028 eliminated. When this is done, only a single variable may be renamed at
6029 once. For instance, @samp{/RENAME=A=B}. This alternate syntax is
6032 DROP, KEEP, and RENAME are performed in left-to-right order. They
6033 each may be present any number of times. @cmd{SAVE} never modifies
6034 the active file. DROP, KEEP, and RENAME only affect the system file
6037 @cmd{SAVE} causes the data to be read. It is a procedure.
6039 @node SYSFILE INFO, XSAVE, SAVE, System and Portable Files
6040 @section SYSFILE INFO
6041 @vindex SYSFILE INFO
6044 SYSFILE INFO FILE='filename'.
6047 @cmd{SYSFILE INFO} reads the dictionary in a system file and
6048 displays the information in its dictionary.
6050 Specify a file name or file handle. @cmd{SYSFILE INFO} reads that file as
6051 a system file and displays information on its dictionary.
6053 @cmd{SYSFILE INFO} does not affect the current active file.
6055 @node XSAVE, , SYSFILE INFO, System and Portable Files
6062 /@{COMPRESSED,UNCOMPRESSED@}
6065 /RENAME=(src_names=target_names)@dots{}
6068 The @cmd{XSAVE} transformation writes the active file dictionary and
6070 system file stored on disk.
6072 @cmd{XSAVE} is a transformation, not a procedure. It is executed when the
6073 data is read by a procedure or procedure-like command. In all other
6074 respects, @cmd{XSAVE} is identical to @cmd{SAVE}. @xref{SAVE}, for
6076 on syntax and usage.
6078 @node Variable Attributes, Data Manipulation, System and Portable Files, Top
6079 @chapter Manipulating variables
6081 The variables in the active file dictionary are important. There are
6082 several utility functions for examining and adjusting them.
6085 * ADD VALUE LABELS:: Add value labels to variables.
6086 * DISPLAY:: Display variable names & descriptions.
6087 * DISPLAY VECTORS:: Display a list of vectors.
6088 * FORMATS:: Set print and write formats.
6089 * LEAVE:: Don't clear variables between cases.
6090 * MISSING VALUES:: Set missing values for variables.
6091 * MODIFY VARS:: Rename, reorder, and drop variables.
6092 * NUMERIC:: Create new numeric variables.
6093 * PRINT FORMATS:: Set variable print formats.
6094 * RENAME VARIABLES:: Rename variables.
6095 * VALUE LABELS:: Set value labels for variables.
6096 * STRING:: Create new string variables.
6097 * VARIABLE LABELS:: Set variable labels for variables.
6098 * VECTOR:: Declare an array of variables.
6099 * WRITE FORMATS:: Set variable write formats.
6102 @node ADD VALUE LABELS, DISPLAY, Variable Attributes, Variable Attributes
6103 @section ADD VALUE LABELS
6104 @vindex ADD VALUE LABELS
6108 /var_list value 'label' [value 'label']@dots{}
6111 @cmd{ADD VALUE LABELS} has the same syntax and purpose as @cmd{VALUE
6112 LABELS} (@pxref{VALUE LABELS}), but it does not clear value
6113 labels from the variables before adding the ones specified.
6115 @node DISPLAY, DISPLAY VECTORS, ADD VALUE LABELS, Variable Attributes
6120 DISPLAY @{NAMES,INDEX,LABELS,VARIABLES,DICTIONARY,SCRATCH@}
6124 @cmd{DISPLAY} displays requested information on variables. Variables can
6125 optionally be sorted alphabetically. The entire dictionary or just
6126 specified variables can be described.
6128 One of the following keywords can be present:
6132 The variables' names are displayed.
6135 The variables' names are displayed along with a value describing their
6136 position within the active file dictionary.
6139 Variable names, positions, and variable labels are displayed.
6142 Variable names, positions, print and write formats, and missing values
6146 Variable names, positions, print and write formats, missing values,
6147 variable labels, and value labels are displayed.
6150 Varible names are displayed, for scratch variables only (@pxref{Scratch
6154 If SORTED is specified, then the variables are displayed in ascending
6155 order based on their names; otherwise, they are displayed in the order
6156 that they occur in the active file dictionary.
6158 @node DISPLAY VECTORS, FORMATS, DISPLAY, Variable Attributes
6159 @section DISPLAY VECTORS
6160 @vindex DISPLAY VECTORS
6166 @cmd{DISPLAY VECTORS} lists all the currently declared vectors.
6168 @node FORMATS, LEAVE, DISPLAY VECTORS, Variable Attributes
6173 FORMATS var_list (fmt_spec).
6176 @cmd{FORMATS} set both print and write formats for the specified
6177 variables to the specified format specification. @xref{Input/Output
6180 Specify a list of variables followed by a format specification in
6181 parentheses. The print and write formats of the specified variables
6184 Additional lists of variables and formats may be included if they are
6185 delimited by a slash (@samp{/}).
6187 @cmd{FORMATS} takes effect immediately. It is not affected by
6188 conditional and looping structures such as @cmd{DO IF} or @cmd{LOOP}.
6190 @node LEAVE, MISSING VALUES, FORMATS, Variable Attributes
6198 @cmd{LEAVE} prevents the specified variables from being
6199 reinitialized whenever a new case is processed.
6201 Normally, when a data file is processed, every variable in the active
6202 file is initialized to the system-missing value or spaces at the
6203 beginning of processing for each case. When a variable has been
6204 specified on @cmd{LEAVE}, this is not the case. Instead, that variable is
6205 initialized to 0 (not system-missing) or spaces for the first case.
6206 After that, it retains its value between cases.
6208 This becomes useful for counters. For instance, in the example below
6209 the variable SUM maintains a running total of the values in the ITEM
6213 DATA LIST /ITEM 1-3.
6214 COMPUTE SUM=SUM+ITEM.
6225 @noindent Partial output from this example:
6234 It is best to use @cmd{LEAVE} command immediately before invoking a
6235 procedure command, because the left status of variables is reset by
6236 certain transformations---for instance, @cmd{COMPUTE} and @cmd{IF}.
6237 Left status is also reset by all procedure invocations.
6239 @node MISSING VALUES, MODIFY VARS, LEAVE, Variable Attributes
6240 @section MISSING VALUES
6241 @vindex MISSING VALUES
6244 MISSING VALUES var_list (missing_values).
6246 missing_values takes one of the following forms:
6251 num1 THRU num2, num3
6254 string1, string2, string3
6255 As part of a range, LO or LOWEST may take the place of num1;
6256 HI or HIGHEST may take the place of num2.
6259 @cmd{MISSING VALUES} sets user-missing values for numeric and
6260 short string variables. Long string variables may not have missing
6263 Specify a list of variables, followed by a list of their user-missing
6264 values in parentheses. Up to three discrete values may be given, or,
6265 for numeric variables only, a range of values optionally accompanied by
6266 a single discrete value. Ranges may be open-ended on one end, indicated
6267 through the use of the keyword LO or LOWEST or HI or HIGHEST.
6269 The @cmd{MISSING VALUES} command takes effect immediately. It is not
6270 affected by conditional and looping constructs such as @cmd{DO IF} or
6273 @node MODIFY VARS, NUMERIC, MISSING VALUES, Variable Attributes
6274 @section MODIFY VARS
6279 /REORDER=@{FORWARD,BACKWARD@} @{POSITIONAL,ALPHA@} (var_list)@dots{}
6280 /RENAME=(old_names=new_names)@dots{}
6281 /@{DROP,KEEP@}=var_list
6285 @cmd{MODIFY VARS} reorders, renames, and deletes variables in the
6288 At least one subcommand must be specified, and no subcommand may be
6289 specified more than once. DROP and KEEP may not both be specified.
6291 The REORDER subcommand changes the order of variables in the active
6292 file. Specify one or more lists of variable names in parentheses. By
6293 default, each list of variables is rearranged into the specified order.
6294 To put the variables into the reverse of the specified order, put
6295 keyword BACKWARD before the parentheses. To put them into alphabetical
6296 order in the dictionary, specify keyword ALPHA before the parentheses.
6297 BACKWARD and ALPHA may also be combined.
6299 To rename variables in the active file, specify RENAME, an equals sign
6300 (@samp{=}), and lists of the old variable names and new variable names
6301 separated by another equals sign within parentheses. There must be the
6302 same number of old and new variable names. Each old variable is renamed to
6303 the corresponding new variable name. Multiple parenthesized groups of
6304 variables may be specified.
6306 The DROP subcommand deletes a specified list of variables from the
6309 The KEEP subcommand keeps the specified list of variables in the active
6310 file. Any unlisted variables are deleted from the active file.
6312 MAP is currently ignored.
6314 If either DROP or KEEP is specified, the data is read; otherwise it is
6317 @cmd{MODIFY VARS} may not be specified following @cmd{TEMPORARY}
6318 (@pxref{TEMPORARY}).
6320 @node NUMERIC, PRINT FORMATS, MODIFY VARS, Variable Attributes
6325 NUMERIC /var_list [(fmt_spec)].
6328 @cmd{NUMERIC} explicitly declares new numeric variables, optionally
6329 setting their output formats.
6331 Specify a slash (@samp{/}), followed by the names of the new numeric
6332 variables. If you wish to set their output formats, follow their names
6333 by an output format specification in parentheses (@pxref{Input/Output
6334 Formats}); otherwise, the default is F8.2.
6336 Variables created with @cmd{NUMERIC} are initialized to the
6337 system-missing value.
6339 @node PRINT FORMATS, RENAME VARIABLES, NUMERIC, Variable Attributes
6340 @section PRINT FORMATS
6341 @vindex PRINT FORMATS
6344 PRINT FORMATS var_list (fmt_spec).
6347 @cmd{PRINT FORMATS} sets the print formats for the specified
6348 variables to the specified format specification.
6350 Its syntax is identical to that of @cmd{FORMATS} (@pxref{FORMATS}),
6351 but @cmd{PRINT FORMATS} sets only print formats, not write formats.
6353 @node RENAME VARIABLES, VALUE LABELS, PRINT FORMATS, Variable Attributes
6354 @section RENAME VARIABLES
6355 @vindex RENAME VARIABLES
6358 RENAME VARIABLES (old_names=new_names)@dots{} .
6361 @cmd{RENAME VARIABLES} changes the names of variables in the active
6362 file. Specify lists of the old variable names and new
6363 variable names, separated by an equals sign (@samp{=}), within
6364 parentheses. There must be the same number of old and new variable
6365 names. Each old variable is renamed to the corresponding new variable
6366 name. Multiple parenthesized groups of variables may be specified.
6368 @cmd{RENAME VARIABLES} takes effect immediately. It does not cause the data
6371 @cmd{RENAME VARIABLES} may not be specified following @cmd{TEMPORARY}
6372 (@pxref{TEMPORARY}).
6374 @node VALUE LABELS, STRING, RENAME VARIABLES, Variable Attributes
6375 @section VALUE LABELS
6376 @vindex VALUE LABELS
6380 /var_list value 'label' [value 'label']@dots{}
6383 @cmd{VALUE LABELS} allows values of numeric and short string
6384 variables to be associated with labels. In this way, a short value can
6385 stand for a long value.
6387 To set up value labels for a set of variables, specify the
6388 variable names after a slash (@samp{/}), followed by a list of values
6389 and their associated labels, separated by spaces. Long string
6390 variables may not be specified.
6392 Before @cmd{VALUE LABELS} is executed, any existing value labels
6393 are cleared from the variables specified. Use @cmd{ADD VALUE LABELS}
6394 (@pxref{ADD VALUE LABELS}) to add value labels without clearing those
6397 @node STRING, VARIABLE LABELS, VALUE LABELS, Variable Attributes
6402 STRING /var_list (fmt_spec).
6405 @cmd{STRING} creates new string variables for use in
6408 Specify a slash (@samp{/}), followed by the names of the string
6409 variables to create and the desired output format specification in
6410 parentheses (@pxref{Input/Output Formats}). Variable widths are
6411 implicitly derived from the specified output formats.
6413 Created variables are initialized to spaces.
6415 @node VARIABLE LABELS, VECTOR, STRING, Variable Attributes
6416 @section VARIABLE LABELS
6417 @vindex VARIABLE LABELS
6421 /var_list 'var_label'.
6424 @cmd{VARIABLE LABELS} associates explanatory names
6425 with variables. This name, called a @dfn{variable label}, is displayed by
6426 statistical procedures.
6428 To assign a variable label to a group of variables, specify a slash
6429 (@samp{/}), followed by the list of variable names and the variable
6432 @node VECTOR, WRITE FORMATS, VARIABLE LABELS, Variable Attributes
6437 Two possible syntaxes:
6438 VECTOR vec_name=var_list.
6439 VECTOR vec_name_list(count).
6442 @cmd{VECTOR} allows a group of variables to be accessed as if they
6443 were consecutive members of an array with a vector(index) notation.
6445 To make a vector out of a set of existing variables, specify a name for
6446 the vector followed by an equals sign (@samp{=}) and the variables that
6447 belong in the vector.
6449 To make a vector and create variables at the same time, specify one or
6450 more vector names followed by a count in parentheses. This will cause
6451 variables named @code{@var{vec}1} through @code{@var{vec}@var{count}}
6452 to be created as numeric variables with print and write format F8.2.
6453 Variable names including numeric suffixes may not exceed 8 characters
6454 in length, and none of the variables may exist prior to @cmd{VECTOR}.
6456 All the variables in a vector must be the same type.
6458 Vectors created with @cmd{VECTOR} disappear after any procedure or
6459 procedure-like command is executed. The variables contained in the
6460 vectors remain, unless they are scratch variables (@pxref{Scratch
6463 Variables within a vector may be references in expressions using
6464 @code{vector(index)} syntax.
6466 @node WRITE FORMATS, , VECTOR, Variable Attributes
6467 @section WRITE FORMATS
6468 @vindex WRITE FORMATS
6471 WRITE FORMATS var_list (fmt_spec).
6474 @cmd{WRITE FORMATS} sets the write formats for the specified variables
6475 to the specified format specification. Its syntax is identical to
6476 that of FORMATS (@pxref{FORMATS}), but @cmd{WRITE FORMATS} sets only
6477 write formats, not print formats.
6479 @node Data Manipulation, Data Selection, Variable Attributes, Top
6480 @chapter Data transformations
6481 @cindex transformations
6483 The PSPP procedures examined in this chapter manipulate data and
6484 prepare the active file for later analyses. They do not produce output,
6488 * AGGREGATE:: Summarize multiple cases into a single case.
6489 * AUTORECODE:: Automatic recoding of variables.
6490 * COMPUTE:: Assigning a variable a calculated value.
6491 * COUNT:: Counting variables with particular values.
6492 * FLIP:: Exchange variables with cases.
6493 * IF:: Conditionally assigning a calculated value.
6494 * RECODE:: Mapping values from one set to another.
6495 * SORT CASES:: Sort the active file.
6498 @node AGGREGATE, AUTORECODE, Data Manipulation, Data Manipulation
6506 /OUTFILE=@{*,'filename'@}
6509 /dest_vars=agr_func(src_vars, args@dots{})@dots{}
6512 @cmd{AGGREGATE} summarizes groups of cases into single cases.
6513 Cases are divided into groups that have the same values for one or more
6514 variables called @dfn{break variables}. Several functions are available
6515 for summarizing case contents.
6517 At least one break variable must be specified on BREAK, the only
6518 required subcommand. The values of these variables are used to divide
6519 the active file into groups to be summarized. In addition, at least
6520 one @var{dest_var} must be specified.
6522 By default, the active file is sorted based on the break variables
6523 before aggregation takes place. If the active file is already sorted
6524 or otherwise grouped in terms of the break variables, specify
6525 PRESORTED to save time.
6527 The OUTFILE subcommand specifies a system file by file name string or
6528 file handle (@pxref{FILE HANDLE}). The aggregated cases are written to
6529 this file. If OUTFILE is not specified, or if @samp{*} is specified,
6530 then the aggregated cases replace the active file.
6532 Specify DOCUMENT to copy the documents from the active file into the
6533 aggregate file (@pxref{DOCUMENT}). Otherwise, the aggregate file will
6534 not contain any documents, even if the aggregate file replaces the
6537 One or more sets of aggregation variables must be specified. Each set
6538 comprises a list of aggregation variables, an equals sign (@samp{=}),
6539 the name of an aggregation function (see the list below), and a list
6540 of source variables in parentheses. Some aggregation functions expect
6541 additional arguments following the source variable names.
6543 Each set must have exactly as many source variables as aggregation
6544 variables. Each aggregation variable receives the results of applying
6545 the specified aggregation function to the corresponding source
6546 variable. Most aggregation functions may be applied to numeric and
6547 short and long string variables. Others, marked below, are restricted
6550 The available aggregation functions are as follows:
6554 Sum. Limited to numeric values.
6555 @item MEAN(var_name)
6556 Arithmetic mean. Limited to numeric values.
6558 Standard deviation of the mean. Limited to numeric values.
6563 @item FGT(var_name, value)
6564 @itemx PGT(var_name, value)
6565 Fraction between 0 and 1, or percentage between 0 and 100, respectively,
6566 of values greater than the specified constant.
6567 @item FLT(var_name, value)
6568 @itemx PLT(var_name, value)
6569 Fraction or percentage, respectively, of values less than the specified
6571 @item FIN(var_name, low, high)
6572 @itemx PIN(var_name, low, high)
6573 Fraction or percentage, respectively, of values within the specified
6574 inclusive range of constants.
6575 @item FOUT(var_name, low, high)
6576 @itemx POUT(var_name, low, high)
6577 Fraction or percentage, respectively, of values strictly outside the
6578 specified range of constants.
6580 Number of non-missing values.
6582 Number of cases aggregated to form this group. Don't supply a source
6583 variable for this aggregation function.
6585 Number of non-missing values. Each case is considered to have a weight
6586 of 1, regardless of the current weighting variable (@pxref{WEIGHT}).
6588 Number of cases aggregated to form this group. Each case is considered
6589 to have a weight of 1, regardless of the current weighting variable.
6590 @item NMISS(var_name)
6591 Number of missing values.
6592 @item NUMISS(var_name)
6593 Number of missing values. Each case is considered to have a weight of
6594 1, regardless of the current weighting variable.
6595 @item FIRST(var_name)
6596 First value in this group.
6597 @item LAST(var_name)
6598 Last value in this group.
6601 Aggregation functions compare string values in terms of internal
6602 character codes. On most modern computers, this is a form of ASCII.
6604 The aggregation functions listed above exclude all user-missing values
6605 from calculations. To include user-missing values, insert a period
6606 (@samp{.}) between the function name and left parenthesis
6609 Normally, only a single case (for SD and SD., two cases) need be
6610 non-missing in each group for the aggregate variable to be
6611 non-missing. Specifying /MISSING=COLUMNWISE inverts this behavior, so
6612 that the aggregate variable becomes missing if any aggregated value is
6615 @cmd{AGGREGATE} both ignores and cancels the current @cmd{SPLIT FILE}
6616 settings (@pxref{SPLIT FILE}).
6618 @node AUTORECODE, COMPUTE, AGGREGATE, Data Manipulation
6623 AUTORECODE VARIABLES=src_vars INTO dest_vars
6628 The @cmd{AUTORECODE} procedure considers the @var{n} values that a variable
6629 takes on and maps them onto values 1@dots{}@var{n} on a new numeric
6632 Subcommand VARIABLES is the only required subcommand and must come
6633 first. Specify VARIABLES, an equals sign (@samp{=}), a list of source
6634 variables, INTO, and a list of target variables. There must the same
6635 number of source and target variables. The target variables must not
6638 By default, increasing values of a source variable (for a string, this
6639 is based on character code comparisons) are recoded to increasing values
6640 of its target variable. To cause increasing values of a source variable
6641 to be recoded to decreasing values of its target variable (@var{n} down
6642 to 1), specify DESCENDING.
6644 PRINT is currently ignored.
6646 @cmd{AUTORECODE} is a procedure. It causes the data to be read.
6648 @node COMPUTE, COUNT, AUTORECODE, Data Manipulation
6653 COMPUTE variable = expression.
6655 COMPUTE vector(index) = expression.
6658 @cmd{COMPUTE} assigns the value of an expression to a target
6659 variable. For each case, the expression is evaluated and its value
6660 assigned to the target variable. Numeric and short and long string
6661 variables may be assigned. When a string expression's width differs
6662 from the target variable's width, the string result of the expression
6663 is truncated or padded with spaces on the right as necessary. The
6664 expression and variable types must match.
6666 For numeric variables only, the target variable need not already
6667 exist. Numeric variables created by @cmd{COMPUTE} are assigned an
6668 @code{F8.2} output format. String variables must be declared before
6669 they can be used as targets for @cmd{COMPUTE}.
6671 The target variable may be specified as an element of a vector
6672 (@pxref{VECTOR}). In this case, a vector index expression must be
6673 specified in parentheses following the vector name. The index
6674 expression must evaluate to a numeric value that, after rounding down
6675 to the nearest integer, is a valid index for the named vector.
6677 Using @cmd{COMPUTE} to assign to a variable specified on @cmd{LEAVE}
6678 (@pxref{LEAVE}) resets the variable's left state. Therefore,
6679 @code{LEAVE} should be specified following @cmd{COMPUTE}, not before.
6681 @cmd{COMPUTE} is a transformation. It does not cause the active file to be
6684 When @cmd{COMPUTE} is specified following @cmd{TEMPORARY}
6685 (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
6688 @node COUNT, FLIP, COMPUTE, Data Manipulation
6693 COUNT var_name = var@dots{} (value@dots{}).
6695 Each value takes one of the following forms:
6701 In addition, num1 and num2 can be LO or LOWEST, or HI or HIGHEST,
6705 @cmd{COUNT} creates or replaces a numeric @dfn{target} variable that
6706 counts the occurrence of a @dfn{criterion} value or set of values over
6707 one or more @dfn{test} variables for each case.
6709 The target variable values are always nonnegative integers. They are
6710 never missing. The target variable is assigned an F8.2 output format.
6711 @xref{Input/Output Formats}. Any variables, including long and short
6712 string variables, may be test variables.
6714 User-missing values of test variables are treated just like any other
6715 values. They are @strong{not} treated as system-missing values.
6716 User-missing values that are criterion values or inside ranges of
6717 criterion values are counted as any other values. However (for numeric
6718 variables), keyword MISSING may be used to refer to all system-
6719 and user-missing values.
6721 @cmd{COUNT} target variables are assigned values in the order
6722 specified. In the command @code{COUNT A=A B(1) /B=A B(2).}, the
6723 following actions occur:
6727 The number of occurrences of 1 between @code{A} and @code{B} is counted.
6730 @code{A} is assigned this value.
6733 The number of occurrences of 1 between @code{B} and the @strong{new}
6734 value of @code{A} is counted.
6737 @code{B} is assigned this value.
6740 Despite this ordering, all @cmd{COUNT} criterion variables must exist
6741 before the procedure is executed---they may not be created as target
6742 variables earlier in the command! Break such a command into two
6745 The examples below may help to clarify.
6749 Assuming @code{Q0}, @code{Q2}, @dots{}, @code{Q9} are numeric variables,
6750 the following commands:
6754 Count the number of times the value 1 occurs through these variables
6755 for each case and assigns the count to variable @code{QCOUNT}.
6758 Print out the total number of times the value 1 occurs throughout
6759 @emph{all} cases using @cmd{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for
6764 COUNT QCOUNT=Q0 TO Q9(1).
6765 DESCRIPTIVES QCOUNT /STATISTICS=SUM.
6769 Given these same variables, the following commands:
6773 Count the number of valid values of these variables for each case and
6774 assigns the count to variable @code{QVALID}.
6777 Multiplies each value of @code{QVALID} by 10 to obtain a percentage of
6778 valid values, using @cmd{COMPUTE}. @xref{COMPUTE}, for details.
6781 Print out the percentage of valid values across all cases, using
6782 @cmd{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for details.
6786 COUNT QVALID=Q0 TO Q9 (LO THRU HI).
6787 COMPUTE QVALID=QVALID*10.
6788 DESCRIPTIVES QVALID /STATISTICS=MEAN.
6792 @node FLIP, IF, COUNT, Data Manipulation
6797 FLIP /VARIABLES=var_list /NEWNAMES=var_name.
6800 @cmd{FLIP} transposes rows and columns in the active file. It
6801 causes cases to be swapped with variables, and vice versa.
6803 All variables in the transposed active file are numeric. String
6804 variables take on the system-missing value in the transposed file.
6806 No subcommands are required. The VARIABLES subcommand specifies
6807 variables that will be transformed into cases. Variables not specified
6808 are discarded. By default, all variables are selected for
6811 The variables specified by NEWNAMES, which must be a string variable, is
6812 used to give names to the variables created by @cmd{FLIP}. If
6814 specified then the default is a variable named CASE_LBL, if it exists.
6815 If it does not then the variables created by FLIP are named VAR000
6816 through VAR999, then VAR1000, VAR1001, and so on.
6818 When a NEWNAMES variable is available, the names must be canonicalized
6819 before becoming variable names. Invalid characters are replaced by
6820 letter @samp{V} in the first position, or by @samp{_} in subsequent
6821 positions. If the name thus generated is not unique, then numeric
6822 extensions are added, starting with 1, until a unique name is found or
6823 there are no remaining possibilities. If the latter occurs then the
6824 FLIP operation aborts.
6826 The resultant dictionary contains a CASE_LBL variable, which stores the
6827 names of the variables in the dictionary before the transposition. If
6828 the active file is subsequently transposed using @cmd{FLIP}, this
6830 be used to recreate the original variable names.
6832 FLIP honors N OF CASES. It ignores TEMPORARY, so that ``temporary''
6833 transformations become permanent.
6835 @node IF, RECODE, FLIP, Data Manipulation
6840 IF condition variable=expression.
6842 IF condition vector(index)=expression.
6845 The @cmd{IF} transformation conditionally assigns the value of a target
6846 expression to a target variable, based on the truth of a test
6849 Specify a boolean-valued expression (@pxref{Expressions}) to be tested
6850 following the IF keyword. This expression is evaluated for each case.
6851 If the value is true, then the value of the expression is computed and
6852 assigned to the specified variable. If the value is false or missing,
6853 nothing is done. Numeric and short and long string variables may be
6854 assigned. When a string expression's width differs from the target
6855 variable's width, the string result of the expression is truncated or
6856 padded with spaces on the right as necessary. The expression and
6857 variable types must match.
6859 The target variable may be specified as an element of a vector
6860 (@pxref{VECTOR}). In this case, a vector index expression must be
6861 specified in parentheses following the vector name. The index
6862 expression must evaluate to a numeric value that, after rounding down
6863 to the nearest integer, is a valid index for the named vector.
6865 Using @cmd{IF} to assign to a variable specified on @cmd{LEAVE}
6866 (@pxref{LEAVE}) resets the variable's left state. Therefore,
6867 @code{LEAVE} should be specified following @cmd{IF}, not before.
6869 When @cmd{IF} is specified following @cmd{TEMPORARY}
6870 (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
6873 @node RECODE, SORT CASES, IF, Data Manipulation
6878 RECODE var_list (src_value@dots{}=dest_value)@dots{} [INTO var_list].
6880 src_value may take the following forms:
6887 Open-ended ranges may be specified using LO or LOWEST for num1
6888 or HI or HIGHEST for num2.
6890 dest_value may take the following forms:
6897 @cmd{RECODE} translates data from one range of values to
6898 another, via flexible user-specified mappings. Data may be remapped
6899 in-place or copied to new variables. Numeric, short string, and long
6900 string data can be recoded.
6902 Specify the list of source variables, followed by one or more mapping
6903 specifications each enclosed in parentheses. If the data is to be
6904 copied to new variables, specify INTO, then the list of target
6905 variables. String target variables must already have been declared
6906 using @cmd{STRING} or another transformation, but numeric target
6908 be created on the fly. There must be exactly as many target variables
6909 as source variables. Each source variable is remapped into its
6910 corresponding target variable.
6912 When INTO is not used, the input and output variables must be of the
6913 same type. Otherwise, string values can be recoded into numeric values,
6914 and vice versa. When this is done and there is no mapping for a
6915 particular value, either a value consisting of all spaces or the
6916 system-missing value is assigned, depending on variable type.
6918 Mappings are considered from left to right. The first src_value that
6919 matches the value of the source variable causes the target variable to
6920 receive the value indicated by the dest_value. Literal number, string,
6921 and range src_value's should be self-explanatory. MISSING as a
6922 src_value matches any user- or system-missing value. SYSMIS matches the
6923 system missing value only. ELSE is a catch-all that matches anything.
6924 It should be the last src_value specified.
6926 Numeric and string dest_value's should also be self-explanatory. COPY
6927 causes the input values to be copied to the output. This is only value
6928 if the source and target variables are of the same type. SYSMIS
6929 indicates the system-missing value.
6931 If the source variables are strings and the target variables are
6932 numeric, then there is one additional mapping available: (CONVERT),
6933 which must be the last specified mapping. CONVERT causes a number
6934 specified as a string to be converted to a numeric value. If the string
6935 cannot be parsed as a number, then the system-missing value is assigned.
6937 Multiple recodings can be specified on a single @cmd{RECODE} invocation.
6938 Introduce additional recodings with a slash (@samp{/}) to
6939 separate them from the previous recodings.
6941 @node SORT CASES, , RECODE, Data Manipulation
6946 SORT CASES BY var_list.
6949 @cmd{SORT CASES} sorts the active file by the values of one or more
6952 Specify BY and a list of variables to sort by. By default, variables
6953 are sorted in ascending order. To override sort order, specify (D) or
6954 (DOWN) after a list of variables to get descending order, or (A) or (UP)
6955 for ascending order. These apply to the entire list of variables
6958 @cmd{SORT CASES} is a procedure. It causes the data to be read.
6960 @cmd{SORT CASES} attempts to sort the entire active file in main memory.
6961 If main memory is exhausted, it falls back to a merge sort algorithm that
6962 involves writing and reading numerous temporary files.
6964 @cmd{SORT CASES} may not be specified following TEMPORARY.
6966 @node Data Selection, Conditionals and Looping, Data Manipulation, Top
6967 @chapter Selecting data for analysis
6969 This chapter documents PSPP commands that temporarily or permanently
6970 select data records from the active file for analysis.
6973 * FILTER:: Exclude cases based on a variable.
6974 * N OF CASES:: Limit the size of the active file.
6975 * PROCESS IF:: Temporarily excluding cases.
6976 * SAMPLE:: Select a specified proportion of cases.
6977 * SELECT IF:: Permanently delete selected cases.
6978 * SPLIT FILE:: Do multiple analyses with one command.
6979 * TEMPORARY:: Make transformations' effects temporary.
6980 * WEIGHT:: Weight cases by a variable.
6983 @node FILTER, N OF CASES, Data Selection, Data Selection
6992 @cmd{FILTER} allows a boolean-valued variable to be used to select
6993 cases from the data stream for processing.
6995 To set up filtering, specify BY and a variable name. Keyword
6996 BY is optional but recommended. Cases which have a zero or system- or
6997 user-missing value are excluded from analysis, but not deleted from the
6998 data stream. Cases with other values are analyzed.
6999 To filter based on a different condition, use
7000 transformations such as @cmd{COMPUTE} or @cmd{RECODE} to compute a
7001 filter variable of the required form, then specify that variable on
7004 @code{FILTER OFF} turns off case filtering.
7006 Filtering takes place immediately before cases pass to a procedure for
7007 analysis. Only one filter variable may be active at a time. Normally,
7008 case filtering continues until it is explicitly turned off with @code{FILTER
7009 OFF}. However, if @cmd{FILTER} is placed after TEMPORARY, it filters only
7010 the next procedure or procedure-like command.
7012 @node N OF CASES, PROCESS IF, FILTER, Data Selection
7017 N [OF CASES] num_of_cases [ESTIMATED].
7020 Sometimes you may want to disregard cases of your input. @cmd{N} can
7021 do this. @code{N 100} tells PSPP to disregard all cases after the
7024 If the value specified for @cmd{N} is greater than the number of cases
7025 read in, the value is ignored.
7027 @cmd{N} does not discard cases or prevent them from being read. It
7028 just causes cases beyond the last one specified to be ignored by data
7031 A later @cmd{N} command can increase or decrease the number of cases
7032 selected. (To select all the cases without knowing how many there are,
7033 specify a very high number: 100000 or whatever you think is large enough.)
7035 Transformation procedures performed after @cmd{N} is executed
7036 @emph{do} cause cases to be discarded.
7038 @cmd{SAMPLE}, @cmd{PROCESS IF}, and @cmd{SELECT IF} have
7039 precedence over @cmd{N}---the same results are obtained by both of the
7040 following fragments, given the same random number seeds:
7043 @i{@dots{}set up, read in data@dots{}}
7046 @i{@dots{}analyze data@dots{}}
7048 @i{@dots{}set up, read in data@dots{}}
7051 @i{@dots{}analyze data@dots{}}
7054 Both fragments above first randomly sample approximately half of the
7055 cases, then select the first 100 of those sampled.
7057 @cmd{N} with the @code{ESTIMATED} keyword gives an
7058 estimated number of cases before @cmd{DATA LIST} or another command to
7059 read in data. @code{ESTIMATED} never limits the number of cases
7060 processed by procedures. PSPP currently does not make use of
7061 case count estimates.
7063 When @cmd{N} is specified after @cmd{TEMPORARY}, it affects only
7064 the next procedure (@pxref{TEMPORARY}).
7066 @node PROCESS IF, SAMPLE, N OF CASES, Data Selection
7071 PROCESS IF expression.
7074 @cmd{PROCESS IF} temporarily eliminates cases from the
7075 data stream. Its effects are active only through the execution of the
7076 next procedure or procedure-like command.
7078 Specify a boolean expression (@pxref{Expressions}). If the value of the
7079 expression is true for a particular case, the case will be analyzed. If
7080 the expression has a false or missing value, then the case will be
7081 deleted from the data stream for this procedure only.
7083 Regardless of its placement relative to other commands, @cmd{PROCESS IF}
7084 always takes effect immediately before data passes to the procedure.
7085 Only one @cmd{PROCESS IF} command may be in effect at any given time.
7087 The effects of @cmd{PROCESS IF} are similar, but not identical, to the
7088 effects of executing @cmd{TEMPORARY}, then @cmd{SELECT IF}
7089 (@pxref{SELECT IF}).
7091 The filtering performed by @cmd{PROCESS IF} takes place immediately
7092 before cases pass to a procedure for analysis. Because @cmd{PROCESS
7093 IF} affects only a single procedure, its placement relative to
7094 @cmd{TEMPORARY} is unimportant.
7096 @cmd{PROCESS IF} is deprecated. It is included for compatibility with
7097 old command files. New syntax files should use @cmd{SELECT IF} or
7098 @cmd{FILTER} instead.
7100 @node SAMPLE, SELECT IF, PROCESS IF, Data Selection
7105 SAMPLE num1 [FROM num2].
7108 @cmd{SAMPLE} randomly samples a proportion of the cases in the active
7109 file. Unless it follows @cmd{TEMPORARY}, it operates as a
7110 transformation, permanently removing cases from the active file.
7112 The proportion to sample can be expressed as a single number between 0
7113 and 1. If @code{k} is the number specified, and @code{N} is the number
7114 of currently-selected cases in the active file, then after
7115 @code{SAMPLE @var{k}.}, approximately @code{k*N} cases will be
7118 The proportion to sample can also be specified in the style @code{SAMPLE
7119 @var{m} FROM @var{N}}. With this style, cases are selected as follows:
7123 If @var{N} is equal to the number of currently-selected cases in the
7124 active file, exactly @var{m} cases will be selected.
7127 If @var{N} is greater than the number of currently-selected cases in the
7128 active file, an equivalent proportion of cases will be selected.
7131 If @var{N} is less than the number of currently-selected cases in the
7132 active, exactly @var{m} cases will be selected @emph{from the first
7133 @var{N} cases in the active file.}
7136 @cmd{SAMPLE} and @cmd{SELECT IF} are performed in
7137 the order specified by the syntax file.
7139 @cmd{SAMPLE} is always performed before @code{N OF CASES}, regardless
7140 of ordering in the syntax file (@pxref{N OF CASES}).
7142 The same values for @cmd{SAMPLE} may result in different samples. To
7143 obtain the same sample, use the @code{SET} command to set the random
7144 number seed to the same value before each @cmd{SAMPLE}. Different
7145 samples may still result when the file is processed on systems with
7146 differing endianness or floating-point formats. By default, the
7147 random number seed is based on the system time.
7149 @node SELECT IF, SPLIT FILE, SAMPLE, Data Selection
7154 SELECT IF expression.
7157 @cmd{SELECT IF} selects cases for analysis based on the value of a
7158 boolean expression. Cases not selected are permanently eliminated
7159 from the active file, unless @cmd{TEMPORARY} is in effect
7160 (@pxref{TEMPORARY}).
7162 Specify a boolean expression (@pxref{Expressions}). If the value of the
7163 expression is true for a particular case, the case will be analyzed. If
7164 the expression has a false or missing value, then the case will be
7165 deleted from the data stream.
7167 Place @cmd{SELECT IF} as early in the command file as
7168 possible. Cases that are deleted early can be processed more
7169 efficiently in time and space.
7171 When @cmd{SELECT IF} is specified following @cmd{TEMPORARY}
7172 (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
7175 @node SPLIT FILE, TEMPORARY, SELECT IF, Data Selection
7180 Two possible syntaxes:
7181 SPLIT FILE BY var_list.
7185 @cmd{SPLIT FILE} allows multiple sets of data present in one data
7186 file to be analyzed separately using single statistical procedure
7189 Specify a list of variable names to analyze multiple sets of
7190 data separately. Groups of cases having the same values for these
7191 variables are analyzed by statistical procedure commands as one group.
7192 An independent analysis is carried out for each group of cases, and the
7193 variable values for the group are printed along with the analysis.
7195 Specify OFF to disable @cmd{SPLIT FILE} and resume analysis of the
7196 entire active file as a single group of data.
7198 When @cmd{SPLIT FILE} is specified after @cmd{TEMPORARY}, it affects only
7199 the next procedure (@pxref{TEMPORARY}).
7201 @node TEMPORARY, WEIGHT, SPLIT FILE, Data Selection
7209 @cmd{TEMPORARY} is used to make the effects of transformations
7210 following its execution temporary. These transformations will
7211 affect only the execution of the next procedure or procedure-like
7212 command. Their effects will not be saved to the active file.
7214 The only specification on @cmd{TEMPORARY} is the command name.
7216 @cmd{TEMPORARY} may not appear within a @cmd{DO IF} or @cmd{LOOP}
7217 construct. It may appear only once between procedures and
7218 procedure-like commands.
7220 Scratch variables cannot be used following @cmd{TEMPORARY}.
7222 An example may help to clarify:
7241 The data read by the first @cmd{DESCRIPTIVES} are 4, 5, 8,
7242 10.5, 13, 15. The data read by the first @cmd{DESCRIPTIVES} are 1, 2,
7245 @node WEIGHT, , TEMPORARY, Data Selection
7254 @cmd{WEIGHT} assigns cases varying weights,
7255 changing the frequency distribution of the active file. Execution of
7256 @cmd{WEIGHT} is delayed until data have been read.
7258 If a variable name is specified, @cmd{WEIGHT} causes the values of that
7259 variable to be used as weighting factors for subsequent statistical
7260 procedures. Use of keyword BY is optional but recommended. Weighting
7261 variables must be numeric. Scratch variables may not be used for
7262 weighting (@pxref{Scratch Variables}).
7264 When OFF is specified, subsequent statistical procedures will weight all
7267 A positive integer weighting factor @var{w} on a case will yield the
7268 same statistical output as would replicating the case @var{w} times.
7269 A weighting factor of 0 is treated for statistical purposes as if the
7270 case did not exist in the input. Weighting values need not be
7271 integers, but negative and system-missing values for the weighting
7272 variable are interpreted as weighting factors of 0. User-missing
7273 values are not treated specially.
7275 When @cmd{WEIGHT} is specified after @cmd{TEMPORARY}, it affects only
7276 the next procedure (@pxref{TEMPORARY}).
7278 @cmd{WEIGHT} does not cause cases in the active file to be replicated in
7281 @node Conditionals and Looping, Statistics, Data Selection, Top
7282 @chapter Conditional and Looping Constructs
7283 @cindex conditionals
7285 @cindex flow of control
7286 @cindex control flow
7288 This chapter documents PSPP commands used for conditional execution,
7289 looping, and flow of control.
7292 * BREAK:: Exit a loop.
7293 * DO IF:: Conditionally execute a block of code.
7294 * DO REPEAT:: Textually repeat a code block.
7295 * LOOP:: Repeat a block of code.
7298 @node BREAK, DO IF, Conditionals and Looping, Conditionals and Looping
7306 @cmd{BREAK} terminates execution of the innermost currently executing
7307 @cmd{LOOP} construct.
7309 @cmd{BREAK} is allowed only inside @cmd{LOOP}@dots{}@cmd{END LOOP}.
7310 @xref{LOOP}, for more details.
7312 @node DO IF, DO REPEAT, BREAK, Conditionals and Looping
7327 @cmd{DO IF} allows one of several sets of transformations to be
7328 executed, depending on user-specified conditions.
7330 If the specified boolean expression evaluates as true, then the block
7331 of code following @cmd{DO IF} is executed. If it evaluates as
7333 none of the code blocks is executed. If it is false, then
7334 the boolean expression on the first @cmd{ELSE IF}, if present, is tested in
7335 turn, with the same rules applied. If all expressions evaluate to
7336 false, then the @cmd{ELSE} code block is executed, if it is present.
7338 When @cmd{DO IF} or @cmd{ELSE IF} is specified following @cmd{TEMPORARY}
7339 (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
7342 @node DO REPEAT, LOOP, DO IF, Conditionals and Looping
7347 DO REPEAT repvar_name=expansion@dots{}.
7351 expansion takes one of the following forms:
7356 num_or_range takes one of the following forms:
7361 @cmd{DO REPEAT} repeats a block of code, textually substituting
7362 different variables, numbers, or strings into the block with each
7365 Specify a repeat variable name followed by an equals sign (@samp{=}) and
7366 the list of replacements. Replacements can be a list of variables
7367 (which may be existing variables or new variables or a combination
7368 thereof), of numbers, or of strings. When new variable names are
7369 specified, @cmd{DO REPEAT} creates them as numeric variables. When numbers
7370 are specified, runs of integers may be indicated with TO notation, for
7371 instance @samp{1 TO 5} and @samp{1 2 3 4 5} would be equivalent. There
7372 is no equivalent notation for string values.
7374 Multiple repeat variables can be specified. When this is done, each
7375 variable must have the same number of replacements.
7377 The code within @cmd{DO REPEAT} is repeated as many times as there are
7378 replacements for each variable. The first time, the first value for
7379 each repeat variable is substituted; the second time, the second value
7380 for each repeat variable is substituted; and so on.
7382 Repeat variable substitutions work like macros. They take place
7383 anywhere in a line that the repeat variable name occurs as a token,
7384 including command and subcommand names. For this reason it is not a
7385 good idea to select words commonly used in command and subcommand names
7386 as repeat variable identifiers.
7388 If PRINT is specified on @cmd{END REPEAT}, the commands after substitutions
7389 are made are printed to the listing file, prefixed by a plus sign
7392 @node LOOP, , DO REPEAT, Conditionals and Looping
7397 LOOP [index_var=start TO end [BY incr]] [IF condition].
7399 END LOOP [IF condition].
7402 @cmd{LOOP} iterates a group of commands. A number of
7403 termination options are offered.
7405 Specify index_var to make that variable count from one value to
7406 another by a particular increment. index_var must be a pre-existing
7407 numeric variable. start, end, and incr are numeric expressions
7408 (@pxref{Expressions}.)
7410 During the first iteration, index_var is set to the value of start.
7411 During each successive iteration, index_var is increased by the value of
7412 incr. If end > start, then the loop terminates when index_var > end;
7413 otherwise it terminates when index_var < end. If incr is not specified
7414 then it defaults to +1 or -1 as appropriate.
7416 If end > start and incr < 0, or if end < start and incr > 0, then the
7417 loop is never executed. index_var is nevertheless set to the value of
7420 Modifying index_var within the loop is allowed, but it has no effect on
7421 the value of index_var in the next iteration.
7423 Specify a boolean expression for the condition on @cmd{LOOP} to
7424 cause the loop to be executed only if the condition is true. If the
7425 condition is false or missing before the loop contents are executed the
7426 first time, the loop contents are not executed at all.
7428 If index and condition clauses are both present on @cmd{LOOP}, the index
7429 clause is always evaluated first.
7431 Specify a boolean expression for the condition on @cmd{END LOOP} to cause
7432 the loop to terminate if the condition is not true after the enclosed
7433 code block is executed. The condition is evaluated at the end of the
7434 loop, not at the beginning.
7436 If the index clause and both condition clauses are not present, then the
7437 loop is executed MXLOOPS (@pxref{SET}) times.
7439 @cmd{BREAK} also terminates @cmd{LOOP} execution (@pxref{BREAK}).
7441 When @cmd{LOOP} or @cmd{END LOOP} is specified following @cmd{TEMPORARY}
7442 (@pxref{TEMPORARY}), the @cmd{LAG} function may not be used
7445 @node Statistics, Utilities, Conditionals and Looping, Top
7448 This chapter documents the statistical procedures that PSPP supports so
7452 * DESCRIPTIVES:: Descriptive statistics.
7453 * FREQUENCIES:: Frequency tables.
7454 * CROSSTABS:: Crosstabulation tables.
7455 * T-TEST:: Test Hypotheses about means.
7458 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
7459 @section DESCRIPTIVES
7461 @vindex DESCRIPTIVES
7465 /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
7466 /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
7468 /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
7469 SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
7470 SESKEWNESS,SEKURTOSIS@}
7471 /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
7472 RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
7476 The @cmd{DESCRIPTIVES} procedure reads the active file and outputs
7478 statistics requested by the user. In addition, it can optionally
7481 The VARIABLES subcommand, which is required, specifies the list of
7482 variables to be analyzed. Keyword VARIABLES is optional.
7484 All other subcommands are optional:
7486 The MISSING subcommand determines the handling of missing variables. If
7487 INCLUDE is set, then user-missing values are included in the
7488 calculations. If NOINCLUDE is set, which is the default, user-missing
7489 values are excluded. If VARIABLE is set, then missing values are
7490 excluded on a variable by variable basis; if LISTWISE is set, then
7491 the entire case is excluded whenever any value in that case has a
7492 system-missing or, if INCLUDE is set, user-missing value.
7494 The FORMAT subcommand affects the output format. Currently the
7495 LABELS/NOLABELS and NOINDEX/INDEX settings are not used. When SERIAL is
7496 set, both valid and missing number of cases are listed in the output;
7497 when NOSERIAL is set, only valid cases are listed.
7499 The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all
7500 the specified variables. The Z scores are saved to new variables.
7501 Variable names are generated by trying first the original variable name
7502 with Z prepended and truncated to a maximum of 8 characters, then the
7503 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
7504 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence. In addition, Z score
7505 variable names can be specified explicitly on VARIABLES in the variable
7506 list by enclosing them in parentheses after each variable.
7508 The STATISTICS subcommand specifies the statistics to be displayed:
7512 All of the statistics below.
7516 Standard error of the mean.
7522 Kurtosis and standard error of the kurtosis.
7524 Skewness and standard error of the skewness.
7534 Mean, standard deviation of the mean, minimum, maximum.
7536 Standard error of the kurtosis.
7538 Standard error of the skewness.
7541 The SORT subcommand specifies how the statistics should be sorted. Most
7542 of the possible values should be self-explanatory. NAME causes the
7543 statistics to be sorted by name. By default, the statistics are listed
7544 in the order that they are specified on the VARIABLES subcommand. The A
7545 and D settings request an ascending or descending sort order,
7548 @node FREQUENCIES, CROSSTABS, DESCRIPTIVES, Statistics
7549 @section FREQUENCIES
7555 /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
7556 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
7558 @{AVALUE,DVALUE,AFREQ,DFREQ@}
7561 /MISSING=@{EXCLUDE,INCLUDE@}
7562 /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
7563 KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
7564 SESKEWNESS,SEKURTOSIS,ALL,NONE@}
7566 /PERCENTILES=percent@dots{}
7568 (These options are not currently implemented.)
7575 /VARIABLES=var_list (low,high)@dots{}
7578 The @cmd{FREQUENCIES} procedure outputs frequency tables for specified
7580 @cmd{FREQUENCIES} can also calculate and display descriptive statistics
7581 (including median and mode) and percentiles.
7583 In the future, @cmd{FREQUENCIES} will also support graphical output in the
7584 form of bar charts and histograms. In addition, it will be able to
7585 support percentiles for grouped data.
7587 The VARIABLES subcommand is the only required subcommand. Specify the
7588 variables to be analyzed. In most cases, this is all that is required.
7589 This is known as @dfn{general mode}.
7591 Occasionally, one may want to invoke a special mode called @dfn{integer
7592 mode}. Normally, in general mode, PSPP will automatically determine
7593 what values occur in the data. In integer mode, the user specifies the
7594 range of values that the data assumes. To invoke this mode, specify a
7595 range of data values in parentheses, separated by a comma. Data values
7596 inside the range are truncated to the nearest integer, then assigned to
7597 that value. If values occur outside this range, they are discarded.
7599 The FORMAT subcommand controls the output format. It has several
7604 TABLE, the default, causes a frequency table to be output for every
7605 variable specified. NOTABLE prevents them from being output. LIMIT
7606 with a numeric argument causes them to be output except when there are
7607 more than the specified number of values in the table.
7610 STANDARD frequency tables contain more complete information, but also to
7611 take up more space on the printed page. CONDENSE frequency tables are
7612 less informative but take up less space. ONEPAGE with a numeric
7613 argument will output standard frequency tables if there are the
7614 specified number of values or less, condensed tables otherwise. ONEPAGE
7615 without an argument defaults to a threshold of 50 values.
7618 LABELS causes value labels to be displayed in STANDARD frequency
7619 tables. NOLABLES prevents this.
7622 Normally frequency tables are sorted in ascending order by value. This
7623 is AVALUE. DVALUE tables are sorted in descending order by value.
7624 AFREQ and DFREQ tables are sorted in ascending and descending order,
7625 respectively, by frequency count.
7628 SINGLE spaced frequency tables are closely spaced. DOUBLE spaced
7629 frequency tables have wider spacing.
7632 OLDPAGE and NEWPAGE are not currently used.
7635 The MISSING subcommand controls the handling of user-missing values.
7636 When EXCLUDE, the default, is set, user-missing values are not included
7637 in frequency tables or statistics. When INCLUDE is set, user-missing
7638 are included. System-missing values are never included in statistics,
7639 but are listed in frequency tables.
7641 The available STATISTICS are the same as available in @cmd{DESCRIPTIVES}
7642 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
7643 value, and MODE, the mode. (If there are multiple modes, the smallest
7644 value is reported.) By default, the mean, standard deviation of the
7645 mean, minimum, and maximum are reported for each variable.
7647 NTILES causes the specified quartiles to be reported. For instance,
7648 @code{/NTILES=4} would cause quartiles to be reported. In addition,
7649 particular percentiles can be requested with the PERCENTILES subcommand.
7651 @node CROSSTABS, T-TEST, FREQUENCIES, Statistics
7657 /TABLES=var_list BY var_list [BY var_list]@dots{}
7658 /MISSING=@{TABLE,INCLUDE,REPORT@}
7659 /WRITE=@{NONE,CELLS,ALL@}
7660 /FORMAT=@{TABLES,NOTABLES@}
7661 @{LABELS,NOLABELS,NOVALLABS@}
7666 /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
7667 ASRESIDUAL,ALL,NONE@}
7668 /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
7669 KAPPA,ETA,CORR,ALL,NONE@}
7672 /VARIABLES=var_list (low,high)@dots{}
7675 The @cmd{CROSSTABS} procedure displays crosstabulation
7676 tables requested by the user. It can calculate several statistics for
7677 each cell in the crosstabulation tables. In addition, a number of
7678 statistics can be calculated for each table itself.
7680 The TABLES subcommand is used to specify the tables to be reported. Any
7681 number of dimensions is permitted, and any number of variables per
7682 dimension is allowed. The TABLES subcommand may be repeated as many
7683 times as needed. This is the only required subcommand in @dfn{general
7686 Occasionally, one may want to invoke a special mode called @dfn{integer
7687 mode}. Normally, in general mode, PSPP automatically determines
7688 what values occur in the data. In integer mode, the user specifies the
7689 range of values that the data assumes. To invoke this mode, specify the
7690 VARIABLES subcommand, giving a range of data values in parentheses for
7691 each variable to be used on the TABLES subcommand. Data values inside
7692 the range are truncated to the nearest integer, then assigned to that
7693 value. If values occur outside this range, they are discarded. When it
7694 is present, the VARIABLES subcommand must precede the TABLES
7697 In general mode, numeric and string variables may be specified on
7698 TABLES. Although long string variables are allowed, only their
7699 initial short-string parts are used. In integer mode, only numeric
7700 variables are allowed.
7702 The MISSING subcommand determines the handling of user-missing values.
7703 When set to TABLE, the default, missing values are dropped on a table by
7704 table basis. When set to INCLUDE, user-missing values are included in
7705 tables and statistics. When set to REPORT, which is allowed only in
7706 integer mode, user-missing values are included in tables but marked with
7707 an @samp{M} (for ``missing'') and excluded from statistical
7710 Currently the WRITE subcommand is ignored.
7712 The FORMAT subcommand controls the characteristics of the
7713 crosstabulation tables to be displayed. It has a number of possible
7718 TABLES, the default, causes crosstabulation tables to be output.
7719 NOTABLES suppresses them.
7722 LABELS, the default, allows variable labels and value labels to appear
7723 in the output. NOLABELS suppresses them. NOVALLABS displays variable
7724 labels but suppresses value labels.
7727 PIVOT, the default, causes each TABLES subcommand to be displayed in a
7728 pivot table format. NOPIVOT causes the old-style crosstabulation format
7732 AVALUE, the default, causes values to be sorted in ascending order.
7733 DVALUE asserts a descending sort order.
7736 INDEX/NOINDEX is currently ignored.
7739 BOX/NOBOX is currently ignored.
7742 The CELLS subcommand controls the contents of each cell in the displayed
7743 crosstabulation table. The possible settings are:
7759 Standardized residual.
7761 Adjusted standardized residual.
7765 Suppress cells entirely.
7768 @samp{/CELLS} without any settings specified requests COUNT, ROW,
7769 COLUMN, and TOTAL. If CELLS is not specified at all then only COUNT
7772 The STATISTICS subcommand selects statistics for computation:
7776 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
7777 correction, linear-by-linear association.
7781 Contingency coefficient.
7785 Uncertainty coefficient.
7801 Spearman correlation, Pearson's r.
7808 Selected statistics are only calculated when appropriate for the
7809 statistic. Certain statistics require tables of a particular size, and
7810 some statistics are calculated only in integer mode.
7812 @samp{/STATISTICS} without any settings selects CHISQ. If the
7813 STATISTICS subcommand is not given, no statistics are calculated.
7815 @strong{Please note:} Currently the implementation of CROSSTABS has the
7820 Pearson's R (but not Spearman) is off a little.
7822 T values for Spearman's R and Pearson's R are wrong.
7824 Significance of symmetric and directional measures is not calculated.
7826 Asymmetric ASEs and T values for lambda are wrong.
7828 ASE of Goodman and Kruskal's tau is not calculated.
7830 ASE of symmetric somers' d is wrong.
7832 Approximate T of uncertainty coefficient is wrong.
7835 Fixes for any of these deficiencies would be welcomed.
7837 @node T-TEST, , CROSSTABS, Statistics
7838 @comment node-name, next, previous, up
7844 /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
7845 /CRITERIA=CIN(confidence)
7853 (Independent Samples mode.)
7854 GROUPS=var(value1 [, value2])
7858 (Paired Samples mode.)
7859 PAIRS=var_list [WITH var_list [(PAIRED)] ]
7864 The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about
7866 It operates in one of three modes:
7868 @item One Sample mode.
7869 @item Independent Groups mode.
7874 Each of these modes are described in more detail below.
7875 There are two optional subcommands which are common to all modes.
7877 The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used
7878 in the tests. The default value is 0.95.
7881 The @cmd{MISSING} subcommand determines the handling of missing
7883 If INCLUDE is set, then user-missing values are included in the
7884 calculations, but system-missing values are not.
7885 If EXCLUDE is set, which is the default, user-missing
7886 values are excluded as well as system-missing values.
7887 This is the default.
7889 If LISTWISE is set, then the entire case is excluded from analysis
7890 whenever any variable specified in the @cmd{/VARIABLES}, @cmd{/PAIRS} or
7891 @cmd{/GROUPS} subcommands contains a missing value.
7892 If ANALYSIS is set, then missing values are excluded only in the analysis for
7893 which they would be needed. This is the default.
7897 * One Sample Mode:: Testing against a hypothesised mean
7898 * Independent Samples Mode:: Testing two independent groups for the same mean
7899 * Paired Samples Mode:: Testing two interdependet groups for the same mean
7902 @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
7903 @comment node-name, next, previous, up
7905 @subsection One Sample Mode
7907 The @cmd{TESTVAL} subcommand invokes the One Sample mode.
7908 This mode is used to test a population mean against a hypothesised
7910 The value given to the @cmd{TESTVAL} subcommand is the value against
7911 which you wish to test.
7912 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
7913 tell PSPP which variables you wish to test.
7915 @node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST
7916 @comment node-name, next, previous, up
7917 @subsection Independent Samples Mode
7919 The @cmd{GROUPS} subcommand invokes Independent Samples mode or
7921 This mode is used to test whether two groups of values have the
7922 same population mean.
7923 In this mode, you must also use the @cmd{/VARIABLES} subcommand to
7924 tell PSPP the dependent variables you wish to test.
7926 The variable given in the @cmd{GROUPS} subcommand is the independent
7927 variable which determines to which group the samples belong.
7928 The values in parentheses are the specific values of the independent
7929 variable for each group.
7930 If the parentheses are omitted and no values are given, the default values
7931 of 1.0 and 2.0 are assumed.
7933 If the independent variable is numeric,
7934 it is acceptable to specify only one value inside the parentheses.
7935 If you do this, cases where the independent variable is
7936 less than or equal to this value belong to the first group, and cases
7937 greater than this value belong to the second group.
7938 When using this form of the @cmd{GROUPS} subcommand, missing values in
7939 the independent variable are excluded on a listwise basis, regardless
7940 of whether @cmd{/MISSING=LISTWISE} was specified.
7943 @node Paired Samples Mode, , Independent Samples Mode, T-TEST
7944 @comment node-name, next, previous, up
7945 @subsection Paired Samples Mode
7947 The @cmd{PAIRS} subcommand introduces Paired Samples mode.
7948 Use this mode when repeated measures have been taken from the same
7950 If the the @code{WITH} keyword is omitted, then tables for all
7951 combinations of variables given in the @cmd{PAIRS} subcommand are
7953 If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword
7954 is also given, then the number of variables preceding @code{WITH}
7955 must be the same as the number following it.
7956 In this case, tables for each respective pair of variables are
7958 In the event that the @code{WITH} keyword is given, but the
7959 @code{(PAIRED)} keyword is omitted, then tables for each combination
7960 of variable preceding @code{WITH} against variable following
7961 @code{WITH} are generated.
7964 @node Utilities, Not Implemented, Statistics, Top
7967 Commands that don't fit any other category are placed here.
7969 Most of these commands are not affected by commands like @cmd{IF} and
7971 they take effect only once, unconditionally, at the time that they are
7972 encountered in the input.
7975 * COMMENT:: Document your syntax file.
7976 * DOCUMENT:: Document the active file.
7977 * DISPLAY DOCUMENTS:: Display active file documents.
7978 * DISPLAY FILE LABEL:: Display the active file label.
7979 * DROP DOCUMENTS:: Remove documents from the active file.
7980 * ERASE:: Erase a file.
7981 * EXECUTE:: Execute pending transformations.
7982 * FILE LABEL:: Set the active file's label.
7983 * FINISH:: Terminate the PSPP session.
7984 * HOST:: Temporarily return to the operating system.
7985 * INCLUDE:: Include a file within the current one.
7986 * QUIT:: Terminate the PSPP session.
7987 * SET:: Adjust PSPP runtime parameters.
7988 * SHOW:: Display runtime parameters.
7989 * SUBTITLE:: Provide a document subtitle.
7990 * TITLE:: Provide a document title.
7993 @node COMMENT, DOCUMENT, Utilities, Utilities
7999 Two possibles syntaxes:
8000 COMMENT comment text @dots{} .
8001 *comment text @dots{} .
8004 @cmd{COMMENT} is ignored. It is used to provide information to
8005 the author and other readers of the PSPP syntax file.
8007 @cmd{COMMENT} can extend over any number of lines. Don't forget to
8008 terminate it with a dot or a blank line.
8010 @node DOCUMENT, DISPLAY DOCUMENTS, COMMENT, Utilities
8015 DOCUMENT documentary_text.
8018 @cmd{DOCUMENT} adds one or more lines of descriptive commentary to the
8019 active file. Documents added in this way are saved to system files.
8020 They can be viewed using @cmd{SYSFILE INFO} or @cmd{DISPLAY
8021 DOCUMENTS}. They can be removed from the active file with @cmd{DROP
8024 Specify the documentary text following the DOCUMENT keyword. You can
8025 extend the documentary text over as many lines as necessary. Lines are
8026 truncated at 80 characters width. Don't forget to terminate
8027 the command with a dot or a blank line.
8029 @node DISPLAY DOCUMENTS, DISPLAY FILE LABEL, DOCUMENT, Utilities
8030 @section DISPLAY DOCUMENTS
8031 @vindex DISPLAY DOCUMENTS
8037 @cmd{DISPLAY DOCUMENTS} displays the documents in the active file. Each
8038 document is preceded by a line giving the time and date that it was
8039 added. @xref{DOCUMENT}.
8041 @node DISPLAY FILE LABEL, DROP DOCUMENTS, DISPLAY DOCUMENTS, Utilities
8042 @section DISPLAY FILE LABEL
8043 @vindex DISPLAY FILE LABEL
8049 @cmd{DISPLAY FILE LABEL} displays the file label contained in the
8051 if any. @xref{FILE LABEL}.
8053 @node DROP DOCUMENTS, ERASE, DISPLAY FILE LABEL, Utilities
8054 @section DROP DOCUMENTS
8055 @vindex DROP DOCUMENTS
8061 @cmd{DROP DOCUMENTS} removes all documents from the active file.
8062 New documents can be added with @cmd{DOCUMENT} (@pxref{DOCUMENT}).
8064 @cmd{DROP DOCUMENTS} changes only the active file. It does not modify any
8065 system files stored on disk.
8068 @node ERASE, EXECUTE, DROP DOCUMENTS, Utilities
8069 @comment node-name, next, previous, up
8074 ERASE FILE file_name.
8077 @cmd{ERASE FILE} deletes a file from the local filesystem.
8078 file_name must be quoted.
8079 This command cannot be used if the SAFER setting is active.
8082 @node EXECUTE, FILE LABEL, ERASE, Utilities
8090 @cmd{EXECUTE} causes the active file to be read and all pending
8091 transformations to be executed.
8093 @node FILE LABEL, FINISH, EXECUTE, Utilities
8098 FILE LABEL file_label.
8101 @cmd{FILE LABEL} provides a title for the active file. This
8102 title will be saved into system files and portable files that are
8103 created during this PSPP run.
8105 file_label need not be quoted. If quotes are
8106 included, they become part of the file label.
8108 @node FINISH, HOST, FILE LABEL, Utilities
8116 @cmd{FINISH} terminates the current PSPP session and returns
8117 control to the operating system.
8119 This command is not valid in interactive mode.
8121 @node HOST, INCLUDE, FINISH, Utilities
8122 @comment node-name, next, previous, up
8130 @cmd{HOST} suspends the current PSPP session and temporarily returns control
8131 to the operating system.
8132 This command cannot be used if the SAFER setting is active.
8135 @node INCLUDE, QUIT, HOST, Utilities
8141 Two possible syntaxes:
8146 @cmd{INCLUDE} causes the PSPP command processor to read an
8147 additional command file as if it were included bodily in the current
8150 Include files may be nested to any depth, up to the limit of available
8153 @node QUIT, SET, INCLUDE, Utilities
8158 Two possible syntaxes:
8163 @cmd{QUIT} terminates the current PSPP session and returns control
8164 to the operating system.
8166 This command is not valid within a command file.
8168 @node SET, SHOW, QUIT, Utilities
8176 /BLANKS=@{SYSMIS,'.',number@}
8177 /DECIMAL=@{DOT,COMMA@}
8185 /CPROMPT='cprompt_string'
8186 /DPROMPT='dprompt_string'
8187 /ERRORBREAK=@{OFF,ON@}
8189 /MXWARNS=max_warnings
8191 /VIEWLENGTH=@{MINIMUM,MEDIAN,MAXIMUM,n_lines@}
8192 /VIEWWIDTH=n_characters
8196 /MITERATE=max_iterations
8200 /SEED=@{RANDOM,seed_value@}
8201 /UNDEFINED=@{WARN,NOWARN@}
8204 /CC@{A,B,C,D,E@}=@{'npre,pre,suf,nsuf','npre.pre.suf.nsuf'@}
8205 /DECIMAL=@{DOT,COMMA@}
8210 /ERRORS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8212 /MESSAGES=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8213 /PRINTBACK=@{ON,OFF@}
8214 /RESULTS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8221 (output driver options)
8222 /HEADERS=@{NO,YES,BLANK@}
8223 /LENGTH=@{NONE,length_in_lines@}
8226 /PAGER=@{OFF,"pager_name"@}
8227 /WIDTH=@{NARROW,WIDTH,n_characters@}
8230 /JOURNAL=@{ON,OFF@} [filename]
8231 /LOG=@{ON,OFF@} [filename]
8234 /COMPRESSION=@{ON,OFF@}
8235 /SCOMPRESSION=@{ON,OFF@}
8240 (obsolete settings accepted for compatibility, but ignored)
8241 /AUTOMENU=@{ON,OFF@}
8244 /BOXSTRING=@{'xxx','xxxxxxxxxxx'@}
8245 /CASE=@{UPPER,UPLOW@}
8250 /HELPWINDOWS=@{ON,OFF@}
8253 /LOWRES=@{AUTO,ON,OFF@}
8255 /MENUS=@{STANDARD,EXTENDED@}
8256 /MXMEMORY=max_memory
8257 /PTRANSLATE=@{ON,OFF@}
8259 /RUNREVIEW=@{AUTO,MANUAL@}
8261 /TB1=@{'xxx','xxxxxxxxxxx'@}
8263 /WORKDEV=drive_letter
8264 /WORKSPACE=workspace_size
8268 @cmd{SET} allows the user to adjust several parameters relating to
8269 PSPP's execution. Since there are many subcommands to this command, its
8270 subcommands will be examined in groups.
8272 On subcommands that take boolean values, ON and YES are synonym, and
8273 as are OFF and NO, when used as subcommand values.
8275 The data input subcommands affect the way that data is read from data
8276 files. The data input subcommands are
8280 This is the value assigned to an item data item that is empty or
8281 contains only whitespace. An argument of SYSMIS or '.' will cause the
8282 system-missing value to be assigned to null items. This is the
8283 default. Any real value may be assigned.
8286 The default DOT setting causes the decimal point character to be
8287 @samp{.}. A setting of COMMA causes the decimal point character to be
8291 Allows the default numeric input/output format to be specified. The
8292 default is F8.2. @xref{Input/Output Formats}.
8295 Program input subcommands affect the way that programs are parsed when
8296 they are typed interactively or run from a script. They are
8300 This is a single character indicating the end of a command. The default
8301 is @samp{.}. Don't change this.
8304 Whether a blank line is interpreted as ending the current command. The
8308 Interaction subcommands affect the way that PSPP interacts with an
8309 online user. The interaction subcommands are
8313 The command continuation prompt. The default is @samp{ > }.
8316 Prompt used when expecting data input within @cmd{BEGIN DATA} (@pxref{BEGIN
8317 DATA}). The default is @samp{data> }.
8320 Whether an error causes PSPP to stop processing the current command
8321 file after finishing the current command. The default is OFF.
8324 The maximum number of errors before PSPP halts processing of the current
8325 command file. The default is 50.
8328 The maximum number of warnings + errors before PSPP halts processing the
8329 current command file. The default is 100.
8332 The command prompt. The default is @samp{PSPP> }.
8335 The length of the screen in lines. MINIMUM means 25 lines, MEDIAN and
8336 MAXIMUM mean 43 lines. Otherwise specify the number of lines. Normally
8337 PSPP should auto-detect your screen size so this shouldn't have to be
8341 The width of the screen in characters. Normally 80 or 132.
8344 Program execution subcommands control the way that PSPP commands
8345 execute. The program execution subcommands are
8355 The maximum number of iterations for an uncontrolled loop (@pxref{LOOP}).
8358 The initial pseudo-random number seed. Set to a real number or to
8359 RANDOM, which will obtain an initial seed from the current time of day.
8365 Data output subcommands affect the format of output data. These
8374 Set up custom currency formats. The argument is a string which must
8375 contain exactly three commas or exactly three periods. If commas, then
8376 the grouping character for the currency format is @samp{,}, and the
8377 decimal point character is @samp{.}; if periods, then the situation is
8380 The commas or periods divide the string into four fields, which are, in
8381 order, the negative prefix, prefix, suffix, and negative suffix. When a
8382 value is formatted using the custom currency format, the prefix precedes
8383 the value formatted and the suffix follows it. In addition, if the
8384 value is negative, the negative prefix precedes the prefix and the
8385 negative suffix follows the suffix.
8388 The default DOT setting causes the decimal point character to be
8389 @samp{.}. A setting of COMMA causes the decimal point character to be
8393 Allows the default numeric input/output format to be specified. The
8394 default is F8.2. @xref{Input/Output Formats}.
8397 Output routing subcommands affect where the output of transformations
8398 and procedures is sent. These subcommands are
8403 If turned on, commands are written to the listing file as they are read
8404 from command files. The default is OFF.
8414 Output activation subcommands affect whether output devices of
8415 particular types are enabled. These subcommands are
8419 Enable or disable listing devices.
8422 Enable or disable printer devices.
8425 Enable or disable screen devices.
8428 Output driver option subcommands affect output drivers' settings. These
8441 Logging subcommands affect logging of commands executed to external
8442 files. These subcommands are
8450 System file subcommands affect the default format of system files
8451 produced by PSPP. These subcommands are
8458 Whether system files created by @cmd{SAVE} or @cmd{XSAVE} are
8459 compressed by default. The default is ON.
8462 Security subcommands affect the operations that commands are allowed to
8463 perform. The security subcommands are
8467 When set, this setting cannot ever be reset, for obvious security
8468 reasons. Setting this option disables the following operations:
8476 Pipe filenames (filenames beginning or ending with @samp{|}).
8479 Be aware that this setting does not guarantee safety (commands can still
8480 overwrite files, for instance) but it is an improvement.
8483 @node SHOW, SUBTITLE, SET, Utilities
8484 @comment node-name, next, previous, up
8494 @cmd{SHOW} can be used to display the current state of PSPP's
8495 execution parameters. All of the parameters which can be changed
8496 using @code{SET} @xref{SET}, can be examined using @cmd{SHOW}, by
8497 using a subcommand with the same name.
8498 In addition, @code{SHOW} supports the following subcommands:
8502 Show details of the lack of warranty for PSPP.
8504 Display the terms of PSPP's copyright licence @ref{License}.
8509 @node SUBTITLE, TITLE, SHOW, Utilities
8514 SUBTITLE 'subtitle_string'.
8516 SUBTITLE subtitle_string.
8519 @cmd{SUBTITLE} provides a subtitle to a particular PSPP
8520 run. This subtitle appears at the top of each output page below the
8521 title, if headers are enabled on the output device.
8523 Specify a subtitle as a string in quotes. The alternate syntax that did
8524 not require quotes is now obsolete. If it is used then the subtitle is
8525 converted to all uppercase.
8527 @node TITLE, , SUBTITLE, Utilities
8532 TITLE 'title_string'.
8537 @cmd{TITLE} provides a title to a particular PSPP run.
8538 This title appears at the top of each output page, if headers are enabled
8539 on the output device.
8541 Specify a title as a string in quotes. The alternate syntax that did
8542 not require quotes is now obsolete. If it is used then the title is
8543 converted to all uppercase.
8545 @node Not Implemented, Data File Format, Utilities, Top
8546 @chapter Not Implemented
8548 This chapter lists parts of the PSPP language that are not yet
8551 The following transformations and utilities are not yet implemented, but
8552 they will be supported in a later release.
8585 The following transformations and utilities are not implemented. There
8586 are no plans to support them in future releases. Contributions to
8587 implement them will still be accepted.
8609 NUMBERED and UNNUMBERED
8622 @node Data File Format, Portable File Format, Not Implemented, Top
8623 @chapter Data File Format
8625 PSPP necessarily uses the same format for system files as do the
8626 products with which it is compatible. This chapter is a description of
8629 There are three data types used in system files: 32-bit integers, 64-bit
8630 floating points, and 1-byte characters. In this document these will
8631 simply be referred to as @code{int32}, @code{flt64}, and @code{char},
8632 the names that are used in the PSPP source code. Every field of type
8633 @code{int32} or @code{flt64} is aligned on a 32-bit boundary.
8635 The endianness of data in PSPP system files is not specified. System
8636 files output on a computer of a particular endianness will have the
8637 endianness of that computer. However, PSPP can read files of either
8638 endianness, regardless of its host computer's endianness. PSPP
8639 translates endianness for both integer and floating point numbers.
8641 Floating point formats are also not specified. PSPP does not
8642 translate between floating point formats. This is unlikely to be a
8643 problem as all modern computer architectures use IEEE 754 format for
8644 floating point representation.
8646 The PSPP system-missing value is represented by the largest possible
8647 negative number in the floating point format; in C, this is most likely
8648 @code{-DBL_MAX}. There are two other important values used in missing
8649 values: @code{HIGHEST} and @code{LOWEST}. These are represented by the
8650 largest possible positive number (probably @code{DBL_MAX}) and the
8651 second-largest negative number. The latter must be determined in a
8652 system-dependent manner; in IEEE 754 format it is represented by value
8653 @code{0xffeffffffffffffe}.
8655 System files are divided into records. Each record begins with an
8656 @code{int32} giving a numeric record type. Individual record types are
8660 * File Header Record::
8662 * Value Label Record::
8663 * Value Label Variable Record::
8665 * Machine int32 Info Record::
8666 * Machine flt64 Info Record::
8667 * Miscellaneous Informational Records::
8668 * Dictionary Termination Record::
8672 @node File Header Record, Variable Record, Data File Format, Data File Format
8673 @section File Header Record
8675 The file header is always the first record in the file.
8678 struct sysfile_header
8688 char creation_date[9];
8689 char creation_time[8];
8690 char file_label[64];
8696 @item char rec_type[4];
8697 Record type code. Always set to @samp{$FL2}. This is the only record
8698 for which the record type is not of type @code{int32}.
8700 @item char prod_name[60];
8701 Product identification string. This always begins with the characters
8702 @samp{@@(#) SPSS DATA FILE}. PSPP uses the remaining characters to
8703 give its version and the operating system name; for example, @samp{GNU
8704 pspp 0.1.4 - sparc-sun-solaris2.5.2}. The string is truncated if it
8705 would be longer than 60 characters; otherwise it is padded on the right
8708 @item int32 layout_code;
8709 Always set to 2. PSPP reads this value to determine the
8712 @item int32 case_size;
8713 Number of data elements per case. This is the number of variables,
8714 except that long string variables add extra data elements (one for every
8715 8 characters after the first 8).
8717 @item int32 compressed;
8718 Set to 1 if the data in the file is compressed, 0 otherwise.
8720 @item int32 weight_index;
8721 If one of the variables in the data set is used as a weighting variable,
8722 set to the index of that variable. Otherwise, set to 0.
8725 Set to the number of cases in the file if it is known, or -1 otherwise.
8727 In the general case it is not possible to determine the number of cases
8728 that will be output to a system file at the time that the header is
8729 written. The way that this is dealt with is by writing the entire
8730 system file, including the header, then seeking back to the beginning of
8731 the file and writing just the @code{ncases} field. For `files' in which
8732 this is not valid, the seek operation fails. In this case,
8733 @code{ncases} remains -1.
8736 Compression bias. Always set to 100. The significance of this value is
8737 that only numbers between @code{(1 - bias)} and @code{(251 - bias)} can
8740 @item char creation_date[9];
8741 Set to the date of creation of the system file, in @samp{dd mmm yy}
8742 format, with the month as standard English abbreviations, using an
8743 initial capital letter and following with lowercase. If the date is not
8744 available then this field is arbitrarily set to @samp{01 Jan 70}.
8746 @item char creation_time[8];
8747 Set to the time of creation of the system file, in @samp{hh:mm:ss}
8748 format and using 24-hour time. If the time is not available then this
8749 field is arbitrarily set to @samp{00:00:00}.
8751 @item char file_label[64];
8752 Set the the file label declared by the user, if any. Padded on the
8755 @item char padding[3];
8756 Ignored padding bytes to make the structure a multiple of 32 bits in
8757 length. Set to zeros.
8760 @node Variable Record, Value Label Record, File Header Record, Data File Format
8761 @section Variable Record
8763 Immediately following the header must come the variable records. There
8764 must be one variable record for every variable and every 8 characters in
8765 a long string beyond the first 8; i.e., there must be exactly as many
8766 variable records as the value specified for @code{case_size} in the file
8770 struct sysfile_variable
8774 int32 has_var_label;
8775 int32 n_missing_values;
8780 /* The following two fields are present
8781 only if has_var_label is 1. */
8783 char label[/* variable length */];
8785 /* The following field is present only
8786 if n_missing_values is not 0. */
8787 flt64 missing_values[/* variable length*/];
8792 @item int32 rec_type;
8793 Record type code. Always set to 2.
8796 Variable type code. Set to 0 for a numeric variable. For a short
8797 string variable or the first part of a long string variable, this is set
8798 to the width of the string. For the second and subsequent parts of a
8799 long string variable, set to -1, and the remaining fields in the
8800 structure are ignored.
8802 @item int32 has_var_label;
8803 If this variable has a variable label, set to 1; otherwise, set to 0.
8805 @item int32 n_missing_values;
8806 If the variable has no missing values, set to 0. If the variable has
8807 one, two, or three discrete missing values, set to 1, 2, or 3,
8808 respectively. If the variable has a range for missing variables, set to
8809 -2; if the variable has a range for missing variables plus a single
8810 discrete value, set to -3.
8813 Print format for this variable. See below.
8816 Write format for this variable. See below.
8819 Variable name. The variable name must begin with a capital letter or
8820 the at-sign (@samp{@@}). Subsequent characters may also be octothorpes
8821 (@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full
8822 stops (@samp{.}). The variable name is padded on the right with spaces.
8824 @item int32 label_len;
8825 This field is present only if @code{has_var_label} is set to 1. It is
8826 set to the length, in characters, of the variable label, which must be a
8827 number between 0 and 120.
8829 @item char label[/* variable length */];
8830 This field is present only if @code{has_var_label} is set to 1. It has
8831 length @code{label_len}, rounded up to the nearest multiple of 32 bits.
8832 The first @code{label_len} characters are the variable's variable label.
8834 @item flt64 missing_values[/* variable length */];
8835 This field is present only if @code{n_missing_values} is not 0. It has
8836 the same number of elements as the absolute value of
8837 @code{n_missing_values}. For discrete missing values, each element
8838 represents one missing value. When a range is present, the first
8839 element denotes the minimum value in the range, and the second element
8840 denotes the maximum value in the range. When a range plus a value are
8841 present, the third element denotes the additional discrete missing
8842 value. HIGHEST and LOWEST are indicated as described in the chapter
8846 The @code{print} and @code{write} members of sysfile_variable are output
8847 formats coded into @code{int32} types. The LSB (least-significant byte)
8848 of the @code{int32} represents the number of decimal places, and the
8849 next two bytes in order of increasing significance represent field width
8850 and format type, respectively. The MSB (most-significant byte) is not
8851 used and should be set to zero.
8853 Format types are defined as follows:
8937 @node Value Label Record, Value Label Variable Record, Variable Record, Data File Format
8938 @section Value Label Record
8940 Value label records must follow the variable records and must precede
8941 the header termination record. Other than this, they may appear
8942 anywhere in the system file. Every value label record must be
8943 immediately followed by a label variable record, described below.
8945 Value label records begin with @code{rec_type}, an @code{int32} value
8946 set to the record type of 3. This is followed by @code{count}, an
8947 @code{int32} value set to the number of value labels present in this
8950 These two fields are followed by a series of @code{count} tuples. Each
8951 tuple is divided into two fields, the value and the label. The first of
8952 these, the value, is composed of a 64-bit value, which is either a
8953 @code{flt64} value or up to 8 characters (padded on the right to 8
8954 bytes) denoting a short string value. Whether the value is a
8955 @code{flt64} or a character string is not defined inside the value label
8958 The second field in the tuple, the label, has variable length. The
8959 first @code{char} is a count of the number of characters in the value
8960 label. The remainder of the field is the label itself. The field is
8961 padded on the right to a multiple of 64 bits in length.
8963 @node Value Label Variable Record, Document Record, Value Label Record, Data File Format
8964 @section Value Label Variable Record
8966 Every value label variable record must be immediately preceded by a
8967 value label record, described above.
8970 struct sysfile_value_label_variable
8974 int32 vars[/* variable length */];
8979 @item int32 rec_type;
8980 Record type. Always set to 4.
8983 Number of variables that the associated value labels from the value
8984 label record are to be applied.
8986 @item int32 vars[/* variable length];
8987 A list of variables to which to apply the value labels. There are
8988 @code{count} elements.
8991 @node Document Record, Machine int32 Info Record, Value Label Variable Record, Data File Format
8992 @section Document Record
8994 There must be no more than one document record per system file.
8995 Document records must follow the variable records and precede the
8996 dictionary termination record.
8999 struct sysfile_document
9003 char lines[/* variable length */][80];
9008 @item int32 rec_type;
9009 Record type. Always set to 6.
9011 @item int32 n_lines;
9012 Number of lines of documents present.
9014 @item char lines[/* variable length */][80];
9015 Document lines. The number of elements is defined by @code{n_lines}.
9016 Lines shorter than 80 characters are padded on the right with spaces.
9019 @node Machine int32 Info Record, Machine flt64 Info Record, Document Record, Data File Format
9020 @section Machine @code{int32} Info Record
9022 There must be no more than one machine @code{int32} info record per
9023 system file. Machine @code{int32} info records must follow the variable
9024 records and precede the dictionary termination record.
9027 struct sysfile_machine_int32_info
9036 int32 version_major;
9037 int32 version_minor;
9038 int32 version_revision;
9040 int32 floating_point_rep;
9041 int32 compression_code;
9043 int32 character_code;
9048 @item int32 rec_type;
9049 Record type. Always set to 7.
9051 @item int32 subtype;
9052 Record subtype. Always set to 3.
9055 Size of each piece of data in the data part, in bytes. Always set to 4.
9058 Number of pieces of data in the data part. Always set to 8.
9060 @item int32 version_major;
9061 PSPP major version number. In version @var{x}.@var{y}.@var{z}, this
9064 @item int32 version_minor;
9065 PSPP minor version number. In version @var{x}.@var{y}.@var{z}, this
9068 @item int32 version_revision;
9069 PSPP version revision number. In version @var{x}.@var{y}.@var{z},
9072 @item int32 machine_code;
9073 Machine code. PSPP always set this field to value to -1, but other
9076 @item int32 floating_point_rep;
9077 Floating point representation code. For IEEE 754 systems this is 1.
9078 IBM 370 sets this to 2, and DEC VAX E to 3.
9080 @item int32 compression_code;
9081 Compression code. Always set to 1.
9083 @item int32 endianness;
9084 Machine endianness. 1 indicates big-endian, 2 indicates little-endian.
9086 @item int32 character_code;
9087 Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3
9088 indicates 8-bit ASCII, 4 indicates DEC Kanji.
9091 @node Machine flt64 Info Record, Miscellaneous Informational Records, Machine int32 Info Record, Data File Format
9092 @section Machine @code{flt64} Info Record
9094 There must be no more than one machine @code{flt64} info record per
9095 system file. Machine @code{flt64} info records must follow the variable
9096 records and precede the dictionary termination record.
9099 struct sysfile_machine_flt64_info
9115 @item int32 rec_type;
9116 Record type. Always set to 3.
9118 @item int32 subtype;
9119 Record subtype. Always set to 4.
9122 Size of each piece of data in the data part, in bytes. Always set to 4.
9125 Number of pieces of data in the data part. Always set to 3.
9128 The system missing value.
9130 @item flt64 highest;
9131 The value used for HIGHEST in missing values.
9134 The value used for LOWEST in missing values.
9137 @node Miscellaneous Informational Records, Dictionary Termination Record, Machine flt64 Info Record, Data File Format
9138 @section Miscellaneous Informational Records
9140 Miscellaneous informational records must follow the variable records and
9141 precede the dictionary termination record.
9143 Miscellaneous informational records are ignored by PSPP when reading
9144 system files. They are not written by PSPP when writing system files.
9147 struct sysfile_misc_info
9156 char data[/* variable length */];
9161 @item int32 rec_type;
9162 Record type. Always set to 3.
9164 @item int32 subtype;
9165 Record subtype. May take any value.
9168 Size of each piece of data in the data part. Should have the value 4 or
9169 8, for @code{int32} and @code{flt64}, respectively.
9172 Number of pieces of data in the data part.
9174 @item char data[/* variable length */];
9175 Arbitrary data. There must be @code{size} times @code{count} bytes of
9179 @node Dictionary Termination Record, Data Record, Miscellaneous Informational Records, Data File Format
9180 @section Dictionary Termination Record
9182 The dictionary termination record must follow all other records, except
9183 for the actual cases, which it must precede. There must be exactly one
9184 dictionary termination record in every system file.
9187 struct sysfile_dict_term
9195 @item int32 rec_type;
9196 Record type. Always set to 999.
9199 Ignored padding. Should be set to 0.
9202 @node Data Record, , Dictionary Termination Record, Data File Format
9203 @section Data Record
9205 Data records must follow all other records in the data file. There must
9206 be at least one data record in every system file.
9208 The format of data records varies depending on whether the data is
9209 compressed. Regardless, the data is arranged in a series of 8-byte
9212 When data is not compressed, Every case is composed of @code{case_size}
9213 of these 8-byte elements, where @code{case_size} comes from the file
9214 header record (@pxref{File Header Record}). Each element corresponds to
9215 the variable declared in the respective variable record (@pxref{Variable
9216 Record}). Numeric values are given in @code{flt64} format; string
9217 values are literal characters string, padded on the right when
9220 Compressed data is arranged in the following manner: the first 8-byte
9221 element in the data section is divided into a series of 1-byte command
9222 codes. These codes have meanings as described below:
9226 Ignored. If the program writing the system file accumulates compressed
9227 data in blocks of fixed length, 0 bytes can be used to pad out extra
9228 bytes remaining at the end of a fixed-size block.
9231 These values indicate that the corresponding numeric variable has the
9232 value @code{(@var{code} - @var{bias})} for the case being read, where
9233 @var{code} is the value of the compression code and @var{bias} is the
9234 variable @code{compression_bias} from the file header. For example,
9235 code 105 with bias 100.0 (the normal value) indicates a numeric variable
9239 End of file. This code may or may not appear at the end of the data
9240 stream. PSPP always outputs this code but its use is not required.
9243 This value indicates that the numeric or string value is not
9244 compressible. The value is stored in the 8-byte element following the
9245 current block of command bytes. If this value appears twice in a block
9246 of command bytes, then it indicates the second element following the
9247 command bytes, and so on.
9250 Used to indicate a string value that is all spaces.
9253 Used to indicate the system-missing value.
9256 When the end of the first 8-byte element of command bytes is reached,
9257 any blocks of non-compressible values are skipped, and the next element
9258 of command bytes is read and interpreted, until the end of the file is
9261 @node Portable File Format, q2c Input Format, Data File Format, Top
9262 @chapter Portable File Format
9264 These days, most computers use the same internal data formats for
9265 integer and floating-point data, if one ignores little differences like
9266 big- versus little-endian byte ordering. However, occasionally it is
9267 necessary to exchange data between systems with incompatible data
9268 formats. This is what portable files are designed to do.
9270 @strong{Please note:} Although all of the following information is
9271 correct, as far as the author has been able to ascertain, it is gleaned
9272 from examination of ASCII-formatted portable files only, so some of it
9273 may be incorrect in the general case.
9276 * Portable File Characters::
9277 * Portable File Structure::
9278 * Portable File Header::
9279 * Version and Date Info Record::
9280 * Identification Records::
9281 * Variable Count Record::
9282 * Variable Records::
9283 * Value Label Records::
9284 * Portable File Data::
9287 @node Portable File Characters, Portable File Structure, Portable File Format, Portable File Format
9288 @section Portable File Characters
9290 Portable files are arranged as a series of lines of exactly 80
9291 characters each. Each line is terminated by a carriage-return,
9292 line-feed sequence (henceforth, ``newline''). Newlines are not
9293 delimiters: they are only used to avoid line-length limitations existing
9294 on some operating systems.
9296 The file must be terminated with a @samp{Z} character. In addition, if
9297 the final line in the file does not have exactly 80 characters, then it
9298 is padded on the right with @samp{Z} characters. (The file contents may
9299 be in any character set; the file contains a description of its own
9300 character set, as explained in the next section. Therefore, the
9301 @samp{Z} character is not necessarily an ASCII @samp{Z}.)
9303 For the rest of the description of the portable file format, newlines
9304 and the trailing @samp{Z}s will be ignored, as if they did not exist,
9305 because they are not an important part of understanding the file
9308 @node Portable File Structure, Portable File Header, Portable File Characters, Portable File Format
9309 @section Portable File Structure
9311 Every portable file consists of the following records, in sequence:
9319 Version and date info.
9322 Product identification.
9325 Subproduct identification (optional).
9331 Variables. Each variable record may optionally be followed by a
9332 missing value record and a variable label record.
9335 Value labels (optional).
9341 Most records are identified by a single-character tag code. The file
9342 header and version info record do not have a tag.
9344 Other than these single-character codes, there are three types of fields
9345 in a portable file: floating-point, integer, and string. Floating-point
9346 fields have the following format:
9351 Zero or more leading spaces.
9354 Optional asterisk (@samp{*}), which indicates a missing value. The
9355 asterisk must be followed by a single character, generally a period
9356 (@samp{.}), but it appears that other characters may also be possible.
9357 This completes the specification of a missing value.
9360 Optional minus sign (@samp{-}) to indicate a negative number.
9363 A whole number, consisting of one or more base-30 digits: @samp{0}
9364 through @samp{9} plus capital letters @samp{A} through @samp{T}.
9367 A fraction, consisting of a radix point (@samp{.}) followed by one or
9368 more base-30 digits (optional).
9371 An exponent, consisting of a plus or minus sign (@samp{+} or @samp{-})
9372 followed by one or more base-30 digits (optional).
9375 A forward slash (@samp{/}).
9378 Integer fields take form identical to floating-point fields, but they
9379 may not contain a fraction.
9381 String fields take the form of a integer field having value @var{n},
9382 followed by exactly @var{n} characters, which are the string content.
9384 @node Portable File Header, Version and Date Info Record, Portable File Structure, Portable File Format
9385 @section Portable File Header
9387 Every portable file begins with a 464-byte header, consisting of a
9388 200-byte collection of vanity splash strings, followed by a 256-byte
9389 character set translation table, followed by an 8-byte tag string.
9391 The 200-byte segment is divided into five 40-byte sections, each of
9392 which represents the string @code{ASCII SPSS PORT FILE} in a different
9393 character set encoding. (If the file is encoded in EBCDIC then the
9394 string is actually @code{EBCDIC SPSS PORT FILE}, and so on.) These
9395 strings are padded on the right with spaces in their own character set.
9397 It appears that these strings exist only to inform those who might view
9398 the file on a screen, and that they are not parsed by SPSS products.
9399 Thus, they can be safely ignored. For those interested, the strings are
9400 supposed to be in the following character sets, in the specified order:
9401 EBCDIC, 7-bit ASCII, CDC 6-bit ASCII, 6-bit ASCII, Honeywell 6-bit
9404 The 256-byte segment describes a mapping from the character set used in
9405 the portable file to an arbitrary character set having characters at the
9406 following positions:
9411 Control characters. Not important enough to describe in full here.
9419 Digits @samp{0} through @samp{9}.
9423 Capital letters @samp{A} through @samp{Z}.
9427 Lowercase letters @samp{a} through @samp{z}.
9439 Solid vertical pipe.
9443 Symbols @code{&[]!$*);^-/}
9447 Broken vertical pipe.
9451 Symbols @code{,%_>}?@code{`:} @c @code{?} is an inverted question mark
9455 British pound symbol.
9459 Symbols @code{@@'="}.
9463 Less than or equal symbol.
9495 Lower left corner box draw.
9499 Upper left corner box draw.
9503 Greater than or equal symbol.
9507 Superscript @samp{0} through @samp{9}.
9511 Lower right corner box draw.
9515 Upper right corner box draw.
9527 Superscript @samp{(}.
9531 Superscript @samp{)}.
9535 Horizontal dagger (?).
9539 Symbols @samp{@{@}\}.
9546 Centered dot, or bullet.
9553 Symbols that are not defined in a particular character set are set to
9554 the same value as symbol 64; i.e., to @samp{0}.
9556 The 8-byte tag string consists of the exact characters @code{SPSSPORT}
9557 in the portable file's character set, which can be used to verify that
9558 the file is indeed a portable file.
9560 @node Version and Date Info Record, Identification Records, Portable File Header, Portable File Format
9561 @section Version and Date Info Record
9563 This record does not have a tag code. It has the following structure:
9567 A single character identifying the file format version. The letter A
9568 represents version 0, and so on.
9571 An 8-character string field giving the file creation date in the format
9575 A 6-character string field giving the file creation time in the format
9579 @node Identification Records, Variable Count Record, Version and Date Info Record, Portable File Format
9580 @section Identification Records
9582 The product identification record has tag code @samp{1}. It consists of
9583 a single string field giving the name of the product that wrote the
9586 The subproduct identification record has tag code @samp{3}. It
9587 consists of a single string field giving additional information on the
9588 product that wrote the portable file.
9590 @node Variable Count Record, Variable Records, Identification Records, Portable File Format
9591 @section Variable Count Record
9593 The variable count record has tag code @samp{4}. It consists of two
9594 integer fields. The first contains the number of variables in the file
9595 dictionary. The purpose of the second is unknown; it contains the value
9596 161 in all portable files examined so far.
9598 @node Variable Records, Value Label Records, Variable Count Record, Portable File Format
9599 @section Variable Records
9601 Each variable record represents a single variable. Variable records
9602 have tag code @samp{7}. They have the following structure:
9607 Width (integer). This is 0 for a numeric variable, and a number between 1
9608 and 255 for a string variable.
9611 Name (string). 1--8 characters long. Must be in all capitals.
9614 Print format. This is a set of three integer fields:
9619 Format type (@pxref{Variable Record}).
9622 Format width. 1--40.
9625 Number of decimal places. 1--40.
9629 Write format. Same structure as the print format described above.
9632 Each variable record can optionally be followed by a missing value
9633 record, which has tag code @samp{8}. A missing value record has one
9634 field, the missing value itself (a floating-point or string, as
9635 appropriate). Up to three of these missing value records can be used.
9637 There is also a record for missing value ranges, which has tag code
9638 @samp{B}. It is followed by two fields representing the range, which
9639 are floating-point or string as appropriate. If a missing value range
9640 is present, it may be followed by a single missing value record.
9642 Tag codes @samp{9} and @samp{A} represent @code{LO THRU @var{x}} and
9643 @code{@var{x} THRU HI} ranges, respectively. Each is followed by a
9644 single field representing @var{x}. If one of the ranges is present, it
9645 may be followed by a single missing value record.
9647 In addition, each variable record can optionally be followed by a
9648 variable label record, which has tag code @samp{C}. A variable label
9649 record has one field, the variable label itself (string).
9651 @node Value Label Records, Portable File Data, Variable Records, Portable File Format
9652 @section Value Label Records
9654 Value label records have tag code @samp{D}. They have the following
9659 Variable count (integer).
9662 List of variables (strings). The variable count specifies the number in
9663 the list. Variables are specified by their names. All variables must
9664 be of the same type (numeric or string).
9667 Label count (integer).
9670 List of (value, label) tuples. The label count specifies the number of
9671 tuples. Each tuple consists of a value, which is numeric or string as
9672 appropriate to the variables, followed by a label (string).
9675 @node Portable File Data, , Value Label Records, Portable File Format
9676 @section Portable File Data
9678 The data record has tag code @samp{F}. There is only one tag for all
9679 the data; thus, all the data must follow the dictionary. The data is
9680 terminated by the end-of-file marker @samp{Z}, which is not valid as the
9681 beginning of a data element.
9683 Data elements are output in the same order as the variable records
9684 describing them. String variables are output as string fields, and
9685 numeric variables are output as floating-point fields.
9687 @node q2c Input Format, Bugs, Portable File Format, Top
9688 @chapter @code{q2c} Input Format
9690 PSPP statistical procedures have a bizarre and somewhat irregular
9691 syntax. Despite this, a parser generator has been written that
9692 adequately addresses many of the possibilities and tries to provide
9693 hooks for the exceptional cases. This parser generator is named
9697 * Invoking q2c:: q2c command-line syntax.
9698 * q2c Input Structure:: High-level layout of the input file.
9699 * Grammar Rules:: Syntax of the grammar rules.
9702 @node Invoking q2c, q2c Input Structure, q2c Input Format, q2c Input Format
9703 @section Invoking q2c
9706 q2c @var{input.q} @var{output.c}
9709 @code{q2c} translates a @samp{.q} file into a @samp{.c} file. It takes
9710 exactly two command-line arguments, which are the input file name and
9711 output file name, respectively. @code{q2c} does not accept any
9712 command-line options.
9714 @node q2c Input Structure, Grammar Rules, Invoking q2c, q2c Input Format
9715 @section @code{q2c} Input Structure
9717 @code{q2c} input files are divided into two sections: the grammar rules
9718 and the supporting code. The @dfn{grammar rules}, which make up the
9719 first part of the input, are used to define the syntax of the
9720 statistical procedure to be parsed. The @dfn{supporting code},
9721 following the grammar rules, are copied largely unchanged to the output
9722 file, except for certain escapes.
9724 The most important lines in the grammar rules are used for defining
9725 procedure syntax. These lines can be prefixed with a dollar sign
9726 (@samp{$}), which prevents Emacs' CC-mode from munging them. Besides
9727 this, a bang (@samp{!}) at the beginning of a line causes the line,
9728 minus the bang, to be written verbatim to the output file (useful for
9729 comments). As a third special case, any line that begins with the exact
9730 characters @code{/* *INDENT} is ignored and not written to the output.
9731 This allows @code{.q} files to be processed through @code{indent}
9732 without being munged.
9734 The syntax of the grammar rules themselves is given in the following
9737 The supporting code is passed into the output file largely unchanged.
9738 However, the following escapes are supported. Each escape must appear
9739 on a line by itself.
9742 @item /* (header) */
9744 Expands to a series of C @code{#include} directives which include the
9745 headers that are required for the parser generated by @code{q2c}.
9747 @item /* (decls @var{scope}) */
9749 Expands to C variable and data type declarations for the variables and
9750 @code{enum}s input and output by the @code{q2c} parser. @var{scope}
9751 must be either @code{local} or @code{global}. @code{local} causes the
9752 declarations to be output as function locals. @code{global} causes them
9753 to be declared as @code{static} module variables; thus, @code{global} is
9754 a bit of a misnomer.
9756 @item /* (parser) */
9758 Expands to the entire parser. Must be enclosed within a C function.
9762 Expands to a set of calls to the @code{free} function for variables
9763 declared by the parser. Only needs to be invoked if subcommands of type
9764 @code{string} are used in the grammar rules.
9767 @node Grammar Rules, , q2c Input Structure, q2c Input Format
9768 @section Grammar Rules
9770 The grammar rules describe the format of the syntax that the parser
9771 generated by @code{q2c} will understand. The way that the grammar rules
9772 are included in @code{q2c} input file are described above.
9774 The grammar rules are divided into tokens of the following types:
9777 @item Identifier (@code{ID})
9779 An identifier token is a sequence of letters, digits, and underscores
9780 (@samp{_}). Identifiers are @emph{not} case-sensitive.
9782 @item String (@code{STRING})
9784 String tokens are initiated by a double-quote character (@samp{"}) and
9785 consist of all the characters between that double quote and the next
9786 double quote, which must be on the same line as the first. Within a
9787 string, a backslash can be used as a ``literal escape''. The only
9788 reasons to use a literal escape are to include a double quote or a
9789 backslash within a string.
9791 @item Special character
9793 Other characters, other than whitespace, constitute tokens in
9798 The syntax of the grammar rules is as follows:
9801 grammar-rules ::= ID : subcommands .
9802 subcommands ::= subcommand
9803 ::= subcommands ; subcommand
9806 The syntax begins with an ID or STRING token that gives the name of the
9807 procedure to be parsed. The rest of the syntax consists of subcommands
9808 separated by semicolons (@samp{;}) and terminated with a full stop
9812 subcommand ::= sbc-options ID sbc-defn
9815 ::= sbc-options sbc-options
9818 sbc-defn ::= opt-prefix = specifiers
9819 ::= [ ID ] = array-sbc
9820 ::= opt-prefix = sbc-special-form
9825 Each subcommand can be prefixed with one or more option characters. An
9826 asterisk (@samp{*}) is used to indicate the default subcommand; the
9827 keyword used for the default subcommand can be omitted in the PSPP
9828 syntax file. A plus sign (@samp{+}) is used to indicate that a
9829 subcommand can appear more than once; if it is not present then that
9830 subcommand can appear no more than once.
9832 The subcommand name appears after the option characters.
9834 There are three forms of subcommands. The first and most common form
9835 simply gives an equals sign (@samp{=}) and a list of specifiers, which
9836 can each be set to a single setting. The second form declares an array,
9837 which is a set of flags that can be individually turned on by the user.
9838 There are also several special forms that do not take a list of
9841 Arrays require an additional @code{ID} argument. This is used as a
9842 prefix, prepended to the variable names constructed from the
9843 specifiers. The other forms also allow an optional prefix to be
9847 array-sbc ::= alternatives
9848 ::= array-sbc , alternatives
9850 ::= alternatives | ID
9853 An array subcommand is a set of Boolean values that can independently be
9854 turned on by the user, listed separated by commas (@samp{,}). If an value has more
9855 than one name then these names are separated by pipes (@samp{|}).
9858 specifiers ::= specifier
9859 ::= specifiers , specifier
9860 specifier ::= opt-id : settings
9865 Ordinary subcommands (other than arrays and special forms) require a
9866 list of specifiers. Each specifier has an optional name and a list of
9867 settings. If the name is given then a correspondingly named variable
9868 will be used to store the user's choice of setting. If no name is given
9869 then there is no way to tell which setting the user picked; in this case
9870 the settings should probably have values attached.
9873 settings ::= setting
9874 ::= settings / setting
9875 setting ::= setting-options ID setting-value
9882 Individual settings are separated by forward slashes (@samp{/}). Each
9883 setting can be as little as an @code{ID} token, but options and values
9884 can optionally be included. The @samp{*} option means that, for this
9885 setting, the @code{ID} can be omitted. The @samp{!} option means that
9886 this option is the default for its specifier.
9890 ::= ( setting-value-2 )
9892 setting-value-2 ::= setting-value-options setting-value-type : ID
9893 setting-value-restriction
9894 setting-value-options ::=
9896 setting-value-type ::= N
9898 setting-value-restriction ::=
9902 Settings may have values. If the value must be enclosed in parentheses,
9903 then enclose the value declaration in parentheses. Declare the setting
9904 type as @samp{n} or @samp{d} for integer or floating point type,
9905 respectively. The given @code{ID} is used to construct a variable name.
9906 If option @samp{*} is given, then the value is optional; otherwise it
9907 must be specified whenever the corresponding setting is specified. A
9908 ``restriction'' can also be specified which is a string giving a C
9909 expression limiting the valid range of the value. The special escape
9910 @code{%s} should be used within the restriction to refer to the
9911 setting's value variable.
9914 sbc-special-form ::= VAR
9915 ::= VARLIST varlist-options
9916 ::= INTEGER opt-list
9919 ::= STRING @r{(the literal word STRING)} string-options
9926 ::= ( STRING STRING )
9929 The special forms are of the following types:
9934 A single variable name.
9938 A list of variables. If given, the string can be used to provide
9939 @code{PV_@var{*}} options to the call to @code{parse_variables}.
9943 A single integer value.
9947 A list of integers separated by spaces or commas.
9951 A single floating-point value.
9955 A list of floating-point values.
9959 A single positive integer value.
9963 A string value. If the options are given then the first string is an
9964 expression giving a restriction on the value of the string; the second
9965 string is an error message to display when the restriction is violated.
9969 A custom function is used to parse this subcommand. The function must
9970 have prototype @code{int custom_@var{name} (void)}. It should return 0
9971 on failure (when it has already issued an appropriate diagnostic), 1 on
9972 success, or 2 if it fails and the calling function should issue a syntax
9973 error on behalf of the custom handler.
9977 @node Bugs, Function Index, q2c Input Format, Top
9981 * Known bugs:: Pointers to other files.
9982 * Contacting the Author:: Where to send the bug reports.
9985 @node Known bugs, Contacting the Author, Bugs, Bugs
9988 This is the list of known bugs in PSPP. In addition, @xref{Not
9989 Implemented}, and @xref{Functions Not Implemented}, for lists of bugs
9990 due to features not implemented. For known bugs in individual language
9991 features, see the documentation for that feature.
9995 Nothing has yet been tested exhaustively. Be cautious using PSPP to
9996 make important decisions.
9999 @code{make check} fails on some systems that don't like the syntax. I'm
10000 not sure why. If someone could make an attempt to track this down, it
10001 would be appreciated.
10004 PostScript driver bugs:
10008 Does not support driver arguments `max-fonts-simult' or
10009 `optimize-text-size'.
10012 Minor problems with font-encodings.
10015 Fails to align fonts along their baselines.
10018 Does not support certain bizarre line intersections--should
10019 never crop up in practice.
10022 Does not gracefully substitute for existing fonts whose
10023 encodings are missing.
10026 Does not perform italic correction or left italic correction
10030 Encapsulated PostScript is unimplemented.
10037 Does not support `infinite length' or `infinite width' paper.
10041 See below for information on reporting bugs not listed here.
10043 @node Contacting the Author, , Known bugs, Bugs
10044 @section Contacting the Author
10046 The author can be contacted at e-mail address
10051 @code{<blp@@gnu.org>}.
10054 PSPP bug reports should be sent to
10056 <bug-gnu-pspp@@gnu.org>.
10059 @code{<bug-gnu-pspp@@gnu.org>}.
10062 @node Function Index, Concept Index, Bugs, Top
10063 @chapter Function Index
10066 @node Concept Index, Command Index, Function Index, Top
10067 @chapter Concept Index
10070 @node Command Index, , Concept Index, Top
10071 @chapter Command Index
10077 @c Local Variables:
10078 @c compile-command: "makeinfo pspp.texi"