X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fpspp.texi;h=d399cb7e69c3e407c3077a124431d017e044f9a3;hb=0ec7c606844768cfc501d6213ffa17ebdfda1bab;hp=6d325c51d1364b0f5b13d3c16fec9bfa899e10b4;hpb=0280721550011ce14d0632f547746ea03e966bf7;p=pspp diff --git a/doc/pspp.texi b/doc/pspp.texi index 6d325c51d1..d399cb7e69 100644 --- a/doc/pspp.texi +++ b/doc/pspp.texi @@ -1,17 +1,53 @@ \input texinfo @c -*- texinfo -*- +@c PSPP - a program for statistical analysis. +@c Copyright (C) 2017 Free Software Foundation, Inc. +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 +@c or any later version published by the Free Software Foundation; +@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the section entitled "GNU +@c Free Documentation License". + @c %**start of header @setfilename pspp.info @settitle PSPP -@set TIMESTAMP Time-stamp: Sat Dec 20 20:25:33 WST 2003 jmd -@set EDITION 0.2 -@set VERSION 0.3 @c For double-sided printing, uncomment: @c @setchapternewpage odd @c %**end of header + +@macro note{param1} +@quotation +@strong{Please note:} \param1\ +@end quotation +@end macro + +@include version.texi + +@c This macro should be used for marking command names. For the purposes of markup, +@c no distinction is made between ``commands'' and ``procedures''. @macro cmd{CMDNAME} -\CMDNAME\ +@code{\CMDNAME\} +@end macro + +@c This macro is used for fragments of command syntax that are not in themselves command names. +@c It does not necessarily have to be a subcommand. +@macro subcmd{CMDNAME} +@code{\CMDNAME\} +@end macro + +@c Use this macro to refer to PSPP itself . Not when giving a shell command line example. +@macro pspp +@sc{pspp} +@end macro + + +@ifset MISSING_CLICKSEQUENCE +@alias clicksequence = asis +@macro click {} +-> @end macro +@end ifset @iftex @finalout @@ -20,10033 +56,116 @@ @dircategory Math @direntry * PSPP: (pspp). Statistical analysis package. +* PSPPIRE: (pspp). Graphical user interface to @pspp{}. @end direntry -@ifinfo -PSPP, for statistical analysis of sampled data, by Ben Pfaff. - -This file documents PSPP, a statistical package for analysis of -sampled data that uses a command language compatible with SPSS. - -Copyright (C) 1996-9, 2000 Free Software Foundation, Inc. +@copying +This manual is for GNU PSPP version @value{VERSION}, +software for statistical analysis. -This version of the PSPP documentation is consistent with version 2 of -``texinfo.tex''. +Copyright @copyright{} 1997, 1998, 2004, 2005, 2009, 2012, 2013, 2014, 2016 Free Software Foundation, Inc. -Permission is granted to make and distribute verbatim copies of this -manual provided the copyright notice and this permission notice are -preserved on all copies. - -@ignore -Permission is granted to process this file through TeX and print the -results, provided the printed document carries copying permission notice -identical to this one except for the removal of this paragraph (this -paragraph not being relevant to the printed manual). - -@end ignore -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided that the -entire resulting derived work is distributed under the terms of a -permission notice identical to this one. - -Permission is granted to copy and distribute translations of this -manual into another language, under the above condition for modified -versions, except that this permission notice may be stated in a -translation approved by the Free Software Foundation. -@end ifinfo +@quotation +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 +or any later version published by the Free Software Foundation; +with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +A copy of the license is included in the section entitled "GNU +Free Documentation License". +@end quotation +@end copying @titlepage -@title PSPP -@subtitle A System for Statistical Analysis -@subtitle Edition @value{EDITION}, for PSPP version @value{VERSION} -@author by Ben Pfaff - +@title PSPP Users' Guide +@subtitle GNU PSPP Statistical Analysis Software +@subtitle Release @value{VERSION} @page @vskip 0pt plus 1filll +@insertcopying +@end titlepage -PSPP Copyright @copyright{} 1997, 1998 Free Software Foundation, Inc. - -Permission is granted to make and distribute verbatim copies of this -manual provided the copyright notice and this permission notice are -preserved on all copies. +@c @chapheading Acknowledgements +The authors wish to thank Network Theory Ltd +@url{http://www.network-theory.co.uk} +for their financial support +in the production of this manual. -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided that the -entire derived work is distributed under the terms of a permission -notice identical to this one. -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that this permission notice may be stated in a translation -approved by the Foundation. -@end titlepage +@contents -@node Top, Introduction, (dir), (dir) -@ifinfo -@top PSPP -This file documents the PSPP package for statistical analysis of sampled -data. This is edition @value{EDITION}, for PSPP version -@value{VERSION}, last modified at @value{TIMESTAMP}. +@ifnottex +@node Top +@top GNU PSPP -@end ifinfo +@insertcopying +@end ifnottex @menu * Introduction:: Description of the package. * License:: Your rights and obligations. -* Credits:: Acknowledgement of authors. - -* Installation:: How to compile and install PSPP. -* Configuration:: Configuring PSPP. -* Invocation:: Starting and running PSPP. +* Invoking PSPP:: Starting the PSPP text-based interface. +* Invoking PSPPIRE:: Starting the PSPP graphical user interface. +* Using PSPP:: How to use PSPP --- A brief tutorial. * Language:: Basics of the PSPP command language. * Expressions:: Numeric and string expression syntax. * Data Input and Output:: Reading data from user files. -* System and Portable Files:: Dealing with system & portable files. +* System and Portable File IO:: Reading and writing system & portable files. +* Combining Data Files:: Combining data from multiple files. * Variable Attributes:: Adjusting and examining variables. * Data Manipulation:: Simple operations on data. * Data Selection:: Select certain cases for analysis. * Conditionals and Looping:: Doing things many times or not at all. * Statistics:: Basic statistical procedures. * Utilities:: Other commands. -* Not Implemented:: What's not here yet - -* Data File Format:: Format of PSPP system files. -* Portable File Format:: Format of PSPP portable files. -* q2c Input Format:: Format of syntax accepted by q2c. +* Invoking pspp-convert:: Utility for converting among file formats. +* Invoking pspp-dump-sav:: Utility for examining raw .sav files. +* Not Implemented:: What's not here yet * Bugs:: Known problems; submitting bug reports. * Function Index:: Index of PSPP functions for expressions. -* Concept Index:: Index of concepts. * Command Index:: Index of PSPP procedures. +* Concept Index:: Index of concepts. +* Installation:: Installing pspp +* GNU Free Documentation License:: License for copying this manual. @end menu -@node Introduction, License, Top, Top -@chapter Introduction -@cindex introduction - -@cindex PSPP language -@cindex language, PSPP -PSPP is a tool for statistical analysis of sampled data. It reads a -syntax file and a data file, analyzes the data, and writes the results -to a listing file or to standard output. - -The language accepted by PSPP is similar to those accepted by SPSS -statistical products. The details of PSPP's language are given -later in this manual. - -@cindex files, PSPP -@cindex output, PSPP -@cindex PostScript -@cindex graphics -@cindex Ghostscript -@cindex Free Software Foundation -PSPP produces output in two forms: tables and charts. Both of these can -be written in several formats; currently, ASCII, PostScript, and HTML -are supported. In the future, more drivers, such as PCL and X Window -System drivers, may be developed. For now, Ghostscript, available from -the Free Software Foundation, may be used to convert PostScript chart -output to other formats. - -The current version of PSPP, @value{VERSION}, is woefully incomplete in -terms of its statistical procedure support. PSPP is a work in progress. -The author hopes to support fully support all features in the products -that PSPP replaces, eventually. The author welcomes questions, -comments, donations, and code submissions. @xref{Bugs,,Submitting Bug -Reports}, for instructions on contacting the author. - -@node License, Credits, Introduction, Top -@chapter Your rights and obligations -@cindex license -@cindex your rights and obligations -@cindex rights, your -@cindex obligations, your - -@cindex Free Software Foundation -@cindex GNU General Public License -@cindex General Public License -@cindex GPL -@cindex distribution -@cindex redistribution -Most of PSPP is distributed under the GNU General Public -License. The General Public License says, in effect, that you may -modify and distribute PSPP as you like, as long as you grant the -same rights to others. It also states that you must provide source code -when you distribute PSPP, or, if you obtained PSPP -source code from an anonymous ftp site, give out the name of that site. - -The General Public License is given in full in the source distribution -as file @file{COPYING}. In Debian GNU/Linux, this file is also -available as file @file{/usr/share/common-licenses/GPL-2}. - -To quote the GPL itself: - -@quotation -This program is free software; you can redistribute it and/or modify it -under the terms of the GNU General Public License as published by the -Free Software Foundation; either version 2 of the License, or (at your -option) any later version. - -This program is distributed in the hope that it will be useful, but -WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -General Public License for more details. - -You should have received a copy of the GNU General Public License along -with this program; if not, write to the Free Software Foundation, Inc., -59 Temple Place, Suite 330, Boston, MA 02111-1307 USA -@end quotation - -@node Credits, Installation, License, Top -@chapter Credits -@cindex credits -@cindex authors - -@cindex Pfaff, Ben -Most of PSPP, as well as this manual, -was written by Ben Pfaff. @xref{Contacting the Author}, for -instructions on contacting the author. - -@cindex Covington, Michael A. -@cindex Van Zandt, James -@cindex @file{ftp.cdrom.com} -@cindex @file{/pub/algorithms/c/julcal10} -@cindex @file{julcal.c} -@cindex @file{julcal.h} -The PSPP source code incorporates @code{julcal10} originally -written by Michael A. Covington and translated into C by Jim Van Zandt. -The original package can be found in directory -@url{ftp://ftp.cdrom.com/pub/algorithms/c/julcal10}. The entire -contents of that directory constitute the package. The files actually -used in PSPP are @code{julcal.c} and @code{julcal.h}. - -@node Installation, Configuration, Credits, Top -@chapter Installing PSPP -@cindex installation -@cindex PSPP, installing - -@cindex GNU C compiler -@cindex gcc -@cindex compiler, recommended -@cindex compiler, gcc -PSPP conforms to the GNU Coding Standards. PSPP is written in, and -requires for proper operation, ANSI/ISO C. You might want to -additionally note the following points: - -@itemize @bullet -@item -The compiler and linker must allow for significance of several -characters in external identifiers. The exact number is unknown but at -least 31 is recommended. - -@item -The @code{int} type must be 32 bits or wider. - -@item -The recommended compiler is gcc 2.7.2.1 or later, but any ANSI compiler -will do if it fits the above criteria. -@end itemize - -Many UNIX variants should work out-of-the-box, as PSPP uses GNU -autoconf to detect differences between environments. Please report any -problems with compilation of PSPP under UNIX and UNIX-like operating -systems---portability is a major concern of the author. - -The pages below give specific instructions for installing PSPP -on each type of system mentioned above. - -@menu -* UNIX installation:: Installing on UNIX-like environments. -@end menu - -@node UNIX installation, , Installation, Installation -@section UNIX installation -@cindex UNIX, installing PSPP under -@cindex installation, under UNIX -@noindent -To install PSPP under a UNIX-like operating system, follow the steps -below in order. Some of the text below was taken directly from various -Free Software Foundation sources. - -@enumerate -@item -@code{cd} to the directory containing the PSPP source. - -@cindex configure, GNU -@cindex GNU configure -@item -Type @samp{./configure} to configure for your particular operating -system and compiler. Running @code{configure} takes a while. While -running, it displays some messages telling which features it is checking -for. - -You can optionally supply some options to @code{configure} to -give it hints about how to do its job. Type @code{./configure --help} -to see a list of options. One of the most useful options is -@samp{--with-checker}, which enables the use of the Checker memory -debugger under supported operating systems. Checker must already be -installed to use this option. Do not use @samp{--with-checker} if you -are not debugging PSPP itself. - -@cindex @file{Makefile} -@cindex @file{config.h} -@cindex @file{pref.h} -@cindex makefile -@item -(optional) Edit @file{Makefile}, @file{config.h}, and @file{pref.h}. -These files are produced by @code{configure}. Note that most PSPP -settings can be changed at runtime. - -@file{pref.h} is only generated by @code{configure} if it does not -already exist. (It's copied from @file{prefh.orig}.) - -@cindex compiling -@item -Type @samp{make} to compile the package. If there are any errors during -compilation, try to fix them. If modifications are necessary to compile -correctly under your configuration, contact the author. -@xref{Bugs,,Submitting Bug Reports}, for details. - -@cindex self-tests, running -@item -Type @samp{make check} to run self-tests on the compiled PSPP package. - -@cindex installation -@cindex PSPP, installing -@cindex @file{/usr/local/share/pspp/} -@cindex @file{/usr/local/bin/} -@cindex @file{/usr/local/info/} -@cindex documentation, installing -@item -Become the superuser and type @samp{make install} to install the -PSPP binaries, by default in @file{/usr/local/bin/}. The -directory @file{/usr/local/share/pspp/} is created and populated with -files needed by PSPP at runtime. This step will also cause the -PSPP documentation to be installed in @file{/usr/local/info/}, -but only if that directory already exists. - -@item -(optional) Type @samp{make clean} to delete the PSPP binaries -from the source tree. -@end enumerate - -@node Configuration, Invocation, Installation, Top -@chapter Configuring PSPP -@cindex configuration -@cindex PSPP, configuring - -PSPP has dozens of configuration possibilities and hundreds of -settings. This is both a bane and a blessing. On one hand, it's -possible to easily accommodate diverse ranges of setups. But, on the -other, the multitude of possibilities can overwhelm the casual user. -Fortunately, the configuration mechanisms are profusely described in the -sections below@enddots{} - -@menu -* File locations:: How PSPP finds config files. -* Configuration techniques:: Many different methods of configuration@enddots{} -* Configuration files:: How configuration files are read. -* Environment variables:: All about environment variables. -* Output devices:: Describing your terminal(s) and printer(s). -* PostScript driver class:: Configuration of PostScript devices. -* ASCII driver class:: Configuration of character-code devices. -* HTML driver class:: Configuration for HTML output. -* Miscellaneous configuring:: Even more configuration variables. -* Improving output quality:: Hints for producing ever-more-lovely output. -@end menu - -@node File locations, Configuration techniques, Configuration, Configuration -@section Locating configuration files - -PSPP uses the same method to find most of its configuration files: - -@enumerate -@item -The @dfn{base name} of the file being sought is determined. - -@item -The path to search is determined. - -@item -Each directory in the search path, from left to right, is searched for a -file with the name of the base name. The first occurrence is read -as the configuration file. -@end enumerate - -The first two steps are elaborated below for the sake of our pedantic -friends. - -@enumerate -@item -A @dfn{base name} is a file name lacking an absolute directory -reference. Some examples of base names are: @file{ps-encodings}, -@file{devices}, @file{devps/DESC} (under UNIX), @file{devps\DESC} (under -M$ environments). - -Determining the base name is a two-step process: - -@enumerate a -@item -If the appropriate environment variable is defined, the value of that -variable is used (@pxref{Environment variables}). For instance, when -searching for the output driver initialization file, the variable -examined is @code{STAT_OUTPUT_INIT_FILE}. - -@item -Otherwise, the compiled-in default is used. For example, when searching -for the output driver initialization file, the default base name is -@file{devices}. -@end enumerate - -@strong{Please note:} If a user-specified base name does contain an -absolute directory reference, as in a file name like -@file{/home/pfaff/fonts/TR}, no path is searched---the file name is used -exactly as given---and the algorithm terminates. - -@item -The path is the first of the following that is defined: - -@itemize @bullet -@item -A variable definition for the path given in the user environment. This -is a PSPP-specific environment variable name; for instance, -@code{STAT_OUTPUT_INIT_PATH}. - -@item -In some cases, another, less-specific environment variable is checked. -For instance, when searching for font files, the PostScript driver first -checks for a variable with name @code{STAT_GROFF_FONT_PATH}, then for -one with name @code{GROFF_FONT_PATH}. (However, font searching has its -own list of esoteric search rules.) - -@item -The configuration file path, which is itself determined by the -following rules: - -@enumerate a -@item -If the command line contains an option of the form @samp{-B @var{path}} -or @samp{--config-dir=@var{path}}, then the value given on the -rightmost occurrence of such an option is used. - -@item -Otherwise, if the environment variable @code{STAT_CONFIG_PATH} is -defined, the value of that variable is used. - -@item -Otherwise, the compiled-in fallback default is used. On UNIX machines, -the default fallback path is - -@enumerate 1 -@item -@file{~/.pspp} - -@item -@file{/usr/local/lib/pspp} - -@item -@file{/usr/lib/pspp} -@end enumerate - -On DOS machines, the default fallback path is: - -@enumerate 1 -@item -All the paths from the DOS search path in the @samp{PATH} environment -variable, in left-to-right order. - -@item -@file{C:\PSPP}, as a last resort. -@end enumerate - -Note that the installer of PSPP can easily change this default -fallback path; thus the above should not be taken as gospel. -@end enumerate -@end itemize -@end enumerate - -As a final note: Under DOS, directories given in paths are delimited by -semicolons (@samp{;}); under UNIX, directories are delimited by colons -(@samp{:}). This corresponds with the standard path delimiter under -these OSes. - -@node Configuration techniques, Configuration files, File locations, Configuration -@section Configuration techniques - -There are many ways that PSPP can be configured. These are -described in the list below. Values given by earlier items take -precedence over those given by later items. - -@enumerate -@item -Syntax commands that modify settings, such as @cmd{SET}. @xref{SET}. - -@item -Command-line options. @xref{Invocation}. - -@item -PSPP-specific environment variable contents. @xref{Environment -variables}. - -@item -General environment variable contents. @xref{Environment variables}. - -@item -Configuration file contents. @xref{Configuration files}. - -@item -Fallback defaults. -@end enumerate - -Some of the above may not apply to a particular setting. For instance, -the current pager (such as @samp{more}, @samp{most}, or @samp{less}) -cannot be determined by configuration file contents because there is no -appropriate configuration file. - -@node Configuration files, Environment variables, Configuration techniques, Configuration -@section Configuration files - -Most configuration files have a common form: - -@itemize @bullet -@item -Each line forms a separate command or directive. This means that lines -cannot be broken up, unless they are spliced together with a trailing -backslash, as described below. - -@item -Before anything else is done, trailing whitespace is removed. - -@item -When a line ends in a backslash (@samp{\}), the backslash is removed, -and the next line is read and appended to the current line. - -@itemize @minus -@item -Whitespace preceding the backslash is retained. - -@item -This rule continues to be applied until the line read does not end in a -backslash. - -@item -It is an error if the last line in the file ends in a backslash. -@end itemize - -@item -Comments are introduced by an octothorpe (@samp{#}), and continue until the -end of the line. - -@itemize @minus -@item -An octothorpe inside balanced pairs of double quotation marks (@samp{"}) -or single quotation marks (@samp{'}) does not introduce a comment. - -@item -The backslash character can be used inside balanced quotes of either -type to escape the following character as a literal character. - -(This is distinct from the use of a backslash as a line-splicing -character.) - -@item -Line splicing takes place before comment removal. -@end itemize - -@item -Blank lines, and lines that contain only whitespace, are ignored. -@end itemize - -@node Environment variables, Output devices, Configuration files, Configuration -@section Environment variables - -You may think the concept of environment variables is a fairly simple -one. However, the author of PSPP has found a way to complicate -even something so simple. Environment variables are further described -in the sections below: - -@menu -* Variable values:: Values of variables are determined this way. -* Environment substitutions:: How environment substitutions are made. -* Predefined variables:: A few variables are automatically defined. -@end menu - -@node Variable values, Environment substitutions, Environment variables, Environment variables -@subsection Values of environment variables - -Values for environment variables are obtained by the following means, -which are arranged in order of decreasing precedence: - -@enumerate -@item -Command-line options. @xref{Invocation}. - -@item -The @file{environment} configuration file---more on this below. - -@item -Actual environment variables (defined in the shell or other parent -process). -@end enumerate - -The @file{environment} configuration file is located through application -of the usual algorithm for configuration files (@pxref{File locations}), -except that its contents do not affect the search path used to find -@file{environment} itself. Use of @file{environment} is discouraged on -systems that allow an arbitrarily large environment; it is supported for -use on systems like MS-DOS that limit environment size. - -@file{environment} is composed of lines having the form -@samp{@var{key}=@var{value}}, where @var{key} and the equals sign -(@samp{=}) are required, and @var{value} is optional. If @var{value} is -given, variable @var{key} is given that value; if @var{value} is absent, -variable @var{key} is undefined (deleted). Variables may not be defined -with a null value. - -Environment substitutions are performed on each line in the file -(@pxref{Environment substitutions}). - -See @ref{Configuration files}, for more details on formatting of the -environment configuration file. - -@quotation -@strong{Please note:} Support for @file{environment} is not yet -implemented. -@end quotation - -@node Environment substitutions, Predefined variables, Variable values, Environment variables -@subsection Environment substitutions - -Much of the power of environment variables lies in the way that they may -be substituted into configuration files. Variable substitutions are -described below. - -The line is scanned from left to right. In this scan, all characters -other than dollar signs (@samp{$}) are retained unmolested. Dollar -signs, however, introduce an environment variable reference. References -take three forms: - -@table @code -@item $@var{var} -Replaced by the value of environment variable @var{var}, determined as -specified in @ref{Variable values}. @var{var} must be one of the -following: - -@itemize @bullet -@item -One or more letters. - -@item -Exactly one nonalphabetic character. This may not be a left brace -(@samp{@{}). -@end itemize - -@item $@{@var{var}@} -Same as above, but @var{var} may contain any character (except -@samp{@}}). - -@item $$ -Replaced by a single dollar sign. -@end table - -Undefined variables expand to a empty value. - -@node Predefined variables, , Environment substitutions, Environment variables -@subsection Predefined environment variables - -There are two environment variables predefined for use in environment -substitutions: - -@table @samp -@item VER -Defined as the version number of PSPP, as a string, in a format -something like @samp{0.9.4}. - -@item ARCH -Defined as the host architecture of PSPP, as a string, in standard -cpu-manufacturer-OS format. For instance, Debian GNU/Linux 1.1 on an -Intel machine defines this as @samp{i586-unknown-linux}. This is -somewhat dependent on the system used to compile PSPP. -@end table - -Nothing prevents these values from being overridden, although it's a -good idea not to do so. - -@node Output devices, PostScript driver class, Environment variables, Configuration -@section Output devices - -Configuring output devices is the most complicated aspect of configuring -PSPP. The output device configuration file is named -@file{devices}. It is searched for using the usual algorithm for -finding configuration files (@pxref{File locations}). Each line in the -file is read in the usual manner for configuration files -(@pxref{Configuration files}). - -Lines in @file{devices} are divided into three categories, described -briefly in the table below: - -@table @i -@item driver category definitions -Define a driver in terms of other drivers. - -@item macro definitions -Define environment variables local to the the output driver -configuration file. - -@item device definitions -Describe the configuration of an output device. -@end table - -The following sections further elaborate the contents of the -@file{devices} file. - -@menu -* Driver categories:: How to organize the driver namespace. -* Macro definitions:: Environment variables local to @file{devices}. -* Device definitions:: Output device descriptions. -* Dimensions:: Lengths, widths, sizes, @enddots{} -* papersize:: Letter, legal, A4, envelope, @enddots{} -* Distinguishing line types:: Details on @file{devices} parsing. -* Tokenizing lines:: Dividing @file{devices} lines into tokens. -@end menu - -@node Driver categories, Macro definitions, Output devices, Output devices -@subsection Driver categories - -Drivers can be divided into categories. Drivers are specified by their -names, or by the names of the categories that they are contained in. -Only certain drivers are enabled each time PSPP is run; by -default, these are the drivers in the category `default'. To enable a -different set of drivers, use the @samp{-o @var{device}} command-line -option (@pxref{Invocation}). - -Categories are specified with a line of the form -@samp{@var{category}=@var{driver1} @var{driver2} @var{driver3} @var{@dots{}} -@var{driver@var{n}}}. This line specifies that the category -@var{category} is composed of drivers named @var{driver1}, -@var{driver2}, and so on. There may be any number of drivers in the -category, from zero on up. - -Categories may also be specified on the command line -(@pxref{Invocation}). - -This is all you need to know about categories. If you're still curious, -read on. - -First of all, the term `categories' is a bit of a misnomer. In fact, -the internal representation is nothing like the hierarchy that the term -seems to imply: a linear list is used to keep track of the enabled -drivers. - -When PSPP first begins reading @file{devices}, this list contains -the name of any drivers or categories specified on the command line, or -the single item `default' if none were specified. - -Each time a category definition is specified, the list is searched for -an item with the value of @var{category}. If a matching item is found, -it is deleted. If there was a match, the list of drivers (@var{driver1} -through @var{driver@var{n}}) is then appended to the list. - -Each time a driver definition line is encountered, the list is searched. -If the list contains an item with that driver's name, the driver is -enabled and the item is deleted from the list. Otherwise, the driver -is not enabled. - -It is an error if the list is not empty when the end of @file{devices} -is reached. - -@node Macro definitions, Device definitions, Driver categories, Output devices -@subsection Macro definitions - -Macro definitions take the form @samp{define @var{macroname} -@var{definition}}. In such a macro definition, the environment variable -@var{macroname} is defined to expand to the value @var{definition}. -Before the definition is made, however, any macros used in -@var{definition} are expanded. - -Please note the following nuances of macro usage: - -@itemize @bullet -@item -For the purposes of this section, @dfn{macro} and @dfn{environment -variable} are synonyms. - -@item -Macros may not take arguments. - -@item -Macros may not recurse. - -@item -Macros are just environment variable definitions like other environment -variable definitions, with the exception that they are limited in scope -to the @file{devices} configuration file. - -@item -Macros override other all environment variables of the same name (within -the scope of @file{devices}). - -@item -Earlier macro definitions for a particular @var{key} override later -ones. In particular, macro definitions on the command line override -those in the device definition file. @xref{Non-option Arguments}. - -@item -There are two predefined macros, whose values are determined at runtime: - -@table @samp -@item viewwidth -Defined as the width of the console screen, in columns of text. - -@item viewlength -Defined as the length of the console screen, in lines of text. -@end table -@end itemize - -@node Device definitions, Dimensions, Macro definitions, Output devices -@subsection Driver definitions - -Driver definitions are the ultimate purpose of the @file{devices} -configuration file. These are where the real action is. Driver -definitions tell PSPP where it should send its output. - -Each driver definition line is divided into four fields. These fields -are delimited by colons (@samp{:}). Each line is subjected to -environment variable interpolation before it is processed further -(@pxref{Environment substitutions}). From left to right, the four -fields are, in brief: - -@table @i -@item driver name -A unique identifier, used to determine whether to enable the driver. - -@item class name -One of the predefined driver classes supported by PSPP. The -currently supported driver classes include `postscript' and `ascii'. - -@item device type(s) -Zero or more of the following keywords, delimited by spaces: - -@table @code -@item screen +@include introduction.texi +@include license.texi -Indicates that the device is a screen display. This may reduce the -amount of buffering done by the driver, to make interactive use more -convenient. +@include invoking.texi +@include tutorial.texi +@include language.texi +@include expressions.texi +@include data-io.texi +@include files.texi +@include combining.texi +@include variables.texi +@include transformation.texi +@include data-selection.texi +@include flow-control.texi +@include statistics.texi +@include utilities.texi -@item printer +@include pspp-convert.texi +@include pspp-dump-sav.texi +@include not-implemented.texi +@include bugs.texi -Indicates that the device is a printer. +@include function-index.texi +@include command-index.texi +@include concept-index.texi -@item listing +@include installing.texi +@include fdl.texi -Indicates that the device is a listing file. -@end table - -These options are just hints to PSPP and do not cause the output to be -directed to the screen, or to the printer, or to a listing file---those -must be set elsewhere in the options. They are used primarily to decide -which devices should be enabled at any given time. @xref{SET}, for more -information. - -@item options -An optional set of options to pass to the driver itself. The exact -format for the options varies among drivers. -@end table - -The driver is enabled if: - -@enumerate -@item -Its driver name is specified on the command line, or - -@item -It's in a category specified on the command line, or - -@item -If no categories or driver names are specified on the command line, it -is in category @code{default}. -@end enumerate - -For more information on driver names, see @ref{Driver categories}. - -The class name must be one of those supported by PSPP. The -classes supported depend on the options with which PSPP was -compiled. See later sections in this chapter for descriptions of the -available driver classes. - -Options are dependent on the driver. See the driver descriptions for -details. - -@node Dimensions, papersize, Device definitions, Output devices -@subsection Dimensions - -Quite often in configuration it is necessary to specify a length or a -size. PSPP uses a common syntax for all such, calling them -collectively by the name @dfn{dimensions}. - -@itemize @bullet -@item -You can specify dimensions in decimal form (@samp{12.5}) or as -fractions, either as mixed numbers (@samp{12-1/2}) or raw fractions -(@samp{25/2}). - -@item -A number of different units are available. These are suffixed to the -numeric part of the dimension. There must be no spaces between the -number and the unit. The available units are identical to those offered -by the popular typesetting system @TeX{}: - -@table @code -@item in -inch (1 @code{in} = 2.54 @code{cm}) - -@item " -inch (1 @code{in} = 2.54 @code{cm}) - -@item pt -printer's point (1 @code{in} = 72.27 @code{pt}) - -@item pc -pica (12 @code{pt} = 1 @code{pc}) - -@item bp -PostScript point (1 @code{in} = 72 @code{bp}) - -@item cm -centimeter - -@item mm -millimeter (10 @code{mm} = 1 @code{cm}) - -@item dd -didot point (1157 @code{dd} = 1238 @code{pt}) - -@item cc -cicero (1 @code{cc} = 12 @code{dd}) - -@item sp -scaled point (65536 @code{sp} = 1 @code{pt}) -@end table - -@item -If no explicit unit is given, PSPP attempts to guess the best unit: - -@itemize @minus -@item -Numbers less than 50 are assumed to be in inches. - -@item -Numbers 50 or greater are assumed to be in millimeters. -@end itemize -@end itemize - -@node papersize, Distinguishing line types, Dimensions, Output devices -@subsection Paper sizes - -Output drivers usually deal with some sort of hardcopy media. This -media is called @dfn{paper} by the drivers, though in reality it could -be a transparency or film or thinly veiled sarcasm. To make it easier -for you to deal with paper, PSPP allows you to have (of course!) a -configuration file that gives symbolic names, like ``letter'' or -``legal'' or ``a4'', to paper sizes, rather than forcing you to use -cryptic numbers like ``8-1/2 x 11'' or ``210 by 297''. Surprisingly -enough, this configuration file is named @file{papersize}. -@xref{Configuration files}. - -When PSPP tries to connect a symbolic paper name to a paper size, it -reads and parses each non-comment line in the file, in order. The first -field on each line must be a symbolic paper name in double quotes. -Paper names may not contain double quotes. Paper names are not -case-sensitive: @samp{legal} and @samp{Legal} are equivalent. - -If a match is found for the paper name, the rest of the line is parsed. -If it is found to be a pair of dimensions (@pxref{Dimensions}) separated -by either @samp{x} or @samp{by}, then those are taken to be the paper -size, in order of width followed by length. There @emph{must} be at -least one space on each side of @samp{x} or @samp{by}. - -Otherwise the line must be of the form -@samp{"@var{paper-1}"="@var{paper-2}"}. In this case the target of the -search becomes paper name @var{paper-2} and the search through the file -continues. - -@node Distinguishing line types, Tokenizing lines, papersize, Output devices -@subsection How lines are divided into types - -The lines in @file{devices} are distinguished in the following manner: - -@enumerate -@item -Leading whitespace is removed. - -@item -If the resulting line begins with the exact string @code{define}, -followed by one or more whitespace characters, the line is processed as -a macro definition. - -@item -Otherwise, the line is scanned for the first instance of a colon -(@samp{:}) or an equals sign (@samp{=}). - -@item -If a colon is encountered first, the line is processed as a driver -definition. - -@item -Otherwise, if an equals sign is encountered, the line is processed as a -macro definition. - -@item -Otherwise, the line is ill-formed. -@end enumerate - -@node Tokenizing lines, , Distinguishing line types, Output devices -@subsection How lines are divided into tokens - -Each driver definition line is run through a simple tokenizer. This -tokenizer recognizes two basic types of tokens. - -The first type is an equals sign (@samp{=}). Equals signs are both -delimiters between tokens and tokens in themselves. - -The second type is an identifier or string token. Identifiers and -strings are equivalent after tokenization, though they are written -differently. An identifier is any string of characters other than -whitespace or equals sign. - -A string is introduced by a single- or double-quote character (@samp{'} -or @samp{"}) and, in general, continues until the next occurrence of -that same character. The following standard C escapes can also be -embedded within strings: - -@table @code -@item \' -A single-quote (@samp{'}). - -@item \" -A double-quote (@samp{"}). - -@item \? -A question mark (@samp{?}). Included for hysterical raisins. - -@item \\ -A backslash (@samp{\}). - -@item \a -Audio bell (ASCII 7). - -@item \b -Backspace (ASCII 8). - -@item \f -Formfeed (ASCII 12). - -@item \n -Newline (ASCII 10) - -@item \r -Carriage return (ASCII 13). - -@item \t -Tab (ASCII 9). - -@item \v -Vertical tab (ASCII 11). - -@item \@var{o}@var{o}@var{o} -Each @samp{o} must be an octal digit. The character is the one having -the octal value specified. Any number of octal digits is read and -interpreted; only the lower 8 bits are used. - -@item \x@var{h}@var{h} -Each @samp{h} must be a hex digit. The character is the one having the -hexadecimal value specified. Any number of hex digits is read and -interpreted; only the lower 8 bits are used. -@end table - -Tokens, outside of quoted strings, are delimited by whitespace or equals -signs. - -@node PostScript driver class, ASCII driver class, Output devices, Configuration -@section The PostScript driver class - -The @code{postscript} driver class is used to produce output that is -acceptable to PostScript printers and to PC-based PostScript -interpreters such as Ghostscript. Continuing a long tradition, -PSPP's PostScript driver is configurable to the point of -absurdity. - -There are actually two PostScript drivers. The first one, -@samp{postscript}, produces ordinary DSC-compliant PostScript output. -The second one @samp{epsf}, produces an Encapsulated PostScript file. -The two drivers are otherwise identical in configuration and in -operation. - -The PostScript driver is described in further detail below. - -@menu -* PS output options:: Output file options. -* PS page options:: Paper, margins, scaling & rotation, more! -* PS file options:: Configuration files. -* PS font options:: Default fonts, font options. -* PS line options:: Line widths, options. -* Prologue:: Details on the PostScript prologue. -* Encodings:: Details on PostScript font encodings. -@end menu - -@node PS output options, PS page options, PostScript driver class, PostScript driver class -@subsection PostScript output options - -These options deal with the form of the output and the output file -itself: - -@table @code -@item output-file=@var{filename} - -File to which output should be sent. This can be an ordinary filename -(i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or -stdout (@code{"-"}). Default: @code{"pspp.ps"}. - -@item color=@var{boolean} - -Most of the time black-and-white PostScript devices are smart enough to -map colors to shades themselves. However, you can cause the PSPP -output driver to do an ugly simulation of this in its own driver by -turning @code{color} off. Default: @code{on}. - -This is a boolean setting, as are many settings in the PostScript -driver. Valid positive boolean values are @samp{on}, @samp{true}, -@samp{yes}, and nonzero integers. Negative boolean values are -@samp{off}, @samp{false}, @samp{no}, and zero. - -@item data=@var{data-type} - -One of @code{clean7bit}, @code{clean8bit}, or @code{binary}. This -controls what characters will be written to the output file. PostScript -produced with @code{clean7bit} can be transmitted over 7-bit -transmission channels that use ASCII control characters for line -control. @code{clean8bit} is similar but allows characters above 127 to -be written to the output file. @code{binary} allows any character in -the output file. Default: @code{clean7bit}. - -@item line-ends=@var{line-end-type} - -One of @code{cr}, @code{lf}, or @code{crlf}. This controls what is used -for newline in the output file. Default: @code{cr}. - -@item optimize-line-size=@var{level} - -Either @code{0} or @code{1}. If @var{level} is @code{1}, then short -line segments will be collected and merged into longer ones. This -reduces output file size but requires more time and memory. A -@var{level} of @code{0} has the advantage of being better for -interactive environments. @code{1} is the default unless the -@code{screen} flag is set; in that case, the default is @code{0}. - -@item optimize-text-size=@var{level} - -One of @code{0}, @code{1}, or @code{2}, each higher level representing -correspondingly more aggressive space savings for text in the output -file and requiring correspondingly more time and memory. Unfortunately -the levels presently are all the same. @code{1} is the default unless -the @code{screen} flag is set; in that case, the default is @code{0}. -@end table - -@node PS page options, PS file options, PS output options, PostScript driver class -@subsection PostScript page options - -These options affect page setup: - -@table @code -@item headers=@var{boolean} - -Controls whether the standard headers showing the time and date and -title and subtitle are printed at the top of each page. Default: -@code{on}. - -@item paper-size=@var{paper-size} - -Paper size, either as a symbolic name (i.e., @code{letter} or @code{a4}) -or specific measurements (i.e., @code{8-1/2x11} or @code{"210 x 297"}. -@xref{papersize, , Paper sizes}. Default: @code{letter}. - -@item orientation=@var{orientation} - -Either @code{portrait} or @code{landscape}. Default: @code{portrait}. - -@item left-margin=@var{dimension} -@itemx right-margin=@var{dimension} -@itemx top-margin=@var{dimension} -@itemx bottom-margin=@var{dimension} - -Sets the margins around the page. The headers, if enabled, are not -included in the margins; they are in addition to the margins. For a -description of dimensions, see @ref{Dimensions}. Default: @code{0.5in}. - -@end table - -@node PS file options, PS font options, PS page options, PostScript driver class -@subsection PostScript file options - -Oh, my. You don't really want to know about the way that the PostScript -driver deals with files, do you? Well I suppose you're entitled, but I -warn you right now: it's not pretty. Here goes@enddots{} - -First let's look at the options that are available: - -@table @code - -@item font-dir=@var{font-directory} - -Sets the font directory. Default: @code{devps}. - -@item prologue-file=@var{prologue-file-name} - -Sets the name of the PostScript prologue file. You can write your own -prologue, though I have no idea why you'd want to: see @ref{Prologue}. -Default: @code{ps-prologue}. - -@item device-file=@var{device-file-name} - -Sets the name of the Groff-format device description file. The -PostScript driver reads this to know about the scaling of fonts -and so on. The format of such files is described in groff_font(5), -included with Groff. Default: @code{DESC}. - -@item encoding-file=@var{encoding-file-name} - -Sets the name of the encoding file. This file contains a list of all -font encodings that will be needed so that the driver can put all of -them at the top of the prologue. @xref{Encodings}. Default: -@code{ps-encodings}. - -If the specified encoding file cannot be found, this error will be -silently ignored, since most people do not need any encodings besides -the ones that can be found using @code{auto-encodings}, described below. - -@item auto-encode=@var{boolean} - -When enabled, the font encodings needed by the default proportional- and -fixed-pitch fonts will automatically be dumped to the PostScript -output. Otherwise, it is assumed that the user has an encoding file -and knows how to use it (@pxref{Encodings}). There is probably no good -reason to turn off this convenient feature. Default: @code{on}. - -@end table - -Next I suppose it's time to describe the search algorithm. When the -PostScript driver needs a file, whether that file be a font, a -PostScript prologue, or what you will, it searches in this manner: - -@enumerate - -@item -Constructs a path by taking the first of the following that is defined: - -@enumerate a - -@item -Environment variable @code{STAT_GROFF_FONT_PATH}. @xref{Environment -variables}. - -@item -Environment variable @code{GROFF_FONT_PATH}. - -@item -The compiled-in fallback default. -@end enumerate - -@item -Constructs a base name from concatenating, in order, the font directory, -a path separator (@samp{/} or @samp{\}), and the file to be found. A -typical base name would be something like @code{devps/ps-encodings}. - -@item -Searches for the base name in the path constructed above. If the file -is found, the algorithm terminates. - -@item -Searches for the base name in the standard configuration path. See -@ref{File locations}, for more details. If the file is found, the -algorithm terminates. - -@item -At this point we remove the font directory and path separator from the -base name. Now the base name is simply the file to be found, i.e., -@code{ps-encodings}. - -@item -Searches for the base name in the path constructed in the first step. -If the file is found, the algorithm terminates. - -@item -Searches for the base name in the standard configuration path. If the -file is found, the algorithm terminates. - -@item -The algorithm terminates unsuccessfully. -@end enumerate - -So, as you see, there are several ways to configure the PostScript -drivers. Careful selection of techniques can make the configuration -very flexible indeed. - -@node PS font options, PS line options, PS file options, PostScript driver class -@subsection PostScript font options - -The list of available font options is short and sweet: - -@table @code -@item prop-font=@var{font-name} - -Sets the default proportional font. The name should be that of a -PostScript font. Default: @code{"Helvetica"}. - -@item fixed-font=@var{font-name} - -Sets the default fixed-pitch font. The name should be that of a -PostScript font. Default: @code{"Courier"}. - -@item font-size=@var{font-size} - -Sets the size of the default fonts, in thousandths of a point. Default: -@code{10000}. - -@end table - -@node PS line options, Prologue, PS font options, PostScript driver class -@subsection PostScript line options - -Most tables contain lines, or rules, between cells. Some features of -the way that lines are drawn in PostScript tables are user-definable: - -@table @code - -@item line-style=@var{style} - -Sets the style used for lines used to divide tables into sections. -@var{style} must be either @code{thick}, in which case thick lines are -used, or @var{double}, in which case double lines are used. Default: -@code{thick}. - -@item line-gutter=@var{dimension} - -Sets the line gutter, which is the amount of whitespace on either side -of lines that border text or graphics objects. @xref{Dimensions}. -Default: @code{0.5pt}. - -@item line-spacing=@var{dimension} - -Sets the line spacing, which is the amount of whitespace that separates -lines that are side by side, as in a double line. Default: -@code{0.5pt}. - -@item line-width=@var{dimension} - -Sets the width of a typical line used in tables. Default: @code{0.5pt}. - -@item line-width-thick=@var{dimension} - -Sets the width of a thick line used in tables. Not used if -@code{line-style} is set to @code{thick}. Default: @code{1.5pt}. - -@end table - -@node Prologue, Encodings, PS line options, PostScript driver class -@subsection The PostScript prologue - -Most PostScript files that are generated mechanically by programs -consist of two parts: a prologue and a body. The prologue is generally -a collection of boilerplate. Only the body differs greatly between -two outputs from the same program. - -This is also the strategy used in the PSPP PostScript driver. In -general, the prologue supplied with PSPP will be more than sufficient. -In this case, you will not need to read the rest of this section. -However, hackers might want to know more. Read on, if you fall into -this category. - -The prologue is dumped into the output stream essentially unmodified. -However, two actions are performed on its lines. First, certain lines -may be omitted as specified in the prologue file itself. Second, -variables are substituted. - -The following lines are omitted: - -@enumerate -@item -All lines that contain three bangs in a row (@code{!!!}). - -@item -Lines that contain @code{!eps}, if the PostScript driver is producing -ordinary PostScript output. Otherwise an EPS file is being produced, -and the line is included in the output, although everything following -@code{!eps} is deleted. - -@item -Lines that contain @code{!ps}, if the PostScript driver is producing EPS -output. Otherwise, ordinary PostScript is being produced, and the line -is included in the output, although everything following @code{!ps} is -deleted. -@end enumerate - -The following are the variables that are substituted. Only the -variables listed are substituted; environment variables are not. -@xref{Environment substitutions}. - -@table @code -@item bounding-box - -The page bounding box, in points, as four space-separated numbers. For -U.S. letter size paper, this is @samp{0 0 612 792}. - -@item creator - -PSPP version as a string: @samp{GNU PSPP 0.1b}, for example. - -@item date - -Date the file was created. Example: @samp{Tue May 21 13:46:22 1991}. - -@item data - -Value of the @code{data} PostScript driver option, as one of the strings -@samp{Clean7Bit}, @samp{Clean8Bit}, or @samp{Binary}. - -@item orientation - -Page orientation, as one of the strings @code{Portrait} or -@code{Landscape}. - -@item user - -Under multiuser OSes, the user's login name, taken either from the -environment variable @code{LOGNAME} or, if that fails, the result of the -C library function @code{getlogin()}. Defaults to @samp{nobody}. - -@item host - -System hostname as reported by @code{gethostname()}. Defaults to -@samp{nowhere}. - -@item prop-font - -Name of the default proportional font, prefixed by the word -@samp{font} and a space. Example: @samp{font Times-Roman}. - -@item fixed-font - -Name of the default fixed-pitch font, prefixed by the word @samp{font} -and a space. - -@item scale-factor - -The page scaling factor as a floating-point number. Example: -@code{1.0}. Note that this is also passed as an argument to the BP -macro. - -@item paper-length -@item paper-width - -The paper length and paper width, respectively, in thousandths of a -point. Note that these are also passed as arguments to the BP macro. - -@item left-margin -@item top-margin - -The left margin and top margin, respectively, in thousandths of a -point. Note that these are also passed as arguments to the BP macro. - -@item title - -Document title as a string. This is not the title specified in the -PSPP syntax file. A typical title is the word @samp{PSPP} followed -by the syntax file name in parentheses. Example: @samp{PSPP -()}. - -@item source-file - -PSPP syntax file name. Example: @samp{mary96/first.stat}. - -@end table - -Any other questions about the PostScript prologue can best be answered -by examining the default prologue or the PSPP source. - -@node Encodings, , Prologue, PostScript driver class -@subsection PostScript encodings - -PostScript fonts often contain many more than 256 characters, in order -to accommodate foreign language characters and special symbols. -PostScript uses @dfn{encodings} to map these onto single-byte symbol -sets. Each font can have many different encodings applied to it. - -PSPP's PostScript driver needs to know which encoding to apply to each -font. It can determine this from the information encapsulated in the -Groff font description that it reads. However, there is an additional -problem---for efficiency, the PostScript driver needs to have a complete -list of all encodings that will be used in the entire session @emph{when -it opens the output file}. For this reason, it can't use the -information built into the fonts because it doesn't know which fonts -will be used. - -As a stopgap solution, there are two mechanisms for specifying which -encodings will be used. The first mechanism is automatic and it is the -only one that most PSPP users will ever need. The second mechanism is -manual, but it is more flexible. Either mechanism or both may be used -at one time. - -The first mechanism is activated by the @samp{auto-encode} driver option -(@pxref{PS file options}). When enabled, @samp{auto-encode} causes the -PostScript driver to include the encodings used by the default -proportional and fixed-pitch fonts (@pxref{PS font options}). Many -PSPP output files will only need these encodings. - -The second mechanism is the file specified by the @samp{encoding-file} -option (@pxref{PS file options}). If it exists, this file must consist -of lines in PSPP configuration-file format (@pxref{Configuration -files}). Each line that is not a comment should name a PostScript -encoding to include in the output. - -It is not an error if an encoding is included more than once, by either -mechanism. It will appear only once in the output. It is also not an -error if an encoding is included in the output but never used. It -@emph{is} an error if an encoding is used but not included by one of -these mechanisms. In this case, the built-in PostScript encoding -@samp{ISOLatin1Encoding} is substituted. - -@node ASCII driver class, HTML driver class, PostScript driver class, Configuration -@section The ASCII driver class - -The ASCII driver class produces output that can be displayed on a -terminal or output to printers. All of its options are highly -configurable. The ASCII driver has class name @samp{ascii}. - -The ASCII driver is described in further detail below. - -@menu -* ASCII output options:: Output file options. -* ASCII page options:: Page size, margins, more. -* ASCII font options:: Box character, bold & italics. -@end menu - -@node ASCII output options, ASCII page options, ASCII driver class, ASCII driver class -@subsection ASCII output options - -@table @code -@item output-file=@var{filename} - -File to which output should be sent. This can be an ordinary filename -(e.g., @code{"pspp.txt"}), a pipe filename (e.g., @code{"|lpr"}), or -stdout (@code{"-"}). Default: @code{"pspp.list"}. - -@item char-set=@var{char-set-type} - -One of @samp{ascii} or @samp{latin1}. This has no effect on output at -the present time. Default: @code{ascii}. - -@item form-feed-string=@var{form-feed-value} - -The string written to the output to cause a formfeed. See also -@code{paginate}, described below, for a related setting. Default: -@code{"\f"}. - -@item newline-string=@var{newline-value} - -The string written to the output to cause a newline (carriage return -plus linefeed). The default, which can be specified explicitly with -@code{newline-string=default}, is to use the system-dependent newline -sequence by opening the output file in text mode. This is usually the -right choice. - -However, @code{newline-string} can be set to any string. When this is -done, the output file is opened in binary mode. - -@item paginate=@var{boolean} - -If set, a formfeed (as set in @code{form-feed-string}, described above) -will be written to the device after every page. Default: @code{on}. - -@item tab-width=@var{tab-width-value} - -The distance between tab stops for this device. If set to 0, tabs will -not be used in the output. Default: @code{8}. - -@item init=@var{initialization-string}. - -String written to the device before anything else, at the beginning of -the output. Default: @code{""} (the empty string). - -@item done=@var{finalization-string}. - -String written to the device after everything else, at the end of the -output. Default: @code{""} (the empty string). -@end table - -@node ASCII page options, ASCII font options, ASCII output options, ASCII driver class -@subsection ASCII page options - -These options affect page setup: - -@table @code -@item headers=@var{boolean} - -If enabled, two lines of header information giving title and subtitle, -page number, date and time, and PSPP version are printed at the top of -every page. These two lines are in addition to any top margin -requested. Default: @code{on}. - -@item length=@var{line-count} - -Physical length of a page, in lines. Headers and margins are subtracted -from this value. Default: @code{66}. - -@item width=@var{character-count} - -Physical width of a page, in characters. Margins are subtracted from -this value. Default: @code{130}. - -@item lpi=@var{lines-per-inch} - -Number of lines per vertical inch. Not currently used. Default: @code{6}. - -@item cpi=@var{characters-per-inch} - -Number of characters per horizontal inch. Not currently used. Default: -@code{10}. - -@item left-margin=@var{left-margin-width} - -Width of the left margin, in characters. PSPP subtracts this value -from the page width. Default: @code{0}. - -@item right-margin=@var{right-margin-width} - -Width of the right margin, in characters. PSPP subtracts this value -from the page width. Default: @code{0}. - -@item top-margin=@var{top-margin-lines} - -Length of the top margin, in lines. PSPP subtracts this value from -the page length. Default: @code{2}. - -@item bottom-margin=@var{bottom-margin-lines} - -Length of the bottom margin, in lines. PSPP subtracts this value from -the page length. Default: @code{2}. - -@end table - -@node ASCII font options, , ASCII page options, ASCII driver class -@subsection ASCII font options - -These are the ASCII font options: - -@table @code -@item box[@var{line-type}]=@var{box-chars} - -The characters used for lines in tables produced by the ASCII driver can -be changed using this option. @var{line-type} is used to indicate which -type of line to change; @var{box-chars} is the character or string of -characters to use for this type of line. - -@var{line-type} must be a 4-digit number in base 4. The digits are in -the order `right', `bottom', `left', `top'. The four possibilities for -each digit are: - -@table @asis -@item 0 -No line. - -@item 1 -Single line. - -@item 2 -Double line. - -@item 3 -Special device-defined line, if one is available; otherwise, a double -line. -@end table - -Examples: - -@table @code -@item box[0101]="|" - -Sets @samp{|} as the character to use for a single-width line with -bottom and top components. - -@item box[2222]="#" - -Sets @samp{#} as the character to use for the intersection of four -double-width lines, one each from the top, bottom, left and right. - -@item box[1100]="\xda" - -Sets @samp{"\xda"}, which under MS-DOS is a box character suitable for -the top-left corner of a box, as the character for the intersection of -two single-width lines, one each from the right and bottom. - -@end table - -Defaults: - -@itemize @bullet -@item -@code{box[0000]=" "} - -@item -@code{box[1000]="-"} -@*@code{box[0010]="-"} -@*@code{box[1010]="-"} - -@item -@code{box[0100]="|"} -@*@code{box[0001]="|"} -@*@code{box[0101]="|"} - -@item -@code{box[2000]="="} -@*@code{box[0020]="="} -@*@code{box[2020]="="} - -@item -@code{box[0200]="#"} -@*@code{box[0002]="#"} -@*@code{box[0202]="#"} - -@item -@code{box[3000]="="} -@*@code{box[0030]="="} -@*@code{box[3030]="="} - -@item -@code{box[0300]="#"} -@*@code{box[0003]="#"} -@*@code{box[0303]="#"} - -@item -For all others, @samp{+} is used unless there are double lines or -special lines, in which case @samp{#} is used. -@end itemize - -@item italic-on=@var{italic-on-string} - -Character sequence written to turn on italics or underline printing. If -this is set to @code{overstrike}, then the driver will simulate -underlining by overstriking with underscore characters (@samp{_}) in the -manner described by @code{overstrike-style} and -@code{carriage-return-style}. Default: @code{overstrike}. - -@item italic-off=@var{italic-off-string} - -Character sequence to turn off italics or underline printing. Default: -@code{""} (the empty string). - -@item bold-on=@var{bold-on-string} - -Character sequence written to turn on bold or emphasized printing. If -set to @code{overstrike}, then the driver will simulated bold printing -by overstriking characters in the manner described by -@code{overstrike-style} and @code{carriage-return-style}. Default: -@code{overstrike}. - -@item bold-off=@var{bold-off-string} - -Character sequence to turn off bold or emphasized printing. Default: -@code{""} (the empty string). - -@item bold-italic-on=@var{bold-italic-on-string} - -Character sequence written to turn on bold-italic printing. If set to -@code{overstrike}, then the driver will simulate bold-italics by -overstriking twice, once with the character, a second time with an -underscore (@samp{_}) character, in the manner described by -@code{overstrike-style} and @code{carriage-return-style}. Default: -@code{overstrike}. - -@item bold-italic-off=@var{bold-italic-off-string} - -Character sequence to turn off bold-italic printing. Default: @code{""} -(the empty string). - -@item overstrike-style=@var{overstrike-option} - -Either @code{single} or @code{line}: - -@itemize @bullet -@item -If @code{single} is selected, then, to overstrike a line of text, the -output driver will output a character, backspace, overstrike, output a -character, backspace, overstrike, and so on along a line. - -@item -If @code{line} is selected then the output driver will output an entire -line, then backspace or emit a carriage return (as indicated by -@code{carriage-return-style}), then overstrike the entire line at once. -@end itemize - -@code{single} is recommended for use with ttys and programs that -understand overstriking in text files, such as the pager @code{less}. -@code{single} will also work with printer devices but results in rapid -back-and-forth motions of the printhead that can cause the printer to -physically overheat! - -@code{line} is recommended for use with printer devices. Most programs -that understand overstriking in text files will not properly deal with -@code{line} mode. - -Default: @code{single}. - -@item carriage-return-style=@var{carriage-return-type} - -Either @code{bs} or @code{cr}. This option applies only when one or -more of the font commands is set to @code{overstrike} and, at the same -time, @code{overstrike-style} is set to @code{line}. - -@itemize @bullet -@item -If @code{bs} is selected then the driver will return to the beginning of -a line by emitting a sequence of backspace characters (ASCII 8). - -@item -If @code{cr} is selected then the driver will return to the beginning of -a line by emitting a single carriage-return character (ASCII 13). -@end itemize - -Although @code{cr} is preferred as being more compact, @code{bs} is more -general since some devices do not interpret carriage returns in the -desired manner. Default: @code{bs}. -@end table - -@node HTML driver class, Miscellaneous configuring, ASCII driver class, Configuration -@section The HTML driver class - -The @code{html} driver class is used to produce output for viewing in -tables-capable web browsers such as Emacs' w3-mode. Its configuration -is very simple. Currently, the output has a very plain format. In the -future, further work may be done on improving the output appearance. - -There are few options for use with the @code{html} driver class: - -@table @code -@item output-file=@var{filename} - -File to which output should be sent. This can be an ordinary filename -(i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or -stdout (@code{"-"}). Default: @code{"pspp.html"}. - -@item prologue-file=@var{prologue-file-name} - -Sets the name of the PostScript prologue file. You can write your own -prologue if you want to customize colors or other settings: see -@ref{HTML Prologue}. Default: @code{html-prologue}. -@end table - -@menu -* HTML Prologue:: Format of the HTML prologue file. -@end menu - -@node HTML Prologue, , HTML driver class, HTML driver class -@subsection The HTML prologue - -HTML files that are generated by PSPP consist of two parts: a prologue -and a body. The prologue is a collection of boilerplate. Only the body -differs greatly between two outputs. You can tune the colors and other -attributes of the output by editing the prologue. - -The prologue is dumped into the output stream essentially unmodified. -However, two actions are performed on its lines. First, certain lines -may be omitted as specified in the prologue file itself. Second, -variables are substituted. - -The following lines are omitted: - -@enumerate -@item -All lines that contain three bangs in a row (@code{!!!}). - -@item -Lines that contain @code{!title}, if no title is set for the output. If -a title is set, then the characters @code{!title} are removed before the -line is output. - -@item -Lines that contain @code{!subtitle}, if no subtitle is set for the -output. If a subtitle is set, then the characters @code{!subtitle} are -removed before the line is output. -@end enumerate - -The following are the variables that are substituted. Only the -variables listed are substituted; environment variables are not. -@xref{Environment substitutions}. - -@table @code -@item generator - -PSPP version as a string: @samp{GNU PSPP 0.1b}, for example. - -@item date - -Date the file was created. Example: @samp{Tue May 21 13:46:22 1991}. - -@item user - -Under multiuser OSes, the user's login name, taken either from the -environment variable @code{LOGNAME} or, if that fails, the result of the -C library function @code{getlogin()}. Defaults to @samp{nobody}. - -@item host - -System hostname as reported by @code{gethostname()}. Defaults to -@samp{nowhere}. - -@item title - -Document title as a string. This is the title specified in the PSPP -syntax file. - -@item subtitle - -Document subtitle as a string. - -@item source-file - -PSPP syntax file name. Example: @samp{mary96/first.stat}. -@end table - -@node Miscellaneous configuring, Improving output quality, HTML driver class, Configuration -@section Miscellaneous configuration - -The following environment variables can be used to further configure -PSPP: - -@table @code -@item HOME - -Used to determine the user's home directory. No default value. - -@item STAT_INCLUDE_PATH - -Path used to find include files in PSPP syntax files. Defaults vary -across operating systems: - -@table @asis -@item UNIX - -@itemize @bullet -@item -@file{.} - -@item -@file{~/.pspp/include} - -@item -@file{/usr/local/lib/pspp/include} - -@item -@file{/usr/lib/pspp/include} - -@item -@file{/usr/local/share/pspp/include} - -@item -@file{/usr/share/pspp/include} -@end itemize - -@item MS-DOS - -@itemize @bullet -@item -@file{.} - -@item -@file{C:\PSPP\INCLUDE} - -@item -@file{$PATH} -@end itemize - -@item Other OSes -No default path. -@end table - -@item STAT_PAGER -@itemx PAGER - -When PSPP invokes an external pager, it uses the first of these that -is defined. There is a default pager only if the person who compiled -PSPP defined one. - -@item TERM - -The terminal type @code{termcap} or @code{ncurses} will use, if such -support was compiled into PSPP. - -@item STAT_OUTPUT_INIT_FILE - -The basename used to search for the driver definition file. -@xref{Output devices}. @xref{File locations}. Default: @code{devices}. - -@item STAT_OUTPUT_PAPERSIZE_FILE - -The basename used to search for the papersize file. @xref{papersize}. -@xref{File locations}. Default: @code{papersize}. - -@item STAT_OUTPUT_INIT_PATH - -The path used to search for the driver definition file and the papersize -file. @xref{File locations}. Default: the standard configuration path. - -@item TMPDIR - -The @code{sort} procedure stores its temporary files in this directory. -Default: (UNIX) @file{/tmp}, (MS-DOS) @file{\}, (other OSes) empty string. - -@item TEMP -@item TMP - -Under MS-DOS only, these variables are consulted after TMPDIR, in this -order. -@end table - -@node Improving output quality, , Miscellaneous configuring, Configuration -@section Improving output quality - -When its drivers are set up properly, PSPP can produce output that -looks very good indeed. The PostScript driver, suitably configured, can -produce presentation-quality output. Here are a few guidelines for -producing better-looking output, regardless of output driver. Your -mileage may vary, of course, and everyone has different esthetic -preferences. - -@itemize @bullet -@item -Width is important in PSPP output. Greater output width leads to more -readable output, to a point. Try the following to increase the output -width: - -@itemize @minus -@item -If you're using the ASCII driver with a dot-matrix printer, figure out -what you need to do to put the printer into compressed mode. Put that -string into the @code{init-string} setting. Try to get 132 columns; 160 -might be better, but you might find that print that tiny is difficult to -read. - -@item -With the PostScript driver, try these ideas: - -@itemize + -@item -Landscape mode. - -@item -Legal-size (8.5" x 14") paper in landscape mode. - -@item -Reducing font sizes. If you're using 12-point fonts, try 10 point; if -you're using 10-point fonts, try 8 point. Some fonts are more readable -than others at small sizes. -@end itemize -@end itemize - -Try to strike a balance between character size and page width. - -@item -Use high-quality fonts. Many public domain fonts are poor in quality. -Recently, URW made some high-quality fonts available under the GPL. -These are probably suitable. - -@item -Be sure you're using the proper font metrics. The font metrics provided -with PSPP may not correspond to the fonts actually being printed. -This can cause bizarre-looking output. - -@item -Make sure that you're using good ink/ribbon/toner. Darker print is -easier to read. - -@item -Use plain fonts with serifs, such as Times-Roman or Palatino. Avoid -choosing italic or bold fonts as document base fonts. -@end itemize - -@node Invocation, Language, Configuration, Top -@chapter Invoking PSPP -@cindex invocation -@cindex PSPP, invoking - -@cindex command line, options -@cindex options, command-line -@example -pspp [ -B @var{dir} | --config-dir=@var{dir} ] [ -o @var{device} | --device=@var{device} ] - [ -d @var{var}[=@var{value}] | --define=@var{var}[=@var{value}] ] [-u @var{var} | --undef=@var{var} ] - [ -f @var{file} | --out-file=@var{file} ] [ -p | --pipe ] [ -I- | --no-include ] - [ -I @var{dir} | --include=@var{dir} ] [ -i | --interactive ] - [ -n | --edit | --dry-run | --just-print | --recon ] - [ -r | --no-statrc ] [ -h | --help ] [ -l | --list ] - [ -c @var{command} | --command @var{command} ] [ -s | --safer ] - [ --testing-mode ] [ -V | --version ] [ -v | --verbose ] - [ @var{key}=@var{value} ] @var{file}@enddots{} -@end example - -@menu -* Non-option Arguments:: Specifying syntax files and output devices. -* Configuration Options:: Change the configuration for the current run. -* Input and output options:: Controlling input and output files. -* Language control options:: Language variants. -* Informational options:: Helpful information about PSPP. -@end menu - -@node Non-option Arguments, Configuration Options, Invocation, Invocation -@section Non-option Arguments - -Syntax files and output device substitutions can be specified on -PSPP's command line: - -@table @code -@item @var{file} - -A file by itself on the command line will be executed as a syntax file. -PSPP terminates after the syntax file runs, unless the @code{-i} or -@code{--interactive} option is given (@pxref{Language control options}). - -@item @var{file1} @var{file2} - -When two or more filenames are given on the command line, the first -syntax file is executed, then PSPP's dictionary is cleared, then the second -syntax file is executed. - -@item @var{file1} + @var{file2} - -If syntax files' names are delimited by a plus sign (@samp{+}), then the -dictionary is not cleared between their executions, as if they were -concatenated together into a single file. - -@item @var{key}=@var{value} - -Defines an output device macro @var{key} to expand to @var{value}, -overriding any macro having the same @var{key} defined in the device -configuration file. @xref{Macro definitions}. - -@end table - -There is one other way to specify a syntax file, if your operating -system supports it. If you have a syntax file @file{foobar.stat}, put -the notation - -@example -#! /usr/local/bin/pspp -@end example - -at the top, and mark the file as executable with @code{chmod +x -foobar.stat}. (If PSPP is not installed in @file{/usr/local/bin}, -then insert its actual installation directory into the syntax file -instead.) Now you should be able to invoke the syntax file just by -typing its name. You can include any options on the command line as -usual. PSPP entirely ignores any lines beginning with @samp{#!}. - -@node Configuration Options, Input and output options, Non-option Arguments, Invocation -@section Configuration Options - -Configuration options are used to change PSPP's configuration for the -current run. The configuration options are: - -@table @code -@item -B @var{dir} -@itemx --config-dir=@var{dir} - -Sets the configuration directory to @var{dir}. @xref{File locations}. - -@item -o @var{device} -@itemx --device=@var{device} - -Selects the output device with name @var{device}. If this option is -given more than once, then all devices mentioned are selected. This -option disables all devices besides those mentioned on the command line. - -@item -d @var{var}[=@var{value}] -@itemx --define=@var{var}[=@var{value}] - -Defines an `environment variable' named @var{var} having the optional -value @var{value} specified. @xref{Variable values}. - -@item -u @var{var} -@itemx --undef=@var{var} - -Undefines the `environment variable' named @var{var}. @xref{Variable -values}. -@end table - -@node Input and output options, Language control options, Configuration Options, Invocation -@section Input and output options - -Input and output options affect how PSPP reads input and writes -output. These are the input and output options: - -@table @code -@item -f @var{file} -@itemx --out-file=@var{file} - -This overrides the output file name for devices designated as listing -devices. If a file named @var{file} already exists, it is overwritten. - -@item -p -@itemx --pipe - -Allows PSPP to be used as a filter by causing the syntax file to be -read from stdin and output to be written to stdout. Conflicts with the -@code{-f @var{file}} and @code{--file=@var{file}} options. - -@item -I- -@itemx --no-include - -Clears all directories from the include path. This includes all -directories put in the include path by default. @xref{Miscellaneous -configuring}. - -@item -I @var{dir} -@itemx --include=@var{dir} - -Appends directory @var{dir} to the path that is searched for include -files in PSPP syntax files. - -@item -c @var{command} -@itemx --command=@var{command} - -Execute literal command @var{command}. The command is executed before -startup syntax files, if any. - -@item --testing-mode - -Invoke heuristics to assist with testing PSPP. For use by @code{make -check} and similar scripts. -@end table - -@node Language control options, Informational options, Input and output options, Invocation -@section Language control options - -Language control options control how PSPP syntax files are parsed and -interpreted. The available language control options are: - -@table @code -@item -i -@itemx --interactive - -When a syntax file is specified on the command line, PSPP normally -terminates after processing it. Giving this option will cause PSPP to -bring up a command prompt after processing the syntax file. - -In addition, this forces syntax files to be interpreted in interactive -mode, rather than the default batch mode. @xref{Tokenizing lines}, for -information on the differences between batch mode and interactive mode -command interpretation. - -@item -n -@itemx --edit -@itemx --dry-run -@itemx --just-print -@itemx --recon - -Only the syntax of any syntax file specified or of commands entered at -the command line is checked. Transformations are not performed and -procedures are not executed. Not yet implemented. - -@item -r -@itemx --no-statrc - -Prevents the execution of the PSPP startup syntax file. Not yet -implemented, as startup syntax files aren't, either. - -@item -s -@itemx --safer - -Disables certain unsafe operations. This includes the ERASE and -HOST commands, as well as use of pipes as input and output files. -@end table - -@node Informational options, , Language control options, Invocation -@section Informational options - -Informational options cause information about PSPP to be written to -the terminal. Here are the available options: - -@table @code -@item -h -@item --help - -Prints a message describing PSPP command-line syntax and the available -device driver classes, then terminates. - -@item -l -@item --list - -Lists the available device driver classes, then terminates. - -@item -V -@item --version - -Prints a brief message listing PSPP's version, warranties you don't -have, copying conditions and copyright, and e-mail address for bug -reports, then terminates. - -@item -v -@item --verbose - -Increments PSPP's verbosity level. Higher verbosity levels cause -PSPP to display greater amounts of information about what it is -doing. Often useful for debugging PSPP's configuration. - -This option can be given multiple times to set the verbosity level to -that value. The default verbosity level is 0, in which no informational -messages will be displayed. - -Higher verbosity levels cause messages to be displayed when the -corresponding events take place. - -@table @asis -@item 1 - -Driver and subsystem initializations. - -@item 2 - -Completion of driver initializations. Beginning of driver closings. - -@item 3 - -Completion of driver closings. - -@item 4 - -Files searched for; success of searches. - -@item 5 - -Individual directories included in file searches. -@end table - -Each verbosity level also includes messages from lower verbosity levels. - -@end table - -@node Language, Expressions, Invocation, Top -@chapter The PSPP language -@cindex language, PSPP -@cindex PSPP, language - -@quotation -@strong{Please note:} PSPP is not even close to completion. -Only a few actual statistical procedures are implemented. PSPP -is a work in progress. -@end quotation - -This chapter discusses elements common to many PSPP commands. -Later chapters will describe individual commands in detail. - -@menu -* Tokens:: Characters combine to form tokens. -* Commands:: Tokens combine to form commands. -* Types of Commands:: Commands come in several flavors. -* Order of Commands:: Commands combine to form syntax files. -* Missing Observations:: Handling missing observations. -* Variables:: The unit of data storage. -* Files:: Files used by PSPP. -* BNF:: How command syntax is described. -@end menu - -@node Tokens, Commands, Language, Language -@section Tokens -@cindex language, lexical analysis -@cindex language, tokens -@cindex tokens -@cindex lexical analysis -@cindex lexemes - -PSPP divides most syntax file lines into series of short chunks -called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}. These -tokens are then grouped to form commands, each of which tells -PSPP to take some action---read in data, write out data, perform -a statistical procedure, etc. The process of dividing input into tokens -is @dfn{tokenization}, or @dfn{lexical analysis}. Each type of token is -described below. - -@cindex delimiters -@cindex whitespace -Tokens must be separated from each other by @dfn{delimiters}. -Delimiters include whitespace (spaces, tabs, carriage returns, line -feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and -operators (plus, minus, times, divide, etc.) Note that while whitespace -only separates tokens, other delimiters are tokens in themselves. - -@table @strong -@cindex identifiers -@item Identifiers -Identifiers are names that specify variable names, commands, or command -details. - -@itemize @bullet -@item -The first character in an identifier must be a letter, @samp{#}, or -@samp{@@}. Some system identifiers begin with @samp{$}, but -user-defined variables' names may not begin with @samp{$}. - -@item -The remaining characters in the identifier must be letters, digits, or -one of the following special characters: - -@example -. _ $ # @@ -@end example - -@item -@cindex variable names -@cindex names, variable -Variable names may be any length, but only the first 8 characters are -significant. - -@item -@cindex case-sensitivity -Identifiers are not case-sensitive: @code{foobar}, @code{Foobar}, -@code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different -representations of the same identifier. - -@item -@cindex keywords -Identifiers other than variable names may be abbreviated to their first -3 characters if this abbreviation is unambiguous. These identifiers are -often called @dfn{keywords}. (Unique abbreviations of 3 or more -characters are also accepted: @samp{FRE}, @samp{FREQ}, and -@samp{FREQUENCIES} are equivalent when the last is a keyword.) - -@item -Whether an identifier is a keyword depends on the context. - -@item -@cindex keywords, reserved -@cindex reserved keywords -Some keywords are reserved. These keywords may not be used in any -context besides those explicitly described in this manual. The reserved -keywords are: - -@example -ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH -@end example - -@item -Since keywords are identifiers, all the rules for identifiers apply. -Specifically, they must be delimited as are other identifiers: -@code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid -variable name. -@end itemize - -@cindex @samp{.} -@cindex period -@cindex variable names, ending with period -@strong{Caution:} It is legal to end a variable name with a period, but -@emph{don't do it!} The variable name will be misinterpreted when it is -the final token on a line: @code{FOO.} will be divided into two separate -tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}. -@xref{Commands, , Forming commands of tokens}. - -@item Numbers -@cindex numbers -@cindex integers -@cindex reals -Numbers may be specified as integers or reals. Integers are internally -converted into reals. Scientific notation is not supported. Here are -some examples of valid numbers: - -@example -1234 3.14159265359 .707106781185 8945. -@end example - -@strong{Caution:} The last example will be interpreted as two tokens, -@samp{8945} and @samp{.}, if it is the last token on a line. - -@item Strings -@cindex strings -@cindex @samp{'} -@cindex @samp{"} -@cindex case-sensitivity -Strings are literal sequences of characters enclosed in pairs of single -quotes (@samp{'}) or double quotes (@samp{"}). - -@itemize @bullet -@item -Whitespace and case of letters @emph{are} significant inside strings. -@item -Whitespace characters inside a string are not delimiters. -@item -To include single-quote characters in a string, enclose the string in -double quotes. -@item -To include double-quote characters in a string, enclose the string in -single quotes. -@item -It is not possible to put both single- and double-quote characters -inside one string. -@end itemize - -@item Hexstrings -@cindex hexstrings -Hexstrings are string variants that use hex digits to specify -characters. - -@itemize @bullet -@item -A hexstring may be used anywhere that an ordinary string is allowed. - -@item -@cindex @samp{X'} -@cindex @samp{'} -A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}. - -@cindex whitespace -@item -No whitespace is allowed between the initial @samp{X} and @samp{'}. - -@item -Double quotes @samp{"} may be used in place of single quotes @samp{'} if -done in both places. - -@item -Each pair of hex digits is internally changed into a single character -with the given value. - -@item -If there is an odd number of hex digits, the missing last digit is -assumed to be @samp{0}. - -@item -@cindex portability -@strong{Please note:} Use of hexstrings is nonportable because the same -numeric values are associated with different glyphs by different -operating systems. Therefore, their use should be confined to syntax -files that will not be widely distributed. - -@item -@cindex characters, reserved -@cindex 0 -@cindex whitespace -@strong{Please note also:} The character with value 00 is reserved for -internal use by PSPP. Its use in strings causes an error and -replacement with a blank space (in ASCII, hex 20, decimal 32). -@end itemize - -@item Punctuation -@cindex punctuation -Punctuation separates tokens; punctuators are delimiters. These are the -punctuation characters: - -@example -, / = ( ) -@end example - -@item Operators -@cindex operators -Operators describe mathematical operations. Some operators are delimiters: - -@example -( ) + - * / ** -@end example - -Many of the above operators are also punctuators. Punctuators are -distinguished from operators by context. - -The other operators are all reserved keywords. None of these are -delimiters: - -@example -AND EQ GE GT LE LT NE OR -@end example - -@item Terminal Dot -@cindex terminal dot -@cindex dot, terminal -@cindex period -@cindex @samp{.} -A period (@samp{.}) at the end of a line (except for whitespace) is one -type of a @dfn{terminal dot}, although not every terminal dot is a -period at the end of a line. @xref{Commands, , Forming commands of -tokens}. A period is a terminal dot @emph{only} -when it is at the end of a line; otherwise it is part of a -floating-point number. (A period outside a number in the middle of a -line is an error.) - -@quotation -@cindex terminal dot, changing -@cindex dot, terminal, changing -@strong{Please note:} The character used for the @dfn{terminal dot} -can be changed with @cmd{SET}'s ENDCMD subcommand (@pxref{SET}). This -is strongly discouraged, and throughout all the remainder of this -manual it will be assumed that the default setting is in effect. -@end quotation - -@end table - -@node Commands, Types of Commands, Tokens, Language -@section Forming commands of tokens - -@cindex PSPP, command structure -@cindex language, command structure -@cindex commands, structure - -Most PSPP commands share a common structure, diagrammed below: - -@example -@var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}]. -@end example - -@cindex @samp{[ ]} -In the above, rather daunting, expression, pairs of square brackets -(@samp{[ ]}) indicate optional elements, and names such as @var{cmd} -indicate parts of the syntax that vary from command to command. -Ellipses (@samp{...}) indicate that the preceding part may be repeated -an arbitrary number of times. Let's pick apart what it says above: - -@itemize @bullet -@cindex commands, names -@item -A command begins with a command name of one or more keywords, such as -@cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF CASES}. @var{cmd} -may be abbreviated to its first word if that is unambiguous; each word -in @var{cmd} may be abbreviated to a unique prefix of three or more -characters as described above. - -@cindex subcommands -@item -The command name may be followed by one or more @dfn{subcommands}: - -@itemize @minus -@item -Each subcommand begins with a unique keyword, indicated by @var{sbc} -above. This is analogous to the command name. - -@item -The subcommand name is optionally followed by an equals sign (@samp{=}). - -@item -Some subcommands accept a series of one or more specifications -(@var{spec}), optionally separated by commas. - -@item -Each subcommand must be separated from the next (if any) by a forward -slash (@samp{/}). -@end itemize - -@cindex dot, terminal -@cindex terminal dot -@item -Each command must be terminated with a @dfn{terminal dot}. -The terminal dot may be given one of three ways: - -@itemize @minus -@item -(most commonly) A period character at the very end of a line, as -described above. - -@item -(only if NULLINE is on: @xref{SET, , Setting user preferences}, for more -details.) A completely blank line. - -@item -(in batch mode only) Any line that is not indented from the left side of -the page causes a terminal dot to be inserted before that line. -Therefore, each command begins with a line that is flush left, followed -by zero or more lines that are indented one or more characters from the -left margin. - -In batch mode, PSPP will ignore a plus sign, minus sign, or period -(@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a -line. Any of these characters as the first character on a line will -begin a new command. This allows for visual indentation of a command -without that command being considered part of the previous command. - -PSPP is in batch mode when it is reading input from a file, rather -than from an interactive user. Note that the other forms of the -terminal dot may also be used in batch mode. - -Sometimes, one encounters syntax files that are intended to be -interpreted in interactive mode rather than batch mode (for instance, -this can happen if a session log file is used directly as a syntax -file). When this occurs, use the @samp{-i} command line option to force -interpretation in interactive mode (@pxref{Language control options}). -@end itemize -@end itemize - -PSPP ignores empty commands when they are generated by the above -rules. Note that, as a consequence of these rules, each command must -begin on a new line. - -@node Types of Commands, Order of Commands, Commands, Language -@section Types of Commands - -Commands in PSPP are divided roughly into six categories: - -@table @strong -@item Utility commands -@cindex utility commands -Set or display various global options that affect PSPP operations. -May appear anywhere in a syntax file. @xref{Utilities, , Utility -commands}. - -@item File definition commands -@cindex file definition commands -Give instructions for reading data from text files or from special -binary ``system files''. Most of these commands discard any previous -data or variables to replace it with the new data and -variables. At least one must appear before the first command in any of -the categories below. @xref{Data Input and Output}. - -@item Input program commands -@cindex input program commands -Though rarely used, these provide powerful tools for reading data files -in arbitrary textual or binary formats. @xref{INPUT PROGRAM}. - -@item Transformations -@cindex transformations -Perform operations on data and write data to output files. Transformations -are not carried out until a procedure is executed. - -@item Restricted transformations -@cindex restricted transformations -Same as transformations for most purposes. @xref{Order of Commands}, for a -detailed description of the differences. - -@item Procedures -@cindex procedures -Analyze data, writing results of analyses to the listing file. Cause -transformations specified earlier in the file to be performed. In a -more general sense, a @dfn{procedure} is any command that causes the -active file (the data) to be read. -@end table - -@node Order of Commands, Missing Observations, Types of Commands, Language -@section Order of Commands -@cindex commands, ordering -@cindex order of commands - -PSPP does not place many restrictions on ordering of commands. -The main restriction is that variables must be defined with one of the -file-definition commands before they are otherwise referred to. - -Of course, there are specific rules, for those who are interested. -PSPP possesses five internal states, called initial, INPUT PROGRAM, -FILE TYPE, transformation, and procedure states. (Please note the -distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE} -@emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.) - -PSPP starts up in the initial state. Each successful completion -of a command may cause a state transition. Each type of command has its -own rules for state transitions: - -@table @strong -@item Utility commands -@itemize @bullet -@item -Legal in all states. -@item -Do not cause state transitions. Exception: when @cmd{N OF CASES} -is executed in the procedure state, it causes a transition to the -transformation state. -@end itemize - -@item @cmd{DATA LIST} -@itemize @bullet -@item -Legal in all states. -@item -When executed in the initial or procedure state, causes a transition to -the transformation state. -@item -Clears the active file if executed in the procedure or transformation -state. -@end itemize - -@item @cmd{INPUT PROGRAM} -@itemize @bullet -@item -Invalid in INPUT PROGRAM and FILE TYPE states. -@item -Causes a transition to the INPUT PROGRAM state. -@item -Clears the active file. -@end itemize - -@item @cmd{FILE TYPE} -@itemize @bullet -@item -Invalid in INPUT PROGRAM and FILE TYPE states. -@item -Causes a transition to the FILE TYPE state. -@item -Clears the active file. -@end itemize - -@item Other file definition commands -@itemize @bullet -@item -Invalid in INPUT PROGRAM and FILE TYPE states. -@item -Cause a transition to the transformation state. -@item -Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES}, -and @cmd{UPDATE}. -@end itemize - -@item Transformations -@itemize @bullet -@item -Invalid in initial and FILE TYPE states. -@item -Cause a transition to the transformation state. -@end itemize - -@item Restricted transformations -@itemize @bullet -@item -Invalid in initial, INPUT PROGRAM, and FILE TYPE states. -@item -Cause a transition to the transformation state. -@end itemize - -@item Procedures -@itemize @bullet -@item -Invalid in initial, INPUT PROGRAM, and FILE TYPE states. -@item -Cause a transition to the procedure state. -@end itemize -@end table - -@node Missing Observations, Variables, Order of Commands, Language -@section Handling missing observations -@cindex missing values -@cindex values, missing - -PSPP includes special support for unknown numeric data values. -Missing observations are assigned a special value, called the -@dfn{system-missing value}. This ``value'' actually indicates the -absence of value; it means that the actual value is unknown. Procedures -automatically exclude from analyses those observations or cases that -have missing values. Whether single observations or entire cases are -excluded depends on the procedure. - -The system-missing value exists only for numeric variables. String -variables always have a defined value, even if it is only a string of -spaces. - -Variables, whether numeric or string, can have designated -@dfn{user-missing values}. Every user-missing value is an actual value -for that variable. However, most of the time user-missing values are -treated in the same way as the system-missing value. String variables -that are wider than a certain width, usually 8 characters (depending on -computer architecture), cannot have user-missing values. - -For more information on missing values, see the following sections: -@ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}. See also the -documentation on individual procedures for information on how they -handle missing values. - -@node Variables, Files, Missing Observations, Language -@section Variables -@cindex variables -@cindex dictionary - -Variables are the basic unit of data storage in PSPP. All the -variables in a file taken together, apart from any associated data, are -said to form a @dfn{dictionary}. -Some details of variables are described in the sections below. - -@menu -* Attributes:: Attributes of variables. -* System Variables:: Variables automatically defined by PSPP. -* Sets of Variables:: Lists of variable names. -* Input/Output Formats:: Input and output formats. -* Scratch Variables:: Variables deleted by procedures. -@end menu - -@node Attributes, System Variables, Variables, Variables -@subsection Attributes of Variables -@cindex variables, attributes of -@cindex attributes of variables -Each variable has a number of attributes, including: - -@table @strong -@item Name -This is an identifier. Each variable must have a different name. -@xref{Tokens}. - -@cindex variables, type -@cindex type of variables -@item Type -Numeric or string. - -@cindex variables, width -@cindex width of variables -@item Width -(string variables only) String variables with a width of 8 characters or -fewer are called @dfn{short string variables}. Short string variables -can be used in many procedures where @dfn{long string variables} (those -with widths greater than 8) are not allowed. - -@quotation -@strong{Please note:} Certain systems may consider strings longer than 8 -characters to be short strings. Eight characters represents a minimum -figure for the maximum length of a short string. -@end quotation - -@item Position -Variables in the dictionary are arranged in a specific order. -@cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}. - -@item Initialization -Either reinitialized to 0 or spaces for each case, or left at its -existing value. @xref{LEAVE}. - -@cindex missing values -@cindex values, missing -@item Missing values -Optionally, up to three values, or a range of values, or a specific -value plus a range, can be specified as @dfn{user-missing values}. -There is also a @dfn{system-missing value} that is assigned to an -observation when there is no other obvious value for that observation. -Observations with missing values are automatically excluded from -analyses. User-missing values are actual data values, while the -system-missing value is not a value at all. @xref{Missing Observations}. - -@cindex variable labels -@cindex labels, variable -@item Variable label -A string that describes the variable. @xref{VARIABLE LABELS}. - -@cindex value labels -@cindex labels, value -@item Value label -Optionally, these associate each possible value of the variable with a -string. @xref{VALUE LABELS}. - -@cindex print format -@item Print format -Display width, format, and (for numeric variables) number of decimal -places. This attribute does not affect how data are stored, just how -they are displayed. Example: a width of 8, with 2 decimal places. -@xref{PRINT FORMATS}. - -@cindex write format -@item Write format -Similar to print format, but used by certain commands that are -designed to write to binary files. @xref{WRITE FORMATS}. -@end table - -@node System Variables, Sets of Variables, Attributes, Variables -@subsection Variables Automatically Defined by PSPP -@cindex system variables -@cindex variables, system - -There are seven system variables. These are not like ordinary -variables, as they are not stored in each case. They can only be used -in expressions. These system variables, whose values and output formats -cannot be modified, are described below. - -@table @code -@cindex @code{$CASENUM} -@item $CASENUM -Case number of the case at the moment. This changes as cases are -shuffled around. - -@cindex @code{$DATE} -@item $DATE -Date the PSPP process was started, in format A9, following the -pattern @code{DD MMM YY}. - -@cindex @code{$JDATE} -@item $JDATE -Number of days between 15 Oct 1582 and the time the PSPP process -was started. - -@cindex @code{$LENGTH} -@item $LENGTH -Page length, in lines, in format F11. - -@cindex @code{$SYSMIS} -@item $SYSMIS -System missing value, in format F1. - -@cindex @code{$TIME} -@item $TIME -Number of seconds between midnight 14 Oct 1582 and the time the active file -was read, in format F20. - -@cindex @code{$WIDTH} -@item $WIDTH -Page width, in characters, in format F3. -@end table - -@node Sets of Variables, Input/Output Formats, System Variables, Variables -@subsection Lists of variable names -@cindex TO convention -@cindex convention, TO - -There are several ways to specify a set of variables: - -@enumerate -@item -(Most commonly.) List the variable names one after another, optionally -separating them by commas. - -@cindex @code{TO} -@item -(This method cannot be used on commands that define the dictionary, such -as @cmd{DATA LIST}.) The syntax is the names of two existing variables, -separated by the reserved keyword @code{TO}. The meaning is to include -every variable in the dictionary between and including the variables -specified. For instance, if the dictionary contains six variables with -the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and -@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include -variables @code{X2}, @code{GOAL}, and @code{MET}. - -@item -(This method can be used only on commands that define the dictionary, -such as @cmd{DATA LIST}.) It is used to define sequences of variables -that end in consecutive integers. The syntax is two identifiers that -end in numbers. This method is best illustrated with examples: - -@itemize @bullet -@item -The syntax @code{X1 TO X5} defines 5 variables: - -@itemize @minus -@item -X1 -@item -X2 -@item -X3 -@item -X4 -@item -X5 -@end itemize - -@item -The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables: - -@itemize @minus -@item -ITEM0008 -@item -ITEM0009 -@item -ITEM0010 -@item -ITEM0011 -@item -ITEM0012 -@item -ITEM0013 -@end itemize - -@item -Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} -are invalid, although for different reasons, which should be evident. -@end itemize - -Note that after a set of variables has been defined with @cmd{DATA LIST} -or another command with this method, the same set can be referenced on -later commands using the same syntax. - -@item -The above methods can be combined, either one after another or delimited -by commas. For instance, the combined syntax @code{A Q5 TO Q8 X TO Z} -is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z} -is individually legal. -@end enumerate - -@node Input/Output Formats, Scratch Variables, Sets of Variables, Variables -@subsection Input and Output Formats - -Data that PSPP inputs and outputs must have one of a number of formats. -These formats are described, in general, by a format specification of -the form @code{NAMEw.d}, where @var{name} is the -format name and @var{w} is a field width. @var{d} is the optional -desired number of decimal places, if appropriate. If @var{d} is not -included then it is assumed to be 0. Some formats do not allow @var{d} -to be specified. - -When an input format is specified on @cmd{DATA LIST} or another -command, then -it is converted to an output format for the purposes of @cmd{PRINT} -and other -data output commands. For most purposes, input and output formats are -the same; the salient differences are described below. - -Below are listed the input and output formats supported by PSPP. If an -input format is mapped to a different output format by default, then -that mapping is indicated with @result{}. Each format has the listed -bounds on input width (iw) and output width (ow). - -The standard numeric input and output formats are given in the following -table: - -@table @asis -@item Fw.d: 1 <= iw,ow <= 40 -Standard decimal format with @var{d} decimal places. If the number is -too large to fit within the field width, it is expressed in scientific -notation (@code{1.2+34}) if w >= 6, with always at least two digits in -the exponent. When used as an input format, scientific notation is -allowed but an E or an F must be used to introduce the exponent. - -The default output format is the same as the input format, except if -@var{d} > 1. In that case the output @var{w} is always made to be at -least 2 + @var{d}. - -@item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40 -For input this is equivalent to F format except that no E or F is -require to introduce the exponent. For output, produces scientific -notation in the form @code{1.2+34}. There are always at least two -digits given in the exponent. - -The default output @var{w} is the largest of the input @var{w}, the -input @var{d} + 7, and 10. The default output @var{d} is the input -@var{d}, but at least 3. - -@item COMMAw.d: 1 <= iw,ow <= 40 -Equivalent to F format, except that groups of three digits are -comma-separated on output. If the number is too large to express in the -field width, then first commas are eliminated, then if there is still -not enough space the number is expressed in scientific notation given -that w >= 6. Commas are allowed and ignored when this is used as an -input format. - -@item DOTw.d: 1 <= iw,ow <= 40 -Equivalent to COMMA format except that the roles of comma and decimal -point are interchanged. However: If SET /DECIMAL=DOT is in effect, then -COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a -decimal point. - -@item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40 -Equivalent to COMMA format, except that the number is prefixed by a -dollar sign (@samp{$}) if there is room. On input the value is allowed -to be prefixed by a dollar sign, which is ignored. - -The default output @var{w} is the input @var{w}, but at least 2. - -@item PCTw.d: 2 <= iw,ow <= 40 -Equivalent to F format, except that the number is suffixed by a percent -sign (@samp{%}) if there is room. On input the value is allowed to be -suffixed by a percent sign, which is ignored. - -The default output @var{w} is the input @var{w}, but at least 2. - -@item Nw.d: 1 <= iw,ow <= 40 -Only digits are allowed within the field width. The decimal point is -assumed to be @var{d} digits from the right margin. - -The default output format is F with the same @var{w} and @var{d}, except -if @var{d} > 1. In that case the output @var{w} is always made to be at -least 2 + @var{d}. - -@item Zw.d @result{} F: 1 <= iw,ow <= 40 -Zoned decimal input. If you need to use this then you know how. - -@item IBw.d @result{} F: 1 <= iw,ow <= 8 -Integer binary format. The field is interpreted as a fixed-point -positive or negative binary number in two's-complement notation. The -location of the decimal point is implied. Endianness is the same as the -host machine. - -The default output format is F8.2 if @var{d} is 0. Otherwise it is F, -with output @var{w} as 9 + input @var{d} and output @var{d} as input -@var{d}. - -@item PIB @result{} F: 1 <= iw,ow <= 8 -Positive integer binary format. The field is interpreted as a -fixed-point positive binary number. The location of the decimal point -is implied. Endianness is teh same as the host machine. - -The default output format follows the rules for IB format. - -@item Pw.d @result{} F: 1 <= iw,ow <= 16 -Binary coded decimal format. Each byte from left to right, except the -rightmost, represents two digits. The upper nibble of each byte is more -significant. The upper nibble of the final byte is the least -significant digit. The lower nibble of the final byte is the sign; a -value of D represents a negative sign and all other values are -considered positive. The decimal point is implied. - -The default output format follows the rules for IB format. - -@item PKw.d @result{} F: 1 <= iw,ow <= 16 -Positive binary code decimal format. Same as P but the last byte is the -same as the others. - -The default output format follows the rules for IB format. - -@item RBw @result{} F: 2 <= iw,ow <= 8 - -Binary C architecture-dependent ``double'' format. For a standard -IEEE754 implementation @var{w} should be 8. - -The default output format follows the rules for IB format. - -@item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16 -PIB format encoded as textual hex digit pairs. @var{w} must be even. - -The input width is mapped to a default output width as follows: -2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14, -12@result{}16, 14@result{}18, 16@result{}21. No allowances are made for -decimal places. - -@item RBHEXw @result{} F: 4 <= iw,ow <= 16 - -RB format encoded as textual hex digits pairs. @var{w} must be even. - -The default output format is F8.2. - -@item CCAw.d: 1 <= ow <= 40 -@itemx CCBw.d: 1 <= ow <= 40 -@itemx CCCw.d: 1 <= ow <= 40 -@itemx CCDw.d: 1 <= ow <= 40 -@itemx CCEw.d: 1 <= ow <= 40 - -User-defined custom currency formats. May not be used as an input -format. @xref{SET}, for more details. -@end table - -The date and time numeric input and output formats accept a number of -possible formats. Before describing the formats themselves, some -definitions of the elements that make up their formats will be helpful: - -@table @dfn -@item leader -All formats accept an optional whitespace leader. - -@item day -An integer between 1 and 31 representing the day of month. - -@item day-count -An integer representing a number of days. - -@item date-delimiter -One or more characters of whitespace or the following characters: -@code{- / . ,} - -@item month -A month name in one of the following forms: -@itemize @bullet -@item -An integer between 1 and 12. -@item -Roman numerals representing an integer between 1 and 12. -@item -At least the first three characters of an English month name (January, -February, @dots{}). -@end itemize - -@item year -An integer year number between 1582 and 19999, or between 1 and 199. -Years between 1 and 199 will have 1900 added. - -@item julian -A single number with a year number in the first 2, 3, or 4 digits (as -above) and the day number within the year in the last 3 digits. - -@item quarter -An integer between 1 and 4 representing a quarter. - -@item q-delimiter -The letter @samp{Q} or @samp{q}. - -@item week -An integer between 1 and 53 representing a week within a year. - -@item wk-delimiter -The letters @samp{wk} in any case. - -@item time-delimiter -At least one characters of whitespace or @samp{:} or @samp{.}. - -@item hour -An integer greater than 0 representing an hour. - -@item minute -An integer between 0 and 59 representing a minute within an hour. - -@item opt-second -Optionally, a time-delimiter followed by a real number representing a -number of seconds. - -@item hour24 -An integer between 0 and 23 representing an hour within a day. - -@item weekday -At least the first two characters of an English day word. - -@item spaces -Any amount or no amount of whitespace. - -@item sign -An optional positive or negative sign. - -@item trailer -All formats accept an optional whitespace trailer. -@end table - -The date input formats are strung together from the above pieces. On -output, the date formats are always printed in a single canonical -manner, based on field width. The date input and output formats are -described below: - -@table @asis -@item DATEw: 9 <= iw,ow <= 40 -Date format. Input format: leader + day + date-delimiter + -month + date-delimiter + year + trailer. Output format: DD-MMM-YY for -@var{w} < 11, DD-MMM-YYYY otherwise. - -@item EDATEw: 8 <= iw,ow <= 40 -European date format. Input format same as DATE. Output format: -DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise. - -@item SDATEw: 8 <= iw,ow <= 40 -Standard date format. Input format: leader + year + date-delimiter + -month + date-delimiter + day + trailer. Output format: YY/MM/DD for -@var{w} < 10, YYYY/MM/DD otherwise. - -@item ADATEw: 8 <= iw,ow <= 40 -American date format. Input format: leader + month + date-delimiter + -day + date-delimiter + year + trailer. Output format: MM/DD/YY for -@var{w} < 10, MM/DD/YYYY otherwise. - -@item JDATEw: 5 <= iw,ow <= 40 -Julian date format. Input format: leader + julian + trailer. Output -format: YYDDD for @var{w} < 7, YYYYDDD otherwise. - -@item QYRw: 4 <= iw <= 40, 6 <= ow <= 40 -Quarter/year format. Input format: leader + quarter + q-delimiter + -year + trailer. Output format: @samp{Q Q YY}, where the first -@samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q -YYYY} otherwise. - -@item MOYRw: 6 <= iw,ow <= 40 -Month/year format. Input format: leader + month + date-delimiter + year -+ trailer. Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM -YYYY} otherwise. - -@item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40 -Week/year format. Input format: leader + week + wk-delimiter + year + -trailer. Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK -YYYY} otherwise. - -@item DATETIMEw.d: 17 <= iw,ow <= 40 -Date and time format. Input format: leader + day + date-delimiter + -month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter -+ minute + opt-second. Output format: @samp{DD-MMM-YYYY HH:MM}. If -@var{w} > 19 then seconds @samp{:SS} is added. If @var{w} > 22 and -@var{d} > 0 then fractional seconds @samp{.SS} are added. - -@item TIMEw.d: 5 <= iw,ow <= 40 -Time format. Input format: leader + sign + spaces + hour + -time-delimiter + minute + opt-second. Output format: @samp{HH:MM}. -Seconds and fractional seconds are available with @var{w} of at least 8 -and 10, respectively. - -@item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40 -Time format with day count. Input format: leader + sign + spaces + -day-count + time-delimiter + hour + time-delimiter + minute + -opt-second. Output format: @samp{DD HH:MM}. Seconds and fractional -seconds are available with @var{w} of at least 8 and 10, respectively. - -@item WKDAYw: 2 <= iw,ow <= 40 -A weekday as a number between 1 and 7, where 1 is Sunday. Input format: -leader + weekday + trailer. Output format: as many characters, in all -capital letters, of the English name of the weekday as will fit in the -field width. - -@item MONTHw: 3 <= iw,ow <= 40 -A month as a number between 1 and 12, where 1 is January. Input format: -leader + month + trailer. Output format: as many character, in all -capital letters, of the English name of the month as will fit in the -field width. -@end table - -There are only two formats that may be used with string variables: - -@table @asis -@item Aw: 1 <= iw <= 255, 1 <= ow <= 254 -The entire field is treated as a string value. - -@item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510 -The field is composed of characters in a string encoded as textual hex -digit pairs. - -The default output @var{w} is half the input @var{w}. -@end table - -@node Scratch Variables, , Input/Output Formats, Variables -@subsection Scratch Variables - -Most of the time, variables don't retain their values between cases. -Instead, either they're being read from a data file or the active file, -in which case they assume the value read, or, if created with -@cmd{COMPUTE} or -another transformation, they're initialized to the system-missing value -or to blanks, depending on type. - -However, sometimes it's useful to have a variable that keeps its value -between cases. You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can -use a @dfn{scratch variable}. Scratch variables are variables whose -names begin with an octothorpe (@samp{#}). - -Scratch variables have the same properties as variables left with -@cmd{LEAVE}: -they retain their values between cases, and for the first case they are -initialized to 0 or blanks. They have the additional property that they -are deleted before the execution of any procedure. For this reason, -scratch variables can't be used for analysis. To obtain the same -effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's -value into an ordinary variable, then analysis that variable. - -@node Files, BNF, Variables, Language -@section Files Used by PSPP - -PSPP makes use of many files each time it runs. Some of these it -reads, some it writes, some it creates. Here is a table listing the -most important of these files: - -@table @strong -@cindex file, command -@cindex file, syntax file -@cindex command file -@cindex syntax file -@item command file -@itemx syntax file -These names (synonyms) refer to the file that contains instructions to -PSPP that tell it what to do. The syntax file's name is specified on -the PSPP command line. Syntax files can also be pulled in with -@cmd{INCLUDE} (@pxref{INCLUDE}). - -@cindex file, data -@cindex data file -@item data file -Data files contain raw data in ASCII format suitable for being read in -by @cmd{DATA LIST}. Data can be embedded in the syntax -file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the -syntax file a data file too. - -@cindex file, output -@cindex output file -@item listing file -One or more output files are created by PSPP each time it is -run. The output files receive the tables and charts produced by -statistical procedures. The output files may be in any number of formats, -depending on how PSPP is configured. - -@cindex active file -@cindex file, active -@item active file -The active file is the ``file'' on which all PSPP procedures -are performed. The active file contains variable definitions and -cases. The active file is not necessarily a disk file: it is stored -in memory if there is room. -@end table - -@node BNF, , Files, Language -@section Backus-Naur Form -@cindex BNF -@cindex Backus-Naur Form -@cindex command syntax, description of -@cindex description of command syntax - -The syntax of some parts of the PSPP language is presented in this -manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The -following table describes BNF: - -@itemize @bullet -@cindex keywords -@cindex terminals -@item -Words in all-uppercase are PSPP keyword tokens. In BNF, these are -often called @dfn{terminals}. There are some special terminals, which -are actually written in lowercase for clarity: - -@table @asis -@cindex @code{number} -@item @code{number} -A real number. - -@cindex @code{integer} -@item @code{integer} -An integer number. - -@cindex @code{string} -@item @code{string} -A string. - -@cindex @code{var-name} -@item @code{var-name} -A single variable name. - -@cindex operators -@cindex punctuators -@item @code{=}, @code{/}, @code{+}, @code{-}, etc. -Operators and punctuators. - -@cindex @code{.} -@cindex terminal dot -@cindex dot, terminal -@item @code{.} -The terminal dot. This is not necessarily an actual dot in the syntax -file: @xref{Commands}, for more details. -@end table - -@item -@cindex productions -@cindex nonterminals -Other words in all lowercase refer to BNF definitions, called -@dfn{productions}. These productions are also known as -@dfn{nonterminals}. Some nonterminals are very common, so they are -defined here in English for clarity: - -@table @code -@cindex @code{var-list} -@item var-list -A list of one or more variable names or the keyword @code{ALL}. - -@cindex @code{expression} -@item expression -An expression. @xref{Expressions}, for details. -@end table - -@item -@cindex @code{::=} -@cindex ``is defined as'' -@cindex productions -@samp{::=} means ``is defined as''. The left side of @samp{::=} gives -the name of the nonterminal being defined. The right side of @samp{::=} -gives the definition of that nonterminal. If the right side is empty, -then one possible expansion of that nonterminal is nothing. A BNF -definition is called a @dfn{production}. - -@item -@cindex terminals and nonterminals, differences -So, the key difference between a terminal and a nonterminal is that a -terminal cannot be broken into smaller parts---in fact, every terminal -is a single token (@pxref{Tokens}). On the other hand, nonterminals are -composed of a (possibly empty) sequence of terminals and nonterminals. -Thus, terminals indicate the deepest level of syntax description. (In -parsing theory, terminals are the leaves of the parse tree; nonterminals -form the branches.) - -@item -@cindex start symbol -@cindex symbol, start -The first nonterminal defined in a set of productions is called the -@dfn{start symbol}. The start symbol defines the entire syntax for -that command. -@end itemize - -@node Expressions, Data Input and Output, Language, Top -@chapter Mathematical Expressions -@cindex expressions, mathematical -@cindex mathematical expressions - -Some PSPP commands use expressions, which share a common syntax -among all PSPP commands. Expressions are made up of -@dfn{operands}, which can be numbers, strings, or variable names, -separated by @dfn{operators}. There are five types of operators: -grouping, arithmetic, logical, relational, and functions. - -Every operator takes one or more @dfn{arguments} as input and produces -or @dfn{returns} exactly one result as output. Both strings and numeric -values can be used as arguments and are produced as results, but each -operator accepts only specific combinations of numeric and string values -as arguments. With few exceptions, operator arguments may be -full-fledged expressions in themselves. - -@menu -* Booleans:: Boolean values. -* Missing Values in Expressions:: Using missing values in expressions. -* Grouping Operators:: ( ) -* Arithmetic Operators:: + - * / ** -* Logical Operators:: AND NOT OR -* Relational Operators:: EQ GE GT LE LT NE -* Functions:: More-sophisticated operators. -* Order of Operations:: Operator precedence. -@end menu - -@node Booleans, Missing Values in Expressions, Expressions, Expressions -@section Boolean values -@cindex Boolean -@cindex values, Boolean - -There is a third type for arguments and results, the @dfn{Boolean} type, -which is used to represent true/false conditions. Booleans have only -three possible values: 0 (false), 1 (true), and system-missing. -System-missing is neither true nor false. - -@itemize @bullet -@item -A numeric expression that has value 0, 1, or system-missing may be used -in place of a Boolean. Thus, the expression @code{0 AND 1} is valid -(although it is always false). - -@item -A numeric expression with any other value will cause an error if it is -used as a Boolean. So, @code{2 OR 3} is invalid. - -@item -A Boolean expression may not be used in place of a numeric expression. -Thus, @code{(1>2) + (3<4)} is invalid. - -@item -Strings and Booleans are not compatible, and neither may be used in -place of the other. -@end itemize - -@node Missing Values in Expressions, Grouping Operators, Booleans, Expressions -@section Missing Values in Expressions - -String missing values are not treated specially in expressions. Most -numeric operators return system-missing when given system-missing -arguments. Exceptions are listed under particular operator -descriptions. - -User-missing values for numeric variables are always transformed into -the system-missing value, except inside the arguments to the -@code{VALUE}, @code{SYSMIS}, and @code{MISSING} functions. - -The missing-value functions can be used to precisely control how missing -values are treated in expressions. @xref{Missing Value Functions}, for -more details. - -@node Grouping Operators, Arithmetic Operators, Missing Values in Expressions, Expressions -@section Grouping Operators -@cindex parentheses -@cindex @samp{( )} -@cindex grouping operators -@cindex operators, grouping - -Parentheses (@samp{()}) are the grouping operators. Surround an -expression with parentheses to force early evaluation. - -Parentheses also surround the arguments to functions, but in that -situation they act as punctuators, not as operators. - -@node Arithmetic Operators, Logical Operators, Grouping Operators, Expressions -@section Arithmetic Operators -@cindex operators, arithmetic -@cindex arithmetic operators - -The arithmetic operators take numeric arguments and produce numeric -results. - -@table @code -@cindex @samp{+} -@cindex addition -@item @var{a} + @var{b} -Adds @var{a} and @var{b}, returning the sum. - -@cindex @samp{-} -@cindex subtraction -@item @var{a} - @var{b} -Subtracts @var{b} from @var{a}, returning the difference. - -@cindex @samp{*} -@cindex multiplication -@item @var{a} * @var{b} -Multiplies @var{a} and @var{b}, returning the product. - -@cindex @samp{/} -@cindex division -@item @var{a} / @var{b} -Divides @var{a} by @var{b}, returning the quotient. If @var{b} is -zero, the result is system-missing. - -@cindex @samp{**} -@cindex exponentiation -@item @var{a} ** @var{b} -Returns the result of raising @var{a} to the power @var{b}. If -@var{a} is negative and @var{b} is not an integer, the result is -system-missing. The result of @code{0**0} is system-missing as well. - -@cindex @samp{-} -@cindex negation -@item - @var{a} -Reverses the sign of @var{a}. -@end table - -@node Logical Operators, Relational Operators, Arithmetic Operators, Expressions -@section Logical Operators -@cindex logical operators -@cindex operators, logical - -@cindex true -@cindex false -@cindex Boolean -@cindex values, system-missing -@cindex system-missing -The logical operators take logical arguments and produce logical -results, meaning ``true or false''. PSPP logical operators are -not true Boolean operators because they may also result in a -system-missing value. - -@table @code -@cindex @code{AND} -@cindex @samp{&} -@cindex intersection, logical -@cindex logical intersection -@item @var{a} AND @var{b} -@itemx @var{a} & @var{b} -True if both @var{a} and @var{b} are true. However, if one argument is -false and the other is missing, the result is false, not missing. If -both arguments are missing, the result is missing. - -@cindex @code{OR} -@cindex @samp{|} -@cindex union, logical -@cindex logical union -@item @var{a} OR @var{b} -@itemx @var{a} | @var{b} -True if at least one of @var{a} and @var{b} is true. If one argument is -true and the other is missing, the result is true, not missing. If both -arguments are missing, the result is missing. - -@cindex @code{NOT} -@cindex @samp{~} -@cindex inversion, logical -@cindex logical inversion -@item NOT @var{a} -@itemx ~ @var{a} -True if @var{a} is false. -@end table - -@node Relational Operators, Functions, Logical Operators, Expressions -@section Relational Operators - -The relational operators take numeric or string arguments and produce Boolean -results. - -Note that, with numeric arguments, PSPP does not make exact -relational tests. Instead, two numbers are considered to be equal even -if they differ by a small amount. This amount, @dfn{epsilon}, is -dependent on the PSPP configuration and determined at compile -time. (The default value is 0.000000001, or -@ifinfo -@code{10**(-9)}.) -@end ifinfo -@tex -$10 ^{-9}$.) -@end tex -Use of epsilon allows for round-off errors. Use of epsilon is also -idiotic, but the author is not a numeric analyst. - -Strings cannot be compared to numbers. When strings of different -lengths are compared, the shorter string is right-padded with spaces -to match the length of the longer string. - -The results of string comparisons, other than tests for equality or -inequality, are dependent on the character set in use. String -comparisons are case-sensitive. - -@table @code -@cindex equality, testing -@cindex testing for equality -@cindex @code{EQ} -@cindex @samp{=} -@item @var{a} EQ @var{b} -@itemx @var{a} = @var{b} -True if @var{a} is equal to @var{b}. - -@cindex less than or equal to -@cindex @code{LE} -@cindex @code{<=} -@item @var{a} LE @var{b} -@itemx @var{a} <= @var{b} -True if @var{a} is less than or equal to @var{b}. - -@cindex less than -@cindex @code{LT} -@cindex @code{<} -@item @var{a} LT @var{b} -@itemx @var{a} < @var{b} -True if @var{a} is less than @var{b}. - -@cindex greater than or equal to -@cindex @code{GE} -@cindex @code{>=} -@item @var{a} GE @var{b} -@itemx @var{a} >= @var{b} -True if @var{a} is greater than or equal to @var{b}. - -@cindex greater than -@cindex @code{GT} -@cindex @samp{>} -@item @var{a} GT @var{b} -@itemx @var{a} > @var{b} -True if @var{a} is greater than @var{b}. - -@cindex inequality, testing -@cindex testing for inequality -@cindex @code{NE} -@cindex @code{~=} -@cindex @code{<>} -@item @var{a} NE @var{b} -@itemx @var{a} ~= @var{b} -@itemx @var{a} <> @var{b} -True is @var{a} is not equal to @var{b}. -@end table - -@node Functions, Order of Operations, Relational Operators, Expressions -@section Functions -@cindex functions - -@cindex mathematics -@cindex operators -@cindex parentheses -@cindex @code{(} -@cindex @code{)} -@cindex names, of functions -PSPP functions provide mathematical abilities above and beyond -those possible using simple operators. Functions have a common -syntax: each is composed of a function name followed by a left -parenthesis, one or more arguments, and a right parenthesis. Function -names are @strong{not} reserved; their names are specially treated -only when followed by a left parenthesis: @code{EXP(10)} refers to the -constant value @code{e} raised to the 10th power, but @code{EXP} by -itself refers to the value of variable EXP. - -The sections below describe each function in detail. - -@menu -* Advanced Mathematics:: EXP LG10 LN SQRT -* Miscellaneous Mathematics:: ABS MOD MOD10 RND TRUNC -* Trigonometry:: ACOS ARCOS ARSIN ARTAN ASIN ATAN COS SIN TAN -* Missing Value Functions:: MISSING NMISS NVALID SYSMIS VALUE -* Pseudo-Random Numbers:: NORMAL UNIFORM -* Set Membership:: ANY RANGE -* Statistical Functions:: CFVAR MAX MEAN MIN SD SUM VARIANCE -* String Functions:: CONCAT INDEX LENGTH LOWER LPAD LTRIM NUMBER - RINDEX RPAD RTRIM STRING SUBSTR UPCASE -* Time & Date:: CTIME.xxx DATE.xxx TIME.xxx XDATE.xxx -* Miscellaneous Functions:: LAG YRMODA -* Functions Not Implemented:: CDF.xxx CDFNORM IDF.xxx NCDF.xxx PROBIT RV.xxx -@end menu - -@node Advanced Mathematics, Miscellaneous Mathematics, Functions, Functions -@subsection Advanced Mathematical Functions -@cindex mathematics, advanced - -Advanced mathematical functions take numeric arguments and produce -numeric results. - -@deftypefn {Function} {} EXP (@var{exponent}) -Returns @i{e} (approximately 2.71828) raised to power @var{exponent}. -@end deftypefn - -@cindex logarithms -@deftypefn {Function} {} LG10 (@var{number}) -Takes the base-10 logarithm of @var{number}. If @var{number} is -not positive, the result is system-missing. -@end deftypefn - -@deftypefn {Function} {} LN (@var{number}) -Takes the base-@i{e} logarithm of @var{number}. If @var{number} is -not positive, the result is system-missing. -@end deftypefn - -@cindex square roots -@deftypefn {Function} {} SQRT (@var{number}) -Takes the square root of @var{number}. If @var{number} is negative, -the result is system-missing. -@end deftypefn - -@node Miscellaneous Mathematics, Trigonometry, Advanced Mathematics, Functions -@subsection Miscellaneous Mathematical Functions -@cindex mathematics, miscellaneous - -Miscellaneous mathematical functions take numeric arguments and produce -numeric results. - -@cindex absolute value -@deftypefn {Function} {} ABS (@var{number}) -Results in the absolute value of @var{number}. -@end deftypefn - -@cindex modulus -@deftypefn {Function} {} MOD (@var{numerator}, @var{denominator}) -Returns the remainder (modulus) of @var{numerator} divided by -@var{denominator}. If @var{denominator} is 0, the result is -system-missing. However, if @var{numerator} is 0 and -@var{denominator} is system-missing, the result is 0. -@end deftypefn - -@cindex modulus, by 10 -@deftypefn {Function} {} MOD10 (@var{number}) -Returns the remainder when @var{number} is divided by 10. If -@var{number} is negative, MOD10(@var{number}) is negative or zero. -@end deftypefn - -@cindex rounding -@deftypefn {Function} {} RND (@var{number}) -Takes the absolute value of @var{number} and rounds it to an integer. -Then, if @var{number} was negative originally, negates the result. -@end deftypefn - -@cindex truncation -@deftypefn {Function} {} TRUNC (@var{number}) -Discards the fractional part of @var{number}; that is, rounds -@var{number} towards zero. -@end deftypefn - -@node Trigonometry, Missing Value Functions, Miscellaneous Mathematics, Functions -@subsection Trigonometric Functions -@cindex trigonometry - -Trigonometric functions take numeric arguments and produce numeric -results. - -@cindex arccosine -@cindex inverse cosine -@deftypefn {Function} {} ACOS (@var{number}) -@deftypefnx {Function} {} ARCOS (@var{number}) -Takes the arccosine, in radians, of @var{number}. Results in -system-missing if @var{number} is not between -1 and 1. Portability: -none. -@end deftypefn - -@cindex arcsine -@cindex inverse sine -@deftypefn {Function} {} ARSIN (@var{number}) -Takes the arcsine, in radians, of @var{number}. Results in -system-missing if @var{number} is not between -1 and 1 inclusive. -@end deftypefn - -@cindex arctangent -@cindex inverse tangent -@deftypefn {Function} {} ARTAN (@var{number}) -Takes the arctangent, in radians, of @var{number}. -@end deftypefn - -@cindex arcsine -@cindex inverse sine -@deftypefn {Function} {} ASIN (@var{number}) -Takes the arcsine, in radians, of @var{number}. Results in -system-missing if @var{number} is not between -1 and 1 inclusive. -Portability: none. -@end deftypefn - -@cindex arctangent -@cindex inverse tangent -@deftypefn {Function} {} ATAN (@var{number}) -Takes the arctangent, in radians, of @var{number}. -@end deftypefn - -@quotation -@strong{Please note:} Use of the AR* group of inverse trigonometric -functions is recommended over the A* group because they are more -portable. -@end quotation - -@cindex cosine -@deftypefn {Function} {} COS (@var{angle}) -Takes the cosine of @var{angle} which should be in radians. -@end deftypefn - -@cindex sine -@deftypefn {Function} {} SIN (@var{angle}) -Takes the sine of @var{angle} which should be in radians. -@end deftypefn - -@cindex tangent -@deftypefn {Function} {} TAN (@var{angle}) -Takes the tangent of @var{angle} which should be in radians. -Results in system-missing at values -of @var{angle} that are too close to odd multiples of pi/2. -Portability: none. -@end deftypefn - -@node Missing Value Functions, Pseudo-Random Numbers, Trigonometry, Functions -@subsection Missing-Value Functions -@cindex missing values -@cindex values, missing -@cindex functions, missing-value - -Missing-value functions take various types as arguments, returning -various types of results. - -@deftypefn {Function} {} MISSING (@var{variable or expression}) -@var{num} may be a single variable name or an expression. If it is a -variable name, results in 1 if the variable has a user-missing or -system-missing value for the current case, 0 otherwise. If it is an -expression, results in 1 if the expression has the system-missing value, -0 otherwise. - -@quotation -@strong{Please note:} If the argument is a string expression other than -a variable name, MISSING is guaranteed to return 0, because strings do -not have a system-missing value. Also, when using a numeric expression -argument, remember that user-missing values are converted to the -system-missing value in most contexts. Thus, the expressions -@code{MISSING(VAR1 @var{op} VAR2)} and @code{MISSING(VAR1) OR -MISSING(VAR2)} are often equivalent, depending on the specific operator -@var{op} used. -@end quotation -@end deftypefn - -@deftypefn {Function} {} NMISS (@var{expr} [, @var{expr}]@dots{}) -Each argument must be a numeric expression. Returns the number of -user- or system-missing values in the list. As a special extension, -the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a -range of variables; see @ref{Sets of Variables}, for more details. -@end deftypefn - -@deftypefn {Function} {} NVALID (@var{expr} [, @var{expr}]@dots{}) -Each argument must be a numeric expression. Returns the number of -values in the list that are not user- or system-missing. As a special extension, -the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a -range of variables; see @ref{Sets of Variables}, for more details. -@end deftypefn - -@deftypefn {Function} {} SYSMIS (@var{variable or expression}) -When given the name of a numeric variable, returns 1 if the value of -that variable is system-missing. Otherwise, if the value is not -missing or if it is user-missing, returns 0. If given the name of a -string variable, always returns 1. If given an expression other than -a single variable name, results in 1 if the value is system- or -user-missing, 0 otherwise. -@end deftypefn - -@deftypefn {Function} {} VALUE (@var{variable}) -Prevents the user-missing values of @var{variable} from being -transformed into system-missing values: If @var{variable} is not -system- or user-missing, results in the value of @var{variable}. If -@var{variable} is user-missing, results in the value of @var{variable} -anyway. If @var{variable} is system-missing, results in system-missing. -@end deftypefn - -@node Pseudo-Random Numbers, Set Membership, Missing Value Functions, Functions -@subsection Pseudo-Random Number Generation Functions -@cindex random numbers -@cindex pseudo-random numbers (see random numbers) - -Pseudo-random number generation functions take numeric arguments and -produce numeric results. - -PSPP uses the alleged RC4 cipher as a pseudo-random number generator -(PRNG). The bytes output by this PRNG are system-independent for a -given random seed, but differences in endianness and floating-point -formats will make PRNG results differ from system to system. RC4 -should produce high-quality random numbers for simulation purposes. -(If you're concerned about the quality of the random number generator, -well, you're using a statistical processing package---analyze it!) - -PSPP's implementation of RC4 has not undergone any security auditing. -Furthermore, various precautions that would be necessary for secure -operation, such as secure seeding and discarding the first several -bytes of output, have not been taken. Therefore, PSPP's -implementation of RC4 should not be used for security purposes. - -@cindex random numbers, normally-distributed -@deftypefn {Function} {} NORMAL (@var{number}) -Results in a random number. Results from @code{NORMAL} are normally -distributed with a mean of 0 and a standard deviation of @var{number}. -@end deftypefn - -@cindex random numbers, uniformly-distributed -@deftypefn {Function} {} UNIFORM (@var{number}) -Results in a random number between 0 and @var{number}. Results from -@code{UNIFORM} are evenly distributed across its entire range. There -may be a maximum on the largest random number ever generated---this is -often -@ifinfo -2**31-1 -@end ifinfo -@tex -$2^{31}-1$ -@end tex -(2,147,483,647), but it may be orders of magnitude -higher or lower. -@end deftypefn - -@node Set Membership, Statistical Functions, Pseudo-Random Numbers, Functions -@subsection Set-Membership Functions -@cindex set membership -@cindex membership, of set - -Set membership functions determine whether a value is a member of a set. -They take a set of numeric arguments or a set of string arguments, and -produce Boolean results. - -String comparisons are performed according to the rules given in -@ref{Relational Operators}. - -@deftypefn {Function} {} ANY (@var{value}, @var{set} [, @var{set}]@dots{}) -Results in true if @var{value} is equal to any of the @var{set} -values. Otherwise, results in false. If @var{value} is -system-missing, returns system-missing. System-missing values in -@var{set} do not cause ANY to return system-missing. -@end deftypefn - -@deftypefn {Function} {} RANGE (@var{value}, @var{low}, @var{high} [, @var{low}, @var{high}]@dots{}) -Results in true if @var{value} is in any of the intervals bounded by -@var{low} and @var{high} inclusive. Otherwise, results in false. -Each @var{low} must be less than or equal to its corresponding -@var{high} value. @var{low} and @var{high} must be given in pairs. -If @var{value} is system-missing, returns system-missing. -System-missing values in @var{set} do not cause RANGE to return -system-missing. -@end deftypefn - -@node Statistical Functions, String Functions, Set Membership, Functions -@subsection Statistical Functions -@cindex functions, statistical -@cindex statistics - -Statistical functions compute descriptive statistics on a list of -values. Some statistics can be computed on numeric or string values; -other can only be computed on numeric values. Their results have the -same type as their arguments. The current case's weighting factor -(@pxref{WEIGHT}) has no effect on statistical functions. - -@cindex arguments, minimum valid -@cindex minimum valid number of arguments -With statistical functions it is possible to specify a minimum number of -non-missing arguments for the function to be evaluated. To do so, -append a dot and the number to the function name. For instance, to -specify a minimum of three valid arguments to the MEAN function, use the -name @code{MEAN.3}. - -@cindex coefficient of variation -@cindex variation, coefficient of -@deftypefn {Function} {} CFVAR (@var{number}, @var{number}[, @dots{}]) -Results in the coefficient of variation of the values of @var{number}. -This function requires at least two valid arguments to give a -non-missing result. (The coefficient of variation is the standard -deviation divided by the mean.) -@end deftypefn - -@cindex maximum -@deftypefn {Function} {} MAX (@var{value}, @var{value}[, @dots{}]) -Results in the value of the greatest @var{value}. The @var{value}s may -be numeric or string. Although at least two arguments must be given, -only one need be valid for MAX to give a non-missing result. -@end deftypefn - -@cindex mean -@deftypefn {Function} {} MEAN (@var{number}, @var{number}[, @dots{}]) -Results in the mean of the values of @var{number}. Although at least -two arguments must be given, only one need be valid for MEAN to give a -non-missing result. -@end deftypefn - -@cindex minimum -@deftypefn {Function} {} MIN (@var{number}, @var{number}[, @dots{}]) -Results in the value of the least @var{value}. The @var{value}s may -be numeric or string. Although at least two arguments must be given, -only one need be valid for MAX to give a non-missing result. -@end deftypefn - -@cindex standard deviation -@cindex deviation, standard -@deftypefn {Function} {} SD (@var{number}, @var{number}[, @dots{}]) -Results in the standard deviation of the values of @var{number}. -This function requires at least two valid arguments to give a -non-missing result. -@end deftypefn - -@cindex sum -@deftypefn {Function} {} SUM (@var{number}, @var{number}[, @dots{}]) -Results in the sum of the values of @var{number}. Although at least two -arguments must be given, only one need by valid for SUM to give a -non-missing result. -@end deftypefn - -@cindex variance -@deftypefn {Function} {} VAR (@var{number}, @var{number}[, @dots{}]) -Results in the variance of the values of @var{number}. This function -requires at least two valid arguments to give a non-missing result. -@end deftypefn - -@deftypefn {Function} {} VARIANCE (@var{number}, @var{number}[, @dots{}]) -Results in the variance of the values of @var{number}. This function -requires at least two valid arguments to give a non-missing result. -(Use VAR in preference to VARIANCE for reasons of portability.) -@end deftypefn - -@node String Functions, Time & Date, Statistical Functions, Functions -@subsection String Functions -@cindex functions, string -@cindex string functions - -String functions take various arguments and return various results. - -@cindex concatenation -@cindex strings, concatenation of -@deftypefn {Function} {} CONCAT (@var{string}, @var{string}[, @dots{}]) -Returns a string consisting of each @var{string} in sequence. -@code{CONCAT("abc", "def", "ghi")} has a value of @code{"abcdefghi"}. -The resultant string is truncated to a maximum of 255 characters. -@end deftypefn - -@cindex searching strings -@deftypefn {Function} {} INDEX (@var{haystack}, @var{needle}) -Returns a positive integer indicating the position of the first -occurrence @var{needle} in @var{haystack}. Returns 0 if @var{haystack} -does not contain @var{needle}. Returns system-missing if @var{needle} -is an empty string. -@end deftypefn - -@deftypefn {Function} {} INDEX (@var{haystack}, @var{needle}, @var{divisor}) -Divides @var{needle} into parts, each with length @var{divisor}. -Searches @var{haystack} for the first occurrence of each part, and -returns the smallest value. Returns 0 if @var{haystack} does not -contain any part in @var{needle}. It is an error if @var{divisor} -cannot be evenly divided into the length of @var{needle}. Returns -system-missing if @var{needle} is an empty string. -@end deftypefn - -@cindex strings, finding length of -@deftypefn {Function} {} LENGTH (@var{string}) -Returns the number of characters in @var{string}. -@end deftypefn - -@cindex strings, case of -@deftypefn {Function} {} LOWER (@var{string}) -Returns a string identical to @var{string} except that all uppercase -letters are changed to lowercase letters. The definitions of -``uppercase'' and ``lowercase'' are system-dependent. -@end deftypefn - -@cindex strings, padding -@deftypefn {Function} {} LPAD (@var{string}, @var{length}) -If @var{string} is at least @var{length} characters in length, returns -@var{string} unchanged. Otherwise, returns @var{string} padded with -spaces on the left side to length @var{length}. Returns an empty string -if @var{length} is system-missing, negative, or greater than 255. -@end deftypefn - -@deftypefn {Function} {} LPAD (@var{string}, @var{length}, @var{padding}) -If @var{string} is at least @var{length} characters in length, returns -@var{string} unchanged. Otherwise, returns @var{string} padded with -@var{padding} on the left side to length @var{length}. Returns an empty -string if @var{length} is system-missing, negative, or greater than 255, or -if @var{padding} does not contain exactly one character. -@end deftypefn - -@cindex strings, trimming -@cindex whitespace, trimming -@deftypefn {Function} {} LTRIM (@var{string}) -Returns @var{string}, after removing leading spaces. Other whitespace, -such as tabs, carriage returns, line feeds, and vertical tabs, is not -removed. -@end deftypefn - -@deftypefn {Function} {} LTRIM (@var{string}, @var{padding}) -Returns @var{string}, after removing leading @var{padding} characters. -If @var{padding} does not contain exactly one character, returns an -empty string. -@end deftypefn - -@cindex numbers, converting from strings -@cindex strings, converting to numbers -@deftypefn {Function} {} NUMBER (@var{string}) -Returns the number produced when @var{string} is interpreted according -to format F@var{x}.0, where @var{x} is the number of characters in -@var{string}. If @var{string} does not form a proper number, -system-missing is returned without an error message. Portability: none. -@end deftypefn - -@deftypefn {Function} {} NUMBER (@var{string}, @var{format}) -Returns the number produced when @var{string} is interpreted according -to format specifier @var{format}. Only the number of characters in -@var{string} specified by @var{format} are examined. For example, -@code{NUMBER("123", F3.0)} and @code{NUMBER("1234", F3.0)} both have -value 123. If @var{string} does not form a proper number, -system-missing is returned without an error message. -@end deftypefn - -@cindex strings, searching backwards -@deftypefn {Function} {} RINDEX (@var{string}, @var{format}) -Returns a positive integer indicating the position of the last -occurrence of @var{needle} in @var{haystack}. Returns 0 if -@var{haystack} does not contain @var{needle}. Returns system-missing if -@var{needle} is an empty string. -@end deftypefn - -@deftypefn {Function} {} RINDEX (@var{haystack}, @var{needle}, @var{divisor}) -Divides @var{needle} into parts, each with length @var{divisor}. -Searches @var{haystack} for the last occurrence of each part, and -returns the largest value. Returns 0 if @var{haystack} does not contain -any part in @var{needle}. It is an error if @var{divisor} cannot be -evenly divided into the length of @var{needle}. Returns system-missing -if @var{needle} is an empty string. -@end deftypefn - -@cindex padding strings -@cindex strings, padding -@deftypefn {Function} {} RPAD (@var{string}, @var{length}) -If @var{string} is at least @var{length} characters in length, returns -@var{string} unchanged. Otherwise, returns @var{string} padded with -spaces on the right to length @var{length}. Returns an empty string if -@var{length} is system-missing, negative, or greater than 255. -@end deftypefn - -@deftypefn {Function} {} RPAD (@var{string}, @var{length}, @var{padding}) -If @var{string} is at least @var{length} characters in length, returns -@var{string} unchanged. Otherwise, returns @var{string} padded with -@var{padding} on the right to length @var{length}. Returns an empty -string if @var{length} is system-missing, negative, or greater than 255, -or if @var{padding} does not contain exactly one character. -@end deftypefn - -@cindex strings, trimming -@cindex whitespace, trimming -@deftypefn {Function} {} RTRIM (@var{string}) -Returns @var{string}, after removing trailing spaces. Other types of -whitespace are not removed. -@end deftypefn - -@deftypefn {Function} {} RTRIM (@var{string}, @var{padding}) -Returns @var{string}, after removing trailing @var{padding} characters. -If @var{padding} does not contain exactly one character, returns an -empty string. -@end deftypefn - -@cindex strings, converting from numbers -@cindex numbers, converting to strings -@deftypefn {Function} {} STRING (@var{number}, @var{format}) -Returns a string corresponding to @var{number} in the format given by -format specifier @var{format}. For example, @code{STRING(123.56, F5.1)} -has the value @code{"123.6"}. -@end deftypefn - -@cindex substrings -@cindex strings, taking substrings of -@deftypefn {Function} {} SUBSTR (@var{string}, @var{start}) -Returns a string consisting of the value of @var{string} from position -@var{start} onward. Returns an empty string if @var{start} is system-missing -or has a value less than 1 or greater than the number of characters in -@var{string}. -@end deftypefn - -@deftypefn {Function} {} SUBSTR (@var{string}, @var{start}, @var{count}) -Returns a string consisting of the first @var{count} characters from -@var{string} beginning at position @var{start}. Returns an empty string -if @var{start} or @var{count} is system-missing, if @var{start} is less -than 1 or greater than the number of characters in @var{string}, or if -@var{count} is less than 1. Returns a string shorter than @var{count} -characters if @var{start} + @var{count} - 1 is greater than the number -of characters in @var{string}. Examples: @code{SUBSTR("abcdefg", 3, 2)} -has value @code{"cd"}; @code{SUBSTR("Ben Pfaff", 5, 10)} has the value -@code{"Pfaff"}. -@end deftypefn - -@cindex case conversion -@cindex strings, case of -@deftypefn {Function} {} UPCASE (@var{string}) -Returns @var{string}, changing lowercase letters to uppercase letters. -@end deftypefn - -@node Time & Date, Miscellaneous Functions, String Functions, Functions -@subsection Time & Date Functions -@cindex functions, time & date -@cindex times -@cindex dates - -@cindex dates, legal range of -The legal range of dates for use in PSPP is 15 Oct 1582 -through 31 Dec 19999. - -@cindex arguments, invalid -@cindex invalid arguments -@quotation -@strong{Please note:} Most time & date extraction functions will accept -invalid arguments: - -@itemize @bullet -@item -Negative numbers in PSPP time format. -@item -Numbers less than 86,400 in PSPP date format. -@end itemize - -However, sensible results are not guaranteed for these invalid values. -The given equivalents for these functions are definitely not guaranteed -for invalid values. -@end quotation - -@quotation -@strong{Please note also:} The time & date construction -functions @strong{do} produce reasonable and useful results for -out-of-range values; these are not considered invalid. -@end quotation - -@menu -* Time & Date Concepts:: How times & dates are defined and represented -* Time Construction:: TIME.@{DAYS HMS@} -* Time Extraction:: CTIME.@{DAYS HOURS MINUTES SECONDS@} -* Date Construction:: DATE.@{DMY MDY MOYR QYR WKYR YRDAY@} -* Date Extraction:: XDATE.@{DATE HOUR JDAY MDAY MINUTE MONTH - QUARTER SECOND TDAY TIME WEEK - WKDAY YEAR@} -@end menu - -@node Time & Date Concepts, Time Construction, Time & Date, Time & Date -@subsubsection How times & dates are defined and represented - -@cindex time, concepts -@cindex time, intervals -Times and dates are handled by PSPP as single numbers. A -@dfn{time} is an interval. PSPP measures times in seconds. -Thus, the following intervals correspond with the numeric values given: - -@example - 10 minutes 600 - 1 hour 3,600 - 1 day, 3 hours, 10 seconds 97,210 - 40 days 3,456,000 - 10010 d, 14 min, 24 s 864,864,864 -@end example - -@cindex dates, concepts -@cindex time, instants of -A @dfn{date}, on the other hand, is a particular instant in the past or -the future. PSPP represents a date as a number of seconds after the -midnight that separated 8 Oct 1582 and 9 Oct 1582. (Please note that 15 -Oct 1582 immediately followed 9 Oct 1582.) Thus, the midnights before -the dates given below correspond with the numeric PSPP dates given: - -@example - 15 Oct 1582 86,400 - 4 Jul 1776 6,113,318,400 - 1 Jan 1900 10,010,390,400 - 1 Oct 1978 12,495,427,200 - 24 Aug 1995 13,028,601,600 -@end example - -@cindex time, mathematical properties of -@cindex mathematics, applied to times & dates -@cindex dates, mathematical properties of -@noindent -Please note: - -@itemize @bullet -@item -A time may be added to, or subtracted from, a date, resulting in a date. - -@item -The difference of two dates may be taken, resulting in a time. - -@item -Two times may be added to, or subtracted from, each other, resulting in -a time. -@end itemize - -(Adding two dates does not produce a useful result.) - -Since times and dates are merely numbers, the ordinary addition and -subtraction operators are employed for these purposes. - -@quotation -@strong{Please note:} Many dates and times have extremely large -values---just look at the values above. Thus, it is not a good idea to -take powers of these values; also, the accuracy of some procedures may -be affected. If necessary, convert times or dates in seconds to some -other unit, like days or years, before performing analysis. -@end quotation - -@node Time Construction, Time Extraction, Time & Date Concepts, Time & Date -@subsubsection Functions that Produce Times -@cindex times, constructing -@cindex constructing times - -These functions take numeric arguments and produce numeric results in -PSPP time format. - -@cindex days -@cindex time, in days -@deftypefn {Function} {} TIME.DAYS (@var{ndays}) -Results in a time value corresponding to @var{ndays} days. -(@code{TIME.DAYS(@var{x})} is equivalent to @code{@var{x} * 60 * 60 * -24}.) -@end deftypefn - -@cindex hours-minutes-seconds -@cindex time, in hours-minutes-seconds -@deftypefn {Function} {} TIME.HMS (@var{nhours}, @var{nmins}, @var{nsecs}) -Results in a time value corresponding to @var{nhours} hours, @var{nmins} -minutes, and @var{nsecs} seconds. (@code{TIME.HMS(@var{h}, @var{m}, -@var{s})} is equivalent to @code{@var{h}*60*60 + @var{m}*60 + -@var{s}}.) -@end deftypefn - -@node Time Extraction, Date Construction, Time Construction, Time & Date -@subsubsection Functions that Examine Times -@cindex extraction, of time -@cindex time examination -@cindex examination, of times -@cindex time, lengths of - -These functions take numeric arguments in PSPP time format and -give numeric results. - -@cindex days -@cindex time, in days -@deftypefn {Function} {} CTIME.DAYS (@var{time}) -Results in the number of days and fractional days in @var{time}. -(@code{CTIME.DAYS(@var{x})} is equivalent to @code{@var{x}/60/60/24}.) -@end deftypefn - -@cindex hours -@cindex time, in hours -@deftypefn {Function} {} CTIME.HOURS (@var{time}) -Results in the number of hours and fractional hours in @var{time}. -(@code{CTIME.HOURS(@var{x})} is equivalent to @code{@var{x}/60/60}.) -@end deftypefn - -@cindex minutes -@cindex time, in minutes -@deftypefn {Function} {} CTIME.MINUTES (@var{time}) -Results in the number of minutes and fractional minutes in @var{time}. -(@code{CTIME.MINUTES(@var{x})} is equivalent to @code{@var{x}/60}.) -@end deftypefn - -@cindex seconds -@cindex time, in seconds -@deftypefn {Function} {} CTIME.SECONDS (@var{time}) -Results in the number of seconds and fractional seconds in @var{time}. -(@code{CTIME.SECONDS} does nothing; @code{CTIME.SECONDS(@var{x})} is -equivalent to @code{@var{x}}.) -@end deftypefn - -@node Date Construction, Date Extraction, Time Extraction, Time & Date -@subsubsection Functions that Produce Dates -@cindex dates, constructing -@cindex constructing dates - -@cindex arguments, of date construction functions -These functions take numeric arguments and give numeric results in the -PSPP date format. Arguments taken by these functions are: - -@table @var -@item day -Refers to a day of the month between 1 and 31. - -@item month -Refers to a month of the year between 1 and 12. - -@item quarter -Refers to a quarter of the year between 1 and 4. The quarters of the -year begin on the first days of months 1, 4, 7, and 10. - -@item week -Refers to a week of the year between 1 and 53. - -@item yday -Refers to a day of the year between 1 and 366. - -@item year -Refers to a year between 1582 and 19999. -@end table - -@cindex arguments, invalid -If these functions' arguments are out-of-range, they are correctly -normalized before conversion to date format. Non-integers are rounded -toward zero. - -@cindex day-month-year -@cindex dates, day-month-year -@deftypefn {Function} {} DATE.DMY (@var{day}, @var{month}, @var{year}) -@deftypefnx {Function} {} DATE.MDY (@var{month}, @var{day}, @var{year}) -Results in a date value corresponding to the midnight before day -@var{day} of month @var{month} of year @var{year}. -@end deftypefn - -@cindex month-year -@cindex dates, month-year -@deftypefn {Function} {} DATE.MOYR (@var{month}, @var{year}) -Results in a date value corresponding to the midnight before the first -day of month @var{month} of year @var{year}. -@end deftypefn - -@cindex quarter-year -@cindex dates, quarter-year -@deftypefn {Function} {} DATE.QYR (@var{quarter}, @var{year}) -Results in a date value corresponding to the midnight before the first -day of quarter @var{quarter} of year @var{year}. -@end deftypefn - -@cindex week-year -@cindex dates, week-year -@deftypefn {Function} {} DATE.WKYR (@var{week}, @var{year}) -Results in a date value corresponding to the midnight before the first -day of week @var{week} of year @var{year}. -@end deftypefn - -@cindex year-day -@cindex dates, year-day -@deftypefn {Function} {} DATE.YRDAY (@var{year}, @var{yday}) -Results in a date value corresponding to the midnight before day -@var{yday} of year @var{year}. -@end deftypefn - -@node Date Extraction, , Date Construction, Time & Date -@subsubsection Functions that Examine Dates -@cindex extraction, of dates -@cindex date examination - -@cindex arguments, of date extraction functions -These functions take numeric arguments in PSPP date or time -format and give numeric results. These names are used for arguments: - -@table @var -@item date -A numeric value in PSPP date format. - -@item time -A numeric value in PSPP time format. - -@item time-or-date -A numeric value in PSPP time or date format. -@end table - -@cindex days -@cindex dates, in days -@cindex time, in days -@deftypefn {Function} {} XDATE.DATE (@var{time-or-date}) -For a time, results in the time corresponding to the number of whole -days @var{date-or-time} includes. For a date, results in the date -corresponding to the latest midnight at or before @var{date-or-time}; -that is, gives the date that @var{date-or-time} is in. -(XDATE.DATE(@var{x}) is equivalent to TRUNC(@var{x}/86400)*86400.) -Applying this function to a time is a non-portable feature. -@end deftypefn - -@cindex hours -@cindex dates, in hours -@cindex time, in hours -@deftypefn {Function} {} XDATE.HOUR (@var{time-or-date}) -For a time, results in the number of whole hours beyond the number of -whole days represented by @var{date-or-time}. For a date, results in -the hour (as an integer between 0 and 23) corresponding to -@var{date-or-time}. (XDATE.HOUR(@var{x}) is equivalent to -MOD(TRUNC(@var{x}/3600),24)) Applying this function to a time is a -non-portable feature. -@end deftypefn - -@cindex day of the year -@cindex dates, day of the year -@deftypefn {Function} {} XDATE.JDAY (@var{date}) -Results in the day of the year (as an integer between 1 and 366) -corresponding to @var{date}. -@end deftypefn - -@cindex day of the month -@cindex dates, day of the month -@deftypefn {Function} {} XDATE.MDAY (@var{date}) -Results in the day of the month (as an integer between 1 and 31) -corresponding to @var{date}. -@end deftypefn - -@cindex minutes -@cindex dates, in minutes -@cindex time, in minutes -@deftypefn {Function} {} XDATE.MINUTE (@var{time-or-date}) -Results in the number of minutes (as an integer between 0 and 59) after -the last hour in @var{time-or-date}. (XDATE.MINUTE(@var{x}) is -equivalent to MOD(TRUNC(@var{x}/60),60)) Applying this function to a -time is a non-portable feature. -@end deftypefn - -@cindex months -@cindex dates, in months -@deftypefn {Function} {} XDATE.MONTH (@var{date}) -Results in the month of the year (as an integer between 1 and 12) -corresponding to @var{date}. -@end deftypefn - -@cindex quarters -@cindex dates, in quarters -@deftypefn {Function} {} XDATE.QUARTER (@var{date}) -Results in the quarter of the year (as an integer between 1 and 4) -corresponding to @var{date}. -@end deftypefn - -@cindex seconds -@cindex dates, in seconds -@cindex time, in seconds -@deftypefn {Function} {} XDATE.SECOND (@var{time-or-date}) -Results in the number of whole seconds after the last whole minute (as -an integer between 0 and 59) in @var{time-or-date}. -(XDATE.SECOND(@var{x}) is equivalent to MOD(@var{x}, 60).) Applying -this function to a time is a non-portable feature. -@end deftypefn - -@cindex days -@cindex times, in days -@deftypefn {Function} {} XDATE.TDAY (@var{time}) -Results in the number of whole days (as an integer) in @var{time}. -(XDATE.TDAY(@var{x}) is equivalent to TRUNC(@var{x}/86400).) -@end deftypefn - -@cindex time -@cindex dates, time of day -@deftypefn {Function} {} XDATE.TIME (@var{date}) -Results in the time of day at the instant corresponding to @var{date}, -in PSPP time format. This is the number of seconds since -midnight on the day corresponding to @var{date}. (XDATE.TIME(@var{x}) is -equivalent to TRUNC(@var{x}/86400)*86400.) -@end deftypefn - -@cindex week -@cindex dates, in weeks -@deftypefn {Function} {} XDATE.WEEK (@var{date}) -Results in the week of the year (as an integer between 1 and 53) -corresponding to @var{date}. -@end deftypefn - -@cindex day of the week -@cindex weekday -@cindex dates, day of the week -@cindex dates, in weekdays -@deftypefn {Function} {} XDATE.WKDAY (@var{date}) -Results in the day of week (as an integer between 1 and 7) corresponding -to @var{date}. The days of the week are: - -@table @asis -@item 1 -Sunday -@item 2 -Monday -@item 3 -Tuesday -@item 4 -Wednesday -@item 5 -Thursday -@item 6 -Friday -@item 7 -Saturday -@end table -@end deftypefn - -@cindex years -@cindex dates, in years -@deftypefn {Function} {} XDATE.YEAR (@var{date}) -Returns the year (as an integer between 1582 and 19999) corresponding to -@var{date}. -@end deftypefn - -@node Miscellaneous Functions, Functions Not Implemented, Time & Date, Functions -@subsection Miscellaneous Functions -@cindex functions, miscellaneous - -Miscellaneous functions take various arguments and produce various -results. - -@cindex cross-case function -@cindex function, cross-case -@deftypefn {Function} {} LAG (@var{variable}) -@var{variable} must be a numeric or string variable name. @code{LAG} -results in the value of that variable for the case before the current -one. In case-selection procedures, @code{LAG} results in the value of -the variable for the last case selected. Results in system-missing (for -numeric variables) or blanks (for string variables) for the first case -or before any cases are selected. -@end deftypefn - -@deftypefn {Function} {} LAG (@var{variable}, @var{ncases}) -@var{variable} must be a numeric or string variable name. @var{ncases} -must be a small positive constant integer, although there is no explicit -limit. (Use of a large value for @var{ncases} will increase memory -consumption, since PSPP must keep @var{ncases} cases in memory.) -@code{LAG (@var{variable}, @var{ncases}} results in the value of -@var{variable} that is @var{ncases} before the case currently being -processed. See @code{LAG (@var{variable})} above for more details. -@end deftypefn - -@cindex date, Julian -@cindex Julian date -@deftypefn {Function} {} YRMODA (@var{year}, @var{month}, @var{day}) -@var{year} is a year between 0 and 199 or 1582 and 19999. @var{month} is -a month between 1 and 12. @var{day} is a day between 1 and 31. If -@var{month} or @var{day} is out-of-range, it changes the next higher -unit. For instance, a @var{day} of 0 refers to the last day of the -previous month, and a @var{month} of 13 refers to the first month of the -next year. @var{year} must be in range. If @var{year} is between 0 and -199, 1900 is added. @var{year}, @var{month}, and @var{day} must all be -integers. - -@code{YRMODA} results in the number of days between 15 Oct 1582 and -the date specified, plus one. The date passed to @code{YRMODA} must be -on or after 15 Oct 1582. 15 Oct 1582 has a value of 1. -@end deftypefn - -@node Functions Not Implemented, , Miscellaneous Functions, Functions -@subsection Functions Not Implemented -@cindex functions, not implemented -@cindex not implemented -@cindex features, not implemented - -These functions are not yet implemented and thus not yet documented, -since it's a hassle. - -@findex CDF.xxx -@findex CDFNORM -@findex IDF.xxx -@findex NCDF.xxx -@findex PROBIT -@findex RV.xxx - -@itemize @bullet -@item -@code{CDF.xxx} -@item -@code{CDFNORM} -@item -@code{IDF.xxx} -@item -@code{NCDF.xxx} -@item -@code{PROBIT} -@item -@code{RV.xxx} -@end itemize - -@node Order of Operations, , Functions, Expressions -@section Operator Precedence -@cindex operator precedence -@cindex precedence, operator -@cindex order of operations -@cindex operations, order of - -The following table describes operator precedence. Smaller-numbered -levels in the table have higher precedence. Within a level, operations -are performed from left to right, except for level 2 (exponentiation), -where operations are performed from right to left. If an operator -appears in the table in two places (@code{-}), the first occurrence is -unary, the second is binary. - -@enumerate -@item -@code{( )} -@item -@code{**} -@item -@code{-} -@item -@code{* /} -@item -@code{+ -} -@item -@code{EQ GE GT LE LT NE} -@item -@code{AND NOT OR} -@end enumerate - -@node Data Input and Output, System and Portable Files, Expressions, Top -@chapter Data Input and Output -@cindex input -@cindex output -@cindex data -@cindex cases -@cindex observations - -Data are the focus of the PSPP language. -Each datum belongs to a @dfn{case} (also called an @dfn{observation}). -Each case represents an individual or `experimental unit'. -For example, in the results of a survey, the names of the respondents, -their sex, age @i{etc}. and their responses are all data and the data -pertaining to single respondent is a case. -This chapter examines -the PSPP commands for defining variables and reading and writing data. - -@quotation -@strong{Please note:} Data is not actually read until a procedure is -executed. These commands tell PSPP how to read data, but they -do not @emph{cause} PSPP to read data. -@end quotation - -@menu -* BEGIN DATA:: Embed data within a syntax file. -* CLEAR TRANSFORMATIONS:: Clear pending transformations. -* DATA LIST:: Fundamental data reading command. -* END CASE:: Output the current case. -* END FILE:: Terminate the current input program. -* FILE HANDLE:: Support for fixed-length records. -* INPUT PROGRAM:: Support for complex input programs. -* LIST:: List cases in the active file. -* MATRIX DATA:: Read matrices in text format. -* NEW FILE:: Clear the active file and dictionary. -* PRINT:: Display values in print formats. -* PRINT EJECT:: Eject the current page then print. -* PRINT SPACE:: Print blank lines. -* REREAD:: Take another look at the previous input line. -* REPEATING DATA:: Multiple cases on a single line. -* WRITE:: Display values in write formats. -@end menu - -@node BEGIN DATA, CLEAR TRANSFORMATIONS, Data Input and Output, Data Input and Output -@section BEGIN DATA -@vindex BEGIN DATA -@vindex END DATA -@cindex Embedding data in syntax files -@cindex Data, embedding in syntax files - -@display -BEGIN DATA. -@dots{} -END DATA. -@end display - -@cmd{BEGIN DATA} and @cmd{END DATA} can be used to embed raw ASCII -data in a PSPP syntax file. @cmd{DATA LIST} or another input -procedure must be used before @cmd{BEGIN DATA} (@pxref{DATA LIST}). -@cmd{BEGIN DATA} and @cmd{END DATA} must be used together. @cmd{END -DATA} must appear by itself on a single line, with no leading -whitespace and exactly one space between the words @code{END} and -@code{DATA}, followed immediately by the terminal dot, like this: - -@example -END DATA. -@end example - -@node CLEAR TRANSFORMATIONS, DATA LIST, BEGIN DATA, Data Input and Output -@section CLEAR TRANSFORMATIONS -@vindex CLEAR TRANSFORMATIONS - -@display -CLEAR TRANSFORMATIONS. -@end display - -@cmd{CLEAR TRANSFORMATIONS} clears out all pending -transformations. It does not cancel the current input program. It is -valid only when PSPP is interactive, not in syntax files. - -@node DATA LIST, END CASE, CLEAR TRANSFORMATIONS, Data Input and Output -@section DATA LIST -@vindex DATA LIST -@cindex reading data from a file -@cindex data, reading from a file -@cindex data, embedding in syntax files -@cindex embedding data in syntax files - -Used to read text or binary data, @cmd{DATA LIST} is the most -fundamental data-reading command. Even the more sophisticated input -methods use @cmd{DATA LIST} commands as a building block. -Understanding @cmd{DATA LIST} is important to understanding how to use -PSPP to read your data files. - -There are two major variants of @cmd{DATA LIST}, which are fixed -format and free format. In addition, free format has a minor variant, -list format, which is discussed in terms of its differences from vanilla -free format. - -Each form of @cmd{DATA LIST} is described in detail below. - -@menu -* DATA LIST FIXED:: Fixed columnar locations for data. -* DATA LIST FREE:: Any spacing you like. -* DATA LIST LIST:: Each case must be on a single line. -@end menu - -@node DATA LIST FIXED, DATA LIST FREE, DATA LIST, DATA LIST -@subsection DATA LIST FIXED -@vindex DATA LIST FIXED -@cindex reading fixed-format data -@cindex fixed-format data, reading -@cindex data, fixed-format, reading -@cindex embedding fixed-format data - -@display -DATA LIST [FIXED] - @{TABLE,NOTABLE@} - FILE='filename' - RECORDS=record_count - END=end_var - /[line_no] var_spec@dots{} - -where each var_spec takes one of the forms - var_list start-end [type_spec] - var_list (fortran_spec) -@end display - -@cmd{DATA LIST FIXED} is used to read data files that have values at fixed -positions on each line of single-line or multiline records. The -keyword FIXED is optional. - -The FILE subcommand must be used if input is to be taken from an -external file. It may be used to specify a filename as a string or a -file handle (@pxref{FILE HANDLE}). If the FILE subcommand is not used, -then input is assumed to be specified within the command file using -@cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}). - -The optional RECORDS subcommand, which takes a single integer as an -argument, is used to specify the number of lines per record. If RECORDS -is not specified, then the number of lines per record is calculated from -the list of variable specifications later in @cmd{DATA LIST}. - -The END subcommand is only useful in conjunction with @cmd{INPUT -PROGRAM}. @xref{INPUT PROGRAM}, for details. - -@cmd{DATA LIST} can optionally output a table describing how the data file -will be read. The TABLE subcommand enables this output, and NOTABLE -disables it. The default is to output the table. - -The list of variables to be read from the data list must come last. -Each line in the data record is introduced by a slash (@samp{/}). -Optionally, a line number may follow the slash. Following, any number -of variable specifications may be present. - -Each variable specification consists of a list of variable names -followed by a description of their location on the input line. Sets of -variables may specified using the @code{DATA LIST} TO convention -(@pxref{Sets of -Variables}). There are two ways to specify the location of the variable -on the line: PSPP style and FORTRAN style. - -With PSPP style, the starting column and ending column for the field -are specified after the variable name, separated by a dash (@samp{-}). -For instance, the third through fifth columns on a line would be -specified @samp{3-5}. By default, variables are considered to be in -@samp{F} format (@pxref{Input/Output Formats}). (This default can be -changed; see @ref{SET} for more information.) - -When using PSPP style, to use a variable format other than the default, -specify the format type in parentheses after the column numbers. For -instance, for alphanumeric @samp{A} format, use @samp{(A)}. - -In addition, implied decimal places can be specified in parentheses -after the column numbers. As an example, suppose that a data file has a -field in which the characters @samp{1234} should be interpreted as -having the value 12.34. Then this field has two implied decimal places, -and the corresponding specification would be @samp{(2)}. If a field -that has implied decimal places contains a decimal point, then the -implied decimal places are not applied. - -Changing the variable format and adding implied decimal places can be -done together; for instance, @samp{(N,5)}. - -When using PSPP style, the input and output width of each variable is -computed from the field width. The field width must be evenly divisible -into the number of variables specified. - -FORTRAN style is an altogether different approach to specifying field -locations. With this approach, a list of variable input format -specifications, separated by commas, are placed after the variable names -inside parentheses. Each format specifier advances as many characters -into the input line as it uses. - -In addition to the standard format specifiers (@pxref{Input/Output -Formats}), FORTRAN style defines some extensions: - -@table @asis -@item @code{X} -Advance the current column on this line by one character position. - -@item @code{T}@var{x} -Set the current column on this line to column @var{x}, with column -numbers considered to begin with 1 at the left margin. - -@item @code{NEWREC}@var{x} -Skip forward @var{x} lines in the current record, resetting the active -column to the left margin. - -@item Repeat count -Any format specifier may be preceded by a number. This causes the -action of that format specifier to be repeated the specified number of -times. - -@item (@var{spec1}, @dots{}, @var{specN}) -Group the given specifiers together. This is most useful when preceded -by a repeat count. Groups may be nested arbitrarily. -@end table - -FORTRAN and PSPP styles may be freely intermixed. PSPP style leaves the -active column immediately after the ending column specified. Record -motion using @code{NEWREC} in FORTRAN style also applies to later -FORTRAN and PSPP specifiers. - -@menu -* DATA LIST FIXED Examples:: Examples of DATA LIST FIXED. -@end menu - -@node DATA LIST FIXED Examples, , DATA LIST FIXED, DATA LIST FIXED -@unnumberedsubsubsec Examples - -@enumerate -@item -@example -DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1). - -BEGIN DATA. -John Smith 102311 -Bob Arnold 122015 -Bill Yates 918 6 -END DATA. -@end example - -Defines the following variables: - -@itemize @bullet -@item -@code{NAME}, a 10-character-wide long string variable, in columns 1 -through 10. - -@item -@code{INFO1}, a numeric variable, in columns 12 through 13. - -@item -@code{INFO2}, a numeric variable, in columns 14 through 15. - -@item -@code{INFO3}, a numeric variable, in columns 16 through 17. -@end itemize - -The @code{BEGIN DATA}/@code{END DATA} commands cause three cases to be -defined: - -@example -Case NAME INFO1 INFO2 INFO3 - 1 John Smith 10 23 11 - 2 Bob Arnold 12 20 15 - 3 Bill Yates 9 18 6 -@end example - -The @code{TABLE} keyword causes PSPP to print out a table -describing the four variables defined. - -@item -@example -DAT LIS FIL="survey.dat" - /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A) - /Q01 TO Q50 7-56 - /. -@end example - -Defines the following variables: - -@itemize @bullet -@item -@code{ID}, a numeric variable, in columns 1-5 of the first record. - -@item -@code{NAME}, a 30-character long string variable, in columns 7-36 of the -first record. - -@item -@code{SURNAME}, a 30-character long string variable, in columns 38-67 of -the first record. - -@item -@code{MINITIAL}, a 1-character short string variable, in column 69 of -the first record. - -@item -Fifty variables @code{Q01}, @code{Q02}, @code{Q03}, @dots{}, @code{Q49}, -@code{Q50}, all numeric, @code{Q01} in column 7, @code{Q02} in column 8, -@dots{}, @code{Q49} in column 55, @code{Q50} in column 56, all in the second -record. -@end itemize - -Cases are separated by a blank record. - -Data is read from file @file{survey.dat} in the current directory. - -This example shows keywords abbreviated to their first 3 letters. - -@end enumerate - -@node DATA LIST FREE, DATA LIST LIST, DATA LIST FIXED, DATA LIST -@subsection DATA LIST FREE -@vindex DATA LIST FREE - -@display -DATA LIST FREE - [@{NOTABLE,TABLE@}] - FILE='filename' - END=end_var - /var_spec@dots{} - -where each var_spec takes one of the forms - var_list [(type_spec)] - var_list * -@end display - -In free format, the input data is structured as a series of comma- or -whitespace-delimited fields (end of line is one form of whitespace; it -is not treated specially). Field contents may be surrounded by matched -pairs of apostrophes (@samp{'}) or quotes (@samp{"}), or they may be -unenclosed. For any type of field leading white space (up to the -apostrophe or quote, if any) is not included in the field. - -Multiple consecutive delimiters are equivalent to a single delimiter. -To specify an empty field, write an empty set of single or double -quotes; for instance, @samp{""}. - -The NOTABLE and TABLE subcommands are as in @cmd{DATA LIST FIXED} above. -NOTABLE is the default. - -The FILE and END subcommands are as in @cmd{DATA LIST FIXED} above. - -The variables to be parsed are given as a single list of variable names. -This list must be introduced by a single slash (@samp{/}). The set of -variable names may contain format specifications in parentheses -(@pxref{Input/Output Formats}). Format specifications apply to all -variables back to the previous parenthesized format specification. - -In addition, an asterisk may be used to indicate that all variables -preceding it are to have input/output format @samp{F8.0}. - -Specified field widths are ignored on input, although all normal limits -on field width apply, but they are honored on output. - -@node DATA LIST LIST, , DATA LIST FREE, DATA LIST -@subsection DATA LIST LIST -@vindex DATA LIST LIST - -@display -DATA LIST LIST - [@{NOTABLE,TABLE@}] - FILE='filename' - END=end_var - /var_spec@dots{} - -where each var_spec takes one of the forms - var_list [(type_spec)] - var_list * -@end display - -With one exception, @cmd{DATA LIST LIST} is syntactically and -semantically equivalent to @cmd{DATA LIST FREE}. The exception is -that each input line is expected to correspond to exactly one input -record. If more or fewer fields are found on an input line than -expected, an appropriate diagnostic is issued. - -@node END CASE, END FILE, DATA LIST, Data Input and Output -@section END CASE -@vindex END CASE - -@display -END CASE. -@end display - -@cmd{END CASE} is used only within @cmd{INPUT PROGRAM} to output the -current case. @xref{INPUT PROGRAM}, for details. - -@node END FILE, FILE HANDLE, END CASE, Data Input and Output -@section END FILE -@vindex END FILE - -@display -END FILE. -@end display - -@cmd{END FILE} is used only within @cmd{INPUT PROGRAM} to terminate -the current input program. @xref{INPUT PROGRAM}. - -@node FILE HANDLE, INPUT PROGRAM, END FILE, Data Input and Output -@section FILE HANDLE -@vindex FILE HANDLE - -@display -FILE HANDLE handle_name - /NAME='filename' - /RECFORM=@{VARIABLE,FIXED,SPANNED@} - /LRECL=rec_len - /MODE=@{CHARACTER,IMAGE,BINARY,MULTIPUNCH,360@} -@end display - -Use @cmd{FILE HANDLE} to define the attributes of a file that does -not use conventional variable-length records terminated by newline -characters. - -Specify the file handle name as an identifier. Any given identifier may -only appear once in a PSPP run. File handles may not be reassigned to a -different file. The file handle name must immediately follow the @cmd{FILE -HANDLE} command name. - -The NAME subcommand specifies the name of the file associated with the -handle. It is the only required subcommand. - -The RECFORM subcommand specifies how the file is laid out. VARIABLE -specifies variable-length lines terminated with newlines, and it is the -default. FIXED specifies fixed-length records. SPANNED is not -supported. - -LRECL specifies the length of fixed-length records. It is required if -@code{/RECFORM FIXED} is specified. - -MODE specifies a file mode. CHARACTER, the default, causes the data -file to be opened in ANSI C text mode. BINARY causes the data file to -be opened in ANSI C binary mode. The other possibilities are not -supported. - -@node INPUT PROGRAM, LIST, FILE HANDLE, Data Input and Output -@section INPUT PROGRAM -@vindex INPUT PROGRAM - -@display -INPUT PROGRAM. -@dots{} input commands @dots{} -END INPUT PROGRAM. -@end display - -@cmd{INPUT PROGRAM}@dots{}@cmd{END INPUT PROGRAM} specifies a -complex input program. By placing data input commands within @cmd{INPUT -PROGRAM}, PSPP programs can take advantage of more complex file -structures than available with only @cmd{DATA LIST}. - -The first sort of extended input program is to simply put multiple @cmd{DATA -LIST} commands within the @cmd{INPUT PROGRAM}. This will cause all of -the data -files to be read in parallel. Input will stop when end of file is -reached on any of the data files. - -Transformations, such as conditional and looping constructs, can also be -included within @cmd{INPUT PROGRAM}. These can be used to combine input -from several data files in more complex ways. However, input will still -stop when end of file is reached on any of the data files. - -To prevent @cmd{INPUT PROGRAM} from terminating at the first end of -file, use -the END subcommand on @cmd{DATA LIST}. This subcommand takes a -variable name, -which should be a numeric scratch variable (@pxref{Scratch Variables}). -(It need not be a scratch variable but otherwise the results can be -surprising.) The value of this variable is set to 0 when reading the -data file, or 1 when end of file is encountered. - -Two additional commands are useful in conjunction with @cmd{INPUT PROGRAM}. -@cmd{END CASE} is the first. Normally each loop through the -@cmd{INPUT PROGRAM} -structure produces one case. @cmd{END CASE} controls exactly -when cases are output. When @cmd{END CASE} is used, looping from the end of -@cmd{INPUT PROGRAM} to the beginning does not cause a case to be output. - -@cmd{END FILE} is the second. When the END subcommand is used on @cmd{DATA -LIST}, there is no way for the @cmd{INPUT PROGRAM} construct to stop -looping, -so an infinite loop results. @cmd{END FILE}, when executed, -stops the flow of input data and passes out of the @cmd{INPUT PROGRAM} -structure. - -All this is very confusing. A few examples should help to clarify. - -@example -INPUT PROGRAM. - DATA LIST NOTABLE FILE='a.data'/X 1-10. - DATA LIST NOTABLE FILE='b.data'/Y 1-10. -END INPUT PROGRAM. -LIST. -@end example - -The example above reads variable X from file @file{a.data} and variable -Y from file @file{b.data}. If one file is shorter than the other then -the extra data in the longer file is ignored. - -@example -INPUT PROGRAM. - NUMERIC #A #B. - - DO IF NOT #A. - DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. - END IF. - DO IF NOT #B. - DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10. - END IF. - DO IF #A AND #B. - END FILE. - END IF. - END CASE. -END INPUT PROGRAM. -LIST. -@end example - -The above example reads variable X from @file{a.data} and variable Y from -@file{b.data}. If one file is shorter than the other then the missing -field is set to the system-missing value alongside the present value for -the remaining length of the longer file. - -@example -INPUT PROGRAM. - NUMERIC #A #B. - - DO IF #A. - DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10. - DO IF #B. - END FILE. - ELSE. - END CASE. - END IF. - ELSE. - DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. - DO IF NOT #A. - END CASE. - END IF. - END IF. -END INPUT PROGRAM. -LIST. -@end example - -The above example reads data from file @file{a.data}, then from -@file{b.data}, and concatenates them into a single active file. - -@example -INPUT PROGRAM. - NUMERIC #EOF. - - LOOP IF NOT #EOF. - DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10. - DO IF NOT #EOF. - END CASE. - END IF. - END LOOP. - - COMPUTE #EOF = 0. - LOOP IF NOT #EOF. - DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10. - DO IF NOT #EOF. - END CASE. - END IF. - END LOOP. - - END FILE. -END INPUT PROGRAM. -LIST. -@end example - -The above example does the same thing as the previous example, in a -different way. - -@example -INPUT PROGRAM. - LOOP #I=1 TO 50. - COMPUTE X=UNIFORM(10). - END CASE. - END LOOP. - END FILE. -END INPUT PROGRAM. -LIST/FORMAT=NUMBERED. -@end example - -The above example causes an active file to be created consisting of 50 -random variates between 0 and 10. - -@node LIST, MATRIX DATA, INPUT PROGRAM, Data Input and Output -@section LIST -@vindex LIST - -@display -LIST - /VARIABLES=var_list - /CASES=FROM start_index TO end_index BY incr_index - /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@} - @{NOWEIGHT,WEIGHT@} -@end display - -The @cmd{LIST} procedure prints the values of specified variables to the -listing file. - -The VARIABLES subcommand specifies the variables whose values are to be -printed. Keyword VARIABLES is optional. If VARIABLES subcommand is not -specified then all variables in the active file are printed. - -The CASES subcommand can be used to specify a subset of cases to be -printed. Specify FROM and the case number of the first case to print, -TO and the case number of the last case to print, and BY and the number -of cases to advance between printing cases, or any subset of those -settings. If CASES is not specified then all cases are printed. - -The FORMAT subcommand can be used to change the output format. NUMBERED -will print case numbers along with each case; UNNUMBERED, the default, -causes the case numbers to be omitted. The WRAP and SINGLE settings are -currently not used. WEIGHT will cause case weights to be printed along -with variable values; NOWEIGHT, the default, causes case weights to be -omitted from the output. - -Case numbers start from 1. They are counted after all transformations -have been considered. - -@cmd{LIST} attempts to fit all the values on a single line. If needed -to make them fit, variable names are displayed vertically. If values -cannot fit on a single line, then a multi-line format will be used. - -@cmd{LIST} is a procedure. It causes the data to be read. - -@node MATRIX DATA, NEW FILE, LIST, Data Input and Output -@section MATRIX DATA -@vindex MATRIX DATA - -@display -MATRIX DATA - /VARIABLES=var_list - /FILE='filename' - /FORMAT=@{LIST,FREE@} @{LOWER,UPPER,FULL@} @{DIAGONAL,NODIAGONAL@} - /SPLIT=@{new_var,var_list@} - /FACTORS=var_list - /CELLS=n_cells - /N=n - /CONTENTS=@{N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE, - DFE,MAT,COV,CORR,PROX@} -@end display - -@cmd{MATRIX DATA} command reads square matrices in one of several textual -formats. @cmd{MATRIX DATA} clears the dictionary and replaces it and -reads a -data file. - -Use VARIABLES to specify the variables that form the rows and columns of -the matrices. You may not specify a variable named @code{VARNAME_}. You -should specify VARIABLES first. - -Specify the file to read on FILE, either as a file name string or a file -handle (@pxref{FILE HANDLE}). If FILE is not specified then matrix data -must immediately follow @cmd{MATRIX DATA} with a @cmd{BEGIN -DATA}@dots{}@cmd{END DATA} -construct (@pxref{BEGIN DATA}). - -The FORMAT subcommand specifies how the matrices are formatted. LIST, -the default, indicates that there is one line per row of matrix data; -FREE allows single matrix rows to be broken across multiple lines. This -is analogous to the difference between @cmd{DATA LIST FREE} and -@cmd{DATA LIST LIST} -(@pxref{DATA LIST}). LOWER, the default, indicates that the lower -triangle of the matrix is given; UPPER indicates the upper triangle; and -FULL indicates that the entire matrix is given. DIAGONAL, the default, -indicates that the diagonal is part of the data; NODIAGONAL indicates -that it is omitted. DIAGONAL/NODIAGONAL have no effect when FULL is -specified. - -The SPLIT subcommand is used to specify @cmd{SPLIT FILE} variables for the -input matrices (@pxref{SPLIT FILE}). Specify either a single variable -not specified on VARIABLES, or one or more variables that are specified -on VARIABLES. In the former case, the SPLIT values are not present in -the data and ROWTYPE_ may not be specified on VARIABLES. In the latter -case, the SPLIT values are present in the data. - -Specify a list of factor variables on FACTORS. Factor variables must -also be listed on VARIABLES. Factor variables are used when there are -some variables where, for each possible combination of their values, -statistics on the matrix variables are included in the data. - -If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the -CELLS subcommand is required. Specify the number of factor variable -combinations that are given. For instance, if factor variable A has 2 -values and factor variable B has 3 values, specify 6. - -The N subcommand specifies a population number of observations. When N -is specified, one N record is output for each @cmd{SPLIT FILE}. - -Use CONTENTS to specify what sort of information the matrices include. -Each possible option is described in more detail below. When ROWTYPE_ -is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS -is not specified then /CONTENTS=CORR is assumed. - -@table @asis -@item N -@item N_VECTOR -Number of observations as a vector, one value for each variable. -@item N_SCALAR -Number of observations as a single value. -@item N_MATRIX -Matrix of counts. -@item MEAN -Vector of means. -@item STDDEV -Vector of standard deviations. -@item COUNT -Vector of counts. -@item MSE -Vector of mean squared errors. -@item DFE -Vector of degrees of freedom. -@item MAT -Generic matrix. -@item COV -Covariance matrix. -@item CORR -Correlation matrix. -@item PROX -Proximities matrix. -@end table - -The exact semantics of the matrices read by @cmd{MATRIX DATA} are complex. -Right now @cmd{MATRIX DATA} isn't too useful due to a lack of procedures -accepting or producing related data, so these semantics aren't -documented. Later, they'll be described here in detail. - -@node NEW FILE, PRINT, MATRIX DATA, Data Input and Output -@section NEW FILE -@vindex NEW FILE - -@display -NEW FILE. -@end display - -@cmd{NEW FILE} command clears the current active file. - -@node PRINT, PRINT EJECT, NEW FILE, Data Input and Output -@section PRINT -@vindex PRINT - -@display -PRINT - OUTFILE='filename' - RECORDS=n_lines - @{NOTABLE,TABLE@} - /[line_no] arg@dots{} - -arg takes one of the following forms: - 'string' [start-end] - var_list start-end [type_spec] - var_list (fortran_spec) - var_list * -@end display - -The @cmd{PRINT} transformation writes variable data to an output file. -@cmd{PRINT} is executed when a procedure causes the data to be read. -Follow @cmd{PRINT} by @cmd{EXECUTE} to print variable data without -invoking a procedure (@pxref{EXECUTE}). - -All @cmd{PRINT} subcommands are optional. - -The OUTFILE subcommand specifies the file to receive the output. The -file may be a file name as a string or a file handle (@pxref{FILE -HANDLE}). If OUTFILE is not present then output will be sent to PSPP's -output listing file. - -The RECORDS subcommand specifies the number of lines to be output. The -number of lines may optionally be surrounded by parentheses. - -TABLE will cause the PRINT command to output a table to the listing file -that describes what it will print to the output file. NOTABLE, the -default, suppresses this output table. - -Introduce the strings and variables to be printed with a slash -(@samp{/}). Optionally, the slash may be followed by a number -indicating which output line will be specified. In the absence of this -line number, the next line number will be specified. Multiple lines may -be specified using multiple slashes with the intended output for a line -following its respective slash. - -Literal strings may be printed. Specify the string itself. Optionally -the string may be followed by a column number or range of column -numbers, specifying the location on the line for the string to be -printed. Otherwise, the string will be printed at the current position -on the line. - -Variables to be printed can be specified in the same ways as available -for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}). In addition, a -variable -list may be followed by an asterisk (@samp{*}), which indicates that the -variables should be printed in their dictionary print formats, separated -by spaces. A variable list followed by a slash or the end of command -will be interpreted the same way. - -If a FORTRAN type specification is used to move backwards on the current -line, then text is written at that point on the line, the line will be -truncated to that length, although additional text being added will -again extend the line to that length. - -@node PRINT EJECT, PRINT SPACE, PRINT, Data Input and Output -@section PRINT EJECT -@vindex PRINT EJECT - -@display -PRINT EJECT - OUTFILE='filename' - RECORDS=n_lines - @{NOTABLE,TABLE@} - /[line_no] arg@dots{} - -arg takes one of the following forms: - 'string' [start-end] - var_list start-end [type_spec] - var_list (fortran_spec) - var_list * -@end display - -@cmd{PRINT EJECT} writes data to an output file. Before the data is -written, the current page in the listing file is ejected. - -@xref{PRINT}, for more information on syntax and usage. - -@node PRINT SPACE, REREAD, PRINT EJECT, Data Input and Output -@section PRINT SPACE -@vindex PRINT SPACE - -@display -PRINT SPACE OUTFILE='filename' n_lines. -@end display - -@cmd{PRINT SPACE} prints one or more blank lines to an output file. - -The OUTFILE subcommand is optional. It may be used to direct output to -a file specified by file name as a string or file handle (@pxref{FILE -HANDLE}). If OUTFILE is not specified then output will be directed to -the listing file. - -n_lines is also optional. If present, it is an expression -(@pxref{Expressions}) specifying the number of blank lines to be -printed. The expression must evaluate to a nonnegative value. - -@node REREAD, REPEATING DATA, PRINT SPACE, Data Input and Output -@section REREAD -@vindex REREAD - -@display -REREAD FILE=handle COLUMN=column. -@end display - -The @cmd{REREAD} transformation allows the previous input line in a -data file -already processed by @cmd{DATA LIST} or another input command to be re-read -for further processing. - -The FILE subcommand, which is optional, is used to specify the file to -have its line re-read. The file must be specified in the form of a file -handle (@pxref{FILE HANDLE}). If FILE is not specified then the last -file specified on @cmd{DATA LIST} will be assumed (last file specified -lexically, not in terms of flow-of-control). - -By default, the line re-read is re-read in its entirety. With the -COLUMN subcommand, a prefix of the line can be exempted from -re-reading. Specify an expression (@pxref{Expressions}) evaluating to -the first column that should be included in the re-read line. Columns -are numbered from 1 at the left margin. - -Issuing @code{REREAD} multiple times will not back up in the data -file. Instead, it will re-read the same line multiple times. - -@node REPEATING DATA, WRITE, REREAD, Data Input and Output -@section REPEATING DATA -@vindex REPEATING DATA - -@display -REPEATING DATA - /STARTS=start-end - /OCCURS=n_occurs - /FILE='filename' - /LENGTH=length - /CONTINUED[=cont_start-cont_end] - /ID=id_start-id_end=id_var - /@{TABLE,NOTABLE@} - /DATA=var_spec@dots{} - -where each var_spec takes one of the forms - var_list start-end [type_spec] - var_list (fortran_spec) -@end display - -@cmd{REPEATING DATA} parses groups of data repeating in -a uniform format, possibly with several groups on a single line. Each -group of data corresponds with one case. @cmd{REPEATING DATA} may only be -used within an @cmd{INPUT PROGRAM} structure (@pxref{INPUT PROGRAM}). -When used with @cmd{DATA LIST}, it -can be used to parse groups of cases that share a subset of variables -but differ in their other data. - -The STARTS subcommand is required. Specify a range of columns, using -literal numbers or numeric variable names. This range specifies the -columns on the first line that are used to contain groups of data. The -ending column is optional. If it is not specified, then the record -width of the input file is used. For the inline file (@pxref{BEGIN -DATA}) this is 80 columns; for a file with fixed record widths it is the -record width; for other files it is 1024 characters by default. - -The OCCURS subcommand is required. It must be a number or the name of a -numeric variable. Its value is the number of groups present in the -current record. - -The DATA subcommand is required. It must be the last subcommand -specified. It is used to specify the data present within each repeating -group. Column numbers are specified relative to the beginning of a -group at column 1. Data is specified in the same way as with @cmd{DATA LIST -FIXED} (@pxref{DATA LIST FIXED}). - -All other subcommands are optional. - -FILE specifies the file to read, either a file name as a string or a -file handle (@pxref{FILE HANDLE}). If FILE is not present then the -default is the last file handle used on @cmd{DATA LIST} (lexically, not in -terms of flow of control). - -By default @cmd{REPEATING DATA} will output a table describing how it will -parse the input data. Specifying NOTABLE will disable this behavior; -specifying TABLE will explicitly enable it. - -The LENGTH subcommand specifies the length in characters of each group. -If it is not present then length is inferred from the DATA subcommand. -LENGTH can be a number or a variable name. - -Normally all the data groups are expected to be present on a single -line. Use the CONTINUED command to indicate that data can be continued -onto additional lines. If data on continuation lines starts at the left -margin and continues through the entire field width, no column -specifications are necessary on CONTINUED. Otherwise, specify the -possible range of columns in the same way as on STARTS. - -When data groups are continued from line to line, it is easy -for cases to get out of sync through careless hand editing. The -ID subcommand allows a case identifier to be present on each line of -repeating data groups. @cmd{REPEATING DATA} will check for the same -identifier on each line and report mismatches. Specify the range of -columns that the identifier will occupy, followed by an equals sign -(@samp{=}) and the identifier variable name. The variable must already -have been declared with @cmd{NUMERIC} or another command. - -@node WRITE, , REPEATING DATA, Data Input and Output -@section WRITE -@vindex WRITE - -@display -WRITE - OUTFILE='filename' - RECORDS=n_lines - @{NOTABLE,TABLE@} - /[line_no] arg@dots{} - -arg takes one of the following forms: - 'string' [start-end] - var_list start-end [type_spec] - var_list (fortran_spec) - var_list * -@end display - -@code{WRITE} writes text or binary data to an output file. - -@xref{PRINT}, for more information on syntax and usage. The main -difference between @code{PRINT} and @code{WRITE} is that @cmd{WRITE} -uses write formats by default, where PRINT uses print formats. - -The sole additional difference is that if @cmd{WRITE} is used to send output -to a binary file, carriage control characters will not be output. -@xref{FILE HANDLE}, for information on how to declare a file as binary. - -@node System and Portable Files, Variable Attributes, Data Input and Output, Top -@chapter System Files and Portable Files - -The commands in this chapter read, write, and examine system files and -portable files. - -@menu -* APPLY DICTIONARY:: Apply system file dictionary to active file. -* EXPORT:: Write to a portable file. -* GET:: Read from a system file. -* IMPORT:: Read from a portable file. -* MATCH FILES:: Merge system files. -* SAVE:: Write to a system file. -* SYSFILE INFO:: Display system file dictionary. -* XSAVE:: Write to a system file, as a transform. -@end menu - -@node APPLY DICTIONARY, EXPORT, System and Portable Files, System and Portable Files -@section APPLY DICTIONARY -@vindex APPLY DICTIONARY - -@display -APPLY DICTIONARY FROM='filename'. -@end display - -@cmd{APPLY DICTIONARY} applies the variable labels, value labels, -and missing values from variables in a system file to corresponding -variables in the active file. In some cases it also updates the -weighting variable. - -Specify a system file with a file name string or as a file handle -(@pxref{FILE HANDLE}). The dictionary in the system file will be read, -but it will not replace the active file dictionary. The system file's -data will not be read. - -Only variables with names that exist in both the active file and the -system file are considered. Variables with the same name but different -types (numeric, string) will cause an error message. Otherwise, the -system file variables' attributes will replace those in their matching -active file variables, as described below. - -If a system file variable has a variable label, then it will replace the -active file variable's variable label. If the system file variable does -not have a variable label, then the active file variable's variable -label, if any, will be retained. - -If the active file variable is numeric or short string, then value -labels and missing values, if any, will be copied to the active file -variable. If the system file variable does not have value labels or -missing values, then those in the active file variable, if any, will not -be disturbed. - -Finally, weighting of the active file is updated (@pxref{WEIGHT}). If -the active file has a weighting variable, and the system file does not, -or if the weighting variable in the system file does not exist in the -active file, then the active file weighting variable, if any, is -retained. Otherwise, the weighting variable in the system file becomes -the active file weighting variable. - -@cmd{APPLY DICTIONARY} takes effect immediately. It does not read the -active -file. The system file is not modified. - -@node EXPORT, GET, APPLY DICTIONARY, System and Portable Files -@section EXPORT -@vindex EXPORT - -@display -EXPORT - /OUTFILE='filename' - /DROP=var_list - /KEEP=var_list - /RENAME=(src_names=target_names)@dots{} -@end display - -The @cmd{EXPORT} procedure writes the active file dictionary and data to a -specified portable file. - -The OUTFILE subcommand, which is the only required subcommand, specifies -the portable file to be written as a file name string or a file handle -(@pxref{FILE HANDLE}). - -DROP, KEEP, and RENAME follow the same format as the SAVE procedure -(@pxref{SAVE}). - -@cmd{EXPORT} is a procedure. It causes the active file to be read. - -@node GET, IMPORT, EXPORT, System and Portable Files -@section GET -@vindex GET - -@display -GET - /FILE='filename' - /DROP=var_list - /KEEP=var_list - /RENAME=(src_names=target_names)@dots{} -@end display - -@cmd{GET} clears the current dictionary and active file and -replaces them with the dictionary and data from a specified system file. - -The FILE subcommand is the only required subcommand. Specify the system -file to be read as a string file name or a file handle (@pxref{FILE -HANDLE}). - -By default, all the variables in a system file are read. The DROP -subcommand can be used to specify a list of variables that are not to be -read. By contrast, the KEEP subcommand can be used to specify variable -that are to be read, with all other variables not read. - -Normally variables in a system file retain the names that they were -saved under. Use the RENAME subcommand to change these names. Specify, -within parentheses, a list of variable names followed by an equals sign -(@samp{=}) and the names that they should be renamed to. Multiple -parenthesized groups of variable names can be included on a single -RENAME subcommand. Variables' names may be swapped using a RENAME -subcommand of the form @samp{/RENAME=(A B=B A)}. - -Alternate syntax for the RENAME subcommand allows the parentheses to be -eliminated. When this is done, only a single variable may be renamed at -once. For instance, @samp{/RENAME=A=B}. This alternate syntax is -deprecated. - -DROP, KEEP, and RENAME are performed in left-to-right order. They -each may be present any number of times. @cmd{GET} never modifies a -system file on disk. Only the active file read from the system file -is affected by these subcommands. - -@cmd{GET} does not cause the data to be read, only the dictionary. The data -is read later, when a procedure is executed. - -@node IMPORT, MATCH FILES, GET, System and Portable Files -@section IMPORT -@vindex IMPORT - -@display -IMPORT - /FILE='filename' - /TYPE=@{COMM,TAPE@} - /DROP=var_list - /KEEP=var_list - /RENAME=(src_names=target_names)@dots{} -@end display - -The @cmd{IMPORT} transformation clears the active file dictionary and -data and -replaces them with a dictionary and data from a portable file on disk. - -The FILE subcommand, which is the only required subcommand, specifies -the portable file to be read as a file name string or a file handle -(@pxref{FILE HANDLE}). - -The TYPE subcommand is currently not used. - -DROP, KEEP, and RENAME follow the syntax used by @cmd{GET} (@pxref{GET}). - -@cmd{IMPORT} does not cause the data to be read, only the dictionary. The -data is read later, when a procedure is executed. - -@node MATCH FILES, SAVE, IMPORT, System and Portable Files -@section MATCH FILES -@vindex MATCH FILES - -@display -MATCH FILES - /BY var_list - /@{FILE,TABLE@}=@{*,'filename'@} - /DROP=var_list - /KEEP=var_list - /RENAME=(src_names=target_names)@dots{} - /IN=var_name - /FIRST=var_name - /LAST=var_name - /MAP -@end display - -@cmd{MATCH FILES} merges one or more system files, optionally -including the active file. Records with the same values for BY -variables are combined into a single record. Records with different -values are output in order. Thus, multiple sorted system files are -combined into a single sorted system file based on the value of the BY -variables. - -The BY subcommand specifies a list of variables that are used to match -records from each of the system files. Variables specified must exist -in all the files specified on FILE and TABLE. BY should usually be -specified. If TABLE is used then BY is required. - -Specify FILE with a system file as a file name string or file handle -(@pxref{FILE HANDLE}), or with an asterisk (@samp{*}) to -indicate the current active file. The files specified on FILE are -merged together based on the BY variables, or combined case-by-case if -BY is not specified. Normally at least two FILE subcommands should be -specified. - -Specify TABLE with a system file to use it as a @dfn{table -lookup file}. Records in table lookup files are not used up after -they've been used once. This means that data in table lookup files can -correspond to any number of records in FILE files. Table lookup files -correspond to lookup tables in traditional relational database systems. -It is incorrect to have records with duplicate BY values in table lookup -files. - -Any number of FILE and TABLE subcommands may be specified. Each -instance of FILE or TABLE can be followed by DROP, KEEP, and/or RENAME -subcommands. These take the same form as the corresponding subcommands -of @cmd{GET} (@pxref{GET}), and perform the same functions. - -Variables belonging to files that are not present for the current case -are set to the system-missing value for numeric variables or spaces for -string variables. - -IN, FIRST, LAST, and MAP are currently not used. - -@node SAVE, SYSFILE INFO, MATCH FILES, System and Portable Files -@section SAVE -@vindex SAVE - -@display -SAVE - /OUTFILE='filename' - /@{COMPRESSED,UNCOMPRESSED@} - /DROP=var_list - /KEEP=var_list - /RENAME=(src_names=target_names)@dots{} -@end display - -The @cmd{SAVE} procedure causes the dictionary and data in the active -file to -be written to a system file. - -FILE is the only required subcommand. Specify the system -file to be written as a string file name or a file handle (@pxref{FILE -HANDLE}). - -The COMPRESS and UNCOMPRESS subcommand determine whether the saved -system file is compressed. By default, system files are compressed. -This default can be changed with the SET command (@pxref{SET}). - -By default, all the variables in the active file dictionary are written -to the system file. The DROP subcommand can be used to specify a list -of variables not to be written. In contrast, KEEP specifies variables -to be written, with all variables not specified not written. - -Normally variables are saved to a system file under the same names they -have in the active file. Use the RENAME subcommand to change these names. -Specify, within parentheses, a list of variable names followed by an -equals sign (@samp{=}) and the names that they should be renamed to. -Multiple parenthesized groups of variable names can be included on a -single RENAME subcommand. Variables' names may be swapped using a -RENAME subcommand of the form @samp{/RENAME=(A B=B A)}. - -Alternate syntax for the RENAME subcommand allows the parentheses to be -eliminated. When this is done, only a single variable may be renamed at -once. For instance, @samp{/RENAME=A=B}. This alternate syntax is -deprecated. - -DROP, KEEP, and RENAME are performed in left-to-right order. They -each may be present any number of times. @cmd{SAVE} never modifies -the active file. DROP, KEEP, and RENAME only affect the system file -written to disk. - -@cmd{SAVE} causes the data to be read. It is a procedure. - -@node SYSFILE INFO, XSAVE, SAVE, System and Portable Files -@section SYSFILE INFO -@vindex SYSFILE INFO - -@display -SYSFILE INFO FILE='filename'. -@end display - -@cmd{SYSFILE INFO} reads the dictionary in a system file and -displays the information in its dictionary. - -Specify a file name or file handle. @cmd{SYSFILE INFO} reads that file as -a system file and displays information on its dictionary. - -@cmd{SYSFILE INFO} does not affect the current active file. - -@node XSAVE, , SYSFILE INFO, System and Portable Files -@section XSAVE -@vindex XSAVE - -@display -XSAVE - /FILE='filename' - /@{COMPRESSED,UNCOMPRESSED@} - /DROP=var_list - /KEEP=var_list - /RENAME=(src_names=target_names)@dots{} -@end display - -The @cmd{XSAVE} transformation writes the active file dictionary and -data to a -system file stored on disk. - -@cmd{XSAVE} is a transformation, not a procedure. It is executed when the -data is read by a procedure or procedure-like command. In all other -respects, @cmd{XSAVE} is identical to @cmd{SAVE}. @xref{SAVE}, for -more information -on syntax and usage. - -@node Variable Attributes, Data Manipulation, System and Portable Files, Top -@chapter Manipulating variables - -The variables in the active file dictionary are important. There are -several utility functions for examining and adjusting them. - -@menu -* ADD VALUE LABELS:: Add value labels to variables. -* DISPLAY:: Display variable names & descriptions. -* DISPLAY VECTORS:: Display a list of vectors. -* FORMATS:: Set print and write formats. -* LEAVE:: Don't clear variables between cases. -* MISSING VALUES:: Set missing values for variables. -* MODIFY VARS:: Rename, reorder, and drop variables. -* NUMERIC:: Create new numeric variables. -* PRINT FORMATS:: Set variable print formats. -* RENAME VARIABLES:: Rename variables. -* VALUE LABELS:: Set value labels for variables. -* STRING:: Create new string variables. -* VARIABLE LABELS:: Set variable labels for variables. -* VECTOR:: Declare an array of variables. -* WRITE FORMATS:: Set variable write formats. -@end menu - -@node ADD VALUE LABELS, DISPLAY, Variable Attributes, Variable Attributes -@section ADD VALUE LABELS -@vindex ADD VALUE LABELS - -@display -ADD VALUE LABELS - /var_list value 'label' [value 'label']@dots{} -@end display - -@cmd{ADD VALUE LABELS} has the same syntax and purpose as @cmd{VALUE -LABELS} (@pxref{VALUE LABELS}), but it does not clear value -labels from the variables before adding the ones specified. - -@node DISPLAY, DISPLAY VECTORS, ADD VALUE LABELS, Variable Attributes -@section DISPLAY -@vindex DISPLAY - -@display -DISPLAY @{NAMES,INDEX,LABELS,VARIABLES,DICTIONARY,SCRATCH@} - [SORTED] [var_list] -@end display - -@cmd{DISPLAY} displays requested information on variables. Variables can -optionally be sorted alphabetically. The entire dictionary or just -specified variables can be described. - -One of the following keywords can be present: - -@table @asis -@item NAMES -The variables' names are displayed. - -@item INDEX -The variables' names are displayed along with a value describing their -position within the active file dictionary. - -@item LABELS -Variable names, positions, and variable labels are displayed. - -@item VARIABLES -Variable names, positions, print and write formats, and missing values -are displayed. - -@item DICTIONARY -Variable names, positions, print and write formats, missing values, -variable labels, and value labels are displayed. - -@item SCRATCH -Varible names are displayed, for scratch variables only (@pxref{Scratch -Variables}). -@end table - -If SORTED is specified, then the variables are displayed in ascending -order based on their names; otherwise, they are displayed in the order -that they occur in the active file dictionary. - -@node DISPLAY VECTORS, FORMATS, DISPLAY, Variable Attributes -@section DISPLAY VECTORS -@vindex DISPLAY VECTORS - -@display -DISPLAY VECTORS. -@end display - -@cmd{DISPLAY VECTORS} lists all the currently declared vectors. - -@node FORMATS, LEAVE, DISPLAY VECTORS, Variable Attributes -@section FORMATS -@vindex FORMATS - -@display -FORMATS var_list (fmt_spec). -@end display - -@cmd{FORMATS} set both print and write formats for the specified -variables to the specified format specification. @xref{Input/Output -Formats}. - -Specify a list of variables followed by a format specification in -parentheses. The print and write formats of the specified variables -will be changed. - -Additional lists of variables and formats may be included if they are -delimited by a slash (@samp{/}). - -@cmd{FORMATS} takes effect immediately. It is not affected by -conditional and looping structures such as @cmd{DO IF} or @cmd{LOOP}. - -@node LEAVE, MISSING VALUES, FORMATS, Variable Attributes -@section LEAVE -@vindex LEAVE - -@display -LEAVE var_list. -@end display - -@cmd{LEAVE} prevents the specified variables from being -reinitialized whenever a new case is processed. - -Normally, when a data file is processed, every variable in the active -file is initialized to the system-missing value or spaces at the -beginning of processing for each case. When a variable has been -specified on @cmd{LEAVE}, this is not the case. Instead, that variable is -initialized to 0 (not system-missing) or spaces for the first case. -After that, it retains its value between cases. - -This becomes useful for counters. For instance, in the example below -the variable SUM maintains a running total of the values in the ITEM -variable. - -@example -DATA LIST /ITEM 1-3. -COMPUTE SUM=SUM+ITEM. -PRINT /ITEM SUM. -LEAVE SUM -BEGIN DATA. -123 -404 -555 -999 -END DATA. -@end example - -@noindent Partial output from this example: - -@example -123 123.00 -404 527.00 -555 1082.00 -999 2081.00 -@end example - -It is best to use @cmd{LEAVE} command immediately before invoking a -procedure command, because the left status of variables is reset by -certain transformations---for instance, @cmd{COMPUTE} and @cmd{IF}. -Left status is also reset by all procedure invocations. - -@node MISSING VALUES, MODIFY VARS, LEAVE, Variable Attributes -@section MISSING VALUES -@vindex MISSING VALUES - -@display -MISSING VALUES var_list (missing_values). - -missing_values takes one of the following forms: - num1 - num1, num2 - num1, num2, num3 - num1 THRU num2 - num1 THRU num2, num3 - string1 - string1, string2 - string1, string2, string3 -As part of a range, LO or LOWEST may take the place of num1; -HI or HIGHEST may take the place of num2. -@end display - -@cmd{MISSING VALUES} sets user-missing values for numeric and -short string variables. Long string variables may not have missing -values. - -Specify a list of variables, followed by a list of their user-missing -values in parentheses. Up to three discrete values may be given, or, -for numeric variables only, a range of values optionally accompanied by -a single discrete value. Ranges may be open-ended on one end, indicated -through the use of the keyword LO or LOWEST or HI or HIGHEST. - -The @cmd{MISSING VALUES} command takes effect immediately. It is not -affected by conditional and looping constructs such as @cmd{DO IF} or -@cmd{LOOP}. - -@node MODIFY VARS, NUMERIC, MISSING VALUES, Variable Attributes -@section MODIFY VARS -@vindex MODIFY VARS - -@display -MODIFY VARS - /REORDER=@{FORWARD,BACKWARD@} @{POSITIONAL,ALPHA@} (var_list)@dots{} - /RENAME=(old_names=new_names)@dots{} - /@{DROP,KEEP@}=var_list - /MAP -@end display - -@cmd{MODIFY VARS} reorders, renames, and deletes variables in the -active file. - -At least one subcommand must be specified, and no subcommand may be -specified more than once. DROP and KEEP may not both be specified. - -The REORDER subcommand changes the order of variables in the active -file. Specify one or more lists of variable names in parentheses. By -default, each list of variables is rearranged into the specified order. -To put the variables into the reverse of the specified order, put -keyword BACKWARD before the parentheses. To put them into alphabetical -order in the dictionary, specify keyword ALPHA before the parentheses. -BACKWARD and ALPHA may also be combined. - -To rename variables in the active file, specify RENAME, an equals sign -(@samp{=}), and lists of the old variable names and new variable names -separated by another equals sign within parentheses. There must be the -same number of old and new variable names. Each old variable is renamed to -the corresponding new variable name. Multiple parenthesized groups of -variables may be specified. - -The DROP subcommand deletes a specified list of variables from the -active file. - -The KEEP subcommand keeps the specified list of variables in the active -file. Any unlisted variables are deleted from the active file. - -MAP is currently ignored. - -If either DROP or KEEP is specified, the data is read; otherwise it is -not. - -@node NUMERIC, PRINT FORMATS, MODIFY VARS, Variable Attributes -@section NUMERIC -@vindex NUMERIC - -@display -NUMERIC /var_list [(fmt_spec)]. -@end display - -@cmd{NUMERIC} explicitly declares new numeric variables, optionally -setting their output formats. - -Specify a slash (@samp{/}), followed by the names of the new numeric -variables. If you wish to set their output formats, follow their names -by an output format specification in parentheses (@pxref{Input/Output -Formats}); otherwise, the default is F8.2. - -Variables created with @cmd{NUMERIC} are initialized to the -system-missing value. - -@node PRINT FORMATS, RENAME VARIABLES, NUMERIC, Variable Attributes -@section PRINT FORMATS -@vindex PRINT FORMATS - -@display -PRINT FORMATS var_list (fmt_spec). -@end display - -@cmd{PRINT FORMATS} sets the print formats for the specified -variables to the specified format specification. - -Its syntax is identical to that of @cmd{FORMATS} (@pxref{FORMATS}), -but @cmd{PRINT FORMATS} sets only print formats, not write formats. - -@node RENAME VARIABLES, VALUE LABELS, PRINT FORMATS, Variable Attributes -@section RENAME VARIABLES -@vindex RENAME VARIABLES - -@display -RENAME VARIABLES (old_names=new_names)@dots{} . -@end display - -@cmd{RENAME VARIABLES} changes the names of variables in the active -file. Specify lists of the old variable names and new -variable names, separated by an equals sign (@samp{=}), within -parentheses. There must be the same number of old and new variable -names. Each old variable is renamed to the corresponding new variable -name. Multiple parenthesized groups of variables may be specified. - -@cmd{RENAME VARIABLES} takes effect immediately. It does not cause the data -to be read. - -@node VALUE LABELS, STRING, RENAME VARIABLES, Variable Attributes -@section VALUE LABELS -@vindex VALUE LABELS - -@display -VALUE LABELS - /var_list value 'label' [value 'label']@dots{} -@end display - -@cmd{VALUE LABELS} allows values of numeric and short string -variables to be associated with labels. In this way, a short value can -stand for a long value. - -To set up value labels for a set of variables, specify the -variable names after a slash (@samp{/}), followed by a list of values -and their associated labels, separated by spaces. Long string -variables may not be specified. - -Before @cmd{VALUE LABELS} is executed, any existing value labels -are cleared from the variables specified. Use @cmd{ADD VALUE LABELS} -(@pxref{ADD VALUE LABELS}) to add value labels without clearing those -already present. - -@node STRING, VARIABLE LABELS, VALUE LABELS, Variable Attributes -@section STRING -@vindex STRING - -@display -STRING /var_list (fmt_spec). -@end display - -@cmd{STRING} creates new string variables for use in -transformations. - -Specify a slash (@samp{/}), followed by the names of the string -variables to create and the desired output format specification in -parentheses (@pxref{Input/Output Formats}). Variable widths are -implicitly derived from the specified output formats. - -Created variables are initialized to spaces. - -@node VARIABLE LABELS, VECTOR, STRING, Variable Attributes -@section VARIABLE LABELS -@vindex VARIABLE LABELS - -@display -VARIABLE LABELS - /var_list 'var_label'. -@end display - -@cmd{VARIABLE LABELS} associates explanatory names -with variables. This name, called a @dfn{variable label}, is displayed by -statistical procedures. - -To assign a variable label to a group of variables, specify a slash -(@samp{/}), followed by the list of variable names and the variable -label as a string. - -@node VECTOR, WRITE FORMATS, VARIABLE LABELS, Variable Attributes -@section VECTOR -@vindex VECTOR - -@display -Two possible syntaxes: - VECTOR vec_name=var_list. - VECTOR vec_name_list(count). -@end display - -@cmd{VECTOR} allows a group of variables to be accessed as if they -were consecutive members of an array with a vector(index) notation. - -To make a vector out of a set of existing variables, specify a name for -the vector followed by an equals sign (@samp{=}) and the variables that -belong in the vector. - -To make a vector and create variables at the same time, specify one or -more vector names followed by a count in parentheses. This will cause -variables named @code{@var{vec}1} through @code{@var{vec}@var{count}} -to be created as numeric variables with print and write format F8.2. -Variable names including numeric suffixes may not exceed 8 characters -in length, and none of the variables may exist prior to @cmd{VECTOR}. - -All the variables in a vector must be the same type. - -Vectors created with @cmd{VECTOR} disappear after any procedure or -procedure-like command is executed. The variables contained in the -vectors remain, unless they are scratch variables (@pxref{Scratch -Variables}). - -Variables within a vector may be references in expressions using -@code{vector(index)} syntax. - -@node WRITE FORMATS, , VECTOR, Variable Attributes -@section WRITE FORMATS -@vindex WRITE FORMATS - -@display -WRITE FORMATS var_list (fmt_spec). -@end display - -@cmd{WRITE FORMATS} sets the write formats for the specified variables -to the specified format specification. Its syntax is identical to -that of FORMATS (@pxref{FORMATS}), but @cmd{WRITE FORMATS} sets only -write formats, not print formats. - -@node Data Manipulation, Data Selection, Variable Attributes, Top -@chapter Data transformations -@cindex transformations - -The PSPP procedures examined in this chapter manipulate data and -prepare the active file for later analyses. They do not produce output, -as a rule. - -@menu -* AGGREGATE:: Summarize multiple cases into a single case. -* AUTORECODE:: Automatic recoding of variables. -* COMPUTE:: Assigning a variable a calculated value. -* COUNT:: Counting variables with particular values. -* FLIP:: Exchange variables with cases. -* IF:: Conditionally assigning a calculated value. -* RECODE:: Mapping values from one set to another. -* SORT CASES:: Sort the active file. -@end menu - -@node AGGREGATE, AUTORECODE, Data Manipulation, Data Manipulation -@section AGGREGATE -@vindex AGGREGATE - -@display -AGGREGATE - /BREAK=var_list - /PRESORTED - /OUTFILE=@{*,'filename'@} - /DOCUMENT - /MISSING=COLUMNWISE - /dest_vars=agr_func(src_vars, args@dots{})@dots{} -@end display - -@cmd{AGGREGATE} summarizes groups of cases into single cases. -Cases are divided into groups that have the same values for one or more -variables called @dfn{break variables}. Several functions are available -for summarizing case contents. - -At least one break variable must be specified on BREAK, the only -required subcommand. The values of these variables are used to divide -the active file into groups to be summarized. In addition, at least -one @var{dest_var} must be specified. - -By default, the active file is sorted based on the break variables -before aggregation takes place. If the active file is already sorted -or otherwise grouped in terms of the break variables, specify -PRESORTED to save time. - -The OUTFILE subcommand specifies a system file by file name string or -file handle (@pxref{FILE HANDLE}). The aggregated cases are written to -this file. If OUTFILE is not specified, or if @samp{*} is specified, -then the aggregated cases replace the active file. - -Specify DOCUMENT to copy the documents from the active file into the -aggregate file (@pxref{DOCUMENT}). Otherwise, the aggregate file will -not contain any documents, even if the aggregate file replaces the -active file. - -One or more sets of aggregation variables must be specified. Each set -comprises a list of aggregation variables, an equals sign (@samp{=}), -the name of an aggregation function (see the list below), and a list -of source variables in parentheses. Some aggregation functions expect -additional arguments following the source variable names. - -Each set must have exactly as many source variables as aggregation -variables. Each aggregation variable receives the results of applying -the specified aggregation function to the corresponding source -variable. Most aggregation functions may be applied to numeric and -short and long string variables. Others, marked below, are restricted -to numeric values. - -The available aggregation functions are as follows: - -@table @asis -@item SUM(var_name) -Sum. Limited to numeric values. -@item MEAN(var_name) -Arithmetic mean. Limited to numeric values. -@item SD(var_name) -Standard deviation of the mean. Limited to numeric values. -@item MAX(var_name) -Maximum value. -@item MIN(var_name) -Minimum value. -@item FGT(var_name, value) -@itemx PGT(var_name, value) -Fraction between 0 and 1, or percentage between 0 and 100, respectively, -of values greater than the specified constant. -@item FLT(var_name, value) -@itemx PLT(var_name, value) -Fraction or percentage, respectively, of values less than the specified -constant. -@item FIN(var_name, low, high) -@itemx PIN(var_name, low, high) -Fraction or percentage, respectively, of values within the specified -inclusive range of constants. -@item FOUT(var_name, low, high) -@itemx POUT(var_name, low, high) -Fraction or percentage, respectively, of values strictly outside the -specified range of constants. -@item N(var_name) -Number of non-missing values. -@item N -Number of cases aggregated to form this group. Don't supply a source -variable for this aggregation function. -@item NU(var_name) -Number of non-missing values. Each case is considered to have a weight -of 1, regardless of the current weighting variable (@pxref{WEIGHT}). -@item NU -Number of cases aggregated to form this group. Each case is considered -to have a weight of 1, regardless of the current weighting variable. -@item NMISS(var_name) -Number of missing values. -@item NUMISS(var_name) -Number of missing values. Each case is considered to have a weight of -1, regardless of the current weighting variable. -@item FIRST(var_name) -First value in this group. -@item LAST(var_name) -Last value in this group. -@end table - -Aggregation functions compare string values in terms of internal -character codes. On most modern computers, this is a form of ASCII. - -The aggregation functions listed above exclude all user-missing values -from calculations. To include user-missing values, insert a period -(@samp{.}) between the function name and left parenthesis -(e.g.~@samp{SUM.}). - -Normally, only a single case (for SD and SD., two cases) need be -non-missing in each group for the aggregate variable to be -non-missing. Specifying /MISSING=COLUMNWISE inverts this behavior, so -that the aggregate variable becomes missing if any aggregated value is -missing. - -@cmd{AGGREGATE} both ignores and cancels the current @cmd{SPLIT FILE} -settings (@pxref{SPLIT FILE}). - -@node AUTORECODE, COMPUTE, AGGREGATE, Data Manipulation -@section AUTORECODE -@vindex AUTORECODE - -@display -AUTORECODE VARIABLES=src_vars INTO dest_vars - /DESCENDING - /PRINT -@end display - -The @cmd{AUTORECODE} procedure considers the @var{n} values that a variable -takes on and maps them onto values 1@dots{}@var{n} on a new numeric -variable. - -Subcommand VARIABLES is the only required subcommand and must come -first. Specify VARIABLES, an equals sign (@samp{=}), a list of source -variables, INTO, and a list of target variables. There must the same -number of source and target variables. The target variables must not -already exist. - -By default, increasing values of a source variable (for a string, this -is based on character code comparisons) are recoded to increasing values -of its target variable. To cause increasing values of a source variable -to be recoded to decreasing values of its target variable (@var{n} down -to 1), specify DESCENDING. - -PRINT is currently ignored. - -@cmd{AUTORECODE} is a procedure. It causes the data to be read. - -@node COMPUTE, COUNT, AUTORECODE, Data Manipulation -@section COMPUTE -@vindex COMPUTE - -@display -COMPUTE variable = expression. - or -COMPUTE vector(index) = expression. -@end display - -@cmd{COMPUTE} assigns the value of an expression to a target -variable. For each case, the expression is evaluated and its value -assigned to the target variable. Numeric and short and long string -variables may be assigned. When a string expression's width differs -from the target variable's width, the string result of the expression -is truncated or padded with spaces on the right as necessary. The -expression and variable types must match. - -For numeric variables only, the target variable need not already -exist. Numeric variables created by @cmd{COMPUTE} are assigned an -@code{F8.2} output format. String variables must be declared before -they can be used as targets for @cmd{COMPUTE}. - -The target variable may be specified as an element of a vector -(@pxref{VECTOR}). In this case, a vector index expression must be -specified in parentheses following the vector name. The index -expression must evaluate to a numeric value that, after rounding down -to the nearest integer, is a valid index for the named vector. - -Using @cmd{COMPUTE} to assign to a variable specified on @cmd{LEAVE} -(@pxref{LEAVE}) resets the variable's left state. Therefore, -@code{LEAVE} should be specified following @cmd{COMPUTE}, not before. - -COMPUTE is a transformation. It does not cause the active file to be -read. - -@node COUNT, FLIP, COMPUTE, Data Manipulation -@section COUNT -@vindex COUNT - -@display -COUNT var_name = var@dots{} (value@dots{}). - -Each value takes one of the following forms: - number - string - num1 THRU num2 - MISSING - SYSMIS -In addition, num1 and num2 can be LO or LOWEST, or HI or HIGHEST, -respectively. -@end display - -@cmd{COUNT} creates or replaces a numeric @dfn{target} variable that -counts the occurrence of a @dfn{criterion} value or set of values over -one or more @dfn{test} variables for each case. - -The target variable values are always nonnegative integers. They are -never missing. The target variable is assigned an F8.2 output format. -@xref{Input/Output Formats}. Any variables, including long and short -string variables, may be test variables. - -User-missing values of test variables are treated just like any other -values. They are @strong{not} treated as system-missing values. -User-missing values that are criterion values or inside ranges of -criterion values are counted as any other values. However (for numeric -variables), keyword MISSING may be used to refer to all system- -and user-missing values. - -@cmd{COUNT} target variables are assigned values in the order -specified. In the command @code{COUNT A=A B(1) /B=A B(2).}, the -following actions occur: - -@itemize @minus -@item -The number of occurrences of 1 between @code{A} and @code{B} is counted. - -@item -@code{A} is assigned this value. - -@item -The number of occurrences of 1 between @code{B} and the @strong{new} -value of @code{A} is counted. - -@item -@code{B} is assigned this value. -@end itemize - -Despite this ordering, all @cmd{COUNT} criterion variables must exist -before the procedure is executed---they may not be created as target -variables earlier in the command! Break such a command into two -separate commands. - -The examples below may help to clarify. - -@enumerate A -@item -Assuming @code{Q0}, @code{Q2}, @dots{}, @code{Q9} are numeric variables, -the following commands: - -@enumerate -@item -Count the number of times the value 1 occurs through these variables -for each case and assigns the count to variable @code{QCOUNT}. - -@item -Print out the total number of times the value 1 occurs throughout -@emph{all} cases using @cmd{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for -details. -@end enumerate - -@example -COUNT QCOUNT=Q0 TO Q9(1). -DESCRIPTIVES QCOUNT /STATISTICS=SUM. -@end example - -@item -Given these same variables, the following commands: - -@enumerate -@item -Count the number of valid values of these variables for each case and -assigns the count to variable @code{QVALID}. - -@item -Multiplies each value of @code{QVALID} by 10 to obtain a percentage of -valid values, using @cmd{COMPUTE}. @xref{COMPUTE}, for details. - -@item -Print out the percentage of valid values across all cases, using -@cmd{DESCRIPTIVES}. @xref{DESCRIPTIVES}, for details. -@end enumerate - -@example -COUNT QVALID=Q0 TO Q9 (LO THRU HI). -COMPUTE QVALID=QVALID*10. -DESCRIPTIVES QVALID /STATISTICS=MEAN. -@end example -@end enumerate - -@node FLIP, IF, COUNT, Data Manipulation -@section FLIP -@vindex FLIP - -@display -FLIP /VARIABLES=var_list /NEWNAMES=var_name. -@end display - -@cmd{FLIP} transposes rows and columns in the active file. It -causes cases to be swapped with variables, and vice versa. - -No subcommands are required. The VARIABLES subcommand specifies -variables that will be transformed into cases. Variables not specified -are discarded. By default, all variables are selected for -transposition. - -The variables specified by NEWNAMES, which must be a string variable, is -used to give names to the variables created by @cmd{FLIP}. If -NEWNAMES is not -specified then the default is a variable named CASE_LBL, if it exists. -If it does not then the variables created by FLIP are named VAR000 -through VAR999, then VAR1000, VAR1001, and so on. - -When a NEWNAMES variable is available, the names must be canonicalized -before becoming variable names. Invalid characters are replaced by -letter @samp{V} in the first position, or by @samp{_} in subsequent -positions. If the name thus generated is not unique, then numeric -extensions are added, starting with 1, until a unique name is found or -there are no remaining possibilities. If the latter occurs then the -FLIP operation aborts. - -The resultant dictionary contains a CASE_LBL variable, which stores the -names of the variables in the dictionary before the transposition. If -the active file is subsequently transposed using @cmd{FLIP}, this -variable can -be used to recreate the original variable names. - -@node IF, RECODE, FLIP, Data Manipulation -@section IF -@vindex IF - -@display -IF condition variable=expression. - or -IF condition vector(index)=expression. -@end display - -The @cmd{IF} transformation conditionally assigns the value of a target -expression to a target variable, based on the truth of a test -expression. - -Specify a boolean-valued expression (@pxref{Expressions}) to be tested -following the IF keyword. This expression is evaluated for each case. -If the value is true, then the value of the expression is computed and -assigned to the specified variable. If the value is false or missing, -nothing is done. Numeric and short and long string variables may be -assigned. When a string expression's width differs from the target -variable's width, the string result of the expression is truncated or -padded with spaces on the right as necessary. The expression and -variable types must match. - -The target variable may be specified as an element of a vector -(@pxref{VECTOR}). In this case, a vector index expression must be -specified in parentheses following the vector name. The index -expression must evaluate to a numeric value that, after rounding down -to the nearest integer, is a valid index for the named vector. - -Using @cmd{IF} to assign to a variable specified on @cmd{LEAVE} -(@pxref{LEAVE}) resets the variable's left state. Therefore, -@code{LEAVE} should be specified following @cmd{IF}, not before. - -@node RECODE, SORT CASES, IF, Data Manipulation -@section RECODE -@vindex RECODE - -@display -RECODE var_list (src_value@dots{}=dest_value)@dots{} [INTO var_list]. - -src_value may take the following forms: - number - string - num1 THRU num2 - MISSING - SYSMIS - ELSE -Open-ended ranges may be specified using LO or LOWEST for num1 -or HI or HIGHEST for num2. - -dest_value may take the following forms: - num - string - SYSMIS - COPY -@end display - -@cmd{RECODE} translates data from one range of values to -another, via flexible user-specified mappings. Data may be remapped -in-place or copied to new variables. Numeric, short string, and long -string data can be recoded. - -Specify the list of source variables, followed by one or more mapping -specifications each enclosed in parentheses. If the data is to be -copied to new variables, specify INTO, then the list of target -variables. String target variables must already have been declared -using @cmd{STRING} or another transformation, but numeric target -variables can -be created on the fly. There must be exactly as many target variables -as source variables. Each source variable is remapped into its -corresponding target variable. - -When INTO is not used, the input and output variables must be of the -same type. Otherwise, string values can be recoded into numeric values, -and vice versa. When this is done and there is no mapping for a -particular value, either a value consisting of all spaces or the -system-missing value is assigned, depending on variable type. - -Mappings are considered from left to right. The first src_value that -matches the value of the source variable causes the target variable to -receive the value indicated by the dest_value. Literal number, string, -and range src_value's should be self-explanatory. MISSING as a -src_value matches any user- or system-missing value. SYSMIS matches the -system missing value only. ELSE is a catch-all that matches anything. -It should be the last src_value specified. - -Numeric and string dest_value's should also be self-explanatory. COPY -causes the input values to be copied to the output. This is only value -if the source and target variables are of the same type. SYSMIS -indicates the system-missing value. - -If the source variables are strings and the target variables are -numeric, then there is one additional mapping available: (CONVERT), -which must be the last specified mapping. CONVERT causes a number -specified as a string to be converted to a numeric value. If the string -cannot be parsed as a number, then the system-missing value is assigned. - -Multiple recodings can be specified on a single @cmd{RECODE} invocation. -Introduce additional recodings with a slash (@samp{/}) to -separate them from the previous recodings. - -@node SORT CASES, , RECODE, Data Manipulation -@section SORT CASES -@vindex SORT CASES - -@display -SORT CASES BY var_list. -@end display - -@cmd{SORT CASES} sorts the active file by the values of one or more -variables. - -Specify BY and a list of variables to sort by. By default, variables -are sorted in ascending order. To override sort order, specify (D) or -(DOWN) after a list of variables to get descending order, or (A) or (UP) -for ascending order. These apply to the entire list of variables -preceding them. - -@cmd{SORT CASES} is a procedure. It causes the data to be read. - -@cmd{SORT CASES} attempts to sort the entire active file in main memory. -If main memory is exhausted, it falls back to a merge sort algorithm that -involves writing and reading numerous temporary files. Environment -variables determine the temporary files' location. The first of -SPSSTMPDIR, SPSSXTMPDIR, or TMPDIR that is set determines the location. -Otherwise, if the compiler environment defined P_tmpdir, that is used. -Otherwise, under Unix-like OSes /tmp is used; under MS-DOS, the first of -TEMP, TMP, or root on the current drive is used; under other OSes, the -current directory. - -@node Data Selection, Conditionals and Looping, Data Manipulation, Top -@chapter Selecting data for analysis - -This chapter documents PSPP commands that temporarily or permanently -select data records from the active file for analysis. - -@menu -* FILTER:: Exclude cases based on a variable. -* N OF CASES:: Limit the size of the active file. -* PROCESS IF:: Temporarily excluding cases. -* SAMPLE:: Select a specified proportion of cases. -* SELECT IF:: Permanently delete selected cases. -* SPLIT FILE:: Do multiple analyses with one command. -* TEMPORARY:: Make transformations' effects temporary. -* WEIGHT:: Weight cases by a variable. -@end menu - -@node FILTER, N OF CASES, Data Selection, Data Selection -@section FILTER -@vindex FILTER - -@display -FILTER BY var_name. -FILTER OFF. -@end display - -@cmd{FILTER} allows a boolean-valued variable to be used to select -cases from the data stream for processing. - -To set up filtering, specify BY and a variable name. Keyword -BY is optional but recommended. Cases which have a zero or system- or -user-missing value are excluded from analysis, but not deleted from the -data stream. Cases with other values are analyzed. - -@code{FILTER OFF} turns off case filtering. - -Filtering takes place immediately before cases pass to a procedure for -analysis. Only one filter variable may be active at a time. Normally, -case filtering continues until it is explicitly turned off with @code{FILTER -OFF}. However, if @cmd{FILTER} is placed after TEMPORARY, filtering stops -after execution of the next procedure or procedure-like command. - -@node N OF CASES, PROCESS IF, FILTER, Data Selection -@section N OF CASES -@vindex N OF CASES - -@display -N [OF CASES] num_of_cases [ESTIMATED]. -@end display - -Sometimes you may want to disregard cases of your input. @cmd{N} can -do this. @code{N 100} tells PSPP to disregard all cases after the -first 100. - -If the value specified for @cmd{N} is greater than the number of cases -read in, the value is ignored. - -@cmd{N} does not discard cases or prevent them from being read. It -just causes cases beyond the last one specified to be ignored by data -analysis commands. - -A later @cmd{N} command can increase or decrease the number of cases -selected. (To select all the cases without knowing how many there are, -specify a very high number: 100000 or whatever you think is large enough.) - -Transformation procedures performed after @cmd{N} is executed -@emph{do} cause cases to be discarded. - -@cmd{SAMPLE}, @cmd{PROCESS IF}, and @cmd{SELECT IF} have -precedence over @cmd{N}---the same results are obtained by both of the -following fragments, given the same random number seeds: - -@example -@i{@dots{}set up, read in data@dots{}} -N 100. -SAMPLE .5. -@i{@dots{}analyze data@dots{}} - -@i{@dots{}set up, read in data@dots{}} -SAMPLE .5. -N 100. -@i{@dots{}analyze data@dots{}} -@end example - -Both fragments above first randomly sample approximately half of the -cases, then select the first 100 of those sampled. - -@cmd{N} with the @code{ESTIMATED} keyword gives an -estimated number of cases before @cmd{DATA LIST} or another command to -read in data. @code{ESTIMATED} never limits the number of cases -processed by procedures. PSPP currently does not make use of -case count estimates. - -@node PROCESS IF, SAMPLE, N OF CASES, Data Selection -@section PROCESS IF -@vindex PROCESS IF - -@example -PROCESS IF expression. -@end example - -@cmd{PROCESS IF} temporarily eliminates cases from the -data stream. Its effects are active only through the execution of the -next procedure or procedure-like command. - -Specify a boolean expression (@pxref{Expressions}). If the value of the -expression is true for a particular case, the case will be analyzed. If -the expression has a false or missing value, then the case will be -deleted from the data stream for this procedure only. - -Regardless of its placement relative to other commands, @cmd{PROCESS IF} -always takes effect immediately before data passes to the procedure. -Only one @cmd{PROCESS IF} command may be in effect at any given time. - -The effects of @cmd{PROCESS IF} are similar, but not identical, to the -effects of executing @cmd{TEMPORARY}, then @cmd{SELECT IF} -(@pxref{SELECT IF}). - -@cmd{PROCESS IF} is deprecated. It is included for compatibility with -old command files. New syntax files should use @cmd{SELECT IF} or -@cmd{FILTER} instead. - -@node SAMPLE, SELECT IF, PROCESS IF, Data Selection -@section SAMPLE -@vindex SAMPLE - -@display -SAMPLE num1 [FROM num2]. -@end display - -@cmd{SAMPLE} is used to randomly sample a proportion of the cases in -the active file. @cmd{SAMPLE} is temporary, affecting only the next -procedure, unless that is a data transformation, such as @cmd{SELECT IF} -or @cmd{RECODE}. - -The proportion to sample can be expressed as a single number between 0 -and 1. If @code{k} is the number specified, and @code{N} is the number -of currently-selected cases in the active file, then after -@code{SAMPLE @var{k}.}, approximately @code{k*N} cases will be -selected. - -The proportion to sample can also be specified in the style @code{SAMPLE -@var{m} FROM @var{N}}. With this style, cases are selected as follows: - -@enumerate -@item -If @var{N} is equal to the number of currently-selected cases in the -active file, exactly @var{m} cases will be selected. - -@item -If @var{N} is greater than the number of currently-selected cases in the -active file, an equivalent proportion of cases will be selected. - -@item -If @var{N} is less than the number of currently-selected cases in the -active, exactly @var{m} cases will be selected @emph{from the first -@var{N} cases in the active file.} -@end enumerate - -@cmd{SAMPLE}, @cmd{SELECT IF}, and @code{PROCESS IF} are performed in -the order specified by the syntax file. - -@cmd{SAMPLE} is ignored before @code{SORT CASES}. - -@cmd{SAMPLE} is always performed before @code{N OF CASES}, regardless -of ordering in the syntax file. @xref{N OF CASES}. - -The same values for @cmd{SAMPLE} may result in different samples. To -obtain the same sample, use the @code{SET} command to set the random -number seed to the same value before each @cmd{SAMPLE}. Different -samples may still result when the file is processed on systems with -differing endianness or floating-point formats. By default, the -random number seed is based on the system time. - -@node SELECT IF, SPLIT FILE, SAMPLE, Data Selection -@section SELECT IF -@vindex SELECT IF - -@display -SELECT IF expression. -@end display - -@cmd{SELECT IF} selects cases for analysis based on the value of a -boolean expression. Cases not selected are permanently eliminated -from the active file, unless @cmd{TEMPORARY} is in effect -(@pxref{TEMPORARY}). - -Specify a boolean expression (@pxref{Expressions}). If the value of the -expression is true for a particular case, the case will be analyzed. If -the expression has a false or missing value, then the case will be -deleted from the data stream. - -Place @cmd{SELECT IF} as early in the command file as -possible. Cases that are deleted early can be processed more -efficiently in time and space. - -@node SPLIT FILE, TEMPORARY, SELECT IF, Data Selection -@section SPLIT FILE -@vindex SPLIT FILE - -@display -Two possible syntaxes: - SPLIT FILE BY var_list. - SPLIT FILE OFF. -@end display - -@cmd{SPLIT FILE} allows multiple sets of data present in one data -file to be analyzed separately using single statistical procedure -commands. - -Specify a list of variable names to analyze multiple sets of -data separately. Groups of cases having the same values for these -variables are analyzed by statistical procedure commands as one group. -An independent analysis is carried out for each group of cases, and the -variable values for the group are printed along with the analysis. - -Specify OFF to disable @cmd{SPLIT FILE} and resume analysis of the -entire active file as a single group of data. - -@node TEMPORARY, WEIGHT, SPLIT FILE, Data Selection -@section TEMPORARY -@vindex TEMPORARY - -@display -TEMPORARY. -@end display - -@cmd{TEMPORARY} is used to make the effects of transformations -following its execution temporary. These transformations will -affect only the execution of the next procedure or procedure-like -command. Their effects will not be saved to the active file. - -The only specification is the command name. - -@cmd{TEMPORARY} may not appear within a @cmd{DO IF} or @cmd{LOOP} -construct. It may -appear only once between procedures and procedure-like commands. - -An example may help to clarify: - -@example -DATA LIST /X 1-2. -BEGIN DATA. - 2 - 4 -10 -15 -20 -24 -END DATA. -COMPUTE X=X/2. -TEMPORARY. -COMPUTE X=X+3. -DESCRIPTIVES X. -DESCRIPTIVES X. -@end example - -The data read by the first @cmd{DESCRIPTIVES} are 4, 5, 8, -10.5, 13, 15. The data read by the first @cmd{DESCRIPTIVES} are 1, 2, -5, 7.5, 10, 12. - -@node WEIGHT, , TEMPORARY, Data Selection -@section WEIGHT -@vindex WEIGHT - -@display -WEIGHT BY var_name. -WEIGHT OFF. -@end display - -@cmd{WEIGHT} assigns cases varying weights, -changing the frequency distribution of the active file. Execution of -@cmd{WEIGHT} is delayed until data have been read. - -If a variable name is specified, @cmd{WEIGHT} causes the values of that -variable to be used as weighting factors for subsequent statistical -procedures. Use of keyword BY is optional but recommended. Weighting -variables must be numeric. Scratch variables may not be used for -weighting (@pxref{Scratch Variables}). - -When OFF is specified, subsequent statistical procedures will weight all -cases equally. - -A positive integer weighting factor @var{w} on a case will yield the -same statistical output as would replicating the case @var{w} times. -A weighting factor of 0 is treated for statistical purposes as if the -case did not exist in the input. Weighting values need not be -integers, but negative and system-missing values for the weighting -variable are interpreted as weighting factors of 0. User-missing -values are not treated specially. - -@cmd{WEIGHT} does not cause cases in the active file to be replicated in -memory. - -@node Conditionals and Looping, Statistics, Data Selection, Top -@chapter Conditional and Looping Constructs -@cindex conditionals -@cindex loops -@cindex flow of control -@cindex control flow - -This chapter documents PSPP commands used for conditional execution, -looping, and flow of control. - -@menu -* BREAK:: Exit a loop. -* DO IF:: Conditionally execute a block of code. -* DO REPEAT:: Textually repeat a code block. -* LOOP:: Repeat a block of code. -@end menu - -@node BREAK, DO IF, Conditionals and Looping, Conditionals and Looping -@section BREAK -@vindex BREAK - -@display -BREAK. -@end display - -@cmd{BREAK} terminates execution of the innermost currently executing -@cmd{LOOP} construct. - -@cmd{BREAK} is allowed only inside @cmd{LOOP}@dots{}@cmd{END LOOP}. -@xref{LOOP}, for more details. - -@node DO IF, DO REPEAT, BREAK, Conditionals and Looping -@section DO IF -@vindex DO IF - -@display -DO IF condition. - @dots{} -[ELSE IF condition. - @dots{} -]@dots{} -[ELSE. - @dots{}] -END IF. -@end display - -@cmd{DO IF} allows one of several sets of transformations to be -executed, depending on user-specified conditions. - -If the specified boolean expression evaluates as true, then the block -of code following @cmd{DO IF} is executed. If it evaluates as -missing, then -none of the code blocks is executed. If it is false, then -the boolean expression on the first @cmd{ELSE IF}, if present, is tested in -turn, with the same rules applied. If all expressions evaluate to -false, then the @cmd{ELSE} code block is executed, if it is present. - -@node DO REPEAT, LOOP, DO IF, Conditionals and Looping -@section DO REPEAT -@vindex DO REPEAT - -@display -DO REPEAT repvar_name=expansion@dots{}. - @dots{} -END REPEAT [PRINT]. - -expansion takes one of the following forms: - var_list - num_or_range@dots{} - 'string'@dots{} - -num_or_range takes one of the following forms: - number - num1 TO num2 -@end display - -@cmd{DO REPEAT} repeats a block of code, textually substituting -different variables, numbers, or strings into the block with each -repetition. - -Specify a repeat variable name followed by an equals sign (@samp{=}) and -the list of replacements. Replacements can be a list of variables -(which may be existing variables or new variables or a combination -thereof), of numbers, or of strings. When new variable names are -specified, @cmd{DO REPEAT} creates them as numeric variables. When numbers -are specified, runs of integers may be indicated with TO notation, for -instance @samp{1 TO 5} and @samp{1 2 3 4 5} would be equivalent. There -is no equivalent notation for string values. - -Multiple repeat variables can be specified. When this is done, each -variable must have the same number of replacements. - -The code within @cmd{DO REPEAT} is repeated as many times as there are -replacements for each variable. The first time, the first value for -each repeat variable is substituted; the second time, the second value -for each repeat variable is substituted; and so on. - -Repeat variable substitutions work like macros. They take place -anywhere in a line that the repeat variable name occurs as a token, -including command and subcommand names. For this reason it is not a -good idea to select words commonly used in command and subcommand names -as repeat variable identifiers. - -If PRINT is specified on @cmd{END REPEAT}, the commands after substitutions -are made are printed to the listing file, prefixed by a plus sign -(@samp{+}). - -@node LOOP, , DO REPEAT, Conditionals and Looping -@section LOOP -@vindex LOOP - -@display -LOOP [index_var=start TO end [BY incr]] [IF condition]. - @dots{} -END LOOP [IF condition]. -@end display - -@cmd{LOOP} iterates a group of commands. A number of -termination options are offered. - -Specify index_var to make that variable count from one value to -another by a particular increment. index_var must be a pre-existing -numeric variable. start, end, and incr are numeric expressions -(@pxref{Expressions}.) - -During the first iteration, index_var is set to the value of start. -During each successive iteration, index_var is increased by the value of -incr. If end > start, then the loop terminates when index_var > end; -otherwise it terminates when index_var < end. If incr is not specified -then it defaults to +1 or -1 as appropriate. - -If end > start and incr < 0, or if end < start and incr > 0, then the -loop is never executed. index_var is nevertheless set to the value of -start. - -Modifying index_var within the loop is allowed, but it has no effect on -the value of index_var in the next iteration. - -Specify a boolean expression for the condition on @cmd{LOOP} to -cause the loop to be executed only if the condition is true. If the -condition is false or missing before the loop contents are executed the -first time, the loop contents are not executed at all. - -If index and condition clauses are both present on @cmd{LOOP}, the index -clause is always evaluated first. - -Specify a boolean expression for the condition on @cmd{END LOOP} to cause -the loop to terminate if the condition is not true after the enclosed -code block is executed. The condition is evaluated at the end of the -loop, not at the beginning. - -If the index clause and both condition clauses are not present, then the -loop is executed MXLOOPS (@pxref{SET}) times. - -@cmd{BREAK} also terminates @cmd{LOOP} execution (@pxref{BREAK}). - -@node Statistics, Utilities, Conditionals and Looping, Top -@chapter Statistics - -This chapter documents the statistical procedures that PSPP supports so -far. - -@menu -* DESCRIPTIVES:: Descriptive statistics. -* FREQUENCIES:: Frequency tables. -* CROSSTABS:: Crosstabulation tables. -* T-TEST:: Test Hypotheses about means. -@end menu - -@node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics -@section DESCRIPTIVES - -@display -DESCRIPTIVES - /VARIABLES=var_list - /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@} - /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@} - /SAVE - /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS, - SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT, - SESKEWNESS,SEKURTOSIS@} - /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS, - RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@} - @{A,D@} -@end display - -The @cmd{DESCRIPTIVES} procedure reads the active file and outputs -descriptive -statistics requested by the user. In addition, it can optionally -compute Z-scores. - -The VARIABLES subcommand, which is required, specifies the list of -variables to be analyzed. Keyword VARIABLES is optional. - -All other subcommands are optional: - -The MISSING subcommand determines the handling of missing variables. If -INCLUDE is set, then user-missing values are included in the -calculations. If NOINCLUDE is set, which is the default, user-missing -values are excluded. If VARIABLE is set, then missing values are -excluded on a variable by variable basis; if LISTWISE is set, then -the entire case is excluded whenever any value in that case has a -system-missing or, if INCLUDE is set, user-missing value. - -The FORMAT subcommand affects the output format. Currently the -LABELS/NOLABELS and NOINDEX/INDEX settings are not used. When SERIAL is -set, both valid and missing number of cases are listed in the output; -when NOSERIAL is set, only valid cases are listed. - -The SAVE subcommand causes @cmd{DESCRIPTIVES} to calculate Z scores for all -the specified variables. The Z scores are saved to new variables. -Variable names are generated by trying first the original variable name -with Z prepended and truncated to a maximum of 8 characters, then the -names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through -ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence. In addition, Z score -variable names can be specified explicitly on VARIABLES in the variable -list by enclosing them in parentheses after each variable. - -The STATISTICS subcommand specifies the statistics to be displayed: - -@table @code -@item ALL -All of the statistics below. -@item MEAN -Arithmetic mean. -@item SEMEAN -Standard error of the mean. -@item STDDEV -Standard deviation. -@item VARIANCE -Variance. -@item KURTOSIS -Kurtosis and standard error of the kurtosis. -@item SKEWNESS -Skewness and standard error of the skewness. -@item RANGE -Range. -@item MINIMUM -Minimum value. -@item MAXIMUM -Maximum value. -@item SUM -Sum. -@item DEFAULT -Mean, standard deviation of the mean, minimum, maximum. -@item SEKURTOSIS -Standard error of the kurtosis. -@item SESKEWNESS -Standard error of the skewness. -@end table - -The SORT subcommand specifies how the statistics should be sorted. Most -of the possible values should be self-explanatory. NAME causes the -statistics to be sorted by name. By default, the statistics are listed -in the order that they are specified on the VARIABLES subcommand. The A -and D settings request an ascending or descending sort order, -respectively. - -@node FREQUENCIES, CROSSTABS, DESCRIPTIVES, Statistics -@section FREQUENCIES - -@display -FREQUENCIES - /VARIABLES=var_list - /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@} - @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@} - @{LABELS,NOLABELS@} - @{AVALUE,DVALUE,AFREQ,DFREQ@} - @{SINGLE,DOUBLE@} - @{OLDPAGE,NEWPAGE@} - /MISSING=@{EXCLUDE,INCLUDE@} - /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE, - KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM, - SESKEWNESS,SEKURTOSIS,ALL,NONE@} - /NTILES=ntiles - /PERCENTILES=percent@dots{} - -(These options are not currently implemented.) - /BARCHART=@dots{} - /HISTOGRAM=@dots{} - /HBAR=@dots{} - /GROUPED=@dots{} - -(Integer mode.) - /VARIABLES=var_list (low,high)@dots{} -@end display - -The @cmd{FREQUENCIES} procedure outputs frequency tables for specified -variables. -@cmd{FREQUENCIES} can also calculate and display descriptive statistics -(including median and mode) and percentiles. - -In the future, @cmd{FREQUENCIES} will also support graphical output in the -form of bar charts and histograms. In addition, it will be able to -support percentiles for grouped data. - -The VARIABLES subcommand is the only required subcommand. Specify the -variables to be analyzed. In most cases, this is all that is required. -This is known as @dfn{general mode}. - -Occasionally, one may want to invoke a special mode called @dfn{integer -mode}. Normally, in general mode, PSPP will automatically determine -what values occur in the data. In integer mode, the user specifies the -range of values that the data assumes. To invoke this mode, specify a -range of data values in parentheses, separated by a comma. Data values -inside the range are truncated to the nearest integer, then assigned to -that value. If values occur outside this range, they are discarded. - -The FORMAT subcommand controls the output format. It has several -possible settings: - -@itemize @bullet -@item -TABLE, the default, causes a frequency table to be output for every -variable specified. NOTABLE prevents them from being output. LIMIT -with a numeric argument causes them to be output except when there are -more than the specified number of values in the table. - -@item -STANDARD frequency tables contain more complete information, but also to -take up more space on the printed page. CONDENSE frequency tables are -less informative but take up less space. ONEPAGE with a numeric -argument will output standard frequency tables if there are the -specified number of values or less, condensed tables otherwise. ONEPAGE -without an argument defaults to a threshold of 50 values. - -@item -LABELS causes value labels to be displayed in STANDARD frequency -tables. NOLABLES prevents this. - -@item -Normally frequency tables are sorted in ascending order by value. This -is AVALUE. DVALUE tables are sorted in descending order by value. -AFREQ and DFREQ tables are sorted in ascending and descending order, -respectively, by frequency count. - -@item -SINGLE spaced frequency tables are closely spaced. DOUBLE spaced -frequency tables have wider spacing. - -@item -OLDPAGE and NEWPAGE are not currently used. -@end itemize - -The MISSING subcommand controls the handling of user-missing values. -When EXCLUDE, the default, is set, user-missing values are not included -in frequency tables or statistics. When INCLUDE is set, user-missing -are included. System-missing values are never included in statistics, -but are listed in frequency tables. - -The available STATISTICS are the same as available in @cmd{DESCRIPTIVES} -(@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median -value, and MODE, the mode. (If there are multiple modes, the smallest -value is reported.) By default, the mean, standard deviation of the -mean, minimum, and maximum are reported for each variable. - -NTILES causes the specified quartiles to be reported. For instance, -@code{/NTILES=4} would cause quartiles to be reported. In addition, -particular percentiles can be requested with the PERCENTILES subcommand. - -@node CROSSTABS, T-TEST, FREQUENCIES, Statistics -@section CROSSTABS - -@display -CROSSTABS - /TABLES=var_list BY var_list [BY var_list]@dots{} - /MISSING=@{TABLE,INCLUDE,REPORT@} - /WRITE=@{NONE,CELLS,ALL@} - /FORMAT=@{TABLES,NOTABLES@} - @{LABELS,NOLABELS,NOVALLABS@} - @{PIVOT,NOPIVOT@} - @{AVALUE,DVALUE@} - @{NOINDEX,INDEX@} - @{BOX,NOBOX@} - /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL, - ASRESIDUAL,ALL,NONE@} - /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D, - KAPPA,ETA,CORR,ALL,NONE@} - -(Integer mode.) - /VARIABLES=var_list (low,high)@dots{} -@end display - -The @cmd{CROSSTABS} procedure displays crosstabulation -tables requested by the user. It can calculate several statistics for -each cell in the crosstabulation tables. In addition, a number of -statistics can be calculated for each table itself. - -The TABLES subcommand is used to specify the tables to be reported. Any -number of dimensions is permitted, and any number of variables per -dimension is allowed. The TABLES subcommand may be repeated as many -times as needed. This is the only required subcommand in @dfn{general -mode}. - -Occasionally, one may want to invoke a special mode called @dfn{integer -mode}. Normally, in general mode, PSPP automatically determines -what values occur in the data. In integer mode, the user specifies the -range of values that the data assumes. To invoke this mode, specify the -VARIABLES subcommand, giving a range of data values in parentheses for -each variable to be used on the TABLES subcommand. Data values inside -the range are truncated to the nearest integer, then assigned to that -value. If values occur outside this range, they are discarded. When it -is present, the VARIABLES subcommand must precede the TABLES -subcommand. - -In general mode, numeric and string variables may be specified on -TABLES. Although long string variables are allowed, only their -initial short-string parts are used. In integer mode, only numeric -variables are allowed. - -The MISSING subcommand determines the handling of user-missing values. -When set to TABLE, the default, missing values are dropped on a table by -table basis. When set to INCLUDE, user-missing values are included in -tables and statistics. When set to REPORT, which is allowed only in -integer mode, user-missing values are included in tables but marked with -an @samp{M} (for ``missing'') and excluded from statistical -calculations. - -Currently the WRITE subcommand is ignored. - -The FORMAT subcommand controls the characteristics of the -crosstabulation tables to be displayed. It has a number of possible -settings: - -@itemize @bullet -@item -TABLES, the default, causes crosstabulation tables to be output. -NOTABLES suppresses them. - -@item -LABELS, the default, allows variable labels and value labels to appear -in the output. NOLABELS suppresses them. NOVALLABS displays variable -labels but suppresses value labels. - -@item -PIVOT, the default, causes each TABLES subcommand to be displayed in a -pivot table format. NOPIVOT causes the old-style crosstabulation format -to be used. - -@item -AVALUE, the default, causes values to be sorted in ascending order. -DVALUE asserts a descending sort order. - -@item -INDEX/NOINDEX is currently ignored. - -@item -BOX/NOBOX is currently ignored. -@end itemize - -The CELLS subcommand controls the contents of each cell in the displayed -crosstabulation table. The possible settings are: - -@table @asis -@item COUNT -Frequency count. -@item ROW -Row percent. -@item COLUMN -Column percent. -@item TOTAL -Table percent. -@item EXPECTED -Expected value. -@item RESIDUAL -Residual. -@item SRESIDUAL -Standardized residual. -@item ASRESIDUAL -Adjusted standardized residual. -@item ALL -All of the above. -@item NONE -Suppress cells entirely. -@end table - -@samp{/CELLS} without any settings specified requests COUNT, ROW, -COLUMN, and TOTAL. If CELLS is not specified at all then only COUNT -will be selected. - -The STATISTICS subcommand selects statistics for computation: - -@table @asis -@item CHISQ -Pearson chi-square, likelihood ratio, Fisher's exact test, continuity -correction, linear-by-linear association. -@item PHI -Phi. -@item CC -Contingency coefficient. -@item LAMBDA -Lambda. -@item UC -Uncertainty coefficient. -@item BTAU -Tau-b. -@item CTAU -Tau-c. -@item RISK -Risk estimate. -@item GAMMA -Gamma. -@item D -Somers' D. -@item KAPPA -Cohen's Kappa. -@item ETA -Eta. -@item CORR -Spearman correlation, Pearson's r. -@item ALL -All of the above. -@item NONE -No statistics. -@end table - -Selected statistics are only calculated when appropriate for the -statistic. Certain statistics require tables of a particular size, and -some statistics are calculated only in integer mode. - -@samp{/STATISTICS} without any settings selects CHISQ. If the -STATISTICS subcommand is not given, no statistics are calculated. - -@strong{Please note:} Currently the implementation of CROSSTABS has the -followings bugs: - -@itemize @bullet -@item -Pearson's R (but not Spearman) is off a little. -@item -T values for Spearman's R and Pearson's R are wrong. -@item -Significance of symmetric and directional measures is not calculated. -@item -Asymmetric ASEs and T values for lambda are wrong. -@item -ASE of Goodman and Kruskal's tau is not calculated. -@item -ASE of symmetric somers' d is wrong. -@item -Approximate T of uncertainty coefficient is wrong. -@end itemize - -Fixes for any of these deficiencies would be welcomed. - -@node T-TEST, , CROSSTABS, Statistics -@comment node-name, next, previous, up - -@section T-TEST - -@display -T-TEST - /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@} - /CRITERIA=CIN(confidence) - - -(One Sample mode.) - TESTVAL=test_value - /VARIABLES=var_list - - -(Independent Samples mode.) - GROUPS=var(value1 [, value2]) - /VARIABLES=var_list - - -(Paired Samples mode.) - PAIRS=var_list [WITH var_list [(PAIRED)] ] - -@end display - - -The @cmd{T-TEST} procedure outputs tables used in testing hypotheses about -means. -It operates in one of three modes: -@itemize -@item One Sample mode. -@item Independent Groups mode. -@item Paired mode. -@end itemize - -@noindent -Each of these modes are described in more detail below. -There are two optional subcommands which are common to all modes. - -The @cmd{/CRITERIA} subcommand tells PSPP the confidence interval used -in the tests. The default value is 0.95. - - -The @cmd{MISSING} subcommand determines the handling of missing -variables. -If INCLUDE is set, then user-missing values are included in the -calculations. -If EXCLUDE is set, which is the default, user-missing -values are excluded. -If LISTWISE is set, then -the entire case is excluded whenever any value in that case has a -system-missing or, if INCLUDE is set, user-missing value. -If ANALYSIS is set, then cases are excluded only where a value used in -the analysis has a system-missing or, if INCLUDE is set, user-missing value. - - -@menu -* One Sample Mode:: Testing against a hypothesised mean -* Independent Samples Mode:: Testing two independent groups for the same mean -* Paired Samples Mode:: Testing two interdependet groups for the same mean -@end menu - -@node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST -@comment node-name, next, previous, up - -@subsection One Sample Mode - -The @cmd{TESTVAL} subcommand invokes the One Sample mode. -This mode is used to test a population mean against a hypothesised -mean. -The value given to the @cmd{TESTVAL} subcommand is the value against -which you wish to test. -In this mode, you must also use the @cmd{/VARIABLES} subcommand to -tell PSPP which variables you wish to test. - -@node Independent Samples Mode, Paired Samples Mode, One Sample Mode, T-TEST -@comment node-name, next, previous, up -@subsection Independent Samples Mode - -The @cmd{GROUPS} subcommand invokes Independent Samples mode or -`Groups' mode. -This mode is used to test whether two groups of values have the -same population mean. -The variable given in the @cmd{GROUPS} subcommand is the independent -variable which determines to which group the samples belong. -The values in parentheses are the specific values of the independent -variable for each group. -In this mode, you must also use the @cmd{/VARIABLES} subcommand to -tell PSPP the dependent variables you wish to test. - -@node Paired Samples Mode, , Independent Samples Mode, T-TEST -@comment node-name, next, previous, up -@subsection Paired Samples Mode - -The @cmd{PAIRS} subcommand introduces Paired Samples mode. -Use this mode when repeated measures have been taken from the same -samples. -If the the @code{WITH} keyword is omitted, then tables for all -combinations of variables given in the @cmd{PAIRS} subcommand are -generated. -If the @code{WITH} keyword is given, and the @code{(PAIRED)} keyword -is also given, then the number of variables preceding @code{WITH} -must be the same as the number following it. -In this case, tables for each respective pair of variables are -generated. -In the event that the @code{WITH} keyword is given, but the -@code{(PAIRED)} keyword is omitted, then tables for each combination -of variable preceding @code{WITH} against variable following -@code{WITH} are generated. - - -@node Utilities, Not Implemented, Statistics, Top -@chapter Utilities - -Commands that don't fit any other category are placed here. - -Most of these commands are not affected by commands like @cmd{IF} and -@cmd{LOOP}: -they take effect only once, unconditionally, at the time that they are -encountered in the input. - -@menu -* COMMENT:: Document your syntax file. -* DOCUMENT:: Document the active file. -* DISPLAY DOCUMENTS:: Display active file documents. -* DISPLAY FILE LABEL:: Display the active file label. -* DROP DOCUMENTS:: Remove documents from the active file. -* ERASE:: Erase a file. -* EXECUTE:: Execute pending transformations. -* FILE LABEL:: Set the active file's label. -* FINISH:: Terminate the PSPP session. -* HOST:: Temporarily return to the operating system. -* INCLUDE:: Include a file within the current one. -* QUIT:: Terminate the PSPP session. -* SET:: Adjust PSPP runtime parameters. -* SUBTITLE:: Provide a document subtitle. -* TITLE:: Provide a document title. -@end menu - -@node COMMENT, DOCUMENT, Utilities, Utilities -@section COMMENT -@vindex COMMENT -@vindex * - -@display -Two possibles syntaxes: - COMMENT comment text @dots{} . - *comment text @dots{} . -@end display - -@cmd{COMMENT} is ignored. It is used to provide information to -the author and other readers of the PSPP syntax file. - -@cmd{COMMENT} can extend over any number of lines. Don't forget to -terminate it with a dot or a blank line. - -@node DOCUMENT, DISPLAY DOCUMENTS, COMMENT, Utilities -@section DOCUMENT -@vindex DOCUMENT - -@display -DOCUMENT documentary_text. -@end display - -@cmd{DOCUMENT} adds one or more lines of descriptive commentary to the -active file. Documents added in this way are saved to system files. -They can be viewed using @cmd{SYSFILE INFO} or @cmd{DISPLAY -DOCUMENTS}. They can be removed from the active file with @cmd{DROP -DOCUMENTS}. - -Specify the documentary text following the DOCUMENT keyword. You can -extend the documentary text over as many lines as necessary. Lines are -truncated at 80 characters width. Don't forget to terminate -the command with a dot or a blank line. - -@node DISPLAY DOCUMENTS, DISPLAY FILE LABEL, DOCUMENT, Utilities -@section DISPLAY DOCUMENTS -@vindex DISPLAY DOCUMENTS - -@display -DISPLAY DOCUMENTS. -@end display - -@cmd{DISPLAY DOCUMENTS} displays the documents in the active file. Each -document is preceded by a line giving the time and date that it was -added. @xref{DOCUMENT}. - -@node DISPLAY FILE LABEL, DROP DOCUMENTS, DISPLAY DOCUMENTS, Utilities -@section DISPLAY FILE LABEL -@vindex DISPLAY FILE LABEL - -@display -DISPLAY FILE LABEL. -@end display - -@cmd{DISPLAY FILE LABEL} displays the file label contained in the -active file, -if any. @xref{FILE LABEL}. - -@node DROP DOCUMENTS, ERASE, DISPLAY FILE LABEL, Utilities -@section DROP DOCUMENTS -@vindex DROP DOCUMENTS - -@display -DROP DOCUMENTS. -@end display - -@cmd{DROP DOCUMENTS} removes all documents from the active file. -New documents can be added with @cmd{DOCUMENT} (@pxref{DOCUMENT}). - -@cmd{DROP DOCUMENTS} changes only the active file. It does not modify any -system files stored on disk. - - -@node ERASE, EXECUTE, DROP DOCUMENTS, Utilities -@comment node-name, next, previous, up -@section ERASE -@vindex ERASE - -@display -ERASE FILE file_name. -@end display - -@cmd{ERASE FILE} deletes a file from the local filesystem. -file_name must be quoted. -This command cannot be used if the SAFER setting is active. - - -@node EXECUTE, FILE LABEL, ERASE, Utilities -@section EXECUTE -@vindex EXECUTE - -@display -EXECUTE. -@end display - -@cmd{EXECUTE} causes the active file to be read and all pending -transformations to be executed. - -@node FILE LABEL, FINISH, EXECUTE, Utilities -@section FILE LABEL -@vindex FILE LABEL - -@display -FILE LABEL file_label. -@end display - -@cmd{FILE LABEL} provides a title for the active file. This -title will be saved into system files and portable files that are -created during this PSPP run. - -file_label need not be quoted. If quotes are -included, they become part of the file label. - -@node FINISH, HOST, FILE LABEL, Utilities -@section FINISH -@vindex FINISH - -@display -FINISH. -@end display - -@cmd{FINISH} terminates the current PSPP session and returns -control to the operating system. - -This command is not valid in interactive mode. - -@node HOST, INCLUDE, FINISH, Utilities -@comment node-name, next, previous, up -@section HOST -@vindex HOST - -@display -HOST. -@end display - -@cmd{HOST} suspends the current PSPP session and temporarily returns control -to the operating system. -This command cannot be used if the SAFER setting is active. - - -@node INCLUDE, QUIT, HOST, Utilities -@section INCLUDE -@vindex INCLUDE -@vindex @@ - -@display -Two possible syntaxes: - INCLUDE 'filename'. - @@filename. -@end display - -@cmd{INCLUDE} causes the PSPP command processor to read an -additional command file as if it were included bodily in the current -command file. - -Include files may be nested to any depth, up to the limit of available -memory. - -@node QUIT, SET, INCLUDE, Utilities -@section QUIT -@vindex QUIT - -@display -Two possible syntaxes: - QUIT. - EXIT. -@end display - -@cmd{QUIT} terminates the current PSPP session and returns control -to the operating system. - -This command is not valid within a command file. - -@node SET, SUBTITLE, QUIT, Utilities -@section SET -@vindex SET - -@display -SET - -(data input) - /BLANKS=@{SYSMIS,'.',number@} - /DECIMAL=@{DOT,COMMA@} - /FORMAT=fmt_spec - -(program input) - /ENDCMD='.' - /NULLINE=@{ON,OFF@} - -(interaction) - /CPROMPT='cprompt_string' - /DPROMPT='dprompt_string' - /ERRORBREAK=@{OFF,ON@} - /MXERRS=max_errs - /MXWARNS=max_warnings - /PROMPT='prompt' - /VIEWLENGTH=@{MINIMUM,MEDIAN,MAXIMUM,n_lines@} - /VIEWWIDTH=n_characters - -(program execution) - /MEXPAND=@{ON,OFF@} - /MITERATE=max_iterations - /MNEST=max_nest - /MPRINT=@{ON,OFF@} - /MXLOOPS=max_loops - /SEED=@{RANDOM,seed_value@} - /UNDEFINED=@{WARN,NOWARN@} - -(data output) - /CC@{A,B,C,D,E@}=@{'npre,pre,suf,nsuf','npre.pre.suf.nsuf'@} - /DECIMAL=@{DOT,COMMA@} - /FORMAT=fmt_spec - -(output routing) - /ECHO=@{ON,OFF@} - /ERRORS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@} - /INCLUDE=@{ON,OFF@} - /MESSAGES=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@} - /PRINTBACK=@{ON,OFF@} - /RESULTS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@} - -(output activation) - /LISTING=@{ON,OFF@} - /PRINTER=@{ON,OFF@} - /SCREEN=@{ON,OFF@} - -(output driver options) - /HEADERS=@{NO,YES,BLANK@} - /LENGTH=@{NONE,length_in_lines@} - /LISTING=filename - /MORE=@{ON,OFF@} - /PAGER=@{OFF,"pager_name"@} - /WIDTH=@{NARROW,WIDTH,n_characters@} - -(logging) - /JOURNAL=@{ON,OFF@} [filename] - /LOG=@{ON,OFF@} [filename] - -(system files) - /COMPRESSION=@{ON,OFF@} - /SCOMPRESSION=@{ON,OFF@} - -(security) - /SAFER=ON - -(obsolete settings accepted for compatibility, but ignored) - /AUTOMENU=@{ON,OFF@} - /BEEP=@{ON,OFF@} - /BLOCK='c' - /BOXSTRING=@{'xxx','xxxxxxxxxxx'@} - /CASE=@{UPPER,UPLOW@} - /COLOR=@dots{} - /CPI=cpi_value - /DISK=@{ON,OFF@} - /EJECT=@{ON,OFF@} - /HELPWINDOWS=@{ON,OFF@} - /HIGHRES=@{ON,OFF@} - /HISTOGRAM='c' - /LOWRES=@{AUTO,ON,OFF@} - /LPI=lpi_value - /MENUS=@{STANDARD,EXTENDED@} - /MXMEMORY=max_memory - /PTRANSLATE=@{ON,OFF@} - /RCOLORS=@dots{} - /RUNREVIEW=@{AUTO,MANUAL@} - /SCRIPTTAB='c' - /TB1=@{'xxx','xxxxxxxxxxx'@} - /TBFONTS='string' - /WORKDEV=drive_letter - /WORKSPACE=workspace_size - /XSORT=@{YES,NO@} -@end display - -@cmd{SET} allows the user to adjust several parameters relating to -PSPP's execution. Since there are many subcommands to this command, its -subcommands will be examined in groups. - -On subcommands that take boolean values, ON and YES are synonym, and -as are OFF and NO, when used as subcommand values. - -The data input subcommands affect the way that data is read from data -files. The data input subcommands are - -@table @asis -@item BLANKS -This is the value assigned to an item data item that is empty or -contains only whitespace. An argument of SYSMIS or '.' will cause the -system-missing value to be assigned to null items. This is the -default. Any real value may be assigned. - -@item DECIMAL -The default DOT setting causes the decimal point character to be -@samp{.}. A setting of COMMA causes the decimal point character to be -@samp{,}. - -@item FORMAT -Allows the default numeric input/output format to be specified. The -default is F8.2. @xref{Input/Output Formats}. -@end table - -Program input subcommands affect the way that programs are parsed when -they are typed interactively or run from a script. They are - -@table @asis -@item ENDCMD -This is a single character indicating the end of a command. The default -is @samp{.}. Don't change this. - -@item NULLINE -Whether a blank line is interpreted as ending the current command. The -default is ON. -@end table - -Interaction subcommands affect the way that PSPP interacts with an -online user. The interaction subcommands are - -@table @asis -@item CPROMPT -The command continuation prompt. The default is @samp{ > }. - -@item DPROMPT -Prompt used when expecting data input within @cmd{BEGIN DATA} (@pxref{BEGIN -DATA}). The default is @samp{data> }. - -@item ERRORBREAK -Whether an error causes PSPP to stop processing the current command -file after finishing the current command. The default is OFF. - -@item MXERRS -The maximum number of errors before PSPP halts processing of the current -command file. The default is 50. - -@item MXWARNS -The maximum number of warnings + errors before PSPP halts processing the -current command file. The default is 100. - -@item PROMPT -The command prompt. The default is @samp{PSPP> }. - -@item VIEWLENGTH -The length of the screen in lines. MINIMUM means 25 lines, MEDIAN and -MAXIMUM mean 43 lines. Otherwise specify the number of lines. Normally -PSPP should auto-detect your screen size so this shouldn't have to be -used. - -@item VIEWWIDTH -The width of the screen in characters. Normally 80 or 132. -@end table - -Program execution subcommands control the way that PSPP commands -execute. The program execution subcommands are - -@table @asis -@item MEXPAND -@itemx MITERATE -@itemx MNEST -@itemx MPRINT -Currently not used. - -@item MXLOOPS -The maximum number of iterations for an uncontrolled loop (@pxref{LOOP}). - -@item SEED -The initial pseudo-random number seed. Set to a real number or to -RANDOM, which will obtain an initial seed from the current time of day. - -@item UNDEFINED -Currently not used. -@end table - -Data output subcommands affect the format of output data. These -subcommands are - -@table @asis -@item CCA -@itemx CCB -@itemx CCC -@itemx CCD -@itemx CCE -Set up custom currency formats. The argument is a string which must -contain exactly three commas or exactly three periods. If commas, then -the grouping character for the currency format is @samp{,}, and the -decimal point character is @samp{.}; if periods, then the situation is -reversed. - -The commas or periods divide the string into four fields, which are, in -order, the negative prefix, prefix, suffix, and negative suffix. When a -value is formatted using the custom currency format, the prefix precedes -the value formatted and the suffix follows it. In addition, if the -value is negative, the negative prefix precedes the prefix and the -negative suffix follows the suffix. - -@item DECIMAL -The default DOT setting causes the decimal point character to be -@samp{.}. A setting of COMMA causes the decimal point character to be -@samp{,}. - -@item FORMAT -Allows the default numeric input/output format to be specified. The -default is F8.2. @xref{Input/Output Formats}. -@end table - -Output routing subcommands affect where the output of transformations -and procedures is sent. These subcommands are - -@table @asis -@item ECHO - -If turned on, commands are written to the listing file as they are read -from command files. The default is OFF. - -@itemx ERRORS -@itemx INCLUDE -@itemx MESSAGES -@item PRINTBACK -@item RESULTS -Currently not used. -@end table - -Output activation subcommands affect whether output devices of -particular types are enabled. These subcommands are - -@table @asis -@item LISTING -Enable or disable listing devices. - -@item PRINTER -Enable or disable printer devices. - -@item SCREEN -Enable or disable screen devices. -@end table - -Output driver option subcommands affect output drivers' settings. These -subcommands are - -@table @asis -@item HEADERS -@itemx LENGTH -@itemx LISTING -@itemx MORE -@itemx PAGER -@itemx WIDTH -Currently not used. -@end table - -Logging subcommands affect logging of commands executed to external -files. These subcommands are - -@table @asis -@item JOURNAL -@item LOG -Not currently used. -@end table - -System file subcommands affect the default format of system files -produced by PSPP. These subcommands are - -@table @asis -@item COMPRESSION -Not currently used. - -@item SCOMPRESSION -Whether system files created by @cmd{SAVE} or @cmd{XSAVE} are -compressed by default. The default is ON. -@end table - -Security subcommands affect the operations that commands are allowed to -perform. The security subcommands are - -@table @asis -@item SAFER -When set, this setting cannot ever be reset, for obvious security -reasons. Setting this option disables the following operations: - -@itemize @bullet -@item -The ERASE command. -@item -The HOST command. -@item -Pipe filenames (filenames beginning or ending with @samp{|}). -@end itemize - -Be aware that this setting does not guarantee safety (commands can still -overwrite files, for instance) but it is an improvement. -@end table - -@node SUBTITLE, TITLE, SET, Utilities -@section SUBTITLE -@vindex SUBTITLE - -@display -SUBTITLE 'subtitle_string'. - or -SUBTITLE subtitle_string. -@end display - -@cmd{SUBTITLE} provides a subtitle to a particular PSPP -run. This subtitle appears at the top of each output page below the -title, if headers are enabled on the output device. - -Specify a subtitle as a string in quotes. The alternate syntax that did -not require quotes is now obsolete. If it is used then the subtitle is -converted to all uppercase. - -@node TITLE, , SUBTITLE, Utilities -@section TITLE -@vindex TITLE - -@display -TITLE 'title_string'. - or -TITLE title_string. -@end display - -@cmd{TITLE} provides a title to a particular PSPP run. -This title appears at the top of each output page, if headers are enabled -on the output device. - -Specify a title as a string in quotes. The alternate syntax that did -not require quotes is now obsolete. If it is used then the title is -converted to all uppercase. - -@node Not Implemented, Data File Format, Utilities, Top -@chapter Not Implemented - -This chapter lists parts of the PSPP language that are not yet -implemented. - -The following transformations and utilities are not yet implemented, but -they will be supported in a later release. - -@itemize @bullet -@item -ADD FILES -@item -ANOVA -@item -DEFINE -@item -FILE TYPE -@item -GET SAS -@item -GET TRANSLATE -@item -MCONVERT -@item -PLOT -@item -PRESERVE -@item -PROCEDURE OUTPUT -@item -RESTORE -@item -SAVE TRANSLATE -@item -SHOW -@item -UPDATE -@end itemize - -The following transformations and utilities are not implemented. There -are no plans to support them in future releases. Contributions to -implement them will still be accepted. - -@itemize @bullet -@item -EDIT -@item -GET DATABASE -@item -GET OSIRIS -@item -GET SCSS -@item -GSET -@item -HELP -@item -INFO -@item -INPUT MATRIX -@item -KEYED DATA LIST -@item -NUMBERED and UNNUMBERED -@item -OPTIONS -@item -REVIEW -@item -SAVE SCSS -@item -SPSS MANAGER -@item -STATISTICS -@end itemize - -@node Data File Format, Portable File Format, Not Implemented, Top -@chapter Data File Format - -PSPP necessarily uses the same format for system files as do the -products with which it is compatible. This chapter is a description of -that format. - -There are three data types used in system files: 32-bit integers, 64-bit -floating points, and 1-byte characters. In this document these will -simply be referred to as @code{int32}, @code{flt64}, and @code{char}, -the names that are used in the PSPP source code. Every field of type -@code{int32} or @code{flt64} is aligned on a 32-bit boundary. - -The endianness of data in PSPP system files is not specified. System -files output on a computer of a particular endianness will have the -endianness of that computer. However, PSPP can read files of either -endianness, regardless of its host computer's endianness. PSPP -translates endianness for both integer and floating point numbers. - -Floating point formats are also not specified. PSPP does not -translate between floating point formats. This is unlikely to be a -problem as all modern computer architectures use IEEE 754 format for -floating point representation. - -The PSPP system-missing value is represented by the largest possible -negative number in the floating point format; in C, this is most likely -@code{-DBL_MAX}. There are two other important values used in missing -values: @code{HIGHEST} and @code{LOWEST}. These are represented by the -largest possible positive number (probably @code{DBL_MAX}) and the -second-largest negative number. The latter must be determined in a -system-dependent manner; in IEEE 754 format it is represented by value -@code{0xffeffffffffffffe}. - -System files are divided into records. Each record begins with an -@code{int32} giving a numeric record type. Individual record types are -described below: - -@menu -* File Header Record:: -* Variable Record:: -* Value Label Record:: -* Value Label Variable Record:: -* Document Record:: -* Machine int32 Info Record:: -* Machine flt64 Info Record:: -* Miscellaneous Informational Records:: -* Dictionary Termination Record:: -* Data Record:: -@end menu - -@node File Header Record, Variable Record, Data File Format, Data File Format -@section File Header Record - -The file header is always the first record in the file. - -@example -struct sysfile_header - @{ - char rec_type[4]; - char prod_name[60]; - int32 layout_code; - int32 case_size; - int32 compressed; - int32 weight_index; - int32 ncases; - flt64 bias; - char creation_date[9]; - char creation_time[8]; - char file_label[64]; - char padding[3]; - @}; -@end example - -@table @code -@item char rec_type[4]; -Record type code. Always set to @samp{$FL2}. This is the only record -for which the record type is not of type @code{int32}. - -@item char prod_name[60]; -Product identification string. This always begins with the characters -@samp{@@(#) SPSS DATA FILE}. PSPP uses the remaining characters to -give its version and the operating system name; for example, @samp{GNU -pspp 0.1.4 - sparc-sun-solaris2.5.2}. The string is truncated if it -would be longer than 60 characters; otherwise it is padded on the right -with spaces. - -@item int32 layout_code; -Always set to 2. PSPP reads this value to determine the -file's endianness. - -@item int32 case_size; -Number of data elements per case. This is the number of variables, -except that long string variables add extra data elements (one for every -8 characters after the first 8). - -@item int32 compressed; -Set to 1 if the data in the file is compressed, 0 otherwise. - -@item int32 weight_index; -If one of the variables in the data set is used as a weighting variable, -set to the index of that variable. Otherwise, set to 0. - -@item int32 ncases; -Set to the number of cases in the file if it is known, or -1 otherwise. - -In the general case it is not possible to determine the number of cases -that will be output to a system file at the time that the header is -written. The way that this is dealt with is by writing the entire -system file, including the header, then seeking back to the beginning of -the file and writing just the @code{ncases} field. For `files' in which -this is not valid, the seek operation fails. In this case, -@code{ncases} remains -1. - -@item flt64 bias; -Compression bias. Always set to 100. The significance of this value is -that only numbers between @code{(1 - bias)} and @code{(251 - bias)} can -be compressed. - -@item char creation_date[9]; -Set to the date of creation of the system file, in @samp{dd mmm yy} -format, with the month as standard English abbreviations, using an -initial capital letter and following with lowercase. If the date is not -available then this field is arbitrarily set to @samp{01 Jan 70}. - -@item char creation_time[8]; -Set to the time of creation of the system file, in @samp{hh:mm:ss} -format and using 24-hour time. If the time is not available then this -field is arbitrarily set to @samp{00:00:00}. - -@item char file_label[64]; -Set the the file label declared by the user, if any. Padded on the -right with spaces. - -@item char padding[3]; -Ignored padding bytes to make the structure a multiple of 32 bits in -length. Set to zeros. -@end table - -@node Variable Record, Value Label Record, File Header Record, Data File Format -@section Variable Record - -Immediately following the header must come the variable records. There -must be one variable record for every variable and every 8 characters in -a long string beyond the first 8; i.e., there must be exactly as many -variable records as the value specified for @code{case_size} in the file -header record. - -@example -struct sysfile_variable - @{ - int32 rec_type; - int32 type; - int32 has_var_label; - int32 n_missing_values; - int32 print; - int32 write; - char name[8]; - - /* The following two fields are present - only if has_var_label is 1. */ - int32 label_len; - char label[/* variable length */]; - - /* The following field is present only - if n_missing_values is not 0. */ - flt64 missing_values[/* variable length*/]; - @}; -@end example - -@table @code -@item int32 rec_type; -Record type code. Always set to 2. - -@item int32 type; -Variable type code. Set to 0 for a numeric variable. For a short -string variable or the first part of a long string variable, this is set -to the width of the string. For the second and subsequent parts of a -long string variable, set to -1, and the remaining fields in the -structure are ignored. - -@item int32 has_var_label; -If this variable has a variable label, set to 1; otherwise, set to 0. - -@item int32 n_missing_values; -If the variable has no missing values, set to 0. If the variable has -one, two, or three discrete missing values, set to 1, 2, or 3, -respectively. If the variable has a range for missing variables, set to --2; if the variable has a range for missing variables plus a single -discrete value, set to -3. - -@item int32 print; -Print format for this variable. See below. - -@item int32 write; -Write format for this variable. See below. - -@item char name[8]; -Variable name. The variable name must begin with a capital letter or -the at-sign (@samp{@@}). Subsequent characters may also be octothorpes -(@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full -stops (@samp{.}). The variable name is padded on the right with spaces. - -@item int32 label_len; -This field is present only if @code{has_var_label} is set to 1. It is -set to the length, in characters, of the variable label, which must be a -number between 0 and 120. - -@item char label[/* variable length */]; -This field is present only if @code{has_var_label} is set to 1. It has -length @code{label_len}, rounded up to the nearest multiple of 32 bits. -The first @code{label_len} characters are the variable's variable label. - -@item flt64 missing_values[/* variable length */]; -This field is present only if @code{n_missing_values} is not 0. It has -the same number of elements as the absolute value of -@code{n_missing_values}. For discrete missing values, each element -represents one missing value. When a range is present, the first -element denotes the minimum value in the range, and the second element -denotes the maximum value in the range. When a range plus a value are -present, the third element denotes the additional discrete missing -value. HIGHEST and LOWEST are indicated as described in the chapter -introduction. -@end table - -The @code{print} and @code{write} members of sysfile_variable are output -formats coded into @code{int32} types. The LSB (least-significant byte) -of the @code{int32} represents the number of decimal places, and the -next two bytes in order of increasing significance represent field width -and format type, respectively. The MSB (most-significant byte) is not -used and should be set to zero. - -Format types are defined as follows: -@table @asis -@item 0 -Not used. -@item 1 -@code{A} -@item 2 -@code{AHEX} -@item 3 -@code{COMMA} -@item 4 -@code{DOLLAR} -@item 5 -@code{F} -@item 6 -@code{IB} -@item 7 -@code{PIBHEX} -@item 8 -@code{P} -@item 9 -@code{PIB} -@item 10 -@code{PK} -@item 11 -@code{RB} -@item 12 -@code{RBHEX} -@item 13 -Not used. -@item 14 -Not used. -@item 15 -@code{Z} -@item 16 -@code{N} -@item 17 -@code{E} -@item 18 -Not used. -@item 19 -Not used. -@item 20 -@code{DATE} -@item 21 -@code{TIME} -@item 22 -@code{DATETIME} -@item 23 -@code{ADATE} -@item 24 -@code{JDATE} -@item 25 -@code{DTIME} -@item 26 -@code{WKDAY} -@item 27 -@code{MONTH} -@item 28 -@code{MOYR} -@item 29 -@code{QYR} -@item 30 -@code{WKYR} -@item 31 -@code{PCT} -@item 32 -@code{DOT} -@item 33 -@code{CCA} -@item 34 -@code{CCB} -@item 35 -@code{CCC} -@item 36 -@code{CCD} -@item 37 -@code{CCE} -@item 38 -@code{EDATE} -@item 39 -@code{SDATE} -@end table - -@node Value Label Record, Value Label Variable Record, Variable Record, Data File Format -@section Value Label Record - -Value label records must follow the variable records and must precede -the header termination record. Other than this, they may appear -anywhere in the system file. Every value label record must be -immediately followed by a label variable record, described below. - -Value label records begin with @code{rec_type}, an @code{int32} value -set to the record type of 3. This is followed by @code{count}, an -@code{int32} value set to the number of value labels present in this -record. - -These two fields are followed by a series of @code{count} tuples. Each -tuple is divided into two fields, the value and the label. The first of -these, the value, is composed of a 64-bit value, which is either a -@code{flt64} value or up to 8 characters (padded on the right to 8 -bytes) denoting a short string value. Whether the value is a -@code{flt64} or a character string is not defined inside the value label -record. - -The second field in the tuple, the label, has variable length. The -first @code{char} is a count of the number of characters in the value -label. The remainder of the field is the label itself. The field is -padded on the right to a multiple of 64 bits in length. - -@node Value Label Variable Record, Document Record, Value Label Record, Data File Format -@section Value Label Variable Record - -Every value label variable record must be immediately preceded by a -value label record, described above. - -@example -struct sysfile_value_label_variable - @{ - int32 rec_type; - int32 count; - int32 vars[/* variable length */]; - @}; -@end example - -@table @code -@item int32 rec_type; -Record type. Always set to 4. - -@item int32 count; -Number of variables that the associated value labels from the value -label record are to be applied. - -@item int32 vars[/* variable length]; -A list of variables to which to apply the value labels. There are -@code{count} elements. -@end table - -@node Document Record, Machine int32 Info Record, Value Label Variable Record, Data File Format -@section Document Record - -There must be no more than one document record per system file. -Document records must follow the variable records and precede the -dictionary termination record. - -@example -struct sysfile_document - @{ - int32 rec_type; - int32 n_lines; - char lines[/* variable length */][80]; - @}; -@end example - -@table @code -@item int32 rec_type; -Record type. Always set to 6. - -@item int32 n_lines; -Number of lines of documents present. - -@item char lines[/* variable length */][80]; -Document lines. The number of elements is defined by @code{n_lines}. -Lines shorter than 80 characters are padded on the right with spaces. -@end table - -@node Machine int32 Info Record, Machine flt64 Info Record, Document Record, Data File Format -@section Machine @code{int32} Info Record - -There must be no more than one machine @code{int32} info record per -system file. Machine @code{int32} info records must follow the variable -records and precede the dictionary termination record. - -@example -struct sysfile_machine_int32_info - @{ - /* Header. */ - int32 rec_type; - int32 subtype; - int32 size; - int32 count; - - /* Data. */ - int32 version_major; - int32 version_minor; - int32 version_revision; - int32 machine_code; - int32 floating_point_rep; - int32 compression_code; - int32 endianness; - int32 character_code; - @}; -@end example - -@table @code -@item int32 rec_type; -Record type. Always set to 7. - -@item int32 subtype; -Record subtype. Always set to 3. - -@item int32 size; -Size of each piece of data in the data part, in bytes. Always set to 4. - -@item int32 count; -Number of pieces of data in the data part. Always set to 8. - -@item int32 version_major; -PSPP major version number. In version @var{x}.@var{y}.@var{z}, this -is @var{x}. - -@item int32 version_minor; -PSPP minor version number. In version @var{x}.@var{y}.@var{z}, this -is @var{y}. - -@item int32 version_revision; -PSPP version revision number. In version @var{x}.@var{y}.@var{z}, -this is @var{z}. - -@item int32 machine_code; -Machine code. PSPP always set this field to value to -1, but other -values may appear. - -@item int32 floating_point_rep; -Floating point representation code. For IEEE 754 systems this is 1. -IBM 370 sets this to 2, and DEC VAX E to 3. - -@item int32 compression_code; -Compression code. Always set to 1. - -@item int32 endianness; -Machine endianness. 1 indicates big-endian, 2 indicates little-endian. - -@item int32 character_code; -Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3 -indicates 8-bit ASCII, 4 indicates DEC Kanji. -@end table - -@node Machine flt64 Info Record, Miscellaneous Informational Records, Machine int32 Info Record, Data File Format -@section Machine @code{flt64} Info Record - -There must be no more than one machine @code{flt64} info record per -system file. Machine @code{flt64} info records must follow the variable -records and precede the dictionary termination record. - -@example -struct sysfile_machine_flt64_info - @{ - /* Header. */ - int32 rec_type; - int32 subtype; - int32 size; - int32 count; - - /* Data. */ - flt64 sysmis; - flt64 highest; - flt64 lowest; - @}; -@end example - -@table @code -@item int32 rec_type; -Record type. Always set to 3. - -@item int32 subtype; -Record subtype. Always set to 4. - -@item int32 size; -Size of each piece of data in the data part, in bytes. Always set to 4. - -@item int32 count; -Number of pieces of data in the data part. Always set to 3. - -@item flt64 sysmis; -The system missing value. - -@item flt64 highest; -The value used for HIGHEST in missing values. - -@item flt64 lowest; -The value used for LOWEST in missing values. -@end table - -@node Miscellaneous Informational Records, Dictionary Termination Record, Machine flt64 Info Record, Data File Format -@section Miscellaneous Informational Records - -Miscellaneous informational records must follow the variable records and -precede the dictionary termination record. - -Miscellaneous informational records are ignored by PSPP when reading -system files. They are not written by PSPP when writing system files. - -@example -struct sysfile_misc_info - @{ - /* Header. */ - int32 rec_type; - int32 subtype; - int32 size; - int32 count; - - /* Data. */ - char data[/* variable length */]; - @}; -@end example - -@table @code -@item int32 rec_type; -Record type. Always set to 3. - -@item int32 subtype; -Record subtype. May take any value. - -@item int32 size; -Size of each piece of data in the data part. Should have the value 4 or -8, for @code{int32} and @code{flt64}, respectively. - -@item int32 count; -Number of pieces of data in the data part. - -@item char data[/* variable length */]; -Arbitrary data. There must be @code{size} times @code{count} bytes of -data. -@end table - -@node Dictionary Termination Record, Data Record, Miscellaneous Informational Records, Data File Format -@section Dictionary Termination Record - -The dictionary termination record must follow all other records, except -for the actual cases, which it must precede. There must be exactly one -dictionary termination record in every system file. - -@example -struct sysfile_dict_term - @{ - int32 rec_type; - int32 filler; - @}; -@end example - -@table @code -@item int32 rec_type; -Record type. Always set to 999. - -@item int32 filler; -Ignored padding. Should be set to 0. -@end table - -@node Data Record, , Dictionary Termination Record, Data File Format -@section Data Record - -Data records must follow all other records in the data file. There must -be at least one data record in every system file. - -The format of data records varies depending on whether the data is -compressed. Regardless, the data is arranged in a series of 8-byte -elements. - -When data is not compressed, Every case is composed of @code{case_size} -of these 8-byte elements, where @code{case_size} comes from the file -header record (@pxref{File Header Record}). Each element corresponds to -the variable declared in the respective variable record (@pxref{Variable -Record}). Numeric values are given in @code{flt64} format; string -values are literal characters string, padded on the right when -necessary. - -Compressed data is arranged in the following manner: the first 8-byte -element in the data section is divided into a series of 1-byte command -codes. These codes have meanings as described below: - -@table @asis -@item 0 -Ignored. If the program writing the system file accumulates compressed -data in blocks of fixed length, 0 bytes can be used to pad out extra -bytes remaining at the end of a fixed-size block. - -@item 1 through 251 -These values indicate that the corresponding numeric variable has the -value @code{(@var{code} - @var{bias})} for the case being read, where -@var{code} is the value of the compression code and @var{bias} is the -variable @code{compression_bias} from the file header. For example, -code 105 with bias 100.0 (the normal value) indicates a numeric variable -of value 5. - -@item 252 -End of file. This code may or may not appear at the end of the data -stream. PSPP always outputs this code but its use is not required. - -@item 253 -This value indicates that the numeric or string value is not -compressible. The value is stored in the 8-byte element following the -current block of command bytes. If this value appears twice in a block -of command bytes, then it indicates the second element following the -command bytes, and so on. - -@item 254 -Used to indicate a string value that is all spaces. - -@item 255 -Used to indicate the system-missing value. -@end table - -When the end of the first 8-byte element of command bytes is reached, -any blocks of non-compressible values are skipped, and the next element -of command bytes is read and interpreted, until the end of the file is -reached. - -@node Portable File Format, q2c Input Format, Data File Format, Top -@chapter Portable File Format - -These days, most computers use the same internal data formats for -integer and floating-point data, if one ignores little differences like -big- versus little-endian byte ordering. However, occasionally it is -necessary to exchange data between systems with incompatible data -formats. This is what portable files are designed to do. - -@strong{Please note:} Although all of the following information is -correct, as far as the author has been able to ascertain, it is gleaned -from examination of ASCII-formatted portable files only, so some of it -may be incorrect in the general case. - -@menu -* Portable File Characters:: -* Portable File Structure:: -* Portable File Header:: -* Version and Date Info Record:: -* Identification Records:: -* Variable Count Record:: -* Variable Records:: -* Value Label Records:: -* Portable File Data:: -@end menu - -@node Portable File Characters, Portable File Structure, Portable File Format, Portable File Format -@section Portable File Characters - -Portable files are arranged as a series of lines of exactly 80 -characters each. Each line is terminated by a carriage-return, -line-feed sequence (henceforth, ``newline''). Newlines are not -delimiters: they are only used to avoid line-length limitations existing -on some operating systems. - -The file must be terminated with a @samp{Z} character. In addition, if -the final line in the file does not have exactly 80 characters, then it -is padded on the right with @samp{Z} characters. (The file contents may -be in any character set; the file contains a description of its own -character set, as explained in the next section. Therefore, the -@samp{Z} character is not necessarily an ASCII @samp{Z}.) - -For the rest of the description of the portable file format, newlines -and the trailing @samp{Z}s will be ignored, as if they did not exist, -because they are not an important part of understanding the file -contents. - -@node Portable File Structure, Portable File Header, Portable File Characters, Portable File Format -@section Portable File Structure - -Every portable file consists of the following records, in sequence: - -@itemize @bullet - -@item -File header. - -@item -Version and date info. - -@item -Product identification. - -@item -Subproduct identification (optional). - -@item -Variable count. - -@item -Variables. Each variable record may optionally be followed by a -missing value record and a variable label record. - -@item -Value labels (optional). - -@item -Data. -@end itemize - -Most records are identified by a single-character tag code. The file -header and version info record do not have a tag. - -Other than these single-character codes, there are three types of fields -in a portable file: floating-point, integer, and string. Floating-point -fields have the following format: - -@itemize @bullet - -@item -Zero or more leading spaces. - -@item -Optional asterisk (@samp{*}), which indicates a missing value. The -asterisk must be followed by a single character, generally a period -(@samp{.}), but it appears that other characters may also be possible. -This completes the specification of a missing value. - -@item -Optional minus sign (@samp{-}) to indicate a negative number. - -@item -A whole number, consisting of one or more base-30 digits: @samp{0} -through @samp{9} plus capital letters @samp{A} through @samp{T}. - -@item -A fraction, consisting of a radix point (@samp{.}) followed by one or -more base-30 digits (optional). - -@item -An exponent, consisting of a plus or minus sign (@samp{+} or @samp{-}) -followed by one or more base-30 digits (optional). - -@item -A forward slash (@samp{/}). -@end itemize - -Integer fields take form identical to floating-point fields, but they -may not contain a fraction. - -String fields take the form of a integer field having value @var{n}, -followed by exactly @var{n} characters, which are the string content. - -@node Portable File Header, Version and Date Info Record, Portable File Structure, Portable File Format -@section Portable File Header - -Every portable file begins with a 464-byte header, consisting of a -200-byte collection of vanity splash strings, followed by a 256-byte -character set translation table, followed by an 8-byte tag string. - -The 200-byte segment is divided into five 40-byte sections, each of -which represents the string @code{ASCII SPSS PORT FILE} in a different -character set encoding. (If the file is encoded in EBCDIC then the -string is actually @code{EBCDIC SPSS PORT FILE}, and so on.) These -strings are padded on the right with spaces in their own character set. - -It appears that these strings exist only to inform those who might view -the file on a screen, and that they are not parsed by SPSS products. -Thus, they can be safely ignored. For those interested, the strings are -supposed to be in the following character sets, in the specified order: -EBCDIC, 7-bit ASCII, CDC 6-bit ASCII, 6-bit ASCII, Honeywell 6-bit -ASCII. - -The 256-byte segment describes a mapping from the character set used in -the portable file to an arbitrary character set having characters at the -following positions: - -@table @asis -@item 0--60 - -Control characters. Not important enough to describe in full here. - -@item 61--63 - -Reserved. - -@item 64--73 - -Digits @samp{0} through @samp{9}. - -@item 74--99 - -Capital letters @samp{A} through @samp{Z}. - -@item 100--125 - -Lowercase letters @samp{a} through @samp{z}. - -@item 126 - -Space. - -@item 127--130 - -Symbols @code{.<(+} - -@item 131 - -Solid vertical pipe. - -@item 132--142 - -Symbols @code{&[]!$*);^-/} - -@item 143 - -Broken vertical pipe. - -@item 144--150 - -Symbols @code{,%_>}?@code{`:} @c @code{?} is an inverted question mark - -@item 151 - -British pound symbol. - -@item 152--155 - -Symbols @code{@@'="}. - -@item 156 - -Less than or equal symbol. - -@item 157 - -Empty box. - -@item 158 - -Plus or minus. - -@item 159 - -Filled box. - -@item 160 - -Degree symbol. - -@item 161 - -Dagger. - -@item 162 - -Symbol @samp{~}. - -@item 163 - -En dash. - -@item 164 - -Lower left corner box draw. - -@item 165 - -Upper left corner box draw. - -@item 166 - -Greater than or equal symbol. - -@item 167--176 - -Superscript @samp{0} through @samp{9}. - -@item 177 - -Lower right corner box draw. - -@item 178 - -Upper right corner box draw. - -@item 179 - -Not equal symbol. - -@item 180 - -Em dash. - -@item 181 - -Superscript @samp{(}. - -@item 182 - -Superscript @samp{)}. - -@item 183 - -Horizontal dagger (?). - -@item 184--186 - -Symbols @samp{@{@}\}. -@item 187 - -Cents symbol. - -@item 188 - -Centered dot, or bullet. - -@item 189--255 - -Reserved. -@end table - -Symbols that are not defined in a particular character set are set to -the same value as symbol 64; i.e., to @samp{0}. - -The 8-byte tag string consists of the exact characters @code{SPSSPORT} -in the portable file's character set, which can be used to verify that -the file is indeed a portable file. - -@node Version and Date Info Record, Identification Records, Portable File Header, Portable File Format -@section Version and Date Info Record - -This record does not have a tag code. It has the following structure: - -@itemize @bullet -@item -A single character identifying the file format version. The letter A -represents version 0, and so on. - -@item -An 8-character string field giving the file creation date in the format -YYYYMMDD. - -@item -A 6-character string field giving the file creation time in the format -HHMMSS. -@end itemize - -@node Identification Records, Variable Count Record, Version and Date Info Record, Portable File Format -@section Identification Records - -The product identification record has tag code @samp{1}. It consists of -a single string field giving the name of the product that wrote the -portable file. - -The subproduct identification record has tag code @samp{3}. It -consists of a single string field giving additional information on the -product that wrote the portable file. - -@node Variable Count Record, Variable Records, Identification Records, Portable File Format -@section Variable Count Record - -The variable count record has tag code @samp{4}. It consists of two -integer fields. The first contains the number of variables in the file -dictionary. The purpose of the second is unknown; it contains the value -161 in all portable files examined so far. - -@node Variable Records, Value Label Records, Variable Count Record, Portable File Format -@section Variable Records - -Each variable record represents a single variable. Variable records -have tag code @samp{7}. They have the following structure: - -@itemize @bullet - -@item -Width (integer). This is 0 for a numeric variable, and a number between 1 -and 255 for a string variable. - -@item -Name (string). 1--8 characters long. Must be in all capitals. - -@item -Print format. This is a set of three integer fields: - -@itemize @minus - -@item -Format type (@pxref{Variable Record}). - -@item -Format width. 1--40. - -@item -Number of decimal places. 1--40. -@end itemize - -@item -Write format. Same structure as the print format described above. -@end itemize - -Each variable record can optionally be followed by a missing value -record, which has tag code @samp{8}. A missing value record has one -field, the missing value itself (a floating-point or string, as -appropriate). Up to three of these missing value records can be used. - -There is also a record for missing value ranges, which has tag code -@samp{B}. It is followed by two fields representing the range, which -are floating-point or string as appropriate. If a missing value range -is present, it may be followed by a single missing value record. - -Tag codes @samp{9} and @samp{A} represent @code{LO THRU @var{x}} and -@code{@var{x} THRU HI} ranges, respectively. Each is followed by a -single field representing @var{x}. If one of the ranges is present, it -may be followed by a single missing value record. - -In addition, each variable record can optionally be followed by a -variable label record, which has tag code @samp{C}. A variable label -record has one field, the variable label itself (string). - -@node Value Label Records, Portable File Data, Variable Records, Portable File Format -@section Value Label Records - -Value label records have tag code @samp{D}. They have the following -format: - -@itemize @bullet -@item -Variable count (integer). - -@item -List of variables (strings). The variable count specifies the number in -the list. Variables are specified by their names. All variables must -be of the same type (numeric or string). - -@item -Label count (integer). - -@item -List of (value, label) tuples. The label count specifies the number of -tuples. Each tuple consists of a value, which is numeric or string as -appropriate to the variables, followed by a label (string). -@end itemize - -@node Portable File Data, , Value Label Records, Portable File Format -@section Portable File Data - -The data record has tag code @samp{F}. There is only one tag for all -the data; thus, all the data must follow the dictionary. The data is -terminated by the end-of-file marker @samp{Z}, which is not valid as the -beginning of a data element. - -Data elements are output in the same order as the variable records -describing them. String variables are output as string fields, and -numeric variables are output as floating-point fields. - -@node q2c Input Format, Bugs, Portable File Format, Top -@chapter @code{q2c} Input Format - -PSPP statistical procedures have a bizarre and somewhat irregular -syntax. Despite this, a parser generator has been written that -adequately addresses many of the possibilities and tries to provide -hooks for the exceptional cases. This parser generator is named -@code{q2c}. - -@menu -* Invoking q2c:: q2c command-line syntax. -* q2c Input Structure:: High-level layout of the input file. -* Grammar Rules:: Syntax of the grammar rules. -@end menu - -@node Invoking q2c, q2c Input Structure, q2c Input Format, q2c Input Format -@section Invoking q2c - -@example -q2c @var{input.q} @var{output.c} -@end example - -@code{q2c} translates a @samp{.q} file into a @samp{.c} file. It takes -exactly two command-line arguments, which are the input file name and -output file name, respectively. @code{q2c} does not accept any -command-line options. - -@node q2c Input Structure, Grammar Rules, Invoking q2c, q2c Input Format -@section @code{q2c} Input Structure - -@code{q2c} input files are divided into two sections: the grammar rules -and the supporting code. The @dfn{grammar rules}, which make up the -first part of the input, are used to define the syntax of the -statistical procedure to be parsed. The @dfn{supporting code}, -following the grammar rules, are copied largely unchanged to the output -file, except for certain escapes. - -The most important lines in the grammar rules are used for defining -procedure syntax. These lines can be prefixed with a dollar sign -(@samp{$}), which prevents Emacs' CC-mode from munging them. Besides -this, a bang (@samp{!}) at the beginning of a line causes the line, -minus the bang, to be written verbatim to the output file (useful for -comments). As a third special case, any line that begins with the exact -characters @code{/* *INDENT} is ignored and not written to the output. -This allows @code{.q} files to be processed through @code{indent} -without being munged. - -The syntax of the grammar rules themselves is given in the following -sections. - -The supporting code is passed into the output file largely unchanged. -However, the following escapes are supported. Each escape must appear -on a line by itself. - -@table @code -@item /* (header) */ - -Expands to a series of C @code{#include} directives which include the -headers that are required for the parser generated by @code{q2c}. - -@item /* (decls @var{scope}) */ - -Expands to C variable and data type declarations for the variables and -@code{enum}s input and output by the @code{q2c} parser. @var{scope} -must be either @code{local} or @code{global}. @code{local} causes the -declarations to be output as function locals. @code{global} causes them -to be declared as @code{static} module variables; thus, @code{global} is -a bit of a misnomer. - -@item /* (parser) */ - -Expands to the entire parser. Must be enclosed within a C function. - -@item /* (free) */ - -Expands to a set of calls to the @code{free} function for variables -declared by the parser. Only needs to be invoked if subcommands of type -@code{string} are used in the grammar rules. -@end table - -@node Grammar Rules, , q2c Input Structure, q2c Input Format -@section Grammar Rules - -The grammar rules describe the format of the syntax that the parser -generated by @code{q2c} will understand. The way that the grammar rules -are included in @code{q2c} input file are described above. - -The grammar rules are divided into tokens of the following types: - -@table @asis -@item Identifier (@code{ID}) - -An identifier token is a sequence of letters, digits, and underscores -(@samp{_}). Identifiers are @emph{not} case-sensitive. - -@item String (@code{STRING}) - -String tokens are initiated by a double-quote character (@samp{"}) and -consist of all the characters between that double quote and the next -double quote, which must be on the same line as the first. Within a -string, a backslash can be used as a ``literal escape''. The only -reasons to use a literal escape are to include a double quote or a -backslash within a string. - -@item Special character - -Other characters, other than whitespace, constitute tokens in -themselves. - -@end table - -The syntax of the grammar rules is as follows: - -@example -grammar-rules ::= ID : subcommands . -subcommands ::= subcommand - ::= subcommands ; subcommand -@end example - -The syntax begins with an ID or STRING token that gives the name of the -procedure to be parsed. The rest of the syntax consists of subcommands -separated by semicolons (@samp{;}) and terminated with a full stop -(@samp{.}). - -@example -subcommand ::= sbc-options ID sbc-defn -sbc-options ::= - ::= sbc-option - ::= sbc-options sbc-options -sbc-option ::= * - ::= + -sbc-defn ::= opt-prefix = specifiers - ::= [ ID ] = array-sbc - ::= opt-prefix = sbc-special-form -opt-prefix ::= - ::= ( ID ) -@end example - -Each subcommand can be prefixed with one or more option characters. An -asterisk (@samp{*}) is used to indicate the default subcommand; the -keyword used for the default subcommand can be omitted in the PSPP -syntax file. A plus sign (@samp{+}) is used to indicate that a -subcommand can appear more than once; if it is not present then that -subcommand can appear no more than once. - -The subcommand name appears after the option characters. - -There are three forms of subcommands. The first and most common form -simply gives an equals sign (@samp{=}) and a list of specifiers, which -can each be set to a single setting. The second form declares an array, -which is a set of flags that can be individually turned on by the user. -There are also several special forms that do not take a list of -specifiers. - -Arrays require an additional @code{ID} argument. This is used as a -prefix, prepended to the variable names constructed from the -specifiers. The other forms also allow an optional prefix to be -specified. - -@example -array-sbc ::= alternatives - ::= array-sbc , alternatives -alternatives ::= ID - ::= alternatives | ID -@end example - -An array subcommand is a set of Boolean values that can independently be -turned on by the user, listed separated by commas (@samp{,}). If an value has more -than one name then these names are separated by pipes (@samp{|}). - -@example -specifiers ::= specifier - ::= specifiers , specifier -specifier ::= opt-id : settings -opt-id ::= - ::= ID -@end example - -Ordinary subcommands (other than arrays and special forms) require a -list of specifiers. Each specifier has an optional name and a list of -settings. If the name is given then a correspondingly named variable -will be used to store the user's choice of setting. If no name is given -then there is no way to tell which setting the user picked; in this case -the settings should probably have values attached. - -@example -settings ::= setting - ::= settings / setting -setting ::= setting-options ID setting-value -setting-options ::= - ::= * - ::= ! - ::= * ! -@end example - -Individual settings are separated by forward slashes (@samp{/}). Each -setting can be as little as an @code{ID} token, but options and values -can optionally be included. The @samp{*} option means that, for this -setting, the @code{ID} can be omitted. The @samp{!} option means that -this option is the default for its specifier. - -@example -setting-value ::= - ::= ( setting-value-2 ) - ::= setting-value-2 -setting-value-2 ::= setting-value-options setting-value-type : ID - setting-value-restriction -setting-value-options ::= - ::= * -setting-value-type ::= N - ::= D -setting-value-restriction ::= - ::= , STRING -@end example - -Settings may have values. If the value must be enclosed in parentheses, -then enclose the value declaration in parentheses. Declare the setting -type as @samp{n} or @samp{d} for integer or floating point type, -respectively. The given @code{ID} is used to construct a variable name. -If option @samp{*} is given, then the value is optional; otherwise it -must be specified whenever the corresponding setting is specified. A -``restriction'' can also be specified which is a string giving a C -expression limiting the valid range of the value. The special escape -@code{%s} should be used within the restriction to refer to the -setting's value variable. - -@example -sbc-special-form ::= VAR - ::= VARLIST varlist-options - ::= INTEGER opt-list - ::= DOUBLE opt-list - ::= PINT - ::= STRING @r{(the literal word STRING)} string-options - ::= CUSTOM -varlist-options ::= - ::= ( STRING ) -opt-list ::= - ::= LIST -string-options ::= - ::= ( STRING STRING ) -@end example - -The special forms are of the following types: - -@table @code -@item VAR - -A single variable name. - -@item VARLIST - -A list of variables. If given, the string can be used to provide -@code{PV_@var{*}} options to the call to @code{parse_variables}. - -@item INTEGER - -A single integer value. - -@item INTEGER LIST - -A list of integers separated by spaces or commas. - -@item DOUBLE - -A single floating-point value. - -@item DOUBLE LIST - -A list of floating-point values. - -@item PINT - -A single positive integer value. - -@item STRING - -A string value. If the options are given then the first string is an -expression giving a restriction on the value of the string; the second -string is an error message to display when the restriction is violated. - -@item CUSTOM - -A custom function is used to parse this subcommand. The function must -have prototype @code{int custom_@var{name} (void)}. It should return 0 -on failure (when it has already issued an appropriate diagnostic), 1 on -success, or 2 if it fails and the calling function should issue a syntax -error on behalf of the custom handler. - -@end table - -@node Bugs, Function Index, q2c Input Format, Top -@chapter Bugs - -@menu -* Known bugs:: Pointers to other files. -* Contacting the Author:: Where to send the bug reports. -@end menu - -@node Known bugs, Contacting the Author, Bugs, Bugs -@section Known bugs - -This is the list of known bugs in PSPP. In addition, @xref{Not -Implemented}, and @xref{Functions Not Implemented}, for lists of bugs -due to features not implemented. For known bugs in individual language -features, see the documentation for that feature. - -@itemize @bullet -@item -Nothing has yet been tested exhaustively. Be cautious using PSPP to -make important decisions. - -@item -@code{make check} fails on some systems that don't like the syntax. I'm -not sure why. If someone could make an attempt to track this down, it -would be appreciated. - -@item -PostScript driver bugs: - -@itemize @minus -@item -Does not support driver arguments `max-fonts-simult' or -`optimize-text-size'. - -@item -Minor problems with font-encodings. - -@item -Fails to align fonts along their baselines. - -@item -Does not support certain bizarre line intersections--should -never crop up in practice. - -@item -Does not gracefully substitute for existing fonts whose -encodings are missing. - -@item -Does not perform italic correction or left italic correction -on font changes. - -@item -Encapsulated PostScript is unimplemented. -@end itemize - -@item -ASCII driver bugs: - -@itemize @minus -Does not support `infinite length' or `infinite width' paper. -@end itemize -@end itemize - -See below for information on reporting bugs not listed here. - -@node Contacting the Author, , Known bugs, Bugs -@section Contacting the Author - -The author can be contacted at e-mail address -@ifinfo -. -@end ifinfo -@iftex -@code{}. -@end iftex - -PSPP bug reports should be sent to -@ifinfo -. -@end ifinfo -@iftex -@code{}. -@end iftex - -@node Function Index, Concept Index, Bugs, Top -@chapter Function Index -@printindex fn - -@node Concept Index, Command Index, Function Index, Top -@chapter Concept Index -@printindex cp - -@node Command Index, , Concept Index, Top -@chapter Command Index -@printindex vr - -@contents @bye @c Local Variables: +@c use (texinfo-multiple-files-update "pspp.texi") in emacs to keep these files consistent @c compile-command: "makeinfo pspp.texi" @c End: