pintos-os.org Git - pspp/blob - doc/pspp.texi

   1 \input texinfo @c -*- texinfo -*-
   2 @c %**start of header
   3 @setfilename pspp.info
   4 @settitle PSPP
   5 @set TIMESTAMP Time-stamp:  Sat Dec 20 20:25:33 WST 2003 jmd
   6 @set EDITION 0.2
   7 @set VERSION 0.3
   8 @c For double-sided printing, uncomment:
   9 @c @setchapternewpage odd
  10 @c %**end of header
  11
  12 @iftex
  13 @finalout
  14 @end iftex
  15
  16 @dircategory Math
  17 @direntry
  18 * PSPP: (pspp).             Statistical analysis package.
  19 @end direntry
  20
  21 @ifinfo
  22 PSPP, for statistical analysis of sampled data, by Ben Pfaff.
  23
  24 This file documents PSPP, a statistical package for analysis of
  25 sampled data that uses a command language compatible with SPSS.
  26
  27 Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.
  28
  29 This version of the PSPP documentation is consistent with version 2 of
  30 ``texinfo.tex''.
  31
  32 Permission is granted to make and distribute verbatim copies of this
  33 manual provided the copyright notice and this permission notice are
  34 preserved on all copies.
  35
  36 @ignore
  37 Permission is granted to process this file through TeX and print the
  38 results, provided the printed document carries copying permission notice
  39 identical to this one except for the removal of this paragraph (this
  40 paragraph not being relevant to the printed manual).
  41
  42 @end ignore
  43 Permission is granted to copy and distribute modified versions of this
  44 manual under the conditions for verbatim copying, provided that the
  45 entire resulting derived work is distributed under the terms of a
  46 permission notice identical to this one.
  47
  48 Permission is granted to copy and distribute translations of this
  49 manual into another language, under the above condition for modified
  50 versions, except that this permission notice may be stated in a
  51 translation approved by the Free Software Foundation.
  52 @end ifinfo
  53
  54 @titlepage
  55 @title PSPP
  56 @subtitle A System for Statistical Analysis
  57 @subtitle Edition @value{EDITION}, for PSPP version @value{VERSION}
  58 @author by Ben Pfaff
  59
  60 @page
  61 @vskip 0pt plus 1filll
  62
  63 PSPP Copyright @copyright{} 1997, 1998 Free Software Foundation, Inc.
  64
  65 Permission is granted to make and distribute verbatim copies of this
  66 manual provided the copyright notice and this permission notice are
  67 preserved on all copies.
  68
  69 Permission is granted to copy and distribute modified versions of this
  70 manual under the conditions for verbatim copying, provided that the
  71 entire derived work is distributed under the terms of a permission
  72 notice identical to this one.
  73
  74 Permission is granted to copy and distribute translations of this manual
  75 into another language, under the above conditions for modified versions,
  76 except that this permission notice may be stated in a translation
  77 approved by the Foundation.
  78 @end titlepage
  79
  80 @node Top, Introduction, (dir), (dir)
  81 @ifinfo
  82 @top PSPP
  83
  84 This file documents the PSPP package for statistical analysis of sampled
  85 data.  This is edition @value{EDITION}, for PSPP version
  86 @value{VERSION}, last modified at @value{TIMESTAMP}.
  87
  88 @end ifinfo
  89
  90 @menu
  91 * Introduction::                Description of the package.
  92 * License::                     Your rights and obligations.
  93 * Credits::                     Acknowledgement of authors.
  94
  95 * Installation::                How to compile and install PSPP.
  96 * Configuration::               Configuring PSPP.
  97 * Invocation::                  Starting and running PSPP.
  98
  99 * Language::                    Basics of the PSPP command language.
 100 * Expressions::                 Numeric and string expression syntax.
 101
 102 * Data Input and Output::       Reading data from user files.
 103 * System and Portable Files::   Dealing with system & portable files.
 104 * Variable Attributes::         Adjusting and examining variables.
 105 * Data Manipulation::           Simple operations on data.
 106 * Data Selection::              Select certain cases for analysis.
 107 * Conditionals and Looping::    Doing things many times or not at all.
 108 * Statistics::                  Basic statistical procedures.
 109 * Utilities::                   Other commands.
 110 * Not Implemented::             What's not here yet
 111
 112 * Data File Format::            Format of PSPP system files.
 113 * Portable File Format::        Format of PSPP portable files.
 114 * q2c Input Format::            Format of syntax accepted by q2c.
 115
 116 * Bugs::                        Known problems; submitting bug reports.
 117
 118 * Function Index::              Index of PSPP functions for expressions.
 119 * Concept Index::               Index of concepts.
 120 * Command Index::               Index of PSPP procedures.
 121
 122 @end menu
 123
 124 @node Introduction, License, Top, Top
 125 @chapter Introduction
 126 @cindex introduction
 127
 128 @cindex PSPP language
 129 @cindex language, PSPP
 130 PSPP is a tool for statistical analysis of sampled data.  It reads a
 131 syntax file and a data file, analyzes the data, and writes the results
 132 to a listing file or to standard output.
 133
 134 The language accepted by PSPP is similar to those accepted by SPSS
 135 statistical products.  The details of PSPP's language are given
 136 later in this manual.
 137
 138 @cindex files, PSPP
 139 @cindex output, PSPP
 140 @cindex PostScript
 141 @cindex graphics
 142 @cindex Ghostscript
 143 @cindex Free Software Foundation
 144 PSPP produces output in two forms: tables and charts.  Both of these can
 145 be written in several formats; currently, ASCII, PostScript, and HTML
 146 are supported.  In the future, more drivers, such as PCL and X Window
 147 System drivers, may be developed.  For now, Ghostscript, available from
 148 the Free Software Foundation, may be used to convert PostScript chart
 149 output to other formats.
 150
 151 The current version of PSPP, @value{VERSION}, is woefully incomplete in
 152 terms of its statistical procedure support.  PSPP is a work in progress.
 153 The author hopes to support fully support all features in the products
 154 that PSPP replaces, eventually.  The author welcomes questions,
 155 comments, donations, and code submissions.  @xref{Bugs,,Submitting Bug
 156 Reports}, for instructions on contacting the author.
 157
 158 @node License, Credits, Introduction, Top
 159 @chapter Your rights and obligations
 160 @cindex license
 161 @cindex your rights and obligations
 162 @cindex rights, your
 163 @cindex obligations, your
 164
 165 @cindex Free Software Foundation
 166 @cindex GNU General Public License
 167 @cindex General Public License
 168 @cindex GPL
 169 @cindex distribution
 170 @cindex redistribution
 171 Most of PSPP is distributed under the GNU General Public
 172 License.  The General Public License says, in effect, that you may
 173 modify and distribute PSPP as you like, as long as you grant the
 174 same rights to others.  It also states that you must provide source code
 175 when you distribute PSPP, or, if you obtained PSPP
 176 source code from an anonymous ftp site, give out the name of that site.
 177
 178 The General Public License is given in full in the source distribution
 179 as file @file{COPYING}.  In Debian GNU/Linux, this file is also
 180 available as file @file{/usr/share/common-licenses/GPL-2}.
 181
 182 To quote the GPL itself:
 183
 184 @quotation
 185 This program is free software; you can redistribute it and/or modify it
 186 under the terms of the GNU General Public License as published by the
 187 Free Software Foundation; either version 2 of the License, or (at your
 188 option) any later version.
 189
 190 This program is distributed in the hope that it will be useful, but
 191 WITHOUT ANY WARRANTY; without even the implied warranty of
 192 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 193 General Public License for more details.
 194
 195 You should have received a copy of the GNU General Public License along
 196 with this program; if not, write to the Free Software Foundation, Inc.,
 197 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 198 @end quotation
 199
 200 @node Credits, Installation, License, Top
 201 @chapter Credits
 202 @cindex credits
 203 @cindex authors
 204
 205 @cindex Minton, Claire
 206 @cindex @cite{Cat's Cradle}
 207 @cindex Vonnegut, Kurt, Jr.
 208 @cindex quotations
 209 @quotation
 210 I'm always embarrassed when I see an index an author has made of his own
 211 work.  It's a shameless exhibition---to the @i{trained} eye.  Never
 212 index your own book.
 213
 214 ---Claire Minton, @cite{Cat's Cradle}, Kurt Vonnegut, Jr.
 215 @end quotation
 216
 217 @cindex Pfaff, Ben
 218 Most of PSPP, as well as this manual (including the indices),
 219 was written by Ben Pfaff.  @xref{Contacting the Author}, for
 220 instructions on contacting the author.
 221
 222 @cindex Covington, Michael A.
 223 @cindex Van Zandt, James
 224 @cindex @file{ftp.cdrom.com}
 225 @cindex @file{/pub/algorithms/c/julcal10}
 226 @cindex @file{julcal.c}
 227 @cindex @file{julcal.h}
 228 The PSPP source code incorporates @code{julcal10} originally
 229 written by Michael A. Covington and translated into C by Jim Van Zandt.
 230 The original package can be found in directory
 231 @url{ftp://ftp.cdrom.com/pub/algorithms/c/julcal10}.  The entire
 232 contents of that directory constitute the package.  The files actually
 233 used in PSPP are @code{julcal.c} and @code{julcal.h}.
 234
 235 @node Installation, Configuration, Credits, Top
 236 @chapter Installing PSPP
 237 @cindex installation
 238 @cindex PSPP, installing
 239
 240 @cindex GNU C compiler
 241 @cindex gcc
 242 @cindex compiler, recommended
 243 @cindex compiler, gcc
 244 PSPP conforms to the GNU Coding Standards.  PSPP is written in, and
 245 requires for proper operation, ANSI/ISO C.  You might want to
 246 additionally note the following points:
 247
 248 @itemize @bullet
 249 @item
 250 The compiler and linker must allow for significance of several
 251 characters in external identifiers.  The exact number is unknown but at
 252 least 31 is recommended.
 253
 254 @item
 255 The @code{int} type must be 32 bits or wider.
 256
 257 @item
 258 The recommended compiler is gcc 2.7.2.1 or later, but any ANSI compiler
 259 will do if it fits the above criteria.
 260 @end itemize
 261
 262 Many UNIX variants should work out-of-the-box, as PSPP uses GNU
 263 autoconf to detect differences between environments.  Please report any
 264 problems with compilation of PSPP under UNIX and UNIX-like operating
 265 systems---portability is a major concern of the author.
 266
 267 The pages below give specific instructions for installing PSPP
 268 on each type of system mentioned above.
 269
 270 @menu
 271 * UNIX installation::           Installing on UNIX-like environments.
 272 @end menu
 273
 274 @node UNIX installation,  , Installation, Installation
 275 @section UNIX installation
 276 @cindex UNIX, installing PSPP under
 277 @cindex installation, under UNIX
 278 @noindent
 279 To install PSPP under a UNIX-like operating system, follow the steps
 280 below in order.  Some of the text below was taken directly from various
 281 Free Software Foundation sources.
 282
 283 @enumerate
 284 @item
 285 @code{cd} to the directory containing the PSPP source.
 286
 287 @cindex configure, GNU
 288 @cindex GNU configure
 289 @item
 290 Type @samp{./configure} to configure for your particular operating
 291 system and compiler.  Running @code{configure} takes a while.  While
 292 running, it displays some messages telling which features it is checking
 293 for.
 294
 295 You can optionally supply some options to @code{configure} in order to
 296 give it hints about how to do its job.  Type @code{./configure --help}
 297 to see a list of options.  One of the most useful options is
 298 @samp{--with-checker}, which enables the use of the Checker memory
 299 debugger under supported operating systems.  Checker must already be
 300 installed to use this option.  Do not use @samp{--with-checker} if you
 301 are not debugging PSPP itself.
 302
 303 @cindex @file{Makefile}
 304 @cindex @file{config.h}
 305 @cindex @file{pref.h}
 306 @cindex makefile
 307 @item
 308 (optional) Edit @file{Makefile}, @file{config.h}, and @file{pref.h}.
 309 These files are produced by @code{configure}.  Note that most PSPP
 310 settings can be changed at runtime.
 311
 312 @file{pref.h} is only generated by @code{configure} if it does not
 313 already exist.  (It's copied from @file{prefh.orig}.)
 314
 315 @cindex compiling
 316 @item
 317 Type @samp{make} to compile the package.  If there are any errors during
 318 compilation, try to fix them.  If modifications are necessary to compile
 319 correctly under your configuration, contact the author.
 320 @xref{Bugs,,Submitting Bug Reports}, for details.
 321
 322 @cindex self-tests, running
 323 @item
 324 Type @samp{make check} to run self-tests on the compiled PSPP package.
 325
 326 @cindex installation
 327 @cindex PSPP, installing
 328 @cindex @file{/usr/local/share/pspp/}
 329 @cindex @file{/usr/local/bin/}
 330 @cindex @file{/usr/local/info/}
 331 @cindex documentation, installing
 332 @item
 333 Become the superuser and type @samp{make install} to install the
 334 PSPP binaries, by default in @file{/usr/local/bin/}.  The
 335 directory @file{/usr/local/share/pspp/} is created and populated with
 336 files needed by PSPP at runtime.  This step will also cause the
 337 PSPP documentation to be installed in @file{/usr/local/info/},
 338 but only if that directory already exists.
 339
 340 @item
 341 (optional) Type @samp{make clean} to delete the PSPP binaries
 342 from the source tree.
 343 @end enumerate
 344
 345 @node Configuration, Invocation, Installation, Top
 346 @chapter Configuring PSPP
 347 @cindex configuration
 348 @cindex PSPP, configuring
 349
 350 PSPP has dozens of configuration possibilities and hundreds of
 351 settings.  This is both a bane and a blessing.  On one hand, it's
 352 possible to easily accommodate diverse ranges of setups.  But, on the
 353 other, the multitude of possibilities can overwhelm the casual user.
 354 Fortunately, the configuration mechanisms are profusely described in the
 355 sections below@enddots{}
 356
 357 @menu
 358 * File locations::              How PSPP finds config files.
 359 * Configuration techniques::    Many different methods of configuration@enddots{}
 360 * Configuration files::         How configuration files are read.
 361 * Environment variables::       All about environment variables.
 362 * Output devices::              Describing your terminal(s) and printer(s).
 363 * PostScript driver class::     Configuration of PostScript devices.
 364 * ASCII driver class::          Configuration of character-code devices.
 365 * HTML driver class::           Configuration for HTML output.
 366 * Miscellaneous configuring::   Even more configuration variables.
 367 * Improving output quality::    Hints for producing ever-more-lovely output.
 368 @end menu
 369
 370 @node File locations, Configuration techniques, Configuration, Configuration
 371 @section Locating configuration files
 372
 373 PSPP uses the same method to find most of its configuration files:
 374
 375 @enumerate
 376 @item
 377 The @dfn{base name} of the file being sought is determined.
 378
 379 @item
 380 The path to search is determined.
 381
 382 @item
 383 Each directory in the search path, from left to right, is searched for a
 384 file with the name of the base name.  The first occurrence is read
 385 as the configuration file.
 386 @end enumerate
 387
 388 The first two steps are elaborated below for the sake of our pedantic
 389 friends.
 390
 391 @enumerate
 392 @item
 393 A @dfn{base name} is a file name lacking an absolute directory
 394 reference.  Some examples of base names are: @file{ps-encodings},
 395 @file{devices}, @file{devps/DESC} (under UNIX), @file{devps\DESC} (under
 396 M$ environments).
 397
 398 Determining the base name is a two-step process:
 399
 400 @enumerate a
 401 @item
 402 If the appropriate environment variable is defined, the value of that
 403 variable is used (@pxref{Environment variables}).  For instance, when
 404 searching for the output driver initialization file, the variable
 405 examined is @code{STAT_OUTPUT_INIT_FILE}.
 406
 407 @item
 408 Otherwise, the compiled-in default is used.  For example, when searching
 409 for the output driver initialization file, the default base name is
 410 @file{devices}.
 411 @end enumerate
 412
 413 @strong{Please note:} If a user-specified base name does contain an
 414 absolute directory reference, as in a file name like
 415 @file{/home/pfaff/fonts/TR}, no path is searched---the file name is used
 416 exactly as given---and the algorithm terminates.
 417
 418 @item
 419 The path is the first of the following that is defined:
 420
 421 @itemize @bullet
 422 @item
 423 A variable definition for the path given in the user environment.  This
 424 is a PSPP-specific environment variable name; for instance,
 425 @code{STAT_OUTPUT_INIT_PATH}.
 426
 427 @item
 428 In some cases, another, less-specific environment variable is checked.
 429 For instance, when searching for font files, the PostScript driver first
 430 checks for a variable with name @code{STAT_GROFF_FONT_PATH}, then for
 431 one with name @code{GROFF_FONT_PATH}.  (However, font searching has its
 432 own list of esoteric search rules.)
 433
 434 @item
 435 The configuration file path, which is itself determined by the
 436 following rules:
 437
 438 @enumerate a
 439 @item
 440 If the command line contains an option of the form @samp{-B @var{path}}
 441 or @samp{--config-dir=@var{path}}, then the value given on the
 442 rightmost occurrence of such an option is used.
 443
 444 @item
 445 Otherwise, if the environment variable @code{STAT_CONFIG_PATH} is
 446 defined, the value of that variable is used.
 447
 448 @item
 449 Otherwise, the compiled-in fallback default is used.  On UNIX machines,
 450 the default fallback path is
 451
 452 @enumerate 1
 453 @item
 454 @file{~/.pspp}
 455
 456 @item
 457 @file{/usr/local/lib/pspp}
 458
 459 @item
 460 @file{/usr/lib/pspp}
 461 @end enumerate
 462
 463 On DOS machines, the default fallback path is:
 464
 465 @enumerate 1
 466 @item
 467 All the paths from the DOS search path in the @samp{PATH} environment
 468 variable, in left-to-right order.
 469
 470 @item
 471 @file{C:\PSPP}, as a last resort.
 472 @end enumerate
 473
 474 Note that the installer of PSPP can easily change this default
 475 fallback path; thus the above should not be taken as gospel.
 476 @end enumerate
 477 @end itemize
 478 @end enumerate
 479
 480 As a final note: Under DOS, directories given in paths are delimited by
 481 semicolons (@samp{;}); under UNIX, directories are delimited by colons
 482 (@samp{:}).  This corresponds with the standard path delimiter under
 483 these OSes.
 484
 485 @node Configuration techniques, Configuration files, File locations, Configuration
 486 @section Configuration techniques
 487
 488 There are many ways that PSPP can be configured.  These are
 489 described in the list below.  Values given by earlier items take
 490 precedence over those given by later items.
 491
 492 @enumerate
 493 @item
 494 Syntax commands that modify settings, such as @code{SET}. @xref{SET}.
 495
 496 @item
 497 Command-line options.  @xref{Invocation}.
 498
 499 @item
 500 PSPP-specific environment variable contents.  @xref{Environment
 501 variables}.
 502
 503 @item
 504 General environment variable contents.  @xref{Environment variables}.
 505
 506 @item
 507 Configuration file contents.  @xref{Configuration files}.
 508
 509 @item
 510 Fallback defaults.
 511 @end enumerate
 512
 513 Some of the above may not apply to a particular setting.  For instance,
 514 the current pager (such as @samp{more}, @samp{most}, or @samp{less})
 515 cannot be determined by configuration file contents because there is no
 516 appropriate configuration file.
 517
 518 @node Configuration files, Environment variables, Configuration techniques, Configuration
 519 @section Configuration files
 520
 521 Most configuration files have a common form:
 522
 523 @itemize @bullet
 524 @item
 525 Each line forms a separate command or directive.  This means that lines
 526 cannot be broken up, unless they are spliced together with a trailing
 527 backslash, as described below.
 528
 529 @item
 530 Before anything else is done, trailing whitespace is removed.
 531
 532 @item
 533 When a line ends in a backslash (@samp{\}), the backslash is removed,
 534 and the next line is read and appended to the current line.
 535
 536 @itemize @minus
 537 @item
 538 Whitespace preceding the backslash is retained.
 539
 540 @item
 541 This rule continues to be applied until the line read does not end in a
 542 backslash.
 543
 544 @item
 545 It is an error if the last line in the file ends in a backslash.
 546 @end itemize
 547
 548 @item
 549 Comments are introduced by an octothorpe (@samp{#}), and continue until the
 550 end of the line.
 551
 552 @itemize @minus
 553 @item
 554 An octothorpe inside balanced pairs of double quotation marks (@samp{"})
 555 or single quotation marks (@samp{'}) does not introduce a comment.
 556
 557 @item
 558 The backslash character can be used inside balanced quotes of either
 559 type to escape the following character as a literal character.
 560
 561 (This is distinct from the use of a backslash as a line-splicing
 562 character.)
 563
 564 @item
 565 Line splicing takes place before comment removal.
 566 @end itemize
 567
 568 @item
 569 Blank lines, and lines that contain only whitespace, are ignored.
 570 @end itemize
 571
 572 @node Environment variables, Output devices, Configuration files, Configuration
 573 @section Environment variables
 574
 575 You may think the concept of environment variables is a fairly simple
 576 one.  However, the author of PSPP has found a way to complicate
 577 even something so simple.  Environment variables are further described
 578 in the sections below:
 579
 580 @menu
 581 * Variable values::             Values of variables are determined this way.
 582 * Environment substitutions::   How environment substitutions are made.
 583 * Predefined variables::        A few variables are automatically defined.
 584 @end menu
 585
 586 @node Variable values, Environment substitutions, Environment variables, Environment variables
 587 @subsection Values of environment variables
 588
 589 Values for environment variables are obtained by the following means,
 590 which are arranged in order of decreasing precedence:
 591
 592 @enumerate
 593 @item
 594 Command-line options.  @xref{Invocation}.
 595
 596 @item
 597 The @file{environment} configuration file---more on this below.
 598
 599 @item
 600 Actual environment variables (defined in the shell or other parent
 601 process).
 602 @end enumerate
 603
 604 The @file{environment} configuration file is located through application
 605 of the usual algorithm for configuration files (@pxref{File locations}),
 606 except that its contents do not affect the search path used to find
 607 @file{environment} itself.  Use of @file{environment} is discouraged on
 608 systems that allow an arbitrarily large environment; it is supported for
 609 use on systems like MS-DOS that limit environment size.
 610
 611 @file{environment} is composed of lines having the form
 612 @samp{@var{key}=@var{value}}, where @var{key} and the equals sign
 613 (@samp{=}) are required, and @var{value} is optional.  If @var{value} is
 614 given, variable @var{key} is given that value; if @var{value} is absent,
 615 variable @var{key} is undefined (deleted).  Variables may not be defined
 616 with a null value.
 617
 618 Environment substitutions are performed on each line in the file
 619 (@pxref{Environment substitutions}).
 620
 621 See @ref{Configuration files}, for more details on formatting of the
 622 environment configuration file.
 623
 624 @quotation
 625 @strong{Please note:} Support for @file{environment} is not yet
 626 implemented.
 627 @end quotation
 628
 629 @node Environment substitutions, Predefined variables, Variable values, Environment variables
 630 @subsection Environment substitutions
 631
 632 Much of the power of environment variables lies in the way that they may
 633 be substituted into configuration files.  Variable substitutions are
 634 described below.
 635
 636 The line is scanned from left to right.  In this scan, all characters
 637 other than dollar signs (@samp{$}) are retained unmolested.  Dollar
 638 signs, however, introduce an environment variable reference.  References
 639 take three forms:
 640
 641 @table @code
 642 @item $@var{var}
 643 Replaced by the value of environment variable @var{var}, determined as
 644 specified in @ref{Variable values}.  @var{var} must be one of the
 645 following:
 646
 647 @itemize @bullet
 648 @item
 649 One or more letters.
 650
 651 @item
 652 Exactly one nonalphabetic character.  This may not be a left brace
 653 (@samp{@{}).
 654 @end itemize
 655
 656 @item $@{@var{var}@}
 657 Same as above, but @var{var} may contain any character (except
 658 @samp{@}}).
 659
 660 @item $$
 661 Replaced by a single dollar sign.
 662 @end table
 663
 664 Undefined variables expand to a empty value.
 665
 666 @node Predefined variables,  , Environment substitutions, Environment variables
 667 @subsection Predefined environment variables
 668
 669 There are two environment variables predefined for use in environment
 670 substitutions:
 671
 672 @table @samp
 673 @item VER
 674 Defined as the version number of PSPP, as a string, in a format
 675 something like @samp{0.9.4}.
 676
 677 @item ARCH
 678 Defined as the host architecture of PSPP, as a string, in standard
 679 cpu-manufacturer-OS format.  For instance, Debian GNU/Linux 1.1 on an
 680 Intel machine defines this as @samp{i586-unknown-linux}.  This is
 681 somewhat dependent on the system used to compile PSPP.
 682 @end table
 683
 684 Nothing prevents these values from being overridden, although it's a
 685 good idea not to do so.
 686
 687 @node Output devices, PostScript driver class, Environment variables, Configuration
 688 @section Output devices
 689
 690 Configuring output devices is the most complicated aspect of configuring
 691 PSPP.  The output device configuration file is named
 692 @file{devices}.  It is searched for using the usual algorithm for
 693 finding configuration files (@pxref{File locations}).  Each line in the
 694 file is read in the usual manner for configuration files
 695 (@pxref{Configuration files}).
 696
 697 Lines in @file{devices} are divided into three categories, described
 698 briefly in the table below:
 699
 700 @table @i
 701 @item driver category definitions
 702 Define a driver in terms of other drivers.
 703
 704 @item macro definitions
 705 Define environment variables local to the the output driver
 706 configuration file.
 707
 708 @item device definitions
 709 Describe the configuration of an output device.
 710 @end table
 711
 712 The following sections further elaborate the contents of the
 713 @file{devices} file.
 714
 715 @menu
 716 * Driver categories::           How to organize the driver namespace.
 717 * Macro definitions::           Environment variables local to @file{devices}.
 718 * Device definitions::          Output device descriptions.
 719 * Dimensions::                  Lengths, widths, sizes, @enddots{}
 720 * papersize::                   Letter, legal, A4, envelope, @enddots{}
 721 * Distinguishing line types::   Details on @file{devices} parsing.
 722 * Tokenizing lines::            Dividing @file{devices} lines into tokens.
 723 @end menu
 724
 725 @node Driver categories, Macro definitions, Output devices, Output devices
 726 @subsection Driver categories
 727
 728 Drivers can be divided into categories.  Drivers are specified by their
 729 names, or by the names of the categories that they are contained in.
 730 Only certain drivers are enabled each time PSPP is run; by
 731 default, these are the drivers in the category `default'.  To enable a
 732 different set of drivers, use the @samp{-o @var{device}} command-line
 733 option (@pxref{Invocation}).
 734
 735 Categories are specified with a line of the form
 736 @samp{@var{category}=@var{driver1} @var{driver2} @var{driver3} @var{@dots{}}
 737 @var{driver@var{n}}}.  This line specifies that the category
 738 @var{category} is composed of drivers named @var{driver1},
 739 @var{driver2}, and so on.  There may be any number of drivers in the
 740 category, from zero on up.
 741
 742 Categories may also be specified on the command line
 743 (@pxref{Invocation}).
 744
 745 This is all you need to know about categories.  If you're still curious,
 746 read on.
 747
 748 First of all, the term `categories' is a bit of a misnomer.  In fact,
 749 the internal representation is nothing like the hierarchy that the term
 750 seems to imply: a linear list is used to keep track of the enabled
 751 drivers.
 752
 753 When PSPP first begins reading @file{devices}, this list contains
 754 the name of any drivers or categories specified on the command line, or
 755 the single item `default' if none were specified.
 756
 757 Each time a category definition is specified, the list is searched for
 758 an item with the value of @var{category}.  If a matching item is found,
 759 it is deleted.  If there was a match, the list of drivers (@var{driver1}
 760 through @var{driver@var{n}}) is then appended to the list.
 761
 762 Each time a driver definition line is encountered, the list is searched.
 763 If the list contains an item with that driver's name, the driver is
 764 enabled and the item is deleted from the list.  Otherwise, the driver
 765 is not enabled.
 766
 767 It is an error if the list is not empty when the end of @file{devices}
 768 is reached.
 769
 770 @node Macro definitions, Device definitions, Driver categories, Output devices
 771 @subsection Macro definitions
 772
 773 Macro definitions take the form @samp{define @var{macroname}
 774 @var{definition}}.  In such a macro definition, the environment variable
 775 @var{macroname} is defined to expand to the value @var{definition}.
 776 Before the definition is made, however, any macros used in
 777 @var{definition} are expanded.
 778
 779 Please note the following nuances of macro usage:
 780
 781 @itemize @bullet
 782 @item
 783 For the purposes of this section, @dfn{macro} and @dfn{environment
 784 variable} are synonyms.
 785
 786 @item
 787 Macros may not take arguments.
 788
 789 @item
 790 Macros may not recurse.
 791
 792 @item
 793 Macros are just environment variable definitions like other environment
 794 variable definitions, with the exception that they are limited in scope
 795 to the @file{devices} configuration file.
 796
 797 @item
 798 Macros override other all environment variables of the same name (within
 799 the scope of @file{devices}).
 800
 801 @item
 802 Earlier macro definitions for a particular @var{key} override later
 803 ones.  In particular, macro definitions on the command line override
 804 those in the device definition file.  @xref{Non-option Arguments}.
 805
 806 @item
 807 There are two predefined macros, whose values are determined at runtime:
 808
 809 @table @samp
 810 @item viewwidth
 811 Defined as the width of the console screen, in columns of text.
 812
 813 @item viewlength
 814 Defined as the length of the console screen, in lines of text.
 815 @end table
 816 @end itemize
 817
 818 @node Device definitions, Dimensions, Macro definitions, Output devices
 819 @subsection Driver definitions
 820
 821 Driver definitions are the ultimate purpose of the @file{devices}
 822 configuration file.  These are where the real action is.  Driver
 823 definitions tell PSPP where it should send its output.
 824
 825 Each driver definition line is divided into four fields.  These fields
 826 are delimited by colons (@samp{:}).  Each line is subjected to
 827 environment variable interpolation before it is processed further
 828 (@pxref{Environment substitutions}).  From left to right, the four
 829 fields are, in brief:
 830
 831 @table @i
 832 @item driver name
 833 A unique identifier, used to determine whether to enable the driver.
 834
 835 @item class name
 836 One of the predefined driver classes supported by PSPP.  The
 837 currently supported driver classes include `postscript' and `ascii'.
 838
 839 @item device type(s)
 840 Zero or more of the following keywords, delimited by spaces:
 841
 842 @table @code
 843 @item screen
 844
 845 Indicates that the device is a screen display.  This may reduce the
 846 amount of buffering done by the driver, to make interactive use more
 847 convenient.
 848
 849 @item printer
 850
 851 Indicates that the device is a printer.
 852
 853 @item listing
 854
 855 Indicates that the device is a listing file.
 856 @end table
 857
 858 These options are just hints to PSPP and do not cause the output to be
 859 directed to the screen, or to the printer, or to a listing file---those
 860 must be set elsewhere in the options.  They are used primarily to decide
 861 which devices should be enabled at any given time.  @xref{SET}, for more
 862 information.
 863
 864 @item options
 865 An optional set of options to pass to the driver itself.  The exact
 866 format for the options varies among drivers.
 867 @end table
 868
 869 The driver is enabled if:
 870
 871 @enumerate
 872 @item
 873 Its driver name is specified on the command line, or
 874
 875 @item
 876 It's in a category specified on the command line, or
 877
 878 @item
 879 If no categories or driver names are specified on the command line, it
 880 is in category @code{default}.
 881 @end enumerate
 882
 883 For more information on driver names, see @ref{Driver categories}.
 884
 885 The class name must be one of those supported by PSPP.  The
 886 classes supported depend on the options with which PSPP was
 887 compiled.  See later sections in this chapter for descriptions of the
 888 available driver classes.
 889
 890 Options are dependent on the driver.  See the driver descriptions for
 891 details.
 892
 893 @node Dimensions, papersize, Device definitions, Output devices
 894 @subsection Dimensions
 895
 896 Quite often in configuration it is necessary to specify a length or a
 897 size.  PSPP uses a common syntax for all such, calling them
 898 collectively by the name @dfn{dimensions}.
 899
 900 @itemize @bullet
 901 @item
 902 You can specify dimensions in decimal form (@samp{12.5}) or as
 903 fractions, either as mixed numbers (@samp{12-1/2}) or raw fractions
 904 (@samp{25/2}).
 905
 906 @item
 907 A number of different units are available.  These are suffixed to the
 908 numeric part of the dimension.  There must be no spaces between the
 909 number and the unit.  The available units are identical to those offered
 910 by the popular typesetting system @TeX{}:
 911
 912 @table @code
 913 @item in
 914 inch (1 @code{in} = 2.54 @code{cm})
 915
 916 @item "
 917 inch (1 @code{in} = 2.54 @code{cm})
 918
 919 @item pt
 920 printer's point (1 @code{in} = 72.27 @code{pt})
 921
 922 @item pc
 923 pica (12 @code{pt} = 1 @code{pc})
 924
 925 @item bp
 926 PostScript point (1 @code{in} = 72 @code{bp})
 927
 928 @item cm
 929 centimeter
 930
 931 @item mm
 932 millimeter (10 @code{mm} = 1 @code{cm})
 933
 934 @item dd
 935 didot point (1157 @code{dd} = 1238 @code{pt})
 936
 937 @item cc
 938 cicero (1 @code{cc} = 12 @code{dd})
 939
 940 @item sp
 941 scaled point (65536 @code{sp} = 1 @code{pt})
 942 @end table
 943
 944 @item
 945 If no explicit unit is given, a DWIM@footnote{Do What I Mean}
 946 ``feature'' attempts to guess the best unit:
 947
 948 @itemize @minus
 949 @item
 950 Numbers less than 50 are assumed to be in inches.
 951
 952 @item
 953 Numbers 50 or greater are assumed to be in millimeters.
 954 @end itemize
 955 @end itemize
 956
 957 @node papersize, Distinguishing line types, Dimensions, Output devices
 958 @subsection Paper sizes
 959
 960 Output drivers usually deal with some sort of hardcopy media.  This
 961 media is called @dfn{paper} by the drivers, though in reality it could
 962 be a transparency or film or thinly veiled sarcasm.  To make it easier
 963 for you to deal with paper, PSPP allows you to have (of course!) a
 964 configuration file that gives symbolic names, like ``letter'' or
 965 ``legal'' or ``a4'', to paper sizes, rather than forcing you to use
 966 cryptic numbers like ``8-1/2 x 11'' or ``210 by 297''.  Surprisingly
 967 enough, this configuration file is named @file{papersize}.
 968 @xref{Configuration files}.
 969
 970 When PSPP tries to connect a symbolic paper name to a paper size, it
 971 reads and parses each non-comment line in the file, in order.  The first
 972 field on each line must be a symbolic paper name in double quotes.
 973 Paper names may not contain double quotes.  Paper names are not
 974 case-sensitive: @samp{legal} and @samp{Legal} are equivalent.
 975
 976 If a match is found for the paper name, the rest of the line is parsed.
 977 If it is found to be a pair of dimensions (@pxref{Dimensions}) separated
 978 by either @samp{x} or @samp{by}, then those are taken to be the paper
 979 size, in order of width followed by length.  There @emph{must} be at
 980 least one space on each side of @samp{x} or @samp{by}.
 981
 982 Otherwise the line must be of the form
 983 @samp{"@var{paper-1}"="@var{paper-2}"}.  In this case the target of the
 984 search becomes paper name @var{paper-2} and the search through the file
 985 continues.
 986
 987 @node Distinguishing line types, Tokenizing lines, papersize, Output devices
 988 @subsection How lines are divided into types
 989
 990 The lines in @file{devices} are distinguished in the following manner:
 991
 992 @enumerate
 993 @item
 994 Leading whitespace is removed.
 995
 996 @item
 997 If the resulting line begins with the exact string @code{define},
 998 followed by one or more whitespace characters, the line is processed as
 999 a macro definition.
1000
1001 @item
1002 Otherwise, the line is scanned for the first instance of a colon
1003 (@samp{:}) or an equals sign (@samp{=}).
1004
1005 @item
1006 If a colon is encountered first, the line is processed as a driver
1007 definition.
1008
1009 @item
1010 Otherwise, if an equals sign is encountered, the line is processed as a
1011 macro definition.
1012
1013 @item
1014 Otherwise, the line is ill-formed.
1015 @end enumerate
1016
1017 @node Tokenizing lines,  , Distinguishing line types, Output devices
1018 @subsection How lines are divided into tokens
1019
1020 Each driver definition line is run through a simple tokenizer.  This
1021 tokenizer recognizes two basic types of tokens.
1022
1023 The first type is an equals sign (@samp{=}).  Equals signs are both
1024 delimiters between tokens and tokens in themselves.
1025
1026 The second type is an identifier or string token.  Identifiers and
1027 strings are equivalent after tokenization, though they are written
1028 differently.  An identifier is any string of characters other than
1029 whitespace or equals sign.
1030
1031 A string is introduced by a single- or double-quote character (@samp{'}
1032 or @samp{"}) and, in general, continues until the next occurrence of
1033 that same character.  The following standard C escapes can also be
1034 embedded within strings:
1035
1036 @table @code
1037 @item \'
1038 A single-quote (@samp{'}).
1039
1040 @item \"
1041 A double-quote (@samp{"}).
1042
1043 @item \?
1044 A question mark (@samp{?}).  Included for hysterical raisins.
1045
1046 @item \\
1047 A backslash (@samp{\}).
1048
1049 @item \a
1050 Audio bell (ASCII 7).
1051
1052 @item \b
1053 Backspace (ASCII 8).
1054
1055 @item \f
1056 Formfeed (ASCII 12).
1057
1058 @item \n
1059 Newline (ASCII 10)
1060
1061 @item \r
1062 Carriage return (ASCII 13).
1063
1064 @item \t
1065 Tab (ASCII 9).
1066
1067 @item \v
1068 Vertical tab (ASCII 11).
1069
1070 @item \@var{o}@var{o}@var{o}
1071 Each @samp{o} must be an octal digit.  The character is the one having
1072 the octal value specified.  Any number of octal digits is read and
1073 interpreted; only the lower 8 bits are used.
1074
1075 @item \x@var{h}@var{h}
1076 Each @samp{h} must be a hex digit.  The character is the one having the
1077 hexadecimal value specified.  Any number of hex digits is read and
1078 interpreted; only the lower 8 bits are used.
1079 @end table
1080
1081 Tokens, outside of quoted strings, are delimited by whitespace or equals
1082 signs.
1083
1084 @node PostScript driver class, ASCII driver class, Output devices, Configuration
1085 @section The PostScript driver class
1086
1087 The @code{postscript} driver class is used to produce output that is
1088 acceptable to PostScript printers and to PC-based PostScript
1089 interpreters such as Ghostscript.  Continuing a long tradition,
1090 PSPP's PostScript driver is configurable to the point of
1091 absurdity.
1092
1093 There are actually two PostScript drivers.  The first one,
1094 @samp{postscript}, produces ordinary DSC-compliant PostScript output.
1095 The second one @samp{epsf}, produces an Encapsulated PostScript file.
1096 The two drivers are otherwise identical in configuration and in
1097 operation.
1098
1099 The PostScript driver is described in further detail below.
1100
1101 @menu
1102 * PS output options::           Output file options.
1103 * PS page options::             Paper, margins, scaling & rotation, more!
1104 * PS file options::             Configuration files.
1105 * PS font options::             Default fonts, font options.
1106 * PS line options::             Line widths, options.
1107 * Prologue::                    Details on the PostScript prologue.
1108 * Encodings::                   Details on PostScript font encodings.
1109 @end menu
1110
1111 @node PS output options, PS page options, PostScript driver class, PostScript driver class
1112 @subsection PostScript output options
1113
1114 These options deal with the form of the output and the output file
1115 itself:
1116
1117 @table @code
1118 @item output-file=@var{filename}
1119
1120 File to which output should be sent.  This can be an ordinary filename
1121 (i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or
1122 stdout (@code{"-"}).  Default: @code{"pspp.ps"}.
1123
1124 @item color=@var{boolean}
1125
1126 Most of the time black-and-white PostScript devices are smart enough to
1127 map colors to shades themselves.  However, you can cause the PSPP
1128 output driver to do an ugly simulation of this in its own driver by
1129 turning @code{color} off.  Default: @code{on}.
1130
1131 This is a boolean setting, as are many settings in the PostScript
1132 driver.  Valid positive boolean values are @samp{on}, @samp{true},
1133 @samp{yes}, and nonzero integers.  Negative boolean values are
1134 @samp{off}, @samp{false}, @samp{no}, and zero.
1135
1136 @item data=@var{data-type}
1137
1138 One of @code{clean7bit}, @code{clean8bit}, or @code{binary}.  This
1139 controls what characters will be written to the output file.  PostScript
1140 produced with @code{clean7bit} can be transmitted over 7-bit
1141 transmission channels that use ASCII control characters for line
1142 control.  @code{clean8bit} is similar but allows characters above 127 to
1143 be written to the output file.  @code{binary} allows any character in
1144 the output file.  Default: @code{clean7bit}.
1145
1146 @item line-ends=@var{line-end-type}
1147
1148 One of @code{cr}, @code{lf}, or @code{crlf}.  This controls what is used
1149 for newline in the output file.  Default: @code{cr}.
1150
1151 @item optimize-line-size=@var{level}
1152
1153 Either @code{0} or @code{1}.  If @var{level} is @code{1}, then short
1154 line segments will be collected and merged into longer ones.  This
1155 reduces output file size but requires more time and memory.  A
1156 @var{level} of @code{0} has the advantage of being better for
1157 interactive environments.  @code{1} is the default unless the
1158 @code{screen} flag is set; in that case, the default is @code{0}.
1159
1160 @item optimize-text-size=@var{level}
1161
1162 One of @code{0}, @code{1}, or @code{2}, each higher level representing
1163 correspondingly more aggressive space savings for text in the output
1164 file and requiring correspondingly more time and memory.  Unfortunately
1165 the levels presently are all the same.  @code{1} is the default unless
1166 the @code{screen} flag is set; in that case, the default is @code{0}.
1167 @end table
1168
1169 @node PS page options, PS file options, PS output options, PostScript driver class
1170 @subsection PostScript page options
1171
1172 These options affect page setup:
1173
1174 @table @code
1175 @item headers=@var{boolean}
1176
1177 Controls whether the standard headers showing the time and date and
1178 title and subtitle are printed at the top of each page.  Default:
1179 @code{on}.
1180
1181 @item paper-size=@var{paper-size}
1182
1183 Paper size, either as a symbolic name (i.e., @code{letter} or @code{a4})
1184 or specific measurements (i.e., @code{8-1/2x11} or @code{"210 x 297"}.
1185 @xref{papersize, , Paper sizes}.  Default: @code{letter}.
1186
1187 @item orientation=@var{orientation}
1188
1189 Either @code{portrait} or @code{landscape}.  Default: @code{portrait}.
1190
1191 @item left-margin=@var{dimension}
1192 @itemx right-margin=@var{dimension}
1193 @itemx top-margin=@var{dimension}
1194 @itemx bottom-margin=@var{dimension}
1195
1196 Sets the margins around the page.  The headers, if enabled, are not
1197 included in the margins; they are in addition to the margins.  For a
1198 description of dimensions, see @ref{Dimensions}.  Default: @code{0.5in}.
1199
1200 @end table
1201
1202 @node PS file options, PS font options, PS page options, PostScript driver class
1203 @subsection PostScript file options
1204
1205 Oh, my.  You don't really want to know about the way that the PostScript
1206 driver deals with files, do you?  Well I suppose you're entitled, but I
1207 warn you right now: it's not pretty.  Here goes@enddots{}
1208
1209 First let's look at the options that are available:
1210
1211 @table @code
1212
1213 @item font-dir=@var{font-directory}
1214
1215 Sets the font directory.  Default: @code{devps}.
1216
1217 @item prologue-file=@var{prologue-file-name}
1218
1219 Sets the name of the PostScript prologue file.  You can write your own
1220 prologue, though I have no idea why you'd want to: see @ref{Prologue}.
1221 Default: @code{ps-prologue}.
1222
1223 @item device-file=@var{device-file-name}
1224
1225 Sets the name of the Groff-format device description file.  The
1226 PostScript driver reads this in order to know about the scaling of fonts
1227 and so on.  The format of such files is described in groff_font(5),
1228 included with Groff.  Default: @code{DESC}.
1229
1230 @item encoding-file=@var{encoding-file-name}
1231
1232 Sets the name of the encoding file.  This file contains a list of all
1233 font encodings that will be needed so that the driver can put all of
1234 them at the top of the prologue.  @xref{Encodings}.  Default:
1235 @code{ps-encodings}.
1236
1237 If the specified encoding file cannot be found, this error will be
1238 silently ignored, since most people do not need any encodings besides
1239 the ones that can be found using @code{auto-encodings}, described below.
1240
1241 @item auto-encode=@var{boolean}
1242
1243 When enabled, the font encodings needed by the default proportional- and
1244 fixed-pitch fonts will automatically be dumped to the PostScript
1245 output.  Otherwise, it is assumed that the user has an encoding file
1246 and knows how to use it (@pxref{Encodings}).  There is probably no good
1247 reason to turn off this convenient feature.  Default: @code{on}.
1248
1249 @end table
1250
1251 Next I suppose it's time to describe the search algorithm.  When the
1252 PostScript driver needs a file, whether that file be a font, a
1253 PostScript prologue, or what you will, it searches in this manner:
1254
1255 @enumerate
1256
1257 @item
1258 Constructs a path by taking the first of the following that is defined:
1259
1260 @enumerate a
1261
1262 @item
1263 Environment variable @code{STAT_GROFF_FONT_PATH}.  @xref{Environment
1264 variables}.
1265
1266 @item
1267 Environment variable @code{GROFF_FONT_PATH}.
1268
1269 @item
1270 The compiled-in fallback default.
1271 @end enumerate
1272
1273 @item
1274 Constructs a base name from concatenating, in order, the font directory,
1275 a path separator (@samp{/} or @samp{\}), and the file to be found.  A
1276 typical base name would be something like @code{devps/ps-encodings}.
1277
1278 @item
1279 Searches for the base name in the path constructed above.  If the file
1280 is found, the algorithm terminates.
1281
1282 @item
1283 Searches for the base name in the standard configuration path.  See
1284 @ref{File locations}, for more details.  If the file is found, the
1285 algorithm terminates.
1286
1287 @item
1288 At this point we remove the font directory and path separator from the
1289 base name.  Now the base name is simply the file to be found, i.e.,
1290 @code{ps-encodings}.
1291
1292 @item
1293 Searches for the base name in the path constructed in the first step.
1294 If the file is found, the algorithm terminates.
1295
1296 @item
1297 Searches for the base name in the standard configuration path.  If the
1298 file is found, the algorithm terminates.
1299
1300 @item
1301 The algorithm terminates unsuccessfully.
1302 @end enumerate
1303
1304 So, as you see, there are several ways to configure the PostScript
1305 drivers.  Careful selection of techniques can make the configuration
1306 very flexible indeed.
1307
1308 @node PS font options, PS line options, PS file options, PostScript driver class
1309 @subsection PostScript font options
1310
1311 The list of available font options is short and sweet:
1312
1313 @table @code
1314 @item prop-font=@var{font-name}
1315
1316 Sets the default proportional font.  The name should be that of a
1317 PostScript font.  Default: @code{"Helvetica"}.
1318
1319 @item fixed-font=@var{font-name}
1320
1321 Sets the default fixed-pitch font.  The name should be that of a
1322 PostScript font.  Default: @code{"Courier"}.
1323
1324 @item font-size=@var{font-size}
1325
1326 Sets the size of the default fonts, in thousandths of a point.  Default:
1327 @code{10000}.
1328
1329 @end table
1330
1331 @node PS line options, Prologue, PS font options, PostScript driver class
1332 @subsection PostScript line options
1333
1334 Most tables contain lines, or rules, between cells.  Some features of
1335 the way that lines are drawn in PostScript tables are user-definable:
1336
1337 @table @code
1338
1339 @item line-style=@var{style}
1340
1341 Sets the style used for lines used to divide tables into sections.
1342 @var{style} must be either @code{thick}, in which case thick lines are
1343 used, or @var{double}, in which case double lines are used.  Default:
1344 @code{thick}.
1345
1346 @item line-gutter=@var{dimension}
1347
1348 Sets the line gutter, which is the amount of whitespace on either side
1349 of lines that border text or graphics objects.  @xref{Dimensions}.
1350 Default: @code{0.5pt}.
1351
1352 @item line-spacing=@var{dimension}
1353
1354 Sets the line spacing, which is the amount of whitespace that separates
1355 lines that are side by side, as in a double line.  Default:
1356 @code{0.5pt}.
1357
1358 @item line-width=@var{dimension}
1359
1360 Sets the width of a typical line used in tables.  Default: @code{0.5pt}.
1361
1362 @item line-width-thick=@var{dimension}
1363
1364 Sets the width of a thick line used in tables.  Not used if
1365 @code{line-style} is set to @code{thick}.  Default: @code{1.5pt}.
1366
1367 @end table
1368
1369 @node Prologue, Encodings, PS line options, PostScript driver class
1370 @subsection The PostScript prologue
1371
1372 Most PostScript files that are generated mechanically by programs
1373 consist of two parts: a prologue and a body.  The prologue is generally
1374 a collection of boilerplate.  Only the body differs greatly between
1375 two outputs from the same program.
1376
1377 This is also the strategy used in the PSPP PostScript driver.  In
1378 general, the prologue supplied with PSPP will be more than sufficient.
1379 In this case, you will not need to read the rest of this section.
1380 However, hackers might want to know more.  Read on, if you fall into
1381 this category.
1382
1383 The prologue is dumped into the output stream essentially unmodified.
1384 However, two actions are performed on its lines.  First, certain lines
1385 may be omitted as specified in the prologue file itself.  Second,
1386 variables are substituted.
1387
1388 The following lines are omitted:
1389
1390 @enumerate
1391 @item
1392 All lines that contain three bangs in a row (@code{!!!}).
1393
1394 @item
1395 Lines that contain @code{!eps}, if the PostScript driver is producing
1396 ordinary PostScript output.  Otherwise an EPS file is being produced,
1397 and the line is included in the output, although everything following
1398 @code{!eps} is deleted.
1399
1400 @item
1401 Lines that contain @code{!ps}, if the PostScript driver is producing EPS
1402 output.  Otherwise, ordinary PostScript is being produced, and the line
1403 is included in the output, although everything following @code{!ps} is
1404 deleted.
1405 @end enumerate
1406
1407 The following are the variables that are substituted.  Only the
1408 variables listed are substituted; environment variables are not.
1409 @xref{Environment substitutions}.
1410
1411 @table @code
1412 @item bounding-box
1413
1414 The page bounding box, in points, as four space-separated numbers.  For
1415 U.S. letter size paper, this is @samp{0 0 612 792}.
1416
1417 @item creator
1418
1419 PSPP version as a string: @samp{GNU PSPP 0.1b}, for example.
1420
1421 @item date
1422
1423 Date the file was created.  Example: @samp{Tue May 21 13:46:22 1991}.
1424
1425 @item data
1426
1427 Value of the @code{data} PostScript driver option, as one of the strings
1428 @samp{Clean7Bit}, @samp{Clean8Bit}, or @samp{Binary}.
1429
1430 @item orientation
1431
1432 Page orientation, as one of the strings @code{Portrait} or
1433 @code{Landscape}.
1434
1435 @item user
1436
1437 Under multiuser OSes, the user's login name, taken either from the
1438 environment variable @code{LOGNAME} or, if that fails, the result of the
1439 C library function @code{getlogin()}.  Defaults to @samp{nobody}.
1440
1441 @item host
1442
1443 System hostname as reported by @code{gethostname()}.  Defaults to
1444 @samp{nowhere}.
1445
1446 @item prop-font
1447
1448 Name of the default proportional font, prefixed by the word
1449 @samp{font} and a space.  Example: @samp{font Times-Roman}.
1450
1451 @item fixed-font
1452
1453 Name of the default fixed-pitch font, prefixed by the word @samp{font}
1454 and a space.
1455
1456 @item scale-factor
1457
1458 The page scaling factor as a floating-point number.  Example:
1459 @code{1.0}.  Note that this is also passed as an argument to the BP
1460 macro.
1461
1462 @item paper-length
1463 @item paper-width
1464
1465 The paper length and paper width, respectively, in thousandths of a
1466 point.  Note that these are also passed as arguments to the BP macro.
1467
1468 @item left-margin
1469 @item top-margin
1470
1471 The left margin and top margin, respectively, in thousandths of a
1472 point.  Note that these are also passed as arguments to the BP macro.
1473
1474 @item title
1475
1476 Document title as a string.  This is not the title specified in the
1477 PSPP syntax file.  A typical title is the word @samp{PSPP} followed
1478 by the syntax file name in parentheses.  Example: @samp{PSPP
1479 (<stdin>)}.
1480
1481 @item source-file
1482
1483 PSPP syntax file name.  Example: @samp{mary96/first.stat}.
1484
1485 @end table
1486
1487 Any other questions about the PostScript prologue can best be answered
1488 by examining the default prologue or the PSPP source.
1489
1490 @node Encodings,  , Prologue, PostScript driver class
1491 @subsection PostScript encodings
1492
1493 PostScript fonts often contain many more than 256 characters, in order
1494 to accommodate foreign language characters and special symbols.
1495 PostScript uses @dfn{encodings} to map these onto single-byte symbol
1496 sets.  Each font can have many different encodings applied to it.
1497
1498 PSPP's PostScript driver needs to know which encoding to apply to each
1499 font.  It can determine this from the information encapsulated in the
1500 Groff font description that it reads.  However, there is an additional
1501 problem---for efficiency, the PostScript driver needs to have a complete
1502 list of all encodings that will be used in the entire session @emph{when
1503 it opens the output file}.  For this reason, it can't use the
1504 information built into the fonts because it doesn't know which fonts
1505 will be used.
1506
1507 As a stopgap solution, there are two mechanisms for specifying which
1508 encodings will be used.  The first mechanism is automatic and it is the
1509 only one that most PSPP users will ever need.  The second mechanism is
1510 manual, but it is more flexible.  Either mechanism or both may be used
1511 at one time.
1512
1513 The first mechanism is activated by the @samp{auto-encode} driver option
1514 (@pxref{PS file options}).  When enabled, @samp{auto-encode} causes the
1515 PostScript driver to include the encodings used by the default
1516 proportional and fixed-pitch fonts (@pxref{PS font options}).  Many
1517 PSPP output files will only need these encodings.
1518
1519 The second mechanism is the file specified by the @samp{encoding-file}
1520 option (@pxref{PS file options}).  If it exists, this file must consist
1521 of lines in PSPP configuration-file format (@pxref{Configuration
1522 files}).  Each line that is not a comment should name a PostScript
1523 encoding to include in the output.
1524
1525 It is not an error if an encoding is included more than once, by either
1526 mechanism.  It will appear only once in the output.  It is also not an
1527 error if an encoding is included in the output but never used.  It
1528 @emph{is} an error if an encoding is used but not included by one of
1529 these mechanisms.  In this case, the built-in PostScript encoding
1530 @samp{ISOLatin1Encoding} is substituted.
1531
1532 @node ASCII driver class, HTML driver class, PostScript driver class, Configuration
1533 @section The ASCII driver class
1534
1535 The ASCII driver class produces output that can be displayed on a
1536 terminal or output to printers.  All of its options are highly
1537 configurable.  The ASCII driver has class name @samp{ascii}.
1538
1539 The ASCII driver is described in further detail below.
1540
1541 @menu
1542 * ASCII output options::        Output file options.
1543 * ASCII page options::          Page size, margins, more.
1544 * ASCII font options::          Box character, bold & italics.
1545 @end menu
1546
1547 @node ASCII output options, ASCII page options, ASCII driver class, ASCII driver class
1548 @subsection ASCII output options
1549
1550 @table @code
1551 @item output-file=@var{filename}
1552
1553 File to which output should be sent.  This can be an ordinary filename
1554 (e.g., @code{"pspp.txt"}), a pipe filename (e.g., @code{"|lpr"}), or
1555 stdout (@code{"-"}).  Default: @code{"pspp.list"}.
1556
1557 @item char-set=@var{char-set-type}
1558
1559 One of @samp{ascii} or @samp{latin1}.  This has no effect on output at
1560 the present time.  Default: @code{ascii}.
1561
1562 @item form-feed-string=@var{form-feed-value}
1563
1564 The string written to the output to cause a formfeed.  See also
1565 @code{paginate}, described below, for a related setting.  Default:
1566 @code{"\f"}.
1567
1568 @item newline-string=@var{newline-value}
1569
1570 The string written to the output to cause a newline (carriage return
1571 plus linefeed).  The default, which can be specified explicitly with
1572 @code{newline-string=default}, is to use the system-dependent newline
1573 sequence by opening the output file in text mode.  This is usually the
1574 right choice.
1575
1576 However, @code{newline-string} can be set to any string.  When this is
1577 done, the output file is opened in binary mode.
1578
1579 @item paginate=@var{boolean}
1580
1581 If set, a formfeed (as set in @code{form-feed-string}, described above)
1582 will be written to the device after every page.  Default: @code{on}.
1583
1584 @item tab-width=@var{tab-width-value}
1585
1586 The distance between tab stops for this device.  If set to 0, tabs will
1587 not be used in the output.  Default: @code{8}.
1588
1589 @item init=@var{initialization-string}.
1590
1591 String written to the device before anything else, at the beginning of
1592 the output.  Default: @code{""} (the empty string).
1593
1594 @item done=@var{finalization-string}.
1595
1596 String written to the device after everything else, at the end of the
1597 output.  Default: @code{""} (the empty string).
1598 @end table
1599
1600 @node ASCII page options, ASCII font options, ASCII output options, ASCII driver class
1601 @subsection ASCII page options
1602
1603 These options affect page setup:
1604
1605 @table @code
1606 @item headers=@var{boolean}
1607
1608 If enabled, two lines of header information giving title and subtitle,
1609 page number, date and time, and PSPP version are printed at the top of
1610 every page.  These two lines are in addition to any top margin
1611 requested.  Default: @code{on}.
1612
1613 @item length=@var{line-count}
1614
1615 Physical length of a page, in lines.  Headers and margins are subtracted
1616 from this value.  Default: @code{66}.
1617
1618 @item width=@var{character-count}
1619
1620 Physical width of a page, in characters.  Margins are subtracted from
1621 this value.  Default: @code{130}.
1622
1623 @item lpi=@var{lines-per-inch}
1624
1625 Number of lines per vertical inch.  Not currently used.  Default: @code{6}.
1626
1627 @item cpi=@var{characters-per-inch}
1628
1629 Number of characters per horizontal inch.  Not currently used.  Default:
1630 @code{10}.
1631
1632 @item left-margin=@var{left-margin-width}
1633
1634 Width of the left margin, in characters.  PSPP subtracts this value
1635 from the page width.  Default: @code{0}.
1636
1637 @item right-margin=@var{right-margin-width}
1638
1639 Width of the right margin, in characters.  PSPP subtracts this value
1640 from the page width.  Default: @code{0}.
1641
1642 @item top-margin=@var{top-margin-lines}
1643
1644 Length of the top margin, in lines.  PSPP subtracts this value from
1645 the page length.  Default: @code{2}.
1646
1647 @item bottom-margin=@var{bottom-margin-lines}
1648
1649 Length of the bottom margin, in lines.  PSPP subtracts this value from
1650 the page length.  Default: @code{2}.
1651
1652 @end table
1653
1654 @node ASCII font options,  , ASCII page options, ASCII driver class
1655 @subsection ASCII font options
1656
1657 These are the ASCII font options:
1658
1659 @table @code
1660 @item box[@var{line-type}]=@var{box-chars}
1661
1662 The characters used for lines in tables produced by the ASCII driver can
1663 be changed using this option.  @var{line-type} is used to indicate which
1664 type of line to change; @var{box-chars} is the character or string of
1665 characters to use for this type of line.
1666
1667 @var{line-type} must be a 4-digit number in base 4.  The digits are in
1668 the order `right', `bottom', `left', `top'.  The four possibilities for
1669 each digit are:
1670
1671 @table @asis
1672 @item 0
1673 No line.
1674
1675 @item 1
1676 Single line.
1677
1678 @item 2
1679 Double line.
1680
1681 @item 3
1682 Special device-defined line, if one is available; otherwise, a double
1683 line.
1684 @end table
1685
1686 Examples:
1687
1688 @table @code
1689 @item box[0101]="|"
1690
1691 Sets @samp{|} as the character to use for a single-width line with
1692 bottom and top components.
1693
1694 @item box[2222]="#"
1695
1696 Sets @samp{#} as the character to use for the intersection of four
1697 double-width lines, one each from the top, bottom, left and right.
1698
1699 @item box[1100]="\xda"
1700
1701 Sets @samp{"\xda"}, which under MS-DOG is a box character suitable for
1702 the top-left corner of a box, as the character for the intersection of
1703 two single-width lines, one each from the right and bottom.
1704
1705 @end table
1706
1707 Defaults:
1708
1709 @itemize @bullet
1710 @item
1711 @code{box[0000]=" "}
1712
1713 @item
1714 @code{box[1000]="-"}
1715 @*@code{box[0010]="-"}
1716 @*@code{box[1010]="-"}
1717
1718 @item
1719 @code{box[0100]="|"}
1720 @*@code{box[0001]="|"}
1721 @*@code{box[0101]="|"}
1722
1723 @item
1724 @code{box[2000]="="}
1725 @*@code{box[0020]="="}
1726 @*@code{box[2020]="="}
1727
1728 @item
1729 @code{box[0200]="#"}
1730 @*@code{box[0002]="#"}
1731 @*@code{box[0202]="#"}
1732
1733 @item
1734 @code{box[3000]="="}
1735 @*@code{box[0030]="="}
1736 @*@code{box[3030]="="}
1737
1738 @item
1739 @code{box[0300]="#"}
1740 @*@code{box[0003]="#"}
1741 @*@code{box[0303]="#"}
1742
1743 @item
1744 For all others, @samp{+} is used unless there are double lines or
1745 special lines, in which case @samp{#} is used.
1746 @end itemize
1747
1748 @item italic-on=@var{italic-on-string}
1749
1750 Character sequence written to turn on italics or underline printing.  If
1751 this is set to @code{overstrike}, then the driver will simulate
1752 underlining by overstriking with underscore characters (@samp{_}) in the
1753 manner described by @code{overstrike-style} and
1754 @code{carriage-return-style}.  Default: @code{overstrike}.
1755
1756 @item italic-off=@var{italic-off-string}
1757
1758 Character sequence to turn off italics or underline printing.  Default:
1759 @code{""} (the empty string).
1760
1761 @item bold-on=@var{bold-on-string}
1762
1763 Character sequence written to turn on bold or emphasized printing.  If
1764 set to @code{overstrike}, then the driver will simulated bold printing
1765 by overstriking characters in the manner described by
1766 @code{overstrike-style} and @code{carriage-return-style}.  Default:
1767 @code{overstrike}.
1768
1769 @item bold-off=@var{bold-off-string}
1770
1771 Character sequence to turn off bold or emphasized printing.  Default:
1772 @code{""} (the empty string).
1773
1774 @item bold-italic-on=@var{bold-italic-on-string}
1775
1776 Character sequence written to turn on bold-italic printing.  If set to
1777 @code{overstrike}, then the driver will simulate bold-italics by
1778 overstriking twice, once with the character, a second time with an
1779 underscore (@samp{_}) character, in the manner described by
1780 @code{overstrike-style} and @code{carriage-return-style}.  Default:
1781 @code{overstrike}.
1782
1783 @item bold-italic-off=@var{bold-italic-off-string}
1784
1785 Character sequence to turn off bold-italic printing.  Default: @code{""}
1786 (the empty string).
1787
1788 @item overstrike-style=@var{overstrike-option}
1789
1790 Either @code{single} or @code{line}:
1791
1792 @itemize @bullet
1793 @item
1794 If @code{single} is selected, then, to overstrike a line of text, the
1795 output driver will output a character, backspace, overstrike, output a
1796 character, backspace, overstrike, and so on along a line.
1797
1798 @item
1799 If @code{line} is selected then the output driver will output an entire
1800 line, then backspace or emit a carriage return (as indicated by
1801 @code{carriage-return-style}), then overstrike the entire line at once.
1802 @end itemize
1803
1804 @code{single} is recommended for use with ttys and programs that
1805 understand overstriking in text files, such as the pager @code{less}.
1806 @code{single} will also work with printer devices but results in rapid
1807 back-and-forth motions of the printhead that can cause the printer to
1808 physically overheat!
1809
1810 @code{line} is recommended for use with printer devices.  Most programs
1811 that understand overstriking in text files will not properly deal with
1812 @code{line} mode.
1813
1814 Default: @code{single}.
1815
1816 @item carriage-return-style=@var{carriage-return-type}
1817
1818 Either @code{bs} or @code{cr}.  This option applies only when one or
1819 more of the font commands is set to @code{overstrike} and, at the same
1820 time, @code{overstrike-style} is set to @code{line}.
1821
1822 @itemize @bullet
1823 @item
1824 If @code{bs} is selected then the driver will return to the beginning of
1825 a line by emitting a sequence of backspace characters (ASCII 8).
1826
1827 @item
1828 If @code{cr} is selected then the driver will return to the beginning of
1829 a line by emitting a single carriage-return character (ASCII 13).
1830 @end itemize
1831
1832 Although @code{cr} is preferred as being more compact, @code{bs} is more
1833 general since some devices do not interpret carriage returns in the
1834 desired manner.  Default: @code{bs}.
1835 @end table
1836
1837 @node HTML driver class, Miscellaneous configuring, ASCII driver class, Configuration
1838 @section The HTML driver class
1839
1840 The @code{html} driver class is used to produce output for viewing in
1841 tables-capable web browsers such as Emacs' w3-mode.  Its configuration
1842 is very simple.  Currently, the output has a very plain format.  In the
1843 future, further work may be done on improving the output appearance.
1844
1845 There are few options for use with the @code{html} driver class:
1846
1847 @table @code
1848 @item output-file=@var{filename}
1849
1850 File to which output should be sent.  This can be an ordinary filename
1851 (i.e., @code{"pspp.ps"}), a pipe filename (i.e., @code{"|lpr"}), or
1852 stdout (@code{"-"}).  Default: @code{"pspp.html"}.
1853
1854 @item prologue-file=@var{prologue-file-name}
1855
1856 Sets the name of the PostScript prologue file.  You can write your own
1857 prologue if you want to customize colors or other settings: see
1858 @ref{HTML Prologue}.  Default: @code{html-prologue}.
1859 @end table
1860
1861 @menu
1862 * HTML Prologue::               Format of the HTML prologue file.
1863 @end menu
1864
1865 @node HTML Prologue,  , HTML driver class, HTML driver class
1866 @subsection The HTML prologue
1867
1868 HTML files that are generated by PSPP consist of two parts: a prologue
1869 and a body.  The prologue is a collection of boilerplate.  Only the body
1870 differs greatly between two outputs.  You can tune the colors and other
1871 attributes of the output by editing the prologue.
1872
1873 The prologue is dumped into the output stream essentially unmodified.
1874 However, two actions are performed on its lines.  First, certain lines
1875 may be omitted as specified in the prologue file itself.  Second,
1876 variables are substituted.
1877
1878 The following lines are omitted:
1879
1880 @enumerate
1881 @item
1882 All lines that contain three bangs in a row (@code{!!!}).
1883
1884 @item
1885 Lines that contain @code{!title}, if no title is set for the output.  If
1886 a title is set, then the characters @code{!title} are removed before the
1887 line is output.
1888
1889 @item
1890 Lines that contain @code{!subtitle}, if no subtitle is set for the
1891 output.  If a subtitle is set, then the characters @code{!subtitle} are
1892 removed before the line is output.
1893 @end enumerate
1894
1895 The following are the variables that are substituted.  Only the
1896 variables listed are substituted; environment variables are not.
1897 @xref{Environment substitutions}.
1898
1899 @table @code
1900 @item generator
1901
1902 PSPP version as a string: @samp{GNU PSPP 0.1b}, for example.
1903
1904 @item date
1905
1906 Date the file was created.  Example: @samp{Tue May 21 13:46:22 1991}.
1907
1908 @item user
1909
1910 Under multiuser OSes, the user's login name, taken either from the
1911 environment variable @code{LOGNAME} or, if that fails, the result of the
1912 C library function @code{getlogin()}.  Defaults to @samp{nobody}.
1913
1914 @item host
1915
1916 System hostname as reported by @code{gethostname()}.  Defaults to
1917 @samp{nowhere}.
1918
1919 @item title
1920
1921 Document title as a string.  This is the title specified in the PSPP
1922 syntax file.
1923
1924 @item subtitle
1925
1926 Document subtitle as a string.
1927
1928 @item source-file
1929
1930 PSPP syntax file name.  Example: @samp{mary96/first.stat}.
1931 @end table
1932
1933 @node Miscellaneous configuring, Improving output quality, HTML driver class, Configuration
1934 @section Miscellaneous configuration
1935
1936 The following environment variables can be used to further configure
1937 PSPP:
1938
1939 @table @code
1940 @item HOME
1941
1942 Used to determine the user's home directory.  No default value.
1943
1944 @item STAT_INCLUDE_PATH
1945
1946 Path used to find include files in PSPP syntax files.  Defaults vary
1947 across operating systems:
1948
1949 @table @asis
1950 @item UNIX
1951
1952 @itemize @bullet
1953 @item
1954 @file{.}
1955
1956 @item
1957 @file{~/.pspp/include}
1958
1959 @item
1960 @file{/usr/local/lib/pspp/include}
1961
1962 @item
1963 @file{/usr/lib/pspp/include}
1964
1965 @item
1966 @file{/usr/local/share/pspp/include}
1967
1968 @item
1969 @file{/usr/share/pspp/include}
1970 @end itemize
1971
1972 @item MS-DOS
1973
1974 @itemize @bullet
1975 @item
1976 @file{.}
1977
1978 @item
1979 @file{C:\PSPP\INCLUDE}
1980
1981 @item
1982 @file{$PATH}
1983 @end itemize
1984
1985 @item Other OSes
1986 No default path.
1987 @end table
1988
1989 @item STAT_PAGER
1990 @itemx PAGER
1991
1992 When PSPP invokes an external pager, it uses the first of these that
1993 is defined.  There is a default pager only if the person who compiled
1994 PSPP defined one.
1995
1996 @item TERM
1997
1998 The terminal type @code{termcap} or @code{ncurses} will use, if such
1999 support was compiled into PSPP.
2000
2001 @item STAT_OUTPUT_INIT_FILE
2002
2003 The basename used to search for the driver definition file.
2004 @xref{Output devices}.  @xref{File locations}.  Default: @code{devices}.
2005
2006 @item STAT_OUTPUT_PAPERSIZE_FILE
2007
2008 The basename used to search for the papersize file.  @xref{papersize}.
2009 @xref{File locations}.  Default: @code{papersize}.
2010
2011 @item STAT_OUTPUT_INIT_PATH
2012
2013 The path used to search for the driver definition file and the papersize
2014 file.  @xref{File locations}.  Default: the standard configuration path.
2015
2016 @item TMPDIR
2017
2018 The @code{sort} procedure stores its temporary files in this directory.
2019 Default: (UNIX) @file{/tmp}, (MS-DOS) @file{\}, (other OSes) empty string.
2020
2021 @item TEMP
2022 @item TMP
2023
2024 Under MS-DOS only, these variables are consulted after TMPDIR, in this
2025 order.
2026 @end table
2027
2028 @node Improving output quality,  , Miscellaneous configuring, Configuration
2029 @section Improving output quality
2030
2031 When its drivers are set up properly, PSPP can produce output that
2032 looks very good indeed.  The PostScript driver, suitably configured, can
2033 produce presentation-quality output.  Here are a few guidelines for
2034 producing better-looking output, regardless of output driver.  Your
2035 mileage may vary, of course, and everyone has different esthetic
2036 preferences.
2037
2038 @itemize @bullet
2039 @item
2040 Width is important in PSPP output.  Greater output width leads to more
2041 readable output, to a point.  Try the following to increase the output
2042 width:
2043
2044 @itemize @minus
2045 @item
2046 If you're using the ASCII driver with a dot-matrix printer, figure out
2047 what you need to do to put the printer into compressed mode.  Put that
2048 string into the @code{init-string} setting.  Try to get 132 columns; 160
2049 might be better, but you might find that print that tiny is difficult to
2050 read.
2051
2052 @item
2053 With the PostScript driver, try these ideas:
2054
2055 @itemize +
2056 @item
2057 Landscape mode.
2058
2059 @item
2060 Legal-size (8.5" x 14") paper in landscape mode.
2061
2062 @item
2063 Reducing font sizes.  If you're using 12-point fonts, try 10 point; if
2064 you're using 10-point fonts, try 8 point.  Some fonts are more readable
2065 than others at small sizes.
2066 @end itemize
2067 @end itemize
2068
2069 Try to strike a balance between character size and page width.
2070
2071 @item
2072 Use high-quality fonts.  Many public domain fonts are poor in quality.
2073 Recently, URW made some high-quality fonts available under the GPL.
2074 These are probably suitable.
2075
2076 @item
2077 Be sure you're using the proper font metrics.  The font metrics provided
2078 with PSPP may not correspond to the fonts actually being printed.
2079 This can cause bizarre-looking output.
2080
2081 @item
2082 Make sure that you're using good ink/ribbon/toner.  Darker print is
2083 easier to read.
2084
2085 @item
2086 Use plain fonts with serifs, such as Times-Roman or Palatino.  Avoid
2087 choosing italic or bold fonts as document base fonts.
2088 @end itemize
2089
2090 @node Invocation, Language, Configuration, Top
2091 @chapter Invoking PSPP
2092 @cindex invocation
2093 @cindex PSPP, invoking
2094
2095 @cindex command line, options
2096 @cindex options, command-line
2097 @example
2098 pspp [ -B @var{dir} | --config-dir=@var{dir} ] [ -o @var{device} | --device=@var{device} ]
2099        [ -d @var{var}[=@var{value}] | --define=@var{var}[=@var{value}] ] [-u @var{var} | --undef=@var{var} ]
2100        [ -f @var{file} | --out-file=@var{file} ] [ -p | --pipe ] [ -I- | --no-include ]
2101        [ -I @var{dir} | --include=@var{dir} ] [ -i | --interactive ]
2102        [ -n | --edit | --dry-run | --just-print | --recon ]
2103        [ -r | --no-statrc ] [ -h | --help ] [ -l | --list ]
2104        [ -c @var{command} | --command @var{command} ] [ -s | --safer ]
2105        [ --testing-mode ] [ -V | --version ] [ -v | --verbose ]
2106        [ @var{key}=@var{value} ] @var{file}@enddots{}
2107 @end example
2108
2109 @menu
2110 * Non-option Arguments::        Specifying syntax files and output devices.
2111 * Configuration Options::       Change the configuration for the current run.
2112 * Input and output options::    Controlling input and output files.
2113 * Language control options::    Language variants.
2114 * Informational options::       Helpful information about PSPP.
2115 @end menu
2116
2117 @node Non-option Arguments, Configuration Options, Invocation, Invocation
2118 @section Non-option Arguments
2119
2120 Syntax files and output device substitutions can be specified on
2121 PSPP's command line:
2122
2123 @table @code
2124 @item @var{file}
2125
2126 A file by itself on the command line will be executed as a syntax file.
2127 PSPP terminates after the syntax file runs, unless the @code{-i} or
2128 @code{--interactive} option is given (@pxref{Language control options}).
2129
2130 @item @var{file1} @var{file2}
2131
2132 When two or more filenames are given on the command line, the first
2133 syntax file is executed, then PSPP's dictionary is cleared, then the second
2134 syntax file is executed.
2135
2136 @item @var{file1} + @var{file2}
2137
2138 If syntax files' names are delimited by a plus sign (@samp{+}), then the
2139 dictionary is not cleared between their executions, as if they were
2140 concatenated together into a single file.
2141
2142 @item @var{key}=@var{value}
2143
2144 Defines an output device macro @var{key} to expand to @var{value},
2145 overriding any macro having the same @var{key} defined in the device
2146 configuration file.  @xref{Macro definitions}.
2147
2148 @end table
2149
2150 There is one other way to specify a syntax file, if your operating
2151 system supports it.  If you have a syntax file @file{foobar.stat}, put
2152 the notation
2153
2154 @example
2155 #! /usr/local/bin/pspp
2156 @end example
2157
2158 at the top, and mark the file as executable with @code{chmod +x
2159 foobar.stat}.  (If PSPP is not installed in @file{/usr/local/bin},
2160 then insert its actual installation directory into the syntax file
2161 instead.)  Now you should be able to invoke the syntax file just by
2162 typing its name.  You can include any options on the command line as
2163 usual.  PSPP entirely ignores any lines beginning with @samp{#!}.
2164
2165 @node Configuration Options, Input and output options, Non-option Arguments, Invocation
2166 @section Configuration Options
2167
2168 Configuration options are used to change PSPP's configuration for the
2169 current run.  The configuration options are:
2170
2171 @table @code
2172 @item -B @var{dir}
2173 @itemx --config-dir=@var{dir}
2174
2175 Sets the configuration directory to @var{dir}.  @xref{File locations}.
2176
2177 @item -o @var{device}
2178 @itemx --device=@var{device}
2179
2180 Selects the output device with name @var{device}.  If this option is
2181 given more than once, then all devices mentioned are selected.  This
2182 option disables all devices besides those mentioned on the command line.
2183
2184 @item -d @var{var}[=@var{value}]
2185 @itemx --define=@var{var}[=@var{value}]
2186
2187 Defines an `environment variable' named @var{var} having the optional
2188 value @var{value} specified.  @xref{Variable values}.
2189
2190 @item -u @var{var}
2191 @itemx --undef=@var{var}
2192
2193 Undefines the `environment variable' named @var{var}.  @xref{Variable
2194 values}.
2195 @end table
2196
2197 @node Input and output options, Language control options, Configuration Options, Invocation
2198 @section Input and output options
2199
2200 Input and output options affect how PSPP reads input and writes
2201 output.  These are the input and output options:
2202
2203 @table @code
2204 @item -f @var{file}
2205 @itemx --out-file=@var{file}
2206
2207 This overrides the output file name for devices designated as listing
2208 devices.  If a file named @var{file} already exists, it is overwritten.
2209
2210 @item -p
2211 @itemx --pipe
2212
2213 Allows PSPP to be used as a filter by causing the syntax file to be
2214 read from stdin and output to be written to stdout.  Conflicts with the
2215 @code{-f @var{file}} and @code{--file=@var{file}} options.
2216
2217 @item -I-
2218 @itemx --no-include
2219
2220 Clears all directories from the include path.  This includes all
2221 directories put in the include path by default.  @xref{Miscellaneous
2222 configuring}.
2223
2224 @item -I @var{dir}
2225 @itemx --include=@var{dir}
2226
2227 Appends directory @var{dir} to the path that is searched for include
2228 files in PSPP syntax files.
2229
2230 @item -c @var{command}
2231 @itemx --command=@var{command}
2232
2233 Execute literal command @var{command}.  The command is executed before
2234 startup syntax files, if any.
2235
2236 @item --testing-mode
2237
2238 Invoke heuristics to assist with testing PSPP.  For use by @code{make
2239 check} and similar scripts.
2240 @end table
2241
2242 @node Language control options, Informational options, Input and output options, Invocation
2243 @section Language control options
2244
2245 Language control options control how PSPP syntax files are parsed and
2246 interpreted.  The available language control options are:
2247
2248 @table @code
2249 @item -i
2250 @itemx --interactive
2251
2252 When a syntax file is specified on the command line, PSPP normally
2253 terminates after processing it.  Giving this option will cause PSPP to
2254 bring up a command prompt after processing the syntax file.
2255
2256 In addition, this forces syntax files to be interpreted in interactive
2257 mode, rather than the default batch mode.  @xref{Tokenizing lines}, for
2258 information on the differences between batch mode and interactive mode
2259 command interpretation.
2260
2261 @item -n
2262 @itemx --edit
2263 @itemx --dry-run
2264 @itemx --just-print
2265 @itemx --recon
2266
2267 Only the syntax of any syntax file specified or of commands entered at
2268 the command line is checked.  Transformations are not performed and
2269 procedures are not executed.  Not yet implemented.
2270
2271 @item -r
2272 @itemx --no-statrc
2273
2274 Prevents the execution of the PSPP startup syntax file.  Not yet
2275 implemented, as startup syntax files aren't, either.
2276
2277 @item -s
2278 @itemx --safer
2279
2280 Disables certain unsafe operations.  This includes the @code{ERASE} and
2281 @code{HOST} commands, as well as use of pipes as input and output files.
2282 @end table
2283
2284 @node Informational options,  , Language control options, Invocation
2285 @section Informational options
2286
2287 Informational options cause information about PSPP to be written to
2288 the terminal.  Here are the available options:
2289
2290 @table @code
2291 @item -h
2292 @item --help
2293
2294 Prints a message describing PSPP command-line syntax and the available
2295 device driver classes, then terminates.
2296
2297 @item -l
2298 @item --list
2299
2300 Lists the available device driver classes, then terminates.
2301
2302 @item -V
2303 @item --version
2304
2305 Prints a brief message listing PSPP's version, warranties you don't
2306 have, copying conditions and copyright, and e-mail address for bug
2307 reports, then terminates.
2308
2309 @item -v
2310 @item --verbose
2311
2312 Increments PSPP's verbosity level.  Higher verbosity levels cause
2313 PSPP to display greater amounts of information about what it is
2314 doing.  Often useful for debugging PSPP's configuration.
2315
2316 This option can be given multiple times to set the verbosity level to
2317 that value.  The default verbosity level is 0, in which no informational
2318 messages will be displayed.
2319
2320 Higher verbosity levels cause messages to be displayed when the
2321 corresponding events take place.
2322
2323 @table @asis
2324 @item 1
2325
2326 Driver and subsystem initializations.
2327
2328 @item 2
2329
2330 Completion of driver initializations.  Beginning of driver closings.
2331
2332 @item 3
2333
2334 Completion of driver closings.
2335
2336 @item 4
2337
2338 Files searched for; success of searches.
2339
2340 @item 5
2341
2342 Individual directories included in file searches.
2343 @end table
2344
2345 Each verbosity level also includes messages from lower verbosity levels.
2346
2347 @end table
2348
2349 @node Language, Expressions, Invocation, Top
2350 @chapter The PSPP language
2351 @cindex language, PSPP
2352 @cindex PSPP, language
2353
2354 @quotation
2355 @strong{Please note:} PSPP is not even close to completion.
2356 Only a few actual statistical procedures are implemented.  PSPP
2357 is a work in progress.
2358 @end quotation
2359
2360 This chapter discusses elements common to many PSPP commands.
2361 Later chapters will describe individual commands in detail.
2362
2363 @menu
2364 * Tokens::                      Characters combine to form tokens.
2365 * Commands::                    Tokens combine to form commands.
2366 * Types of Commands::           Commands come in several flavors.
2367 * Order of Commands::           Commands combine to form syntax files.
2368 * Missing Observations::        Handling missing observations.
2369 * Variables::                   The unit of data storage.
2370 * Files::                       Files used by PSPP.
2371 * BNF::                         How command syntax is described.
2372 @end menu
2373
2374 @node Tokens, Commands, Language, Language
2375 @section Tokens
2376 @cindex language, lexical analysis
2377 @cindex language, tokens
2378 @cindex tokens
2379 @cindex lexical analysis
2380 @cindex lexemes
2381
2382 PSPP divides most syntax file lines into series of short chunks
2383 called @dfn{tokens}, @dfn{lexical elements}, or @dfn{lexemes}.  These
2384 tokens are then grouped to form commands, each of which tells
2385 PSPP to take some action---read in data, write out data, perform
2386 a statistical procedure, etc.  The process of dividing input into tokens
2387 is @dfn{tokenization}, or @dfn{lexical analysis}.  Each type of token is
2388 described below.
2389
2390 @cindex delimiters
2391 @cindex whitespace
2392 Tokens must be separated from each other by @dfn{delimiters}.
2393 Delimiters include whitespace (spaces, tabs, carriage returns, line
2394 feeds, vertical tabs), punctuation (commas, forward slashes, etc.), and
2395 operators (plus, minus, times, divide, etc.)  Note that while whitespace
2396 only separates tokens, other delimiters are tokens in themselves.
2397
2398 @table @strong
2399 @cindex identifiers
2400 @item Identifiers
2401 Identifiers are names that specify variable names, commands, or command
2402 details.
2403
2404 @itemize @bullet
2405 @item
2406 The first character in an identifier must be a letter, @samp{#}, or
2407 @samp{@@}.  Some system identifiers begin with @samp{$}, but
2408 user-defined variables' names may not begin with @samp{$}.
2409
2410 @item
2411 The remaining characters in the identifier must be letters, digits, or
2412 one of the following special characters:
2413
2414 @example
2415 .  _  $  #  @@
2416 @end example
2417
2418 @item
2419 @cindex variable names
2420 @cindex names, variable
2421 Variable names may be any length, but only the first 8 characters are
2422 significant.
2423
2424 @item
2425 @cindex case-sensitivity
2426 Identifiers are not case-sensitive: @code{foobar}, @code{Foobar},
2427 @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are different
2428 representations of the same identifier.
2429
2430 @item
2431 @cindex keywords
2432 Identifiers other than variable names may be abbreviated to their first
2433 3 characters if this abbreviation is unambiguous.  These identifiers are
2434 often called @dfn{keywords}.  (Unique abbreviations of 3 or more
2435 characters are also accepted: @samp{FRE}, @samp{FREQ}, and
2436 @samp{FREQUENCIES} are equivalent when the last is a keyword.)
2437
2438 @item
2439 Whether an identifier is a keyword depends on the context.
2440
2441 @item
2442 @cindex keywords, reserved
2443 @cindex reserved keywords
2444 Some keywords are reserved.  These keywords may not be used in any
2445 context besides those explicitly described in this manual.  The reserved
2446 keywords are:
2447
2448 @example
2449 ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
2450 @end example
2451
2452 @item
2453 Since keywords are identifiers, all the rules for identifiers apply.
2454 Specifically, they must be delimited as are other identifiers:
2455 @code{WITH} is a reserved keyword, but @code{WITHOUT} is a valid
2456 variable name.
2457 @end itemize
2458
2459 @cindex @samp{.}
2460 @cindex period
2461 @cindex variable names, ending with period
2462 @strong{Caution:} It is legal to end a variable name with a period, but
2463 @emph{don't do it!}  The variable name will be misinterpreted when it is
2464 the final token on a line: @code{FOO.} will be divided into two separate
2465 tokens, @samp{FOO} and @samp{.}, the @dfn{terminal dot}.
2466 @xref{Commands, , Forming commands of tokens}.
2467
2468 @item Numbers
2469 @cindex numbers
2470 @cindex integers
2471 @cindex reals
2472 Numbers may be specified as integers or reals.  Integers are internally
2473 converted into reals.  Scientific notation is not supported.  Here are
2474 some examples of valid numbers:
2475
2476 @example
2477 1234  3.14159265359  .707106781185  8945.
2478 @end example
2479
2480 @strong{Caution:} The last example will be interpreted as two tokens,
2481 @samp{8945} and @samp{.}, if it is the last token on a line.
2482
2483 @item Strings
2484 @cindex strings
2485 @cindex @samp{'}
2486 @cindex @samp{"}
2487 @cindex case-sensitivity
2488 Strings are literal sequences of characters enclosed in pairs of single
2489 quotes (@samp{'}) or double quotes (@samp{"}).
2490
2491 @itemize @bullet
2492 @item
2493 Whitespace and case of letters @emph{are} significant inside strings.
2494 @item
2495 Whitespace characters inside a string are not delimiters.
2496 @item
2497 To include single-quote characters in a string, enclose the string in
2498 double quotes.
2499 @item
2500 To include double-quote characters in a string, enclose the string in
2501 single quotes.
2502 @item
2503 It is not possible to put both single- and double-quote characters
2504 inside one string.
2505 @end itemize
2506
2507 @item Hexstrings
2508 @cindex hexstrings
2509 Hexstrings are string variants that use hex digits to specify
2510 characters.
2511
2512 @itemize @bullet
2513 @item
2514 A hexstring may be used anywhere that an ordinary string is allowed.
2515
2516 @item
2517 @cindex @samp{X'}
2518 @cindex @samp{'}
2519 A hexstring begins with @samp{X'} or @samp{x'}, and ends with @samp{'}.
2520
2521 @cindex whitespace
2522 @item
2523 No whitespace is allowed between the initial @samp{X} and @samp{'}.
2524
2525 @item
2526 Double quotes @samp{"} may be used in place of single quotes @samp{'} if
2527 done in both places.
2528
2529 @item
2530 Each pair of hex digits is internally changed into a single character
2531 with the given value.
2532
2533 @item
2534 If there is an odd number of hex digits, the missing last digit is
2535 assumed to be @samp{0}.
2536
2537 @item
2538 @cindex portability
2539 @strong{Please note:} Use of hexstrings is nonportable because the same
2540 numeric values are associated with different glyphs by different
2541 operating systems.  Therefore, their use should be confined to syntax
2542 files that will not be widely distributed.
2543
2544 @item
2545 @cindex characters, reserved
2546 @cindex 0
2547 @cindex whitespace
2548 @strong{Please note also:} The character with value 00 is reserved for
2549 internal use by PSPP.  Its use in strings causes an error and
2550 replacement with a blank space (in ASCII, hex 20, decimal 32).
2551 @end itemize
2552
2553 @item Punctuation
2554 @cindex punctuation
2555 Punctuation separates tokens; punctuators are delimiters.  These are the
2556 punctuation characters:
2557
2558 @example
2559 ,  /  =  (  )
2560 @end example
2561
2562 @item Operators
2563 @cindex operators
2564 Operators describe mathematical operations.  Some operators are delimiters:
2565
2566 @example
2567 (  )  +  -  *  /  **
2568 @end example
2569
2570 Many of the above operators are also punctuators.  Punctuators are
2571 distinguished from operators by context.
2572
2573 The other operators are all reserved keywords.  None of these are
2574 delimiters:
2575
2576 @example
2577 AND  EQ  GE  GT  LE  LT  NE  OR
2578 @end example
2579
2580 @item Terminal Dot
2581 @cindex terminal dot
2582 @cindex dot, terminal
2583 @cindex period
2584 @cindex @samp{.}
2585 A period (@samp{.}) at the end of a line (except for whitespace) is one
2586 type of a @dfn{terminal dot}, although not every terminal dot is a
2587 period at the end of a line.  @xref{Commands, , Forming commands of
2588 tokens}.  A period is a terminal dot @emph{only}
2589 when it is at the end of a line; otherwise it is part of a
2590 floating-point number.  (A period outside a number in the middle of a
2591 line is an error.)
2592
2593 @quotation
2594 @cindex terminal dot, changing
2595 @cindex dot, terminal, changing
2596 @strong{Please note:} The character used for the @dfn{terminal dot} can
2597 be changed with the SET command.  This is strongly discouraged, and
2598 throughout all the remainder of this manual it will be assumed that the
2599 default setting is in effect.
2600 @end quotation
2601
2602 @end table
2603
2604 @node Commands, Types of Commands, Tokens, Language
2605 @section Forming commands of tokens
2606
2607 @cindex PSPP, command structure
2608 @cindex language, command structure
2609 @cindex commands, structure
2610
2611 Most PSPP commands share a common structure, diagrammed below:
2612
2613 @example
2614 @var{cmd}@dots{} [@var{sbc}[=][@var{spec} [[,]@var{spec}]@dots{}]] [[/[=][@var{spec} [[,]@var{spec}]@dots{}]]@dots{}].
2615 @end example
2616
2617 @cindex @samp{[  ]}
2618 In the above, rather daunting, expression, pairs of square brackets
2619 (@samp{[ ]}) indicate optional elements, and names such as @var{cmd}
2620 indicate parts of the syntax that vary from command to command.
2621 Ellipses (@samp{...}) indicate that the preceding part may be repeated
2622 an arbitrary number of times.  Let's pick apart what it says above:
2623
2624 @itemize @bullet
2625 @cindex commands, names
2626 @item
2627 A command begins with a command name of one or more keywords, such as
2628 @code{FREQUENCIES}, @code{DATA LIST}, or @code{N OF CASES}.  @var{cmd}
2629 may be abbreviated to its first word if that is unambiguous; each word
2630 in @var{cmd} may be abbreviated to a unique prefix of three or more
2631 characters as described above.
2632
2633 @cindex subcommands
2634 @item
2635 The command name may be followed by one or more @dfn{subcommands}:
2636
2637 @itemize @minus
2638 @item
2639 Each subcommand begins with a unique keyword, indicated by @var{sbc}
2640 above.  This is analogous to the command name.
2641
2642 @item
2643 The subcommand name is optionally followed by an equals sign (@samp{=}).
2644
2645 @item
2646 Some subcommands accept a series of one or more specifications
2647 (@var{spec}), optionally separated by commas.
2648
2649 @item
2650 Each subcommand must be separated from the next (if any) by a forward
2651 slash (@samp{/}).
2652 @end itemize
2653
2654 @cindex dot, terminal
2655 @cindex terminal dot
2656 @item
2657 Each command must be terminated with a @dfn{terminal dot}.
2658 The terminal dot may be given one of three ways:
2659
2660 @itemize @minus
2661 @item
2662 (most commonly) A period character at the very end of a line, as
2663 described above.
2664
2665 @item
2666 (only if NULLINE is on: @xref{SET, , Setting user preferences}, for more
2667 details.)  A completely blank line.
2668
2669 @item
2670 (in batch mode only) Any line that is not indented from the left side of
2671 the page causes a terminal dot to be inserted before that line.
2672 Therefore, each command begins with a line that is flush left, followed
2673 by zero or more lines that are indented one or more characters from the
2674 left margin.
2675
2676 In batch mode, PSPP will ignore a plus sign, minus sign, or period
2677 (@samp{+}, @samp{@minus{}}, or @samp{.}) as the first character in a
2678 line.  Any of these characters as the first character on a line will
2679 begin a new command.  This allows for visual indentation of a command
2680 without that command being considered part of the previous command.
2681
2682 PSPP is in batch mode when it is reading input from a file, rather
2683 than from an interactive user.  Note that the other forms of the
2684 terminal dot may also be used in batch mode.
2685
2686 Sometimes, one encounters syntax files that are intended to be
2687 interpreted in interactive mode rather than batch mode (for instance,
2688 this can happen if a session log file is used directly as a syntax
2689 file).  When this occurs, use the @samp{-i} command line option to force
2690 interpretation in interactive mode (@pxref{Language control options}).
2691 @end itemize
2692 @end itemize
2693
2694 PSPP ignores empty commands when they are generated by the above
2695 rules.  Note that, as a consequence of these rules, each command must
2696 begin on a new line.
2697
2698 @node Types of Commands, Order of Commands, Commands, Language
2699 @section Types of Commands
2700
2701 Commands in PSPP are divided roughly into six categories:
2702
2703 @table @strong
2704 @item Utility commands
2705 @cindex utility commands
2706 Set or display various global options that affect PSPP operations.
2707 May appear anywhere in a syntax file.  @xref{Utilities, , Utility
2708 commands}.
2709
2710 @item File definition commands
2711 @cindex file definition commands
2712 Give instructions for reading data from text files or from special
2713 binary ``system files''.  Most of these commands discard any previous
2714 data or variables in order to replace it with the new data and
2715 variables.  At least one must appear before the first command in any of
2716 the categories below.  @xref{Data Input and Output}.
2717
2718 @item Input program commands
2719 @cindex input program commands
2720 Though rarely used, these provide powerful tools for reading data files
2721 in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
2722
2723 @item Transformations
2724 @cindex transformations
2725 Perform operations on data and write data to output files.  Transformations
2726 are not carried out until a procedure is executed.
2727
2728 @item Restricted transformations
2729 @cindex restricted transformations
2730 Same as transformations for most purposes.  @xref{Order of Commands}, for a
2731 detailed description of the differences.
2732
2733 @item Procedures
2734 @cindex procedures
2735 Analyze data, writing results of analyses to the listing file.  Cause
2736 transformations specified earlier in the file to be performed.  In a
2737 more general sense, a @dfn{procedure} is any command that causes the
2738 active file (the data) to be read.
2739 @end table
2740
2741 @node Order of Commands, Missing Observations, Types of Commands, Language
2742 @section Order of Commands
2743 @cindex commands, ordering
2744 @cindex order of commands
2745
2746 PSPP does not place many restrictions on ordering of commands.
2747 The main restriction is that variables must be defined with one of the
2748 file-definition commands before they are otherwise referred to.
2749
2750 Of course, there are specific rules, for those who are interested.
2751 PSPP possesses five internal states, called initial, INPUT
2752 PROGRAM, FILE TYPE, transformation, and procedure states.  (Please note
2753 the distinction between the INPUT PROGRAM and FILE TYPE @emph{commands}
2754 and the INPUT PROGRAM and FILE TYPE @emph{states}.)
2755
2756 PSPP starts up in the initial state.  Each successful completion
2757 of a command may cause a state transition.  Each type of command has its
2758 own rules for state transitions:
2759
2760 @table @strong
2761 @item Utility commands
2762 @itemize @bullet
2763 @item
2764 Legal in all states.
2765 @item
2766 Do not cause state transitions.  Exception: when the N OF CASES command
2767 is executed in the procedure state, it causes a transition to the
2768 transformation state.
2769 @end itemize
2770
2771 @item DATA LIST
2772 @itemize @bullet
2773 @item
2774 Legal in all states.
2775 @item
2776 When executed in the initial or procedure state, causes a transition to
2777 the transformation state.
2778 @item
2779 Clears the active file if executed in the procedure or transformation
2780 state.
2781 @end itemize
2782
2783 @item INPUT PROGRAM
2784 @itemize @bullet
2785 @item
2786 Invalid in INPUT PROGRAM and FILE TYPE states.
2787 @item
2788 Causes a transition to the INPUT PROGRAM state.
2789 @item
2790 Clears the active file.
2791 @end itemize
2792
2793 @item FILE TYPE
2794 @itemize @bullet
2795 @item
2796 Invalid in INPUT PROGRAM and FILE TYPE states.
2797 @item
2798 Causes a transition to the FILE TYPE state.
2799 @item
2800 Clears the active file.
2801 @end itemize
2802
2803 @item Other file definition commands
2804 @itemize @bullet
2805 @item
2806 Invalid in INPUT PROGRAM and FILE TYPE states.
2807 @item
2808 Cause a transition to the transformation state.
2809 @item
2810 Clear the active file, except for ADD FILES, MATCH FILES, and UPDATE.
2811 @end itemize
2812
2813 @item Transformations
2814 @itemize @bullet
2815 @item
2816 Invalid in initial and FILE TYPE states.
2817 @item
2818 Cause a transition to the transformation state.
2819 @end itemize
2820
2821 @item Restricted transformations
2822 @itemize @bullet
2823 @item
2824 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
2825 @item
2826 Cause a transition to the transformation state.
2827 @end itemize
2828
2829 @item Procedures
2830 @itemize @bullet
2831 @item
2832 Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
2833 @item
2834 Cause a transition to the procedure state.
2835 @end itemize
2836 @end table
2837
2838 @node Missing Observations, Variables, Order of Commands, Language
2839 @section Handling missing observations
2840 @cindex missing values
2841 @cindex values, missing
2842
2843 PSPP includes special support for unknown numeric data values.
2844 Missing observations are assigned a special value, called the
2845 @dfn{system-missing value}.  This ``value'' actually indicates the
2846 absence of value; it means that the actual value is unknown.  Procedures
2847 automatically exclude from analyses those observations or cases that
2848 have missing values.  Whether single observations or entire cases are
2849 excluded depends on the procedure.
2850
2851 The system-missing value exists only for numeric variables.  String
2852 variables always have a defined value, even if it is only a string of
2853 spaces.
2854
2855 Variables, whether numeric or string, can have designated
2856 @dfn{user-missing values}.  Every user-missing value is an actual value
2857 for that variable.  However, most of the time user-missing values are
2858 treated in the same way as the system-missing value.  String variables
2859 that are wider than a certain width, usually 8 characters (depending on
2860 computer architecture), cannot have user-missing values.
2861
2862 For more information on missing values, see the following sections:
2863 @ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
2864 documentation on individual procedures for information on how they
2865 handle missing values.
2866
2867 @node Variables, Files, Missing Observations, Language
2868 @section Variables
2869 @cindex variables
2870 @cindex dictionary
2871
2872 Variables are the basic unit of data storage in PSPP.  All the
2873 variables in a file taken together, apart from any associated data, are
2874 said to form a @dfn{dictionary}.
2875 Some details of variables are described in the sections below.
2876
2877 @menu
2878 * Attributes::                  Attributes of variables.
2879 * System Variables::            Variables automatically defined by PSPP.
2880 * Sets of Variables::           Lists of variable names.
2881 * Input/Output Formats::        Input and output formats.
2882 * Scratch Variables::           Variables deleted by procedures.
2883 @end menu
2884
2885 @node Attributes, System Variables, Variables, Variables
2886 @subsection Attributes of Variables
2887 @cindex variables, attributes of
2888 @cindex attributes of variables
2889 Each variable has a number of attributes, including:
2890
2891 @table @strong
2892 @item Name
2893 This is an identifier.  Each variable must have a different name.
2894 @xref{Tokens}.
2895
2896 @cindex variables, type
2897 @cindex type of variables
2898 @item Type
2899 Numeric or string.
2900
2901 @cindex variables, width
2902 @cindex width of variables
2903 @item Width
2904 (string variables only) String variables with a width of 8 characters or
2905 fewer are called @dfn{short string variables}.  Short string variables
2906 can be used in many procedures where @dfn{long string variables} (those
2907 with widths greater than 8) are not allowed.
2908
2909 @quotation
2910 @strong{Please note:} Certain systems may consider strings longer than 8
2911 characters to be short strings.  Eight characters represents a minimum
2912 figure for the maximum length of a short string.
2913 @end quotation
2914
2915 @item Position
2916 Variables in the dictionary are arranged in a specific order.  The
2917 DISPLAY command can be used to show this order: see @ref{DISPLAY}.
2918
2919 @item Orientation
2920 Dexter or sinister.  @xref{LEAVE}.
2921
2922 @cindex missing values
2923 @cindex values, missing
2924 @item Missing values
2925 Optionally, up to three values, or a range of values, or a specific
2926 value plus a range, can be specified as @dfn{user-missing values}.
2927 There is also a @dfn{system-missing value} that is assigned to an
2928 observation when there is no other obvious value for that observation.
2929 Observations with missing values are automatically excluded from
2930 analyses.  User-missing values are actual data values, while the
2931 system-missing value is not a value at all.  @xref{Missing Observations}.
2932
2933 @cindex variable labels
2934 @cindex labels, variable
2935 @item Variable label
2936 A string that describes the variable.  @xref{VARIABLE LABELS}.
2937
2938 @cindex value labels
2939 @cindex labels, value
2940 @item Value label
2941 Optionally, these associate each possible value of the variable with a
2942 string.  @xref{VALUE LABELS}.
2943
2944 @cindex print format
2945 @item Print format
2946 Display width, format, and (for numeric variables) number of decimal
2947 places.  This attribute does not affect how data are stored, just how
2948 they are displayed.  Example: a width of 8, with 2 decimal places.
2949 @xref{PRINT FORMATS}.
2950
2951 @cindex write format
2952 @item Write format
2953 Similar to print format, but used by certain commands that are
2954 designed to write to binary files.  @xref{WRITE FORMATS}.
2955 @end table
2956
2957 @node System Variables, Sets of Variables, Attributes, Variables
2958 @subsection Variables Automatically Defined by PSPP
2959 @cindex system variables
2960 @cindex variables, system
2961
2962 There are seven system variables.  These are not like ordinary
2963 variables, as they are not stored in each case.  They can only be used
2964 in expressions.  These system variables, whose values and output formats
2965 cannot be modified, are described below.
2966
2967 @table @code
2968 @cindex @code{$CASENUM}
2969 @item $CASENUM
2970 Case number of the case at the moment.  This changes as cases are
2971 shuffled around.
2972
2973 @cindex @code{$DATE}
2974 @item $DATE
2975 Date the PSPP process was started, in format A9, following the
2976 pattern @code{DD MMM YY}.
2977
2978 @cindex @code{$JDATE}
2979 @item $JDATE
2980 Number of days between 15 Oct 1582 and the time the PSPP process
2981 was started.
2982
2983 @cindex @code{$LENGTH}
2984 @item $LENGTH
2985 Page length, in lines, in format F11.
2986
2987 @cindex @code{$SYSMIS}
2988 @item $SYSMIS
2989 System missing value, in format F1.
2990
2991 @cindex @code{$TIME}
2992 @item $TIME
2993 Number of seconds between midnight 14 Oct 1582 and the time the active file
2994 was read, in format F20.
2995
2996 @cindex @code{$WIDTH}
2997 @item $WIDTH
2998 Page width, in characters, in format F3.
2999 @end table
3000
3001 @node Sets of Variables, Input/Output Formats, System Variables, Variables
3002 @subsection Lists of variable names
3003 @cindex TO convention
3004 @cindex convention, TO
3005
3006 There are several ways to specify a set of variables:
3007
3008 @enumerate
3009 @item
3010 (Most commonly.)  List the variable names one after another, optionally
3011 separating them by commas.
3012
3013 @cindex @code{TO}
3014 @item
3015 (This method cannot be used on commands that define the dictionary, such
3016 as @code{DATA LIST}.)  The syntax is the names of two existing variables,
3017 separated by the reserved keyword @code{TO}.  The meaning is to include
3018 every variable in the dictionary between and including the variables
3019 specified.  For instance, if the dictionary contains six variables with
3020 the names @code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
3021 @code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
3022 variables @code{X2}, @code{GOAL}, and @code{MET}.
3023
3024 @item
3025 (This method can be used only on commands that define the dictionary,
3026 such as @code{DATA LIST}.)  It is used to define sequences of variables
3027 that end in consecutive integers.  The syntax is two identifiers that
3028 end in numbers.  This method is best illustrated with examples:
3029
3030 @itemize @bullet
3031 @item
3032 The syntax @code{X1 TO X5} defines 5 variables:
3033
3034 @itemize @minus
3035 @item
3036 X1
3037 @item
3038 X2
3039 @item
3040 X3
3041 @item
3042 X4
3043 @item
3044 X5
3045 @end itemize
3046
3047 @item
3048 The syntax @code{ITEM0008 TO ITEM0013} defines 6 variables:
3049
3050 @itemize @minus
3051 @item
3052 ITEM0008
3053 @item
3054 ITEM0009
3055 @item
3056 ITEM0010
3057 @item
3058 ITEM0011
3059 @item
3060 ITEM0012
3061 @item
3062 ITEM0013
3063 @end itemize
3064
3065 @item
3066 Each of the syntaxes @code{QUES001 TO QUES9} and @code{QUES6 TO QUES3}
3067 are invalid, although for different reasons, which should be evident.
3068 @end itemize
3069
3070 Note that after a set of variables has been defined with @code{DATA LIST}
3071 or another command with this method, the same set can be referenced on
3072 later commands using the same syntax.
3073
3074 @item
3075 The above methods can be combined, either one after another or delimited
3076 by commas.  For instance, the combined syntax @code{A Q5 TO Q8 X TO Z}
3077 is legal as long as each part @code{A}, @code{Q5 TO Q8}, @code{X TO Z}
3078 is individually legal.
3079 @end enumerate
3080
3081 @node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
3082 @subsection Input and Output Formats
3083
3084 Data that PSPP inputs and outputs must have one of a number of formats.
3085 These formats are described, in general, by a format specification of
3086 the form @code{NAMEw.d}, where @var{name} is the
3087 format name and @var{w} is a field width.  @var{d} is the optional
3088 desired number of decimal places, if appropriate.  If @var{d} is not
3089 included then it is assumed to be 0.  Some formats do not allow @var{d}
3090 to be specified.
3091
3092 When an input format is specified on DATA LIST or another command, then
3093 it is converted to an output format for the purposes of PRINT and other
3094 data output commands.  For most purposes, input and output formats are
3095 the same; the salient differences are described below.
3096
3097 Below are listed the input and output formats supported by PSPP.  If an
3098 input format is mapped to a different output format by default, then
3099 that mapping is indicated with @result{}.  Each format has the listed
3100 bounds on input width (iw) and output width (ow).
3101
3102 The standard numeric input and output formats are given in the following
3103 table:
3104
3105 @table @asis
3106 @item Fw.d: 1 <= iw,ow <= 40
3107 Standard decimal format with @var{d} decimal places.  If the number is
3108 too large to fit within the field width, it is expressed in scientific
3109 notation (@code{1.2+34}) if w >= 6, with always at least two digits in
3110 the exponent.  When used as an input format, scientific notation is
3111 allowed but an E or an F must be used to introduce the exponent.
3112
3113 The default output format is the same as the input format, except if
3114 @var{d} > 1.  In that case the output @var{w} is always made to be at
3115 least 2 + @var{d}.
3116
3117 @item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
3118 For input this is equivalent to F format except that no E or F is
3119 require to introduce the exponent.  For output, produces scientific
3120 notation in the form @code{1.2+34}.  There are always at least two
3121 digits given in the exponent.
3122
3123 The default output @var{w} is the largest of the input @var{w}, the
3124 input @var{d} + 7, and 10.  The default output @var{d} is the input
3125 @var{d}, but at least 3.
3126
3127 @item COMMAw.d: 1 <= iw,ow <= 40
3128 Equivalent to F format, except that groups of three digits are
3129 comma-separated on output.  If the number is too large to express in the
3130 field width, then first commas are eliminated, then if there is still
3131 not enough space the number is expressed in scientific notation given
3132 that w >= 6.  Commas are allowed and ignored when this is used as an
3133 input format.
3134
3135 @item DOTw.d: 1 <= iw,ow <= 40
3136 Equivalent to COMMA format except that the roles of comma and decimal
3137 point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
3138 COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
3139 decimal point.
3140
3141 @item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
3142 Equivalent to COMMA format, except that the number is prefixed by a
3143 dollar sign (@samp{$}) if there is room.  On input the value is allowed
3144 to be prefixed by a dollar sign, which is ignored.
3145
3146 The default output @var{w} is the input @var{w}, but at least 2.
3147
3148 @item PCTw.d: 2 <= iw,ow <= 40
3149 Equivalent to F format, except that the number is suffixed by a percent
3150 sign (@samp{%}) if there is room.  On input the value is allowed to be
3151 suffixed by a percent sign, which is ignored.
3152
3153 The default output @var{w} is the input @var{w}, but at least 2.
3154
3155 @item Nw.d: 1 <= iw,ow <= 40
3156 Only digits are allowed within the field width.  The decimal point is
3157 assumed to be @var{d} digits from the right margin.
3158
3159 The default output format is F with the same @var{w} and @var{d}, except
3160 if @var{d} > 1.  In that case the output @var{w} is always made to be at
3161 least 2 + @var{d}.
3162
3163 @item Zw.d @result{} F: 1 <= iw,ow <= 40
3164 Zoned decimal input.  If you need to use this then you know how.
3165
3166 @item IBw.d @result{} F: 1 <= iw,ow <= 8
3167 Integer binary format.  The field is interpreted as a fixed-point
3168 positive or negative binary number in two's-complement notation.  The
3169 location of the decimal point is implied.  Endianness is the same as the
3170 host machine.
3171
3172 The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
3173 with output @var{w} as 9 + input @var{d} and output @var{d} as input
3174 @var{d}.
3175
3176 @item PIB @result{} F: 1 <= iw,ow <= 8
3177 Positive integer binary format.  The field is interpreted as a
3178 fixed-point positive binary number.  The location of the decimal point
3179 is implied.  Endianness is teh same as the host machine.
3180
3181 The default output format follows the rules for IB format.
3182
3183 @item Pw.d @result{} F: 1 <= iw,ow <= 16
3184 Binary coded decimal format.  Each byte from left to right, except the
3185 rightmost, represents two digits.  The upper nibble of each byte is more
3186 significant.  The upper nibble of the final byte is the least
3187 significant digit.  The lower nibble of the final byte is the sign; a
3188 value of D represents a negative sign and all other values are
3189 considered positive.  The decimal point is implied.
3190
3191 The default output format follows the rules for IB format.
3192
3193 @item PKw.d @result{} F: 1 <= iw,ow <= 16
3194 Positive binary code decimal format.  Same as P but the last byte is the
3195 same as the others.
3196
3197 The default output format follows the rules for IB format.
3198
3199 @item RBw @result{} F: 2 <= iw,ow <= 8
3200
3201 Binary C architecture-dependent ``double'' format.  For a standard
3202 IEEE754 implementation @var{w} should be 8.
3203
3204 The default output format follows the rules for IB format.
3205
3206 @item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
3207 PIB format encoded as textual hex digit pairs.  @var{w} must be even.
3208
3209 The input width is mapped to a default output width as follows:
3210 2@result{}4, 4@result{}6, 6@result{}9, 8@result{}11, 10@result{}14,
3211 12@result{}16, 14@result{}18, 16@result{}21.  No allowances are made for
3212 decimal places.
3213
3214 @item RBHEXw @result{} F: 4 <= iw,ow <= 16
3215
3216 RB format encoded as textual hex digits pairs.  @var{w} must be even.
3217
3218 The default output format is F8.2.
3219
3220 @item CCAw.d: 1 <= ow <= 40
3221 @itemx CCBw.d: 1 <= ow <= 40
3222 @itemx CCCw.d: 1 <= ow <= 40
3223 @itemx CCDw.d: 1 <= ow <= 40
3224 @itemx CCEw.d: 1 <= ow <= 40
3225
3226 User-defined custom currency formats.  May not be used as an input
3227 format.  @xref{SET}, for more details.
3228 @end table
3229
3230 The date and time numeric input and output formats accept a number of
3231 possible formats.  Before describing the formats themselves, some
3232 definitions of the elements that make up their formats will be helpful:
3233
3234 @table @dfn
3235 @item leader
3236 All formats accept an optional whitespace leader.
3237
3238 @item day
3239 An integer between 1 and 31 representing the day of month.
3240
3241 @item day-count
3242 An integer representing a number of days.
3243
3244 @item date-delimiter
3245 One or more characters of whitespace or the following characters:
3246 @code{- / . ,}
3247
3248 @item month
3249 A month name in one of the following forms:
3250 @itemize @bullet
3251 @item
3252 An integer between 1 and 12.
3253 @item
3254 Roman numerals representing an integer between 1 and 12.
3255 @item
3256 At least the first three characters of an English month name (January,
3257 February, @dots{}).
3258 @end itemize
3259
3260 @item year
3261 An integer year number between 1582 and 19999, or between 1 and 199.
3262 Years between 1 and 199 will have 1900 added.
3263
3264 @item julian
3265 A single number with a year number in the first 2, 3, or 4 digits (as
3266 above) and the day number within the year in the last 3 digits.
3267
3268 @item quarter
3269 An integer between 1 and 4 representing a quarter.
3270
3271 @item q-delimiter
3272 The letter @samp{Q} or @samp{q}.
3273
3274 @item week
3275 An integer between 1 and 53 representing a week within a year.
3276
3277 @item wk-delimiter
3278 The letters @samp{wk} in any case.
3279
3280 @item time-delimiter
3281 At least one characters of whitespace or @samp{:} or @samp{.}.
3282
3283 @item hour
3284 An integer greater than 0 representing an hour.
3285
3286 @item minute
3287 An integer between 0 and 59 representing a minute within an hour.
3288
3289 @item opt-second
3290 Optionally, a time-delimiter followed by a real number representing a
3291 number of seconds.
3292
3293 @item hour24
3294 An integer between 0 and 23 representing an hour within a day.
3295
3296 @item weekday
3297 At least the first two characters of an English day word.
3298
3299 @item spaces
3300 Any amount or no amount of whitespace.
3301
3302 @item sign
3303 An optional positive or negative sign.
3304
3305 @item trailer
3306 All formats accept an optional whitespace trailer.
3307 @end table
3308
3309 The date input formats are strung together from the above pieces.  On
3310 output, the date formats are always printed in a single canonical
3311 manner, based on field width.  The date input and output formats are
3312 described below:
3313
3314 @table @asis
3315 @item DATEw: 9 <= iw,ow <= 40
3316 Date format. Input format: leader + day + date-delimiter +
3317 month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
3318 @var{w} < 11, DD-MMM-YYYY otherwise.
3319
3320 @item EDATEw: 8 <= iw,ow <= 40
3321 European date format.  Input format same as DATE.  Output format:
3322 DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.
3323
3324 @item SDATEw: 8 <= iw,ow <= 40
3325 Standard date format. Input format: leader + year + date-delimiter +
3326 month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
3327 @var{w} < 10, YYYY/MM/DD otherwise.
3328
3329 @item ADATEw: 8 <= iw,ow <= 40
3330 American date format.  Input format: leader + month + date-delimiter +
3331 day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
3332 @var{w} < 10, MM/DD/YYYY otherwise.
3333
3334 @item JDATEw: 5 <= iw,ow <= 40
3335 Julian date format.  Input format: leader + julian + trailer.  Output
3336 format: YYDDD for @var{w} < 7, YYYYDDD otherwise.
3337
3338 @item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
3339 Quarter/year format.  Input format: leader + quarter + q-delimiter +
3340 year + trailer.  Output format: @samp{Q Q YY}, where the first
3341 @samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
3342 YYYY} otherwise.
3343
3344 @item MOYRw: 6 <= iw,ow <= 40
3345 Month/year format.  Input format: leader + month + date-delimiter + year
3346 + trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
3347 YYYY} otherwise.
3348
3349 @item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
3350 Week/year format.  Input format: leader + week + wk-delimiter + year +
3351 trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
3352 YYYY} otherwise.
3353
3354 @item DATETIMEw.d: 17 <= iw,ow <= 40
3355 Date and time format.  Input format: leader + day + date-delimiter +
3356 month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
3357 + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
3358 @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
3359 @var{d} > 0 then fractional seconds @samp{.SS} are added.
3360
3361 @item TIMEw.d: 5 <= iw,ow <= 40
3362 Time format.  Input format: leader + sign + spaces + hour +
3363 time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
3364 Seconds and fractional seconds are available with @var{w} of at least 8
3365 and 10, respectively.
3366
3367 @item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
3368 Time format with day count.  Input format: leader + sign + spaces +
3369 day-count + time-delimiter + hour + time-delimiter + minute +
3370 opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
3371 seconds are available with @var{w} of at least 8 and 10, respectively.
3372
3373 @item WKDAYw: 2 <= iw,ow <= 40
3374 A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
3375 leader + weekday + trailer.  Output format: as many characters, in all
3376 capital letters, of the English name of the weekday as will fit in the
3377 field width.
3378
3379 @item MONTHw: 3 <= iw,ow <= 40
3380 A month as a number between 1 and 12, where 1 is January.  Input format:
3381 leader + month + trailer.  Output format: as many character, in all
3382 capital letters, of the English name of the month as will fit in the
3383 field width.
3384 @end table
3385
3386 There are only two formats that may be used with string variables:
3387
3388 @table @asis
3389 @item Aw: 1 <= iw <= 255, 1 <= ow <= 254
3390 The entire field is treated as a string value.
3391
3392 @item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
3393 The field is composed of characters in a string encoded as textual hex
3394 digit pairs.
3395
3396 The default output @var{w} is half the input @var{w}.
3397 @end table
3398
3399 @node Scratch Variables,  , Input/Output Formats, Variables
3400 @subsection Scratch Variables
3401
3402 Most of the time, variables don't retain their values between cases.
3403 Instead, either they're being read from a data file or the active file,
3404 in which case they assume the value read, or, if created with COMPUTE or
3405 another transformation, they're initialized to the system-missing value
3406 or to blanks, depending on type.
3407
3408 However, sometimes it's useful to have a variable that keeps its value
3409 between cases.  You can do this with LEAVE (@pxref{LEAVE}), or you can
3410 use a @dfn{scratch variable}.  Scratch variables are variables whose
3411 names begin with an octothorpe (@samp{#}).
3412
3413 Scratch variables have the same properties as variables left with LEAVE:
3414 they retain their values between cases, and for the first case they are
3415 initialized to 0 or blanks.  They have the additional property that they
3416 are deleted before the execution of any procedure.  For this reason,
3417 scratch variables can't be used for analysis.  To obtain the same
3418 effect, use COMPUTE (@pxref{COMPUTE}) to copy the scratch variable's
3419 value into an ordinary variable, then analysis that variable.
3420
3421 @node Files, BNF, Variables, Language
3422 @section Files Used by PSPP
3423
3424 PSPP makes use of many files each time it runs.  Some of these it
3425 reads, some it writes, some it creates.  Here is a table listing the
3426 most important of these files:
3427
3428 @table @strong
3429 @cindex file, command
3430 @cindex file, syntax file
3431 @cindex command file
3432 @cindex syntax file
3433 @item command file
3434 @itemx syntax file
3435 These names (synonyms) refer to the file that contains instructions to
3436 PSPP that tell it what to do.  The syntax file's name is specified on
3437 the PSPP command line.  Syntax files can also be pulled in with the
3438 @code{INCLUDE} command.
3439
3440 @cindex file, data
3441 @cindex data file
3442 @item data file
3443 Data files contain raw data in ASCII format suitable for being read in
3444 by the @code{DATA LIST} command.  Data can be embedded in the syntax
3445 file with @code{BEGIN DATA} and @code{END DATA} commands: this makes the
3446 syntax file a data file too.
3447
3448 @cindex file, output
3449 @cindex output file
3450 @item listing file
3451 One or more output files are created by PSPP each time it is
3452 run.  The output files receive the tables and charts produced by
3453 statistical procedures.  The output files may be in any number of formats,
3454 depending on how PSPP is configured.
3455
3456 @cindex active file
3457 @cindex file, active
3458 @item active file
3459 The active file is the ``file'' on which all PSPP procedures
3460 are performed.  The active file contains variable definitions and
3461 cases.  The active file is not necessarily a disk file: it is stored
3462 in memory if there is room.
3463 @end table
3464
3465 @node BNF,  , Files, Language
3466 @section Backus-Naur Form
3467 @cindex BNF
3468 @cindex Backus-Naur Form
3469 @cindex command syntax, description of
3470 @cindex description of command syntax
3471
3472 The syntax of some parts of the PSPP language is presented in this
3473 manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
3474 following table describes BNF:
3475
3476 @itemize @bullet
3477 @cindex keywords
3478 @cindex terminals
3479 @item
3480 Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
3481 often called @dfn{terminals}.  There are some special terminals, which
3482 are actually written in lowercase for clarity:
3483
3484 @table @asis
3485 @cindex @code{number}
3486 @item @code{number}
3487 A real number.
3488
3489 @cindex @code{integer}
3490 @item @code{integer}
3491 An integer number.
3492
3493 @cindex @code{string}
3494 @item @code{string}
3495 A string.
3496
3497 @cindex @code{var-name}
3498 @item @code{var-name}
3499 A single variable name.
3500
3501 @cindex operators
3502 @cindex punctuators
3503 @item @code{=}, @code{/}, @code{+}, @code{-}, etc.
3504 Operators and punctuators.
3505
3506 @cindex @code{.}
3507 @cindex terminal dot
3508 @cindex dot, terminal
3509 @item @code{.}
3510 The terminal dot.  This is not necessarily an actual dot in the syntax
3511 file: @xref{Commands}, for more details.
3512 @end table
3513
3514 @item
3515 @cindex productions
3516 @cindex nonterminals
3517 Other words in all lowercase refer to BNF definitions, called
3518 @dfn{productions}.  These productions are also known as
3519 @dfn{nonterminals}.  Some nonterminals are very common, so they are
3520 defined here in English for clarity:
3521
3522 @table @code
3523 @cindex @code{var-list}
3524 @item var-list
3525 A list of one or more variable names or the keyword @code{ALL}.
3526
3527 @cindex @code{expression}
3528 @item expression
3529 An expression.  @xref{Expressions}, for details.
3530 @end table
3531
3532 @item
3533 @cindex @code{::=}
3534 @cindex ``is defined as''
3535 @cindex productions
3536 @samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
3537 the name of the nonterminal being defined.  The right side of @samp{::=}
3538 gives the definition of that nonterminal.  If the right side is empty,
3539 then one possible expansion of that nonterminal is nothing.  A BNF
3540 definition is called a @dfn{production}.
3541
3542 @item
3543 @cindex terminals and nonterminals, differences
3544 So, the key difference between a terminal and a nonterminal is that a
3545 terminal cannot be broken into smaller parts---in fact, every terminal
3546 is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
3547 composed of a (possibly empty) sequence of terminals and nonterminals.
3548 Thus, terminals indicate the deepest level of syntax description.  (In
3549 parsing theory, terminals are the leaves of the parse tree; nonterminals
3550 form the branches.)
3551
3552 @item
3553 @cindex start symbol
3554 @cindex symbol, start
3555 The first nonterminal defined in a set of productions is called the
3556 @dfn{start symbol}.  The start symbol defines the entire syntax for
3557 that command.
3558 @end itemize
3559
3560 @node Expressions, Data Input and Output, Language, Top
3561 @chapter Mathematical Expressions
3562 @cindex expressions, mathematical
3563 @cindex mathematical expressions
3564
3565 Some PSPP commands use expressions, which share a common syntax
3566 among all PSPP commands.  Expressions are made up of
3567 @dfn{operands}, which can be numbers, strings, or variable names,
3568 separated by @dfn{operators}.  There are five types of operators:
3569 grouping, arithmetic, logical, relational, and functions.
3570
3571 Every operator takes one or more @dfn{arguments} as input and produces
3572 or @dfn{returns} exactly one result as output.  Both strings and numeric
3573 values can be used as arguments and are produced as results, but each
3574 operator accepts only specific combinations of numeric and string values
3575 as arguments.  With few exceptions, operator arguments may be
3576 full-fledged expressions in themselves.
3577
3578 @menu
3579 * Booleans::                       Boolean values.
3580 * Missing Values in Expressions::  Using missing values in expressions.
3581 * Grouping Operators::             ( )
3582 * Arithmetic Operators::           + - * / **
3583 * Logical Operators::              AND NOT OR
3584 * Relational Operators::           EQ GE GT LE LT NE
3585 * Functions::                      More-sophisticated operators.
3586 * Order of Operations::            Operator precedence.
3587 @end menu
3588
3589 @node Booleans, Missing Values in Expressions, Expressions, Expressions
3590 @section Boolean values
3591 @cindex Boolean
3592 @cindex values, Boolean
3593
3594 There is a third type for arguments and results, the @dfn{Boolean} type,
3595 which is used to represent true/false conditions.  Booleans have only
3596 three possible values: 0 (false), 1 (true), and system-missing.
3597 System-missing is neither true nor false.
3598
3599 @itemize @bullet
3600 @item
3601 A numeric expression that has value 0, 1, or system-missing may be used
3602 in place of a Boolean.  Thus, the expression @code{0 AND 1} is valid
3603 (although it is always false).
3604
3605 @item
3606 A numeric expression with any other value will cause an error if it is
3607 used as a Boolean.  So, @code{2 OR 3} is invalid.
3608
3609 @item
3610 A Boolean expression may not be used in place of a numeric expression.
3611 Thus, @code{(1>2) + (3<4)} is invalid.
3612
3613 @item
3614 Strings and Booleans are not compatible, and neither may be used in
3615 place of the other.
3616 @end itemize
3617
3618 @node Missing Values in Expressions, Grouping Operators, Booleans, Expressions
3619 @section Missing Values in Expressions
3620
3621 String missing values are not treated specially in expressions.  Most
3622 numeric operators return system-missing when given system-missing
3623 arguments.  Exceptions are listed under particular operator
3624 descriptions.
3625
3626 User-missing values for numeric variables are always transformed into
3627 the system-missing value, except inside the arguments to the
3628 @code{VALUE}, @code{SYSMIS}, and @code{MISSING} functions.
3629
3630 The missing-value functions can be used to precisely control how missing
3631 values are treated in expressions.  @xref{Missing Value Functions}, for
3632 more details.
3633
3634 @node Grouping Operators, Arithmetic Operators, Missing Values in Expressions, Expressions
3635 @section Grouping Operators
3636 @cindex parentheses
3637 @cindex @samp{(  )}
3638 @cindex grouping operators
3639 @cindex operators, grouping
3640
3641 Parentheses (@samp{()}) are the grouping operators.  Surround an
3642 expression with parentheses to force early evaluation.
3643
3644 Parentheses also surround the arguments to functions, but in that
3645 situation they act as punctuators, not as operators.
3646
3647 @node Arithmetic Operators, Logical Operators, Grouping Operators, Expressions
3648 @section Arithmetic Operators
3649 @cindex operators, arithmetic
3650 @cindex arithmetic operators
3651
3652 The arithmetic operators take numeric arguments and produce numeric
3653 results.
3654
3655 @table @code
3656 @cindex @samp{+}
3657 @cindex addition
3658 @item @var{a} + @var{b}
3659 Adds @var{a} and @var{b}, returning the sum.
3660
3661 @cindex @samp{-}
3662 @cindex subtraction
3663 @item @var{a} - @var{b}
3664 Subtracts @var{b} from @var{a}, returning the difference.
3665
3666 @cindex @samp{*}
3667 @cindex multiplication
3668 @item @var{a} * @var{b}
3669 Multiplies @var{a} and @var{b}, returning the product.
3670
3671 @cindex @samp{/}
3672 @cindex division
3673 @item @var{a} / @var{b}
3674 Divides @var{a} by @var{b}, returning the quotient.  If @var{b} is
3675 zero, the result is system-missing.
3676
3677 @cindex @samp{**}
3678 @cindex exponentiation
3679 @item @var{a} ** @var{b}
3680 Returns the result of raising @var{a} to the power @var{b}.  If
3681 @var{a} is negative and @var{b} is not an integer, the result is
3682 system-missing.  The result of @code{0**0} is system-missing as well.
3683
3684 @cindex @samp{-}
3685 @cindex negation
3686 @item - @var{a}
3687 Reverses the sign of @var{a}.
3688 @end table
3689
3690 @node Logical Operators, Relational Operators, Arithmetic Operators, Expressions
3691 @section Logical Operators
3692 @cindex logical operators
3693 @cindex operators, logical
3694
3695 @cindex true
3696 @cindex false
3697 @cindex Boolean
3698 @cindex values, system-missing
3699 @cindex system-missing
3700 The logical operators take logical arguments and produce logical
3701 results, meaning ``true or false''.  PSPP logical operators are
3702 not true Boolean operators because they may also result in a
3703 system-missing value.
3704
3705 @table @code
3706 @cindex @code{AND}
3707 @cindex @samp{&}
3708 @cindex intersection, logical
3709 @cindex logical intersection
3710 @item @var{a} AND @var{b}
3711 @itemx @var{a} & @var{b}
3712 True if both @var{a} and @var{b} are true.  However, if one argument is
3713 false and the other is missing, the result is false, not missing.  If
3714 both arguments are missing, the result is missing.
3715
3716 @cindex @code{OR}
3717 @cindex @samp{|}
3718 @cindex union, logical
3719 @cindex logical union
3720 @item @var{a} OR @var{b}
3721 @itemx @var{a} | @var{b}
3722 True if at least one of @var{a} and @var{b} is true.  If one argument is
3723 true and the other is missing, the result is true, not missing.  If both
3724 arguments are missing, the result is missing.
3725
3726 @cindex @code{NOT}
3727 @cindex @samp{~}
3728 @cindex inversion, logical
3729 @cindex logical inversion
3730 @item NOT @var{a}
3731 @itemx ~ @var{a}
3732 True if @var{a} is false.
3733 @end table
3734
3735 @node Relational Operators, Functions, Logical Operators, Expressions
3736 @section Relational Operators
3737
3738 The relational operators take numeric or string arguments and produce Boolean
3739 results.
3740
3741 Note that, with numeric arguments, PSPP does not make exact
3742 relational tests.  Instead, two numbers are considered to be equal even
3743 if they differ by a small amount.  This amount, @dfn{epsilon}, is
3744 dependent on the PSPP configuration and determined at compile
3745 time.  (The default value is 0.000000001, or
3746 @ifinfo
3747 @code{10**(-9)}.)
3748 @end ifinfo
3749 @tex
3750 $10 ^{-9}$.)
3751 @end tex
3752 Use of epsilon allows for round-off errors.  Use of epsilon is also
3753 idiotic, but the author is not a numeric analyst.
3754
3755 Strings cannot be compared to numbers.  When strings of different
3756 lengths are compared, the shorter string is right-padded with spaces
3757 to match the length of the longer string.
3758
3759 The results of string comparisons, other than tests for equality or
3760 inequality, are dependent on the character set in use.  String
3761 comparisons are case-sensitive.
3762
3763 @table @code
3764 @cindex equality, testing
3765 @cindex testing for equality
3766 @cindex @code{EQ}
3767 @cindex @samp{=}
3768 @item @var{a} EQ @var{b}
3769 @itemx @var{a} = @var{b}
3770 True if @var{a} is equal to @var{b}.
3771
3772 @cindex less than or equal to
3773 @cindex @code{LE}
3774 @cindex @code{<=}
3775 @item @var{a} LE @var{b}
3776 @itemx @var{a} <= @var{b}
3777 True if @var{a} is less than or equal to @var{b}.
3778
3779 @cindex less than
3780 @cindex @code{LT}
3781 @cindex @code{<}
3782 @item @var{a} LT @var{b}
3783 @itemx @var{a} < @var{b}
3784 True if @var{a} is less than @var{b}.
3785
3786 @cindex greater than or equal to
3787 @cindex @code{GE}
3788 @cindex @code{>=}
3789 @item @var{a} GE @var{b}
3790 @itemx @var{a} >= @var{b}
3791 True if @var{a} is greater than or equal to @var{b}.
3792
3793 @cindex greater than
3794 @cindex @code{GT}
3795 @cindex @samp{>}
3796 @item @var{a} GT @var{b}
3797 @itemx @var{a} > @var{b}
3798 True if @var{a} is greater than @var{b}.
3799
3800 @cindex inequality, testing
3801 @cindex testing for inequality
3802 @cindex @code{NE}
3803 @cindex @code{~=}
3804 @cindex @code{<>}
3805 @item @var{a} NE @var{b}
3806 @itemx @var{a} ~= @var{b}
3807 @itemx @var{a} <> @var{b}
3808 True is @var{a} is not equal to @var{b}.
3809 @end table
3810
3811 @node Functions, Order of Operations, Relational Operators, Expressions
3812 @section Functions
3813 @cindex functions
3814
3815 @cindex mathematics
3816 @cindex operators
3817 @cindex parentheses
3818 @cindex @code{(}
3819 @cindex @code{)}
3820 @cindex names, of functions
3821 PSPP functions provide mathematical abilities above and beyond
3822 those possible using simple operators.  Functions have a common
3823 syntax: each is composed of a function name followed by a left
3824 parenthesis, one or more arguments, and a right parenthesis.  Function
3825 names are @strong{not} reserved; their names are specially treated
3826 only when followed by a left parenthesis: @code{EXP(10)} refers to the
3827 constant value @code{e} raised to the 10th power, but @code{EXP} by
3828 itself refers to the value of variable EXP.
3829
3830 The sections below describe each function in detail.
3831
3832 @menu
3833 * Advanced Mathematics::        EXP LG10 LN SQRT
3834 * Miscellaneous Mathematics::   ABS MOD MOD10 RND TRUNC
3835 * Trigonometry::                ACOS ARCOS ARSIN ARTAN ASIN ATAN COS SIN TAN
3836 * Missing Value Functions::     MISSING NMISS NVALID SYSMIS VALUE
3837 * Pseudo-Random Numbers::       NORMAL UNIFORM
3838 * Set Membership::              ANY RANGE
3839 * Statistical Functions::       CFVAR MAX MEAN MIN SD SUM VARIANCE
3840 * String Functions::            CONCAT INDEX LENGTH LOWER LPAD LTRIM NUMBER
3841                                 RINDEX RPAD RTRIM STRING SUBSTR UPCASE
3842 * Time & Date::                 CTIME.xxx DATE.xxx TIME.xxx XDATE.xxx
3843 * Miscellaneous Functions::     LAG YRMODA
3844 * Functions Not Implemented::   CDF.xxx CDFNORM IDF.xxx NCDF.xxx PROBIT RV.xxx
3845 @end menu
3846
3847 @node Advanced Mathematics, Miscellaneous Mathematics, Functions, Functions
3848 @subsection Advanced Mathematical Functions
3849 @cindex mathematics, advanced
3850
3851 Advanced mathematical functions take numeric arguments and produce
3852 numeric results.
3853
3854 @deftypefn {Function} {} EXP (@var{exponent})
3855 Returns @i{e} (approximately 2.71828) raised to power @var{exponent}.
3856 @end deftypefn
3857
3858 @cindex logarithms
3859 @deftypefn {Function} {} LG10 (@var{number})
3860 Takes the base-10 logarithm of @var{number}.  If @var{number} is
3861 not positive, the result is system-missing.
3862 @end deftypefn
3863
3864 @deftypefn {Function} {} LN (@var{number})
3865 Takes the base-@i{e} logarithm of @var{number}.  If @var{number} is
3866 not positive, the result is system-missing.
3867 @end deftypefn
3868
3869 @cindex square roots
3870 @deftypefn {Function} {} SQRT (@var{number})
3871 Takes the square root of @var{number}.  If @var{number} is negative,
3872 the result is system-missing.
3873 @end deftypefn
3874
3875 @node Miscellaneous Mathematics, Trigonometry, Advanced Mathematics, Functions
3876 @subsection Miscellaneous Mathematical Functions
3877 @cindex mathematics, miscellaneous
3878
3879 Miscellaneous mathematical functions take numeric arguments and produce
3880 numeric results.
3881
3882 @cindex absolute value
3883 @deftypefn {Function} {} ABS (@var{number})
3884 Results in the absolute value of @var{number}.
3885 @end deftypefn
3886
3887 @cindex modulus
3888 @deftypefn {Function} {} MOD (@var{numerator}, @var{denominator})
3889 Returns the remainder (modulus) of @var{numerator} divided by
3890 @var{denominator}.  If @var{denominator} is 0, the result is
3891 system-missing.  However, if @var{numerator} is 0 and
3892 @var{denominator} is system-missing, the result is 0.
3893 @end deftypefn
3894
3895 @cindex modulus, by 10
3896 @deftypefn {Function} {} MOD10 (@var{number})
3897 Returns the remainder when @var{number} is divided by 10.  If
3898 @var{number} is negative, MOD10(@var{number}) is negative or zero.
3899 @end deftypefn
3900
3901 @cindex rounding
3902 @deftypefn {Function} {} RND (@var{number})
3903 Takes the absolute value of @var{number} and rounds it to an integer.
3904 Then, if @var{number} was negative originally, negates the result.
3905 @end deftypefn
3906
3907 @cindex truncation
3908 @deftypefn {Function} {} TRUNC (@var{number})
3909 Discards the fractional part of @var{number}; that is, rounds
3910 @var{number} towards zero.
3911 @end deftypefn
3912
3913 @node Trigonometry, Missing Value Functions, Miscellaneous Mathematics, Functions
3914 @subsection Trigonometric Functions
3915 @cindex trigonometry
3916
3917 Trigonometric functions take numeric arguments and produce numeric
3918 results.
3919
3920 @cindex arccosine
3921 @cindex inverse cosine
3922 @deftypefn {Function} {} ACOS (@var{number})
3923 @deftypefnx {Function} {} ARCOS (@var{number})
3924 Takes the arccosine, in radians, of @var{number}.  Results in
3925 system-missing if @var{number} is not between -1 and 1.  Portability:
3926 none.
3927 @end deftypefn
3928
3929 @cindex arcsine
3930 @cindex inverse sine
3931 @deftypefn {Function} {} ARSIN (@var{number})
3932 Takes the arcsine, in radians, of @var{number}.  Results in
3933 system-missing if @var{number} is not between -1 and 1 inclusive.
3934 @end deftypefn
3935
3936 @cindex arctangent
3937 @cindex inverse tangent
3938 @deftypefn {Function} {} ARTAN (@var{number})
3939 Takes the arctangent, in radians, of @var{number}.
3940 @end deftypefn
3941
3942 @cindex arcsine
3943 @cindex inverse sine
3944 @deftypefn {Function} {} ASIN (@var{number})
3945 Takes the arcsine, in radians, of @var{number}.  Results in
3946 system-missing if @var{number} is not between -1 and 1 inclusive.
3947 Portability: none.
3948 @end deftypefn
3949
3950 @cindex arctangent
3951 @cindex inverse tangent
3952 @deftypefn {Function} {} ATAN (@var{number})
3953 Takes the arctangent, in radians, of @var{number}.
3954 @end deftypefn
3955
3956 @quotation
3957 @strong{Please note:} Use of the AR* group of inverse trigonometric
3958 functions is recommended over the A* group because they are more
3959 portable.
3960 @end quotation
3961
3962 @cindex cosine
3963 @deftypefn {Function} {} COS (@var{angle})
3964 Takes the cosine of @var{angle} which should be in radians.
3965 @end deftypefn
3966
3967 @cindex sine
3968 @deftypefn {Function} {} SIN (@var{angle})
3969 Takes the sine of @var{angle} which should be in radians.
3970 @end deftypefn
3971
3972 @cindex tangent
3973 @deftypefn {Function} {} TAN (@var{angle})
3974 Takes the tangent of @var{angle} which should be in radians.
3975 Results in system-missing at values
3976 of @var{angle} that are too close to odd multiples of pi/2.
3977 Portability: none.
3978 @end deftypefn
3979
3980 @node Missing Value Functions, Pseudo-Random Numbers, Trigonometry, Functions
3981 @subsection Missing-Value Functions
3982 @cindex missing values
3983 @cindex values, missing
3984 @cindex functions, missing-value
3985
3986 Missing-value functions take various types as arguments, returning
3987 various types of results.
3988
3989 @deftypefn {Function} {} MISSING (@var{variable or expression})
3990 @var{num} may be a single variable name or an expression.  If it is a
3991 variable name, results in 1 if the variable has a user-missing or
3992 system-missing value for the current case, 0 otherwise.  If it is an
3993 expression, results in 1 if the expression has the system-missing value,
3994 0 otherwise.
3995
3996 @quotation
3997 @strong{Please note:} If the argument is a string expression other than
3998 a variable name, MISSING is guaranteed to return 0, because strings do
3999 not have a system-missing value.  Also, when using a numeric expression
4000 argument, remember that user-missing values are converted to the
4001 system-missing value in most contexts.  Thus, the expressions
4002 @code{MISSING(VAR1 @var{op} VAR2)} and @code{MISSING(VAR1) OR
4003 MISSING(VAR2)} are often equivalent, depending on the specific operator
4004 @var{op} used.
4005 @end quotation
4006 @end deftypefn
4007
4008 @deftypefn {Function} {} NMISS (@var{expr} [, @var{expr}]@dots{})
4009 Each argument must be a numeric expression.  Returns the number of
4010 user- or system-missing values in the list.  As a special extension,
4011 the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a
4012 range of variables; see @ref{Sets of Variables}, for more details.
4013 @end deftypefn
4014
4015 @deftypefn {Function} {} NVALID (@var{expr} [, @var{expr}]@dots{})
4016 Each argument must be a numeric expression.  Returns the number of
4017 values in the list that are not user- or system-missing.  As a special extension,
4018 the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a
4019 range of variables; see @ref{Sets of Variables}, for more details.
4020 @end deftypefn
4021
4022 @deftypefn {Function} {} SYSMIS (@var{variable or expression})
4023 When given the name of a numeric variable, returns 1 if the value of
4024 that variable is system-missing.  Otherwise, if the value is not
4025 missing or if it is user-missing, returns 0.  If given the name of a
4026 string variable, always returns 1.  If given an expression other than
4027 a single variable name, results in 1 if the value is system- or
4028 user-missing, 0 otherwise.
4029 @end deftypefn
4030
4031 @deftypefn {Function} {} VALUE (@var{variable})
4032 Prevents the user-missing values of @var{variable} from being
4033 transformed into system-missing values: If @var{variable} is not
4034 system- or user-missing, results in the value of @var{variable}.  If
4035 @var{variable} is user-missing, results in the value of @var{variable}
4036 anyway.  If @var{variable} is system-missing, results in system-missing.
4037 @end deftypefn
4038
4039 @node Pseudo-Random Numbers, Set Membership, Missing Value Functions, Functions
4040 @subsection Pseudo-Random Number Generation Functions
4041 @cindex random numbers
4042 @cindex pseudo-random numbers (see random numbers)
4043
4044 Pseudo-random number generation functions take numeric arguments and
4045 produce numeric results.
4046
4047 @cindex Knuth
4048 The system's C library random generator is used as a basis for
4049 generating random numbers, since random number generation is a
4050 system-dependent task.  However, Knuth's Algorithm B is used to
4051 shuffle the resultant values, which is enough to make even a stream of
4052 consecutive integers random enough for most applications.
4053
4054 (If you're worried about the quality of the random number generator,
4055 well, you're using a statistical processing package---analyze it!)
4056
4057 @cindex random numbers, normally-distributed
4058 @deftypefn {Function} {} NORMAL (@var{number})
4059 Results in a random number.  Results from @code{NORMAL} are normally
4060 distributed with a mean of 0 and a standard deviation of @var{number}.
4061 @end deftypefn
4062
4063 @cindex random numbers, uniformly-distributed
4064 @deftypefn {Function} {} UNIFORM (@var{number})
4065 Results in a random number between 0 and @var{number}.  Results from
4066 @code{UNIFORM} are evenly distributed across its entire range.  There
4067 may be a maximum on the largest random number ever generated---this is
4068 often
4069 @ifinfo
4070 2**31-1
4071 @end ifinfo
4072 @tex
4073 $2^{31}-1$
4074 @end tex
4075 (2,147,483,647), but it may be orders of magnitude
4076 higher or lower.
4077 @end deftypefn
4078
4079 @node Set Membership, Statistical Functions, Pseudo-Random Numbers, Functions
4080 @subsection Set-Membership Functions
4081 @cindex set membership
4082 @cindex membership, of set
4083
4084 Set membership functions determine whether a value is a member of a set.
4085 They take a set of numeric arguments or a set of string arguments, and
4086 produce Boolean results.
4087
4088 String comparisons are performed according to the rules given in
4089 @ref{Relational Operators}.
4090
4091 @deftypefn {Function} {} ANY (@var{value}, @var{set} [, @var{set}]@dots{})
4092 Results in true if @var{value} is equal to any of the @var{set}
4093 values.  Otherwise, results in false.  If @var{value} is
4094 system-missing, returns system-missing.  System-missing values in
4095 @var{set} do not cause ANY to return system-missing.
4096 @end deftypefn
4097
4098 @deftypefn {Function} {} RANGE (@var{value}, @var{low}, @var{high} [, @var{low}, @var{high}]@dots{})
4099 Results in true if @var{value} is in any of the intervals bounded by
4100 @var{low} and @var{high} inclusive.  Otherwise, results in false.
4101 Each @var{low} must be less than or equal to its corresponding
4102 @var{high} value.  @var{low} and @var{high} must be given in pairs.
4103 If @var{value} is system-missing, returns system-missing.
4104 System-missing values in @var{set} do not cause RANGE to return
4105 system-missing.
4106 @end deftypefn
4107
4108 @node Statistical Functions, String Functions, Set Membership, Functions
4109 @subsection Statistical Functions
4110 @cindex functions, statistical
4111 @cindex statistics
4112
4113 Statistical functions compute descriptive statistics on a list of
4114 values.  Some statistics can be computed on numeric or string values;
4115 other can only be computed on numeric values.  They result in the same
4116 type as their arguments.
4117
4118 @cindex arguments, minimum valid
4119 @cindex minimum valid number of arguments
4120 With statistical functions it is possible to specify a minimum number of
4121 non-missing arguments for the function to be evaluated.  To do so,
4122 append a dot and the number to the function name.  For instance, to
4123 specify a minimum of three valid arguments to the MEAN function, use the
4124 name @code{MEAN.3}.
4125
4126 @cindex coefficient of variation
4127 @cindex variation, coefficient of
4128 @deftypefn {Function} {} CFVAR (@var{number}, @var{number}[, @dots{}])
4129 Results in the coefficient of variation of the values of @var{number}.
4130 This function requires at least two valid arguments to give a
4131 non-missing result.  (The coefficient of variation is the standard
4132 deviation divided by the mean.)
4133 @end deftypefn
4134
4135 @cindex maximum
4136 @deftypefn {Function} {} MAX (@var{value}, @var{value}[, @dots{}])
4137 Results in the value of the greatest @var{value}.  The @var{value}s may
4138 be numeric or string.  Although at least two arguments must be given,
4139 only one need be valid for MAX to give a non-missing result.
4140 @end deftypefn
4141
4142 @cindex mean
4143 @deftypefn {Function} {} MEAN (@var{number}, @var{number}[, @dots{}])
4144 Results in the mean of the values of @var{number}.  Although at least
4145 two arguments must be given, only one need be valid for MEAN to give a
4146 non-missing result.
4147 @end deftypefn
4148
4149 @cindex minimum
4150 @deftypefn {Function} {} MIN (@var{number}, @var{number}[, @dots{}])
4151 Results in the value of the least @var{value}.  The @var{value}s may
4152 be numeric or string.  Although at least two arguments must be given,
4153 only one need be valid for MAX to give a non-missing result.
4154 @end deftypefn
4155
4156 @cindex standard deviation
4157 @cindex deviation, standard
4158 @deftypefn {Function} {} SD (@var{number}, @var{number}[, @dots{}])
4159 Results in the standard deviation of the values of @var{number}.
4160 This function requires at least two valid arguments to give a
4161 non-missing result.
4162 @end deftypefn
4163
4164 @cindex sum
4165 @deftypefn {Function} {} SUM (@var{number}, @var{number}[, @dots{}])
4166 Results in the sum of the values of @var{number}.  Although at least two
4167 arguments must be given, only one need by valid for SUM to give a
4168 non-missing result.
4169 @end deftypefn
4170
4171 @cindex variance
4172 @deftypefn {Function} {} VAR (@var{number}, @var{number}[, @dots{}])
4173 Results in the variance of the values of @var{number}.  This function
4174 requires at least two valid arguments to give a non-missing result.
4175 @end deftypefn
4176
4177 @deftypefn {Function} {} VARIANCE (@var{number}, @var{number}[, @dots{}])
4178 Results in the variance of the values of @var{number}.  This function
4179 requires at least two valid arguments to give a non-missing result.
4180 (Use VAR in preference to VARIANCE for reasons of portability.)
4181 @end deftypefn
4182
4183 @node String Functions, Time & Date, Statistical Functions, Functions
4184 @subsection String Functions
4185 @cindex functions, string
4186 @cindex string functions
4187
4188 String functions take various arguments and return various results.
4189
4190 @cindex concatenation
4191 @cindex strings, concatenation of
4192 @deftypefn {Function} {} CONCAT (@var{string}, @var{string}[, @dots{}])
4193 Returns a string consisting of each @var{string} in sequence.
4194 @code{CONCAT("abc", "def", "ghi")} has a value of @code{"abcdefghi"}.
4195 The resultant string is truncated to a maximum of 255 characters.
4196 @end deftypefn
4197
4198 @cindex searching strings
4199 @deftypefn {Function} {} INDEX (@var{haystack}, @var{needle})
4200 Returns a positive integer indicating the position of the first
4201 occurrence @var{needle} in @var{haystack}.  Returns 0 if @var{haystack}
4202 does not contain @var{needle}.  Returns system-missing if @var{needle}
4203 is an empty string.
4204 @end deftypefn
4205
4206 @deftypefn {Function} {} INDEX (@var{haystack}, @var{needle}, @var{divisor})
4207 Divides @var{needle} into parts, each with length @var{divisor}.
4208 Searches @var{haystack} for the first occurrence of each part, and
4209 returns the smallest value.  Returns 0 if @var{haystack} does not
4210 contain any part in @var{needle}.  It is an error if @var{divisor}
4211 cannot be evenly divided into the length of @var{needle}.  Returns
4212 system-missing if @var{needle} is an empty string.
4213 @end deftypefn
4214
4215 @cindex strings, finding length of
4216 @deftypefn {Function} {} LENGTH (@var{string})
4217 Returns the number of characters in @var{string}.
4218 @end deftypefn
4219
4220 @cindex strings, case of
4221 @deftypefn {Function} {} LOWER (@var{string})
4222 Returns a string identical to @var{string} except that all uppercase
4223 letters are changed to lowercase letters.  The definitions of
4224 ``uppercase'' and ``lowercase'' are system-dependent.
4225 @end deftypefn
4226
4227 @cindex strings, padding
4228 @deftypefn {Function} {} LPAD (@var{string}, @var{length})
4229 If @var{string} is at least @var{length} characters in length, returns
4230 @var{string} unchanged.  Otherwise, returns @var{string} padded with
4231 spaces on the left side to length @var{length}.  Returns an empty string
4232 if @var{length} is system-missing, negative, or greater than 255.
4233 @end deftypefn
4234
4235 @deftypefn {Function} {} LPAD (@var{string}, @var{length}, @var{padding})
4236 If @var{string} is at least @var{length} characters in length, returns
4237 @var{string} unchanged.  Otherwise, returns @var{string} padded with
4238 @var{padding} on the left side to length @var{length}.  Returns an empty
4239 string if @var{length} is system-missing, negative, or greater than 255, or
4240 if @var{padding} does not contain exactly one character.
4241 @end deftypefn
4242
4243 @cindex strings, trimming
4244 @cindex whitespace, trimming
4245 @deftypefn {Function} {} LTRIM (@var{string})
4246 Returns @var{string}, after removing leading spaces.  Other whitespace,
4247 such as tabs, carriage returns, line feeds, and vertical tabs, is not
4248 removed.
4249 @end deftypefn
4250
4251 @deftypefn {Function} {} LTRIM (@var{string}, @var{padding})
4252 Returns @var{string}, after removing leading @var{padding} characters.
4253 If @var{padding} does not contain exactly one character, returns an
4254 empty string.
4255 @end deftypefn
4256
4257 @cindex numbers, converting from strings
4258 @cindex strings, converting to numbers
4259 @deftypefn {Function} {} NUMBER (@var{string})
4260 Returns the number produced when @var{string} is interpreted according
4261 to format F@var{x}.0, where @var{x} is the number of characters in
4262 @var{string}.  If @var{string} does not form a proper number,
4263 system-missing is returned without an error message.  Portability: none.
4264 @end deftypefn
4265
4266 @deftypefn {Function} {} NUMBER (@var{string}, @var{format})
4267 Returns the number produced when @var{string} is interpreted according
4268 to format specifier @var{format}.  Only the number of characters in
4269 @var{string} specified by @var{format} are examined.  For example,
4270 @code{NUMBER("123", F3.0)} and @code{NUMBER("1234", F3.0)} both have
4271 value 123.  If @var{string} does not form a proper number,
4272 system-missing is returned without an error message.
4273 @end deftypefn
4274
4275 @cindex strings, searching backwards
4276 @deftypefn {Function} {} RINDEX (@var{string}, @var{format})
4277 Returns a positive integer indicating the position of the last
4278 occurrence of @var{needle} in @var{haystack}.  Returns 0 if
4279 @var{haystack} does not contain @var{needle}.  Returns system-missing if
4280 @var{needle} is an empty string.
4281 @end deftypefn
4282
4283 @deftypefn {Function} {} RINDEX (@var{haystack}, @var{needle}, @var{divisor})
4284 Divides @var{needle} into parts, each with length @var{divisor}.
4285 Searches @var{haystack} for the last occurrence of each part, and
4286 returns the largest value.  Returns 0 if @var{haystack} does not contain
4287 any part in @var{needle}.  It is an error if @var{divisor} cannot be
4288 evenly divided into the length of @var{needle}.  Returns system-missing
4289 if @var{needle} is an empty string.
4290 @end deftypefn
4291
4292 @cindex padding strings
4293 @cindex strings, padding
4294 @deftypefn {Function} {} RPAD (@var{string}, @var{length})
4295 If @var{string} is at least @var{length} characters in length, returns
4296 @var{string} unchanged.  Otherwise, returns @var{string} padded with
4297 spaces on the right to length @var{length}.  Returns an empty string if
4298 @var{length} is system-missing, negative, or greater than 255.
4299 @end deftypefn
4300
4301 @deftypefn {Function} {} RPAD (@var{string}, @var{length}, @var{padding})
4302 If @var{string} is at least @var{length} characters in length, returns
4303 @var{string} unchanged.  Otherwise, returns @var{string} padded with
4304 @var{padding} on the right to length @var{length}.  Returns an empty
4305 string if @var{length} is system-missing, negative, or greater than 255,
4306 or if @var{padding} does not contain exactly one character.
4307 @end deftypefn
4308
4309 @cindex strings, trimming
4310 @cindex whitespace, trimming
4311 @deftypefn {Function} {} RTRIM (@var{string})
4312 Returns @var{string}, after removing trailing spaces.  Other types of
4313 whitespace are not removed.
4314 @end deftypefn
4315
4316 @deftypefn {Function} {} RTRIM (@var{string}, @var{padding})
4317 Returns @var{string}, after removing trailing @var{padding} characters.
4318 If @var{padding} does not contain exactly one character, returns an
4319 empty string.
4320 @end deftypefn
4321
4322 @cindex strings, converting from numbers
4323 @cindex numbers, converting to strings
4324 @deftypefn {Function} {} STRING (@var{number}, @var{format})
4325 Returns a string corresponding to @var{number} in the format given by
4326 format specifier @var{format}.  For example, @code{STRING(123.56, F5.1)}
4327 has the value @code{"123.6"}.
4328 @end deftypefn
4329
4330 @cindex substrings
4331 @cindex strings, taking substrings of
4332 @deftypefn {Function} {} SUBSTR (@var{string}, @var{start})
4333 Returns a string consisting of the value of @var{string} from position
4334 @var{start} onward.  Returns an empty string if @var{start} is system-missing
4335 or has a value less than 1 or greater than the number of characters in
4336 @var{string}.
4337 @end deftypefn
4338
4339 @deftypefn {Function} {} SUBSTR (@var{string}, @var{start}, @var{count})
4340 Returns a string consisting of the first @var{count} characters from
4341 @var{string} beginning at position @var{start}.  Returns an empty string
4342 if @var{start} or @var{count} is system-missing, if @var{start} is less
4343 than 1 or greater than the number of characters in @var{string}, or if
4344 @var{count} is less than 1.  Returns a string shorter than @var{count}
4345 characters if @var{start} + @var{count} - 1 is greater than the number
4346 of characters in @var{string}.  Examples: @code{SUBSTR("abcdefg", 3, 2)}
4347 has value @code{"cd"}; @code{SUBSTR("Ben Pfaff", 5, 10)} has the value
4348 @code{"Pfaff"}.
4349 @end deftypefn
4350
4351 @cindex case conversion
4352 @cindex strings, case of
4353 @deftypefn {Function} {} UPCASE (@var{string})
4354 Returns @var{string}, changing lowercase letters to uppercase letters.
4355 @end deftypefn
4356
4357 @node Time & Date, Miscellaneous Functions, String Functions, Functions
4358 @subsection Time & Date Functions
4359 @cindex functions, time & date
4360 @cindex times
4361 @cindex dates
4362
4363 @cindex dates, legal range of
4364 The legal range of dates for use in PSPP is 15 Oct 1582
4365 through 31 Dec 19999.
4366
4367 @cindex arguments, invalid
4368 @cindex invalid arguments
4369 @quotation
4370 @strong{Please note:} Most time & date extraction functions will accept
4371 invalid arguments:
4372
4373 @itemize @bullet
4374 @item
4375 Negative numbers in PSPP time format.
4376 @item
4377 Numbers less than 86,400 in PSPP date format.
4378 @end itemize
4379
4380 However, sensible results are not guaranteed for these invalid values.
4381 The given equivalents for these functions are definitely not guaranteed
4382 for invalid values.
4383 @end quotation
4384
4385 @quotation
4386 @strong{Please note also:} The time & date construction
4387 functions @strong{do} produce reasonable and useful results for
4388 out-of-range values; these are not considered invalid.
4389 @end quotation
4390
4391 @menu
4392 * Time & Date Concepts::        How times & dates are defined and represented
4393 * Time Construction::           TIME.@{DAYS HMS@}
4394 * Time Extraction::             CTIME.@{DAYS HOURS MINUTES SECONDS@}
4395 * Date Construction::           DATE.@{DMY MDY MOYR QYR WKYR YRDAY@}
4396 * Date Extraction::             XDATE.@{DATE HOUR JDAY MDAY MINUTE MONTH
4397                                        QUARTER SECOND TDAY TIME WEEK
4398                                        WKDAY YEAR@}
4399 @end menu
4400
4401 @node Time & Date Concepts, Time Construction, Time & Date, Time & Date
4402 @subsubsection How times & dates are defined and represented
4403
4404 @cindex time, concepts
4405 @cindex time, intervals
4406 Times and dates are handled by PSPP as single numbers.  A
4407 @dfn{time} is an interval.  PSPP measures times in seconds.
4408 Thus, the following intervals correspond with the numeric values given:
4409
4410 @example
4411           10 minutes                        600
4412           1 hour                          3,600
4413           1 day, 3 hours, 10 seconds     97,210
4414           40 days                     3,456,000
4415           10010 d, 14 min, 24 s     864,864,864
4416 @end example
4417
4418 @cindex dates, concepts
4419 @cindex time, instants of
4420 A @dfn{date}, on the other hand, is a particular instant in the past or
4421 the future.  PSPP represents a date as a number of seconds after the
4422 midnight that separated 8 Oct 1582 and 9 Oct 1582.  (Please note that 15
4423 Oct 1582 immediately followed 9 Oct 1582.)  Thus, the midnights before
4424 the dates given below correspond with the numeric PSPP dates given:
4425
4426 @example
4427               15 Oct 1582                86,400
4428                4 Jul 1776         6,113,318,400
4429                1 Jan 1900        10,010,390,400
4430                1 Oct 1978        12,495,427,200
4431               24 Aug 1995        13,028,601,600
4432 @end example
4433
4434 @cindex time, mathematical properties of
4435 @cindex mathematics, applied to times & dates
4436 @cindex dates, mathematical properties of
4437 @noindent
4438 Please note:
4439
4440 @itemize @bullet
4441 @item
4442 A time may be added to, or subtracted from, a date, resulting in a date.
4443
4444 @item
4445 The difference of two dates may be taken, resulting in a time.
4446
4447 @item
4448 Two times may be added to, or subtracted from, each other, resulting in
4449 a time.
4450 @end itemize
4451
4452 (Adding two dates does not produce a useful result.)
4453
4454 Since times and dates are merely numbers, the ordinary addition and
4455 subtraction operators are employed for these purposes.
4456
4457 @quotation
4458 @strong{Please note:} Many dates and times have extremely large
4459 values---just look at the values above.  Thus, it is not a good idea to
4460 take powers of these values; also, the accuracy of some procedures may
4461 be affected.  If necessary, convert times or dates in seconds to some
4462 other unit, like days or years, before performing analysis.
4463 @end quotation
4464
4465 @node Time Construction, Time Extraction, Time & Date Concepts, Time & Date
4466 @subsubsection Functions that Produce Times
4467 @cindex times, constructing
4468 @cindex constructing times
4469
4470 These functions take numeric arguments and produce numeric results in
4471 PSPP time format.
4472
4473 @cindex days
4474 @cindex time, in days
4475 @deftypefn {Function} {} TIME.DAYS (@var{ndays})
4476 Results in a time value corresponding to @var{ndays} days.
4477 (@code{TIME.DAYS(@var{x})} is equivalent to @code{@var{x} * 60 * 60 *
4478 24}.)
4479 @end deftypefn
4480
4481 @cindex hours-minutes-seconds
4482 @cindex time, in hours-minutes-seconds
4483 @deftypefn {Function} {} TIME.HMS (@var{nhours}, @var{nmins}, @var{nsecs})
4484 Results in a time value corresponding to @var{nhours} hours, @var{nmins}
4485 minutes, and @var{nsecs} seconds.  (@code{TIME.HMS(@var{h}, @var{m},
4486 @var{s})} is equivalent to @code{@var{h}*60*60 + @var{m}*60 +
4487 @var{s}}.)
4488 @end deftypefn
4489
4490 @node Time Extraction, Date Construction, Time Construction, Time & Date
4491 @subsubsection Functions that Examine Times
4492 @cindex extraction, of time
4493 @cindex time examination
4494 @cindex examination, of times
4495 @cindex time, lengths of
4496
4497 These functions take numeric arguments in PSPP time format and
4498 give numeric results.
4499
4500 @cindex days
4501 @cindex time, in days
4502 @deftypefn {Function} {} CTIME.DAYS (@var{time})
4503 Results in the number of days and fractional days in @var{time}.
4504 (@code{CTIME.DAYS(@var{x})} is equivalent to @code{@var{x}/60/60/24}.)
4505 @end deftypefn
4506
4507 @cindex hours
4508 @cindex time, in hours
4509 @deftypefn {Function} {} CTIME.HOURS (@var{time})
4510 Results in the number of hours and fractional hours in @var{time}.
4511 (@code{CTIME.HOURS(@var{x})} is equivalent to @code{@var{x}/60/60}.)
4512 @end deftypefn
4513
4514 @cindex minutes
4515 @cindex time, in minutes
4516 @deftypefn {Function} {} CTIME.MINUTES (@var{time})
4517 Results in the number of minutes and fractional minutes in @var{time}.
4518 (@code{CTIME.MINUTES(@var{x})} is equivalent to @code{@var{x}/60}.)
4519 @end deftypefn
4520
4521 @cindex seconds
4522 @cindex time, in seconds
4523 @deftypefn {Function} {} CTIME.SECONDS (@var{time})
4524 Results in the number of seconds and fractional seconds in @var{time}.
4525 (@code{CTIME.SECONDS} does nothing; @code{CTIME.SECONDS(@var{x})} is
4526 equivalent to @code{@var{x}}.)
4527 @end deftypefn
4528
4529 @node Date Construction, Date Extraction, Time Extraction, Time & Date
4530 @subsubsection Functions that Produce Dates
4531 @cindex dates, constructing
4532 @cindex constructing dates
4533
4534 @cindex arguments, of date construction functions
4535 These functions take numeric arguments and give numeric results in the
4536 PSPP date format.  Arguments taken by these functions are:
4537
4538 @table @var
4539 @item day
4540 Refers to a day of the month between 1 and 31.
4541
4542 @item month
4543 Refers to a month of the year between 1 and 12.
4544
4545 @item quarter
4546 Refers to a quarter of the year between 1 and 4.  The quarters of the
4547 year begin on the first days of months 1, 4, 7, and 10.
4548
4549 @item week
4550 Refers to a week of the year between 1 and 53.
4551
4552 @item yday
4553 Refers to a day of the year between 1 and 366.
4554
4555 @item year
4556 Refers to a year between 1582 and 19999.
4557 @end table
4558
4559 @cindex arguments, invalid
4560 If these functions' arguments are out-of-range, they are correctly
4561 normalized before conversion to date format.  Non-integers are rounded
4562 toward zero.
4563
4564 @cindex day-month-year
4565 @cindex dates, day-month-year
4566 @deftypefn {Function} {} DATE.DMY (@var{day}, @var{month}, @var{year})
4567 @deftypefnx {Function} {} DATE.MDY (@var{month}, @var{day}, @var{year})
4568 Results in a date value corresponding to the midnight before day
4569 @var{day} of month @var{month} of year @var{year}.
4570 @end deftypefn
4571
4572 @cindex month-year
4573 @cindex dates, month-year
4574 @deftypefn {Function} {} DATE.MOYR (@var{month}, @var{year})
4575 Results in a date value corresponding to the midnight before the first
4576 day of month @var{month} of year @var{year}.
4577 @end deftypefn
4578
4579 @cindex quarter-year
4580 @cindex dates, quarter-year
4581 @deftypefn {Function} {} DATE.QYR (@var{quarter}, @var{year})
4582 Results in a date value corresponding to the midnight before the first
4583 day of quarter @var{quarter} of year @var{year}.
4584 @end deftypefn
4585
4586 @cindex week-year
4587 @cindex dates, week-year
4588 @deftypefn {Function} {} DATE.WKYR (@var{week}, @var{year})
4589 Results in a date value corresponding to the midnight before the first
4590 day of week @var{week} of year @var{year}.
4591 @end deftypefn
4592
4593 @cindex year-day
4594 @cindex dates, year-day
4595 @deftypefn {Function} {} DATE.YRDAY (@var{year}, @var{yday})
4596 Results in a date value corresponding to the midnight before day
4597 @var{yday} of year @var{year}.
4598 @end deftypefn
4599
4600 @node Date Extraction,  , Date Construction, Time & Date
4601 @subsubsection Functions that Examine Dates
4602 @cindex extraction, of dates
4603 @cindex date examination
4604
4605 @cindex arguments, of date extraction functions
4606 These functions take numeric arguments in PSPP date or time
4607 format and give numeric results.  These names are used for arguments:
4608
4609 @table @var
4610 @item date
4611 A numeric value in PSPP date format.
4612
4613 @item time
4614 A numeric value in PSPP time format.
4615
4616 @item time-or-date
4617 A numeric value in PSPP time or date format.
4618 @end table
4619
4620 @cindex days
4621 @cindex dates, in days
4622 @cindex time, in days
4623 @deftypefn {Function} {} XDATE.DATE (@var{time-or-date})
4624 For a time, results in the time corresponding to the number of whole
4625 days @var{date-or-time} includes.  For a date, results in the date
4626 corresponding to the latest midnight at or before @var{date-or-time};
4627 that is, gives the date that @var{date-or-time} is in.
4628 (XDATE.DATE(@var{x}) is equivalent to TRUNC(@var{x}/86400)*86400.)
4629 Applying this function to a time is a non-portable feature.
4630 @end deftypefn
4631
4632 @cindex hours
4633 @cindex dates, in hours
4634 @cindex time, in hours
4635 @deftypefn {Function} {} XDATE.HOUR (@var{time-or-date})
4636 For a time, results in the number of whole hours beyond the number of
4637 whole days represented by @var{date-or-time}.  For a date, results in
4638 the hour (as an integer between 0 and 23) corresponding to
4639 @var{date-or-time}.  (XDATE.HOUR(@var{x}) is equivalent to
4640 MOD(TRUNC(@var{x}/3600),24))  Applying this function to a time is a
4641 non-portable feature.
4642 @end deftypefn
4643
4644 @cindex day of the year
4645 @cindex dates, day of the year
4646 @deftypefn {Function} {} XDATE.JDAY (@var{date})
4647 Results in the day of the year (as an integer between 1 and 366)
4648 corresponding to @var{date}.
4649 @end deftypefn
4650
4651 @cindex day of the month
4652 @cindex dates, day of the month
4653 @deftypefn {Function} {} XDATE.MDAY (@var{date})
4654 Results in the day of the month (as an integer between 1 and 31)
4655 corresponding to @var{date}.
4656 @end deftypefn
4657
4658 @cindex minutes
4659 @cindex dates, in minutes
4660 @cindex time, in minutes
4661 @deftypefn {Function} {} XDATE.MINUTE (@var{time-or-date})
4662 Results in the number of minutes (as an integer between 0 and 59) after
4663 the last hour in @var{time-or-date}.  (XDATE.MINUTE(@var{x}) is
4664 equivalent to MOD(TRUNC(@var{x}/60),60)) Applying this function to a
4665 time is a non-portable feature.
4666 @end deftypefn
4667
4668 @cindex months
4669 @cindex dates, in months
4670 @deftypefn {Function} {} XDATE.MONTH (@var{date})
4671 Results in the month of the year (as an integer between 1 and 12)
4672 corresponding to @var{date}.
4673 @end deftypefn
4674
4675 @cindex quarters
4676 @cindex dates, in quarters
4677 @deftypefn {Function} {} XDATE.QUARTER (@var{date})
4678 Results in the quarter of the year (as an integer between 1 and 4)
4679 corresponding to @var{date}.
4680 @end deftypefn
4681
4682 @cindex seconds
4683 @cindex dates, in seconds
4684 @cindex time, in seconds
4685 @deftypefn {Function} {} XDATE.SECOND (@var{time-or-date})
4686 Results in the number of whole seconds after the last whole minute (as
4687 an integer between 0 and 59) in @var{time-or-date}.
4688 (XDATE.SECOND(@var{x}) is equivalent to MOD(@var{x}, 60).)  Applying
4689 this function to a time is a non-portable feature.
4690 @end deftypefn
4691
4692 @cindex days
4693 @cindex times, in days
4694 @deftypefn {Function} {} XDATE.TDAY (@var{time})
4695 Results in the number of whole days (as an integer) in @var{time}.
4696 (XDATE.TDAY(@var{x}) is equivalent to TRUNC(@var{x}/86400).)
4697 @end deftypefn
4698
4699 @cindex time
4700 @cindex dates, time of day
4701 @deftypefn {Function} {} XDATE.TIME (@var{date})
4702 Results in the time of day at the instant corresponding to @var{date},
4703 in PSPP time format.  This is the number of seconds since
4704 midnight on the day corresponding to @var{date}.  (XDATE.TIME(@var{x}) is
4705 equivalent to TRUNC(@var{x}/86400)*86400.)
4706 @end deftypefn
4707
4708 @cindex week
4709 @cindex dates, in weeks
4710 @deftypefn {Function} {} XDATE.WEEK (@var{date})
4711 Results in the week of the year (as an integer between 1 and 53)
4712 corresponding to @var{date}.
4713 @end deftypefn
4714
4715 @cindex day of the week
4716 @cindex weekday
4717 @cindex dates, day of the week
4718 @cindex dates, in weekdays
4719 @deftypefn {Function} {} XDATE.WKDAY (@var{date})
4720 Results in the day of week (as an integer between 1 and 7) corresponding
4721 to @var{date}.  The days of the week are:
4722
4723 @table @asis
4724 @item 1
4725 Sunday
4726 @item 2
4727 Monday
4728 @item 3
4729 Tuesday
4730 @item 4
4731 Wednesday
4732 @item 5
4733 Thursday
4734 @item 6
4735 Friday
4736 @item 7
4737 Saturday
4738 @end table
4739 @end deftypefn
4740
4741 @cindex years
4742 @cindex dates, in years
4743 @deftypefn {Function} {} XDATE.YEAR (@var{date})
4744 Returns the year (as an integer between 1582 and 19999) corresponding to
4745 @var{date}.
4746 @end deftypefn
4747
4748 @node Miscellaneous Functions, Functions Not Implemented, Time & Date, Functions
4749 @subsection Miscellaneous Functions
4750 @cindex functions, miscellaneous
4751
4752 Miscellaneous functions take various arguments and produce various
4753 results.
4754
4755 @cindex cross-case function
4756 @cindex function, cross-case
4757 @deftypefn {Function} {} LAG (@var{variable})
4758 @var{variable} must be a numeric or string variable name.  @code{LAG}
4759 results in the value of that variable for the case before the current
4760 one.  In case-selection procedures, @code{LAG} results in the value of
4761 the variable for the last case selected.  Results in system-missing (for
4762 numeric variables) or blanks (for string variables) for the first case
4763 or before any cases are selected.
4764 @end deftypefn
4765
4766 @deftypefn {Function} {} LAG (@var{variable}, @var{ncases})
4767 @var{variable} must be a numeric or string variable name.  @var{ncases}
4768 must be a small positive constant integer, although there is no explicit
4769 limit.  (Use of a large value for @var{ncases} will increase memory
4770 consumption, since PSPP must keep @var{ncases} cases in memory.)
4771 @code{LAG (@var{variable}, @var{ncases}} results in the value of
4772 @var{variable} that is @var{ncases} before the case currently being
4773 processed.  See @code{LAG (@var{variable})} above for more details.
4774 @end deftypefn
4775
4776 @cindex date, Julian
4777 @cindex Julian date
4778 @deftypefn {Function} {} YRMODA (@var{year}, @var{month}, @var{day})
4779 @var{year} is a year between 0 and 199 or 1582 and 19999.  @var{month} is
4780 a month between 1 and 12.  @var{day} is a day between 1 and 31.  If
4781 @var{month} or @var{day} is out-of-range, it changes the next higher
4782 unit.  For instance, a @var{day} of 0 refers to the last day of the
4783 previous month, and a @var{month} of 13 refers to the first month of the
4784 next year.  @var{year} must be in range.  If @var{year} is between 0 and
4785 199, 1900 is added.  @var{year}, @var{month}, and @var{day} must all be
4786 integers.
4787
4788 @code{YRMODA} results in the number of days between 15 Oct 1582 and
4789 the date specified, plus one.  The date passed to @code{YRMODA} must be
4790 on or after 15 Oct 1582.  15 Oct 1582 has a value of 1.
4791 @end deftypefn
4792
4793 @node Functions Not Implemented,  , Miscellaneous Functions, Functions
4794 @subsection Functions Not Implemented
4795 @cindex functions, not implemented
4796 @cindex not implemented
4797 @cindex features, not implemented
4798
4799 These functions are not yet implemented and thus not yet documented,
4800 since it's a hassle.
4801
4802 @findex CDF.xxx
4803 @findex CDFNORM
4804 @findex IDF.xxx
4805 @findex NCDF.xxx
4806 @findex PROBIT
4807 @findex RV.xxx
4808
4809 @itemize @bullet
4810 @item
4811 @code{CDF.xxx}
4812 @item
4813 @code{CDFNORM}
4814 @item
4815 @code{IDF.xxx}
4816 @item
4817 @code{NCDF.xxx}
4818 @item
4819 @code{PROBIT}
4820 @item
4821 @code{RV.xxx}
4822 @end itemize
4823
4824 @node Order of Operations,  , Functions, Expressions
4825 @section Operator Precedence
4826 @cindex operator precedence
4827 @cindex precedence, operator
4828 @cindex order of operations
4829 @cindex operations, order of
4830
4831 The following table describes operator precedence.  Smaller-numbered
4832 levels in the table have higher precedence.  Within a level, operations
4833 are performed from left to right, except for level 2 (exponentiation),
4834 where operations are performed from right to left.  If an operator
4835 appears in the table in two places (@code{-}), the first occurrence is
4836 unary, the second is binary.
4837
4838 @enumerate
4839 @item
4840 @code{(  )}
4841 @item
4842 @code{**}
4843 @item
4844 @code{-}
4845 @item
4846 @code{*  /}
4847 @item
4848 @code{+  -}
4849 @item
4850 @code{EQ  GE  GT  LE  LT  NE}
4851 @item
4852 @code{AND  NOT  OR}
4853 @end enumerate
4854
4855 @node Data Input and Output, System and Portable Files, Expressions, Top
4856 @chapter Data Input and Output
4857 @cindex input
4858 @cindex output
4859 @cindex data
4860 @cindex cases
4861 @cindex observations
4862
4863 Data are the focus of the PSPP language.
4864 Each datum  belongs to a @dfn{case} (also called an @dfn{observation}).
4865 Each case represents an individual or `experimental unit'.
4866 For example, in the results of a survey, the names of the respondents,
4867 their sex, age @i{etc}. and their responses are all data and the data
4868 pertaining to single respondent is a case.
4869 This chapter examines
4870 the PSPP commands for defining variables and reading and writing data.
4871
4872 @quotation
4873 @strong{Please note:} Data is not actually read until a procedure is
4874 executed.  These commands tell PSPP how to read data, but they
4875 do not @emph{cause} PSPP to read data.
4876 @end quotation
4877
4878 @menu
4879 * BEGIN DATA::                  Embed data within a syntax file.
4880 * CLEAR TRANSFORMATIONS::       Clear pending transformations.
4881 * DATA LIST::                   Fundamental data reading command.
4882 * END CASE::                    Output the current case.
4883 * END FILE::                    Terminate the current input program.
4884 * FILE HANDLE::                 Support for fixed-length records.
4885 * INPUT PROGRAM::               Support for complex input programs.
4886 * LIST::                        List cases in the active file.
4887 * MATRIX DATA::                 Read matrices in text format.
4888 * NEW FILE::                    Clear the active file and dictionary.
4889 * PRINT::                       Display values in print formats.
4890 * PRINT EJECT::                 Eject the current page then print.
4891 * PRINT SPACE::                 Print blank lines.
4892 * REREAD::                      Take another look at the previous input line.
4893 * REPEATING DATA::              Multiple cases on a single line.
4894 * WRITE::                       Display values in write formats.
4895 @end menu
4896
4897 @node BEGIN DATA, CLEAR TRANSFORMATIONS, Data Input and Output, Data Input and Output
4898 @section BEGIN DATA
4899 @vindex BEGIN DATA
4900 @vindex END DATA
4901 @cindex Embedding data in syntax files
4902 @cindex Data, embedding in syntax files
4903
4904 @display
4905 BEGIN DATA.
4906 @dots{}
4907 END DATA.
4908 @end display
4909
4910 BEGIN DATA and END DATA can be used to embed raw ASCII data in a PSPP
4911 syntax file.  DATA LIST or another input procedure must be used before
4912 BEGIN DATA (@pxref{DATA LIST}).  BEGIN DATA and END DATA must be used
4913 together.  The END DATA command must appear by itself on a single line,
4914 with no leading whitespace and exactly one space between the words
4915 @code{END} and @code{DATA}, followed immediately by the terminal dot,
4916 like this:
4917
4918 @example
4919 END DATA.
4920 @end example
4921
4922 @node CLEAR TRANSFORMATIONS, DATA LIST, BEGIN DATA, Data Input and Output
4923 @section CLEAR TRANSFORMATIONS
4924 @vindex CLEAR TRANSFORMATIONS
4925
4926 @display
4927 CLEAR TRANSFORMATIONS.
4928 @end display
4929
4930 The CLEAR TRANSFORMATIONS command clears out all pending
4931 transformations.  It does not cancel the current input program.  It is
4932 valid only when PSPP is interactive, not in syntax files.
4933
4934 @node DATA LIST, END CASE, CLEAR TRANSFORMATIONS, Data Input and Output
4935 @section DATA LIST
4936 @vindex DATA LIST
4937 @cindex reading data from a file
4938 @cindex data, reading from a file
4939 @cindex data, embedding in syntax files
4940 @cindex embedding data in syntax files
4941
4942 Used to read text or binary data, DATA LIST is the most
4943 fundamental data-reading command.  Even the more sophisticated input
4944 methods use DATA LIST commands as a building block.
4945 Understanding DATA LIST is important to understanding how to use
4946 PSPP to read your data files.
4947
4948 There are two major variants of DATA LIST, which are fixed
4949 format and free format.  In addition, free format has a minor variant,
4950 list format, which is discussed in terms of its differences from vanilla
4951 free format.
4952
4953 Each form of DATA LIST is described in detail below.
4954
4955 @menu
4956 * DATA LIST FIXED::             Fixed columnar locations for data.
4957 * DATA LIST FREE::              Any spacing you like.
4958 * DATA LIST LIST::              Each case must be on a single line.
4959 @end menu
4960
4961 @node DATA LIST FIXED, DATA LIST FREE, DATA LIST, DATA LIST
4962 @subsection DATA LIST FIXED
4963 @vindex DATA LIST FIXED
4964 @cindex reading fixed-format data
4965 @cindex fixed-format data, reading
4966 @cindex data, fixed-format, reading
4967 @cindex embedding fixed-format data
4968
4969 @display
4970 DATA LIST [FIXED]
4971         @{TABLE,NOTABLE@}
4972         FILE='filename'
4973         RECORDS=record_count
4974         END=end_var
4975         /[line_no] var_spec@dots{}
4976
4977 where each var_spec takes one of the forms
4978         var_list start-end [type_spec]
4979         var_list (fortran_spec)
4980 @end display
4981
4982 DATA LIST FIXED is used to read data files that have values at fixed
4983 positions on each line of single-line or multiline records.  The
4984 keyword FIXED is optional.
4985
4986 The FILE subcommand must be used if input is to be taken from an
4987 external file.  It may be used to specify a filename as a string or a
4988 file handle (@pxref{FILE HANDLE}).  If the FILE subcommand is not used,
4989 then input is assumed to be specified within the command file using
4990 BEGIN DATA@dots{}END DATA (@pxref{BEGIN DATA}).
4991
4992 The optional RECORDS subcommand, which takes a single integer as an
4993 argument, is used to specify the number of lines per record.  If RECORDS
4994 is not specified, then the number of lines per record is calculated from
4995 the list of variable specifications later in the DATA LIST command.
4996
4997 The END subcommand is only useful in conjunction with the INPUT PROGRAM
4998 input procedure, and for that reason it is not discussed here
4999 (@pxref{INPUT PROGRAM}).
5000
5001 DATA LIST can optionally output a table describing how the data file
5002 will be read.  The TABLE subcommand enables this output, and NOTABLE
5003 disables it.  The default is to output the table.
5004
5005 The list of variables to be read from the data list must come last in
5006 the DATA LIST command.  Each line in the data record is introduced by a
5007 slash (@samp{/}).  Optionally, a line number may follow the slash.
5008 Following, any number of variable specifications may be present.
5009
5010 Each variable specification consists of a list of variable names
5011 followed by a description of their location on the input line.  Sets of
5012 variables may specified using DATA LIST's TO convention (@pxref{Sets of
5013 Variables}).  There are two ways to specify the location of the variable
5014 on the line: SPSS style and FORTRAN style.
5015
5016 With SPSS style, the starting column and ending column for the field
5017 are specified after the variable name, separated by a dash (@samp{-}).
5018 For instance, the third through fifth columns on a line would be
5019 specified @samp{3-5}.  By default, variables are considered to be in
5020 @samp{F} format (@pxref{Input/Output Formats}).  (This default can be
5021 changed; see @ref{SET} for more information.)
5022
5023 When using SPSS style, to use a variable format other than the default,
5024 specify the format type in parentheses after the column numbers.  For
5025 instance, for alphanumeric @samp{A} format, use @samp{(A)}.
5026
5027 In addition, implied decimal places can be specified in parentheses
5028 after the column numbers.  As an example, suppose that a data file has a
5029 field in which the characters @samp{1234} should be interpreted as
5030 having the value 12.34.  Then this field has two implied decimal places,
5031 and the corresponding specification would be @samp{(2)}.  If a field
5032 that has implied decimal places contains a decimal point, then the
5033 implied decimal places are not applied.
5034
5035 Changing the variable format and adding implied decimal places can be
5036 done together; for instance, @samp{(N,5)}.
5037
5038 When using SPSS style, the input and output width of each variable is
5039 computed from the field width.  The field width must be evenly divisible
5040 into the number of variables specified.
5041
5042 FORTRAN style is an altogether different approach to specifying field
5043 locations.  With this approach, a list of variable input format
5044 specifications, separated by commas, are placed after the variable names
5045 inside parentheses.  Each format specifier advances as many characters
5046 into the input line as it uses.
5047
5048 In addition to the standard format specifiers (@pxref{Input/Output
5049 Formats}), FORTRAN style defines some extensions:
5050
5051 @table @asis
5052 @item @code{X}
5053 Advance the current column on this line by one character position.
5054
5055 @item @code{T}@var{x}
5056 Set the current column on this line to column @var{x}, with column
5057 numbers considered to begin with 1 at the left margin.
5058
5059 @item @code{NEWREC}@var{x}
5060 Skip forward @var{x} lines in the current record, resetting the active
5061 column to the left margin.
5062
5063 @item Repeat count
5064 Any format specifier may be preceded by a number.  This causes the
5065 action of that format specifier to be repeated the specified number of
5066 times.
5067
5068 @item (@var{spec1}, @dots{}, @var{specN})
5069 Group the given specifiers together.  This is most useful when preceded
5070 by a repeat count.  Groups may be nested arbitrarily.
5071 @end table
5072
5073 FORTRAN and SPSS styles may be freely intermixed.  SPSS style leaves the
5074 active column immediately after the ending column specified.  Record
5075 motion using @code{NEWREC} in FORTRAN style also applies to later
5076 FORTRAN and SPSS specifiers.
5077
5078 @menu
5079 * DATA LIST FIXED Examples::    Examples of DATA LIST FIXED.
5080 @end menu
5081
5082 @node DATA LIST FIXED Examples,  , DATA LIST FIXED, DATA LIST FIXED
5083 @unnumberedsubsubsec Examples
5084
5085 @enumerate
5086 @item
5087 @example
5088 DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
5089
5090 BEGIN DATA.
5091 John Smith 102311
5092 Bob Arnold 122015
5093 Bill Yates  918 6
5094 END DATA.
5095 @end example
5096
5097 Defines the following variables:
5098
5099 @itemize @bullet
5100 @item
5101 @code{NAME}, a 10-character-wide long string variable, in columns 1
5102 through 10.
5103
5104 @item
5105 @code{INFO1}, a numeric variable, in columns 12 through 13.
5106
5107 @item
5108 @code{INFO2}, a numeric variable, in columns 14 through 15.
5109
5110 @item
5111 @code{INFO3}, a numeric variable, in columns 16 through 17.
5112 @end itemize
5113
5114 The @code{BEGIN DATA}/@code{END DATA} commands cause three cases to be
5115 defined:
5116
5117 @example
5118 Case   NAME         INFO1   INFO2   INFO3
5119    1   John Smith     10      23      11
5120    2   Bob Arnold     12      20      15
5121    3   Bill Yates      9      18       6
5122 @end example
5123
5124 The @code{TABLE} keyword causes PSPP to print out a table
5125 describing the four variables defined.
5126
5127 @item
5128 @example
5129 DAT LIS FIL="survey.dat"
5130         /ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
5131         /Q01 TO Q50 7-56
5132         /.
5133 @end example
5134
5135 Defines the following variables:
5136
5137 @itemize @bullet
5138 @item
5139 @code{ID}, a numeric variable, in columns 1-5 of the first record.
5140
5141 @item
5142 @code{NAME}, a 30-character long string variable, in columns 7-36 of the
5143 first record.
5144
5145 @item
5146 @code{SURNAME}, a 30-character long string variable, in columns 38-67 of
5147 the first record.
5148
5149 @item
5150 @code{MINITIAL}, a 1-character short string variable, in column 69 of
5151 the first record.
5152
5153 @item
5154 Fifty variables @code{Q01}, @code{Q02}, @code{Q03}, @dots{}, @code{Q49},
5155 @code{Q50}, all numeric, @code{Q01} in column 7, @code{Q02} in column 8,
5156 @dots{}, @code{Q49} in column 55, @code{Q50} in column 56, all in the second
5157 record.
5158 @end itemize
5159
5160 Cases are separated by a blank record.
5161
5162 Data is read from file @file{survey.dat} in the current directory.
5163
5164 This example shows keywords abbreviated to their first 3 letters.
5165
5166 @end enumerate
5167
5168 @node DATA LIST FREE, DATA LIST LIST, DATA LIST FIXED, DATA LIST
5169 @subsection DATA LIST FREE
5170 @vindex DATA LIST FREE
5171
5172 @display
5173 DATA LIST FREE
5174         [@{NOTABLE,TABLE@}]
5175         FILE='filename'
5176         END=end_var
5177         /var_spec@dots{}
5178
5179 where each var_spec takes one of the forms
5180         var_list [(type_spec)]
5181         var_list *
5182 @end display
5183
5184 In free format, the input data is structured as a series of comma- or
5185 whitespace-delimited fields (end of line is one form of whitespace; it
5186 is not treated specially).  Field contents may be surrounded by matched
5187 pairs of apostrophes (@samp{'}) or quotes (@samp{"}), or they may be
5188 unenclosed.  For any type of field leading white space (up to the
5189 apostrophe or quote, if any) is not included in the field.
5190
5191 Multiple consecutive delimiters are equivalent to a single delimiter.
5192 To specify an empty field, write an empty set of single or double
5193 quotes; for instance, @samp{""}.
5194
5195 The NOTABLE and TABLE subcommands are as in DATA LIST FIXED above.
5196 NOTABLE is the default.
5197
5198 The FILE and END subcommands are as in DATA LIST FIXED above.
5199
5200 The variables to be parsed are given as a single list of variable names.
5201 This list must be introduced by a single slash (@samp{/}).  The set of
5202 variable names may contain format specifications in parentheses
5203 (@pxref{Input/Output Formats}).  Format specifications apply to all
5204 variables back to the previous parenthesized format specification.
5205
5206 In addition, an asterisk may be used to indicate that all variables
5207 preceding it are to have input/output format @samp{F8.0}.
5208
5209 Specified field widths are ignored on input, although all normal limits
5210 on field width apply, but they are honored on output.
5211
5212 @node DATA LIST LIST,  , DATA LIST FREE, DATA LIST
5213 @subsection DATA LIST LIST
5214 @vindex DATA LIST LIST
5215
5216 @display
5217 DATA LIST LIST
5218         [@{NOTABLE,TABLE@}]
5219         FILE='filename'
5220         END=end_var
5221         /var_spec@dots{}
5222
5223 where each var_spec takes one of the forms
5224         var_list [(type_spec)]
5225         var_list *
5226 @end display
5227
5228 Syntactically and semantically, DATA LIST LIST is equivalent to DATA
5229 LIST FREE, with one exception: each input line is expected to correspond
5230 to exactly one input record.  If more or fewer fields are found on an
5231 input line than expected, an appropriate diagnostic is issued.
5232
5233 @node END CASE, END FILE, DATA LIST, Data Input and Output
5234 @section END CASE
5235 @vindex END CASE
5236
5237 @display
5238 END CASE.
5239 @end display
5240
5241 END CASE is used within INPUT PROGRAM to output the current case.
5242 @xref{INPUT PROGRAM}.
5243
5244 @node END FILE, FILE HANDLE, END CASE, Data Input and Output
5245 @section END FILE
5246 @vindex END FILE
5247
5248 @display
5249 END FILE.
5250 @end display
5251
5252 END FILE is used within INPUT PROGRAM to terminate the current input
5253 program.  @xref{INPUT PROGRAM}.
5254
5255 @node FILE HANDLE, INPUT PROGRAM, END FILE, Data Input and Output
5256 @section FILE HANDLE
5257 @vindex FILE HANDLE
5258
5259 @display
5260 FILE HANDLE handle_name
5261         /NAME='filename'
5262         /RECFORM=@{VARIABLE,FIXED,SPANNED@}
5263         /LRECL=rec_len
5264         /MODE=@{CHARACTER,IMAGE,BINARY,MULTIPUNCH,360@}
5265 @end display
5266
5267 Use the FILE HANDLE command to define the attributes of a file that does
5268 not use conventional variable-length records terminated by newline
5269 characters.
5270
5271 Specify the file handle name as an identifier.  Any given identifier may
5272 only appear once in a PSPP run.  File handles may not be reassigned to a
5273 different file.  The file handle name must immediately follow the FILE
5274 HANDLE command name.
5275
5276 The NAME subcommand specifies the name of the file associated with the
5277 handle.  It is the only required subcommand.
5278
5279 The RECFORM subcommand specifies how the file is laid out.  VARIABLE
5280 specifies variable-length lines terminated with newlines, and it is the
5281 default.  FIXED specifies fixed-length records.  SPANNED is not
5282 supported.
5283
5284 LRECL specifies the length of fixed-length records.  It is required if
5285 @code{/RECFORM FIXED} is specified.
5286
5287 MODE specifies a file mode.  CHARACTER, the default, causes the data
5288 file to be opened in ANSI C text mode.  BINARY causes the data file to
5289 be opened in ANSI C binary mode.  The other possibilities are not
5290 supported.
5291
5292 @node INPUT PROGRAM, LIST, FILE HANDLE, Data Input and Output
5293 @section INPUT PROGRAM
5294 @vindex INPUT PROGRAM
5295
5296 @display
5297 INPUT PROGRAM.
5298 @dots{} input commands @dots{}
5299 END INPUT PROGRAM.
5300 @end display
5301
5302 The INPUT PROGRAM@dots{}END INPUT PROGRAM construct is used to specify a
5303 complex input program.  By placing data input commands within INPUT
5304 PROGRAM, PSPP programs can take advantage of more complex file
5305 structures than available by using DATA LIST by itself.
5306
5307 The first sort of extended input program is to simply put multiple DATA
5308 LIST commands within the INPUT PROGRAM.  This will cause all of the data
5309 files to be read in parallel.  Input will stop when end of file is
5310 reached on any of the data files.
5311
5312 Transformations, such as conditional and looping constructs, can also be
5313 included within an INPUT PROGRAM.  These can be used to combine input
5314 from several data files in more complex ways.  However, input will still
5315 stop when end of file is reached on any of the data files.
5316
5317 To prevent INPUT PROGRAM from terminating at the first end of file, use
5318 the END subcommand on DATA LIST.  This subcommand takes a variable name,
5319 which should be a numeric scratch variable (@pxref{Scratch Variables}).
5320 (It need not be a scratch variable but otherwise the results can be
5321 surprising.)  The value of this variable is set to 0 when reading the
5322 data file, or 1 when end of file is encountered.
5323
5324 Some additional commands are useful in conjunction with INPUT PROGRAM.
5325 END CASE is the first one.  Normally each loop through the INPUT PROGRAM
5326 structure produces one case.  But with END CASE you can control exactly
5327 when cases are output.  When END CASE is used, looping from the end of
5328 INPUT PROGRAM to the beginning does not cause a case to be output.
5329
5330 END FILE is the other command.  When the END subcommand is used on DATA
5331 LIST, there is no way for the INPUT PROGRAM construct to stop looping,
5332 so an infinite loop results.  The END FILE command, when executed,
5333 stops the flow of input data and passes out of the INPUT PROGRAM
5334 structure.
5335
5336 All this is very confusing.  A few examples should help to clarify.
5337
5338 @example
5339 INPUT PROGRAM.
5340         DATA LIST NOTABLE FILE='a.data'/X 1-10.
5341         DATA LIST NOTABLE FILE='b.data'/Y 1-10.
5342 END INPUT PROGRAM.
5343 LIST.
5344 @end example
5345
5346 The example above reads variable X from file @file{a.data} and variable
5347 Y from file @file{b.data}.  If one file is shorter than the other then
5348 the extra data in the longer file is ignored.
5349
5350 @example
5351 INPUT PROGRAM.
5352         NUMERIC #A #B.
5353
5354         DO IF NOT #A.
5355                 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
5356         END IF.
5357         DO IF NOT #B.
5358                 DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
5359         END IF.
5360         DO IF #A AND #B.
5361                 END FILE.
5362         END IF.
5363         END CASE.
5364 END INPUT PROGRAM.
5365 LIST.
5366 @end example
5367
5368 This example reads variable X from @file{a.data} and variable Y from
5369 @file{b.data}.  If one file is shorter than the other then the missing
5370 field is set to the system-missing value alongside the present value for
5371 the remaining length of the longer file.
5372
5373 @example
5374 INPUT PROGRAM.
5375         NUMERIC #A #B.
5376
5377         DO IF #A.
5378                 DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
5379                 DO IF #B.
5380                         END FILE.
5381                 ELSE.
5382                         END CASE.
5383                 END IF.
5384         ELSE.
5385                 DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
5386                 DO IF NOT #A.
5387                         END CASE.
5388                 END IF.
5389         END IF.
5390 END INPUT PROGRAM.
5391 LIST.
5392 @end example
5393
5394 The above example reads data from file @file{a.data}, then from
5395 @file{b.data}, and concatenates them into a single active file.
5396
5397 @example
5398 INPUT PROGRAM.
5399         NUMERIC #EOF.
5400
5401         LOOP IF NOT #EOF.
5402                 DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
5403                 DO IF NOT #EOF.
5404                         END CASE.
5405                 END IF.
5406         END LOOP.
5407
5408         COMPUTE #EOF = 0.
5409         LOOP IF NOT #EOF.
5410                 DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
5411                 DO IF NOT #EOF.
5412                         END CASE.
5413                 END IF.
5414         END LOOP.
5415
5416         END FILE.
5417 END INPUT PROGRAM.
5418 LIST.
5419 @end example
5420
5421 The above example does the same thing as the previous example, in a
5422 different way.
5423
5424 @example
5425 INPUT PROGRAM.
5426         LOOP #I=1 TO 50.
5427                 COMPUTE X=UNIFORM(10).
5428                 END CASE.
5429         END LOOP.
5430         END FILE.
5431 END INPUT PROGRAM.
5432 LIST/FORMAT=NUMBERED.
5433 @end example
5434
5435 The above example causes an active file to be created consisting of 50
5436 random variates between 0 and 10.
5437
5438 @node LIST, MATRIX DATA, INPUT PROGRAM, Data Input and Output
5439 @section LIST
5440 @vindex LIST
5441
5442 @display
5443 LIST
5444         /VARIABLES=var_list
5445         /CASES=FROM start_index TO end_index BY incr_index
5446         /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@}
5447                 @{NOWEIGHT,WEIGHT@}
5448 @end display
5449
5450 The LIST procedure prints the values of specified variables to the
5451 listing file.
5452
5453 The VARIABLES subcommand specifies the variables whose values are to be
5454 printed.  Keyword VARIABLES is optional.  If VARIABLES subcommand is not
5455 specified then all variables in the active file are printed.
5456
5457 The CASES subcommand can be used to specify a subset of cases to be
5458 printed.  Specify FROM and the case number of the first case to print,
5459 TO and the case number of the last case to print, and BY and the number
5460 of cases to advance between printing cases, or any subset of those
5461 settings.  If CASES is not specified then all cases are printed.
5462
5463 The FORMAT subcommand can be used to change the output format.  NUMBERED
5464 will print case numbers along with each case; UNNUMBERED, the default,
5465 causes the case numbers to be omitted.  The WRAP and SINGLE settings are
5466 currently not used.  WEIGHT will cause case weights to be printed along
5467 with variable values; NOWEIGHT, the default, causes case weights to be
5468 omitted from the output.
5469
5470 Case numbers start from 1.  They are counted after all transformations
5471 have been considered.
5472
5473 LIST will attempt to fit all the values on a single line.  If necessary,
5474 variable names will be display vertically in order to fit.  If values
5475 cannot fit on a single line, then a multi-line format will be used.
5476
5477 LIST is a procedure.  It causes the data to be read.
5478
5479 @node MATRIX DATA, NEW FILE, LIST, Data Input and Output
5480 @section MATRIX DATA
5481 @vindex MATRIX DATA
5482
5483 @display
5484 MATRIX DATA
5485         /VARIABLES=var_list
5486         /FILE='filename'
5487         /FORMAT=@{LIST,FREE@} @{LOWER,UPPER,FULL@} @{DIAGONAL,NODIAGONAL@}
5488         /SPLIT=@{new_var,var_list@}
5489         /FACTORS=var_list
5490         /CELLS=n_cells
5491         /N=n
5492         /CONTENTS=@{N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE,
5493                    DFE,MAT,COV,CORR,PROX@}
5494 @end display
5495
5496 The MATRIX DATA command reads square matrices in one of several textual
5497 formats.  MATRIX DATA clears the dictionary and replaces it and reads a
5498 data file.
5499
5500 Use VARIABLES to specify the variables that form the rows and columns of
5501 the matrices.  You may not specify a variable named VARNAME_.  You
5502 should specify VARIABLES first.
5503
5504 Specify the file to read on FILE, either as a file name string or a file
5505 handle (@pxref{FILE HANDLE}).  If FILE is not specified then matrix data
5506 must immediately follow MATRIX DATA with a BEGIN DATA@dots{}END DATA
5507 construct (@pxref{BEGIN DATA}).
5508
5509 The FORMAT subcommand specifies how the matrices are formatted.  LIST,
5510 the default, indicates that there is one line per row of matrix data;
5511 FREE allows single matrix rows to be broken across multiple lines.  This
5512 is analogous to the difference between DATA LIST FREE and DATA LIST LIST
5513 (@pxref{DATA LIST}).  LOWER, the default, indicates that the lower
5514 triangle of the matrix is given; UPPER indicates the upper triangle; and
5515 FULL indicates that the entire matrix is given.  DIAGONAL, the default,
5516 indicates that the diagonal is part of the data; NODIAGONAL indicates
5517 that it is omitted.  DIAGONAL/NODIAGONAL have no effect when FULL is
5518 specified.
5519
5520 The SPLIT subcommand is used to specify SPLIT FILE variables for the
5521 input matrices (@pxref{SPLIT FILE}).  Specify either a single variable
5522 not specified on VARIABLES, or one or more variables that are specified
5523 on VARIABLES.  In the former case, the SPLIT values are not present in
5524 the data and ROWTYPE_ may not be specified on VARIABLES.  In the latter
5525 case, the SPLIT values are present in the data.
5526
5527 Specify a list of factor variables on FACTORS.  Factor variables must
5528 also be listed on VARIABLES.  Factor variables are used when there are
5529 some variables where, for each possible combination of their values,
5530 statistics on the matrix variables are included in the data.
5531
5532 If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the
5533 CELLS subcommand is required.  Specify the number of factor variable
5534 combinations that are given.  For instance, if factor variable A has 2
5535 values and factor variable B has 3 values, specify 6.
5536
5537 The N subcommand specifies a population number of observations.  When N
5538 is specified, one N record is output for each SPLIT FILE.
5539
5540 Use CONTENTS to specify what sort of information the matrices include.
5541 Each possible option is described in more detail below.  When ROWTYPE_
5542 is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS
5543 is not specified then /CONTENTS=CORR is assumed.
5544
5545 @table @asis
5546 @item N
5547 @item N_VECTOR
5548 Number of observations as a vector, one value for each variable.
5549 @item N_SCALAR
5550 Number of observations as a single value.
5551 @item N_MATRIX
5552 Matrix of counts.
5553 @item MEAN
5554 Vector of means.
5555 @item STDDEV
5556 Vector of standard deviations.
5557 @item COUNT
5558 Vector of counts.
5559 @item MSE
5560 Vector of mean squared errors.
5561 @item DFE
5562 Vector of degrees of freedom.
5563 @item MAT
5564 Generic matrix.
5565 @item COV
5566 Covariance matrix.
5567 @item CORR
5568 Correlation matrix.
5569 @item PROX
5570 Proximities matrix.
5571 @end table
5572
5573 The exact semantics of the matrices read by MATRIX DATA are complex.
5574 Right now MATRIX DATA isn't too useful due to a lack of procedures
5575 accepting or producing related data, so these semantics aren't
5576 documented.  Later, they'll be described here in detail.
5577
5578 @node NEW FILE, PRINT, MATRIX DATA, Data Input and Output
5579 @section NEW FILE
5580 @vindex NEW FILE
5581
5582 @display
5583 NEW FILE.
5584 @end display
5585
5586 The NEW FILE command clears the current active file.
5587
5588 @node PRINT, PRINT EJECT, NEW FILE, Data Input and Output
5589 @section PRINT
5590 @vindex PRINT
5591
5592 @display
5593 PRINT
5594         OUTFILE='filename'
5595         RECORDS=n_lines
5596         @{NOTABLE,TABLE@}
5597         /[line_no] arg@dots{}
5598
5599 arg takes one of the following forms:
5600         'string' [start-end]
5601         var_list start-end [type_spec]
5602         var_list (fortran_spec)
5603         var_list *
5604 @end display
5605
5606 The PRINT transformation writes variable data to an output file.  PRINT
5607 is executed when a procedure causes the data to be read.  In order to
5608 execute the PRINT transformation without invoking a procedure, use the
5609 EXECUTE command (@pxref{EXECUTE}).
5610
5611 All PRINT subcommands are optional.
5612
5613 The OUTFILE subcommand specifies the file to receive the output.  The
5614 file may be a file name as a string or a file handle (@pxref{FILE
5615 HANDLE}).  If OUTFILE is not present then output will be sent to PSPP's
5616 output listing file.
5617
5618 The RECORDS subcommand specifies the number of lines to be output.  The
5619 number of lines may optionally be surrounded by parentheses.
5620
5621 TABLE will cause the PRINT command to output a table to the listing file
5622 that describes what it will print to the output file.  NOTABLE, the
5623 default, suppresses this output table.
5624
5625 Introduce the strings and variables to be printed with a slash
5626 (@samp{/}).  Optionally, the slash may be followed by a number
5627 indicating which output line will be specified.  In the absence of this
5628 line number, the next line number will be specified.  Multiple lines may
5629 be specified using multiple slashes with the intended output for a line
5630 following its respective slash.
5631
5632 Literal strings may be printed.  Specify the string itself.  Optionally
5633 the string may be followed by a column number or range of column
5634 numbers, specifying the location on the line for the string to be
5635 printed.  Otherwise, the string will be printed at the current position
5636 on the line.
5637
5638 Variables to be printed can be specified in the same ways as available
5639 for DATA LIST FIXED (@pxref{DATA LIST FIXED}).  In addition, a variable
5640 list may be followed by an asterisk (@samp{*}), which indicates that the
5641 variables should be printed in their dictionary print formats, separated
5642 by spaces.  A variable list followed by a slash or the end of command
5643 will be interpreted the same way.
5644
5645 If a FORTRAN type specification is used to move backwards on the current
5646 line, then text is written at that point on the line, the line will be
5647 truncated to that length, although additional text being added will
5648 again extend the line to that length.
5649
5650 @node PRINT EJECT, PRINT SPACE, PRINT, Data Input and Output
5651 @section PRINT EJECT
5652 @vindex PRINT EJECT
5653
5654 @display
5655 PRINT EJECT
5656         OUTFILE='filename'
5657         RECORDS=n_lines
5658         @{NOTABLE,TABLE@}
5659         /[line_no] arg@dots{}
5660
5661 arg takes one of the following forms:
5662         'string' [start-end]
5663         var_list start-end [type_spec]
5664         var_list (fortran_spec)
5665         var_list *
5666 @end display
5667
5668 PRINT EJECT is used to write data to an output file.  Before the data is
5669 written, the current page in the listing file is ejected.
5670
5671 @xref{PRINT}, for more information on syntax and usage.
5672
5673 @node PRINT SPACE, REREAD, PRINT EJECT, Data Input and Output
5674 @section PRINT SPACE
5675 @vindex PRINT SPACE
5676
5677 @display
5678 PRINT SPACE OUTFILE='filename' n_lines.
5679 @end display
5680
5681 The PRINT SPACE prints one or more blank lines to an output file.
5682
5683 The OUTFILE subcommand is optional.  It may be used to direct output to
5684 a file specified by file name as a string or file handle (@pxref{FILE
5685 HANDLE}).  If OUTFILE is not specified then output will be directed to
5686 the listing file.
5687
5688 n_lines is also optional.  If present, it is an expression
5689 (@pxref{Expressions}) specifying the number of blank lines to be
5690 printed.  The expression must evaluate to a nonnegative value.
5691
5692 @node REREAD, REPEATING DATA, PRINT SPACE, Data Input and Output
5693 @section REREAD
5694 @vindex REREAD
5695
5696 @display
5697 REREAD FILE=handle COLUMN=column.
5698 @end display
5699
5700 The REREAD transformation allows the previous input line in a data file
5701 already processed by DATA LIST or another input command to be re-read
5702 for further processing.
5703
5704 The FILE subcommand, which is optional, is used to specify the file to
5705 have its line re-read.  The file must be specified in the form of a file
5706 handle (@pxref{FILE HANDLE}).  If FILE is not specified then the last
5707 file specified on DATA LIST will be assumed (last file specified
5708 lexically, not in terms of flow-of-control).
5709
5710 By default, the line re-read is re-read in its entirety.  With the
5711 COLUMN subcommand, a prefix of the line can be exempted from
5712 re-reading.  Specify an expression (@pxref{Expressions}) evaluating to
5713 the first column that should be included in the re-read line.  Columns
5714 are numbered from 1 at the left margin.
5715
5716 Multiple REREAD commands will not back up in the data file.  Instead,
5717 they will re-read the same line multiple times.
5718
5719 @node REPEATING DATA, WRITE, REREAD, Data Input and Output
5720 @section REPEATING DATA
5721 @vindex REPEATING DATA
5722
5723 @display
5724 REPEATING DATA
5725         /STARTS=start-end
5726         /OCCURS=n_occurs
5727         /FILE='filename'
5728         /LENGTH=length
5729         /CONTINUED[=cont_start-cont_end]
5730         /ID=id_start-id_end=id_var
5731         /@{TABLE,NOTABLE@}
5732         /DATA=var_spec@dots{}
5733
5734 where each var_spec takes one of the forms
5735         var_list start-end [type_spec]
5736         var_list (fortran_spec)
5737 @end display
5738
5739 The REPEATING DATA command is used to parse groups of data repeating in
5740 a uniform format, possibly with several groups on a single line.  Each
5741 group of data corresponds with one case.  REPEATING DATA may only be
5742 used within an INPUT PROGRAM structure.  When used with DATA LIST, it
5743 can be used to parse groups of cases that share a subset of variables
5744 but differ in their other data.
5745
5746 The STARTS subcommand is required.  Specify a range of columns, using
5747 literal numbers or numeric variable names.  This range specifies the
5748 columns on the first line that are used to contain groups of data.  The
5749 ending column is optional.  If it is not specified, then the record
5750 width of the input file is used.  For the inline file (@pxref{BEGIN
5751 DATA}) this is 80 columns; for a file with fixed record widths it is the
5752 record width; for other files it is 1024 characters by default.
5753
5754 The OCCURS subcommand is required.  It must be a number or the name of a
5755 numeric variable.  Its value is the number of groups present in the
5756 current record.
5757
5758 The DATA subcommand is required.  It must be the last subcommand
5759 specified.  It is used to specify the data present within each repeating
5760 group.  Column numbers are specified relative to the beginning of a
5761 group at column 1.  Data is specified in the same way as with DATA LIST
5762 FIXED (@pxref{DATA LIST FIXED}).
5763
5764 All other subcommands are optional.
5765
5766 FILE specifies the file to read, either a file name as a string or a
5767 file handle (@pxref{FILE HANDLE}).  If FILE is not present then the
5768 default is the last file handle used on DATA LIST (lexically, not in
5769 terms of flow of control).
5770
5771 By default REPEATING DATA will output a table describing how it will
5772 parse the input data.  Specifying NOTABLE will disable this behavior;
5773 specifying TABLE will explicitly enable it.
5774
5775 The LENGTH subcommand specifies the length in characters of each group.
5776 If it is not present then length is inferred from the DATA subcommand.
5777 LENGTH can be a number or a variable name.
5778
5779 Normally all the data groups are expected to be present on a single
5780 line.  Use the CONTINUED command to indicate that data can be continued
5781 onto additional lines.  If data on continuation lines starts at the left
5782 margin and continues through the entire field width, no column
5783 specifications are necessary on CONTINUED.  Otherwise, specify the
5784 possible range of columns in the same way as on STARTS.
5785
5786 When data groups are continued from line to line, it's easily possible
5787 for cases to get out of sync if hand editing is not done carefully.  The
5788 ID subcommand allows a case identifier to be present on each line of
5789 repeating data groups.  REPEATING DATA will check for the same
5790 identifier on each line and report mismatches.  Specify the range of
5791 columns that the identifier will occupy, followed by an equals sign
5792 (@samp{=}) and the identifier variable name.  The variable must already
5793 have been declared with NUMERIC or another command.
5794
5795 @node WRITE,  , REPEATING DATA, Data Input and Output
5796 @section WRITE
5797 @vindex WRITE
5798
5799 @display
5800 WRITE
5801         OUTFILE='filename'
5802         RECORDS=n_lines
5803         @{NOTABLE,TABLE@}
5804         /[line_no] arg@dots{}
5805
5806 arg takes one of the following forms:
5807         'string' [start-end]
5808         var_list start-end [type_spec]
5809         var_list (fortran_spec)
5810         var_list *
5811 @end display
5812
5813 WRITE is used to write text or binary data to an output file.
5814
5815 @xref{PRINT}, for more information on syntax and usage.  The main
5816 difference between PRINT and WRITE is that whereas by default PRINT uses
5817 variables' print formats, WRITE uses write formats.
5818
5819 The sole additional difference is that if WRITE is used to send output
5820 to a binary file, carriage control characters will not be output.
5821 @xref{FILE HANDLE}, for information on how to declare a file as binary.
5822
5823 @node System and Portable Files, Variable Attributes, Data Input and Output, Top
5824 @chapter System Files and Portable Files
5825
5826 The commands in this chapter read, write, and examine system files and
5827 portable files.
5828
5829 @menu
5830 * APPLY DICTIONARY::            Apply system file dictionary to active file.
5831 * EXPORT::                      Write to a portable file.
5832 * GET::                         Read from a system file.
5833 * IMPORT::                      Read from a portable file.
5834 * MATCH FILES::                 Merge system files.
5835 * SAVE::                        Write to a system file.
5836 * SYSFILE INFO::                Display system file dictionary.
5837 * XSAVE::                       Write to a system file, as a transform.
5838 @end menu
5839
5840 @node APPLY DICTIONARY, EXPORT, System and Portable Files, System and Portable Files
5841 @section APPLY DICTIONARY
5842 @vindex APPLY DICTIONARY
5843
5844 @display
5845 APPLY DICTIONARY FROM='filename'.
5846 @end display
5847
5848 The APPLY DICTIONARY command applies the variable labels, value labels,
5849 and missing values from variables in a system file to corresponding
5850 variables in the active file.  In some cases it also updates the
5851 weighting variable.
5852
5853 Specify a system file with a file name string or as a file handle
5854 (@pxref{FILE HANDLE}).  The dictionary in the system file will be read,
5855 but it will not replace the active file dictionary.  The system file's
5856 data will not be read.
5857
5858 Only variables with names that exist in both the active file and the
5859 system file are considered.  Variables with the same name but different
5860 types (numeric, string) will cause an error message.  Otherwise, the
5861 system file variables' attributes will replace those in their matching
5862 active file variables, as described below.
5863
5864 If a system file variable has a variable label, then it will replace the
5865 active file variable's variable label.  If the system file variable does
5866 not have a variable label, then the active file variable's variable
5867 label, if any, will be retained.
5868
5869 If the active file variable is numeric or short string, then value
5870 labels and missing values, if any, will be copied to the active file
5871 variable.  If the system file variable does not have value labels or
5872 missing values, then those in the active file variable, if any, will not
5873 be disturbed.
5874
5875 Finally, weighting of the active file is updated (@pxref{WEIGHT}).  If
5876 the active file has a weighting variable, and the system file does not,
5877 or if the weighting variable in the system file does not exist in the
5878 active file, then the active file weighting variable, if any, is
5879 retained.  Otherwise, the weighting variable in the system file becomes
5880 the active file weighting variable.
5881
5882 APPLY DICTIONARY takes effect immediately.  It does not read the active
5883 file.  The system file is not modified.
5884
5885 @node EXPORT, GET, APPLY DICTIONARY, System and Portable Files
5886 @section EXPORT
5887 @vindex EXPORT
5888
5889 @display
5890 EXPORT
5891         /OUTFILE='filename'
5892         /DROP=var_list
5893         /KEEP=var_list
5894         /RENAME=(src_names=target_names)@dots{}
5895 @end display
5896
5897 The EXPORT procedure writes the active file dictionary and data to a
5898 specified portable file.
5899
5900 The OUTFILE subcommand, which is the only required subcommand, specifies
5901 the portable file to be written as a file name string or a file handle
5902 (@pxref{FILE HANDLE}).
5903
5904 DROP, KEEP, and RENAME follow the same format as the SAVE procedure
5905 (@pxref{SAVE}).
5906
5907 EXPORT is a procedure.  It causes the active file to be read.
5908
5909 @node GET, IMPORT, EXPORT, System and Portable Files
5910 @section GET
5911 @vindex GET
5912
5913 @display
5914 GET
5915         /FILE='filename'
5916         /DROP=var_list
5917         /KEEP=var_list
5918         /RENAME=(src_names=target_names)@dots{}
5919 @end display
5920
5921 The GET transformation clears the current dictionary and active file and
5922 replaces them with the dictionary and data from a specified system file.
5923
5924 The FILE subcommand is the only required subcommand.  Specify the system
5925 file to be read as a string file name or a file handle (@pxref{FILE
5926 HANDLE}).
5927
5928 By default, all the variables in a system file are read.  The DROP
5929 subcommand can be used to specify a list of variables that are not to be
5930 read.  By contrast, the KEEP subcommand can be used to specify variable
5931 that are to be read, with all other variables not read.
5932
5933 Normally variables in a system file retain the names that they were
5934 saved under.  Use the RENAME subcommand to change these names.  Specify,
5935 within parentheses, a list of variable names followed by an equals sign
5936 (@samp{=}) and the names that they should be renamed to.  Multiple
5937 parenthesized groups of variable names can be included on a single
5938 RENAME subcommand.  Variables' names may be swapped using a RENAME
5939 subcommand of the form @samp{/RENAME=(A B=B A)}.
5940
5941 Alternate syntax for the RENAME subcommand allows the parentheses to be
5942 eliminated.  When this is done, only a single variable may be renamed at
5943 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
5944 deprecated.
5945
5946 DROP, KEEP, and RENAME are performed in left-to-right order.  They each
5947 may be present any number of times.
5948
5949 Please note that DROP, KEEP, and RENAME do not cause the system file on
5950 disk to be modified.  Only the active file read from the system file is
5951 changed.
5952
5953 GET does not cause the data to be read, only the dictionary.  The data
5954 is read later, when a procedure is executed.
5955
5956 @node IMPORT, MATCH FILES, GET, System and Portable Files
5957 @section IMPORT
5958 @vindex IMPORT
5959
5960 @display
5961 IMPORT
5962         /FILE='filename'
5963         /TYPE=@{COMM,TAPE@}
5964         /DROP=var_list
5965         /KEEP=var_list
5966         /RENAME=(src_names=target_names)@dots{}
5967 @end display
5968
5969 The IMPORT transformation clears the active file dictionary and data and
5970 replaces them with a dictionary and data from a portable file on disk.
5971
5972 The FILE subcommand, which is the only required subcommand, specifies
5973 the portable file to be read as a file name string or a file handle
5974 (@pxref{FILE HANDLE}).
5975
5976 The TYPE subcommand is currently not used.
5977
5978 DROP, KEEP, and RENAME follow the syntax used by GET (@pxref{GET}).
5979
5980 IMPORT does not cause the data to be read, only the dictionary.  The
5981 data is read later, when a procedure is executed.
5982
5983 @node MATCH FILES, SAVE, IMPORT, System and Portable Files
5984 @section MATCH FILES
5985 @vindex MATCH FILES
5986
5987 @display
5988 MATCH FILES
5989         /BY var_list
5990         /@{FILE,TABLE@}=@{*,'filename'@}
5991         /DROP=var_list
5992         /KEEP=var_list
5993         /RENAME=(src_names=target_names)@dots{}
5994         /IN=var_name
5995         /FIRST=var_name
5996         /LAST=var_name
5997         /MAP
5998 @end display
5999
6000 The MATCH FILES command merges one or more system files, optionally
6001 including the active file.  Records with the same values for BY
6002 variables are combined into a single record.  Records with different
6003 values are output in order.  Thus, multiple sorted system files are
6004 combined into a single sorted system file based on the value of the BY
6005 variables.
6006
6007 The BY subcommand specifies a list of variables that are used to match
6008 records from each of the system files.  Variables specified must exist
6009 in all the files specified on FILE and TABLE.  BY should usually be
6010 specified.  If TABLE is used then BY is required.
6011
6012 Specify FILE with a system file as a file name string or file handle
6013 (@pxref{FILE HANDLE}).  An asterisk (@samp{*}) may also be specified to
6014 indicate the current active file.  The files specified on FILE are
6015 merged together based on the BY variables, or combined case-by-case if
6016 BY is not specified.  Normally at least two FILE subcommands should be
6017 specified.
6018
6019 Specify TABLE with a system file in order to use it as a @dfn{table
6020 lookup file}.  Records in table lookup files are not used up after
6021 they've been used once.  This means that data in table lookup files can
6022 correspond to any number of records in FILE files.  Table lookup files
6023 correspond to lookup tables in traditional relational database systems.
6024 It is incorrect to have records with duplicate BY values in table lookup
6025 files.
6026
6027 Any number of FILE and TABLE subcommands may be specified.  Each
6028 instance of FILE or TABLE can be followed by DROP, KEEP, and/or RENAME
6029 subcommands.  These take the same form as the corresponding subcommands
6030 of GET (@pxref{GET}), and perform the same functions.
6031
6032 Variables belonging to files that are not present for the current case
6033 are set to the system-missing value for numeric variables or spaces for
6034 string variables.
6035
6036 IN, FIRST, LAST, and MAP are currently not used.
6037
6038 @node SAVE, SYSFILE INFO, MATCH FILES, System and Portable Files
6039 @section SAVE
6040 @vindex SAVE
6041
6042 @display
6043 SAVE
6044         /OUTFILE='filename'
6045         /@{COMPRESSED,UNCOMPRESSED@}
6046         /DROP=var_list
6047         /KEEP=var_list
6048         /RENAME=(src_names=target_names)@dots{}
6049 @end display
6050
6051 The SAVE procedure causes the dictionary and data in the active file to
6052 be written to a system file.
6053
6054 The FILE subcommand is the only required subcommand.  Specify the system
6055 file to be written as a string file name or a file handle (@pxref{FILE
6056 HANDLE}).
6057
6058 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
6059 system file is compressed.  By default, system files are compressed.
6060 This default can be changed with the SET command (@pxref{SET}).
6061
6062 By default, all the variables in the active file dictionary are written
6063 to the system file.  The DROP subcommand can be used to specify a list
6064 of variables not to be written.  In contrast, KEEP specifies variables
6065 to be written, with all variables not specified not written.
6066
6067 Normally variables are saved to a system file under the same names they
6068 have in the active file.  Use the RENAME command to change these names.
6069 Specify, within parentheses, a list of variable names followed by an
6070 equals sign (@samp{=}) and the names that they should be renamed to.
6071 Multiple parenthesized groups of variable names can be included on a
6072 single RENAME subcommand.  Variables' names may be swapped using a
6073 RENAME subcommand of the form @samp{/RENAME=(A B=B A)}.
6074
6075 Alternate syntax for the RENAME subcommand allows the parentheses to be
6076 eliminated.  When this is done, only a single variable may be renamed at
6077 once.  For instance, @samp{/RENAME=A=B}.  This alternate syntax is
6078 deprecated.
6079
6080 DROP, KEEP, and RENAME are performed in left-to-right order.  They each
6081 may be present any number of times.
6082
6083 Please note that DROP, KEEP, and RENAME do not cause the active file to
6084 be modified.  Only the system file written to disk is changed.
6085
6086 SAVE causes the data to be read.  It is a procedure.
6087
6088 @node SYSFILE INFO, XSAVE, SAVE, System and Portable Files
6089 @section SYSFILE INFO
6090 @vindex SYSFILE INFO
6091
6092 @display
6093 SYSFILE INFO FILE='filename'.
6094 @end display
6095
6096 The SYSFILE INFO command reads the dictionary in a system file and
6097 displays the information in its dictionary.
6098
6099 Specify a file name or file handle.  SYSFILE INFO will read that file as
6100 a system file and display information on its dictionary.
6101
6102 The file does not replace the current active file.
6103
6104 @node XSAVE,  , SYSFILE INFO, System and Portable Files
6105 @section XSAVE
6106 @vindex XSAVE
6107
6108 @display
6109 XSAVE
6110         /FILE='filename'
6111         /@{COMPRESSED,UNCOMPRESSED@}
6112         /DROP=var_list
6113         /KEEP=var_list
6114         /RENAME=(src_names=target_names)@dots{}
6115 @end display
6116
6117 The XSAVE transformation writes the active file dictionary and data to a
6118 system file stored on disk.
6119
6120 XSAVE is a transformation, not a procedure.  It is executed when the
6121 data is read by a procedure or procedure-like command.  In all other
6122 respects, XSAVE is identical to SAVE.  @xref{SAVE}, for more information
6123 on syntax and usage.
6124
6125 @node Variable Attributes, Data Manipulation, System and Portable Files, Top
6126 @chapter Manipulating variables
6127
6128 The variables in the active file dictionary are important.  There are
6129 several utility functions for examining and adjusting them.
6130
6131 @menu
6132 * ADD VALUE LABELS::            Add value labels to variables.
6133 * DISPLAY::                     Display variable names & descriptions.
6134 * DISPLAY VECTORS::             Display a list of vectors.
6135 * FORMATS::                     Set print and write formats.
6136 * LEAVE::                       Don't clear variables between cases.
6137 * MISSING VALUES::              Set missing values for variables.
6138 * MODIFY VARS::                 Rename, reorder, and drop variables.
6139 * NUMERIC::                     Create new numeric variables.
6140 * PRINT FORMATS::               Set variable print formats.
6141 * RENAME VARIABLES::            Rename variables.
6142 * VALUE LABELS::                Set value labels for variables.
6143 * STRING::                      Create new string variables.
6144 * VARIABLE LABELS::             Set variable labels for variables.
6145 * VECTOR::                      Declare an array of variables.
6146 * WRITE FORMATS::               Set variable write formats.
6147 @end menu
6148
6149 @node ADD VALUE LABELS, DISPLAY, Variable Attributes, Variable Attributes
6150 @section ADD VALUE LABELS
6151 @vindex ADD VALUE LABELS
6152
6153 @display
6154 ADD VALUE LABELS
6155         /var_list value 'label' [value 'label']@dots{}
6156 @end display
6157
6158 ADD VALUE LABELS has the same syntax and purpose as VALUE LABELS (see
6159 above), but it does not clear away value labels from the variables
6160 before adding the ones specified.
6161
6162 @node DISPLAY, DISPLAY VECTORS, ADD VALUE LABELS, Variable Attributes
6163 @section DISPLAY
6164 @vindex DISPLAY
6165
6166 @display
6167 DISPLAY @{NAMES,INDEX,LABELS,VARIABLES,DICTIONARY,SCRATCH@}
6168         [SORTED] [var_list]
6169 @end display
6170
6171 DISPLAY displays requested information on variables.  Variables can
6172 optionally be sorted alphabetically.  The entire dictionary or just
6173 specified variables can be described.
6174
6175 One of the following keywords can be present:
6176
6177 @table @asis
6178 @item NAMES
6179 The variables' names are displayed.
6180
6181 @item INDEX
6182 The variables' names are displayed along with a value describing their
6183 position within the active file dictionary.
6184
6185 @item LABELS
6186 Variable names, positions, and variable labels are displayed.
6187
6188 @item VARIABLES
6189 Variable names, positions, print and write formats, and missing values
6190 are displayed.
6191
6192 @item DICTIONARY
6193 Variable names, positions, print and write formats, missing values,
6194 variable labels, and value labels are displayed.
6195
6196 @item SCRATCH
6197 Varible names are displayed, for scratch variables only (@pxref{Scratch
6198 Variables}).
6199 @end table
6200
6201 If SORTED is specified, then the variables are displayed in ascending
6202 order based on their names; otherwise, they are displayed in the order
6203 that they occur in the active file dictionary.
6204
6205 @node DISPLAY VECTORS, FORMATS, DISPLAY, Variable Attributes
6206 @section DISPLAY VECTORS
6207 @vindex DISPLAY VECTORS
6208
6209 @display
6210 DISPLAY VECTORS.
6211 @end display
6212
6213 The DISPLAY VECTORS command causes a list of the currently declared
6214 vectors to be displayed.
6215
6216 @node FORMATS, LEAVE, DISPLAY VECTORS, Variable Attributes
6217 @section FORMATS
6218 @vindex FORMATS
6219
6220 @display
6221 FORMATS var_list (fmt_spec).
6222 @end display
6223
6224 The FORMATS command set the print and write formats for the specified
6225 variables to the specified format specification.  @xref{Input/Output
6226 Formats}.
6227
6228 Specify a list of variables followed by a format specification in
6229 parentheses.  The print and write formats of the specified variables
6230 will be changed.
6231
6232 Additional lists of variables and formats may be included if they are
6233 delimited by a slash (@samp{/}).
6234
6235 The FORMATS command takes effect immediately.  It is not affected by
6236 conditional and looping structures such as DO IF or LOOP.
6237
6238 @node LEAVE, MISSING VALUES, FORMATS, Variable Attributes
6239 @section LEAVE
6240 @vindex LEAVE
6241
6242 @display
6243 LEAVE var_list.
6244 @end display
6245
6246 The LEAVE command prevents the specified variables from being
6247 reinitialized whenever a new case is processed.
6248
6249 Normally, when a data file is processed, every variable in the active
6250 file is initialized to the system-missing value or spaces at the
6251 beginning of processing for each case.  When a variable has been
6252 specified on LEAVE, this is not the case.  Instead, that variable is
6253 initialized to 0 (not system-missing) or spaces for the first case.
6254 After that, it retains its value between cases.
6255
6256 This becomes useful for counters.  For instance, in the example below
6257 the variable SUM maintains a running total of the values in the ITEM
6258 variable.
6259
6260 @example
6261 DATA LIST /ITEM 1-3.
6262 COMPUTE SUM=SUM+ITEM.
6263 PRINT /ITEM SUM.
6264 LEAVE SUM
6265 BEGIN DATA.
6266 123
6267 404
6268 555
6269 999
6270 END DATA.
6271 @end example
6272
6273 @noindent Partial output from this example:
6274
6275 @example
6276 123   123.00
6277 404   527.00
6278 555  1082.00
6279 999  2081.00
6280 @end example
6281
6282 It is best to use the LEAVE command immediately before invoking a
6283 procedure command, because it is reset by certain transformations---for
6284 instance, COMPUTE and IF.  LEAVE is also reset by all procedure
6285 invocations.
6286
6287 @node MISSING VALUES, MODIFY VARS, LEAVE, Variable Attributes
6288 @section MISSING VALUES
6289 @vindex MISSING VALUES
6290
6291 @display
6292 MISSING VALUES var_list (missing_values).
6293
6294 missing_values takes one of the following forms:
6295         num1
6296         num1, num2
6297         num1, num2, num3
6298         num1 THRU num2
6299         num1 THRU num2, num3
6300         string1
6301         string1, string2
6302         string1, string2, string3
6303 As part of a range, LO or LOWEST may take the place of num1;
6304 HI or HIGHEST may take the place of num2.
6305 @end display
6306
6307 The MISSING VALUES command sets user-missing values for numeric and
6308 short string variables.  Long string variables may not have missing
6309 values.
6310
6311 Specify a list of variables, followed by a list of their user-missing
6312 values in parentheses.  Up to three discrete values may be given, or,
6313 for numeric variables only, a range of values optionally accompanied by
6314 a single discrete value.  Ranges may be open-ended on one end, indicated
6315 through the use of the keyword LO or LOWEST or HI or HIGHEST.
6316
6317 The MISSING VALUES command takes effect immediately.  It is not affected
6318 by conditional and looping constructs such as DO IF or LOOP.
6319
6320 @node MODIFY VARS, NUMERIC, MISSING VALUES, Variable Attributes
6321 @section MODIFY VARS
6322 @vindex MODIFY VARS
6323
6324 @display
6325 MODIFY VARS
6326         /REORDER=@{FORWARD,BACKWARD@} @{POSITIONAL,ALPHA@} (var_list)@dots{}
6327         /RENAME=(old_names=new_names)@dots{}
6328         /@{DROP,KEEP@}=var_list
6329         /MAP
6330 @end display
6331
6332 The MODIFY VARS commands allows variables in the active file to be
6333 reordered, renamed, or deleted from the active file.
6334
6335 At least one subcommand must be specified, and no subcommand may be
6336 specified more than once.  DROP and KEEP may not both be specified.
6337
6338 The REORDER subcommand changes the order of variables in the active
6339 file.  Specify one or more lists of variable names in parentheses.  By
6340 default, each list of variables is rearranged into the specified order.
6341 To put the variables into the reverse of the specified order, put
6342 keyword BACKWARD before the parentheses.  To put them into alphabetical
6343 order in the dictionary, specify keyword ALPHA before the parentheses.
6344 BACKWARD and ALPHA may also be combined.
6345
6346 To rename variables in the active file, specify RENAME, an equals sign
6347 (@samp{=}), and lists of the old variable names and new variable names
6348 separated by another equals sign within parentheses.  There must be the
6349 same number of old and new variable names.  Each old variable is renamed to
6350 the corresponding new variable name.  Multiple parenthesized groups of
6351 variables may be specified.
6352
6353 The DROP subcommand deletes a specified list of variables from the
6354 active file.
6355
6356 The KEEP subcommand keeps the specified list of variables in the active
6357 file.  Any unlisted variables are deleted from the active file.
6358
6359 MAP is currently ignored.
6360
6361 MODIFY VARS takes effect immediately.  It does not cause the data to be
6362 read.
6363
6364 @node NUMERIC, PRINT FORMATS, MODIFY VARS, Variable Attributes
6365 @section NUMERIC
6366 @vindex NUMERIC
6367
6368 @display
6369 NUMERIC /var_list [(fmt_spec)].
6370 @end display
6371
6372 The NUMERIC command explicitly declares new numeric variables,
6373 optionally setting their output formats.
6374
6375 Specify a slash (@samp{/}), followed by the names of the new numeric
6376 variables.  If you wish to set their output formats, follow their names
6377 by an output format specification in parentheses (@pxref{Input/Output
6378 Formats}).  If no output format specification is given then the
6379 variables will default to F8.2.
6380
6381 Variables created with NUMERIC will be initialized to the system-missing
6382 value.
6383
6384 @node PRINT FORMATS, RENAME VARIABLES, NUMERIC, Variable Attributes
6385 @section PRINT FORMATS
6386 @vindex PRINT FORMATS
6387
6388 @display
6389 PRINT FORMATS var_list (fmt_spec).
6390 @end display
6391
6392 The PRINT FORMATS command sets the print formats for the specified
6393 variables to the specified format specification.
6394
6395 Syntax is identical to that of FORMATS (@pxref{FORMATS}), but the PRINT
6396 FORMATS command sets only print formats, not write formats.
6397
6398 @node RENAME VARIABLES, VALUE LABELS, PRINT FORMATS, Variable Attributes
6399 @section RENAME VARIABLES
6400 @vindex RENAME VARIABLES
6401
6402 @display
6403 RENAME VARIABLES (old_names=new_names)@dots{} .
6404 @end display
6405
6406 The RENAME VARIABLES command allows the names of variables in the active
6407 file to be changed.
6408
6409 To rename variables, specify lists of the old variable names and new
6410 variable names, separated by an equals sign (@samp{=}), within
6411 parentheses.  There must be the same number of old and new variable
6412 names.  Each old variable is renamed to the corresponding new variable
6413 name.  Multiple parenthesized groups of variables may be specified.
6414
6415 RENAME VARIABLES takes effect immediately.  It does not cause the data
6416 to be read.
6417
6418 @node VALUE LABELS, STRING, RENAME VARIABLES, Variable Attributes
6419 @section VALUE LABELS
6420 @vindex VALUE LABELS
6421
6422 @display
6423 VALUE LABELS
6424         /var_list value 'label' [value 'label']@dots{}
6425 @end display
6426
6427 The VALUE LABELS command allows values of numeric and short string
6428 variables to be associated with labels.  In this way, a short value can
6429 stand for a long value.
6430
6431 In order to set up value labels for a set of variables, specify the
6432 variable names after a slash (@samp{/}), followed by a list of values
6433 and their associated labels, separated by spaces.
6434
6435 Before the VALUE LABELS command is executed, any existing value labels
6436 are cleared from the variables specified.
6437
6438 @node STRING, VARIABLE LABELS, VALUE LABELS, Variable Attributes
6439 @section STRING
6440 @vindex STRING
6441
6442 @display
6443 STRING /var_list (fmt_spec).
6444 @end display
6445
6446 The STRING command creates new string variables for use in
6447 transformations.
6448
6449 Specify a slash (@samp{/}), followed by the names of the string
6450 variables to create and the desired output format specification in
6451 parentheses (@pxref{Input/Output Formats}).  Variable widths are
6452 implicitly derived from the specified output formats.
6453
6454 Created variables are initialized to spaces.
6455
6456 @node VARIABLE LABELS, VECTOR, STRING, Variable Attributes
6457 @section VARIABLE LABELS
6458 @vindex VARIABLE LABELS
6459
6460 @display
6461 VARIABLE LABELS
6462         /var_list 'var_label'.
6463 @end display
6464
6465 The VARIABLE LABELS command is used to associate an explanatory name
6466 with a group of variables.  This name (a variable label) is displayed by
6467 statistical procedures.
6468
6469 To assign a variable label to a group of variables, specify a slash
6470 (@samp{/}), followed by the list of variable names and the variable
6471 label as a string.
6472
6473 @node VECTOR, WRITE FORMATS, VARIABLE LABELS, Variable Attributes
6474 @section VECTOR
6475 @vindex VECTOR
6476
6477 @display
6478 Two possible syntaxes:
6479         VECTOR vec_name=var_list.
6480         VECTOR vec_name_list(count).
6481 @end display
6482
6483 The VECTOR command allows a group of variables to be accessed as if they
6484 were consecutive members of an array with a vector(index) notation.
6485
6486 To make a vector out of a set of existing variables, specify a name for
6487 the vector followed by an equals sign (@samp{=}) and the variables that
6488 belong in the vector.
6489
6490 To make a vector and create variables at the same time, specify one or
6491 more vector names followed by a count in parentheses.  This will cause
6492 variables named @code{@var{vec}1} through @code{@var{vec}@var{count}} to
6493 be created as numeric variables.  Variable names including numeric
6494 suffixes may not exceed 8 characters in length, and none of the
6495 variables may exist prior to the VECTOR command.
6496
6497 All the variables in a vector must be the same type.
6498
6499 Vectors created with VECTOR disappear after any procedure or
6500 procedure-like command is executed.  The variables contained in the
6501 vectors remain, unless they are scratch variables (@pxref{Scratch
6502 Variables}).
6503
6504 Variables within a vector may be references in expressions using
6505 vector(index) syntax.
6506
6507 @node WRITE FORMATS,  , VECTOR, Variable Attributes
6508 @section WRITE FORMATS
6509 @vindex WRITE FORMATS
6510
6511 @display
6512 WRITE FORMATS var_list (fmt_spec).
6513 @end display
6514
6515 The WRITE FORMATS command sets the write formats for the specified
6516 variables to the specified format specification.
6517
6518 Syntax is identical to that of FORMATS (@pxref{FORMATS}), but the WRITE
6519 FORMATS command sets only write formats, not print formats.
6520
6521 @node Data Manipulation, Data Selection, Variable Attributes, Top
6522 @chapter Data transformations
6523 @cindex transformations
6524
6525 The PSPP procedures examined in this chapter manipulate data and
6526 prepare the active file for later analyses.  They do not produce output,
6527 as a rule.
6528
6529 @menu
6530 * AGGREGATE::                   Summarize multiple cases into a single case.
6531 * AUTORECODE::                  Automatic recoding of variables.
6532 * COMPUTE::                     Assigning a variable a calculated value.
6533 * COUNT::                       Counting variables with particular values.
6534 * FLIP::                        Exchange variables with cases.
6535 * IF::                          Conditionally assigning a calculated value.
6536 * RECODE::                      Mapping values from one set to another.
6537 * SORT CASES::                  Sort the active file.
6538 @end menu
6539
6540 @node AGGREGATE, AUTORECODE, Data Manipulation, Data Manipulation
6541 @section AGGREGATE
6542 @vindex AGGREGATE
6543
6544 @display
6545 AGGREGATE
6546         /BREAK=var_list
6547         /PRESORTED
6548         /OUTFILE=@{*,'filename'@}
6549         /DOCUMENT
6550         /MISSING=COLUMNWISE
6551         /dest_vars=agr_func(src_vars, args@dots{})@dots{}
6552 @end display
6553
6554 The AGGREGATE command summarizes groups of cases into single cases.
6555 Cases are divided into groups that have the same values for one or more
6556 variables called @dfn{break variables}.  Several functions are available
6557 for summarizing case contents.
6558
6559 BREAK is the only required subcommand (in addition, at least one
6560 aggregation variable must be specified).  Specify a list of variable
6561 names.  The values of these variables are used to divide the active file
6562 into groups to be summarized.
6563
6564 By default, the active file is sorted based on the break variables
6565 before aggregation takes place.  If the active file is already sorted,
6566 specify PRESORTED to save time.
6567
6568 The OUTFILE subcommand specifies a system file by file name string or
6569 file handle (@pxref{FILE HANDLE}).  The aggregated cases are sent to
6570 this file.  If OUTFILE is not specified, or if @samp{*} is specified,
6571 then the aggregated cases replace the active file.
6572
6573 Normally the aggregate file does not receive the documents from the
6574 active file, even if the aggregate file replaces the active file.
6575 Specify DOCUMENT to have the documents from the active file copied to
6576 the aggregate file.
6577
6578 At least one aggregation variable must be specified.  Specify a list of
6579 aggregation variables, an equals sign (@samp{=}), an aggregation
6580 function name (see the list below), and a list of source variables in
6581 parentheses.  In addition, some aggregation functions expect additional
6582 arguments in the parentheses following the source variable names.
6583
6584 There must be exactly as many source variables as aggregation variables.
6585 Each aggregation variable receives the results of applying the specified
6586 aggregation function to the corresponding source variable.  Most
6587 aggregation functions may be applied to numeric and short and long
6588 string variables.  Others are restricted to numeric values; these are
6589 marked as such in this list below.
6590
6591 Any number of sets of aggregation variables may be specified.
6592
6593 The available aggregation functions are as follows:
6594
6595 @table @asis
6596 @item SUM(var_name)
6597 Sum.  Limited to numeric values.
6598 @item MEAN(var_name)
6599 Arithmetic mean.  Limited to numeric values.
6600 @item SD(var_name)
6601 Standard deviation of the mean.  Limited to numeric values.
6602 @item MAX(var_name)
6603 Maximum value.
6604 @item MIN(var_name)
6605 Minimum value.
6606 @item FGT(var_name, value)
6607 @itemx PGT(var_name, value)
6608 Fraction between 0 and 1, or percentage between 0 and 100, respectively,
6609 of values greater than the specified constant.
6610 @item FLT(var_name, value)
6611 @itemx PLT(var_name, value)
6612 Fraction or percentage, respectively, of values less than the specified
6613 constant.
6614 @item FIN(var_name, low, high)
6615 @itemx PIN(var_name, low, high)
6616 Fraction or percentage, respectively, of values within the specified
6617 inclusive range of constants.
6618 @item FOUT(var_name, low, high)
6619 @itemx POUT(var_name, low, high)
6620 Fraction or percentage, respectively, of values strictly outside the
6621 specified range of constants.
6622 @item N(var_name)
6623 Number of non-missing values.
6624 @item N
6625 Number of cases aggregated to form this group.  Don't supply a source
6626 variable for this aggregation function.
6627 @item NU(var_name)
6628 Number of non-missing values.  Each case is considered to have a weight
6629 of 1, regardless of the current weighting variable (@pxref{WEIGHT}).
6630 @item NU
6631 Number of cases aggregated to form this group.  Each case is considered
6632 to have a weight of 1, regardless of the current weighting variable.
6633 @item NMISS(var_name)
6634 Number of missing values.
6635 @item NUMISS(var_name)
6636 Number of missing values.  Each case is considered to have a weight of
6637 1, regardless of the current weighting variable.
6638 @item FIRST(var_name)
6639 First value in this group.
6640 @item LAST(var_name)
6641 Last value in this group.
6642 @end table
6643
6644 When string values are compared by aggregation functions, they are done
6645 in terms of internal character codes.  On most modern computers, this is
6646 a form of ASCII.
6647
6648 In addition, there is a parallel set of aggregation functions having the
6649 same names as those above, but with a dot after the last character (for
6650 instance, @samp{SUM.}).  These functions are the same as the above,
6651 except that they cause user-missing values, which are normally excluded
6652 from calculations, to be included.
6653
6654 Normally, only a single case (2 for SD and SD.) need be non-missing in
6655 each group in order for the aggregate variable to be non-missing.  If
6656 /MISSING=COLUMNWISE is specified, the behavior reverses: that is, a
6657 single missing value is enough to make the aggregate variable become a
6658 missing value.
6659
6660 AGGREGATE ignores the current SPLIT FILE settings and causes them to be
6661 canceled (@pxref{SPLIT FILE}).
6662
6663 @node AUTORECODE, COMPUTE, AGGREGATE, Data Manipulation
6664 @section AUTORECODE
6665 @vindex AUTORECODE
6666
6667 @display
6668 AUTORECODE VARIABLES=src_vars INTO dest_vars
6669         /DESCENDING
6670         /PRINT
6671 @end display
6672
6673 The AUTORECODE procedure considers the @var{n} values that a variable
6674 takes on and maps them onto values 1@dots{}@var{n} on a new numeric
6675 variable.
6676
6677 Subcommand VARIABLES is the only required subcommand and must come
6678 first.  Specify VARIABLES, an equals sign (@samp{=}), a list of source
6679 variables, INTO, and a list of target variables.  There must the same
6680 number of source and target variables.  The target variables must not
6681 already exist.
6682
6683 By default, increasing values of a source variable (for a string, this
6684 is based on character code comparisons) are recoded to increasing values
6685 of its target variable.  To cause increasing values of a source variable
6686 to be recoded to decreasing values of its target variable (@var{n} down
6687 to 1), specify DESCENDING.
6688
6689 PRINT is currently ignored.
6690
6691 AUTORECODE is a procedure.  It causes the data to be read.
6692
6693 @node COMPUTE, COUNT, AUTORECODE, Data Manipulation
6694 @section COMPUTE
6695 @vindex COMPUTE
6696
6697
6698 @display
6699 COMPUTE var_name = expression.
6700 @end display
6701
6702 @code{COMPUTE} creates a variable with the name specified (if
6703 necessary), then evaluates the given expression for every case and
6704 assigns the result to the variable.  @xref{Expressions}.
6705
6706 Numeric variables created or computed by @code{COMPUTE} are assigned an
6707 output width of 8 characters with two decimal places (@code{F8.2}).
6708 String variables created or computed by @code{COMPUTE} have the same
6709 width as the existing variable or constant.
6710
6711 COMPUTE is a transformation.  It does not cause the active file to be
6712 read.
6713
6714 @node COUNT, FLIP, COMPUTE, Data Manipulation
6715 @section COUNT
6716 @vindex COUNT
6717
6718 @display
6719 COUNT var_name = var@dots{} (value@dots{}).
6720
6721 Each value takes one of the following forms:
6722         number
6723         string
6724         num1 THRU num2
6725         MISSING
6726         SYSMIS
6727 In addition, num1 and num2 can be LO or LOWEST, or HI or HIGHEST,
6728 respectively.
6729 @end display
6730
6731 @code{COUNT} creates or replaces a numeric @dfn{target} variable that
6732 counts the occurrence of a @dfn{criterion} value or set of values over
6733 one or more @dfn{test} variables for each case.
6734
6735 The target variable values are always nonnegative integers.  They are
6736 never missing.  The target variable is assigned an F8.2 output format.
6737 @xref{Input/Output Formats}.  Any variables, including long and short
6738 string variables, may be test variables.
6739
6740 User-missing values of test variables are treated just like any other
6741 values.  They are @strong{not} treated as system-missing values.
6742 User-missing values that are criterion values or inside ranges of
6743 criterion values are counted as any other values.  However (for numeric
6744 variables), keyword @code{MISSING} may be used to refer to all system-
6745 and user-missing values.
6746
6747
6748 @code{COUNT} target variables are assigned values in the order
6749 specified.  In the command @code{COUNT A=A B(1) /B=A B(2).}, the
6750 following actions occur:
6751
6752 @itemize @minus
6753 @item
6754 The number of occurrences of 1 between @code{A} and @code{B} is counted.
6755
6756 @item
6757 @code{A} is assigned this value.
6758
6759 @item
6760 The number of occurrences of 1 between @code{B} and the @strong{new}
6761 value of @code{A} is counted.
6762
6763 @item
6764 @code{B} is assigned this value.
6765 @end itemize
6766
6767 Despite this ordering, all @code{COUNT} criterion variables must exist
6768 before the procedure is executed---they may not be created as target
6769 variables earlier in the command!  Break such a command into two
6770 separate commands.
6771
6772 The examples below may help to clarify.
6773
6774 @enumerate A
6775 @item
6776 Assuming @code{Q0}, @code{Q2}, @dots{}, @code{Q9} are numeric variables,
6777 the following commands:
6778
6779 @enumerate
6780 @item
6781 Count the number of times the value 1 occurs through these variables
6782 for each case and assigns the count to variable @code{QCOUNT}.
6783
6784 @item
6785 Print out the total number of times the value 1 occurs throughout
6786 @emph{all} cases using @code{DESCRIPTIVES}.  @xref{DESCRIPTIVES}, for
6787 details.
6788 @end enumerate
6789
6790 @example
6791 COUNT QCOUNT=Q0 TO Q9(1).
6792 DESCRIPTIVES QCOUNT /STATISTICS=SUM.
6793 @end example
6794
6795 @item
6796 Given these same variables, the following commands:
6797
6798 @enumerate
6799 @item
6800 Count the number of valid values of these variables for each case and
6801 assigns the count to variable @code{QVALID}.
6802
6803 @item
6804 Multiplies each value of @code{QVALID} by 10 to obtain a percentage of
6805 valid values, using @code{COMPUTE}.  @xref{COMPUTE}, for details.
6806
6807 @item
6808 Print out the percentage of valid values across all cases, using
6809 @code{DESCRIPTIVES}.  @xref{DESCRIPTIVES}, for details.
6810 @end enumerate
6811
6812 @example
6813 COUNT QVALID=Q0 TO Q9 (LO THRU HI).
6814 COMPUTE QVALID=QVALID*10.
6815 DESCRIPTIVES QVALID /STATISTICS=MEAN.
6816 @end example
6817 @end enumerate
6818
6819 @node FLIP, IF, COUNT, Data Manipulation
6820 @section FLIP
6821 @vindex FLIP
6822
6823 @display
6824 FLIP /VARIABLES=var_list /NEWNAMES=var_name.
6825 @end display
6826
6827 The FLIP command transposes rows and columns in the active file.  It
6828 causes cases to be swapped with variables, and vice versa.
6829
6830 There are no required subcommands.  The VARIABLES subcommand specifies
6831 variables that will be transformed into cases.  Variables not specified
6832 are discarded.  By default, all variables are selected for
6833 transposition.
6834
6835 The variables specified by NEWNAMES, which must be a string variable, is
6836 used to give names to the variables created by FLIP.  If NEWNAMES is not
6837 specified then the default is a variable named CASE_LBL, if it exists.
6838 If it does not then the variables created by FLIP are named VAR000
6839 through VAR999, then VAR1000, VAR1001, and so on.
6840
6841 When a NEWNAMES variable is available, the names must be canonicalized
6842 before becoming variable names.  Invalid characters are replaced by
6843 letter @samp{V} in the first position, or by @samp{_} in subsequent
6844 positions.  If the name thus generated is not unique, then numeric
6845 extensions are added, starting with 1, until a unique name is found or
6846 there are no remaining possibilities.  If the latter occurs then the
6847 FLIP operation aborts.
6848
6849 The resultant dictionary contains a CASE_LBL variable, which stores the
6850 names of the variables in the dictionary before the transposition.  If
6851 the active file is subsequently transposed using FLIP, this variable can
6852 be used to recreate the original variable names.
6853
6854 @node IF, RECODE, FLIP, Data Manipulation
6855 @section IF
6856 @vindex IF
6857
6858 @display
6859 Two possible syntaxes:
6860         IF test_expr target_var=target_expr.
6861         IF test_expr target_vec(target_index)=target_expr.
6862 @end display
6863
6864 The IF transformation conditionally assigns the value of a target
6865 expression to a target variable, based on the truth of a test
6866 expression.
6867
6868 Specify a boolean-valued expression (@pxref{Expressions}) to be tested
6869 following the IF keyword.  This expression is calculated for each case.
6870 If the value is true, then the value of target_expr is computed and
6871 assigned to target_var.  If the value is false or missing, nothing is
6872 done.  Numeric and short and long string variables may be used.  The
6873 type of target_expr must match the type of target_var.
6874
6875 For numeric variables only, target_var need not exist before the IF
6876 transformation is executed.  In this case, target_var is assigned the
6877 system-missing value if the IF condition is not true.  String variables
6878 must be declared before they can be used as targets for IF.
6879
6880 In addition to ordinary variables, the target variable may be an element
6881 of a vector.  In this case, the vector index must be specified in
6882 parentheses following the vector name.
6883
6884 @node RECODE, SORT CASES, IF, Data Manipulation
6885 @section RECODE
6886 @vindex RECODE
6887
6888 @display
6889 RECODE var_list (src_value@dots{}=dest_value)@dots{} [INTO var_list].
6890
6891 src_value may take the following forms:
6892         number
6893         string
6894         num1 THRU num2
6895         MISSING
6896         SYSMIS
6897         ELSE
6898 Open-ended ranges may be specified using LO or LOWEST for num1
6899 or HI or HIGHEST for num2.
6900
6901 dest_value may take the following forms:
6902         num
6903         string
6904         SYSMIS
6905         COPY
6906 @end display
6907
6908 The RECODE command is used to translate data from one range of values to
6909 another, using flexible user-specified mappings.  Data may be remapped
6910 in-place or copied to new variables.  Numeric, short string, and long
6911 string data can be recoded.
6912
6913 Specify the list of source variables, followed by one or more mapping
6914 specifications each enclosed in parentheses.  If the data is to be
6915 copied to new variables, specify INTO, then the list of target
6916 variables.  String target variables must already have been declared
6917 using STRING or another transformation, but numeric target variables can
6918 be created on the fly.  There must be exactly as many target variables
6919 as source variables.  Each source variable is remapped into its
6920 corresponding target variable.
6921
6922 When INTO is not used, the input and output variables must be of the
6923 same type.  Otherwise, string values can be recoded into numeric values,
6924 and vice versa.  When this is done and there is no mapping for a
6925 particular value, either a value consisting of all spaces or the
6926 system-missing value is assigned, depending on variable type.
6927
6928 Mappings are considered from left to right.  The first src_value that
6929 matches the value of the source variable causes the target variable to
6930 receive the value indicated by the dest_value.  Literal number, string,
6931 and range src_value's should be self-explanatory.  MISSING as a
6932 src_value matches any user- or system-missing value.  SYSMIS matches the
6933 system missing value only.  ELSE is a catch-all that matches anything.
6934 It should be the last src_value specified.
6935
6936 Numeric and string dest_value's should also be self-explanatory.  COPY
6937 causes the input values to be copied to the output.  This is only value
6938 if the source and target variables are of the same type.  SYSMIS
6939 indicates the system-missing value.
6940
6941 If the source variables are strings and the target variables are
6942 numeric, then there is one additional mapping available: (CONVERT),
6943 which must be the last specified mapping.  CONVERT causes a number
6944 specified as a string to be converted to a numeric value.  If the string
6945 cannot be parsed as a number, then the system-missing value is assigned.
6946
6947 Multiple recodings can be specified on the same RECODE command.
6948 Introduce additional recodings with a slash (@samp{/}) in order to
6949 separate them from the previous recodings.
6950
6951 @node SORT CASES,  , RECODE, Data Manipulation
6952 @section SORT CASES
6953 @vindex SORT CASES
6954
6955 @display
6956 SORT CASES BY var_list.
6957 @end display
6958
6959 SORT CASES sorts the active file by the values of one or more
6960 variables.
6961
6962 Specify BY and a list of variables to sort by.  By default, variables
6963 are sorted in ascending order.  To override sort order, specify (D) or
6964 (DOWN) after a list of variables to get descending order, or (A) or (UP)
6965 for ascending order.  These apply to the entire list of variables
6966 preceding them.
6967
6968 SORT CASES is a procedure.  It causes the data to be read.
6969
6970 SORT CASES will attempt to sort the entire active file in main memory.
6971 If main memory is exhausted then it will use a merge sort algorithm that
6972 involves writing and reading numerous temporary files.  Environment
6973 variables determine the temporary files' location.  The first of
6974 SPSSTMPDIR, SPSSXTMPDIR, or TMPDIR that is set determines the location.
6975 Otherwise, if the compiler environment defined P_tmpdir, that is used.
6976 Otherwise, under Unix-like OSes /tmp is used; under MS-DOS, the first of
6977 TEMP, TMP, or root on the current drive is used; under other OSes, the
6978 current directory.
6979
6980 @node Data Selection, Conditionals and Looping, Data Manipulation, Top
6981 @chapter Selecting data for analysis
6982
6983 This chapter documents PSPP commands that temporarily or permanently
6984 select data records from the active file for analysis.
6985
6986 @menu
6987 * FILTER::                      Exclude cases based on a variable.
6988 * N OF CASES::                  Limit the size of the active file.
6989 * PROCESS IF::                  Temporarily excluding cases.
6990 * SAMPLE::                      Select a specified proportion of cases.
6991 * SELECT IF::                   Permanently delete selected cases.
6992 * SPLIT FILE::                  Do multiple analyses with one command.
6993 * TEMPORARY::                   Make transformations' effects temporary.
6994 * WEIGHT::                      Weight cases by a variable.
6995 @end menu
6996
6997 @node FILTER, N OF CASES, Data Selection, Data Selection
6998 @section FILTER
6999 @vindex FILTER
7000
7001 @display
7002 FILTER BY var_name.
7003 FILTER OFF.
7004 @end display
7005
7006 The FILTER command allows a boolean-valued variable to be used to select
7007 cases from the data stream for processing.
7008
7009 In order to set up filtering, specify BY and a variable name.  Keyword
7010 BY is optional but recommended.  Cases which have a zero or system- or
7011 user-missing value are excluded from analysis, but not deleted from the
7012 data stream.  Cases with other values are analyzed.
7013
7014 Use FILTER OFF to turn off case filtering.
7015
7016 Filtering takes place immediately before cases pass to a procedure for
7017 analysis.  Only one filter variable may be active at once.  Normally,
7018 case filtering continues until it is explicitly turned off with FILTER
7019 OFF.  However, if FILTER is placed after TEMPORARY, then filtering stops
7020 after execution of the next procedure or procedure-like command.
7021
7022 @node N OF CASES, PROCESS IF, FILTER, Data Selection
7023 @section N OF CASES
7024 @vindex N OF CASES
7025
7026 @display
7027 N [OF CASES] num_of_cases [ESTIMATED].
7028 @end display
7029
7030 Sometimes you may want to disregard cases of your input.  The @code{N}
7031 command can be used to do this.  @code{N 100} tells PSPP to
7032 disregard all cases after the first 100.
7033
7034 If the value specified for @code{N} is greater than the number of cases
7035 read in, the value is ignored.
7036
7037 @code{N} does not discard cases or cause them not to be read in.  It
7038 just causes cases beyond the last one specified to be ignored by data
7039 analysis commands.
7040
7041 A later @code{N} command can increase or decrease the number of cases
7042 selected.  (To select all the cases without knowing how many there are,
7043 specify a very high number: 100000 or whatever you think is large enough.)
7044
7045 Transformation procedures performed after @code{N} is executed
7046 @emph{do} cause cases to be discarded.
7047
7048 The @code{SAMPLE}, @code{PROCESS IF}, and @code{SELECT IF} commands have
7049 precedence over @code{N}---the same results are obtained by both of the
7050 following fragments, given the same random number seeds:
7051
7052 @example
7053 @i{@dots{}set up, read in data@dots{}}
7054 N 100.
7055 SAMPLE .5.
7056 @i{@dots{}analyze data@dots{}}
7057
7058 @i{@dots{}set up, read in data@dots{}}
7059 SAMPLE .5.
7060 N 100.
7061 @i{@dots{}analyze data@dots{}}
7062 @end example
7063
7064 Both fragments above first randomly sample approximately half of the
7065 cases, then select the first 100 of those sampled.
7066
7067 @code{N} with the @code{ESTIMATED} keyword can be used to give an
7068 estimated number of cases before DATA LIST or another command to
7069 read in data.  (@code{ESTIMATED} never limits the number of cases
7070 processed by procedures.)
7071
7072 @node PROCESS IF, SAMPLE, N OF CASES, Data Selection
7073 @section PROCESS IF
7074 @vindex PROCESS IF
7075
7076 @example
7077 PROCESS IF expression.
7078 @end example
7079
7080 The PROCESS IF command is used to temporarily eliminate cases from the
7081 data stream.  Its effects are active only through the execution of the
7082 next procedure or procedure-like command.
7083
7084 Specify a boolean expression (@pxref{Expressions}).  If the value of the
7085 expression is true for a particular case, the case will be analyzed.  If
7086 the expression has a false or missing value, then the case will be
7087 deleted from the data stream for this procedure only.
7088
7089 Regardless of its placement relative to other commands, PROCESS IF
7090 always takes effect immediately before data passes to the procedure.
7091 Only one PROCESS IF command may be in effect at any given time.
7092
7093 The effects of PROCESS IF are similar not identical to the effects of
7094 executing TEMPORARY then SELECT IF (@pxref{SELECT IF}).
7095
7096 Use of PROCESS IF is deprecated.  It is included for compatibility with
7097 old command files.  New syntax files should use SELECT IF or FILTER
7098 instead.
7099
7100 @node SAMPLE, SELECT IF, PROCESS IF, Data Selection
7101 @section SAMPLE
7102 @vindex SAMPLE
7103
7104 @display
7105 SAMPLE num1 [FROM num2].
7106 @end display
7107
7108 @code{SAMPLE} is used to randomly sample a proportion of the cases in
7109 the active file.  @code{SAMPLE} is temporary, affecting only the next
7110 procedure, unless that is a data transformation, such as @code{SELECT IF}
7111 or @code{RECODE}.
7112
7113 The proportion to sample can be expressed as a single number between 0
7114 and 1.  If @code{k} is the number specified, and @code{N} is the number
7115 of currently-selected cases in the active file, then after
7116 @code{SAMPLE @var{k}.}, approximately @code{k*N} cases will be
7117 selected.
7118
7119 The proportion to sample can also be specified in the style @code{SAMPLE
7120 @var{m} FROM @var{N}}.  With this style, cases are selected as follows:
7121
7122 @enumerate
7123 @item
7124 If @var{N} is equal to the number of currently-selected cases in the
7125 active file, exactly @var{m} cases will be selected.
7126
7127 @item
7128 If @var{N} is greater than the number of currently-selected cases in the
7129 active file, an equivalent proportion of cases will be selected.
7130
7131 @item
7132 If @var{N} is less than the number of currently-selected cases in the
7133 active, exactly @var{m} cases will be selected @emph{from the first
7134 @var{N} cases in the active file.}
7135 @end enumerate
7136
7137 @code{SAMPLE}, @code{SELECT IF}, and @code{PROCESS IF} are performed in
7138 the order specified by the syntax file.
7139
7140 @code{SAMPLE} is ignored before @code{SORT CASES}.
7141
7142 @code{SAMPLE} is always performed before @code{N OF CASES}, regardless
7143 of ordering in the syntax file.  @xref{N OF CASES}.
7144
7145 The same values for @code{SAMPLE} may result in different samples.  To
7146 obtain the same sample, use the @code{SET} command to set the random
7147 number seed to the same value before each @code{SAMPLE}.  By default,
7148 the random number seed is based on the system time.
7149
7150 @node SELECT IF, SPLIT FILE, SAMPLE, Data Selection
7151 @section SELECT IF
7152 @vindex SELECT IF
7153
7154 @display
7155 SELECT IF expression.
7156 @end display
7157
7158 The SELECT IF command is used to select particular cases for analysis
7159 based on the value of a boolean expression.  Cases not selected are
7160 permanently eliminated, unless TEMPORARY is in effect
7161 (@pxref{TEMPORARY}).
7162
7163 Specify a boolean expression (@pxref{Expressions}).  If the value of the
7164 expression is true for a particular case, the case will be analyzed.  If
7165 the expression has a false or missing value, then the case will be
7166 deleted from the data stream.
7167
7168 Always place SELECT IF commands as early in the command file as
7169 possible.  Cases that are deleted early can be processed more
7170 efficiently in time and space.
7171
7172 @node SPLIT FILE, TEMPORARY, SELECT IF, Data Selection
7173 @section SPLIT FILE
7174 @vindex SPLIT FILE
7175
7176 @display
7177 Two possible syntaxes:
7178         SPLIT FILE BY var_list.
7179         SPLIT FILE OFF.
7180 @end display
7181
7182 The SPLIT FILE command allows multiple sets of data present in one data
7183 file to be analyzed separately using single statistical procedure
7184 commands.
7185
7186 Specify a list of variable names in order to analyze multiple sets of
7187 data separately.  Groups of cases having the same values for these
7188 variables are analyzed by statistical procedure commands as one group.
7189 An independent analysis is carried out for each group of cases, and the
7190 variable values for the group are printed along with the analysis.
7191
7192 Specify OFF in order to disable SPLIT FILE and resume analysis of the
7193 entire active file as a single group of data.
7194
7195 @node TEMPORARY, WEIGHT, SPLIT FILE, Data Selection
7196 @section TEMPORARY
7197 @vindex TEMPORARY
7198
7199 @display
7200 TEMPORARY.
7201 @end display
7202
7203 The TEMPORARY command is used to make the effects of transformations
7204 following its execution temporary.  These transformations will
7205 affect only the execution of the next procedure or procedure-like
7206 command.  Their effects will not be saved to the active file.
7207
7208 The only specification is the command name.
7209
7210 TEMPORARY may not appear within a DO IF or LOOP construct.  It may
7211 appear only once between procedures and procedure-like commands.
7212
7213 An example may help to clarify:
7214
7215 @example
7216 DATA LIST /X 1-2.
7217 BEGIN DATA.
7218  2
7219  4
7220 10
7221 15
7222 20
7223 24
7224 END DATA.
7225 COMPUTE X=X/2.
7226 TEMPORARY.
7227 COMPUTE X=X+3.
7228 DESCRIPTIVES X.
7229 DESCRIPTIVES X.
7230 @end example
7231
7232 The data read by the first DESCRIPTIVES command are 4, 5, 8,
7233 10.5, 13, 15.  The data read by the first DESCRIPTIVES command are 1, 2,
7234 5, 7.5, 10, 12.
7235
7236 @node WEIGHT,  , TEMPORARY, Data Selection
7237 @section WEIGHT
7238 @vindex WEIGHT
7239
7240 @display
7241 WEIGHT BY var_name.
7242 WEIGHT OFF.
7243 @end display
7244
7245 WEIGHT can be used to assign cases varying weights in order to
7246 change the frequency distribution of the active file.  Execution of
7247 WEIGHT is delayed until data have been read in.
7248
7249 If a variable name is specified, WEIGHT causes the values of that
7250 variable to be used as weighting factors for subsequent statistical
7251 procedures.  Use of keyword BY is optional but recommended.  Weighting
7252 variables must be numeric.  Scratch variables may not be used for
7253 weighting (@pxref{Scratch Variables}).
7254
7255 When OFF is specified, subsequent statistical procedures will weight all
7256 cases equally.
7257
7258 Weighting values do not need to be integers.  However, negative and
7259 system- and user-missing values for the weighting variable are
7260 interpreted as weighting factors of 0.
7261
7262 WEIGHT does not cause cases in the active file to be replicated in
7263 memory.
7264
7265 @node Conditionals and Looping, Statistics, Data Selection, Top
7266 @chapter Conditional and Looping Constructs
7267 @cindex conditionals
7268 @cindex loops
7269 @cindex flow of control
7270 @cindex control flow
7271
7272 This chapter documents PSPP commands used for conditional execution,
7273 looping, and flow of control.
7274
7275 @menu
7276 * BREAK::                       Exit a loop.
7277 * DO IF::                       Conditionally execute a block of code.
7278 * DO REPEAT::                   Textually repeat a code block.
7279 * LOOP::                        Repeat a block of code.
7280 @end menu
7281
7282 @node BREAK, DO IF, Conditionals and Looping, Conditionals and Looping
7283 @section BREAK
7284 @vindex BREAK
7285
7286 @display
7287 BREAK.
7288 @end display
7289
7290 BREAK terminates execution of the innermost currently executing LOOP
7291 construct.
7292
7293 BREAK is allowed only inside a LOOP construct.  @xref{LOOP}, for more
7294 details.
7295
7296 @node DO IF, DO REPEAT, BREAK, Conditionals and Looping
7297 @section DO IF
7298 @vindex DO IF
7299
7300 @display
7301 DO IF condition.
7302         @dots{}
7303 [ELSE IF condition.
7304         @dots{}
7305 ]@dots{}
7306 [ELSE.
7307         @dots{}]
7308 END IF.
7309 @end display
7310
7311 The DO IF command allows one of several sets of transformations to be
7312 executed, depending on user-specified conditions.
7313
7314 Specify a boolean expression.  If the condition is true, then the block
7315 of code following DO IF is executed.  If the condition is missing, then
7316 none of the code blocks is executed.  If the condition is false, then
7317 the boolean expressions on the first ELSE IF, if present, is tested in
7318 turn, with the same rules applied.  If all expressions evaluate to
7319 false, then the ELSE code block is executed, if it is present.
7320
7321 @node DO REPEAT, LOOP, DO IF, Conditionals and Looping
7322 @section DO REPEAT
7323 @vindex DO REPEAT
7324
7325 @display
7326 DO REPEAT repvar_name=expansion@dots{}.
7327         @dots{}
7328 END REPEAT [PRINT].
7329
7330 expansion takes one of the following forms:
7331         var_list
7332         num_or_range@dots{}
7333         'string'@dots{}
7334
7335 num_or_range takes one of the following forms:
7336         number
7337         num1 TO num2
7338 @end display
7339
7340 The DO REPEAT command causes a block of code to be repeated a number of
7341 times with different variables, numbers, or strings textually
7342 substituted into the block with each repetition.
7343
7344 Specify a repeat variable name followed by an equals sign (@samp{=}) and
7345 the list of replacements.  Replacements can be a list of variables
7346 (which may be existing variables or new variables or a combination
7347 thereof), of numbers, or of strings.  When new variable names are
7348 specified, DO REPEAT creates them as numeric variables.  When numbers
7349 are specified, runs of integers may be indicated with TO notation, for
7350 instance @samp{1 TO 5} and @samp{1 2 3 4 5} would be equivalent.  There
7351 is no equivalent notation for string values.
7352
7353 Multiple repeat variables can be specified.  When this is done, each
7354 variable must have the same number of replacements.
7355
7356 The code within DO REPEAT is repeated as many times as there are
7357 replacements for each variable.  The first time, the first value for
7358 each repeat variable is substituted; the second time, the second value
7359 for each repeat variable is substituted; and so on.
7360
7361 Repeat variable substitutions work like macros.  They take place
7362 anywhere in a line that the repeat variable name occurs as a token,
7363 including command and subcommand names.  For this reason it is not a
7364 good idea to select words commonly used in command and subcommand names
7365 as repeat variable identifiers.
7366
7367 If PRINT is specified on END REPEAT, the commands after substitutions
7368 are made are printed to the listing file, prefixed by a plus sign
7369 (@samp{+}).
7370
7371 @node LOOP,  , DO REPEAT, Conditionals and Looping
7372 @section LOOP
7373 @vindex LOOP
7374
7375 @display
7376 LOOP [index_var=start TO end [BY incr]] [IF condition].
7377         @dots{}
7378 END LOOP [IF condition].
7379 @end display
7380
7381 The LOOP command allows a group of commands to be iterated.  A number of
7382 termination options are offered.
7383
7384 Specify index_var in order to make that variable count from one value to
7385 another by a particular increment.  index_var must be a pre-existing
7386 numeric variable.  start, end, and incr are numeric expressions
7387 (@pxref{Expressions}.)
7388
7389 During the first iteration, index_var is set to the value of start.
7390 During each successive iteration, index_var is increased by the value of
7391 incr.  If end > start, then the loop terminates when index_var > end;
7392 otherwise it terminates when index_var < end.  If incr is not specified
7393 then it defaults to +1 or -1 as appropriate.
7394
7395 If end > start and incr < 0, or if end < start and incr > 0, then the
7396 loop is never executed.  index_var is nevertheless set to the value of
7397 start.
7398
7399 Modifying index_var within the loop is allowed, but it has no effect on
7400 the value of index_var in the next iteration.
7401
7402 Specify a boolean expression for the condition on the LOOP command to
7403 cause the loop to be executed only if the condition is true.  If the
7404 condition is false or missing before the loop contents are executed the
7405 first time, the loop contents are not executed at all.
7406
7407 If index and condition clauses are both present on LOOP, the index
7408 clause is always evaluated first.
7409
7410 Specify a boolean expression for the condition on the END LOOP to cause
7411 the loop to terminate if the condition is not true after the enclosed
7412 code block is executed.  The condition is evaluated at the end of the
7413 loop, not at the beginning.
7414
7415 If the index clause and both condition clauses are not present, then the
7416 loop is executed MXLOOPS (@pxref{SET}) times or until BREAK
7417 (@pxref{BREAK}) is executed.
7418
7419 The BREAK command provides another way to terminate execution of a LOOP
7420 construct.
7421
7422 @node Statistics, Utilities, Conditionals and Looping, Top
7423 @chapter Statistics
7424
7425 This chapter documents the statistical procedures that PSPP supports so
7426 far.
7427
7428 @menu
7429 * DESCRIPTIVES::                Descriptive statistics.
7430 * FREQUENCIES::                 Frequency tables.
7431 * CROSSTABS::                   Crosstabulation tables.
7432 @end menu
7433
7434 @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
7435 @section DESCRIPTIVES
7436
7437 @display
7438 DESCRIPTIVES
7439         /VARIABLES=var_list
7440         /MISSING=@{VARIABLE,LISTWISE@} @{INCLUDE,NOINCLUDE@}
7441         /FORMAT=@{LABELS,NOLABELS@} @{NOINDEX,INDEX@} @{LINE,SERIAL@}
7442         /SAVE
7443         /STATISTICS=@{ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
7444                      SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
7445                      SESKEWNESS,SEKURTOSIS@}
7446         /SORT=@{NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
7447                RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME@}
7448               @{A,D@}
7449 @end display
7450
7451 The DESCRIPTIVES procedure reads the active file and outputs descriptive
7452 statistics requested by the user.  In addition, it can optionally
7453 compute Z-scores.
7454
7455 The VARIABLES subcommand, which is required, specifies the list of
7456 variables to be analyzed.  Keyword VARIABLES is optional.
7457
7458 All other subcommands are optional:
7459
7460 The MISSING subcommand determines the handling of missing variables.  If
7461 INCLUDE is set, then user-missing values are included in the
7462 calculations.  If NOINCLUDE is set, which is the default, user-missing
7463 values are excluded.  If VARIABLE is set, then missing values are
7464 excluded on a variable by variable basis; if LISTWISE is set, then
7465 the entire case is excluded whenever any value in that case has a
7466 system-missing or, if INCLUDE is set, user-missing value.
7467
7468 The FORMAT subcommand affects the output format.  Currently the
7469 LABELS/NOLABELS and NOINDEX/INDEX settings is not used.  When SERIAL is
7470 set, both valid and missing number of cases are listed in the output;
7471 when NOSERIAL is set, only valid cases are listed.
7472
7473 The SAVE subcommand causes DESCRIPTIVES to calculate Z scores for all
7474 the specified variables.  The Z scores are saved to new variables.
7475 Variable names are generated by trying first the original variable name
7476 with Z prepended and truncated to a maximum of 8 characters, then the
7477 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
7478 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence.  In addition, Z score
7479 variable names can be specified explicitly on VARIABLES in the variable
7480 list by enclosing them in parentheses after each variable.
7481
7482 The STATISTICS subcommand specifies the statistics to be displayed:
7483
7484 @table @code
7485 @item ALL
7486 All of the statistics below.
7487 @item MEAN
7488 Arithmetic mean.
7489 @item SEMEAN
7490 Standard error of the mean.
7491 @item STDDEV
7492 Standard deviation.
7493 @item VARIANCE
7494 Variance.
7495 @item KURTOSIS
7496 Kurtosis and standard error of the kurtosis.
7497 @item SKEWNESS
7498 Skewness and standard error of the skewness.
7499 @item RANGE
7500 Range.
7501 @item MINIMUM
7502 Minimum value.
7503 @item MAXIMUM
7504 Maximum value.
7505 @item SUM
7506 Sum.
7507 @item DEFAULT
7508 Mean, standard deviation of the mean, minimum, maximum.
7509 @item SEKURTOSIS
7510 Standard error of the kurtosis.
7511 @item SESKEWNESS
7512 Standard error of the skewness.
7513 @end table
7514
7515 The SORT subcommand specifies how the statistics should be sorted.  Most
7516 of the possible values should be self-explanatory.  NAME causes the
7517 statistics to be sorted by name.  By default, the statistics are listed
7518 in the order that they are specified on the VARIABLES subcommand.  The A
7519 and D settings request an ascending or descending sort order,
7520 respectively.
7521
7522 @node FREQUENCIES, CROSSTABS, DESCRIPTIVES, Statistics
7523 @section FREQUENCIES
7524
7525 @display
7526 FREQUENCIES
7527         /VARIABLES=var_list
7528         /FORMAT=@{TABLE,NOTABLE,LIMIT(limit)@}
7529                 @{STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]@}
7530                 @{LABELS,NOLABELS@}
7531                 @{AVALUE,DVALUE,AFREQ,DFREQ@}
7532                 @{SINGLE,DOUBLE@}
7533                 @{OLDPAGE,NEWPAGE@}
7534         /MISSING=@{EXCLUDE,INCLUDE@}
7535         /STATISTICS=@{DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
7536                      KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
7537                      SESKEWNESS,SEKURTOSIS,ALL,NONE@}
7538         /NTILES=ntiles
7539         /PERCENTILES=percent@dots{}
7540
7541 (These options are not currently implemented.)
7542         /BARCHART=@dots{}
7543         /HISTOGRAM=@dots{}
7544         /HBAR=@dots{}
7545         /GROUPED=@dots{}
7546
7547 (Integer mode.)
7548         /VARIABLES=var_list (low,high)@dots{}
7549 @end display
7550
7551 FREQUENCIES causes the data to be read and frequency tables to be built
7552 and output for specified variables.  FREQUENCIES can also calculate and
7553 display descriptive statistics (including median and mode) and
7554 percentiles.
7555
7556 In the future, FREQUENCIES will also support graphical output in the
7557 form of bar charts and histograms.  In addition, it will be able to
7558 support percentiles for grouped data.  (As a historical note, these
7559 options were supported in a version of PSPP written years ago, but the
7560 code has not survived.)
7561
7562 The VARIABLES subcommand is the only required subcommand.  Specify the
7563 variables to be analyzed.  In most cases, this is all that is required.
7564 This is known as @dfn{general mode}.
7565
7566 Occasionally, one may want to invoke a special mode called @dfn{integer
7567 mode}.  Normally, in general mode, PSPP will automatically determine
7568 what values occur in the data.  In integer mode, the user specifies the
7569 range of values that the data assumes.  To invoke this mode, specify a
7570 range of data values in parentheses, separated by a comma.  Data values
7571 inside the range are truncated to the nearest integer, then assigned to
7572 that value.  If values occur outside this range, they are discarded.
7573
7574 The FORMAT subcommand controls the output format.  It has several
7575 possible settings:
7576
7577 @itemize @bullet
7578 @item
7579 TABLE, the default, causes a frequency table to be output for every
7580 variable specified.  NOTABLE prevents them from being output.  LIMIT
7581 with a numeric argument causes them to be output except when there are
7582 more than the specified number of values in the table.
7583
7584 @item
7585 STANDARD frequency tables contain more complete information, but also to
7586 take up more space on the printed page.  CONDENSE frequency tables are
7587 less informative but take up less space.  ONEPAGE with a numeric
7588 argument will output standard frequency tables if there are the
7589 specified number of values or less, condensed tables otherwise.  ONEPAGE
7590 without an argument defaults to a threshold of 50 values.
7591
7592 @item
7593 LABELS causes value labels to be displayed in STANDARD frequency
7594 tables.  NOLABLES prevents this.
7595
7596 @item
7597 Normally frequency tables are sorted in ascending order by value.  This
7598 is AVALUE.  DVALUE tables are sorted in descending order by value.
7599 AFREQ and DFREQ tables are sorted in ascending and descending order,
7600 respectively, by frequency count.
7601
7602 @item
7603 SINGLE spaced frequency tables are closely spaced.  DOUBLE spaced
7604 frequency tables have wider spacing.
7605
7606 @item
7607 OLDPAGE and NEWPAGE are not currently used.
7608 @end itemize
7609
7610 The MISSING subcommand controls the handling of user-missing values.
7611 When EXCLUDE, the default, is set, user-missing values are not included
7612 in frequency tables or statistics.  When INCLUDE is set, user-missing
7613 are included.  System-missing values are never included in statistics,
7614 but are listed in frequency tables.
7615
7616 The available STATISTICS are the same as available in DESCRIPTIVES
7617 (@pxref{DESCRIPTIVES}), with the addition of MEDIAN, the data's median
7618 value, and MODE, the mode.  (If there are multiple modes, the smallest
7619 value is reported.)  By default, the mean, standard deviation of the
7620 mean, minimum, and maximum are reported for each variable.
7621
7622 NTILES causes the specified quartiles to be reported.  For instance,
7623 @code{/NTILES=4} would cause quartiles to be reported.  In addition,
7624 particular percentiles can be requested with the PERCENTILES subcommand.
7625
7626 @node CROSSTABS,  , FREQUENCIES, Statistics
7627 @section CROSSTABS
7628
7629 @display
7630 CROSSTABS
7631         /TABLES=var_list BY var_list [BY var_list]@dots{}
7632         /MISSING=@{TABLE,INCLUDE,REPORT@}
7633         /WRITE=@{NONE,CELLS,ALL@}
7634         /FORMAT=@{TABLES,NOTABLES@}
7635                 @{LABELS,NOLABELS,NOVALLABS@}
7636                 @{PIVOT,NOPIVOT@}
7637                 @{AVALUE,DVALUE@}
7638                 @{NOINDEX,INDEX@}
7639                 @{BOX,NOBOX@}
7640         /CELLS=@{COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
7641                 ASRESIDUAL,ALL,NONE@}
7642         /STATISTICS=@{CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
7643                      KAPPA,ETA,CORR,ALL,NONE@}
7644
7645 (Integer mode.)
7646         /VARIABLES=var_list (low,high)@dots{}
7647 @end display
7648
7649 CROSSTABS reads the active file and builds and displays crosstabulation
7650 tables requested by the user.  It can calculate several statistics for
7651 each cell in the crosstabulation tables.  In addition, a number of
7652 statistics can be calculated for each table itself.
7653
7654 The TABLES subcommand is used to specify the tables to be reported.  Any
7655 number of dimensions is permitted, and any number of variables per
7656 dimension is allowed.  The TABLES subcommand may be repeated as many
7657 times as needed.  This is the only required subcommand in @dfn{general
7658 mode}.
7659
7660 Occasionally, one may want to invoke a special mode called @dfn{integer
7661 mode}.  Normally, in general mode, PSPP will automatically determine
7662 what values occur in the data.  In integer mode, the user specifies the
7663 range of values that the data assumes.  To invoke this mode, specify the
7664 VARIABLES subcommand, giving a range of data values in parentheses for
7665 each variable to be used on the TABLES subcommand.  Data values inside
7666 the range are truncated to the nearest integer, then assigned to that
7667 value.  If values occur outside this range, they are discarded.  When it
7668 is present, the VARIABLES subcommand must precede the TABLES subcommand.
7669
7670 The MISSING subcommand determines the handling of user-missing values.
7671 When set to TABLE, the default, missing values are dropped on a table by
7672 table basis.  When set to INCLUDE, user-missing values are included in
7673 tables and statistics.  When set to REPORT, which is allowed only in
7674 integer mode, user-missing values are included in tables but marked with
7675 an @samp{M} (for ``missing'') and excluded from statistical
7676 calculations.
7677
7678 Currently the WRITE subcommand is not used.
7679
7680 The FORMAT subcommand controls the characteristics of the
7681 crosstabulation tables to be displayed.  It has a number of possible
7682 settings:
7683
7684 @itemize @bullet
7685 @item
7686 TABLES, the default, causes crosstabulation tables to be output.
7687 NOTABLES suppresses them.
7688
7689 @item
7690 LABELS, the default, allows variable labels and value labels to appear
7691 in the output.  NOLABELS suppresses them.  NOVALLABS displays variable
7692 labels but suppresses value labels.
7693
7694 @item
7695 PIVOT, the default, causes each TABLES subcommand to be displayed in a
7696 pivot table format.  NOPIVOT causes the old-style crosstabulation format
7697 to be used.
7698
7699 @item
7700 AVALUE, the default, causes values to be sorted in ascending order.
7701 DVALUE asserts a descending sort order.
7702
7703 @item
7704 INDEX/NOINDEX is currently ignored.
7705
7706 @item
7707 BOX/NOBOX is currently ignored.
7708 @end itemize
7709
7710 The CELLS subcommand controls the contents of each cell in the displayed
7711 crosstabulation table.  The possible settings are:
7712
7713 @table @asis
7714 @item COUNT
7715 Frequency count.
7716 @item ROW
7717 Row percent.
7718 @item COLUMN
7719 Column percent.
7720 @item TOTAL
7721 Table percent.
7722 @item EXPECTED
7723 Expected value.
7724 @item RESIDUAL
7725 Residual.
7726 @item SRESIDUAL
7727 Standardized residual.
7728 @item ASRESIDUAL
7729 Adjusted standardized residual.
7730 @item ALL
7731 All of the above.
7732 @item NONE
7733 Suppress cells entirely.
7734 @end table
7735
7736 @samp{/CELLS} without any settings specified requests COUNT, ROW,
7737 COLUMN, and TOTAL.  If CELLS is not specified at all then only COUNT
7738 will be selected.
7739
7740 The STATISTICS subcommand selects statistics for computation:
7741
7742 @table @asis
7743 @item CHISQ
7744 Pearson chi-square, likelihood ratio, Fisher's exact test, continuity
7745 correction, linear-by-linear association.
7746 @item PHI
7747 Phi.
7748 @item CC
7749 Contingency coefficient.
7750 @item LAMBDA
7751 Lambda.
7752 @item UC
7753 Uncertainty coefficient.
7754 @item BTAU
7755 Tau-b.
7756 @item CTAU
7757 Tau-c.
7758 @item RISK
7759 Risk estimate.
7760 @item GAMMA
7761 Gamma.
7762 @item D
7763 Somers' D.
7764 @item KAPPA
7765 Cohen's Kappa.
7766 @item ETA
7767 Eta.
7768 @item CORR
7769 Spearman correlation, Pearson's r.
7770 @item ALL
7771 All of the above.
7772 @item NONE
7773 No statistics.
7774 @end table
7775
7776 Selected statistics are only calculated when appropriate for the
7777 statistic.  Certain statistics require tables of a particular size, and
7778 some statistics are calculated only in integer mode.
7779
7780 @samp{/STATISTICS} without any settings selects CHISQ.  If the
7781 STATISTICS subcommand is not given, no statistics are calculated.
7782
7783 @strong{Please note:} Currently the implementation of CROSSTABS has the
7784 followings bugs:
7785
7786 @itemize @bullet
7787 @item
7788 Pearson's R (but not Spearman!) is off a little.
7789 @item
7790 T values for Spearman's R and Pearson's R are wrong.
7791 @item
7792 How to calculate significance of symmetric and directional measures?
7793 @item
7794 Asymmetric ASEs and T values for lambda are wrong.
7795 @item
7796 ASE of Goodman and Kruskal's tau is not calculated.
7797 @item
7798 ASE of symmetric somers' d is wrong.
7799 @item
7800 Approx. T of uncertainty coefficient is wrong.
7801 @end itemize
7802
7803 Fix for any of these deficiencies would be welcomed.
7804
7805 @node Utilities, Not Implemented, Statistics, Top
7806 @chapter Utilities
7807
7808 Commands that don't fit any other category are placed here.
7809
7810 Most of these commands are not affected by commands like IF and LOOP:
7811 they take effect only once, unconditionally, at the time that they are
7812 encountered in the input.
7813
7814 @menu
7815 * COMMENT::                     Document your syntax file.
7816 * DOCUMENT::                    Document the active file.
7817 * DISPLAY DOCUMENTS::           Display active file documents.
7818 * DISPLAY FILE LABEL::          Display the active file label.
7819 * DROP DOCUMENTS::              Remove documents from the active file.
7820 * EXECUTE::                     Execute pending transformations.
7821 * FILE LABEL::                  Set the active file's label.
7822 * INCLUDE::                     Include a file within the current one.
7823 * QUIT::                        Terminate the PSPP session.
7824 * SET::                         Adjust PSPP runtime parameters.
7825 * SUBTITLE::                    Provide a document subtitle.
7826 * SYSFILE INFO::                Display the dictionary in a system file.
7827 * TITLE::                       Provide a document title.
7828 @end menu
7829
7830 @node COMMENT, DOCUMENT, Utilities, Utilities
7831 @section COMMENT
7832 @vindex COMMENT
7833 @vindex *
7834
7835 @display
7836 Two possibles syntaxes:
7837         COMMENT comment text @dots{} .
7838         *comment text @dots{} .
7839 @end display
7840
7841 The COMMENT command is ignored.  It is used to provide information to
7842 the author and other readers of the PSPP syntax file.
7843
7844 A COMMENT command can extend over any number of lines.  Don't forget to
7845 terminate it with a dot or a blank line!
7846
7847 @node DOCUMENT, DISPLAY DOCUMENTS, COMMENT, Utilities
7848 @section DOCUMENT
7849 @vindex DOCUMENT
7850
7851 @display
7852 DOCUMENT documentary_text.
7853 @end display
7854
7855 The DOCUMENT command adds one or more lines of descriptive commentary to
7856 the active file.  Documents added in this way are saved to system files.
7857 They can be viewed using SYSFILE INFO or DISPLAY DOCUMENTS.  They can be
7858 removed from the active file with DROP DOCUMENTS.
7859
7860 Specify the documentary text following the DOCUMENT keyword.  You can
7861 extend the documentary text over as many lines as necessary.  Lines are
7862 truncated at 80 characters width.  Don't forget to terminate the
7863 DOCUMENT command with a dot or a blank line.
7864
7865 @node DISPLAY DOCUMENTS, DISPLAY FILE LABEL, DOCUMENT, Utilities
7866 @section DISPLAY DOCUMENTS
7867 @vindex DISPLAY DOCUMENTS
7868
7869 @display
7870 DISPLAY DOCUMENTS.
7871 @end display
7872
7873 DISPLAY DOCUMENTS displays the documents in the active file.  Each
7874 document is preceded by a line giving the time and date that it was
7875 added.  @xref{DOCUMENT}.
7876
7877 @node DISPLAY FILE LABEL, DROP DOCUMENTS, DISPLAY DOCUMENTS, Utilities
7878 @section DISPLAY FILE LABEL
7879 @vindex DISPLAY FILE LABEL
7880
7881 @display
7882 DISPLAY FILE LABEL.
7883 @end display
7884
7885 DISPLAY FILE LABEL displays the file label contained in the active file,
7886 if any.  @xref{FILE LABEL}.
7887
7888 @node DROP DOCUMENTS, EXECUTE, DISPLAY FILE LABEL, Utilities
7889 @section DROP DOCUMENTS
7890 @vindex DROP DOCUMENTS
7891
7892 @display
7893 DROP DOCUMENTS.
7894 @end display
7895
7896 The DROP DOCUMENTS command removes all documents from the active file.
7897 New documents can be added with the DOCUMENT utility (@pxref{DOCUMENT}).
7898
7899 DROP DOCUMENTS only changes the active file.  It does not modify any
7900 system files stored on disk.
7901
7902 @node EXECUTE, FILE LABEL, DROP DOCUMENTS, Utilities
7903 @section EXECUTE
7904 @vindex EXECUTE
7905
7906 @display
7907 EXECUTE.
7908 @end display
7909
7910 The EXECUTE utility causes the active file to be read and all pending
7911 transformations to be executed.
7912
7913 @node FILE LABEL, FINISH, EXECUTE, Utilities
7914 @section FILE LABEL
7915 @vindex FILE LABEL
7916
7917 @display
7918 FILE LABEL file_label.
7919 @end display
7920
7921 Use the FILE LABEL command to provide a title for the active file.  This
7922 title will be saved into system files and portable files that are
7923 created during this PSPP run.
7924
7925 It is not necessary to include quotes around file_label.  If they are
7926 included then they become part of the file label.
7927
7928
7929
7930 @node FINISH, INCLUDE, FILE LABEL, Utilities
7931 @section FINISH
7932 @vindex FINISH
7933
7934 @display
7935 FINISH.
7936 @end display
7937
7938 The FINISH command terminates the current PSPP session and returns
7939 control to the operating system.
7940
7941 This command is not valid in interactive mode.
7942
7943
7944 @node INCLUDE, QUIT, FINISH, Utilities
7945 @section INCLUDE
7946 @vindex INCLUDE
7947 @vindex @@
7948
7949 @display
7950 Two possible syntaxes:
7951         INCLUDE 'filename'.
7952         @@filename.
7953 @end display
7954
7955 The INCLUDE command causes the PSPP command processor to read an
7956 additional command file as if it were included bodily in the current
7957 command file.
7958
7959 INCLUDE files may be nested to any depth, up to the limit of available
7960 memory.
7961
7962 @node QUIT, SET, INCLUDE, Utilities
7963 @section QUIT
7964 @vindex QUIT
7965
7966 @display
7967 Two possible syntaxes:
7968         QUIT.
7969         EXIT.
7970 @end display
7971
7972 The QUIT command terminates the current PSPP session and returns control
7973 to the operating system.
7974
7975 This command is not valid within a command file.
7976
7977 @node SET, SUBTITLE, QUIT, Utilities
7978 @section SET
7979 @vindex SET
7980
7981 @display
7982 SET
7983
7984 (data input)
7985         /BLANKS=@{SYSMIS,'.',number@}
7986         /DECIMAL=@{DOT,COMMA@}
7987         /FORMAT=fmt_spec
7988
7989 (program input)
7990         /ENDCMD='.'
7991         /NULLINE=@{ON,OFF@}
7992
7993 (interaction)
7994         /CPROMPT='cprompt_string'
7995         /DPROMPT='dprompt_string'
7996         /ERRORBREAK=@{OFF,ON@}
7997         /MXERRS=max_errs
7998         /MXWARNS=max_warnings
7999         /PROMPT='prompt'
8000         /VIEWLENGTH=@{MINIMUM,MEDIAN,MAXIMUM,n_lines@}
8001         /VIEWWIDTH=n_characters
8002
8003 (program execution)
8004         /MEXPAND=@{ON,OFF@}
8005         /MITERATE=max_iterations
8006         /MNEST=max_nest
8007         /MPRINT=@{ON,OFF@}
8008         /MXLOOPS=max_loops
8009         /SEED=@{RANDOM,seed_value@}
8010         /UNDEFINED=@{WARN,NOWARN@}
8011
8012 (data output)
8013         /CC@{A,B,C,D,E@}=@{'npre,pre,suf,nsuf','npre.pre.suf.nsuf'@}
8014         /DECIMAL=@{DOT,COMMA@}
8015         /FORMAT=fmt_spec
8016
8017 (output routing)
8018         /ECHO=@{ON,OFF@}
8019         /ERRORS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8020         /INCLUDE=@{ON,OFF@}
8021         /MESSAGES=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8022         /PRINTBACK=@{ON,OFF@}
8023         /RESULTS=@{ON,OFF,TERMINAL,LISTING,BOTH,NONE@}
8024
8025 (output activation)
8026         /LISTING=@{ON,OFF@}
8027         /PRINTER=@{ON,OFF@}
8028         /SCREEN=@{ON,OFF@}
8029
8030 (output driver options)
8031         /HEADERS=@{NO,YES,BLANK@}
8032         /LENGTH=@{NONE,length_in_lines@}
8033         /LISTING=filename
8034         /MORE=@{ON,OFF@}
8035         /PAGER=@{OFF,"pager_name"@}
8036         /WIDTH=@{NARROW,WIDTH,n_characters@}
8037
8038 (logging)
8039         /JOURNAL=@{ON,OFF@} [filename]
8040         /LOG=@{ON,OFF@} [filename]
8041
8042 (system files)
8043         /COMPRESSION=@{ON,OFF@}
8044         /SCOMPRESSION=@{ON,OFF@}
8045
8046 (security)
8047         /SAFER=ON
8048
8049 (obsolete settings accepted for compatibility, but ignored)
8050         /AUTOMENU=@{ON,OFF@}
8051         /BEEP=@{ON,OFF@}
8052         /BLOCK='c'
8053         /BOXSTRING=@{'xxx','xxxxxxxxxxx'@}
8054         /CASE=@{UPPER,UPLOW@}
8055         /COLOR=@dots{}
8056         /CPI=cpi_value
8057         /DISK=@{ON,OFF@}
8058         /EJECT=@{ON,OFF@}
8059         /HELPWINDOWS=@{ON,OFF@}
8060         /HIGHRES=@{ON,OFF@}
8061         /HISTOGRAM='c'
8062         /LOWRES=@{AUTO,ON,OFF@}
8063         /LPI=lpi_value
8064         /MENUS=@{STANDARD,EXTENDED@}
8065         /MXMEMORY=max_memory
8066         /PTRANSLATE=@{ON,OFF@}
8067         /RCOLORS=@dots{}
8068         /RUNREVIEW=@{AUTO,MANUAL@}
8069         /SCRIPTTAB='c'
8070         /TB1=@{'xxx','xxxxxxxxxxx'@}
8071         /TBFONTS='string'
8072         /WORKDEV=drive_letter
8073         /WORKSPACE=workspace_size
8074         /XSORT=@{YES,NO@}
8075 @end display
8076
8077 The SET command allows the user to adjust several parameters relating to
8078 PSPP's execution.  Since there are many subcommands to this command, its
8079 subcommands will be examined in groups.
8080
8081 As a general comment, ON and YES are considered synonymous, and
8082 so are OFF and NO, when used as subcommand values.
8083
8084 The data input subcommands affect the way that data is read from data
8085 files.  The data input subcommands are
8086
8087 @table @asis
8088 @item BLANKS
8089 This is the value assigned to an item data item that is empty or
8090 contains only whitespace.  An argument of SYSMIS or '.' will cause the
8091 system-missing value to be assigned to null items.  This is the
8092 default.  Any real value may be assigned.
8093
8094 @item DECIMAL
8095 The default DOT setting causes the decimal point character to be
8096 @samp{.}.  A setting of COMMA causes the decimal point character to be
8097 @samp{,}.
8098
8099 @item FORMAT
8100 Allows the default numeric input/output format to be specified.  The
8101 default is F8.2.  @xref{Input/Output Formats}.
8102 @end table
8103
8104 Program input subcommands affect the way that programs are parsed when
8105 they are typed interactively or run from a script.  They are
8106
8107 @table @asis
8108 @item ENDCMD
8109 This is a single character indicating the end of a command.  The default
8110 is @samp{.}.  Don't change this.
8111
8112 @item NULLINE
8113 Whether a blank line is interpreted as ending the current command.  The
8114 default is ON.
8115 @end table
8116
8117 Interaction subcommands affect the way that PSPP interacts with an
8118 online user.  The interaction subcommands are
8119
8120 @table @asis
8121 @item CPROMPT
8122 The command continuation prompt.  The default is @samp{    > }.
8123
8124 @item DPROMPT
8125 Prompt used when expecting data input within BEGIN DATA (@pxref{BEGIN
8126 DATA}).  The default is @samp{data> }.
8127
8128 @item ERRORBREAK
8129 Whether an error causes PSPP to stop processing the current command
8130 file after finishing the current command.  The default is OFF.
8131
8132 @item MXERRS
8133 The maximum number of errors before PSPP halts processing of the current
8134 command file.  The default is 50.
8135
8136 @item MXWARNS
8137 The maximum number of warnings + errors before PSPP halts processing the
8138 current command file.  The default is 100.
8139
8140 @item PROMPT
8141 The command prompt.  The default is @samp{PSPP> }.
8142
8143 @item VIEWLENGTH
8144 The length of the screen in lines.  MINIMUM means 25 lines, MEDIAN and
8145 MAXIMUM mean 43 lines.  Otherwise specify the number of lines.  Normally
8146 PSPP should auto-detect your screen size so this shouldn't have to be
8147 used.
8148
8149 @item VIEWWIDTH
8150 The width of the screen in characters.  Normally 80 or 132.
8151 @end table
8152
8153 Program execution subcommands control the way that PSPP commands
8154 execute.  The program execution subcommands are
8155
8156 @table @asis
8157 @item MEXPAND
8158 @itemx MITERATE
8159 @itemx MNEST
8160 @itemx MPRINT
8161 Currently not used.
8162
8163 @item MXLOOPS
8164 The maximum number of iterations for an uncontrolled loop.
8165
8166 @item SEED
8167 The initial pseudo-random number seed.  Set to a real number or to
8168 RANDOM, which will obtain an initial seed from the current time of day.
8169
8170 @item UNDEFINED
8171 Currently not used.
8172 @end table
8173
8174 Data output subcommands affect the format of output data.  These
8175 subcommands are
8176
8177 @table @asis
8178 @item CCA
8179 @itemx CCB
8180 @itemx CCC
8181 @itemx CCD
8182 @itemx CCE
8183 Set up custom currency formats.  The argument is a string which must
8184 contain exactly three commas or exactly three periods.  If commas, then
8185 the grouping character for the currency format is @samp{,}, and the
8186 decimal point character is @samp{.}; if periods, then the situation is
8187 reversed.
8188
8189 The commas or periods divide the string into four fields, which are, in
8190 order, the negative prefix, prefix, suffix, and negative suffix.  When a
8191 value is formatted using the custom currency format, the prefix precedes
8192 the value formatted and the suffix follows it.  In addition, if the
8193 value is negative, the negative prefix precedes the prefix and the
8194 negative suffix follows the suffix.
8195
8196 @item DECIMAL
8197 The default DOT setting causes the decimal point character to be
8198 @samp{.}.  A setting of COMMA causes the decimal point character to be
8199 @samp{,}.
8200
8201 @item FORMAT
8202 Allows the default numeric input/output format to be specified.  The
8203 default is F8.2.  @xref{Input/Output Formats}.
8204 @end table
8205
8206 Output routing subcommands affect where the output of transformations
8207 and procedures is sent.  These subcommands are
8208
8209 @table @asis
8210 @item ECHO
8211
8212 If turned on, commands are written to the listing file as they are read
8213 from command files.  The default is OFF.
8214
8215 @itemx ERRORS
8216 @itemx INCLUDE
8217 @itemx MESSAGES
8218 @item PRINTBACK
8219 @item RESULTS
8220 Currently not used.
8221 @end table
8222
8223 Output activation subcommands affect whether output devices of
8224 particular types are enabled.  These subcommands are
8225
8226 @table @asis
8227 @item LISTING
8228 Enable or disable listing devices.
8229
8230 @item PRINTER
8231 Enable or disable printer devices.
8232
8233 @item SCREEN
8234 Enable or disable screen devices.
8235 @end table
8236
8237 Output driver option subcommands affect output drivers' settings.  These
8238 subcommands are
8239
8240 @table @asis
8241 @item HEADERS
8242 @itemx LENGTH
8243 @itemx LISTING
8244 @itemx MORE
8245 @itemx PAGER
8246 @itemx WIDTH
8247 Currently not used.
8248 @end table
8249
8250 Logging subcommands affect logging of commands executed to external
8251 files.  These subcommands are
8252
8253 @table @asis
8254 @item JOURNAL
8255 @item LOG
8256 Not currently used.
8257 @end table
8258
8259 System file subcommands affect the default format of system files
8260 produced by PSPP.  These subcommands are
8261
8262 @table @asis
8263 @item COMPRESSION
8264 Not currently used.
8265
8266 @item SCOMPRESSION
8267 Whether system files created by SAVE or XSAVE are compressed by default.
8268 The default is ON.
8269 @end table
8270
8271 Security subcommands affect the operations that commands are allowed to
8272 perform.  The security subcommands are
8273
8274 @table @asis
8275 @item SAFER
8276 When set, this setting cannot ever be reset, for obvious security
8277 reasons.  Setting this option disables the following operations:
8278
8279 @itemize @bullet
8280 @item
8281 The ERASE command.
8282 @item
8283 The HOST command.
8284 @item
8285 Pipe filenames (filenames beginning or ending with @samp{|}).
8286 @end itemize
8287
8288 Be aware that this setting does not guarantee safety (commands can still
8289 overwrite files, for instance) but it is an improvement.
8290 @end table
8291
8292 @node SUBTITLE, TITLE, SET, Utilities
8293 @section SUBTITLE
8294 @vindex SUBTITLE
8295
8296 @display
8297 Two possible syntaxes:
8298         SUBTITLE 'subtitle_string'.
8299         SUBTITLE subtitle_string.
8300 @end display
8301
8302 The SUBTITLE command is used to provide a subtitle to a particular PSPP
8303 run.  This subtitle appears at the top of each output page below the
8304 title, if headers are enabled on the output device.
8305
8306 Specify a subtitle as a string in quotes.  The alternate syntax that did
8307 not require quotes is now obsolete.  If it is used then the subtitle is
8308 converted to all uppercase.
8309
8310 @node TITLE,  , SUBTITLE, Utilities
8311 @section TITLE
8312 @vindex TITLE
8313
8314 @display
8315 Two possible syntaxes:
8316         TITLE 'title_string'.
8317         TITLE title_string.
8318 @end display
8319
8320 The TITLE command is used to provide a title to a particular PSPP run.
8321 This title appears at the top of each output page, if headers are enabled
8322 on the output device.
8323
8324 Specify a title as a string in quotes.  The alternate syntax that did
8325 not require quotes is now obsolete.  If it is used then the title is
8326 converted to all uppercase.
8327
8328 @node Not Implemented, Data File Format, Utilities, Top
8329 @chapter Not Implemented
8330
8331 This chapter lists parts of the PSPP language that are not yet
8332 implemented.
8333
8334 The following transformations and utilities are not yet implemented, but
8335 they will be supported in a later release.
8336
8337 @itemize @bullet
8338 @item
8339 ADD FILES
8340 @item
8341 ANOVA
8342 @item
8343 DEFINE
8344 @item
8345 FILE TYPE
8346 @item
8347 GET SAS
8348 @item
8349 GET TRANSLATE
8350 @item
8351 MCONVERT
8352 @item
8353 PLOT
8354 @item
8355 PRESERVE
8356 @item
8357 PROCEDURE OUTPUT
8358 @item
8359 RESTORE
8360 @item
8361 SAVE TRANSLATE
8362 @item
8363 SHOW
8364 @item
8365 UPDATE
8366 @end itemize
8367
8368 The following transformations and utilities are not implemented.  There
8369 are no plans to support them in future releases.  Contributions to
8370 implement them will still be accepted.
8371
8372 @itemize @bullet
8373 @item
8374 EDIT
8375 @item
8376 GET DATABASE
8377 @item
8378 GET OSIRIS
8379 @item
8380 GET SCSS
8381 @item
8382 GSET
8383 @item
8384 HELP
8385 @item
8386 INFO
8387 @item
8388 INPUT MATRIX
8389 @item
8390 KEYED DATA LIST
8391 @item
8392 NUMBERED and UNNUMBERED
8393 @item
8394 OPTIONS
8395 @item
8396 REVIEW
8397 @item
8398 SAVE SCSS
8399 @item
8400 SPSS MANAGER
8401 @item
8402 STATISTICS
8403 @end itemize
8404
8405 @node Data File Format, Portable File Format, Not Implemented, Top
8406 @chapter Data File Format
8407
8408 PSPP necessarily uses the same format for system files as do the
8409 products with which it is compatible.  This chapter is a description of
8410 that format.
8411
8412 There are three data types used in system files: 32-bit integers, 64-bit
8413 floating points, and 1-byte characters.  In this document these will
8414 simply be referred to as @code{int32}, @code{flt64}, and @code{char},
8415 the names that are used in the PSPP source code.  Every field of type
8416 @code{int32} or @code{flt64} is aligned on a 32-bit boundary.
8417
8418 The endianness of data in PSPP system files is not specified.  System
8419 files output on a computer of a particular endianness will have the
8420 endianness of that computer.  However, PSPP can read files of either
8421 endianness, regardless of its host computer's endianness.  PSPP
8422 translates endianness for both integer and floating point numbers.
8423
8424 Floating point formats are also not specified.  PSPP does not
8425 translate between floating point formats.  This is unlikely to be a
8426 problem as all modern computer architectures use IEEE 754 format for
8427 floating point representation.
8428
8429 The PSPP system-missing value is represented by the largest possible
8430 negative number in the floating point format; in C, this is most likely
8431 @code{-DBL_MAX}.  There are two other important values used in missing
8432 values: @code{HIGHEST} and @code{LOWEST}.  These are represented by the
8433 largest possible positive number (probably @code{DBL_MAX}) and the
8434 second-largest negative number.  The latter must be determined in a
8435 system-dependent manner; in IEEE 754 format it is represented by value
8436 @code{0xffeffffffffffffe}.
8437
8438 System files are divided into records.  Each record begins with an
8439 @code{int32} giving a numeric record type.  Individual record types are
8440 described below:
8441
8442 @menu
8443 * File Header Record::
8444 * Variable Record::
8445 * Value Label Record::
8446 * Value Label Variable Record::
8447 * Document Record::
8448 * Machine int32 Info Record::
8449 * Machine flt64 Info Record::
8450 * Miscellaneous Informational Records::
8451 * Dictionary Termination Record::
8452 * Data Record::
8453 @end menu
8454
8455 @node File Header Record, Variable Record, Data File Format, Data File Format
8456 @section File Header Record
8457
8458 The file header is always the first record in the file.
8459
8460 @example
8461 struct sysfile_header
8462   @{
8463     char                rec_type[4];
8464     char                prod_name[60];
8465     int32               layout_code;
8466     int32               case_size;
8467     int32               compressed;
8468     int32               weight_index;
8469     int32               ncases;
8470     flt64               bias;
8471     char                creation_date[9];
8472     char                creation_time[8];
8473     char                file_label[64];
8474     char                padding[3];
8475   @};
8476 @end example
8477
8478 @table @code
8479 @item char rec_type[4];
8480 Record type code.  Always set to @samp{$FL2}.  This is the only record
8481 for which the record type is not of type @code{int32}.
8482
8483 @item char prod_name[60];
8484 Product identification string.  This always begins with the characters
8485 @samp{@@(#) SPSS DATA FILE}.  PSPP uses the remaining characters to
8486 give its version and the operating system name; for example, @samp{GNU
8487 pspp 0.1.4 - sparc-sun-solaris2.5.2}.  The string is truncated if it
8488 would be longer than 60 characters; otherwise it is padded on the right
8489 with spaces.
8490
8491 @item int32 layout_code;
8492 Always set to 2.  PSPP reads this value in order to determine the
8493 file's endianness.
8494
8495 @item int32 case_size;
8496 Number of data elements per case.  This is the number of variables,
8497 except that long string variables add extra data elements (one for every
8498 8 characters after the first 8).
8499
8500 @item int32 compressed;
8501 Set to 1 if the data in the file is compressed, 0 otherwise.
8502
8503 @item int32 weight_index;
8504 If one of the variables in the data set is used as a weighting variable,
8505 set to the index of that variable.  Otherwise, set to 0.
8506
8507 @item int32 ncases;
8508 Set to the number of cases in the file if it is known, or -1 otherwise.
8509
8510 In the general case it is not possible to determine the number of cases
8511 that will be output to a system file at the time that the header is
8512 written.  The way that this is dealt with is by writing the entire
8513 system file, including the header, then seeking back to the beginning of
8514 the file and writing just the @code{ncases} field.  For `files' in which
8515 this is not valid, the seek operation fails.  In this case,
8516 @code{ncases} remains -1.
8517
8518 @item flt64 bias;
8519 Compression bias.  Always set to 100.  The significance of this value is
8520 that only numbers between @code{(1 - bias)} and @code{(251 - bias)} can
8521 be compressed.
8522
8523 @item char creation_date[9];
8524 Set to the date of creation of the system file, in @samp{dd mmm yy}
8525 format, with the month as standard English abbreviations, using an
8526 initial capital letter and following with lowercase.  If the date is not
8527 available then this field is arbitrarily set to @samp{01 Jan 70}.
8528
8529 @item char creation_time[8];
8530 Set to the time of creation of the system file, in @samp{hh:mm:ss}
8531 format and using 24-hour time.  If the time is not available then this
8532 field is arbitrarily set to @samp{00:00:00}.
8533
8534 @item char file_label[64];
8535 Set the the file label declared by the user, if any.  Padded on the
8536 right with spaces.
8537
8538 @item char padding[3];
8539 Ignored padding bytes to make the structure a multiple of 32 bits in
8540 length.  Set to zeros.
8541 @end table
8542
8543 @node Variable Record, Value Label Record, File Header Record, Data File Format
8544 @section Variable Record
8545
8546 Immediately following the header must come the variable records.  There
8547 must be one variable record for every variable and every 8 characters in
8548 a long string beyond the first 8; i.e., there must be exactly as many
8549 variable records as the value specified for @code{case_size} in the file
8550 header record.
8551
8552 @example
8553 struct sysfile_variable
8554   @{
8555     int32               rec_type;
8556     int32               type;
8557     int32               has_var_label;
8558     int32               n_missing_values;
8559     int32               print;
8560     int32               write;
8561     char                name[8];
8562
8563     /* The following two fields are present
8564        only if has_var_label is 1. */
8565     int32               label_len;
8566     char                label[/* variable length */];
8567
8568     /* The following field is present only
8569        if n_missing_values is not 0. */
8570     flt64               missing_values[/* variable length*/];
8571   @};
8572 @end example
8573
8574 @table @code
8575 @item int32 rec_type;
8576 Record type code.  Always set to 2.
8577
8578 @item int32 type;
8579 Variable type code.  Set to 0 for a numeric variable.  For a short
8580 string variable or the first part of a long string variable, this is set
8581 to the width of the string.  For the second and subsequent parts of a
8582 long string variable, set to -1, and the remaining fields in the
8583 structure are ignored.
8584
8585 @item int32 has_var_label;
8586 If this variable has a variable label, set to 1; otherwise, set to 0.
8587
8588 @item int32 n_missing_values;
8589 If the variable has no missing values, set to 0.  If the variable has
8590 one, two, or three discrete missing values, set to 1, 2, or 3,
8591 respectively.  If the variable has a range for missing variables, set to
8592 -2; if the variable has a range for missing variables plus a single
8593 discrete value, set to -3.
8594
8595 @item int32 print;
8596 Print format for this variable.  See below.
8597
8598 @item int32 write;
8599 Write format for this variable.  See below.
8600
8601 @item char name[8];
8602 Variable name.  The variable name must begin with a capital letter or
8603 the at-sign (@samp{@@}).  Subsequent characters may also be octothorpes
8604 (@samp{#}), dollar signs (@samp{$}), underscores (@samp{_}), or full
8605 stops (@samp{.}).  The variable name is padded on the right with spaces.
8606
8607 @item int32 label_len;
8608 This field is present only if @code{has_var_label} is set to 1.  It is
8609 set to the length, in characters, of the variable label, which must be a
8610 number between 0 and 120.
8611
8612 @item char label[/* variable length */];
8613 This field is present only if @code{has_var_label} is set to 1.  It has
8614 length @code{label_len}, rounded up to the nearest multiple of 32 bits.
8615 The first @code{label_len} characters are the variable's variable label.
8616
8617 @item flt64 missing_values[/* variable length */];
8618 This field is present only if @code{n_missing_values} is not 0.  It has
8619 the same number of elements as the absolute value of
8620 @code{n_missing_values}.  For discrete missing values, each element
8621 represents one missing value.  When a range is present, the first
8622 element denotes the minimum value in the range, and the second element
8623 denotes the maximum value in the range.  When a range plus a value are
8624 present, the third element denotes the additional discrete missing
8625 value.  HIGHEST and LOWEST are indicated as described in the chapter
8626 introduction.
8627 @end table
8628
8629 The @code{print} and @code{write} members of sysfile_variable are output
8630 formats coded into @code{int32} types.  The LSB (least-significant byte)
8631 of the @code{int32} represents the number of decimal places, and the
8632 next two bytes in order of increasing significance represent field width
8633 and format type, respectively.  The MSB (most-significant byte) is not
8634 used and should be set to zero.
8635
8636 Format types are defined as follows:
8637 @table @asis
8638 @item 0
8639 Not used.
8640 @item 1
8641 @code{A}
8642 @item 2
8643 @code{AHEX}
8644 @item 3
8645 @code{COMMA}
8646 @item 4
8647 @code{DOLLAR}
8648 @item 5
8649 @code{F}
8650 @item 6
8651 @code{IB}
8652 @item 7
8653 @code{PIBHEX}
8654 @item 8
8655 @code{P}
8656 @item 9
8657 @code{PIB}
8658 @item 10
8659 @code{PK}
8660 @item 11
8661 @code{RB}
8662 @item 12
8663 @code{RBHEX}
8664 @item 13
8665 Not used.
8666 @item 14
8667 Not used.
8668 @item 15
8669 @code{Z}
8670 @item 16
8671 @code{N}
8672 @item 17
8673 @code{E}
8674 @item 18
8675 Not used.
8676 @item 19
8677 Not used.
8678 @item 20
8679 @code{DATE}
8680 @item 21
8681 @code{TIME}
8682 @item 22
8683 @code{DATETIME}
8684 @item 23
8685 @code{ADATE}
8686 @item 24
8687 @code{JDATE}
8688 @item 25
8689 @code{DTIME}
8690 @item 26
8691 @code{WKDAY}
8692 @item 27
8693 @code{MONTH}
8694 @item 28
8695 @code{MOYR}
8696 @item 29
8697 @code{QYR}
8698 @item 30
8699 @code{WKYR}
8700 @item 31
8701 @code{PCT}
8702 @item 32
8703 @code{DOT}
8704 @item 33
8705 @code{CCA}
8706 @item 34
8707 @code{CCB}
8708 @item 35
8709 @code{CCC}
8710 @item 36
8711 @code{CCD}
8712 @item 37
8713 @code{CCE}
8714 @item 38
8715 @code{EDATE}
8716 @item 39
8717 @code{SDATE}
8718 @end table
8719
8720 @node Value Label Record, Value Label Variable Record, Variable Record, Data File Format
8721 @section Value Label Record
8722
8723 Value label records must follow the variable records and must precede
8724 the header termination record.  Other than this, they may appear
8725 anywhere in the system file.  Every value label record must be
8726 immediately followed by a label variable record, described below.
8727
8728 Value label records begin with @code{rec_type}, an @code{int32} value
8729 set to the record type of 3.  This is followed by @code{count}, an
8730 @code{int32} value set to the number of value labels present in this
8731 record.
8732
8733 These two fields are followed by a series of @code{count} tuples.  Each
8734 tuple is divided into two fields, the value and the label.  The first of
8735 these, the value, is composed of a 64-bit value, which is either a
8736 @code{flt64} value or up to 8 characters (padded on the right to 8
8737 bytes) denoting a short string value.  Whether the value is a
8738 @code{flt64} or a character string is not defined inside the value label
8739 record.
8740
8741 The second field in the tuple, the label, has variable length.  The
8742 first @code{char} is a count of the number of characters in the value
8743 label.  The remainder of the field is the label itself.  The field is
8744 padded on the right to a multiple of 64 bits in length.
8745
8746 @node Value Label Variable Record, Document Record, Value Label Record, Data File Format
8747 @section Value Label Variable Record
8748
8749 Every value label variable record must be immediately preceded by a
8750 value label record, described above.
8751
8752 @example
8753 struct sysfile_value_label_variable
8754   @{
8755      int32              rec_type;
8756      int32              count;
8757      int32              vars[/* variable length */];
8758   @};
8759 @end example
8760
8761 @table @code
8762 @item int32 rec_type;
8763 Record type.  Always set to 4.
8764
8765 @item int32 count;
8766 Number of variables that the associated value labels from the value
8767 label record are to be applied.
8768
8769 @item int32 vars[/* variable length];
8770 A list of variables to which to apply the value labels.  There are
8771 @code{count} elements.
8772 @end table
8773
8774 @node Document Record, Machine int32 Info Record, Value Label Variable Record, Data File Format
8775 @section Document Record
8776
8777 There must be no more than one document record per system file.
8778 Document records must follow the variable records and precede the
8779 dictionary termination record.
8780
8781 @example
8782 struct sysfile_document
8783   @{
8784     int32               rec_type;
8785     int32               n_lines;
8786     char                lines[/* variable length */][80];
8787   @};
8788 @end example
8789
8790 @table @code
8791 @item int32 rec_type;
8792 Record type.  Always set to 6.
8793
8794 @item int32 n_lines;
8795 Number of lines of documents present.
8796
8797 @item char lines[/* variable length */][80];
8798 Document lines.  The number of elements is defined by @code{n_lines}.
8799 Lines shorter than 80 characters are padded on the right with spaces.
8800 @end table
8801
8802 @node Machine int32 Info Record, Machine flt64 Info Record, Document Record, Data File Format
8803 @section Machine @code{int32} Info Record
8804
8805 There must be no more than one machine @code{int32} info record per
8806 system file.  Machine @code{int32} info records must follow the variable
8807 records and precede the dictionary termination record.
8808
8809 @example
8810 struct sysfile_machine_int32_info
8811   @{
8812     /* Header. */
8813     int32               rec_type;
8814     int32               subtype;
8815     int32               size;
8816     int32               count;
8817
8818     /* Data. */
8819     int32               version_major;
8820     int32               version_minor;
8821     int32               version_revision;
8822     int32               machine_code;
8823     int32               floating_point_rep;
8824     int32               compression_code;
8825     int32               endianness;
8826     int32               character_code;
8827   @};
8828 @end example
8829
8830 @table @code
8831 @item int32 rec_type;
8832 Record type.  Always set to 7.
8833
8834 @item int32 subtype;
8835 Record subtype.  Always set to 3.
8836
8837 @item int32 size;
8838 Size of each piece of data in the data part, in bytes.  Always set to 4.
8839
8840 @item int32 count;
8841 Number of pieces of data in the data part.  Always set to 8.
8842
8843 @item int32 version_major;
8844 PSPP major version number.  In version @var{x}.@var{y}.@var{z}, this
8845 is @var{x}.
8846
8847 @item int32 version_minor;
8848 PSPP minor version number.  In version @var{x}.@var{y}.@var{z}, this
8849 is @var{y}.
8850
8851 @item int32 version_revision;
8852 PSPP version revision number.  In version @var{x}.@var{y}.@var{z},
8853 this is @var{z}.
8854
8855 @item int32 machine_code;
8856 Machine code.  PSPP always set this field to value to -1, but other
8857 values may appear.
8858
8859 @item int32 floating_point_rep;
8860 Floating point representation code.  For IEEE 754 systems this is 1.
8861 IBM 370 sets this to 2, and DEC VAX E to 3.
8862
8863 @item int32 compression_code;
8864 Compression code.  Always set to 1.
8865
8866 @item int32 endianness;
8867 Machine endianness.  1 indicates big-endian, 2 indicates little-endian.
8868
8869 @item int32 character_code;
8870 Character code.  1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3
8871 indicates 8-bit ASCII, 4 indicates DEC Kanji.
8872 @end table
8873
8874 @node Machine flt64 Info Record, Miscellaneous Informational Records, Machine int32 Info Record, Data File Format
8875 @section Machine @code{flt64} Info Record
8876
8877 There must be no more than one machine @code{flt64} info record per
8878 system file.  Machine @code{flt64} info records must follow the variable
8879 records and precede the dictionary termination record.
8880
8881 @example
8882 struct sysfile_machine_flt64_info
8883   @{
8884     /* Header. */
8885     int32               rec_type;
8886     int32               subtype;
8887     int32               size;
8888     int32               count;
8889
8890     /* Data. */
8891     flt64               sysmis;
8892     flt64               highest;
8893     flt64               lowest;
8894   @};
8895 @end example
8896
8897 @table @code
8898 @item int32 rec_type;
8899 Record type.  Always set to 3.
8900
8901 @item int32 subtype;
8902 Record subtype.  Always set to 4.
8903
8904 @item int32 size;
8905 Size of each piece of data in the data part, in bytes.  Always set to 4.
8906
8907 @item int32 count;
8908 Number of pieces of data in the data part.  Always set to 3.
8909
8910 @item flt64 sysmis;
8911 The system missing value.
8912
8913 @item flt64 highest;
8914 The value used for HIGHEST in missing values.
8915
8916 @item flt64 lowest;
8917 The value used for LOWEST in missing values.
8918 @end table
8919
8920 @node Miscellaneous Informational Records, Dictionary Termination Record, Machine flt64 Info Record, Data File Format
8921 @section Miscellaneous Informational Records
8922
8923 Miscellaneous informational records must follow the variable records and
8924 precede the dictionary termination record.
8925
8926 Miscellaneous informational records are ignored by PSPP when reading
8927 system files.  They are not written by PSPP when writing system files.
8928
8929 @example
8930 struct sysfile_misc_info
8931   @{
8932     /* Header. */
8933     int32               rec_type;
8934     int32               subtype;
8935     int32               size;
8936     int32               count;
8937
8938     /* Data. */
8939     char                data[/* variable length */];
8940   @};
8941 @end example
8942
8943 @table @code
8944 @item int32 rec_type;
8945 Record type.  Always set to 3.
8946
8947 @item int32 subtype;
8948 Record subtype.  May take any value.
8949
8950 @item int32 size;
8951 Size of each piece of data in the data part.  Should have the value 4 or
8952 8, for @code{int32} and @code{flt64}, respectively.
8953
8954 @item int32 count;
8955 Number of pieces of data in the data part.
8956
8957 @item char data[/* variable length */];
8958 Arbitrary data.  There must be @code{size} times @code{count} bytes of
8959 data.
8960 @end table
8961
8962 @node Dictionary Termination Record, Data Record, Miscellaneous Informational Records, Data File Format
8963 @section Dictionary Termination Record
8964
8965 The dictionary termination record must follow all other records, except
8966 for the actual cases, which it must precede.  There must be exactly one
8967 dictionary termination record in every system file.
8968
8969 @example
8970 struct sysfile_dict_term
8971   @{
8972     int32               rec_type;
8973     int32               filler;
8974   @};
8975 @end example
8976
8977 @table @code
8978 @item int32 rec_type;
8979 Record type.  Always set to 999.
8980
8981 @item int32 filler;
8982 Ignored padding.  Should be set to 0.
8983 @end table
8984
8985 @node Data Record,  , Dictionary Termination Record, Data File Format
8986 @section Data Record
8987
8988 Data records must follow all other records in the data file.  There must
8989 be at least one data record in every system file.
8990
8991 The format of data records varies depending on whether the data is
8992 compressed.  Regardless, the data is arranged in a series of 8-byte
8993 elements.
8994
8995 When data is not compressed, Every case is composed of @code{case_size}
8996 of these 8-byte elements, where @code{case_size} comes from the file
8997 header record (@pxref{File Header Record}).  Each element corresponds to
8998 the variable declared in the respective variable record (@pxref{Variable
8999 Record}).  Numeric values are given in @code{flt64} format; string
9000 values are literal characters string, padded on the right when
9001 necessary.
9002
9003 Compressed data is arranged in the following manner: the first 8-byte
9004 element in the data section is divided into a series of 1-byte command
9005 codes.  These codes have meanings as described below:
9006
9007 @table @asis
9008 @item 0
9009 Ignored.  If the program writing the system file accumulates compressed
9010 data in blocks of fixed length, 0 bytes can be used to pad out extra
9011 bytes remaining at the end of a fixed-size block.
9012
9013 @item 1 through 251
9014 These values indicate that the corresponding numeric variable has the
9015 value @code{(@var{code} - @var{bias})} for the case being read, where
9016 @var{code} is the value of the compression code and @var{bias} is the
9017 variable @code{compression_bias} from the file header.  For example,
9018 code 105 with bias 100.0 (the normal value) indicates a numeric variable
9019 of value 5.
9020
9021 @item 252
9022 End of file.  This code may or may not appear at the end of the data
9023 stream.  PSPP always outputs this code but its use is not required.
9024
9025 @item 253
9026 This value indicates that the numeric or string value is not
9027 compressible.  The value is stored in the 8-byte element following the
9028 current block of command bytes.  If this value appears twice in a block
9029 of command bytes, then it indicates the second element following the
9030 command bytes, and so on.
9031
9032 @item 254
9033 Used to indicate a string value that is all spaces.
9034
9035 @item 255
9036 Used to indicate the system-missing value.
9037 @end table
9038
9039 When the end of the first 8-byte element of command bytes is reached,
9040 any blocks of non-compressible values are skipped, and the next element
9041 of command bytes is read and interpreted, until the end of the file is
9042 reached.
9043
9044 @node Portable File Format, q2c Input Format, Data File Format, Top
9045 @chapter Portable File Format
9046
9047 These days, most computers use the same internal data formats for
9048 integer and floating-point data, if one ignores little differences like
9049 big- versus little-endian byte ordering.  However, occasionally it is
9050 necessary to exchange data between systems with incompatible data
9051 formats.  This is what portable files are designed to do.
9052
9053 @strong{Please note:} Although all of the following information is
9054 correct, as far as the author has been able to ascertain, it is gleaned
9055 from examination of ASCII-formatted portable files only, so some of it
9056 may be incorrect in the general case.
9057
9058 @menu
9059 * Portable File Characters::
9060 * Portable File Structure::
9061 * Portable File Header::
9062 * Version and Date Info Record::
9063 * Identification Records::
9064 * Variable Count Record::
9065 * Variable Records::
9066 * Value Label Records::
9067 * Portable File Data::
9068 @end menu
9069
9070 @node Portable File Characters, Portable File Structure, Portable File Format, Portable File Format
9071 @section Portable File Characters
9072
9073 Portable files are arranged as a series of lines of exactly 80
9074 characters each.  Each line is terminated by a carriage-return,
9075 line-feed sequence (henceforth, ``newline'').  Newlines are not
9076 delimiters: they are only used to avoid line-length limitations existing
9077 on some operating systems.
9078
9079 The file must be terminated with a @samp{Z} character.  In addition, if
9080 the final line in the file does not have exactly 80 characters, then it
9081 is padded on the right with @samp{Z} characters.  (The file contents may
9082 be in any character set; the file contains a description of its own
9083 character set, as explained in the next section.  Therefore, the
9084 @samp{Z} character is not necessarily an ASCII @samp{Z}.)
9085
9086 For the rest of the description of the portable file format, newlines
9087 and the trailing @samp{Z}s will be ignored, as if they did not exist,
9088 because they are not an important part of understanding the file
9089 contents.
9090
9091 @node Portable File Structure, Portable File Header, Portable File Characters, Portable File Format
9092 @section Portable File Structure
9093
9094 Every portable file consists of the following records, in sequence:
9095
9096 @itemize @bullet
9097
9098 @item
9099 File header.
9100
9101 @item
9102 Version and date info.
9103
9104 @item
9105 Product identification.
9106
9107 @item
9108 Subproduct identification (optional).
9109
9110 @item
9111 Variable count.
9112
9113 @item
9114 Variables.  Each variable record may optionally be followed by a
9115 missing value record and a variable label record.
9116
9117 @item
9118 Value labels (optional).
9119
9120 @item
9121 Data.
9122 @end itemize
9123
9124 Most records are identified by a single-character tag code.  The file
9125 header and version info record do not have a tag.
9126
9127 Other than these single-character codes, there are three types of fields
9128 in a portable file: floating-point, integer, and string.  Floating-point
9129 fields have the following format:
9130
9131 @itemize @bullet
9132
9133 @item
9134 Zero or more leading spaces.
9135
9136 @item
9137 Optional asterisk (@samp{*}), which indicates a missing value.  The
9138 asterisk must be followed by a single character, generally a period
9139 (@samp{.}), but it appears that other characters may also be possible.
9140 This completes the specification of a missing value.
9141
9142 @item
9143 Optional minus sign (@samp{-}) to indicate a negative number.
9144
9145 @item
9146 A whole number, consisting of one or more base-30 digits: @samp{0}
9147 through @samp{9} plus capital letters @samp{A} through @samp{T}.
9148
9149 @item
9150 A fraction, consisting of a radix point (@samp{.}) followed by one or
9151 more base-30 digits (optional).
9152
9153 @item
9154 An exponent, consisting of a plus or minus sign (@samp{+} or @samp{-})
9155 followed by one or more base-30 digits (optional).
9156
9157 @item
9158 A forward slash (@samp{/}).
9159 @end itemize
9160
9161 Integer fields take form identical to floating-point fields, but they
9162 may not contain a fraction.
9163
9164 String fields take the form of a integer field having value @var{n},
9165 followed by exactly @var{n} characters, which are the string content.
9166
9167 @node Portable File Header, Version and Date Info Record, Portable File Structure, Portable File Format
9168 @section Portable File Header
9169
9170 Every portable file begins with a 464-byte header, consisting of a
9171 200-byte collection of vanity splash strings, followed by a 256-byte
9172 character set translation table, followed by an 8-byte tag string.
9173
9174 The 200-byte segment is divided into five 40-byte sections, each of
9175 which represents the string @code{ASCII SPSS PORT FILE} in a different
9176 character set encoding.  (If the file is encoded in EBCDIC then the
9177 string is actually @code{EBCDIC SPSS PORT FILE}, and so on.)  These
9178 strings are padded on the right with spaces in their own character set.
9179
9180 It appears that these strings exist only to inform those who might view
9181 the file on a screen, and that they are not parsed by SPSS products.
9182 Thus, they can be safely ignored.  For those interested, the strings are
9183 supposed to be in the following character sets, in the specified order:
9184 EBCDIC, 7-bit ASCII, CDC 6-bit ASCII, 6-bit ASCII, Honeywell 6-bit
9185 ASCII.
9186
9187 The 256-byte segment describes a mapping from the character set used in
9188 the portable file to an arbitrary character set having characters at the
9189 following positions:
9190
9191 @table @asis
9192 @item 0--60
9193
9194 Control characters.  Not important enough to describe in full here.
9195
9196 @item 61--63
9197
9198 Reserved.
9199
9200 @item 64--73
9201
9202 Digits @samp{0} through @samp{9}.
9203
9204 @item 74--99
9205
9206 Capital letters @samp{A} through @samp{Z}.
9207
9208 @item 100--125
9209
9210 Lowercase letters @samp{a} through @samp{z}.
9211
9212 @item 126
9213
9214 Space.
9215
9216 @item 127--130
9217
9218 Symbols @code{.<(+}
9219
9220 @item 131
9221
9222 Solid vertical pipe.
9223
9224 @item 132--142
9225
9226 Symbols @code{&[]!$*);^-/}
9227
9228 @item 143
9229
9230 Broken vertical pipe.
9231
9232 @item 144--150
9233
9234 Symbols @code{,%_>}?@code{`:}   @c @code{?} is an inverted question mark
9235
9236 @item 151
9237
9238 British pound symbol.
9239
9240 @item 152--155
9241
9242 Symbols @code{@@'="}.
9243
9244 @item 156
9245
9246 Less than or equal symbol.
9247
9248 @item 157
9249
9250 Empty box.
9251
9252 @item 158
9253
9254 Plus or minus.
9255
9256 @item 159
9257
9258 Filled box.
9259
9260 @item 160
9261
9262 Degree symbol.
9263
9264 @item 161
9265
9266 Dagger.
9267
9268 @item 162
9269
9270 Symbol @samp{~}.
9271
9272 @item 163
9273
9274 En dash.
9275
9276 @item 164
9277
9278 Lower left corner box draw.
9279
9280 @item 165
9281
9282 Upper left corner box draw.
9283
9284 @item 166
9285
9286 Greater than or equal symbol.
9287
9288 @item 167--176
9289
9290 Superscript @samp{0} through @samp{9}.
9291
9292 @item 177
9293
9294 Lower right corner box draw.
9295
9296 @item 178
9297
9298 Upper right corner box draw.
9299
9300 @item 179
9301
9302 Not equal symbol.
9303
9304 @item 180
9305
9306 Em dash.
9307
9308 @item 181
9309
9310 Superscript @samp{(}.
9311
9312 @item 182
9313
9314 Superscript @samp{)}.
9315
9316 @item 183
9317
9318 Horizontal dagger (?).
9319
9320 @item 184--186
9321
9322 Symbols @samp{@{@}\}.
9323 @item 187
9324
9325 Cents symbol.
9326
9327 @item 188
9328
9329 Centered dot, or bullet.
9330
9331 @item 189--255
9332
9333 Reserved.
9334 @end table
9335
9336 Symbols that are not defined in a particular character set are set to
9337 the same value as symbol 64; i.e., to @samp{0}.
9338
9339 The 8-byte tag string consists of the exact characters @code{SPSSPORT}
9340 in the portable file's character set, which can be used to verify that
9341 the file is indeed a portable file.
9342
9343 @node Version and Date Info Record, Identification Records, Portable File Header, Portable File Format
9344 @section Version and Date Info Record
9345
9346 This record does not have a tag code.  It has the following structure:
9347
9348 @itemize @bullet
9349 @item
9350 A single character identifying the file format version.  The letter A
9351 represents version 0, and so on.
9352
9353 @item
9354 An 8-character string field giving the file creation date in the format
9355 YYYYMMDD.
9356
9357 @item
9358 A 6-character string field giving the file creation time in the format
9359 HHMMSS.
9360 @end itemize
9361
9362 @node Identification Records, Variable Count Record, Version and Date Info Record, Portable File Format
9363 @section Identification Records
9364
9365 The product identification record has tag code @samp{1}.  It consists of
9366 a single string field giving the name of the product that wrote the
9367 portable file.
9368
9369 The subproduct identification record has tag code @samp{3}.  It
9370 consists of a single string field giving additional information on the
9371 product that wrote the portable file.
9372
9373 @node Variable Count Record, Variable Records, Identification Records, Portable File Format
9374 @section Variable Count Record
9375
9376 The variable count record has tag code @samp{4}.  It consists of two
9377 integer fields.  The first contains the number of variables in the file
9378 dictionary.  The purpose of the second is unknown; it contains the value
9379 161 in all portable files examined so far.
9380
9381 @node Variable Records, Value Label Records, Variable Count Record, Portable File Format
9382 @section Variable Records
9383
9384 Each variable record represents a single variable.  Variable records
9385 have tag code @samp{7}.  They have the following structure:
9386
9387 @itemize @bullet
9388
9389 @item
9390 Width (integer).  This is 0 for a numeric variable, and a number between 1
9391 and 255 for a string variable.
9392
9393 @item
9394 Name (string).  1--8 characters long.  Must be in all capitals.
9395
9396 @item
9397 Print format.  This is a set of three integer fields:
9398
9399 @itemize @minus
9400
9401 @item
9402 Format type (@pxref{Variable Record}).
9403
9404 @item
9405 Format width.  1--40.
9406
9407 @item
9408 Number of decimal places.  1--40.
9409 @end itemize
9410
9411 @item
9412 Write format.  Same structure as the print format described above.
9413 @end itemize
9414
9415 Each variable record can optionally be followed by a missing value
9416 record, which has tag code @samp{8}.  A missing value record has one
9417 field, the missing value itself (a floating-point or string, as
9418 appropriate).  Up to three of these missing value records can be used.
9419
9420 There is also a record for missing value ranges, which has tag code
9421 @samp{B}.  It is followed by two fields representing the range, which
9422 are floating-point or string as appropriate.  If a missing value range
9423 is present, it may be followed by a single missing value record.
9424
9425 Tag codes @samp{9} and @samp{A} represent @code{LO THRU @var{x}} and
9426 @code{@var{x} THRU HI} ranges, respectively.  Each is followed by a
9427 single field representing @var{x}.  If one of the ranges is present, it
9428 may be followed by a single missing value record.
9429
9430 In addition, each variable record can optionally be followed by a
9431 variable label record, which has tag code @samp{C}.  A variable label
9432 record has one field, the variable label itself (string).
9433
9434 @node Value Label Records, Portable File Data, Variable Records, Portable File Format
9435 @section Value Label Records
9436
9437 Value label records have tag code @samp{D}.  They have the following
9438 format:
9439
9440 @itemize @bullet
9441 @item
9442 Variable count (integer).
9443
9444 @item
9445 List of variables (strings).  The variable count specifies the number in
9446 the list.  Variables are specified by their names.  All variables must
9447 be of the same type (numeric or string).
9448
9449 @item
9450 Label count (integer).
9451
9452 @item
9453 List of (value, label) tuples.  The label count specifies the number of
9454 tuples.  Each tuple consists of a value, which is numeric or string as
9455 appropriate to the variables, followed by a label (string).
9456 @end itemize
9457
9458 @node Portable File Data,  , Value Label Records, Portable File Format
9459 @section Portable File Data
9460
9461 The data record has tag code @samp{F}.  There is only one tag for all
9462 the data; thus, all the data must follow the dictionary.  The data is
9463 terminated by the end-of-file marker @samp{Z}, which is not valid as the
9464 beginning of a data element.
9465
9466 Data elements are output in the same order as the variable records
9467 describing them.  String variables are output as string fields, and
9468 numeric variables are output as floating-point fields.
9469
9470 @node q2c Input Format, Bugs, Portable File Format, Top
9471 @chapter @code{q2c} Input Format
9472
9473 PSPP statistical procedures have a bizarre and somewhat irregular
9474 syntax.  Despite this, a parser generator has been written that
9475 adequately addresses many of the possibilities and tries to provide
9476 hooks for the exceptional cases.  This parser generator is named
9477 @code{q2c}.
9478
9479 @menu
9480 * Invoking q2c::                q2c command-line syntax.
9481 * q2c Input Structure::         High-level layout of the input file.
9482 * Grammar Rules::               Syntax of the grammar rules.
9483 @end menu
9484
9485 @node Invoking q2c, q2c Input Structure, q2c Input Format, q2c Input Format
9486 @section Invoking q2c
9487
9488 @example
9489 q2c @var{input.q} @var{output.c}
9490 @end example
9491
9492 @code{q2c} translates a @samp{.q} file into a @samp{.c} file.  It takes
9493 exactly two command-line arguments, which are the input file name and
9494 output file name, respectively.  @code{q2c} does not accept any
9495 command-line options.
9496
9497 @node q2c Input Structure, Grammar Rules, Invoking q2c, q2c Input Format
9498 @section @code{q2c} Input Structure
9499
9500 @code{q2c} input files are divided into two sections: the grammar rules
9501 and the supporting code.  The @dfn{grammar rules}, which make up the
9502 first part of the input, are used to define the syntax of the
9503 statistical procedure to be parsed.  The @dfn{supporting code},
9504 following the grammar rules, are copied largely unchanged to the output
9505 file, except for certain escapes.
9506
9507 The most important lines in the grammar rules are used for defining
9508 procedure syntax.  These lines can be prefixed with a dollar sign
9509 (@samp{$}), which prevents Emacs' CC-mode from munging them.  Besides
9510 this, a bang (@samp{!}) at the beginning of a line causes the line,
9511 minus the bang, to be written verbatim to the output file (useful for
9512 comments).  As a third special case, any line that begins with the exact
9513 characters @code{/* *INDENT} is ignored and not written to the output.
9514 This allows @code{.q} files to be processed through @code{indent}
9515 without being munged.
9516
9517 The syntax of the grammar rules themselves is given in the following
9518 sections.
9519
9520 The supporting code is passed into the output file largely unchanged.
9521 However, the following escapes are supported.  Each escape must appear
9522 on a line by itself.
9523
9524 @table @code
9525 @item /* (header) */
9526
9527 Expands to a series of C @code{#include} directives which include the
9528 headers that are required for the parser generated by @code{q2c}.
9529
9530 @item /* (decls @var{scope}) */
9531
9532 Expands to C variable and data type declarations for the variables and
9533 @code{enum}s input and output by the @code{q2c} parser.  @var{scope}
9534 must be either @code{local} or @code{global}.  @code{local} causes the
9535 declarations to be output as function locals.  @code{global} causes them
9536 to be declared as @code{static} module variables; thus, @code{global} is
9537 a bit of a misnomer.
9538
9539 @item /* (parser) */
9540
9541 Expands to the entire parser.  Must be enclosed within a C function.
9542
9543 @item /* (free) */
9544
9545 Expands to a set of calls to the @code{free} function for variables
9546 declared by the parser.  Only needs to be invoked if subcommands of type
9547 @code{string} are used in the grammar rules.
9548 @end table
9549
9550 @node Grammar Rules,  , q2c Input Structure, q2c Input Format
9551 @section Grammar Rules
9552
9553 The grammar rules describe the format of the syntax that the parser
9554 generated by @code{q2c} will understand.  The way that the grammar rules
9555 are included in @code{q2c} input file are described above.
9556
9557 The grammar rules are divided into tokens of the following types:
9558
9559 @table @asis
9560 @item Identifier (@code{ID})
9561
9562 An identifier token is a sequence of letters, digits, and underscores
9563 (@samp{_}).  Identifiers are @emph{not} case-sensitive.
9564
9565 @item String (@code{STRING})
9566
9567 String tokens are initiated by a double-quote character (@samp{"}) and
9568 consist of all the characters between that double quote and the next
9569 double quote, which must be on the same line as the first.  Within a
9570 string, a backslash can be used as a ``literal escape''.  The only
9571 reasons to use a literal escape are to include a double quote or a
9572 backslash within a string.
9573
9574 @item Special character
9575
9576 Other characters, other than whitespace, constitute tokens in
9577 themselves.
9578
9579 @end table
9580
9581 The syntax of the grammar rules is as follows:
9582
9583 @example
9584 grammar-rules ::= ID : subcommands .
9585 subcommands ::= subcommand
9586             ::= subcommands ; subcommand
9587 @end example
9588
9589 The syntax begins with an ID or STRING token that gives the name of the
9590 procedure to be parsed.  The rest of the syntax consists of subcommands
9591 separated by semicolons (@samp{;}) and terminated with a full stop
9592 (@samp{.}).
9593
9594 @example
9595 subcommand ::= sbc-options ID sbc-defn
9596 sbc-options ::=
9597             ::= sbc-option
9598             ::= sbc-options sbc-options
9599 sbc-option ::= *
9600            ::= +
9601 sbc-defn ::= opt-prefix = specifiers
9602          ::= [ ID ] = array-sbc
9603          ::= opt-prefix = sbc-special-form
9604 opt-prefix ::=
9605            ::= ( ID )
9606 @end example
9607
9608 Each subcommand can be prefixed with one or more option characters.  An
9609 asterisk (@samp{*}) is used to indicate the default subcommand; the
9610 keyword used for the default subcommand can be omitted in the PSPP
9611 syntax file.  A plus sign (@samp{+}) is used to indicate that a
9612 subcommand can appear more than once; if it is not present then that
9613 subcommand can appear no more than once.
9614
9615 The subcommand name appears after the option characters.
9616
9617 There are three forms of subcommands.  The first and most common form
9618 simply gives an equals sign (@samp{=}) and a list of specifiers, which
9619 can each be set to a single setting.  The second form declares an array,
9620 which is a set of flags that can be individually turned on by the user.
9621 There are also several special forms that do not take a list of
9622 specifiers.
9623
9624 Arrays require an additional @code{ID} argument.  This is used as a
9625 prefix, prepended to the variable names constructed from the
9626 specifiers.  The other forms also allow an optional prefix to be
9627 specified.
9628
9629 @example
9630 array-sbc ::= alternatives
9631           ::= array-sbc , alternatives
9632 alternatives ::= ID
9633              ::= alternatives | ID
9634 @end example
9635
9636 An array subcommand is a set of Boolean values that can independently be
9637 turned on by the user, listed separated by commas (@samp{,}).  If an value has more
9638 than one name then these names are separated by pipes (@samp{|}).
9639
9640 @example
9641 specifiers ::= specifier
9642            ::= specifiers , specifier
9643 specifier ::= opt-id : settings
9644 opt-id ::=
9645        ::= ID
9646 @end example
9647
9648 Ordinary subcommands (other than arrays and special forms) require a
9649 list of specifiers.  Each specifier has an optional name and a list of
9650 settings.  If the name is given then a correspondingly named variable
9651 will be used to store the user's choice of setting.  If no name is given
9652 then there is no way to tell which setting the user picked; in this case
9653 the settings should probably have values attached.
9654
9655 @example
9656 settings ::= setting
9657          ::= settings / setting
9658 setting ::= setting-options ID setting-value
9659 setting-options ::=
9660                 ::= *
9661                 ::= !
9662                 ::= * !
9663 @end example
9664
9665 Individual settings are separated by forward slashes (@samp{/}).  Each
9666 setting can be as little as an @code{ID} token, but options and values
9667 can optionally be included.  The @samp{*} option means that, for this
9668 setting, the @code{ID} can be omitted.  The @samp{!} option means that
9669 this option is the default for its specifier.
9670
9671 @example
9672 setting-value ::=
9673               ::= ( setting-value-2 )
9674               ::= setting-value-2
9675 setting-value-2 ::= setting-value-options setting-value-type : ID
9676                     setting-value-restriction
9677 setting-value-options ::=
9678                       ::= *
9679 setting-value-type ::= N
9680                    ::= D
9681 setting-value-restriction ::=
9682                           ::= , STRING
9683 @end example
9684
9685 Settings may have values.  If the value must be enclosed in parentheses,
9686 then enclose the value declaration in parentheses.  Declare the setting
9687 type as @samp{n} or @samp{d} for integer or floating point type,
9688 respectively.  The given @code{ID} is used to construct a variable name.
9689 If option @samp{*} is given, then the value is optional; otherwise it
9690 must be specified whenever the corresponding setting is specified.  A
9691 ``restriction'' can also be specified which is a string giving a C
9692 expression limiting the valid range of the value.  The special escape
9693 @code{%s} should be used within the restriction to refer to the
9694 setting's value variable.
9695
9696 @example
9697 sbc-special-form ::= VAR
9698                  ::= VARLIST varlist-options
9699                  ::= INTEGER opt-list
9700                  ::= DOUBLE opt-list
9701                  ::= PINT
9702                  ::= STRING @r{(the literal word STRING)} string-options
9703                  ::= CUSTOM
9704 varlist-options ::=
9705                 ::= ( STRING )
9706 opt-list ::=
9707          ::= LIST
9708 string-options ::=
9709                ::= ( STRING STRING )
9710 @end example
9711
9712 The special forms are of the following types:
9713
9714 @table @code
9715 @item VAR
9716
9717 A single variable name.
9718
9719 @item VARLIST
9720
9721 A list of variables.  If given, the string can be used to provide
9722 @code{PV_@var{*}} options to the call to @code{parse_variables}.
9723
9724 @item INTEGER
9725
9726 A single integer value.
9727
9728 @item INTEGER LIST
9729
9730 A list of integers separated by spaces or commas.
9731
9732 @item DOUBLE
9733
9734 A single floating-point value.
9735
9736 @item DOUBLE LIST
9737
9738 A list of floating-point values.
9739
9740 @item PINT
9741
9742 A single positive integer value.
9743
9744 @item STRING
9745
9746 A string value.  If the options are given then the first string is an
9747 expression giving a restriction on the value of the string; the second
9748 string is an error message to display when the restriction is violated.
9749
9750 @item CUSTOM
9751
9752 A custom function is used to parse this subcommand.  The function must
9753 have prototype @code{int custom_@var{name} (void)}.  It should return 0
9754 on failure (when it has already issued an appropriate diagnostic), 1 on
9755 success, or 2 if it fails and the calling function should issue a syntax
9756 error on behalf of the custom handler.
9757
9758 @end table
9759
9760 @node Bugs, Function Index, q2c Input Format, Top
9761 @chapter Bugs
9762
9763 @quotation
9764 As of fvwm 0.99 there were exactly 39.342 unidentified bugs.  Identified
9765 bugs have mostly been fixed, though.  Since then 9.34 bugs have been
9766 fixed.  Assuming that there are at least 10 unidentified bugs for every
9767 identified one, that leaves us with 39.342 - 9.34 + 10 * 9.34 = 123.422
9768 unidentified bugs.  If we follow this to its logical conclusion we
9769 will have an infinite number of unidentified bugs before the number of
9770 bugs can start to diminish, at which point the program will be
9771 bug-free.  Since this is a computer program infinity = 3.4028e+38 if you
9772 don't insist on double-precision.  At the current rate of bug discovery
9773 we should expect to achieve this point in 3.37e+27 years.  I guess I
9774 better plan on passing this thing on to my children@enddots{}
9775
9776 ---Robert Nation, @cite{fvwm manpage}.
9777 @end quotation
9778
9779 @menu
9780 * Known bugs::                  Pointers to other files.
9781 * Contacting the Author::       Where to send the bug reports.
9782 @end menu
9783
9784 @node Known bugs, Contacting the Author, Bugs, Bugs
9785 @section Known bugs
9786
9787 This is the list of known bugs in PSPP.  In addition, @xref{Not
9788 Implemented}, and @xref{Functions Not Implemented}, for lists of bugs
9789 due to features not implemented.  For known bugs in individual language
9790 features, see the documentation for that feature.
9791
9792 @itemize @bullet
9793 @item
9794 Nothing has yet been tested exhaustively. Be cautious using PSPP to
9795 make important decisions.
9796
9797 @item
9798 @code{make check} fails on some systems that don't like the syntax.  I'm
9799 not sure why.  If someone could make an attempt to track this down, it
9800 would be appreciated.
9801
9802 @item
9803 PostScript driver bugs:
9804
9805 @itemize @minus
9806 @item
9807 Does not support driver arguments `max-fonts-simult' or
9808 `optimize-text-size'.
9809
9810 @item
9811 Minor problems with font-encodings.
9812
9813 @item
9814 Fails to align fonts along their baselines.
9815
9816 @item
9817 Does not support certain bizarre line intersections--should
9818 never crop up in practice.
9819
9820 @item
9821 Does not gracefully substitute for existing fonts whose
9822 encodings are missing.
9823
9824 @item
9825 Does not perform italic correction or left italic correction
9826 on font changes.
9827
9828 @item
9829 Encapsulated PostScript is unimplemented.
9830 @end itemize
9831
9832 @item
9833 ASCII driver bugs:
9834
9835 @itemize @minus
9836 Does not support `infinite length' or `infinite width' paper.
9837 @end itemize
9838 @end itemize
9839
9840 See below for information on reporting bugs not listed here.
9841
9842 @node Contacting the Author,  , Known bugs, Bugs
9843 @section Contacting the Author
9844
9845 The author can be contacted at e-mail address
9846 @ifinfo
9847 <blp@@gnu.org>.
9848 @end ifinfo
9849 @iftex
9850 @code{<blp@@gnu.org>}.
9851 @end iftex
9852
9853 PSPP bug reports should be sent to
9854 @ifinfo
9855 <bug-gnu-pspp@@gnu.org>.
9856 @end ifinfo
9857 @iftex
9858 @code{<bug-gnu-pspp@@gnu.org>}.
9859 @end iftex
9860
9861 @node Function Index, Concept Index, Bugs, Top
9862 @chapter Function Index
9863 @printindex fn
9864
9865 @node Concept Index, Command Index, Function Index, Top
9866 @chapter Concept Index
9867 @printindex cp
9868
9869 @node Command Index,  , Concept Index, Top
9870 @chapter Command Index
9871 @printindex vr
9872
9873 @contents
9874 @bye
9875
9876 @c Local Variables:
9877 @c compile-command: "makeinfo pspp.texi"
9878 @c End: