1 This is pspp.info, produced by makeinfo version 4.0 from pspp.texi.
4 * PSPP: (pspp). Statistical analysis package.
7 PSPP, for statistical analysis of sampled data, by Ben Pfaff.
9 This file documents PSPP, a statistical package for analysis of
10 sampled data that uses a command language compatible with SPSS.
12 Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.
14 This version of the PSPP documentation is consistent with version 2
17 Permission is granted to make and distribute verbatim copies of this
18 manual provided the copyright notice and this permission notice are
19 preserved on all copies.
21 Permission is granted to copy and distribute modified versions of
22 this manual under the conditions for verbatim copying, provided that the
23 entire resulting derived work is distributed under the terms of a
24 permission notice identical to this one.
26 Permission is granted to copy and distribute translations of this
27 manual into another language, under the above condition for modified
28 versions, except that this permission notice may be stated in a
29 translation approved by the Free Software Foundation.
32 File: pspp.info, Node: Portable File Format, Next: q2c Input Format, Prev: Data File Format, Up: Top
37 These days, most computers use the same internal data formats for
38 integer and floating-point data, if one ignores little differences like
39 big- versus little-endian byte ordering. However, occasionally it is
40 necessary to exchange data between systems with incompatible data
41 formats. This is what portable files are designed to do.
43 *Please note:* Although all of the following information is correct,
44 as far as the author has been able to ascertain, it is gleaned from
45 examination of ASCII-formatted portable files only, so some of it may
46 be incorrect in the general case.
50 * Portable File Characters::
51 * Portable File Structure::
52 * Portable File Header::
53 * Version and Date Info Record::
54 * Identification Records::
55 * Variable Count Record::
57 * Value Label Records::
58 * Portable File Data::
61 File: pspp.info, Node: Portable File Characters, Next: Portable File Structure, Prev: Portable File Format, Up: Portable File Format
63 Portable File Characters
64 ========================
66 Portable files are arranged as a series of lines of exactly 80
67 characters each. Each line is terminated by a carriage-return,
68 line-feed sequence (henceforth, "newline"). Newlines are not
69 delimiters: they are only used to avoid line-length limitations existing
70 on some operating systems.
72 The file must be terminated with a `Z' character. In addition, if
73 the final line in the file does not have exactly 80 characters, then it
74 is padded on the right with `Z' characters. (The file contents may be
75 in any character set; the file contains a description of its own
76 character set, as explained in the next section. Therefore, the `Z'
77 character is not necessarily an ASCII `Z'.)
79 For the rest of the description of the portable file format, newlines
80 and the trailing `Z's will be ignored, as if they did not exist,
81 because they are not an important part of understanding the file
85 File: pspp.info, Node: Portable File Structure, Next: Portable File Header, Prev: Portable File Characters, Up: Portable File Format
87 Portable File Structure
88 =======================
90 Every portable file consists of the following records, in sequence:
94 * Version and date info.
96 * Product identification.
98 * Subproduct identification (optional).
102 * Variables. Each variable record may optionally be followed by a
103 missing value record and a variable label record.
105 * Value labels (optional).
109 Most records are identified by a single-character tag code. The file
110 header and version info record do not have a tag.
112 Other than these single-character codes, there are three types of
113 fields in a portable file: floating-point, integer, and string.
114 Floating-point fields have the following format:
116 * Zero or more leading spaces.
118 * Optional asterisk (`*'), which indicates a missing value. The
119 asterisk must be followed by a single character, generally a period
120 (`.'), but it appears that other characters may also be possible.
121 This completes the specification of a missing value.
123 * Optional minus sign (`-') to indicate a negative number.
125 * A whole number, consisting of one or more base-30 digits: `0'
126 through `9' plus capital letters `A' through `T'.
128 * A fraction, consisting of a radix point (`.') followed by one or
129 more base-30 digits (optional).
131 * An exponent, consisting of a plus or minus sign (`+' or `-')
132 followed by one or more base-30 digits (optional).
134 * A forward slash (`/').
136 Integer fields take form identical to floating-point fields, but they
137 may not contain a fraction.
139 String fields take the form of a integer field having value N,
140 followed by exactly N characters, which are the string content.
143 File: pspp.info, Node: Portable File Header, Next: Version and Date Info Record, Prev: Portable File Structure, Up: Portable File Format
148 Every portable file begins with a 464-byte header, consisting of a
149 200-byte collection of vanity splash strings, followed by a 256-byte
150 character set translation table, followed by an 8-byte tag string.
152 The 200-byte segment is divided into five 40-byte sections, each of
153 which represents the string `ASCII SPSS PORT FILE' in a different
154 character set encoding. (If the file is encoded in EBCDIC then the
155 string is actually `EBCDIC SPSS PORT FILE', and so on.) These strings
156 are padded on the right with spaces in their own character set.
158 It appears that these strings exist only to inform those who might
159 view the file on a screen, and that they are not parsed by SPSS
160 products. Thus, they can be safely ignored. For those interested, the
161 strings are supposed to be in the following character sets, in the
162 specified order: EBCDIC, 7-bit ASCII, CDC 6-bit ASCII, 6-bit ASCII,
163 Honeywell 6-bit ASCII.
165 The 256-byte segment describes a mapping from the character set used
166 in the portable file to an arbitrary character set having characters at
167 the following positions:
170 Control characters. Not important enough to describe in full here.
176 Digits `0' through `9'.
179 Capital letters `A' through `Z'.
182 Lowercase letters `a' through `z'.
194 Symbols `&[]!$*);^-/'
197 Broken vertical pipe.
203 British pound symbol.
209 Less than or equal symbol.
233 Lower left corner box draw.
236 Upper left corner box draw.
239 Greater than or equal symbol.
242 Superscript `0' through `9'.
245 Lower right corner box draw.
248 Upper right corner box draw.
263 Horizontal dagger (?).
272 Centered dot, or bullet.
277 Symbols that are not defined in a particular character set are set to
278 the same value as symbol 64; i.e., to `0'.
280 The 8-byte tag string consists of the exact characters `SPSSPORT' in
281 the portable file's character set, which can be used to verify that the
282 file is indeed a portable file.
285 File: pspp.info, Node: Version and Date Info Record, Next: Identification Records, Prev: Portable File Header, Up: Portable File Format
287 Version and Date Info Record
288 ============================
290 This record does not have a tag code. It has the following
293 * A single character identifying the file format version. The
294 letter A represents version 0, and so on.
296 * An 8-character string field giving the file creation date in the
299 * A 6-character string field giving the file creation time in the
303 File: pspp.info, Node: Identification Records, Next: Variable Count Record, Prev: Version and Date Info Record, Up: Portable File Format
305 Identification Records
306 ======================
308 The product identification record has tag code `1'. It consists of
309 a single string field giving the name of the product that wrote the
312 The subproduct identification record has tag code `3'. It consists
313 of a single string field giving additional information on the product
314 that wrote the portable file.
317 File: pspp.info, Node: Variable Count Record, Next: Variable Records, Prev: Identification Records, Up: Portable File Format
319 Variable Count Record
320 =====================
322 The variable count record has tag code `4'. It consists of two
323 integer fields. The first contains the number of variables in the file
324 dictionary. The purpose of the second is unknown; it contains the value
325 161 in all portable files examined so far.
328 File: pspp.info, Node: Variable Records, Next: Value Label Records, Prev: Variable Count Record, Up: Portable File Format
333 Each variable record represents a single variable. Variable records
334 have tag code `7'. They have the following structure:
336 * Width (integer). This is 0 for a numeric variable, and a number
337 between 1 and 255 for a string variable.
339 * Name (string). 1-8 characters long. Must be in all capitals.
341 * Print format. This is a set of three integer fields:
343 - Format type (*note Variable Record::).
345 - Format width. 1-40.
347 - Number of decimal places. 1-40.
349 * Write format. Same structure as the print format described above.
351 Each variable record can optionally be followed by a missing value
352 record, which has tag code `8'. A missing value record has one field,
353 the missing value itself (a floating-point or string, as appropriate).
354 Up to three of these missing value records can be used.
356 There is also a record for missing value ranges, which has tag code
357 `B'. It is followed by two fields representing the range, which are
358 floating-point or string as appropriate. If a missing value range is
359 present, it may be followed by a single missing value record.
361 Tag codes `9' and `A' represent `LO THRU X' and `X THRU HI' ranges,
362 respectively. Each is followed by a single field representing X. If
363 one of the ranges is present, it may be followed by a single missing
366 In addition, each variable record can optionally be followed by a
367 variable label record, which has tag code `C'. A variable label record
368 has one field, the variable label itself (string).
371 File: pspp.info, Node: Value Label Records, Next: Portable File Data, Prev: Variable Records, Up: Portable File Format
376 Value label records have tag code `D'. They have the following
379 * Variable count (integer).
381 * List of variables (strings). The variable count specifies the
382 number in the list. Variables are specified by their names. All
383 variables must be of the same type (numeric or string).
385 * Label count (integer).
387 * List of (value, label) tuples. The label count specifies the
388 number of tuples. Each tuple consists of a value, which is
389 numeric or string as appropriate to the variables, followed by a
393 File: pspp.info, Node: Portable File Data, Prev: Value Label Records, Up: Portable File Format
398 The data record has tag code `F'. There is only one tag for all the
399 data; thus, all the data must follow the dictionary. The data is
400 terminated by the end-of-file marker `Z', which is not valid as the
401 beginning of a data element.
403 Data elements are output in the same order as the variable records
404 describing them. String variables are output as string fields, and
405 numeric variables are output as floating-point fields.
408 File: pspp.info, Node: q2c Input Format, Next: Bugs, Prev: Portable File Format, Up: Top
413 PSPP statistical procedures have a bizarre and somewhat irregular
414 syntax. Despite this, a parser generator has been written that
415 adequately addresses many of the possibilities and tries to provide
416 hooks for the exceptional cases. This parser generator is named `q2c'.
420 * Invoking q2c:: q2c command-line syntax.
421 * q2c Input Structure:: High-level layout of the input file.
422 * Grammar Rules:: Syntax of the grammar rules.
425 File: pspp.info, Node: Invoking q2c, Next: q2c Input Structure, Prev: q2c Input Format, Up: q2c Input Format
432 `q2c' translates a `.q' file into a `.c' file. It takes exactly two
433 command-line arguments, which are the input file name and output file
434 name, respectively. `q2c' does not accept any command-line options.
437 File: pspp.info, Node: q2c Input Structure, Next: Grammar Rules, Prev: Invoking q2c, Up: q2c Input Format
439 `q2c' Input Structure
440 =====================
442 `q2c' input files are divided into two sections: the grammar rules
443 and the supporting code. The "grammar rules", which make up the first
444 part of the input, are used to define the syntax of the statistical
445 procedure to be parsed. The "supporting code", following the grammar
446 rules, are copied largely unchanged to the output file, except for
449 The most important lines in the grammar rules are used for defining
450 procedure syntax. These lines can be prefixed with a dollar sign
451 (`$'), which prevents Emacs' CC-mode from munging them. Besides this,
452 a bang (`!') at the beginning of a line causes the line, minus the
453 bang, to be written verbatim to the output file (useful for comments).
454 As a third special case, any line that begins with the exact characters
455 `/* *INDENT' is ignored and not written to the output. This allows
456 `.q' files to be processed through `indent' without being munged.
458 The syntax of the grammar rules themselves is given in the following
461 The supporting code is passed into the output file largely unchanged.
462 However, the following escapes are supported. Each escape must appear
466 Expands to a series of C `#include' directives which include the
467 headers that are required for the parser generated by `q2c'.
469 `/* (decls SCOPE) */'
470 Expands to C variable and data type declarations for the variables
471 and `enum's input and output by the `q2c' parser. SCOPE must be
472 either `local' or `global'. `local' causes the declarations to be
473 output as function locals. `global' causes them to be declared as
474 `static' module variables; thus, `global' is a bit of a misnomer.
477 Expands to the entire parser. Must be enclosed within a C
481 Expands to a set of calls to the `free' function for variables
482 declared by the parser. Only needs to be invoked if subcommands
483 of type `string' are used in the grammar rules.
486 File: pspp.info, Node: Grammar Rules, Prev: q2c Input Structure, Up: q2c Input Format
491 The grammar rules describe the format of the syntax that the parser
492 generated by `q2c' will understand. The way that the grammar rules are
493 included in `q2c' input file are described above.
495 The grammar rules are divided into tokens of the following types:
498 An identifier token is a sequence of letters, digits, and
499 underscores (`_'). Identifiers are _not_ case-sensitive.
502 String tokens are initiated by a double-quote character (`"') and
503 consist of all the characters between that double quote and the
504 next double quote, which must be on the same line as the first.
505 Within a string, a backslash can be used as a "literal escape".
506 The only reasons to use a literal escape are to include a double
507 quote or a backslash within a string.
510 Other characters, other than whitespace, constitute tokens in
513 The syntax of the grammar rules is as follows:
515 grammar-rules ::= ID : subcommands .
516 subcommands ::= subcommand
517 ::= subcommands ; subcommand
519 The syntax begins with an ID or STRING token that gives the name of
520 the procedure to be parsed. The rest of the syntax consists of
521 subcommands separated by semicolons (`;') and terminated with a full
524 subcommand ::= sbc-options ID sbc-defn
527 ::= sbc-options sbc-options
530 sbc-defn ::= opt-prefix = specifiers
531 ::= [ ID ] = array-sbc
532 ::= opt-prefix = sbc-special-form
536 Each subcommand can be prefixed with one or more option characters.
537 An asterisk (`*') is used to indicate the default subcommand; the
538 keyword used for the default subcommand can be omitted in the PSPP
539 syntax file. A plus sign (`+') is used to indicate that a subcommand
540 can appear more than once; if it is not present then that subcommand
541 can appear no more than once.
543 The subcommand name appears after the option characters.
545 There are three forms of subcommands. The first and most common form
546 simply gives an equals sign (`=') and a list of specifiers, which can
547 each be set to a single setting. The second form declares an array,
548 which is a set of flags that can be individually turned on by the user.
549 There are also several special forms that do not take a list of
552 Arrays require an additional `ID' argument. This is used as a
553 prefix, prepended to the variable names constructed from the
554 specifiers. The other forms also allow an optional prefix to be
557 array-sbc ::= alternatives
558 ::= array-sbc , alternatives
560 ::= alternatives | ID
562 An array subcommand is a set of Boolean values that can
563 independently be turned on by the user, listed separated by commas
564 (`,'). If an value has more than one name then these names are
565 separated by pipes (`|').
567 specifiers ::= specifier
568 ::= specifiers , specifier
569 specifier ::= opt-id : settings
573 Ordinary subcommands (other than arrays and special forms) require a
574 list of specifiers. Each specifier has an optional name and a list of
575 settings. If the name is given then a correspondingly named variable
576 will be used to store the user's choice of setting. If no name is given
577 then there is no way to tell which setting the user picked; in this case
578 the settings should probably have values attached.
581 ::= settings / setting
582 setting ::= setting-options ID setting-value
588 Individual settings are separated by forward slashes (`/'). Each
589 setting can be as little as an `ID' token, but options and values can
590 optionally be included. The `*' option means that, for this setting,
591 the `ID' can be omitted. The `!' option means that this option is the
592 default for its specifier.
595 ::= ( setting-value-2 )
597 setting-value-2 ::= setting-value-options setting-value-type : ID
598 setting-value-restriction
599 setting-value-options ::=
601 setting-value-type ::= N
603 setting-value-restriction ::=
606 Settings may have values. If the value must be enclosed in
607 parentheses, then enclose the value declaration in parentheses.
608 Declare the setting type as `n' or `d' for integer or floating point
609 type, respectively. The given `ID' is used to construct a variable
610 name. If option `*' is given, then the value is optional; otherwise it
611 must be specified whenever the corresponding setting is specified. A
612 "restriction" can also be specified which is a string giving a C
613 expression limiting the valid range of the value. The special escape
614 `%s' should be used within the restriction to refer to the setting's
617 sbc-special-form ::= VAR
618 ::= VARLIST varlist-options
622 ::= STRING (the literal word STRING) string-options
629 ::= ( STRING STRING )
631 The special forms are of the following types:
634 A single variable name.
637 A list of variables. If given, the string can be used to provide
638 `PV_*' options to the call to `parse_variables'.
641 A single integer value.
644 A list of integers separated by spaces or commas.
647 A single floating-point value.
650 A list of floating-point values.
653 A single positive integer value.
656 A string value. If the options are given then the first string is
657 an expression giving a restriction on the value of the string; the
658 second string is an error message to display when the restriction
662 A custom function is used to parse this subcommand. The function
663 must have prototype `int custom_NAME (void)'. It should return 0
664 on failure (when it has already issued an appropriate diagnostic),
665 1 on success, or 2 if it fails and the calling function should
666 issue a syntax error on behalf of the custom handler.
669 File: pspp.info, Node: Bugs, Next: Function Index, Prev: q2c Input Format, Up: Top
674 As of fvwm 0.99 there were exactly 39.342 unidentified bugs.
675 Identified bugs have mostly been fixed, though. Since then 9.34
676 bugs have been fixed. Assuming that there are at least 10
677 unidentified bugs for every identified one, that leaves us with
678 39.342 - 9.34 + 10 * 9.34 = 123.422 unidentified bugs. If we
679 follow this to its logical conclusion we will have an infinite
680 number of unidentified bugs before the number of bugs can start to
681 diminish, at which point the program will be bug-free. Since this
682 is a computer program infinity = 3.4028e+38 if you don't insist on
683 double-precision. At the current rate of bug discovery we should
684 expect to achieve this point in 3.37e+27 years. I guess I better
685 plan on passing this thing on to my children....
687 --Robert Nation, `fvwm manpage'.
691 * Known bugs:: Pointers to other files.
692 * Contacting the Author:: Where to send the bug reports.
695 File: pspp.info, Node: Known bugs, Next: Contacting the Author, Prev: Bugs, Up: Bugs
700 This is the list of known bugs in PSPP. In addition, *Note Not
701 Implemented::, and *Note Functions Not Implemented::, for lists of bugs
702 due to features not implemented. For known bugs in individual language
703 features, see the documentation for that feature.
705 * Nothing has yet been tested exhaustively. Be cautious using PSPP to
706 make important decisions.
708 * `make check' fails on some systems that don't like the syntax. I'm
709 not sure why. If someone could make an attempt to track this
710 down, it would be appreciated.
712 * PostScript driver bugs:
714 - Does not support driver arguments `max-fonts-simult' or
715 `optimize-text-size'.
717 - Minor problems with font-encodings.
719 - Fails to align fonts along their baselines.
721 - Does not support certain bizarre line intersections-should
722 never crop up in practice.
724 - Does not gracefully substitute for existing fonts whose
725 encodings are missing.
727 - Does not perform italic correction or left italic correction
730 - Encapsulated PostScript is unimplemented.
734 Does not support `infinite length' or `infinite width' paper.
736 See below for information on reporting bugs not listed here.
739 File: pspp.info, Node: Contacting the Author, Prev: Known bugs, Up: Bugs
741 Contacting the Author
742 =====================
744 The author can be contacted at e-mail address <blp@gnu.org>.
746 PSPP bug reports should be sent to <bug-gnu-pspp@gnu.org>.
749 File: pspp.info, Node: Function Index, Next: Concept Index, Prev: Bugs, Up: Top
756 * ABS: Miscellaneous Mathematics.
757 * ACOS: Trigonometry.
758 * ANY: Set Membership.
759 * ARCOS: Trigonometry.
760 * ARSIN: Trigonometry.
761 * ARTAN: Trigonometry.
762 * ASIN: Trigonometry.
763 * ATAN: Trigonometry.
764 * CDF.xxx: Functions Not Implemented.
765 * CDFNORM: Functions Not Implemented.
766 * CFVAR: Statistical Functions.
767 * CONCAT: String Functions.
769 * CTIME.DAYS: Time Extraction.
770 * CTIME.HOURS: Time Extraction.
771 * CTIME.MINUTES: Time Extraction.
772 * CTIME.SECONDS: Time Extraction.
773 * DATE.DMY: Date Construction.
774 * DATE.MDY: Date Construction.
775 * DATE.MOYR: Date Construction.
776 * DATE.QYR: Date Construction.
777 * DATE.WKYR: Date Construction.
778 * DATE.YRDAY: Date Construction.
779 * EXP: Advanced Mathematics.
780 * IDF.xxx: Functions Not Implemented.
781 * INDEX: String Functions.
782 * LAG: Miscellaneous Functions.
783 * LENGTH: String Functions.
784 * LG10: Advanced Mathematics.
785 * LN: Advanced Mathematics.
786 * LOWER: String Functions.
787 * LPAD: String Functions.
788 * LTRIM: String Functions.
789 * MAX: Statistical Functions.
790 * MEAN: Statistical Functions.
791 * MIN: Statistical Functions.
792 * MISSING: Missing Value Functions.
793 * MOD: Miscellaneous Mathematics.
794 * MOD10: Miscellaneous Mathematics.
795 * NCDF.xxx: Functions Not Implemented.
796 * NMISS: Missing Value Functions.
797 * NORMAL: Pseudo-Random Numbers.
798 * NUMBER: String Functions.
799 * NVALID: Missing Value Functions.
800 * PROBIT: Functions Not Implemented.
801 * RANGE: Set Membership.
802 * RINDEX: String Functions.
803 * RND: Miscellaneous Mathematics.
804 * RPAD: String Functions.
805 * RTRIM: String Functions.
806 * RV.xxx: Functions Not Implemented.
807 * SD: Statistical Functions.
809 * SQRT: Advanced Mathematics.
810 * STRING: String Functions.
811 * SUBSTR: String Functions.
812 * SUM: Statistical Functions.
813 * SYSMIS: Missing Value Functions.
815 * TIME.DAYS: Time Construction.
816 * TIME.HMS: Time Construction.
817 * TRUNC: Miscellaneous Mathematics.
818 * UNIFORM: Pseudo-Random Numbers.
819 * UPCASE: String Functions.
820 * VALUE: Missing Value Functions.
821 * VAR: Statistical Functions.
822 * VARIANCE: Statistical Functions.
823 * XDATE.DATE: Date Extraction.
824 * XDATE.HOUR: Date Extraction.
825 * XDATE.JDAY: Date Extraction.
826 * XDATE.MDAY: Date Extraction.
827 * XDATE.MINUTE: Date Extraction.
828 * XDATE.MONTH: Date Extraction.
829 * XDATE.QUARTER: Date Extraction.
830 * XDATE.SECOND: Date Extraction.
831 * XDATE.TDAY: Date Extraction.
832 * XDATE.TIME: Date Extraction.
833 * XDATE.WEEK: Date Extraction.
834 * XDATE.WKDAY: Date Extraction.
835 * XDATE.YEAR: Date Extraction.
836 * YRMODA: Miscellaneous Functions.