1 This is pspp.info, produced by makeinfo version 4.0 from pspp.texi.
4 * PSPP: (pspp). Statistical analysis package.
7 PSPP, for statistical analysis of sampled data, by Ben Pfaff.
9 This file documents PSPP, a statistical package for analysis of
10 sampled data that uses a command language compatible with SPSS.
12 Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.
14 This version of the PSPP documentation is consistent with version 2
17 Permission is granted to make and distribute verbatim copies of this
18 manual provided the copyright notice and this permission notice are
19 preserved on all copies.
21 Permission is granted to copy and distribute modified versions of
22 this manual under the conditions for verbatim copying, provided that the
23 entire resulting derived work is distributed under the terms of a
24 permission notice identical to this one.
26 Permission is granted to copy and distribute translations of this
27 manual into another language, under the above condition for modified
28 versions, except that this permission notice may be stated in a
29 translation approved by the Free Software Foundation.
32 File: pspp.info, Node: DESCRIPTIVES, Next: FREQUENCIES, Prev: Statistics, Up: Statistics
39 /MISSING={VARIABLE,LISTWISE} {INCLUDE,NOINCLUDE}
40 /FORMAT={LABELS,NOLABELS} {NOINDEX,INDEX} {LINE,SERIAL}
42 /STATISTICS={ALL,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,
43 SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,DEFAULT,
44 SESKEWNESS,SEKURTOSIS}
45 /SORT={NONE,MEAN,SEMEAN,STDDEV,VARIANCE,KURTOSIS,SKEWNESS,
46 RANGE,MINIMUM,MAXIMUM,SUM,SESKEWNESS,SEKURTOSIS,NAME}
49 The DESCRIPTIVES procedure reads the active file and outputs
50 descriptive statistics requested by the user. In addition, it can
51 optionally compute Z-scores.
53 The VARIABLES subcommand, which is required, specifies the list of
54 variables to be analyzed. Keyword VARIABLES is optional.
56 All other subcommands are optional:
58 The MISSING subcommand determines the handling of missing variables.
59 If INCLUDE is set, then user-missing values are included in the
60 calculations. If NOINCLUDE is set, which is the default, user-missing
61 values are excluded. If VARIABLE is set, then missing values are
62 excluded on a variable by variable basis; if LISTWISE is set, then the
63 entire case is excluded whenever any value in that case has a
64 system-missing or, if INCLUDE is set, user-missing value.
66 The FORMAT subcommand affects the output format. Currently the
67 LABELS/NOLABELS and NOINDEX/INDEX settings is not used. When SERIAL is
68 set, both valid and missing number of cases are listed in the output;
69 when NOSERIAL is set, only valid cases are listed.
71 The SAVE subcommand causes DESCRIPTIVES to calculate Z scores for all
72 the specified variables. The Z scores are saved to new variables.
73 Variable names are generated by trying first the original variable name
74 with Z prepended and truncated to a maximum of 8 characters, then the
75 names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through
76 ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence. In addition, Z score
77 variable names can be specified explicitly on VARIABLES in the variable
78 list by enclosing them in parentheses after each variable.
80 The STATISTICS subcommand specifies the statistics to be displayed:
83 All of the statistics below.
89 Standard error of the mean.
98 Kurtosis and standard error of the kurtosis.
101 Skewness and standard error of the skewness.
116 Mean, standard deviation of the mean, minimum, maximum.
119 Standard error of the kurtosis.
122 Standard error of the skewness.
124 The SORT subcommand specifies how the statistics should be sorted.
125 Most of the possible values should be self-explanatory. NAME causes the
126 statistics to be sorted by name. By default, the statistics are listed
127 in the order that they are specified on the VARIABLES subcommand. The A
128 and D settings request an ascending or descending sort order,
132 File: pspp.info, Node: FREQUENCIES, Next: CROSSTABS, Prev: DESCRIPTIVES, Up: Statistics
139 /FORMAT={TABLE,NOTABLE,LIMIT(limit)}
140 {STANDARD,CONDENSE,ONEPAGE[(onepage_limit)]}
142 {AVALUE,DVALUE,AFREQ,DFREQ}
145 /MISSING={EXCLUDE,INCLUDE}
146 /STATISTICS={DEFAULT,MEAN,SEMEAN,MEDIAN,MODE,STDDEV,VARIANCE,
147 KURTOSIS,SKEWNESS,RANGE,MINIMUM,MAXIMUM,SUM,
148 SESKEWNESS,SEKURTOSIS,ALL,NONE}
150 /PERCENTILES=percent...
152 (These options are not currently implemented.)
159 /VARIABLES=var_list (low,high)...
161 FREQUENCIES causes the data to be read and frequency tables to be
162 built and output for specified variables. FREQUENCIES can also
163 calculate and display descriptive statistics (including median and
164 mode) and percentiles.
166 In the future, FREQUENCIES will also support graphical output in the
167 form of bar charts and histograms. In addition, it will be able to
168 support percentiles for grouped data. (As a historical note, these
169 options were supported in a version of PSPP written years ago, but the
170 code has not survived.)
172 The VARIABLES subcommand is the only required subcommand. Specify
173 the variables to be analyzed. In most cases, this is all that is
174 required. This is known as "general mode".
176 Occasionally, one may want to invoke a special mode called "integer
177 mode". Normally, in general mode, PSPP will automatically determine
178 what values occur in the data. In integer mode, the user specifies the
179 range of values that the data assumes. To invoke this mode, specify a
180 range of data values in parentheses, separated by a comma. Data values
181 inside the range are truncated to the nearest integer, then assigned to
182 that value. If values occur outside this range, they are discarded.
184 The FORMAT subcommand controls the output format. It has several
187 * TABLE, the default, causes a frequency table to be output for every
188 variable specified. NOTABLE prevents them from being output.
189 LIMIT with a numeric argument causes them to be output except when
190 there are more than the specified number of values in the table.
192 * STANDARD frequency tables contain more complete information, but
193 also to take up more space on the printed page. CONDENSE
194 frequency tables are less informative but take up less space.
195 ONEPAGE with a numeric argument will output standard frequency
196 tables if there are the specified number of values or less,
197 condensed tables otherwise. ONEPAGE without an argument defaults
198 to a threshold of 50 values.
200 * LABELS causes value labels to be displayed in STANDARD frequency
201 tables. NOLABLES prevents this.
203 * Normally frequency tables are sorted in ascending order by value.
204 This is AVALUE. DVALUE tables are sorted in descending order by
205 value. AFREQ and DFREQ tables are sorted in ascending and
206 descending order, respectively, by frequency count.
208 * SINGLE spaced frequency tables are closely spaced. DOUBLE spaced
209 frequency tables have wider spacing.
211 * OLDPAGE and NEWPAGE are not currently used.
213 The MISSING subcommand controls the handling of user-missing values.
214 When EXCLUDE, the default, is set, user-missing values are not included
215 in frequency tables or statistics. When INCLUDE is set, user-missing
216 are included. System-missing values are never included in statistics,
217 but are listed in frequency tables.
219 The available STATISTICS are the same as available in DESCRIPTIVES
220 (*note DESCRIPTIVES::), with the addition of MEDIAN, the data's median
221 value, and MODE, the mode. (If there are multiple modes, the smallest
222 value is reported.) By default, the mean, standard deviation of the
223 mean, minimum, and maximum are reported for each variable.
225 NTILES causes the specified quartiles to be reported. For instance,
226 `/NTILES=4' would cause quartiles to be reported. In addition,
227 particular percentiles can be requested with the PERCENTILES subcommand.
230 File: pspp.info, Node: CROSSTABS, Prev: FREQUENCIES, Up: Statistics
236 /TABLES=var_list BY var_list [BY var_list]...
237 /MISSING={TABLE,INCLUDE,REPORT}
238 /WRITE={NONE,CELLS,ALL}
239 /FORMAT={TABLES,NOTABLES}
240 {LABELS,NOLABELS,NOVALLABS}
245 /CELLS={COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
247 /STATISTICS={CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
248 KAPPA,ETA,CORR,ALL,NONE}
251 /VARIABLES=var_list (low,high)...
253 CROSSTABS reads the active file and builds and displays
254 crosstabulation tables requested by the user. It can calculate several
255 statistics for each cell in the crosstabulation tables. In addition, a
256 number of statistics can be calculated for each table itself.
258 The TABLES subcommand is used to specify the tables to be reported.
259 Any number of dimensions is permitted, and any number of variables per
260 dimension is allowed. The TABLES subcommand may be repeated as many
261 times as needed. This is the only required subcommand in "general
264 Occasionally, one may want to invoke a special mode called "integer
265 mode". Normally, in general mode, PSPP will automatically determine
266 what values occur in the data. In integer mode, the user specifies the
267 range of values that the data assumes. To invoke this mode, specify the
268 VARIABLES subcommand, giving a range of data values in parentheses for
269 each variable to be used on the TABLES subcommand. Data values inside
270 the range are truncated to the nearest integer, then assigned to that
271 value. If values occur outside this range, they are discarded. When it
272 is present, the VARIABLES subcommand must precede the TABLES subcommand.
274 The MISSING subcommand determines the handling of user-missing
275 values. When set to TABLE, the default, missing values are dropped on
276 a table by table basis. When set to INCLUDE, user-missing values are
277 included in tables and statistics. When set to REPORT, which is
278 allowed only in integer mode, user-missing values are included in
279 tables but marked with an `M' (for "missing") and excluded from
280 statistical calculations.
282 Currently the WRITE subcommand is not used.
284 The FORMAT subcommand controls the characteristics of the
285 crosstabulation tables to be displayed. It has a number of possible
288 * TABLES, the default, causes crosstabulation tables to be output.
289 NOTABLES suppresses them.
291 * LABELS, the default, allows variable labels and value labels to
292 appear in the output. NOLABELS suppresses them. NOVALLABS
293 displays variable labels but suppresses value labels.
295 * PIVOT, the default, causes each TABLES subcommand to be displayed
296 in a pivot table format. NOPIVOT causes the old-style
297 crosstabulation format to be used.
299 * AVALUE, the default, causes values to be sorted in ascending order.
300 DVALUE asserts a descending sort order.
302 * INDEX/NOINDEX is currently ignored.
304 * BOX/NOBOX is currently ignored.
306 The CELLS subcommand controls the contents of each cell in the
307 displayed crosstabulation table. The possible settings are:
328 Standardized residual.
331 Adjusted standardized residual.
337 Suppress cells entirely.
339 `/CELLS' without any settings specified requests COUNT, ROW, COLUMN,
340 and TOTAL. If CELLS is not specified at all then only COUNT will be
343 The STATISTICS subcommand selects statistics for computation:
346 Pearson chi-square, likelihood ratio, Fisher's exact test,
347 continuity correction, linear-by-linear association.
353 Contingency coefficient.
359 Uncertainty coefficient.
383 Spearman correlation, Pearson's r.
391 Selected statistics are only calculated when appropriate for the
392 statistic. Certain statistics require tables of a particular size, and
393 some statistics are calculated only in integer mode.
395 `/STATISTICS' without any settings selects CHISQ. If the STATISTICS
396 subcommand is not given, no statistics are calculated.
398 *Please note:* Currently the implementation of CROSSTABS has the
401 * Pearson's R (but not Spearman!) is off a little.
403 * T values for Spearman's R and Pearson's R are wrong.
405 * How to calculate significance of symmetric and directional
408 * Asymmetric ASEs and T values for lambda are wrong.
410 * ASE of Goodman and Kruskal's tau is not calculated.
412 * ASE of symmetric somers' d is wrong.
414 * Approx. T of uncertainty coefficient is wrong.
416 Fix for any of these deficiencies would be welcomed.
419 File: pspp.info, Node: Utilities, Next: Not Implemented, Prev: Statistics, Up: Top
424 Commands that don't fit any other category are placed here.
426 Most of these commands are not affected by commands like IF and LOOP:
427 they take effect only once, unconditionally, at the time that they are
428 encountered in the input.
432 * COMMENT:: Document your syntax file.
433 * DOCUMENT:: Document the active file.
434 * DISPLAY DOCUMENTS:: Display active file documents.
435 * DISPLAY FILE LABEL:: Display the active file label.
436 * DROP DOCUMENTS:: Remove documents from the active file.
437 * EXECUTE:: Execute pending transformations.
438 * FILE LABEL:: Set the active file's label.
439 * INCLUDE:: Include a file within the current one.
440 * QUIT:: Terminate the PSPP session.
441 * SET:: Adjust PSPP runtime parameters.
442 * SUBTITLE:: Provide a document subtitle.
443 * SYSFILE INFO:: Display the dictionary in a system file.
444 * TITLE:: Provide a document title.
447 File: pspp.info, Node: COMMENT, Next: DOCUMENT, Prev: Utilities, Up: Utilities
452 Two possibles syntaxes:
453 COMMENT comment text ... .
456 The COMMENT command is ignored. It is used to provide information to
457 the author and other readers of the PSPP syntax file.
459 A COMMENT command can extend over any number of lines. Don't forget
460 to terminate it with a dot or a blank line!
463 File: pspp.info, Node: DOCUMENT, Next: DISPLAY DOCUMENTS, Prev: COMMENT, Up: Utilities
468 DOCUMENT documentary_text.
470 The DOCUMENT command adds one or more lines of descriptive
471 commentary to the active file. Documents added in this way are saved
472 to system files. They can be viewed using SYSFILE INFO or DISPLAY
473 DOCUMENTS. They can be removed from the active file with DROP
476 Specify the documentary text following the DOCUMENT keyword. You can
477 extend the documentary text over as many lines as necessary. Lines are
478 truncated at 80 characters width. Don't forget to terminate the
479 DOCUMENT command with a dot or a blank line.
482 File: pspp.info, Node: DISPLAY DOCUMENTS, Next: DISPLAY FILE LABEL, Prev: DOCUMENT, Up: Utilities
489 DISPLAY DOCUMENTS displays the documents in the active file. Each
490 document is preceded by a line giving the time and date that it was
491 added. *Note DOCUMENT::.
494 File: pspp.info, Node: DISPLAY FILE LABEL, Next: DROP DOCUMENTS, Prev: DISPLAY DOCUMENTS, Up: Utilities
501 DISPLAY FILE LABEL displays the file label contained in the active
502 file, if any. *Note FILE LABEL::.
505 File: pspp.info, Node: DROP DOCUMENTS, Next: EXECUTE, Prev: DISPLAY FILE LABEL, Up: Utilities
512 The DROP DOCUMENTS command removes all documents from the active
513 file. New documents can be added with the DOCUMENT utility (*note
516 DROP DOCUMENTS only changes the active file. It does not modify any
517 system files stored on disk.
520 File: pspp.info, Node: EXECUTE, Next: FILE LABEL, Prev: DROP DOCUMENTS, Up: Utilities
527 The EXECUTE utility causes the active file to be read and all pending
528 transformations to be executed.
531 File: pspp.info, Node: FILE LABEL, Next: INCLUDE, Prev: EXECUTE, Up: Utilities
536 FILE LABEL file_label.
538 Use the FILE LABEL command to provide a title for the active file.
539 This title will be saved into system files and portable files that are
540 created during this PSPP run.
542 It is not necessary to include quotes around file_label. If they are
543 included then they become part of the file label.
546 File: pspp.info, Node: INCLUDE, Next: QUIT, Prev: FILE LABEL, Up: Utilities
551 Two possible syntaxes:
555 The INCLUDE command causes the PSPP command processor to read an
556 additional command file as if it were included bodily in the current
559 INCLUDE files may be nested to any depth, up to the limit of
563 File: pspp.info, Node: QUIT, Next: SET, Prev: INCLUDE, Up: Utilities
568 Two possible syntaxes:
572 The QUIT command terminates the current PSPP session and returns
573 control to the operating system.
575 This command is not valid within a command file.
578 File: pspp.info, Node: SET, Next: SUBTITLE, Prev: QUIT, Up: Utilities
586 /BLANKS={SYSMIS,'.',number}
595 /CPROMPT='cprompt_string'
596 /DPROMPT='dprompt_string'
599 /MXWARNS=max_warnings
601 /VIEWLENGTH={MINIMUM,MEDIAN,MAXIMUM,n_lines}
602 /VIEWWIDTH=n_characters
606 /MITERATE=max_iterations
610 /SEED={RANDOM,seed_value}
611 /UNDEFINED={WARN,NOWARN}
614 /CC{A,B,C,D,E}={'npre,pre,suf,nsuf','npre.pre.suf.nsuf'}
620 /ERRORS={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
622 /MESSAGES={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
624 /RESULTS={ON,OFF,TERMINAL,LISTING,BOTH,NONE}
631 (output driver options)
632 /HEADERS={NO,YES,BLANK}
633 /LENGTH={NONE,length_in_lines}
636 /PAGER={OFF,"pager_name"}
637 /WIDTH={NARROW,WIDTH,n_characters}
640 /JOURNAL={ON,OFF} [filename]
641 /LOG={ON,OFF} [filename]
644 /COMPRESSION={ON,OFF}
645 /SCOMPRESSION={ON,OFF}
650 (obsolete settings accepted for compatibility, but ignored)
654 /BOXSTRING={'xxx','xxxxxxxxxxx'}
660 /HELPWINDOWS={ON,OFF}
663 /LOWRES={AUTO,ON,OFF}
665 /MENUS={STANDARD,EXTENDED}
669 /RUNREVIEW={AUTO,MANUAL}
671 /TB1={'xxx','xxxxxxxxxxx'}
673 /WORKDEV=drive_letter
674 /WORKSPACE=workspace_size
677 The SET command allows the user to adjust several parameters
678 relating to PSPP's execution. Since there are many subcommands to this
679 command, its subcommands will be examined in groups.
681 As a general comment, ON and YES are considered synonymous, and so
682 are OFF and NO, when used as subcommand values.
684 The data input subcommands affect the way that data is read from data
685 files. The data input subcommands are
688 This is the value assigned to an item data item that is empty or
689 contains only whitespace. An argument of SYSMIS or '.' will cause
690 the system-missing value to be assigned to null items. This is the
691 default. Any real value may be assigned.
694 The default DOT setting causes the decimal point character to be
695 `.'. A setting of COMMA causes the decimal point character to be
699 Allows the default numeric input/output format to be specified.
700 The default is F8.2. *Note Input/Output Formats::.
702 Program input subcommands affect the way that programs are parsed
703 when they are typed interactively or run from a script. They are
706 This is a single character indicating the end of a command. The
707 default is `.'. Don't change this.
710 Whether a blank line is interpreted as ending the current command.
713 Interaction subcommands affect the way that PSPP interacts with an
714 online user. The interaction subcommands are
717 The command continuation prompt. The default is ` > '.
720 Prompt used when expecting data input within BEGIN DATA (*note
721 BEGIN DATA::). The default is `data> '.
724 Whether an error causes PSPP to stop processing the current command
725 file after finishing the current command. The default is OFF.
728 The maximum number of errors before PSPP halts processing of the
729 current command file. The default is 50.
732 The maximum number of warnings + errors before PSPP halts
733 processing the current command file. The default is 100.
736 The command prompt. The default is `PSPP> '.
739 The length of the screen in lines. MINIMUM means 25 lines, MEDIAN
740 and MAXIMUM mean 43 lines. Otherwise specify the number of lines.
741 Normally PSPP should auto-detect your screen size so this
742 shouldn't have to be used.
745 The width of the screen in characters. Normally 80 or 132.
747 Program execution subcommands control the way that PSPP commands
748 execute. The program execution subcommands are
757 The maximum number of iterations for an uncontrolled loop.
760 The initial pseudo-random number seed. Set to a real number or to
761 RANDOM, which will obtain an initial seed from the current time of
767 Data output subcommands affect the format of output data. These
775 Set up custom currency formats. The argument is a string which
776 must contain exactly three commas or exactly three periods. If
777 commas, then the grouping character for the currency format is
778 `,', and the decimal point character is `.'; if periods, then the
779 situation is reversed.
781 The commas or periods divide the string into four fields, which
782 are, in order, the negative prefix, prefix, suffix, and negative
783 suffix. When a value is formatted using the custom currency
784 format, the prefix precedes the value formatted and the suffix
785 follows it. In addition, if the value is negative, the negative
786 prefix precedes the prefix and the negative suffix follows the
790 The default DOT setting causes the decimal point character to be
791 `.'. A setting of COMMA causes the decimal point character to be
795 Allows the default numeric input/output format to be specified.
796 The default is F8.2. *Note Input/Output Formats::.
798 Output routing subcommands affect where the output of transformations
799 and procedures is sent. These subcommands are
802 If turned on, commands are written to the listing file as they are
803 read from command files. The default is OFF.
814 Output activation subcommands affect whether output devices of
815 particular types are enabled. These subcommands are
818 Enable or disable listing devices.
821 Enable or disable printer devices.
824 Enable or disable screen devices.
826 Output driver option subcommands affect output drivers' settings.
827 These subcommands are
837 Logging subcommands affect logging of commands executed to external
838 files. These subcommands are
845 System file subcommands affect the default format of system files
846 produced by PSPP. These subcommands are
852 Whether system files created by SAVE or XSAVE are compressed by
853 default. The default is ON.
855 Security subcommands affect the operations that commands are allowed
856 to perform. The security subcommands are
859 When set, this setting cannot ever be reset, for obvious security
860 reasons. Setting this option disables the following operations:
866 * Pipe filenames (filenames beginning or ending with `|').
869 Be aware that this setting does not guarantee safety (commands can
870 still overwrite files, for instance) but it is an improvement.
873 File: pspp.info, Node: SUBTITLE, Next: TITLE, Prev: SET, Up: Utilities
878 Two possible syntaxes:
879 SUBTITLE 'subtitle_string'.
880 SUBTITLE subtitle_string.
882 The SUBTITLE command is used to provide a subtitle to a particular
883 PSPP run. This subtitle appears at the top of each output page below
884 the title, if titles are enabled on the output device.
886 Specify a subtitle as a string in quotes. The alternate syntax that
887 did not require quotes is now obsolete. If it is used then the
888 subtitle is converted to all uppercase.
891 File: pspp.info, Node: TITLE, Prev: SUBTITLE, Up: Utilities
896 Two possible syntaxes:
897 TITLE 'title_string'.
900 The TITLE command is used to provide a title to a particular PSPP
901 run. This title appears at the top of each output page, if titles are
902 enabled on the output device.
904 Specify a title as a string in quotes. The alternate syntax that did
905 not require quotes is now obsolete. If it is used then the title is
906 converted to all uppercase.
909 File: pspp.info, Node: Not Implemented, Next: Data File Format, Prev: Utilities, Up: Top
914 This chapter lists parts of the PSPP language that are not yet
917 The following transformations and utilities are not yet implemented,
918 but they will be supported in a later release.
944 The following transformations and utilities are not implemented.
945 There are no plans to support them in future releases. Contributions to
946 implement them will still be accepted.
966 * NUMBERED and UNNUMBERED
979 File: pspp.info, Node: Data File Format, Next: Portable File Format, Prev: Not Implemented, Up: Top
984 PSPP necessarily uses the same format for system files as do the
985 products with which it is compatible. This chapter is a description of
988 There are three data types used in system files: 32-bit integers,
989 64-bit floating points, and 1-byte characters. In this document these
990 will simply be referred to as `int32', `flt64', and `char', the names
991 that are used in the PSPP source code. Every field of type `int32' or
992 `flt64' is aligned on a 32-bit boundary.
994 The endianness of data in PSPP system files is not specified. System
995 files output on a computer of a particular endianness will have the
996 endianness of that computer. However, PSPP can read files of either
997 endianness, regardless of its host computer's endianness. PSPP
998 translates endianness for both integer and floating point numbers.
1000 Floating point formats are also not specified. PSPP does not
1001 translate between floating point formats. This is unlikely to be a
1002 problem as all modern computer architectures use IEEE 754 format for
1003 floating point representation.
1005 The PSPP system-missing value is represented by the largest possible
1006 negative number in the floating point format; in C, this is most likely
1007 `-DBL_MAX'. There are two other important values used in missing
1008 values: `HIGHEST' and `LOWEST'. These are represented by the largest
1009 possible positive number (probably `DBL_MAX') and the second-largest
1010 negative number. The latter must be determined in a system-dependent
1011 manner; in IEEE 754 format it is represented by value
1012 `0xffeffffffffffffe'.
1014 System files are divided into records. Each record begins with an
1015 `int32' giving a numeric record type. Individual record types are
1020 * File Header Record::
1022 * Value Label Record::
1023 * Value Label Variable Record::
1025 * Machine int32 Info Record::
1026 * Machine flt64 Info Record::
1027 * Miscellaneous Informational Records::
1028 * Dictionary Termination Record::
1032 File: pspp.info, Node: File Header Record, Next: Variable Record, Prev: Data File Format, Up: Data File Format
1037 The file header is always the first record in the file.
1039 struct sysfile_header
1049 char creation_date[9];
1050 char creation_time[8];
1051 char file_label[64];
1056 Record type code. Always set to `$FL2'. This is the only record
1057 for which the record type is not of type `int32'.
1059 `char prod_name[60];'
1060 Product identification string. This always begins with the
1061 characters `@(#) SPSS DATA FILE'. PSPP uses the remaining
1062 characters to give its version and the operating system name; for
1063 example, `GNU pspp 0.1.4 - sparc-sun-solaris2.5.2'. The string is
1064 truncated if it would be longer than 60 characters; otherwise it
1065 is padded on the right with spaces.
1067 `int32 layout_code;'
1068 Always set to 2. PSPP reads this value in order to determine the
1072 Number of data elements per case. This is the number of variables,
1073 except that long string variables add extra data elements (one for
1074 every 8 characters after the first 8).
1077 Set to 1 if the data in the file is compressed, 0 otherwise.
1079 `int32 weight_index;'
1080 If one of the variables in the data set is used as a weighting
1081 variable, set to the index of that variable. Otherwise, set to 0.
1084 Set to the number of cases in the file if it is known, or -1
1087 In the general case it is not possible to determine the number of
1088 cases that will be output to a system file at the time that the
1089 header is written. The way that this is dealt with is by writing
1090 the entire system file, including the header, then seeking back to
1091 the beginning of the file and writing just the `ncases' field.
1092 For `files' in which this is not valid, the seek operation fails.
1093 In this case, `ncases' remains -1.
1096 Compression bias. Always set to 100. The significance of this
1097 value is that only numbers between `(1 - bias)' and `(251 - bias)'
1100 `char creation_date[9];'
1101 Set to the date of creation of the system file, in `dd mmm yy'
1102 format, with the month as standard English abbreviations, using an
1103 initial capital letter and following with lowercase. If the date
1104 is not available then this field is arbitrarily set to `01 Jan 70'.
1106 `char creation_time[8];'
1107 Set to the time of creation of the system file, in `hh:mm:ss'
1108 format and using 24-hour time. If the time is not available then
1109 this field is arbitrarily set to `00:00:00'.
1111 `char file_label[64];'
1112 Set the the file label declared by the user, if any. Padded on the
1116 Ignored padding bytes to make the structure a multiple of 32 bits
1117 in length. Set to zeros.
1120 File: pspp.info, Node: Variable Record, Next: Value Label Record, Prev: File Header Record, Up: Data File Format
1125 Immediately following the header must come the variable records.
1126 There must be one variable record for every variable and every 8
1127 characters in a long string beyond the first 8; i.e., there must be
1128 exactly as many variable records as the value specified for `case_size'
1129 in the file header record.
1131 struct sysfile_variable
1135 int32 has_var_label;
1136 int32 n_missing_values;
1141 /* The following two fields are present
1142 only if has_var_label is 1. */
1144 char label[/* variable length */];
1146 /* The following field is present only
1147 if n_missing_values is not 0. */
1148 flt64 missing_values[/* variable length*/];
1152 Record type code. Always set to 2.
1155 Variable type code. Set to 0 for a numeric variable. For a short
1156 string variable or the first part of a long string variable, this
1157 is set to the width of the string. For the second and subsequent
1158 parts of a long string variable, set to -1, and the remaining
1159 fields in the structure are ignored.
1161 `int32 has_var_label;'
1162 If this variable has a variable label, set to 1; otherwise, set to
1165 `int32 n_missing_values;'
1166 If the variable has no missing values, set to 0. If the variable
1167 has one, two, or three discrete missing values, set to 1, 2, or 3,
1168 respectively. If the variable has a range for missing variables,
1169 set to -2; if the variable has a range for missing variables plus
1170 a single discrete value, set to -3.
1173 Print format for this variable. See below.
1176 Write format for this variable. See below.
1179 Variable name. The variable name must begin with a capital letter
1180 or the at-sign (`@'). Subsequent characters may also be
1181 octothorpes (`#'), dollar signs (`$'), underscores (`_'), or full
1182 stops (`.'). The variable name is padded on the right with spaces.
1185 This field is present only if `has_var_label' is set to 1. It is
1186 set to the length, in characters, of the variable label, which
1187 must be a number between 0 and 120.
1189 `char label[/* variable length */];'
1190 This field is present only if `has_var_label' is set to 1. It has
1191 length `label_len', rounded up to the nearest multiple of 32 bits.
1192 The first `label_len' characters are the variable's variable label.
1194 `flt64 missing_values[/* variable length */];'
1195 This field is present only if `n_missing_values' is not 0. It has
1196 the same number of elements as the absolute value of
1197 `n_missing_values'. For discrete missing values, each element
1198 represents one missing value. When a range is present, the first
1199 element denotes the minimum value in the range, and the second
1200 element denotes the maximum value in the range. When a range plus
1201 a value are present, the third element denotes the additional
1202 discrete missing value. HIGHEST and LOWEST are indicated as
1203 described in the chapter introduction.
1205 The `print' and `write' members of sysfile_variable are output
1206 formats coded into `int32' types. The LSB (least-significant byte) of
1207 the `int32' represents the number of decimal places, and the next two
1208 bytes in order of increasing significance represent field width and
1209 format type, respectively. The MSB (most-significant byte) is not used
1210 and should be set to zero.
1212 Format types are defined as follows:
1334 File: pspp.info, Node: Value Label Record, Next: Value Label Variable Record, Prev: Variable Record, Up: Data File Format
1339 Value label records must follow the variable records and must precede
1340 the header termination record. Other than this, they may appear
1341 anywhere in the system file. Every value label record must be
1342 immediately followed by a label variable record, described below.
1344 Value label records begin with `rec_type', an `int32' value set to
1345 the record type of 3. This is followed by `count', an `int32' value
1346 set to the number of value labels present in this record.
1348 These two fields are followed by a series of `count' tuples. Each
1349 tuple is divided into two fields, the value and the label. The first of
1350 these, the value, is composed of a 64-bit value, which is either a
1351 `flt64' value or up to 8 characters (padded on the right to 8 bytes)
1352 denoting a short string value. Whether the value is a `flt64' or a
1353 character string is not defined inside the value label record.
1355 The second field in the tuple, the label, has variable length. The
1356 first `char' is a count of the number of characters in the value label.
1357 The remainder of the field is the label itself. The field is padded
1358 on the right to a multiple of 64 bits in length.
1361 File: pspp.info, Node: Value Label Variable Record, Next: Document Record, Prev: Value Label Record, Up: Data File Format
1363 Value Label Variable Record
1364 ===========================
1366 Every value label variable record must be immediately preceded by a
1367 value label record, described above.
1369 struct sysfile_value_label_variable
1373 int32 vars[/* variable length */];
1377 Record type. Always set to 4.
1380 Number of variables that the associated value labels from the value
1381 label record are to be applied.
1383 `int32 vars[/* variable length];'
1384 A list of variables to which to apply the value labels. There are
1388 File: pspp.info, Node: Document Record, Next: Machine int32 Info Record, Prev: Value Label Variable Record, Up: Data File Format
1393 There must be no more than one document record per system file.
1394 Document records must follow the variable records and precede the
1395 dictionary termination record.
1397 struct sysfile_document
1401 char lines[/* variable length */][80];
1405 Record type. Always set to 6.
1408 Number of lines of documents present.
1410 `char lines[/* variable length */][80];'
1411 Document lines. The number of elements is defined by `n_lines'.
1412 Lines shorter than 80 characters are padded on the right with
1416 File: pspp.info, Node: Machine int32 Info Record, Next: Machine flt64 Info Record, Prev: Document Record, Up: Data File Format
1418 Machine `int32' Info Record
1419 ===========================
1421 There must be no more than one machine `int32' info record per
1422 system file. Machine `int32' info records must follow the variable
1423 records and precede the dictionary termination record.
1425 struct sysfile_machine_int32_info
1434 int32 version_major;
1435 int32 version_minor;
1436 int32 version_revision;
1438 int32 floating_point_rep;
1439 int32 compression_code;
1441 int32 character_code;
1445 Record type. Always set to 7.
1448 Record subtype. Always set to 3.
1451 Size of each piece of data in the data part, in bytes. Always set
1455 Number of pieces of data in the data part. Always set to 8.
1457 `int32 version_major;'
1458 PSPP major version number. In version X.Y.Z, this is X.
1460 `int32 version_minor;'
1461 PSPP minor version number. In version X.Y.Z, this is Y.
1463 `int32 version_revision;'
1464 PSPP version revision number. In version X.Y.Z, this is Z.
1466 `int32 machine_code;'
1467 Machine code. PSPP always set this field to value to -1, but other
1470 `int32 floating_point_rep;'
1471 Floating point representation code. For IEEE 754 systems this is
1472 1. IBM 370 sets this to 2, and DEC VAX E to 3.
1474 `int32 compression_code;'
1475 Compression code. Always set to 1.
1478 Machine endianness. 1 indicates big-endian, 2 indicates
1481 `int32 character_code;'
1482 Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3
1483 indicates 8-bit ASCII, 4 indicates DEC Kanji.
1486 File: pspp.info, Node: Machine flt64 Info Record, Next: Miscellaneous Informational Records, Prev: Machine int32 Info Record, Up: Data File Format
1488 Machine `flt64' Info Record
1489 ===========================
1491 There must be no more than one machine `flt64' info record per
1492 system file. Machine `flt64' info records must follow the variable
1493 records and precede the dictionary termination record.
1495 struct sysfile_machine_flt64_info
1510 Record type. Always set to 3.
1513 Record subtype. Always set to 4.
1516 Size of each piece of data in the data part, in bytes. Always set
1520 Number of pieces of data in the data part. Always set to 3.
1523 The system missing value.
1526 The value used for HIGHEST in missing values.
1529 The value used for LOWEST in missing values.
1532 File: pspp.info, Node: Miscellaneous Informational Records, Next: Dictionary Termination Record, Prev: Machine flt64 Info Record, Up: Data File Format
1534 Miscellaneous Informational Records
1535 ===================================
1537 Miscellaneous informational records must follow the variable records
1538 and precede the dictionary termination record.
1540 Miscellaneous informational records are ignored by PSPP when reading
1541 system files. They are not written by PSPP when writing system files.
1543 struct sysfile_misc_info
1552 char data[/* variable length */];
1556 Record type. Always set to 3.
1559 Record subtype. May take any value.
1562 Size of each piece of data in the data part. Should have the
1563 value 4 or 8, for `int32' and `flt64', respectively.
1566 Number of pieces of data in the data part.
1568 `char data[/* variable length */];'
1569 Arbitrary data. There must be `size' times `count' bytes of data.
1572 File: pspp.info, Node: Dictionary Termination Record, Next: Data Record, Prev: Miscellaneous Informational Records, Up: Data File Format
1574 Dictionary Termination Record
1575 =============================
1577 The dictionary termination record must follow all other records,
1578 except for the actual cases, which it must precede. There must be
1579 exactly one dictionary termination record in every system file.
1581 struct sysfile_dict_term
1588 Record type. Always set to 999.
1591 Ignored padding. Should be set to 0.
1594 File: pspp.info, Node: Data Record, Prev: Dictionary Termination Record, Up: Data File Format
1599 Data records must follow all other records in the data file. There
1600 must be at least one data record in every system file.
1602 The format of data records varies depending on whether the data is
1603 compressed. Regardless, the data is arranged in a series of 8-byte
1606 When data is not compressed, Every case is composed of `case_size'
1607 of these 8-byte elements, where `case_size' comes from the file header
1608 record (*note File Header Record::). Each element corresponds to the
1609 variable declared in the respective variable record (*note Variable
1610 Record::). Numeric values are given in `flt64' format; string values
1611 are literal characters string, padded on the right when necessary.
1613 Compressed data is arranged in the following manner: the first 8-byte
1614 element in the data section is divided into a series of 1-byte command
1615 codes. These codes have meanings as described below:
1618 Ignored. If the program writing the system file accumulates
1619 compressed data in blocks of fixed length, 0 bytes can be used to
1620 pad out extra bytes remaining at the end of a fixed-size block.
1623 These values indicate that the corresponding numeric variable has
1624 the value `(CODE - BIAS)' for the case being read, where CODE is
1625 the value of the compression code and BIAS is the variable
1626 `compression_bias' from the file header. For example, code 105
1627 with bias 100.0 (the normal value) indicates a numeric variable of
1631 End of file. This code may or may not appear at the end of the
1632 data stream. PSPP always outputs this code but its use is not
1636 This value indicates that the numeric or string value is not
1637 compressible. The value is stored in the 8-byte element following
1638 the current block of command bytes. If this value appears twice
1639 in a block of command bytes, then it indicates the second element
1640 following the command bytes, and so on.
1643 Used to indicate a string value that is all spaces.
1646 Used to indicate the system-missing value.
1648 When the end of the first 8-byte element of command bytes is reached,
1649 any blocks of non-compressible values are skipped, and the next element
1650 of command bytes is read and interpreted, until the end of the file is