1 This is pspp.info, produced by makeinfo version 4.0 from pspp.texi.
4 * PSPP: (pspp). Statistical analysis package.
7 PSPP, for statistical analysis of sampled data, by Ben Pfaff.
9 This file documents PSPP, a statistical package for analysis of
10 sampled data that uses a command language compatible with SPSS.
12 Copyright (C) 1996-9, 2000 Free Software Foundation, Inc.
14 This version of the PSPP documentation is consistent with version 2
17 Permission is granted to make and distribute verbatim copies of this
18 manual provided the copyright notice and this permission notice are
19 preserved on all copies.
21 Permission is granted to copy and distribute modified versions of
22 this manual under the conditions for verbatim copying, provided that the
23 entire resulting derived work is distributed under the terms of a
24 permission notice identical to this one.
26 Permission is granted to copy and distribute translations of this
27 manual into another language, under the above condition for modified
28 versions, except that this permission notice may be stated in a
29 translation approved by the Free Software Foundation.
32 File: pspp.info, Node: SAVE, Next: SYSFILE INFO, Prev: MATCH FILES, Up: System and Portable Files
39 /{COMPRESSED,UNCOMPRESSED}
42 /RENAME=(src_names=target_names)...
44 The SAVE procedure causes the dictionary and data in the active file
45 to be written to a system file.
47 The FILE subcommand is the only required subcommand. Specify the
48 system file to be written as a string file name or a file handle (*note
51 The COMPRESS and UNCOMPRESS subcommand determine whether the saved
52 system file is compressed. By default, system files are compressed.
53 This default can be changed with the SET command (*note SET::).
55 By default, all the variables in the active file dictionary are
56 written to the system file. The DROP subcommand can be used to specify
57 a list of variables not to be written. In contrast, KEEP specifies
58 variables to be written, with all variables not specified not written.
60 Normally variables are saved to a system file under the same names
61 they have in the active file. Use the RENAME command to change these
62 names. Specify, within parentheses, a list of variable names followed
63 by an equals sign (`=') and the names that they should be renamed to.
64 Multiple parenthesized groups of variable names can be included on a
65 single RENAME subcommand. Variables' names may be swapped using a
66 RENAME subcommand of the form `/RENAME=(A B=B A)'.
68 Alternate syntax for the RENAME subcommand allows the parentheses to
69 be eliminated. When this is done, only a single variable may be
70 renamed at once. For instance, `/RENAME=A=B'. This alternate syntax is
73 DROP, KEEP, and RENAME are performed in left-to-right order. They
74 each may be present any number of times.
76 Please note that DROP, KEEP, and RENAME do not cause the active file
77 to be modified. Only the system file written to disk is changed.
79 SAVE causes the data to be read. It is a procedure.
82 File: pspp.info, Node: SYSFILE INFO, Next: XSAVE, Prev: SAVE, Up: System and Portable Files
87 SYSFILE INFO FILE='filename'.
89 The SYSFILE INFO command reads the dictionary in a system file and
90 displays the information in its dictionary.
92 Specify a file name or file handle. SYSFILE INFO will read that
93 file as a system file and display information on its dictionary.
95 The file does not replace the current active file.
98 File: pspp.info, Node: XSAVE, Prev: SYSFILE INFO, Up: System and Portable Files
105 /{COMPRESSED,UNCOMPRESSED}
108 /RENAME=(src_names=target_names)...
110 The XSAVE transformation writes the active file dictionary and data
111 to a system file stored on disk.
113 XSAVE is a transformation, not a procedure. It is executed when the
114 data is read by a procedure or procedure-like command. In all other
115 respects, XSAVE is identical to SAVE. *Note SAVE::, for more
116 information on syntax and usage.
119 File: pspp.info, Node: Variable Attributes, Next: Data Manipulation, Prev: System and Portable Files, Up: Top
121 Manipulating variables
122 **********************
124 The variables in the active file dictionary are important. There are
125 several utility functions for examining and adjusting them.
129 * ADD VALUE LABELS:: Add value labels to variables.
130 * DISPLAY:: Display variable names & descriptions.
131 * DISPLAY VECTORS:: Display a list of vectors.
132 * FORMATS:: Set print and write formats.
133 * LEAVE:: Don't clear variables between cases.
134 * MISSING VALUES:: Set missing values for variables.
135 * MODIFY VARS:: Rename, reorder, and drop variables.
136 * NUMERIC:: Create new numeric variables.
137 * PRINT FORMATS:: Set variable print formats.
138 * RENAME VARIABLES:: Rename variables.
139 * VALUE LABELS:: Set value labels for variables.
140 * STRING:: Create new string variables.
141 * VARIABLE LABELS:: Set variable labels for variables.
142 * VECTOR:: Declare an array of variables.
143 * WRITE FORMATS:: Set variable write formats.
146 File: pspp.info, Node: ADD VALUE LABELS, Next: DISPLAY, Prev: Variable Attributes, Up: Variable Attributes
152 /var_list value 'label' [value 'label']...
154 ADD VALUE LABELS has the same syntax and purpose as VALUE LABELS (see
155 above), but it does not clear away value labels from the variables
156 before adding the ones specified.
159 File: pspp.info, Node: DISPLAY, Next: DISPLAY VECTORS, Prev: ADD VALUE LABELS, Up: Variable Attributes
164 DISPLAY {NAMES,INDEX,LABELS,VARIABLES,DICTIONARY,SCRATCH}
167 DISPLAY displays requested information on variables. Variables can
168 optionally be sorted alphabetically. The entire dictionary or just
169 specified variables can be described.
171 One of the following keywords can be present:
174 The variables' names are displayed.
177 The variables' names are displayed along with a value describing
178 their position within the active file dictionary.
181 Variable names, positions, and variable labels are displayed.
184 Variable names, positions, print and write formats, and missing
185 values are displayed.
188 Variable names, positions, print and write formats, missing values,
189 variable labels, and value labels are displayed.
192 Varible names are displayed, for scratch variables only (*note
193 Scratch Variables::).
195 If SORTED is specified, then the variables are displayed in ascending
196 order based on their names; otherwise, they are displayed in the order
197 that they occur in the active file dictionary.
200 File: pspp.info, Node: DISPLAY VECTORS, Next: FORMATS, Prev: DISPLAY, Up: Variable Attributes
207 The DISPLAY VECTORS command causes a list of the currently declared
208 vectors to be displayed.
211 File: pspp.info, Node: FORMATS, Next: LEAVE, Prev: DISPLAY VECTORS, Up: Variable Attributes
216 FORMATS var_list (fmt_spec).
218 The FORMATS command set the print and write formats for the specified
219 variables to the specified format specification. *Note Input/Output
222 Specify a list of variables followed by a format specification in
223 parentheses. The print and write formats of the specified variables
226 Additional lists of variables and formats may be included if they are
227 delimited by a slash (`/').
229 The FORMATS command takes effect immediately. It is not affected by
230 conditional and looping structures such as DO IF or LOOP.
233 File: pspp.info, Node: LEAVE, Next: MISSING VALUES, Prev: FORMATS, Up: Variable Attributes
240 The LEAVE command prevents the specified variables from being
241 reinitialized whenever a new case is processed.
243 Normally, when a data file is processed, every variable in the active
244 file is initialized to the system-missing value or spaces at the
245 beginning of processing for each case. When a variable has been
246 specified on LEAVE, this is not the case. Instead, that variable is
247 initialized to 0 (not system-missing) or spaces for the first case.
248 After that, it retains its value between cases.
250 This becomes useful for counters. For instance, in the example below
251 the variable SUM maintains a running total of the values in the ITEM
255 COMPUTE SUM=SUM+ITEM.
265 Partial output from this example:
272 It is best to use the LEAVE command immediately before invoking a
273 procedure command, because it is reset by certain transformations--for
274 instance, COMPUTE and IF. LEAVE is also reset by all procedure
278 File: pspp.info, Node: MISSING VALUES, Next: MODIFY VARS, Prev: LEAVE, Up: Variable Attributes
283 MISSING VALUES var_list (missing_values).
285 missing_values takes one of the following forms:
293 string1, string2, string3
294 As part of a range, LO or LOWEST may take the place of num1;
295 HI or HIGHEST may take the place of num2.
297 The MISSING VALUES command sets user-missing values for numeric and
298 short string variables. Long string variables may not have missing
301 Specify a list of variables, followed by a list of their user-missing
302 values in parentheses. Up to three discrete values may be given, or,
303 for numeric variables only, a range of values optionally accompanied by
304 a single discrete value. Ranges may be open-ended on one end, indicated
305 through the use of the keyword LO or LOWEST or HI or HIGHEST.
307 The MISSING VALUES command takes effect immediately. It is not
308 affected by conditional and looping constructs such as DO IF or LOOP.
311 File: pspp.info, Node: MODIFY VARS, Next: NUMERIC, Prev: MISSING VALUES, Up: Variable Attributes
317 /REORDER={FORWARD,BACKWARD} {POSITIONAL,ALPHA} (var_list)...
318 /RENAME=(old_names=new_names)...
319 /{DROP,KEEP}=var_list
322 The MODIFY VARS commands allows variables in the active file to be
323 reordered, renamed, or deleted from the active file.
325 At least one subcommand must be specified, and no subcommand may be
326 specified more than once. DROP and KEEP may not both be specified.
328 The REORDER subcommand changes the order of variables in the active
329 file. Specify one or more lists of variable names in parentheses. By
330 default, each list of variables is rearranged into the specified order.
331 To put the variables into the reverse of the specified order, put
332 keyword BACKWARD before the parentheses. To put them into alphabetical
333 order in the dictionary, specify keyword ALPHA before the parentheses.
334 BACKWARD and ALPHA may also be combined.
336 To rename variables in the active file, specify RENAME, an equals
337 sign (`='), and lists of the old variable names and new variable names
338 separated by another equals sign within parentheses. There must be the
339 same number of old and new variable names. Each old variable is
340 renamed to the corresponding new variable name. Multiple parenthesized
341 groups of variables may be specified.
343 The DROP subcommand deletes a specified list of variables from the
346 The KEEP subcommand keeps the specified list of variables in the
347 active file. Any unlisted variables are delete from the active file.
349 MAP is currently ignored.
351 MODIFY VARS takes effect immediately. It does not cause the data to
355 File: pspp.info, Node: NUMERIC, Next: PRINT FORMATS, Prev: MODIFY VARS, Up: Variable Attributes
360 NUMERIC /var_list [(fmt_spec)].
362 The NUMERIC command explicitly declares new numeric variables,
363 optionally setting their output formats.
365 Specify a slash (`/'), followed by the names of the new numeric
366 variables. If you wish to set their output formats, follow their names
367 by an output format specification in parentheses (*note Input/Output
368 Formats::). If no output format specification is given then the
369 variables will default to F8.2.
371 Variables created with NUMERIC will be initialized to the
372 system-missing value.
375 File: pspp.info, Node: PRINT FORMATS, Next: RENAME VARIABLES, Prev: NUMERIC, Up: Variable Attributes
380 PRINT FORMATS var_list (fmt_spec).
382 The PRINT FORMATS command sets the print formats for the specified
383 variables to the specified format specification.
385 Syntax is identical to that of FORMATS (*note FORMATS::), but the
386 PRINT FORMATS command sets only print formats, not write formats.
389 File: pspp.info, Node: RENAME VARIABLES, Next: VALUE LABELS, Prev: PRINT FORMATS, Up: Variable Attributes
394 RENAME VARIABLES (old_names=new_names)... .
396 The RENAME VARIABLES command allows the names of variables in the
397 active file to be changed.
399 To rename variables, specify lists of the old variable names and new
400 variable names, separated by an equals sign (`='), within parentheses.
401 There must be the same number of old and new variable names. Each old
402 variable is renamed to the corresponding new variable name. Multiple
403 parenthesized groups of variables may be specified.
405 RENAME VARIABLES takes effect immediately. It does not cause the
409 File: pspp.info, Node: VALUE LABELS, Next: STRING, Prev: RENAME VARIABLES, Up: Variable Attributes
415 /var_list value 'label' [value 'label']...
417 The VALUE LABELS command allows values of numeric and short string
418 variables to be associated with labels. In this way, a short value can
419 stand for a long value.
421 In order to set up value labels for a set of variables, specify the
422 variable names after a slash (`/'), followed by a list of values and
423 their associated labels, separated by spaces.
425 Before the VALUE LABELS command is executed, any existing value
426 labels are cleared from the variables specified.
429 File: pspp.info, Node: STRING, Next: VARIABLE LABELS, Prev: VALUE LABELS, Up: Variable Attributes
434 STRING /var_list (fmt_spec).
436 The STRING command creates new string variables for use in
439 Specify a slash (`/'), followed by the names of the string variables
440 to create and the desired output format specification in parentheses
441 (*note Input/Output Formats::). Variable widths are implicitly derived
442 from the specified output formats.
444 Created variables are initialized to spaces.
447 File: pspp.info, Node: VARIABLE LABELS, Next: VECTOR, Prev: STRING, Up: Variable Attributes
453 /var_list 'var_label'.
455 The VARIABLE LABELS command is used to associate an explanatory name
456 with a group of variables. This name (a variable label) is displayed by
457 statistical procedures.
459 To assign a variable label to a group of variables, specify a slash
460 (`/'), followed by the list of variable names and the variable label as
464 File: pspp.info, Node: VECTOR, Next: WRITE FORMATS, Prev: VARIABLE LABELS, Up: Variable Attributes
469 Two possible syntaxes:
470 VECTOR vec_name=var_list.
471 VECTOR vec_name_list(count).
473 The VECTOR command allows a group of variables to be accessed as if
474 they were consecutive members of an array with a vector(index) notation.
476 To make a vector out of a set of existing variables, specify a name
477 for the vector followed by an equals sign (`=') and the variables that
478 belong in the vector.
480 To make a vector and create variables at the same time, specify one
481 or more vector names followed by a count in parentheses. This will
482 cause variables named `VEC1' through `VECCOUNT' to be created as
483 numeric variables. Variable names including numeric suffixes may not
484 exceed 8 characters in length, and none of the variables may exist
485 prior to the VECTOR command.
487 All the variables in a vector must be the same type.
489 Vectors created with VECTOR disappear after any procedure or
490 procedure-like command is executed. The variables contained in the
491 vectors remain, unless they are scratch variables (*note Scratch
494 Variables within a vector may be references in expressions using
495 vector(index) syntax.
498 File: pspp.info, Node: WRITE FORMATS, Prev: VECTOR, Up: Variable Attributes
503 WRITE FORMATS var_list (fmt_spec).
505 The WRITE FORMATS command sets the write formats for the specified
506 variables to the specified format specification.
508 Syntax is identical to that of FORMATS (*note FORMATS::), but the
509 WRITE FORMATS command sets only write formats, not print formats.
512 File: pspp.info, Node: Data Manipulation, Next: Data Selection, Prev: Variable Attributes, Up: Top
517 The PSPP procedures examined in this chapter manipulate data and
518 prepare the active file for later analyses. They do not produce output,
523 * AGGREGATE:: Summarize multiple cases into a single case.
524 * AUTORECODE:: Automatic recoding of variables.
525 * COMPUTE:: Assigning a variable a calculated value.
526 * COUNT:: Counting variables with particular values.
527 * FLIP:: Exchange variables with cases.
528 * IF:: Conditionally assigning a calculated value.
529 * RECODE:: Mapping values from one set to another.
530 * SORT CASES:: Sort the active file.
533 File: pspp.info, Node: AGGREGATE, Next: AUTORECODE, Prev: Data Manipulation, Up: Data Manipulation
541 /OUTFILE={*,'filename'}
544 /dest_vars=agr_func(src_vars, args...)...
546 The AGGREGATE command summarizes groups of cases into single cases.
547 Cases are divided into groups that have the same values for one or more
548 variables called "break variables". Several functions are available
549 for summarizing case contents.
551 BREAK is the only required subcommand (in addition, at least one
552 aggregation variable must be specified). Specify a list of variable
553 names. The values of these variables are used to divide the active file
554 into groups to be summarized.
556 By default, the active file is sorted based on the break variables
557 before aggregation takes place. If the active file is already sorted,
558 specify PRESORTED to save time.
560 The OUTFILE subcommand specifies a system file by file name string or
561 file handle (*note FILE HANDLE::). The aggregated cases are sent to
562 this file. If OUTFILE is not specified, or if `*' is specified, then
563 the aggregated cases replace the active file.
565 Normally the aggregate file does not receive the documents from the
566 active file, even if the aggregate file replaces the active file.
567 Specify DOCUMENT to have the documents from the active file copied to
570 At least one aggregation variable must be specified. Specify a list
571 of aggregation variables, an equals sign (`='), an aggregation function
572 name (see the list below), and a list of source variables in
573 parentheses. In addition, some aggregation functions expect additional
574 arguments in the parentheses following the source variable names.
576 There must be exactly as many source variables as aggregation
577 variables. Each aggregation variable receives the results of applying
578 the specified aggregation function to the corresponding source
579 variable. Most aggregation functions may be applied to numeric and
580 short and long string variables. Others are restricted to numeric
581 values; these are marked as such in this list below.
583 Any number of sets of aggregation variables may be specified.
585 The available aggregation functions are as follows:
588 Sum. Limited to numeric values.
591 Arithmetic mean. Limited to numeric values.
594 Standard deviation of the mean. Limited to numeric values.
604 Fraction between 0 and 1, or percentage between 0 and 100,
605 respectively, of values greater than the specified constant.
609 Fraction or percentage, respectively, of values less than the
612 FIN(var_name, low, high)
613 PIN(var_name, low, high)
614 Fraction or percentage, respectively, of values within the
615 specified inclusive range of constants.
617 FOUT(var_name, low, high)
618 POUT(var_name, low, high)
619 Fraction or percentage, respectively, of values strictly outside
620 the specified range of constants.
623 Number of non-missing values.
626 Number of cases aggregated to form this group. Don't supply a
627 source variable for this aggregation function.
630 Number of non-missing values. Each case is considered to have a
631 weight of 1, regardless of the current weighting variable (*note
635 Number of cases aggregated to form this group. Each case is
636 considered to have a weight of 1, regardless of the current
640 Number of missing values.
643 Number of missing values. Each case is considered to have a
644 weight of 1, regardless of the current weighting variable.
647 First value in this group.
650 Last value in this group.
652 When string values are compared by aggregation functions, they are
653 done in terms of internal character codes. On most modern computers,
654 this is a form of ASCII.
656 In addition, there is a parallel set of aggregation functions having
657 the same names as those above, but with a dot after the last character
658 (for instance, `SUM.'). These functions are the same as the above,
659 except that they cause user-missing values, which are normally excluded
660 from calculations, to be included.
662 Normally, only a single case (2 for SD and SD.) need be non-missing
663 in each group in order for the aggregate variable to be non-missing. If
664 /MISSING=COLUMNWISE is specified, the behavior reverses: that is, a
665 single missing value is enough to make the aggregate variable become a
668 AGGREGATE ignores the current SPLIT FILE settings and causes them to
669 be canceled (*note SPLIT FILE::).
672 File: pspp.info, Node: AUTORECODE, Next: COMPUTE, Prev: AGGREGATE, Up: Data Manipulation
677 AUTORECODE VARIABLES=src_vars INTO dest_vars
681 The AUTORECODE procedure considers the N values that a variable
682 takes on and maps them onto values 1...N on a new numeric variable.
684 Subcommand VARIABLES is the only required subcommand and must come
685 first. Specify VARIABLES, an equals sign (`='), a list of source
686 variables, INTO, and a list of target variables. There must the same
687 number of source and target variables. The target variables must not
690 By default, increasing values of a source variable (for a string,
691 this is based on character code comparisons) are recoded to increasing
692 values of its target variable. To cause increasing values of a source
693 variable to be recoded to decreasing values of its target variable (N
694 down to 1), specify DESCENDING.
696 PRINT is currently ignored.
698 AUTORECODE is a procedure. It causes the data to be read.
701 File: pspp.info, Node: COMPUTE, Next: COUNT, Prev: AUTORECODE, Up: Data Manipulation
706 COMPUTE var_name = expression.
708 `COMPUTE' creates a variable with the name specified (if necessary),
709 then evaluates the given expression for every case and assigns the
710 result to the variable. *Note Expressions::.
712 Numeric variables created or computed by `COMPUTE' are assigned an
713 output width of 8 character with two decimal places (`F8.2'). String
714 variables created or computed by `COMPUTE' have the same width as the
715 existing variable or constant.
717 COMPUTE is a transformation. It does not cause the active file to be
721 File: pspp.info, Node: COUNT, Next: FLIP, Prev: COMPUTE, Up: Data Manipulation
726 COUNT var_name = var... (value...).
728 Each value takes one of the following forms:
734 In addition, num1 and num2 can be LO or LOWEST, or HI or HIGHEST,
737 `COUNT' creates or replaces a numeric "target" variable that counts
738 the occurrence of a "criterion" value or set of values over one or more
739 "test" variables for each case.
741 The target variable values are always nonnegative integers. They are
742 never missing. The target variable is assigned an F8.2 output format.
743 *Note Input/Output Formats::. Any variables, including long and short
744 string variables, may be test variables.
746 User-missing values of test variables are treated just like any other
747 values. They are *not* treated as system-missing values. User-missing
748 values that are criterion values or inside ranges of criterion values
749 are counted as any other values. However (for numeric variables),
750 keyword `MISSING' may be used to refer to all system- and user-missing
753 `COUNT' target variables are assigned values in the order specified.
754 In the command `COUNT A=A B(1) /B=A B(2).', the following actions
757 - The number of occurrences of 1 between `A' and `B' is counted.
759 - `A' is assigned this value.
761 - The number of occurrences of 1 between `B' and the *new* value of
764 - `B' is assigned this value.
766 Despite this ordering, all `COUNT' criterion variables must exist
767 before the procedure is executed--they may not be created as target
768 variables earlier in the command! Break such a command into two
771 The examples below may help to clarify.
773 A. Assuming `Q0', `Q2', ..., `Q9' are numeric variables, the
776 1. Count the number of times the value 1 occurs through these
777 variables for each case and assigns the count to variable
780 2. Print out the total number of times the value 1 occurs
781 throughout _all_ cases using `DESCRIPTIVES'. *Note
782 DESCRIPTIVES::, for details.
784 COUNT QCOUNT=Q0 TO Q9(1).
785 DESCRIPTIVES QCOUNT /STATISTICS=SUM.
787 B. Given these same variables, the following commands:
789 1. Count the number of valid values of these variables for each
790 case and assigns the count to variable `QVALID'.
792 2. Multiplies each value of `QVALID' by 10 to obtain a
793 percentage of valid values, using `COMPUTE'. *Note
794 COMPUTE::, for details.
796 3. Print out the percentage of valid values across all cases,
797 using `DESCRIPTIVES'. *Note DESCRIPTIVES::, for details.
799 COUNT QVALID=Q0 TO Q9 (LO THRU HI).
800 COMPUTE QVALID=QVALID*10.
801 DESCRIPTIVES QVALID /STATISTICS=MEAN.
804 File: pspp.info, Node: FLIP, Next: IF, Prev: COUNT, Up: Data Manipulation
809 FLIP /VARIABLES=var_list /NEWNAMES=var_name.
811 The FLIP command transposes rows and columns in the active file. It
812 causes cases to be swapped with variables, and vice versa.
814 There are no required subcommands. The VARIABLES subcommand
815 specifies variables that will be transformed into cases. Variables not
816 specified are discarded. By default, all variables are selected for
819 The variables specified by NEWNAMES, which must be a string
820 variable, is used to give names to the variables created by FLIP. If
821 NEWNAMES is not specified then the default is a variable named
822 CASE_LBL, if it exists. If it does not then the variables created by
823 FLIP are named VAR000 through VAR999, then VAR1000, VAR1001, and so on.
825 When a NEWNAMES variable is available, the names must be
826 canonicalized before becoming variable names. Invalid characters are
827 replaced by letter `V' in the first position, or by `_' in subsequent
828 positions. If the name thus generated is not unique, then numeric
829 extensions are added, starting with 1, until a unique name is found or
830 there are no remaining possibilities. If the latter occurs then the
831 FLIP operation aborts.
833 The resultant dictionary contains a CASE_LBL variable, which stores
834 the names of the variables in the dictionary before the transposition.
835 If the active file is subsequently transposed using FLIP, this variable
836 can be used to recreate the original variable names.
839 File: pspp.info, Node: IF, Next: RECODE, Prev: FLIP, Up: Data Manipulation
844 Two possible syntaxes:
845 IF test_expr target_var=target_expr.
846 IF test_expr target_vec(target_index)=target_expr.
848 The IF transformation conditionally assigns the value of a target
849 expression to a target variable, based on the truth of a test
852 Specify a boolean-valued expression (*note Expressions::) to be
853 tested following the IF keyword. This expression is calculated for
854 each case. If the value is true, then the value of target_expr is
855 computed and assigned to target_var. If the value is false or missing,
856 nothing is done. Numeric and short and long string variables may be
857 used. The type of target_expr must match the type of target_var.
859 For numeric variables only, target_var need not exist before the IF
860 transformation is executed. In this case, target_var is assigned the
861 system-missing value if the IF condition is not true. String variables
862 must be declared before they can be used as targets for IF.
864 In addition to ordinary variables, the target variable may be an
865 element of a vector. In this case, the vector index must be specified
866 in parentheses following the vector name.
869 File: pspp.info, Node: RECODE, Next: SORT CASES, Prev: IF, Up: Data Manipulation
874 RECODE var_list (src_value...=dest_value)... [INTO var_list].
876 src_value may take the following forms:
883 Open-ended ranges may be specified using LO or LOWEST for num1
884 or HI or HIGHEST for num2.
886 dest_value may take the following forms:
892 The RECODE command is used to translate data from one range of
893 values to another, using flexible user-specified mappings. Data may be
894 remapped in-place or copied to new variables. Numeric, short string,
895 and long string data can be recoded.
897 Specify the list of source variables, followed by one or more mapping
898 specifications each enclosed in parentheses. If the data is to be
899 copied to new variables, specify INTO, then the list of target
900 variables. String target variables must already have been declared
901 using STRING or another transformation, but numeric target variables can
902 be created on the fly. There must be exactly as many target variables
903 as source variables. Each source variable is remapped into its
904 corresponding target variable.
906 When INTO is not used, the input and output variables must be of the
907 same type. Otherwise, string values can be recoded into numeric values,
908 and vice versa. When this is done and there is no mapping for a
909 particular value, either a value consisting of all spaces or the
910 system-missing value is assigned, depending on variable type.
912 Mappings are considered from left to right. The first src_value that
913 matches the value of the source variable causes the target variable to
914 receive the value indicated by the dest_value. Literal number, string,
915 and range src_value's should be self-explanatory. MISSING as a
916 src_value matches any user- or system-missing value. SYSMIS matches the
917 system missing value only. ELSE is a catch-all that matches anything.
918 It should be the last src_value specified.
920 Numeric and string dest_value's should also be self-explanatory.
921 COPY causes the input values to be copied to the output. This is only
922 value if the source and target variables are of the same type. SYSMIS
923 indicates the system-missing value.
925 If the source variables are strings and the target variables are
926 numeric, then there is one additional mapping available: (CONVERT),
927 which must be the last specified mapping. CONVERT causes a number
928 specified as a string to be converted to a numeric value. If the string
929 cannot be parsed as a number, then the system-missing value is assigned.
931 Multiple recodings can be specified on the same RECODE command.
932 Introduce additional recodings with a slash (`/') in order to separate
933 them from the previous recodings.
936 File: pspp.info, Node: SORT CASES, Prev: RECODE, Up: Data Manipulation
941 SORT CASES BY var_list.
943 SORT CASES sorts the active file by the values of one or more
946 Specify BY and a list of variables to sort by. By default, variables
947 are sorted in ascending order. To override sort order, specify (D) or
948 (DOWN) after a list of variables to get descending order, or (A) or (UP)
949 for ascending order. These apply to the entire list of variables
952 SORT CASES is a procedure. It causes the data to be read.
954 SORT CASES will attempt to sort the entire active file in main
955 memory. If main memory is exhausted then it will use a merge sort
956 algorithm that involves writing and reading numerous temporary files.
957 Environment variables determine the temporary files' location. The
958 first of SPSSTMPDIR, SPSSXTMPDIR, or TMPDIR that is set determines the
959 location. Otherwise, if the compiler environment defined P_tmpdir,
960 that is used. Otherwise, under Unix-like OSes /tmp is used; under
961 MS-DOS, the first of TEMP, TMP, or root on the current drive is used;
962 under other OSes, the current directory.
965 File: pspp.info, Node: Data Selection, Next: Conditionals and Looping, Prev: Data Manipulation, Up: Top
967 Selecting data for analysis
968 ***************************
970 This chapter documents PSPP commands that temporarily or permanently
971 select data records from the active file for analysis.
975 * FILTER:: Exclude cases based on a variable.
976 * N OF CASES:: Limit the size of the active file.
977 * PROCESS IF:: Temporarily excluding cases.
978 * SAMPLE:: Select a specified proportion of cases.
979 * SELECT IF:: Permanently delete selected cases.
980 * SPLIT FILE:: Do multiple analyses with one command.
981 * TEMPORARY:: Make transformations' effects temporary.
982 * WEIGHT:: Weight cases by a variable.
985 File: pspp.info, Node: FILTER, Next: N OF CASES, Prev: Data Selection, Up: Data Selection
993 The FILTER command allows a boolean-valued variable to be used to
994 select cases from the data stream for processing.
996 In order to set up filtering, specify BY and a variable name.
997 Keyword BY is optional but recommended. Cases which have a zero or
998 system- or user-missing value are excluded from analysis, but not
999 deleted from the data stream. Cases with other values are analyzed.
1001 Use FILTER OFF to turn off case filtering.
1003 Filtering takes place immediately before cases pass to a procedure
1004 for analysis. Only one filter variable may be active at once.
1005 Normally, case filtering continues until it is explicitly turned off
1006 with FILTER OFF. However, if FILTER is placed after TEMPORARY, then
1007 filtering stops after execution of the next procedure or procedure-like
1011 File: pspp.info, Node: N OF CASES, Next: PROCESS IF, Prev: FILTER, Up: Data Selection
1016 N [OF CASES] num_of_cases [ESTIMATED].
1018 Sometimes you may want to disregard cases of your input. The `N'
1019 command can be used to do this. `N 100' tells PSPP to disregard all
1020 cases after the first 100.
1022 If the value specified for `N' is greater than the number of cases
1023 read in, the value is ignored.
1025 `N' does not discard cases or cause them not to be read in. It just
1026 causes cases beyond the last one specified to be ignored by data
1029 A later `N' command can increase or decrease the number of cases
1030 selected. (To select all the cases without knowing how many there are,
1031 specify a very high number: 100000 or whatever you think is large
1034 Transformation procedures performed after `N' is executed _do_ cause
1035 cases to be discarded.
1037 The `SAMPLE', `PROCESS IF', and `SELECT IF' commands have precedence
1038 over `N'--the same results are obtained by both of the following
1039 fragments, given the same random number seeds:
1041 ...set up, read in data...
1046 ...set up, read in data...
1051 Both fragments above first randomly sample approximately half of the
1052 cases, then select the first 100 of those sampled.
1054 `N' with the `ESTIMATED' keyword can be used to give an estimated
1055 number of cases before DATA LIST or another command to read in data.
1056 (`ESTIMATED' never limits the number of cases processed by procedures.)
1059 File: pspp.info, Node: PROCESS IF, Next: SAMPLE, Prev: N OF CASES, Up: Data Selection
1064 PROCESS IF expression.
1066 The PROCESS IF command is used to temporarily eliminate cases from
1067 the data stream. Its effects are active only through the execution of
1068 the next procedure or procedure-like command.
1070 Specify a boolean expression (*note Expressions::). If the value of
1071 the expression is true for a particular case, the case will be
1072 analyzed. If the expression has a false or missing value, then the
1073 case will be deleted from the data stream for this procedure only.
1075 Regardless of its placement relative to other commands, PROCESS IF
1076 always takes effect immediately before data passes to the procedure.
1077 Only one PROCESS IF command may be in effect at any given time.
1079 The effects of PROCESS IF are similar not identical to the effects of
1080 executing TEMPORARY then SELECT IF (*note SELECT IF::).
1082 Use of PROCESS IF is deprecated. It is included for compatibility
1083 with old command files. New syntax files should use SELECT IF or FILTER
1087 File: pspp.info, Node: SAMPLE, Next: SELECT IF, Prev: PROCESS IF, Up: Data Selection
1092 SAMPLE num1 [FROM num2].
1094 `SAMPLE' is used to randomly sample a proportion of the cases in the
1095 active file. `SAMPLE' is temporary, affecting only the next procedure,
1096 unless that is a data transformation, such as `SELECT IF' or `RECODE'.
1098 The proportion to sample can be expressed as a single number between
1099 0 and 1. If `k' is the number specified, and `N' is the number of
1100 currently-selected cases in the active file, then after `SAMPLE K.',
1101 there will be `k*N', plus or minus one, cases selected.
1103 The proportion to sample can also be specified in the style `SAMPLE
1104 M FROM N'. With this style, cases are selected as follows:
1106 1. If N is equal to the number of currently-selected cases in the
1107 active file, exactly M cases will be selected.
1109 2. If N is greater than the number of currently-selected cases in the
1110 active file, an equivalent proportion of cases will be selected.
1112 3. If N is less than the number of currently-selected cases in the
1113 active, exactly M cases will be selected _from the first N cases
1114 in the active file._
1116 `SAMPLE', `SELECT IF', and `PROCESS IF' are performed in the order
1117 specified by the syntax file.
1119 `SAMPLE' is ignored before `SORT CASES'.
1121 `SAMPLE' is always performed before `N OF CASES', regardless of
1122 ordering in the syntax file. *Note N OF CASES::.
1124 The same values for `SAMPLE' may result in different samples. To
1125 obtain the same sample, use the `SET' command to set the random number
1126 seed to the same value before each `SAMPLE'. By default, the random
1127 number seed is based on the system time.
1130 File: pspp.info, Node: SELECT IF, Next: SPLIT FILE, Prev: SAMPLE, Up: Data Selection
1135 SELECT IF expression.
1137 The SELECT IF command is used to select particular cases for analysis
1138 based on the value of a boolean expression. Cases not selected are
1139 permanently eliminated, unless TEMPORARY is in effect (*note
1142 Specify a boolean expression (*note Expressions::). If the value of
1143 the expression is true for a particular case, the case will be
1144 analyzed. If the expression has a false or missing value, then the
1145 case will be deleted from the data stream.
1147 Always place SELECT IF commands as early in the command file as
1148 possible. Cases that are deleted early can be processed more
1149 efficiently in time and space.
1152 File: pspp.info, Node: SPLIT FILE, Next: TEMPORARY, Prev: SELECT IF, Up: Data Selection
1157 Two possible syntaxes:
1158 SPLIT FILE BY var_list.
1161 The SPLIT FILE command allows multiple sets of data present in one
1162 data file to be analyzed separately using single statistical procedure
1165 Specify a list of variable names in order to analyze multiple sets of
1166 data separately. Groups of cases having the same values for these
1167 variables are analyzed by statistical procedure commands as one group.
1168 An independent analysis is carried out for each group of cases, and the
1169 variable values for the group are printed along with the analysis.
1171 Specify OFF in order to disable SPLIT FILE and resume analysis of the
1172 entire active file as a single group of data.
1175 File: pspp.info, Node: TEMPORARY, Next: WEIGHT, Prev: SPLIT FILE, Up: Data Selection
1182 The TEMPORARY command is used to make the effects of transformations
1183 following its execution temporary. These transformations will affect
1184 only the execution of the next procedure or procedure-like command.
1185 Their effects will not be saved to the active file.
1187 The only specification is the command name.
1189 TEMPORARY may not appear within a DO IF or LOOP construct. It may
1190 appear only once between procedures and procedure-like commands.
1192 An example may help to clarify:
1209 The data read by the first DESCRIPTIVES command are 4, 5, 8, 10.5,
1210 13, 15. The data read by the first DESCRIPTIVES command are 1, 2, 5,
1214 File: pspp.info, Node: WEIGHT, Prev: TEMPORARY, Up: Data Selection
1222 WEIGHT can be used to assign cases varying weights in order to
1223 change the frequency distribution of the active file. Execution of
1224 WEIGHT is delayed until data have been read in.
1226 If a variable name is specified, WEIGHT causes the values of that
1227 variable to be used as weighting factors for subsequent statistical
1228 procedures. Use of keyword BY is optional but recommended. Weighting
1229 variables must be numeric. Scratch variables may not be used for
1230 weighting (*note Scratch Variables::).
1232 When OFF is specified, subsequent statistical procedures will weight
1235 Weighting values do not need to be integers. However, negative and
1236 system- and user-missing values for the weighting variable are
1237 interpreted as weighting factors of 0.
1239 WEIGHT does not cause cases in the active file to be replicated in
1243 File: pspp.info, Node: Conditionals and Looping, Next: Statistics, Prev: Data Selection, Up: Top
1245 Conditional and Looping Constructs
1246 **********************************
1248 This chapter documents PSPP commands used for conditional execution,
1249 looping, and flow of control.
1253 * BREAK:: Exit a loop.
1254 * DO IF:: Conditionally execute a block of code.
1255 * DO REPEAT:: Textually repeat a code block.
1256 * LOOP:: Repeat a block of code.
1259 File: pspp.info, Node: BREAK, Next: DO IF, Prev: Conditionals and Looping, Up: Conditionals and Looping
1266 BREAK terminates execution of the innermost currently executing LOOP
1269 BREAK is allowed only inside a LOOP construct. *Note LOOP::, for
1273 File: pspp.info, Node: DO IF, Next: DO REPEAT, Prev: BREAK, Up: Conditionals and Looping
1287 The DO IF command allows one of several sets of transformations to be
1288 executed, depending on user-specified conditions.
1290 Specify a boolean expression. If the condition is true, then the
1291 block of code following DO IF is executed. If the condition is
1292 missing, then none of the code blocks is executed. If the condition is
1293 false, then the boolean expressions on the first ELSE IF, if present,
1294 is tested in turn, with the same rules applied. If all expressions
1295 evaluate to false, then the ELSE code block is executed, if it is
1299 File: pspp.info, Node: DO REPEAT, Next: LOOP, Prev: DO IF, Up: Conditionals and Looping
1304 DO REPEAT repvar_name=expansion....
1308 expansion takes one of the following forms:
1313 num_or_range takes one of the following forms:
1317 The DO REPEAT command causes a block of code to be repeated a number
1318 of times with different variables, numbers, or strings textually
1319 substituted into the block with each repetition.
1321 Specify a repeat variable name followed by an equals sign (`=') and
1322 the list of replacements. Replacements can be a list of variables
1323 (which may be existing variables or new variables or a combination
1324 thereof), of numbers, or of strings. When new variable names are
1325 specified, DO REPEAT creates them as numeric variables. When numbers
1326 are specified, runs of integers may be indicated with TO notation, for
1327 instance `1 TO 5' and `1 2 3 4 5' would be equivalent. There is no
1328 equivalent notation for string values.
1330 Multiple repeat variables can be specified. When this is done, each
1331 variable must have the same number of replacements.
1333 The code within DO REPEAT is repeated as many times as there are
1334 replacements for each variable. The first time, the first value for
1335 each repeat variable is substituted; the second time, the second value
1336 for each repeat variable is substituted; and so on.
1338 Repeat variable substitutions work like macros. They take place
1339 anywhere in a line that the repeat variable name occurs as a token,
1340 including command and subcommand names. For this reason it is not a
1341 good idea to select words commonly used in command and subcommand names
1342 as repeat variable identifiers.
1344 If PRINT is specified on END REPEAT, the commands after substitutions
1345 are made are printed to the listing file, prefixed by a plus sign (`+').
1348 File: pspp.info, Node: LOOP, Prev: DO REPEAT, Up: Conditionals and Looping
1353 LOOP [index_var=start TO end [BY incr]] [IF condition].
1355 END LOOP [IF condition].
1357 The LOOP command allows a group of commands to be iterated. A
1358 number of termination options are offered.
1360 Specify index_var in order to make that variable count from one
1361 value to another by a particular increment. index_var must be a
1362 pre-existing numeric variable. start, end, and incr are numeric
1363 expressions (*note Expressions::.)
1365 During the first iteration, index_var is set to the value of start.
1366 During each successive iteration, index_var is increased by the value of
1367 incr. If end > start, then the loop terminates when index_var > end;
1368 otherwise it terminates when index_var < end. If incr is not specified
1369 then it defaults to +1 or -1 as appropriate.
1371 If end > start and incr < 0, or if end < start and incr > 0, then the
1372 loop is never executed. index_var is nevertheless set to the value of
1375 Modifying index_var within the loop is allowed, but it has no effect
1376 on the value of index_var in the next iteration.
1378 Specify a boolean expression for the condition on the LOOP command to
1379 cause the loop to be executed only if the condition is true. If the
1380 condition is false or missing before the loop contents are executed the
1381 first time, the loop contents are not executed at all.
1383 If index and condition clauses are both present on LOOP, the index
1384 clause is always evaluated first.
1386 Specify a boolean expression for the condition on the END LOOP to
1387 cause the loop to terminate if the condition is not true after the
1388 enclosed code block is executed. The condition is evaluated at the end
1389 of the loop, not at the beginning.
1391 If the index clause and both condition clauses are not present, then
1392 the loop is executed MXLOOPS (*note SET::) times or until BREAK (*note
1393 BREAK::) is executed.
1395 The BREAK command provides another way to terminate execution of a
1399 File: pspp.info, Node: Statistics, Next: Utilities, Prev: Conditionals and Looping, Up: Top
1404 This chapter documents the statistical procedures that PSPP supports
1409 * DESCRIPTIVES:: Descriptive statistics.
1410 * FREQUENCIES:: Frequency tables.
1411 * CROSSTABS:: Crosstabulation tables.