+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
@c (modify-syntax-entry ?_ "w")
@c (modify-syntax-entry ?' "'")
@c (modify-syntax-entry ?@ "'")
@cindex cases
@cindex observations
-Data are the focus of the @pspp{} language.
+Data are the focus of the @pspp{} language.
Each datum belongs to a @dfn{case} (also called an @dfn{observation}).
Each case represents an individual or ``experimental unit''.
For example, in the results of a survey, the names of the respondents,
* INPUT PROGRAM:: Support for complex input programs.
* LIST:: List cases in the active dataset.
* NEW FILE:: Clear the active dataset.
-* MATRIX DATA:: Defining matrix material for procedures.
* PRINT:: Display values in print formats.
* PRINT EJECT:: Eject the current page then print.
* PRINT SPACE:: Print blank lines.
``empty,'' that is, it has no dictionary or data. If a dataset with
the given name already exists, this has no effect. The new dataset
can be used with commands that support output to a dataset,
-e.g. AGGREGATE (@pxref{AGGREGATE}).
+@i{e.g.} AGGREGATE (@pxref{AGGREGATE}).
@vindex DATASET CLOSE
The DATASET CLOSE command deletes a dataset. If the active dataset is
that contains variable names, for example.
@cmd{DATA LIST} can optionally output a table describing how the data file
-will be read. The @subcmd{TABLE} subcommand enables this output, and
+is read. The @subcmd{TABLE} subcommand enables this output, and
@subcmd{NOTABLE} disables it. The default is to output the table.
The list of variables to be read from the data list must come last.
In columnar style, to use a variable format other than the default,
specify the format type in parentheses after the column numbers. For
-instance, for alphanumeric @samp{A} format, use @samp{(A)}.
+instance, for alphanumeric @samp{A} format, use @samp{(A)}.
In addition, implied decimal places can be specified in parentheses
after the column numbers. As an example, suppose that a data file has a
leaves the active column immediately after the ending column
specified. Record motion using @code{NEWREC} in FORTRAN style also
applies to later FORTRAN and columnar specifiers.
-
+
@menu
* DATA LIST FIXED Examples:: Examples of DATA LIST FIXED.
@end menu
[(@{TAB,'@var{c}'@}, @dots{})]
[@{NOTABLE,TABLE@}]
[FILE='@var{file_name}' [ENCODING='@var{encoding}']]
- [SKIP=@var{record_cnt}]
+ [SKIP=@var{n_records}]
/@var{var_spec}@dots{}
where each @var{var_spec} takes one of the forms
This list must be introduced by a single slash (@samp{/}). The set of
variable names may contain format specifications in parentheses
(@pxref{Input and Output Formats}). Format specifications apply to all
-variables back to the previous parenthesized format specification.
+variables back to the previous parenthesized format specification.
In addition, an asterisk may be used to indicate that all variables
preceding it are to have input/output format @samp{F8.0}.
for ENCODING on the INSERT command are supported (@pxref{INSERT}).
For reading in other file-based modes, encoding autodetection is not
supported; if the specified encoding requests autodetection then the
-default encoding will be used. This is also true when a file handle
+default encoding is used. This is also true when a file handle
is used for writing a file in any mode.
@node INPUT PROGRAM
@cmd{INPUT PROGRAM} must contain at least one @cmd{DATA LIST} or
@cmd{END FILE} command.
-All this is very confusing. A few examples should help to clarify.
+@subheading Example 1: Read two files in parallel to the end of the shorter
+
+The following example reads variable X from file @file{a.txt} and
+variable Y from file @file{b.txt}. If one file is shorter than the
+other then the extra data in the longer file is ignored.
-@c If you change this example, change the regression test1 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- DATA LIST NOTABLE FILE='a.data'/X 1-10.
- DATA LIST NOTABLE FILE='b.data'/Y 1-10.
+ DATA LIST NOTABLE FILE='a.txt'/X 1-10.
+ DATA LIST NOTABLE FILE='b.txt'/Y 1-10.
END INPUT PROGRAM.
LIST.
@end example
-The example above reads variable X from file @file{a.data} and variable
-Y from file @file{b.data}. If one file is shorter than the other then
-the extra data in the longer file is ignored.
+@subheading Example 2: Read two files in parallel, supplementing the shorter
+
+The following example also reads variable X from @file{a.txt} and
+variable Y from @file{b.txt}. If one file is shorter than the other
+then it continues reading the longer to its end, setting the other
+variable to system-missing.
-@c If you change this example, change the regression test2 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- NUMERIC #A #B.
-
- DO IF NOT #A.
- DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
- END IF.
- DO IF NOT #B.
- DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
- END IF.
- DO IF #A AND #B.
- END FILE.
- END IF.
- END CASE.
+ NUMERIC #A #B.
+
+ DO IF NOT #A.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ END IF.
+ DO IF NOT #B.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10.
+ END IF.
+ DO IF #A AND #B.
+ END FILE.
+ END IF.
+ END CASE.
END INPUT PROGRAM.
LIST.
@end example
-The above example reads variable X from @file{a.data} and variable Y from
-@file{b.data}. If one file is shorter than the other then the missing
-field is set to the system-missing value alongside the present value for
-the remaining length of the longer file.
+@subheading Example 3: Concatenate two files (version 1)
+
+The following example reads data from file @file{a.txt}, then from
+@file{b.txt}, and concatenates them into a single active dataset.
-@c If you change this example, change the regression test3 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- NUMERIC #A #B.
-
- DO IF #A.
- DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
- DO IF #B.
- END FILE.
- ELSE.
- END CASE.
- END IF.
+ NUMERIC #A #B.
+
+ DO IF #A.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10.
+ DO IF #B.
+ END FILE.
ELSE.
- DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
- DO IF NOT #A.
- END CASE.
- END IF.
+ END CASE.
+ END IF.
+ ELSE.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ DO IF NOT #A.
+ END CASE.
END IF.
+ END IF.
END INPUT PROGRAM.
LIST.
@end example
-The above example reads data from file @file{a.data}, then from
-@file{b.data}, and concatenates them into a single active dataset.
+@subheading Example 4: Concatenate two files (version 2)
+
+This is another way to do the same thing as Example 3.
-@c If you change this example, change the regression test4 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- NUMERIC #EOF.
-
- LOOP IF NOT #EOF.
- DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
- DO IF NOT #EOF.
- END CASE.
- END IF.
- END LOOP.
-
- COMPUTE #EOF = 0.
- LOOP IF NOT #EOF.
- DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
- DO IF NOT #EOF.
- END CASE.
- END IF.
- END LOOP.
+ NUMERIC #EOF.
- END FILE.
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ COMPUTE #EOF = 0.
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ END FILE.
END INPUT PROGRAM.
LIST.
@end example
-The above example does the same thing as the previous example, in a
-different way.
+@subheading Example 5: Generate random variates
+
+The follows example creates a dataset that consists of 50 random
+variates between 0 and 10.
-@c If you change this example, make similar changes to the regression
-@c test5 in tests/command/input-program.sh.
@example
INPUT PROGRAM.
- LOOP #I=1 TO 50.
- COMPUTE X=UNIFORM(10).
- END CASE.
- END LOOP.
- END FILE.
+ LOOP #I=1 TO 50.
+ COMPUTE X=UNIFORM(10).
+ END CASE.
+ END LOOP.
+ END FILE.
END INPUT PROGRAM.
-LIST/FORMAT=NUMBERED.
+LIST /FORMAT=NUMBERED.
@end example
-The above example causes an active dataset to be created consisting of 50
-random variates between 0 and 10.
-
@node LIST
@section LIST
@vindex LIST
@cmd{NEW FILE} command clears the dictionary and data from the current
active dataset.
-@node MATRIX DATA
-@section MATRIX DATA
-@vindex MATRIX DATA
-
-@display
-MATRIX DATA
- VARIABLES = @var{columns}
- [FILE='@var{file_name}'| INLINE @}
- [/FORMAT= [@{LIST | FREE@}]
- [@{UPPER | LOWER | FULL@}]
- [@{DIAGONAL | NODIAGONAL@}]]
- [/N= @var{n}]
- [/SPLIT= @var{split_variables}].
-@end display
-
-The @cmd{MATRIX DATA} command is used to input data in the form of matrices
-which can subsequently be used by other commands. If the
-@subcmd{FILE} is omitted or takes the value @samp{INLINE} then the command
-should immediately followed by @cmd{BEGIN DATA}, @xref{BEGIN DATA}.
-
-There is one mandatory subcommand, @i{viz:} @subcmd{VARIABLES}, which defines
-the @var{columns} of the matrix.
-Normally, the @var{columns} should include an item called @samp{ROWTYPE_}.
-The @samp{ROWTYPE_} column is used to specify the purpose of a row in the
-matrix.
-
-@example
-matrix data
- variables = rowtype_ var01 TO var08.
-
-begin data.
-mean 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
-sd 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
-n 92 92 92 92 92 92 92 92
-corr 1.00
-corr .18 1.00
-corr -.22 -.17 1.00
-corr .36 .31 -.14 1.00
-corr .27 .16 -.12 .22 1.00
-corr .33 .15 -.17 .24 .21 1.00
-corr .50 .29 -.20 .32 .12 .38 1.00
-corr .17 .29 -.05 .20 .27 .20 .04 1.00
-end data.
-@end example
-
-In the above example, the first three rows have ROWTYPE_ values of
-@samp{mean}, @samp{sd}, and @samp{n}. These indicate that the rows
-contain mean values, standard deviations and counts, respectively.
-All subsequent rows have a ROWTYPE_ of @samp{corr} which indicates
-that the values are correlation coefficients.
-
-Note that in this example, the upper right values of the @samp{corr}
-values are blank, and in each case, the rightmost value is unity.
-This is because, the
-@subcmd{FORMAT} subcommand defaults to @samp{LOWER DIAGONAL},
-which indicates that only the lower triangle is provided in the data.
-The opposite triangle is automatically inferred. One could instead
-specify the upper triangle as follows:
-
-
-@example
-matrix data
- variables = rowtype_ var01 TO var08
- /format = upper nodiagonal.
-
-begin data.
-mean 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
-sd 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
-n 92 92 92 92 92 92 92 92
-corr .17 .50 -.33 .27 .36 -.22 .18
-corr .29 .29 -.20 .32 .12 .38
-corr .05 .20 -.15 .16 .21
-corr .20 .32 -.17 .12
-corr .27 .12 -.24
-corr -.20 -.38
-corr .04
-end data.
-@end example
-
-In this example the @samp{NODIAGONAL} keyword is used. Accordingly
-the diagonal values of the matrix are omitted. This implies that
-there is one less @samp{corr} line than there are variables.
-If the @samp{FULL} option is passed to the @subcmd{FORMAT} subcommand,
-then all the matrix elements must be provided, including the diagonal
-elements.
-
-In the preceding examples, each matrix row has been specified on a
-single line. If you pass the keyword @var{FREE} to @subcmd{FORMAT}
-then the data may be data for several matrix rows may be specified on
-the same line, or a single row may be split across lines.
-
-The @subcmd{N} subcommand may be used to specify the number
-of valid cases for each variable. It should not be used if the
-data contains a record whose ROWTYPE_ column is @samp{N} or @samp{N_VECTOR}.
-It implies a @samp{N} record whose values are all @var{n}.
-That is to say,
-@example
-matrix data
- variables = rowtype_ var01 TO var04
- /format = upper nodiagonal
- /n = 99.
-begin data
-mean 34 35 36 37
-sd 22 11 55 66
-corr 9 8 7
-corr 6 5
-corr 4
-end data.
-@end example
-produces an effect identical to
-@example
-matrix data
- variables = rowtype_ var01 TO var04
- /format = upper nodiagonal
-begin data
-n 99 99 99 99
-mean 34 35 36 37
-sd 22 11 55 66
-corr 9 8 7
-corr 6 5
-corr 4
-end data.
-@end example
-
-
-The @subcmd{SPLIT} is used to indicate that variables are to be
-considered as split variables. For example, the following
-defines two matrices using the variable @samp{S1} to distinguish
-between them.
-
-@example
-matrix data
- variables = s1 rowtype_ var01 TO var04
- /split = s1
- /format = full diagonal.
-
-begin data
-0 mean 34 35 36 37
-0 sd 22 11 55 66
-0 n 99 98 99 92
-0 corr 1 9 8 7
-0 corr 9 1 6 5
-0 corr 8 6 1 4
-0 corr 7 5 4 1
-1 mean 44 45 34 39
-1 sd 23 15 51 46
-1 n 98 34 87 23
-1 corr 1 2 3 4
-1 corr 2 1 5 6
-1 corr 3 5 1 7
-1 corr 4 6 7 1
-end data.
-@end example
-
@node PRINT
@section PRINT
@vindex PRINT
@display
-PRINT
+PRINT
[OUTFILE='@var{file_name}']
[RECORDS=@var{n_lines}]
[@{NOTABLE,TABLE@}]
The @subcmd{OUTFILE} subcommand specifies the file to receive the output. The
file may be a file name as a string or a file handle (@pxref{File
-Handles}). If @subcmd{OUTFILE} is not present then output will be sent to
-@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, a space is
-inserted at beginning of each output line, even lines that otherwise
-would be blank.
+Handles}). If @subcmd{OUTFILE} is not present then output is sent to
+@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, the
+output is written to @var{file_name} in a plain text format, with a
+space inserted at beginning of each output line, even lines that
+otherwise would be blank.
The @subcmd{ENCODING} subcommand may only be used if the
@subcmd{OUTFILE} subcommand is also used. It specifies the character
Introduce the strings and variables to be printed with a slash
(@samp{/}). Optionally, the slash may be followed by a number
-indicating which output line will be specified. In the absence of this
-line number, the next line number will be specified. Multiple lines may
+indicating which output line is specified. In the absence of this
+line number, the next line number is specified. Multiple lines may
be specified using multiple slashes with the intended output for a line
following its respective slash.
Literal strings may be printed. Specify the string itself.
Optionally the string may be followed by a column number, specifying
the column on the line where the string should start. Otherwise, the
-string will be printed at the current position on the line.
+string is printed at the current position on the line.
Variables to be printed can be specified in the same ways as available
for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}). In addition, a
list may be followed by an asterisk (@samp{*}), which indicates that the
variables should be printed in their dictionary print formats, separated
by spaces. A variable list followed by a slash or the end of command
-will be interpreted the same way.
+is interpreted in the same way.
If a FORTRAN type specification is used to move backwards on the current
-line, then text is written at that point on the line, the line will be
+line, then text is written at that point on the line, the line is
truncated to that length, although additional text being added will
again extend the line to that length.
@vindex PRINT EJECT
@display
-PRINT EJECT
+PRINT EJECT
OUTFILE='@var{file_name}'
RECORDS=@var{n_lines}
@{NOTABLE,TABLE@}
The @subcmd{OUTFILE} subcommand is optional. It may be used to direct output to
a file specified by file name as a string or file handle (@pxref{File
-Handles}). If OUTFILE is not specified then output will be directed to
+Handles}). If OUTFILE is not specified then output is directed to
the listing file.
The @subcmd{ENCODING} subcommand may only be used if @subcmd{OUTFILE}
The @subcmd{FILE} subcommand, which is optional, is used to specify the file to
have its line re-read. The file must be specified as the name of a file
handle (@pxref{File Handles}). If FILE is not specified then the last
-file specified on @cmd{DATA LIST} will be assumed (last file specified
+file specified on @cmd{DATA LIST} is assumed (last file specified
lexically, not in terms of flow-of-control).
By default, the line re-read is re-read in its entirety. With the
@vindex WRITE
@display
-WRITE
+WRITE
OUTFILE='@var{file_name}'
RECORDS=@var{n_lines}
@{NOTABLE,TABLE@}
@var{var_list} *
@end display
-@code{WRITE} writes text or binary data to an output file.
+@code{WRITE} writes text or binary data to an output file.
@xref{PRINT}, for more information on syntax and usage. @cmd{PRINT}
and @cmd{WRITE} differ in only a few ways: