+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
@c (modify-syntax-entry ?_ "w")
@c (modify-syntax-entry ?' "'")
@c (modify-syntax-entry ?@ "'")
@cindex cases
@cindex observations
-Data are the focus of the @pspp{} language.
+Data are the focus of the @pspp{} language.
Each datum belongs to a @dfn{case} (also called an @dfn{observation}).
Each case represents an individual or ``experimental unit''.
For example, in the results of a survey, the names of the respondents,
``empty,'' that is, it has no dictionary or data. If a dataset with
the given name already exists, this has no effect. The new dataset
can be used with commands that support output to a dataset,
-e.g. AGGREGATE (@pxref{AGGREGATE}).
+@i{e.g.} AGGREGATE (@pxref{AGGREGATE}).
@vindex DATASET CLOSE
The DATASET CLOSE command deletes a dataset. If the active dataset is
that contains variable names, for example.
@cmd{DATA LIST} can optionally output a table describing how the data file
-will be read. The @subcmd{TABLE} subcommand enables this output, and
+is read. The @subcmd{TABLE} subcommand enables this output, and
@subcmd{NOTABLE} disables it. The default is to output the table.
The list of variables to be read from the data list must come last.
In columnar style, to use a variable format other than the default,
specify the format type in parentheses after the column numbers. For
-instance, for alphanumeric @samp{A} format, use @samp{(A)}.
+instance, for alphanumeric @samp{A} format, use @samp{(A)}.
In addition, implied decimal places can be specified in parentheses
after the column numbers. As an example, suppose that a data file has a
leaves the active column immediately after the ending column
specified. Record motion using @code{NEWREC} in FORTRAN style also
applies to later FORTRAN and columnar specifiers.
-
+
@menu
* DATA LIST FIXED Examples:: Examples of DATA LIST FIXED.
@end menu
[(@{TAB,'@var{c}'@}, @dots{})]
[@{NOTABLE,TABLE@}]
[FILE='@var{file_name}' [ENCODING='@var{encoding}']]
- [SKIP=@var{record_cnt}]
+ [SKIP=@var{n_records}]
/@var{var_spec}@dots{}
where each @var{var_spec} takes one of the forms
@end display
In free format, the input data is, by default, structured as a series
-of fields separated by spaces, tabs, commas, or line breaks. Each
+of fields separated by spaces, tabs, or line breaks.
+If the current @subcmd{DECIMAL} separator is @subcmd{DOT} (@pxref{SET}),
+then commas are also treated as field separators.
+Each
field's content may be unquoted, or it may be quoted with a pairs of
apostrophes (@samp{'}) or double quotes (@samp{"}). Unquoted white
space separates fields but is not part of any field. Any mix of
This list must be introduced by a single slash (@samp{/}). The set of
variable names may contain format specifications in parentheses
(@pxref{Input and Output Formats}). Format specifications apply to all
-variables back to the previous parenthesized format specification.
+variables back to the previous parenthesized format specification.
In addition, an asterisk may be used to indicate that all variables
preceding it are to have input/output format @samp{F8.0}.
FILE HANDLE @var{handle_name}
/NAME='@var{file_name}
[/MODE=CHARACTER]
+ [/ENDS=@{CR,CRLF@}]
/TABWIDTH=@var{tab_width}
[ENCODING='@var{encoding}']
@itemize
@item
-In CHARACTER mode, the default, the data file is read as a text file,
-according to the local system's conventions, and each text line is
-read as one record.
+In CHARACTER mode, the default, the data file is read as a text file.
+Each text line is read as one record.
In CHARACTER mode only, tabs are expanded to spaces by input programs,
except by @cmd{DATA LIST FREE} with explicitly specified delimiters.
extension) may be used to specify an alternate width. Use a TABWIDTH
of 0 to suppress tab expansion.
+A file written in CHARACTER mode by default uses the line ends of the
+system on which PSPP is running, that is, on Windows, the default is
+CR LF line ends, and on other systems the default is LF only. Specify
+ENDS as CR or CRLF to override the default. PSPP reads files using
+either convention on any kind of system, regardless of ENDS.
+
@item
In IMAGE mode, the data file is treated as a series of fixed-length
binary records. LRECL should be used to specify the record length in
for ENCODING on the INSERT command are supported (@pxref{INSERT}).
For reading in other file-based modes, encoding autodetection is not
supported; if the specified encoding requests autodetection then the
-default encoding will be used. This is also true when a file handle
+default encoding is used. This is also true when a file handle
is used for writing a file in any mode.
@node INPUT PROGRAM
stops the flow of input data and passes out of the @cmd{INPUT PROGRAM}
structure.
-All this is very confusing. A few examples should help to clarify.
+@cmd{INPUT PROGRAM} must contain at least one @cmd{DATA LIST} or
+@cmd{END FILE} command.
+
+@subheading Example 1: Read two files in parallel to the end of the shorter
+
+The following example reads variable X from file @file{a.txt} and
+variable Y from file @file{b.txt}. If one file is shorter than the
+other then the extra data in the longer file is ignored.
-@c If you change this example, change the regression test1 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- DATA LIST NOTABLE FILE='a.data'/X 1-10.
- DATA LIST NOTABLE FILE='b.data'/Y 1-10.
+ DATA LIST NOTABLE FILE='a.txt'/X 1-10.
+ DATA LIST NOTABLE FILE='b.txt'/Y 1-10.
END INPUT PROGRAM.
LIST.
@end example
-The example above reads variable X from file @file{a.data} and variable
-Y from file @file{b.data}. If one file is shorter than the other then
-the extra data in the longer file is ignored.
+@subheading Example 2: Read two files in parallel, supplementing the shorter
+
+The following example also reads variable X from @file{a.txt} and
+variable Y from @file{b.txt}. If one file is shorter than the other
+then it continues reading the longer to its end, setting the other
+variable to system-missing.
-@c If you change this example, change the regression test2 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- NUMERIC #A #B.
-
- DO IF NOT #A.
- DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
- END IF.
- DO IF NOT #B.
- DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10.
- END IF.
- DO IF #A AND #B.
- END FILE.
- END IF.
- END CASE.
+ NUMERIC #A #B.
+
+ DO IF NOT #A.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ END IF.
+ DO IF NOT #B.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10.
+ END IF.
+ DO IF #A AND #B.
+ END FILE.
+ END IF.
+ END CASE.
END INPUT PROGRAM.
LIST.
@end example
-The above example reads variable X from @file{a.data} and variable Y from
-@file{b.data}. If one file is shorter than the other then the missing
-field is set to the system-missing value alongside the present value for
-the remaining length of the longer file.
+@subheading Example 3: Concatenate two files (version 1)
+
+The following example reads data from file @file{a.txt}, then from
+@file{b.txt}, and concatenates them into a single active dataset.
-@c If you change this example, change the regression test3 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- NUMERIC #A #B.
-
- DO IF #A.
- DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10.
- DO IF #B.
- END FILE.
- ELSE.
- END CASE.
- END IF.
+ NUMERIC #A #B.
+
+ DO IF #A.
+ DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10.
+ DO IF #B.
+ END FILE.
ELSE.
- DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
- DO IF NOT #A.
- END CASE.
- END IF.
+ END CASE.
END IF.
+ ELSE.
+ DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10.
+ DO IF NOT #A.
+ END CASE.
+ END IF.
+ END IF.
END INPUT PROGRAM.
LIST.
@end example
-The above example reads data from file @file{a.data}, then from
-@file{b.data}, and concatenates them into a single active dataset.
+@subheading Example 4: Concatenate two files (version 2)
+
+This is another way to do the same thing as Example 3.
-@c If you change this example, change the regression test4 in
-@c tests/command/input-program.sh to match.
@example
INPUT PROGRAM.
- NUMERIC #EOF.
-
- LOOP IF NOT #EOF.
- DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10.
- DO IF NOT #EOF.
- END CASE.
- END IF.
- END LOOP.
-
- COMPUTE #EOF = 0.
- LOOP IF NOT #EOF.
- DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10.
- DO IF NOT #EOF.
- END CASE.
- END IF.
- END LOOP.
+ NUMERIC #EOF.
- END FILE.
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ COMPUTE #EOF = 0.
+ LOOP IF NOT #EOF.
+ DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10.
+ DO IF NOT #EOF.
+ END CASE.
+ END IF.
+ END LOOP.
+
+ END FILE.
END INPUT PROGRAM.
LIST.
@end example
-The above example does the same thing as the previous example, in a
-different way.
+@subheading Example 5: Generate random variates
+
+The follows example creates a dataset that consists of 50 random
+variates between 0 and 10.
-@c If you change this example, make similar changes to the regression
-@c test5 in tests/command/input-program.sh.
@example
INPUT PROGRAM.
- LOOP #I=1 TO 50.
- COMPUTE X=UNIFORM(10).
- END CASE.
- END LOOP.
- END FILE.
+ LOOP #I=1 TO 50.
+ COMPUTE X=UNIFORM(10).
+ END CASE.
+ END LOOP.
+ END FILE.
END INPUT PROGRAM.
-LIST/FORMAT=NUMBERED.
+LIST /FORMAT=NUMBERED.
@end example
-The above example causes an active dataset to be created consisting of 50
-random variates between 0 and 10.
-
@node LIST
@section LIST
@vindex LIST
Case numbers start from 1. They are counted after all transformations
have been considered.
-@cmd{LIST} attempts to fit all the values on a single line. If needed
-to make them fit, variable names are displayed vertically. If values
-cannot fit on a single line, then a multi-line format will be used.
-
@cmd{LIST} is a procedure. It causes the data to be read.
@node NEW FILE
@vindex PRINT
@display
-PRINT
+PRINT
[OUTFILE='@var{file_name}']
[RECORDS=@var{n_lines}]
[@{NOTABLE,TABLE@}]
The @subcmd{OUTFILE} subcommand specifies the file to receive the output. The
file may be a file name as a string or a file handle (@pxref{File
-Handles}). If @subcmd{OUTFILE} is not present then output will be sent to
-@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, a space is
-inserted at beginning of each output line, even lines that otherwise
-would be blank.
+Handles}). If @subcmd{OUTFILE} is not present then output is sent to
+@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, the
+output is written to @var{file_name} in a plain text format, with a
+space inserted at beginning of each output line, even lines that
+otherwise would be blank.
The @subcmd{ENCODING} subcommand may only be used if the
@subcmd{OUTFILE} subcommand is also used. It specifies the character
Introduce the strings and variables to be printed with a slash
(@samp{/}). Optionally, the slash may be followed by a number
-indicating which output line will be specified. In the absence of this
-line number, the next line number will be specified. Multiple lines may
+indicating which output line is specified. In the absence of this
+line number, the next line number is specified. Multiple lines may
be specified using multiple slashes with the intended output for a line
following its respective slash.
Literal strings may be printed. Specify the string itself.
Optionally the string may be followed by a column number, specifying
the column on the line where the string should start. Otherwise, the
-string will be printed at the current position on the line.
+string is printed at the current position on the line.
Variables to be printed can be specified in the same ways as available
for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}). In addition, a
list may be followed by an asterisk (@samp{*}), which indicates that the
variables should be printed in their dictionary print formats, separated
by spaces. A variable list followed by a slash or the end of command
-will be interpreted the same way.
+is interpreted in the same way.
If a FORTRAN type specification is used to move backwards on the current
-line, then text is written at that point on the line, the line will be
+line, then text is written at that point on the line, the line is
truncated to that length, although additional text being added will
again extend the line to that length.
@vindex PRINT EJECT
@display
-PRINT EJECT
+PRINT EJECT
OUTFILE='@var{file_name}'
RECORDS=@var{n_lines}
@{NOTABLE,TABLE@}
The @subcmd{OUTFILE} subcommand is optional. It may be used to direct output to
a file specified by file name as a string or file handle (@pxref{File
-Handles}). If OUTFILE is not specified then output will be directed to
+Handles}). If OUTFILE is not specified then output is directed to
the listing file.
The @subcmd{ENCODING} subcommand may only be used if @subcmd{OUTFILE}
The @subcmd{FILE} subcommand, which is optional, is used to specify the file to
have its line re-read. The file must be specified as the name of a file
handle (@pxref{File Handles}). If FILE is not specified then the last
-file specified on @cmd{DATA LIST} will be assumed (last file specified
+file specified on @cmd{DATA LIST} is assumed (last file specified
lexically, not in terms of flow-of-control).
By default, the line re-read is re-read in its entirety. With the
@vindex WRITE
@display
-WRITE
+WRITE
OUTFILE='@var{file_name}'
RECORDS=@var{n_lines}
@{NOTABLE,TABLE@}
@var{var_list} *
@end display
-@code{WRITE} writes text or binary data to an output file.
+@code{WRITE} writes text or binary data to an output file.
@xref{PRINT}, for more information on syntax and usage. @cmd{PRINT}
and @cmd{WRITE} differ in only a few ways: