X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-io.texi;h=9f5b8dfbeec3303c19ffe29b89249d96e8c2a67f;hb=refs%2Fheads%2Fctables7;hp=313e543205a41f430ac864534ce22a132bfabecd;hpb=0b0ca44889e637251cb5f2dbf3c7fdc4ec8b9bd7;p=pspp diff --git a/doc/data-io.texi b/doc/data-io.texi index 313e543205..9f5b8dfbee 100644 --- a/doc/data-io.texi +++ b/doc/data-io.texi @@ -1,3 +1,12 @@ +@c PSPP - a program for statistical analysis. +@c Copyright (C) 2017, 2020 Free Software Foundation, Inc. +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 +@c or any later version published by the Free Software Foundation; +@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the section entitled "GNU +@c Free Documentation License". +@c @c (modify-syntax-entry ?_ "w") @c (modify-syntax-entry ?' "'") @c (modify-syntax-entry ?@ "'") @@ -11,7 +20,7 @@ @cindex cases @cindex observations -Data are the focus of the @pspp{} language. +Data are the focus of the @pspp{} language. Each datum belongs to a @dfn{case} (also called an @dfn{observation}). Each case represents an individual or ``experimental unit''. For example, in the results of a survey, the names of the respondents, @@ -180,7 +189,7 @@ The DATASET DECLARE command creates a new dataset that is initially ``empty,'' that is, it has no dictionary or data. If a dataset with the given name already exists, this has no effect. The new dataset can be used with commands that support output to a dataset, -e.g. AGGREGATE (@pxref{AGGREGATE}). +@i{e.g.} AGGREGATE (@pxref{AGGREGATE}). @vindex DATASET CLOSE The DATASET CLOSE command deletes a dataset. If the active dataset is @@ -295,7 +304,7 @@ the beginning of an input file. It can be used to skip over a row that contains variable names, for example. @cmd{DATA LIST} can optionally output a table describing how the data file -will be read. The @subcmd{TABLE} subcommand enables this output, and +is read. The @subcmd{TABLE} subcommand enables this output, and @subcmd{NOTABLE} disables it. The default is to output the table. The list of variables to be read from the data list must come last. @@ -319,7 +328,7 @@ changed; see @ref{SET} for more information.) In columnar style, to use a variable format other than the default, specify the format type in parentheses after the column numbers. For -instance, for alphanumeric @samp{A} format, use @samp{(A)}. +instance, for alphanumeric @samp{A} format, use @samp{(A)}. In addition, implied decimal places can be specified in parentheses after the column numbers. As an example, suppose that a data file has a @@ -375,7 +384,7 @@ FORTRAN and columnar styles may be freely intermixed. Columnar style leaves the active column immediately after the ending column specified. Record motion using @code{NEWREC} in FORTRAN style also applies to later FORTRAN and columnar specifiers. - + @menu * DATA LIST FIXED Examples:: Examples of DATA LIST FIXED. @end menu @@ -475,7 +484,7 @@ DATA LIST FREE [(@{TAB,'@var{c}'@}, @dots{})] [@{NOTABLE,TABLE@}] [FILE='@var{file_name}' [ENCODING='@var{encoding}']] - [SKIP=@var{record_cnt}] + [SKIP=@var{n_records}] /@var{var_spec}@dots{} where each @var{var_spec} takes one of the forms @@ -484,7 +493,10 @@ where each @var{var_spec} takes one of the forms @end display In free format, the input data is, by default, structured as a series -of fields separated by spaces, tabs, commas, or line breaks. Each +of fields separated by spaces, tabs, or line breaks. +If the current @subcmd{DECIMAL} separator is @subcmd{DOT} (@pxref{SET}), +then commas are also treated as field separators. +Each field's content may be unquoted, or it may be quoted with a pairs of apostrophes (@samp{'}) or double quotes (@samp{"}). Unquoted white space separates fields but is not part of any field. Any mix of @@ -511,7 +523,7 @@ The variables to be parsed are given as a single list of variable names. This list must be introduced by a single slash (@samp{/}). The set of variable names may contain format specifications in parentheses (@pxref{Input and Output Formats}). Format specifications apply to all -variables back to the previous parenthesized format specification. +variables back to the previous parenthesized format specification. In addition, an asterisk may be used to indicate that all variables preceding it are to have input/output format @samp{F8.0}. @@ -743,7 +755,7 @@ For reading text files in CHARACTER mode, all of the forms described for ENCODING on the INSERT command are supported (@pxref{INSERT}). For reading in other file-based modes, encoding autodetection is not supported; if the specified encoding requests autodetection then the -default encoding will be used. This is also true when a file handle +default encoding is used. This is also true when a file handle is used for writing a file in any mode. @node INPUT PROGRAM @@ -795,118 +807,118 @@ so an infinite loop results. @cmd{END FILE}, when executed, stops the flow of input data and passes out of the @cmd{INPUT PROGRAM} structure. -All this is very confusing. A few examples should help to clarify. +@cmd{INPUT PROGRAM} must contain at least one @cmd{DATA LIST} or +@cmd{END FILE} command. + +@subheading Example 1: Read two files in parallel to the end of the shorter + +The following example reads variable X from file @file{a.txt} and +variable Y from file @file{b.txt}. If one file is shorter than the +other then the extra data in the longer file is ignored. -@c If you change this example, change the regression test1 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - DATA LIST NOTABLE FILE='a.data'/X 1-10. - DATA LIST NOTABLE FILE='b.data'/Y 1-10. + DATA LIST NOTABLE FILE='a.txt'/X 1-10. + DATA LIST NOTABLE FILE='b.txt'/Y 1-10. END INPUT PROGRAM. LIST. @end example -The example above reads variable X from file @file{a.data} and variable -Y from file @file{b.data}. If one file is shorter than the other then -the extra data in the longer file is ignored. +@subheading Example 2: Read two files in parallel, supplementing the shorter + +The following example also reads variable X from @file{a.txt} and +variable Y from @file{b.txt}. If one file is shorter than the other +then it continues reading the longer to its end, setting the other +variable to system-missing. -@c If you change this example, change the regression test2 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - NUMERIC #A #B. - - DO IF NOT #A. - DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. - END IF. - DO IF NOT #B. - DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10. - END IF. - DO IF #A AND #B. - END FILE. - END IF. - END CASE. + NUMERIC #A #B. + + DO IF NOT #A. + DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10. + END IF. + DO IF NOT #B. + DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10. + END IF. + DO IF #A AND #B. + END FILE. + END IF. + END CASE. END INPUT PROGRAM. LIST. @end example -The above example reads variable X from @file{a.data} and variable Y from -@file{b.data}. If one file is shorter than the other then the missing -field is set to the system-missing value alongside the present value for -the remaining length of the longer file. +@subheading Example 3: Concatenate two files (version 1) + +The following example reads data from file @file{a.txt}, then from +@file{b.txt}, and concatenates them into a single active dataset. -@c If you change this example, change the regression test3 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - NUMERIC #A #B. - - DO IF #A. - DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10. - DO IF #B. - END FILE. - ELSE. - END CASE. - END IF. + NUMERIC #A #B. + + DO IF #A. + DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10. + DO IF #B. + END FILE. ELSE. - DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. - DO IF NOT #A. - END CASE. - END IF. + END CASE. END IF. + ELSE. + DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10. + DO IF NOT #A. + END CASE. + END IF. + END IF. END INPUT PROGRAM. LIST. @end example -The above example reads data from file @file{a.data}, then from -@file{b.data}, and concatenates them into a single active dataset. +@subheading Example 4: Concatenate two files (version 2) + +This is another way to do the same thing as Example 3. -@c If you change this example, change the regression test4 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - NUMERIC #EOF. - - LOOP IF NOT #EOF. - DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10. - DO IF NOT #EOF. - END CASE. - END IF. - END LOOP. - - COMPUTE #EOF = 0. - LOOP IF NOT #EOF. - DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10. - DO IF NOT #EOF. - END CASE. - END IF. - END LOOP. + NUMERIC #EOF. - END FILE. + LOOP IF NOT #EOF. + DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10. + DO IF NOT #EOF. + END CASE. + END IF. + END LOOP. + + COMPUTE #EOF = 0. + LOOP IF NOT #EOF. + DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10. + DO IF NOT #EOF. + END CASE. + END IF. + END LOOP. + + END FILE. END INPUT PROGRAM. LIST. @end example -The above example does the same thing as the previous example, in a -different way. +@subheading Example 5: Generate random variates + +The follows example creates a dataset that consists of 50 random +variates between 0 and 10. -@c If you change this example, make similar changes to the regression -@c test5 in tests/command/input-program.sh. @example INPUT PROGRAM. - LOOP #I=1 TO 50. - COMPUTE X=UNIFORM(10). - END CASE. - END LOOP. - END FILE. + LOOP #I=1 TO 50. + COMPUTE X=UNIFORM(10). + END CASE. + END LOOP. + END FILE. END INPUT PROGRAM. -LIST/FORMAT=NUMBERED. +LIST /FORMAT=NUMBERED. @end example -The above example causes an active dataset to be created consisting of 50 -random variates between 0 and 10. - @node LIST @section LIST @vindex LIST @@ -939,10 +951,6 @@ currently not used. Case numbers start from 1. They are counted after all transformations have been considered. -@cmd{LIST} attempts to fit all the values on a single line. If needed -to make them fit, variable names are displayed vertically. If values -cannot fit on a single line, then a multi-line format will be used. - @cmd{LIST} is a procedure. It causes the data to be read. @node NEW FILE @@ -961,7 +969,7 @@ active dataset. @vindex PRINT @display -PRINT +PRINT [OUTFILE='@var{file_name}'] [RECORDS=@var{n_lines}] [@{NOTABLE,TABLE@}] @@ -985,10 +993,11 @@ are specified, @cmd{PRINT} outputs a single blank line. The @subcmd{OUTFILE} subcommand specifies the file to receive the output. The file may be a file name as a string or a file handle (@pxref{File -Handles}). If @subcmd{OUTFILE} is not present then output will be sent to -@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, a space is -inserted at beginning of each output line, even lines that otherwise -would be blank. +Handles}). If @subcmd{OUTFILE} is not present then output is sent to +@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, the +output is written to @var{file_name} in a plain text format, with a +space inserted at beginning of each output line, even lines that +otherwise would be blank. The @subcmd{ENCODING} subcommand may only be used if the @subcmd{OUTFILE} subcommand is also used. It specifies the character @@ -1004,15 +1013,15 @@ default, suppresses this output table. Introduce the strings and variables to be printed with a slash (@samp{/}). Optionally, the slash may be followed by a number -indicating which output line will be specified. In the absence of this -line number, the next line number will be specified. Multiple lines may +indicating which output line is specified. In the absence of this +line number, the next line number is specified. Multiple lines may be specified using multiple slashes with the intended output for a line following its respective slash. Literal strings may be printed. Specify the string itself. Optionally the string may be followed by a column number, specifying the column on the line where the string should start. Otherwise, the -string will be printed at the current position on the line. +string is printed at the current position on the line. Variables to be printed can be specified in the same ways as available for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}). In addition, a @@ -1020,10 +1029,10 @@ variable list may be followed by an asterisk (@samp{*}), which indicates that the variables should be printed in their dictionary print formats, separated by spaces. A variable list followed by a slash or the end of command -will be interpreted the same way. +is interpreted in the same way. If a FORTRAN type specification is used to move backwards on the current -line, then text is written at that point on the line, the line will be +line, then text is written at that point on the line, the line is truncated to that length, although additional text being added will again extend the line to that length. @@ -1032,7 +1041,7 @@ again extend the line to that length. @vindex PRINT EJECT @display -PRINT EJECT +PRINT EJECT OUTFILE='@var{file_name}' RECORDS=@var{n_lines} @{NOTABLE,TABLE@} @@ -1074,7 +1083,7 @@ PRINT SPACE [OUTFILE='file_name'] [ENCODING='@var{encoding}'] [n_lines]. The @subcmd{OUTFILE} subcommand is optional. It may be used to direct output to a file specified by file name as a string or file handle (@pxref{File -Handles}). If OUTFILE is not specified then output will be directed to +Handles}). If OUTFILE is not specified then output is directed to the listing file. The @subcmd{ENCODING} subcommand may only be used if @subcmd{OUTFILE} @@ -1101,7 +1110,7 @@ for further processing. The @subcmd{FILE} subcommand, which is optional, is used to specify the file to have its line re-read. The file must be specified as the name of a file handle (@pxref{File Handles}). If FILE is not specified then the last -file specified on @cmd{DATA LIST} will be assumed (last file specified +file specified on @cmd{DATA LIST} is assumed (last file specified lexically, not in terms of flow-of-control). By default, the line re-read is re-read in its entirety. With the @@ -1204,7 +1213,7 @@ structure (@pxref{LOOP}). Use @cmd{DATA LIST} before, not after, @vindex WRITE @display -WRITE +WRITE OUTFILE='@var{file_name}' RECORDS=@var{n_lines} @{NOTABLE,TABLE@} @@ -1217,7 +1226,7 @@ WRITE @var{var_list} * @end display -@code{WRITE} writes text or binary data to an output file. +@code{WRITE} writes text or binary data to an output file. @xref{PRINT}, for more information on syntax and usage. @cmd{PRINT} and @cmd{WRITE} differ in only a few ways: