X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-io.texi;h=9f5b8dfbeec3303c19ffe29b89249d96e8c2a67f;hb=refs%2Fheads%2Fctables10;hp=747e32d9b503d86f89bd212f5eaff8738e977ac7;hpb=4056e461fd8f8d9ba7feca63c73d2f50a2048b63;p=pspp diff --git a/doc/data-io.texi b/doc/data-io.texi index 747e32d9b5..9f5b8dfbee 100644 --- a/doc/data-io.texi +++ b/doc/data-io.texi @@ -1,5 +1,5 @@ @c PSPP - a program for statistical analysis. -@c Copyright (C) 2017 Free Software Foundation, Inc. +@c Copyright (C) 2017, 2020 Free Software Foundation, Inc. @c Permission is granted to copy, distribute and/or modify this document @c under the terms of the GNU Free Documentation License, Version 1.3 @c or any later version published by the Free Software Foundation; @@ -48,7 +48,6 @@ actually be read until a procedure is executed. * INPUT PROGRAM:: Support for complex input programs. * LIST:: List cases in the active dataset. * NEW FILE:: Clear the active dataset. -* MATRIX DATA:: Defining matrix material for procedures. * PRINT:: Display values in print formats. * PRINT EJECT:: Eject the current page then print. * PRINT SPACE:: Print blank lines. @@ -305,7 +304,7 @@ the beginning of an input file. It can be used to skip over a row that contains variable names, for example. @cmd{DATA LIST} can optionally output a table describing how the data file -will be read. The @subcmd{TABLE} subcommand enables this output, and +is read. The @subcmd{TABLE} subcommand enables this output, and @subcmd{NOTABLE} disables it. The default is to output the table. The list of variables to be read from the data list must come last. @@ -485,7 +484,7 @@ DATA LIST FREE [(@{TAB,'@var{c}'@}, @dots{})] [@{NOTABLE,TABLE@}] [FILE='@var{file_name}' [ENCODING='@var{encoding}']] - [SKIP=@var{record_cnt}] + [SKIP=@var{n_records}] /@var{var_spec}@dots{} where each @var{var_spec} takes one of the forms @@ -756,7 +755,7 @@ For reading text files in CHARACTER mode, all of the forms described for ENCODING on the INSERT command are supported (@pxref{INSERT}). For reading in other file-based modes, encoding autodetection is not supported; if the specified encoding requests autodetection then the -default encoding will be used. This is also true when a file handle +default encoding is used. This is also true when a file handle is used for writing a file in any mode. @node INPUT PROGRAM @@ -811,118 +810,115 @@ structure. @cmd{INPUT PROGRAM} must contain at least one @cmd{DATA LIST} or @cmd{END FILE} command. -All this is very confusing. A few examples should help to clarify. +@subheading Example 1: Read two files in parallel to the end of the shorter + +The following example reads variable X from file @file{a.txt} and +variable Y from file @file{b.txt}. If one file is shorter than the +other then the extra data in the longer file is ignored. -@c If you change this example, change the regression test1 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - DATA LIST NOTABLE FILE='a.data'/X 1-10. - DATA LIST NOTABLE FILE='b.data'/Y 1-10. + DATA LIST NOTABLE FILE='a.txt'/X 1-10. + DATA LIST NOTABLE FILE='b.txt'/Y 1-10. END INPUT PROGRAM. LIST. @end example -The example above reads variable X from file @file{a.data} and variable -Y from file @file{b.data}. If one file is shorter than the other then -the extra data in the longer file is ignored. +@subheading Example 2: Read two files in parallel, supplementing the shorter + +The following example also reads variable X from @file{a.txt} and +variable Y from @file{b.txt}. If one file is shorter than the other +then it continues reading the longer to its end, setting the other +variable to system-missing. -@c If you change this example, change the regression test2 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - NUMERIC #A #B. - - DO IF NOT #A. - DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. - END IF. - DO IF NOT #B. - DATA LIST NOTABLE END=#B FILE='b.data'/Y 1-10. - END IF. - DO IF #A AND #B. - END FILE. - END IF. - END CASE. + NUMERIC #A #B. + + DO IF NOT #A. + DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10. + END IF. + DO IF NOT #B. + DATA LIST NOTABLE END=#B FILE='b.txt'/Y 1-10. + END IF. + DO IF #A AND #B. + END FILE. + END IF. + END CASE. END INPUT PROGRAM. LIST. @end example -The above example reads variable X from @file{a.data} and variable Y from -@file{b.data}. If one file is shorter than the other then the missing -field is set to the system-missing value alongside the present value for -the remaining length of the longer file. +@subheading Example 3: Concatenate two files (version 1) + +The following example reads data from file @file{a.txt}, then from +@file{b.txt}, and concatenates them into a single active dataset. -@c If you change this example, change the regression test3 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - NUMERIC #A #B. - - DO IF #A. - DATA LIST NOTABLE END=#B FILE='b.data'/X 1-10. - DO IF #B. - END FILE. - ELSE. - END CASE. - END IF. + NUMERIC #A #B. + + DO IF #A. + DATA LIST NOTABLE END=#B FILE='b.txt'/X 1-10. + DO IF #B. + END FILE. ELSE. - DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10. - DO IF NOT #A. - END CASE. - END IF. + END CASE. + END IF. + ELSE. + DATA LIST NOTABLE END=#A FILE='a.txt'/X 1-10. + DO IF NOT #A. + END CASE. END IF. + END IF. END INPUT PROGRAM. LIST. @end example -The above example reads data from file @file{a.data}, then from -@file{b.data}, and concatenates them into a single active dataset. +@subheading Example 4: Concatenate two files (version 2) + +This is another way to do the same thing as Example 3. -@c If you change this example, change the regression test4 in -@c tests/command/input-program.sh to match. @example INPUT PROGRAM. - NUMERIC #EOF. - - LOOP IF NOT #EOF. - DATA LIST NOTABLE END=#EOF FILE='a.data'/X 1-10. - DO IF NOT #EOF. - END CASE. - END IF. - END LOOP. - - COMPUTE #EOF = 0. - LOOP IF NOT #EOF. - DATA LIST NOTABLE END=#EOF FILE='b.data'/X 1-10. - DO IF NOT #EOF. - END CASE. - END IF. - END LOOP. + NUMERIC #EOF. - END FILE. + LOOP IF NOT #EOF. + DATA LIST NOTABLE END=#EOF FILE='a.txt'/X 1-10. + DO IF NOT #EOF. + END CASE. + END IF. + END LOOP. + + COMPUTE #EOF = 0. + LOOP IF NOT #EOF. + DATA LIST NOTABLE END=#EOF FILE='b.txt'/X 1-10. + DO IF NOT #EOF. + END CASE. + END IF. + END LOOP. + + END FILE. END INPUT PROGRAM. LIST. @end example -The above example does the same thing as the previous example, in a -different way. +@subheading Example 5: Generate random variates + +The follows example creates a dataset that consists of 50 random +variates between 0 and 10. -@c If you change this example, make similar changes to the regression -@c test5 in tests/command/input-program.sh. @example INPUT PROGRAM. - LOOP #I=1 TO 50. - COMPUTE X=UNIFORM(10). - END CASE. - END LOOP. - END FILE. + LOOP #I=1 TO 50. + COMPUTE X=UNIFORM(10). + END CASE. + END LOOP. + END FILE. END INPUT PROGRAM. -LIST/FORMAT=NUMBERED. +LIST /FORMAT=NUMBERED. @end example -The above example causes an active dataset to be created consisting of 50 -random variates between 0 and 10. - @node LIST @section LIST @vindex LIST @@ -968,160 +964,6 @@ NEW FILE. @cmd{NEW FILE} command clears the dictionary and data from the current active dataset. -@node MATRIX DATA -@section MATRIX DATA -@vindex MATRIX DATA - -@display -MATRIX DATA - VARIABLES = @var{columns} - [FILE='@var{file_name}'| INLINE @} - [/FORMAT= [@{LIST | FREE@}] - [@{UPPER | LOWER | FULL@}] - [@{DIAGONAL | NODIAGONAL@}]] - [/N= @var{n}] - [/SPLIT= @var{split_variables}]. -@end display - -The @cmd{MATRIX DATA} command is used to input data in the form of matrices -which can subsequently be used by other commands. If the -@subcmd{FILE} is omitted or takes the value @samp{INLINE} then the command -should immediately followed by @cmd{BEGIN DATA} (@pxref{BEGIN DATA}). - -There is one mandatory subcommand, @i{viz:} @subcmd{VARIABLES}, which defines -the @var{columns} of the matrix. -Normally, the @var{columns} should include an item called @samp{ROWTYPE_}. -The @samp{ROWTYPE_} column is used to specify the purpose of a row in the -matrix. - -@example -matrix data - variables = rowtype_ var01 TO var08. - -begin data. -mean 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7 -sd 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5 -n 92 92 92 92 92 92 92 92 -corr 1.00 -corr .18 1.00 -corr -.22 -.17 1.00 -corr .36 .31 -.14 1.00 -corr .27 .16 -.12 .22 1.00 -corr .33 .15 -.17 .24 .21 1.00 -corr .50 .29 -.20 .32 .12 .38 1.00 -corr .17 .29 -.05 .20 .27 .20 .04 1.00 -end data. -@end example - -In the above example, the first three rows have ROWTYPE_ values of -@samp{mean}, @samp{sd}, and @samp{n}. These indicate that the rows -contain mean values, standard deviations and counts, respectively. -All subsequent rows have a ROWTYPE_ of @samp{corr} which indicates -that the values are correlation coefficients. - -Note that in this example, the upper right values of the @samp{corr} -values are blank, and in each case, the rightmost value is unity. -This is because, the -@subcmd{FORMAT} subcommand defaults to @samp{LOWER DIAGONAL}, -which indicates that only the lower triangle is provided in the data. -The opposite triangle is automatically inferred. One could instead -specify the upper triangle as follows: - - -@example -matrix data - variables = rowtype_ var01 TO var08 - /format = upper nodiagonal. - -begin data. -mean 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7 -sd 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5 -n 92 92 92 92 92 92 92 92 -corr .17 .50 -.33 .27 .36 -.22 .18 -corr .29 .29 -.20 .32 .12 .38 -corr .05 .20 -.15 .16 .21 -corr .20 .32 -.17 .12 -corr .27 .12 -.24 -corr -.20 -.38 -corr .04 -end data. -@end example - -In this example the @samp{NODIAGONAL} keyword is used. Accordingly -the diagonal values of the matrix are omitted. This implies that -there is one less @samp{corr} line than there are variables. -If the @samp{FULL} option is passed to the @subcmd{FORMAT} subcommand, -then all the matrix elements must be provided, including the diagonal -elements. - -In the preceding examples, each matrix row has been specified on a -single line. If you pass the keyword @var{FREE} to @subcmd{FORMAT} -then the data may be data for several matrix rows may be specified on -the same line, or a single row may be split across lines. - -The @subcmd{N} subcommand may be used to specify the number -of valid cases for each variable. It should not be used if the -data contains a record whose ROWTYPE_ column is @samp{N} or @samp{N_VECTOR}. -It implies a @samp{N} record whose values are all @var{n}. -That is to say, -@example -matrix data - variables = rowtype_ var01 TO var04 - /format = upper nodiagonal - /n = 99. -begin data -mean 34 35 36 37 -sd 22 11 55 66 -corr 9 8 7 -corr 6 5 -corr 4 -end data. -@end example -produces an effect identical to -@example -matrix data - variables = rowtype_ var01 TO var04 - /format = upper nodiagonal -begin data -n 99 99 99 99 -mean 34 35 36 37 -sd 22 11 55 66 -corr 9 8 7 -corr 6 5 -corr 4 -end data. -@end example - - -The @subcmd{SPLIT} is used to indicate that variables are to be -considered as split variables. For example, the following -defines two matrices using the variable @samp{S1} to distinguish -between them. - -@example -matrix data - variables = s1 rowtype_ var01 TO var04 - /split = s1 - /format = full diagonal. - -begin data -0 mean 34 35 36 37 -0 sd 22 11 55 66 -0 n 99 98 99 92 -0 corr 1 9 8 7 -0 corr 9 1 6 5 -0 corr 8 6 1 4 -0 corr 7 5 4 1 -1 mean 44 45 34 39 -1 sd 23 15 51 46 -1 n 98 34 87 23 -1 corr 1 2 3 4 -1 corr 2 1 5 6 -1 corr 3 5 1 7 -1 corr 4 6 7 1 -end data. -@end example - @node PRINT @section PRINT @vindex PRINT @@ -1151,7 +993,7 @@ are specified, @cmd{PRINT} outputs a single blank line. The @subcmd{OUTFILE} subcommand specifies the file to receive the output. The file may be a file name as a string or a file handle (@pxref{File -Handles}). If @subcmd{OUTFILE} is not present then output will be sent to +Handles}). If @subcmd{OUTFILE} is not present then output is sent to @pspp{}'s output listing file. When @subcmd{OUTFILE} is present, the output is written to @var{file_name} in a plain text format, with a space inserted at beginning of each output line, even lines that @@ -1171,15 +1013,15 @@ default, suppresses this output table. Introduce the strings and variables to be printed with a slash (@samp{/}). Optionally, the slash may be followed by a number -indicating which output line will be specified. In the absence of this -line number, the next line number will be specified. Multiple lines may +indicating which output line is specified. In the absence of this +line number, the next line number is specified. Multiple lines may be specified using multiple slashes with the intended output for a line following its respective slash. Literal strings may be printed. Specify the string itself. Optionally the string may be followed by a column number, specifying the column on the line where the string should start. Otherwise, the -string will be printed at the current position on the line. +string is printed at the current position on the line. Variables to be printed can be specified in the same ways as available for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}). In addition, a @@ -1187,10 +1029,10 @@ variable list may be followed by an asterisk (@samp{*}), which indicates that the variables should be printed in their dictionary print formats, separated by spaces. A variable list followed by a slash or the end of command -will be interpreted the same way. +is interpreted in the same way. If a FORTRAN type specification is used to move backwards on the current -line, then text is written at that point on the line, the line will be +line, then text is written at that point on the line, the line is truncated to that length, although additional text being added will again extend the line to that length. @@ -1241,7 +1083,7 @@ PRINT SPACE [OUTFILE='file_name'] [ENCODING='@var{encoding}'] [n_lines]. The @subcmd{OUTFILE} subcommand is optional. It may be used to direct output to a file specified by file name as a string or file handle (@pxref{File -Handles}). If OUTFILE is not specified then output will be directed to +Handles}). If OUTFILE is not specified then output is directed to the listing file. The @subcmd{ENCODING} subcommand may only be used if @subcmd{OUTFILE} @@ -1268,7 +1110,7 @@ for further processing. The @subcmd{FILE} subcommand, which is optional, is used to specify the file to have its line re-read. The file must be specified as the name of a file handle (@pxref{File Handles}). If FILE is not specified then the last -file specified on @cmd{DATA LIST} will be assumed (last file specified +file specified on @cmd{DATA LIST} is assumed (last file specified lexically, not in terms of flow-of-control). By default, the line re-read is re-read in its entirety. With the