X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-io.texi;h=8f727e50540f0e305c57195881a1cb886b359bce;hb=871f4456a207925fdce3df3150af3f3b263b2776;hp=07e01e605d7093fbe09914b02c76481fb0dc62cf;hpb=72435b7315ecb4943d0c8cf4ffb04fbba695ca1d;p=pspp-builds.git diff --git a/doc/data-io.texi b/doc/data-io.texi index 07e01e60..8f727e50 100644 --- a/doc/data-io.texi +++ b/doc/data-io.texi @@ -30,7 +30,6 @@ actually be read until a procedure is executed. * FILE HANDLE:: Support for special file formats. * INPUT PROGRAM:: Support for complex input programs. * LIST:: List cases in the active file. -* MATRIX DATA:: Read matrices in text format. * NEW FILE:: Clear the active file and dictionary. * PRINT:: Display values in print formats. * PRINT EJECT:: Eject the current page then print. @@ -138,9 +137,10 @@ Each form of @cmd{DATA LIST} is described in detail below. @display DATA LIST [FIXED] @{TABLE,NOTABLE@} - FILE='file-name' - RECORDS=record_count - END=end_var + [FILE='file-name'] + [RECORDS=record_count] + [END=end_var] + [SKIP=record_count] /[line_no] var_spec@dots{} where each var_spec takes one of the forms @@ -166,6 +166,10 @@ the list of variable specifications later in @cmd{DATA LIST}. The END subcommand is only useful in conjunction with @cmd{INPUT PROGRAM}. @xref{INPUT PROGRAM}, for details. +The optional SKIP subcommand specifies a number of records to skip at +the beginning of an input file. It can be used to skip over a row +that contains variable names, for example. + @cmd{DATA LIST} can optionally output a table describing how the data file will be read. The TABLE subcommand enables this output, and NOTABLE disables it. The default is to output the table. @@ -186,7 +190,7 @@ In columnar style, the starting column and ending column for the field are specified after the variable name, separated by a dash (@samp{-}). For instance, the third through fifth columns on a line would be specified @samp{3-5}. By default, variables are considered to be in -@samp{F} format (@pxref{Input/Output Formats}). (This default can be +@samp{F} format (@pxref{Input and Output Formats}). (This default can be changed; see @ref{SET} for more information.) In columnar style, to use a variable format other than the default, @@ -218,7 +222,7 @@ Implied decimal places also exist in FORTRAN style. A format specification with @var{d} decimal places also has @var{d} implied decimal places. -In addition to the standard format specifiers (@pxref{Input/Output +In addition to the standard format specifiers (@pxref{Input and Output Formats}), FORTRAN style defines some extensions: @table @asis @@ -346,8 +350,9 @@ This example shows keywords abbreviated to their first 3 letters. DATA LIST FREE [(@{TAB,'c'@}, @dots{})] [@{NOTABLE,TABLE@}] - FILE='file-name' - END=end_var + [FILE='file-name'] + [END=end_var] + [SKIP=record_cnt] /var_spec@dots{} where each var_spec takes one of the forms @@ -376,12 +381,12 @@ of quoting is allowed. The NOTABLE and TABLE subcommands are as in @cmd{DATA LIST FIXED} above. NOTABLE is the default. -The FILE and END subcommands are as in @cmd{DATA LIST FIXED} above. +The FILE, END, and SKIP subcommands are as in @cmd{DATA LIST FIXED} above. The variables to be parsed are given as a single list of variable names. This list must be introduced by a single slash (@samp{/}). The set of variable names may contain format specifications in parentheses -(@pxref{Input/Output Formats}). Format specifications apply to all +(@pxref{Input and Output Formats}). Format specifications apply to all variables back to the previous parenthesized format specification. In addition, an asterisk may be used to indicate that all variables @@ -398,8 +403,9 @@ on field width apply, but they are honored on output. DATA LIST LIST [(@{TAB,'c'@}, @dots{})] [@{NOTABLE,TABLE@}] - FILE='file-name' - END=end_var + [FILE='file-name'] + [END=end_var] + [SKIP=record_count] /var_spec@dots{} where each var_spec takes one of the forms @@ -482,10 +488,11 @@ exception). By default, each tab is 4 characters wide, but an alternate width may be specified on TABWIDTH. A tab width of 0 suppresses tab expansion entirely. -In IMAGE mode, the data file is opened in ANSI C binary mode and records -are fixed in length. In IMAGE mode, LRECL specifies the record length in -bytes, with a default of 1024. Tab characters are never expanded to -spaces in binary mode. +In IMAGE mode, the data file is opened in ANSI C binary mode. Record +length is fixed, with output data truncated or padded with spaces to +the record length. LRECL specifies the record length in bytes, with a +default of 1024. Tab characters are never expanded to spaces in +binary mode. Records The NAME subcommand specifies the name of the file associated with the handle. It is required in CHARACTER and IMAGE modes. @@ -687,108 +694,6 @@ cannot fit on a single line, then a multi-line format will be used. @cmd{LIST} is a procedure. It causes the data to be read. -@node MATRIX DATA -@section MATRIX DATA -@vindex MATRIX DATA - -@display -MATRIX DATA - /VARIABLES=var_list - /FILE='file-name' - /FORMAT=@{LIST,FREE@} @{LOWER,UPPER,FULL@} @{DIAGONAL,NODIAGONAL@} - /SPLIT=@{new_var,var_list@} - /FACTORS=var_list - /CELLS=n_cells - /N=n - /CONTENTS=@{N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE, - DFE,MAT,COV,CORR,PROX@} -@end display - -@cmd{MATRIX DATA} command reads square matrices in one of several textual -formats. @cmd{MATRIX DATA} clears the dictionary and replaces it and -reads a -data file. - -Use VARIABLES to specify the variables that form the rows and columns of -the matrices. You may not specify a variable named @code{VARNAME_}. You -should specify VARIABLES first. - -Specify the file to read on FILE, either as a file name string or a file -handle (@pxref{File Handles}). If FILE is not specified then matrix data -must immediately follow @cmd{MATRIX DATA} with a @cmd{BEGIN -DATA}@dots{}@cmd{END DATA} -construct (@pxref{BEGIN DATA}). - -The FORMAT subcommand specifies how the matrices are formatted. LIST, -the default, indicates that there is one line per row of matrix data; -FREE allows single matrix rows to be broken across multiple lines. This -is analogous to the difference between @cmd{DATA LIST FREE} and -@cmd{DATA LIST LIST} -(@pxref{DATA LIST}). LOWER, the default, indicates that the lower -triangle of the matrix is given; UPPER indicates the upper triangle; and -FULL indicates that the entire matrix is given. DIAGONAL, the default, -indicates that the diagonal is part of the data; NODIAGONAL indicates -that it is omitted. DIAGONAL/NODIAGONAL have no effect when FULL is -specified. - -The SPLIT subcommand is used to specify @cmd{SPLIT FILE} variables for the -input matrices (@pxref{SPLIT FILE}). Specify either a single variable -not specified on VARIABLES, or one or more variables that are specified -on VARIABLES. In the former case, the SPLIT values are not present in -the data and ROWTYPE_ may not be specified on VARIABLES. In the latter -case, the SPLIT values are present in the data. - -Specify a list of factor variables on FACTORS. Factor variables must -also be listed on VARIABLES. Factor variables are used when there are -some variables where, for each possible combination of their values, -statistics on the matrix variables are included in the data. - -If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the -CELLS subcommand is required. Specify the number of factor variable -combinations that are given. For instance, if factor variable A has 2 -values and factor variable B has 3 values, specify 6. - -The N subcommand specifies a population number of observations. When N -is specified, one N record is output for each @cmd{SPLIT FILE}. - -Use CONTENTS to specify what sort of information the matrices include. -Each possible option is described in more detail below. When ROWTYPE_ -is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS -is not specified then /CONTENTS=CORR is assumed. - -@table @asis -@item N -@item N_VECTOR -Number of observations as a vector, one value for each variable. -@item N_SCALAR -Number of observations as a single value. -@item N_MATRIX -Matrix of counts. -@item MEAN -Vector of means. -@item STDDEV -Vector of standard deviations. -@item COUNT -Vector of counts. -@item MSE -Vector of mean squared errors. -@item DFE -Vector of degrees of freedom. -@item MAT -Generic matrix. -@item COV -Covariance matrix. -@item CORR -Correlation matrix. -@item PROX -Proximities matrix. -@end table - -The exact semantics of the matrices read by @cmd{MATRIX DATA} are complex. -Right now @cmd{MATRIX DATA} isn't too useful due to a lack of procedures -accepting or producing related data, so these semantics aren't -documented. Later, they'll be described here in detail. - @node NEW FILE @section NEW FILE @vindex NEW FILE @@ -808,7 +713,7 @@ PRINT OUTFILE='file-name' RECORDS=n_lines @{NOTABLE,TABLE@} - /[line_no] arg@dots{} + [/[line_no] arg@dots{}] arg takes one of the following forms: 'string' [start-end] @@ -817,17 +722,20 @@ arg takes one of the following forms: var_list * @end display -The @cmd{PRINT} transformation writes variable data to an output file. -@cmd{PRINT} is executed when a procedure causes the data to be read. -Follow @cmd{PRINT} by @cmd{EXECUTE} to print variable data without -invoking a procedure (@pxref{EXECUTE}). +The @cmd{PRINT} transformation writes variable data to the listing +file or an output file. @cmd{PRINT} is executed when a procedure +causes the data to be read. Follow @cmd{PRINT} by @cmd{EXECUTE} to +print variable data without invoking a procedure (@pxref{EXECUTE}). -All @cmd{PRINT} subcommands are optional. +All @cmd{PRINT} subcommands are optional. If no strings or variables +are specified, PRINT outputs a single blank line. The OUTFILE subcommand specifies the file to receive the output. The file may be a file name as a string or a file handle (@pxref{File -Handles}). If OUTFILE is not present then output will be sent to PSPP's -output listing file. +Handles}). If OUTFILE is not present then output will be sent to +PSPP's output listing file. When OUTFILE is present, a space is +inserted at beginning of each output line, even lines that otherwise +would be blank. The RECORDS subcommand specifies the number of lines to be output. The number of lines may optionally be surrounded by parentheses. @@ -880,8 +788,20 @@ arg takes one of the following forms: var_list * @end display -@cmd{PRINT EJECT} writes data to an output file. Before the data is -written, the current page in the listing file is ejected. +@cmd{PRINT EJECT} advances to the beginning of a new output page in +the listing file or output file. It can also output data in the same +way as @cmd{PRINT}. + +All @cmd{PRINT EJECT} subcommands are optional. + +Without OUTFILE, PRINT EJECT ejects the current page in +the listing file, then it produces other output, if any is specified. + +With OUTFILE, PRINT EJECT writes its output to the specified file. +The first line of output is written with @samp{1} inserted in the +first column. Commonly, this is the only line of output. If +additional lines of output are specified, these additional lines are +written with a space inserted in the first column, as with PRINT. @xref{PRINT}, for more information on syntax and usage. @@ -1034,11 +954,29 @@ arg takes one of the following forms: @code{WRITE} writes text or binary data to an output file. -@xref{PRINT}, for more information on syntax and usage. The main -difference between @code{PRINT} and @code{WRITE} is that @cmd{WRITE} -uses write formats by default, where PRINT uses print formats. +@xref{PRINT}, for more information on syntax and usage. @cmd{PRINT} +and @cmd{WRITE} differ in only a few ways: + +@itemize @bullet +@item +@cmd{WRITE} uses write formats by default, whereas @cmd{PRINT} uses +print formats. + +@item +@cmd{PRINT} inserts a space between variables unless a format is +explicitly specified, but @cmd{WRITE} never inserts space between +variables in output. + +@item +@cmd{PRINT} inserts a space at the beginning of each line that it +writes to an output file (and @cmd{PRINT EJECT} inserts @samp{1} at +the beginning of each line that should begin a new page), but +@cmd{WRITE} does not. -The sole additional difference is that if @cmd{WRITE} is used to send output -to a binary file, carriage control characters will not be output. -@xref{FILE HANDLE}, for information on how to declare a file as binary. +@item +@cmd{PRINT} outputs the system-missing value according to its +specified output format, whereas @cmd{WRITE} outputs the +system-missing value as a field filled with spaces. Binary formats +are an exception. +@end itemize @setfilename ignored