X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-io.texi;h=8f727e50540f0e305c57195881a1cb886b359bce;hb=871f4456a207925fdce3df3150af3f3b263b2776;hp=d7e18ae8dec3109b72a3110099cfd371de0ac047;hpb=cbdfa35f7fb46948d1ee8aee7b7438cf1a5fd44c;p=pspp-builds.git diff --git a/doc/data-io.texi b/doc/data-io.texi index d7e18ae8..8f727e50 100644 --- a/doc/data-io.texi +++ b/doc/data-io.texi @@ -30,7 +30,6 @@ actually be read until a procedure is executed. * FILE HANDLE:: Support for special file formats. * INPUT PROGRAM:: Support for complex input programs. * LIST:: List cases in the active file. -* MATRIX DATA:: Read matrices in text format. * NEW FILE:: Clear the active file and dictionary. * PRINT:: Display values in print formats. * PRINT EJECT:: Eject the current page then print. @@ -138,9 +137,10 @@ Each form of @cmd{DATA LIST} is described in detail below. @display DATA LIST [FIXED] @{TABLE,NOTABLE@} - FILE='file-name' - RECORDS=record_count - END=end_var + [FILE='file-name'] + [RECORDS=record_count] + [END=end_var] + [SKIP=record_count] /[line_no] var_spec@dots{} where each var_spec takes one of the forms @@ -166,6 +166,10 @@ the list of variable specifications later in @cmd{DATA LIST}. The END subcommand is only useful in conjunction with @cmd{INPUT PROGRAM}. @xref{INPUT PROGRAM}, for details. +The optional SKIP subcommand specifies a number of records to skip at +the beginning of an input file. It can be used to skip over a row +that contains variable names, for example. + @cmd{DATA LIST} can optionally output a table describing how the data file will be read. The TABLE subcommand enables this output, and NOTABLE disables it. The default is to output the table. @@ -186,7 +190,7 @@ In columnar style, the starting column and ending column for the field are specified after the variable name, separated by a dash (@samp{-}). For instance, the third through fifth columns on a line would be specified @samp{3-5}. By default, variables are considered to be in -@samp{F} format (@pxref{Input/Output Formats}). (This default can be +@samp{F} format (@pxref{Input and Output Formats}). (This default can be changed; see @ref{SET} for more information.) In columnar style, to use a variable format other than the default, @@ -218,7 +222,7 @@ Implied decimal places also exist in FORTRAN style. A format specification with @var{d} decimal places also has @var{d} implied decimal places. -In addition to the standard format specifiers (@pxref{Input/Output +In addition to the standard format specifiers (@pxref{Input and Output Formats}), FORTRAN style defines some extensions: @table @asis @@ -346,8 +350,9 @@ This example shows keywords abbreviated to their first 3 letters. DATA LIST FREE [(@{TAB,'c'@}, @dots{})] [@{NOTABLE,TABLE@}] - FILE='file-name' - END=end_var + [FILE='file-name'] + [END=end_var] + [SKIP=record_cnt] /var_spec@dots{} where each var_spec takes one of the forms @@ -376,12 +381,12 @@ of quoting is allowed. The NOTABLE and TABLE subcommands are as in @cmd{DATA LIST FIXED} above. NOTABLE is the default. -The FILE and END subcommands are as in @cmd{DATA LIST FIXED} above. +The FILE, END, and SKIP subcommands are as in @cmd{DATA LIST FIXED} above. The variables to be parsed are given as a single list of variable names. This list must be introduced by a single slash (@samp{/}). The set of variable names may contain format specifications in parentheses -(@pxref{Input/Output Formats}). Format specifications apply to all +(@pxref{Input and Output Formats}). Format specifications apply to all variables back to the previous parenthesized format specification. In addition, an asterisk may be used to indicate that all variables @@ -398,8 +403,9 @@ on field width apply, but they are honored on output. DATA LIST LIST [(@{TAB,'c'@}, @dots{})] [@{NOTABLE,TABLE@}] - FILE='file-name' - END=end_var + [FILE='file-name'] + [END=end_var] + [SKIP=record_count] /var_spec@dots{} where each var_spec takes one of the forms @@ -688,108 +694,6 @@ cannot fit on a single line, then a multi-line format will be used. @cmd{LIST} is a procedure. It causes the data to be read. -@node MATRIX DATA -@section MATRIX DATA -@vindex MATRIX DATA - -@display -MATRIX DATA - /VARIABLES=var_list - /FILE='file-name' - /FORMAT=@{LIST,FREE@} @{LOWER,UPPER,FULL@} @{DIAGONAL,NODIAGONAL@} - /SPLIT=@{new_var,var_list@} - /FACTORS=var_list - /CELLS=n_cells - /N=n - /CONTENTS=@{N_VECTOR,N_SCALAR,N_MATRIX,MEAN,STDDEV,COUNT,MSE, - DFE,MAT,COV,CORR,PROX@} -@end display - -@cmd{MATRIX DATA} command reads square matrices in one of several textual -formats. @cmd{MATRIX DATA} clears the dictionary and replaces it and -reads a -data file. - -Use VARIABLES to specify the variables that form the rows and columns of -the matrices. You may not specify a variable named @code{VARNAME_}. You -should specify VARIABLES first. - -Specify the file to read on FILE, either as a file name string or a file -handle (@pxref{File Handles}). If FILE is not specified then matrix data -must immediately follow @cmd{MATRIX DATA} with a @cmd{BEGIN -DATA}@dots{}@cmd{END DATA} -construct (@pxref{BEGIN DATA}). - -The FORMAT subcommand specifies how the matrices are formatted. LIST, -the default, indicates that there is one line per row of matrix data; -FREE allows single matrix rows to be broken across multiple lines. This -is analogous to the difference between @cmd{DATA LIST FREE} and -@cmd{DATA LIST LIST} -(@pxref{DATA LIST}). LOWER, the default, indicates that the lower -triangle of the matrix is given; UPPER indicates the upper triangle; and -FULL indicates that the entire matrix is given. DIAGONAL, the default, -indicates that the diagonal is part of the data; NODIAGONAL indicates -that it is omitted. DIAGONAL/NODIAGONAL have no effect when FULL is -specified. - -The SPLIT subcommand is used to specify @cmd{SPLIT FILE} variables for the -input matrices (@pxref{SPLIT FILE}). Specify either a single variable -not specified on VARIABLES, or one or more variables that are specified -on VARIABLES. In the former case, the SPLIT values are not present in -the data and ROWTYPE_ may not be specified on VARIABLES. In the latter -case, the SPLIT values are present in the data. - -Specify a list of factor variables on FACTORS. Factor variables must -also be listed on VARIABLES. Factor variables are used when there are -some variables where, for each possible combination of their values, -statistics on the matrix variables are included in the data. - -If FACTORS is specified and ROWTYPE_ is not specified on VARIABLES, the -CELLS subcommand is required. Specify the number of factor variable -combinations that are given. For instance, if factor variable A has 2 -values and factor variable B has 3 values, specify 6. - -The N subcommand specifies a population number of observations. When N -is specified, one N record is output for each @cmd{SPLIT FILE}. - -Use CONTENTS to specify what sort of information the matrices include. -Each possible option is described in more detail below. When ROWTYPE_ -is specified on VARIABLES, CONTENTS is optional; otherwise, if CONTENTS -is not specified then /CONTENTS=CORR is assumed. - -@table @asis -@item N -@item N_VECTOR -Number of observations as a vector, one value for each variable. -@item N_SCALAR -Number of observations as a single value. -@item N_MATRIX -Matrix of counts. -@item MEAN -Vector of means. -@item STDDEV -Vector of standard deviations. -@item COUNT -Vector of counts. -@item MSE -Vector of mean squared errors. -@item DFE -Vector of degrees of freedom. -@item MAT -Generic matrix. -@item COV -Covariance matrix. -@item CORR -Correlation matrix. -@item PROX -Proximities matrix. -@end table - -The exact semantics of the matrices read by @cmd{MATRIX DATA} are complex. -Right now @cmd{MATRIX DATA} isn't too useful due to a lack of procedures -accepting or producing related data, so these semantics aren't -documented. Later, they'll be described here in detail. - @node NEW FILE @section NEW FILE @vindex NEW FILE