X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-io.texi;h=1316f603a6aa9d8c0ee832d460817ba645531570;hb=refs%2Fheads%2Fpivot-table2;hp=4862ccc5964e4d2e9783e5900b1f28901b542caf;hpb=e8b26fb0d765310d4c7400c39465008f1bb8601d;p=pspp diff --git a/doc/data-io.texi b/doc/data-io.texi index 4862ccc596..1316f603a6 100644 --- a/doc/data-io.texi +++ b/doc/data-io.texi @@ -277,8 +277,9 @@ external file. It may be used to specify a file name as a string or a file handle (@pxref{File Handles}). If the @subcmd{FILE} subcommand is not used, then input is assumed to be specified within the command file using @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}). -The @subcmd{ENCODING} subcommand may only be used if the @subcmd{FILE} subcommand is also used. -It specifies the character encoding of the file. +The @subcmd{ENCODING} subcommand may only be used if the @subcmd{FILE} +subcommand is also used. It specifies the character encoding of the +file. @xref{INSERT}, for information on supported encodings. The optional @subcmd{RECORDS} subcommand, which takes a single integer as an argument, is used to specify the number of lines per record. @@ -503,7 +504,8 @@ of quoting is allowed. The @subcmd{NOTABLE} and @subcmd{TABLE} subcommands are as in @cmd{DATA LIST FIXED} above. @subcmd{NOTABLE} is the default. -The @subcmd{FILE} and @subcmd{SKIP} subcommands are as in @cmd{DATA LIST FIXED} above. +The @subcmd{FILE}, @subcmd{SKIP}, and @subcmd{ENCODING} subcommands +are as in @cmd{DATA LIST FIXED} above. The variables to be parsed are given as a single list of variable names. This list must be introduced by a single slash (@samp{/}). The set of @@ -525,7 +527,7 @@ on field width apply, but they are honored on output. DATA LIST LIST [(@{TAB,'@var{c}'@}, @dots{})] [@{NOTABLE,TABLE@}] - [FILE='@var{file_name'} [ENCODING='@var{encoding}']] + [FILE='@var{file_name}' [ENCODING='@var{encoding}']] [SKIP=@var{record_count}] /@var{var_spec}@dots{} @@ -571,19 +573,23 @@ For text files: FILE HANDLE @var{handle_name} /NAME='@var{file_name} [/MODE=CHARACTER] + [/ENDS=@{CR,CRLF@}] /TABWIDTH=@var{tab_width} + [ENCODING='@var{encoding}'] For binary files in native encoding with fixed-length records: FILE HANDLE @var{handle_name} /NAME='@var{file_name}' /MODE=IMAGE [/LRECL=@var{rec_len}] + [ENCODING='@var{encoding}'] For binary files in native encoding with variable-length records: FILE HANDLE @var{handle_name} /NAME='@var{file_name}' /MODE=BINARY [/LRECL=@var{rec_len}] + [ENCODING='@var{encoding}'] For binary files encoded in EBCDIC: FILE HANDLE @var{handle_name} @@ -591,6 +597,7 @@ For binary files encoded in EBCDIC: /MODE=360 /RECFORM=@{FIXED,VARIABLE,SPANNED@} [/LRECL=@var{rec_len}] + [ENCODING='@var{encoding}'] @end display Use @cmd{FILE HANDLE} to associate a file handle name with a file and @@ -613,9 +620,8 @@ The effect and syntax of @cmd{FILE HANDLE} depends on the selected MODE: @itemize @item -In CHARACTER mode, the default, the data file is read as a text file, -according to the local system's conventions, and each text line is -read as one record. +In CHARACTER mode, the default, the data file is read as a text file. +Each text line is read as one record. In CHARACTER mode only, tabs are expanded to spaces by input programs, except by @cmd{DATA LIST FREE} with explicitly specified delimiters. @@ -623,6 +629,12 @@ Each tab is 4 characters wide by default, but TABWIDTH (a @pspp{} extension) may be used to specify an alternate width. Use a TABWIDTH of 0 to suppress tab expansion. +A file written in CHARACTER mode by default uses the line ends of the +system on which PSPP is running, that is, on Windows, the default is +CR LF line ends, and on other systems the default is LF only. Specify +ENDS as CR or CRLF to override the default. PSPP reads files using +either convention on any kind of system, regardless of ENDS. + @item In IMAGE mode, the data file is treated as a series of fixed-length binary records. LRECL should be used to specify the record length in @@ -726,6 +738,14 @@ The @subcmd{NAME} subcommand specifies the name of the file associated with the handle. It is required in all modes but SCRATCH mode, in which its use is forbidden. +The ENCODING subcommand specifies the encoding of text in the file. +For reading text files in CHARACTER mode, all of the forms described +for ENCODING on the INSERT command are supported (@pxref{INSERT}). +For reading in other file-based modes, encoding autodetection is not +supported; if the specified encoding requests autodetection then the +default encoding will be used. This is also true when a file handle +is used for writing a file in any mode. + @node INPUT PROGRAM @section INPUT PROGRAM @vindex INPUT PROGRAM @@ -775,6 +795,9 @@ so an infinite loop results. @cmd{END FILE}, when executed, stops the flow of input data and passes out of the @cmd{INPUT PROGRAM} structure. +@cmd{INPUT PROGRAM} must contain at least one @cmd{DATA LIST} or +@cmd{END FILE} command. + All this is very confusing. A few examples should help to clarify. @c If you change this example, change the regression test1 in @@ -942,13 +965,14 @@ active dataset. @display PRINT - OUTFILE='@var{file_name}' - RECORDS=@var{n_lines} - @{NOTABLE,TABLE@} + [OUTFILE='@var{file_name}'] + [RECORDS=@var{n_lines}] + [@{NOTABLE,TABLE@}] + [ENCODING='@var{encoding}'] [/[@var{line_no}] @var{arg}@dots{}] @var{arg} takes one of the following forms: - '@var{string}' [@var{start}-@var{end}] + '@var{string}' [@var{start}] @var{var_list} @var{start}-@var{end} [@var{type_spec}] @var{var_list} (@var{fortran_spec}) @var{var_list} * @@ -969,6 +993,11 @@ Handles}). If @subcmd{OUTFILE} is not present then output will be sent to inserted at beginning of each output line, even lines that otherwise would be blank. +The @subcmd{ENCODING} subcommand may only be used if the +@subcmd{OUTFILE} subcommand is also used. It specifies the character +encoding of the file. @xref{INSERT}, for information on supported +encodings. + The @subcmd{RECORDS} subcommand specifies the number of lines to be output. The number of lines may optionally be surrounded by parentheses. @@ -983,12 +1012,10 @@ line number, the next line number will be specified. Multiple lines may be specified using multiple slashes with the intended output for a line following its respective slash. - -Literal strings may be printed. Specify the string itself. Optionally -the string may be followed by a column number or range of column -numbers, specifying the location on the line for the string to be -printed. Otherwise, the string will be printed at the current position -on the line. +Literal strings may be printed. Specify the string itself. +Optionally the string may be followed by a column number, specifying +the column on the line where the string should start. Otherwise, the +string will be printed at the current position on the line. Variables to be printed can be specified in the same ways as available for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}). In addition, a @@ -1043,7 +1070,7 @@ written with a space inserted in the first column, as with @subcmd{PRINT}. @vindex PRINT SPACE @display -PRINT SPACE OUTFILE='file_name' n_lines. +PRINT SPACE [OUTFILE='file_name'] [ENCODING='@var{encoding}'] [n_lines]. @end display @cmd{PRINT SPACE} prints one or more blank lines to an output file. @@ -1053,6 +1080,10 @@ a file specified by file name as a string or file handle (@pxref{File Handles}). If OUTFILE is not specified then output will be directed to the listing file. +The @subcmd{ENCODING} subcommand may only be used if @subcmd{OUTFILE} +is also used. It specifies the character encoding of the file. +@xref{INSERT}, for information on supported encodings. + n_lines is also optional. If present, it is an expression (@pxref{Expressions}) specifying the number of blank lines to be printed. The expression must evaluate to a nonnegative value. @@ -1062,7 +1093,7 @@ printed. The expression must evaluate to a nonnegative value. @vindex REREAD @display -REREAD FILE=handle COLUMN=column. +REREAD [FILE=handle] [COLUMN=column] [ENCODING='@var{encoding}']. @end display The @cmd{REREAD} transformation allows the previous input line in a @@ -1082,6 +1113,10 @@ re-reading. Specify an expression (@pxref{Expressions}) evaluating to the first column that should be included in the re-read line. Columns are numbered from 1 at the left margin. +The @subcmd{ENCODING} subcommand may only be used if the @subcmd{FILE} +subcommand is also used. It specifies the character encoding of the +file. @xref{INSERT}, for information on supported encodings. + Issuing @code{REREAD} multiple times will not back up in the data file. Instead, it will re-read the same line multiple times.