X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-io.texi;h=cfb99eb7927d8b2f7a7940e32c81d9f2a4ef4e4b;hb=f3dc389d453a81ffd3957662e287c96b3a73c4bb;hp=3439ca3a8a0631c8dc0f777ee3e0e38f0029367d;hpb=a1d8ce0ad176357cf250ec065e0bf1a4520a19e3;p=pspp-builds.git

diff --git a/doc/data-io.texi b/doc/data-io.texi
index 3439ca3a..cfb99eb7 100644
--- a/doc/data-io.texi
+++ b/doc/data-io.texi
@@ -14,6 +14,8 @@ their sex, age, etc.@: and their responses are all data and the data
 pertaining to single respondent is a case.
 This chapter examines
 the PSPP commands for defining variables and reading and writing data.
+There are alternative commands to  read data from predefined sources
+such as system files or databases (@xref{GET, GET DATA}.)
 
 @quotation Note
 These commands tell PSPP how to read data, but the data will not
@@ -23,13 +25,15 @@ actually be read until a procedure is executed.
 @menu
 * BEGIN DATA::                  Embed data within a syntax file.
 * CLOSE FILE HANDLE::           Close a file handle.
+* DATAFILE ATTRIBUTE::          Set custom attributes on data files.
+* DATASET::                     Manage multiple datasets.
 * DATA LIST::                   Fundamental data reading command.
 * END CASE::                    Output the current case.
 * END FILE::                    Terminate the current input program.
 * FILE HANDLE::                 Support for special file formats.
 * INPUT PROGRAM::               Support for complex input programs.
-* LIST::                        List cases in the active file.
-* NEW FILE::                    Clear the active file and dictionary.
+* LIST::                        List cases in the active dataset.
+* NEW FILE::                    Clear the active dataset.
 * PRINT::                       Display values in print formats.
 * PRINT EJECT::                 Eject the current page then print.
 * PRINT SPACE::                 Print blank lines.
@@ -75,18 +79,137 @@ given file.  The only specification is the name of the handle to close.
 Afterward
 @cmd{FILE HANDLE}.
 
-If the file handle name refers to a scratch file, then the storage
-associated with the scratch file in memory or on disk will be freed.
-If the scratch file is in use, e.g.@: it has been specified on a
-@cmd{GET} command whose execution has not completed, then freeing is
-delayed until it is no longer in use.
-
 The file named INLINE, which represents data entered between @cmd{BEGIN
 DATA} and @cmd{END DATA}, cannot be closed.  Attempts to close it with
 @cmd{CLOSE FILE HANDLE} have no effect.
 
 @cmd{CLOSE FILE HANDLE} is a PSPP extension.
 
+@node DATAFILE ATTRIBUTE
+@section DATAFILE ATTRIBUTE
+@vindex DATAFILE ATTRIBUTE
+
+@display
+DATAFILE ATTRIBUTE
+         ATTRIBUTE=name('value') [name('value')]@dots{}
+         ATTRIBUTE=name@b{[}index@b{]}('value') [name@b{[}index@b{]}('value')]@dots{}
+         DELETE=name [name]@dots{}
+         DELETE=name@b{[}index@b{]} [name@b{[}index@b{]}]@dots{}
+@end display
+
+@cmd{DATAFILE ATTRIBUTE} adds, modifies, or removes user-defined
+attributes associated with the active dataset.  Custom data file
+attributes are not interpreted by PSPP, but they are saved as part of
+system files and may be used by other software that reads them.
+
+Use the ATTRIBUTE subcommand to add or modify a custom data file
+attribute.  Specify the name of the attribute as an identifier
+(@pxref{Tokens}), followed by the desired value, in parentheses, as a
+quoted string.  Attribute names that begin with @code{$} are reserved
+for PSPP's internal use, and attribute names that begin with @code{@@}
+or @code{$@@} are not displayed by most PSPP commands that display
+other attributes.  Other attribute names are not treated specially.
+
+Attributes may also be organized into arrays.  To assign to an array
+element, add an integer array index enclosed in square brackets
+(@code{[} and @code{]}) between the attribute name and value.  Array
+indexes start at 1, not 0.  An attribute array that has a single
+element (number 1) is not distinguished from a non-array attribute.
+
+Use the DELETE subcommand to delete an attribute.  Specify an
+attribute name by itself to delete an entire attribute, including all
+array elements for attribute arrays.  Specify an attribute name
+followed by an array index in square brackets to delete a single
+element of an attribute array.  In the latter case, all the array
+elements numbered higher than the deleted element are shifted down,
+filling the vacated position.
+
+To associate custom attributes with particular variables, instead of
+with the entire active dataset, use @cmd{VARIABLE ATTRIBUTE}
+(@pxref{VARIABLE ATTRIBUTE}) instead.
+
+@cmd{DATAFILE ATTRIBUTE} takes effect immediately.  It is not affected
+by conditional and looping structures such as @cmd{DO IF} or
+@cmd{LOOP}.
+
+@node DATASET
+@section DATASET commands
+@vindex DATASET
+
+@display
+DATASET NAME name [WINDOW=@{ASIS,FRONT@}].
+DATASET ACTIVATE name [WINDOW=@{ASIS,FRONT@}].
+DATASET COPY name [WINDOW=@{MINIMIZED,HIDDEN,FRONT@}].
+DATASET DECLARE name [WINDOW=@{MINIMIZED,HIDDEN,FRONT@}].
+DATASET CLOSE @{name,*,ALL@}.
+DATASET DISPLAY.
+@end display
+
+The @cmd{DATASET} commands simplify use of multiple datasets within a
+PSPP session.  They allow datasets to be created and destroyed.  At
+any given time, most PSPP commands work with a single dataset, called
+the active dataset.
+
+@vindex DATASET NAME
+The DATASET NAME command gives the active dataset the specified name, or
+if it already had a name, it renames it.  If another dataset already
+had the given name, that dataset is deleted.
+
+@vindex DATASET ACTIVATE
+The DATASET ACTIVATE command selects the named dataset, which must
+already exist, as the active dataset.  Before switching the active
+dataset, any pending transformations are executed, as if @cmd{EXECUTE}
+had been specified.  If the active dataset is unnamed before
+switching, then it is deleted and becomes unavailable after switching.
+
+@vindex DATASET COPY
+The DATASET COPY command creates a new dataset with the specified
+name, whose contents are a copy of the active dataset.  Any pending
+transformations are executed, as if @cmd{EXECUTE} had been specified,
+before making the copy.  If a dataset with the given name already
+exists, it is replaced.  If the name is the name of the active
+dataset, then the active dataset becomes unnamed.
+
+@vindex DATASET DECLARE
+The DATASET DECLARE command creates a new dataset that is initially
+``empty,'' that is, it has no dictionary or data.  If a dataset with
+the given name already exists, this has no effect.  The new dataset
+can be used with commands that support output to a dataset,
+e.g. AGGREGATE (@pxref{AGGREGATE}).
+
+@vindex DATASET CLOSE
+The DATASET CLOSE command deletes a dataset.  If the active dataset is
+specified by name, or if @samp{*} is specified, then the active
+dataset becomes unnamed.  If a different dataset is specified by name,
+then it is deleted and becomes unavailable.  Specifying ALL deletes
+all datasets except for the active dataset, which becomes unnamed.
+
+@vindex DATASET DISPLAY
+The DATASET DISPLAY command lists all the currently defined datasets.
+
+Many DATASET commands accept an optional WINDOW subcommand.  In the
+PSPPIRE GUI, the value given for this subcommand influences how the
+dataset's window is displayed.  Outside the GUI, the WINDOW subcommand
+has no effect.  The valid values are:
+
+@table @asis
+@item ASIS
+Do not change how the window is displayed.  This is the default for
+DATASET NAME and DATASET ACTIVATE.
+
+@item FRONT
+Raise the dataset's window to the top.  Make it the default dataset
+for running syntax.
+
+@item MINIMIZED
+Display the window ``minimized'' to an icon.  Prefer other datasets
+for running syntax.  This is the default for DATASET COPY and DATASET
+DECLARE.
+
+@item HIDDEN
+Hide the dataset's window.  Prefer other datasets for running syntax.
+@end table
+
 @node DATA LIST
 @section DATA LIST
 @vindex DATA LIST
@@ -108,6 +231,10 @@ free format.
 
 Each form of @cmd{DATA LIST} is described in detail below.
 
+@xref{GET DATA}, for a command that offers a few enhancements over
+DATA LIST and that may be substituted for DATA LIST in many
+situations.
+
 @menu
 * DATA LIST FIXED::             Fixed columnar locations for data.
 * DATA LIST FREE::              Any spacing you like.
@@ -125,7 +252,7 @@ Each form of @cmd{DATA LIST} is described in detail below.
 @display
 DATA LIST [FIXED]
         @{TABLE,NOTABLE@}
-        [FILE='file-name']
+        [FILE='file-name' [ENCODING='encoding']]
         [RECORDS=record_count]
         [END=end_var]
         [SKIP=record_count]
@@ -145,6 +272,8 @@ external file.  It may be used to specify a file name as a string or a
 file handle (@pxref{File Handles}).  If the FILE subcommand is not used,
 then input is assumed to be specified within the command file using
 @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}).
+The ENCODING subcommand may only be used if the FILE subcommand is also used.
+It specifies the character encoding of the file.
 
 The optional RECORDS subcommand, which takes a single integer as an
 argument, is used to specify the number of lines per record.  If RECORDS
@@ -263,7 +392,7 @@ Defines the following variables:
 
 @itemize @bullet
 @item
-@code{NAME}, a 10-character-wide long string variable, in columns 1
+@code{NAME}, a 10-character-wide string variable, in columns 1
 through 10.
 
 @item
@@ -304,15 +433,15 @@ Defines the following variables:
 @code{ID}, a numeric variable, in columns 1-5 of the first record.
 
 @item
-@code{NAME}, a 30-character long string variable, in columns 7-36 of the
+@code{NAME}, a 30-character string variable, in columns 7-36 of the
 first record.
 
 @item
-@code{SURNAME}, a 30-character long string variable, in columns 38-67 of
+@code{SURNAME}, a 30-character string variable, in columns 38-67 of
 the first record.
 
 @item
-@code{MINITIAL}, a 1-character short string variable, in column 69 of
+@code{MINITIAL}, a 1-character string variable, in column 69 of
 the first record.
 
 @item
@@ -338,7 +467,7 @@ This example shows keywords abbreviated to their first 3 letters.
 DATA LIST FREE
         [(@{TAB,'c'@}, @dots{})]
         [@{NOTABLE,TABLE@}]
-        [FILE='file-name']
+        [FILE='file-name' [ENCODING='encoding']]
         [SKIP=record_cnt]
         /var_spec@dots{}
 
@@ -390,7 +519,7 @@ on field width apply, but they are honored on output.
 DATA LIST LIST
         [(@{TAB,'c'@}, @dots{})]
         [@{NOTABLE,TABLE@}]
-        [FILE='file-name']
+        [FILE='file-name' [ENCODING='encoding']]
         [SKIP=record_count]
         /var_spec@dots{}
 
@@ -438,15 +567,24 @@ For text files:
                 [/MODE=CHARACTER]
                 /TABWIDTH=tab_width
 
-For binary files with fixed-length records:
+For binary files in native encoding with fixed-length records:
         FILE HANDLE handle_name
                 /NAME='file-name'
                 /MODE=IMAGE
                 [/LRECL=rec_len]
 
-To explicitly declare a scratch handle:
+For binary files in native encoding with variable-length records:
+        FILE HANDLE handle_name
+                /NAME='file-name'
+                /MODE=BINARY
+                [/LRECL=rec_len]
+
+For binary files encoded in EBCDIC:
         FILE HANDLE handle_name
-                /MODE=SCRATCH
+                /NAME='file-name'
+                /MODE=360
+                /RECFORM=@{FIXED,VARIABLE,SPANNED@}
+                [/LRECL=rec_len]
 @end display
 
 Use @cmd{FILE HANDLE} to associate a file handle name with a file and
@@ -465,28 +603,122 @@ file handle name must not already have been used in a previous
 invocation of @cmd{FILE HANDLE}, unless it has been closed by an
 intervening command (@pxref{CLOSE FILE HANDLE}).
 
-MODE specifies a file mode.  In CHARACTER mode, the default, the data
-file is read as a text file, according to the local system's 
-conventions, and each text line is read as one record.
-In CHARACTER mode, most input programs will expand tabs to spaces
-(@cmd{DATA LIST FREE} with explicitly specified delimiters is an
-exception).  By default, each tab is 4 characters wide, but an
-alternate width may be specified on TABWIDTH.  A tab width of 0
-suppresses tab expansion entirely.
-
-In IMAGE mode, the data file is opened in ANSI C binary mode.  Record
-length is fixed, with output data truncated or padded with spaces to
-the record length.  LRECL specifies the record length in bytes, with a
-default of 1024.  Tab characters are never expanded to spaces in
-binary mode.  Records
+The effect and syntax of FILE HANDLE depends on the selected MODE:
 
-The NAME subcommand specifies the name of the file associated with the
-handle.  It is required in CHARACTER and IMAGE modes.
+@itemize
+@item
+In CHARACTER mode, the default, the data file is read as a text file,
+according to the local system's conventions, and each text line is
+read as one record.
+
+In CHARACTER mode only, tabs are expanded to spaces by input programs,
+except by @cmd{DATA LIST FREE} with explicitly specified delimiters.
+Each tab is 4 characters wide by default, but TABWIDTH (a PSPP
+extension) may be used to specify an alternate width.  Use a TABWIDTH
+of 0 to suppress tab expansion.
+
+@item
+In IMAGE mode, the data file is treated as a series of fixed-length
+binary records.  LRECL should be used to specify the record length in
+bytes, with a default of 1024.  On input, it is an error if an IMAGE
+file's length is not a integer multiple of the record length.  On
+output, each record is padded with spaces or truncated, if necessary,
+to make it exactly the correct length.
+
+@item
+In BINARY mode, the data file is treated as a series of
+variable-length binary records.  LRECL may be specified, but its value
+is ignored.  The data for each record is both preceded and followed by
+a 32-bit signed integer in little-endian byte order that specifies the
+length of the record.  (This redundancy permits records in these
+files to be efficiently read in reverse order, although PSPP always
+reads them in forward order.)  The length does not include either
+integer.
 
-The SCRATCH mode designates the file handle as a scratch file handle.
-Its use is usually unnecessary because file handle names that begin with
-@samp{#} are assumed to refer to scratch files.  @pxref{File Handles},
-for more information.
+@item
+Mode 360 reads and writes files in formats first used for tapes in the
+1960s on IBM mainframe operating systems and still supported today by
+the modern successors of those operating systems.  For more
+information, see @cite{OS/400 Tape and Diskette Device Programming},
+available on IBM's website.
+
+Alphanumeric data in mode 360 files are encoded in EBCDIC.  PSPP
+translates EBCDIC to or from the host's native format as necessary on
+input or output, using an ASCII/EBCDIC translation that is one-to-one,
+so that a ``round trip'' from ASCII to EBCDIC back to ASCII, or vice
+versa, always yields exactly the original data.
+
+The RECFORM subcommand is required in mode 360.  The precise file
+format depends on its setting:
+
+@table @asis
+@item F
+@itemx FIXED
+This record format is equivalent to IMAGE mode, except for EBCDIC
+translation.
+
+IBM documentation calls this @code{*F} (fixed-length, deblocked)
+format.
+
+@item V
+@itemx VARIABLE
+The file comprises a sequence of zero or more variable-length blocks.
+Each block begins with a 4-byte @dfn{block descriptor word} (BDW).
+The first two bytes of the BDW are an unsigned integer in big-endian
+byte order that specifies the length of the block, including the BDW
+itself.  The other two bytes of the BDW are ignored on input and
+written as zeros on output.
+
+Following the BDW, the remainder of each block is a sequence of one or
+more variable-length records, each of which in turn begins with a
+4-byte @dfn{record descriptor word} (RDW) that has the same format as
+the BDW.  Following the RDW, the remainder of each record is the
+record data.
+
+The maximum length of a record in VARIABLE mode is 65,527 bytes:
+65,535 bytes (the maximum value of a 16-bit unsigned integer), minus 4
+bytes for the BDW, minus 4 bytes for the RDW.
+
+In mode VARIABLE, LRECL specifies a maximum, not a fixed, record
+length, in bytes.  The default is 8,192.
+
+IBM documentation calls this @code{*VB} (variable-length, blocked,
+unspanned) format.
+
+@item VS
+@itemx SPANNED
+The file format is like that of VARIABLE mode, except that logical
+records may be split among multiple physical records (called
+@dfn{segments}) or blocks.  In SPANNED mode, the third byte of each
+RDW is called the segment control character (SCC).  Odd SCC values
+cause the segment to be appended to a record buffer maintained in
+memory; even values also append the segment and then flush its
+contents to the input procedure.  Canonically, SCC value 0 designates
+a record not spanned among multiple segments, and values 1 through 3
+designate the first segment, the last segment, or an intermediate
+segment, respectively, within a multi-segment record.  The record
+buffer is also flushed at end of file regardless of the final record's
+SCC.
+
+The maximum length of a logical record in VARIABLE mode is limited
+only by memory available to PSPP.  Segments are limited to 65,527
+bytes, as in VARIABLE mode.
+
+This format is similar to what IBM documentation call @code{*VS}
+(variable-length, deblocked, spanned) format.
+@end table
+
+In mode 360, fields of type A that extend beyond the end of a record
+read from disk are padded with spaces in the host's native character
+set, which are then translated from EBCDIC to the native character
+set.  Thus, when the host's native character set is based on ASCII,
+these fields are effectively padded with character @code{X'80'}.  This
+wart is implemented for compatibility.
+@end itemize
+
+The NAME subcommand specifies the name of the file associated with the
+handle.  It is required in all modes but SCRATCH mode, in which its
+use is forbidden.
 
 @node INPUT PROGRAM
 @section INPUT PROGRAM
@@ -602,7 +834,7 @@ LIST.
 @end example
 
 The above example reads data from file @file{a.data}, then from
-@file{b.data}, and concatenates them into a single active file.
+@file{b.data}, and concatenates them into a single active dataset.
 
 @c If you change this example, change the regression test4 in
 @c tests/command/input-program.sh to match.
@@ -646,7 +878,7 @@ END INPUT PROGRAM.
 LIST/FORMAT=NUMBERED.
 @end example
 
-The above example causes an active file to be created consisting of 50
+The above example causes an active dataset to be created consisting of 50
 random variates between 0 and 10.
 
 @node LIST
@@ -657,8 +889,7 @@ random variates between 0 and 10.
 LIST
         /VARIABLES=var_list
         /CASES=FROM start_index TO end_index BY incr_index
-        /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@} 
-                @{NOWEIGHT,WEIGHT@}
+        /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@}
 @end display
 
 The @cmd{LIST} procedure prints the values of specified variables to the
@@ -666,7 +897,7 @@ listing file.
 
 The VARIABLES subcommand specifies the variables whose values are to be
 printed.  Keyword VARIABLES is optional.  If VARIABLES subcommand is not
-specified then all variables in the active file are printed.
+specified then all variables in the active dataset are printed.
 
 The CASES subcommand can be used to specify a subset of cases to be
 printed.  Specify FROM and the case number of the first case to print,
@@ -677,9 +908,7 @@ settings.  If CASES is not specified then all cases are printed.
 The FORMAT subcommand can be used to change the output format.  NUMBERED
 will print case numbers along with each case; UNNUMBERED, the default,
 causes the case numbers to be omitted.  The WRAP and SINGLE settings are
-currently not used.  WEIGHT will cause case weights to be printed along
-with variable values; NOWEIGHT, the default, causes case weights to be
-omitted from the output.
+currently not used.
 
 Case numbers start from 1.  They are counted after all transformations
 have been considered.
@@ -698,7 +927,8 @@ cannot fit on a single line, then a multi-line format will be used.
 NEW FILE.
 @end display
 
-@cmd{NEW FILE} command clears the current active file.
+@cmd{NEW FILE} command clears the dictionary and data from the current
+active dataset.
 
 @node PRINT
 @section PRINT
@@ -975,4 +1205,3 @@ specified output format, whereas @cmd{WRITE} outputs the
 system-missing value as a field filled with spaces.  Binary formats
 are an exception.
 @end itemize
-@setfilename ignored