Change terminology from "active file" to "active dataset".

[pspp-builds.git] / doc / data-io.texi
diff --git a/doc/data-io.texi b/doc/data-io.texi

index 8ca251085df691f1cd7bef535e0c6f615f4f28b6..81bdd4770e44ea6f8ed8cec0afc7f1e6025b5ee2 100644 (file)
--- a/doc/data-io.texi
+++ b/doc/data-io.texi
@@ -14,6 +14,8 @@ their sex, age, etc.@: and their responses are all data and the data
  pertaining to single respondent is a case.
  This chapter examines
  the PSPP commands for defining variables and reading and writing data.
+There are alternative commands to  read data from predefined sources
+such as system files or databases (@xref{GET, GET DATA}.)
  
  @quotation Note
  These commands tell PSPP how to read data, but the data will not
@@ -23,13 +25,14 @@ actually be read until a procedure is executed.
  @menu
  * BEGIN DATA::                  Embed data within a syntax file.
  * CLOSE FILE HANDLE::           Close a file handle.
+* DATAFILE ATTRIBUTE::          Set custom attributes on data files.
  * DATA LIST::                   Fundamental data reading command.
  * END CASE::                    Output the current case.
  * END FILE::                    Terminate the current input program.
  * FILE HANDLE::                 Support for special file formats.
  * INPUT PROGRAM::               Support for complex input programs.
-* LIST::                        List cases in the active file.
-* NEW FILE::                    Clear the active file and dictionary.
+* LIST::                        List cases in the active dataset.
+* NEW FILE::                    Clear the active dataset.
  * PRINT::                       Display values in print formats.
  * PRINT EJECT::                 Eject the current page then print.
  * PRINT SPACE::                 Print blank lines.
@@ -87,6 +90,53 @@ DATA} and @cmd{END DATA}, cannot be closed.  Attempts to close it with
  
  @cmd{CLOSE FILE HANDLE} is a PSPP extension.
  
+@node DATAFILE ATTRIBUTE
+@section DATAFILE ATTRIBUTE
+@vindex DATAFILE ATTRIBUTE
+
+@display
+DATAFILE ATTRIBUTE
+         ATTRIBUTE=name('value') [name('value')]@dots{}
+         ATTRIBUTE=name@b{[}index@b{]}('value') [name@b{[}index@b{]}('value')]@dots{}
+         DELETE=name [name]@dots{}
+         DELETE=name@b{[}index@b{]} [name@b{[}index@b{]}]@dots{}
+@end display
+
+@cmd{DATAFILE ATTRIBUTE} adds, modifies, or removes user-defined
+attributes associated with the active dataset.  Custom data file
+attributes are not interpreted by PSPP, but they are saved as part of
+system files and may be used by other software that reads them.
+
+Use the ATTRIBUTE subcommand to add or modify a custom data file
+attribute.  Specify the name of the attribute as an identifier
+(@pxref{Tokens}), followed by the desired value, in parentheses, as a
+quoted string.  Attribute names that begin with @code{$} are reserved
+for PSPP's internal use, and attribute names that begin with @code{@@}
+or @code{$@@} are not displayed by most PSPP commands that display
+other attributes.  Other attribute names are not treated specially.
+
+Attributes may also be organized into arrays.  To assign to an array
+element, add an integer array index enclosed in square brackets
+(@code{[} and @code{]}) between the attribute name and value.  Array
+indexes start at 1, not 0.  An attribute array that has a single
+element (number 1) is not distinguished from a non-array attribute.
+
+Use the DELETE subcommand to delete an attribute.  Specify an
+attribute name by itself to delete an entire attribute, including all
+array elements for attribute arrays.  Specify an attribute name
+followed by an array index in square brackets to delete a single
+element of an attribute array.  In the latter case, all the array
+elements numbered higher than the deleted element are shifted down,
+filling the vacated position.
+
+To associate custom attributes with particular variables, instead of
+with the entire active dataset, use @cmd{VARIABLE ATTRIBUTE}
+(@pxref{VARIABLE ATTRIBUTE}) instead.
+
+@cmd{DATAFILE ATTRIBUTE} takes effect immediately.  It is not affected
+by conditional and looping structures such as @cmd{DO IF} or
+@cmd{LOOP}.
+
  @node DATA LIST
  @section DATA LIST
  @vindex DATA LIST
@@ -129,7 +179,7 @@ situations.
  @display
  DATA LIST [FIXED]
          @{TABLE,NOTABLE@}
-        [FILE='file-name']
+        [FILE='file-name' [ENCODING='encoding']]
          [RECORDS=record_count]
          [END=end_var]
          [SKIP=record_count]
@@ -149,6 +199,8 @@ external file.  It may be used to specify a file name as a string or a
  file handle (@pxref{File Handles}).  If the FILE subcommand is not used,
  then input is assumed to be specified within the command file using
  @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}).
+The ENCODING subcommand may only be used if the FILE subcommand is also used.
+It specifies the character encoding of the file.
  
  The optional RECORDS subcommand, which takes a single integer as an
  argument, is used to specify the number of lines per record.  If RECORDS
@@ -267,7 +319,7 @@ Defines the following variables:
  
  @itemize @bullet
  @item
-@code{NAME}, a 10-character-wide long string variable, in columns 1
+@code{NAME}, a 10-character-wide string variable, in columns 1
  through 10.
  
  @item
@@ -308,15 +360,15 @@ Defines the following variables:
  @code{ID}, a numeric variable, in columns 1-5 of the first record.
  
  @item
-@code{NAME}, a 30-character long string variable, in columns 7-36 of the
+@code{NAME}, a 30-character string variable, in columns 7-36 of the
  first record.
  
  @item
-@code{SURNAME}, a 30-character long string variable, in columns 38-67 of
+@code{SURNAME}, a 30-character string variable, in columns 38-67 of
  the first record.
  
  @item
-@code{MINITIAL}, a 1-character short string variable, in column 69 of
+@code{MINITIAL}, a 1-character string variable, in column 69 of
  the first record.
  
  @item
@@ -342,7 +394,7 @@ This example shows keywords abbreviated to their first 3 letters.
  DATA LIST FREE
          [(@{TAB,'c'@}, @dots{})]
          [@{NOTABLE,TABLE@}]
-        [FILE='file-name']
+        [FILE='file-name' [ENCODING='encoding']]
          [SKIP=record_cnt]
          /var_spec@dots{}
  
@@ -394,7 +446,7 @@ on field width apply, but they are honored on output.
  DATA LIST LIST
          [(@{TAB,'c'@}, @dots{})]
          [@{NOTABLE,TABLE@}]
-        [FILE='file-name']
+        [FILE='file-name' [ENCODING='encoding']]
          [SKIP=record_count]
          /var_spec@dots{}
  
@@ -720,7 +772,7 @@ LIST.
  @end example
  
  The above example reads data from file @file{a.data}, then from
-@file{b.data}, and concatenates them into a single active file.
+@file{b.data}, and concatenates them into a single active dataset.
  
  @c If you change this example, change the regression test4 in
  @c tests/command/input-program.sh to match.
@@ -764,7 +816,7 @@ END INPUT PROGRAM.
  LIST/FORMAT=NUMBERED.
  @end example
  
-The above example causes an active file to be created consisting of 50
+The above example causes an active dataset to be created consisting of 50
  random variates between 0 and 10.
  
  @node LIST
@@ -775,8 +827,7 @@ random variates between 0 and 10.
  LIST
          /VARIABLES=var_list
          /CASES=FROM start_index TO end_index BY incr_index
-        /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@} 
-                @{NOWEIGHT,WEIGHT@}
+        /FORMAT=@{UNNUMBERED,NUMBERED@} @{WRAP,SINGLE@}
  @end display
  
  The @cmd{LIST} procedure prints the values of specified variables to the
@@ -784,7 +835,7 @@ listing file.
  
  The VARIABLES subcommand specifies the variables whose values are to be
  printed.  Keyword VARIABLES is optional.  If VARIABLES subcommand is not
-specified then all variables in the active file are printed.
+specified then all variables in the active dataset are printed.
  
  The CASES subcommand can be used to specify a subset of cases to be
  printed.  Specify FROM and the case number of the first case to print,
@@ -795,9 +846,7 @@ settings.  If CASES is not specified then all cases are printed.
  The FORMAT subcommand can be used to change the output format.  NUMBERED
  will print case numbers along with each case; UNNUMBERED, the default,
  causes the case numbers to be omitted.  The WRAP and SINGLE settings are
-currently not used.  WEIGHT will cause case weights to be printed along
-with variable values; NOWEIGHT, the default, causes case weights to be
-omitted from the output.
+currently not used.
  
  Case numbers start from 1.  They are counted after all transformations
  have been considered.
@@ -816,7 +865,8 @@ cannot fit on a single line, then a multi-line format will be used.
  NEW FILE.
  @end display
  
-@cmd{NEW FILE} command clears the current active file.
+@cmd{NEW FILE} command clears the dictionary and data from the current
+active dataset.
  
  @node PRINT
  @section PRINT
@@ -1093,4 +1143,3 @@ specified output format, whereas @cmd{WRITE} outputs the
  system-missing value as a field filled with spaces.  Binary formats
  are an exception.
  @end itemize
-@setfilename ignored