Add documentation for the Kruskal-Wallis subcommand

[pspp] / doc / files.texi
diff --git a/doc/files.texi b/doc/files.texi

index ded40e7b9222aea362f9ee5b32f01ab4ec6cd24c..5f0c527c4738c90ffaeab349c357dba46c73e4ca 100644 (file)
--- a/doc/files.texi
+++ b/doc/files.texi
@@ -1,5 +1,5 @@
-@node System and Portable Files
-@chapter System Files and Portable Files
+@node System and Portable File IO
+@chapter System and Portable File I/O
  
  The commands in this chapter read, write, and examine system files and
  portable files.
@@ -10,8 +10,8 @@ portable files.
  * GET::                         Read from a system file.
  * GET DATA::                    Read from foreign files.
  * IMPORT::                      Read from a portable file.
-* MATCH FILES::                 Merge system files.
  * SAVE::                        Write to a system file.
+* SAVE TRANSLATE::              Write data in foreign file formats.
  * SYSFILE INFO::                Display system file dictionary.
  * XEXPORT::                     Write to a portable file, as a transformation.
  * XSAVE::                       Write to a system file, as a transformation.
@@ -39,25 +39,46 @@ Only variables with names that exist in both the active file and the
  system file are considered.  Variables with the same name but different
  types (numeric, string) will cause an error message.  Otherwise, the
  system file variables' attributes will replace those in their matching
-active file variables, as described below.
+active file variables:
  
+@itemize @bullet
+@item
  If a system file variable has a variable label, then it will replace the
  active file variable's variable label.  If the system file variable does
  not have a variable label, then the active file variable's variable
-label, if any, will be retained.
+label, if any, will be retained.  
+
+@item
+If the system file variable has custom attributes (@pxref{VARIABLE
+ATTRIBUTE}), then those attributes replace the active file variable's
+custom attributes.  If the system file variable does not have custom
+attributes, then the active file variable's custom attributes, if any,
+will be retained.
  
+@item
  If the active file variable is numeric or short string, then value
  labels and missing values, if any, will be copied to the active file
  variable.  If the system file variable does not have value labels or
  missing values, then those in the active file variable, if any, will not
  be disturbed.
+@end itemize
+
+In addition to properties of variables, some properties of the active
+file dictionary as a whole are updated:
  
-Finally, weighting of the active file is updated (@pxref{WEIGHT}).  If
-the active file has a weighting variable, and the system file does not,
-or if the weighting variable in the system file does not exist in the
-active file, then the active file weighting variable, if any, is
-retained.  Otherwise, the weighting variable in the system file becomes
-the active file weighting variable.
+@itemize @bullet
+@item
+If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
+then those attributes replace the active file variable's custom
+attributes.
+
+@item
+If the active file has a weighting variable (@pxref{WEIGHT}), and the
+system file does not, or if the weighting variable in the system file
+does not exist in the active file, then the active file weighting
+variable, if any, is retained.  Otherwise, the weighting variable in
+the system file becomes the active file weighting variable.
+@end itemize
  
  @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  active
@@ -385,7 +406,7 @@ GET DATA /TYPE=TXT
          [/IMPORTCASE=@{ALL,FIRST max_cases,PERCENT percent@}]
  
          /DELIMITERS="delimiters"
-        [/QUALIFIER="quote"]
+        [/QUALIFIER="quotes" [/ESCAPE]]
          [/DELCASE=@{LINE,VARIABLES n_variables@}]
          /VARIABLES=del_var [del_var]@dots{}
  where each del_var takes the form:
@@ -417,11 +438,22 @@ delimiter, immediately following @samp{\t}.  To read a data file in
  which each field appears on a separate line, specify the empty string
  for DELIMITERS.
  
-The optional QUALIFIER subcommand names a character that can be used
-to quote values within fields in the input.  A field that begins with
-the specified quote character ends at the next match quote.
-Intervening delimiters become part of the field, instead of
-terminating it.
+The optional QUALIFIER subcommand names one or more characters that
+can be used to quote values within fields in the input.  A field that
+begins with one of the specified quote characters ends at the next
+matching quote.  Intervening delimiters become part of the field,
+instead of terminating it.  The ability to specify more than one quote
+character is a PSPP extension.
+
+By default, a character specified on QUALIFIER cannot itself be
+embedded within a field that it quotes, because the quote character
+always terminates the quoted field.  With ESCAPE, however, a doubled
+quote character within a quoted field inserts a single instance of the
+quote into the field.  For example, if @samp{'} is specified on
+QUALIFIER, then without ESCAPE @code{'a''b'} specifies a pair of
+fields that contain @samp{a} and @samp{b}, but with ESCAPE it
+specifies a single field that contains @samp{a'b}.  ESCAPE is a PSPP
+extension.
  
  The DELCASE subcommand controls how data may be broken across lines in
  the data file.  With LINE, the default setting, each line must contain
@@ -453,7 +485,7 @@ The following syntax reads a file in the format used by
  @samp{/etc/passwd}:
  
  @c If you change this example, change the regression test in
-@c tests/command/get-data-txt-examples.sh to match.
+@c tests/language/data-io/get-data.at to match.
  @example
  GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
          /VARIABLES=username A20
@@ -480,7 +512,7 @@ Accord  2002    26613   17900   EX      1
  The following syntax can be used to read the used car data:
  
  @c If you change this example, change the regression test in
-@c tests/command/get-data-txt-examples.sh to match.
+@c tests/language/data-io/get-data.at to match.
  @example
  GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
          /VARIABLES=model A8
@@ -495,29 +527,29 @@ GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
  Consider the following information on animals in a pet store:
  
  @example
-"Pet Name", "Age", "Color", "Date Received", "Price", "Needs Walking", "Type"
+'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
  , (Years), , , (Dollars), ,
-"Rover", 4.5, Brown, "12 Feb 2004", 80, True, "Dog"
-"Charlie", , Gold, "5 Apr 2007", 12.3, False, "Fish"
-"Molly", 2, Black, "12 Dec 2006", 25, False, "Cat"
-"Gilly", , White, "10 Apr 2007", 10, False, "Guinea Pig"
+"Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
+"Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
+"Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
+"Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
  @end example
  
  @noindent
  The following syntax can be used to read the pet store data:
  
  @c If you change this example, change the regression test in
-@c tests/command/get-data-txt-examples.sh to match.
+@c tests/language/data-io/get-data.at to match.
  @example
-GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='"'
+GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
          /FIRSTCASE=3
          /VARIABLES=name A10
                     age F3.1
                     color A5
                     received EDATE10
                     price F5.2
-                   needs_walking A5
-                   type A10.
+                   height a5
+                   type a10.
  @end example
  
  @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
@@ -576,7 +608,7 @@ Accord  2002    26613   17900   EX      1
  The following syntax can be used to read the used car data:
  
  @c If you change this example, change the regression test in
-@c tests/command/get-data-txt-examples.sh to match.
+@c tests/language/data-io/get-data.at to match.
  @example
  GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
          /VARIABLES=model 0-7 A
@@ -619,99 +651,6 @@ data is read later, when a procedure is executed.
  Use of @cmd{IMPORT} to read a system file or scratch file is a PSPP
  extension.
  
-@node MATCH FILES
-@section MATCH FILES
-@vindex MATCH FILES
-
-@display
-MATCH FILES
-        /@{FILE,TABLE@}=@{*,'file-name'@}
-        /RENAME=(src_names=target_names)@dots{}
-        /IN=var_name
-
-        /BY=var_list
-        /DROP=var_list
-        /KEEP=var_list
-        /FIRST=var_name
-        /LAST=var_name
-        /MAP
-@end display
-
-@cmd{MATCH FILES} merges one or more system, portable, or scratch files,
-optionally
-including the active file.  Cases with the same values for BY
-variables are combined into a single case.  Cases with different
-values are output in order.  Thus, multiple sorted files are
-combined into a single sorted file based on the value of the BY
-variables.  The results of the merge become the new active file.
-
-Specify FILE with a system, portable, or scratch file as a file name
-string or file handle
-(@pxref{File Handles}), or with an asterisk (@samp{*}) to
-indicate the current active file.  The files specified on FILE are
-merged together based on the BY variables, or combined case-by-case if
-BY is not specified.
-
-Specify TABLE with a file to use it as a @dfn{table
-lookup file}.  Cases in table lookup files are not used up after
-they've been used once.  This means that data in table lookup files can
-correspond to any number of cases in FILE files.  Table lookup files
-correspond to lookup tables in traditional relational database systems.
-If a table lookup file contains more than one case with a given set of
-BY variables, only the first case is used.
-
-Any number of FILE and TABLE subcommands may be specified.
-Ordinarily, at least two FILE subcommands, or one FILE and at least
-one TABLE, should be specified.  Each instance of FILE or TABLE can be
-followed by any sequence of RENAME subcommands.  These have the same
-form and meaning as the corresponding subcommands of @cmd{GET}
-(@pxref{GET}), but apply only to variables in the given file.
-
-Each FILE or TABLE may optionally be followed by an IN subcommand,
-which creates a numeric variable with the specified name and format
-F1.0.  The IN variable takes value 1 in a case if the given file
-contributed a row to the merged file, 0 otherwise.  The DROP, KEEP,
-and RENAME subcommands do not affect IN variables.
-
-When more than one FILE or TABLE contains a variable with a given
-name, those variables must all have the same type (numeric or string)
-and, for string variables, the same width.  This rules applies to
-variable names after renaming with RENAME; thus, RENAME can be used to
-resolve conflicts.
-
-FILE and TABLE must be specified at the beginning of the command, with
-any RENAME or IN specifications immediately after the corresponding
-FILE or TABLE.  These subcommands are followed by BY, DROP, KEEP,
-FIRST, LAST, and MAP.
-
-The BY subcommand specifies a list of variables that are used to match
-cases from each of the files.  When TABLE or IN is used, BY is
-required; otherwise, it is optional.  When BY is specified, all the
-files named on FILE and TABLE subcommands must be sorted in ascending
-order of the BY variables.  Variables belonging to files that are not
-present for the current case are set to the system-missing value for
-numeric variables or spaces for string variables.
-
-The DROP and KEEP subcommands allow variables to be dropped from or
-reordered within the new active file.  These subcommands have the same
-form and meaning as the corresponding subcommands of @cmd{GET}
-(@pxref{GET}).  They apply to the new active file as a whole, not to
-individual input files.  The variable names specified on DROP and KEEP
-are those after any renaming with RENAME.
-
-The optional FIRST and LAST subcommands name variables that @cmd{MATCH
-FILES} adds to the active file.  The new variables are numeric with
-print and write format F1.0.  The value of the FIRST variable is 1 in
-the first case with a given set of values for the BY variables, and 0
-in other cases.  Similarly, the LAST variable is 1 in the last case
-with a given of BY values, and 0 in other cases.
-
-@cmd{MATCH FILES} may not be specified following @cmd{TEMPORARY}
-(@pxref{TEMPORARY}) if the active file is used as an input source.
-
-Use of portable or scratch files on @cmd{MATCH FILES} is a PSPP
-extension.
-
  @node SAVE
  @section SAVE
  @vindex SAVE
@@ -782,6 +721,140 @@ The NAMES and MAP subcommands are currently ignored.
  
  @cmd{SAVE} causes the data to be read.  It is a procedure.
  
+@node SAVE TRANSLATE
+@section SAVE TRANSLATE
+@vindex SAVE TRANSLATE
+
+@display
+SAVE TRANSLATE
+        /OUTFILE=@{'file-name',file_handle@}
+        /TYPE=@{CSV,TAB@}
+        [/REPLACE]
+        [/MISSING=@{IGNORE,RECODE@}]
+
+        [/DROP=var_list]
+        [/KEEP=var_list]
+        [/RENAME=(src_names=target_names)@dots{}]
+        [/UNSELECTED=@{RETAIN,DELETE@}]
+        [/MAP]
+
+        @dots{}additional subcommands depending on TYPE@dots{}
+@end display
+
+The @cmd{SAVE TRANSLATE} command is used to save data into various
+formats understood by other applications.
+
+The OUTFILE and TYPE subcommands are mandatory.  OUTFILE specifies the
+file to be written, as a string file name or a file handle
+(@pxref{File Handles}).  TYPE determines the type of the file or
+source to read.  It must be one of the following:
+
+@table @asis
+@item CSV
+Comma-separated value format,
+
+@item TAB
+Tab-delimited format.
+@end table
+
+By default, SAVE TRANSLATE will not overwrite an existing file.  Use
+REPLACE to force an existing file to be overwritten.
+
+With MISSING=IGNORE, the default, SAVE TRANSLATE treats user-missing
+values as if they were not missing.  Specify MISSING=RECODE to output
+numeric user-missing values like system-missing values and string
+user-missing values as all spaces.
+
+By default, all the variables in the active file dictionary are saved
+to the system file, but DROP or KEEP can select a subset of variable
+to save.  The RENAME subcommand can also be used to change the names
+under which variables are saved.  UNSELECTED determines whether cases
+filtered out by the FILTER command are written to the output file.
+These subcommands have the same syntax and meaning as on the
+@cmd{SAVE} command (@pxref{SAVE}).
+
+Each supported file type has additional subcommands, explained in
+separate sections below.
+
+@cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
+
+@menu
+* SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
+@end menu
+
+@node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
+@subsection Writing Comma- and Tab-Separated Data Files
+
+@display
+SAVE TRANSLATE
+        /OUTFILE=@{'file-name',file_handle@}
+        /TYPE=CSV
+        [/REPLACE]
+        [/MISSING=@{IGNORE,RECODE@}]
+
+        [/DROP=var_list]
+        [/KEEP=var_list]
+        [/RENAME=(src_names=target_names)@dots{}]
+        [/UNSELECTED=@{RETAIN,DELETE@}]
+
+        [/FIELDNAMES]
+        [/CELLS=@{VALUES,LABELS@}]
+        [/TEXTOPTIONS DELIMITER='delimiter']
+        [/TEXTOPTIONS QUALIFIER='qualifier']
+        [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
+        [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
+@end display
+
+The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
+comma- or tab-separated value format similar to that described by
+RFC@tie{}4180.  Each variable becomes one output column, and each case
+becomes one line of output.  If FIELDNAMES is specified, an additional
+line at the top of the output file lists variable names.
+
+The CELLS and TEXTOPTIONS FORMAT settings determine how values are
+written to the output file:
+
+@table @asis
+@item CELLS=VALUES FORMAT=PLAIN (the default settings)
+Writes variables to the output in ``plain'' formats that ignore the
+details of variable formats.  Numeric values are written as plain
+decimal numbers with enough digits to indicate their exact values in
+machine representation.  Numeric values include @samp{e} followed by
+an exponent if the exponent value would be less than -4 or greater
+than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
+format.  WKDAY and MONTH values are written as decimal numbers.
+
+Numeric values use, by default, the decimal point character set with
+SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
+to force a particular decimal point character.
+
+@item CELLS=VALUES FORMAT=VARIABLE
+Writes variables using their print formats.  Leading and trailing
+spaces are removed from numeric values, and trailing spaces are
+removed from string values.
+
+@item CELLS=LABEL FORMAT=PLAIN
+@itemx CELLS=LABEL FORMAT=VARIABLE
+Writes value labels where they exist, and otherwise writes the values
+themselves as described above.
+@end table
+
+Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
+values are output as a single space.
+
+For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
+TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
+that separate values within a line.  If DELIMITER is specified, then
+the specified string separate values.  If DELIMITER is not specified,
+then the default is a comma with DECIMAL=DOT or a semicolon with
+DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
+decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
+
+The TEXTOPTIONS QUALIFIER setting specifies a character that is output
+before and after a value that contains the delimiter character or the
+qualifier character.  The default is a double quote (@samp{@@}).  A
+qualifier character that appears within a value is doubled.
+
  @node SYSFILE INFO
  @section SYSFILE INFO
  @vindex SYSFILE INFO
@@ -862,4 +935,3 @@ the data is read by a procedure or procedure-like command.
  @end itemize
  
  @xref{SAVE}, for more information.
-@setfilename ignored