Add a callback when variables change

[pspp] / doc / data-io.texi
diff --git a/doc/data-io.texi b/doc/data-io.texi

index 7d11cc6f05ad75dd8236f6694c960f64d3f1ecb3..a5ba26f0186eace4fed20f708ae4f9b802916c71 100644 (file)
--- a/doc/data-io.texi
+++ b/doc/data-io.texi
@@ -39,6 +39,7 @@ actually be read until a procedure is executed.
  * INPUT PROGRAM::               Support for complex input programs.
  * LIST::                        List cases in the active dataset.
  * NEW FILE::                    Clear the active dataset.
+* MATRIX DATA::                 Defining matrix material for procedures.
  * PRINT::                       Display values in print formats.
  * PRINT EJECT::                 Eject the current page then print.
  * PRINT SPACE::                 Print blank lines.
@@ -277,8 +278,9 @@ external file.  It may be used to specify a file name as a string or a
  file handle (@pxref{File Handles}).  If the @subcmd{FILE} subcommand is not used,
  then input is assumed to be specified within the command file using
  @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}).
-The @subcmd{ENCODING} subcommand may only be used if the @subcmd{FILE} subcommand is also used.
-It specifies the character encoding of the file.
+The @subcmd{ENCODING} subcommand may only be used if the @subcmd{FILE}
+subcommand is also used.  It specifies the character encoding of the
+file.  @xref{INSERT}, for information on supported encodings.
  
  The optional @subcmd{RECORDS} subcommand, which takes a single integer as an
  argument, is used to specify the number of lines per record.
@@ -304,7 +306,7 @@ of variable specifications may be present.
  
  Each variable specification consists of a list of variable names
  followed by a description of their location on the input line.  Sets of
-variables may be specified using the @code{DATA LIST} TO convention
+variables may be specified using the @cmd{DATA LIST} @subcmd{TO} convention
  (@pxref{Sets of
  Variables}).  There are two ways to specify the location of the variable
  on the line: columnar style and FORTRAN style.
@@ -483,7 +485,10 @@ where each @var{var_spec} takes one of the forms
  @end display
  
  In free format, the input data is, by default, structured as a series
-of fields separated by spaces, tabs, commas, or line breaks.  Each
+of fields separated by spaces, tabs, or line breaks.
+If the current @subcmd{DECIMAL} separator is @subcmd{DOT} (@pxref{SET}),
+then commas are also treated as field separators.
+Each
  field's content may be unquoted, or it may be quoted with a pairs of
  apostrophes (@samp{'}) or double quotes (@samp{"}).  Unquoted white
  space separates fields but is not part of any field.  Any mix of
@@ -503,7 +508,8 @@ of quoting is allowed.
  The @subcmd{NOTABLE} and @subcmd{TABLE} subcommands are as in @cmd{DATA LIST FIXED} above.
  @subcmd{NOTABLE} is the default.
  
-The @subcmd{FILE} and @subcmd{SKIP} subcommands are as in @cmd{DATA LIST FIXED} above.
+The @subcmd{FILE}, @subcmd{SKIP}, and @subcmd{ENCODING} subcommands
+are as in @cmd{DATA LIST FIXED} above.
  
  The variables to be parsed are given as a single list of variable names.
  This list must be introduced by a single slash (@samp{/}).  The set of
@@ -525,7 +531,7 @@ on field width apply, but they are honored on output.
  DATA LIST LIST
          [(@{TAB,'@var{c}'@}, @dots{})]
          [@{NOTABLE,TABLE@}]
-        [FILE='@var{file_name'} [ENCODING='@var{encoding}']]
+        [FILE='@var{file_name}' [ENCODING='@var{encoding}']]
          [SKIP=@var{record_count}]
          /@var{var_spec}@dots{}
  
@@ -571,19 +577,23 @@ For text files:
          FILE HANDLE @var{handle_name}
                  /NAME='@var{file_name}
                  [/MODE=CHARACTER]
+                [/ENDS=@{CR,CRLF@}]
                  /TABWIDTH=@var{tab_width}
+                [ENCODING='@var{encoding}']
  
  For binary files in native encoding with fixed-length records:
          FILE HANDLE @var{handle_name}
                  /NAME='@var{file_name}'
                  /MODE=IMAGE
                  [/LRECL=@var{rec_len}]
+                [ENCODING='@var{encoding}']
  
  For binary files in native encoding with variable-length records:
          FILE HANDLE @var{handle_name}
                  /NAME='@var{file_name}'
                  /MODE=BINARY
                  [/LRECL=@var{rec_len}]
+                [ENCODING='@var{encoding}']
  
  For binary files encoded in EBCDIC:
          FILE HANDLE @var{handle_name}
@@ -591,6 +601,7 @@ For binary files encoded in EBCDIC:
                  /MODE=360
                  /RECFORM=@{FIXED,VARIABLE,SPANNED@}
                  [/LRECL=@var{rec_len}]
+                [ENCODING='@var{encoding}']
  @end display
  
  Use @cmd{FILE HANDLE} to associate a file handle name with a file and
@@ -613,9 +624,8 @@ The effect and syntax of @cmd{FILE HANDLE} depends on the selected MODE:
  
  @itemize
  @item
-In CHARACTER mode, the default, the data file is read as a text file,
-according to the local system's conventions, and each text line is
-read as one record.
+In CHARACTER mode, the default, the data file is read as a text file.
+Each text line is read as one record.
  
  In CHARACTER mode only, tabs are expanded to spaces by input programs,
  except by @cmd{DATA LIST FREE} with explicitly specified delimiters.
@@ -623,6 +633,12 @@ Each tab is 4 characters wide by default, but TABWIDTH (a @pspp{}
  extension) may be used to specify an alternate width.  Use a TABWIDTH
  of 0 to suppress tab expansion.
  
+A file written in CHARACTER mode by default uses the line ends of the
+system on which PSPP is running, that is, on Windows, the default is
+CR LF line ends, and on other systems the default is LF only.  Specify
+ENDS as CR or CRLF to override the default.  PSPP reads files using
+either convention on any kind of system, regardless of ENDS.
+
  @item
  In IMAGE mode, the data file is treated as a series of fixed-length
  binary records.  LRECL should be used to specify the record length in
@@ -726,6 +742,14 @@ The @subcmd{NAME} subcommand specifies the name of the file associated with the
  handle.  It is required in all modes but SCRATCH mode, in which its
  use is forbidden.
  
+The ENCODING subcommand specifies the encoding of text in the file.
+For reading text files in CHARACTER mode, all of the forms described
+for ENCODING on the INSERT command are supported (@pxref{INSERT}).
+For reading in other file-based modes, encoding autodetection is not
+supported; if the specified encoding requests autodetection then the
+default encoding will be used.  This is also true when a file handle
+is used for writing a file in any mode.
+
  @node INPUT PROGRAM
  @section INPUT PROGRAM
  @vindex INPUT PROGRAM
@@ -775,6 +799,9 @@ so an infinite loop results.  @cmd{END FILE}, when executed,
  stops the flow of input data and passes out of the @cmd{INPUT PROGRAM}
  structure.
  
+@cmd{INPUT PROGRAM} must contain at least one @cmd{DATA LIST} or
+@cmd{END FILE} command.
+
  All this is very confusing.  A few examples should help to clarify.
  
  @c If you change this example, change the regression test1 in
@@ -906,23 +933,19 @@ printed.  Keyword VARIABLES is optional.  If @subcmd{VARIABLES} subcommand is no
  specified then all variables in the active dataset are printed.
  
  The @subcmd{CASES} subcommand can be used to specify a subset of cases to be
-printed.  Specify FROM and the case number of the first case to print,
-TO and the case number of the last case to print, and BY and the number
+printed.  Specify @subcmd{FROM} and the case number of the first case to print,
+@subcmd{TO} and the case number of the last case to print, and @subcmd{BY} and the number
  of cases to advance between printing cases, or any subset of those
-settings.  If CASES is not specified then all cases are printed.
+settings.  If @subcmd{CASES} is not specified then all cases are printed.
  
-The @subcmd{FORMAT} subcommand can be used to change the output format.  NUMBERED
-will print case numbers along with each case; UNNUMBERED, the default,
-causes the case numbers to be omitted.  The WRAP and SINGLE settings are
+The @subcmd{FORMAT} subcommand can be used to change the output format.  @subcmd{NUMBERED}
+will print case numbers along with each case; @subcmd{UNNUMBERED}, the default,
+causes the case numbers to be omitted.  The @subcmd{WRAP} and @subcmd{SINGLE} settings are
  currently not used.
  
  Case numbers start from 1.  They are counted after all transformations
  have been considered.
  
-@cmd{LIST} attempts to fit all the values on a single line.  If needed
-to make them fit, variable names are displayed vertically.  If values
-cannot fit on a single line, then a multi-line format will be used.
-
  @cmd{LIST} is a procedure.  It causes the data to be read.
  
  @node NEW FILE
@@ -936,19 +959,139 @@ NEW FILE.
  @cmd{NEW FILE} command clears the dictionary and data from the current
  active dataset.
  
+@node MATRIX DATA
+@section MATRIX DATA
+@vindex MATRIX DATA
+
+@display
+MATRIX DATA
+        VARIABLES = @var{columns}
+        [eFILE='@var{file_name}'| INLINE @}
+        [/FORMAT= [@{LIST | FREE@}]
+                  [@{UPPER | LOWER | FULL@}]
+                  [@{DIAGONAL | NODIAGONAL@}]]
+        [/SPLIT= @var{split_variables}].
+@end display
+
+The @cmd{MATRIX DATA} command is used to input data in the form of matrices
+which can subsequently be used by other commands.  If the
+@subcmd{FILE} is omitted or takes the value @samp{INLINE} then the command
+should immediately followed by @cmd{BEGIN DATA}, @xref{BEGIN DATA}.
+
+There is one mandatory subcommand, @i{viz:} @subcmd{VARIABLES}, which defines
+the @var{columns} of the matrix.
+Normally, the @var{columns} should include an item called @samp{ROWTYPE_}.
+The @samp{ROWTYPE_} column is used to specify the purpose of a row in the
+matrix.
+
+@example
+matrix data
+    variables = rowtype_ var01 TO var08.
+
+begin data.
+mean  24.3  5.4  69.7  20.1  13.4  2.7  27.9  3.7
+sd    5.7   1.5  23.5  5.8   2.8   4.5  5.4   1.5
+n     92    92   92    92    92    92   92    92
+corr 1.00
+corr .18  1.00
+corr -.22  -.17  1.00
+corr .36  .31  -.14  1.00
+corr .27  .16  -.12  .22  1.00
+corr .33  .15  -.17  .24  .21  1.00
+corr .50  .29  -.20  .32  .12  .38  1.00
+corr .17  .29  -.05  .20  .27  .20  .04  1.00
+end data.
+@end example
+
+In the above example, the first three rows have ROWTYPE_ values of
+@samp{mean}, @samp{sd}, and @samp{n}.  These indicate that the rows
+contain mean values, standard deviations and counts, respectively.
+All subsequent rows have a ROWTYPE_ of @samp{corr} which indicates
+that the values are correlation coefficients.
+
+Note that in this example, the upper right values of the @samp{corr}
+values are blank, and in each case, the rightmost value is unity.
+This is because, the
+@subcmd{FORMAT} subcommand defaults to @samp{LOWER DIAGONAL},
+which indicates that only the lower triangle is provided in the data.
+The opposite triangle is automatically inferred.  One could instead
+specify the upper triangle as follows:
+
+
+@example
+matrix data
+    variables = rowtype_ var01 TO var08
+    /format = upper nodiagonal.
+
+begin data.
+mean  24.3 5.4  69.7  20.1  13.4  2.7  27.9  3.7
+sd    5.7  1.5  23.5  5.8   2.8   4.5  5.4   1.5
+n     92    92   92    92    92    92   92    92
+corr         .17  .50  -.33  .27  .36  -.22  .18
+corr               .29  .29  -.20  .32  .12  .38
+corr                    .05  .20  -.15  .16  .21
+corr                         .20  .32  -.17  .12
+corr                              .27  .12  -.24
+corr                                  -.20  -.38
+corr                                         .04
+end data.
+@end example
+
+In this example the @samp{NODIAGONAL} keyword is used.  Accordingly
+the diagonal values of the matrix are omitted.  This implies that
+there is one less @samp{corr} line than there are variables.
+If the @samp{FULL} option is passed to the @subcmd{FORMAT} subcommand,
+then all the matrix elements must be provided, including the diagonal
+elements.
+
+In the preceding examples, each matrix row has been specified on a
+single line.  If you pass the keyword @var{FREE} to @subcmd{FORMAT}
+then the data may be data for several matrix rows may be specified on
+the same line, or a single row may be split across lines.
+
+The @subcmd{SPLIT} is used to indicate that variables are to be
+considered as split variables.  For example, the following
+defines two matrices using the variable @samp{S1} to distinguish
+between them.
+
+@example
+matrix data
+    variables = s1 rowtype_  var01 TO var04
+    /split = s1
+    /format = full diagonal.
+
+begin data
+0 mean 34 35 36 37
+0 sd   22 11 55 66
+0 n    99 98 99 92
+0 corr 1 9 8 7
+0 corr 9 1 6 5
+0 corr 8 6 1 4
+0 corr 7 5 4 1
+1 mean 44 45 34 39
+1 sd   23 15 51 46
+1 n    98 34 87 23
+1 corr 1 2 3 4
+1 corr 2 1 5 6
+1 corr 3 5 1 7
+1 corr 4 6 7 1
+end data.
+@end example
+
  @node PRINT
  @section PRINT
  @vindex PRINT
  
  @display
  PRINT 
-        OUTFILE='@var{file_name}'
-        RECORDS=@var{n_lines}
-        @{NOTABLE,TABLE@}
+        [OUTFILE='@var{file_name}']
+        [RECORDS=@var{n_lines}]
+        [@{NOTABLE,TABLE@}]
+        [ENCODING='@var{encoding}']
          [/[@var{line_no}] @var{arg}@dots{}]
  
  @var{arg} takes one of the following forms:
-        '@var{string}' [@var{start}-@var{end}]
+        '@var{string}' [@var{start}]
          @var{var_list} @var{start}-@var{end} [@var{type_spec}]
          @var{var_list} (@var{fortran_spec})
          @var{var_list} *
@@ -969,6 +1112,11 @@ Handles}).  If @subcmd{OUTFILE} is not present then output will be sent to
  inserted at beginning of each output line, even lines that otherwise
  would be blank.
  
+The @subcmd{ENCODING} subcommand may only be used if the
+@subcmd{OUTFILE} subcommand is also used.  It specifies the character
+encoding of the file.  @xref{INSERT}, for information on supported
+encodings.
+
  The @subcmd{RECORDS} subcommand specifies the number of lines to be output.  The
  number of lines may optionally be surrounded by parentheses.
  
@@ -983,12 +1131,10 @@ line number, the next line number will be specified.  Multiple lines may
  be specified using multiple slashes with the intended output for a line
  following its respective slash.
  
-
-Literal strings may be printed.  Specify the string itself.  Optionally
-the string may be followed by a column number or range of column
-numbers, specifying the location on the line for the string to be
-printed.  Otherwise, the string will be printed at the current position
-on the line.
+Literal strings may be printed.  Specify the string itself.
+Optionally the string may be followed by a column number, specifying
+the column on the line where the string should start.  Otherwise, the
+string will be printed at the current position on the line.
  
  Variables to be printed can be specified in the same ways as available
  for @cmd{DATA LIST FIXED} (@pxref{DATA LIST FIXED}).  In addition, a
@@ -1034,7 +1180,7 @@ With @subcmd{OUTFILE}, @cmd{PRINT EJECT} writes its output to the specified file
  The first line of output is written with @samp{1} inserted in the
  first column.  Commonly, this is the only line of output.  If
  additional lines of output are specified, these additional lines are
-written with a space inserted in the first column, as with PRINT.
+written with a space inserted in the first column, as with @subcmd{PRINT}.
  
  @xref{PRINT}, for more information on syntax and usage.
  
@@ -1043,7 +1189,7 @@ written with a space inserted in the first column, as with PRINT.
  @vindex PRINT SPACE
  
  @display
-PRINT SPACE OUTFILE='file_name' n_lines.
+PRINT SPACE [OUTFILE='file_name'] [ENCODING='@var{encoding}'] [n_lines].
  @end display
  
  @cmd{PRINT SPACE} prints one or more blank lines to an output file.
@@ -1053,6 +1199,10 @@ a file specified by file name as a string or file handle (@pxref{File
  Handles}).  If OUTFILE is not specified then output will be directed to
  the listing file.
  
+The @subcmd{ENCODING} subcommand may only be used if @subcmd{OUTFILE}
+is also used.  It specifies the character encoding of the file.
+@xref{INSERT}, for information on supported encodings.
+
  n_lines is also optional.  If present, it is an expression
  (@pxref{Expressions}) specifying the number of blank lines to be
  printed.  The expression must evaluate to a nonnegative value.
@@ -1062,7 +1212,7 @@ printed.  The expression must evaluate to a nonnegative value.
  @vindex REREAD
  
  @display
-REREAD FILE=handle COLUMN=column.
+REREAD [FILE=handle] [COLUMN=column] [ENCODING='@var{encoding}'].
  @end display
  
  The @cmd{REREAD} transformation allows the previous input line in a
@@ -1082,6 +1232,10 @@ re-reading.  Specify an expression (@pxref{Expressions}) evaluating to
  the first column that should be included in the re-read line.  Columns
  are numbered from 1 at the left margin.
  
+The @subcmd{ENCODING} subcommand may only be used if the @subcmd{FILE}
+subcommand is also used.  It specifies the character encoding of the
+file.   @xref{INSERT}, for information on supported encodings.
+
  Issuing @code{REREAD} multiple times will not back up in the data
  file.  Instead, it will re-read the same line multiple times.