MATRIX DATA: Fully implement.

author Ben Pfaff <blp@cs.stanford.edu>

Fri, 3 Sep 2021 05:15:53 +0000 (22:15 -0700)

committer Ben Pfaff <blp@cs.stanford.edu>

Fri, 3 Sep 2021 05:15:53 +0000 (22:15 -0700)
author Ben Pfaff <blp@cs.stanford.edu>
Fri, 3 Sep 2021 05:15:53 +0000 (22:15 -0700)
committer Ben Pfaff <blp@cs.stanford.edu>
Fri, 3 Sep 2021 05:15:53 +0000 (22:15 -0700)
diff --git a/NEWS b/NEWS

index 6392be56c9e5e1d39a7eadfbb1a674487b05b071..9277e682eb41c026ebf1f57204c12aae265ade3e 100644 (file)
--- a/NEWS
+++ b/NEWS
@@ -8,6 +8,8 @@ Changes from 1.4.1 to 1.5.3:
  
   * The DEFINE command is now supported.
  
+ * The MATRIX DATA command is now fully implemented.
+
   * An error in the displayed signficance of oneway anova
     contrasts tests has been corrected.
  
diff --git a/doc/automake.mk b/doc/automake.mk

index 3206beb25e28ad9d953a109ec0dd76987db22892..cb277a2e8897e330c654dd9d28b0180d5ec1136b 100644 (file)
--- a/doc/automake.mk
+++ b/doc/automake.mk
@@ -46,6 +46,7 @@ doc_pspp_TEXINFOS = doc/version.texi \
         doc/regression.texi \
         doc/utilities.texi \
         doc/variables.texi \
+       doc/matrices.texi \
         doc/fdl.texi
  
  doc_pspp_dev_TEXINFOS = doc/version-dev.texi \
diff --git a/doc/data-io.texi b/doc/data-io.texi

index f7d3e7ff336726413ffd6105a4c8c9426655434c..1d4b0d6b0ba3f255d907eb78a2c5ff8fa131c794 100644 (file)
--- a/doc/data-io.texi
+++ b/doc/data-io.texi
@@ -48,7 +48,6 @@ actually be read until a procedure is executed.
  * INPUT PROGRAM::               Support for complex input programs.
  * LIST::                        List cases in the active dataset.
  * NEW FILE::                    Clear the active dataset.
-* MATRIX DATA::                 Defining matrix material for procedures.
  * PRINT::                       Display values in print formats.
  * PRINT EJECT::                 Eject the current page then print.
  * PRINT SPACE::                 Print blank lines.
@@ -968,160 +967,6 @@ NEW FILE.
  @cmd{NEW FILE} command clears the dictionary and data from the current
  active dataset.
  
-@node MATRIX DATA
-@section MATRIX DATA
-@vindex MATRIX DATA
-
-@display
-MATRIX DATA
-        VARIABLES = @var{columns}
-        [FILE='@var{file_name}'| INLINE @}
-        [/FORMAT= [@{LIST | FREE@}]
-                  [@{UPPER | LOWER | FULL@}]
-                  [@{DIAGONAL | NODIAGONAL@}]]
-        [/N= @var{n}]
-        [/SPLIT= @var{split_variables}].
-@end display
-
-The @cmd{MATRIX DATA} command is used to input data in the form of matrices
-which can subsequently be used by other commands.  If the
-@subcmd{FILE} is omitted or takes the value @samp{INLINE} then the command
-should immediately followed by @cmd{BEGIN DATA} (@pxref{BEGIN DATA}).
-
-There is one mandatory subcommand, @i{viz:} @subcmd{VARIABLES}, which defines
-the @var{columns} of the matrix.
-Normally, the @var{columns} should include an item called @samp{ROWTYPE_}.
-The @samp{ROWTYPE_} column is used to specify the purpose of a row in the
-matrix.
-
-@example
-matrix data
-    variables = rowtype_ var01 TO var08.
-
-begin data.
-mean  24.3  5.4  69.7  20.1  13.4  2.7  27.9  3.7
-sd    5.7   1.5  23.5  5.8   2.8   4.5  5.4   1.5
-n     92    92   92    92    92    92   92    92
-corr 1.00
-corr .18  1.00
-corr -.22  -.17  1.00
-corr .36  .31  -.14  1.00
-corr .27  .16  -.12  .22  1.00
-corr .33  .15  -.17  .24  .21  1.00
-corr .50  .29  -.20  .32  .12  .38  1.00
-corr .17  .29  -.05  .20  .27  .20  .04  1.00
-end data.
-@end example
-
-In the above example, the first three rows have ROWTYPE_ values of
-@samp{mean}, @samp{sd}, and @samp{n}.  These indicate that the rows
-contain mean values, standard deviations and counts, respectively.
-All subsequent rows have a ROWTYPE_ of @samp{corr} which indicates
-that the values are correlation coefficients.
-
-Note that in this example, the upper right values of the @samp{corr}
-values are blank, and in each case, the rightmost value is unity.
-This is because, the
-@subcmd{FORMAT} subcommand defaults to @samp{LOWER DIAGONAL},
-which indicates that only the lower triangle is provided in the data.
-The opposite triangle is automatically inferred.  One could instead
-specify the upper triangle as follows:
-
-
-@example
-matrix data
-    variables = rowtype_ var01 TO var08
-    /format = upper nodiagonal.
-
-begin data.
-mean  24.3 5.4  69.7  20.1  13.4  2.7  27.9  3.7
-sd    5.7  1.5  23.5  5.8   2.8   4.5  5.4   1.5
-n     92    92   92    92    92    92   92    92
-corr         .17  .50  -.33  .27  .36  -.22  .18
-corr               .29  .29  -.20  .32  .12  .38
-corr                    .05  .20  -.15  .16  .21
-corr                         .20  .32  -.17  .12
-corr                              .27  .12  -.24
-corr                                  -.20  -.38
-corr                                         .04
-end data.
-@end example
-
-In this example the @samp{NODIAGONAL} keyword is used.  Accordingly
-the diagonal values of the matrix are omitted.  This implies that
-there is one less @samp{corr} line than there are variables.
-If the @samp{FULL} option is passed to the @subcmd{FORMAT} subcommand,
-then all the matrix elements must be provided, including the diagonal
-elements.
-
-In the preceding examples, each matrix row has been specified on a
-single line.  If you pass the keyword @var{FREE} to @subcmd{FORMAT}
-then the data may be data for several matrix rows may be specified on
-the same line, or a single row may be split across lines.
-
-The @subcmd{N} subcommand may be used to specify the number
-of valid cases for each variable.  It should not be used if the
-data contains a record whose ROWTYPE_ column is @samp{N} or @samp{N_VECTOR}.
-It implies a @samp{N} record whose values are all @var{n}.
-That is to say,
-@example
-matrix data
-    variables = rowtype_  var01 TO var04
-    /format = upper nodiagonal
-    /n = 99.
-begin data
-mean 34 35 36 37
-sd   22 11 55 66
-corr 9 8 7
-corr 6 5
-corr 4
-end data.
-@end example
-produces an effect identical to
-@example
-matrix data
-    variables = rowtype_  var01 TO var04
-    /format = upper nodiagonal
-begin data
-n    99 99 99 99
-mean 34 35 36 37
-sd   22 11 55 66
-corr 9 8 7
-corr 6 5
-corr 4
-end data.
-@end example
-
-
-The @subcmd{SPLIT} is used to indicate that variables are to be
-considered as split variables.  For example, the following
-defines two matrices using the variable @samp{S1} to distinguish
-between them.
-
-@example
-matrix data
-    variables = s1 rowtype_  var01 TO var04
-    /split = s1
-    /format = full diagonal.
-
-begin data
-0 mean 34 35 36 37
-0 sd   22 11 55 66
-0 n    99 98 99 92
-0 corr 1 9 8 7
-0 corr 9 1 6 5
-0 corr 8 6 1 4
-0 corr 7 5 4 1
-1 mean 44 45 34 39
-1 sd   23 15 51 46
-1 n    98 34 87 23
-1 corr 1 2 3 4
-1 corr 2 1 5 6
-1 corr 3 5 1 7
-1 corr 4 6 7 1
-end data.
-@end example
-
  @node PRINT
  @section PRINT
  @vindex PRINT
diff --git a/doc/matrices.texi b/doc/matrices.texi

new file mode 100644 (file)

index 0000000..26a29db
--- /dev/null
+++ b/doc/matrices.texi
@@ -0,0 +1,574 @@
+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017, 2020, 2021 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
+@node Matrices
+@chapter Matrices
+
+Some @pspp{} procedures work with matrices by producing numeric
+matrices that report results of data analysis, or by consuming
+matrices as a basis for further analysis.  This chapter documents the
+format of data files that store these matrices and commands for
+working with them.
+
+@node Matrix Files
+@section Matrix Files
+@vindex Matrix file
+
+A matrix file is an SPSS system file that conforms to the dictionary
+and case structure described in this section.  Procedures that read
+matrices from files expect them to be in the matrix file format.
+Procedures that write matrices also use this format.
+
+Text files that contain matrices can be converted to matrix file
+format.  @xref{MATRIX DATA}, for a command to read a text file as a
+matrix file.
+
+A matrix file's dictionary must have the following variables in the
+specified order:
+
+@enumerate
+@item
+Zero or more numeric split variables.  These are included by
+procedures when @cmd{SPLIT FILE} is active.  @cmd{MATRIX DATA} assigns
+split variables format F4.0.
+
+@item
+@code{ROWTYPE_}, a string variable with width 8.  This variable
+indicates the kind of matrix or vector that a given case represents.
+The supported row types are listed below.
+
+@item
+Zero or more numeric factor variables.  These are included by
+procedures that divide data into cells.  For within-cell data, factor
+variables are filled with non-missing values; for pooled data, they
+are missing.  @cmd{MATRIX DATA} assigns factor variables format F4.0.
+
+@item
+@code{VARNAME_}, a string variable.  Matrix data includes one row per
+continuous variable (see below), naming each continuous variable in
+order.  This column is blank for vector data.  @cmd{MATRIX DATA} makes
+@code{VARNAME_} wide enough for the name of any of the continuous
+variables, but at least 8 bytes.
+
+@item
+One or more continuous variables.  These are the variables whose data
+was analyzed to produce the matrices.  @cmd{MATRIX DATA} assigns
+continuous variables format F10.4.
+@end enumerate
+
+Case weights are ignored in matrix files. 
+
+@subheading Row Types
+@anchor{Matrix File Row Types}
+
+Matrix files support a fixed set of types of matrix and vector data.
+The @code{ROWTYPE_} variable in each case of a matrix file indicates
+its row type.  
+
+The supported matrix row types are listed below.  Each type is listed
+with the keyword that identifies it in @code{ROWTYPE_}.  All supported
+types of matrices are square, meaning that each matrix must include
+one row per continuous variable, with the @code{VARNAME_} variable
+indicating each continuous variable in turn in the same order as the
+dictionary.
+
+@table @code
+@item CORR
+Correlation coefficients.
+
+@item COV
+Covariance coefficients.
+
+@item MAT
+General-purpose matrix.
+
+@item N_MATRIX
+Counts.
+
+@item PROX
+Proximities matrix.
+@end table
+
+The supported vector row types are listed below, along with their
+associated keyword.  Vector row types only require a single row, whose
+@code{VARNAME_} is blank:
+
+@table @code
+@item COUNT
+Unweighted counts.
+
+@item DFE
+Degrees of freedom.
+
+@item MEAN
+Means.
+
+@item MSE
+Mean squared errors.
+
+@item N
+Counts.
+
+@item STDDEV
+Standard deviations.
+@end table
+
+Only the row types listed above may appear in matrix files.  The
+@cmd{MATRIX DATA} command, however, accepts the additional row types
+listed below, which it changes into matrix file row types as part of
+its conversion process:
+
+@table @code
+@item N_VECTOR
+Synonym for @cmd{N}.
+
+@item SD
+Synonym for @code{STDDEV}.
+
+@item N_SCALAR
+Accepts a single number from the @code{MATRIX DATA} input and writes
+it as an @code{N} row with the number replicated across all the
+continuous variables.
+@end table
+
+@node MATRIX DATA
+@section MATRIX DATA
+@vindex MATRIX DATA
+
+@display
+MATRIX DATA
+        VARIABLES=@var{variables}
+        [FILE=@{'@var{file_name}' | INLINE@}
+        [/FORMAT=[@{LIST | FREE@}]
+                 [@{UPPER | LOWER | FULL@}]
+                 [@{DIAGONAL | NODIAGONAL@}]]
+        [/SPLIT=@var{split_vars}]
+        [/FACTORS=@var{factor_vars}]
+        [/N=@var{n}]
+
+The following subcommands are only needed when ROWTYPE_ is not
+specified on the VARIABLES subcommand:
+        [/CONTENTS=@{CORR,COUNT,COV,DFE,MAT,MEAN,MSE,
+                    N_MATRIX,N|N_VECTOR,N_SCALAR,PROX,SD|STDDEV@}]
+        [/CELLS=@var{n_cells}]
+@end display
+
+The @cmd{MATRIX DATA} command convert matrices and vectors from text
+format into the matrix file format (@xref{Matrix Files}) for use by
+procedures that read matrices.  It reads a text file or inline data
+and outputs to the active file, replacing any data already in the
+active dataset.  The matrix file may then be used by other commands
+directly from the active file, or it may be written to a @file{.sav}
+file using the @cmd{SAVE} command.
+
+The text data read by @cmd{MATRIX DATA} can be delimited by spaces or
+commas.  A plus or minus sign, except immediately following a @samp{d}
+or @samp{e}, also begins a new value.  Optionally, values may be
+enclosed in single or double quotes.
+
+@cmd{MATRIX DATA} can read the types of matrix and vector data
+supported in matrix files (@pxref{Matrix File Row Types}).
+
+The @subcmd{FILE} subcommand specifies the source of the command's
+input.  To read input from a text file, specify its name in quotes.
+To supply input inline, omit @subcmd{FILE} or specify @code{INLINE}.
+Inline data must directly follow @code{MATRIX DATA}, inside @cmd{BEGIN
+DATA} (@pxref{BEGIN DATA}).
+
+@subcmd{VARIABLES} is the only required subcommand.  It names the
+variables present in each input record in the order that they appear.
+(@cmd{MATRIX DATA} reorders the variables in the matrix file it
+produces, if needed to fit the matrix file format.)  The variable list
+must include split variables and factor variables, if they are present
+in the data, in addition to the continuous variables that form matrix
+rows and columns.  It may also include a special variable named
+@code{ROWTYPE_}.
+
+Matrix data may include split variables or factor variables or both.
+List split variables, if any, on the @subcmd{SPLIT} subcommand and
+factor variables, if any, on the @subcmd{FACTORS} subcommand.  Split
+and factor variables must be numeric.  Split and factor variables must
+also be listed on @subcmd{VARIABLES}, with one exception: if
+@subcmd{VARIABLES} does not include @code{ROWTYPE_}, then
+@subcmd{SPLIT} may name a single variable that is not in
+@subcmd{VARIABLES} (@pxref{MATRIX DATA Example 8}).
+
+The @subcmd{FORMAT} subcommand accepts settings to describe the format
+of the input data:
+
+@table @asis
+@item @code{LIST} (default)
+@itemx @code{FREE}
+LIST requires each row to begin at the start of a new input line.
+FREE allows rows to begin in the middle of a line.  Either setting
+allows a single row to continue across multiple input lines.
+
+@item @code{LOWER} (default)
+@itemx @code{UPPER}
+@itemx @code{FULL}
+With LOWER, only the lower triangle is read from the input data and
+the upper triangle is mirrored across the main diagonal.  UPPER
+behaves similarly for the upper triangle.  FULL reads the entire
+matrix.
+
+@item @code{DIAGONAL} (default)
+@itemx @code{NODIAGONAL}
+With DIAGONAL, the main diagonal is read from the input data.  With
+NODIAGONAL, which is incompatible with FULL, the main diagonal is not
+read from the input data but instead set to 1 for correlation matrices
+and system-missing for others.
+@end table
+
+The @subcmd{N} subcommand is a way to specify the size of the
+population.  It is equivalent to specifying an @code{N} vector with
+the specified value for each split file.
+
+@cmd{MATRIX DATA} supports two different ways to indicate the kinds of
+matrices and vectors present in the data, depending on whether a
+variable with the special name @code{ROWTYPE_} is present in
+@code{VARIABLES}.  The following subsections explain @cmd{MATRIX DATA}
+syntax and behavior in each case.
+
+@node MATRIX DATA with ROWTYPE_
+@subsection With @code{ROWTYPE_}
+
+If @code{VARIABLES} includes @code{ROWTYPE_}, each case's
+@code{ROWTYPE_} indicates the type of data contained in the row.
+@xref{Matrix File Row Types}, for a list of supported row types.
+
+@subsubheading Example 1: Defaults with @code{ROWTYPE_}
+@anchor{MATRIX DATA Example 1}
+
+This example shows a simple use of @cmd{MATRIX DATA} with
+@code{ROWTYPE_} plus 8 variables named @code{var01} through
+@code{var08}.
+
+Because @code{ROWTYPE_} is the first variable in @subcmd{VARIABLES},
+it appears first on each line. The first three lines in the example
+data have @code{ROWTYPE_} values of @samp{MEAN}, @samp{SD}, and
+@samp{N}.  These indicate that these lines contain vectors of means,
+standard deviations, and counts, respectively, for @code{var01}
+through @code{var08} in order.
+
+The remaining 8 lines have a ROWTYPE_ of @samp{CORR} which indicates
+that the values are correlation coefficients.  Each of the lines
+corresponds to a row in the correlation matrix: the first line is for
+@code{var01}, the next line for @code{var02}, and so on.  The input
+only contains values for the lower triangle, including the diagonal,
+since @code{FORMAT=LOWER DIAGONAL} is the default.
+
+With @code{ROWTYPE_}, the @code{CONTENTS} subcommand is optional and
+the @code{CELLS} subcommand may not be used.
+
+@example
+MATRIX DATA
+    VARIABLES=ROWTYPE_ var01 TO var08.
+BEGIN DATA.
+MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+N       92    92    92    92    92    92    92    92
+CORR  1.00
+CORR   .18  1.00
+CORR  -.22  -.17  1.00
+CORR   .36   .31  -.14  1.00
+CORR   .27   .16  -.12   .22  1.00
+CORR   .33   .15  -.17   .24   .21  1.00
+CORR   .50   .29  -.20   .32   .12   .38  1.00
+CORR   .17   .29  -.05   .20   .27   .20   .04  1.00
+END DATA.
+@end example
+
+@subsubheading Example 2: @code{FORMAT=UPPER NODIAGONAL}
+
+This syntax produces the same matrix file as example 1, but it uses
+@code{FORMAT=UPPER NODIAGONAL} to specify the upper triangle and omit
+the diagonal.  Because the matrix's @code{ROWTYPE_} is @code{CORR},
+@pspp{} automatically fills in the diagonal with 1.
+
+@example
+MATRIX DATA
+    VARIABLES=ROWTYPE_ var01 TO var08
+    /FORMAT=UPPER NODIAGONAL.
+BEGIN DATA.
+MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+N       92    92    92    92    92    92    92    92
+CORR         .17   .50  -.33   .27   .36  -.22   .18
+CORR               .29   .29  -.20   .32   .12   .38
+CORR                     .05   .20  -.15   .16   .21
+CORR                           .20   .32  -.17   .12
+CORR                                 .27   .12  -.24
+CORR                                      -.20  -.38
+CORR                                             .04
+END DATA.
+@end example
+
+@subsubheading Example 3: @subcmd{N} subcommand
+
+This syntax uses the @subcmd{N} subcommand in place of an @code{N}
+vector.  It produces the same matrix file as examples 1 and 2.
+
+@example
+MATRIX DATA
+    VARIABLES=ROWTYPE_ var01 TO var08
+    /FORMAT=UPPER NODIAGONAL
+    /N 92.
+BEGIN DATA.
+MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+CORR         .17   .50  -.33   .27   .36  -.22   .18
+CORR               .29   .29  -.20   .32   .12   .38
+CORR                     .05   .20  -.15   .16   .21
+CORR                           .20   .32  -.17   .12
+CORR                                 .27   .12  -.24
+CORR                                      -.20  -.38
+CORR                                             .04
+END DATA.
+@end example
+
+@subsubheading Example 4: Split variables
+@anchor{MATRIX DATA Example 4}
+
+This syntax defines two matrices, using the variable @samp{s1} to
+distinguish between them.  Notice how the order of variables in the
+input matches their order on @subcmd{VARIABLES}.  This example also
+uses @code{FORMAT=FULL}.
+
+@example
+MATRIX DATA
+    VARIABLES=s1 ROWTYPE_  var01 TO var04
+    /SPLIT=s1
+    /FORMAT=FULL.
+BEGIN DATA.
+0 MEAN 34 35 36 37
+0 SD   22 11 55 66
+0 N    99 98 99 92
+0 CORR  1 .9 .8 .7
+0 CORR .9  1 .6 .5
+0 CORR .8 .6  1 .4
+0 CORR .7 .5 .4  1
+1 MEAN 44 45 34 39
+1 SD   23 15 51 46
+1 N    98 34 87 23
+1 CORR  1 .2 .3 .4
+1 CORR .2  1 .5 .6
+1 CORR .3 .5  1 .7
+1 CORR .4 .6 .7  1
+END DATA.
+@end example
+
+@subsubheading Example 5: Factor variables
+@anchor{MATRIX DATA Example 5}
+
+This syntax defines a matrix file that includes a factor variable
+@samp{f1}.  The data includes mean, standard deviation, and count
+vectors for two values of the factor variable, plus a correlation
+matrix for pooled data.
+
+@example
+MATRIX DATA
+    VARIABLES=ROWTYPE_ f1 var01 TO var04
+    /FACTOR=f1.
+BEGIN DATA.
+MEAN 0 34 35 36 37
+SD   0 22 11 55 66
+N    0 99 98 99 92
+MEAN 1 44 45 34 39
+SD   1 23 15 51 46
+N    1 98 34 87 23
+CORR .  1
+CORR . .9  1
+CORR . .8 .6  1
+CORR . .7 .5 .4  1
+END DATA.
+@end example
+
+@node MATRIX DATA without ROWTYPE_
+@subsection Without @code{ROWTYPE_}
+
+If @code{VARIABLES} does not contain @code{ROWTYPE_}, the
+@subcmd{CONTENTS} subcommand defines the row types that appear in the
+file and their order.  If @subcmd{CONTENTS} is omitted,
+@code{CONTENTS=CORR} is assumed.
+
+Factor variables without @code{ROWTYPE_} introduce special
+requirements, illustrated below in Examples 8 and 9.
+
+@subsubheading Example 6: Defaults without @code{ROWTYPE_}
+
+This example shows a simple use of @cmd{MATRIX DATA} with 8 variables
+named @code{var01} through @code{var08}, without @code{ROWTYPE_}.
+This yields the same matrix file as Example 1 (@pxref{MATRIX DATA
+Example 1}).
+
+@example
+MATRIX DATA
+    VARIABLES=var01 TO var08
+   /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+ 5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+  92    92    92    92    92    92    92    92
+1.00
+ .18  1.00
+-.22  -.17  1.00
+ .36   .31  -.14  1.00
+ .27   .16  -.12   .22  1.00
+ .33   .15  -.17   .24   .21  1.00
+ .50   .29  -.20   .32   .12   .38  1.00
+ .17   .29  -.05   .20   .27   .20   .04  1.00
+END DATA.
+@end example
+
+@subsubheading Example 7: Split variables with explicit values
+
+This syntax defines two matrices, using the variable @code{s1} to
+distinguish between them.  Each line of data begins with @code{s1}.
+This yields the same matrix file as Example 4 (@pxref{MATRIX DATA
+Example 4}).
+
+@example
+MATRIX DATA
+    VARIABLES=s1 var01 TO var04
+    /SPLIT=s1
+    /FORMAT=FULL
+    /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+0  1 .9 .8 .7
+0 .9  1 .6 .5
+0 .8 .6  1 .4
+0 .7 .5 .4  1
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+1  1 .2 .3 .4
+1 .2  1 .5 .6
+1 .3 .5  1 .7
+1 .4 .6 .7  1
+END DATA.
+@end example
+
+@subsubheading Example 8: Split variable with sequential values
+@anchor{MATRIX DATA Example 8}
+
+Like this previous example, this syntax defines two matrices with
+split variable @code{s1}.  In this case, though, @code{s1} is not
+listed in @subcmd{VARIABLES}, which means that its value does not
+appear in the data.  Instead, @cmd{MATRIX DATA} reads matrix data
+until the input is exhausted, supplying 1 for the first split, 2 for
+the second, and so on.
+
+@example
+MATRIX DATA
+    VARIABLES=var01 TO var04
+    /SPLIT=s1
+    /FORMAT=FULL
+    /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+34 35 36 37
+22 11 55 66
+99 98 99 92
+ 1 .9 .8 .7
+.9  1 .6 .5
+.8 .6  1 .4
+.7 .5 .4  1
+44 45 34 39
+23 15 51 46
+98 34 87 23
+ 1 .2 .3 .4
+.2  1 .5 .6
+.3 .5  1 .7
+.4 .6 .7  1
+END DATA.
+@end example
+
+@subsubsection Factor variables without @code{ROWTYPE_}
+
+Without @subcmd{ROWTYPE_}, factor variables introduce two new wrinkles
+to @cmd{MATRIX DATA} syntax.  First, the @subcmd{CELLS} subcommand
+must declare the number of combinations of factor variables present in
+the data.  If there is, for example, one factor variable for which the
+data contains three values, one would write @code{CELLS=3}; if there
+are two (or more) factor variables for which the data contains five
+combinations, one would use @code{CELLS=5}; and so on.
+
+Second, the @subcmd{CONTENTS} subcommand must distinguish within-cell
+data from pooled data by enclosing within-cell row types in
+parentheses.  When different within-cell row types for a single factor
+appear in subsequent lines, enclose the row types in a single set of
+parentheses; when different factors' values for a given within-cell
+row type appear in subsequent lines, enclose each row type in
+individual parentheses.
+
+Without @subcmd{ROWTYPE_}, input lines for pooled data do not include
+factor values, not even as missing values, but input lines for
+within-cell data do.
+
+The following examples aim to clarify this syntax.
+
+@subsubheading Example 9: Factor variables, grouping within-cell records by factor
+
+This syntax defines the same matrix file as Example 5 (@pxref{MATRIX
+DATA Example 5}), without using @code{ROWTYPE_}.  It declares
+@code{CELLS=2} because the data contains two values (0 and 1) for
+factor variable @code{f1}.  Within-cell vector row types @code{MEAN},
+@code{SD}, and @code{N} are in a single set of parentheses on
+@subcmd{CONTENTS} because they are grouped together in subsequent
+lines for a single factor value.  The data lines with the pooled
+correlation matrix do not have any factor values.
+
+@example
+MATRIX DATA
+    VARIABLES=f1 var01 TO var04
+    /FACTOR=f1
+    /CELLS=2
+    /CONTENTS=(MEAN SD N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+   1
+  .9  1
+  .8 .6  1
+  .7 .5 .4  1
+END DATA.
+@end example
+
+@subsubheading Example 10: Factor variables, grouping within-cell records by row type
+
+This syntax defines the same matrix file as the previous example.  The
+only difference is that the within-cell vector rows are grouped
+differently: two rows of means (one for each factor), followed by two
+rows of standard deviations, followed by two rows of counts.
+
+@example
+MATRIX DATA
+    VARIABLES=f1 var01 TO var04
+    /FACTOR=f1
+    /CELLS=2
+    /CONTENTS=(MEAN) (SD) (N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+1 44 45 34 39
+0 22 11 55 66
+1 23 15 51 46
+0 99 98 99 92
+1 98 34 87 23
+   1
+  .9  1
+  .8 .6  1
+  .7 .5 .4  1
+END DATA.
+@end example
diff --git a/doc/pspp.texi b/doc/pspp.texi

index e107be4aa12ae8fc79fa3e05a65698391e2ba5a6..6c4771e944da8ab3dcecb0150056b926e48d2fdf 100644 (file)
--- a/doc/pspp.texi
+++ b/doc/pspp.texi
@@ -178,6 +178,7 @@ Free Documentation License".
  * Data Selection::              Select certain cases for analysis.
  * Conditionals and Looping::    Doing things many times or not at all.
  * Statistics::                  Basic statistical procedures.
+* Matrices::                    Matrix operations and transformations.
  * Utilities::                   Other commands.
  
  * Invoking pspp-convert::       Utility for converting among file formats.
@@ -208,6 +209,7 @@ Free Documentation License".
  @include data-selection.texi
  @include flow-control.texi
  @include statistics.texi
+@include matrices.texi
  @include utilities.texi
  
  @include pspp-convert.texi
diff --git a/src/language/command.def b/src/language/command.def

index 63df224bde598e249819da9ad500f81207e2fbb0..f47c0f586630d51119cd4824e996b2d843be0280 100644 (file)
--- a/src/language/command.def
+++ b/src/language/command.def
@@ -53,7 +53,7 @@ DEF_CMD (S_INITIAL | S_DATA, 0, "GET DATA", cmd_get_data)
  DEF_CMD (S_INITIAL | S_DATA, 0, "IMPORT", cmd_import)
  DEF_CMD (S_INITIAL | S_DATA, 0, "INPUT PROGRAM", cmd_input_program)
  DEF_CMD (S_INITIAL | S_DATA, 0, "MATCH FILES", cmd_match_files)
-DEF_CMD (S_INITIAL | S_DATA | S_INPUT_PROGRAM | S_FILE_TYPE, 0, "MATRIX DATA", cmd_matrix)
+DEF_CMD (S_INITIAL | S_DATA, 0, "MATRIX DATA", cmd_matrix_data)
  DEF_CMD (S_INITIAL | S_DATA, 0, "UPDATE", cmd_update)
  DEF_CMD (S_INITIAL | S_DATA, 0, "DATASET ACTIVATE", cmd_dataset_activate)
  DEF_CMD (S_INITIAL | S_DATA, 0, "DATASET DECLARE", cmd_dataset_declare)
@@ -163,6 +163,7 @@ DEF_CMD (S_ANY, F_TESTING, "DEBUG PAPER SIZE", cmd_debug_paper_size)
  DEF_CMD (S_ANY, F_TESTING, "DEBUG POOL", cmd_debug_pool)
  DEF_CMD (S_ANY, F_TESTING, "DEBUG FLOAT FORMAT", cmd_debug_float_format)
  DEF_CMD (S_ANY, F_TESTING, "DEBUG XFORM FAIL", cmd_debug_xform_fail)
+DEF_CMD (S_DATA, F_TESTING, "DEBUG MATRIX READ", cmd_debug_matrix_read)
  
  /* Unimplemented commands. */
  UNIMPL_CMD ("2SLS", "Two stage least squares regression")
diff --git a/src/language/data-io/matrix-data.c b/src/language/data-io/matrix-data.c

index b41f47216b32a2d86e46345aaca5e3d757be74f3..c0f5979bcd268e215b2930b7e30a60bbfceff332 100644 (file)
--- a/src/language/data-io/matrix-data.c
+++ b/src/language/data-io/matrix-data.c
@@ -16,12 +16,17 @@
  
  #include <config.h>
  
+#include <gsl/gsl_matrix.h>
+#include <gsl/gsl_vector.h>
+
  #include "data/case.h"
  #include "data/casereader.h"
  #include "data/casewriter.h"
+#include "data/data-in.h"
  #include "data/dataset.h"
  #include "data/dictionary.h"
  #include "data/format.h"
+#include "data/short-names.h"
  #include "data/transformations.h"
  #include "data/variable.h"
  #include "language/command.h"
@@ -32,493 +37,949 @@
  #include "language/data-io/placement-parser.h"
  #include "language/lexer/lexer.h"
  #include "language/lexer/variable-parser.h"
+#include "libpspp/assertion.h"
  #include "libpspp/i18n.h"
  #include "libpspp/message.h"
-#include "libpspp/misc.h"
+#include "libpspp/str.h"
  
+#include "gl/c-ctype.h"
+#include "gl/minmax.h"
  #include "gl/xsize.h"
  #include "gl/xalloc.h"
  
  #include "gettext.h"
  #define _(msgid) gettext (msgid)
  \f
-/* DATA LIST transformation data. */
-struct data_list_trns
+#define ROWTYPES                                \
+    /* Matrix row types. */                     \
+    RT(CORR,     2)                             \
+    RT(COV,      2)                             \
+    RT(MAT,      2)                             \
+    RT(N_MATRIX, 2)                             \
+    RT(PROX,     2)                             \
+                                                \
+    /* Vector row types. */                     \
+    RT(COUNT,    1)                             \
+    RT(DFE,      1)                             \
+    RT(MEAN,     1)                             \
+    RT(MSE,      1)                             \
+    RT(STDDEV,   1)                             \
+    RT(N, 1)                                    \
+                                                \
+    /* Scalar row types. */                     \
+    RT(N_SCALAR, 0)
+
+enum rowtype
    {
-    struct data_parser *parser; /* Parser. */
-    struct dfm_reader *reader;  /* Data file reader. */
-    struct variable *end;      /* Variable specified on END subcommand. */
+#define RT(NAME, DIMS) C_##NAME,
+    ROWTYPES
+#undef RT
    };
  
-static trns_free_func data_list_trns_free;
-static trns_proc_func data_list_trns_proc;
-
-enum diagonal
+enum
    {
-    DIAGONAL,
-    NO_DIAGONAL
+#define RT(NAME, DIMS) +1
+    N_ROWTYPES = ROWTYPES
+#undef RT
    };
+verify (N_ROWTYPES < 32);
  
-enum triangle
-  {
-    LOWER,
-    UPPER,
-    FULL
+/* Returns the number of dimensions in the indexes for row type RT.  A matrix
+   has 2 dimensions, a vector has 1, a scalar has 0. */
+static int
+rowtype_dimensions (enum rowtype rt)
+{
+  static const int rowtype_dims[N_ROWTYPES] = {
+#define RT(NAME, DIMS) [C_##NAME] = DIMS,
+    ROWTYPES
+#undef RT
    };
+  return rowtype_dims[rt];
+}
  
-static const int ROWTYPE_WIDTH = 8;
+static struct substring
+rowtype_name (enum rowtype rt)
+{
+  static const struct substring rowtype_names[N_ROWTYPES] = {
+#define RT(NAME, DIMS) [C_##NAME] = SS_LITERAL_INITIALIZER (#NAME),
+    ROWTYPES
+#undef RT
+  };
  
-struct matrix_format
+  return rowtype_names[rt];
+}
+
+static bool
+rowtype_from_string (struct substring token, enum rowtype *rt)
+{
+  ss_trim (&token, ss_cstr (CC_SPACES));
+  for (size_t i = 0; i < N_ROWTYPES; i++)
+    if (lex_id_match (rowtype_name (i), token))
+      {
+        *rt = i;
+        return true;
+      }
+
+  if (lex_id_match (ss_cstr ("N_VECTOR"), token))
+    {
+      *rt = C_N;
+      return true;
+    }
+  else if (lex_id_match (ss_cstr ("SD"), token))
+    {
+      *rt = C_STDDEV;
+      return true;
+    }
+
+  return false;
+}
+
+static bool
+rowtype_parse (struct lexer *lexer, enum rowtype *rt)
  {
-  enum triangle triangle;
-  enum diagonal diagonal;
-  const struct variable *rowtype;
-  const struct variable *varname;
-  int n_continuous_vars;
-  struct variable **split_vars;
-  size_t n_split_vars;
-  long n;
-};
-
-/*
-valid rowtype_ values:
-  CORR,
-  COV,
-  MAT,
-
-
-  MSE,
-  DFE,
-  MEAN,
-  STDDEV (or SD),
-  N_VECTOR (or N),
-  N_SCALAR,
-  N_MATRIX,
-  COUNT,
-  PROX.
-*/
-
-/* Sets the value of OUTCASE which corresponds to VNAME
-   to the value STR.  VNAME must be of type string.
- */
+  bool parsed = (lex_token (lexer) == T_ID
+                 && rowtype_from_string (lex_tokss (lexer), rt));
+  if (parsed)
+    lex_get (lexer);
+  return parsed;
+}
+\f
+struct matrix_format
+  {
+    bool span;
+    enum triangle
+      {
+        LOWER,
+        UPPER,
+        FULL
+      }
+    triangle;
+    enum diagonal
+      {
+        DIAGONAL,
+        NO_DIAGONAL
+      }
+    diagonal;
+
+    bool input_rowtype;
+    struct variable **input_vars;
+    size_t n_input_vars;
+
+    /* How to read matrices with each possible number of dimensions (0=scalar,
+       1=vector, 2=matrix). */
+    struct matrix_sched
+      {
+        /* Number of rows and columns in the matrix: (1,1) for a scalar, (1,n) for
+           a vector, (n,n) for a matrix. */
+        int nr, nc;
+
+        /* Rows of data to read and the number of columns in each.  Because we
+           often read just a triangle and sometimes omit the diagonal, 'n_rp' can
+           be less than 'nr' and 'rp[i]->y' isn't always 'y'. */
+        struct row_sched
+          {
+            /* The y-value of the row inside the matrix. */
+            int y;
+
+            /* first and last (exclusive) columns to read in this row. */
+            int x0, x1;
+          }
+          *rp;
+        size_t n_rp;
+      }
+    ms[3];
+
+    struct variable *rowtype;
+    struct variable *varname;
+    struct variable **cvars;
+    int n_cvars;
+    struct variable **svars;
+    size_t *svar_indexes;
+    size_t n_svars;
+    struct variable **fvars;
+    size_t *fvar_indexes;
+    size_t n_fvars;
+    int cells;
+    int n;
+
+    unsigned int pooled_rowtype_mask;
+    unsigned int factor_rowtype_mask;
+
+    struct content
+      {
+        bool open;
+        enum rowtype rowtype;
+        bool close;
+      }
+    *contents;
+    size_t n_contents;
+  };
+
  static void
-set_varname_column (struct ccase *outcase, const struct variable *vname,
-     const char *str)
+matrix_format_uninit (struct matrix_format *mf)
  {
-  int len = var_get_width (vname);
-  uint8_t *s = case_str_rw (outcase, vname);
+  free (mf->input_vars);
+  for (int i = 0; i < 3; i++)
+    free (mf->ms[i].rp);
+  free (mf->cvars);
+  free (mf->svars);
+  free (mf->svar_indexes);
+  free (mf->fvars);
+  free (mf->fvar_indexes);
+  free (mf->contents);
+}
  
-  strncpy (CHAR_CAST (char *, s), str, len);
+static void
+set_string (struct ccase *outcase, const struct variable *var,
+            struct substring src)
+{
+  struct substring dst = case_ss (outcase, var);
+  for (size_t i = 0; i < dst.length; i++)
+    dst.string[i] = i < src.length ? src.string[i] : ' ';
  }
  
  static void
-blank_varname_column (struct ccase *outcase, const struct variable *vname)
+parse_msg (struct dfm_reader *reader, const struct substring *token,
+           char *text, enum msg_severity severity)
+{
+  int first_column = 0;
+  if (token)
+    {
+      struct substring line = dfm_get_record (reader);
+      if (token->string >= line.string && token->string < ss_end (line))
+        first_column = ss_pointer_to_position (line, token->string) + 1;
+    }
+
+  int line_number = dfm_get_line_number (reader);
+  struct msg_location *location = xmalloc (sizeof *location);
+  *location = (struct msg_location) {
+    .file_name = xstrdup (dfm_get_file_name (reader)),
+    .first_line = line_number,
+    .last_line = line_number + 1,
+    .first_column = first_column,
+    .last_column = first_column ? first_column + token->length : 0,
+  };
+  struct msg *m = xmalloc (sizeof *m);
+  *m = (struct msg) {
+    .category = MSG_C_DATA,
+    .severity = severity,
+    .location = location,
+    .text = text,
+  };
+  msg_emit (m);
+}
+
+static void PRINTF_FORMAT (3, 4)
+parse_warning (struct dfm_reader *reader, const struct substring *token,
+               const char *format, ...)
  {
-  int len = var_get_width (vname);
-  uint8_t *s = case_str_rw (outcase, vname);
+  va_list args;
+  va_start (args, format);
+  parse_msg (reader, token, xvasprintf (format, args), MSG_S_WARNING);
+  va_end (args);
+}
  
-  memset (s, ' ', len);
+static void PRINTF_FORMAT (3, 4)
+parse_error (struct dfm_reader *reader, const struct substring *token,
+             const char *format, ...)
+{
+  va_list args;
+  va_start (args, format);
+  parse_msg (reader, token, xvasprintf (format, args), MSG_S_ERROR);
+  va_end (args);
  }
  
-static struct casereader *
-preprocess (struct casereader *casereader0, const struct dictionary *dict, void *aux)
+/* Advance to beginning of next token. */
+static bool
+more_tokens (struct substring *p, struct dfm_reader *r)
  {
-  struct matrix_format *mformat = aux;
-  const struct caseproto *proto = casereader_get_proto (casereader0);
-  struct casewriter *writer = autopaging_writer_create (proto);
-  struct ccase *prev_case = NULL;
-  double **matrices = NULL;
-  size_t n_splits = 0;
-
-  const size_t sizeof_matrix =
-    sizeof (double) * mformat->n_continuous_vars * mformat->n_continuous_vars;
-
-
-  /* Make an initial pass to populate our temporary matrix */
-  struct casereader *pass0 = casereader_clone (casereader0);
-  struct ccase *c;
-  union value *prev_values = XCALLOC (mformat->n_split_vars,  union value);
-  int row = (mformat->triangle == LOWER && mformat->diagonal == NO_DIAGONAL) ? 1 : 0;
-  bool first_case = true;
-  for (; (c = casereader_read (pass0)) != NULL; case_unref (c))
+  for (;;)
      {
-      int s;
-      bool match = false;
-      if (!first_case)
-       {
-         match = true;
-         for (s = 0; s < mformat->n_split_vars; ++s)
-           {
-             const struct variable *svar = mformat->split_vars[s];
-             const union value *sv = case_data (c, svar);
-             if (! value_equal (prev_values + s, sv, var_get_width (svar)))
-               {
-                 match = false;
-                 break;
-               }
-           }
-       }
-      first_case = false;
+      ss_ltrim (p, ss_cstr (CC_SPACES ","));
+      if (p->length)
+        return true;
+
+      dfm_forward_record (r);
+      if (dfm_eof (r))
+        return false;
+      *p = dfm_get_record (r);
+    }
+}
  
-      if (matrices == NULL || ! match)
-       {
-         row = (mformat->triangle == LOWER && mformat->diagonal == NO_DIAGONAL) ?
-           1 : 0;
+static bool
+next_token (struct substring *p, struct dfm_reader *r, struct substring *token)
+{
+  if (!more_tokens (p, r))
+    return false;
  
-         n_splits++;
-         matrices = xrealloc (matrices, sizeof (double*)  * n_splits);
-         matrices[n_splits - 1] = xmalloc (sizeof_matrix);
-       }
+  /* Collect token. */
+  int c = ss_first (*p);
+  if (c == '\'' || c == '"')
+    {
+      ss_advance (p, 1);
+      ss_get_until (p, c, token);
+    }
+  else
+    {
+      size_t n = 1;
+      for (;;)
+        {
+          c = ss_at (*p, n);
+          if (c == EOF
+              || ss_find_byte (ss_cstr (CC_SPACES ","), c) != SIZE_MAX
+              || ((c == '+' || c == '-')
+                  && ss_find_byte (ss_cstr ("dDeE"),
+                                   ss_at (*p, n - 1)) == SIZE_MAX))
+            break;
+          n++;
+        }
+      ss_get_bytes (p, n, token);
+    }
+  return true;
+}
  
-      for (s = 0; s < mformat->n_split_vars; ++s)
-       {
-         const struct variable *svar = mformat->split_vars[s];
-         const union value *sv = case_data (c, svar);
-         value_clone (prev_values + s, sv, var_get_width (svar));
-       }
+static bool
+next_number (struct substring *p, struct dfm_reader *r, double *d)
+{
+  struct substring token;
+  if (!next_token (p, r, &token))
+    return false;
+
+  union value v;
+  char *error = data_in (token, dfm_reader_get_encoding (r), FMT_F,
+                         settings_get_fmt_settings (), &v, 0, NULL);
+  if (error)
+    {
+      parse_error (r, &token, "%s", error);
+      free (error);
+    }
+  *d = v.f;
+  return true;
+}
  
-      int c_offset = (mformat->triangle == UPPER) ? row : 0;
-      if (mformat->triangle == UPPER && mformat->diagonal == NO_DIAGONAL)
-       c_offset++;
-      const union value *v = case_data (c, mformat->rowtype);
-      const char *val = CHAR_CAST (const char *, v->s);
-      if (0 == strncasecmp (val, "corr    ", ROWTYPE_WIDTH) ||
-         0 == strncasecmp (val, "cov     ", ROWTYPE_WIDTH))
-       {
-         if (row >= mformat->n_continuous_vars)
-           {
-             msg (SE,
-                  _("There are %d variable declared but the data has at least %d matrix rows."),
-                  mformat->n_continuous_vars, row + 1);
-             case_unref (c);
-             casereader_destroy (pass0);
-             free (prev_values);
-             goto error;
-           }
-         int col;
-         for (col = c_offset; col < mformat->n_continuous_vars; ++col)
-           {
-             const struct variable *var =
-               dict_get_var (dict,
-                             1 + col - c_offset +
-                             var_get_dict_index (mformat->varname));
+static bool
+next_rowtype (struct substring *p, struct dfm_reader *r, enum rowtype *rt)
+{
+  struct substring token;
+  if (!next_token (p, r, &token))
+    return false;
  
-             double e = case_data (c, var)->f;
-             if (e == SYSMIS)
-               continue;
+  if (rowtype_from_string (token, rt))
+    return true;
  
-             /* Fill in the lower triangle */
-             (matrices[n_splits-1])[col + mformat->n_continuous_vars * row] = e;
+  parse_error (r, &token, _("Unknown row type \"%.*s\"."),
+               (int) token.length, token.string);
+  return false;
+}
  
-             if (mformat->triangle != FULL)
-               /* Fill in the upper triangle */
-               (matrices[n_splits-1]) [row + mformat->n_continuous_vars * col] = e;
-           }
-         row++;
-       }
-    }
-  casereader_destroy (pass0);
-  free (prev_values);
+struct read_matrix_params
+  {
+    /* Adjustments to first and last row to read. */
+    int dy0, dy1;
  
-  if (!matrices)
-    goto error;
+    /* Left and right columns to read in first row, inclusive.
+       For x1, INT_MAX is the rightmost column. */
+    int x0, x1;
  
-  /* Now make a second pass to fill in the other triangle from our
-     temporary matrix */
-  const int idx = var_get_dict_index (mformat->varname);
-  row = 0;
+    /* Adjustment to x0 and x1 for each subsequent row we read.  Each of these
+       is 0 to keep it the same or -1 or +1 to adjust it by that much. */
+    int dx0, dx1;
+  };
  
-  if (mformat->n >= 0)
+static const struct read_matrix_params *
+get_read_matrix_params (const struct matrix_format *mf)
+{
+  if (mf->triangle == FULL)
      {
-      int col;
-      struct ccase *outcase = case_create (proto);
-      union value *v = case_data_rw (outcase, mformat->rowtype);
-      memcpy (v->s, "N       ", ROWTYPE_WIDTH);
-      blank_varname_column (outcase, mformat->varname);
-      for (col = 0; col < mformat->n_continuous_vars; ++col)
-       {
-         union value *dest_val =
-           case_data_rw_idx (outcase,
-                             1 + col + var_get_dict_index (mformat->varname));
-         dest_val->f = mformat->n;
-       }
-      casewriter_write (writer, outcase);
+      /* 1 2 3 4
+         2 1 5 6
+         3 5 1 7
+         4 6 7 1 */
+      static const struct read_matrix_params rmp = { 0, 0, 0, INT_MAX, 0, 0 };
+      return &rmp;
+    }
+  else if (mf->triangle == LOWER)
+    {
+      if (mf->diagonal == DIAGONAL)
+        {
+          /* 1 . . .
+             2 1 . .
+             3 5 1 .
+             4 6 7 1 */
+          static const struct read_matrix_params rmp = { 0, 0, 0, 0, 0, 1 };
+          return &rmp;
+        }
+      else
+        {
+          /* . . . .
+             2 . . .
+             3 5 . .
+             4 6 7 . */
+          static const struct read_matrix_params rmp = { 1, 0, 0, 0, 0, 1 };
+          return &rmp;
+        }
      }
+  else if (mf->triangle == UPPER)
+    {
+      if (mf->diagonal == DIAGONAL)
+        {
+          /* 1 2 3 4
+             . 1 5 6
+             . . 1 7
+             . . . 1 */
+          static const struct read_matrix_params rmp = { 0, 0, 0, INT_MAX, 1, 0 };
+          return &rmp;
+        }
+      else
+        {
+          /* . 2 3 4
+             . . 5 6
+             . . . 7
+             . . . . */
+          static const struct read_matrix_params rmp = { 0, -1, 1, INT_MAX, 1, 0 };
+          return &rmp;
+        }
+    }
+  else
+    NOT_REACHED ();
+}
  
-  n_splits = 0;
-  prev_values = xcalloc (mformat->n_split_vars, sizeof *prev_values);
-  first_case = true;
-  for (; (c = casereader_read (casereader0)) != NULL; prev_case = c)
+static void
+schedule_matrices (struct matrix_format *mf)
+{
+  struct matrix_sched *ms0 = &mf->ms[0];
+  ms0->nr = 1;
+  ms0->nc = 1;
+  ms0->rp = xmalloc (sizeof *ms0->rp);
+  ms0->rp[0] = (struct row_sched) { .y = 0, .x0 = 0, .x1 = 1 };
+  ms0->n_rp = 1;
+
+  struct matrix_sched *ms1 = &mf->ms[1];
+  ms1->nr = 1;
+  ms1->nc = mf->n_cvars;
+  ms1->rp = xmalloc (sizeof *ms1->rp);
+  ms1->rp[0] = (struct row_sched) { .y = 0, .x0 = 0, .x1 = mf->n_cvars };
+  ms1->n_rp = 1;
+
+  struct matrix_sched *ms2 = &mf->ms[2];
+  ms2->nr = mf->n_cvars;
+  ms2->nc = mf->n_cvars;
+  ms2->rp = xmalloc (mf->n_cvars * sizeof *ms2->rp);
+  ms2->n_rp = 0;
+
+  const struct read_matrix_params *rmp = get_read_matrix_params (mf);
+  int x0 = rmp->x0;
+  int x1 = rmp->x1 < mf->n_cvars ? rmp->x1 : mf->n_cvars - 1;
+  int y0 = rmp->dy0;
+  int y1 = (int) mf->n_cvars + rmp->dy1;
+  for (int y = y0; y < y1; y++)
      {
-      int s;
-      bool match = false;
-      if (!first_case)
-       {
-         match = true;
-         for (s = 0; s < mformat->n_split_vars; ++s)
-           {
-             const struct variable *svar = mformat->split_vars[s];
-             const union value *sv = case_data (c, svar);
-             if (! value_equal (prev_values + s, sv, var_get_width (svar)))
-               {
-                 match = false;
-                 break;
-               }
-           }
-       }
-      first_case = false;
-      if (! match)
-       {
-         n_splits++;
-         row = 0;
-       }
+      assert (x0 >= 0 && x0 < mf->n_cvars);
+      assert (x1 >= 0 && x1 < mf->n_cvars);
+      assert (x1 >= x0);
  
-      for (s = 0; s < mformat->n_split_vars; ++s)
-       {
-         const struct variable *svar = mformat->split_vars[s];
-         const union value *sv = case_data (c, svar);
-         value_clone (prev_values + s, sv, var_get_width (svar));
-       }
+      ms2->rp[ms2->n_rp++] = (struct row_sched) {
+        .y = y, .x0 = x0, .x1 = x1 + 1
+      };
  
-      case_unref (prev_case);
-      const union value *v = case_data (c, mformat->rowtype);
-      const char *val = CHAR_CAST (const char *, v->s);
-      if (mformat->n >= 0)
-       {
-         if (0 == strncasecmp (val, "n       ", ROWTYPE_WIDTH) ||
-             0 == strncasecmp (val, "n_vector", ROWTYPE_WIDTH))
-           {
-             msg (SW,
-                  _("The N subcommand was specified, but a N record was also found in the data.  The N record will be ignored."));
-             continue;
-           }
-       }
+      x0 += rmp->dx0;
+      x1 += rmp->dx1;
+    }
+}
+
+static bool
+read_id_columns (const struct matrix_format *mf,
+                 struct substring *p, struct dfm_reader *r,
+                 double *d, enum rowtype *rt)
+{
+  for (size_t i = 0; mf->input_vars[i] != mf->cvars[0]; i++)
+    if (!(mf->input_vars[i] == mf->rowtype
+          ? next_rowtype (p, r, rt)
+          : next_number (p, r, &d[i])))
+      return false;
+  return true;
+}
  
-      struct ccase *outcase = case_create (proto);
-      case_copy (outcase, 0, c, 0, caseproto_get_n_widths (proto));
+static bool
+equal_id_columns (const struct matrix_format *mf,
+                  const double *a, const double *b)
+{
+  for (size_t i = 0; mf->input_vars[i] != mf->cvars[0]; i++)
+    if (mf->input_vars[i] != mf->rowtype && a[i] != b[i])
+      return false;
+  return true;
+}
  
-      if (0 == strncasecmp (val, "corr    ", ROWTYPE_WIDTH) ||
-         0 == strncasecmp (val, "cov     ", ROWTYPE_WIDTH))
-       {
-         int col;
-         const struct variable *var = dict_get_var (dict, idx + 1 + row);
-         set_varname_column (outcase, mformat->varname, var_get_name (var));
-         value_copy (case_data_rw (outcase, mformat->rowtype), v, ROWTYPE_WIDTH);
+static bool
+equal_split_columns (const struct matrix_format *mf,
+                     const double *a, const double *b)
+{
+  for (size_t i = 0; i < mf->n_svars; i++)
+    {
+      size_t idx = mf->svar_indexes[i];
+      if (a[idx] != b[idx])
+        return false;
+    }
+  return true;
+}
  
-         for (col = 0; col < mformat->n_continuous_vars; ++col)
-           {
-             union value *dest_val =
-               case_data_rw_idx (outcase,
-                                 1 + col + var_get_dict_index (mformat->varname));
-             dest_val->f = (matrices[n_splits - 1])[col + mformat->n_continuous_vars * row];
-             if (col == row && mformat->diagonal == NO_DIAGONAL)
-               dest_val->f = 1.0;
-           }
-         row++;
-       }
-      else
-       {
-         blank_varname_column (outcase, mformat->varname);
-       }
+static bool
+is_pooled (const struct matrix_format *mf, const double *d)
+{
+  for (size_t i = 0; i < mf->n_fvars; i++)
+    if (d[mf->fvar_indexes[i]] != SYSMIS)
+      return false;
+  return true;
+}
  
-      /* Special case for SD and N_VECTOR: Rewrite as STDDEV and N respectively */
-      if (0 == strncasecmp (val, "sd      ", ROWTYPE_WIDTH))
-       {
-         value_copy_buf_rpad (case_data_rw (outcase, mformat->rowtype), ROWTYPE_WIDTH,
-                              (uint8_t *) "STDDEV", 6, ' ');
-       }
-      else if (0 == strncasecmp (val, "n_vector", ROWTYPE_WIDTH))
-       {
-         value_copy_buf_rpad (case_data_rw (outcase, mformat->rowtype), ROWTYPE_WIDTH,
-                              (uint8_t *) "N", 1, ' ');
-       }
+static void
+matrix_sched_init (const struct matrix_format *mf, enum rowtype rt,
+                   gsl_matrix *m)
+{
+  int n_dims = rowtype_dimensions (rt);
+  const struct matrix_sched *ms = &mf->ms[n_dims];
+  double diagonal = n_dims < 2 || rt != C_CORR ? SYSMIS : 1.0;
+  for (size_t y = 0; y < ms->nr; y++)
+    for (size_t x = 0; x < ms->nc; x++)
+      gsl_matrix_set (m, y, x, y == x ? diagonal : SYSMIS);
+}
+
+static void
+matrix_sched_output (const struct matrix_format *mf, enum rowtype rt,
+                     gsl_matrix *m, const double *d, int split_num,
+                     struct casewriter *w)
+{
+  int n_dims = rowtype_dimensions (rt);
+  const struct matrix_sched *ms = &mf->ms[n_dims];
  
-      casewriter_write (writer, outcase);
+  if (rt == C_N_SCALAR)
+    {
+      for (size_t x = 1; x < mf->n_cvars; x++)
+        gsl_matrix_set (m, 0, x, gsl_matrix_get (m, 0, 0));
+      rt = C_N;
      }
  
-  /* If NODIAGONAL is specified, then a final case must be written */
-  if (mformat->diagonal == NO_DIAGONAL)
+  for (int y = 0; y < ms->nr; y++)
      {
-      int col;
-      struct ccase *outcase = case_create (proto);
+      struct ccase *c = case_create (casewriter_get_proto (w));
+      for (size_t i = 0; mf->input_vars[i] != mf->cvars[0]; i++)
+        if (mf->input_vars[i] != mf->rowtype)
+          *case_num_rw (c, mf->input_vars[i]) = d[i];
+      if (mf->n_svars && !mf->svar_indexes)
+        *case_num_rw (c, mf->svars[0]) = split_num;
+      set_string (c, mf->rowtype, rowtype_name (rt));
+      const char *varname = n_dims == 2 ? var_get_name (mf->cvars[y]) : "";
+      set_string (c, mf->varname, ss_cstr (varname));
+      for (int x = 0; x < mf->n_cvars; x++)
+        *case_num_rw (c, mf->cvars[x]) = gsl_matrix_get (m, y, x);
+      casewriter_write (w, c);
+    }
+}
  
-      if (prev_case)
-       case_copy (outcase, 0, prev_case, 0, caseproto_get_n_widths (proto));
+static void
+matrix_sched_output_n (const struct matrix_format *mf, double n,
+                       gsl_matrix *m, const double *d, int split_num,
+                       struct casewriter *w)
+{
+  gsl_matrix_set (m, 0, 0, n);
+  matrix_sched_output (mf, C_N_SCALAR, m, d, split_num, w);
+}
  
-      const struct variable *var = dict_get_var (dict, idx + 1 + row);
-      set_varname_column (outcase, mformat->varname, var_get_name (var));
+static void
+check_eol (const struct matrix_format *mf, struct substring *p,
+           struct dfm_reader *r)
+{
+  if (!mf->span)
+    {
+      ss_ltrim (p, ss_cstr (CC_SPACES ","));
+      if (p->length)
+        {
+          parse_error (r, p, _("Extraneous data expecting end of line."));
+          p->length = 0;
+        }
+    }
+}
  
-      for (col = 0; col < mformat->n_continuous_vars; ++col)
-       {
-         union value *dest_val =
-           case_data_rw_idx (outcase, 1 + col +
-                             var_get_dict_index (mformat->varname));
-         dest_val->f = (matrices[n_splits - 1]) [col + mformat->n_continuous_vars * row];
-         if (col == row && mformat->diagonal == NO_DIAGONAL)
-           dest_val->f = 1.0;
-       }
+static void
+parse_data_with_rowtype (const struct matrix_format *mf,
+                         struct dfm_reader *r, struct casewriter *w)
+{
+  if (dfm_eof (r))
+    return;
+  struct substring p = dfm_get_record (r);
  
-      casewriter_write (writer, outcase);
-    }
-  free (prev_values);
+  double *prev = NULL;
+  gsl_matrix *m = gsl_matrix_alloc (mf->n_cvars, mf->n_cvars);
  
-  if (prev_case)
-    case_unref (prev_case);
+  double *d = xnmalloc (mf->n_input_vars, sizeof *d);
+  enum rowtype rt;
  
-  int i;
-  for (i = 0 ; i < n_splits; ++i)
-    free (matrices[i]);
-  free (matrices);
-  struct casereader *reader1 = casewriter_make_reader (writer);
-  casereader_destroy (casereader0);
-  return reader1;
+  double *d_next = xnmalloc (mf->n_input_vars, sizeof *d_next);
  
+  if (!read_id_columns (mf, &p, r, d, &rt))
+    goto exit;
+  for (;;)
+    {
+      /* If this has rowtype N but there was an N subcommand, then the
+         subcommand takes precedence, so we will suppress outputting this
+         record.  We still need to parse it, though, so we can't skip other
+         work. */
+      bool suppress_output = mf->n >= 0 && (rt == C_N || rt == C_N_SCALAR);
+      if (suppress_output)
+        parse_error (r, NULL, _("N record is not allowed with N subcommand.  "
+                                "Ignoring N record."));
+
+      /* If there's an N subcommand, and this is a new split, then output an N
+         record. */
+      if (mf->n >= 0 && (!prev || !equal_split_columns (mf, prev, d)))
+        {
+          matrix_sched_output_n (mf, mf->n, m, d, 0, w);
  
-error:
-  if (prev_case)
-    case_unref (prev_case);
-
-  if (matrices)
-    for (i = 0 ; i < n_splits; ++i)
-      free (matrices[i]);
-  free (matrices);
-  casereader_destroy (casereader0);
-  casewriter_destroy (writer);
-  return NULL;
+          if (!prev)
+            prev = xnmalloc (mf->n_input_vars, sizeof *prev);
+          memcpy (prev, d, mf->n_input_vars * sizeof *prev);
+        }
+
+      /* Usually users don't provide the CONTENTS subcommand with ROWTYPE_, but
+         if they did then warn if ROWTYPE_ is an unexpected type. */
+      if (mf->factor_rowtype_mask || mf->pooled_rowtype_mask)
+        {
+          const char *name = rowtype_name (rt).string;
+          if (is_pooled (mf, d))
+            {
+              if (!((1u << rt) & mf->pooled_rowtype_mask))
+                parse_warning (r, NULL, _("Data contains pooled row type %s not "
+                                          "included in CONTENTS."), name);
+            }
+          else
+            {
+              if (!((1u << rt) & mf->factor_rowtype_mask))
+                parse_warning (r, NULL, _("Data contains with-factors row type "
+                                          "%s not included in CONTENTS."), name);
+            }
+        }
+
+      /* Initialize the matrix to be filled-in. */
+      int n_dims = rowtype_dimensions (rt);
+      const struct matrix_sched *ms = &mf->ms[n_dims];
+      matrix_sched_init (mf, rt, m);
+
+      enum rowtype rt_next;
+      bool eof;
+
+      size_t n_rows;
+      for (n_rows = 1; ; n_rows++)
+        {
+          if (n_rows <= ms->n_rp)
+            {
+              const struct row_sched *rs = &ms->rp[n_rows - 1];
+              size_t y = rs->y;
+              for (size_t x = rs->x0; x < rs->x1; x++)
+                {
+                  double e;
+                  if (!next_number (&p, r, &e))
+                    goto exit;
+                  gsl_matrix_set (m, y, x, e);
+                  if (n_dims == 2 && mf->triangle != FULL)
+                    gsl_matrix_set (m, x, y, e);
+                }
+              check_eol (mf, &p, r);
+            }
+          else
+            {
+              /* Suppress bad input data.  We'll issue an error later. */
+              p.length = 0;
+            }
+
+          eof = (!more_tokens (&p, r)
+                 || !read_id_columns (mf, &p, r, d_next, &rt_next));
+          if (eof)
+            break;
+
+          if (!equal_id_columns (mf, d, d_next) || rt_next != rt)
+            break;
+        }
+      if (!suppress_output)
+        matrix_sched_output (mf, rt, m, d, 0, w);
+
+      if (n_rows != ms->n_rp)
+        parse_error (r, NULL,
+                     _("Matrix %s had %zu rows but %zu rows were expected."),
+                     rowtype_name (rt).string, n_rows, ms->n_rp);
+      if (eof)
+        break;
+
+      double *d_tmp = d;
+      d = d_next;
+      d_next = d_tmp;
+
+      rt = rt_next;
+    }
+
+exit:
+  free (prev);
+  gsl_matrix_free (m);
+  free (d);
+  free (d_next);
  }
  
-int
-cmd_matrix (struct lexer *lexer, struct dataset *ds)
+static void
+parse_matrix_without_rowtype (const struct matrix_format *mf,
+                              struct substring *p, struct dfm_reader *r,
+                              gsl_matrix *m, enum rowtype rowtype, bool pooled,
+                              int split_num, struct casewriter *w)
  {
-  struct dictionary *dict;
-  struct data_parser *parser;
-  struct dfm_reader *reader;
-  struct file_handle *fh = NULL;
-  char *encoding = NULL;
-  struct matrix_format mformat;
-  int i;
-  size_t n_names;
-  char **names = NULL;
+  int n_dims = rowtype_dimensions (rowtype);
+  const struct matrix_sched *ms = &mf->ms[n_dims];
  
-  mformat.triangle = LOWER;
-  mformat.diagonal = DIAGONAL;
-  mformat.n_split_vars = 0;
-  mformat.split_vars = NULL;
-  mformat.n = -1;
+  double *d = xnmalloc (mf->n_input_vars, sizeof *d);
+  matrix_sched_init (mf, rowtype, m);
+  for (size_t i = 0; i < ms->n_rp; i++)
+    {
+      int y = ms->rp[i].y;
+      int k = 0;
+      int h = 0;
+      for (size_t j = 0; j < mf->n_input_vars; j++)
+        {
+          const struct variable *iv = mf->input_vars[j];
+          if (k < mf->n_cvars && iv == mf->cvars[k])
+            {
+              if (k < ms->rp[i].x1 - ms->rp[i].x0)
+                {
+                  double e;
+                  if (!next_number (p, r, &e))
+                    goto exit;
+
+                  int x = k + ms->rp[i].x0;
+                  gsl_matrix_set (m, y, x, e);
+                  if (n_dims == 2 && mf->triangle != FULL)
+                    gsl_matrix_set (m, x, y, e);
+                }
+              k++;
+              continue;
+            }
+          if (h < mf->n_fvars && iv == mf->fvars[h])
+            {
+              h++;
+              if (pooled)
+                {
+                  d[j] = SYSMIS;
+                  continue;
+                }
+            }
+
+          double e;
+          if (!next_number (p, r, &e))
+            goto exit;
+          d[j] = e;
+        }
+      check_eol (mf, p, r);
+    }
+
+  matrix_sched_output (mf, rowtype, m, d, split_num, w);
+exit:
+  free (d);
+}
  
-  dict = (in_input_program ()
-          ? dataset_dict (ds)
-          : dict_create (get_default_encoding ()));
-  parser = data_parser_create (dict);
-  reader = NULL;
+static void
+parse_data_without_rowtype (const struct matrix_format *mf,
+                            struct dfm_reader *r, struct casewriter *w)
+{
+  if (dfm_eof (r))
+    return;
+  struct substring p = dfm_get_record (r);
  
-  data_parser_set_type (parser, DP_DELIMITED);
-  data_parser_set_warn_missing_fields (parser, false);
-  data_parser_set_span (parser, false);
+  gsl_matrix *m = gsl_matrix_alloc (mf->n_cvars, mf->n_cvars);
  
-  mformat.rowtype = dict_create_var (dict, "ROWTYPE_", ROWTYPE_WIDTH);
+  int split_num = 1;
+  do
+    {
+      for (size_t i = 0; i < mf->n_contents; )
+        {
+          size_t j = i;
+          if (mf->contents[i].open)
+            while (!mf->contents[j].close)
+              j++;
+
+          if (mf->contents[i].open)
+            {
+              for (size_t k = 0; k < mf->cells; k++)
+                for (size_t h = i; h <= j; h++)
+                  parse_matrix_without_rowtype (mf, &p, r, m,
+                                                mf->contents[h].rowtype, false,
+                                                split_num, w);
+            }
+          else
+            parse_matrix_without_rowtype (mf, &p, r, m, mf->contents[i].rowtype,
+                                          true, split_num, w);
+          i = j + 1;
+        }
  
-  mformat.n_continuous_vars = 0;
-  mformat.n_split_vars = 0;
+      split_num++;
+    }
+  while (more_tokens (&p, r));
  
-  if (! lex_force_match_id (lexer, "VARIABLES"))
-    goto error;
+  gsl_matrix_free (m);
+}
  
+/* Parses VARIABLES=varnames for MATRIX DATA and returns a dictionary with the
+   named variables in it. */
+static struct dictionary *
+parse_matrix_data_variables (struct lexer *lexer)
+{
+  if (!lex_force_match_id (lexer, "VARIABLES"))
+    return NULL;
    lex_match (lexer, T_EQUALS);
  
-  if (! parse_mixed_vars (lexer, dict, &names, &n_names, PV_NO_DUPLICATE))
+  struct dictionary *dict = dict_create (get_default_encoding ());
+
+  size_t n_names = 0;
+  char **names = NULL;
+  if (!parse_DATA_LIST_vars (lexer, dict, &names, &n_names, PV_NO_DUPLICATE))
      {
-      int i;
-      for (i = 0; i < n_names; ++i)
-       free (names[i]);
-      free (names);
-      goto error;
+      dict_unref (dict);
+      return NULL;
      }
  
-  int longest_name = 0;
-  for (i = 0; i < n_names; ++i)
+  for (size_t i = 0; i < n_names; i++)
+    if (!strcasecmp (names[i], "ROWTYPE_"))
+      dict_create_var_assert (dict, "ROWTYPE_", 8);
+    else
+      dict_create_var_assert (dict, names[i], 0);
+
+  for (size_t i = 0; i < n_names; ++i)
+    free (names[i]);
+  free (names);
+
+  if (dict_lookup_var (dict, "VARNAME_"))
      {
-      maximize_int (&longest_name, strlen (names[i]));
+      msg (SE, _("VARIABLES may not include VARNAME_."));
+      dict_unref (dict);
+      return NULL;
      }
+  return dict;
+}
  
-  mformat.varname = dict_create_var (dict, "VARNAME_",
-                                    8 * DIV_RND_UP (longest_name, 8));
+static bool
+parse_matrix_data_subvars (struct lexer *lexer, struct dictionary *dict,
+                           bool *taken_vars,
+                           struct variable ***vars, size_t **indexes,
+                           size_t *n_vars)
+{
+  if (!parse_variables (lexer, dict, vars, n_vars, 0))
+    return false;
  
-  for (i = 0; i < n_names; ++i)
+  *indexes = xnmalloc (*n_vars, sizeof **indexes);
+  for (size_t i = 0; i < *n_vars; i++)
      {
-      if (0 == strcasecmp (names[i], "ROWTYPE_"))
-       {
-         const struct fmt_spec fmt = fmt_for_input (FMT_A, 8, 0);
-         data_parser_add_delimited_field (parser,
-                                          &fmt,
-                                          var_get_case_index (mformat.rowtype),
-                                          "ROWTYPE_");
-       }
-      else
-       {
-         const struct fmt_spec fmt = fmt_for_input (FMT_F, 10, 4);
-         struct variable *v = dict_create_var (dict, names[i], 0);
-         var_set_both_formats (v, &fmt);
-         data_parser_add_delimited_field (parser,
-                                          &fmt,
-                                          var_get_case_index (mformat.varname) +
-                                          ++mformat.n_continuous_vars,
-                                          names[i]);
-       }
+      struct variable *v = (*vars)[i];
+      if (!strcasecmp (var_get_name (v), "ROWTYPE_"))
+        {
+          msg (SE, _("ROWTYPE_ is not allowed on SPLIT or FACTORS."));
+          goto error;
+        }
+      (*indexes)[i] = var_get_dict_index (v);
+
+      bool *tv = &taken_vars[var_get_dict_index (v)];
+      if (*tv)
+        {
+          msg (SE, _("%s may not appear on both SPLIT and FACTORS."),
+               var_get_name (v));
+          goto error;
+        }
+      *tv = true;
+
+      var_set_both_formats (v, &(struct fmt_spec) { .type = FMT_F, .w = 4 });
      }
-  for (i = 0; i < n_names; ++i)
-    free (names[i]);
-  free (names);
+  return true;
+
+error:
+  free (*vars);
+  *vars = NULL;
+  *n_vars = 0;
+  free (*indexes);
+  *indexes = NULL;
+  return false;
+}
+
+int
+cmd_matrix_data (struct lexer *lexer, struct dataset *ds)
+{
+  struct dictionary *dict = parse_matrix_data_variables (lexer);
+  if (!dict)
+    return CMD_FAILURE;
+
+  size_t n_input_vars = dict_get_var_cnt (dict);
+  struct variable **input_vars = xnmalloc (n_input_vars, sizeof *input_vars);
+  for (size_t i = 0; i < n_input_vars; i++)
+    input_vars[i] = dict_get_var (dict, i);
+
+  int varname_width = 8;
+  for (size_t i = 0; i < n_input_vars; i++)
+    {
+      int w = strlen (var_get_name (input_vars[i]));
+      varname_width = MAX (w, varname_width);
+    }
+
+  struct variable *rowtype = dict_lookup_var (dict, "ROWTYPE_");
+  bool input_rowtype = rowtype != NULL;
+  if (!rowtype)
+    rowtype = dict_create_var_assert (dict, "ROWTYPE_", 8);
+
+  struct matrix_format mf = {
+    .input_rowtype = input_rowtype,
+    .input_vars = input_vars,
+    .n_input_vars = n_input_vars,
  
+    .rowtype = rowtype,
+    .varname = dict_create_var_assert (dict, "VARNAME_", varname_width),
+
+    .triangle = LOWER,
+    .diagonal = DIAGONAL,
+    .n = -1,
+    .cells = -1,
+  };
+
+  bool *taken_vars = xzalloc (n_input_vars);
+  if (input_rowtype)
+    taken_vars[var_get_dict_index (rowtype)] = true;
+
+  struct file_handle *fh = NULL;
    while (lex_token (lexer) != T_ENDCMD)
      {
-      if (! lex_force_match (lexer, T_SLASH))
+      if (!lex_force_match (lexer, T_SLASH))
         goto error;
  
        if (lex_match_id (lexer, "N"))
         {
           lex_match (lexer, T_EQUALS);
  
-         if (! lex_force_int_range (lexer, "N", 0, INT_MAX))
+         if (!lex_force_int_range (lexer, "N", 0, INT_MAX))
             goto error;
  
-         mformat.n = lex_integer (lexer);
+         mf.n = lex_integer (lexer);
           lex_get (lexer);
         }
        else if (lex_match_id (lexer, "FORMAT"))
         {
           lex_match (lexer, T_EQUALS);
  
-         while (lex_token (lexer) != T_SLASH && (lex_token (lexer) != T_ENDCMD))
+         while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD)
             {
               if (lex_match_id (lexer, "LIST"))
-               {
-                 data_parser_set_span (parser, false);
-               }
+                mf.span = false;
               else if (lex_match_id (lexer, "FREE"))
-               {
-                 data_parser_set_span (parser, true);
-               }
+                mf.span = true;
               else if (lex_match_id (lexer, "UPPER"))
-               {
-                 mformat.triangle = UPPER;
-               }
+                mf.triangle = UPPER;
               else if (lex_match_id (lexer, "LOWER"))
-               {
-                 mformat.triangle = LOWER;
-               }
+                mf.triangle = LOWER;
               else if (lex_match_id (lexer, "FULL"))
-               {
-                 mformat.triangle = FULL;
-               }
+                mf.triangle = FULL;
               else if (lex_match_id (lexer, "DIAGONAL"))
-               {
-                 mformat.diagonal = DIAGONAL;
-               }
+                mf.diagonal = DIAGONAL;
               else if (lex_match_id (lexer, "NODIAGONAL"))
-               {
-                 mformat.diagonal = NO_DIAGONAL;
-               }
+                mf.diagonal = NO_DIAGONAL;
               else
                 {
                   lex_error (lexer, NULL);
@@ -531,132 +992,204 @@ cmd_matrix (struct lexer *lexer, struct dataset *ds)
           lex_match (lexer, T_EQUALS);
            fh_unref (fh);
           fh = fh_parse (lexer, FH_REF_FILE | FH_REF_INLINE, NULL);
-         if (fh == NULL)
+         if (!fh)
             goto error;
         }
-      else if (lex_match_id (lexer, "SPLIT"))
+      else if (!mf.n_svars && lex_match_id (lexer, "SPLIT"))
+        {
+          lex_match (lexer, T_EQUALS);
+          if (!mf.input_rowtype
+              && lex_token (lexer) == T_ID
+              && !dict_lookup_var (dict, lex_tokcstr (lexer)))
+            {
+              mf.svars = xmalloc (sizeof *mf.svars);
+              mf.svars[0] = dict_create_var_assert (dict, lex_tokcstr (lexer),
+                                                    0);
+              var_set_both_formats (
+                mf.svars[0], &(struct fmt_spec) { .type = FMT_F, .w = 4 });
+              mf.n_svars = 1;
+              lex_get (lexer);
+            }
+          else if (!parse_matrix_data_subvars (lexer, dict, taken_vars,
+                                               &mf.svars, &mf.svar_indexes,
+                                               &mf.n_svars))
+            goto error;
+        }
+      else if (!mf.n_fvars && lex_match_id (lexer, "FACTORS"))
+        {
+          lex_match (lexer, T_EQUALS);
+          if (!parse_matrix_data_subvars (lexer, dict, taken_vars,
+                                          &mf.fvars, &mf.fvar_indexes,
+                                          &mf.n_fvars))
+            goto error;
+        }
+      else if (lex_match_id (lexer, "CELLS"))
         {
+          if (mf.input_rowtype)
+            msg (SW, _("CELLS is ignored when VARIABLES includes ROWTYPE_"));
+
           lex_match (lexer, T_EQUALS);
-         if (! parse_variables (lexer, dict, &mformat.split_vars, &mformat.n_split_vars, 0))
-           {
-             free (mformat.split_vars);
-             goto error;
-           }
-         int i;
-         for (i = 0; i < mformat.n_split_vars; ++i)
-           {
-             const struct fmt_spec fmt = fmt_for_input (FMT_F, 4, 0);
-             var_set_both_formats (mformat.split_vars[i], &fmt);
-           }
-         dict_reorder_vars (dict, mformat.split_vars, mformat.n_split_vars);
-         mformat.n_continuous_vars -= mformat.n_split_vars;
+
+         if (!lex_force_int_range (lexer, "CELLS", 0, INT_MAX))
+           goto error;
+
+         mf.cells = lex_integer (lexer);
+         lex_get (lexer);
         }
+      else if (lex_match_id (lexer, "CONTENTS"))
+        {
+          lex_match (lexer, T_EQUALS);
+
+          size_t allocated_contents = mf.n_contents;
+          bool in_parens = false;
+          for (;;)
+            {
+              bool open = !in_parens && lex_match (lexer, T_LPAREN);
+              enum rowtype rt;
+              if (!rowtype_parse (lexer, &rt))
+                {
+                  if (open || in_parens || (lex_token (lexer) != T_ENDCMD
+                                            && lex_token (lexer) != T_SLASH))
+                    {
+                      lex_error (lexer, _("Row type keyword expected."));
+                      goto error;
+                    }
+                  break;
+                }
+
+              if (open)
+                in_parens = true;
+
+              if (in_parens)
+                mf.factor_rowtype_mask |= 1u << rt;
+              else
+                mf.pooled_rowtype_mask |= 1u << rt;
+
+              bool close = in_parens && lex_match (lexer, T_RPAREN);
+              if (close)
+                in_parens = false;
+
+              if (mf.n_contents >= allocated_contents)
+                mf.contents = x2nrealloc (mf.contents, &allocated_contents,
+                                          sizeof *mf.contents);
+              mf.contents[mf.n_contents++] = (struct content) {
+                .open = open, .rowtype = rt, .close = close
+              };
+            }
+        }
        else
         {
           lex_error (lexer, NULL);
           goto error;
         }
      }
-
-  if (mformat.diagonal == NO_DIAGONAL && mformat.triangle == FULL)
+  if (mf.diagonal == NO_DIAGONAL && mf.triangle == FULL)
      {
-      msg (SE, _("FORMAT = FULL and FORMAT = NODIAGONAL are mutually exclusive."));
+      msg (SE, _("FORMAT=FULL and FORMAT=NODIAGONAL are mutually exclusive."));
        goto error;
      }
+  if (!mf.input_rowtype)
+    {
+      if (mf.cells < 0)
+        {
+          if (mf.n_fvars)
+            {
+              msg (SE, _("CELLS is required when factor variables are specified "
+                         "and VARIABLES does not include ROWTYPE_."));
+              goto error;
+            }
+          mf.cells = 1;
+        }
  
-  if (fh == NULL)
-    fh = fh_inline_file ();
-  fh_set_default_handle (fh);
+      if (!mf.n_contents)
+        {
+          msg (SW, _("CONTENTS was not specified and VARIABLES does not "
+                     "include ROWTYPE_.  Assuming CONTENTS=CORR."));
  
-  if (!data_parser_any_fields (parser))
+          mf.n_contents = 1;
+          mf.contents = xmalloc (sizeof *mf.contents);
+          *mf.contents = (struct content) { .rowtype = C_CORR };
+        }
+    }
+  mf.cvars = xmalloc (mf.n_input_vars * sizeof *mf.cvars);
+  for (size_t i = 0; i < mf.n_input_vars; i++)
+    if (!taken_vars[i])
+      {
+        struct variable *v = input_vars[i];
+        mf.cvars[mf.n_cvars++] = v;
+        var_set_both_formats (v, &(struct fmt_spec) { .type = FMT_F, .w = 10,
+                                                      .d = 4 });
+      }
+  if (!mf.n_cvars)
      {
-      msg (SE, _("At least one variable must be specified."));
+      msg (SE, _("At least one continuous variable is required."));
        goto error;
      }
-
-  if (lex_end_of_command (lexer) != CMD_SUCCESS)
-    goto error;
-
-  reader = dfm_open_reader (fh, lexer, encoding);
-  if (reader == NULL)
-    goto error;
-
-  if (in_input_program ())
+  if (mf.input_rowtype)
      {
-      struct data_list_trns *trns = xmalloc (sizeof *trns);
-      trns->parser = parser;
-      trns->reader = reader;
-      trns->end = NULL;
-      add_transformation (ds, data_list_trns_proc, data_list_trns_free, trns);
+      for (size_t i = 0; i < mf.n_cvars; i++)
+        if (mf.cvars[i] != input_vars[n_input_vars - mf.n_cvars + i])
+          {
+            msg (SE, _("VARIABLES includes ROWTYPE_ but the continuous "
+                       "variables are not the last ones on VARIABLES."));
+            goto error;
+          }
      }
-  else
+  unsigned int rowtype_mask = mf.pooled_rowtype_mask | mf.factor_rowtype_mask;
+  if (rowtype_mask & (1u << C_N) && mf.n >= 0)
      {
-      data_parser_make_active_file (parser, ds, reader, dict, preprocess,
-                                   &mformat);
+      msg (SE, _("Cannot specify N on CONTENTS along with the N subcommand."));
+      goto error;
      }
  
-  fh_unref (fh);
-  free (encoding);
-  free (mformat.split_vars);
+  struct variable **order = xnmalloc (dict_get_var_cnt (dict), sizeof *order);
+  size_t n_order = 0;
+  for (size_t i = 0; i < mf.n_svars; i++)
+    order[n_order++] = mf.svars[i];
+  order[n_order++] = mf.rowtype;
+  for (size_t i = 0; i < mf.n_fvars; i++)
+    order[n_order++] = mf.fvars[i];
+  order[n_order++] = mf.varname;
+  for (size_t i = 0; i < mf.n_cvars; i++)
+    order[n_order++] = mf.cvars[i];
+  assert (n_order == dict_get_var_cnt (dict));
+  dict_reorder_vars (dict, order, n_order);
+  free (order);
  
-  return CMD_DATA_LIST;
+  dict_set_split_vars (dict, mf.svars, mf.n_svars);
  
- error:
-  data_parser_destroy (parser);
-  if (!in_input_program ())
-    dict_unref (dict);
-  fh_unref (fh);
-  free (encoding);
-  free (mformat.split_vars);
-  return CMD_CASCADING_FAILURE;
-}
+  schedule_matrices (&mf);
  
-\f
-/* Input procedure. */
+  if (fh == NULL)
+    fh = fh_inline_file ();
  
-/* Destroys DATA LIST transformation TRNS.
-   Returns true if successful, false if an I/O error occurred. */
-static bool
-data_list_trns_free (void *trns_)
-{
-  struct data_list_trns *trns = trns_;
-  data_parser_destroy (trns->parser);
-  dfm_close_reader (trns->reader);
-  free (trns);
-  return true;
-}
+  if (lex_end_of_command (lexer) != CMD_SUCCESS)
+    goto error;
  
-/* Handle DATA LIST transformation TRNS, parsing data into *C. */
-static int
-data_list_trns_proc (void *trns_, struct ccase **c, casenumber case_num UNUSED)
-{
-  struct data_list_trns *trns = trns_;
-  int retval;
+  struct dfm_reader *reader = dfm_open_reader (fh, lexer, NULL);
+  if (reader == NULL)
+    goto error;
  
-  *c = case_unshare (*c);
-  if (data_parser_parse (trns->parser, trns->reader, *c))
-    retval = TRNS_CONTINUE;
-  else if (dfm_reader_error (trns->reader) || dfm_eof (trns->reader) > 1)
-    {
-      /* An I/O error, or encountering end of file for a second
-         time, should be escalated into a more serious error. */
-      retval = TRNS_ERROR;
-    }
+  struct casewriter *writer = autopaging_writer_create (dict_get_proto (dict));
+  if (mf.input_rowtype)
+    parse_data_with_rowtype (&mf, reader, writer);
    else
-    retval = TRNS_END_FILE;
+    parse_data_without_rowtype (&mf, reader, writer);
+  dfm_close_reader (reader);
  
-  /* If there was an END subcommand handle it. */
-  if (trns->end != NULL)
-    {
-      double *end = &case_data_rw (*c, trns->end)->f;
-      if (retval == TRNS_END_FILE)
-        {
-          *end = 1.0;
-          retval = TRNS_CONTINUE;
-        }
-      else
-        *end = 0.0;
-    }
+  dataset_set_dict (ds, dict);
+  dataset_set_source (ds, casewriter_make_reader (writer));
  
-  return retval;
+  matrix_format_uninit (&mf);
+  free (taken_vars);
+  fh_unref (fh);
+
+  return CMD_SUCCESS;
+
+ error:
+  matrix_format_uninit (&mf);
+  free (taken_vars);
+  dict_unref (dict);
+  fh_unref (fh);
+  return CMD_FAILURE;
  }
diff --git a/src/language/data-io/matrix-reader.c b/src/language/data-io/matrix-reader.c

index a7db15e029d8b9feac1d9111a8297edb87c6540b..cf7b524480334fa8a6e5568f5edb8470961b8c99 100644 (file)
--- a/src/language/data-io/matrix-reader.c
+++ b/src/language/data-io/matrix-reader.c
@@ -19,20 +19,26 @@
  #include "matrix-reader.h"
  
  #include <stdbool.h>
-
-#include <libpspp/message.h>
-#include <libpspp/str.h>
-#include <data/casegrouper.h>
-#include <data/casereader.h>
-#include <data/dictionary.h>
-#include <data/variable.h>
-#include <data/data-out.h>
-#include <data/format.h>
+#include <math.h>
+
+#include "data/casegrouper.h"
+#include "data/casereader.h"
+#include "data/data-out.h"
+#include "data/dataset.h"
+#include "data/dictionary.h"
+#include "data/format.h"
+#include "data/variable.h"
+#include "language/command.h"
+#include "libpspp/i18n.h"
+#include "libpspp/message.h"
+#include "libpspp/str.h"
+#include "output/pivot-table.h"
  
  #include "gettext.h"
  #define _(msgid) gettext (msgid)
  #define N_(msgid) msgid
  
+struct lexer;
  
  /*
  This module interprets a "data matrix", typically generated by the command
@@ -77,16 +83,22 @@ s_0 ROWTYPE_   VARNAME_   v_0         v_1         v_2
  
  */
  
+void
+matrix_material_uninit (struct matrix_material *mm)
+{
+  gsl_matrix_free (mm->corr);
+  gsl_matrix_free (mm->cov);
+  gsl_matrix_free (mm->n);
+  gsl_matrix_free (mm->mean_matrix);
+  gsl_matrix_free (mm->var_matrix);
+}
+\f
  struct matrix_reader
  {
    const struct dictionary *dict;
    const struct variable *varname;
    const struct variable *rowtype;
    struct casegrouper *grouper;
-
-  gsl_matrix *n_vectors;
-  gsl_matrix *mean_vectors;
-  gsl_matrix *var_vectors;
  };
  
  struct matrix_reader *
@@ -177,7 +189,10 @@ matrix_fill_row (gsl_matrix **matrix,
  {
    int col;
    if (*matrix == NULL)
-    *matrix = gsl_matrix_alloc (n_vars, n_vars);
+    {
+      *matrix = gsl_matrix_alloc (n_vars, n_vars);
+      gsl_matrix_set_all (*matrix, SYSMIS);
+    }
  
    for (col = 0; col < n_vars; ++col)
      {
@@ -189,6 +204,16 @@ matrix_fill_row (gsl_matrix **matrix,
      }
  }
  
+static int
+find_varname (const struct variable **vars, int n_vars,
+              const char *varname)
+{
+  for (int i = 0; i < n_vars; i++)
+    if (!strcasecmp (var_get_name (vars[i]), varname))
+      return i;
+  return -1;
+}
+
  bool
  next_matrix_from_reader (struct matrix_material *mm,
                          struct matrix_reader *mr,
@@ -198,87 +223,191 @@ next_matrix_from_reader (struct matrix_material *mm,
  
    assert (vars);
  
-  gsl_matrix_free (mr->n_vectors);
-  gsl_matrix_free (mr->mean_vectors);
-  gsl_matrix_free (mr->var_vectors);
-
    if (!casegrouper_get_next_group (mr->grouper, &group))
-    return false;
-
-  mr->n_vectors    = gsl_matrix_alloc (n_vars, n_vars);
-  mr->mean_vectors = gsl_matrix_alloc (n_vars, n_vars);
-  mr->var_vectors  = gsl_matrix_alloc (n_vars, n_vars);
+    {
+      *mm = (struct matrix_material) MATRIX_MATERIAL_INIT;
+      return false;
+    }
  
-  mm->n = mr->n_vectors;
-  mm->mean_matrix = mr->mean_vectors;
-  mm->var_matrix = mr->var_vectors;
+  *mm = (struct matrix_material) {
+    .n = gsl_matrix_calloc (n_vars, n_vars),
+    .mean_matrix = gsl_matrix_calloc (n_vars, n_vars),
+    .var_matrix = gsl_matrix_calloc (n_vars, n_vars),
+  };
  
-  struct substring *var_names = XCALLOC (n_vars,  struct substring);
-  for (int i = 0; i < n_vars; ++i)
+  struct matrix
      {
-      ss_alloc_substring (var_names + i, ss_cstr (var_get_name (vars[i])));
-    }
+      const char *name;
+      gsl_matrix **m;
+      size_t good_rows;
+      size_t bad_rows;
+    };
+  struct matrix matrices[] = {
+    { .name = "CORR", .m = &mm->corr },
+    { .name = "COV", .m = &mm->cov },
+  };
+  enum { N_MATRICES = 2 };
  
    struct ccase *c;
    for (; (c = casereader_read (group)); case_unref (c))
      {
-      const union value *uv = case_data (c, mr->rowtype);
-      const char *row_type = CHAR_CAST (const char *, uv->s);
-      int col, row;
-      for (col = 0; col < n_vars; ++col)
-       {
-         const struct variable *cv = vars[col];
-         double x = case_data (c, cv)->f;
-         if (0 == strncasecmp (row_type, "N       ", 8))
-           for (row = 0; row < n_vars; ++row)
-             gsl_matrix_set (mr->n_vectors, row, col, x);
-         else if (0 == strncasecmp (row_type, "MEAN    ", 8))
-           for (row = 0; row < n_vars; ++row)
-             gsl_matrix_set (mr->mean_vectors, row, col, x);
-         else if (0 == strncasecmp (row_type, "STDDEV  ", 8))
-           for (row = 0; row < n_vars; ++row)
-             gsl_matrix_set (mr->var_vectors, row, col, x * x);
-       }
-
-      const char *enc = dict_get_encoding (mr->dict);
+      struct substring rowtype = case_ss (c, mr->rowtype);
+      ss_rtrim (&rowtype, ss_cstr (CC_SPACES));
+
+      gsl_matrix *v
+        = (ss_equals_case (rowtype, ss_cstr ("N")) ? mm->n
+           : ss_equals_case (rowtype, ss_cstr ("MEAN")) ? mm->mean_matrix
+           : ss_equals_case (rowtype, ss_cstr ("STDDEV")) ? mm->var_matrix
+           : NULL);
+      if (v)
+        {
+          for (int x = 0; x < n_vars; ++x)
+            {
+              double n = case_num (c, vars[x]);
+              if (v == mm->var_matrix)
+                n *= n;
+              for (int y = 0; y < n_vars; ++y)
+                gsl_matrix_set (v, y, x, n);
+            }
+          continue;
+        }
+
+      struct matrix *m = NULL;
+      for (size_t i = 0; i < N_MATRICES; i++)
+        if (ss_equals_case (rowtype, ss_cstr (matrices[i].name)))
+          {
+            m = &matrices[i];
+            break;
+          }
+      if (m)
+        {
+          struct substring varname_raw = case_ss (c, mr->varname);
+          struct substring varname = ss_cstr (
+            recode_string (UTF8, dict_get_encoding (mr->dict),
+                           varname_raw.string, varname_raw.length));
+          ss_rtrim (&varname, ss_cstr (CC_SPACES));
+          varname.string[varname.length] = '\0';
+
+          int y = find_varname (vars, n_vars, varname.string);
+          if (y >= 0)
+            {
+              m->good_rows++;
+              matrix_fill_row (m->m, c, y, vars, n_vars);
+            }
+          else
+            m->bad_rows++;
+          ss_dealloc (&varname);
+        }
+    }
+  casereader_destroy (group);
  
-      const union value *uvv  = case_data (c, mr->varname);
-      int w = var_get_width (mr->varname);
+  for (size_t i = 0; i < N_MATRICES; i++)
+    if (matrices[i].good_rows && matrices[i].good_rows != n_vars)
+      msg (SW, _("%s matrix has %d columns but %zu rows named variables "
+                 "to be analyzed (and %zu rows named unknown variables)."),
+           matrices[i].name, n_vars, matrices[i].good_rows,
+           matrices[i].bad_rows);
  
-      struct fmt_spec fmt = { .type = FMT_A };
-      fmt.w = w;
-      char *vname = data_out (uvv, enc, &fmt, settings_get_fmt_settings ());
-      struct substring the_name = ss_cstr (vname);
+  return true;
+}
  
-      int mrow = -1;
-      for (int i = 0; i < n_vars; ++i)
-       {
-         if (ss_equals (var_names[i], the_name))
-           {
-             mrow = i;
-             break;
-           }
-       }
-      free (vname);
+int
+cmd_debug_matrix_read (struct lexer *lexer UNUSED, struct dataset *ds)
+{
+  const struct variable **vars;
+  size_t n_vars;
+  struct matrix_reader *mr = create_matrix_reader_from_case_reader (
+    dataset_dict (ds), proc_open (ds), &vars, &n_vars);
+  if (!mr)
+    return CMD_FAILURE;
  
-      if (mrow == -1)
-       continue;
+  struct pivot_table *pt = pivot_table_create ("Debug Matrix Reader");
  
-      if (0 == strncasecmp (row_type, "CORR    ", 8))
-       {
-         matrix_fill_row (&mm->corr, c, mrow, vars, n_vars);
-       }
-      else if (0 == strncasecmp (row_type, "COV     ", 8))
-       {
-         matrix_fill_row (&mm->cov, c, mrow, vars, n_vars);
-       }
+  enum mm_stat
+    {
+      MM_CORR,
+      MM_COV,
+      MM_N,
+      MM_MEAN,
+      MM_STDDEV,
+    };
+  const char *mm_stat_names[] = {
+    [MM_CORR] = "Correlation",
+    [MM_COV] = "Covariance",
+    [MM_N] = "N",
+    [MM_MEAN] = "Mean",
+    [MM_STDDEV] = "Standard Deviation",
+  };
+  enum { N_STATS = sizeof mm_stat_names / sizeof *mm_stat_names };
+  for (size_t i = 0; i < 2; i++)
+    {
+      struct pivot_dimension *d = pivot_dimension_create (
+        pt,
+        i ? PIVOT_AXIS_COLUMN : PIVOT_AXIS_ROW,
+        i ? "Column" : "Row");
+      if (!i)
+        pivot_category_create_leaf_rc (d->root, pivot_value_new_text ("Value"),
+                                       PIVOT_RC_CORRELATION);
+      for (size_t j = 0; j < n_vars; j++)
+        pivot_category_create_leaf_rc (
+          d->root, pivot_value_new_variable (vars[j]), PIVOT_RC_CORRELATION);
      }
  
-  casereader_destroy (group);
+  struct pivot_dimension *stat = pivot_dimension_create (pt, PIVOT_AXIS_ROW,
+                                                         "Statistic");
+  for (size_t i = 0; i < N_STATS; i++)
+    pivot_category_create_leaf (stat->root,
+                                pivot_value_new_text (mm_stat_names[i]));
  
-  for (int i = 0; i < n_vars; ++i)
-    ss_dealloc (var_names + i);
-  free (var_names);
+  struct pivot_dimension *split = pivot_dimension_create (
+    pt, PIVOT_AXIS_ROW, "Split");
  
-  return true;
+  int split_num = 0;
+
+  struct matrix_material mm = MATRIX_MATERIAL_INIT;
+  while (next_matrix_from_reader (&mm, mr, vars, n_vars))
+    {
+      pivot_category_create_leaf (split->root,
+                                  pivot_value_new_integer (split_num + 1));
+
+      const gsl_matrix *m[N_STATS] = {
+        [MM_CORR] = mm.corr,
+        [MM_COV] = mm.cov,
+        [MM_N] = mm.n,
+        [MM_MEAN] = mm.mean_matrix,
+        [MM_STDDEV] = mm.var_matrix,
+      };
+
+      for (size_t i = 0; i < N_STATS; i++)
+        if (m[i])
+          {
+            if (i == MM_COV || i == MM_CORR)
+              {
+                for (size_t y = 0; y < n_vars; y++)
+                  for (size_t x = 0; x < n_vars; x++)
+                    pivot_table_put4 (
+                      pt, y + 1, x, i, split_num,
+                      pivot_value_new_number (gsl_matrix_get (m[i], y, x)));
+              }
+            else
+              for (size_t x = 0; x < n_vars; x++)
+                {
+                  double n = gsl_matrix_get (m[i], 0, x);
+                  if (i == MM_STDDEV)
+                    n = sqrt (n);
+                  pivot_table_put4 (pt, 0, x, i, split_num,
+                                    pivot_value_new_number (n));
+                }
+          }
+
+      split_num++;
+      matrix_material_uninit (&mm);
+    }
+  pivot_table_submit (pt);
+
+  proc_commit (ds);
+
+  destroy_matrix_reader (mr);
+  free (vars);
+  return CMD_SUCCESS;
  }
diff --git a/src/language/data-io/matrix-reader.h b/src/language/data-io/matrix-reader.h

index 7a651d7866e2a320ee74a1203d9e5bef88687cf4..f3ac7000d57cef196f5db9976142caf4483312c8 100644 (file)
--- a/src/language/data-io/matrix-reader.h
+++ b/src/language/data-io/matrix-reader.h
@@ -22,15 +22,18 @@
  
  struct matrix_material
  {
-  gsl_matrix *corr ;  /* The correlation matrix */
-  gsl_matrix *cov ;   /* The covariance matrix */
+  gsl_matrix *corr;             /* The correlation matrix */
+  gsl_matrix *cov;              /* The covariance matrix */
  
    /* Moment matrices */
-  const gsl_matrix *n ;           /* MOMENT 0 */
-  const gsl_matrix *mean_matrix;  /* MOMENT 1 */
-  const gsl_matrix *var_matrix;   /* MOMENT 2 */
+  gsl_matrix *n;                /* MOMENT 0 */
+  gsl_matrix *mean_matrix;      /* MOMENT 1 */
+  gsl_matrix *var_matrix;       /* MOMENT 2 */
  };
  
+#define MATRIX_MATERIAL_INIT { .corr = NULL }
+void matrix_material_uninit (struct matrix_material *);
+
  struct dictionary;
  struct variable;
  struct casereader;
diff --git a/src/language/stats/factor.c b/src/language/stats/factor.c

index 4ec49010c9dcf6fc44a2ca934ef6ec4e942ce644..9fa0a8ad258c3198a5db9e903daf2249d4a29054 100644 (file)
--- a/src/language/stats/factor.c
+++ b/src/language/stats/factor.c
@@ -1536,10 +1536,8 @@ cmd_factor (struct lexer *lexer, struct dataset *ds)
            id->ai_cov = NULL;
            gsl_matrix_free (id->ai_cor);
            id->ai_cor = NULL;
-         gsl_matrix_free (id->mm.corr);
-         id->mm.corr = NULL;
-         gsl_matrix_free (id->mm.cov);
-         id->mm.cov = NULL;
+
+          matrix_material_uninit (&id->mm);
         }
  
        idata_free (id);
diff --git a/src/math/covariance.c b/src/math/covariance.c

index 44787c679a2a107acd5ef4eb1f7888c12ffbf366..dec5ee87ccb86e1db1b507847c64bc4f6f2aa820 100644 (file)
--- a/src/math/covariance.c
+++ b/src/math/covariance.c
@@ -129,7 +129,7 @@ struct covariance
     be identical.  If missing values are involved, then element (i,j)
     is the moment of the i th variable, when paired with the j th variable.
   */
-const gsl_matrix *
+gsl_matrix *
  covariance_moments (const struct covariance *cov, int m)
  {
    return cov->moments[m];
diff --git a/src/math/covariance.h b/src/math/covariance.h

index eef6019ddfe4259b58f9f049ba6bd40bcd153bc9..5a7be0d68958e0bc2986f332284a4274ad0a3f1b 100644 (file)
--- a/src/math/covariance.h
+++ b/src/math/covariance.h
@@ -44,7 +44,7 @@ const gsl_matrix * covariance_calculate_unnormalized (struct covariance *);
  
  void covariance_destroy (struct covariance *cov);
  
-const gsl_matrix *covariance_moments (const struct covariance *cov, int m);
+gsl_matrix *covariance_moments (const struct covariance *cov, int m);
  
  const struct categoricals * covariance_get_categoricals (const struct covariance *cov);
  size_t covariance_dim (const struct covariance * cov);
diff --git a/tests/language/data-io/matrix-data.at b/tests/language/data-io/matrix-data.at

index c949617cf9268b728ce90cd5d418b9a3833da246..e66d1aaa16aa22189810f129ddc9b1d6d70f51e7 100644 (file)
--- a/tests/language/data-io/matrix-data.at
+++ b/tests/language/data-io/matrix-data.at
@@ -16,56 +16,48 @@ dnl along with this program.  If not, see <http://www.gnu.org/licenses/>.
  dnl
  AT_BANNER([MATRIX DATA])
  
-AT_SETUP([Matrix data (lower file)])
-
-AT_DATA([matrix-data.pspp], [dnl
-matrix data
-    variables = rowtype_
-    var01 TO var08
-    /format = lower diagonal
-    /file = 'matrix.dat'
-    .
-
-list.
+dnl Keep this test in sync with Example 1 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - LOWER DIAGONAL with ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=ROWTYPE_ var01 TO var08
+    /FILE='matrix-data.txt'.
+FORMATS var01 TO var08(F5.2).
+LIST.
  ])
-
-AT_DATA([matrix.dat], [dnl
-mean  24.3  5.4  69.7  20.1  13.4  2.7  27.9  3.7
-sd    5.7   1.5  23.5  5.8   2.8   4.5  5.4   1.5
-n     92    92   92    92    92    92   92    92
-corr 1.00
-corr .18  1.00
-corr -.22  -.17  1.00
-corr .36  .31  -.14  1.00
-corr .27  .16  -.12  .22  1.00
-corr .33  .15  -.17  .24  .21  1.00
-corr .50  .29  -.20  .32  .12  .38  1.00
-corr .17  .29  -.05  .20  .27  .20  .04  1.00
+AT_DATA([matrix-data.txt], [dnl
+MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+N       92    92    92    92    92    92    92    92
+'CORR'  1.00
+CORR   .18  1.00
+CORR  -.22  -.17  1.00
+"CORR"   .36   .31  -.14  1.00
+COR   .27   .16  -.12   .22  1.00
+CORR   .33   .15  -.17   .24   .21  1.00
+CORR   .50   .29  -.20   .32   .12   .38  1.00
+CORR   .17   .29  -.05   .20   .27   .20   .04  1.00
  ])
  
-
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
  Table: Data List
  ROWTYPE_,VARNAME_,var01,var02,var03,var04,var05,var06,var07,var08
-mean,,24.3000,5.4000,69.7000,20.1000,13.4000,2.7000,27.9000,3.7000
-STDDEV,,5.7000,1.5000,23.5000,5.8000,2.8000,4.5000,5.4000,1.5000
-n,,92.0000,92.0000,92.0000,92.0000,92.0000,92.0000,92.0000,92.0000
-corr,var01,1.0000,.1800,-.2200,.3600,.2700,.3300,.5000,.1700
-corr,var02,.1800,1.0000,-.1700,.3100,.1600,.1500,.2900,.2900
-corr,var03,-.2200,-.1700,1.0000,-.1400,-.1200,-.1700,-.2000,-.0500
-corr,var04,.3600,.3100,-.1400,1.0000,.2200,.2400,.3200,.2000
-corr,var05,.2700,.1600,-.1200,.2200,1.0000,.2100,.1200,.2700
-corr,var06,.3300,.1500,-.1700,.2400,.2100,1.0000,.3800,.2000
-corr,var07,.5000,.2900,-.2000,.3200,.1200,.3800,1.0000,.0400
-corr,var08,.1700,.2900,-.0500,.2000,.2700,.2000,.0400,1.0000
+MEAN,,24.30,5.40,69.70,20.10,13.40,2.70,27.90,3.70
+STDDEV,,5.70,1.50,23.50,5.80,2.80,4.50,5.40,1.50
+N,,92.00,92.00,92.00,92.00,92.00,92.00,92.00,92.00
+CORR,var01,1.00,.18,-.22,.36,.27,.33,.50,.17
+CORR,var02,.18,1.00,-.17,.31,.16,.15,.29,.29
+CORR,var03,-.22,-.17,1.00,-.14,-.12,-.17,-.20,-.05
+CORR,var04,.36,.31,-.14,1.00,.22,.24,.32,.20
+CORR,var05,.27,.16,-.12,.22,1.00,.21,.12,.27
+CORR,var06,.33,.15,-.17,.24,.21,1.00,.38,.20
+CORR,var07,.50,.29,-.20,.32,.12,.38,1.00,.04
+CORR,var08,.17,.29,-.05,.20,.27,.20,.04,1.00
  ])
  AT_CLEANUP
  
-
-
-AT_SETUP([Matrix data (upper)])
-
-AT_DATA([matrix-data.pspp], [dnl
+AT_SETUP([MATRIX DATA - UPPER DIAGONAL with ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = rowtype_  var01 var02 var03 var04
      /format = upper diagonal.
@@ -73,7 +65,7 @@ matrix data
  begin data
  mean        34 35 36 37
  sd          22 11 55 66
-n_vector    100 101 102 103
+n_ve    100 101 102 103
  corr        1 9 8 7
  corr        1 6 5
  corr        1 4
@@ -83,24 +75,22 @@ end data.
  list.
  ])
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
  Table: Data List
  ROWTYPE_,VARNAME_,var01,var02,var03,var04
-mean,,34.0000,35.0000,36.0000,37.0000
+MEAN,,34.0000,35.0000,36.0000,37.0000
  STDDEV,,22.0000,11.0000,55.0000,66.0000
  N,,100.0000,101.0000,102.0000,103.0000
-corr,var01,1.0000,9.0000,8.0000,7.0000
-corr,var02,9.0000,1.0000,6.0000,5.0000
-corr,var03,8.0000,6.0000,1.0000,4.0000
-corr,var04,7.0000,5.0000,4.0000,1.0000
+CORR,var01,1.0000,9.0000,8.0000,7.0000
+CORR,var02,9.0000,1.0000,6.0000,5.0000
+CORR,var03,8.0000,6.0000,1.0000,4.0000
+CORR,var04,7.0000,5.0000,4.0000,1.0000
  ])
-
  AT_CLEANUP
  
-AT_SETUP([Matrix data (full)])
-
+AT_SETUP([MATRIX DATA - FULL with ROWTYPE_])
  dnl Just for fun, this one is in a different case.
-AT_DATA([matrix-data.pspp], [dnl
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = ROWTYPE_  var01 var02 var03 var04
      /format = full diagonal.
@@ -118,8 +108,7 @@ end data.
  list.
  ])
  
-
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
  Table: Data List
  ROWTYPE_,VARNAME_,var01,var02,var03,var04
  MEAN,,34.0000,35.0000,36.0000,37.0000
@@ -130,13 +119,11 @@ CORR,var02,9.0000,1.0000,6.0000,5.0000
  CORR,var03,8.0000,6.0000,1.0000,4.0000
  CORR,var04,7.0000,5.0000,4.0000,1.0000
  ])
-
  AT_CLEANUP
  
  
-AT_SETUP([Matrix data (upper nodiagonal)])
-
-AT_DATA([matrix-data.pspp], [dnl
+AT_SETUP([MATRIX DATA - UPPER NODIAGONAL with ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = rowtype_  var01 var02 var03 var04
      /format = upper nodiagonal.
@@ -153,27 +140,63 @@ end data.
  list.
  ])
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
  Table: Data List
  ROWTYPE_,VARNAME_,var01,var02,var03,var04
-mean,,34.0000,35.0000,36.0000,37.0000
+MEAN,,34.0000,35.0000,36.0000,37.0000
  STDDEV,,22.0000,11.0000,55.0000,66.0000
-n,,100.0000,101.0000,102.0000,103.0000
-corr,var01,1.0000,9.0000,8.0000,7.0000
-corr,var02,9.0000,1.0000,6.0000,5.0000
-corr,var03,8.0000,6.0000,1.0000,4.0000
-corr,var04,7.0000,5.0000,4.0000,1.0000
+N,,100.0000,101.0000,102.0000,103.0000
+CORR,var01,1.0000,9.0000,8.0000,7.0000
+CORR,var02,9.0000,1.0000,6.0000,5.0000
+CORR,var03,8.0000,6.0000,1.0000,4.0000
+CORR,var04,7.0000,5.0000,4.0000,1.0000
  ])
-
  AT_CLEANUP
  
+dnl Keep this test in sync with Example 2 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - UPPER NODIAGONAL with ROWTYPE_ - 2])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=ROWTYPE_ var01 TO var08
+    /FORMAT=UPPER NODIAGONAL.
+BEGIN DATA.
+MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+N       92    92    92    92    92    92    92    92
+CORR         .17   .50  -.33   .27   .36  -.22   .18
+CORR               .29   .29  -.20   .32   .12   .38
+CORR                     .05   .20  -.15   .16   .21
+CORR                           .20   .32  -.17   .12
+CORR                                 .27   .12  -.24
+CORR                                      -.20  -.38
+CORR                                             .04
+END DATA.
+FORMATS var01 TO var08(F6.2).
+LIST.
+])
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04,var05,var06,var07,var08
+MEAN,,24.30,5.40,69.70,20.10,13.40,2.70,27.90,3.70
+STDDEV,,5.70,1.50,23.50,5.80,2.80,4.50,5.40,1.50
+N,,92.00,92.00,92.00,92.00,92.00,92.00,92.00,92.00
+CORR,var01,1.00,.17,.50,-.33,.27,.36,-.22,.18
+CORR,var02,.17,1.00,.29,.29,-.20,.32,.12,.38
+CORR,var03,.50,.29,1.00,.05,.20,-.15,.16,.21
+CORR,var04,-.33,.29,.05,1.00,.20,.32,-.17,.12
+CORR,var05,.27,-.20,.20,.20,1.00,.27,.12,-.24
+CORR,var06,.36,.32,-.15,.32,.27,1.00,-.20,-.38
+CORR,var07,-.22,.12,.16,-.17,.12,-.20,1.00,.04
+CORR,var08,.18,.38,.21,.12,-.24,-.38,.04,1.00
+])
+AT_CLEANUP
  
-AT_SETUP([Matrix data (lower nodiagonal)])
-
-AT_DATA([matrix-data.pspp], [dnl
+AT_SETUP([MATRIX DATA - LOWER NODIAGONAL with ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = rowtype_  var01 var02 var03 var04
-    /format = lower nodiagonal.
+    /format = lower nodiagonal
+    /cells = 2.
  
  begin data
  mean 34 35 36 37
@@ -187,25 +210,23 @@ end data.
  list.
  ])
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
+matrix-data.sps:4: warning: MATRIX DATA: CELLS is ignored when VARIABLES includes ROWTYPE_
+
  Table: Data List
  ROWTYPE_,VARNAME_,var01,var02,var03,var04
-mean,,34.0000,35.0000,36.0000,37.0000
+MEAN,,34.0000,35.0000,36.0000,37.0000
  STDDEV,,22.0000,11.0000,55.0000,66.0000
-n,,100.0000,101.0000,102.0000,103.0000
-corr,var01,1.0000,9.0000,8.0000,7.0000
-corr,var02,9.0000,1.0000,6.0000,5.0000
-corr,var03,8.0000,6.0000,1.0000,4.0000
-corr,var04,7.0000,5.0000,4.0000,1.0000
+N,,100.0000,101.0000,102.0000,103.0000
+CORR,var01,1.0000,9.0000,8.0000,7.0000
+CORR,var02,9.0000,1.0000,6.0000,5.0000
+CORR,var03,8.0000,6.0000,1.0000,4.0000
+CORR,var04,7.0000,5.0000,4.0000,1.0000
  ])
-
  AT_CLEANUP
  
-
-
-AT_SETUP([Matrix data split data])
-
-AT_DATA([matrix-data.pspp], [dnl
+AT_SETUP([MATRIX DATA - split data])
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = s1 s2 rowtype_  var01 var02 var03
      /split=s1 s2.
@@ -230,8 +251,7 @@ display dictionary.
  list.
  ])
  
-
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
  Table: Variables
  Name,Position,Measurement Level,Role,Width,Alignment,Print Format,Write Format
  s1,1,Scale,Input,8,Right,F4.0,F4.0
@@ -242,62 +262,229 @@ var01,5,Scale,Input,8,Right,F10.4,F10.4
  var02,6,Scale,Input,8,Right,F10.4,F10.4
  var03,7,Scale,Input,8,Right,F10.4,F10.4
  
+Table: Split Values
+Variable,Value
+s1,8
+s2,0
+
  Table: Data List
  s1,s2,ROWTYPE_,VARNAME_,var01,var02,var03
-8,0,mean,,21.4000,5.0000,72.9000
+8,0,MEAN,,21.4000,5.0000,72.9000
  8,0,STDDEV,,6.5000,1.6000,22.8000
-8,0,n,,106.0000,106.0000,106.0000
-8,0,corr,var01,1.0000,.4100,-.1600
-8,0,corr,var02,.4100,1.0000,-.2200
-8,0,corr,var03,-.1600,-.2200,1.0000
-8,1,mean,,11.4000,1.0000,52.9000
+8,0,N,,106.0000,106.0000,106.0000
+8,0,CORR,var01,1.0000,.4100,-.1600
+8,0,CORR,var02,.4100,1.0000,-.2200
+8,0,CORR,var03,-.1600,-.2200,1.0000
+
+Table: Split Values
+Variable,Value
+s1,8
+s2,1
+
+Table: Data List
+s1,s2,ROWTYPE_,VARNAME_,var01,var02,var03
+8,1,MEAN,,11.4000,1.0000,52.9000
  8,1,STDDEV,,9.5000,8.6000,12.8000
-8,1,n,,10.0000,11.0000,12.0000
-8,1,corr,var01,1.0000,.5100,.3600
-8,1,corr,var02,.5100,1.0000,-.4100
-8,1,corr,var03,.3600,-.4100,1.0000
+8,1,N,,10.0000,11.0000,12.0000
+8,1,CORR,var01,1.0000,.5100,.3600
+8,1,CORR,var02,.5100,1.0000,-.4100
+8,1,CORR,var03,.3600,-.4100,1.0000
  ])
  
  AT_CLEANUP
  
+dnl Keep this test in sync with Example 4 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - split data - 2])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=s1 ROWTYPE_  var01 TO var04
+    /SPLIT=s1
+    /FORMAT=FULL.
+BEGIN DATA.
+0 MEAN 34 35 36 37
+0 SD   22 11 55 66
+0 N    99 98 99 92
+0 CORR  1 .9 .8 .7
+0 CORR .9  1 .6 .5
+0 CORR .8 .6  1 .4
+0 CORR .7 .5 .4  1
+1 MEAN 44 45 34 39
+1 SD   23 15 51 46
+1 N    98 34 87 23
+1 CORR  1 .2 .3 .4
+1 CORR .2  1 .5 .6
+1 CORR .3 .5  1 .7
+1 CORR .4 .6 .7  1
+END DATA.
+FORMATS var01 TO var04(F5.1).
+LIST.
+])
+
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
+Table: Split Values
+Variable,Value
+s1,0
+
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+0,MEAN,,34.0,35.0,36.0,37.0
+0,STDDEV,,22.0,11.0,55.0,66.0
+0,N,,99.0,98.0,99.0,92.0
+0,CORR,var01,1.0,.9,.8,.7
+0,CORR,var02,.9,1.0,.6,.5
+0,CORR,var03,.8,.6,1.0,.4
+0,CORR,var04,.7,.5,.4,1.0
+
+Table: Split Values
+Variable,Value
+s1,1
  
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+1,MEAN,,44.0,45.0,34.0,39.0
+1,STDDEV,,23.0,15.0,51.0,46.0
+1,N,,98.0,34.0,87.0,23.0
+1,CORR,var01,1.0,.2,.3,.4
+1,CORR,var02,.2,1.0,.5,.6
+1,CORR,var03,.3,.5,1.0,.7
+1,CORR,var04,.4,.6,.7,1.0
+])
+AT_CLEANUP
  
+dnl Keep this test in sync with Example 5 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - factor variables])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=ROWTYPE_ f1 var01 TO var04
+    /FACTOR=f1.
+BEGIN DATA.
+MEAN 0 34 35 36 37
+SD   0 22 11 55 66
+N    0 99 98 99 92
+MEAN 1 44 45 34 39
+SD   1 23 15 51 46
+N    1 98 34 87 23
+CORR .  1
+CORR . .9  1
+CORR . .8 .6  1
+CORR . .7 .5 .4  1
+END DATA.
+FORMATS var01 TO var04(F5.1).
+LIST.
+])
  
-AT_SETUP([Matrix data duplicate variable])
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
+Table: Data List
+ROWTYPE_,f1,VARNAME_,var01,var02,var03,var04
+MEAN,0,,34.0,35.0,36.0,37.0
+STDDEV,0,,22.0,11.0,55.0,66.0
+N,0,,99.0,98.0,99.0,92.0
+MEAN,1,,44.0,45.0,34.0,39.0
+STDDEV,1,,23.0,15.0,51.0,46.0
+N,1,,98.0,34.0,87.0,23.0
+CORR,.,var01,1.0,.9,.8,.7
+CORR,.,var02,.9,1.0,.6,.5
+CORR,.,var03,.8,.6,1.0,.4
+CORR,.,var04,.7,.5,.4,1.0
+])
+AT_CLEANUP
  
-dnl Negative test to check for sane behaviour in the face of bad syntax
-AT_DATA([matrix-data.pspp], [dnl
-set decimal = dot .
+AT_SETUP([MATRIX DATA - bad ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
  matrix data
-    variables = s1 s1 rowtype_  var01 var02 var03
-    /split=s1.
+    variables = rowtype_  var01 var02 var03 var04
+    /format = upper diagonal.
  
  begin data
-0   mean     21.4  5.0  72.9
-0   sd       6.5   1.6  22.8
-0   n        106   106  106
-0   corr     1
-0   corr    .41  1
-0   corr    -.16  -.22  1
-end data .
+cork        1 9 8 7
+corr        1 6 5
+corr        1 4
+corr        1
+end data.
  
  list.
  ])
  
+AT_CHECK([pspp -O format=csv matrix-data.sps], [1], [dnl
+"matrix-data.sps:6.1-6.4: error: Unknown row type ""cork""."
+])
+AT_CLEANUP
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [1], [dnl
-matrix-data.pspp:3: error: MATRIX DATA: Variable s1 appears twice in variable list.
+AT_SETUP([MATRIX DATA - unexpected ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+matrix data
+    variables = rowtype_ f1 var01 var02 var03 var04
+    /content = corr (sd)
+    /factor = f1
+    /format = upper diagonal.
  
-matrix-data.pspp:6: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
+begin data
+corr . 1 9 8 7
+corr . 1 6 5
+corr . 1 4
+corr . 1
+sd   . 1 2 3 4
+
+corr 0 1 9 8 7
+corr 0 1 6 5
+corr 0 1 4
+corr 0 1
+sd   0 1 2 3 4
+end data.
+
+list.
  ])
  
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
+matrix-data.sps:12: warning: Data contains pooled row type STDDEV not included in CONTENTS.
+
+matrix-data.sps:14: warning: Data contains with-factors row type CORR not included in CONTENTS.
+
+Table: Data List
+ROWTYPE_,f1,VARNAME_,var01,var02,var03,var04
+CORR,.,var01,1.0000,9.0000,8.0000,7.0000
+CORR,.,var02,9.0000,1.0000,6.0000,5.0000
+CORR,.,var03,8.0000,6.0000,1.0000,4.0000
+CORR,.,var04,7.0000,5.0000,4.0000,1.0000
+STDDEV,.,,1.0000,2.0000,3.0000,4.0000
+CORR,0,var01,1.0000,9.0000,8.0000,7.0000
+CORR,0,var02,9.0000,1.0000,6.0000,5.0000
+CORR,0,var03,8.0000,6.0000,1.0000,4.0000
+CORR,0,var04,7.0000,5.0000,4.0000,1.0000
+STDDEV,0,,1.0000,2.0000,3.0000,4.0000
+])
  AT_CLEANUP
  
+AT_SETUP([MATRIX DATA - bad number])
+AT_DATA([matrix-data.sps], [dnl
+matrix data
+    variables = rowtype_  var01 var02 var03 var04
+    /format = upper diagonal.
+
+begin data
+corr        1 9 8 7
+corr        1 x 5
+corr        1 4
+corr        1
+end data.
  
+list.
+])
  
-AT_SETUP([Matrix data - long variable names])
+AT_CHECK([pspp -O format=csv matrix-data.sps], [1], [dnl
+matrix-data.sps:7.15: error: Field contents are not numeric.
  
-AT_DATA([matrix-data.pspp], [dnl
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04
+CORR,var01,1.0000,9.0000,8.0000,7.0000
+CORR,var02,9.0000,1.0000,.    ,5.0000
+CORR,var03,8.0000,.    ,1.0000,4.0000
+CORR,var04,7.0000,5.0000,4.0000,1.0000
+])
+AT_CLEANUP
+
+AT_SETUP([MATRIX DATA - long variable names])
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = rowtype_  var01 var_two variable_number_three variableFour
      /format = upper diagonal.
@@ -315,29 +502,25 @@ end data.
  list.
  ])
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
  Table: Data List
  ROWTYPE_,VARNAME_,var01,var_two,variable_number_three,variableFour
-mean,,34.0000,35.0000,36.0000,37.0000
+MEAN,,34.0000,35.0000,36.0000,37.0000
  STDDEV,,22.0000,11.0000,55.0000,66.0000
  N,,100.0000,101.0000,102.0000,103.0000
-corr,var01,1.0000,9.0000,8.0000,7.0000
-corr,var_two,9.0000,1.0000,6.0000,5.0000
-corr,variable_number_three,8.0000,6.0000,1.0000,4.0000
-corr,variableFour,7.0000,5.0000,4.0000,1.0000
+CORR,var01,1.0000,9.0000,8.0000,7.0000
+CORR,var_two,9.0000,1.0000,6.0000,5.0000
+CORR,variable_number_three,8.0000,6.0000,1.0000,4.0000
+CORR,variableFour,7.0000,5.0000,4.0000,1.0000
  ])
-
  AT_CLEANUP
  
-
-
-AT_SETUP([Matrix reader - read integrity])
-
+AT_SETUP([MATRIX DATA - read integrity])
  dnl Check that matrices presented are read correctly.
  dnl The example below is an unlikely one since all
  dnl covariance/correlation matrices must be symmetrical
  dnl but it serves a purpose for this test.
-AT_DATA([matrix-reader.pspp], [dnl
+AT_DATA([matrix-reader.sps], [dnl
  matrix data
      variables = rowtype_  var01 to var9
      /format = full.
@@ -355,7 +538,9 @@ corr 71 72 73 74 75 76 77 78 79
  corr 81 82 83 84 85 86 87 88 89
  corr 91 92 93 94 95 96 97 98 99
  end data.
-
+DEBUG MATRIX READ.
+FORMATS var01 to var09(F3.0).
+list.
  factor  /matrix = in (corr = *)
         /analysis var02 var04 var06
         /method = correlation
@@ -363,7 +548,36 @@ factor  /matrix = in (corr = *)
         /print correlation.
  ])
  
-AT_CHECK([pspp -O format=csv matrix-reader.pspp], [0], [dnl
+AT_CHECK([pspp --testing-mode -O format=csv matrix-reader.sps], [0], [dnl
+Table: Debug Matrix Reader
+,,,var01,var02,var03,var04,var05,var06,var07,var08,var09
+1,Correlation,var01,11.000,12.000,13.000,14.000,15.000,16.000,17.000,18.000,19.000
+,,var02,21.000,22.000,23.000,24.000,25.000,26.000,27.000,28.000,29.000
+,,var03,31.000,32.000,33.000,34.000,35.000,36.000,37.000,38.000,39.000
+,,var04,41.000,42.000,43.000,44.000,45.000,46.000,47.000,48.000,49.000
+,,var05,51.000,52.000,53.000,54.000,55.000,56.000,57.000,58.000,59.000
+,,var06,61.000,62.000,63.000,64.000,65.000,66.000,67.000,68.000,69.000
+,,var07,71.000,72.000,73.000,74.000,75.000,76.000,77.000,78.000,79.000
+,,var08,81.000,82.000,83.000,84.000,85.000,86.000,87.000,88.000,89.000
+,,var09,91.000,92.000,93.000,94.000,95.000,96.000,97.000,98.000,99.000
+,N,Value,1.000,2.000,3.000,4.000,5.000,6.000,7.000,8.000,9.000
+,Mean,Value,.000,.000,.000,.000,.000,.000,.000,.000,.000
+,Standard Deviation,Value,100.000,200.000,300.000,400.000,500.000,600.000,700.000,800.000,900.000
+
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04,var05,var06,var07,var08,var09
+N,,1,2,3,4,5,6,7,8,9
+STDDEV,,100,200,300,400,500,600,700,800,900
+CORR,var01,11,12,13,14,15,16,17,18,19
+CORR,var02,21,22,23,24,25,26,27,28,29
+CORR,var03,31,32,33,34,35,36,37,38,39
+CORR,var04,41,42,43,44,45,46,47,48,49
+CORR,var05,51,52,53,54,55,56,57,58,59
+CORR,var06,61,62,63,64,65,66,67,68,69
+CORR,var07,71,72,73,74,75,76,77,78,79
+CORR,var08,81,82,83,84,85,86,87,88,89
+CORR,var09,91,92,93,94,95,96,97,98,99
+
  Table: Correlation Matrix
  ,,var02,var04,var06
  Correlation,var02,22.000,24.000,26.000
@@ -377,15 +591,12 @@ var02,6.73,-2.23
  var04,6.95,2.15
  var06,9.22,.01
  ])
-
  AT_CLEANUP
  
-
-AT_SETUP([Matrix data - too many rows])
-
+AT_SETUP([MATRIX DATA - too many rows])
  dnl Test for a crash which occurred when the matrix had more rows declared
  dnl than variables to hold them.
-AT_DATA([matrix-data.pspp], [dnl
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = rowtype_
      var01 var02 var03 var04
@@ -404,27 +615,85 @@ begin data
      corr    1.00  .70
      corr    1.00
  end data .
-
-execute.
+FORMATS var01 TO var04 (F6.2).
+LIST.
  ])
  
+AT_CHECK([pspp -O format=csv matrix-data.sps], [1], [dnl
+matrix-data.sps:10.29-10.31: error: Extraneous data expecting end of line.
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [1], [dnl
-matrix-data.pspp:13: error: MATRIX DATA: There are 4 variable declared but the data has at least 5 matrix rows.
+matrix-data.sps:11.24-11.31: error: Extraneous data expecting end of line.
  
-matrix-data.pspp:20: error: EXECUTE: EXECUTE is allowed only after the active dataset has been defined.
-])
+matrix-data.sps:12.19-12.32: error: Extraneous data expecting end of line.
  
+matrix-data.sps:18: error: Matrix CORR had 9 rows but 4 rows were expected.
  
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04
+MEAN,,21.40,5.00,72.90,17.40
+STDDEV,,6.50,1.60,22.80,5.70
+N,,106.00,106.00,106.00,106.00
+CORR,var01,1.00,.32,.48,.28
+CORR,var02,.32,1.00,.72,.54
+CORR,var03,.48,.72,1.00,.50
+CORR,var04,.28,.54,.50,1.00
+])
  AT_CLEANUP
  
+AT_SETUP([MATRIX DATA - too few rows])
+AT_DATA([matrix-data.sps], [dnl
+matrix data
+    variables = rowtype_ s1 var01 var02 var03 var04
+    /split s1
+    /format = upper diagonal
+    /file='matrix-data.txt'.
+FORMATS var01 TO var04 (F6.2).
+LIST.
+])
+AT_DATA([matrix-data.txt], [dnl
+mean 1    21.4  5.0  72.9  17.4
+sd   1    6.5  1.6  22.8  5.7
+n    1   106  106  106  106
+corr 1   1.00  .32  .48  .28
+corr 2   1.00  .32  .48  .28
+corr 2        2.00  .72  .54
+])
  
+AT_CHECK([pspp -O format=csv matrix-data.sps], [1], [dnl
+matrix-data.txt:5: error: Matrix CORR had 1 rows but 4 rows were expected.
+ 
+matrix-data.txt:6: error: Matrix CORR had 2 rows but 4 rows were expected.
  
+Table: Split Values
+Variable,Value
+s1,1
  
-AT_SETUP([Matrix data (badly formed)])
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+1,MEAN,,21.40,5.00,72.90,17.40
+1,STDDEV,,6.50,1.60,22.80,5.70
+1,N,,106.00,106.00,106.00,106.00
+1,CORR,var01,1.00,.32,.48,.28
+1,CORR,var02,.32,1.00,.  ,.  @&t@
+1,CORR,var03,.48,.  ,1.00,.  @&t@
+1,CORR,var04,.28,.  ,.  ,1.00
+
+Table: Split Values
+Variable,Value
+s1,2
  
-AT_DATA([data.pspp], [dnl
-data list list /ROWTYPE_ (a8) VARNAME_(a4) v1 v2 v3 v4xxxxxxxxxxxxxxxxxxxxxzzzzzzzzzzzzzxxxxxxxxx.
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+2,CORR,var01,1.00,.32,.48,.28
+2,CORR,var02,.32,2.00,.72,.54
+2,CORR,var03,.48,.72,1.00,.  @&t@
+2,CORR,var04,.28,.54,.  ,1.00
+])
+AT_CLEANUP
+
+AT_SETUP([MATRIX DATA - badly formed])
+AT_DATA([data.sps], [dnl
+data list list NOTABLE /ROWTYPE_ (a8) VARNAME_(a4) v1 v2 v3 v4xxxxxxxxxxxxxxxxxxxxxzzzzzzzzzzzzzxxxxxxxxx.
  begin data
  mean ""                          1 2 3 4
  sd   ""                          5 6 7 8
@@ -435,22 +704,26 @@ corr v3                          111 222 333 444
  corr v4                           4 3 21 1
  end data.
  
-list.
-
-factor matrix=in(corr = *)
-       .
+DEBUG MATRIX READ.
  ])
  
-AT_CHECK([pspp -O format=csv data.pspp], [1], [ignore])
-
+AT_CHECK([pspp --testing-mode -O format=csv data.sps], [0], [dnl
+data.sps:12: warning: DEBUG MATRIX READ: CORR matrix has 4 columns but 3 rows named variables to be analyzed (and 1 rows named unknown variables).
+
+Table: Debug Matrix Reader
+,,,v1,v2,v3,v4xxxxxxxxxxxxxxxxxxxxxzzzzzzzzzzzzzxxxxxxxxx
+1,Correlation,v1,11.000,22.000,33.000,44.000
+,,v2,55.000,66.000,77.000,88.000
+,,v3,111.000,222.000,333.000,444.000
+,,v4xxxxxxxxxxxxxxxxxxxxxzzzzzzzzzzzzzxxxxxxxxx,.   ,.   ,.   ,.   @&t@
+,N,Value,2.000,3.000,4.000,5.000
+,Mean,Value,1.000,2.000,3.000,4.000
+,Standard Deviation,Value,.000,.000,.000,.000
+])
  AT_CLEANUP
  
-
-
-
-AT_SETUP([Matrix data (N subcommand)])
-
-AT_DATA([matrix-data.pspp], [dnl
+AT_SETUP([MATRIX DATA - N subcommand])
+AT_DATA([matrix-data.sps], [dnl
  matrix data
      variables = rowtype_  var01 var02 var03 var04
      /n = 99
@@ -467,29 +740,65 @@ end data.
  list.
  ])
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [0], [dnl
-"matrix-data.pspp:12: warning: MATRIX DATA: The N subcommand was specified, but a N record was also found in the data.  The N record will be ignored."
+AT_CHECK([pspp -O format=csv matrix-data.sps], [1], [dnl
+matrix-data.sps:8: error: N record is not allowed with N subcommand.  Ignoring N record.
  
  Table: Data List
  ROWTYPE_,VARNAME_,var01,var02,var03,var04
  N,,99.0000,99.0000,99.0000,99.0000
-mean,,34.0000,35.0000,36.0000,37.0000
+MEAN,,34.0000,35.0000,36.0000,37.0000
  STDDEV,,22.0000,11.0000,55.0000,66.0000
-corr,var01,1.0000,9.0000,8.0000,7.0000
-corr,var02,9.0000,1.0000,6.0000,5.0000
-corr,var03,8.0000,6.0000,1.0000,4.0000
-corr,var04,7.0000,5.0000,4.0000,1.0000
+CORR,var01,1.0000,9.0000,8.0000,7.0000
+CORR,var02,9.0000,1.0000,6.0000,5.0000
+CORR,var03,8.0000,6.0000,1.0000,4.0000
+CORR,var04,7.0000,5.0000,4.0000,1.0000
  ])
-
  AT_CLEANUP
  
+dnl Keep this test in sync with Example 3 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - N subcommand - 2])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=ROWTYPE_ var01 TO var08
+    /FORMAT=UPPER NODIAGONAL
+    /N 92.
+BEGIN DATA.
+MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+CORR         .17   .50  -.33   .27   .36  -.22   .18
+CORR               .29   .29  -.20   .32   .12   .38
+CORR                     .05   .20  -.15   .16   .21
+CORR                           .20   .32  -.17   .12
+CORR                                 .27   .12  -.24
+CORR                                      -.20  -.38
+CORR                                             .04
+END DATA.
+FORMATS var01 TO var08(F6.2).
+LIST.
+])
  
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04,var05,var06,var07,var08
+N,,92.00,92.00,92.00,92.00,92.00,92.00,92.00,92.00
+MEAN,,24.30,5.40,69.70,20.10,13.40,2.70,27.90,3.70
+STDDEV,,5.70,1.50,23.50,5.80,2.80,4.50,5.40,1.50
+CORR,var01,1.00,.17,.50,-.33,.27,.36,-.22,.18
+CORR,var02,.17,1.00,.29,.29,-.20,.32,.12,.38
+CORR,var03,.50,.29,1.00,.05,.20,-.15,.16,.21
+CORR,var04,-.33,.29,.05,1.00,.20,.32,-.17,.12
+CORR,var05,.27,-.20,.20,.20,1.00,.27,.12,-.24
+CORR,var06,.36,.32,-.15,.32,.27,1.00,-.20,-.38
+CORR,var07,-.22,.12,.16,-.17,.12,-.20,1.00,.04
+CORR,var08,.18,.38,.21,.12,-.24,-.38,.04,1.00
+])
+AT_CLEANUP
  
  dnl A "no-crash" test.  This was observed to cause problems.
  dnl See bug #58596
-AT_SETUP([Matrix data crash])
+AT_SETUP([MATRIX DATA - crash])
  
-AT_DATA([matrix-data.pspp], [dnl
+AT_DATA([matrix-data.sps], [dnl
  begin data
  corr 31
  
@@ -505,6 +814,396 @@ matrix data
  begin data
  ])
  
-AT_CHECK([pspp -O format=csv matrix-data.pspp], [1], [ignore])
+AT_CHECK([pspp -O format=csv matrix-data.sps], [1], [ignore])
+AT_CLEANUP
+\f
+dnl Keep this test in sync with Example 6 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - LOWER DIAGONAL without ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=var01 TO var08
+   /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+ 5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+  92    92    92    92    92    92    92    92
+1.00
+ .18  1.00
+-.22  -.17  1.00
+ .36   .31  -.14  1.00
+ .27   .16  -.12   .22  1.00
+ .33   .15  -.17   .24   .21  1.00
+ .50   .29  -.20   .32   .12   .38  1.00
+ .17   .29  -.05   .20   .27   .20   .04  1.00
+END DATA.
+FORMATS var01 TO var08(F5.2).
+LIST.
+])
+AT_CHECK([pspp matrix-data.sps -O format=csv], [0], [dnl
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04,var05,var06,var07,var08
+MEAN,,24.30,5.40,69.70,20.10,13.40,2.70,27.90,3.70
+STDDEV,,5.70,1.50,23.50,5.80,2.80,4.50,5.40,1.50
+N,,92.00,92.00,92.00,92.00,92.00,92.00,92.00,92.00
+CORR,var01,1.00,.18,-.22,.36,.27,.33,.50,.17
+CORR,var02,.18,1.00,-.17,.31,.16,.15,.29,.29
+CORR,var03,-.22,-.17,1.00,-.14,-.12,-.17,-.20,-.05
+CORR,var04,.36,.31,-.14,1.00,.22,.24,.32,.20
+CORR,var05,.27,.16,-.12,.22,1.00,.21,.12,.27
+CORR,var06,.33,.15,-.17,.24,.21,1.00,.38,.20
+CORR,var07,.50,.29,-.20,.32,.12,.38,1.00,.04
+CORR,var08,.17,.29,-.05,.20,.27,.20,.04,1.00
+])
+AT_CLEANUP
+
+AT_SETUP([MATRIX DATA - extraneous data without ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=var01 TO var08
+   /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
+ 5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
+  92    92    92    92    92    92    92    92
+1.00   .18
+ .18  1.00
+-.22  -.17  1.00
+ .36   .31  -.14  1.00
+ .27   .16  -.12   .22  1.00
+ .33   .15  -.17   .24   .21  1.00
+ .50   .29  -.20   .32   .12   .38  1.00
+ .17   .29  -.05   .20   .27   .20   .04  1.00
+END DATA.
+FORMATS var01 TO var08(F5.2).
+LIST.
+])
+AT_CHECK([pspp matrix-data.sps -O format=csv], [1], [dnl
+matrix-data.sps:8.8-8.10: error: Extraneous data expecting end of line.
+
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04,var05,var06,var07,var08
+MEAN,,24.30,5.40,69.70,20.10,13.40,2.70,27.90,3.70
+STDDEV,,5.70,1.50,23.50,5.80,2.80,4.50,5.40,1.50
+N,,92.00,92.00,92.00,92.00,92.00,92.00,92.00,92.00
+CORR,var01,1.00,.18,-.22,.36,.27,.33,.50,.17
+CORR,var02,.18,1.00,-.17,.31,.16,.15,.29,.29
+CORR,var03,-.22,-.17,1.00,-.14,-.12,-.17,-.20,-.05
+CORR,var04,.36,.31,-.14,1.00,.22,.24,.32,.20
+CORR,var05,.27,.16,-.12,.22,1.00,.21,.12,.27
+CORR,var06,.33,.15,-.17,.24,.21,1.00,.38,.20
+CORR,var07,.50,.29,-.20,.32,.12,.38,1.00,.04
+CORR,var08,.17,.29,-.05,.20,.27,.20,.04,1.00
+])
+AT_CLEANUP
+
+dnl Keep this test in sync with Example 7 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - Split variables with explicit values without ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=s1 var01 TO var04
+    /SPLIT=s1
+    /FORMAT=FULL
+    /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+0  1 .9 .8 .7
+0 .9  1 .6 .5
+0 .8 .6  1 .4
+0 .7 .5 .4  1
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+1  1 .2 .3 .4
+1 .2  1 .5 .6
+1 .3 .5  1 .7
+1 .4 .6 .7  1
+END DATA.
+FORMATS var01 TO var04(F5.2).
+LIST.
+])
+AT_CHECK([pspp matrix-data.sps -O format=csv], [0], [dnl
+Table: Split Values
+Variable,Value
+s1,0
+
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+0,MEAN,,34.00,35.00,36.00,37.00
+0,STDDEV,,22.00,11.00,55.00,66.00
+0,N,,99.00,98.00,99.00,92.00
+0,CORR,var01,1.00,.90,.80,.70
+0,CORR,var02,.90,1.00,.60,.50
+0,CORR,var03,.80,.60,1.00,.40
+0,CORR,var04,.70,.50,.40,1.00
+
+Table: Split Values
+Variable,Value
+s1,1
  
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+1,MEAN,,44.00,45.00,34.00,39.00
+1,STDDEV,,23.00,15.00,51.00,46.00
+1,N,,98.00,34.00,87.00,23.00
+1,CORR,var01,1.00,.20,.30,.40
+1,CORR,var02,.20,1.00,.50,.60
+1,CORR,var03,.30,.50,1.00,.70
+1,CORR,var04,.40,.60,.70,1.00
+])
+AT_CLEANUP
+
+dnl Keep this test in sync with Example 8 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - Split variable with sequential values without ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=var01 TO var04
+    /SPLIT=s1
+    /FORMAT=FULL
+    /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+34 35 36 37
+22 11 55 66
+99 98 99 92
+ 1 .9 .8 .7
+.9  1 .6 .5
+.8 .6  1 .4
+.7 .5 .4  1
+44 45 34 39
+23 15 51 46
+98 34 87 23
+ 1 .2 .3 .4
+.2  1 .5 .6
+.3 .5  1 .7
+.4 .6 .7  1
+END DATA.
+FORMATS var01 TO var04(F5.2).
+LIST.
+])
+AT_CHECK([pspp matrix-data.sps -O format=csv], [0], [dnl
+Table: Split Values
+Variable,Value
+s1,1
+
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+1,MEAN,,34.00,35.00,36.00,37.00
+1,STDDEV,,22.00,11.00,55.00,66.00
+1,N,,99.00,98.00,99.00,92.00
+1,CORR,var01,1.00,.90,.80,.70
+1,CORR,var02,.90,1.00,.60,.50
+1,CORR,var03,.80,.60,1.00,.40
+1,CORR,var04,.70,.50,.40,1.00
+
+Table: Split Values
+Variable,Value
+s1,2
+
+Table: Data List
+s1,ROWTYPE_,VARNAME_,var01,var02,var03,var04
+2,MEAN,,44.00,45.00,34.00,39.00
+2,STDDEV,,23.00,15.00,51.00,46.00
+2,N,,98.00,34.00,87.00,23.00
+2,CORR,var01,1.00,.20,.30,.40
+2,CORR,var02,.20,1.00,.50,.60
+2,CORR,var03,.30,.50,1.00,.70
+2,CORR,var04,.40,.60,.70,1.00
+])
+AT_CLEANUP
+
+dnl Keep this test in sync with Example 9 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - Factor variables grouping within-cell records by factor without ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=f1 var01 TO var04
+    /FACTOR=f1
+    /CELLS=2
+    /CONTENTS=(MEAN SD N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+   1
+  .9  1
+  .8 .6  1
+  .7 .5 .4  1
+END DATA.
+FORMATS var01 TO var04(F5.1).
+LIST.
+])
+AT_CHECK([pspp matrix-data.sps -O format=csv], [0], [dnl
+Table: Data List
+ROWTYPE_,f1,VARNAME_,var01,var02,var03,var04
+MEAN,0,,34.0,35.0,36.0,37.0
+STDDEV,0,,22.0,11.0,55.0,66.0
+N,0,,99.0,98.0,99.0,92.0
+MEAN,1,,44.0,45.0,34.0,39.0
+STDDEV,1,,23.0,15.0,51.0,46.0
+N,1,,98.0,34.0,87.0,23.0
+CORR,.,var01,1.0,.9,.8,.7
+CORR,.,var02,.9,1.0,.6,.5
+CORR,.,var03,.8,.6,1.0,.4
+CORR,.,var04,.7,.5,.4,1.0
+])
+AT_CLEANUP
+
+dnl Keep this test in sync with Example 10 in doc/matrices.texi.
+AT_SETUP([MATRIX DATA - Factor variables grouping within-cell records by row type without ROWTYPE_])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=f1 var01 TO var04
+    /FACTOR=f1
+    /CELLS=2
+    /CONTENTS=(MEAN) (SD) (N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+1 44 45 34 39
+0 22 11 55 66
+1 23 15 51 46
+0 99 98 99 92
+1 98 34 87 23
+   1
+  .9  1
+  .8 .6  1
+  .7 .5 .4  1
+END DATA.
+FORMATS var01 TO var04(F5.1).
+LIST.
+])
+AT_CHECK([pspp matrix-data.sps -O format=csv], [0], [dnl
+Table: Data List
+ROWTYPE_,f1,VARNAME_,var01,var02,var03,var04
+MEAN,0,,34.0,35.0,36.0,37.0
+MEAN,1,,44.0,45.0,34.0,39.0
+STDDEV,0,,22.0,11.0,55.0,66.0
+STDDEV,1,,23.0,15.0,51.0,46.0
+N,0,,99.0,98.0,99.0,92.0
+N,1,,98.0,34.0,87.0,23.0
+CORR,.,var01,1.0,.9,.8,.7
+CORR,.,var02,.9,1.0,.6,.5
+CORR,.,var03,.8,.6,1.0,.4
+CORR,.,var04,.7,.5,.4,1.0
+])
+AT_CLEANUP
+
+AT_SETUP([MATRIX DATA - syntax errors])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA VARIABLES=var01 varname_.
+MATRIX DATA VARIABLES=v v v.
+MATRIX DATA VARIABLES=rowtype_ v1 v2 v3/SPLIT=rowtype_.
+MATRIX DATA VARIABLES=rowtype_ v1 v2 v3/FACTORS=rowtype_.
+MATRIX DATA VARIABLES=rowtype_ s1 v1 v2 v3/SPLIT=v1/FACTORS=v1.
+
+MATRIX DATA VARIABLES=v1 v2 v3/FORMAT=FULL NODIAGONAL.
+MATRIX DATA VARIABLES=v1 v2 v3/FACTORS=v1.
+MATRIX DATA VARIABLES=v1 v2 v3.
+BEGIN DATA.
+END DATA.
+MATRIX DATA VARIABLES=v1/FACTORS=v1.
+MATRIX DATA VARIABLES=v1 v2 v3 ROWTYPE_.
+MATRIX DATA VARIABLES=v1 v2 v3/CONTENTS=N/N=5.
+MATRIX DATA VARIABLES=v1/CONTENTS=XYZZY.
+MATRIX DATA VARIABLES=v1/CONTENTS=(.
+MATRIX DATA VARIABLES=v1/CONTENTS=(CORR.
+MATRIX DATA VARIABLES=v1/CONTENTS=).
+MATRIX DATA.
+MATRIX DATA VARIABLES=v*.
+MATRIX DATA VARIABLES=v/N=-1.
+MATRIX DATA VARIABLES=v/FORMAT=XYZZY.
+MATRIX DATA VARIABLES=v/FILE=123.
+MATRIX DATA VARIABLES=v/SPLIT=123.
+MATRIX DATA VARIABLES=v/CELLS=-1.
+MATRIX DATA VARIABLES=v/XYZZY.
+])
+AT_CHECK([pspp matrix-data.sps -O format=csv], [1], [dnl
+matrix-data.sps:1: error: MATRIX DATA: VARIABLES may not include VARNAME_.
+
+matrix-data.sps:2: error: MATRIX DATA: Variable v appears twice in variable list.
+
+matrix-data.sps:3: error: MATRIX DATA: ROWTYPE_ is not allowed on SPLIT or FACTORS.
+
+matrix-data.sps:4: error: MATRIX DATA: ROWTYPE_ is not allowed on SPLIT or FACTORS.
+
+matrix-data.sps:5: error: MATRIX DATA: v1 may not appear on both SPLIT and FACTORS.
+
+matrix-data.sps:7: error: MATRIX DATA: FORMAT=FULL and FORMAT=NODIAGONAL are mutually exclusive.
+
+matrix-data.sps:8: error: MATRIX DATA: CELLS is required when factor variables are specified and VARIABLES does not include ROWTYPE_.
+
+matrix-data.sps:9: warning: MATRIX DATA: CONTENTS was not specified and VARIABLES does not include ROWTYPE_.  Assuming CONTENTS=CORR.
+
+matrix-data.sps:12: error: MATRIX DATA: CELLS is required when factor variables are specified and VARIABLES does not include ROWTYPE_.
+
+matrix-data.sps:13: error: MATRIX DATA: VARIABLES includes ROWTYPE_ but the continuous variables are not the last ones on VARIABLES.
+
+matrix-data.sps:14: error: MATRIX DATA: Cannot specify N on CONTENTS along with the N subcommand.
+
+matrix-data.sps:15.35-15.39: error: MATRIX DATA: Syntax error at `XYZZY': Row type keyword expected.
+
+matrix-data.sps:16.36: error: MATRIX DATA: Syntax error at end of command: Row type keyword expected.
+
+matrix-data.sps:17.40: error: MATRIX DATA: Syntax error at end of command: Row type keyword expected.
+
+matrix-data.sps:18.35: error: MATRIX DATA: Syntax error at `)': Row type keyword expected.
+
+matrix-data.sps:19.12: error: MATRIX DATA: Syntax error at end of command: expecting VARIABLES.
+
+matrix-data.sps:20.24: error: MATRIX DATA: Syntax error at `*': expecting `/'.
+
+matrix-data.sps:21.27-21.28: error: MATRIX DATA: Syntax error at `-1': Expected non-negative integer for N.
+
+matrix-data.sps:22.32-22.36: error: MATRIX DATA: Syntax error at `XYZZY'.
+
+matrix-data.sps:23.30-23.32: error: MATRIX DATA: Syntax error at `123': expecting a file name or handle name.
+
+matrix-data.sps:24.31-24.33: error: MATRIX DATA: Syntax error at `123': expecting variable name.
+
+matrix-data.sps:25.31-25.32: error: MATRIX DATA: Syntax error at `-1': Expected non-negative integer for CELLS.
+
+matrix-data.sps:26.25-26.29: error: MATRIX DATA: Syntax error at `XYZZY'.
+])
+AT_CLEANUP
+
+dnl I don't know what lunatic thought this was OK, but we strive to be
+dnl compatible.
+AT_SETUP([MATRIX DATA - plus and minus as delimiters])
+AT_DATA([matrix-data.sps], [dnl
+MATRIX DATA
+    VARIABLES=ROWTYPE_ var01 TO var08.
+BEGIN DATA.
+MEAN+24.3+5.4+69.7+20.1+13.4+2.7+27.9+3.7
+SD +5.7+1.5+23.5+5.8+2.8+4.5+5.4+1.5
+N+92+92+92+92+92+92+92+92
+CORR+1.00
+CORR+.18+1.00
+CORR-.22e+0-.17+1.00
+CORR+.36d-0+.31-.14+1.00
+CORR+.27+.16-.12+.22+1.00
+CORR+.33+.15-.17+.24+.21+1.00
+CORR+.50+.29-.20+.32+.12+.38+1.00
+CORR+.17+.29-.05+.20+.27+.20+.04+1.00
+END DATA.
+FORMATS var01 TO var08(F5.2).
+LIST.
+])
+
+AT_CHECK([pspp -O format=csv matrix-data.sps], [0], [dnl
+Table: Data List
+ROWTYPE_,VARNAME_,var01,var02,var03,var04,var05,var06,var07,var08
+MEAN,,24.30,5.40,69.70,20.10,13.40,2.70,27.90,3.70
+STDDEV,,5.70,1.50,23.50,5.80,2.80,4.50,5.40,1.50
+N,,92.00,92.00,92.00,92.00,92.00,92.00,92.00,92.00
+CORR,var01,1.00,.18,-.22,.36,.27,.33,.50,.17
+CORR,var02,.18,1.00,-.17,.31,.16,.15,.29,.29
+CORR,var03,-.22,-.17,1.00,-.14,-.12,-.17,-.20,-.05
+CORR,var04,.36,.31,-.14,1.00,.22,.24,.32,.20
+CORR,var05,.27,.16,-.12,.22,1.00,.21,.12,.27
+CORR,var06,.33,.15,-.17,.24,.21,1.00,.38,.20
+CORR,var07,.50,.29,-.20,.32,.12,.38,1.00,.04
+CORR,var08,.17,.29,-.05,.20,.27,.20,.04,1.00
+])
  AT_CLEANUP
author	Ben Pfaff <blp@cs.stanford.edu>
	Fri, 3 Sep 2021 05:15:53 +0000 (22:15 -0700)
committer	Ben Pfaff <blp@cs.stanford.edu>
	Fri, 3 Sep 2021 05:15:53 +0000 (22:15 -0700)
NEWS		patch \| blob \| history
doc/automake.mk		patch \| blob \| history
doc/data-io.texi		patch \| blob \| history
doc/matrices.texi	[new file with mode: 0644]	patch \| blob
doc/pspp.texi		patch \| blob \| history
src/language/command.def		patch \| blob \| history
src/language/data-io/matrix-data.c		patch \| blob \| history
src/language/data-io/matrix-reader.c		patch \| blob \| history
src/language/data-io/matrix-reader.h		patch \| blob \| history
src/language/stats/factor.c		patch \| blob \| history
src/math/covariance.c		patch \| blob \| history
src/math/covariance.h		patch \| blob \| history
tests/language/data-io/matrix-data.at		patch \| blob \| history