+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017, 2020, 2021 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
+@node Matrices
+@chapter Matrices
+
+Some @pspp{} procedures work with matrices by producing numeric
+matrices that report results of data analysis, or by consuming
+matrices as a basis for further analysis. This chapter documents the
+format of data files that store these matrices and commands for
+working with them.
+
+@node Matrix Files
+@section Matrix Files
+@vindex Matrix file
+
+A matrix file is an SPSS system file that conforms to the dictionary
+and case structure described in this section. Procedures that read
+matrices from files expect them to be in the matrix file format.
+Procedures that write matrices also use this format.
+
+Text files that contain matrices can be converted to matrix file
+format. @xref{MATRIX DATA}, for a command to read a text file as a
+matrix file.
+
+A matrix file's dictionary must have the following variables in the
+specified order:
+
+@enumerate
+@item
+Zero or more numeric split variables. These are included by
+procedures when @cmd{SPLIT FILE} is active. @cmd{MATRIX DATA} assigns
+split variables format F4.0.
+
+@item
+@code{ROWTYPE_}, a string variable with width 8. This variable
+indicates the kind of matrix or vector that a given case represents.
+The supported row types are listed below.
+
+@item
+Zero or more numeric factor variables. These are included by
+procedures that divide data into cells. For within-cell data, factor
+variables are filled with non-missing values; for pooled data, they
+are missing. @cmd{MATRIX DATA} assigns factor variables format F4.0.
+
+@item
+@code{VARNAME_}, a string variable. Matrix data includes one row per
+continuous variable (see below), naming each continuous variable in
+order. This column is blank for vector data. @cmd{MATRIX DATA} makes
+@code{VARNAME_} wide enough for the name of any of the continuous
+variables, but at least 8 bytes.
+
+@item
+One or more continuous variables. These are the variables whose data
+was analyzed to produce the matrices. @cmd{MATRIX DATA} assigns
+continuous variables format F10.4.
+@end enumerate
+
+Case weights are ignored in matrix files.
+
+@subheading Row Types
+@anchor{Matrix File Row Types}
+
+Matrix files support a fixed set of types of matrix and vector data.
+The @code{ROWTYPE_} variable in each case of a matrix file indicates
+its row type.
+
+The supported matrix row types are listed below. Each type is listed
+with the keyword that identifies it in @code{ROWTYPE_}. All supported
+types of matrices are square, meaning that each matrix must include
+one row per continuous variable, with the @code{VARNAME_} variable
+indicating each continuous variable in turn in the same order as the
+dictionary.
+
+@table @code
+@item CORR
+Correlation coefficients.
+
+@item COV
+Covariance coefficients.
+
+@item MAT
+General-purpose matrix.
+
+@item N_MATRIX
+Counts.
+
+@item PROX
+Proximities matrix.
+@end table
+
+The supported vector row types are listed below, along with their
+associated keyword. Vector row types only require a single row, whose
+@code{VARNAME_} is blank:
+
+@table @code
+@item COUNT
+Unweighted counts.
+
+@item DFE
+Degrees of freedom.
+
+@item MEAN
+Means.
+
+@item MSE
+Mean squared errors.
+
+@item N
+Counts.
+
+@item STDDEV
+Standard deviations.
+@end table
+
+Only the row types listed above may appear in matrix files. The
+@cmd{MATRIX DATA} command, however, accepts the additional row types
+listed below, which it changes into matrix file row types as part of
+its conversion process:
+
+@table @code
+@item N_VECTOR
+Synonym for @cmd{N}.
+
+@item SD
+Synonym for @code{STDDEV}.
+
+@item N_SCALAR
+Accepts a single number from the @code{MATRIX DATA} input and writes
+it as an @code{N} row with the number replicated across all the
+continuous variables.
+@end table
+
+@node MATRIX DATA
+@section MATRIX DATA
+@vindex MATRIX DATA
+
+@display
+MATRIX DATA
+ VARIABLES=@var{variables}
+ [FILE=@{'@var{file_name}' | INLINE@}
+ [/FORMAT=[@{LIST | FREE@}]
+ [@{UPPER | LOWER | FULL@}]
+ [@{DIAGONAL | NODIAGONAL@}]]
+ [/SPLIT=@var{split_vars}]
+ [/FACTORS=@var{factor_vars}]
+ [/N=@var{n}]
+
+The following subcommands are only needed when ROWTYPE_ is not
+specified on the VARIABLES subcommand:
+ [/CONTENTS=@{CORR,COUNT,COV,DFE,MAT,MEAN,MSE,
+ N_MATRIX,N|N_VECTOR,N_SCALAR,PROX,SD|STDDEV@}]
+ [/CELLS=@var{n_cells}]
+@end display
+
+The @cmd{MATRIX DATA} command convert matrices and vectors from text
+format into the matrix file format (@xref{Matrix Files}) for use by
+procedures that read matrices. It reads a text file or inline data
+and outputs to the active file, replacing any data already in the
+active dataset. The matrix file may then be used by other commands
+directly from the active file, or it may be written to a @file{.sav}
+file using the @cmd{SAVE} command.
+
+The text data read by @cmd{MATRIX DATA} can be delimited by spaces or
+commas. A plus or minus sign, except immediately following a @samp{d}
+or @samp{e}, also begins a new value. Optionally, values may be
+enclosed in single or double quotes.
+
+@cmd{MATRIX DATA} can read the types of matrix and vector data
+supported in matrix files (@pxref{Matrix File Row Types}).
+
+The @subcmd{FILE} subcommand specifies the source of the command's
+input. To read input from a text file, specify its name in quotes.
+To supply input inline, omit @subcmd{FILE} or specify @code{INLINE}.
+Inline data must directly follow @code{MATRIX DATA}, inside @cmd{BEGIN
+DATA} (@pxref{BEGIN DATA}).
+
+@subcmd{VARIABLES} is the only required subcommand. It names the
+variables present in each input record in the order that they appear.
+(@cmd{MATRIX DATA} reorders the variables in the matrix file it
+produces, if needed to fit the matrix file format.) The variable list
+must include split variables and factor variables, if they are present
+in the data, in addition to the continuous variables that form matrix
+rows and columns. It may also include a special variable named
+@code{ROWTYPE_}.
+
+Matrix data may include split variables or factor variables or both.
+List split variables, if any, on the @subcmd{SPLIT} subcommand and
+factor variables, if any, on the @subcmd{FACTORS} subcommand. Split
+and factor variables must be numeric. Split and factor variables must
+also be listed on @subcmd{VARIABLES}, with one exception: if
+@subcmd{VARIABLES} does not include @code{ROWTYPE_}, then
+@subcmd{SPLIT} may name a single variable that is not in
+@subcmd{VARIABLES} (@pxref{MATRIX DATA Example 8}).
+
+The @subcmd{FORMAT} subcommand accepts settings to describe the format
+of the input data:
+
+@table @asis
+@item @code{LIST} (default)
+@itemx @code{FREE}
+LIST requires each row to begin at the start of a new input line.
+FREE allows rows to begin in the middle of a line. Either setting
+allows a single row to continue across multiple input lines.
+
+@item @code{LOWER} (default)
+@itemx @code{UPPER}
+@itemx @code{FULL}
+With LOWER, only the lower triangle is read from the input data and
+the upper triangle is mirrored across the main diagonal. UPPER
+behaves similarly for the upper triangle. FULL reads the entire
+matrix.
+
+@item @code{DIAGONAL} (default)
+@itemx @code{NODIAGONAL}
+With DIAGONAL, the main diagonal is read from the input data. With
+NODIAGONAL, which is incompatible with FULL, the main diagonal is not
+read from the input data but instead set to 1 for correlation matrices
+and system-missing for others.
+@end table
+
+The @subcmd{N} subcommand is a way to specify the size of the
+population. It is equivalent to specifying an @code{N} vector with
+the specified value for each split file.
+
+@cmd{MATRIX DATA} supports two different ways to indicate the kinds of
+matrices and vectors present in the data, depending on whether a
+variable with the special name @code{ROWTYPE_} is present in
+@code{VARIABLES}. The following subsections explain @cmd{MATRIX DATA}
+syntax and behavior in each case.
+
+@node MATRIX DATA with ROWTYPE_
+@subsection With @code{ROWTYPE_}
+
+If @code{VARIABLES} includes @code{ROWTYPE_}, each case's
+@code{ROWTYPE_} indicates the type of data contained in the row.
+@xref{Matrix File Row Types}, for a list of supported row types.
+
+@subsubheading Example 1: Defaults with @code{ROWTYPE_}
+@anchor{MATRIX DATA Example 1}
+
+This example shows a simple use of @cmd{MATRIX DATA} with
+@code{ROWTYPE_} plus 8 variables named @code{var01} through
+@code{var08}.
+
+Because @code{ROWTYPE_} is the first variable in @subcmd{VARIABLES},
+it appears first on each line. The first three lines in the example
+data have @code{ROWTYPE_} values of @samp{MEAN}, @samp{SD}, and
+@samp{N}. These indicate that these lines contain vectors of means,
+standard deviations, and counts, respectively, for @code{var01}
+through @code{var08} in order.
+
+The remaining 8 lines have a ROWTYPE_ of @samp{CORR} which indicates
+that the values are correlation coefficients. Each of the lines
+corresponds to a row in the correlation matrix: the first line is for
+@code{var01}, the next line for @code{var02}, and so on. The input
+only contains values for the lower triangle, including the diagonal,
+since @code{FORMAT=LOWER DIAGONAL} is the default.
+
+With @code{ROWTYPE_}, the @code{CONTENTS} subcommand is optional and
+the @code{CELLS} subcommand may not be used.
+
+@example
+MATRIX DATA
+ VARIABLES=ROWTYPE_ var01 TO var08.
+BEGIN DATA.
+MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+N 92 92 92 92 92 92 92 92
+CORR 1.00
+CORR .18 1.00
+CORR -.22 -.17 1.00
+CORR .36 .31 -.14 1.00
+CORR .27 .16 -.12 .22 1.00
+CORR .33 .15 -.17 .24 .21 1.00
+CORR .50 .29 -.20 .32 .12 .38 1.00
+CORR .17 .29 -.05 .20 .27 .20 .04 1.00
+END DATA.
+@end example
+
+@subsubheading Example 2: @code{FORMAT=UPPER NODIAGONAL}
+
+This syntax produces the same matrix file as example 1, but it uses
+@code{FORMAT=UPPER NODIAGONAL} to specify the upper triangle and omit
+the diagonal. Because the matrix's @code{ROWTYPE_} is @code{CORR},
+@pspp{} automatically fills in the diagonal with 1.
+
+@example
+MATRIX DATA
+ VARIABLES=ROWTYPE_ var01 TO var08
+ /FORMAT=UPPER NODIAGONAL.
+BEGIN DATA.
+MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+N 92 92 92 92 92 92 92 92
+CORR .17 .50 -.33 .27 .36 -.22 .18
+CORR .29 .29 -.20 .32 .12 .38
+CORR .05 .20 -.15 .16 .21
+CORR .20 .32 -.17 .12
+CORR .27 .12 -.24
+CORR -.20 -.38
+CORR .04
+END DATA.
+@end example
+
+@subsubheading Example 3: @subcmd{N} subcommand
+
+This syntax uses the @subcmd{N} subcommand in place of an @code{N}
+vector. It produces the same matrix file as examples 1 and 2.
+
+@example
+MATRIX DATA
+ VARIABLES=ROWTYPE_ var01 TO var08
+ /FORMAT=UPPER NODIAGONAL
+ /N 92.
+BEGIN DATA.
+MEAN 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+SD 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+CORR .17 .50 -.33 .27 .36 -.22 .18
+CORR .29 .29 -.20 .32 .12 .38
+CORR .05 .20 -.15 .16 .21
+CORR .20 .32 -.17 .12
+CORR .27 .12 -.24
+CORR -.20 -.38
+CORR .04
+END DATA.
+@end example
+
+@subsubheading Example 4: Split variables
+@anchor{MATRIX DATA Example 4}
+
+This syntax defines two matrices, using the variable @samp{s1} to
+distinguish between them. Notice how the order of variables in the
+input matches their order on @subcmd{VARIABLES}. This example also
+uses @code{FORMAT=FULL}.
+
+@example
+MATRIX DATA
+ VARIABLES=s1 ROWTYPE_ var01 TO var04
+ /SPLIT=s1
+ /FORMAT=FULL.
+BEGIN DATA.
+0 MEAN 34 35 36 37
+0 SD 22 11 55 66
+0 N 99 98 99 92
+0 CORR 1 .9 .8 .7
+0 CORR .9 1 .6 .5
+0 CORR .8 .6 1 .4
+0 CORR .7 .5 .4 1
+1 MEAN 44 45 34 39
+1 SD 23 15 51 46
+1 N 98 34 87 23
+1 CORR 1 .2 .3 .4
+1 CORR .2 1 .5 .6
+1 CORR .3 .5 1 .7
+1 CORR .4 .6 .7 1
+END DATA.
+@end example
+
+@subsubheading Example 5: Factor variables
+@anchor{MATRIX DATA Example 5}
+
+This syntax defines a matrix file that includes a factor variable
+@samp{f1}. The data includes mean, standard deviation, and count
+vectors for two values of the factor variable, plus a correlation
+matrix for pooled data.
+
+@example
+MATRIX DATA
+ VARIABLES=ROWTYPE_ f1 var01 TO var04
+ /FACTOR=f1.
+BEGIN DATA.
+MEAN 0 34 35 36 37
+SD 0 22 11 55 66
+N 0 99 98 99 92
+MEAN 1 44 45 34 39
+SD 1 23 15 51 46
+N 1 98 34 87 23
+CORR . 1
+CORR . .9 1
+CORR . .8 .6 1
+CORR . .7 .5 .4 1
+END DATA.
+@end example
+
+@node MATRIX DATA without ROWTYPE_
+@subsection Without @code{ROWTYPE_}
+
+If @code{VARIABLES} does not contain @code{ROWTYPE_}, the
+@subcmd{CONTENTS} subcommand defines the row types that appear in the
+file and their order. If @subcmd{CONTENTS} is omitted,
+@code{CONTENTS=CORR} is assumed.
+
+Factor variables without @code{ROWTYPE_} introduce special
+requirements, illustrated below in Examples 8 and 9.
+
+@subsubheading Example 6: Defaults without @code{ROWTYPE_}
+
+This example shows a simple use of @cmd{MATRIX DATA} with 8 variables
+named @code{var01} through @code{var08}, without @code{ROWTYPE_}.
+This yields the same matrix file as Example 1 (@pxref{MATRIX DATA
+Example 1}).
+
+@example
+MATRIX DATA
+ VARIABLES=var01 TO var08
+ /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+ 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+ 92 92 92 92 92 92 92 92
+1.00
+ .18 1.00
+-.22 -.17 1.00
+ .36 .31 -.14 1.00
+ .27 .16 -.12 .22 1.00
+ .33 .15 -.17 .24 .21 1.00
+ .50 .29 -.20 .32 .12 .38 1.00
+ .17 .29 -.05 .20 .27 .20 .04 1.00
+END DATA.
+@end example
+
+@subsubheading Example 7: Split variables with explicit values
+
+This syntax defines two matrices, using the variable @code{s1} to
+distinguish between them. Each line of data begins with @code{s1}.
+This yields the same matrix file as Example 4 (@pxref{MATRIX DATA
+Example 4}).
+
+@example
+MATRIX DATA
+ VARIABLES=s1 var01 TO var04
+ /SPLIT=s1
+ /FORMAT=FULL
+ /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+0 1 .9 .8 .7
+0 .9 1 .6 .5
+0 .8 .6 1 .4
+0 .7 .5 .4 1
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+1 1 .2 .3 .4
+1 .2 1 .5 .6
+1 .3 .5 1 .7
+1 .4 .6 .7 1
+END DATA.
+@end example
+
+@subsubheading Example 8: Split variable with sequential values
+@anchor{MATRIX DATA Example 8}
+
+Like this previous example, this syntax defines two matrices with
+split variable @code{s1}. In this case, though, @code{s1} is not
+listed in @subcmd{VARIABLES}, which means that its value does not
+appear in the data. Instead, @cmd{MATRIX DATA} reads matrix data
+until the input is exhausted, supplying 1 for the first split, 2 for
+the second, and so on.
+
+@example
+MATRIX DATA
+ VARIABLES=var01 TO var04
+ /SPLIT=s1
+ /FORMAT=FULL
+ /CONTENTS=MEAN SD N CORR.
+BEGIN DATA.
+34 35 36 37
+22 11 55 66
+99 98 99 92
+ 1 .9 .8 .7
+.9 1 .6 .5
+.8 .6 1 .4
+.7 .5 .4 1
+44 45 34 39
+23 15 51 46
+98 34 87 23
+ 1 .2 .3 .4
+.2 1 .5 .6
+.3 .5 1 .7
+.4 .6 .7 1
+END DATA.
+@end example
+
+@subsubsection Factor variables without @code{ROWTYPE_}
+
+Without @subcmd{ROWTYPE_}, factor variables introduce two new wrinkles
+to @cmd{MATRIX DATA} syntax. First, the @subcmd{CELLS} subcommand
+must declare the number of combinations of factor variables present in
+the data. If there is, for example, one factor variable for which the
+data contains three values, one would write @code{CELLS=3}; if there
+are two (or more) factor variables for which the data contains five
+combinations, one would use @code{CELLS=5}; and so on.
+
+Second, the @subcmd{CONTENTS} subcommand must distinguish within-cell
+data from pooled data by enclosing within-cell row types in
+parentheses. When different within-cell row types for a single factor
+appear in subsequent lines, enclose the row types in a single set of
+parentheses; when different factors' values for a given within-cell
+row type appear in subsequent lines, enclose each row type in
+individual parentheses.
+
+Without @subcmd{ROWTYPE_}, input lines for pooled data do not include
+factor values, not even as missing values, but input lines for
+within-cell data do.
+
+The following examples aim to clarify this syntax.
+
+@subsubheading Example 9: Factor variables, grouping within-cell records by factor
+
+This syntax defines the same matrix file as Example 5 (@pxref{MATRIX
+DATA Example 5}), without using @code{ROWTYPE_}. It declares
+@code{CELLS=2} because the data contains two values (0 and 1) for
+factor variable @code{f1}. Within-cell vector row types @code{MEAN},
+@code{SD}, and @code{N} are in a single set of parentheses on
+@subcmd{CONTENTS} because they are grouped together in subsequent
+lines for a single factor value. The data lines with the pooled
+correlation matrix do not have any factor values.
+
+@example
+MATRIX DATA
+ VARIABLES=f1 var01 TO var04
+ /FACTOR=f1
+ /CELLS=2
+ /CONTENTS=(MEAN SD N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+0 22 11 55 66
+0 99 98 99 92
+1 44 45 34 39
+1 23 15 51 46
+1 98 34 87 23
+ 1
+ .9 1
+ .8 .6 1
+ .7 .5 .4 1
+END DATA.
+@end example
+
+@subsubheading Example 10: Factor variables, grouping within-cell records by row type
+
+This syntax defines the same matrix file as the previous example. The
+only difference is that the within-cell vector rows are grouped
+differently: two rows of means (one for each factor), followed by two
+rows of standard deviations, followed by two rows of counts.
+
+@example
+MATRIX DATA
+ VARIABLES=f1 var01 TO var04
+ /FACTOR=f1
+ /CELLS=2
+ /CONTENTS=(MEAN) (SD) (N) CORR.
+BEGIN DATA.
+0 34 35 36 37
+1 44 45 34 39
+0 22 11 55 66
+1 23 15 51 46
+0 99 98 99 92
+1 98 34 87 23
+ 1
+ .9 1
+ .8 .6 1
+ .7 .5 .4 1
+END DATA.
+@end example