pintos-os.org Git - pspp/blob - doc/matrices.texi

   1 @c PSPP - a program for statistical analysis.
   2 @c Copyright (C) 2017, 2020, 2021 Free Software Foundation, Inc.
   3 @c Permission is granted to copy, distribute and/or modify this document
   4 @c under the terms of the GNU Free Documentation License, Version 1.3
   5 @c or any later version published by the Free Software Foundation;
   6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
   7 @c A copy of the license is included in the section entitled "GNU
   8 @c Free Documentation License".
   9 @c
  10 @node Matrices
  11 @chapter Matrices
  12
  13 Some @pspp{} procedures work with matrices by producing numeric
  14 matrices that report results of data analysis, or by consuming
  15 matrices as a basis for further analysis.  This chapter documents the
  16 format of data files that store these matrices and commands for
  17 working with them, as well as @pspp{}'s general-purpose facility for
  18 matrix operations.
  19
  20 @node Matrix Files
  21 @section Matrix Files
  22 @vindex Matrix file
  23
  24 A matrix file is an SPSS system file that conforms to the dictionary
  25 and case structure described in this section.  Procedures that read
  26 matrices from files expect them to be in the matrix file format.
  27 Procedures that write matrices also use this format.
  28
  29 Text files that contain matrices can be converted to matrix file
  30 format.  @xref{MATRIX DATA}, for a command to read a text file as a
  31 matrix file.
  32
  33 A matrix file's dictionary must have the following variables in the
  34 specified order:
  35
  36 @enumerate
  37 @item
  38 Zero or more numeric split variables.  These are included by
  39 procedures when @cmd{SPLIT FILE} is active.  @cmd{MATRIX DATA} assigns
  40 split variables format F4.0.
  41
  42 @item
  43 @code{ROWTYPE_}, a string variable with width 8.  This variable
  44 indicates the kind of matrix or vector that a given case represents.
  45 The supported row types are listed below.
  46
  47 @item
  48 Zero or more numeric factor variables.  These are included by
  49 procedures that divide data into cells.  For within-cell data, factor
  50 variables are filled with non-missing values; for pooled data, they
  51 are missing.  @cmd{MATRIX DATA} assigns factor variables format F4.0.
  52
  53 @item
  54 @code{VARNAME_}, a string variable.  Matrix data includes one row per
  55 continuous variable (see below), naming each continuous variable in
  56 order.  This column is blank for vector data.  @cmd{MATRIX DATA} makes
  57 @code{VARNAME_} wide enough for the name of any of the continuous
  58 variables, but at least 8 bytes.
  59
  60 @item
  61 One or more numeric continuous variables.  These are the variables
  62 whose data was analyzed to produce the matrices.  @cmd{MATRIX DATA}
  63 assigns continuous variables format F10.4.
  64 @end enumerate
  65
  66 Case weights are ignored in matrix files.
  67
  68 @subheading Row Types
  69 @anchor{Matrix File Row Types}
  70
  71 Matrix files support a fixed set of types of matrix and vector data.
  72 The @code{ROWTYPE_} variable in each case of a matrix file indicates
  73 its row type.
  74
  75 The supported matrix row types are listed below.  Each type is listed
  76 with the keyword that identifies it in @code{ROWTYPE_}.  All supported
  77 types of matrices are square, meaning that each matrix must include
  78 one row per continuous variable, with the @code{VARNAME_} variable
  79 indicating each continuous variable in turn in the same order as the
  80 dictionary.
  81
  82 @table @code
  83 @item CORR
  84 Correlation coefficients.
  85
  86 @item COV
  87 Covariance coefficients.
  88
  89 @item MAT
  90 General-purpose matrix.
  91
  92 @item N_MATRIX
  93 Counts.
  94
  95 @item PROX
  96 Proximities matrix.
  97 @end table
  98
  99 The supported vector row types are listed below, along with their
 100 associated keyword.  Vector row types only require a single row, whose
 101 @code{VARNAME_} is blank:
 102
 103 @table @code
 104 @item COUNT
 105 Unweighted counts.
 106
 107 @item DFE
 108 Degrees of freedom.
 109
 110 @item MEAN
 111 Means.
 112
 113 @item MSE
 114 Mean squared errors.
 115
 116 @item N
 117 Counts.
 118
 119 @item STDDEV
 120 Standard deviations.
 121 @end table
 122
 123 Only the row types listed above may appear in matrix files.  The
 124 @cmd{MATRIX DATA} command, however, accepts the additional row types
 125 listed below, which it changes into matrix file row types as part of
 126 its conversion process:
 127
 128 @table @code
 129 @item N_VECTOR
 130 Synonym for @cmd{N}.
 131
 132 @item SD
 133 Synonym for @code{STDDEV}.
 134
 135 @item N_SCALAR
 136 Accepts a single number from the @code{MATRIX DATA} input and writes
 137 it as an @code{N} row with the number replicated across all the
 138 continuous variables.
 139 @end table
 140
 141 @node MATRIX DATA
 142 @section MATRIX DATA
 143 @vindex MATRIX DATA
 144
 145 @display
 146 MATRIX DATA
 147         VARIABLES=@var{variables}
 148         [FILE=@{'@var{file_name}' | INLINE@}
 149         [/FORMAT=[@{LIST | FREE@}]
 150                  [@{UPPER | LOWER | FULL@}]
 151                  [@{DIAGONAL | NODIAGONAL@}]]
 152         [/SPLIT=@var{split_vars}]
 153         [/FACTORS=@var{factor_vars}]
 154         [/N=@var{n}]
 155
 156 The following subcommands are only needed when ROWTYPE_ is not
 157 specified on the VARIABLES subcommand:
 158         [/CONTENTS=@{CORR,COUNT,COV,DFE,MAT,MEAN,MSE,
 159                     N_MATRIX,N|N_VECTOR,N_SCALAR,PROX,SD|STDDEV@}]
 160         [/CELLS=@var{n_cells}]
 161 @end display
 162
 163 The @cmd{MATRIX DATA} command convert matrices and vectors from text
 164 format into the matrix file format (@xref{Matrix Files}) for use by
 165 procedures that read matrices.  It reads a text file or inline data
 166 and outputs to the active file, replacing any data already in the
 167 active dataset.  The matrix file may then be used by other commands
 168 directly from the active file, or it may be written to a @file{.sav}
 169 file using the @cmd{SAVE} command.
 170
 171 The text data read by @cmd{MATRIX DATA} can be delimited by spaces or
 172 commas.  A plus or minus sign, except immediately following a @samp{d}
 173 or @samp{e}, also begins a new value.  Optionally, values may be
 174 enclosed in single or double quotes.
 175
 176 @cmd{MATRIX DATA} can read the types of matrix and vector data
 177 supported in matrix files (@pxref{Matrix File Row Types}).
 178
 179 The @subcmd{FILE} subcommand specifies the source of the command's
 180 input.  To read input from a text file, specify its name in quotes.
 181 To supply input inline, omit @subcmd{FILE} or specify @code{INLINE}.
 182 Inline data must directly follow @code{MATRIX DATA}, inside @cmd{BEGIN
 183 DATA} (@pxref{BEGIN DATA}).
 184
 185 @subcmd{VARIABLES} is the only required subcommand.  It names the
 186 variables present in each input record in the order that they appear.
 187 (@cmd{MATRIX DATA} reorders the variables in the matrix file it
 188 produces, if needed to fit the matrix file format.)  The variable list
 189 must include split variables and factor variables, if they are present
 190 in the data, in addition to the continuous variables that form matrix
 191 rows and columns.  It may also include a special variable named
 192 @code{ROWTYPE_}.
 193
 194 Matrix data may include split variables or factor variables or both.
 195 List split variables, if any, on the @subcmd{SPLIT} subcommand and
 196 factor variables, if any, on the @subcmd{FACTORS} subcommand.  Split
 197 and factor variables must be numeric.  Split and factor variables must
 198 also be listed on @subcmd{VARIABLES}, with one exception: if
 199 @subcmd{VARIABLES} does not include @code{ROWTYPE_}, then
 200 @subcmd{SPLIT} may name a single variable that is not in
 201 @subcmd{VARIABLES} (@pxref{MATRIX DATA Example 8}).
 202
 203 The @subcmd{FORMAT} subcommand accepts settings to describe the format
 204 of the input data:
 205
 206 @table @asis
 207 @item @code{LIST} (default)
 208 @itemx @code{FREE}
 209 LIST requires each row to begin at the start of a new input line.
 210 FREE allows rows to begin in the middle of a line.  Either setting
 211 allows a single row to continue across multiple input lines.
 212
 213 @item @code{LOWER} (default)
 214 @itemx @code{UPPER}
 215 @itemx @code{FULL}
 216 With LOWER, only the lower triangle is read from the input data and
 217 the upper triangle is mirrored across the main diagonal.  UPPER
 218 behaves similarly for the upper triangle.  FULL reads the entire
 219 matrix.
 220
 221 @item @code{DIAGONAL} (default)
 222 @itemx @code{NODIAGONAL}
 223 With DIAGONAL, the main diagonal is read from the input data.  With
 224 NODIAGONAL, which is incompatible with FULL, the main diagonal is not
 225 read from the input data but instead set to 1 for correlation matrices
 226 and system-missing for others.
 227 @end table
 228
 229 The @subcmd{N} subcommand is a way to specify the size of the
 230 population.  It is equivalent to specifying an @code{N} vector with
 231 the specified value for each split file.
 232
 233 @cmd{MATRIX DATA} supports two different ways to indicate the kinds of
 234 matrices and vectors present in the data, depending on whether a
 235 variable with the special name @code{ROWTYPE_} is present in
 236 @code{VARIABLES}.  The following subsections explain @cmd{MATRIX DATA}
 237 syntax and behavior in each case.
 238
 239 @node MATRIX DATA with ROWTYPE_
 240 @subsection With @code{ROWTYPE_}
 241
 242 If @code{VARIABLES} includes @code{ROWTYPE_}, each case's
 243 @code{ROWTYPE_} indicates the type of data contained in the row.
 244 @xref{Matrix File Row Types}, for a list of supported row types.
 245
 246 @subsubheading Example 1: Defaults with @code{ROWTYPE_}
 247 @anchor{MATRIX DATA Example 1}
 248
 249 This example shows a simple use of @cmd{MATRIX DATA} with
 250 @code{ROWTYPE_} plus 8 variables named @code{var01} through
 251 @code{var08}.
 252
 253 Because @code{ROWTYPE_} is the first variable in @subcmd{VARIABLES},
 254 it appears first on each line. The first three lines in the example
 255 data have @code{ROWTYPE_} values of @samp{MEAN}, @samp{SD}, and
 256 @samp{N}.  These indicate that these lines contain vectors of means,
 257 standard deviations, and counts, respectively, for @code{var01}
 258 through @code{var08} in order.
 259
 260 The remaining 8 lines have a ROWTYPE_ of @samp{CORR} which indicates
 261 that the values are correlation coefficients.  Each of the lines
 262 corresponds to a row in the correlation matrix: the first line is for
 263 @code{var01}, the next line for @code{var02}, and so on.  The input
 264 only contains values for the lower triangle, including the diagonal,
 265 since @code{FORMAT=LOWER DIAGONAL} is the default.
 266
 267 With @code{ROWTYPE_}, the @code{CONTENTS} subcommand is optional and
 268 the @code{CELLS} subcommand may not be used.
 269
 270 @example
 271 MATRIX DATA
 272     VARIABLES=ROWTYPE_ var01 TO var08.
 273 BEGIN DATA.
 274 MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
 275 SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
 276 N       92    92    92    92    92    92    92    92
 277 CORR  1.00
 278 CORR   .18  1.00
 279 CORR  -.22  -.17  1.00
 280 CORR   .36   .31  -.14  1.00
 281 CORR   .27   .16  -.12   .22  1.00
 282 CORR   .33   .15  -.17   .24   .21  1.00
 283 CORR   .50   .29  -.20   .32   .12   .38  1.00
 284 CORR   .17   .29  -.05   .20   .27   .20   .04  1.00
 285 END DATA.
 286 @end example
 287
 288 @subsubheading Example 2: @code{FORMAT=UPPER NODIAGONAL}
 289
 290 This syntax produces the same matrix file as example 1, but it uses
 291 @code{FORMAT=UPPER NODIAGONAL} to specify the upper triangle and omit
 292 the diagonal.  Because the matrix's @code{ROWTYPE_} is @code{CORR},
 293 @pspp{} automatically fills in the diagonal with 1.
 294
 295 @example
 296 MATRIX DATA
 297     VARIABLES=ROWTYPE_ var01 TO var08
 298     /FORMAT=UPPER NODIAGONAL.
 299 BEGIN DATA.
 300 MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
 301 SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
 302 N       92    92    92    92    92    92    92    92
 303 CORR         .17   .50  -.33   .27   .36  -.22   .18
 304 CORR               .29   .29  -.20   .32   .12   .38
 305 CORR                     .05   .20  -.15   .16   .21
 306 CORR                           .20   .32  -.17   .12
 307 CORR                                 .27   .12  -.24
 308 CORR                                      -.20  -.38
 309 CORR                                             .04
 310 END DATA.
 311 @end example
 312
 313 @subsubheading Example 3: @subcmd{N} subcommand
 314
 315 This syntax uses the @subcmd{N} subcommand in place of an @code{N}
 316 vector.  It produces the same matrix file as examples 1 and 2.
 317
 318 @example
 319 MATRIX DATA
 320     VARIABLES=ROWTYPE_ var01 TO var08
 321     /FORMAT=UPPER NODIAGONAL
 322     /N 92.
 323 BEGIN DATA.
 324 MEAN  24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
 325 SD     5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
 326 CORR         .17   .50  -.33   .27   .36  -.22   .18
 327 CORR               .29   .29  -.20   .32   .12   .38
 328 CORR                     .05   .20  -.15   .16   .21
 329 CORR                           .20   .32  -.17   .12
 330 CORR                                 .27   .12  -.24
 331 CORR                                      -.20  -.38
 332 CORR                                             .04
 333 END DATA.
 334 @end example
 335
 336 @subsubheading Example 4: Split variables
 337 @anchor{MATRIX DATA Example 4}
 338
 339 This syntax defines two matrices, using the variable @samp{s1} to
 340 distinguish between them.  Notice how the order of variables in the
 341 input matches their order on @subcmd{VARIABLES}.  This example also
 342 uses @code{FORMAT=FULL}.
 343
 344 @example
 345 MATRIX DATA
 346     VARIABLES=s1 ROWTYPE_  var01 TO var04
 347     /SPLIT=s1
 348     /FORMAT=FULL.
 349 BEGIN DATA.
 350 0 MEAN 34 35 36 37
 351 0 SD   22 11 55 66
 352 0 N    99 98 99 92
 353 0 CORR  1 .9 .8 .7
 354 0 CORR .9  1 .6 .5
 355 0 CORR .8 .6  1 .4
 356 0 CORR .7 .5 .4  1
 357 1 MEAN 44 45 34 39
 358 1 SD   23 15 51 46
 359 1 N    98 34 87 23
 360 1 CORR  1 .2 .3 .4
 361 1 CORR .2  1 .5 .6
 362 1 CORR .3 .5  1 .7
 363 1 CORR .4 .6 .7  1
 364 END DATA.
 365 @end example
 366
 367 @subsubheading Example 5: Factor variables
 368 @anchor{MATRIX DATA Example 5}
 369
 370 This syntax defines a matrix file that includes a factor variable
 371 @samp{f1}.  The data includes mean, standard deviation, and count
 372 vectors for two values of the factor variable, plus a correlation
 373 matrix for pooled data.
 374
 375 @example
 376 MATRIX DATA
 377     VARIABLES=ROWTYPE_ f1 var01 TO var04
 378     /FACTOR=f1.
 379 BEGIN DATA.
 380 MEAN 0 34 35 36 37
 381 SD   0 22 11 55 66
 382 N    0 99 98 99 92
 383 MEAN 1 44 45 34 39
 384 SD   1 23 15 51 46
 385 N    1 98 34 87 23
 386 CORR .  1
 387 CORR . .9  1
 388 CORR . .8 .6  1
 389 CORR . .7 .5 .4  1
 390 END DATA.
 391 @end example
 392
 393 @node MATRIX DATA without ROWTYPE_
 394 @subsection Without @code{ROWTYPE_}
 395
 396 If @code{VARIABLES} does not contain @code{ROWTYPE_}, the
 397 @subcmd{CONTENTS} subcommand defines the row types that appear in the
 398 file and their order.  If @subcmd{CONTENTS} is omitted,
 399 @code{CONTENTS=CORR} is assumed.
 400
 401 Factor variables without @code{ROWTYPE_} introduce special
 402 requirements, illustrated below in Examples 8 and 9.
 403
 404 @subsubheading Example 6: Defaults without @code{ROWTYPE_}
 405
 406 This example shows a simple use of @cmd{MATRIX DATA} with 8 variables
 407 named @code{var01} through @code{var08}, without @code{ROWTYPE_}.
 408 This yields the same matrix file as Example 1 (@pxref{MATRIX DATA
 409 Example 1}).
 410
 411 @example
 412 MATRIX DATA
 413     VARIABLES=var01 TO var08
 414    /CONTENTS=MEAN SD N CORR.
 415 BEGIN DATA.
 416 24.3   5.4  69.7  20.1  13.4   2.7  27.9   3.7
 417  5.7   1.5  23.5   5.8   2.8   4.5   5.4   1.5
 418   92    92    92    92    92    92    92    92
 419 1.00
 420  .18  1.00
 421 -.22  -.17  1.00
 422  .36   .31  -.14  1.00
 423  .27   .16  -.12   .22  1.00
 424  .33   .15  -.17   .24   .21  1.00
 425  .50   .29  -.20   .32   .12   .38  1.00
 426  .17   .29  -.05   .20   .27   .20   .04  1.00
 427 END DATA.
 428 @end example
 429
 430 @subsubheading Example 7: Split variables with explicit values
 431
 432 This syntax defines two matrices, using the variable @code{s1} to
 433 distinguish between them.  Each line of data begins with @code{s1}.
 434 This yields the same matrix file as Example 4 (@pxref{MATRIX DATA
 435 Example 4}).
 436
 437 @example
 438 MATRIX DATA
 439     VARIABLES=s1 var01 TO var04
 440     /SPLIT=s1
 441     /FORMAT=FULL
 442     /CONTENTS=MEAN SD N CORR.
 443 BEGIN DATA.
 444 0 34 35 36 37
 445 0 22 11 55 66
 446 0 99 98 99 92
 447 0  1 .9 .8 .7
 448 0 .9  1 .6 .5
 449 0 .8 .6  1 .4
 450 0 .7 .5 .4  1
 451 1 44 45 34 39
 452 1 23 15 51 46
 453 1 98 34 87 23
 454 1  1 .2 .3 .4
 455 1 .2  1 .5 .6
 456 1 .3 .5  1 .7
 457 1 .4 .6 .7  1
 458 END DATA.
 459 @end example
 460
 461 @subsubheading Example 8: Split variable with sequential values
 462 @anchor{MATRIX DATA Example 8}
 463
 464 Like this previous example, this syntax defines two matrices with
 465 split variable @code{s1}.  In this case, though, @code{s1} is not
 466 listed in @subcmd{VARIABLES}, which means that its value does not
 467 appear in the data.  Instead, @cmd{MATRIX DATA} reads matrix data
 468 until the input is exhausted, supplying 1 for the first split, 2 for
 469 the second, and so on.
 470
 471 @example
 472 MATRIX DATA
 473     VARIABLES=var01 TO var04
 474     /SPLIT=s1
 475     /FORMAT=FULL
 476     /CONTENTS=MEAN SD N CORR.
 477 BEGIN DATA.
 478 34 35 36 37
 479 22 11 55 66
 480 99 98 99 92
 481  1 .9 .8 .7
 482 .9  1 .6 .5
 483 .8 .6  1 .4
 484 .7 .5 .4  1
 485 44 45 34 39
 486 23 15 51 46
 487 98 34 87 23
 488  1 .2 .3 .4
 489 .2  1 .5 .6
 490 .3 .5  1 .7
 491 .4 .6 .7  1
 492 END DATA.
 493 @end example
 494
 495 @subsubsection Factor variables without @code{ROWTYPE_}
 496
 497 Without @subcmd{ROWTYPE_}, factor variables introduce two new wrinkles
 498 to @cmd{MATRIX DATA} syntax.  First, the @subcmd{CELLS} subcommand
 499 must declare the number of combinations of factor variables present in
 500 the data.  If there is, for example, one factor variable for which the
 501 data contains three values, one would write @code{CELLS=3}; if there
 502 are two (or more) factor variables for which the data contains five
 503 combinations, one would use @code{CELLS=5}; and so on.
 504
 505 Second, the @subcmd{CONTENTS} subcommand must distinguish within-cell
 506 data from pooled data by enclosing within-cell row types in
 507 parentheses.  When different within-cell row types for a single factor
 508 appear in subsequent lines, enclose the row types in a single set of
 509 parentheses; when different factors' values for a given within-cell
 510 row type appear in subsequent lines, enclose each row type in
 511 individual parentheses.
 512
 513 Without @subcmd{ROWTYPE_}, input lines for pooled data do not include
 514 factor values, not even as missing values, but input lines for
 515 within-cell data do.
 516
 517 The following examples aim to clarify this syntax.
 518
 519 @subsubheading Example 9: Factor variables, grouping within-cell records by factor
 520
 521 This syntax defines the same matrix file as Example 5 (@pxref{MATRIX
 522 DATA Example 5}), without using @code{ROWTYPE_}.  It declares
 523 @code{CELLS=2} because the data contains two values (0 and 1) for
 524 factor variable @code{f1}.  Within-cell vector row types @code{MEAN},
 525 @code{SD}, and @code{N} are in a single set of parentheses on
 526 @subcmd{CONTENTS} because they are grouped together in subsequent
 527 lines for a single factor value.  The data lines with the pooled
 528 correlation matrix do not have any factor values.
 529
 530 @example
 531 MATRIX DATA
 532     VARIABLES=f1 var01 TO var04
 533     /FACTOR=f1
 534     /CELLS=2
 535     /CONTENTS=(MEAN SD N) CORR.
 536 BEGIN DATA.
 537 0 34 35 36 37
 538 0 22 11 55 66
 539 0 99 98 99 92
 540 1 44 45 34 39
 541 1 23 15 51 46
 542 1 98 34 87 23
 543    1
 544   .9  1
 545   .8 .6  1
 546   .7 .5 .4  1
 547 END DATA.
 548 @end example
 549
 550 @subsubheading Example 10: Factor variables, grouping within-cell records by row type
 551
 552 This syntax defines the same matrix file as the previous example.  The
 553 only difference is that the within-cell vector rows are grouped
 554 differently: two rows of means (one for each factor), followed by two
 555 rows of standard deviations, followed by two rows of counts.
 556
 557 @example
 558 MATRIX DATA
 559     VARIABLES=f1 var01 TO var04
 560     /FACTOR=f1
 561     /CELLS=2
 562     /CONTENTS=(MEAN) (SD) (N) CORR.
 563 BEGIN DATA.
 564 0 34 35 36 37
 565 1 44 45 34 39
 566 0 22 11 55 66
 567 1 23 15 51 46
 568 0 99 98 99 92
 569 1 98 34 87 23
 570    1
 571   .9  1
 572   .8 .6  1
 573   .7 .5 .4  1
 574 END DATA.
 575 @end example
 576
 577 @node MCONVERT
 578 @section MCONVERT
 579 @vindex MCONVERT
 580
 581 @display
 582 MCONVERT
 583     [[MATRIX=]
 584      [IN(@{@samp{*}|'@var{file}'@})]
 585      [OUT(@{@samp{*}|'@var{file}'@})]]
 586     [/@{REPLACE,APPEND@}].
 587 @end display
 588
 589 The @cmd{MCONVERT} command converts matrix data from a correlation
 590 matrix and a vector of standard deviations into a covariance matrix,
 591 or vice versa.
 592
 593 By default, @cmd{MCONVERT} both reads and writes the active file.  Use
 594 the @cmd{MATRIX} subcommand to specify other files.  To read a matrix
 595 file, specify its name inside parentheses following @code{IN}.  To
 596 write a matrix file, specify its name inside parentheses following
 597 @code{OUT}.  Use @samp{*} to explicitly specify the active file for
 598 input or output.
 599
 600 When @cmd{MCONVERT} reads the input, by default it substitutes a
 601 correlation matrix and a vector of standard deviations each time it
 602 encounters a covariance matrix, and vice versa.  Specify
 603 @code{/APPEND} to instead have @cmd{MCONVERT} add the other form of
 604 data without removing the existing data.  Use @code{/REPLACE} to
 605 explicitly request removing the existing data.
 606
 607 The @cmd{MCONVERT} command requires its input to be a matrix file.
 608 Use @cmd{MATRIX DATA} to convert text input into matrix file format.
 609 @xref{MATRIX DATA}, for details.
 610
 611 @node MATRIX
 612 @section MATRIX
 613 @vindex MATRIX
 614 @vindex END MATRIX
 615
 616 @node Matrix Overview
 617 @subsection Overview
 618
 619 @display
 620 @t{MATRIX.}
 621 @dots{}@i{matrix commands}@dots{}
 622 @t{END MATRIX.}
 623 @end display
 624
 625 @noindent
 626 The following basic matrix commands are supported:
 627
 628 @display
 629 @t{COMPUTE} @i{variable}[(@i{index}[,@i{index}])]=@i{expression}.
 630 @t{CALL} @i{procedure}(@i{argument}, @dots{}).
 631 @t{PRINT} [@i{expression}]
 632       [/@t{FORMAT}=@i{format}]
 633       [/@t{TITLE}=@i{title}]
 634       [/@t{SPACE}=@{@t{NEWPAGE} @math{|} @i{n}@}]
 635       [@{/@t{RLABELS}=@i{string}@dots{} @math{|} /@t{RNAMES}=@i{expression}@}]
 636       [@{/@t{CLABELS}=@i{string}@dots{} @math{|} /@t{CNAMES}=@i{expression}@}].
 637 @end display
 638
 639 @noindent
 640 The following matrix commands offer support for flow control:
 641
 642 @display
 643 @t{DO IF} @i{expression}.
 644   @dots{}@i{matrix commands}@dots{}
 645 [@t{ELSE IF} @i{expression}.
 646   @dots{}@i{matrix commands}@dots{}]@dots{}
 647 [@t{ELSE}
 648   @dots{}@i{matrix commands}@dots{}]
 649 @t{END IF}.
 650
 651 @t{LOOP} [@i{var}=@i{first} @t{TO} @i{last} [@t{BY} @i{step}]] [@t{IF} @i{expression}].
 652   @dots{}@i{matrix commands}@dots{}
 653 @t{END LOOP} [@t{IF} @i{expression}].
 654
 655 @t{BREAK}.
 656 @end display
 657
 658 @noindent
 659 The following matrix commands support matrix input and output:
 660
 661 @display
 662 @t{READ} @i{variable}[(@i{index}[,@i{index}])]
 663      [/@t{FILE}=@i{file}]
 664      /@t{FIELD}=@i{first} @t{TO} @i{last} [@t{BY} @i{width}]
 665      [/@t{SIZE}=@i{expression}]
 666      [/@t{MODE}=@{@t{RECTANGULAR} @math{|} @t{SYMMETRIC}@}]
 667      [/@t{REREAD}]
 668      [/@t{FORMAT}=@i{format}].
 669 @t{WRITE} @i{expression}
 670       [/@t{OUTFILE}=@i{file}]
 671       /@t{FIELD}=@i{first} @t{TO} @i{last} [@t{BY} @i{width}]
 672       [/@t{MODE}=@{@t{RECTANGULAR} @math{|} @t{TRIANGULAR}@}]
 673       [/@t{HOLD}]
 674       [/@t{FORMAT}=@i{format}].
 675 @t{GET} @i{variable}[(@i{index}[,@i{index}])]
 676     [/@t{FILE}=@{@i{file} @math{|} @t{*}@}]
 677     [/@t{VARIABLES}=@i{variable}@dots{}]
 678     [/@t{NAMES}=@i{expression}]
 679     [/@t{MISSING}=@{@t{ACCEPT} @math{|} @t{OMIT} @math{|} @i{number}@}]
 680     [/@t{SYSMIS}=@{@t{OMIT} @math{|} @i{number}@}].
 681 @t{SAVE} @i{expression}
 682      [/@t{OUTFILE}=@{@i{file} @math{|} @t{*}@}]
 683      [/@t{VARIABLES}=@i{variable}@dots{}]
 684      [/@t{NAMES}=@i{expression}]
 685      [/@t{STRINGS}=@i{variable}@dots{}].
 686 @t{MGET} [/@t{FILE}=@i{file}]
 687      [/@t{TYPE}=@{@t{COV} @math{|} @t{CORR} @math{|} @t{MEAN} @math{|} @t{STDDEV} @math{|} @t{N} @math{|} @t{COUNT}@}].
 688 @t{MSAVE} @i{expression}
 689       /@t{TYPE}=@{@t{COV} @math{|} @t{CORR} @math{|} @t{MEAN} @math{|} @t{STDDEV} @math{|} @t{N} @math{|} @t{COUNT}@}
 690       [/@t{OUTFILE}=@i{file}]
 691       [/@t{VARIABLES}=@i{variable}@dots{}]
 692       [/@t{SNAMES}=@i{variable}@dots{}]
 693       [/@t{SPLIT}=@i{expression}]
 694       [/@t{FNAMES}=@i{variable}@dots{}]
 695       [/@t{FACTOR}=@i{expression}].
 696 @end display
 697
 698 @noindent
 699 The following matrix commands provide additional support:
 700
 701 @display
 702 @t{DISPLAY} [@{@t{DICTIONARY} @math{|} @t{STATUS}@}].
 703 @t{RELEASE} @i{variable}@dots{}.
 704 @end display
 705
 706 @node Matrix Introduction
 707 @subsection Introduction
 708
 709 @code{MATRIX} and @code{END MATRIX} enclose a special @pspp{}
 710 sub-language, called the matrix language.  The matrix language does
 711 not require an active dataset to be defined and only a few of the
 712 matrix language commands work with any datasets that are defined.
 713 Each instance of @code{MATRIX}@dots{}@code{END MATRIX} is a separate
 714 program whose state is independent of any instance, so that variables
 715 declared within a matrix program are forgotten at its end.
 716
 717 The matrix language works with matrices, where a @dfn{matrix} is a
 718 rectangular array of real numbers.  An @math{@var{n}@times{}@var{m}}
 719 matrix has @var{n} rows and @var{m} columns.  Some special cases are
 720 important: a @math{@var{n}@times{}1} matrix is a @dfn{column vector},
 721 a @math{1@times{}@var{n}} is a @dfn{row vector}, and a
 722 @math{1@times{}1} matrix is a @dfn{scalar}.
 723
 724 The matrix language also has limited support for matrices that contain
 725 8-byte strings instead of numbers.  Strings longer than 8 bytes are
 726 truncated, and shorter strings are padded with spaces.  String
 727 matrices are mainly useful for labeling rows and columns when printing
 728 numerical matrices with the @code{MATRIX PRINT} command.  Arithmetic
 729 operations on string matrices will not produce useful results.  The
 730 user should not mix strings and numbers within a matrix.
 731
 732 The matrix language does not work with cases.  A variable in the
 733 matrix language represents a single matrix.
 734
 735 The matrix language does not support missing values.
 736
 737 @node Matrix Operators
 738 @subsection Matrix Operators
 739
 740 Many matrix commands use expressions.  A matrix expression may use the
 741 following operators, listed in descending order of operator
 742 precedence:
 743
 744 @itemize @bullet
 745 @item @t{(@dots{})}  @t{@{@dots{}@}}
 746
 747 @item @t{(}@i{index}[@t{,} @i{index}]@t{)}
 748
 749 @item @t{+  -}
 750
 751 @item @t{:}
 752
 753 @item @t{**  &**}
 754
 755 @item @t{*  /  &*  &/}
 756
 757 @item @t{+  -}
 758
 759 @item @t{< <= = >= > <>}
 760
 761 @item @t{NOT}
 762
 763 @item @t{AND}
 764
 765 @item @t{OR  XOR}
 766 @end itemize
 767
 768 Each of these operators is described in more detail below.
 769
 770 @node Matrix Construction Operator
 771 @subsubsection Matrix Construction Operator @t{@{@}}
 772
 773 Use the @t{@{}@t{@}} operator to construct matrices.  Within
 774 the curly braces, commas separate elements within a row and semicolons
 775 separate rows.  The following examples show a @math{2@times{}3}
 776 matrix, a @math{1@times{}4} row vector, a @math{3@times{}1} column
 777 vector, and a scalar.
 778
 779 @multitable @columnfractions .4 .05 .4
 780 @item @t{@{1, 2, 3; 4, 5, 6@}}
 781 @tab @result{}
 782 @tab
 783 @ifnottex
 784 @t{[1  2  3] @* [4  5  6]}
 785 @end ifnottex
 786 @iftex
 787 @math{\left(\matrix{1 & 2 & 3 \cr 4 & 5 & 6}\right)}
 788 @end iftex
 789 @
 790 @item @t{@{3.14, 6.28, 9.24, 12.57@}}
 791 @tab @result{}
 792 @tab
 793 @ifnottex
 794 [3.14  6.28  9.42  12.57]
 795 @end ifnottex
 796 @iftex
 797 @math{(\matrix{3.14 & 6.28 & 9.42 & 12.57})}
 798 @end iftex
 799 @
 800 @item @t{@{1.41; 1.73; 2@}}
 801 @tab @result{}
 802 @tab
 803 @ifnottex
 804 @t{[1.41] @* [1.73] @* [2.00]}
 805 @end ifnottex
 806 @iftex
 807 @math{(\matrix{1.41 & 1.73 & 2.00})}
 808 @end iftex
 809 @
 810 @item @t{@{5@}}
 811 @tab @result{}
 812 @tab 5
 813 @end multitable
 814
 815 Curly braces are not limited to holding numeric literals.  They can
 816 contain calculations, and they can paste together matrices and vectors
 817 in any way as long as the result is rectangular.  For example, if
 818 @samp{m} is matrix @code{@{1, 2; 3, 4@}}, @samp{r} is row vector
 819 @code{@{5, 6@}}, and @samp{c} is column vector @code{@{7, 8@}}, then
 820 curly braces can be used as follows:
 821
 822 @multitable @columnfractions .4 .05 .4
 823 @item @t{@{m, c; r, 10@}}
 824 @tab @result{}
 825 @tab
 826 @ifnottex
 827 @t{[1 2  7] @* [3 4  8] @* [5 6 10]}
 828 @end ifnottex
 829 @iftex
 830 @math{\left(\matrix{1 & 2 & 7 \cr 3 & 4 & 8 \cr 5 & 6 & 10}\right)}
 831 @end iftex
 832 @
 833 @item @t{@{c, 2 * c, T(r)@}}
 834 @tab @result{}
 835 @tab
 836 @ifnottex
 837 @t{[7 14 5] @* [8 16 6]}
 838 @end ifnottex
 839 @iftex
 840 @math{\left(\matrix{7 & 14 & 5 \cr 8 & 16 & 6}\right)}
 841 @end iftex
 842 @end multitable
 843
 844 The final example above uses the transposition function @code{T}.
 845
 846 @node Matrix Sequence Operator
 847 @subsubsection Integer Sequence Operator @samp{:}
 848
 849 The syntax @code{@var{first}:@var{last}:@var{step}} yields a row
 850 vector of consecutive integers from @var{first} to @var{last} counting
 851 by @var{step}.  The final @code{:@var{step}} is optional and
 852 defaults to 1 when omitted.
 853
 854 Each of @var{first}, @var{last}, and @var{step} must be a scalar and
 855 should be an integer (any fractional part is discarded).  Because
 856 @samp{:} has a high precedence, operands other than numeric literals
 857 must usually be parenthesized.
 858
 859 When @var{step} is positive (or omitted) and @math{@var{end} <
 860 @var{start}}, or if @var{step} is negative and @math{@var{end} >
 861 @var{start}}, then the result is an empty matrix.  If @var{step} is 0,
 862 then @pspp{} reports an error.
 863
 864 Here are some examples:
 865
 866 @multitable @columnfractions .4 .05 .4
 867 @item @t{1:6}      @tab @result{} @tab @t{@{1, 2, 3, 4, 5, 6@}}
 868 @item @t{1:6:2}    @tab @result{} @tab @t{@{1, 3, 5@}}
 869 @item @t{-1:-5:-1} @tab @result{} @tab @t{@{-1, -2, -3, -4, -5@}}
 870 @item @t{-1:-5}    @tab @result{} @tab @t{@{@}}
 871 @item @t{2:1:0}    @tab @result{} @tab (error)
 872 @end multitable
 873
 874 @node Matrix Index Operator
 875 @subsubsection Index Operator @code{()}
 876
 877 The result of the submatrix or indexing operator, written
 878 @code{@var{m}(@var{rindex}, @var{cindex})}, contains the rows of
 879 @var{m} whose indexes are given in vector @var{rindex} and the columns
 880 whose indexes are given in vector @var{cindex}.
 881
 882 In the simplest case, if @var{rindex} and @var{cindex} are both
 883 scalars, the result is also a scalar:
 884
 885 @multitable @columnfractions .4 .05 .4
 886 @item @t{@{10, 20; 30, 40@}(1, 1)} @tab @result{} @tab @t{10}
 887 @item @t{@{10, 20; 30, 40@}(1, 2)} @tab @result{} @tab @t{20}
 888 @item @t{@{10, 20; 30, 40@}(2, 1)} @tab @result{} @tab @t{30}
 889 @item @t{@{10, 20; 30, 40@}(2, 2)} @tab @result{} @tab @t{40}
 890 @end multitable
 891
 892 If the index arguments have multiple elements, then the result
 893 includes multiple rows or columns:
 894
 895 @multitable @columnfractions .4 .05 .4
 896 @item @t{@{10, 20; 30, 40@}(1:2, 1)} @tab @result{} @tab @t{@{10; 30@}}
 897 @item @t{@{10, 20; 30, 40@}(2, 1:2)} @tab @result{} @tab @t{@{30, 40@}}
 898 @item @t{@{10, 20; 30, 40@}(1:2, 1:2)} @tab @result{} @tab @t{@{10, 20; 30, 40@}}
 899 @end multitable
 900
 901 The special argument @samp{:} may stand in for all the rows or columns
 902 in the matrix being indexed, like this:
 903
 904 @multitable @columnfractions .4 .05 .4
 905 @item @t{@{10, 20; 30, 40@}(:, 1)} @tab @result{} @tab @t{@{10; 30@}}
 906 @item @t{@{10, 20; 30, 40@}(2, :)} @tab @result{} @tab @t{@{30, 40@}}
 907 @item @t{@{10, 20; 30, 40@}(:, :)} @tab @result{} @tab @t{@{10, 20; 30, 40@}}
 908 @end multitable
 909
 910 The index arguments do not have to be in order, and they may contain
 911 repeated values, like this:
 912
 913 @multitable @columnfractions .4 .05 .4
 914 @item @t{@{10, 20; 30, 40@}(@{2, 1@}, 1)} @tab @result{} @tab @t{@{30; 10@}}
 915 @item @t{@{10, 20; 30, 40@}(2, @{2; 2; 1@})} @tab @result{} @tab @t{@{40, 40, 30@}}
 916 @item @t{@{10, 20; 30, 40@}(2:1:-1, :)} @tab @result{} @tab @t{@{30, 40; 10, 20@}}
 917 @end multitable
 918
 919 When the matrix being indexed is a row or column vector, only a single
 920 index argument is needed, like this:
 921
 922 @multitable @columnfractions .4 .05 .4
 923 @item @t{@{11, 12, 13, 14, 15@}(2:4)} @tab @result{} @tab @t{@{12, 13, 14@}}
 924 @item @t{@{11; 12; 13; 14; 15@}(2:4)} @tab @result{} @tab @t{@{12; 13; 14@}}
 925 @end multitable
 926
 927 When an index is not an integer, @pspp{} discards the fractional part.
 928 It is an error for an index to be less than 1 or greater than the
 929 number of rows or columns:
 930
 931 @multitable @columnfractions .4 .05 .4
 932 @item @t{@{11, 12, 13, 14@}(@{2.5, 4.6@})} @tab @result{} @tab @t{@{12, 14@}}
 933 @item @t{@{11; 12; 13; 14@}(0)} @tab @result{} @tab (error)
 934 @end multitable
 935
 936 @node Matrix Unary Operators
 937 @subsubsection Unary Operators
 938
 939 The unary operators take a single operand of any dimensions and
 940 operate on each of its elements independently.  The unary operators
 941 are:
 942
 943 @table @code
 944 @item -
 945 Inverts the sign of each element.
 946
 947 @item +
 948 No change.
 949
 950 @item NOT
 951 Logical inversion: each positive value becomes 0 and each zero or
 952 negative value becomes 1.
 953 @end table
 954
 955 Examples:
 956
 957 @multitable @columnfractions .4 .05 .4
 958 @item @t{-@{1, -2; 3, -4@}} @tab @result{} @tab @t{@{-1, 2; -3, 4@}}
 959 @item @t{+@{1, -2; 3, -4@}} @tab @result{} @tab @t{@{1, -2; 3, -4@}}
 960 @item @t{NOT @{1, 0; -1, 1@}} @tab @result{} @tab @t{@{0, 1; 1, 0@}}
 961 @end multitable
 962
 963 @node Matrix Elementwise Binary Operators
 964 @subsubsection Elementwise Binary Operators
 965
 966 The elementwise binary operators require their operands to be matrices
 967 with the same dimensions.  Alternatively, if one operand is a scalar,
 968 then its value is treated as if it were duplicated to the dimensions
 969 of the other operand.  The result is a matrix of the same size as the
 970 operands, in which each element is the result of the applying the
 971 operator to the corresponding elements of the operands.
 972
 973 The elementwise binary operators are listed below.
 974
 975 @itemize @bullet
 976 @item
 977 The arithmetic operators, for familiar arithmetic operations:
 978
 979 @table @asis
 980 @item @code{+}
 981 Addition.
 982
 983 @item @code{-}
 984 Subtraction.
 985
 986 @item @code{*}
 987 Multiplication, if one operand is a scalar.  (Otherwise this is matrix
 988 multiplication, described below.)
 989
 990 @item @code{/} or @code{&/}
 991 Division.
 992
 993 @item @code{&*}
 994 Multiplication.
 995
 996 @item @code{&**}
 997 Exponentiation.
 998 @end table
 999
1000 @item
1001 The relational operators, whose results are 1 when a comparison is
1002 true and 0 when it is false:
1003
1004 @table @asis
1005 @item @code{<} or @code{LT}
1006 Less than.
1007
1008 @item @code{<=} or @code{LE}
1009 Less than or equal.
1010
1011 @item @code{=} or @code{EQ}
1012 Equal.
1013
1014 @item @code{>} or @code{GT}
1015 Greater than.
1016
1017 @item @code{>=} or @code{GE}
1018 Greater than or equal.
1019
1020 @item @code{<>} or @code{~=} or @code{NE}
1021 Not equal.
1022 @end table
1023
1024 @item
1025 The logical operators, which treat positive operands as true and
1026 nonpositive operands as false.  They yield 0 for false and 1 for true:
1027
1028 @table @code
1029 @item AND
1030 True if both operands are true.
1031
1032 @item OR
1033 True if at least one operand is true.
1034
1035 @item XOR
1036 True if exactly one operand is true.
1037 @end table
1038 @end itemize
1039
1040 @node Matrix Multiplication Operator
1041 @subsubsection Matrix Multiplication Operator @samp{*}
1042
1043 If @code{A} is an @math{@var{m}@times{}@var{n}} matrix and @code{B} is
1044 an @math{@var{n}@times{}@var{p}} matrix, then @code{A*B} is the
1045 @math{@var{m}@times{}@var{p}} matrix multiplication product @code{C}.
1046 @pspp{} reports an error if the number of columns in @code{A} differs
1047 from the number of rows in @code{B}.
1048
1049 The @code{*} operator performs elementwise multiplication (see above)
1050 if one of its operands is a scalar.
1051
1052 No built-in operator yields the inverse of matrix multiplication.
1053 Instead, multiply by the result of @code{INV} or @code{GINV}.
1054
1055 Some examples:
1056
1057 @multitable @columnfractions .4 .05 .4
1058 @item @t{@{1, 2, 3@} * @{4; 5; 6@}} @tab @result{} @tab @t{32}
1059 @item @t{@{4; 5; 6@} * @{1, 2, 3@}} @tab @result{} @tab @t{@{4,@w{ } 8, 12; @*@w{ }5, 10, 15; @*@w{ }6, 12, 18@}}
1060 @end multitable
1061
1062 @node Matrix Exponentiation Operator
1063 @subsubsection Matrix Exponentiation Operator @code{**}
1064
1065 The result of @code{A**B} is defined as follows when @code{A} is a
1066 square matrix and @code{B} is an integer scalar:
1067
1068 @itemize @bullet
1069 @item
1070 For @code{B > 0}, @code{A**B} is @code{A*@dots{}*A}, where there are
1071 @code{B} @samp{A}s.  (@pspp{} implements this efficiently for large
1072 @code{B}, using exponentiation by squaring.)
1073
1074 @item
1075 For @code{B < 0}, @code{A**B} is @code{INV(A**(-B))}.
1076
1077 @item
1078 For @code{B = 0}, @code{A**B} is the identity matrix.
1079 @end itemize
1080
1081 @noindent
1082 @pspp{} reports an error if @code{A} is not square or @code{B} is not
1083 an integer.