+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
@c (modify-syntax-entry ?_ "w")
@c (modify-syntax-entry ?' "'")
@c (modify-syntax-entry ?@ "'")
@cindex cases
@cindex observations
-Data are the focus of the @pspp{} language.
+Data are the focus of the @pspp{} language.
Each datum belongs to a @dfn{case} (also called an @dfn{observation}).
Each case represents an individual or ``experimental unit''.
For example, in the results of a survey, the names of the respondents,
* INPUT PROGRAM:: Support for complex input programs.
* LIST:: List cases in the active dataset.
* NEW FILE:: Clear the active dataset.
+* MATRIX DATA:: Defining matrix material for procedures.
* PRINT:: Display values in print formats.
* PRINT EJECT:: Eject the current page then print.
* PRINT SPACE:: Print blank lines.
``empty,'' that is, it has no dictionary or data. If a dataset with
the given name already exists, this has no effect. The new dataset
can be used with commands that support output to a dataset,
-e.g. AGGREGATE (@pxref{AGGREGATE}).
+@i{e.g.} AGGREGATE (@pxref{AGGREGATE}).
@vindex DATASET CLOSE
The DATASET CLOSE command deletes a dataset. If the active dataset is
In columnar style, to use a variable format other than the default,
specify the format type in parentheses after the column numbers. For
-instance, for alphanumeric @samp{A} format, use @samp{(A)}.
+instance, for alphanumeric @samp{A} format, use @samp{(A)}.
In addition, implied decimal places can be specified in parentheses
after the column numbers. As an example, suppose that a data file has a
leaves the active column immediately after the ending column
specified. Record motion using @code{NEWREC} in FORTRAN style also
applies to later FORTRAN and columnar specifiers.
-
+
@menu
* DATA LIST FIXED Examples:: Examples of DATA LIST FIXED.
@end menu
@end display
In free format, the input data is, by default, structured as a series
-of fields separated by spaces, tabs, commas, or line breaks. Each
+of fields separated by spaces, tabs, or line breaks.
+If the current @subcmd{DECIMAL} separator is @subcmd{DOT} (@pxref{SET}),
+then commas are also treated as field separators.
+Each
field's content may be unquoted, or it may be quoted with a pairs of
apostrophes (@samp{'}) or double quotes (@samp{"}). Unquoted white
space separates fields but is not part of any field. Any mix of
This list must be introduced by a single slash (@samp{/}). The set of
variable names may contain format specifications in parentheses
(@pxref{Input and Output Formats}). Format specifications apply to all
-variables back to the previous parenthesized format specification.
+variables back to the previous parenthesized format specification.
In addition, an asterisk may be used to indicate that all variables
preceding it are to have input/output format @samp{F8.0}.
stops the flow of input data and passes out of the @cmd{INPUT PROGRAM}
structure.
+@cmd{INPUT PROGRAM} must contain at least one @cmd{DATA LIST} or
+@cmd{END FILE} command.
+
All this is very confusing. A few examples should help to clarify.
@c If you change this example, change the regression test1 in
@example
INPUT PROGRAM.
NUMERIC #A #B.
-
+
DO IF NOT #A.
DATA LIST NOTABLE END=#A FILE='a.data'/X 1-10.
END IF.
Case numbers start from 1. They are counted after all transformations
have been considered.
-@cmd{LIST} attempts to fit all the values on a single line. If needed
-to make them fit, variable names are displayed vertically. If values
-cannot fit on a single line, then a multi-line format will be used.
-
@cmd{LIST} is a procedure. It causes the data to be read.
@node NEW FILE
@cmd{NEW FILE} command clears the dictionary and data from the current
active dataset.
+@node MATRIX DATA
+@section MATRIX DATA
+@vindex MATRIX DATA
+
+@display
+MATRIX DATA
+ VARIABLES = @var{columns}
+ [FILE='@var{file_name}'| INLINE @}
+ [/FORMAT= [@{LIST | FREE@}]
+ [@{UPPER | LOWER | FULL@}]
+ [@{DIAGONAL | NODIAGONAL@}]]
+ [/N= @var{n}]
+ [/SPLIT= @var{split_variables}].
+@end display
+
+The @cmd{MATRIX DATA} command is used to input data in the form of matrices
+which can subsequently be used by other commands. If the
+@subcmd{FILE} is omitted or takes the value @samp{INLINE} then the command
+should immediately followed by @cmd{BEGIN DATA} (@pxref{BEGIN DATA}).
+
+There is one mandatory subcommand, @i{viz:} @subcmd{VARIABLES}, which defines
+the @var{columns} of the matrix.
+Normally, the @var{columns} should include an item called @samp{ROWTYPE_}.
+The @samp{ROWTYPE_} column is used to specify the purpose of a row in the
+matrix.
+
+@example
+matrix data
+ variables = rowtype_ var01 TO var08.
+
+begin data.
+mean 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+sd 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+n 92 92 92 92 92 92 92 92
+corr 1.00
+corr .18 1.00
+corr -.22 -.17 1.00
+corr .36 .31 -.14 1.00
+corr .27 .16 -.12 .22 1.00
+corr .33 .15 -.17 .24 .21 1.00
+corr .50 .29 -.20 .32 .12 .38 1.00
+corr .17 .29 -.05 .20 .27 .20 .04 1.00
+end data.
+@end example
+
+In the above example, the first three rows have ROWTYPE_ values of
+@samp{mean}, @samp{sd}, and @samp{n}. These indicate that the rows
+contain mean values, standard deviations and counts, respectively.
+All subsequent rows have a ROWTYPE_ of @samp{corr} which indicates
+that the values are correlation coefficients.
+
+Note that in this example, the upper right values of the @samp{corr}
+values are blank, and in each case, the rightmost value is unity.
+This is because, the
+@subcmd{FORMAT} subcommand defaults to @samp{LOWER DIAGONAL},
+which indicates that only the lower triangle is provided in the data.
+The opposite triangle is automatically inferred. One could instead
+specify the upper triangle as follows:
+
+
+@example
+matrix data
+ variables = rowtype_ var01 TO var08
+ /format = upper nodiagonal.
+
+begin data.
+mean 24.3 5.4 69.7 20.1 13.4 2.7 27.9 3.7
+sd 5.7 1.5 23.5 5.8 2.8 4.5 5.4 1.5
+n 92 92 92 92 92 92 92 92
+corr .17 .50 -.33 .27 .36 -.22 .18
+corr .29 .29 -.20 .32 .12 .38
+corr .05 .20 -.15 .16 .21
+corr .20 .32 -.17 .12
+corr .27 .12 -.24
+corr -.20 -.38
+corr .04
+end data.
+@end example
+
+In this example the @samp{NODIAGONAL} keyword is used. Accordingly
+the diagonal values of the matrix are omitted. This implies that
+there is one less @samp{corr} line than there are variables.
+If the @samp{FULL} option is passed to the @subcmd{FORMAT} subcommand,
+then all the matrix elements must be provided, including the diagonal
+elements.
+
+In the preceding examples, each matrix row has been specified on a
+single line. If you pass the keyword @var{FREE} to @subcmd{FORMAT}
+then the data may be data for several matrix rows may be specified on
+the same line, or a single row may be split across lines.
+
+The @subcmd{N} subcommand may be used to specify the number
+of valid cases for each variable. It should not be used if the
+data contains a record whose ROWTYPE_ column is @samp{N} or @samp{N_VECTOR}.
+It implies a @samp{N} record whose values are all @var{n}.
+That is to say,
+@example
+matrix data
+ variables = rowtype_ var01 TO var04
+ /format = upper nodiagonal
+ /n = 99.
+begin data
+mean 34 35 36 37
+sd 22 11 55 66
+corr 9 8 7
+corr 6 5
+corr 4
+end data.
+@end example
+produces an effect identical to
+@example
+matrix data
+ variables = rowtype_ var01 TO var04
+ /format = upper nodiagonal
+begin data
+n 99 99 99 99
+mean 34 35 36 37
+sd 22 11 55 66
+corr 9 8 7
+corr 6 5
+corr 4
+end data.
+@end example
+
+
+The @subcmd{SPLIT} is used to indicate that variables are to be
+considered as split variables. For example, the following
+defines two matrices using the variable @samp{S1} to distinguish
+between them.
+
+@example
+matrix data
+ variables = s1 rowtype_ var01 TO var04
+ /split = s1
+ /format = full diagonal.
+
+begin data
+0 mean 34 35 36 37
+0 sd 22 11 55 66
+0 n 99 98 99 92
+0 corr 1 9 8 7
+0 corr 9 1 6 5
+0 corr 8 6 1 4
+0 corr 7 5 4 1
+1 mean 44 45 34 39
+1 sd 23 15 51 46
+1 n 98 34 87 23
+1 corr 1 2 3 4
+1 corr 2 1 5 6
+1 corr 3 5 1 7
+1 corr 4 6 7 1
+end data.
+@end example
+
@node PRINT
@section PRINT
@vindex PRINT
@display
-PRINT
+PRINT
[OUTFILE='@var{file_name}']
[RECORDS=@var{n_lines}]
[@{NOTABLE,TABLE@}]
The @subcmd{OUTFILE} subcommand specifies the file to receive the output. The
file may be a file name as a string or a file handle (@pxref{File
Handles}). If @subcmd{OUTFILE} is not present then output will be sent to
-@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, a space is
-inserted at beginning of each output line, even lines that otherwise
-would be blank.
+@pspp{}'s output listing file. When @subcmd{OUTFILE} is present, the
+output is written to @var{file_name} in a plain text format, with a
+space inserted at beginning of each output line, even lines that
+otherwise would be blank.
The @subcmd{ENCODING} subcommand may only be used if the
@subcmd{OUTFILE} subcommand is also used. It specifies the character
@vindex PRINT EJECT
@display
-PRINT EJECT
+PRINT EJECT
OUTFILE='@var{file_name}'
RECORDS=@var{n_lines}
@{NOTABLE,TABLE@}
@vindex WRITE
@display
-WRITE
+WRITE
OUTFILE='@var{file_name}'
RECORDS=@var{n_lines}
@{NOTABLE,TABLE@}
@var{var_list} *
@end display
-@code{WRITE} writes text or binary data to an output file.
+@code{WRITE} writes text or binary data to an output file.
@xref{PRINT}, for more information on syntax and usage. @cmd{PRINT}
and @cmd{WRITE} differ in only a few ways: