Fixed behaviour of oneway when presented with missing values

[pspp-builds.git] / doc / pspp.texi
diff --git a/doc/pspp.texi b/doc/pspp.texi

index a39c7dc0d14a287d626755374ee7c99e9beac3a7..0ad81eb183f2a45d36742ba5adab37af4f1efe4f 100644 (file)
--- a/doc/pspp.texi
+++ b/doc/pspp.texi
@@ -1047,7 +1047,7 @@ Backspace (ASCII 8).
  Formfeed (ASCII 12).
  
  @item \n
-Newline (ASCII 10)
+New-line (ASCII 10)
  
  @item \r
  Carriage return (ASCII 13).
@@ -1137,7 +1137,7 @@ the output file.  Default: @code{clean7bit}.
  @item line-ends=@var{line-end-type}
  
  One of @code{cr}, @code{lf}, or @code{crlf}.  This controls what is used
-for newline in the output file.  Default: @code{cr}.
+for new-line in the output file.  Default: @code{cr}.
  
  @item optimize-line-size=@var{level}
  
@@ -1556,11 +1556,11 @@ The string written to the output to cause a formfeed.  See also
  @code{paginate}, described below, for a related setting.  Default:
  @code{"\f"}.
  
-@item newline-string=@var{newline-value}
+@item newline-string=@var{new-line-value}
  
-The string written to the output to cause a newline (carriage return
+The string written to the output to cause a new-line (carriage return
  plus linefeed).  The default, which can be specified explicitly with
-@code{newline-string=default}, is to use the system-dependent newline
+@code{newline-string=default}, is to use the system-dependent new-line
  sequence by opening the output file in text mode.  This is usually the
  right choice.
  
@@ -3593,14 +3593,14 @@ as arguments.  With few exceptions, operator arguments may be
  full-fledged expressions in themselves.
  
  @menu
-* Boolean Values::                 Boolean values.
+* Boolean Values::              Boolean values.
  * Missing Values in Expressions::  Using missing values in expressions.
-* Grouping Operators::             ( )
-* Arithmetic Operators::           + - * / **
-* Logical Operators::              AND NOT OR
-* Relational Operators::           EQ GE GT LE LT NE
-* Functions::                      More-sophisticated operators.
-* Order of Operations::            Operator precedence.
+* Grouping Operators::          parentheses
+* Arithmetic Operators::        add sub mul div pow
+* Logical Operators::           AND NOT OR
+* Relational Operators::        EQ GE GT LE LT NE
+* Functions::                   More-sophisticated operators.
+* Order of Operations::         Operator precedence.
  @end menu
  
  @node Boolean Values, Missing Values in Expressions, Expressions, Expressions
@@ -5122,6 +5122,7 @@ This example shows keywords abbreviated to their first 3 letters.
  
  @display
  DATA LIST FREE
+        [(@{TAB,'c'@}, @dots{})]
          [@{NOTABLE,TABLE@}]
          FILE='filename'
          END=end_var
@@ -5132,16 +5133,23 @@ where each var_spec takes one of the forms
          var_list *
  @end display
  
-In free format, the input data is structured as a series of comma- or
-whitespace-delimited fields (end of line is one form of whitespace; it
-is not treated specially).  Field contents may be surrounded by matched
-pairs of apostrophes (@samp{'}) or quotes (@samp{"}), or they may be
-unenclosed.  For any type of field leading white space (up to the
-apostrophe or quote, if any) is not included in the field.
-
-Multiple consecutive delimiters are equivalent to a single delimiter.
-To specify an empty field, write an empty set of single or double
-quotes; for instance, @samp{""}.
+In free format, the input data is, by default, structured as a series
+of fields separated by spaces, tabs, commas, or line breaks.  Each
+field's content may be unquoted, or it may be quoted with a pairs of
+apostrophes (@samp{'}) or double quotes (@samp{"}).  Unquoted white
+space separates fields but is not part of any field.  Any mix of
+spaces, tabs, and line breaks is equivalent to a single space for the
+purpose of separating fields, but consecutive commas will skip a
+field.
+
+Alternatively, delimiters can be specified explicitly, as a
+parenthesized, comma-separated list of single-character strings
+immediately following FREE.  The word TAB may also be used to specify
+a tab character as a delimiter.  When delimiters are specified
+explicitly, only the given characters, plus line breaks, separate
+fields.  Furthermore, leading spaces at the beginnings of fields are
+not trimmed, consecutive delimiters define empty fields, and no form
+of quoting is allowed.
  
  The NOTABLE and TABLE subcommands are as in @cmd{DATA LIST FIXED} above.
  NOTABLE is the default.
@@ -5166,6 +5174,7 @@ on field width apply, but they are honored on output.
  
  @display
  DATA LIST LIST
+        [(@{TAB,'c'@}, @dots{})]
          [@{NOTABLE,TABLE@}]
          FILE='filename'
          END=end_var
@@ -5211,14 +5220,19 @@ the current input program.  @xref{INPUT PROGRAM}.
  @display
  FILE HANDLE handle_name
          /NAME='filename'
-        /RECFORM=@{VARIABLE,FIXED,SPANNED@}
+        /MODE=@{CHARACTER,IMAGE@}
          /LRECL=rec_len
-        /MODE=@{CHARACTER,IMAGE,BINARY,MULTIPUNCH,360@}
+        /TABWIDTH=tab_width
  @end display
  
-Use @cmd{FILE HANDLE} to define the attributes of a file that does
-not use conventional variable-length records terminated by newline
-characters.
+Use @cmd{FILE HANDLE} to associate a file handle name with a file and
+its attributes, so that later commands can refer to the file by its
+handle name.  Because names of text files can be specified directly on
+commands that access files, @cmd{FILE HANDLE} is only needed when a
+file is not an ordinary file containing lines of text.  However,
+@cmd{FILE HANDLE} may be used even for text files, and it may be
+easier to specify a file's name once and later refer to it by an
+abstract handle.
  
  Specify the file handle name as an identifier.  Any given identifier may
  only appear once in a PSPP run.  File handles may not be reassigned to a
@@ -5228,18 +5242,19 @@ HANDLE} command name.
  The NAME subcommand specifies the name of the file associated with the
  handle.  It is the only required subcommand.
  
-The RECFORM subcommand specifies how the file is laid out.  VARIABLE
-specifies variable-length lines terminated with newlines, and it is the
-default.  FIXED specifies fixed-length records.  SPANNED is not
-supported.
+MODE specifies a file mode.  In CHARACTER mode, the default, the data
+file is opened in ANSI C text mode, so that local end of line
+conventions are followed, and each text line is read as one record.
+In CHARACTER mode, most input programs will expand tabs to spaces
+(@cmd{DATA LIST FREE} with explicitly specified delimiters is an
+exception).  By default, each tab is 4 characters wide, but an
+alternate width may be specified on TABWIDTH.  A tab width of 0
+suppresses tab expansion entirely.
  
-LRECL specifies the length of fixed-length records.  It is required if
-@code{/RECFORM FIXED} is specified.  
-
-MODE specifies a file mode.  CHARACTER, the default, causes the data
-file to be opened in ANSI C text mode.  BINARY causes the data file to
-be opened in ANSI C binary mode.  The other possibilities are not
-supported.
+By contrast, in BINARY mode, the data file is opened in ANSI C binary
+mode and records are a fixed length.  In BINARY mode, LRECL specifies
+the record length in bytes, with a default of 1024.  Tab characters
+are never expanded to spaces in binary mode.
  
  @node INPUT PROGRAM, LIST, FILE HANDLE, Data Input and Output
  @section INPUT PROGRAM
@@ -6624,7 +6639,7 @@ character codes.  On most modern computers, this is a form of ASCII.
  The aggregation functions listed above exclude all user-missing values
  from calculations.  To include user-missing values, insert a period
  (@samp{.}) between the function name and left parenthesis
-(e.g.~@samp{SUM.}).
+(e.g.@: @samp{SUM.}).
  
  Normally, only a single case (for SD and SD., two cases) need be
  non-missing in each group for the aggregate variable to be
@@ -7472,7 +7487,8 @@ far.
  * DESCRIPTIVES::                Descriptive statistics.
  * FREQUENCIES::                 Frequency tables.
  * CROSSTABS::                   Crosstabulation tables.
-* T-TEST::                      Test Hypotheses about means.
+* T-TEST::                      Test hypotheses about means.
+* ONEWAY::                      One analysis of variance.
  @end menu
  
  @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics
@@ -7858,7 +7874,7 @@ Approximate T of uncertainty coefficient is wrong.
  
  Fixes for any of these deficiencies would be welcomed.
  
-@node T-TEST, , CROSSTABS, Statistics
+@node T-TEST, ONEWAY, CROSSTABS, Statistics
  @comment  node-name,  next,  previous,  up
  @section T-TEST
  
@@ -7918,14 +7934,12 @@ which they would be needed. This is the default.
  
  
  @menu
-* One Sample Mode::              Testing against a hypothesised mean
-* Independent Samples Mode::     Testing two independent groups for the same mean
-* Paired Samples Mode::          Testing two interdependet groups for the same mean
+* One Sample Mode::             Testing against a hypothesised mean
+* Independent Samples Mode::    Testing two independent groups for equal mean
+* Paired Samples Mode::         Testing two interdependent groups for equal mean
  @end menu
  
  @node One Sample Mode, Independent Samples Mode, T-TEST, T-TEST
-@comment  node-name,  next,  previous,  up
-
  @subsection One Sample Mode
  
  The @cmd{TESTVAL} subcommand invokes the One Sample mode.
@@ -7964,7 +7978,7 @@ the independent variable are excluded on a listwise basis, regardless
  of whether @cmd{/MISSING=LISTWISE} was specified.
  
  
-@node Paired Samples Mode, , Independent Samples Mode, T-TEST
+@node Paired Samples Mode,  , Independent Samples Mode, T-TEST
  @comment  node-name,  next,  previous,  up
  @subsection Paired Samples Mode
  
@@ -7985,6 +7999,57 @@ of variable preceding @code{WITH} against variable following
  @code{WITH} are generated.
  
  
+@node ONEWAY, , T-TEST, Statistics
+@comment  node-name,  next,  previous,  up
+@section Oneway
+
+@vindex ONEWAY
+@cindex analysis of variance
+@cindex ANOVA
+
+@display
+ONEWAY
+        [/VARIABLES = ] var_list BY var
+        /MISSING=@{ANALYSIS,LISTWISE@} @{EXCLUDE,INCLUDE@}
+        /CONTRASTS= value1 [, value2] ... [,valueN]
+        /STATISTICS=@{DESCRIPTIVES,HOMOGENEITY@}
+
+@end display
+
+The @cmd{ONEWAY} procedure performs a one-way analysis of variance of
+variables factored by a single independent variable.
+It is used to compare the means of a population
+divided into more than two groups. 
+
+The  variables to be analysed should be given in the @code{VARIABLES}
+subcommand.  
+The list of variables must be followed by the @code{BY} keyword and
+the name of the independent (or factor) variable.
+
+You can use the @code{STATISTICS} subcommand to tell PSPP to display
+ancilliary information.  The options accepted are:
+@itemize
+@item DESCRIPTIVES
+Displays descriptive statistics about the groups factored by the independent
+variable.
+@item HOMOGENEITY
+Displays the Levene test of Homogeneity of Variance for the
+variables and their groups.
+@end itemize
+
+The @code{CONTRASTS} subcommand is used when you anticipate certain
+differences between the groups.
+The subcommand must be followed by a list of numerals which are the
+coefficients of the groups to be tested.
+The number of coefficients must correspond to the number of distinct
+groups (or values of the independent variable).
+If the total sum of the coefficients are not zero, then PSPP will
+display a warning, but will proceed with the analysis.
+The @code{CONTRASTS} subcommand may be given up to 10 times in order
+to specify different contrast tests.
+
+
+
  @node Utilities, Not Implemented, Statistics, Top
  @chapter Utilities
  
@@ -9137,7 +9202,7 @@ struct sysfile_machine_flt64_info
  
  @table @code
  @item int32 rec_type;
-Record type.  Always set to 3.
+Record type.  Always set to 7.
  
  @item int32 subtype;
  Record subtype.  Always set to 4.
@@ -9183,10 +9248,12 @@ struct sysfile_misc_info
  
  @table @code
  @item int32 rec_type;
-Record type.  Always set to 3.
+Record type.  Always set to 7.
  
  @item int32 subtype;
-Record subtype.  May take any value.
+Record subtype.  May take any value.  According to Aapi
+H@"am@"al@"ainen, value 5 indicates a set of grouped variables and 6
+indicates date info (probably related to USE).
  
  @item int32 size;
  Size of each piece of data in the data part.  Should have the value 4 or
@@ -9303,6 +9370,7 @@ may be incorrect in the general case.
  * Version and Date Info Record::  
  * Identification Records::      
  * Variable Count Record::       
+* Case Weight Variable Record::  
  * Variable Records::            
  * Value Label Records::         
  * Portable File Data::          
@@ -9313,9 +9381,8 @@ may be incorrect in the general case.
  
  Portable files are arranged as a series of lines of exactly 80
  characters each.  Each line is terminated by a carriage-return,
-line-feed sequence (henceforth, ``newline'').  Newlines are not
-delimiters: they are only used to avoid line-length limitations existing
-on some operating systems.
+line-feed sequence ``new-lines'').  New-lines are only used to avoid
+line length limits imposed by some OSes; they are not meaningful.
  
  The file must be terminated with a @samp{Z} character.  In addition, if
  the final line in the file does not have exactly 80 characters, then it
@@ -9324,7 +9391,7 @@ be in any character set; the file contains a description of its own
  character set, as explained in the next section.  Therefore, the
  @samp{Z} character is not necessarily an ASCII @samp{Z}.)
  
-For the rest of the description of the portable file format, newlines
+For the rest of the description of the portable file format, new-lines
  and the trailing @samp{Z}s will be ignored, as if they did not exist,
  because they are not an important part of understanding the file
  contents.
@@ -9351,6 +9418,9 @@ Subproduct identification (optional).
  @item
  Variable count.
  
+@item
+Case weight variable (optional).
+
  @item
  Variables.  Each variable record may optionally be followed by a
  missing value record and a variable label record.
@@ -9388,18 +9458,18 @@ A whole number, consisting of one or more base-30 digits: @samp{0}
  through @samp{9} plus capital letters @samp{A} through @samp{T}.
  
  @item
-A fraction, consisting of a radix point (@samp{.}) followed by one or
-more base-30 digits (optional).
+Optional fraction, consisting of a radix point (@samp{.}) followed by
+one or more base-30 digits.
  
  @item
-An exponent, consisting of a plus or minus sign (@samp{+} or @samp{-})
-followed by one or more base-30 digits (optional).
+Optional exponent, consisting of a plus or minus sign (@samp{+} or
+@samp{-}) followed by one or more base-30 digits.
  
  @item
  A forward slash (@samp{/}).
  @end itemize
  
-Integer fields take form identical to floating-point fields, but they
+Integer fields take a form identical to floating-point fields, but they
  may not contain a fraction.
  
  String fields take the form of a integer field having value @var{n},
@@ -9413,10 +9483,11 @@ Every portable file begins with a 464-byte header, consisting of a
  character set translation table, followed by an 8-byte tag string.
  
  The 200-byte segment is divided into five 40-byte sections, each of
-which represents the string @code{ASCII SPSS PORT FILE} in a different
-character set encoding.  (If the file is encoded in EBCDIC then the
-string is actually @code{EBCDIC SPSS PORT FILE}, and so on.)  These
-strings are padded on the right with spaces in their own character set.
+which represents the string @code{@var{charset} SPSS PORT FILE} in a
+different character set encoding, where @var{charset} is the name of
+the character set used in the file, e.g.@: @code{ASCII} or
+@code{EBCDIC}.  Each string is padded on the right with spaces in its
+respective character set.
  
  It appears that these strings exist only to inform those who might view
  the file on a screen, and that they are not parsed by SPSS products.
@@ -9611,7 +9682,7 @@ The subproduct identification record has tag code @samp{3}.  It
  consists of a single string field giving additional information on the
  product that wrote the portable file.
  
-@node Variable Count Record, Variable Records, Identification Records, Portable File Format
+@node Variable Count Record, Case Weight Variable Record, Identification Records, Portable File Format
  @section Variable Count Record
  
  The variable count record has tag code @samp{4}.  It consists of two
@@ -9619,7 +9690,15 @@ integer fields.  The first contains the number of variables in the file
  dictionary.  The purpose of the second is unknown; it contains the value
  161 in all portable files examined so far.
  
-@node Variable Records, Value Label Records, Variable Count Record, Portable File Format
+@node Case Weight Variable Record, Variable Records, Variable Count Record, Portable File Format
+@section Case Weight Variable Record
+
+The case weight variable record is optional.  If it is present, it
+indicates the variable used for weighting cases; if it is absent,
+cases are unweighted.  It has tag code @samp{6}.  It consists of a
+single string field that names the weighting variable.
+
+@node Variable Records, Value Label Records, Case Weight Variable Record, Portable File Format
  @section Variable Records
  
  Each variable record represents a single variable.  Variable records