+@c PSPP - a program for statistical analysis.
+@c Copyright (C) 2017, 2020 Free Software Foundation, Inc.
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3
+@c or any later version published by the Free Software Foundation;
+@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+@c A copy of the license is included in the section entitled "GNU
+@c Free Documentation License".
+@c
@node REGRESSION
-@comment node-name, next, previous, up
@section REGRESSION
@cindex regression
@cindex linear regression
-The REGRESSION procedure fits linear models to data via least-squares
+The @cmd{REGRESSION} procedure fits linear models to data via least-squares
estimation. The procedure is appropriate for data which satisfy those
assumptions typical in linear regression:
@itemize @bullet
@item The data set contains @math{n} observations of a dependent variable, say
@math{Y_1,@dots{},Y_n}, and @math{n} observations of one or more explanatory
-variables. Let @math{X_{11}, X_{12}, @dots{}, X_{1n}} denote the @math{n} observations of the
-first explanatory variable; @math{X_{21},@dots{},X_{2n}} denote the @math{n} observations of the
-second explanatory variable; @math{X_{k1},@dots{},X_{kn}} denote the @math{n} observations of the kth
-explanatory variable.
-
-@item The dependent variable @math{Y} has the following relationship to the
+variables.
+Let @math{X_{11}, X_{12}}, @dots{}, @math{X_{1n}} denote the @math{n} observations
+of the first explanatory variable;
+@math{X_{21}},@dots{},@math{X_{2n}} denote the @math{n} observations of the second
+explanatory variable;
+@math{X_{k1}},@dots{},@math{X_{kn}} denote the @math{n} observations of
+the @math{k}th explanatory variable.
+
+@item The dependent variable @math{Y} has the following relationship to the
explanatory variables:
-@math{Y_i = b_0 + b_1 X_{1i} + ... + b_k X_{ki} + Z_i}
+@math{Y_i = b_0 + b_1 X_{1i} + ... + b_k X_{ki} + Z_i}
where @math{b_0, b_1, @dots{}, b_k} are unknown
coefficients, and @math{Z_1,@dots{},Z_n} are independent, normally
-distributed ``noise'' terms with mean zero and common variance. The noise, or
-``error'' terms are unobserved. This relationship is called the
-``linear model.''
+distributed @dfn{noise} terms with mean zero and common variance.
+The noise, or @dfn{error} terms are unobserved.
+This relationship is called the @dfn{linear model}.
@end itemize
-The REGRESSION procedure estimates the coefficients
+The @cmd{REGRESSION} procedure estimates the coefficients
@math{b_0,@dots{},b_k} and produces output relevant to inferences for the
-linear model.
-
-@c If you add any new commands, then don't forget to remove the entry in
-@c not-implemented.texi
+linear model.
@menu
* Syntax:: Syntax definition.
@vindex REGRESSION
@display
REGRESSION
- /VARIABLES=var_list
- /DEPENDENT=var_list
- /STATISTICS=@{ALL, DEFAULTS, R, COEFF, ANOVA, BCOV@}
+ /VARIABLES=@var{var_list}
+ /DEPENDENT=@var{var_list}
+ /STATISTICS=@{ALL, DEFAULTS, R, COEFF, ANOVA, BCOV, CI[@var{conf}, TOL]@}
+ @{ /ORIGIN | /NOORIGIN @}
/SAVE=@{PRED, RESID@}
@end display
-The @cmd{REGRESSION} procedure reads the active file and outputs
+The @cmd{REGRESSION} procedure reads the active dataset and outputs
statistics relevant to the linear model specified by the user.
-The VARIABLES subcommand, which is required, specifies the list of
-variables to be analyzed. Keyword VARIABLES is required. The
-DEPENDENT subcommand specifies the dependent variable of the linear
-model. The DEPENDENT subcommand is required. All variables listed in
-the VARIABLES subcommand, but not listed in the DEPENDENT subcommand,
+The @subcmd{VARIABLES} subcommand, which is required, specifies the list of
+variables to be analyzed. Keyword @subcmd{VARIABLES} is required. The
+@subcmd{DEPENDENT} subcommand specifies the dependent variable of the linear
+model. The @subcmd{DEPENDENT} subcommand is required. All variables listed in
+the @subcmd{VARIABLES} subcommand, but not listed in the @subcmd{DEPENDENT} subcommand,
are treated as explanatory variables in the linear model.
All other subcommands are optional:
-The STATISTICS subcommand specifies the statistics to be displayed:
+The @subcmd{STATISTICS} subcommand specifies which statistics are to be displayed.
+The following keywords are accepted:
-@table @code
+@table @subcmd
@item ALL
All of the statistics below.
@item R
squares for the dependent variable.
@item COEFF
A table containing the estimated model coefficients and their standard errors.
+@item CI (@var{conf})
+This item is only relevant if COEFF has also been selected. It specifies that the
+confidence interval for the coefficients should be printed. The optional value @var{conf},
+which must be in parentheses, is the desired confidence level expressed as a percentage.
@item ANOVA
Analysis of variance table for the model.
@item BCOV
The covariance matrix for the estimated model coefficients.
+@item TOL
+The variance inflation factor and its reciprocal. This has no effect unless COEFF is also given.
+@item DEFAULT
+The same as if R, COEFF, and ANOVA had been selected.
+This is what you get if the /STATISTICS command is not specified,
+or if it is specified without any parameters.
@end table
-The SAVE subcommand causes PSPP to save the residuals or predicted
+The @subcmd{ORIGIN} and @subcmd{NOORIGIN} subcommands are mutually
+exclusive. @subcmd{ORIGIN} indicates that the regression should be
+performed through the origin. You should use this option if, and
+only if you have reason to believe that the regression does indeed
+pass through the origin --- that is to say, the value @math{b_0} above,
+is zero. The default is @subcmd{NOORIGIN}.
+
+The @subcmd{SAVE} subcommand causes @pspp{} to save the residuals or predicted
values from the fitted
-model to the active file. PSPP will store the residuals in a variable
-called RES1 if no such variable exists, RES2 if RES1 already exists,
-RES3 if RES1 and RES2 already exist, etc. It will choose the name of
-the variable for the predicted values similarly, but with PRED as a
+model to the active dataset. @pspp{} will store the residuals in a variable
+called @samp{RES1} if no such variable exists, @samp{RES2} if @samp{RES1}
+already exists,
+@samp{RES3} if @samp{RES1} and @samp{RES2} already exist, etc. It will
+choose the name of
+the variable for the predicted values similarly, but with @samp{PRED} as a
prefix.
+When @subcmd{SAVE} is used, @pspp{} ignores @cmd{TEMPORARY}, treating
+temporary transformations as permanent.
@node Examples
@subsection Examples
-The following PSPP syntax will generate the default output and save the
-predicted values and residuals to the active file.
+The following @pspp{} syntax will generate the default output and save the
+predicted values and residuals to the active dataset.
@example
title 'Demonstrate REGRESSION procedure'.
b 6.200189 -18.58219
end data.
list.
-regression /variables=v0 v1 v2 /statistics defaults /dependent=v2
+regression /variables=v0 v1 v2 /statistics defaults /dependent=v2
/save pred resid /method=enter.
@end example
-@setfilename ignored