5 @cindex linear regression
6 The @cmd{REGRESSION} procedure fits linear models to data via least-squares
7 estimation. The procedure is appropriate for data which satisfy those
8 assumptions typical in linear regression:
11 @item The data set contains @math{n} observations of a dependent variable, say
12 @math{Y_1,@dots{},Y_n}, and @math{n} observations of one or more explanatory
14 Let @math{X_{11}, X_{12}}, @dots{}, @math{X_{1n}} denote the @math{n} observations
15 of the first explanatory variable;
16 @math{X_{21}},@dots{},@math{X_{2n}} denote the @math{n} observations of the second
18 @math{X_{k1}},@dots{},@math{X_{kn}} denote the @math{n} observations of
19 the @math{k}th explanatory variable.
21 @item The dependent variable @math{Y} has the following relationship to the
22 explanatory variables:
23 @math{Y_i = b_0 + b_1 X_{1i} + ... + b_k X_{ki} + Z_i}
24 where @math{b_0, b_1, @dots{}, b_k} are unknown
25 coefficients, and @math{Z_1,@dots{},Z_n} are independent, normally
26 distributed @dfn{noise} terms with mean zero and common variance.
27 The noise, or @dfn{error} terms are unobserved.
28 This relationship is called the @dfn{linear model}.
31 The @cmd{REGRESSION} procedure estimates the coefficients
32 @math{b_0,@dots{},b_k} and produces output relevant to inferences for the
36 * Syntax:: Syntax definition.
37 * Examples:: Using the REGRESSION procedure.
46 /VARIABLES=@var{var_list}
47 /DEPENDENT=@var{var_list}
48 /STATISTICS=@{ALL, DEFAULTS, R, COEFF, ANOVA, BCOV, CI[@var{conf}]@}
52 The @cmd{REGRESSION} procedure reads the active dataset and outputs
53 statistics relevant to the linear model specified by the user.
55 The @subcmd{VARIABLES} subcommand, which is required, specifies the list of
56 variables to be analyzed. Keyword @subcmd{VARIABLES} is required. The
57 @subcmd{DEPENDENT} subcommand specifies the dependent variable of the linear
58 model. The @subcmd{DEPENDENT} subcommand is required. All variables listed in
59 the @subcmd{VARIABLES} subcommand, but not listed in the @subcmd{DEPENDENT} subcommand,
60 are treated as explanatory variables in the linear model.
62 All other subcommands are optional:
64 The @subcmd{STATISTICS} subcommand specifies which statistics are to be displayed.
65 The following keywords are accepted:
69 All of the statistics below.
71 The ratio of the sums of squares due to the model to the total sums of
72 squares for the dependent variable.
74 A table containing the estimated model coefficients and their standard errors.
76 This item is only relevant if COEFF has also been selected. It specifies that the
77 confidence interval for the coefficients should be printed. The optional value @var{conf},
78 which must be in parentheses, is the desired confidence level expressed as a percentage.
80 Analysis of variance table for the model.
82 The covariance matrix for the estimated model coefficients.
84 The same as if R, COEFF, and ANOVA had been selected.
85 This is what you get if the /STATISTICS command is not specified,
86 or if it is specified without any parameters.
89 The @subcmd{SAVE} subcommand causes @pspp{} to save the residuals or predicted
90 values from the fitted
91 model to the active dataset. @pspp{} will store the residuals in a variable
92 called @samp{RES1} if no such variable exists, @samp{RES2} if @samp{RES1}
94 @samp{RES3} if @samp{RES1} and @samp{RES2} already exist, etc. It will
96 the variable for the predicted values similarly, but with @samp{PRED} as a
98 When @subcmd{SAVE} is used, @pspp{} ignores @cmd{TEMPORARY}, treating
99 temporary transformations as permanent.
103 The following @pspp{} syntax will generate the default output and save the
104 predicted values and residuals to the active dataset.
107 title 'Demonstrate REGRESSION procedure'.
108 data list / v0 1-2 (A) v1 v2 3-22 (10).
122 regression /variables=v0 v1 v2 /statistics defaults /dependent=v2
123 /save pred resid /method=enter.