X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fregression.texi;h=d3d5a198f79cd92d80dc09d1a4d69e7cb7f4bf50;hb=refs%2Fheads%2Fctables7;hp=b47416b70191c1a1af0129ca073a36fa7a390056;hpb=b5c82cc9aabe7e641011130240ae1b2e84348e23;p=pspp diff --git a/doc/regression.texi b/doc/regression.texi index b47416b701..d3d5a198f7 100644 --- a/doc/regression.texi +++ b/doc/regression.texi @@ -1,37 +1,45 @@ +@c PSPP - a program for statistical analysis. +@c Copyright (C) 2017, 2020 Free Software Foundation, Inc. +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 +@c or any later version published by the Free Software Foundation; +@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the section entitled "GNU +@c Free Documentation License". +@c @node REGRESSION -@comment node-name, next, previous, up @section REGRESSION @cindex regression @cindex linear regression -The REGRESSION procedure fits linear models to data via least-squares +The @cmd{REGRESSION} procedure fits linear models to data via least-squares estimation. The procedure is appropriate for data which satisfy those assumptions typical in linear regression: @itemize @bullet @item The data set contains @math{n} observations of a dependent variable, say @math{Y_1,@dots{},Y_n}, and @math{n} observations of one or more explanatory -variables. Let @math{X_{11}, X_{12}}, @dots{}, @math{X_{1n}} denote the @math{n} observations of the -first explanatory variable; @math{X_{21}},@dots{},@math{X_{2n}} denote the @math{n} observations of the -second explanatory variable; @math{X_{k1}},@dots{},@math{X_{kn}} denote the @math{n} observations of the kth -explanatory variable. - -@item The dependent variable @math{Y} has the following relationship to the +variables. +Let @math{X_{11}, X_{12}}, @dots{}, @math{X_{1n}} denote the @math{n} observations +of the first explanatory variable; +@math{X_{21}},@dots{},@math{X_{2n}} denote the @math{n} observations of the second +explanatory variable; +@math{X_{k1}},@dots{},@math{X_{kn}} denote the @math{n} observations of +the @math{k}th explanatory variable. + +@item The dependent variable @math{Y} has the following relationship to the explanatory variables: -@math{Y_i = b_0 + b_1 X_{1i} + ... + b_k X_{ki} + Z_i} +@math{Y_i = b_0 + b_1 X_{1i} + ... + b_k X_{ki} + Z_i} where @math{b_0, b_1, @dots{}, b_k} are unknown coefficients, and @math{Z_1,@dots{},Z_n} are independent, normally -distributed ``noise'' terms with mean zero and common variance. The noise, or -``error'' terms are unobserved. This relationship is called the -``linear model.'' +distributed @dfn{noise} terms with mean zero and common variance. +The noise, or @dfn{error} terms are unobserved. +This relationship is called the @dfn{linear model}. @end itemize -The REGRESSION procedure estimates the coefficients +The @cmd{REGRESSION} procedure estimates the coefficients @math{b_0,@dots{},b_k} and produces output relevant to inferences for the -linear model. - -@c If you add any new commands, then don't forget to remove the entry in -@c not-implemented.texi +linear model. @menu * Syntax:: Syntax definition. @@ -44,27 +52,29 @@ linear model. @vindex REGRESSION @display REGRESSION - /VARIABLES=var_list - /DEPENDENT=var_list - /STATISTICS=@{ALL, DEFAULTS, R, COEFF, ANOVA, BCOV@} + /VARIABLES=@var{var_list} + /DEPENDENT=@var{var_list} + /STATISTICS=@{ALL, DEFAULTS, R, COEFF, ANOVA, BCOV, CI[@var{conf}, TOL]@} + @{ /ORIGIN | /NOORIGIN @} /SAVE=@{PRED, RESID@} @end display -The @cmd{REGRESSION} procedure reads the active file and outputs +The @cmd{REGRESSION} procedure reads the active dataset and outputs statistics relevant to the linear model specified by the user. -The VARIABLES subcommand, which is required, specifies the list of -variables to be analyzed. Keyword VARIABLES is required. The -DEPENDENT subcommand specifies the dependent variable of the linear -model. The DEPENDENT subcommand is required. All variables listed in -the VARIABLES subcommand, but not listed in the DEPENDENT subcommand, +The @subcmd{VARIABLES} subcommand, which is required, specifies the list of +variables to be analyzed. Keyword @subcmd{VARIABLES} is required. The +@subcmd{DEPENDENT} subcommand specifies the dependent variable of the linear +model. The @subcmd{DEPENDENT} subcommand is required. All variables listed in +the @subcmd{VARIABLES} subcommand, but not listed in the @subcmd{DEPENDENT} subcommand, are treated as explanatory variables in the linear model. All other subcommands are optional: -The STATISTICS subcommand specifies the statistics to be displayed: +The @subcmd{STATISTICS} subcommand specifies which statistics are to be displayed. +The following keywords are accepted: -@table @code +@table @subcmd @item ALL All of the statistics below. @item R @@ -72,24 +82,45 @@ The ratio of the sums of squares due to the model to the total sums of squares for the dependent variable. @item COEFF A table containing the estimated model coefficients and their standard errors. +@item CI (@var{conf}) +This item is only relevant if COEFF has also been selected. It specifies that the +confidence interval for the coefficients should be printed. The optional value @var{conf}, +which must be in parentheses, is the desired confidence level expressed as a percentage. @item ANOVA Analysis of variance table for the model. @item BCOV The covariance matrix for the estimated model coefficients. +@item TOL +The variance inflation factor and its reciprocal. This has no effect unless COEFF is also given. +@item DEFAULT +The same as if R, COEFF, and ANOVA had been selected. +This is what you get if the /STATISTICS command is not specified, +or if it is specified without any parameters. @end table -The SAVE subcommand causes PSPP to save the residuals or predicted +The @subcmd{ORIGIN} and @subcmd{NOORIGIN} subcommands are mutually +exclusive. @subcmd{ORIGIN} indicates that the regression should be +performed through the origin. You should use this option if, and +only if you have reason to believe that the regression does indeed +pass through the origin --- that is to say, the value @math{b_0} above, +is zero. The default is @subcmd{NOORIGIN}. + +The @subcmd{SAVE} subcommand causes @pspp{} to save the residuals or predicted values from the fitted -model to the active file. PSPP will store the residuals in a variable -called RES1 if no such variable exists, RES2 if RES1 already exists, -RES3 if RES1 and RES2 already exist, etc. It will choose the name of -the variable for the predicted values similarly, but with PRED as a +model to the active dataset. @pspp{} will store the residuals in a variable +called @samp{RES1} if no such variable exists, @samp{RES2} if @samp{RES1} +already exists, +@samp{RES3} if @samp{RES1} and @samp{RES2} already exist, etc. It will +choose the name of +the variable for the predicted values similarly, but with @samp{PRED} as a prefix. +When @subcmd{SAVE} is used, @pspp{} ignores @cmd{TEMPORARY}, treating +temporary transformations as permanent. @node Examples @subsection Examples -The following PSPP syntax will generate the default output and save the -predicted values and residuals to the active file. +The following @pspp{} syntax will generate the default output and save the +predicted values and residuals to the active dataset. @example title 'Demonstrate REGRESSION procedure'. @@ -107,6 +138,6 @@ a 8.838262 -29.25689 b 6.200189 -18.58219 end data. list. -regression /variables=v0 v1 v2 /statistics defaults /dependent=v2 +regression /variables=v0 v1 v2 /statistics defaults /dependent=v2 /save pred resid /method=enter. @end example