X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fregression.texi;h=127ec7164567bce7b2f2d2b6233d5e97ca526c03;hb=refs%2Fheads%2Flexer;hp=2a3368539f4dd46895addb832842232ce9e07ff5;hpb=b648b71a8e7180c976aa66dfe05afcf152a1e19d;p=pspp diff --git a/doc/regression.texi b/doc/regression.texi index 2a3368539f..127ec71645 100644 --- a/doc/regression.texi +++ b/doc/regression.texi @@ -1,38 +1,37 @@ @node REGRESSION -@comment node-name, next, previous, up @section REGRESSION @cindex regression @cindex linear regression -The REGRESSION procedure fits linear models to data via least-squares +The @cmd{REGRESSION} procedure fits linear models to data via least-squares estimation. The procedure is appropriate for data which satisfy those assumptions typical in linear regression: @itemize @bullet -@item The data set contains n observations of a dependent variable, say -Y_1,@dots{},Y_n, and n observations of one or more explanatory -variables. Let X_11, X_12, @dots{}, X_1n denote the n observations of the -first explanatory variable; X_21,@dots{},X_2n denote the n observations of the -second explanatory variable; X_k1,@dots{},X_kn denote the n observations of the kth -explanatory variable. - -@item The dependent variable Y has the following relationship to the +@item The data set contains @math{n} observations of a dependent variable, say +@math{Y_1,@dots{},Y_n}, and @math{n} observations of one or more explanatory +variables. +Let @math{X_{11}, X_{12}}, @dots{}, @math{X_{1n}} denote the @math{n} observations +of the first explanatory variable; +@math{X_{21}},@dots{},@math{X_{2n}} denote the @math{n} observations of the second +explanatory variable; +@math{X_{k1}},@dots{},@math{X_{kn}} denote the @math{n} observations of +the @math{k}th explanatory variable. + +@item The dependent variable @math{Y} has the following relationship to the explanatory variables: @math{Y_i = b_0 + b_1 X_{1i} + ... + b_k X_{ki} + Z_i} where @math{b_0, b_1, @dots{}, b_k} are unknown coefficients, and @math{Z_1,@dots{},Z_n} are independent, normally -distributed ``noise'' terms with common variance. The noise, or -``error'' terms are unobserved. This relationship is called the -``linear model.'' +distributed @dfn{noise} terms with mean zero and common variance. +The noise, or @dfn{error} terms are unobserved. +This relationship is called the @dfn{linear model}. @end itemize -The REGRESSION procedure estimates the coefficients +The @cmd{REGRESSION} procedure estimates the coefficients @math{b_0,@dots{},b_k} and produces output relevant to inferences for the linear model. -@c If you add any new commands, then don't forget to remove the entry in -@c not-implemented.texi - @menu * Syntax:: Syntax definition. * Examples:: Using the REGRESSION procedure. @@ -44,27 +43,27 @@ linear model. @vindex REGRESSION @display REGRESSION - /VARIABLES=var_list - /DEPENDENT=var_list + /VARIABLES=@var{var_list} + /DEPENDENT=@var{var_list} /STATISTICS=@{ALL, DEFAULTS, R, COEFF, ANOVA, BCOV@} /SAVE=@{PRED, RESID@} @end display -The @cmd{REGRESSION} procedure reads the active file and outputs +The @cmd{REGRESSION} procedure reads the active dataset and outputs statistics relevant to the linear model specified by the user. -The VARIABLES subcommand, which is required, specifies the list of -variables to be analyzed. Keyword VARIABLES is required. The -DEPENDENT subcommand specifies the dependent variable of the linear -model. The DEPENDENT subcommand is required. All variables listed in -the VARIABLES subcommand, but not listed in the DEPENDENT subcommand, +The @subcmd{VARIABLES} subcommand, which is required, specifies the list of +variables to be analyzed. Keyword @subcmd{VARIABLES} is required. The +@subcmd{DEPENDENT} subcommand specifies the dependent variable of the linear +model. The @subcmd{DEPENDENT} subcommand is required. All variables listed in +the @subcmd{VARIABLES} subcommand, but not listed in the @subcmd{DEPENDENT} subcommand, are treated as explanatory variables in the linear model. All other subcommands are optional: -The STATISTICS subcommand specifies the statistics to be displayed: +The @subcmd{STATISTICS} subcommand specifies the statistics to be displayed: -@table @code +@table @subcmd @item ALL All of the statistics below. @item R @@ -78,18 +77,20 @@ Analysis of variance table for the model. The covariance matrix for the estimated model coefficients. @end table -The SAVE subcommand causes PSPP to save the residuals or predicted +The @subcmd{SAVE} subcommand causes @pspp{} to save the residuals or predicted values from the fitted -model to the active file. PSPP will store the residuals in a variable -called RES1 if no such variable exists, RES2 if RES1 already exists, -RES3 if RES1 and RES2 already exist, etc. It will choose the name of -the variable for the predicted values similarly, but with PRED as a +model to the active dataset. @pspp{} will store the residuals in a variable +called @samp{RES1} if no such variable exists, @samp{RES2} if @samp{RES1} +already exists, +@samp{RES3} if @samp{RES1} and @samp{RES2} already exist, etc. It will +choose the name of +the variable for the predicted values similarly, but with @samp{PRED} as a prefix. @node Examples @subsection Examples -The following PSPP syntax will generate the default output and save the -predicted values and residuals to the active file. +The following @pspp{} syntax will generate the default output and save the +predicted values and residuals to the active dataset. @example title 'Demonstrate REGRESSION procedure'. @@ -110,4 +111,3 @@ list. regression /variables=v0 v1 v2 /statistics defaults /dependent=v2 /save pred resid /method=enter. @end example -@setfilename ignored