X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fregression.texi;h=127ec7164567bce7b2f2d2b6233d5e97ca526c03;hb=refs%2Fheads%2Flexer;hp=22b9f58ff664c196a574565aeac548983f452f1f;hpb=54bc9183a551c4249fb9eabc008beade0e751b78;p=pspp diff --git a/doc/regression.texi b/doc/regression.texi index 22b9f58ff6..127ec71645 100644 --- a/doc/regression.texi +++ b/doc/regression.texi @@ -1,68 +1,69 @@ -@node REGRESSION, , ONEWAY, Statistics +@node REGRESSION @section REGRESSION -The REGRESSION procedure fits linear models to data via least-squares +@cindex regression +@cindex linear regression +The @cmd{REGRESSION} procedure fits linear models to data via least-squares estimation. The procedure is appropriate for data which satisfy those assumptions typical in linear regression: @itemize @bullet -@item The data set contains n observations of a dependent variable, say -Y_1,...,Y_n, and n observations of one or more explanatory -variables. Let X_11, X_12, ..., X_1n denote the n observations of the -first explanatory variable; X_21,...,X_2n denote the n observations of the -second explanatory variable; X_k1,...,X_kn denote the n observations of the kth -explanatory variable. - -@item The dependent variable Y has the following relationship to the +@item The data set contains @math{n} observations of a dependent variable, say +@math{Y_1,@dots{},Y_n}, and @math{n} observations of one or more explanatory +variables. +Let @math{X_{11}, X_{12}}, @dots{}, @math{X_{1n}} denote the @math{n} observations +of the first explanatory variable; +@math{X_{21}},@dots{},@math{X_{2n}} denote the @math{n} observations of the second +explanatory variable; +@math{X_{k1}},@dots{},@math{X_{kn}} denote the @math{n} observations of +the @math{k}th explanatory variable. + +@item The dependent variable @math{Y} has the following relationship to the explanatory variables: -@math{Y_i = b_0 + b_1 X_1i + ... + b_k X_ki + Z_i} -where @math{b_0, b_1, ..., b_k} are unknown -coefficients, and @math{Z_1,...,Z_n} are independent, normally -distributed ``noise'' terms with common variance. The noise, or -``error'' terms are unobserved. This relationship is called the -``linear model.'' +@math{Y_i = b_0 + b_1 X_{1i} + ... + b_k X_{ki} + Z_i} +where @math{b_0, b_1, @dots{}, b_k} are unknown +coefficients, and @math{Z_1,@dots{},Z_n} are independent, normally +distributed @dfn{noise} terms with mean zero and common variance. +The noise, or @dfn{error} terms are unobserved. +This relationship is called the @dfn{linear model}. @end itemize -The REGRESSION procedure estimates the coefficients -@math{b_0,...,b_k} and produces output relevant to inferences for the +The @cmd{REGRESSION} procedure estimates the coefficients +@math{b_0,@dots{},b_k} and produces output relevant to inferences for the linear model. -@c If you add any new commands, then don't forget to remove the entry in -@c not-implemented.texi - @menu * Syntax:: Syntax definition. * Examples:: Using the REGRESSION procedure. @end menu -@node Syntax, Examples, , REGRESSION +@node Syntax @subsection Syntax @vindex REGRESSION @display REGRESSION - /VARIABLES=var_list - /DEPENDENT=var_list + /VARIABLES=@var{var_list} + /DEPENDENT=@var{var_list} /STATISTICS=@{ALL, DEFAULTS, R, COEFF, ANOVA, BCOV@} - /EXPORT ('file-name') - /SAVE + /SAVE=@{PRED, RESID@} @end display -The @cmd{REGRESSION} procedure reads the active file and outputs +The @cmd{REGRESSION} procedure reads the active dataset and outputs statistics relevant to the linear model specified by the user. -The VARIABLES subcommand, which is required, specifies the list of -variables to be analyzed. Keyword VARIABLES is required. The -DEPENDENT subcommand specifies the dependent variable of the linear -model. The DEPENDENT subcommond is required. All variables listed in -the VARIABLES subcommand, but not listed in the DEPENDENT subcommand, +The @subcmd{VARIABLES} subcommand, which is required, specifies the list of +variables to be analyzed. Keyword @subcmd{VARIABLES} is required. The +@subcmd{DEPENDENT} subcommand specifies the dependent variable of the linear +model. The @subcmd{DEPENDENT} subcommand is required. All variables listed in +the @subcmd{VARIABLES} subcommand, but not listed in the @subcmd{DEPENDENT} subcommand, are treated as explanatory variables in the linear model. All other subcommands are optional: -The STATISTICS subcommand specifies the statistics to be displayed: +The @subcmd{STATISTICS} subcommand specifies the statistics to be displayed: -@table @code +@table @subcmd @item ALL All of the statistics below. @item R @@ -76,27 +77,20 @@ Analysis of variance table for the model. The covariance matrix for the estimated model coefficients. @end table -The SAVE subcommand causes PSPP to save the residuals from the fitted -model to the active file. PSPP will store the residuals in a variable -called RES1 if no such variable exists, RES2 if RES1 already exists, -RES3 if RES1 and RES2 already exist, etc. - -The EXPORT subcommand causes PSPP to write a C program containing -functions related to the model. One such function accepts values of -explanatory variables as arguments, and returns an estimate of the -corresponding new -value of the dependent variable. The generated program will also contain -functions that return prediction and confidence intervals related to -those new estimates. PSPP will write the program to the -'file-name' given by the user, and write declarations of functions -to a file called pspp_model_reg.h. The user can then compile the C -program and use it as part of another program. This subcommand is a -PSPP extension. - -@node Examples, , Syntax, REGRESSION +The @subcmd{SAVE} subcommand causes @pspp{} to save the residuals or predicted +values from the fitted +model to the active dataset. @pspp{} will store the residuals in a variable +called @samp{RES1} if no such variable exists, @samp{RES2} if @samp{RES1} +already exists, +@samp{RES3} if @samp{RES1} and @samp{RES2} already exist, etc. It will +choose the name of +the variable for the predicted values similarly, but with @samp{PRED} as a +prefix. + +@node Examples @subsection Examples -The following PSPP code will generate the default output, and save the -linear model in a program called ``model.c.'' +The following @pspp{} syntax will generate the default output and save the +predicted values and residuals to the active dataset. @example title 'Demonstrate REGRESSION procedure'. @@ -114,19 +108,6 @@ a 8.838262 -29.25689 b 6.200189 -18.58219 end data. list. -regression /variables=v0 v1 v2 /statistics defaults /dependent=v2 /export (model.c) /method=enter. +regression /variables=v0 v1 v2 /statistics defaults /dependent=v2 + /save pred resid /method=enter. @end example - -The file pspp_model_reg.h contains these declarations: - -@example -double pspp_reg_estimate (const double *, const char *[]); -double pspp_reg_variance (const double *var_vals, const char *[]); -double pspp_reg_confidence_interval_U (const double *var_vals, const char *var_names[], double p); -double pspp_reg_confidence_interval_L (const double *var_vals, const char *var_names[], double p); -double pspp_reg_prediction_interval_U (const double *var_vals, const char *var_names[], double p); -double pspp_reg_prediction_interval_L (const double *var_vals, const char *var_names[], double p); -@end example - -The file model.c contains the definitions of the functions. -@setfilename ignored