X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=03e4a44e4c387f25196b6da63b187c3a9570d7d4;hb=3eb4b3858bc911dd79e3da5fdcb96151eeaecd4d;hp=35c47eea2c9b1a5152678c4b4c15638c7c655d28;hpb=01056c26493652935e9ad48f6c5da5d84473d885;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index 35c47eea2c..03e4a44e4c 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -10,7 +10,8 @@ far. * EXAMINE:: Testing data for normality. * CORRELATIONS:: Correlation tables. * CROSSTABS:: Crosstabulation tables. -* FACTOR:: Factor analysis and Principal Components analysis +* FACTOR:: Factor analysis and Principal Components analysis. +* LOGISTIC REGRESSION:: Bivariate Logistic Regression. * MEANS:: Average values and other statistics. * NPAR TESTS:: Nonparametric tests. * T-TEST:: Test hypotheses about means. @@ -72,6 +73,8 @@ names ZSC000 through ZSC999, STDZ00 through STDZ09, ZZZZ00 through ZZZZ09, ZQZQ00 through ZQZQ09, in that sequence. In addition, Z score variable names can be specified explicitly on @subcmd{VARIABLES} in the variable list by enclosing them in parentheses after each variable. +When Z scores are calculated, @pspp{} ignores @cmd{TEMPORARY}, +treating temporary transformations as permanent. The @subcmd{STATISTICS} subcommand specifies the statistics to be displayed: @@ -702,13 +705,23 @@ performed, and all coefficients will be printed. The @subcmd{/CRITERIA} subcommand is used to specify how the number of extracted factors (components) are chosen. If @subcmd{FACTORS(@var{n})} is specified, where @var{n} is an integer, then @var{n} factors will be extracted. Otherwise, the @subcmd{MINEIGEN} setting will -be used. @subcmd{MINEIGEN(@var{l})} requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted. -The default value of @var{l} is 1. The @subcmd{ECONVERGE} and @subcmd{ITERATE} settings have effect only when iterative algorithms for factor -extraction (such as Principal Axis Factoring) are used. @subcmd{ECONVERGE(@var{delta})} specifies that +be used. +@subcmd{MINEIGEN(@var{l})} requests that all factors whose eigenvalues are greater than or equal to @var{l} are extracted. +The default value of @var{l} is 1. +The @subcmd{ECONVERGE} setting has effect only when iterative algorithms for factor +extraction (such as Principal Axis Factoring) are used. +@subcmd{ECONVERGE(@var{delta})} specifies that iteration should cease when the maximum absolute value of the communality estimate between one iteration and the previous is less than @var{delta}. The default value of @var{delta} is 0.001. -The @subcmd{ITERATE(@var{m})} setting sets the maximum number of iterations to @var{m}. The default value of @var{m} is 25. +The @subcmd{ITERATE(@var{m})} may appear any number of times and is used for two different purposes. +It is used to set the maximum number of iterations (@var{m}) for convergence and also to set the maximum number of iterations +for rotation. +Whether it affects convergence or rotation depends upon which subcommand follows the @subcmd{ITERATE} subcommand. +If @subcmd{EXTRACTION} follows, it affects convergence. +If @subcmd{ROTATION} follows, it affects rotation. +If neither @subcmd{ROTATION} nor @subcmd{EXTRACTION} follow a @subcmd{ITERATE} subcommand it will be ignored. +The default value of @var{m} is 25. The @cmd{MISSING} subcommand determines the handling of missing variables. If @subcmd{INCLUDE} is set, then user-missing values are included in the @@ -723,6 +736,92 @@ If @subcmd{PAIRWISE} is set, then a case is considered missing only if either of values for the particular coefficient are missing. The default is @subcmd{LISTWISE}. +@node LOGISTIC REGRESSION +@section LOGISTIC REGRESSION + +@vindex LOGISTIC REGRESSION +@cindex logistic regression +@cindex bivariate logistic regression + +@display +LOGISTIC REGRESSION [VARIABLES =] @var{dependent_var} WITH @var{predictors} + + [/CATEGORICAL = @var{categorical_predictors}] + + [@{/NOCONST | /ORIGIN | /NOORIGIN @}] + + [/PRINT = [SUMMARY] [DEFAULT] [CI(@var{confidence})] [ALL]] + + [/CRITERIA = [BCON(@var{min_delta})] [ITERATE(@var{max_interations})] + [LCON(@var{min_likelihood_delta})] [EPS(@var{min_epsilon})] + [CUT(@var{cut_point})]] + + [/MISSING = @{INCLUDE|EXCLUDE@}] +@end display + +Bivariate Logistic Regression is used when you want to explain a dichotomous dependent +variable in terms of one or more predictor variables. + +The minimum command is +@example +LOGISTIC REGRESSION @var{y} WITH @var{x1} @var{x2} @dots{} @var{xn}. +@end example +Here, @var{y} is the dependent variable, which must be dichotomous and @var{x1} @dots{} @var{xn} +are the predictor variables whose coefficients the procedure estimates. + +By default, a constant term is included in the model. +Hence, the full model is +@math{ +{\bf y} += b_0 + b_1 {\bf x_1} ++ b_2 {\bf x_2} ++ \dots ++ b_n {\bf x_n} +} + +Predictor variables which are categorical in nature should be listed on the @subcmd{/CATEGORICAL} subcommand. +Simple variables as well as interactions between variables may be listed here. + +If you want a model without the constant term @math{b_0}, use the keyword @subcmd{/ORIGIN}. +@subcmd{/NOCONST} is a synonym for @subcmd{/ORIGIN}. + +An iterative Newton-Raphson procedure is used to fit the model. +The @subcmd{/CRITERIA} subcommand is used to specify the stopping criteria of the procedure, +and other parameters. +The value of @var{cut_point} is used in the classification table. It is the +threshold above which predicted values are considered to be 1. Values +of @var{cut_point} must lie in the range [0,1]. +During iterations, if any one of the stopping criteria are satisfied, the procedure is +considered complete. +The stopping criteria are: +@itemize +@item The number of iterations exceeds @var{max_iterations}. + The default value of @var{max_iterations} is 20. +@item The change in the all coefficient estimates are less than @var{min_delta}. +The default value of @var{min_delta} is 0.001. +@item The magnitude of change in the likelihood estimate is less than @var{min_likelihood_delta}. +The default value of @var{min_delta} is zero. +This means that this criterion is disabled. +@item The differential of the estimated probability for all cases is less than @var{min_epsilon}. +In other words, the probabilities are close to zero or one. +The default value of @var{min_epsilon} is 0.00000001. +@end itemize + + +The @subcmd{PRINT} subcommand controls the display of optional statistics. +Currently there is one such option, @subcmd{CI}, which indicates that the +confidence interval of the odds ratio should be displayed as well as its value. +@subcmd{CI} should be followed by an integer in parentheses, to indicate the +confidence level of the desired confidence interval. + +The @subcmd{MISSING} subcommand determines the handling of missing +variables. +If @subcmd{INCLUDE} is set, then user-missing values are included in the +calculations, but system-missing values are not. +If @subcmd{EXCLUDE} is set, which is the default, user-missing +values are excluded as well as system-missing values. +This is the default. + @node MEANS @section MEANS