X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=9f1923b8e82ddad89e8a4c9183cf9bff3c345e5a;hb=5d9d2b2ae7d49240f3438e4f3c40ac1f276e31fb;hp=6014e670e4ed14c818c152c0589817443648b876;hpb=f500c9c2989d63465b9a93fe6f7e1600207681af;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index 6014e670e4..9f1923b8e8 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -10,9 +10,12 @@ far. @menu * DESCRIPTIVES:: Descriptive statistics. * FREQUENCIES:: Frequency tables. +* EXAMINE:: Testing data for normality. * CROSSTABS:: Crosstabulation tables. * T-TEST:: Test hypotheses about means. * ONEWAY:: One way analysis of variance. +* RANK:: Compute rank scores. +* REGRESSION:: Linear regression. @end menu @node DESCRIPTIVES, FREQUENCIES, Statistics, Statistics @@ -105,7 +108,7 @@ in the order that they are specified on the VARIABLES subcommand. The A and D settings request an ascending or descending sort order, respectively. -@node FREQUENCIES, CROSSTABS, DESCRIPTIVES, Statistics +@node FREQUENCIES, EXAMINE, DESCRIPTIVES, Statistics @section FREQUENCIES @vindex FREQUENCIES @@ -212,7 +215,82 @@ boundaries of the data set divided into the specified number of ranges. For instance, @code{/NTILES=4} would cause quartiles to be reported. -@node CROSSTABS, T-TEST, FREQUENCIES, Statistics +@node EXAMINE, CROSSTABS, FREQUENCIES, Statistics +@comment node-name, next, previous, up +@section EXAMINE +@vindex EXAMINE + +@cindex Normality, testing for + +@display +EXAMINE + VARIABLES=var_list [BY factor_list ] + /STATISTICS=@{DESCRIPTIVES, EXTREME[(n)], ALL, NONE@} + /PLOT=@{STEMLEAF, BOXPLOT, NPPLOT, SPREADLEVEL(n), HISTOGRAM, + ALL, NONE@} + /CINTERVAL n + /COMPARE=@{GROUPS,VARIABLES@} + /ID=@{case_number, var_name@} + /@{TOTAL,NOTOTAL@} + /PERCENTILE=[value_list]=@{HAVERAGE, WAVERAGE, ROUND, AEMPIRICAL, EMPIRICAL @} + /MISSING=@{LISTWISE, PAIRWISE@} [@{EXCLUDE, INCLUDE@}] + [@{NOREPORT,REPORT@}] + +@end display + +The @cmd{EXAMINE} command is used to test how closely a distribution is to a +normal distribution. It also shows you outliers and extreme values. + +The VARIABLES subcommand specifies the dependent variables and the +independent variable to use as factors for the analysis. Variables +listed before the first BY keyword are the dependent variables. +The dependent variables may optionally be followed by a list of +factors which tell PSPP how to break down the analysis for each +dependent variable. The format for each factor is +@display +var [BY var]. +@end display + + +The STATISTICS subcommand specifies the analysis to be done. +DESCRIPTIVES will produce a table showing some parametric and +non-parametrics statistics. EXTREME produces a table showing extreme +values of the dependent variable. A number in parentheses determines +how many upper and lower extremes to show. The default number is 5. + + +The PLOT subcommand specifies which plots are to be produced if any. + +The COMPARE subcommand is only relevant if producing boxplots, and it is only +useful there is more than one dependent variable and at least one factor. If +/COMPARE=GROUPS is specified, then one plot per dependent variable is produced, +containing boxplots for all the factors. +If /COMPARE=VARIABLES is specified, then one plot per factor is produced, each +each containing one boxplot per dependent variable. +If the /COMPARE subcommand is ommitted, then PSPP uses the default value of +/COMPARE=GROUPS. + +The CINTERVAL subcommand specifies the confidence interval to use in +calculation of the descriptives command. The default it 95%. + +The PERCENTILES subcommand specifies which percentiles are to be calculated, +and which algorithm to use for calculating them. The default is to +calculate the 5, 10, 25, 50, 75, 90, 95 percentiles using the +HAVERAGE algorithm. + +The TOTAL and NOTOTAL subcommands are mutually exclusive. If NOTOTAL +is given and factors have been specified in the VARIABLES subcommand, +then then statistics for the unfactored dependent variables are +produced in addition to the factored variables. If there are no +factors specified then TOTAL and NOTOTAL have no effect. + +@strong{Warning!} +If many dependent variable are given, or factors are given for which +there are many distinct values, then @cmd{EXAMINE} will produce a very +large quantity of output. + + +@node CROSSTABS, T-TEST, EXAMINE, Statistics @section CROSSTABS @vindex CROSSTABS @@ -523,9 +601,9 @@ of variable preceding @code{WITH} against variable following @code{WITH} are generated. -@node ONEWAY, , T-TEST, Statistics +@node ONEWAY, RANK, T-TEST, Statistics @comment node-name, next, previous, up -@section Oneway +@section ONEWAY @vindex ONEWAY @cindex analysis of variance @@ -572,3 +650,69 @@ display a warning, but will proceed with the analysis. The @code{CONTRASTS} subcommand may be given up to 10 times in order to specify different contrast tests. @setfilename ignored + +@node RANK, REGRESSION, ONEWAY, Statistics +@comment node-name, next, previous, up +@section RANK + +@vindex RANK +@cindex RANK + +@display +RANK + [VARIABLES=] var_list [@{A,D@}] [BY var_list] + /TIES=@{MEAN,LOW,HIGH,CONDENSE@} + /FRACTION=@{BLOM,TUKEY,VW,RANKIT@} + /PRINT[=@{YES,NO@} + /MISSING=@{EXCLUDE,INCLUDE@} + + /RANK [INTO var_list] + /NTILES(k) [INTO var_list] + /NORMAL [INTO var_list] + /PERCENT [INTO var_list] + /RFRACTION [INTO var_list] + /PROPORTION [INTO var_list] + /N [INTO var_list] + /SAVAGE [INTO var_list] +@end display + +The @cmd{RANK} command ranks variables and stores the results into new +variables. + +The VARIABLES subcommand, which is mandatory, specifies one or +more variables whose values are to be ranked. +After each variable, @samp{A} or @samp{D} may appear, indicating that +the variable is to be ranked in ascending or descending order. +Ascending is the default. +If a BY keyword appears, it should be followed by a list of variables +which are to serve as group variables. +In this case, the cases are gathered into groups, and ranks calculated +for each group. + +The TIES subcommand specifies how tied values are to be treated. The +default is to take the mean value of all the tied cases. + +The FRACTION subcommand specifies how proportional ranks are to be +calculated. This only has any effect if NORMAL or PROPORTIONAL rank +functions are requested. + +The PRINT subcommand may be used to specify that a summary of the rank +variables created should appear in the output. + +The function subcommands are RANK, NTILES, NORMAL, PERCENT, RFRACTION, +PROPORTION and SAVAGE. Any number of function subcommands may appear. +If none are given, then the default is RANK. +The NTILES subcommand must take an integer specifying the number of +partitions into which values should be ranked. +Each subcommand may be followed by the INTO keyword and a list of +variables which are the variables to be created and receive the rank +scores. There may be as many variables specified as there are +variables named on the VARIABLES subcommand. If fewer are specified, +then the variable names are automatically created. + +The MISSING subcommand determines how user missing values are to be +treated. A setting of EXCLUDE means that variables whose values are +user-missing are to be excluded from the rank scores. A setting of +INCLUDE means they are to be included. The default is EXCLUDE. + +@include regression.texi