X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fstatistics.texi;h=9cc49557ea2ff288b4abd32e57580bc1f68ae115;hb=47f9412c378ef8e0bcce43566f72caa3b856580b;hp=82d853568f6df985434f45c135dbedfaed3129b7;hpb=a5d6129e0fa3ae298b5f71613509c68441b7d04b;p=pspp diff --git a/doc/statistics.texi b/doc/statistics.texi index 82d853568f..9cc49557ea 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -8,6 +8,7 @@ far. * DESCRIPTIVES:: Descriptive statistics. * FREQUENCIES:: Frequency tables. * EXAMINE:: Testing data for normality. +* GRAPH:: Plot data. * CORRELATIONS:: Correlation tables. * CROSSTABS:: Crosstabulation tables. * FACTOR:: Factor analysis and Principal Components analysis. @@ -194,9 +195,13 @@ For instance, @subcmd{/NTILES=4} would cause quartiles to be reported. The @subcmd{HISTOGRAM} subcommand causes the output to include a histogram for each specified numeric variable. The X axis by default ranges from the minimum to the maximum value observed in the data, but the @subcmd{MINIMUM} -and @subcmd{MAXIMUM} keywords can set an explicit range. Specify @subcmd{NORMAL} to -superimpose a normal curve on the histogram. Histograms are not -created for string variables. +and @subcmd{MAXIMUM} keywords can set an explicit range. The number of +bins are 2IQR(x)n^-1/3 according to the Freedman-Diaconis rule. (Note that +@cmd{EXAMINE} uses a different algorithm to determine bin sizes.) +Histograms are not created for string variables. + +Specify @subcmd{NORMAL} to superimpose a normal curve on the +histogram. @cindex piechart The @subcmd{PIECHART} subcommand adds a pie chart for each variable to the data. Each @@ -288,6 +293,10 @@ normal distribution, whilst the spread vs.@: level plot can be useful to visuali how the variance of differs between factors. Boxplots will also show you the outliers and extreme values. +@subcmd{HISTOGRAM} uses Sturges' rule to determine the number of +bins, as approximately 1 + log2(n). (Note that @cmd{FREQUENCIES} uses a +different algorithm to find the bin size.) + The @subcmd{SPREADLEVEL} plot displays the interquartile range versus the median. It takes an optional parameter @var{t}, which specifies how the data should be transformed prior to plotting. @@ -375,6 +384,52 @@ specified for which there are many distinct values, then @cmd{EXAMINE} will produce a very large quantity of output. +@node GRAPH +@section GRAPH + +@vindex GRAPH +@cindex Exploratory data analysis +@cindex normality, testing + +@display +GRAPH + /HISTOGRAM = @var{var} + /SCATTERPLOT [(BIVARIATE)] = @var{var1} WITH @var{var2} [BY @var{var3}] + [ /MISSING=@{LISTWISE, VARIABLE@} [@{EXCLUDE, INCLUDE@}] ] + [@{NOREPORT,REPORT@}] + +@end display + +The @cmd{GRAPH} produces graphical plots of data. Only one of the subcommands +@subcmd{HISTOGRAM} or @subcmd{SCATTERPLOT} can be specified, i.e. only one plot +can be produced per call of @cmd{GRAPH}. The @subcmd{MISSING} is optional. + +@cindex scatterplot + +The subcommand @subcmd{SCATTERPLOT} produces an xy plot of the data. The different +values of the optional third variable @var{var3} will result in different colours and/or +markers for the plot. The following is an example for producing a scatterplot. + +@example +GRAPH + /SCATTERPLOT = @var{height} WITH @var{weight} BY @var{gender}. +@end example + +This example will produce a scatterplot where height is plotted versus weight. Depending +on the value of the gender variable, the colour of the datapoint is different. With +this plot it is possible to analyze gender differences for height vs. weight relation. + +@cindex histogram + +The subcommand @subcmd{HISTOGRAM} produces a histogram. Only one variable is allowed for +the histogram plot. For an alternative method to produce histograms @pxref{EXAMINE}. The +following example produces a histogram plot for variable weigth. + +@example +GRAPH + /HISTOGRAM = @var{weight}. +@end example + @node CORRELATIONS @section CORRELATIONS @@ -500,7 +555,7 @@ The @subcmd{FORMAT} subcommand controls the characteristics of the crosstabulation tables to be displayed. It has a number of possible settings: -@itemize @asis +@itemize @w{} @item @subcmd{TABLES}, the default, causes crosstabulation tables to be output. @subcmd{NOTABLES} suppresses them. @@ -603,7 +658,7 @@ following bugs: @itemize @bullet @item -Significance of symmetric and directional measures is not calculated. +Significance of some symmetric and directional measures is not calculated. @item Asymptotic standard error is not calculated for Goodman and Kruskal's tau or symmetric Somers' d.