From: Daniel Schlieper Date: Sun, 7 Dec 2014 17:31:34 +0000 (-0800) Subject: doc: Describe how bin sizes are chosen for histograms. X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?p=pspp;a=commitdiff_plain;h=af3d2be249020c55e0e90925ff3b21fc70501bac doc: Describe how bin sizes are chosen for histograms. --- diff --git a/doc/statistics.texi b/doc/statistics.texi index 7a880d15ab..9cc49557ea 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -195,9 +195,13 @@ For instance, @subcmd{/NTILES=4} would cause quartiles to be reported. The @subcmd{HISTOGRAM} subcommand causes the output to include a histogram for each specified numeric variable. The X axis by default ranges from the minimum to the maximum value observed in the data, but the @subcmd{MINIMUM} -and @subcmd{MAXIMUM} keywords can set an explicit range. Specify @subcmd{NORMAL} to -superimpose a normal curve on the histogram. Histograms are not -created for string variables. +and @subcmd{MAXIMUM} keywords can set an explicit range. The number of +bins are 2IQR(x)n^-1/3 according to the Freedman-Diaconis rule. (Note that +@cmd{EXAMINE} uses a different algorithm to determine bin sizes.) +Histograms are not created for string variables. + +Specify @subcmd{NORMAL} to superimpose a normal curve on the +histogram. @cindex piechart The @subcmd{PIECHART} subcommand adds a pie chart for each variable to the data. Each @@ -289,6 +293,10 @@ normal distribution, whilst the spread vs.@: level plot can be useful to visuali how the variance of differs between factors. Boxplots will also show you the outliers and extreme values. +@subcmd{HISTOGRAM} uses Sturges' rule to determine the number of +bins, as approximately 1 + log2(n). (Note that @cmd{FREQUENCIES} uses a +different algorithm to find the bin size.) + The @subcmd{SPREADLEVEL} plot displays the interquartile range versus the median. It takes an optional parameter @var{t}, which specifies how the data should be transformed prior to plotting.