From a5d5cfde1dec0777a4d5c8f554027324be9dba57 Mon Sep 17 00:00:00 2001 From: John Darrington Date: Mon, 9 Feb 2015 07:37:36 +0100 Subject: [PATCH] Documentation: Tidy up notes about histogram bin widths --- doc/statistics.texi | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/doc/statistics.texi b/doc/statistics.texi index 7977e3387c..796fde91d3 100644 --- a/doc/statistics.texi +++ b/doc/statistics.texi @@ -197,9 +197,12 @@ For instance, @subcmd{/NTILES=4} would cause quartiles to be reported. The @subcmd{HISTOGRAM} subcommand causes the output to include a histogram for each specified numeric variable. The X axis by default ranges from the minimum to the maximum value observed in the data, but the @subcmd{MINIMUM} -and @subcmd{MAXIMUM} keywords can set an explicit range. The number of -bins are 2IQR(x)n^-1/3 according to the Freedman-Diaconis rule. (Note that -@cmd{EXAMINE} uses a different algorithm to determine bin sizes.) +and @subcmd{MAXIMUM} keywords can set an explicit range. +@footnote{The number of +bins is chosen according to the Freedman-Diaconis rule: +@math{2 \times IQR(x)n^{-1/3}}, where @math{IQR(x)} is the interquartile range of @math{x} +and @math{n} is the number of samples. Note that +@cmd{EXAMINE} uses a different algorithm to determine bin sizes.} Histograms are not created for string variables. Specify @subcmd{NORMAL} to superimpose a normal curve on the @@ -306,10 +309,9 @@ The first three can be used to visualise how closely each cell conforms to a normal distribution, whilst the spread vs.@: level plot can be useful to visualise how the variance of differs between factors. Boxplots will also show you the outliers and extreme values. - -@subcmd{HISTOGRAM} uses Sturges' rule to determine the number of -bins, as approximately 1 + log2(n). (Note that @cmd{FREQUENCIES} uses a -different algorithm to find the bin size.) +@footnote{@subcmd{HISTOGRAM} uses Sturges' rule to determine the number of +bins, as approximately @math{1 + \log2(n)}, where @math{n} is the number of samples. +Note that @cmd{FREQUENCIES} uses a different algorithm to find the bin size.} The @subcmd{SPREADLEVEL} plot displays the interquartile range versus the median. It takes an optional parameter @var{t}, which specifies how the data -- 2.30.2