X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdata-selection.texi;h=fbec76af7a3589f75a53300c171de985a24c8de7;hb=3fa740d165a0c2ef7c03b11633f65c421f07f0a2;hp=a8cb12d95364d95c0525ec06974cac9c7981fdfc;hpb=1b1837591924226078c96db15888b68beec2ef6d;p=pspp diff --git a/doc/data-selection.texi b/doc/data-selection.texi index a8cb12d953..fbec76af7a 100644 --- a/doc/data-selection.texi +++ b/doc/data-selection.texi @@ -1,3 +1,12 @@ +@c PSPP - a program for statistical analysis. +@c Copyright (C) 2017 Free Software Foundation, Inc. +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 +@c or any later version published by the Free Software Foundation; +@c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +@c A copy of the license is included in the section entitled "GNU +@c Free Documentation License". +@c @node Data Selection @chapter Selecting data for analysis @@ -175,7 +184,7 @@ When a list of variable names is specified, one of the keywords @subcmd{LAYERED} or @subcmd{SEPARATE} may also be specified. If provided, either keyword are ignored. -Groups are formed only by @emph{adjacent} cases. To create a split +Groups are formed only by @emph{adjacent} cases. To create a split using a variable where like values are not adjacent in the working file, you should first sort the data by that variable (@pxref{SORT CASES}). @@ -185,6 +194,40 @@ entire active dataset as a single group of data. When @cmd{SPLIT FILE} is specified after @cmd{TEMPORARY}, it affects only the next procedure (@pxref{TEMPORARY}). +@subsection Example Split + +The file @file{horticulture.sav} contains data describing the @exvar{yield} +of a number of horticultural specimens which have been subjected to +various @exvar{treatment}s. If we wanted to investigate linear statistics +of the @exvar{yeild}, one way to do this is using the @cmd{DESCRIPTIVES} (@pxref{DESCRIPTIVES}). +However, it is reasonable to expect the mean to be different depending +on the @exvar{treatment}. So we might want to perform three separate +procedures --- one for each treatment. +@footnote{There are other, possibly better, ways to achieve a similar result +using the @cmd{MEANS} or @cmd{EXAMINE} commands.} +@ref{split:ex} shows how this can be done automatically using +the @cmd{SPLIT FILE} command. + +@float Example, split:ex +@psppsyntax {split.sps} +@caption {Running @cmd{DESCRIPTIVES} on each value of @exvar{treatment}} +@end float + +In @ref{split:res} you can see that the table of descriptive statistics +appears 3 times --- once for each value of @exvar{treatment}. +In this example @samp{N}, the number of observations are identical in +all splits. This is because that experiment was deliberately designed +that way. However in general one can expect a different @samp{N} for each +split. + +@float Example, split:res +@psppoutput {split} +@caption {The results of running @cmd{DESCRIPTIVES} with an active split} +@end float + +Unless @cmd{TEMPORARY} was used, after a split has been defined for +a dataset it remains active until explicitly disabled. + @node TEMPORARY @section TEMPORARY @vindex TEMPORARY @@ -267,3 +310,27 @@ the next procedure (@pxref{TEMPORARY}). @cmd{WEIGHT} does not cause cases in the active dataset to be replicated in memory. + + +@subsection Example Weights + +One could define a dataset containing an inventory of stock items. +It would be reasonable to use a string variable for a description of the +item, and a numeric variable for the number in stock, like in @ref{weight:ex}. + +@float Example, weight:ex +@psppsyntax {weight.sps} +@caption {Setting the weight on the variable @exvar{quantity}} +@end float + +One analysis which most surely would be of interest is +the relative amounts or each item in stock. +However without setting a weight variable, @cmd{FREQUENCIES} (@pxref{FREQUENCIES}) will not +tell us what we want to know, since there is only one case for each stock item. +@ref{weight:res} shows the difference between the weighted and unweighted +frequency tables. + +@float Example, weight:res +@psppoutput {weight} +@caption {Weighted and unweighted frequency tables of @exvar{items}} +@end float