2 @chapter Basic Concepts
4 This chapter introduces basic data structures and other concepts
5 needed for developing in PSPP.
9 * Input and Output Formats::
10 * User-Missing Values::
14 * Coding Conventions::
24 The unit of data in PSPP is a @dfn{value}.
30 Values are classified by @dfn{type} and @dfn{width}. The
31 type of a value is either @dfn{numeric} or @dfn{string} (sometimes
32 called alphanumeric). The width of a string value ranges from 1 to
33 @code{MAX_STRING} bytes. The width of a numeric value is artificially
34 defined to be 0; thus, the type of a value can be inferred from its
37 Some support is provided for working with value types and widths, in
38 @file{data/val-type.h}:
40 @deftypefn Macro int MAX_STRING
41 Maximum width of a string value, in bytes, currently 32,767.
44 @deftypefun bool val_type_is_valid (enum val_type @var{val_type})
45 Returns true if @var{val_type} is a valid value type, that is,
46 either @code{VAL_NUMERIC} or @code{VAL_STRING}. Useful for
50 @deftypefun {enum val_type} val_type_from_width (int @var{width})
51 Returns @code{VAL_NUMERIC} if @var{width} is 0 and thus represents the
52 width of a numeric value, otherwise @code{VAL_STRING} to indicate that
53 @var{width} is the width of a string value.
56 The following subsections describe how values of each type are
62 * Runtime Typed Values::
66 @subsection Numeric Values
68 A value known to be numeric at compile time is represented as a
69 @code{double}. PSPP provides three values of @code{double} for
70 special purposes, defined in @file{data/val-type.h}:
72 @deftypefn Macro double SYSMIS
73 The @dfn{system-missing value}, used to represent a datum whose true
74 value is unknown, such as a survey question that was not answered by
75 the respondent, or undefined, such as the result of division by zero.
76 PSPP propagates the system-missing value through calculations and
77 compensates for missing values in statistical analyses. @xref{Missing
78 Observations,,,pspp, PSPP Users Guide}, for a PSPP user's view of
81 PSPP currently defines @code{SYSMIS} as @code{-DBL_MAX}, that is, the
82 greatest finite negative value of @code{double}. It is best not to
83 depend on this definition, because PSPP may transition to using an
84 IEEE NaN (not a number) instead at some point in the future.
87 @deftypefn Macro double LOWEST
88 @deftypefnx Macro double HIGHEST
89 The greatest finite negative (except for @code{SYSMIS}) and positive
90 values of @code{double}, respectively. These values do not ordinarily
91 appear in user data files. Instead, they are used to implement
92 endpoints of open-ended ranges that are occasionally permitted in PSPP
93 syntax, e.g.@: @code{5 THRU HI} as a range of missing values
94 (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}).
98 @subsection String Values
100 A value known at compile time to have string type is represented as an
101 array of @code{char}. String values do not necessarily represent
102 readable text strings and may contain arbitrary 8-bit data, including
103 null bytes, control codes, and bytes with the high bit set. Thus,
104 string values are not null-terminated strings, but rather opaque
107 @code{SYSMIS}, @code{LOWEST}, and @code{HIGHEST} have no equivalents
108 as string values. Usually, PSPP fills an unknown or undefined string
109 values with spaces, but PSPP does not treat such a string as a special
110 case when it processes it later.
113 @code{MAX_STRING}, the maximum length of a string value, is defined in
114 @file{data/val-type.h}.
116 @node Runtime Typed Values
117 @subsection Runtime Typed Values
119 When a value's type is only known at runtime, it is often represented
120 as a @union{value}, defined in @file{data/value.h}. A @union{value}
121 does not identify the type or width of the data it contains. Code
122 that works with @union{values}s must therefore have external knowledge
123 of its content, often through the type and width of a
124 @struct{variable} (@pxref{Variables}).
126 @union{value} has one member that clients are permitted to access
127 directly, a @code{double} named @samp{f} that stores the content of a
128 numeric @union{value}. It has other members that store the content of
129 string @union{value}, but client code should use accessor functions
130 instead of referring to these directly.
132 PSPP provides some functions for working with @union{value}s. The
133 most useful are described below. To use these functions, recall that
134 a numeric value has a width of 0.
136 @deftypefun void value_init (union value *@var{value}, int @var{width})
137 Initializes @var{value} as a value of the given @var{width}. After
138 initialization, the data in @var{value} are indeterminate; the caller
139 is responsible for storing initial data in it.
142 @deftypefun void value_destroy (union value *@var{value}, int @var{width})
143 Frees auxiliary storage associated with @var{value}, which must have
144 the given @var{width}.
147 @deftypefun bool value_needs_init (int @var{width})
148 For some widths, @func{value_init} and @func{value_destroy} do not
149 actually do anything, because no additional storage is needed beyond
150 the size of @union{value}. This function returns true if @var{width}
151 is such a width, which case there is no actual need to call those
152 functions. This can be a useful optimization if a large number of
153 @union{value}s of such a width are to be initialized or destroyed.
155 This function returns false if @func{value_init} and
156 @func{value_destroy} are actually required for the given @var{width}.
159 @deftypefun void value_copy (union value *@var{dst}, @
160 const union value *@var{src}, @
162 Copies the contents of @union{value} @var{src} to @var{dst}. Both
163 @var{dst} and @var{src} must have been initialized with the specified
167 @deftypefun void value_set_missing (union value *@var{value}, int @var{width})
168 Sets @var{value} to @code{SYSMIS} if it is numeric or to all spaces if
169 it is alphanumeric, according to @var{width}. @var{value} must have
170 been initialized with the specified @var{width}.
173 @anchor{value_is_resizable}
174 @deftypefun bool value_is_resizable (const union value *@var{value}, int @var{old_width}, int @var{new_width})
175 Determines whether @var{value}, which must have been initialized with
176 the specified @var{old_width}, may be resized to @var{new_width}.
177 Resizing is possible if the following criteria are met. First,
178 @var{old_width} and @var{new_width} must be both numeric or both
179 string widths. Second, if @var{new_width} is a short string width and
180 less than @var{old_width}, resizing is allowed only if bytes
181 @var{new_width} through @var{old_width} in @var{value} contain only
184 These rules are part of those used by @func{mv_is_resizable} and
185 @func{val_labs_can_set_width}.
188 @deftypefun void value_resize (union value *@var{value}, int @var{old_width}, int @var{new_width})
189 Resizes @var{value} from @var{old_width} to @var{new_width}, which
190 must be allowed by the rules stated above. @var{value} must have been
191 initialized with the specified @var{old_width} before calling this
192 function. After resizing, @var{value} has width @var{new_width}.
194 If @var{new_width} is greater than @var{old_width}, @var{value} will
195 be padded on the right with spaces to the new width. If
196 @var{new_width} is less than @var{old_width}, the rightmost bytes of
197 @var{value} are truncated.
200 @deftypefun bool value_equal (const union value *@var{a}, const union value *@var{b}, int @var{width})
201 Compares of @var{a} and @var{b}, which must both have width
202 @var{width}. Returns true if their contents are the same, false if
206 @deftypefun int value_compare_3way (const union value *@var{a}, const union value *@var{b}, int @var{width})
207 Compares of @var{a} and @var{b}, which must both have width
208 @var{width}. Returns -1 if @var{a} is less than @var{b}, 0 if they
209 are equal, or 1 if @var{a} is greater than @var{b}.
211 Numeric values are compared numerically, with @code{SYSMIS} comparing
212 less than any real number. String values are compared
213 lexicographically byte-by-byte.
216 @deftypefun size_t value_hash (const union value *@var{value}, int @var{width}, unsigned int @var{basis})
217 Computes and returns a hash of @var{value}, which must have the
218 specified @var{width}. The value in @var{basis} is folded into the
222 @node Input and Output Formats
223 @section Input and Output Formats
225 Input and output formats specify how to convert data fields to and
226 from data values (@pxref{Input and Output Formats,,,pspp, PSPP Users
227 Guide}). PSPP uses @struct{fmt_spec} to represent input and output
230 Function prototypes and other declarations related to formats are in
231 the @file{<data/format.h>} header.
233 @deftp {Structure} {struct fmt_spec}
234 An input or output format, with the following members:
237 @item enum fmt_type type
238 The format type (see below).
241 Field width, in bytes. The width of numeric fields is always between
242 1 and 40 bytes, and the width of string fields is always between 1 and
243 65534 bytes. However, many individual types of formats place stricter
244 limits on field width (see @ref{fmt_max_input_width},
245 @ref{fmt_max_output_width}).
248 Number of decimal places, in character positions. For format types
249 that do not allow decimal places to be specified, this value must be
250 0. Format types that do allow decimal places have type-specific and
251 often width-specific restrictions on @code{d} (see
252 @ref{fmt_max_input_decimals}, @ref{fmt_max_output_decimals}).
256 @deftp {Enumeration} {enum fmt_type}
257 An enumerated type representing an input or output format type. Each
258 PSPP input and output format has a corresponding enumeration constant
259 prefixed by @samp{FMT}: @code{FMT_F}, @code{FMT_COMMA},
260 @code{FMT_DOT}, and so on.
263 The following sections describe functions for manipulating formats and
264 the data in fields represented by formats.
267 * Constructing and Verifying Formats::
268 * Format Utility Functions::
269 * Obtaining Properties of Format Types::
270 * Numeric Formatting Styles::
271 * Formatted Data Input and Output::
274 @node Constructing and Verifying Formats
275 @subsection Constructing and Verifying Formats
277 These functions construct @struct{fmt_spec}s and verify that they are
282 @deftypefun {struct fmt_spec} fmt_for_input (enum fmt_type @var{type}, int @var{w}, int @var{d})
283 @deftypefunx {struct fmt_spec} fmt_for_output (enum fmt_type @var{type}, int @var{w}, int @var{d})
284 Constructs a @struct{fmt_spec} with the given @var{type}, @var{w}, and
285 @var{d}, asserts that the result is a valid input (or output) format,
289 @anchor{fmt_for_output_from_input}
290 @deftypefun {struct fmt_spec} fmt_for_output_from_input (const struct fmt_spec *@var{input})
291 Given @var{input}, which must be a valid input format, returns the
292 equivalent output format. @xref{Input and Output Formats,,,pspp, PSPP
293 Users Guide}, for the rules for converting input formats into output
297 @deftypefun {struct fmt_spec} fmt_default_for_width (int @var{width})
298 Returns the default output format for a variable of the given
299 @var{width}. For a numeric variable, this is F8.2 format; for a
300 string variable, it is the A format of the given @var{width}.
303 The following functions check whether a @struct{fmt_spec} is valid for
304 various uses and return true if so, false otherwise. When any of them
305 returns false, it also outputs an explanatory error message using
306 @func{msg}. To suppress error output, enclose a call to one of these
307 functions by a @func{msg_disable}/@func{msg_enable} pair.
309 @deftypefun bool fmt_check (const struct fmt_spec *@var{format}, bool @var{for_input})
310 @deftypefunx bool fmt_check_input (const struct fmt_spec *@var{format})
311 @deftypefunx bool fmt_check_output (const struct fmt_spec *@var{format})
312 Checks whether @var{format} is a valid input format (for
313 @func{fmt_check_input}, or @func{fmt_check} if @var{for_input}) or
314 output format (for @func{fmt_check_output}, or @func{fmt_check} if not
318 @deftypefun bool fmt_check_type_compat (const struct fmt_spec *@var{format}, enum val_type @var{type})
319 Checks whether @var{format} matches the value type @var{type}, that
320 is, if @var{type} is @code{VAL_NUMERIC} and @var{format} is a numeric
321 format or @var{type} is @code{VAL_STRING} and @var{format} is a string
325 @deftypefun bool fmt_check_width_compat (const struct fmt_spec *@var{format}, int @var{width})
326 Checks whether @var{format} may be used as an output format for a
327 value of the given @var{width}.
329 @func{fmt_var_width}, described in
330 the following section, can be also be used to determine the value
331 width needed by a format.
334 @node Format Utility Functions
335 @subsection Format Utility Functions
337 These functions work with @struct{fmt_spec}s.
339 @deftypefun int fmt_var_width (const struct fmt_spec *@var{format})
340 Returns the width for values associated with @var{format}. If
341 @var{format} is a numeric format, the width is 0; if @var{format} is
342 an A format, then the width @code{@var{format}->w}; otherwise,
343 @var{format} is an AHEX format and its width is @code{@var{format}->w
347 @deftypefun char *fmt_to_string (const struct fmt_spec *@var{format}, char @var{s}[FMT_STRING_LEN_MAX + 1])
348 Converts @var{format} to a human-readable format specifier in @var{s}
349 and returns @var{s}. @var{format} need not be a valid input or output
350 format specifier, e.g.@: it is allowed to have an excess width or
351 decimal places. In particular, if @var{format} has decimals, they are
352 included in the output string, even if @var{format}'s type does not
353 allow decimals, to allow accurately presenting incorrect formats to
357 @deftypefun bool fmt_equal (const struct fmt_spec *@var{a}, const struct fmt_spec *@var{b})
358 Compares @var{a} and @var{b} memberwise and returns true if they are
359 identical, false otherwise. @var{format} need not be a valid input or
360 output format specifier.
363 @deftypefun void fmt_resize (struct fmt_spec *@var{fmt}, int @var{width})
364 Sets the width of @var{fmt} to a valid format for a @union{value} of size @var{width}.
367 @node Obtaining Properties of Format Types
368 @subsection Obtaining Properties of Format Types
370 These functions work with @enum{fmt_type}s instead of the higher-level
371 @struct{fmt_spec}s. Their primary purpose is to report properties of
372 each possible format type, which in turn allows clients to abstract
373 away many of the details of the very heterogeneous requirements of
376 The first group of functions works with format type names.
378 @deftypefun const char *fmt_name (enum fmt_type @var{type})
379 Returns the name for the given @var{type}, e.g.@: @code{"COMMA"} for
383 @deftypefun bool fmt_from_name (const char *@var{name}, enum fmt_type *@var{type})
384 Tries to find the @enum{fmt_type} associated with @var{name}. If
385 successful, sets @code{*@var{type}} to the type and returns true;
386 otherwise, returns false without modifying @code{*@var{type}}.
389 The functions below query basic limits on width and decimal places for
392 @deftypefun bool fmt_takes_decimals (enum fmt_type @var{type})
393 Returns true if a format of the given @var{type} is allowed to have a
394 nonzero number of decimal places (the @code{d} member of
395 @struct{fmt_spec}), false if not.
398 @anchor{fmt_min_input_width}
399 @anchor{fmt_max_input_width}
400 @anchor{fmt_min_output_width}
401 @anchor{fmt_max_output_width}
402 @deftypefun int fmt_min_input_width (enum fmt_type @var{type})
403 @deftypefunx int fmt_max_input_width (enum fmt_type @var{type})
404 @deftypefunx int fmt_min_output_width (enum fmt_type @var{type})
405 @deftypefunx int fmt_max_output_width (enum fmt_type @var{type})
406 Returns the minimum or maximum width (the @code{w} member of
407 @struct{fmt_spec}) allowed for an input or output format of the
408 specified @var{type}.
411 @anchor{fmt_max_input_decimals}
412 @anchor{fmt_max_output_decimals}
413 @deftypefun int fmt_max_input_decimals (enum fmt_type @var{type}, int @var{width})
414 @deftypefunx int fmt_max_output_decimals (enum fmt_type @var{type}, int @var{width})
415 Returns the maximum number of decimal places allowed for an input or
416 output format, respectively, of the given @var{type} and @var{width}.
417 Returns 0 if the specified @var{type} does not allow any decimal
418 places or if @var{width} is too narrow to allow decimal places.
421 @deftypefun int fmt_step_width (enum fmt_type @var{type})
422 Returns the ``width step'' for a @struct{fmt_spec} of the given
423 @var{type}. A @struct{fmt_spec}'s width must be a multiple of its
424 type's width step. Most format types have a width step of 1, so that
425 their formats' widths may be any integer within the valid range, but
426 hexadecimal numeric formats and AHEX string formats have a width step
430 These functions allow clients to broadly determine how each kind of
431 input or output format behaves.
433 @deftypefun bool fmt_is_string (enum fmt_type @var{type})
434 @deftypefunx bool fmt_is_numeric (enum fmt_type @var{type})
435 Returns true if @var{type} is a format for numeric or string values,
436 respectively, false otherwise.
439 @deftypefun enum fmt_category fmt_get_category (enum fmt_type @var{type})
440 Returns the category within which @var{type} falls.
442 @deftp {Enumeration} {enum fmt_category}
443 A group of format types. Format type categories correspond to the
444 input and output categories described in the PSPP user documentation
445 (@pxref{Input and Output Formats,,,pspp, PSPP Users Guide}).
447 Each format is in exactly one category. The categories have bitwise
448 disjoint values to make it easy to test whether a format type is in
449 one of multiple categories, e.g.@:
452 if (fmt_get_category (type) & (FMT_CAT_DATE | FMT_CAT_TIME))
454 /* @dots{}@r{@code{type} is a date or time format}@dots{} */
458 The format categories are:
461 Basic numeric formats.
464 Custom currency formats.
467 Legacy numeric formats.
472 @item FMT_CAT_HEXADECIMAL
481 @item FMT_CAT_DATE_COMPONENT
482 Date component formats.
490 The PSPP input and output routines use the following pair of functions
491 to convert @enum{fmt_type}s to and from the separate set of codes used
492 in system and portable files:
494 @deftypefun int fmt_to_io (enum fmt_type @var{type})
495 Returns the format code used in system and portable files that
496 corresponds to @var{type}.
499 @deftypefun bool fmt_from_io (int @var{io}, enum fmt_type *@var{type})
500 Converts @var{io}, a format code used in system and portable files,
501 into a @enum{fmt_type} in @code{*@var{type}}. Returns true if
502 successful, false if @var{io} is not valid.
505 These functions reflect the relationship between input and output
508 @deftypefun enum fmt_type fmt_input_to_output (enum fmt_type @var{type})
509 Returns the output format type that is used by default by DATA LIST
510 and other input procedures when @var{type} is specified as an input
511 format. The conversion from input format to output format is more
512 complicated than simply changing the format.
513 @xref{fmt_for_output_from_input}, for a function that performs the
517 @deftypefun bool fmt_usable_for_input (enum fmt_type @var{type})
518 Returns true if @var{type} may be used as an input format type, false
519 otherwise. The custom currency formats, in particular, may be used
520 for output but not for input.
522 All format types are valid for output.
525 The final group of format type property functions obtain
526 human-readable templates that illustrate the formats graphically.
528 @deftypefun const char *fmt_date_template (enum fmt_type @var{type})
529 Returns a formatting template for @var{type}, which must be a date or
530 time format type. These formats are used by @func{data_in} and
531 @func{data_out} to guide parsing and formatting date and time data.
534 @deftypefun char *fmt_dollar_template (const struct fmt_spec *@var{format})
535 Returns a string of the form @code{$#,###.##} according to
536 @var{format}, which must be of type @code{FMT_DOLLAR}. The caller
537 must free the string with @code{free}.
540 @node Numeric Formatting Styles
541 @subsection Numeric Formatting Styles
543 Each of the basic numeric formats (F, E, COMMA, DOT, DOLLAR, PCT) and
544 custom currency formats (CCA, CCB, CCC, CCD, CCE) has an associated
545 numeric formatting style, represented by @struct{fmt_number_style}.
546 Input and output conversion of formats that have numeric styles is
547 determined mainly by the style, although the formatting rules have
548 special cases that are not represented within the style.
550 @deftp {Structure} {struct fmt_number_style}
551 A structure type with the following members:
554 @item struct substring neg_prefix
555 @itemx struct substring prefix
556 @itemx struct substring suffix
557 @itemx struct substring neg_suffix
558 A set of strings used a prefix to negative numbers, a prefix to every
559 number, a suffix to every number, and a suffix to negative numbers,
560 respectively. Each of these strings is no more than
561 @code{FMT_STYLE_AFFIX_MAX} bytes (currently 16) bytes in length.
562 These strings must be freed with @func{ss_dealloc} when no longer
566 The character used as a decimal point. It must be either @samp{.} or
570 The character used for grouping digits to the left of the decimal
571 point. It may be @samp{.} or @samp{,}, in which case it must not be
572 equal to @code{decimal}, or it may be set to 0 to disable grouping.
576 The following functions are provided for working with numeric
579 @deftypefun void fmt_number_style_init (struct fmt_number_style *@var{style})
580 Initialises a @struct{fmt_number_style} with all of the
581 prefixes and suffixes set to the empty string, @samp{.} as the decimal
582 point character, and grouping disables.
586 @deftypefun void fmt_number_style_destroy (struct fmt_number_style *@var{style})
587 Destroys @var{style}, freeing its storage.
590 @deftypefun {struct fmt_number_style} *fmt_create (void)
591 A function which creates an array of all the styles used by pspp, and
592 calls fmt_number_style_init on each of them.
595 @deftypefun void fmt_done (struct fmt_number_style *@var{styles})
596 A wrapper function which takes an array of @struct{fmt_number_style}, calls
597 fmt_number_style_destroy on each of them, and then frees the array.
602 @deftypefun int fmt_affix_width (const struct fmt_number_style *@var{style})
603 Returns the total length of @var{style}'s @code{prefix} and @code{suffix}.
606 @deftypefun int fmt_neg_affix_width (const struct fmt_number_style *@var{style})
607 Returns the total length of @var{style}'s @code{neg_prefix} and
611 PSPP maintains a global set of number styles for each of the basic
612 numeric formats and custom currency formats. The following functions
613 work with these global styles:
615 @deftypefun {const struct fmt_number_style *} fmt_get_style (enum fmt_type @var{type})
616 Returns the numeric style for the given format @var{type}.
619 @deftypefun {const char *} fmt_name (enum fmt_type @var{type})
620 Returns the name of the given format @var{type}.
625 @node Formatted Data Input and Output
626 @subsection Formatted Data Input and Output
628 These functions provide the ability to convert data fields into
629 @union{value}s and vice versa.
631 @deftypefun bool data_in (struct substring @var{input}, const char *@var{encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, const struct dictionary *@var{dict}, union value *@var{output}, int @var{width})
632 Parses @var{input} as a field containing data in the given format
633 @var{type}. The resulting value is stored in @var{output}, which the
634 caller must have initialized with the given @var{width}. For
635 consistency, @var{width} must be 0 if
636 @var{type} is a numeric format type and greater than 0 if @var{type}
637 is a string format type.
638 @var{encoding} should be set to indicate the character
639 encoding of @var{input}.
640 @var{dict} must be a pointer to the dictionary with which @var{output}
643 If @var{input} is the empty string (with length 0), @var{output} is
644 set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP
645 Users Guide}) for a numeric value, or to all spaces for a string
646 value. This applies regardless of the usual parsing requirements for
649 If @var{implied_decimals} is greater than zero, then the numeric
650 result is shifted right by @var{implied_decimals} decimal places if
651 @var{input} does not contain a decimal point character or an exponent.
652 Only certain numeric format types support implied decimal places; for
653 string formats and other numeric formats, @var{implied_decimals} has
654 no effect. DATA LIST FIXED is the primary user of this feature
655 (@pxref{DATA LIST FIXED,,,pspp, PSPP Users Guide}). Other callers
656 should generally specify 0 for @var{implied_decimals}, to disable this
659 When @var{input} contains invalid input data, @func{data_in} outputs a
660 message using @func{msg}.
662 If @var{first_column} is
663 nonzero, it is included in any such error message as the 1-based
664 column number of the start of the field. The last column in the field
665 is calculated as @math{@var{first_column} + @var{input} - 1}. To
666 suppress error output, enclose the call to @func{data_in} by calls to
667 @func{msg_disable} and @func{msg_enable}.
669 This function returns true on success, false if a message was output
670 (even if suppressed). Overflow and underflow provoke warnings but are
671 not propagated to the caller as errors.
673 This function is declared in @file{data/data-in.h}.
676 @deftypefun char * data_out (const union value *@var{input}, const struct fmt_spec *@var{format})
677 @deftypefunx char * data_out_legacy (const union value *@var{input}, const char *@var{encoding}, const struct fmt_spec *@var{format})
678 Converts the data pointed to by @var{input} into a string value, which
679 will be encoded in UTF-8, according to output format specifier @var{format}.
681 must be a valid output format. The width of @var{input} is
682 inferred from @var{format} using an algorithm equivalent to
683 @func{fmt_var_width}.
685 When @var{input} contains data that cannot be represented in the given
686 @var{format}, @func{data_out} may output a message using @func{msg},
688 although the current implementation does not
689 consistently do so. To suppress error output, enclose the call to
690 @func{data_out} by calls to @func{msg_disable} and @func{msg_enable}.
692 This function is declared in @file{data/data-out.h}.
695 @node User-Missing Values
696 @section User-Missing Values
698 In addition to the system-missing value for numeric values, each
699 variable has a set of user-missing values (@pxref{MISSING
700 VALUES,,,pspp, PSPP Users Guide}). A set of user-missing values is
701 represented by @struct{missing_values}.
703 It is rarely necessary to interact directly with a
704 @struct{missing_values} object. Instead, the most common operation,
705 querying whether a particular value is a missing value for a given
706 variable, is most conveniently executed through functions on
707 @struct{variable}. @xref{Variable Missing Values}, for details.
709 A @struct{missing_values} is essentially a set of @union{value}s that
710 have a common value width (@pxref{Values}). For a set of
711 missing values associated with a variable (the common case), the set's
712 width is the same as the variable's width.
714 Function prototypes and other declarations related to missing values
715 are declared in @file{data/missing-values.h}.
717 @deftp {Structure} {struct missing_values}
718 Opaque type that represents a set of missing values.
721 The contents of a set of missing values is subject to some
722 restrictions. Regardless of width, a set of missing values is allowed
723 to be empty. A set of numeric missing values may contain up to three
724 discrete numeric values, or a range of numeric values (which includes
725 both ends of the range), or a range plus one discrete numeric value.
726 A set of string missing values may contain up to three discrete string
727 values (with the same width as the set), but ranges are not supported.
729 In addition, values in string missing values wider than
730 @code{MV_MAX_STRING} bytes may contain non-space characters only in
731 their first @code{MV_MAX_STRING} bytes; all the bytes after the first
732 @code{MV_MAX_STRING} must be spaces. @xref{mv_is_acceptable}, for a
733 function that tests a value against these constraints.
735 @deftypefn Macro int MV_MAX_STRING
736 Number of bytes in a string missing value that are not required to be
737 spaces. The current value is 8, a value which is fixed by the system
738 file format. In PSPP we could easily eliminate this restriction, but
739 doing so would also require us to extend the system file format in an
740 incompatible way, which we consider a bad tradeoff.
743 The most often useful functions for missing values are those for
744 testing whether a given value is missing, described in the following
745 section. Several other functions for creating, inspecting, and
746 modifying @struct{missing_values} objects are described afterward, but
747 these functions are much more rarely useful.
750 * Testing for Missing Values::
751 * Creating and Destroying User-Missing Values::
752 * Changing User-Missing Value Set Width::
753 * Inspecting User-Missing Value Sets::
754 * Modifying User-Missing Value Sets::
757 @node Testing for Missing Values
758 @subsection Testing for Missing Values
760 The most often useful functions for missing values are those for
761 testing whether a given value is missing, described here. However,
762 using one of the corresponding missing value testing functions for
763 variables can be even easier (@pxref{Variable Missing Values}).
765 @deftypefun bool mv_is_value_missing (const struct missing_values *@var{mv}, const union value *@var{value}, enum mv_class @var{class})
766 @deftypefunx bool mv_is_num_missing (const struct missing_values *@var{mv}, double @var{value}, enum mv_class @var{class})
767 @deftypefunx bool mv_is_str_missing (const struct missing_values *@var{mv}, const char @var{value}[], enum mv_class @var{class})
768 Tests whether @var{value} is in one of the categories of missing
769 values given by @var{class}. Returns true if so, false otherwise.
771 @var{mv} determines the width of @var{value} and provides the set of
772 user-missing values to test.
774 The only difference among these functions in the form in which
775 @var{value} is provided, so you may use whichever function is most
778 The @var{class} argument determines the exact kinds of missing values
779 that the functions test for:
781 @deftp Enumeration {enum mv_class}
784 Returns true if @var{value} is in the set of user-missing values given
788 Returns true if @var{value} is system-missing. (If @var{mv}
789 represents a set of string values, then @var{value} is never
793 @itemx MV_USER | MV_SYSTEM
794 Returns true if @var{value} is user-missing or system-missing.
797 Always returns false, that is, @var{value} is never considered
803 @node Creating and Destroying User-Missing Values
804 @subsection Creation and Destruction
806 These functions create and destroy @struct{missing_values} objects.
808 @deftypefun void mv_init (struct missing_values *@var{mv}, int @var{width})
809 Initializes @var{mv} as a set of user-missing values. The set is
810 initially empty. Any values added to it must have the specified
814 @deftypefun void mv_destroy (struct missing_values *@var{mv})
815 Destroys @var{mv}, which must not be referred to again.
818 @deftypefun void mv_copy (struct missing_values *@var{mv}, const struct missing_values *@var{old})
819 Initializes @var{mv} as a copy of the existing set of user-missing
823 @deftypefun void mv_clear (struct missing_values *@var{mv})
824 Empties the user-missing value set @var{mv}, retaining its existing
828 @node Changing User-Missing Value Set Width
829 @subsection Changing User-Missing Value Set Width
831 A few PSPP language constructs copy sets of user-missing values from
832 one variable to another. When the source and target variables have
833 the same width, this is simple. But when the target variable's width
834 might be different from the source variable's, it takes a little more
835 work. The functions described here can help.
837 In fact, it is usually unnecessary to call these functions directly.
838 Most of the time @func{var_set_missing_values}, which uses
839 @func{mv_resize} internally to resize the new set of missing values to
840 the required width, may be used instead.
841 @xref{var_set_missing_values}, for more information.
843 @deftypefun bool mv_is_resizable (const struct missing_values *@var{mv}, int @var{new_width})
844 Tests whether @var{mv}'s width may be changed to @var{new_width} using
845 @func{mv_resize}. Returns true if it is allowed, false otherwise.
847 If @var{mv} contains any missing values, then it may be resized only
848 if each missing value may be resized, as determined by
849 @func{value_is_resizable} (@pxref{value_is_resizable}).
853 @deftypefun void mv_resize (struct missing_values *@var{mv}, int @var{width})
854 Changes @var{mv}'s width to @var{width}. @var{mv} and @var{width}
855 must satisfy the constraints explained above.
857 When a string missing value set's width is increased, each
858 user-missing value is padded on the right with spaces to the new
862 @node Inspecting User-Missing Value Sets
863 @subsection Inspecting User-Missing Value Sets
865 These functions inspect the properties and contents of
866 @struct{missing_values} objects.
868 The first set of functions inspects the discrete values that sets of
869 user-missing values may contain:
871 @deftypefun bool mv_is_empty (const struct missing_values *@var{mv})
872 Returns true if @var{mv} contains no user-missing values, false if it
873 contains at least one user-missing value (either a discrete value or a
877 @deftypefun int mv_get_width (const struct missing_values *@var{mv})
878 Returns the width of the user-missing values that @var{mv} represents.
881 @deftypefun int mv_n_values (const struct missing_values *@var{mv})
882 Returns the number of discrete user-missing values included in
883 @var{mv}. The return value will be between 0 and 3. For sets of
884 numeric user-missing values that include a range, the return value
888 @deftypefun bool mv_has_value (const struct missing_values *@var{mv})
889 Returns true if @var{mv} has at least one discrete user-missing
890 values, that is, if @func{mv_n_values} would return nonzero for
894 @deftypefun {const union value *} mv_get_value (const struct missing_values *@var{mv}, int @var{index})
895 Returns the discrete user-missing value in @var{mv} with the given
896 @var{index}. The caller must not modify or free the returned value or
897 refer to it after modifying or freeing @var{mv}. The index must be
898 less than the number of discrete user-missing values in @var{mv}, as
899 reported by @func{mv_n_values}.
902 The second set of functions inspects the single range of values that
903 numeric sets of user-missing values may contain:
905 @deftypefun bool mv_has_range (const struct missing_values *@var{mv})
906 Returns true if @var{mv} includes a range, false otherwise.
909 @deftypefun void mv_get_range (const struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
910 Stores the low endpoint of @var{mv}'s range in @code{*@var{low}} and
911 the high endpoint of the range in @code{*@var{high}}. @var{mv} must
915 @node Modifying User-Missing Value Sets
916 @subsection Modifying User-Missing Value Sets
918 These functions modify the contents of @struct{missing_values}
921 The next set of functions applies to all sets of user-missing values:
923 @deftypefun bool mv_add_value (struct missing_values *@var{mv}, const union value *@var{value})
924 @deftypefunx bool mv_add_str (struct missing_values *@var{mv}, const char @var{value}[])
925 @deftypefunx bool mv_add_num (struct missing_values *@var{mv}, double @var{value})
926 Attempts to add the given discrete @var{value} to set of user-missing
927 values @var{mv}. @var{value} must have the same width as @var{mv}.
928 Returns true if @var{value} was successfully added, false if the set
929 could not accept any more discrete values or if @var{value} is not an
930 acceptable user-missing value (see @func{mv_is_acceptable} below).
932 These functions are equivalent, except for the form in which
933 @var{value} is provided, so you may use whichever function is most
937 @deftypefun void mv_pop_value (struct missing_values *@var{mv}, union value *@var{value})
938 Removes a discrete value from @var{mv} (which must contain at least
939 one discrete value) and stores it in @var{value}.
942 @deftypefun bool mv_replace_value (struct missing_values *@var{mv}, const union value *@var{value}, int @var{index})
943 Attempts to replace the discrete value with the given @var{index} in
944 @var{mv} (which must contain at least @var{index} + 1 discrete values)
945 by @var{value}. Returns true if successful, false if @var{value} is
946 not an acceptable user-missing value (see @func{mv_is_acceptable}
950 @deftypefun bool mv_is_acceptable (const union value *@var{value}, int @var{width})
951 @anchor{mv_is_acceptable}
952 Returns true if @var{value}, which must have the specified
953 @var{width}, may be added to a missing value set of the same
954 @var{width}, false if it cannot. As described above, all numeric
955 values and string values of width @code{MV_MAX_STRING} or less may be
956 added, but string value of greater width may be added only if bytes
957 beyond the first @code{MV_MAX_STRING} are all spaces.
960 The second set of functions applies only to numeric sets of
963 @deftypefun bool mv_add_range (struct missing_values *@var{mv}, double @var{low}, double @var{high})
964 Attempts to add a numeric range covering @var{low}@dots{}@var{high}
965 (inclusive on both ends) to @var{mv}, which must be a numeric set of
966 user-missing values. Returns true if the range is successful added,
967 false on failure. Fails if @var{mv} already contains a range, or if
968 @var{mv} contains more than one discrete value, or if @var{low} >
972 @deftypefun void mv_pop_range (struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
973 Given @var{mv}, which must be a numeric set of user-missing values
974 that contains a range, removes that range from @var{mv} and stores its
975 low endpoint in @code{*@var{low}} and its high endpoint in
980 @section Value Labels
982 Each variable has a set of value labels (@pxref{VALUE LABELS,,,pspp,
983 PSPP Users Guide}), represented as @struct{val_labs}. A
984 @struct{val_labs} is essentially a map from @union{value}s to strings.
985 All of the values in a set of value labels have the same width, which
986 for a set of value labels owned by a variable (the common case) is the
987 same as its variable.
989 Sets of value labels may contain any number of entries.
991 It is rarely necessary to interact directly with a @struct{val_labs}
992 object. Instead, the most common operation, looking up the label for
993 a value of a given variable, can be conveniently executed through
994 functions on @struct{variable}. @xref{Variable Value Labels}, for
997 Function prototypes and other declarations related to missing values
998 are declared in @file{data/value-labels.h}.
1000 @deftp {Structure} {struct val_labs}
1001 Opaque type that represents a set of value labels.
1004 The most often useful function for value labels is
1005 @func{val_labs_find}, for looking up the label associated with a
1008 @deftypefun {char *} val_labs_find (const struct val_labs *@var{val_labs}, union value @var{value})
1009 Looks in @var{val_labs} for a label for the given @var{value}.
1010 Returns the label, if one is found, or a null pointer otherwise.
1013 Several other functions for working with value labels are described in
1014 the following section, but these are more rarely useful.
1017 * Value Labels Creation and Destruction::
1018 * Value Labels Properties::
1019 * Value Labels Adding and Removing Labels::
1020 * Value Labels Iteration::
1023 @node Value Labels Creation and Destruction
1024 @subsection Creation and Destruction
1026 These functions create and destroy @struct{val_labs} objects.
1028 @deftypefun {struct val_labs *} val_labs_create (int @var{width})
1029 Creates and returns an initially empty set of value labels with the
1033 @deftypefun {struct val_labs *} val_labs_clone (const struct val_labs *@var{val_labs})
1034 Creates and returns a set of value labels whose width and contents are
1035 the same as those of @var{var_labs}.
1038 @deftypefun void val_labs_clear (struct val_labs *@var{var_labs})
1039 Deletes all value labels from @var{var_labs}.
1042 @deftypefun void val_labs_destroy (struct val_labs *@var{var_labs})
1043 Destroys @var{var_labs}, which must not be referenced again.
1046 @node Value Labels Properties
1047 @subsection Value Labels Properties
1049 These functions inspect and manipulate basic properties of
1050 @struct{val_labs} objects.
1052 @deftypefun size_t val_labs_count (const struct val_labs *@var{val_labs})
1053 Returns the number of value labels in @var{val_labs}.
1056 @deftypefun bool val_labs_can_set_width (const struct val_labs *@var{val_labs}, int @var{new_width})
1057 Tests whether @var{val_labs}'s width may be changed to @var{new_width}
1058 using @func{val_labs_set_width}. Returns true if it is allowed, false
1061 A set of value labels may be resized to a given width only if each
1062 value in it may be resized to that width, as determined by
1063 @func{value_is_resizable} (@pxref{value_is_resizable}).
1066 @deftypefun void val_labs_set_width (struct val_labs *@var{val_labs}, int @var{new_width})
1067 Changes the width of @var{val_labs}'s values to @var{new_width}, which
1068 must be a valid new width as determined by
1069 @func{val_labs_can_set_width}.
1072 @node Value Labels Adding and Removing Labels
1073 @subsection Adding and Removing Labels
1075 These functions add and remove value labels from a @struct{val_labs}
1078 @deftypefun bool val_labs_add (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1079 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1080 which must have the same width as the set of value labels. Returns
1081 true if successful, false if @var{value} already has a label.
1084 @deftypefun void val_labs_replace (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1085 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1086 which must have the same width as the set of value labels. If
1087 @var{value} already has a label in @var{var_labs}, it is replaced.
1090 @deftypefun bool val_labs_remove (struct val_labs *@var{val_labs}, union value @var{value})
1091 Removes from @var{val_labs} any label for @var{value}, which must have
1092 the same width as the set of value labels. Returns true if a label
1093 was removed, false otherwise.
1096 @node Value Labels Iteration
1097 @subsection Iterating through Value Labels
1099 These functions allow iteration through the set of value labels
1100 represented by a @struct{val_labs} object. They may be used in the
1101 context of a @code{for} loop:
1104 struct val_labs val_labs;
1105 const struct val_lab *vl;
1109 for (vl = val_labs_first (val_labs); vl != NULL;
1110 vl = val_labs_next (val_labs, vl))
1112 @dots{}@r{do something with @code{vl}}@dots{}
1116 Value labels should not be added or deleted from a @struct{val_labs}
1117 as it is undergoing iteration.
1119 @deftypefun {const struct val_lab *} val_labs_first (const struct val_labs *@var{val_labs})
1120 Returns the first value label in @var{var_labs}, if it contains at
1121 least one value label, or a null pointer if it does not contain any
1125 @deftypefun {const struct val_lab *} val_labs_next (const struct val_labs *@var{val_labs}, const struct val_labs_iterator **@var{vl})
1126 Returns the value label in @var{var_labs} following @var{vl}, if
1127 @var{vl} is not the last value label in @var{val_labs}, or a null
1128 pointer if there are no value labels following @var{vl}.
1131 @deftypefun {const struct val_lab **} val_labs_sorted (const struct val_labs *@var{val_labs})
1132 Allocates and returns an array of pointers to value labels, which are
1133 sorted in increasing order by value. The array has
1134 @code{val_labs_count (@var{val_labs})} elements. The caller is
1135 responsible for freeing the array with @func{free} (but must not free
1136 any of the @struct{val_lab} elements that the array points to).
1139 The iteration functions above work with pointers to @struct{val_lab}
1140 which is an opaque data structure that users of @struct{val_labs} must
1141 not modify or free directly. The following functions work with
1142 objects of this type:
1144 @deftypefun {const union value *} val_lab_get_value (const struct val_lab *@var{vl})
1145 Returns the value of value label @var{vl}. The caller must not modify
1146 or free the returned value. (To achieve a similar result, remove the
1147 value label with @func{val_labs_remove}, then add the new value with
1148 @func{val_labs_add}.)
1150 The width of the returned value cannot be determined directly from
1151 @var{vl}. It may be obtained by calling @func{val_labs_get_width} on
1152 the @struct{val_labs} that @var{vl} is in.
1155 @deftypefun {const char *} val_lab_get_label (const struct val_lab *@var{vl})
1156 Returns the label in @var{vl} as a null-terminated string. The caller
1157 must not modify or free the returned string. (Use
1158 @func{val_labs_replace} to change a value label.)
1164 A PSPP variable is represented by @struct{variable}, an opaque type
1165 declared in @file{data/variable.h} along with related declarations.
1166 @xref{Variables,,,pspp, PSPP Users Guide}, for a description of PSPP
1167 variables from a user perspective.
1169 PSPP is unusual among computer languages in that, by itself, a PSPP
1170 variable does not have a value. Instead, a variable in PSPP takes on
1171 a value only in the context of a case, which supplies one value for
1172 each variable in a set of variables (@pxref{Cases}). The set of
1173 variables in a case, in turn, are ordinarily part of a dictionary
1174 (@pxref{Dictionaries}).
1176 Every variable has several attributes, most of which correspond
1177 directly to one of the variable attributes visible to PSPP users
1178 (@pxref{Attributes,,,pspp, PSPP Users Guide}).
1180 The following sections describe variable-related functions and macros.
1184 * Variable Type and Width::
1185 * Variable Missing Values::
1186 * Variable Value Labels::
1187 * Variable Print and Write Formats::
1189 * Variable GUI Attributes::
1190 * Variable Leave Status::
1191 * Dictionary Class::
1192 * Variable Creation and Destruction::
1193 * Variable Short Names::
1194 * Variable Relationships::
1195 * Variable Auxiliary Data::
1196 * Variable Categorical Values::
1200 @subsection Variable Name
1202 A variable name is a string between 1 and @code{ID_MAX_LEN} bytes
1203 long that satisfies the rules for PSPP identifiers
1204 (@pxref{Tokens,,,pspp, PSPP Users Guide}). Variable names are
1205 mixed-case and treated case-insensitively.
1207 @deftypefn Macro int ID_MAX_LEN
1208 Maximum length of a variable name, in bytes, currently 64.
1211 Only one commonly useful function relates to variable names:
1213 @deftypefun {const char *} var_get_name (const struct variable *@var{var})
1214 Returns @var{var}'s variable name as a C string.
1217 A few other functions are much more rarely used. Some of these
1218 functions are used internally by the dictionary implementation:
1220 @anchor{var_set_name}
1221 @deftypefun {void} var_set_name (struct variable *@var{var}, const char *@var{new_name})
1222 Changes the name of @var{var} to @var{new_name}, which must be a
1223 ``plausible'' name as defined below.
1225 This function cannot be applied to a variable that is part of a
1226 dictionary. Use @func{dict_rename_var} instead (@pxref{Dictionary
1227 Renaming Variables}).
1230 @deftypefun {enum dict_class} var_get_dict_class (const struct variable *@var{var})
1231 Returns the dictionary class of @var{var}'s name (@pxref{Dictionary
1235 @node Variable Type and Width
1236 @subsection Variable Type and Width
1238 A variable's type and width are the type and width of its values
1241 @deftypefun {enum val_type} var_get_type (const struct variable *@var{var})
1242 Returns the type of variable @var{var}.
1245 @deftypefun int var_get_width (const struct variable *@var{var})
1246 Returns the width of variable @var{var}.
1249 @deftypefun void var_set_width (struct variable *@var{var}, int @var{width})
1250 Sets the width of variable @var{var} to @var{width}. The width of a
1251 variable should not normally be changed after the variable is created,
1252 so this function is rarely used. This function cannot be applied to a
1253 variable that is part of a dictionary.
1256 @deftypefun bool var_is_numeric (const struct variable *@var{var})
1257 Returns true if @var{var} is a numeric variable, false otherwise.
1260 @deftypefun bool var_is_alpha (const struct variable *@var{var})
1261 Returns true if @var{var} is an alphanumeric (string) variable, false
1265 @node Variable Missing Values
1266 @subsection Variable Missing Values
1268 A numeric or short string variable may have a set of user-missing
1269 values (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}), represented
1270 as a @struct{missing_values} (@pxref{User-Missing Values}).
1272 The most frequent operation on a variable's missing values is to query
1273 whether a value is user- or system-missing:
1275 @deftypefun bool var_is_value_missing (const struct variable *@var{var}, const union value *@var{value}, enum mv_class @var{class})
1276 @deftypefunx bool var_is_num_missing (const struct variable *@var{var}, double @var{value}, enum mv_class @var{class})
1277 @deftypefunx bool var_is_str_missing (const struct variable *@var{var}, const char @var{value}[], enum mv_class @var{class})
1278 Tests whether @var{value} is a missing value of the given @var{class}
1279 for variable @var{var} and returns true if so, false otherwise.
1280 @func{var_is_num_missing} may only be applied to numeric variables;
1281 @func{var_is_str_missing} may only be applied to string variables.
1282 @var{value} must have been initialized with the same width as
1285 @code{var_is_@var{type}_missing (@var{var}, @var{value}, @var{class})}
1286 is equivalent to @code{mv_is_@var{type}_missing
1287 (var_get_missing_values (@var{var}), @var{value}, @var{class})}.
1290 In addition, a few functions are provided to work more directly with a
1291 variable's @struct{missing_values}:
1293 @deftypefun {const struct missing_values *} var_get_missing_values (const struct variable *@var{var})
1294 Returns the @struct{missing_values} associated with @var{var}. The
1295 caller must not modify the returned structure. The return value is
1299 @anchor{var_set_missing_values}
1300 @deftypefun {void} var_set_missing_values (struct variable *@var{var}, const struct missing_values *@var{miss})
1301 Changes @var{var}'s missing values to a copy of @var{miss}, or if
1302 @var{miss} is a null pointer, clears @var{var}'s missing values. If
1303 @var{miss} is non-null, it must have the same width as @var{var} or be
1304 resizable to @var{var}'s width (@pxref{mv_resize}). The caller
1305 retains ownership of @var{miss}.
1308 @deftypefun void var_clear_missing_values (struct variable *@var{var})
1309 Clears @var{var}'s missing values. Equivalent to
1310 @code{var_set_missing_values (@var{var}, NULL)}.
1313 @deftypefun bool var_has_missing_values (const struct variable *@var{var})
1314 Returns true if @var{var} has any missing values, false if it has
1315 none. Equivalent to @code{mv_is_empty (var_get_missing_values (@var{var}))}.
1318 @node Variable Value Labels
1319 @subsection Variable Value Labels
1321 A numeric or short string variable may have a set of value labels
1322 (@pxref{VALUE LABELS,,,pspp, PSPP Users Guide}), represented as a
1323 @struct{val_labs} (@pxref{Value Labels}). The most commonly useful
1324 functions for value labels return the value label associated with a
1327 @deftypefun {const char *} var_lookup_value_label (const struct variable *@var{var}, const union value *@var{value})
1328 Looks for a label for @var{value} in @var{var}'s set of value labels.
1329 @var{value} must have the same width as @var{var}. Returns the label
1330 if one exists, otherwise a null pointer.
1333 @deftypefun void var_append_value_name (const struct variable *@var{var}, const union value *@var{value}, struct string *@var{str})
1334 Looks for a label for @var{value} in @var{var}'s set of value labels.
1335 @var{value} must have the same width as @var{var}.
1336 If a label exists, it will be appended to the string pointed to by @var{str}.
1337 Otherwise, it formats @var{value}
1338 using @var{var}'s print format (@pxref{Input and Output Formats})
1339 and appends the formatted string.
1342 The underlying @struct{val_labs} structure may also be accessed
1343 directly using the functions described below.
1345 @deftypefun bool var_has_value_labels (const struct variable *@var{var})
1346 Returns true if @var{var} has at least one value label, false
1350 @deftypefun {const struct val_labs *} var_get_value_labels (const struct variable *@var{var})
1351 Returns the @struct{val_labs} associated with @var{var}. If @var{var}
1352 has no value labels, then the return value may or may not be a null
1355 The variable retains ownership of the returned @struct{val_labs},
1356 which the caller must not attempt to modify.
1359 @deftypefun void var_set_value_labels (struct variable *@var{var}, const struct val_labs *@var{val_labs})
1360 Replaces @var{var}'s value labels by a copy of @var{val_labs}. The
1361 caller retains ownership of @var{val_labs}. If @var{val_labs} is a
1362 null pointer, then @var{var}'s value labels, if any, are deleted.
1365 @deftypefun void var_clear_value_labels (struct variable *@var{var})
1366 Deletes @var{var}'s value labels. Equivalent to
1367 @code{var_set_value_labels (@var{var}, NULL)}.
1370 A final group of functions offers shorthands for operations that would
1371 otherwise require getting the value labels from a variable, copying
1372 them, modifying them, and then setting the modified value labels into
1373 the variable (making a second copy):
1375 @deftypefun bool var_add_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1376 Attempts to add a copy of @var{label} as a label for @var{value} for
1377 the given @var{var}. @var{value} must have the same width as
1378 @var{var}. If @var{value} already has a label, then the old label is
1379 retained. Returns true if a label is added, false if there was an
1380 existing label for @var{value}. Either way, the caller retains
1381 ownership of @var{value} and @var{label}.
1384 @deftypefun void var_replace_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1385 Attempts to add a copy of @var{label} as a label for @var{value} for
1386 the given @var{var}. @var{value} must have the same width as
1387 @var{var}. If @var{value} already has a label, then
1388 @var{label} replaces the old label. Either way, the caller retains
1389 ownership of @var{value} and @var{label}.
1392 @node Variable Print and Write Formats
1393 @subsection Variable Print and Write Formats
1395 Each variable has an associated pair of output formats, called its
1396 @dfn{print format} and @dfn{write format}. @xref{Input and Output
1397 Formats,,,pspp, PSPP Users Guide}, for an introduction to formats.
1398 @xref{Input and Output Formats}, for a developer's description of
1399 format representation.
1401 The print format is used to convert a variable's data values to
1402 strings for human-readable output. The write format is used similarly
1403 for machine-readable output, primarily by the WRITE transformation
1404 (@pxref{WRITE,,,pspp, PSPP Users Guide}). Most often a variable's
1405 print and write formats are the same.
1407 A newly created variable by default has format F8.2 if it is numeric
1408 or an A format with the same width as the variable if it is string.
1409 Many creators of variables override these defaults.
1411 Both the print format and write format are output formats. Input
1412 formats are not part of @struct{variable}. Instead, input programs
1413 and transformations keep track of variable input formats themselves.
1415 The following functions work with variable print and write formats.
1417 @deftypefun {const struct fmt_spec *} var_get_print_format (const struct variable *@var{var})
1418 @deftypefunx {const struct fmt_spec *} var_get_write_format (const struct variable *@var{var})
1419 Returns @var{var}'s print or write format, respectively.
1422 @deftypefun void var_set_print_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1423 @deftypefunx void var_set_write_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1424 @deftypefunx void var_set_both_formats (struct variable *@var{var}, const struct fmt_spec *@var{format})
1425 Sets @var{var}'s print format, write format, or both formats,
1426 respectively, to a copy of @var{format}.
1429 @node Variable Labels
1430 @subsection Variable Labels
1432 A variable label is a string that describes a variable. Variable
1433 labels may contain spaces and punctuation not allowed in variable
1434 names. @xref{VARIABLE LABELS,,,pspp, PSPP Users Guide}, for a
1435 user-level description of variable labels.
1437 The most commonly useful functions for variable labels are those to
1438 retrieve a variable's label:
1440 @deftypefun {const char *} var_to_string (const struct variable *@var{var})
1441 Returns @var{var}'s variable label, if it has one, otherwise
1442 @var{var}'s name. In either case the caller must not attempt to
1443 modify or free the returned string.
1445 This function is useful for user output.
1448 @deftypefun {const char *} var_get_label (const struct variable *@var{var})
1449 Returns @var{var}'s variable label, if it has one, or a null pointer
1453 A few other variable label functions are also provided:
1455 @deftypefun void var_set_label (struct variable *@var{var}, const char *@var{label})
1456 Sets @var{var}'s variable label to a copy of @var{label}, or removes
1457 any label from @var{var} if @var{label} is a null pointer or contains
1458 only spaces. Leading and trailing spaces are removed from the
1459 variable label and its remaining content is truncated at 255 bytes.
1462 @deftypefun void var_clear_label (struct variable *@var{var})
1463 Removes any variable label from @var{var}.
1466 @deftypefun bool var_has_label (const struct variable *@var{var})
1467 Returns true if @var{var} has a variable label, false otherwise.
1470 @node Variable GUI Attributes
1471 @subsection GUI Attributes
1473 These functions and types access and set attributes that are mainly
1474 used by graphical user interfaces. Their values are also stored in
1475 and retrieved from system files (but not portable files).
1477 The first group of functions relate to the measurement level of
1478 numeric data. New variables are assigned a nominal level of
1479 measurement by default.
1481 @deftp {Enumeration} {enum measure}
1482 Measurement level. Available values are:
1485 @item MEASURE_NOMINAL
1486 Numeric data values are arbitrary. Arithmetic operations and
1487 numerical comparisons of such data are not meaningful.
1489 @item MEASURE_ORDINAL
1490 Numeric data values indicate progression along a rank order.
1491 Arbitrary arithmetic operations such as addition are not meaningful on
1492 such data, but inequality comparisons (less, greater, etc.) have
1493 straightforward interpretations.
1496 Ratios, sums, etc. of numeric data values have meaningful
1500 PSPP does not have a separate category for interval data, which would
1501 naturally fall between the ordinal and scale measurement levels.
1504 @deftypefun bool measure_is_valid (enum measure @var{measure})
1505 Returns true if @var{measure} is a valid level of measurement, that
1506 is, if it is one of the @code{enum measure} constants listed above,
1507 and false otherwise.
1510 @deftypefun enum measure var_get_measure (const struct variable *@var{var})
1511 @deftypefunx void var_set_measure (struct variable *@var{var}, enum measure @var{measure})
1512 Gets or sets @var{var}'s measurement level.
1515 The following set of functions relates to the width of on-screen
1516 columns used for displaying variable data in a graphical user
1517 interface environment. The unit of measurement is the width of a
1518 character. For proportionally spaced fonts, this is based on the
1519 average width of a character.
1521 @deftypefun int var_get_display_width (const struct variable *@var{var})
1522 @deftypefunx void var_set_display_width (struct variable *@var{var}, int @var{display_width})
1523 Gets or sets @var{var}'s display width.
1526 @anchor{var_default_display_width}
1527 @deftypefun int var_default_display_width (int @var{width})
1528 Returns the default display width for a variable with the given
1529 @var{width}. The default width of a numeric variable is 8. The
1530 default width of a string variable is @var{width} or 32, whichever is
1534 The final group of functions work with the justification of data when
1535 it is displayed in on-screen columns. New variables are by default
1538 @deftp {Enumeration} {enum alignment}
1539 Text justification. Possible values are @code{ALIGN_LEFT},
1540 @code{ALIGN_RIGHT}, and @code{ALIGN_CENTRE}.
1543 @deftypefun bool alignment_is_valid (enum alignment @var{alignment})
1544 Returns true if @var{alignment} is a valid alignment, that is, if it
1545 is one of the @code{enum alignment} constants listed above, and false
1549 @deftypefun enum alignment var_get_alignment (const struct variable *@var{var})
1550 @deftypefunx void var_set_alignment (struct variable *@var{var}, enum alignment @var{alignment})
1551 Gets or sets @var{var}'s alignment.
1554 @node Variable Leave Status
1555 @subsection Variable Leave Status
1557 Commonly, most or all data in a case come from an input file, read
1558 with a command such as DATA LIST or GET, but data can also be
1559 generated with transformations such as COMPUTE. In the latter case
1560 the question of a datum's ``initial value'' can arise. For example,
1561 the value of a piece of generated data can recursively depend on its
1566 Another situation where the initial value of a variable arises is when
1567 its value is not set at all for some cases, e.g.@: below, @code{Y} is
1568 set only for the first 10 cases:
1570 DO IF #CASENUM <= 10.
1575 By default, the initial value of a datum in either of these situations
1576 is the system-missing value for numeric values and spaces for string
1577 values. This means that, above, X would be system-missing and that Y
1578 would be 1 for the first 10 cases and system-missing for the
1581 PSPP also supports retaining the value of a variable from one case to
1582 another, using the LEAVE command (@pxref{LEAVE,,,pspp, PSPP Users
1583 Guide}). The initial value of such a variable is 0 if it is numeric
1584 and spaces if it is a string. If the command @samp{LEAVE X Y} is
1585 appended to the above example, then X would have value 1 in the first
1586 case and increase by 1 in every succeeding case, and Y would have
1587 value 1 for the first 10 cases and 0 for later cases.
1589 The LEAVE command has no effect on data that comes from an input file
1590 or whose values do not depend on a variable's initial value.
1592 The value of scratch variables (@pxref{Scratch Variables,,,pspp, PSPP
1593 Users Guide}) are always left from one case to another.
1595 The following functions work with a variable's leave status.
1597 @deftypefun bool var_get_leave (const struct variable *@var{var})
1598 Returns true if @var{var}'s value is to be retained from case to case,
1599 false if it is reinitialized to system-missing or spaces.
1602 @deftypefun void var_set_leave (struct variable *@var{var}, bool @var{leave})
1603 If @var{leave} is true, marks @var{var} to be left from case to case;
1604 if @var{leave} is false, marks @var{var} to be reinitialized for each
1607 If @var{var} is a scratch variable, @var{leave} must be true.
1610 @deftypefun bool var_must_leave (const struct variable *@var{var})
1611 Returns true if @var{var} must be left from case to case, that is, if
1612 @var{var} is a scratch variable.
1615 @node Dictionary Class
1616 @subsection Dictionary Class
1618 Occasionally it is useful to classify variables into @dfn{dictionary
1619 classes} based on their names. Dictionary classes are represented by
1620 @enum{dict_class}. This type and other declarations for dictionary
1621 classes are in the @file{<data/dict-class.h>} header.
1623 @deftp {Enumeration} {enum dict_class}
1624 The dictionary classes are:
1628 An ordinary variable, one whose name does not begin with @samp{$} or
1632 A system variable, one whose name begins with @samp{$}. @xref{System
1633 Variables,,,pspp, PSPP Users Guide}.
1636 A scratch variable, one whose name begins with @samp{#}.
1637 @xref{Scratch Variables,,,pspp, PSPP Users Guide}.
1640 The values for dictionary classes are bitwise disjoint, which allows
1641 them to be used in bit-masks. An extra enumeration constant
1642 @code{DC_ALL}, whose value is the bitwise-@i{or} of all of the above
1643 constants, is provided to aid in this purpose.
1646 One example use of dictionary classes arises in connection with PSPP
1647 syntax that uses @code{@var{a} TO @var{b}} to name the variables in a
1648 dictionary from @var{a} to @var{b} (@pxref{Sets of Variables,,,pspp,
1649 PSPP Users Guide}). This syntax requires @var{a} and @var{b} to be in
1650 the same dictionary class. It limits the variables that it includes
1651 to those in that dictionary class.
1653 The following functions relate to dictionary classes.
1655 @deftypefun {enum dict_class} dict_class_from_id (const char *@var{name})
1656 Returns the ``dictionary class'' for the given variable @var{name}, by
1657 looking at its first letter.
1660 @deftypefun {const char *} dict_class_to_name (enum dict_class @var{dict_class})
1661 Returns a name for the given @var{dict_class} as an adjective, e.g.@:
1664 This function should probably not be used in new code as it can lead
1665 to difficulties for internationalization.
1668 @node Variable Creation and Destruction
1669 @subsection Variable Creation and Destruction
1671 Only rarely should PSPP code create or destroy variables directly.
1672 Ordinarily, variables are created within a dictionary and destroying
1673 by individual deletion from the dictionary or by destroying the entire
1674 dictionary at once. The functions here enable the exceptional case,
1675 of creation and destruction of variables that are not associated with
1676 any dictionary. These functions are used internally in the dictionary
1680 @deftypefun {struct variable *} var_create (const char *@var{name}, int @var{width})
1681 Creates and returns a new variable with the given @var{name} and
1682 @var{width}. The new variable is not part of any dictionary. Use
1683 @func{dict_create_var}, instead, to create a variable in a dictionary
1684 (@pxref{Dictionary Creating Variables}).
1686 @var{name} should be a valid variable name and must be a ``plausible''
1687 variable name (@pxref{Variable Name}). @var{width} must be between 0
1688 and @code{MAX_STRING}, inclusive (@pxref{Values}).
1690 The new variable has no user-missing values, value labels, or variable
1691 label. Numeric variables initially have F8.2 print and write formats,
1692 right-justified display alignment, and scale level of measurement.
1693 String variables are created with A print and write formats,
1694 left-justified display alignment, and nominal level of measurement.
1695 The initial display width is determined by
1696 @func{var_default_display_width} (@pxref{var_default_display_width}).
1698 The new variable initially has no short name (@pxref{Variable Short
1699 Names}) and no auxiliary data (@pxref{Variable Auxiliary Data}).
1703 @deftypefun {struct variable *} var_clone (const struct variable *@var{old_var})
1704 Creates and returns a new variable with the same attributes as
1705 @var{old_var}, with a few exceptions. First, the new variable is not
1706 part of any dictionary, regardless of whether @var{old_var} was in a
1707 dictionary. Use @func{dict_clone_var}, instead, to add a clone of a
1708 variable to a dictionary.
1710 Second, the new variable is not given any short name, even if
1711 @var{old_var} had a short name. This is because the new variable is
1712 likely to be immediately renamed, in which case the short name would
1713 be incorrect (@pxref{Variable Short Names}).
1715 Finally, @var{old_var}'s auxiliary data, if any, is not copied to the
1716 new variable (@pxref{Variable Auxiliary Data}).
1719 @deftypefun {void} var_destroy (struct variable *@var{var})
1720 Destroys @var{var} and frees all associated storage, including its
1721 auxiliary data, if any. @var{var} must not be part of a dictionary.
1722 To delete a variable from a dictionary and destroy it, use
1723 @func{dict_delete_var} (@pxref{Dictionary Deleting Variables}).
1726 @node Variable Short Names
1727 @subsection Variable Short Names
1729 PSPP variable names may be up to 64 (@code{ID_MAX_LEN}) bytes long.
1730 The system and portable file formats, however, were designed when
1731 variable names were limited to 8 bytes in length. Since then, the
1732 system file format has been augmented with an extension record that
1733 explains how the 8-byte short names map to full-length names
1734 (@pxref{Long Variable Names Record}), but the short names are still
1735 present. Thus, the continued presence of the short names is more or
1736 less invisible to PSPP users, but every variable in a system file
1737 still has a short name that must be unique.
1739 PSPP can generate unique short names for variables based on their full
1740 names at the time it creates the data file. If all variables' full
1741 names are unique in their first 8 bytes, then the short names are
1742 simply prefixes of the full names; otherwise, PSPP changes them so
1743 that they are unique.
1745 By itself this algorithm interoperates well with other software that
1746 can read system files, as long as that software understands the
1747 extension record that maps short names to long names. When the other
1748 software does not understand the extension record, it can produce
1749 surprising results. Consider a situation where PSPP reads a system
1750 file that contains two variables named RANKINGSCORE, then the user
1751 adds a new variable named RANKINGSTATUS, then saves the modified data
1752 as a new system file. A program that does not understand long names
1753 would then see one of these variables under the name RANKINGS---either
1754 one, depending on the algorithm's details---and the other under a
1755 different name. The effect could be very confusing: by adding a new
1756 and apparently unrelated variable in PSPP, the user effectively
1757 renamed the existing variable.
1759 To counteract this potential problem, every @struct{variable} may have
1760 a short name. A variable created by the system or portable file
1761 reader receives the short name from that data file. When a variable
1762 with a short name is written to a system or portable file, that
1763 variable receives priority over other long names whose names begin
1764 with the same 8 bytes but which were not read from a data file under
1767 Variables not created by the system or portable file reader have no
1768 short name by default.
1770 A variable with a full name of 8 bytes or less in length has absolute
1771 priority for that name when the variable is written to a system file,
1772 even over a second variable with that assigned short name.
1774 PSPP does not enforce uniqueness of short names, although the short
1775 names read from any given data file will always be unique. If two
1776 variables with the same short name are written to a single data file,
1777 neither one receives priority.
1779 The following macros and functions relate to short names.
1781 @defmac SHORT_NAME_LEN
1782 Maximum length of a short name, in bytes. Its value is 8.
1785 @deftypefun {const char *} var_get_short_name (const struct variable *@var{var})
1786 Returns @var{var}'s short name, or a null pointer if @var{var} has not
1787 been assigned a short name.
1790 @deftypefun void var_set_short_name (struct variable *@var{var}, const char *@var{short_name})
1791 Sets @var{var}'s short name to @var{short_name}, or removes
1792 @var{var}'s short name if @var{short_name} is a null pointer. If it
1793 is non-null, then @var{short_name} must be a plausible name for a
1794 variable. The name will be truncated
1795 to 8 bytes in length and converted to all-uppercase.
1798 @deftypefun void var_clear_short_name (struct variable *@var{var})
1799 Removes @var{var}'s short name.
1802 @node Variable Relationships
1803 @subsection Variable Relationships
1805 Variables have close relationships with dictionaries
1806 (@pxref{Dictionaries}) and cases (@pxref{Cases}). A variable is
1807 usually a member of some dictionary, and a case is often used to store
1808 data for the set of variables in a dictionary.
1810 These functions report on these relationships. They may be applied
1811 only to variables that are in a dictionary.
1813 @deftypefun size_t var_get_dict_index (const struct variable *@var{var})
1814 Returns @var{var}'s index within its dictionary. The first variable
1815 in a dictionary has index 0, the next variable index 1, and so on.
1817 The dictionary index can be influenced using dictionary functions such
1818 as dict_reorder_var (@pxref{dict_reorder_var}).
1821 @deftypefun size_t var_get_case_index (const struct variable *@var{var})
1822 Returns @var{var}'s index within a case. The case index is an index
1823 into an array of @union{value} large enough to contain all the data in
1826 The returned case index can be used to access the value of @var{var}
1827 within a case for its dictionary, as in e.g.@: @code{case_data_idx
1828 (case, var_get_case_index (@var{var}))}, but ordinarily it is more
1829 convenient to use the data access functions that do variable-to-index
1830 translation internally, as in e.g.@: @code{case_data (case,
1834 @node Variable Auxiliary Data
1835 @subsection Variable Auxiliary Data
1837 Each @struct{variable} can have a single pointer to auxiliary data of
1838 type @code{void *}. These functions manipulate a variable's auxiliary
1841 Use of auxiliary data is discouraged because of its lack of
1842 flexibility. Only one client can make use of auxiliary data on a
1843 given variable at any time, even though many clients could usefully
1844 associate data with a variable.
1846 To prevent multiple clients from attempting to use a variable's single
1847 auxiliary data field at the same time, we adopt the convention that
1848 use of auxiliary data in the active dataset dictionary is restricted to
1849 the currently executing command. In particular, transformations must
1850 not attach auxiliary data to a variable in the active dataset in the
1851 expectation that it can be used later when the active dataset is read and
1852 the transformation is executed. To help enforce this restriction,
1853 auxiliary data is deleted from all variables in the active dataset
1854 dictionary after the execution of each PSPP command.
1856 This convention for safe use of auxiliary data applies only to the
1857 active dataset dictionary. Rules for other dictionaries may be
1858 established separately.
1860 Auxiliary data should be replaced by a more flexible mechanism at some
1861 point, but no replacement mechanism has been designed or implemented
1864 The following functions work with variable auxiliary data.
1866 @deftypefun {void *} var_get_aux (const struct variable *@var{var})
1867 Returns @var{var}'s auxiliary data, or a null pointer if none has been
1871 @deftypefun {void *} var_attach_aux (const struct variable *@var{var}, void *@var{aux}, void (*@var{aux_dtor}) (struct variable *))
1872 Sets @var{var}'s auxiliary data to @var{aux}, which must not be null.
1873 @var{var} must not already have auxiliary data.
1875 Before @var{var}'s auxiliary data is cleared by @code{var_clear_aux},
1876 @var{aux_dtor}, if non-null, will be called with @var{var} as its
1877 argument. It should free any storage associated with @var{aux}, if
1878 necessary. @code{var_dtor_free} may be appropriate for use as
1881 @deffn {Function} void var_dtor_free (struct variable *@var{var})
1882 Frees @var{var}'s auxiliary data by calling @code{free}.
1886 @deftypefun void var_clear_aux (struct variable *@var{var})
1887 Removes auxiliary data, if any, from @var{var}, first calling the
1888 destructor passed to @code{var_attach_aux}, if one was provided.
1890 Use @code{dict_clear_aux} to remove auxiliary data from every variable
1891 in a dictionary. @c (@pxref{dict_clear_aux}).
1894 @deftypefun {void *} var_detach_aux (struct variable *@var{var})
1895 Removes auxiliary data, if any, from @var{var}, and returns it.
1896 Returns a null pointer if @var{var} had no auxiliary data.
1898 Any destructor passed to @code{var_attach_aux} is not called, so the
1899 caller is responsible for freeing storage associated with the returned
1903 @node Variable Categorical Values
1904 @subsection Variable Categorical Values
1906 Some statistical procedures require a list of all the values that a
1907 categorical variable takes on. Arranging such a list requires making
1908 a pass through the data, so PSPP caches categorical values in
1911 When variable auxiliary data is revamped to support multiple clients
1912 as described in the previous section, categorical values are an
1913 obvious candidate. The form in which they are currently supported is
1916 Categorical values are not robust against changes in the data. That
1917 is, there is currently no way to detect that a transformation has
1918 changed data values, meaning that categorical values lists for the
1919 changed variables must be recomputed. PSPP is in fact in need of a
1920 general-purpose caching and cache-invalidation mechanism, but none
1921 has yet been designed and built.
1923 The following functions work with cached categorical values.
1925 @deftypefun {struct cat_vals *} var_get_obs_vals (const struct variable *@var{var})
1926 Returns @var{var}'s set of categorical values. Yields undefined
1927 behavior if @var{var} does not have any categorical values.
1930 @deftypefun void var_set_obs_vals (const struct variable *@var{var}, struct cat_vals *@var{cat_vals})
1931 Destroys @var{var}'s categorical values, if any, and replaces them by
1932 @var{cat_vals}, ownership of which is transferred to @var{var}. If
1933 @var{cat_vals} is a null pointer, then @var{var}'s categorical values
1937 @deftypefun bool var_has_obs_vals (const struct variable *@var{var})
1938 Returns true if @var{var} has a set of categorical values, false
1943 @section Dictionaries
1945 Each data file in memory or on disk has an associated dictionary,
1946 whose primary purpose is to describe the data in the file.
1947 @xref{Variables,,,pspp, PSPP Users Guide}, for a PSPP user's view of a
1950 A data file stored in a PSPP format, either as a system or portable
1951 file, has a representation of its dictionary embedded in it. Other
1952 kinds of data files are usually not self-describing enough to
1953 construct a dictionary unassisted, so the dictionaries for these files
1954 must be specified explicitly with PSPP commands such as @cmd{DATA
1957 The most important content of a dictionary is an array of variables,
1958 which must have unique names. A dictionary also conceptually contains
1959 a mapping from each of its variables to a location within a case
1960 (@pxref{Cases}), although in fact these mappings are stored within
1961 individual variables.
1963 System variables are not members of any dictionary (@pxref{System
1964 Variables,,,pspp, PSPP Users Guide}).
1966 Dictionaries are represented by @struct{dictionary}. Declarations
1967 related to dictionaries are in the @file{<data/dictionary.h>} header.
1969 The following sections describe functions for use with dictionaries.
1972 * Dictionary Variable Access::
1973 * Dictionary Creating Variables::
1974 * Dictionary Deleting Variables::
1975 * Dictionary Reordering Variables::
1976 * Dictionary Renaming Variables::
1977 * Dictionary Weight Variable::
1978 * Dictionary Filter Variable::
1979 * Dictionary Case Limit::
1980 * Dictionary Split Variables::
1981 * Dictionary File Label::
1982 * Dictionary Documents::
1985 @node Dictionary Variable Access
1986 @subsection Accessing Variables
1988 The most common operations on a dictionary simply retrieve a
1989 @code{struct variable *} of an individual variable based on its name
1992 @deftypefun {struct variable *} dict_lookup_var (const struct dictionary *@var{dict}, const char *@var{name})
1993 @deftypefunx {struct variable *} dict_lookup_var_assert (const struct dictionary *@var{dict}, const char *@var{name})
1994 Looks up and returns the variable with the given @var{name} within
1995 @var{dict}. Name lookup is not case-sensitive.
1997 @code{dict_lookup_var} returns a null pointer if @var{dict} does not
1998 contain a variable named @var{name}. @code{dict_lookup_var_assert}
1999 asserts that such a variable exists.
2002 @deftypefun {struct variable *} dict_get_var (const struct dictionary *@var{dict}, size_t @var{position})
2003 Returns the variable at the given @var{position} in @var{dict}.
2004 @var{position} must be less than the number of variables in @var{dict}
2008 @deftypefun size_t dict_get_var_cnt (const struct dictionary *@var{dict})
2009 Returns the number of variables in @var{dict}.
2012 Another pair of functions allows retrieving a number of variables at
2013 once. These functions are more rarely useful.
2015 @deftypefun void dict_get_vars (const struct dictionary *@var{dict}, const struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2016 @deftypefunx void dict_get_vars_mutable (const struct dictionary *@var{dict}, struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2017 Retrieves all of the variables in @var{dict}, in their original order,
2018 except that any variables in the dictionary classes specified
2019 @var{exclude}, if any, are excluded (@pxref{Dictionary Class}).
2020 Pointers to the variables are stored in an array allocated with
2021 @code{malloc}, and a pointer to the first element of this array is
2022 stored in @code{*@var{vars}}. The caller is responsible for freeing
2023 this memory when it is no longer needed. The number of variables
2024 retrieved is stored in @code{*@var{cnt}}.
2026 The presence or absence of @code{DC_SYSTEM} in @var{exclude} has no
2027 effect, because dictionaries never include system variables.
2030 One additional function is available. This function is most often
2031 used in assertions, but it is not restricted to such use.
2033 @deftypefun bool dict_contains_var (const struct dictionary *@var{dict}, const struct variable *@var{var})
2034 Tests whether @var{var} is one of the variables in @var{dict}.
2035 Returns true if so, false otherwise.
2038 @node Dictionary Creating Variables
2039 @subsection Creating Variables
2041 These functions create a new variable and insert it into a dictionary
2044 There is no provision for inserting an already created variable into a
2045 dictionary. There is no reason that such a function could not be
2046 written, but so far there has been no need for one.
2048 The names provided to one of these functions should be valid variable
2049 names and must be plausible variable names. @c (@pxref{Variable Names}).
2051 If a variable with the same name already exists in the dictionary, the
2052 non-@code{assert} variants of these functions return a null pointer,
2053 without modifying the dictionary. The @code{assert} variants, on the
2054 other hand, assert that no duplicate name exists.
2056 A variable may be in only one dictionary at any given time.
2058 @deftypefun {struct variable *} dict_create_var (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2059 @deftypefunx {struct variable *} dict_create_var_assert (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2060 Creates a new variable with the given @var{name} and @var{width}, as
2061 if through a call to @code{var_create} with those arguments
2062 (@pxref{var_create}), appends the new variable to @var{dict}'s array
2063 of variables, and returns the new variable.
2066 @deftypefun {struct variable *} dict_clone_var (struct dictionary *@var{dict}, const struct variable *@var{old_var})
2067 @deftypefunx {struct variable *} dict_clone_var_assert (struct dictionary *@var{dict}, const struct variable *@var{old_var})
2068 Creates a new variable as a clone of @var{var}, inserts the new
2069 variable into @var{dict}, and returns the new variable. Other
2070 properties of the new variable are copied from @var{old_var}, except
2071 for those not copied by @code{var_clone} (@pxref{var_clone}).
2073 @var{var} does not need to be a member of any dictionary.
2076 @deftypefun {struct variable *} dict_clone_var_as (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2077 @deftypefunx {struct variable *} dict_clone_var_as_assert (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2078 These functions are similar to @code{dict_clone_var} and
2079 @code{dict_clone_var_assert}, respectively, except that the new
2080 variable is named @var{name} instead of keeping @var{old_var}'s name.
2083 @node Dictionary Deleting Variables
2084 @subsection Deleting Variables
2086 These functions remove variables from a dictionary's array of
2087 variables. They also destroy the removed variables and free their
2090 Deleting a variable to which there might be external pointers is a bad
2091 idea. In particular, deleting variables from the active dataset
2092 dictionary is a risky proposition, because transformations can retain
2093 references to arbitrary variables. Therefore, no variable should be
2094 deleted from the active dataset dictionary when any transformations are
2095 active, because those transformations might reference the variable to
2096 be deleted. The safest time to delete a variable is just after a
2097 procedure has been executed, as done by @cmd{DELETE VARIABLES}.
2099 Deleting a variable automatically removes references to that variable
2100 from elsewhere in the dictionary as a weighting variable, filter
2101 variable, @cmd{SPLIT FILE} variable, or member of a vector.
2103 No functions are provided for removing a variable from a dictionary
2104 without destroying that variable. As with insertion of an existing
2105 variable, there is no reason that this could not be implemented, but
2106 so far there has been no need.
2108 @deftypefun void dict_delete_var (struct dictionary *@var{dict}, struct variable *@var{var})
2109 Deletes @var{var} from @var{dict}, of which it must be a member.
2112 @deftypefun void dict_delete_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{count})
2113 Deletes the @var{count} variables in array @var{vars} from @var{dict}.
2114 All of the variables in @var{vars} must be members of @var{dict}. No
2115 variable may be included in @var{vars} more than once.
2118 @deftypefun void dict_delete_consecutive_vars (struct dictionary *@var{dict}, size_t @var{idx}, size_t @var{count})
2119 Deletes the variables in sequential positions
2120 @var{idx}@dots{}@var{idx} + @var{count} (exclusive) from @var{dict},
2121 which must contain at least @var{idx} + @var{count} variables.
2124 @deftypefun void dict_delete_scratch_vars (struct dictionary *@var{dict})
2125 Deletes all scratch variables from @var{dict}.
2128 @node Dictionary Reordering Variables
2129 @subsection Changing Variable Order
2131 The variables in a dictionary are stored in an array. These functions
2132 change the order of a dictionary's array of variables without changing
2133 which variables are in the dictionary.
2135 @anchor{dict_reorder_var}
2136 @deftypefun void dict_reorder_var (struct dictionary *@var{dict}, struct variable *@var{var}, size_t @var{new_index})
2137 Moves @var{var}, which must be in @var{dict}, so that it is at
2138 position @var{new_index} in @var{dict}'s array of variables. Other
2139 variables in @var{dict}, if any, retain their relative positions.
2140 @var{new_index} must be less than the number of variables in
2144 @deftypefun void dict_reorder_vars (struct dictionary *@var{dict}, struct variable *const *@var{new_order}, size_t @var{count})
2145 Moves the @var{count} variables in @var{new_order} to the beginning of
2146 @var{dict}'s array of variables in the specified order. Other
2147 variables in @var{dict}, if any, retain their relative positions.
2149 All of the variables in @var{new_order} must be in @var{dict}. No
2150 duplicates are allowed within @var{new_order}, which means that
2151 @var{count} must be no greater than the number of variables in
2155 @node Dictionary Renaming Variables
2156 @subsection Renaming Variables
2158 These functions change the names of variables within a dictionary.
2159 The @func{var_set_name} function (@pxref{var_set_name}) cannot be
2160 applied directly to a variable that is in a dictionary, because
2161 @struct{dictionary} contains an index by name that @func{var_set_name}
2162 would not update. The following functions take care to update the
2163 index as well. They also ensure that variable renaming does not cause
2164 a dictionary to contain a duplicate variable name.
2166 @deftypefun void dict_rename_var (struct dictionary *@var{dict}, struct variable *@var{var}, const char *@var{new_name})
2167 Changes the name of @var{var}, which must be in @var{dict}, to
2168 @var{new_name}. A variable named @var{new_name} must not already be
2169 in @var{dict}, unless @var{new_name} is the same as @var{var}'s
2173 @deftypefun bool dict_rename_vars (struct dictionary *@var{dicT}, struct variable **@var{vars}, char **@var{new_names}, size_t @var{count}, char **@var{err_name})
2174 Renames each of the @var{count} variables in @var{vars} to the name in
2175 the corresponding position of @var{new_names}. If the renaming would
2176 result in a duplicate variable name, returns false and stores one of
2177 the names that would be be duplicated into @code{*@var{err_name}}, if
2178 @var{err_name} is non-null. Otherwise, the renaming is successful,
2179 and true is returned.
2182 @node Dictionary Weight Variable
2183 @subsection Weight Variable
2185 A data set's cases may optionally be weighted by the value of a
2186 numeric variable. @xref{WEIGHT,,,pspp, PSPP Users Guide}, for a user
2187 view of weight variables.
2189 The weight variable is written to and read from system and portable
2192 The most commonly useful function related to weighting is a
2193 convenience function to retrieve a weighting value from a case.
2195 @deftypefun double dict_get_case_weight (const struct dictionary *@var{dict}, const struct ccase *@var{case}, bool *@var{warn_on_invalid})
2196 Retrieves and returns the value of the weighting variable specified by
2197 @var{dict} from @var{case}. Returns 1.0 if @var{dict} has no
2200 Returns 0.0 if @var{c}'s weight value is user- or system-missing,
2201 zero, or negative. In such a case, if @var{warn_on_invalid} is
2202 non-null and @code{*@var{warn_on_invalid}} is true,
2203 @func{dict_get_case_weight} also issues an error message and sets
2204 @code{*@var{warn_on_invalid}} to false. To disable error reporting,
2205 pass a null pointer or a pointer to false as @var{warn_on_invalid} or
2206 use a @func{msg_disable}/@func{msg_enable} pair.
2209 The dictionary also has a pair of functions for getting and setting
2210 the weight variable.
2212 @deftypefun {struct variable *} dict_get_weight (const struct dictionary *@var{dict})
2213 Returns @var{dict}'s current weighting variable, or a null pointer if
2214 the dictionary does not have a weighting variable.
2217 @deftypefun void dict_set_weight (struct dictionary *@var{dict}, struct variable *@var{var})
2218 Sets @var{dict}'s weighting variable to @var{var}. If @var{var} is
2219 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2220 is null, then @var{dict}'s weighting variable, if any, is cleared.
2223 @node Dictionary Filter Variable
2224 @subsection Filter Variable
2226 When the active dataset is read by a procedure, cases can be excluded
2227 from analysis based on the values of a @dfn{filter variable}.
2228 @xref{FILTER,,,pspp, PSPP Users Guide}, for a user view of filtering.
2230 These functions store and retrieve the filter variable. They are
2231 rarely useful, because the data analysis framework automatically
2232 excludes from analysis the cases that should be filtered.
2234 @deftypefun {struct variable *} dict_get_filter (const struct dictionary *@var{dict})
2235 Returns @var{dict}'s current filter variable, or a null pointer if the
2236 dictionary does not have a filter variable.
2239 @deftypefun void dict_set_filter (struct dictionary *@var{dict}, struct variable *@var{var})
2240 Sets @var{dict}'s filter variable to @var{var}. If @var{var} is
2241 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2242 is null, then @var{dict}'s filter variable, if any, is cleared.
2245 @node Dictionary Case Limit
2246 @subsection Case Limit
2248 The limit on cases analyzed by a procedure, set by the @cmd{N OF
2249 CASES} command (@pxref{N OF CASES,,,pspp, PSPP Users Guide}), is
2250 stored as part of the dictionary. The dictionary does not, on the
2251 other hand, play any role in enforcing the case limit (a job done by
2252 data analysis framework code).
2254 A case limit of 0 means that the number of cases is not limited.
2256 These functions are rarely useful, because the data analysis framework
2257 automatically excludes from analysis any cases beyond the limit.
2259 @deftypefun casenumber dict_get_case_limit (const struct dictionary *@var{dict})
2260 Returns the current case limit for @var{dict}.
2263 @deftypefun void dict_set_case_limit (struct dictionary *@var{dict}, casenumber @var{limit})
2264 Sets @var{dict}'s case limit to @var{limit}.
2267 @node Dictionary Split Variables
2268 @subsection Split Variables
2270 The user may use the @cmd{SPLIT FILE} command (@pxref{SPLIT
2271 FILE,,,pspp, PSPP Users Guide}) to select a set of variables on which
2272 to split the active dataset into groups of cases to be analyzed
2273 independently in each statistical procedure. The set of split
2274 variables is stored as part of the dictionary, although the effect on
2275 data analysis is implemented by each individual statistical procedure.
2277 Split variables may be numeric or short or long string variables.
2279 The most useful functions for split variables are those to retrieve
2280 them. Even these functions are rarely useful directly: for the
2281 purpose of breaking cases into groups based on the values of the split
2282 variables, it is usually easier to use
2283 @func{casegrouper_create_splits}.
2285 @deftypefun {const struct variable *const *} dict_get_split_vars (const struct dictionary *@var{dict})
2286 Returns a pointer to an array of pointers to split variables. If and
2287 only if there are no split variables, returns a null pointer. The
2288 caller must not modify or free the returned array.
2291 @deftypefun size_t dict_get_split_cnt (const struct dictionary *@var{dict})
2292 Returns the number of split variables.
2295 The following functions are also available for working with split
2298 @deftypefun void dict_set_split_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{cnt})
2299 Sets @var{dict}'s split variables to the @var{cnt} variables in
2300 @var{vars}. If @var{cnt} is 0, then @var{dict} will not have any
2301 split variables. The caller retains ownership of @var{vars}.
2304 @deftypefun void dict_unset_split_var (struct dictionary *@var{dict}, struct variable *@var{var})
2305 Removes @var{var}, which must be a variable in @var{dict}, from
2306 @var{dict}'s split of split variables.
2309 @node Dictionary File Label
2310 @subsection File Label
2312 A dictionary may optionally have an associated string that describes
2313 its contents, called its file label. The user may set the file label
2314 with the @cmd{FILE LABEL} command (@pxref{FILE LABEL,,,pspp, PSPP
2317 These functions set and retrieve the file label.
2319 @deftypefun {const char *} dict_get_label (const struct dictionary *@var{dict})
2320 Returns @var{dict}'s file label. If @var{dict} does not have a label,
2321 returns a null pointer.
2324 @deftypefun void dict_set_label (struct dictionary *@var{dict}, const char *@var{label})
2325 Sets @var{dict}'s label to @var{label}. If @var{label} is non-null,
2326 then its content, truncated to at most 60 bytes, becomes the new file
2327 label. If @var{label} is null, then @var{dict}'s label is removed.
2329 The caller retains ownership of @var{label}.
2332 @node Dictionary Documents
2333 @subsection Documents
2335 A dictionary may include an arbitrary number of lines of explanatory
2336 text, called the dictionary's documents. For compatibility, document
2337 lines have a fixed width, and lines that are not exactly this width
2338 are truncated or padded with spaces as necessary to bring them to the
2341 PSPP users can use the @cmd{DOCUMENT} (@pxref{DOCUMENT,,,pspp, PSPP
2342 Users Guide}), @cmd{ADD DOCUMENT} (@pxref{ADD DOCUMENT,,,pspp, PSPP
2343 Users Guide}), and @cmd{DROP DOCUMENTS} (@pxref{DROP DOCUMENTS,,,pspp,
2344 PSPP Users Guide}) commands to manipulate documents.
2346 @deftypefn Macro int DOC_LINE_LENGTH
2347 The fixed length of a document line, in bytes, defined to 80.
2350 The following functions work with whole sets of documents. They
2351 accept or return sets of documents formatted as null-terminated
2352 strings that are an exact multiple of @code{DOC_LINE_LENGTH}
2355 @deftypefun {const char *} dict_get_documents (const struct dictionary *@var{dict})
2356 Returns the documents in @var{dict}, or a null pointer if @var{dict}
2360 @deftypefun void dict_set_documents (struct dictionary *@var{dict}, const char *@var{new_documents})
2361 Sets @var{dict}'s documents to @var{new_documents}. If
2362 @var{new_documents} is a null pointer or an empty string, then
2363 @var{dict}'s documents are cleared. The caller retains ownership of
2364 @var{new_documents}.
2367 @deftypefun void dict_clear_documents (struct dictionary *@var{dict})
2368 Clears the documents from @var{dict}.
2371 The following functions work with individual lines in a dictionary's
2374 @deftypefun void dict_add_document_line (struct dictionary *@var{dict}, const char *@var{content})
2375 Appends @var{content} to the documents in @var{dict}. The text in
2376 @var{content} will be truncated or padded with spaces as necessary to
2377 make it exactly @code{DOC_LINE_LENGTH} bytes long. The caller retains
2378 ownership of @var{content}.
2380 If @var{content} is over @code{DOC_LINE_LENGTH}, this function also
2381 issues a warning using @func{msg}. To suppress the warning, enclose a
2382 call to one of this function in a @func{msg_disable}/@func{msg_enable}
2386 @deftypefun size_t dict_get_document_line_cnt (const struct dictionary *@var{dict})
2387 Returns the number of line of documents in @var{dict}. If the
2388 dictionary contains no documents, returns 0.
2391 @deftypefun void dict_get_document_line (const struct dictionary *@var{dict}, size_t @var{idx}, struct string *@var{content})
2392 Replaces the text in @var{content} (which must already have been
2393 initialized by the caller) by the document line in @var{dict} numbered
2394 @var{idx}, which must be less than the number of lines of documents in
2395 @var{dict}. Any trailing white space in the document line is trimmed,
2396 so that @var{content} will have a length between 0 and
2397 @code{DOC_LINE_LENGTH}.
2400 @node Coding Conventions
2401 @section Coding Conventions
2403 Every @file{.c} file should have @samp{#include <config.h>} as its
2404 first non-comment line. No @file{.h} file should include
2407 This section needs to be finished.
2412 This section needs to be written.
2417 This section needs to be written.
2422 This section needs to be written.