2 @chapter Basic Concepts
4 This chapter introduces basic data structures and other concepts
5 needed for developing in PSPP.
9 * Input and Output Formats::
10 * User-Missing Values::
14 * Coding Conventions::
24 The unit of data in PSPP is a @dfn{value}.
30 Values are classified by @dfn{type} and @dfn{width}. The
31 type of a value is either @dfn{numeric} or @dfn{string} (sometimes
32 called alphanumeric). The width of a string value ranges from 1 to
33 @code{MAX_STRING} bytes. The width of a numeric value is artificially
34 defined to be 0; thus, the type of a value can be inferred from its
37 Some support is provided for working with value types and widths, in
38 @file{data/val-type.h}:
40 @deftypefn Macro int MAX_STRING
41 Maximum width of a string value, in bytes, currently 32,767.
44 @deftypefun bool val_type_is_valid (enum val_type @var{val_type})
45 Returns true if @var{val_type} is a valid value type, that is,
46 either @code{VAL_NUMERIC} or @code{VAL_STRING}. Useful for
50 @deftypefun {enum val_type} val_type_from_width (int @var{width})
51 Returns @code{VAL_NUMERIC} if @var{width} is 0 and thus represents the
52 width of a numeric value, otherwise @code{VAL_STRING} to indicate that
53 @var{width} is the width of a string value.
56 The following subsections describe how values of each type are
62 * Runtime Typed Values::
66 @subsection Numeric Values
68 A value known to be numeric at compile time is represented as a
69 @code{double}. PSPP provides three values of @code{double} for
70 special purposes, defined in @file{data/val-type.h}:
72 @deftypefn Macro double SYSMIS
73 The @dfn{system-missing value}, used to represent a datum whose true
74 value is unknown, such as a survey question that was not answered by
75 the respondent, or undefined, such as the result of division by zero.
76 PSPP propagates the system-missing value through calculations and
77 compensates for missing values in statistical analyses. @xref{Missing
78 Observations,,,pspp, PSPP Users Guide}, for a PSPP user's view of
81 PSPP currently defines @code{SYSMIS} as @code{-DBL_MAX}, that is, the
82 greatest finite negative value of @code{double}. It is best not to
83 depend on this definition, because PSPP may transition to using an
84 IEEE NaN (not a number) instead at some point in the future.
87 @deftypefn Macro double LOWEST
88 @deftypefnx Macro double HIGHEST
89 The greatest finite negative (except for @code{SYSMIS}) and positive
90 values of @code{double}, respectively. These values do not ordinarily
91 appear in user data files. Instead, they are used to implement
92 endpoints of open-ended ranges that are occasionally permitted in PSPP
93 syntax, e.g.@: @code{5 THRU HI} as a range of missing values
94 (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}).
98 @subsection String Values
100 A value known at compile time to have string type is represented as an
101 array of @code{char}. String values do not necessarily represent
102 readable text strings and may contain arbitrary 8-bit data, including
103 null bytes, control codes, and bytes with the high bit set. Thus,
104 string values are not null-terminated strings, but rather opaque
107 @code{SYSMIS}, @code{LOWEST}, and @code{HIGHEST} have no equivalents
108 as string values. Usually, PSPP fills an unknown or undefined string
109 values with spaces, but PSPP does not treat such a string as a special
110 case when it processes it later.
113 @code{MAX_STRING}, the maximum length of a string value, is defined in
114 @file{data/val-type.h}.
116 @node Runtime Typed Values
117 @subsection Runtime Typed Values
119 When a value's type is only known at runtime, it is often represented
120 as a @union{value}, defined in @file{data/value.h}. @union{value} has
121 two members: a @code{double} named @samp{f} to store a numeric value
122 and an array of @code{char} named @samp{s} to a store a string value.
123 A @union{value} does not identify the type or width of the data it
124 contains. Code that works with @union{values}s must therefore have
125 external knowledge of its content, often through the type and width of
126 a @struct{variable} (@pxref{Variables}).
128 @cindex MAX_SHORT_STRING
132 The array of @code{char} in @union{value} has only a small, fixed
133 capacity of @code{MAX_SHORT_STRING} bytes. A value that
134 fits within this capacity is called a @dfn{short string}. Any wider
135 string value, which must be represented by more than one
136 @union{value}, is called a @dfn{long string}.
138 @deftypefn Macro int MAX_SHORT_STRING
139 Maximum width of a short string value, never less than 8 bytes. It is
140 wider than 8 bytes on systems where @code{double} is either larger
141 than 8 bytes or has stricter alignment than 8 bytes.
144 @deftypefn Macro int MIN_LONG_STRING
145 Minimum width of a long string value, that is, @code{MAX_SHORT_STRING
149 Long string variables are slightly harder to work with than short
150 string values, because they cannot be conveniently and efficiently
151 allocated as block scope variables or structure members. The PSPP
152 language exposes this inconvenience to the user: there are many
153 circumstances in PSPP syntax where short strings are allowed but not
154 long strings. Short string variables, for example, may have
155 user-missing values, but long string variables may not (@pxref{Missing
156 Observations,,,pspp, PSPP Users Guide}).
158 PSPP provides a few functions for working with @union{value}s. The
159 most useful are described below. To use these functions, recall that
160 a numeric value has a width of 0.
162 @deftypefun size_t value_cnt_from_width (int @var{width})
163 Returns the number of consecutive @union{value}s that must be
164 allocated to store a value of the given @var{width}. For a numeric or
165 short string value, the return value is 1; for long string
166 variables, it is greater than 1.
169 @deftypefun void value_copy (union value *@var{dst}, @
170 const union value *@var{src}, @
172 Copies a value of the given @var{width} from the @union{value} array
173 starting at @var{src} to the one starting at @var{dst}. The two
174 arrays must not overlap.
177 @deftypefun void value_set_missing (union value *@var{value}, int @var{width})
178 Sets @var{value} to @code{SYSMIS} if it is numeric or to all spaces if
179 it is alphanumeric, according to @var{width}. @var{value} must point
180 to the start of a @union{value} array of the given @var{width}.
183 @anchor{value_is_resizable}
184 @deftypefun bool value_is_resizable (const union value *@var{value}, int @var{old_width}, int @var{new_width})
185 Determines whether @var{value} may be resized from @var{old_width} to
186 @var{new_width}. Resizing is possible if the following criteria are
187 met. First, @var{old_width} and @var{new_width} must be both numeric
188 or both string widths. Second, if @var{new_width} is a short string
189 width and less than @var{old_width}, resizing is allowed only if bytes
190 @var{new_width} through @var{old_width} in @var{value} contain only
193 These rules are part of those used by @func{mv_is_resizable} and
194 @func{val_labs_can_set_width}.
197 @deftypefun void value_resize (union value *@var{value}, int @var{old_width}, int @var{new_width})
198 Resizes @var{value} from @var{old_width} to @var{new_width}, which
199 must be allowed by the rules stated above. This has an effect only if
200 @var{new_width} is greater than @var{old_width}, in which case the
201 bytes newly added to @var{value} are cleared to spaces.
204 @node Input and Output Formats
205 @section Input and Output Formats
207 Input and output formats specify how to convert data fields to and
208 from data values (@pxref{Input and Output Formats,,,pspp, PSPP Users
209 Guide}). PSPP uses @struct{fmt_spec} to represent input and output
212 Function prototypes and other declarations related to formats are in
213 the @file{<data/format.h>} header.
215 @deftp {Structure} {struct fmt_spec}
216 An input or output format, with the following members:
219 @item enum fmt_type type
220 The format type (see below).
223 Field width, in bytes. The width of numeric fields is always between
224 1 and 40 bytes, and the width of string fields is always between 1 and
225 65534 bytes. However, many individual types of formats place stricter
226 limits on field width (see @ref{fmt_max_input_width},
227 @ref{fmt_max_output_width}).
230 Number of decimal places, in character positions. For format types
231 that do not allow decimal places to be specified, this value must be
232 0. Format types that do allow decimal places have type-specific and
233 often width-specific restrictions on @code{d} (see
234 @ref{fmt_max_input_decimals}, @ref{fmt_max_output_decimals}).
238 @deftp {Enumeration} {enum fmt_type}
239 An enumerated type representing an input or output format type. Each
240 PSPP input and output format has a corresponding enumeration constant
241 prefixed by @samp{FMT}: @code{FMT_F}, @code{FMT_COMMA},
242 @code{FMT_DOT}, and so on.
245 The following sections describe functions for manipulating formats and
246 the data in fields represented by formats.
249 * Constructing and Verifying Formats::
250 * Format Utility Functions::
251 * Obtaining Properties of Format Types::
252 * Numeric Formatting Styles::
253 * Formatted Data Input and Output::
256 @node Constructing and Verifying Formats
257 @subsection Constructing and Verifying Formats
259 These functions construct @struct{fmt_spec}s and verify that they are
264 @deftypefun {struct fmt_spec} fmt_for_input (enum fmt_type @var{type}, int @var{w}, int @var{d})
265 @deftypefunx {struct fmt_spec} fmt_for_output (enum fmt_type @var{type}, int @var{w}, int @var{d})
266 Constructs a @struct{fmt_spec} with the given @var{type}, @var{w}, and
267 @var{d}, asserts that the result is a valid input (or output) format,
271 @anchor{fmt_for_output_from_input}
272 @deftypefun {struct fmt_spec} fmt_for_output_from_input (const struct fmt_spec *@var{input})
273 Given @var{input}, which must be a valid input format, returns the
274 equivalent output format. @xref{Input and Output Formats,,,pspp, PSPP
275 Users Guide}, for the rules for converting input formats into output
279 @deftypefun {struct fmt_spec} fmt_default_for_width (int @var{width})
280 Returns the default output format for a variable of the given
281 @var{width}. For a numeric variable, this is F8.2 format; for a
282 string variable, it is the A format of the given @var{width}.
285 The following functions check whether a @struct{fmt_spec} is valid for
286 various uses and return true if so, false otherwise. When any of them
287 returns false, it also outputs an explanatory error message using
288 @func{msg}. To suppress error output, enclose a call to one of these
289 functions by a @func{msg_disable}/@func{msg_enable} pair.
291 @deftypefun bool fmt_check (const struct fmt_spec *@var{format}, bool @var{for_input})
292 @deftypefunx bool fmt_check_input (const struct fmt_spec *@var{format})
293 @deftypefunx bool fmt_check_output (const struct fmt_spec *@var{format})
294 Checks whether @var{format} is a valid input format (for
295 @func{fmt_check_input}, or @func{fmt_check} if @var{for_input}) or
296 output format (for @func{fmt_check_output}, or @func{fmt_check} if not
300 @deftypefun bool fmt_check_type_compat (const struct fmt_spec *@var{format}, enum val_type @var{type})
301 Checks whether @var{format} matches the value type @var{type}, that
302 is, if @var{type} is @code{VAL_NUMERIC} and @var{format} is a numeric
303 format or @var{type} is @code{VAL_STRING} and @var{format} is a string
307 @deftypefun bool fmt_check_width_compat (const struct fmt_spec *@var{format}, int @var{width})
308 Checks whether @var{format} may be used as an output format for a
309 value of the given @var{width}.
311 @func{fmt_var_width}, described in
312 the following section, can be also be used to determine the value
313 width needed by a format.
316 @node Format Utility Functions
317 @subsection Format Utility Functions
319 These functions work with @struct{fmt_spec}s.
321 @deftypefun int fmt_var_width (const struct fmt_spec *@var{format})
322 Returns the width for values associated with @var{format}. If
323 @var{format} is a numeric format, the width is 0; if @var{format} is
324 an A format, then the width @code{@var{format}->w}; otherwise,
325 @var{format} is an AHEX format and its width is @code{@var{format}->w
329 @deftypefun char *fmt_to_string (const struct fmt_spec *@var{format}, char @var{s}[FMT_STRING_LEN_MAX + 1])
330 Converts @var{format} to a human-readable format specifier in @var{s}
331 and returns @var{s}. @var{format} need not be a valid input or output
332 format specifier, e.g.@: it is allowed to have an excess width or
333 decimal places. In particular, if @var{format} has decimals, they are
334 included in the output string, even if @var{format}'s type does not
335 allow decimals, to allow accurately presenting incorrect formats to
339 @deftypefun bool fmt_equal (const struct fmt_spec *@var{a}, const struct fmt_spec *@var{b})
340 Compares @var{a} and @var{b} memberwise and returns true if they are
341 identical, false otherwise. @var{format} need not be a valid input or
342 output format specifier.
345 @deftypefun void fmt_resize (struct fmt_spec *@var{fmt}, int @var{width})
346 Sets the width of @var{fmt} to a valid format for a @union{value} of size @var{width}.
349 @node Obtaining Properties of Format Types
350 @subsection Obtaining Properties of Format Types
352 These functions work with @enum{fmt_type}s instead of the higher-level
353 @struct{fmt_spec}s. Their primary purpose is to report properties of
354 each possible format type, which in turn allows clients to abstract
355 away many of the details of the very heterogeneous requirements of
358 The first group of functions works with format type names.
360 @deftypefun const char *fmt_name (enum fmt_type @var{type})
361 Returns the name for the given @var{type}, e.g.@: @code{"COMMA"} for
365 @deftypefun bool fmt_from_name (const char *@var{name}, enum fmt_type *@var{type})
366 Tries to find the @enum{fmt_type} associated with @var{name}. If
367 successful, sets @code{*@var{type}} to the type and returns true;
368 otherwise, returns false without modifying @code{*@var{type}}.
371 The functions below query basic limits on width and decimal places for
374 @deftypefun bool fmt_takes_decimals (enum fmt_type @var{type})
375 Returns true if a format of the given @var{type} is allowed to have a
376 nonzero number of decimal places (the @code{d} member of
377 @struct{fmt_spec}), false if not.
380 @anchor{fmt_min_input_width}
381 @anchor{fmt_max_input_width}
382 @anchor{fmt_min_output_width}
383 @anchor{fmt_max_output_width}
384 @deftypefun int fmt_min_input_width (enum fmt_type @var{type})
385 @deftypefunx int fmt_max_input_width (enum fmt_type @var{type})
386 @deftypefunx int fmt_min_output_width (enum fmt_type @var{type})
387 @deftypefunx int fmt_max_output_width (enum fmt_type @var{type})
388 Returns the minimum or maximum width (the @code{w} member of
389 @struct{fmt_spec}) allowed for an input or output format of the
390 specified @var{type}.
393 @anchor{fmt_max_input_decimals}
394 @anchor{fmt_max_output_decimals}
395 @deftypefun int fmt_max_input_decimals (enum fmt_type @var{type}, int @var{width})
396 @deftypefunx int fmt_max_output_decimals (enum fmt_type @var{type}, int @var{width})
397 Returns the maximum number of decimal places allowed for an input or
398 output format, respectively, of the given @var{type} and @var{width}.
399 Returns 0 if the specified @var{type} does not allow any decimal
400 places or if @var{width} is too narrow to allow decimal places.
403 @deftypefun int fmt_step_width (enum fmt_type @var{type})
404 Returns the ``width step'' for a @struct{fmt_spec} of the given
405 @var{type}. A @struct{fmt_spec}'s width must be a multiple of its
406 type's width step. Most format types have a width step of 1, so that
407 their formats' widths may be any integer within the valid range, but
408 hexadecimal numeric formats and AHEX string formats have a width step
412 These functions allow clients to broadly determine how each kind of
413 input or output format behaves.
415 @deftypefun bool fmt_is_string (enum fmt_type @var{type})
416 @deftypefunx bool fmt_is_numeric (enum fmt_type @var{type})
417 Returns true if @var{type} is a format for numeric or string values,
418 respectively, false otherwise.
421 @deftypefun enum fmt_category fmt_get_category (enum fmt_type @var{type})
422 Returns the category within which @var{type} falls.
424 @deftp {Enumeration} {enum fmt_category}
425 A group of format types. Format type categories correspond to the
426 input and output categories described in the PSPP user documentation
427 (@pxref{Input and Output Formats,,,pspp, PSPP Users Guide}).
429 Each format is in exactly one category. The categories have bitwise
430 disjoint values to make it easy to test whether a format type is in
431 one of multiple categories, e.g.@:
434 if (fmt_get_category (type) & (FMT_CAT_DATE | FMT_CAT_TIME))
436 /* @dots{}@r{@code{type} is a date or time format}@dots{} */
440 The format categories are:
443 Basic numeric formats.
446 Custom currency formats.
449 Legacy numeric formats.
454 @item FMT_CAT_HEXADECIMAL
463 @item FMT_CAT_DATE_COMPONENT
464 Date component formats.
472 The PSPP input and output routines use the following pair of functions
473 to convert @enum{fmt_type}s to and from the separate set of codes used
474 in system and portable files:
476 @deftypefun int fmt_to_io (enum fmt_type @var{type})
477 Returns the format code used in system and portable files that
478 corresponds to @var{type}.
481 @deftypefun bool fmt_from_io (int @var{io}, enum fmt_type *@var{type})
482 Converts @var{io}, a format code used in system and portable files,
483 into a @enum{fmt_type} in @code{*@var{type}}. Returns true if
484 successful, false if @var{io} is not valid.
487 These functions reflect the relationship between input and output
490 @deftypefun enum fmt_type fmt_input_to_output (enum fmt_type @var{type})
491 Returns the output format type that is used by default by DATA LIST
492 and other input procedures when @var{type} is specified as an input
493 format. The conversion from input format to output format is more
494 complicated than simply changing the format.
495 @xref{fmt_for_output_from_input}, for a function that performs the
499 @deftypefun bool fmt_usable_for_input (enum fmt_type @var{type})
500 Returns true if @var{type} may be used as an input format type, false
501 otherwise. The custom currency formats, in particular, may be used
502 for output but not for input.
504 All format types are valid for output.
507 The final group of format type property functions obtain
508 human-readable templates that illustrate the formats graphically.
510 @deftypefun const char *fmt_date_template (enum fmt_type @var{type})
511 Returns a formatting template for @var{type}, which must be a date or
512 time format type. These formats are used by @func{data_in} and
513 @func{data_out} to guide parsing and formatting date and time data.
516 @deftypefun char *fmt_dollar_template (const struct fmt_spec *@var{format})
517 Returns a string of the form @code{$#,###.##} according to
518 @var{format}, which must be of type @code{FMT_DOLLAR}. The caller
519 must free the string with @code{free}.
522 @node Numeric Formatting Styles
523 @subsection Numeric Formatting Styles
525 Each of the basic numeric formats (F, E, COMMA, DOT, DOLLAR, PCT) and
526 custom currency formats (CCA, CCB, CCC, CCD, CCE) has an associated
527 numeric formatting style, represented by @struct{fmt_number_style}.
528 Input and output conversion of formats that have numeric styles is
529 determined mainly by the style, although the formatting rules have
530 special cases that are not represented within the style.
532 @deftp {Structure} {struct fmt_number_style}
533 A structure type with the following members:
536 @item struct substring neg_prefix
537 @itemx struct substring prefix
538 @itemx struct substring suffix
539 @itemx struct substring neg_suffix
540 A set of strings used a prefix to negative numbers, a prefix to every
541 number, a suffix to every number, and a suffix to negative numbers,
542 respectively. Each of these strings is no more than
543 @code{FMT_STYLE_AFFIX_MAX} bytes (currently 16) bytes in length.
544 These strings must be freed with @func{ss_dealloc} when no longer
548 The character used as a decimal point. It must be either @samp{.} or
552 The character used for grouping digits to the left of the decimal
553 point. It may be @samp{.} or @samp{,}, in which case it must not be
554 equal to @code{decimal}, or it may be set to 0 to disable grouping.
558 The following functions are provided for working with numeric
561 @deftypefun void fmt_number_style_init (struct fmt_number_style *@var{style})
562 Initialises a @struct{fmt_number_style} with all of the
563 prefixes and suffixes set to the empty string, @samp{.} as the decimal
564 point character, and grouping disables.
568 @deftypefun void fmt_number_style_destroy (struct fmt_number_style *@var{style})
569 Destroys @var{style}, freeing its storage.
572 @deftypefun {struct fmt_number_style} *fmt_create (void)
573 A function which creates an array of all the styles used by pspp, and
574 calls fmt_number_style_init on each of them.
577 @deftypefun void fmt_done (struct fmt_number_style *@var{styles})
578 A wrapper function which takes an array of @struct{fmt_number_style}, calls
579 fmt_number_style_destroy on each of them, and then frees the array.
584 @deftypefun int fmt_affix_width (const struct fmt_number_style *@var{style})
585 Returns the total length of @var{style}'s @code{prefix} and @code{suffix}.
588 @deftypefun int fmt_neg_affix_width (const struct fmt_number_style *@var{style})
589 Returns the total length of @var{style}'s @code{neg_prefix} and
593 PSPP maintains a global set of number styles for each of the basic
594 numeric formats and custom currency formats. The following functions
595 work with these global styles:
597 @deftypefun {const struct fmt_number_style *} fmt_get_style (enum fmt_type @var{type})
598 Returns the numeric style for the given format @var{type}.
601 @deftypefun void fmt_check_style (const struct fmt_number_style *@var{style})
602 Asserts that style is self consistent.
606 @deftypefun {const char *} fmt_name (enum fmt_type @var{type})
607 Returns the name of the given format @var{type}.
612 @node Formatted Data Input and Output
613 @subsection Formatted Data Input and Output
615 These functions provide the ability to convert data fields into
616 @union{value}s and vice versa.
618 @deftypefun bool data_in (struct substring @var{input}, enum legacy_encoding @var{legacy_encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, union value *@var{output}, int @var{width})
619 Parses @var{input} as a field containing data in the given format
620 @var{type}. The resulting value is stored in @var{output}, which has
621 the given @var{width}. For consistency, @var{width} must be 0 if
622 @var{type} is a numeric format type and greater than 0 if @var{type}
623 is a string format type.
625 Ordinarily @var{legacy_encoding} should be @code{LEGACY_NATIVE},
626 indicating that @var{input} is encoded in the character set
627 conventionally used on the host machine. It may be set to
628 @code{LEGACY_EBCDIC} to cause @var{input} to be re-encoded from EBCDIC
631 If @var{input} is the empty string (with length 0), @var{output} is
632 set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP
633 Users Guide}) for a numeric value, or to all spaces for a string
634 value. This applies regardless of the usual parsing requirements for
637 If @var{implied_decimals} is greater than zero, then the numeric
638 result is shifted right by @var{implied_decimals} decimal places if
639 @var{input} does not contain a decimal point character or an exponent.
640 Only certain numeric format types support implied decimal places; for
641 string formats and other numeric formats, @var{implied_decimals} has
642 no effect. DATA LIST FIXED is the primary user of this feature
643 (@pxref{DATA LIST FIXED,,,pspp, PSPP Users Guide}). Other callers
644 should generally specify 0 for @var{implied_decimals}, to disable this
647 When @var{input} contains invalid input data, @func{data_in} outputs a
648 message using @func{msg}.
650 If @var{first_column} is
651 nonzero, it is included in any such error message as the 1-based
652 column number of the start of the field. The last column in the field
653 is calculated as @math{@var{first_column} + @var{input} - 1}. To
654 suppress error output, enclose the call to @func{data_in} by calls to
655 @func{msg_disable} and @func{msg_enable}.
657 This function returns true on success, false if a message was output
658 (even if suppressed). Overflow and underflow provoke warnings but are
659 not propagated to the caller as errors.
661 This function is declared in @file{data/data-in.h}.
664 @deftypefun void data_out (const union value *@var{input}, const struct fmt_spec *@var{format}, char *@var{output})
665 @deftypefunx void data_out_legacy (const union value *@var{input}, enum legacy_encoding @var{legacy_encoding}, const struct fmt_spec *@var{format}, char *@var{output})
666 Converts the data pointed to by @var{input} into a data field in
667 @var{output} according to output format specifier @var{format}, which
668 must be a valid output format. Exactly @code{@var{format}->w} bytes
669 are written to @var{output}. The width of @var{input} is also
670 inferred from @var{format} using an algorithm equivalent to
671 @func{fmt_var_width}.
673 If @func{data_out} is called, or @func{data_out_legacy} is called with
674 @var{legacy_encoding} set to @code{LEGACY_NATIVE}, @var{output} will
675 be encoded in the character set conventionally used on the host
676 machine. If @var{legacy_encoding} is set to @code{LEGACY_EBCDIC},
677 @var{output} will be re-encoded from EBCDIC during data output.
679 When @var{input} contains data that cannot be represented in the given
680 @var{format}, @func{data_out} may output a message using @func{msg},
682 although the current implementation does not
683 consistently do so. To suppress error output, enclose the call to
684 @func{data_out} by calls to @func{msg_disable} and @func{msg_enable}.
686 This function is declared in @file{data/data-out.h}.
689 @node User-Missing Values
690 @section User-Missing Values
692 In addition to the system-missing value for numeric values, each
693 variable has a set of user-missing values (@pxref{MISSING
694 VALUES,,,pspp, PSPP Users Guide}). A set of user-missing values is
695 represented by @struct{missing_values}.
697 It is rarely necessary to interact directly with a
698 @struct{missing_values} object. Instead, the most common operation,
699 querying whether a particular value is a missing value for a given
700 variable, is most conveniently executed through functions on
701 @struct{variable}. @xref{Variable Missing Values}, for details.
703 A @struct{missing_values} is essentially a set of @union{value}s that
704 have a common value width (@pxref{Values}). For a set of
705 missing values associated with a variable (the common case), the set's
706 width is the same as the variable's width. The contents of a set of
707 missing values is subject to some restrictions. Regardless of width,
708 a set of missing values is allowed to be empty. Otherwise, its
709 possible contents depend on its width:
712 @item 0 (numeric values)
713 Up to three discrete numeric values, or a range of numeric values
714 (which includes both ends of the range), or a range plus one discrete
717 @item 1@dots{}@t{MAX_SHORT_STRING} - 1 (short string values)
718 Up to three discrete string values (with the same width as the set).
720 @item @t{MAX_SHORT_STRING}@dots{}@t{MAX_STRING} (long string values)
724 These somewhat arbitrary restrictions are the same as those imposed by
725 SPSS. In PSPP we could easily eliminate these restrictions, but doing
726 so would also require us to extend the system file format in an
727 incompatible way, which we consider a bad tradeoff.
729 Function prototypes and other declarations related to missing values
730 are declared in @file{data/missing-values.h}.
732 @deftp {Structure} {struct missing_values}
733 Opaque type that represents a set of missing values.
736 The most often useful functions for missing values are those for
737 testing whether a given value is missing, described in the following
738 section. Several other functions for creating, inspecting, and
739 modifying @struct{missing_values} objects are described afterward, but
740 these functions are much more rarely useful. No function for
741 destroying a @struct{missing_values} is provided, because
742 @struct{missing_values} does not contain any pointers or other
743 references to resources that need deallocation.
746 * Testing for Missing Values::
747 * Initializing User-Missing Value Sets::
748 * Changing User-Missing Value Set Width::
749 * Inspecting User-Missing Value Sets::
750 * Modifying User-Missing Value Sets::
753 @node Testing for Missing Values
754 @subsection Testing for Missing Values
756 The most often useful functions for missing values are those for
757 testing whether a given value is missing, described here. However,
758 using one of the corresponding missing value testing functions for
759 variables can be even easier (@pxref{Variable Missing Values}).
761 @deftypefun bool mv_is_value_missing (const struct missing_values *@var{mv}, const union value *@var{value}, enum mv_class @var{class})
762 @deftypefunx bool mv_is_num_missing (const struct missing_values *@var{mv}, double @var{value}, enum mv_class @var{class})
763 @deftypefunx bool mv_is_str_missing (const struct missing_values *@var{mv}, const char @var{value}[], enum mv_class @var{class})
764 Tests whether @var{value} is in one of the categories of missing
765 values given by @var{class}. Returns true if so, false otherwise.
767 @var{mv} determines the width of @var{value} and provides the set of
768 user-missing values to test.
770 The only difference among these functions in the form in which
771 @var{value} is provided, so you may use whichever function is most
774 The @var{class} argument determines the exact kinds of missing values
775 that the functions test for:
777 @deftp Enumeration {enum mv_class}
780 Returns true if @var{value} is in the set of user-missing values given
784 Returns true if @var{value} is system-missing. (If @var{mv}
785 represents a set of string values, then @var{value} is never
789 @itemx MV_USER | MV_SYSTEM
790 Returns true if @var{value} is user-missing or system-missing.
793 Always returns false, that is, @var{value} is never considered
799 @node Initializing User-Missing Value Sets
800 @subsection Initializing User-Missing Value Sets
802 @deftypefun void mv_init (struct missing_values *@var{mv}, int @var{width})
803 Initializes @var{mv} as a set of user-missing values. The set is
804 initially empty. Any values added to it must have the specified
808 @deftypefun void mv_copy (struct missing_values *@var{mv}, const struct missing_values *@var{old})
809 Initializes @var{mv} as a copy of the existing set of user-missing
813 @deftypefun void mv_clear (struct missing_values *@var{mv})
814 Empties the user-missing value set @var{mv}, retaining its existing
818 @node Changing User-Missing Value Set Width
819 @subsection Changing User-Missing Value Set Width
821 A few PSPP language constructs copy sets of user-missing values from
822 one variable to another. When the source and target variables have
823 the same width, this is simple. But when the target variable's width
824 might be different from the source variable's, it takes a little more
825 work. The functions described here can help.
827 In fact, it is usually unnecessary to call these functions directly.
828 Most of the time @func{var_set_missing_values}, which uses
829 @func{mv_resize} internally to resize the new set of missing values to
830 the required width, may be used instead.
831 @xref{var_set_missing_values}, for more information.
833 @deftypefun bool mv_is_resizable (const struct missing_values *@var{mv}, int @var{new_width})
834 Tests whether @var{mv}'s width may be changed to @var{new_width} using
835 @func{mv_resize}. Returns true if it is allowed, false otherwise.
837 If @var{new_width} is a long string width, @var{mv} may be resized
838 only if it is empty. Otherwise, if @var{mv} contains any missing
839 values, then it may be resized only if each missing value may be
840 resized, as determined by @func{value_is_resizable}
841 (@pxref{value_is_resizable}).
845 @deftypefun void mv_resize (struct missing_values *@var{mv}, int @var{width})
846 Changes @var{mv}'s width to @var{width}. @var{mv} and @var{width}
847 must satisfy the constraints explained above.
849 When a string missing value set's width is increased, each
850 user-missing value is padded on the right with spaces to the new
854 @node Inspecting User-Missing Value Sets
855 @subsection Inspecting User-Missing Value Sets
857 These functions inspect the properties and contents of
858 @struct{missing_values} objects.
860 The first set of functions inspects the discrete values that numeric
861 and short string sets of user-missing values may contain:
863 @deftypefun bool mv_is_empty (const struct missing_values *@var{mv})
864 Returns true if @var{mv} contains no user-missing values, false if it
865 contains at least one user-missing value (either a discrete value or a
869 @deftypefun int mv_get_width (const struct missing_values *@var{mv})
870 Returns the width of the user-missing values that @var{mv} represents.
873 @deftypefun int mv_n_values (const struct missing_values *@var{mv})
874 Returns the number of discrete user-missing values included in
875 @var{mv}. The return value will be between 0 and 3. For sets of
876 numeric user-missing values that include a range, the return value
880 @deftypefun bool mv_has_value (const struct missing_values *@var{mv})
881 Returns true if @var{mv} has at least one discrete user-missing
882 values, that is, if @func{mv_n_values} would return nonzero for
886 @deftypefun void mv_get_value (const struct missing_values *@var{mv}, union value *@var{value}, int @var{index})
887 Copies the discrete user-missing value in @var{mv} with the given
888 @var{index} into @var{value}. The index must be less than the number
889 of discrete user-missing values in @var{mv}, as reported by
893 The second set of functions inspects the single range of values that
894 numeric sets of user-missing values may contain:
896 @deftypefun bool mv_has_range (const struct missing_values *@var{mv})
897 Returns true if @var{mv} includes a range, false otherwise.
900 @deftypefun void mv_get_range (const struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
901 Stores the low endpoint of @var{mv}'s range in @code{*@var{low}} and
902 the high endpoint of the range in @code{*@var{high}}. @var{mv} must
906 @node Modifying User-Missing Value Sets
907 @subsection Modifying User-Missing Value Sets
909 These functions modify the contents of @struct{missing_values}
912 The first set of functions applies to all sets of user-missing values:
914 @deftypefun bool mv_add_value (struct missing_values *@var{mv}, const union value *@var{value})
915 @deftypefunx bool mv_add_str (struct missing_values *@var{mv}, const char @var{value}[])
916 @deftypefunx bool mv_add_num (struct missing_values *@var{mv}, double @var{value})
917 Attempts to add the given discrete @var{value} to set of user-missing
918 values @var{mv}. @var{value} must have the same width as @var{mv}.
919 Returns true if @var{value} was successfully added, false if the set
920 could not accept any more discrete values. (Always returns false if
921 @var{mv} is a set of long string user-missing values.)
923 These functions are equivalent, except for the form in which
924 @var{value} is provided, so you may use whichever function is most
928 @deftypefun void mv_pop_value (struct missing_values *@var{mv}, union value *@var{value})
929 Removes a discrete value from @var{mv} (which must contain at least
930 one discrete value) and stores it in @var{value}.
933 @deftypefun void mv_replace_value (struct missing_values *@var{mv}, const union value *@var{value}, int @var{index})
934 Replaces the discrete value with the given @var{index} in @var{mv}
935 (which must contain at least @var{index} + 1 discrete values) with
939 The second set of functions applies only to numeric sets of
942 @deftypefun bool mv_add_range (struct missing_values *@var{mv}, double @var{low}, double @var{high})
943 Attempts to add a numeric range covering @var{low}@dots{}@var{high}
944 (inclusive on both ends) to @var{mv}, which must be a numeric set of
945 user-missing values. Returns true if the range is successful added,
946 false on failure. Fails if @var{mv} already contains a range, or if
947 @var{mv} contains more than one discrete value, or if @var{low} >
951 @deftypefun void mv_pop_range (struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
952 Given @var{mv}, which must be a numeric set of user-missing values
953 that contains a range, removes that range from @var{mv} and stores its
954 low endpoint in @code{*@var{low}} and its high endpoint in
959 @section Value Labels
961 Each variable has a set of value labels (@pxref{VALUE LABELS,,,pspp,
962 PSPP Users Guide}), represented as @struct{val_labs}. A
963 @struct{val_labs} is essentially a map from @union{value}s to strings.
964 All of the values in a set of value labels have the same width, which
965 for a set of value labels owned by a variable (the common case) is the
966 same as its variable.
968 Numeric and short string sets of value labels may contain any number
969 of entries. Long string sets of value labels may not contain any
970 value labels at all, due to a corresponding restriction in SPSS. In
971 PSPP we could easily eliminate this restriction, but doing so would
972 also require us to extend the system file format in an incompatible
973 way, which we consider a bad tradeoff.
975 It is rarely necessary to interact directly with a @struct{val_labs}
976 object. Instead, the most common operation, looking up the label for
977 a value of a given variable, can be conveniently executed through
978 functions on @struct{variable}. @xref{Variable Value Labels}, for
981 Function prototypes and other declarations related to missing values
982 are declared in @file{data/value-labels.h}.
984 @deftp {Structure} {struct val_labs}
985 Opaque type that represents a set of value labels.
988 The most often useful function for value labels is
989 @func{val_labs_find}, for looking up the label associated with a
992 @deftypefun {char *} val_labs_find (const struct val_labs *@var{val_labs}, union value @var{value})
993 Looks in @var{val_labs} for a label for the given @var{value}.
994 Returns the label, if one is found, or a null pointer otherwise.
997 Several other functions for working with value labels are described in
998 the following section, but these are more rarely useful.
1001 * Value Labels Creation and Destruction::
1002 * Value Labels Properties::
1003 * Value Labels Adding and Removing Labels::
1004 * Value Labels Iteration::
1007 @node Value Labels Creation and Destruction
1008 @subsection Creation and Destruction
1010 These functions create and destroy @struct{val_labs} objects.
1012 @deftypefun {struct val_labs *} val_labs_create (int @var{width})
1013 Creates and returns an initially empty set of value labels with the
1017 @deftypefun {struct val_labs *} val_labs_clone (const struct val_labs *@var{val_labs})
1018 Creates and returns a set of value labels whose width and contents are
1019 the same as those of @var{var_labs}.
1022 @deftypefun void val_labs_clear (struct val_labs *@var{var_labs})
1023 Deletes all value labels from @var{var_labs}.
1026 @deftypefun void val_labs_destroy (struct val_labs *@var{var_labs})
1027 Destroys @var{var_labs}, which must not be referenced again.
1030 @node Value Labels Properties
1031 @subsection Value Labels Properties
1033 These functions inspect and manipulate basic properties of
1034 @struct{val_labs} objects.
1036 @deftypefun size_t val_labs_count (const struct val_labs *@var{val_labs})
1037 Returns the number of value labels in @var{val_labs}.
1040 @deftypefun bool val_labs_can_set_width (const struct val_labs *@var{val_labs}, int @var{new_width})
1041 Tests whether @var{val_labs}'s width may be changed to @var{new_width}
1042 using @func{val_labs_set_width}. Returns true if it is allowed, false
1045 A set of value labels may be resized to a given width only if each
1046 value in it may be resized to that width, as determined by
1047 @func{value_is_resizable} (@pxref{value_is_resizable}).
1050 @deftypefun void val_labs_set_width (struct val_labs *@var{val_labs}, int @var{new_width})
1051 Changes the width of @var{val_labs}'s values to @var{new_width}, which
1052 must be a valid new width as determined by
1053 @func{val_labs_can_set_width}.
1055 If @var{new_width} is a long string width, this function deletes all
1056 value labels from @var{val_labs}.
1059 @node Value Labels Adding and Removing Labels
1060 @subsection Adding and Removing Labels
1062 These functions add and remove value labels from a @struct{val_labs}
1063 object. These functions apply only to numeric and short string sets
1064 of value labels. They have no effect on long string sets of value
1065 labels, since these sets are always empty.
1067 @deftypefun bool val_labs_add (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1068 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1069 which must have the same width as the set of value labels. Returns
1070 true if successful, false if @var{value} already has a label or if
1071 @var{val_labs} has long string width.
1074 @deftypefun void val_labs_replace (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1075 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1076 which must have the same width as the set of value labels. If
1077 @var{value} already has a label in @var{var_labs}, it is replaced.
1078 Has no effect if @var{var_labs} has long string width.
1081 @deftypefun bool val_labs_remove (struct val_labs *@var{val_labs}, union value @var{value})
1082 Removes from @var{val_labs} any label for @var{value}, which must have
1083 the same width as the set of value labels. Returns true if a label
1084 was removed, false otherwise.
1087 @node Value Labels Iteration
1088 @subsection Iterating through Value Labels
1090 These functions allow iteration through the set of value labels
1091 represented by a @struct{val_labs} object. They are usually used in
1092 the context of a @code{for} loop:
1095 struct val_labs val_labs;
1096 struct val_labs_iterator *i;
1101 for (vl = val_labs_first (val_labs, &i); vl != NULL;
1102 vl = val_labs_next (val_labs, &i))
1104 @dots{}@r{do something with @code{vl}}@dots{}
1108 The value labels in a @struct{val_labs} must not be modified as it is
1109 undergoing iteration.
1111 @deftp {Structure} {struct val_lab}
1112 Represents a value label for iteration purposes, with two
1113 client-visible members:
1116 @item union value value
1117 Value being labeled, of the same width as the @struct{val_labs} being
1120 @item const char *label
1121 The label, as a null-terminated string.
1125 @deftp {Structure} {struct val_labs_iterator}
1126 Opaque object that represents the current state of iteration through a
1127 set of value value labels. Automatically destroyed by successful
1128 completion of iteration. Must be destroyed manually in other
1129 circumstances, by calling @func{val_labs_done}.
1132 @deftypefun {struct val_lab *} val_labs_first (const struct val_labs *@var{val_labs}, struct val_labs_iterator **@var{iterator})
1133 If @var{val_labs} contains at least one value label, starts an
1134 iteration through @var{val_labs}, initializes @code{*@var{iterator}}
1135 to point to a newly allocated iterator, and returns the first value
1136 label in @var{val_labs}. If @var{val_labs} is empty, sets
1137 @code{*@var{iterator}} to null and returns a null pointer.
1139 This function creates iterators that traverse sets of value labels in
1140 no particular order.
1143 @deftypefun {struct val_lab *} val_labs_first_sorted (const struct val_labs *@var{val_labs}, struct val_labs_iterator **@var{iterator})
1144 Same as @func{val_labs_first}, except that the created iterator
1145 traverses the set of value labels in ascending order of value.
1148 @deftypefun {struct val_lab *} val_labs_next (const struct val_labs *@var{val_labs}, struct val_labs_iterator **@var{iterator})
1149 Advances an iterator created with @func{val_labs_first} or
1150 @func{val_labs_first_sorted} to the next value label, which is
1151 returned. If the set of value labels is exhausted, returns a null
1152 pointer after freeing @code{*@var{iterator}} and setting it to a null
1156 @deftypefun void val_labs_done (struct val_labs_iterator **@var{iterator})
1157 Frees @code{*@var{iterator}} and sets it to a null pointer. Does
1158 not need to be called explicitly if @func{val_labs_next} returns a
1159 null pointer, indicating that all value labels have been visited.
1165 A PSPP variable is represented by @struct{variable}, an opaque type
1166 declared in @file{data/variable.h} along with related declarations.
1167 @xref{Variables,,,pspp, PSPP Users Guide}, for a description of PSPP
1168 variables from a user perspective.
1170 PSPP is unusual among computer languages in that, by itself, a PSPP
1171 variable does not have a value. Instead, a variable in PSPP takes on
1172 a value only in the context of a case, which supplies one value for
1173 each variable in a set of variables (@pxref{Cases}). The set of
1174 variables in a case, in turn, are ordinarily part of a dictionary
1175 (@pxref{Dictionaries}).
1177 Every variable has several attributes, most of which correspond
1178 directly to one of the variable attributes visible to PSPP users
1179 (@pxref{Attributes,,,pspp, PSPP Users Guide}).
1181 The following sections describe variable-related functions and macros.
1185 * Variable Type and Width::
1186 * Variable Missing Values::
1187 * Variable Value Labels::
1188 * Variable Print and Write Formats::
1190 * Variable GUI Attributes::
1191 * Variable Leave Status::
1192 * Dictionary Class::
1193 * Variable Creation and Destruction::
1194 * Variable Short Names::
1195 * Variable Relationships::
1196 * Variable Auxiliary Data::
1197 * Variable Categorical Values::
1201 @subsection Variable Name
1203 A variable name is a string between 1 and @code{VAR_NAME_LEN} bytes
1204 long that satisfies the rules for PSPP identifiers
1205 (@pxref{Tokens,,,pspp, PSPP Users Guide}). Variable names are
1206 mixed-case and treated case-insensitively.
1208 @deftypefn Macro int VAR_NAME_LEN
1209 Maximum length of a variable name, in bytes, currently 64.
1212 Only one commonly useful function relates to variable names:
1214 @deftypefun {const char *} var_get_name (const struct variable *@var{var})
1215 Returns @var{var}'s variable name as a C string.
1218 A few other functions are much more rarely used. Some of these
1219 functions are used internally by the dictionary implementation:
1221 @anchor{var_set_name}
1222 @deftypefun {void} var_set_name (struct variable *@var{var}, const char *@var{new_name})
1223 Changes the name of @var{var} to @var{new_name}, which must be a
1224 ``plausible'' name as defined below.
1226 This function cannot be applied to a variable that is part of a
1227 dictionary. Use @func{dict_rename_var} instead (@pxref{Dictionary
1228 Renaming Variables}).
1231 @anchor{var_is_plausible_name}
1232 @deftypefun {bool} var_is_valid_name (const char *@var{name}, bool @var{issue_error})
1233 @deftypefunx {bool} var_is_plausible_name (const char *@var{name}, bool @var{issue_error})
1234 Tests @var{name} for validity or ``plausibility.'' Returns true if
1235 the name is acceptable, false otherwise. If the name is not
1236 acceptable and @var{issue_error} is true, also issues an error message
1237 explaining the violation.
1239 A valid name is one that fully satisfies all of the requirements for
1240 variable names (@pxref{Tokens,,,pspp, PSPP Users Guide}). A
1241 ``plausible'' name is simply a string whose length is in the valid
1242 range and that is not a reserved word. PSPP accepts plausible but
1243 invalid names as variable names in some contexts where the character
1244 encoding scheme is ambiguous, as when reading variable names from
1248 @deftypefun {enum dict_class} var_get_dict_class (const struct variable *@var{var})
1249 Returns the dictionary class of @var{var}'s name (@pxref{Dictionary
1253 @node Variable Type and Width
1254 @subsection Variable Type and Width
1256 A variable's type and width are the type and width of its values
1259 @deftypefun {enum val_type} var_get_type (const struct variable *@var{var})
1260 Returns the type of variable @var{var}.
1263 @deftypefun int var_get_width (const struct variable *@var{var})
1264 Returns the width of variable @var{var}.
1267 @deftypefun void var_set_width (struct variable *@var{var}, int @var{width})
1268 Sets the width of variable @var{var} to @var{width}. The width of a
1269 variable should not normally be changed after the variable is created,
1270 so this function is rarely used. This function cannot be applied to a
1271 variable that is part of a dictionary.
1274 @deftypefun bool var_is_numeric (const struct variable *@var{var})
1275 Returns true if @var{var} is a numeric variable, false otherwise.
1278 @deftypefun bool var_is_alpha (const struct variable *@var{var})
1279 Returns true if @var{var} is an alphanumeric (string) variable, false
1283 @deftypefun bool var_is_short_string (const struct variable *@var{var})
1284 Returns true if @var{var} is a string variable of width
1285 @code{MAX_SHORT_STRING} or less, false otherwise.
1288 @deftypefun bool var_is_long_string (const struct variable *@var{var})
1289 Returns true if @var{var} is a string variable of width greater than
1290 @code{MAX_SHORT_STRING}, false otherwise.
1293 @deftypefun size_t var_get_value_cnt (const struct variable *@var{var})
1294 Returns the number of @union{value}s needed to hold an instance of
1295 variable @var{var}. @code{var_get_value_cnt (var)} is equivalent to
1296 @code{value_cnt_from_width (var_get_width (var))}.
1299 @node Variable Missing Values
1300 @subsection Variable Missing Values
1302 A numeric or short string variable may have a set of user-missing
1303 values (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}), represented
1304 as a @struct{missing_values} (@pxref{User-Missing Values}).
1306 The most frequent operation on a variable's missing values is to query
1307 whether a value is user- or system-missing:
1309 @deftypefun bool var_is_value_missing (const struct variable *@var{var}, const union value *@var{value}, enum mv_class @var{class})
1310 @deftypefunx bool var_is_num_missing (const struct variable *@var{var}, double @var{value}, enum mv_class @var{class})
1311 @deftypefunx bool var_is_str_missing (const struct variable *@var{var}, const char @var{value}[], enum mv_class @var{class})
1312 Tests whether @var{value} is a missing value of the given @var{class}
1313 for variable @var{var} and returns true if so, false otherwise.
1314 @func{var_is_num_missing} may only be applied to numeric variables;
1315 @func{var_is_str_missing} may only be applied to string variables.
1316 For string variables, @var{value} must contain exactly as many
1317 characters as @var{var}'s width.
1319 @code{var_is_@var{type}_missing (@var{var}, @var{value}, @var{class})}
1320 is equivalent to @code{mv_is_@var{type}_missing
1321 (var_get_missing_values (@var{var}), @var{value}, @var{class})}.
1324 In addition, a few functions are provided to work more directly with a
1325 variable's @struct{missing_values}:
1327 @deftypefun {const struct missing_values *} var_get_missing_values (const struct variable *@var{var})
1328 Returns the @struct{missing_values} associated with @var{var}. The
1329 caller must not modify the returned structure. The return value is
1333 @anchor{var_set_missing_values}
1334 @deftypefun {void} var_set_missing_values (struct variable *@var{var}, const struct missing_values *@var{miss})
1335 Changes @var{var}'s missing values to a copy of @var{miss}, or if
1336 @var{miss} is a null pointer, clears @var{var}'s missing values. If
1337 @var{miss} is non-null, it must have the same width as @var{var} or be
1338 resizable to @var{var}'s width (@pxref{mv_resize}). The caller
1339 retains ownership of @var{miss}.
1342 b@deftypefun void var_clear_missing_values (struct variable *@var{var})
1343 Clears @var{var}'s missing values. Equivalent to
1344 @code{var_set_missing_values (@var{var}, NULL)}.
1347 @deftypefun bool var_has_missing_values (const struct variable *@var{var})
1348 Returns true if @var{var} has any missing values, false if it has
1349 none. Equivalent to @code{mv_is_empty (var_get_missing_values (@var{var}))}.
1352 @node Variable Value Labels
1353 @subsection Variable Value Labels
1355 A numeric or short string variable may have a set of value labels
1356 (@pxref{VALUE LABELS,,,pspp, PSPP Users Guide}), represented as a
1357 @struct{val_labs} (@pxref{Value Labels}). The most commonly useful
1358 functions for value labels return the value label associated with a
1361 @deftypefun {const char *} var_lookup_value_label (const struct variable *@var{var}, const union value *@var{value})
1362 Looks for a label for @var{value} in @var{var}'s set of value labels.
1363 Returns the label if one exists, otherwise a null pointer.
1366 @deftypefun void var_append_value_name (const struct variable *@var{var}, const union value *@var{value}, struct string *@var{str})
1367 Looks for a label for @var{value} in @var{var}'s set of value labels.
1368 If a label exists, it will be appended to the string pointed to by @var{str}.
1369 Otherwise, it formats @var{value}
1370 using @var{var}'s print format (@pxref{Input and Output Formats})
1371 and appends the formatted string.
1374 The underlying @struct{val_labs} structure may also be accessed
1375 directly using the functions described below.
1377 @deftypefun bool var_has_value_labels (const struct variable *@var{var})
1378 Returns true if @var{var} has at least one value label, false
1382 @deftypefun {const struct val_labs *} var_get_value_labels (const struct variable *@var{var})
1383 Returns the @struct{val_labs} associated with @var{var}. If @var{var}
1384 has no value labels, then the return value may or may not be a null
1387 The variable retains ownership of the returned @struct{val_labs},
1388 which the caller must not attempt to modify.
1391 @deftypefun void var_set_value_labels (struct variable *@var{var}, const struct val_labs *@var{val_labs})
1392 Replaces @var{var}'s value labels by a copy of @var{val_labs}. The
1393 caller retains ownership of @var{val_labs}. If @var{val_labs} is a
1394 null pointer, then @var{var}'s value labels, if any, are deleted.
1397 @deftypefun void var_clear_value_labels (struct variable *@var{var})
1398 Deletes @var{var}'s value labels. Equivalent to
1399 @code{var_set_value_labels (@var{var}, NULL)}.
1402 A final group of functions offers shorthands for operations that would
1403 otherwise require getting the value labels from a variable, copying
1404 them, modifying them, and then setting the modified value labels into
1405 the variable (making a second copy):
1407 @deftypefun bool var_add_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1408 Attempts to add a copy of @var{label} as a label for @var{value} for
1409 the given @var{var}. If @var{value} already has a label, then the old
1410 label is retained. Returns true if a label is added, false if there
1411 was an existing label for @var{value} or if @var{var} is a long string
1412 variable. Either way, the caller retains ownership of @var{value} and
1416 @deftypefun void var_replace_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1417 Attempts to add a copy of @var{label} as a label for @var{value} for
1418 the given @var{var}. If @var{value} already has a label, then
1419 @var{label} replaces the old label. Either way, the caller retains
1420 ownership of @var{value} and @var{label}.
1422 If @var{var} is a long string variable, this function has no effect.
1425 @node Variable Print and Write Formats
1426 @subsection Variable Print and Write Formats
1428 Each variable has an associated pair of output formats, called its
1429 @dfn{print format} and @dfn{write format}. @xref{Input and Output
1430 Formats,,,pspp, PSPP Users Guide}, for an introduction to formats.
1431 @xref{Input and Output Formats}, for a developer's description of
1432 format representation.
1434 The print format is used to convert a variable's data values to
1435 strings for human-readable output. The write format is used similarly
1436 for machine-readable output, primarily by the WRITE transformation
1437 (@pxref{WRITE,,,pspp, PSPP Users Guide}). Most often a variable's
1438 print and write formats are the same.
1440 A newly created variable by default has format F8.2 if it is numeric
1441 or an A format with the same width as the variable if it is string.
1442 Many creators of variables override these defaults.
1444 Both the print format and write format are output formats. Input
1445 formats are not part of @struct{variable}. Instead, input programs
1446 and transformations keep track of variable input formats themselves.
1448 The following functions work with variable print and write formats.
1450 @deftypefun {const struct fmt_spec *} var_get_print_format (const struct variable *@var{var})
1451 @deftypefunx {const struct fmt_spec *} var_get_write_format (const struct variable *@var{var})
1452 Returns @var{var}'s print or write format, respectively.
1455 @deftypefun void var_set_print_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1456 @deftypefunx void var_set_write_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1457 @deftypefunx void var_set_both_formats (struct variable *@var{var}, const struct fmt_spec *@var{format})
1458 Sets @var{var}'s print format, write format, or both formats,
1459 respectively, to a copy of @var{format}.
1462 @node Variable Labels
1463 @subsection Variable Labels
1465 A variable label is a string that describes a variable. Variable
1466 labels may contain spaces and punctuation not allowed in variable
1467 names. @xref{VARIABLE LABELS,,,pspp, PSPP Users Guide}, for a
1468 user-level description of variable labels.
1470 The most commonly useful functions for variable labels are those to
1471 retrieve a variable's label:
1473 @deftypefun {const char *} var_to_string (const struct variable *@var{var})
1474 Returns @var{var}'s variable label, if it has one, otherwise
1475 @var{var}'s name. In either case the caller must not attempt to
1476 modify or free the returned string.
1478 This function is useful for user output.
1481 @deftypefun {const char *} var_get_label (const struct variable *@var{var})
1482 Returns @var{var}'s variable label, if it has one, or a null pointer
1486 A few other variable label functions are also provided:
1488 @deftypefun void var_set_label (struct variable *@var{var}, const char *@var{label})
1489 Sets @var{var}'s variable label to a copy of @var{label}, or removes
1490 any label from @var{var} if @var{label} is a null pointer or contains
1491 only spaces. Leading and trailing spaces are removed from the
1492 variable label and its remaining content is truncated at 255 bytes.
1495 @deftypefun void var_clear_label (struct variable *@var{var})
1496 Removes any variable label from @var{var}.
1499 @deftypefun bool var_has_label (const struct variable *@var{var})
1500 Returns true if @var{var} has a variable label, false otherwise.
1503 @node Variable GUI Attributes
1504 @subsection GUI Attributes
1506 These functions and types access and set attributes that are mainly
1507 used by graphical user interfaces. Their values are also stored in
1508 and retrieved from system files (but not portable files).
1510 The first group of functions relate to the measurement level of
1511 numeric data. New variables are assigned a nominal level of
1512 measurement by default.
1514 @deftp {Enumeration} {enum measure}
1515 Measurement level. Available values are:
1518 @item MEASURE_NOMINAL
1519 Numeric data values are arbitrary. Arithmetic operations and
1520 numerical comparisons of such data are not meaningful.
1522 @item MEASURE_ORDINAL
1523 Numeric data values indicate progression along a rank order.
1524 Arbitrary arithmetic operations such as addition are not meaningful on
1525 such data, but inequality comparisons (less, greater, etc.) have
1526 straightforward interpretations.
1529 Ratios, sums, etc. of numeric data values have meaningful
1533 PSPP does not have a separate category for interval data, which would
1534 naturally fall between the ordinal and scale measurement levels.
1537 @deftypefun bool measure_is_valid (enum measure @var{measure})
1538 Returns true if @var{measure} is a valid level of measurement, that
1539 is, if it is one of the @code{enum measure} constants listed above,
1540 and false otherwise.
1543 @deftypefun enum measure var_get_measure (const struct variable *@var{var})
1544 @deftypefunx void var_set_measure (struct variable *@var{var}, enum measure @var{measure})
1545 Gets or sets @var{var}'s measurement level.
1548 The following set of functions relates to the width of on-screen
1549 columns used for displaying variable data in a graphical user
1550 interface environment. The unit of measurement is the width of a
1551 character. For proportionally spaced fonts, this is based on the
1552 average width of a character.
1554 @deftypefun int var_get_display_width (const struct variable *@var{var})
1555 @deftypefunx void var_set_display_width (struct variable *@var{var}, int @var{display_width})
1556 Gets or sets @var{var}'s display width.
1559 @anchor{var_default_display_width}
1560 @deftypefun int var_default_display_width (int @var{width})
1561 Returns the default display width for a variable with the given
1562 @var{width}. The default width of a numeric variable is 8. The
1563 default width of a string variable is @var{width} or 32, whichever is
1567 The final group of functions work with the justification of data when
1568 it is displayed in on-screen columns. New variables are by default
1571 @deftp {Enumeration} {enum alignment}
1572 Text justification. Possible values are @code{ALIGN_LEFT},
1573 @code{ALIGN_RIGHT}, and @code{ALIGN_CENTRE}.
1576 @deftypefun bool alignment_is_valid (enum alignment @var{alignment})
1577 Returns true if @var{alignment} is a valid alignment, that is, if it
1578 is one of the @code{enum alignment} constants listed above, and false
1582 @deftypefun enum alignment var_get_alignment (const struct variable *@var{var})
1583 @deftypefunx void var_set_alignment (struct variable *@var{var}, enum alignment @var{alignment})
1584 Gets or sets @var{var}'s alignment.
1587 @node Variable Leave Status
1588 @subsection Variable Leave Status
1590 Commonly, most or all data in a case come from an input file, read
1591 with a command such as DATA LIST or GET, but data can also be
1592 generated with transformations such as COMPUTE. In the latter case
1593 the question of a datum's ``initial value'' can arise. For example,
1594 the value of a piece of generated data can recursively depend on its
1599 Another situation where the initial value of a variable arises is when
1600 its value is not set at all for some cases, e.g.@: below, @code{Y} is
1601 set only for the first 10 cases:
1603 DO IF #CASENUM <= 10.
1608 By default, the initial value of a datum in either of these situations
1609 is the system-missing value for numeric values and spaces for string
1610 values. This means that, above, X would be system-missing and that Y
1611 would be 1 for the first 10 cases and system-missing for the
1614 PSPP also supports retaining the value of a variable from one case to
1615 another, using the LEAVE command (@pxref{LEAVE,,,pspp, PSPP Users
1616 Guide}). The initial value of such a variable is 0 if it is numeric
1617 and spaces if it is a string. If the command @samp{LEAVE X Y} is
1618 appended to the above example, then X would have value 1 in the first
1619 case and increase by 1 in every succeeding case, and Y would have
1620 value 1 for the first 10 cases and 0 for later cases.
1622 The LEAVE command has no effect on data that comes from an input file
1623 or whose values do not depend on a variable's initial value.
1625 The value of scratch variables (@pxref{Scratch Variables,,,pspp, PSPP
1626 Users Guide}) are always left from one case to another.
1628 The following functions work with a variable's leave status.
1630 @deftypefun bool var_get_leave (const struct variable *@var{var})
1631 Returns true if @var{var}'s value is to be retained from case to case,
1632 false if it is reinitialized to system-missing or spaces.
1635 @deftypefun void var_set_leave (struct variable *@var{var}, bool @var{leave})
1636 If @var{leave} is true, marks @var{var} to be left from case to case;
1637 if @var{leave} is false, marks @var{var} to be reinitialized for each
1640 If @var{var} is a scratch variable, @var{leave} must be true.
1643 @deftypefun bool var_must_leave (const struct variable *@var{var})
1644 Returns true if @var{var} must be left from case to case, that is, if
1645 @var{var} is a scratch variable.
1648 @node Dictionary Class
1649 @subsection Dictionary Class
1651 Occasionally it is useful to classify variables into @dfn{dictionary
1652 classes} based on their names. Dictionary classes are represented by
1653 @enum{dict_class}. This type and other declarations for dictionary
1654 classes are in the @file{<data/dict-class.h>} header.
1656 @deftp {Enumeration} {enum dict_class}
1657 The dictionary classes are:
1661 An ordinary variable, one whose name does not begin with @samp{$} or
1665 A system variable, one whose name begins with @samp{$}. @xref{System
1666 Variables,,,pspp, PSPP Users Guide}.
1669 A scratch variable, one whose name begins with @samp{#}.
1670 @xref{Scratch Variables,,,pspp, PSPP Users Guide}.
1673 The values for dictionary classes are bitwise disjoint, which allows
1674 them to be used in bit-masks. An extra enumeration constant
1675 @code{DC_ALL}, whose value is the bitwise-@i{or} of all of the above
1676 constants, is provided to aid in this purpose.
1679 One example use of dictionary classes arises in connection with PSPP
1680 syntax that uses @code{@var{a} TO @var{b}} to name the variables in a
1681 dictionary from @var{a} to @var{b} (@pxref{Sets of Variables,,,pspp,
1682 PSPP Users Guide}). This syntax requires @var{a} and @var{b} to be in
1683 the same dictionary class. It limits the variables that it includes
1684 to those in that dictionary class.
1686 The following functions relate to dictionary classes.
1688 @deftypefun {enum dict_class} dict_class_from_id (const char *@var{name})
1689 Returns the ``dictionary class'' for the given variable @var{name}, by
1690 looking at its first letter.
1693 @deftypefun {const char *} dict_class_to_name (enum dict_class @var{dict_class})
1694 Returns a name for the given @var{dict_class} as an adjective, e.g.@:
1697 This function should probably not be used in new code as it can lead
1698 to difficulties for internationalization.
1701 @node Variable Creation and Destruction
1702 @subsection Variable Creation and Destruction
1704 Only rarely should PSPP code create or destroy variables directly.
1705 Ordinarily, variables are created within a dictionary and destroying
1706 by individual deletion from the dictionary or by destroying the entire
1707 dictionary at once. The functions here enable the exceptional case,
1708 of creation and destruction of variables that are not associated with
1709 any dictionary. These functions are used internally in the dictionary
1713 @deftypefun {struct variable *} var_create (const char *@var{name}, int @var{width})
1714 Creates and returns a new variable with the given @var{name} and
1715 @var{width}. The new variable is not part of any dictionary. Use
1716 @func{dict_create_var}, instead, to create a variable in a dictionary
1717 (@pxref{Dictionary Creating Variables}).
1719 @var{name} should be a valid variable name and must be a ``plausible''
1720 variable name (@pxref{Variable Name}). @var{width} must be between 0
1721 and @code{MAX_STRING}, inclusive (@pxref{Values}).
1723 The new variable has no user-missing values, value labels, or variable
1724 label. Numeric variables initially have F8.2 print and write formats,
1725 right-justified display alignment, and scale level of measurement.
1726 String variables are created with A print and write formats,
1727 left-justified display alignment, and nominal level of measurement.
1728 The initial display width is determined by
1729 @func{var_default_display_width} (@pxref{var_default_display_width}).
1731 The new variable initially has no short name (@pxref{Variable Short
1732 Names}) and no auxiliary data (@pxref{Variable Auxiliary Data}).
1736 @deftypefun {struct variable *} var_clone (const struct variable *@var{old_var})
1737 Creates and returns a new variable with the same attributes as
1738 @var{old_var}, with a few exceptions. First, the new variable is not
1739 part of any dictionary, regardless of whether @var{old_var} was in a
1740 dictionary. Use @func{dict_clone_var}, instead, to add a clone of a
1741 variable to a dictionary.
1743 Second, the new variable is not given any short name, even if
1744 @var{old_var} had a short name. This is because the new variable is
1745 likely to be immediately renamed, in which case the short name would
1746 be incorrect (@pxref{Variable Short Names}).
1748 Finally, @var{old_var}'s auxiliary data, if any, is not copied to the
1749 new variable (@pxref{Variable Auxiliary Data}).
1752 @deftypefun {void} var_destroy (struct variable *@var{var})
1753 Destroys @var{var} and frees all associated storage, including its
1754 auxiliary data, if any. @var{var} must not be part of a dictionary.
1755 To delete a variable from a dictionary and destroy it, use
1756 @func{dict_delete_var} (@pxref{Dictionary Deleting Variables}).
1759 @node Variable Short Names
1760 @subsection Variable Short Names
1762 PSPP variable names may be up to 64 (@code{VAR_NAME_LEN}) bytes long.
1763 The system and portable file formats, however, were designed when
1764 variable names were limited to 8 bytes in length. Since then, the
1765 system file format has been augmented with an extension record that
1766 explains how the 8-byte short names map to full-length names
1767 (@pxref{Long Variable Names Record}), but the short names are still
1768 present. Thus, the continued presence of the short names is more or
1769 less invisible to PSPP users, but every variable in a system file
1770 still has a short name that must be unique.
1772 PSPP can generate unique short names for variables based on their full
1773 names at the time it creates the data file. If all variables' full
1774 names are unique in their first 8 bytes, then the short names are
1775 simply prefixes of the full names; otherwise, PSPP changes them so
1776 that they are unique.
1778 By itself this algorithm interoperates well with other software that
1779 can read system files, as long as that software understands the
1780 extension record that maps short names to long names. When the other
1781 software does not understand the extension record, it can produce
1782 surprising results. Consider a situation where PSPP reads a system
1783 file that contains two variables named RANKINGSCORE, then the user
1784 adds a new variable named RANKINGSTATUS, then saves the modified data
1785 as a new system file. A program that does not understand long names
1786 would then see one of these variables under the name RANKINGS---either
1787 one, depending on the algorithm's details---and the other under a
1788 different name. The effect could be very confusing: by adding a new
1789 and apparently unrelated variable in PSPP, the user effectively
1790 renamed the existing variable.
1792 To counteract this potential problem, every @struct{variable} may have
1793 a short name. A variable created by the system or portable file
1794 reader receives the short name from that data file. When a variable
1795 with a short name is written to a system or portable file, that
1796 variable receives priority over other long names whose names begin
1797 with the same 8 bytes but which were not read from a data file under
1800 Variables not created by the system or portable file reader have no
1801 short name by default.
1803 A variable with a full name of 8 bytes or less in length has absolute
1804 priority for that name when the variable is written to a system file,
1805 even over a second variable with that assigned short name.
1807 PSPP does not enforce uniqueness of short names, although the short
1808 names read from any given data file will always be unique. If two
1809 variables with the same short name are written to a single data file,
1810 neither one receives priority.
1812 The following macros and functions relate to short names.
1814 @defmac SHORT_NAME_LEN
1815 Maximum length of a short name, in bytes. Its value is 8.
1818 @deftypefun {const char *} var_get_short_name (const struct variable *@var{var})
1819 Returns @var{var}'s short name, or a null pointer if @var{var} has not
1820 been assigned a short name.
1823 @deftypefun void var_set_short_name (struct variable *@var{var}, const char *@var{short_name})
1824 Sets @var{var}'s short name to @var{short_name}, or removes
1825 @var{var}'s short name if @var{short_name} is a null pointer. If it
1826 is non-null, then @var{short_name} must be a plausible name for a
1827 variable (@pxref{var_is_plausible_name}). The name will be truncated
1828 to 8 bytes in length and converted to all-uppercase.
1831 @deftypefun void var_clear_short_name (struct variable *@var{var})
1832 Removes @var{var}'s short name.
1835 @node Variable Relationships
1836 @subsection Variable Relationships
1838 Variables have close relationships with dictionaries
1839 (@pxref{Dictionaries}) and cases (@pxref{Cases}). A variable is
1840 usually a member of some dictionary, and a case is often used to store
1841 data for the set of variables in a dictionary.
1843 These functions report on these relationships. They may be applied
1844 only to variables that are in a dictionary.
1846 @deftypefun size_t var_get_dict_index (const struct variable *@var{var})
1847 Returns @var{var}'s index within its dictionary. The first variable
1848 in a dictionary has index 0, the next variable index 1, and so on.
1850 The dictionary index can be influenced using dictionary functions such
1851 as dict_reorder_var (@pxref{dict_reorder_var}).
1854 @deftypefun size_t var_get_case_index (const struct variable *@var{var})
1855 Returns @var{var}'s index within a case. The case index is an index
1856 into an array of @union{value} large enough to contain all the data in
1859 The returned case index can be used to access the value of @var{var}
1860 within a case for its dictionary, as in e.g.@: @code{case_data_idx
1861 (case, var_get_case_index (@var{var}))}, but ordinarily it is more
1862 convenient to use the data access functions that do variable-to-index
1863 translation internally, as in e.g.@: @code{case_data (case,
1867 @node Variable Auxiliary Data
1868 @subsection Variable Auxiliary Data
1870 Each @struct{variable} can have a single pointer to auxiliary data of
1871 type @code{void *}. These functions manipulate a variable's auxiliary
1874 Use of auxiliary data is discouraged because of its lack of
1875 flexibility. Only one client can make use of auxiliary data on a
1876 given variable at any time, even though many clients could usefully
1877 associate data with a variable.
1879 To prevent multiple clients from attempting to use a variable's single
1880 auxiliary data field at the same time, we adopt the convention that
1881 use of auxiliary data in the active file dictionary is restricted to
1882 the currently executing command. In particular, transformations must
1883 not attach auxiliary data to a variable in the active file in the
1884 expectation that it can be used later when the active file is read and
1885 the transformation is executed. To help enforce this restriction,
1886 auxiliary data is deleted from all variables in the active file
1887 dictionary after the execution of each PSPP command.
1889 This convention for safe use of auxiliary data applies only to the
1890 active file dictionary. Rules for other dictionaries may be
1891 established separately.
1893 Auxiliary data should be replaced by a more flexible mechanism at some
1894 point, but no replacement mechanism has been designed or implemented
1897 The following functions work with variable auxiliary data.
1899 @deftypefun {void *} var_get_aux (const struct variable *@var{var})
1900 Returns @var{var}'s auxiliary data, or a null pointer if none has been
1904 @deftypefun {void *} var_attach_aux (const struct variable *@var{var}, void *@var{aux}, void (*@var{aux_dtor}) (struct variable *))
1905 Sets @var{var}'s auxiliary data to @var{aux}, which must not be null.
1906 @var{var} must not already have auxiliary data.
1908 Before @var{var}'s auxiliary data is cleared by @code{var_clear_aux},
1909 @var{aux_dtor}, if non-null, will be called with @var{var} as its
1910 argument. It should free any storage associated with @var{aux}, if
1911 necessary. @code{var_dtor_free} may be appropriate for use as
1914 @deffn {Function} void var_dtor_free (struct variable *@var{var})
1915 Frees @var{var}'s auxiliary data by calling @code{free}.
1919 @deftypefun void var_clear_aux (struct variable *@var{var})
1920 Removes auxiliary data, if any, from @var{var}, first calling the
1921 destructor passed to @code{var_attach_aux}, if one was provided.
1923 Use @code{dict_clear_aux} to remove auxiliary data from every variable
1924 in a dictionary. @c (@pxref{dict_clear_aux}).
1927 @deftypefun {void *} var_detach_aux (struct variable *@var{var})
1928 Removes auxiliary data, if any, from @var{var}, and returns it.
1929 Returns a null pointer if @var{var} had no auxiliary data.
1931 Any destructor passed to @code{var_attach_aux} is not called, so the
1932 caller is responsible for freeing storage associated with the returned
1936 @node Variable Categorical Values
1937 @subsection Variable Categorical Values
1939 Some statistical procedures require a list of all the values that a
1940 categorical variable takes on. Arranging such a list requires making
1941 a pass through the data, so PSPP caches categorical values in
1944 When variable auxiliary data is revamped to support multiple clients
1945 as described in the previous section, categorical values are an
1946 obvious candidate. The form in which they are currently supported is
1949 Categorical values are not robust against changes in the data. That
1950 is, there is currently no way to detect that a transformation has
1951 changed data values, meaning that categorical values lists for the
1952 changed variables must be recomputed. PSPP is in fact in need of a
1953 general-purpose caching and cache-invalidation mechanism, but none
1954 has yet been designed and built.
1956 The following functions work with cached categorical values.
1958 @deftypefun {struct cat_vals *} var_get_obs_vals (const struct variable *@var{var})
1959 Returns @var{var}'s set of categorical values. Yields undefined
1960 behavior if @var{var} does not have any categorical values.
1963 @deftypefun void var_set_obs_vals (const struct variable *@var{var}, struct cat_vals *@var{cat_vals})
1964 Destroys @var{var}'s categorical values, if any, and replaces them by
1965 @var{cat_vals}, ownership of which is transferred to @var{var}. If
1966 @var{cat_vals} is a null pointer, then @var{var}'s categorical values
1970 @deftypefun bool var_has_obs_vals (const struct variable *@var{var})
1971 Returns true if @var{var} has a set of categorical values, false
1976 @section Dictionaries
1978 Each data file in memory or on disk has an associated dictionary,
1979 whose primary purpose is to describe the data in the file.
1980 @xref{Variables,,,pspp, PSPP Users Guide}, for a PSPP user's view of a
1983 A data file stored in a PSPP format, either as a system or portable
1984 file, has a representation of its dictionary embedded in it. Other
1985 kinds of data files are usually not self-describing enough to
1986 construct a dictionary unassisted, so the dictionaries for these files
1987 must be specified explicitly with PSPP commands such as @cmd{DATA
1990 The most important content of a dictionary is an array of variables,
1991 which must have unique names. A dictionary also conceptually contains
1992 a mapping from each of its variables to a location within a case
1993 (@pxref{Cases}), although in fact these mappings are stored within
1994 individual variables.
1996 System variables are not members of any dictionary (@pxref{System
1997 Variables,,,pspp, PSPP Users Guide}).
1999 Dictionaries are represented by @struct{dictionary}. Declarations
2000 related to dictionaries are in the @file{<data/dictionary.h>} header.
2002 The following sections describe functions for use with dictionaries.
2005 * Dictionary Variable Access::
2006 * Dictionary Creating Variables::
2007 * Dictionary Deleting Variables::
2008 * Dictionary Reordering Variables::
2009 * Dictionary Renaming Variables::
2010 * Dictionary Weight Variable::
2011 * Dictionary Filter Variable::
2012 * Dictionary Case Limit::
2013 * Dictionary Split Variables::
2014 * Dictionary File Label::
2015 * Dictionary Documents::
2018 @node Dictionary Variable Access
2019 @subsection Accessing Variables
2021 The most common operations on a dictionary simply retrieve a
2022 @code{struct variable *} of an individual variable based on its name
2025 @deftypefun {struct variable *} dict_lookup_var (const struct dictionary *@var{dict}, const char *@var{name})
2026 @deftypefunx {struct variable *} dict_lookup_var_assert (const struct dictionary *@var{dict}, const char *@var{name})
2027 Looks up and returns the variable with the given @var{name} within
2028 @var{dict}. Name lookup is not case-sensitive.
2030 @code{dict_lookup_var} returns a null pointer if @var{dict} does not
2031 contain a variable named @var{name}. @code{dict_lookup_var_assert}
2032 asserts that such a variable exists.
2035 @deftypefun {struct variable *} dict_get_var (const struct dictionary *@var{dict}, size_t @var{position})
2036 Returns the variable at the given @var{position} in @var{dict}.
2037 @var{position} must be less than the number of variables in @var{dict}
2041 @deftypefun size_t dict_get_var_cnt (const struct dictionary *@var{dict})
2042 Returns the number of variables in @var{dict}.
2045 Another pair of functions allows retrieving a number of variables at
2046 once. These functions are more rarely useful.
2048 @deftypefun void dict_get_vars (const struct dictionary *@var{dict}, const struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2049 @deftypefunx void dict_get_vars_mutable (const struct dictionary *@var{dict}, struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2050 Retrieves all of the variables in @var{dict}, in their original order,
2051 except that any variables in the dictionary classes specified
2052 @var{exclude}, if any, are excluded (@pxref{Dictionary Class}).
2053 Pointers to the variables are stored in an array allocated with
2054 @code{malloc}, and a pointer to the first element of this array is
2055 stored in @code{*@var{vars}}. The caller is responsible for freeing
2056 this memory when it is no longer needed. The number of variables
2057 retrieved is stored in @code{*@var{cnt}}.
2059 The presence or absence of @code{DC_SYSTEM} in @var{exclude} has no
2060 effect, because dictionaries never include system variables.
2063 One additional function is available. This function is most often
2064 used in assertions, but it is not restricted to such use.
2066 @deftypefun bool dict_contains_var (const struct dictionary *@var{dict}, const struct variable *@var{var})
2067 Tests whether @var{var} is one of the variables in @var{dict}.
2068 Returns true if so, false otherwise.
2071 @node Dictionary Creating Variables
2072 @subsection Creating Variables
2074 These functions create a new variable and insert it into a dictionary
2077 There is no provision for inserting an already created variable into a
2078 dictionary. There is no reason that such a function could not be
2079 written, but so far there has been no need for one.
2081 The names provided to one of these functions should be valid variable
2082 names and must be plausible variable names. @c (@pxref{Variable Names}).
2084 If a variable with the same name already exists in the dictionary, the
2085 non-@code{assert} variants of these functions return a null pointer,
2086 without modifying the dictionary. The @code{assert} variants, on the
2087 other hand, assert that no duplicate name exists.
2089 A variable may be in only one dictionary at any given time.
2091 @deftypefun {struct variable *} dict_create_var (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2092 @deftypefunx {struct variable *} dict_create_var_assert (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2093 Creates a new variable with the given @var{name} and @var{width}, as
2094 if through a call to @code{var_create} with those arguments
2095 (@pxref{var_create}), appends the new variable to @var{dict}'s array
2096 of variables, and returns the new variable.
2099 @deftypefun {struct variable *} dict_clone_var (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2100 @deftypefunx {struct variable *} dict_clone_var_assert (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2101 Creates a new variable as a clone of @var{var}, inserts the new
2102 variable into @var{dict}, and returns the new variable. The new
2103 variable is named @var{name}. Other properties of the new variable
2104 are copied from @var{old_var}, except for those not copied by
2105 @code{var_clone} (@pxref{var_clone}).
2107 @var{var} does not need to be a member of any dictionary.
2110 @node Dictionary Deleting Variables
2111 @subsection Deleting Variables
2113 These functions remove variables from a dictionary's array of
2114 variables. They also destroy the removed variables and free their
2117 Deleting a variable to which there might be external pointers is a bad
2118 idea. In particular, deleting variables from the active file
2119 dictionary is a risky proposition, because transformations can retain
2120 references to arbitrary variables. Therefore, no variable should be
2121 deleted from the active file dictionary when any transformations are
2122 active, because those transformations might reference the variable to
2123 be deleted. The safest time to delete a variable is just after a
2124 procedure has been executed, as done by @cmd{DELETE VARIABLES}.
2126 Deleting a variable automatically removes references to that variable
2127 from elsewhere in the dictionary as a weighting variable, filter
2128 variable, @cmd{SPLIT FILE} variable, or member of a vector.
2130 No functions are provided for removing a variable from a dictionary
2131 without destroying that variable. As with insertion of an existing
2132 variable, there is no reason that this could not be implemented, but
2133 so far there has been no need.
2135 @deftypefun void dict_delete_var (struct dictionary *@var{dict}, struct variable *@var{var})
2136 Deletes @var{var} from @var{dict}, of which it must be a member.
2139 @deftypefun void dict_delete_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{count})
2140 Deletes the @var{count} variables in array @var{vars} from @var{dict}.
2141 All of the variables in @var{vars} must be members of @var{dict}. No
2142 variable may be included in @var{vars} more than once.
2145 @deftypefun void dict_delete_consecutive_vars (struct dictionary *@var{dict}, size_t @var{idx}, size_t @var{count})
2146 Deletes the variables in sequential positions
2147 @var{idx}@dots{}@var{idx} + @var{count} (exclusive) from @var{dict},
2148 which must contain at least @var{idx} + @var{count} variables.
2151 @deftypefun void dict_delete_scratch_vars (struct dictionary *@var{dict})
2152 Deletes all scratch variables from @var{dict}.
2155 @node Dictionary Reordering Variables
2156 @subsection Changing Variable Order
2158 The variables in a dictionary are stored in an array. These functions
2159 change the order of a dictionary's array of variables without changing
2160 which variables are in the dictionary.
2162 @anchor{dict_reorder_var}
2163 @deftypefun void dict_reorder_var (struct dictionary *@var{dict}, struct variable *@var{var}, size_t @var{new_index})
2164 Moves @var{var}, which must be in @var{dict}, so that it is at
2165 position @var{new_index} in @var{dict}'s array of variables. Other
2166 variables in @var{dict}, if any, retain their relative positions.
2167 @var{new_index} must be less than the number of variables in
2171 @deftypefun void dict_reorder_vars (struct dictionary *@var{dict}, struct variable *const *@var{new_order}, size_t @var{count})
2172 Moves the @var{count} variables in @var{new_order} to the beginning of
2173 @var{dict}'s array of variables in the specified order. Other
2174 variables in @var{dict}, if any, retain their relative positions.
2176 All of the variables in @var{new_order} must be in @var{dict}. No
2177 duplicates are allowed within @var{new_order}, which means that
2178 @var{count} must be no greater than the number of variables in
2182 @node Dictionary Renaming Variables
2183 @subsection Renaming Variables
2185 These functions change the names of variables within a dictionary.
2186 The @func{var_set_name} function (@pxref{var_set_name}) cannot be
2187 applied directly to a variable that is in a dictionary, because
2188 @struct{dictionary} contains an index by name that @func{var_set_name}
2189 would not update. The following functions take care to update the
2190 index as well. They also ensure that variable renaming does not cause
2191 a dictionary to contain a duplicate variable name.
2193 @deftypefun void dict_rename_var (struct dictionary *@var{dict}, struct variable *@var{var}, const char *@var{new_name})
2194 Changes the name of @var{var}, which must be in @var{dict}, to
2195 @var{new_name}. A variable named @var{new_name} must not already be
2196 in @var{dict}, unless @var{new_name} is the same as @var{var}'s
2200 @deftypefun bool dict_rename_vars (struct dictionary *@var{dicT}, struct variable **@var{vars}, char **@var{new_names}, size_t @var{count}, char **@var{err_name})
2201 Renames each of the @var{count} variables in @var{vars} to the name in
2202 the corresponding position of @var{new_names}. If the renaming would
2203 result in a duplicate variable name, returns false and stores one of
2204 the names that would be be duplicated into @code{*@var{err_name}}, if
2205 @var{err_name} is non-null. Otherwise, the renaming is successful,
2206 and true is returned.
2209 @node Dictionary Weight Variable
2210 @subsection Weight Variable
2212 A data set's cases may optionally be weighted by the value of a
2213 numeric variable. @xref{WEIGHT,,,pspp, PSPP Users Guide}, for a user
2214 view of weight variables.
2216 The weight variable is written to and read from system and portable
2219 The most commonly useful function related to weighting is a
2220 convenience function to retrieve a weighting value from a case.
2222 @deftypefun double dict_get_case_weight (const struct dictionary *@var{dict}, const struct ccase *@var{case}, bool *@var{warn_on_invalid})
2223 Retrieves and returns the value of the weighting variable specified by
2224 @var{dict} from @var{case}. Returns 1.0 if @var{dict} has no
2227 Returns 0.0 if @var{c}'s weight value is user- or system-missing,
2228 zero, or negative. In such a case, if @var{warn_on_invalid} is
2229 non-null and @code{*@var{warn_on_invalid}} is true,
2230 @func{dict_get_case_weight} also issues an error message and sets
2231 @code{*@var{warn_on_invalid}} to false. To disable error reporting,
2232 pass a null pointer or a pointer to false as @var{warn_on_invalid} or
2233 use a @func{msg_disable}/@func{msg_enable} pair.
2236 The dictionary also has a pair of functions for getting and setting
2237 the weight variable.
2239 @deftypefun {struct variable *} dict_get_weight (const struct dictionary *@var{dict})
2240 Returns @var{dict}'s current weighting variable, or a null pointer if
2241 the dictionary does not have a weighting variable.
2244 @deftypefun void dict_set_weight (struct dictionary *@var{dict}, struct variable *@var{var})
2245 Sets @var{dict}'s weighting variable to @var{var}. If @var{var} is
2246 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2247 is null, then @var{dict}'s weighting variable, if any, is cleared.
2250 @node Dictionary Filter Variable
2251 @subsection Filter Variable
2253 When the active file is read by a procedure, cases can be excluded
2254 from analysis based on the values of a @dfn{filter variable}.
2255 @xref{FILTER,,,pspp, PSPP Users Guide}, for a user view of filtering.
2257 These functions store and retrieve the filter variable. They are
2258 rarely useful, because the data analysis framework automatically
2259 excludes from analysis the cases that should be filtered.
2261 @deftypefun {struct variable *} dict_get_filter (const struct dictionary *@var{dict})
2262 Returns @var{dict}'s current filter variable, or a null pointer if the
2263 dictionary does not have a filter variable.
2266 @deftypefun void dict_set_filter (struct dictionary *@var{dict}, struct variable *@var{var})
2267 Sets @var{dict}'s filter variable to @var{var}. If @var{var} is
2268 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2269 is null, then @var{dict}'s filter variable, if any, is cleared.
2272 @node Dictionary Case Limit
2273 @subsection Case Limit
2275 The limit on cases analyzed by a procedure, set by the @cmd{N OF
2276 CASES} command (@pxref{N OF CASES,,,pspp, PSPP Users Guide}), is
2277 stored as part of the dictionary. The dictionary does not, on the
2278 other hand, play any role in enforcing the case limit (a job done by
2279 data analysis framework code).
2281 A case limit of 0 means that the number of cases is not limited.
2283 These functions are rarely useful, because the data analysis framework
2284 automatically excludes from analysis any cases beyond the limit.
2286 @deftypefun casenumber dict_get_case_limit (const struct dictionary *@var{dict})
2287 Returns the current case limit for @var{dict}.
2290 @deftypefun void dict_set_case_limit (struct dictionary *@var{dict}, casenumber @var{limit})
2291 Sets @var{dict}'s case limit to @var{limit}.
2294 @node Dictionary Split Variables
2295 @subsection Split Variables
2297 The user may use the @cmd{SPLIT FILE} command (@pxref{SPLIT
2298 FILE,,,pspp, PSPP Users Guide}) to select a set of variables on which
2299 to split the active file into groups of cases to be analyzed
2300 independently in each statistical procedure. The set of split
2301 variables is stored as part of the dictionary, although the effect on
2302 data analysis is implemented by each individual statistical procedure.
2304 Split variables may be numeric or short or long string variables.
2306 The most useful functions for split variables are those to retrieve
2307 them. Even these functions are rarely useful directly: for the
2308 purpose of breaking cases into groups based on the values of the split
2309 variables, it is usually easier to use
2310 @func{casegrouper_create_splits}.
2312 @deftypefun {const struct variable *const *} dict_get_split_vars (const struct dictionary *@var{dict})
2313 Returns a pointer to an array of pointers to split variables. If and
2314 only if there are no split variables, returns a null pointer. The
2315 caller must not modify or free the returned array.
2318 @deftypefun size_t dict_get_split_cnt (const struct dictionary *@var{dict})
2319 Returns the number of split variables.
2322 The following functions are also available for working with split
2325 @deftypefun void dict_set_split_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{cnt})
2326 Sets @var{dict}'s split variables to the @var{cnt} variables in
2327 @var{vars}. If @var{cnt} is 0, then @var{dict} will not have any
2328 split variables. The caller retains ownership of @var{vars}.
2331 @deftypefun void dict_unset_split_var (struct dictionary *@var{dict}, struct variable *@var{var})
2332 Removes @var{var}, which must be a variable in @var{dict}, from
2333 @var{dict}'s split of split variables.
2336 @node Dictionary File Label
2337 @subsection File Label
2339 A dictionary may optionally have an associated string that describes
2340 its contents, called its file label. The user may set the file label
2341 with the @cmd{FILE LABEL} command (@pxref{FILE LABEL,,,pspp, PSPP
2344 These functions set and retrieve the file label.
2346 @deftypefun {const char *} dict_get_label (const struct dictionary *@var{dict})
2347 Returns @var{dict}'s file label. If @var{dict} does not have a label,
2348 returns a null pointer.
2351 @deftypefun void dict_set_label (struct dictionary *@var{dict}, const char *@var{label})
2352 Sets @var{dict}'s label to @var{label}. If @var{label} is non-null,
2353 then its content, truncated to at most 60 bytes, becomes the new file
2354 label. If @var{label} is null, then @var{dict}'s label is removed.
2356 The caller retains ownership of @var{label}.
2359 @node Dictionary Documents
2360 @subsection Documents
2362 A dictionary may include an arbitrary number of lines of explanatory
2363 text, called the dictionary's documents. For compatibility, document
2364 lines have a fixed width, and lines that are not exactly this width
2365 are truncated or padded with spaces as necessary to bring them to the
2368 PSPP users can use the @cmd{DOCUMENT} (@pxref{DOCUMENT,,,pspp, PSPP
2369 Users Guide}), @cmd{ADD DOCUMENT} (@pxref{ADD DOCUMENT,,,pspp, PSPP
2370 Users Guide}), and @cmd{DROP DOCUMENTS} (@pxref{DROP DOCUMENTS,,,pspp,
2371 PSPP Users Guide}) commands to manipulate documents.
2373 @deftypefn Macro int DOC_LINE_LENGTH
2374 The fixed length of a document line, in bytes, defined to 80.
2377 The following functions work with whole sets of documents. They
2378 accept or return sets of documents formatted as null-terminated
2379 strings that are an exact multiple of @code{DOC_LINE_LENGTH}
2382 @deftypefun {const char *} dict_get_documents (const struct dictionary *@var{dict})
2383 Returns the documents in @var{dict}, or a null pointer if @var{dict}
2387 @deftypefun void dict_set_documents (struct dictionary *@var{dict}, const char *@var{new_documents})
2388 Sets @var{dict}'s documents to @var{new_documents}. If
2389 @var{new_documents} is a null pointer or an empty string, then
2390 @var{dict}'s documents are cleared. The caller retains ownership of
2391 @var{new_documents}.
2394 @deftypefun void dict_clear_documents (struct dictionary *@var{dict})
2395 Clears the documents from @var{dict}.
2398 The following functions work with individual lines in a dictionary's
2401 @deftypefun void dict_add_document_line (struct dictionary *@var{dict}, const char *@var{content})
2402 Appends @var{content} to the documents in @var{dict}. The text in
2403 @var{content} will be truncated or padded with spaces as necessary to
2404 make it exactly @code{DOC_LINE_LENGTH} bytes long. The caller retains
2405 ownership of @var{content}.
2407 If @var{content} is over @code{DOC_LINE_LENGTH}, this function also
2408 issues a warning using @func{msg}. To suppress the warning, enclose a
2409 call to one of this function in a @func{msg_disable}/@func{msg_enable}
2413 @deftypefun size_t dict_get_document_line_cnt (const struct dictionary *@var{dict})
2414 Returns the number of line of documents in @var{dict}. If the
2415 dictionary contains no documents, returns 0.
2418 @deftypefun void dict_get_document_line (const struct dictionary *@var{dict}, size_t @var{idx}, struct string *@var{content})
2419 Replaces the text in @var{content} (which must already have been
2420 initialized by the caller) by the document line in @var{dict} numbered
2421 @var{idx}, which must be less than the number of lines of documents in
2422 @var{dict}. Any trailing white space in the document line is trimmed,
2423 so that @var{content} will have a length between 0 and
2424 @code{DOC_LINE_LENGTH}.
2427 @node Coding Conventions
2428 @section Coding Conventions
2430 Every @file{.c} file should have @samp{#include <config.h>} as its
2431 first non-comment line. No @file{.h} file should include
2434 This section needs to be finished.
2439 This section needs to be written.
2444 This section needs to be written.
2449 This section needs to be written.