2 @chapter Basic Concepts
4 This chapter introduces basic data structures and other concepts
5 needed for developing in PSPP.
9 * Input and Output Formats::
10 * User-Missing Values::
14 * Coding Conventions::
24 The unit of data in PSPP is a @dfn{value}.
30 Values are classified by @dfn{type} and @dfn{width}. The
31 type of a value is either @dfn{numeric} or @dfn{string} (sometimes
32 called alphanumeric). The width of a string value ranges from 1 to
33 @code{MAX_STRING} bytes. The width of a numeric value is artificially
34 defined to be 0; thus, the type of a value can be inferred from its
37 Some support is provided for working with value types and widths, in
38 @file{data/val-type.h}:
40 @deftypefn Macro int MAX_STRING
41 Maximum width of a string value, in bytes, currently 32,767.
44 @deftypefun bool val_type_is_valid (enum val_type @var{val_type})
45 Returns true if @var{val_type} is a valid value type, that is,
46 either @code{VAL_NUMERIC} or @code{VAL_STRING}. Useful for
50 @deftypefun {enum val_type} val_type_from_width (int @var{width})
51 Returns @code{VAL_NUMERIC} if @var{width} is 0 and thus represents the
52 width of a numeric value, otherwise @code{VAL_STRING} to indicate that
53 @var{width} is the width of a string value.
56 The following subsections describe how values of each type are
62 * Runtime Typed Values::
66 @subsection Numeric Values
68 A value known to be numeric at compile time is represented as a
69 @code{double}. PSPP provides three values of @code{double} for
70 special purposes, defined in @file{data/val-type.h}:
72 @deftypefn Macro double SYSMIS
73 The @dfn{system-missing value}, used to represent a datum whose true
74 value is unknown, such as a survey question that was not answered by
75 the respondent, or undefined, such as the result of division by zero.
76 PSPP propagates the system-missing value through calculations and
77 compensates for missing values in statistical analyses. @xref{Missing
78 Observations,,,pspp, PSPP Users Guide}, for a PSPP user's view of
81 PSPP currently defines @code{SYSMIS} as @code{-DBL_MAX}, that is, the
82 greatest finite negative value of @code{double}. It is best not to
83 depend on this definition, because PSPP may transition to using an
84 IEEE NaN (not a number) instead at some point in the future.
87 @deftypefn Macro double LOWEST
88 @deftypefnx Macro double HIGHEST
89 The greatest finite negative (except for @code{SYSMIS}) and positive
90 values of @code{double}, respectively. These values do not ordinarily
91 appear in user data files. Instead, they are used to implement
92 endpoints of open-ended ranges that are occasionally permitted in PSPP
93 syntax, e.g.@: @code{5 THRU HI} as a range of missing values
94 (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}).
98 @subsection String Values
100 A value known at compile time to have string type is represented as an
101 array of @code{char}. String values do not necessarily represent
102 readable text strings and may contain arbitrary 8-bit data, including
103 null bytes, control codes, and bytes with the high bit set. Thus,
104 string values are not null-terminated strings, but rather opaque
107 @code{SYSMIS}, @code{LOWEST}, and @code{HIGHEST} have no equivalents
108 as string values. Usually, PSPP fills an unknown or undefined string
109 values with spaces, but PSPP does not treat such a string as a special
110 case when it processes it later.
113 @code{MAX_STRING}, the maximum length of a string value, is defined in
114 @file{data/val-type.h}.
116 @node Runtime Typed Values
117 @subsection Runtime Typed Values
119 When a value's type is only known at runtime, it is often represented
120 as a @union{value}, defined in @file{data/value.h}. A @union{value}
121 does not identify the type or width of the data it contains. Code
122 that works with @union{values}s must therefore have external knowledge
123 of its content, often through the type and width of a
124 @struct{variable} (@pxref{Variables}).
126 @union{value} has one member that clients are permitted to access
127 directly, a @code{double} named @samp{f} that stores the content of a
128 numeric @union{value}. It has other members that store the content of
129 string @union{value}, but client code should use accessor functions
130 instead of referring to these directly.
132 PSPP provides some functions for working with @union{value}s. The
133 most useful are described below. To use these functions, recall that
134 a numeric value has a width of 0.
136 @deftypefun void value_init (union value *@var{value}, int @var{width})
137 Initializes @var{value} as a value of the given @var{width}. After
138 initialization, the data in @var{value} are indeterminate; the caller
139 is responsible for storing initial data in it.
142 @deftypefun void value_destroy (union value *@var{value}, int @var{width})
143 Frees auxiliary storage associated with @var{value}, which must have
144 the given @var{width}.
147 @deftypefun bool value_needs_init (int @var{width})
148 For some widths, @func{value_init} and @func{value_destroy} do not
149 actually do anything, because no additional storage is needed beyond
150 the size of @union{value}. This function returns true if @var{width}
151 is such a width, which case there is no actual need to call those
152 functions. This can be a useful optimization if a large number of
153 @union{value}s of such a width are to be initialized or destroyed.
155 This function returns false if @func{value_init} and
156 @func{value_destroy} are actually required for the given @var{width}.
159 @deftypefun double value_num (const union value *@var{value})
160 Returns the numeric value in @var{value}, which must have been
161 initialized as a numeric value. Equivalent to @code{@var{value}->f}.
164 @deftypefun {const char *} value_str (const union value *@var{value}, int @var{width})
165 @deftypefunx {char *} value_str_rw (union value *@var{value}, int @var{width})
166 Returns the string value in @var{value}, which must have been
167 initialized with positive width @var{width}. The string returned is
168 not null-terminated. Only @var{width} bytes of returned data may be
171 The two different functions exist only for @code{const}-correctness.
172 Otherwise they are identical.
174 It is important that @var{width} be the correct value that was passed
175 to @func{value_init}. Passing a smaller or larger value (e.g.@:
176 because that number of bytes will be accessed) will not always work
177 and should be avoided.
180 @deftypefun void value_copy (union value *@var{dst}, @
181 const union value *@var{src}, @
183 Copies the contents of @union{value} @var{src} to @var{dst}. Both
184 @var{dst} and @var{src} must have been initialized with the specified
188 @deftypefun void value_set_missing (union value *@var{value}, int @var{width})
189 Sets @var{value} to @code{SYSMIS} if it is numeric or to all spaces if
190 it is alphanumeric, according to @var{width}. @var{value} must have
191 been initialized with the specified @var{width}.
194 @anchor{value_is_resizable}
195 @deftypefun bool value_is_resizable (const union value *@var{value}, int @var{old_width}, int @var{new_width})
196 Determines whether @var{value}, which must have been initialized with
197 the specified @var{old_width}, may be resized to @var{new_width}.
198 Resizing is possible if the following criteria are met. First,
199 @var{old_width} and @var{new_width} must be both numeric or both
200 string widths. Second, if @var{new_width} is a short string width and
201 less than @var{old_width}, resizing is allowed only if bytes
202 @var{new_width} through @var{old_width} in @var{value} contain only
205 These rules are part of those used by @func{mv_is_resizable} and
206 @func{val_labs_can_set_width}.
209 @deftypefun void value_resize (union value *@var{value}, int @var{old_width}, int @var{new_width})
210 Resizes @var{value} from @var{old_width} to @var{new_width}, which
211 must be allowed by the rules stated above. @var{value} must have been
212 initialized with the specified @var{old_width} before calling this
213 function. After resizing, @var{value} has width @var{new_width}.
215 If @var{new_width} is greater than @var{old_width}, @var{value} will
216 be padded on the right with spaces to the new width. If
217 @var{new_width} is less than @var{old_width}, the rightmost bytes of
218 @var{value} are truncated.
221 @deftypefun bool value_equal (const union value *@var{a}, const union value *@var{b}, int @var{width})
222 Compares of @var{a} and @var{b}, which must both have width
223 @var{width}. Returns true if their contents are the same, false if
227 @deftypefun int value_compare_3way (const union value *@var{a}, const union value *@var{b}, int @var{width})
228 Compares of @var{a} and @var{b}, which must both have width
229 @var{width}. Returns -1 if @var{a} is less than @var{b}, 0 if they
230 are equal, or 1 if @var{a} is greater than @var{b}.
232 Numeric values are compared numerically, with @code{SYSMIS} comparing
233 less than any real number. String values are compared
234 lexicographically byte-by-byte.
237 @deftypefun size_t value_hash (const union value *@var{value}, int @var{width}, unsigned int @var{basis})
238 Computes and returns a hash of @var{value}, which must have the
239 specified @var{width}. The value in @var{basis} is folded into the
243 @node Input and Output Formats
244 @section Input and Output Formats
246 Input and output formats specify how to convert data fields to and
247 from data values (@pxref{Input and Output Formats,,,pspp, PSPP Users
248 Guide}). PSPP uses @struct{fmt_spec} to represent input and output
251 Function prototypes and other declarations related to formats are in
252 the @file{<data/format.h>} header.
254 @deftp {Structure} {struct fmt_spec}
255 An input or output format, with the following members:
258 @item enum fmt_type type
259 The format type (see below).
262 Field width, in bytes. The width of numeric fields is always between
263 1 and 40 bytes, and the width of string fields is always between 1 and
264 65534 bytes. However, many individual types of formats place stricter
265 limits on field width (see @ref{fmt_max_input_width},
266 @ref{fmt_max_output_width}).
269 Number of decimal places, in character positions. For format types
270 that do not allow decimal places to be specified, this value must be
271 0. Format types that do allow decimal places have type-specific and
272 often width-specific restrictions on @code{d} (see
273 @ref{fmt_max_input_decimals}, @ref{fmt_max_output_decimals}).
277 @deftp {Enumeration} {enum fmt_type}
278 An enumerated type representing an input or output format type. Each
279 PSPP input and output format has a corresponding enumeration constant
280 prefixed by @samp{FMT}: @code{FMT_F}, @code{FMT_COMMA},
281 @code{FMT_DOT}, and so on.
284 The following sections describe functions for manipulating formats and
285 the data in fields represented by formats.
288 * Constructing and Verifying Formats::
289 * Format Utility Functions::
290 * Obtaining Properties of Format Types::
291 * Numeric Formatting Styles::
292 * Formatted Data Input and Output::
295 @node Constructing and Verifying Formats
296 @subsection Constructing and Verifying Formats
298 These functions construct @struct{fmt_spec}s and verify that they are
303 @deftypefun {struct fmt_spec} fmt_for_input (enum fmt_type @var{type}, int @var{w}, int @var{d})
304 @deftypefunx {struct fmt_spec} fmt_for_output (enum fmt_type @var{type}, int @var{w}, int @var{d})
305 Constructs a @struct{fmt_spec} with the given @var{type}, @var{w}, and
306 @var{d}, asserts that the result is a valid input (or output) format,
310 @anchor{fmt_for_output_from_input}
311 @deftypefun {struct fmt_spec} fmt_for_output_from_input (const struct fmt_spec *@var{input})
312 Given @var{input}, which must be a valid input format, returns the
313 equivalent output format. @xref{Input and Output Formats,,,pspp, PSPP
314 Users Guide}, for the rules for converting input formats into output
318 @deftypefun {struct fmt_spec} fmt_default_for_width (int @var{width})
319 Returns the default output format for a variable of the given
320 @var{width}. For a numeric variable, this is F8.2 format; for a
321 string variable, it is the A format of the given @var{width}.
324 The following functions check whether a @struct{fmt_spec} is valid for
325 various uses and return true if so, false otherwise. When any of them
326 returns false, it also outputs an explanatory error message using
327 @func{msg}. To suppress error output, enclose a call to one of these
328 functions by a @func{msg_disable}/@func{msg_enable} pair.
330 @deftypefun bool fmt_check (const struct fmt_spec *@var{format}, bool @var{for_input})
331 @deftypefunx bool fmt_check_input (const struct fmt_spec *@var{format})
332 @deftypefunx bool fmt_check_output (const struct fmt_spec *@var{format})
333 Checks whether @var{format} is a valid input format (for
334 @func{fmt_check_input}, or @func{fmt_check} if @var{for_input}) or
335 output format (for @func{fmt_check_output}, or @func{fmt_check} if not
339 @deftypefun bool fmt_check_type_compat (const struct fmt_spec *@var{format}, enum val_type @var{type})
340 Checks whether @var{format} matches the value type @var{type}, that
341 is, if @var{type} is @code{VAL_NUMERIC} and @var{format} is a numeric
342 format or @var{type} is @code{VAL_STRING} and @var{format} is a string
346 @deftypefun bool fmt_check_width_compat (const struct fmt_spec *@var{format}, int @var{width})
347 Checks whether @var{format} may be used as an output format for a
348 value of the given @var{width}.
350 @func{fmt_var_width}, described in
351 the following section, can be also be used to determine the value
352 width needed by a format.
355 @node Format Utility Functions
356 @subsection Format Utility Functions
358 These functions work with @struct{fmt_spec}s.
360 @deftypefun int fmt_var_width (const struct fmt_spec *@var{format})
361 Returns the width for values associated with @var{format}. If
362 @var{format} is a numeric format, the width is 0; if @var{format} is
363 an A format, then the width @code{@var{format}->w}; otherwise,
364 @var{format} is an AHEX format and its width is @code{@var{format}->w
368 @deftypefun char *fmt_to_string (const struct fmt_spec *@var{format}, char @var{s}[FMT_STRING_LEN_MAX + 1])
369 Converts @var{format} to a human-readable format specifier in @var{s}
370 and returns @var{s}. @var{format} need not be a valid input or output
371 format specifier, e.g.@: it is allowed to have an excess width or
372 decimal places. In particular, if @var{format} has decimals, they are
373 included in the output string, even if @var{format}'s type does not
374 allow decimals, to allow accurately presenting incorrect formats to
378 @deftypefun bool fmt_equal (const struct fmt_spec *@var{a}, const struct fmt_spec *@var{b})
379 Compares @var{a} and @var{b} memberwise and returns true if they are
380 identical, false otherwise. @var{format} need not be a valid input or
381 output format specifier.
384 @deftypefun void fmt_resize (struct fmt_spec *@var{fmt}, int @var{width})
385 Sets the width of @var{fmt} to a valid format for a @union{value} of size @var{width}.
388 @node Obtaining Properties of Format Types
389 @subsection Obtaining Properties of Format Types
391 These functions work with @enum{fmt_type}s instead of the higher-level
392 @struct{fmt_spec}s. Their primary purpose is to report properties of
393 each possible format type, which in turn allows clients to abstract
394 away many of the details of the very heterogeneous requirements of
397 The first group of functions works with format type names.
399 @deftypefun const char *fmt_name (enum fmt_type @var{type})
400 Returns the name for the given @var{type}, e.g.@: @code{"COMMA"} for
404 @deftypefun bool fmt_from_name (const char *@var{name}, enum fmt_type *@var{type})
405 Tries to find the @enum{fmt_type} associated with @var{name}. If
406 successful, sets @code{*@var{type}} to the type and returns true;
407 otherwise, returns false without modifying @code{*@var{type}}.
410 The functions below query basic limits on width and decimal places for
413 @deftypefun bool fmt_takes_decimals (enum fmt_type @var{type})
414 Returns true if a format of the given @var{type} is allowed to have a
415 nonzero number of decimal places (the @code{d} member of
416 @struct{fmt_spec}), false if not.
419 @anchor{fmt_min_input_width}
420 @anchor{fmt_max_input_width}
421 @anchor{fmt_min_output_width}
422 @anchor{fmt_max_output_width}
423 @deftypefun int fmt_min_input_width (enum fmt_type @var{type})
424 @deftypefunx int fmt_max_input_width (enum fmt_type @var{type})
425 @deftypefunx int fmt_min_output_width (enum fmt_type @var{type})
426 @deftypefunx int fmt_max_output_width (enum fmt_type @var{type})
427 Returns the minimum or maximum width (the @code{w} member of
428 @struct{fmt_spec}) allowed for an input or output format of the
429 specified @var{type}.
432 @anchor{fmt_max_input_decimals}
433 @anchor{fmt_max_output_decimals}
434 @deftypefun int fmt_max_input_decimals (enum fmt_type @var{type}, int @var{width})
435 @deftypefunx int fmt_max_output_decimals (enum fmt_type @var{type}, int @var{width})
436 Returns the maximum number of decimal places allowed for an input or
437 output format, respectively, of the given @var{type} and @var{width}.
438 Returns 0 if the specified @var{type} does not allow any decimal
439 places or if @var{width} is too narrow to allow decimal places.
442 @deftypefun int fmt_step_width (enum fmt_type @var{type})
443 Returns the ``width step'' for a @struct{fmt_spec} of the given
444 @var{type}. A @struct{fmt_spec}'s width must be a multiple of its
445 type's width step. Most format types have a width step of 1, so that
446 their formats' widths may be any integer within the valid range, but
447 hexadecimal numeric formats and AHEX string formats have a width step
451 These functions allow clients to broadly determine how each kind of
452 input or output format behaves.
454 @deftypefun bool fmt_is_string (enum fmt_type @var{type})
455 @deftypefunx bool fmt_is_numeric (enum fmt_type @var{type})
456 Returns true if @var{type} is a format for numeric or string values,
457 respectively, false otherwise.
460 @deftypefun enum fmt_category fmt_get_category (enum fmt_type @var{type})
461 Returns the category within which @var{type} falls.
463 @deftp {Enumeration} {enum fmt_category}
464 A group of format types. Format type categories correspond to the
465 input and output categories described in the PSPP user documentation
466 (@pxref{Input and Output Formats,,,pspp, PSPP Users Guide}).
468 Each format is in exactly one category. The categories have bitwise
469 disjoint values to make it easy to test whether a format type is in
470 one of multiple categories, e.g.@:
473 if (fmt_get_category (type) & (FMT_CAT_DATE | FMT_CAT_TIME))
475 /* @dots{}@r{@code{type} is a date or time format}@dots{} */
479 The format categories are:
482 Basic numeric formats.
485 Custom currency formats.
488 Legacy numeric formats.
493 @item FMT_CAT_HEXADECIMAL
502 @item FMT_CAT_DATE_COMPONENT
503 Date component formats.
511 The PSPP input and output routines use the following pair of functions
512 to convert @enum{fmt_type}s to and from the separate set of codes used
513 in system and portable files:
515 @deftypefun int fmt_to_io (enum fmt_type @var{type})
516 Returns the format code used in system and portable files that
517 corresponds to @var{type}.
520 @deftypefun bool fmt_from_io (int @var{io}, enum fmt_type *@var{type})
521 Converts @var{io}, a format code used in system and portable files,
522 into a @enum{fmt_type} in @code{*@var{type}}. Returns true if
523 successful, false if @var{io} is not valid.
526 These functions reflect the relationship between input and output
529 @deftypefun enum fmt_type fmt_input_to_output (enum fmt_type @var{type})
530 Returns the output format type that is used by default by DATA LIST
531 and other input procedures when @var{type} is specified as an input
532 format. The conversion from input format to output format is more
533 complicated than simply changing the format.
534 @xref{fmt_for_output_from_input}, for a function that performs the
538 @deftypefun bool fmt_usable_for_input (enum fmt_type @var{type})
539 Returns true if @var{type} may be used as an input format type, false
540 otherwise. The custom currency formats, in particular, may be used
541 for output but not for input.
543 All format types are valid for output.
546 The final group of format type property functions obtain
547 human-readable templates that illustrate the formats graphically.
549 @deftypefun const char *fmt_date_template (enum fmt_type @var{type})
550 Returns a formatting template for @var{type}, which must be a date or
551 time format type. These formats are used by @func{data_in} and
552 @func{data_out} to guide parsing and formatting date and time data.
555 @deftypefun char *fmt_dollar_template (const struct fmt_spec *@var{format})
556 Returns a string of the form @code{$#,###.##} according to
557 @var{format}, which must be of type @code{FMT_DOLLAR}. The caller
558 must free the string with @code{free}.
561 @node Numeric Formatting Styles
562 @subsection Numeric Formatting Styles
564 Each of the basic numeric formats (F, E, COMMA, DOT, DOLLAR, PCT) and
565 custom currency formats (CCA, CCB, CCC, CCD, CCE) has an associated
566 numeric formatting style, represented by @struct{fmt_number_style}.
567 Input and output conversion of formats that have numeric styles is
568 determined mainly by the style, although the formatting rules have
569 special cases that are not represented within the style.
571 @deftp {Structure} {struct fmt_number_style}
572 A structure type with the following members:
575 @item struct substring neg_prefix
576 @itemx struct substring prefix
577 @itemx struct substring suffix
578 @itemx struct substring neg_suffix
579 A set of strings used a prefix to negative numbers, a prefix to every
580 number, a suffix to every number, and a suffix to negative numbers,
581 respectively. Each of these strings is no more than
582 @code{FMT_STYLE_AFFIX_MAX} bytes (currently 16) bytes in length.
583 These strings must be freed with @func{ss_dealloc} when no longer
587 The character used as a decimal point. It must be either @samp{.} or
591 The character used for grouping digits to the left of the decimal
592 point. It may be @samp{.} or @samp{,}, in which case it must not be
593 equal to @code{decimal}, or it may be set to 0 to disable grouping.
597 The following functions are provided for working with numeric
600 @deftypefun void fmt_number_style_init (struct fmt_number_style *@var{style})
601 Initialises a @struct{fmt_number_style} with all of the
602 prefixes and suffixes set to the empty string, @samp{.} as the decimal
603 point character, and grouping disables.
607 @deftypefun void fmt_number_style_destroy (struct fmt_number_style *@var{style})
608 Destroys @var{style}, freeing its storage.
611 @deftypefun {struct fmt_number_style} *fmt_create (void)
612 A function which creates an array of all the styles used by pspp, and
613 calls fmt_number_style_init on each of them.
616 @deftypefun void fmt_done (struct fmt_number_style *@var{styles})
617 A wrapper function which takes an array of @struct{fmt_number_style}, calls
618 fmt_number_style_destroy on each of them, and then frees the array.
623 @deftypefun int fmt_affix_width (const struct fmt_number_style *@var{style})
624 Returns the total length of @var{style}'s @code{prefix} and @code{suffix}.
627 @deftypefun int fmt_neg_affix_width (const struct fmt_number_style *@var{style})
628 Returns the total length of @var{style}'s @code{neg_prefix} and
632 PSPP maintains a global set of number styles for each of the basic
633 numeric formats and custom currency formats. The following functions
634 work with these global styles:
636 @deftypefun {const struct fmt_number_style *} fmt_get_style (enum fmt_type @var{type})
637 Returns the numeric style for the given format @var{type}.
640 @deftypefun void fmt_check_style (const struct fmt_number_style *@var{style})
641 Asserts that style is self consistent.
645 @deftypefun {const char *} fmt_name (enum fmt_type @var{type})
646 Returns the name of the given format @var{type}.
651 @node Formatted Data Input and Output
652 @subsection Formatted Data Input and Output
654 These functions provide the ability to convert data fields into
655 @union{value}s and vice versa.
657 @deftypefun bool data_in (struct substring @var{input}, const char *@var{encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, const struct dictionary *@var{dict}, union value *@var{output}, int @var{width})
658 Parses @var{input} as a field containing data in the given format
659 @var{type}. The resulting value is stored in @var{output}, which the
660 caller must have initialized with the given @var{width}. For
661 consistency, @var{width} must be 0 if
662 @var{type} is a numeric format type and greater than 0 if @var{type}
663 is a string format type.
664 @var{encoding} should be set to indicate the character
665 encoding of @var{input}.
666 @var{dict} must be a pointer to the dictionary with which @var{output}
669 If @var{input} is the empty string (with length 0), @var{output} is
670 set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP
671 Users Guide}) for a numeric value, or to all spaces for a string
672 value. This applies regardless of the usual parsing requirements for
675 If @var{implied_decimals} is greater than zero, then the numeric
676 result is shifted right by @var{implied_decimals} decimal places if
677 @var{input} does not contain a decimal point character or an exponent.
678 Only certain numeric format types support implied decimal places; for
679 string formats and other numeric formats, @var{implied_decimals} has
680 no effect. DATA LIST FIXED is the primary user of this feature
681 (@pxref{DATA LIST FIXED,,,pspp, PSPP Users Guide}). Other callers
682 should generally specify 0 for @var{implied_decimals}, to disable this
685 When @var{input} contains invalid input data, @func{data_in} outputs a
686 message using @func{msg}.
688 If @var{first_column} is
689 nonzero, it is included in any such error message as the 1-based
690 column number of the start of the field. The last column in the field
691 is calculated as @math{@var{first_column} + @var{input} - 1}. To
692 suppress error output, enclose the call to @func{data_in} by calls to
693 @func{msg_disable} and @func{msg_enable}.
695 This function returns true on success, false if a message was output
696 (even if suppressed). Overflow and underflow provoke warnings but are
697 not propagated to the caller as errors.
699 This function is declared in @file{data/data-in.h}.
702 @deftypefun char * data_out (const union value *@var{input}, const struct fmt_spec *@var{format})
703 @deftypefunx char * data_out_legacy (const union value *@var{input}, const char *@var{encoding}, const struct fmt_spec *@var{format})
704 Converts the data pointed to by @var{input} into a string value, which
705 will be encoded in UTF-8, according to output format specifier @var{format}.
707 must be a valid output format. The width of @var{input} is
708 inferred from @var{format} using an algorithm equivalent to
709 @func{fmt_var_width}.
711 When @var{input} contains data that cannot be represented in the given
712 @var{format}, @func{data_out} may output a message using @func{msg},
714 although the current implementation does not
715 consistently do so. To suppress error output, enclose the call to
716 @func{data_out} by calls to @func{msg_disable} and @func{msg_enable}.
718 This function is declared in @file{data/data-out.h}.
721 @node User-Missing Values
722 @section User-Missing Values
724 In addition to the system-missing value for numeric values, each
725 variable has a set of user-missing values (@pxref{MISSING
726 VALUES,,,pspp, PSPP Users Guide}). A set of user-missing values is
727 represented by @struct{missing_values}.
729 It is rarely necessary to interact directly with a
730 @struct{missing_values} object. Instead, the most common operation,
731 querying whether a particular value is a missing value for a given
732 variable, is most conveniently executed through functions on
733 @struct{variable}. @xref{Variable Missing Values}, for details.
735 A @struct{missing_values} is essentially a set of @union{value}s that
736 have a common value width (@pxref{Values}). For a set of
737 missing values associated with a variable (the common case), the set's
738 width is the same as the variable's width.
740 Function prototypes and other declarations related to missing values
741 are declared in @file{data/missing-values.h}.
743 @deftp {Structure} {struct missing_values}
744 Opaque type that represents a set of missing values.
747 The contents of a set of missing values is subject to some
748 restrictions. Regardless of width, a set of missing values is allowed
749 to be empty. A set of numeric missing values may contain up to three
750 discrete numeric values, or a range of numeric values (which includes
751 both ends of the range), or a range plus one discrete numeric value.
752 A set of string missing values may contain up to three discrete string
753 values (with the same width as the set), but ranges are not supported.
755 In addition, values in string missing values wider than
756 @code{MV_MAX_STRING} bytes may contain non-space characters only in
757 their first @code{MV_MAX_STRING} bytes; all the bytes after the first
758 @code{MV_MAX_STRING} must be spaces. @xref{mv_is_acceptable}, for a
759 function that tests a value against these constraints.
761 @deftypefn Macro int MV_MAX_STRING
762 Number of bytes in a string missing value that are not required to be
763 spaces. The current value is 8, a value which is fixed by the system
764 file format. In PSPP we could easily eliminate this restriction, but
765 doing so would also require us to extend the system file format in an
766 incompatible way, which we consider a bad tradeoff.
769 The most often useful functions for missing values are those for
770 testing whether a given value is missing, described in the following
771 section. Several other functions for creating, inspecting, and
772 modifying @struct{missing_values} objects are described afterward, but
773 these functions are much more rarely useful.
776 * Testing for Missing Values::
777 * Creating and Destroying User-Missing Values::
778 * Changing User-Missing Value Set Width::
779 * Inspecting User-Missing Value Sets::
780 * Modifying User-Missing Value Sets::
783 @node Testing for Missing Values
784 @subsection Testing for Missing Values
786 The most often useful functions for missing values are those for
787 testing whether a given value is missing, described here. However,
788 using one of the corresponding missing value testing functions for
789 variables can be even easier (@pxref{Variable Missing Values}).
791 @deftypefun bool mv_is_value_missing (const struct missing_values *@var{mv}, const union value *@var{value}, enum mv_class @var{class})
792 @deftypefunx bool mv_is_num_missing (const struct missing_values *@var{mv}, double @var{value}, enum mv_class @var{class})
793 @deftypefunx bool mv_is_str_missing (const struct missing_values *@var{mv}, const char @var{value}[], enum mv_class @var{class})
794 Tests whether @var{value} is in one of the categories of missing
795 values given by @var{class}. Returns true if so, false otherwise.
797 @var{mv} determines the width of @var{value} and provides the set of
798 user-missing values to test.
800 The only difference among these functions in the form in which
801 @var{value} is provided, so you may use whichever function is most
804 The @var{class} argument determines the exact kinds of missing values
805 that the functions test for:
807 @deftp Enumeration {enum mv_class}
810 Returns true if @var{value} is in the set of user-missing values given
814 Returns true if @var{value} is system-missing. (If @var{mv}
815 represents a set of string values, then @var{value} is never
819 @itemx MV_USER | MV_SYSTEM
820 Returns true if @var{value} is user-missing or system-missing.
823 Always returns false, that is, @var{value} is never considered
829 @node Creating and Destroying User-Missing Values
830 @subsection Creation and Destruction
832 These functions create and destroy @struct{missing_values} objects.
834 @deftypefun void mv_init (struct missing_values *@var{mv}, int @var{width})
835 Initializes @var{mv} as a set of user-missing values. The set is
836 initially empty. Any values added to it must have the specified
840 @deftypefun void mv_destroy (struct missing_values *@var{mv})
841 Destroys @var{mv}, which must not be referred to again.
844 @deftypefun void mv_copy (struct missing_values *@var{mv}, const struct missing_values *@var{old})
845 Initializes @var{mv} as a copy of the existing set of user-missing
849 @deftypefun void mv_clear (struct missing_values *@var{mv})
850 Empties the user-missing value set @var{mv}, retaining its existing
854 @node Changing User-Missing Value Set Width
855 @subsection Changing User-Missing Value Set Width
857 A few PSPP language constructs copy sets of user-missing values from
858 one variable to another. When the source and target variables have
859 the same width, this is simple. But when the target variable's width
860 might be different from the source variable's, it takes a little more
861 work. The functions described here can help.
863 In fact, it is usually unnecessary to call these functions directly.
864 Most of the time @func{var_set_missing_values}, which uses
865 @func{mv_resize} internally to resize the new set of missing values to
866 the required width, may be used instead.
867 @xref{var_set_missing_values}, for more information.
869 @deftypefun bool mv_is_resizable (const struct missing_values *@var{mv}, int @var{new_width})
870 Tests whether @var{mv}'s width may be changed to @var{new_width} using
871 @func{mv_resize}. Returns true if it is allowed, false otherwise.
873 If @var{mv} contains any missing values, then it may be resized only
874 if each missing value may be resized, as determined by
875 @func{value_is_resizable} (@pxref{value_is_resizable}).
879 @deftypefun void mv_resize (struct missing_values *@var{mv}, int @var{width})
880 Changes @var{mv}'s width to @var{width}. @var{mv} and @var{width}
881 must satisfy the constraints explained above.
883 When a string missing value set's width is increased, each
884 user-missing value is padded on the right with spaces to the new
888 @node Inspecting User-Missing Value Sets
889 @subsection Inspecting User-Missing Value Sets
891 These functions inspect the properties and contents of
892 @struct{missing_values} objects.
894 The first set of functions inspects the discrete values that sets of
895 user-missing values may contain:
897 @deftypefun bool mv_is_empty (const struct missing_values *@var{mv})
898 Returns true if @var{mv} contains no user-missing values, false if it
899 contains at least one user-missing value (either a discrete value or a
903 @deftypefun int mv_get_width (const struct missing_values *@var{mv})
904 Returns the width of the user-missing values that @var{mv} represents.
907 @deftypefun int mv_n_values (const struct missing_values *@var{mv})
908 Returns the number of discrete user-missing values included in
909 @var{mv}. The return value will be between 0 and 3. For sets of
910 numeric user-missing values that include a range, the return value
914 @deftypefun bool mv_has_value (const struct missing_values *@var{mv})
915 Returns true if @var{mv} has at least one discrete user-missing
916 values, that is, if @func{mv_n_values} would return nonzero for
920 @deftypefun {const union value *} mv_get_value (const struct missing_values *@var{mv}, int @var{index})
921 Returns the discrete user-missing value in @var{mv} with the given
922 @var{index}. The caller must not modify or free the returned value or
923 refer to it after modifying or freeing @var{mv}. The index must be
924 less than the number of discrete user-missing values in @var{mv}, as
925 reported by @func{mv_n_values}.
928 The second set of functions inspects the single range of values that
929 numeric sets of user-missing values may contain:
931 @deftypefun bool mv_has_range (const struct missing_values *@var{mv})
932 Returns true if @var{mv} includes a range, false otherwise.
935 @deftypefun void mv_get_range (const struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
936 Stores the low endpoint of @var{mv}'s range in @code{*@var{low}} and
937 the high endpoint of the range in @code{*@var{high}}. @var{mv} must
941 @node Modifying User-Missing Value Sets
942 @subsection Modifying User-Missing Value Sets
944 These functions modify the contents of @struct{missing_values}
947 The next set of functions applies to all sets of user-missing values:
949 @deftypefun bool mv_add_value (struct missing_values *@var{mv}, const union value *@var{value})
950 @deftypefunx bool mv_add_str (struct missing_values *@var{mv}, const char @var{value}[])
951 @deftypefunx bool mv_add_num (struct missing_values *@var{mv}, double @var{value})
952 Attempts to add the given discrete @var{value} to set of user-missing
953 values @var{mv}. @var{value} must have the same width as @var{mv}.
954 Returns true if @var{value} was successfully added, false if the set
955 could not accept any more discrete values or if @var{value} is not an
956 acceptable user-missing value (see @func{mv_is_acceptable} below).
958 These functions are equivalent, except for the form in which
959 @var{value} is provided, so you may use whichever function is most
963 @deftypefun void mv_pop_value (struct missing_values *@var{mv}, union value *@var{value})
964 Removes a discrete value from @var{mv} (which must contain at least
965 one discrete value) and stores it in @var{value}.
968 @deftypefun bool mv_replace_value (struct missing_values *@var{mv}, const union value *@var{value}, int @var{index})
969 Attempts to replace the discrete value with the given @var{index} in
970 @var{mv} (which must contain at least @var{index} + 1 discrete values)
971 by @var{value}. Returns true if successful, false if @var{value} is
972 not an acceptable user-missing value (see @func{mv_is_acceptable}
976 @deftypefun bool mv_is_acceptable (const union value *@var{value}, int @var{width})
977 @anchor{mv_is_acceptable}
978 Returns true if @var{value}, which must have the specified
979 @var{width}, may be added to a missing value set of the same
980 @var{width}, false if it cannot. As described above, all numeric
981 values and string values of width @code{MV_MAX_STRING} or less may be
982 added, but string value of greater width may be added only if bytes
983 beyond the first @code{MV_MAX_STRING} are all spaces.
986 The second set of functions applies only to numeric sets of
989 @deftypefun bool mv_add_range (struct missing_values *@var{mv}, double @var{low}, double @var{high})
990 Attempts to add a numeric range covering @var{low}@dots{}@var{high}
991 (inclusive on both ends) to @var{mv}, which must be a numeric set of
992 user-missing values. Returns true if the range is successful added,
993 false on failure. Fails if @var{mv} already contains a range, or if
994 @var{mv} contains more than one discrete value, or if @var{low} >
998 @deftypefun void mv_pop_range (struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
999 Given @var{mv}, which must be a numeric set of user-missing values
1000 that contains a range, removes that range from @var{mv} and stores its
1001 low endpoint in @code{*@var{low}} and its high endpoint in
1006 @section Value Labels
1008 Each variable has a set of value labels (@pxref{VALUE LABELS,,,pspp,
1009 PSPP Users Guide}), represented as @struct{val_labs}. A
1010 @struct{val_labs} is essentially a map from @union{value}s to strings.
1011 All of the values in a set of value labels have the same width, which
1012 for a set of value labels owned by a variable (the common case) is the
1013 same as its variable.
1015 Sets of value labels may contain any number of entries.
1017 It is rarely necessary to interact directly with a @struct{val_labs}
1018 object. Instead, the most common operation, looking up the label for
1019 a value of a given variable, can be conveniently executed through
1020 functions on @struct{variable}. @xref{Variable Value Labels}, for
1023 Function prototypes and other declarations related to missing values
1024 are declared in @file{data/value-labels.h}.
1026 @deftp {Structure} {struct val_labs}
1027 Opaque type that represents a set of value labels.
1030 The most often useful function for value labels is
1031 @func{val_labs_find}, for looking up the label associated with a
1034 @deftypefun {char *} val_labs_find (const struct val_labs *@var{val_labs}, union value @var{value})
1035 Looks in @var{val_labs} for a label for the given @var{value}.
1036 Returns the label, if one is found, or a null pointer otherwise.
1039 Several other functions for working with value labels are described in
1040 the following section, but these are more rarely useful.
1043 * Value Labels Creation and Destruction::
1044 * Value Labels Properties::
1045 * Value Labels Adding and Removing Labels::
1046 * Value Labels Iteration::
1049 @node Value Labels Creation and Destruction
1050 @subsection Creation and Destruction
1052 These functions create and destroy @struct{val_labs} objects.
1054 @deftypefun {struct val_labs *} val_labs_create (int @var{width})
1055 Creates and returns an initially empty set of value labels with the
1059 @deftypefun {struct val_labs *} val_labs_clone (const struct val_labs *@var{val_labs})
1060 Creates and returns a set of value labels whose width and contents are
1061 the same as those of @var{var_labs}.
1064 @deftypefun void val_labs_clear (struct val_labs *@var{var_labs})
1065 Deletes all value labels from @var{var_labs}.
1068 @deftypefun void val_labs_destroy (struct val_labs *@var{var_labs})
1069 Destroys @var{var_labs}, which must not be referenced again.
1072 @node Value Labels Properties
1073 @subsection Value Labels Properties
1075 These functions inspect and manipulate basic properties of
1076 @struct{val_labs} objects.
1078 @deftypefun size_t val_labs_count (const struct val_labs *@var{val_labs})
1079 Returns the number of value labels in @var{val_labs}.
1082 @deftypefun bool val_labs_can_set_width (const struct val_labs *@var{val_labs}, int @var{new_width})
1083 Tests whether @var{val_labs}'s width may be changed to @var{new_width}
1084 using @func{val_labs_set_width}. Returns true if it is allowed, false
1087 A set of value labels may be resized to a given width only if each
1088 value in it may be resized to that width, as determined by
1089 @func{value_is_resizable} (@pxref{value_is_resizable}).
1092 @deftypefun void val_labs_set_width (struct val_labs *@var{val_labs}, int @var{new_width})
1093 Changes the width of @var{val_labs}'s values to @var{new_width}, which
1094 must be a valid new width as determined by
1095 @func{val_labs_can_set_width}.
1098 @node Value Labels Adding and Removing Labels
1099 @subsection Adding and Removing Labels
1101 These functions add and remove value labels from a @struct{val_labs}
1104 @deftypefun bool val_labs_add (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1105 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1106 which must have the same width as the set of value labels. Returns
1107 true if successful, false if @var{value} already has a label.
1110 @deftypefun void val_labs_replace (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1111 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1112 which must have the same width as the set of value labels. If
1113 @var{value} already has a label in @var{var_labs}, it is replaced.
1116 @deftypefun bool val_labs_remove (struct val_labs *@var{val_labs}, union value @var{value})
1117 Removes from @var{val_labs} any label for @var{value}, which must have
1118 the same width as the set of value labels. Returns true if a label
1119 was removed, false otherwise.
1122 @node Value Labels Iteration
1123 @subsection Iterating through Value Labels
1125 These functions allow iteration through the set of value labels
1126 represented by a @struct{val_labs} object. They may be used in the
1127 context of a @code{for} loop:
1130 struct val_labs val_labs;
1131 const struct val_lab *vl;
1135 for (vl = val_labs_first (val_labs); vl != NULL;
1136 vl = val_labs_next (val_labs, vl))
1138 @dots{}@r{do something with @code{vl}}@dots{}
1142 Value labels should not be added or deleted from a @struct{val_labs}
1143 as it is undergoing iteration.
1145 @deftypefun {const struct val_lab *} val_labs_first (const struct val_labs *@var{val_labs})
1146 Returns the first value label in @var{var_labs}, if it contains at
1147 least one value label, or a null pointer if it does not contain any
1151 @deftypefun {const struct val_lab *} val_labs_next (const struct val_labs *@var{val_labs}, const struct val_labs_iterator **@var{vl})
1152 Returns the value label in @var{var_labs} following @var{vl}, if
1153 @var{vl} is not the last value label in @var{val_labs}, or a null
1154 pointer if there are no value labels following @var{vl}.
1157 @deftypefun {const struct val_lab **} val_labs_sorted (const struct val_labs *@var{val_labs})
1158 Allocates and returns an array of pointers to value labels, which are
1159 sorted in increasing order by value. The array has
1160 @code{val_labs_count (@var{val_labs})} elements. The caller is
1161 responsible for freeing the array with @func{free} (but must not free
1162 any of the @struct{val_lab} elements that the array points to).
1165 The iteration functions above work with pointers to @struct{val_lab}
1166 which is an opaque data structure that users of @struct{val_labs} must
1167 not modify or free directly. The following functions work with
1168 objects of this type:
1170 @deftypefun {const union value *} val_lab_get_value (const struct val_lab *@var{vl})
1171 Returns the value of value label @var{vl}. The caller must not modify
1172 or free the returned value. (To achieve a similar result, remove the
1173 value label with @func{val_labs_remove}, then add the new value with
1174 @func{val_labs_add}.)
1176 The width of the returned value cannot be determined directly from
1177 @var{vl}. It may be obtained by calling @func{val_labs_get_width} on
1178 the @struct{val_labs} that @var{vl} is in.
1181 @deftypefun {const char *} val_lab_get_label (const struct val_lab *@var{vl})
1182 Returns the label in @var{vl} as a null-terminated string. The caller
1183 must not modify or free the returned string. (Use
1184 @func{val_labs_replace} to change a value label.)
1190 A PSPP variable is represented by @struct{variable}, an opaque type
1191 declared in @file{data/variable.h} along with related declarations.
1192 @xref{Variables,,,pspp, PSPP Users Guide}, for a description of PSPP
1193 variables from a user perspective.
1195 PSPP is unusual among computer languages in that, by itself, a PSPP
1196 variable does not have a value. Instead, a variable in PSPP takes on
1197 a value only in the context of a case, which supplies one value for
1198 each variable in a set of variables (@pxref{Cases}). The set of
1199 variables in a case, in turn, are ordinarily part of a dictionary
1200 (@pxref{Dictionaries}).
1202 Every variable has several attributes, most of which correspond
1203 directly to one of the variable attributes visible to PSPP users
1204 (@pxref{Attributes,,,pspp, PSPP Users Guide}).
1206 The following sections describe variable-related functions and macros.
1210 * Variable Type and Width::
1211 * Variable Missing Values::
1212 * Variable Value Labels::
1213 * Variable Print and Write Formats::
1215 * Variable GUI Attributes::
1216 * Variable Leave Status::
1217 * Dictionary Class::
1218 * Variable Creation and Destruction::
1219 * Variable Short Names::
1220 * Variable Relationships::
1221 * Variable Auxiliary Data::
1222 * Variable Categorical Values::
1226 @subsection Variable Name
1228 A variable name is a string between 1 and @code{VAR_NAME_LEN} bytes
1229 long that satisfies the rules for PSPP identifiers
1230 (@pxref{Tokens,,,pspp, PSPP Users Guide}). Variable names are
1231 mixed-case and treated case-insensitively.
1233 @deftypefn Macro int VAR_NAME_LEN
1234 Maximum length of a variable name, in bytes, currently 64.
1237 Only one commonly useful function relates to variable names:
1239 @deftypefun {const char *} var_get_name (const struct variable *@var{var})
1240 Returns @var{var}'s variable name as a C string.
1243 A few other functions are much more rarely used. Some of these
1244 functions are used internally by the dictionary implementation:
1246 @anchor{var_set_name}
1247 @deftypefun {void} var_set_name (struct variable *@var{var}, const char *@var{new_name})
1248 Changes the name of @var{var} to @var{new_name}, which must be a
1249 ``plausible'' name as defined below.
1251 This function cannot be applied to a variable that is part of a
1252 dictionary. Use @func{dict_rename_var} instead (@pxref{Dictionary
1253 Renaming Variables}).
1256 @anchor{var_is_plausible_name}
1257 @deftypefun {bool} var_is_valid_name (const char *@var{name}, bool @var{issue_error})
1258 @deftypefunx {bool} var_is_plausible_name (const char *@var{name}, bool @var{issue_error})
1259 Tests @var{name} for validity or ``plausibility.'' Returns true if
1260 the name is acceptable, false otherwise. If the name is not
1261 acceptable and @var{issue_error} is true, also issues an error message
1262 explaining the violation.
1264 A valid name is one that fully satisfies all of the requirements for
1265 variable names (@pxref{Tokens,,,pspp, PSPP Users Guide}). A
1266 ``plausible'' name is simply a string whose length is in the valid
1267 range and that is not a reserved word. PSPP accepts plausible but
1268 invalid names as variable names in some contexts where the character
1269 encoding scheme is ambiguous, as when reading variable names from
1273 @deftypefun {enum dict_class} var_get_dict_class (const struct variable *@var{var})
1274 Returns the dictionary class of @var{var}'s name (@pxref{Dictionary
1278 @node Variable Type and Width
1279 @subsection Variable Type and Width
1281 A variable's type and width are the type and width of its values
1284 @deftypefun {enum val_type} var_get_type (const struct variable *@var{var})
1285 Returns the type of variable @var{var}.
1288 @deftypefun int var_get_width (const struct variable *@var{var})
1289 Returns the width of variable @var{var}.
1292 @deftypefun void var_set_width (struct variable *@var{var}, int @var{width})
1293 Sets the width of variable @var{var} to @var{width}. The width of a
1294 variable should not normally be changed after the variable is created,
1295 so this function is rarely used. This function cannot be applied to a
1296 variable that is part of a dictionary.
1299 @deftypefun bool var_is_numeric (const struct variable *@var{var})
1300 Returns true if @var{var} is a numeric variable, false otherwise.
1303 @deftypefun bool var_is_alpha (const struct variable *@var{var})
1304 Returns true if @var{var} is an alphanumeric (string) variable, false
1308 @node Variable Missing Values
1309 @subsection Variable Missing Values
1311 A numeric or short string variable may have a set of user-missing
1312 values (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}), represented
1313 as a @struct{missing_values} (@pxref{User-Missing Values}).
1315 The most frequent operation on a variable's missing values is to query
1316 whether a value is user- or system-missing:
1318 @deftypefun bool var_is_value_missing (const struct variable *@var{var}, const union value *@var{value}, enum mv_class @var{class})
1319 @deftypefunx bool var_is_num_missing (const struct variable *@var{var}, double @var{value}, enum mv_class @var{class})
1320 @deftypefunx bool var_is_str_missing (const struct variable *@var{var}, const char @var{value}[], enum mv_class @var{class})
1321 Tests whether @var{value} is a missing value of the given @var{class}
1322 for variable @var{var} and returns true if so, false otherwise.
1323 @func{var_is_num_missing} may only be applied to numeric variables;
1324 @func{var_is_str_missing} may only be applied to string variables.
1325 @var{value} must have been initialized with the same width as
1328 @code{var_is_@var{type}_missing (@var{var}, @var{value}, @var{class})}
1329 is equivalent to @code{mv_is_@var{type}_missing
1330 (var_get_missing_values (@var{var}), @var{value}, @var{class})}.
1333 In addition, a few functions are provided to work more directly with a
1334 variable's @struct{missing_values}:
1336 @deftypefun {const struct missing_values *} var_get_missing_values (const struct variable *@var{var})
1337 Returns the @struct{missing_values} associated with @var{var}. The
1338 caller must not modify the returned structure. The return value is
1342 @anchor{var_set_missing_values}
1343 @deftypefun {void} var_set_missing_values (struct variable *@var{var}, const struct missing_values *@var{miss})
1344 Changes @var{var}'s missing values to a copy of @var{miss}, or if
1345 @var{miss} is a null pointer, clears @var{var}'s missing values. If
1346 @var{miss} is non-null, it must have the same width as @var{var} or be
1347 resizable to @var{var}'s width (@pxref{mv_resize}). The caller
1348 retains ownership of @var{miss}.
1351 @deftypefun void var_clear_missing_values (struct variable *@var{var})
1352 Clears @var{var}'s missing values. Equivalent to
1353 @code{var_set_missing_values (@var{var}, NULL)}.
1356 @deftypefun bool var_has_missing_values (const struct variable *@var{var})
1357 Returns true if @var{var} has any missing values, false if it has
1358 none. Equivalent to @code{mv_is_empty (var_get_missing_values (@var{var}))}.
1361 @node Variable Value Labels
1362 @subsection Variable Value Labels
1364 A numeric or short string variable may have a set of value labels
1365 (@pxref{VALUE LABELS,,,pspp, PSPP Users Guide}), represented as a
1366 @struct{val_labs} (@pxref{Value Labels}). The most commonly useful
1367 functions for value labels return the value label associated with a
1370 @deftypefun {const char *} var_lookup_value_label (const struct variable *@var{var}, const union value *@var{value})
1371 Looks for a label for @var{value} in @var{var}'s set of value labels.
1372 @var{value} must have the same width as @var{var}. Returns the label
1373 if one exists, otherwise a null pointer.
1376 @deftypefun void var_append_value_name (const struct variable *@var{var}, const union value *@var{value}, struct string *@var{str})
1377 Looks for a label for @var{value} in @var{var}'s set of value labels.
1378 @var{value} must have the same width as @var{var}.
1379 If a label exists, it will be appended to the string pointed to by @var{str}.
1380 Otherwise, it formats @var{value}
1381 using @var{var}'s print format (@pxref{Input and Output Formats})
1382 and appends the formatted string.
1385 The underlying @struct{val_labs} structure may also be accessed
1386 directly using the functions described below.
1388 @deftypefun bool var_has_value_labels (const struct variable *@var{var})
1389 Returns true if @var{var} has at least one value label, false
1393 @deftypefun {const struct val_labs *} var_get_value_labels (const struct variable *@var{var})
1394 Returns the @struct{val_labs} associated with @var{var}. If @var{var}
1395 has no value labels, then the return value may or may not be a null
1398 The variable retains ownership of the returned @struct{val_labs},
1399 which the caller must not attempt to modify.
1402 @deftypefun void var_set_value_labels (struct variable *@var{var}, const struct val_labs *@var{val_labs})
1403 Replaces @var{var}'s value labels by a copy of @var{val_labs}. The
1404 caller retains ownership of @var{val_labs}. If @var{val_labs} is a
1405 null pointer, then @var{var}'s value labels, if any, are deleted.
1408 @deftypefun void var_clear_value_labels (struct variable *@var{var})
1409 Deletes @var{var}'s value labels. Equivalent to
1410 @code{var_set_value_labels (@var{var}, NULL)}.
1413 A final group of functions offers shorthands for operations that would
1414 otherwise require getting the value labels from a variable, copying
1415 them, modifying them, and then setting the modified value labels into
1416 the variable (making a second copy):
1418 @deftypefun bool var_add_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1419 Attempts to add a copy of @var{label} as a label for @var{value} for
1420 the given @var{var}. @var{value} must have the same width as
1421 @var{var}. If @var{value} already has a label, then the old label is
1422 retained. Returns true if a label is added, false if there was an
1423 existing label for @var{value}. Either way, the caller retains
1424 ownership of @var{value} and @var{label}.
1427 @deftypefun void var_replace_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1428 Attempts to add a copy of @var{label} as a label for @var{value} for
1429 the given @var{var}. @var{value} must have the same width as
1430 @var{var}. If @var{value} already has a label, then
1431 @var{label} replaces the old label. Either way, the caller retains
1432 ownership of @var{value} and @var{label}.
1435 @node Variable Print and Write Formats
1436 @subsection Variable Print and Write Formats
1438 Each variable has an associated pair of output formats, called its
1439 @dfn{print format} and @dfn{write format}. @xref{Input and Output
1440 Formats,,,pspp, PSPP Users Guide}, for an introduction to formats.
1441 @xref{Input and Output Formats}, for a developer's description of
1442 format representation.
1444 The print format is used to convert a variable's data values to
1445 strings for human-readable output. The write format is used similarly
1446 for machine-readable output, primarily by the WRITE transformation
1447 (@pxref{WRITE,,,pspp, PSPP Users Guide}). Most often a variable's
1448 print and write formats are the same.
1450 A newly created variable by default has format F8.2 if it is numeric
1451 or an A format with the same width as the variable if it is string.
1452 Many creators of variables override these defaults.
1454 Both the print format and write format are output formats. Input
1455 formats are not part of @struct{variable}. Instead, input programs
1456 and transformations keep track of variable input formats themselves.
1458 The following functions work with variable print and write formats.
1460 @deftypefun {const struct fmt_spec *} var_get_print_format (const struct variable *@var{var})
1461 @deftypefunx {const struct fmt_spec *} var_get_write_format (const struct variable *@var{var})
1462 Returns @var{var}'s print or write format, respectively.
1465 @deftypefun void var_set_print_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1466 @deftypefunx void var_set_write_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1467 @deftypefunx void var_set_both_formats (struct variable *@var{var}, const struct fmt_spec *@var{format})
1468 Sets @var{var}'s print format, write format, or both formats,
1469 respectively, to a copy of @var{format}.
1472 @node Variable Labels
1473 @subsection Variable Labels
1475 A variable label is a string that describes a variable. Variable
1476 labels may contain spaces and punctuation not allowed in variable
1477 names. @xref{VARIABLE LABELS,,,pspp, PSPP Users Guide}, for a
1478 user-level description of variable labels.
1480 The most commonly useful functions for variable labels are those to
1481 retrieve a variable's label:
1483 @deftypefun {const char *} var_to_string (const struct variable *@var{var})
1484 Returns @var{var}'s variable label, if it has one, otherwise
1485 @var{var}'s name. In either case the caller must not attempt to
1486 modify or free the returned string.
1488 This function is useful for user output.
1491 @deftypefun {const char *} var_get_label (const struct variable *@var{var})
1492 Returns @var{var}'s variable label, if it has one, or a null pointer
1496 A few other variable label functions are also provided:
1498 @deftypefun void var_set_label (struct variable *@var{var}, const char *@var{label})
1499 Sets @var{var}'s variable label to a copy of @var{label}, or removes
1500 any label from @var{var} if @var{label} is a null pointer or contains
1501 only spaces. Leading and trailing spaces are removed from the
1502 variable label and its remaining content is truncated at 255 bytes.
1505 @deftypefun void var_clear_label (struct variable *@var{var})
1506 Removes any variable label from @var{var}.
1509 @deftypefun bool var_has_label (const struct variable *@var{var})
1510 Returns true if @var{var} has a variable label, false otherwise.
1513 @node Variable GUI Attributes
1514 @subsection GUI Attributes
1516 These functions and types access and set attributes that are mainly
1517 used by graphical user interfaces. Their values are also stored in
1518 and retrieved from system files (but not portable files).
1520 The first group of functions relate to the measurement level of
1521 numeric data. New variables are assigned a nominal level of
1522 measurement by default.
1524 @deftp {Enumeration} {enum measure}
1525 Measurement level. Available values are:
1528 @item MEASURE_NOMINAL
1529 Numeric data values are arbitrary. Arithmetic operations and
1530 numerical comparisons of such data are not meaningful.
1532 @item MEASURE_ORDINAL
1533 Numeric data values indicate progression along a rank order.
1534 Arbitrary arithmetic operations such as addition are not meaningful on
1535 such data, but inequality comparisons (less, greater, etc.) have
1536 straightforward interpretations.
1539 Ratios, sums, etc. of numeric data values have meaningful
1543 PSPP does not have a separate category for interval data, which would
1544 naturally fall between the ordinal and scale measurement levels.
1547 @deftypefun bool measure_is_valid (enum measure @var{measure})
1548 Returns true if @var{measure} is a valid level of measurement, that
1549 is, if it is one of the @code{enum measure} constants listed above,
1550 and false otherwise.
1553 @deftypefun enum measure var_get_measure (const struct variable *@var{var})
1554 @deftypefunx void var_set_measure (struct variable *@var{var}, enum measure @var{measure})
1555 Gets or sets @var{var}'s measurement level.
1558 The following set of functions relates to the width of on-screen
1559 columns used for displaying variable data in a graphical user
1560 interface environment. The unit of measurement is the width of a
1561 character. For proportionally spaced fonts, this is based on the
1562 average width of a character.
1564 @deftypefun int var_get_display_width (const struct variable *@var{var})
1565 @deftypefunx void var_set_display_width (struct variable *@var{var}, int @var{display_width})
1566 Gets or sets @var{var}'s display width.
1569 @anchor{var_default_display_width}
1570 @deftypefun int var_default_display_width (int @var{width})
1571 Returns the default display width for a variable with the given
1572 @var{width}. The default width of a numeric variable is 8. The
1573 default width of a string variable is @var{width} or 32, whichever is
1577 The final group of functions work with the justification of data when
1578 it is displayed in on-screen columns. New variables are by default
1581 @deftp {Enumeration} {enum alignment}
1582 Text justification. Possible values are @code{ALIGN_LEFT},
1583 @code{ALIGN_RIGHT}, and @code{ALIGN_CENTRE}.
1586 @deftypefun bool alignment_is_valid (enum alignment @var{alignment})
1587 Returns true if @var{alignment} is a valid alignment, that is, if it
1588 is one of the @code{enum alignment} constants listed above, and false
1592 @deftypefun enum alignment var_get_alignment (const struct variable *@var{var})
1593 @deftypefunx void var_set_alignment (struct variable *@var{var}, enum alignment @var{alignment})
1594 Gets or sets @var{var}'s alignment.
1597 @node Variable Leave Status
1598 @subsection Variable Leave Status
1600 Commonly, most or all data in a case come from an input file, read
1601 with a command such as DATA LIST or GET, but data can also be
1602 generated with transformations such as COMPUTE. In the latter case
1603 the question of a datum's ``initial value'' can arise. For example,
1604 the value of a piece of generated data can recursively depend on its
1609 Another situation where the initial value of a variable arises is when
1610 its value is not set at all for some cases, e.g.@: below, @code{Y} is
1611 set only for the first 10 cases:
1613 DO IF #CASENUM <= 10.
1618 By default, the initial value of a datum in either of these situations
1619 is the system-missing value for numeric values and spaces for string
1620 values. This means that, above, X would be system-missing and that Y
1621 would be 1 for the first 10 cases and system-missing for the
1624 PSPP also supports retaining the value of a variable from one case to
1625 another, using the LEAVE command (@pxref{LEAVE,,,pspp, PSPP Users
1626 Guide}). The initial value of such a variable is 0 if it is numeric
1627 and spaces if it is a string. If the command @samp{LEAVE X Y} is
1628 appended to the above example, then X would have value 1 in the first
1629 case and increase by 1 in every succeeding case, and Y would have
1630 value 1 for the first 10 cases and 0 for later cases.
1632 The LEAVE command has no effect on data that comes from an input file
1633 or whose values do not depend on a variable's initial value.
1635 The value of scratch variables (@pxref{Scratch Variables,,,pspp, PSPP
1636 Users Guide}) are always left from one case to another.
1638 The following functions work with a variable's leave status.
1640 @deftypefun bool var_get_leave (const struct variable *@var{var})
1641 Returns true if @var{var}'s value is to be retained from case to case,
1642 false if it is reinitialized to system-missing or spaces.
1645 @deftypefun void var_set_leave (struct variable *@var{var}, bool @var{leave})
1646 If @var{leave} is true, marks @var{var} to be left from case to case;
1647 if @var{leave} is false, marks @var{var} to be reinitialized for each
1650 If @var{var} is a scratch variable, @var{leave} must be true.
1653 @deftypefun bool var_must_leave (const struct variable *@var{var})
1654 Returns true if @var{var} must be left from case to case, that is, if
1655 @var{var} is a scratch variable.
1658 @node Dictionary Class
1659 @subsection Dictionary Class
1661 Occasionally it is useful to classify variables into @dfn{dictionary
1662 classes} based on their names. Dictionary classes are represented by
1663 @enum{dict_class}. This type and other declarations for dictionary
1664 classes are in the @file{<data/dict-class.h>} header.
1666 @deftp {Enumeration} {enum dict_class}
1667 The dictionary classes are:
1671 An ordinary variable, one whose name does not begin with @samp{$} or
1675 A system variable, one whose name begins with @samp{$}. @xref{System
1676 Variables,,,pspp, PSPP Users Guide}.
1679 A scratch variable, one whose name begins with @samp{#}.
1680 @xref{Scratch Variables,,,pspp, PSPP Users Guide}.
1683 The values for dictionary classes are bitwise disjoint, which allows
1684 them to be used in bit-masks. An extra enumeration constant
1685 @code{DC_ALL}, whose value is the bitwise-@i{or} of all of the above
1686 constants, is provided to aid in this purpose.
1689 One example use of dictionary classes arises in connection with PSPP
1690 syntax that uses @code{@var{a} TO @var{b}} to name the variables in a
1691 dictionary from @var{a} to @var{b} (@pxref{Sets of Variables,,,pspp,
1692 PSPP Users Guide}). This syntax requires @var{a} and @var{b} to be in
1693 the same dictionary class. It limits the variables that it includes
1694 to those in that dictionary class.
1696 The following functions relate to dictionary classes.
1698 @deftypefun {enum dict_class} dict_class_from_id (const char *@var{name})
1699 Returns the ``dictionary class'' for the given variable @var{name}, by
1700 looking at its first letter.
1703 @deftypefun {const char *} dict_class_to_name (enum dict_class @var{dict_class})
1704 Returns a name for the given @var{dict_class} as an adjective, e.g.@:
1707 This function should probably not be used in new code as it can lead
1708 to difficulties for internationalization.
1711 @node Variable Creation and Destruction
1712 @subsection Variable Creation and Destruction
1714 Only rarely should PSPP code create or destroy variables directly.
1715 Ordinarily, variables are created within a dictionary and destroying
1716 by individual deletion from the dictionary or by destroying the entire
1717 dictionary at once. The functions here enable the exceptional case,
1718 of creation and destruction of variables that are not associated with
1719 any dictionary. These functions are used internally in the dictionary
1723 @deftypefun {struct variable *} var_create (const char *@var{name}, int @var{width})
1724 Creates and returns a new variable with the given @var{name} and
1725 @var{width}. The new variable is not part of any dictionary. Use
1726 @func{dict_create_var}, instead, to create a variable in a dictionary
1727 (@pxref{Dictionary Creating Variables}).
1729 @var{name} should be a valid variable name and must be a ``plausible''
1730 variable name (@pxref{Variable Name}). @var{width} must be between 0
1731 and @code{MAX_STRING}, inclusive (@pxref{Values}).
1733 The new variable has no user-missing values, value labels, or variable
1734 label. Numeric variables initially have F8.2 print and write formats,
1735 right-justified display alignment, and scale level of measurement.
1736 String variables are created with A print and write formats,
1737 left-justified display alignment, and nominal level of measurement.
1738 The initial display width is determined by
1739 @func{var_default_display_width} (@pxref{var_default_display_width}).
1741 The new variable initially has no short name (@pxref{Variable Short
1742 Names}) and no auxiliary data (@pxref{Variable Auxiliary Data}).
1746 @deftypefun {struct variable *} var_clone (const struct variable *@var{old_var})
1747 Creates and returns a new variable with the same attributes as
1748 @var{old_var}, with a few exceptions. First, the new variable is not
1749 part of any dictionary, regardless of whether @var{old_var} was in a
1750 dictionary. Use @func{dict_clone_var}, instead, to add a clone of a
1751 variable to a dictionary.
1753 Second, the new variable is not given any short name, even if
1754 @var{old_var} had a short name. This is because the new variable is
1755 likely to be immediately renamed, in which case the short name would
1756 be incorrect (@pxref{Variable Short Names}).
1758 Finally, @var{old_var}'s auxiliary data, if any, is not copied to the
1759 new variable (@pxref{Variable Auxiliary Data}).
1762 @deftypefun {void} var_destroy (struct variable *@var{var})
1763 Destroys @var{var} and frees all associated storage, including its
1764 auxiliary data, if any. @var{var} must not be part of a dictionary.
1765 To delete a variable from a dictionary and destroy it, use
1766 @func{dict_delete_var} (@pxref{Dictionary Deleting Variables}).
1769 @node Variable Short Names
1770 @subsection Variable Short Names
1772 PSPP variable names may be up to 64 (@code{VAR_NAME_LEN}) bytes long.
1773 The system and portable file formats, however, were designed when
1774 variable names were limited to 8 bytes in length. Since then, the
1775 system file format has been augmented with an extension record that
1776 explains how the 8-byte short names map to full-length names
1777 (@pxref{Long Variable Names Record}), but the short names are still
1778 present. Thus, the continued presence of the short names is more or
1779 less invisible to PSPP users, but every variable in a system file
1780 still has a short name that must be unique.
1782 PSPP can generate unique short names for variables based on their full
1783 names at the time it creates the data file. If all variables' full
1784 names are unique in their first 8 bytes, then the short names are
1785 simply prefixes of the full names; otherwise, PSPP changes them so
1786 that they are unique.
1788 By itself this algorithm interoperates well with other software that
1789 can read system files, as long as that software understands the
1790 extension record that maps short names to long names. When the other
1791 software does not understand the extension record, it can produce
1792 surprising results. Consider a situation where PSPP reads a system
1793 file that contains two variables named RANKINGSCORE, then the user
1794 adds a new variable named RANKINGSTATUS, then saves the modified data
1795 as a new system file. A program that does not understand long names
1796 would then see one of these variables under the name RANKINGS---either
1797 one, depending on the algorithm's details---and the other under a
1798 different name. The effect could be very confusing: by adding a new
1799 and apparently unrelated variable in PSPP, the user effectively
1800 renamed the existing variable.
1802 To counteract this potential problem, every @struct{variable} may have
1803 a short name. A variable created by the system or portable file
1804 reader receives the short name from that data file. When a variable
1805 with a short name is written to a system or portable file, that
1806 variable receives priority over other long names whose names begin
1807 with the same 8 bytes but which were not read from a data file under
1810 Variables not created by the system or portable file reader have no
1811 short name by default.
1813 A variable with a full name of 8 bytes or less in length has absolute
1814 priority for that name when the variable is written to a system file,
1815 even over a second variable with that assigned short name.
1817 PSPP does not enforce uniqueness of short names, although the short
1818 names read from any given data file will always be unique. If two
1819 variables with the same short name are written to a single data file,
1820 neither one receives priority.
1822 The following macros and functions relate to short names.
1824 @defmac SHORT_NAME_LEN
1825 Maximum length of a short name, in bytes. Its value is 8.
1828 @deftypefun {const char *} var_get_short_name (const struct variable *@var{var})
1829 Returns @var{var}'s short name, or a null pointer if @var{var} has not
1830 been assigned a short name.
1833 @deftypefun void var_set_short_name (struct variable *@var{var}, const char *@var{short_name})
1834 Sets @var{var}'s short name to @var{short_name}, or removes
1835 @var{var}'s short name if @var{short_name} is a null pointer. If it
1836 is non-null, then @var{short_name} must be a plausible name for a
1837 variable (@pxref{var_is_plausible_name}). The name will be truncated
1838 to 8 bytes in length and converted to all-uppercase.
1841 @deftypefun void var_clear_short_name (struct variable *@var{var})
1842 Removes @var{var}'s short name.
1845 @node Variable Relationships
1846 @subsection Variable Relationships
1848 Variables have close relationships with dictionaries
1849 (@pxref{Dictionaries}) and cases (@pxref{Cases}). A variable is
1850 usually a member of some dictionary, and a case is often used to store
1851 data for the set of variables in a dictionary.
1853 These functions report on these relationships. They may be applied
1854 only to variables that are in a dictionary.
1856 @deftypefun size_t var_get_dict_index (const struct variable *@var{var})
1857 Returns @var{var}'s index within its dictionary. The first variable
1858 in a dictionary has index 0, the next variable index 1, and so on.
1860 The dictionary index can be influenced using dictionary functions such
1861 as dict_reorder_var (@pxref{dict_reorder_var}).
1864 @deftypefun size_t var_get_case_index (const struct variable *@var{var})
1865 Returns @var{var}'s index within a case. The case index is an index
1866 into an array of @union{value} large enough to contain all the data in
1869 The returned case index can be used to access the value of @var{var}
1870 within a case for its dictionary, as in e.g.@: @code{case_data_idx
1871 (case, var_get_case_index (@var{var}))}, but ordinarily it is more
1872 convenient to use the data access functions that do variable-to-index
1873 translation internally, as in e.g.@: @code{case_data (case,
1877 @node Variable Auxiliary Data
1878 @subsection Variable Auxiliary Data
1880 Each @struct{variable} can have a single pointer to auxiliary data of
1881 type @code{void *}. These functions manipulate a variable's auxiliary
1884 Use of auxiliary data is discouraged because of its lack of
1885 flexibility. Only one client can make use of auxiliary data on a
1886 given variable at any time, even though many clients could usefully
1887 associate data with a variable.
1889 To prevent multiple clients from attempting to use a variable's single
1890 auxiliary data field at the same time, we adopt the convention that
1891 use of auxiliary data in the active file dictionary is restricted to
1892 the currently executing command. In particular, transformations must
1893 not attach auxiliary data to a variable in the active file in the
1894 expectation that it can be used later when the active file is read and
1895 the transformation is executed. To help enforce this restriction,
1896 auxiliary data is deleted from all variables in the active file
1897 dictionary after the execution of each PSPP command.
1899 This convention for safe use of auxiliary data applies only to the
1900 active file dictionary. Rules for other dictionaries may be
1901 established separately.
1903 Auxiliary data should be replaced by a more flexible mechanism at some
1904 point, but no replacement mechanism has been designed or implemented
1907 The following functions work with variable auxiliary data.
1909 @deftypefun {void *} var_get_aux (const struct variable *@var{var})
1910 Returns @var{var}'s auxiliary data, or a null pointer if none has been
1914 @deftypefun {void *} var_attach_aux (const struct variable *@var{var}, void *@var{aux}, void (*@var{aux_dtor}) (struct variable *))
1915 Sets @var{var}'s auxiliary data to @var{aux}, which must not be null.
1916 @var{var} must not already have auxiliary data.
1918 Before @var{var}'s auxiliary data is cleared by @code{var_clear_aux},
1919 @var{aux_dtor}, if non-null, will be called with @var{var} as its
1920 argument. It should free any storage associated with @var{aux}, if
1921 necessary. @code{var_dtor_free} may be appropriate for use as
1924 @deffn {Function} void var_dtor_free (struct variable *@var{var})
1925 Frees @var{var}'s auxiliary data by calling @code{free}.
1929 @deftypefun void var_clear_aux (struct variable *@var{var})
1930 Removes auxiliary data, if any, from @var{var}, first calling the
1931 destructor passed to @code{var_attach_aux}, if one was provided.
1933 Use @code{dict_clear_aux} to remove auxiliary data from every variable
1934 in a dictionary. @c (@pxref{dict_clear_aux}).
1937 @deftypefun {void *} var_detach_aux (struct variable *@var{var})
1938 Removes auxiliary data, if any, from @var{var}, and returns it.
1939 Returns a null pointer if @var{var} had no auxiliary data.
1941 Any destructor passed to @code{var_attach_aux} is not called, so the
1942 caller is responsible for freeing storage associated with the returned
1946 @node Variable Categorical Values
1947 @subsection Variable Categorical Values
1949 Some statistical procedures require a list of all the values that a
1950 categorical variable takes on. Arranging such a list requires making
1951 a pass through the data, so PSPP caches categorical values in
1954 When variable auxiliary data is revamped to support multiple clients
1955 as described in the previous section, categorical values are an
1956 obvious candidate. The form in which they are currently supported is
1959 Categorical values are not robust against changes in the data. That
1960 is, there is currently no way to detect that a transformation has
1961 changed data values, meaning that categorical values lists for the
1962 changed variables must be recomputed. PSPP is in fact in need of a
1963 general-purpose caching and cache-invalidation mechanism, but none
1964 has yet been designed and built.
1966 The following functions work with cached categorical values.
1968 @deftypefun {struct cat_vals *} var_get_obs_vals (const struct variable *@var{var})
1969 Returns @var{var}'s set of categorical values. Yields undefined
1970 behavior if @var{var} does not have any categorical values.
1973 @deftypefun void var_set_obs_vals (const struct variable *@var{var}, struct cat_vals *@var{cat_vals})
1974 Destroys @var{var}'s categorical values, if any, and replaces them by
1975 @var{cat_vals}, ownership of which is transferred to @var{var}. If
1976 @var{cat_vals} is a null pointer, then @var{var}'s categorical values
1980 @deftypefun bool var_has_obs_vals (const struct variable *@var{var})
1981 Returns true if @var{var} has a set of categorical values, false
1986 @section Dictionaries
1988 Each data file in memory or on disk has an associated dictionary,
1989 whose primary purpose is to describe the data in the file.
1990 @xref{Variables,,,pspp, PSPP Users Guide}, for a PSPP user's view of a
1993 A data file stored in a PSPP format, either as a system or portable
1994 file, has a representation of its dictionary embedded in it. Other
1995 kinds of data files are usually not self-describing enough to
1996 construct a dictionary unassisted, so the dictionaries for these files
1997 must be specified explicitly with PSPP commands such as @cmd{DATA
2000 The most important content of a dictionary is an array of variables,
2001 which must have unique names. A dictionary also conceptually contains
2002 a mapping from each of its variables to a location within a case
2003 (@pxref{Cases}), although in fact these mappings are stored within
2004 individual variables.
2006 System variables are not members of any dictionary (@pxref{System
2007 Variables,,,pspp, PSPP Users Guide}).
2009 Dictionaries are represented by @struct{dictionary}. Declarations
2010 related to dictionaries are in the @file{<data/dictionary.h>} header.
2012 The following sections describe functions for use with dictionaries.
2015 * Dictionary Variable Access::
2016 * Dictionary Creating Variables::
2017 * Dictionary Deleting Variables::
2018 * Dictionary Reordering Variables::
2019 * Dictionary Renaming Variables::
2020 * Dictionary Weight Variable::
2021 * Dictionary Filter Variable::
2022 * Dictionary Case Limit::
2023 * Dictionary Split Variables::
2024 * Dictionary File Label::
2025 * Dictionary Documents::
2028 @node Dictionary Variable Access
2029 @subsection Accessing Variables
2031 The most common operations on a dictionary simply retrieve a
2032 @code{struct variable *} of an individual variable based on its name
2035 @deftypefun {struct variable *} dict_lookup_var (const struct dictionary *@var{dict}, const char *@var{name})
2036 @deftypefunx {struct variable *} dict_lookup_var_assert (const struct dictionary *@var{dict}, const char *@var{name})
2037 Looks up and returns the variable with the given @var{name} within
2038 @var{dict}. Name lookup is not case-sensitive.
2040 @code{dict_lookup_var} returns a null pointer if @var{dict} does not
2041 contain a variable named @var{name}. @code{dict_lookup_var_assert}
2042 asserts that such a variable exists.
2045 @deftypefun {struct variable *} dict_get_var (const struct dictionary *@var{dict}, size_t @var{position})
2046 Returns the variable at the given @var{position} in @var{dict}.
2047 @var{position} must be less than the number of variables in @var{dict}
2051 @deftypefun size_t dict_get_var_cnt (const struct dictionary *@var{dict})
2052 Returns the number of variables in @var{dict}.
2055 Another pair of functions allows retrieving a number of variables at
2056 once. These functions are more rarely useful.
2058 @deftypefun void dict_get_vars (const struct dictionary *@var{dict}, const struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2059 @deftypefunx void dict_get_vars_mutable (const struct dictionary *@var{dict}, struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2060 Retrieves all of the variables in @var{dict}, in their original order,
2061 except that any variables in the dictionary classes specified
2062 @var{exclude}, if any, are excluded (@pxref{Dictionary Class}).
2063 Pointers to the variables are stored in an array allocated with
2064 @code{malloc}, and a pointer to the first element of this array is
2065 stored in @code{*@var{vars}}. The caller is responsible for freeing
2066 this memory when it is no longer needed. The number of variables
2067 retrieved is stored in @code{*@var{cnt}}.
2069 The presence or absence of @code{DC_SYSTEM} in @var{exclude} has no
2070 effect, because dictionaries never include system variables.
2073 One additional function is available. This function is most often
2074 used in assertions, but it is not restricted to such use.
2076 @deftypefun bool dict_contains_var (const struct dictionary *@var{dict}, const struct variable *@var{var})
2077 Tests whether @var{var} is one of the variables in @var{dict}.
2078 Returns true if so, false otherwise.
2081 @node Dictionary Creating Variables
2082 @subsection Creating Variables
2084 These functions create a new variable and insert it into a dictionary
2087 There is no provision for inserting an already created variable into a
2088 dictionary. There is no reason that such a function could not be
2089 written, but so far there has been no need for one.
2091 The names provided to one of these functions should be valid variable
2092 names and must be plausible variable names. @c (@pxref{Variable Names}).
2094 If a variable with the same name already exists in the dictionary, the
2095 non-@code{assert} variants of these functions return a null pointer,
2096 without modifying the dictionary. The @code{assert} variants, on the
2097 other hand, assert that no duplicate name exists.
2099 A variable may be in only one dictionary at any given time.
2101 @deftypefun {struct variable *} dict_create_var (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2102 @deftypefunx {struct variable *} dict_create_var_assert (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2103 Creates a new variable with the given @var{name} and @var{width}, as
2104 if through a call to @code{var_create} with those arguments
2105 (@pxref{var_create}), appends the new variable to @var{dict}'s array
2106 of variables, and returns the new variable.
2109 @deftypefun {struct variable *} dict_clone_var (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2110 @deftypefunx {struct variable *} dict_clone_var_assert (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2111 Creates a new variable as a clone of @var{var}, inserts the new
2112 variable into @var{dict}, and returns the new variable. The new
2113 variable is named @var{name}. Other properties of the new variable
2114 are copied from @var{old_var}, except for those not copied by
2115 @code{var_clone} (@pxref{var_clone}).
2117 @var{var} does not need to be a member of any dictionary.
2120 @node Dictionary Deleting Variables
2121 @subsection Deleting Variables
2123 These functions remove variables from a dictionary's array of
2124 variables. They also destroy the removed variables and free their
2127 Deleting a variable to which there might be external pointers is a bad
2128 idea. In particular, deleting variables from the active file
2129 dictionary is a risky proposition, because transformations can retain
2130 references to arbitrary variables. Therefore, no variable should be
2131 deleted from the active file dictionary when any transformations are
2132 active, because those transformations might reference the variable to
2133 be deleted. The safest time to delete a variable is just after a
2134 procedure has been executed, as done by @cmd{DELETE VARIABLES}.
2136 Deleting a variable automatically removes references to that variable
2137 from elsewhere in the dictionary as a weighting variable, filter
2138 variable, @cmd{SPLIT FILE} variable, or member of a vector.
2140 No functions are provided for removing a variable from a dictionary
2141 without destroying that variable. As with insertion of an existing
2142 variable, there is no reason that this could not be implemented, but
2143 so far there has been no need.
2145 @deftypefun void dict_delete_var (struct dictionary *@var{dict}, struct variable *@var{var})
2146 Deletes @var{var} from @var{dict}, of which it must be a member.
2149 @deftypefun void dict_delete_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{count})
2150 Deletes the @var{count} variables in array @var{vars} from @var{dict}.
2151 All of the variables in @var{vars} must be members of @var{dict}. No
2152 variable may be included in @var{vars} more than once.
2155 @deftypefun void dict_delete_consecutive_vars (struct dictionary *@var{dict}, size_t @var{idx}, size_t @var{count})
2156 Deletes the variables in sequential positions
2157 @var{idx}@dots{}@var{idx} + @var{count} (exclusive) from @var{dict},
2158 which must contain at least @var{idx} + @var{count} variables.
2161 @deftypefun void dict_delete_scratch_vars (struct dictionary *@var{dict})
2162 Deletes all scratch variables from @var{dict}.
2165 @node Dictionary Reordering Variables
2166 @subsection Changing Variable Order
2168 The variables in a dictionary are stored in an array. These functions
2169 change the order of a dictionary's array of variables without changing
2170 which variables are in the dictionary.
2172 @anchor{dict_reorder_var}
2173 @deftypefun void dict_reorder_var (struct dictionary *@var{dict}, struct variable *@var{var}, size_t @var{new_index})
2174 Moves @var{var}, which must be in @var{dict}, so that it is at
2175 position @var{new_index} in @var{dict}'s array of variables. Other
2176 variables in @var{dict}, if any, retain their relative positions.
2177 @var{new_index} must be less than the number of variables in
2181 @deftypefun void dict_reorder_vars (struct dictionary *@var{dict}, struct variable *const *@var{new_order}, size_t @var{count})
2182 Moves the @var{count} variables in @var{new_order} to the beginning of
2183 @var{dict}'s array of variables in the specified order. Other
2184 variables in @var{dict}, if any, retain their relative positions.
2186 All of the variables in @var{new_order} must be in @var{dict}. No
2187 duplicates are allowed within @var{new_order}, which means that
2188 @var{count} must be no greater than the number of variables in
2192 @node Dictionary Renaming Variables
2193 @subsection Renaming Variables
2195 These functions change the names of variables within a dictionary.
2196 The @func{var_set_name} function (@pxref{var_set_name}) cannot be
2197 applied directly to a variable that is in a dictionary, because
2198 @struct{dictionary} contains an index by name that @func{var_set_name}
2199 would not update. The following functions take care to update the
2200 index as well. They also ensure that variable renaming does not cause
2201 a dictionary to contain a duplicate variable name.
2203 @deftypefun void dict_rename_var (struct dictionary *@var{dict}, struct variable *@var{var}, const char *@var{new_name})
2204 Changes the name of @var{var}, which must be in @var{dict}, to
2205 @var{new_name}. A variable named @var{new_name} must not already be
2206 in @var{dict}, unless @var{new_name} is the same as @var{var}'s
2210 @deftypefun bool dict_rename_vars (struct dictionary *@var{dicT}, struct variable **@var{vars}, char **@var{new_names}, size_t @var{count}, char **@var{err_name})
2211 Renames each of the @var{count} variables in @var{vars} to the name in
2212 the corresponding position of @var{new_names}. If the renaming would
2213 result in a duplicate variable name, returns false and stores one of
2214 the names that would be be duplicated into @code{*@var{err_name}}, if
2215 @var{err_name} is non-null. Otherwise, the renaming is successful,
2216 and true is returned.
2219 @node Dictionary Weight Variable
2220 @subsection Weight Variable
2222 A data set's cases may optionally be weighted by the value of a
2223 numeric variable. @xref{WEIGHT,,,pspp, PSPP Users Guide}, for a user
2224 view of weight variables.
2226 The weight variable is written to and read from system and portable
2229 The most commonly useful function related to weighting is a
2230 convenience function to retrieve a weighting value from a case.
2232 @deftypefun double dict_get_case_weight (const struct dictionary *@var{dict}, const struct ccase *@var{case}, bool *@var{warn_on_invalid})
2233 Retrieves and returns the value of the weighting variable specified by
2234 @var{dict} from @var{case}. Returns 1.0 if @var{dict} has no
2237 Returns 0.0 if @var{c}'s weight value is user- or system-missing,
2238 zero, or negative. In such a case, if @var{warn_on_invalid} is
2239 non-null and @code{*@var{warn_on_invalid}} is true,
2240 @func{dict_get_case_weight} also issues an error message and sets
2241 @code{*@var{warn_on_invalid}} to false. To disable error reporting,
2242 pass a null pointer or a pointer to false as @var{warn_on_invalid} or
2243 use a @func{msg_disable}/@func{msg_enable} pair.
2246 The dictionary also has a pair of functions for getting and setting
2247 the weight variable.
2249 @deftypefun {struct variable *} dict_get_weight (const struct dictionary *@var{dict})
2250 Returns @var{dict}'s current weighting variable, or a null pointer if
2251 the dictionary does not have a weighting variable.
2254 @deftypefun void dict_set_weight (struct dictionary *@var{dict}, struct variable *@var{var})
2255 Sets @var{dict}'s weighting variable to @var{var}. If @var{var} is
2256 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2257 is null, then @var{dict}'s weighting variable, if any, is cleared.
2260 @node Dictionary Filter Variable
2261 @subsection Filter Variable
2263 When the active file is read by a procedure, cases can be excluded
2264 from analysis based on the values of a @dfn{filter variable}.
2265 @xref{FILTER,,,pspp, PSPP Users Guide}, for a user view of filtering.
2267 These functions store and retrieve the filter variable. They are
2268 rarely useful, because the data analysis framework automatically
2269 excludes from analysis the cases that should be filtered.
2271 @deftypefun {struct variable *} dict_get_filter (const struct dictionary *@var{dict})
2272 Returns @var{dict}'s current filter variable, or a null pointer if the
2273 dictionary does not have a filter variable.
2276 @deftypefun void dict_set_filter (struct dictionary *@var{dict}, struct variable *@var{var})
2277 Sets @var{dict}'s filter variable to @var{var}. If @var{var} is
2278 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2279 is null, then @var{dict}'s filter variable, if any, is cleared.
2282 @node Dictionary Case Limit
2283 @subsection Case Limit
2285 The limit on cases analyzed by a procedure, set by the @cmd{N OF
2286 CASES} command (@pxref{N OF CASES,,,pspp, PSPP Users Guide}), is
2287 stored as part of the dictionary. The dictionary does not, on the
2288 other hand, play any role in enforcing the case limit (a job done by
2289 data analysis framework code).
2291 A case limit of 0 means that the number of cases is not limited.
2293 These functions are rarely useful, because the data analysis framework
2294 automatically excludes from analysis any cases beyond the limit.
2296 @deftypefun casenumber dict_get_case_limit (const struct dictionary *@var{dict})
2297 Returns the current case limit for @var{dict}.
2300 @deftypefun void dict_set_case_limit (struct dictionary *@var{dict}, casenumber @var{limit})
2301 Sets @var{dict}'s case limit to @var{limit}.
2304 @node Dictionary Split Variables
2305 @subsection Split Variables
2307 The user may use the @cmd{SPLIT FILE} command (@pxref{SPLIT
2308 FILE,,,pspp, PSPP Users Guide}) to select a set of variables on which
2309 to split the active file into groups of cases to be analyzed
2310 independently in each statistical procedure. The set of split
2311 variables is stored as part of the dictionary, although the effect on
2312 data analysis is implemented by each individual statistical procedure.
2314 Split variables may be numeric or short or long string variables.
2316 The most useful functions for split variables are those to retrieve
2317 them. Even these functions are rarely useful directly: for the
2318 purpose of breaking cases into groups based on the values of the split
2319 variables, it is usually easier to use
2320 @func{casegrouper_create_splits}.
2322 @deftypefun {const struct variable *const *} dict_get_split_vars (const struct dictionary *@var{dict})
2323 Returns a pointer to an array of pointers to split variables. If and
2324 only if there are no split variables, returns a null pointer. The
2325 caller must not modify or free the returned array.
2328 @deftypefun size_t dict_get_split_cnt (const struct dictionary *@var{dict})
2329 Returns the number of split variables.
2332 The following functions are also available for working with split
2335 @deftypefun void dict_set_split_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{cnt})
2336 Sets @var{dict}'s split variables to the @var{cnt} variables in
2337 @var{vars}. If @var{cnt} is 0, then @var{dict} will not have any
2338 split variables. The caller retains ownership of @var{vars}.
2341 @deftypefun void dict_unset_split_var (struct dictionary *@var{dict}, struct variable *@var{var})
2342 Removes @var{var}, which must be a variable in @var{dict}, from
2343 @var{dict}'s split of split variables.
2346 @node Dictionary File Label
2347 @subsection File Label
2349 A dictionary may optionally have an associated string that describes
2350 its contents, called its file label. The user may set the file label
2351 with the @cmd{FILE LABEL} command (@pxref{FILE LABEL,,,pspp, PSPP
2354 These functions set and retrieve the file label.
2356 @deftypefun {const char *} dict_get_label (const struct dictionary *@var{dict})
2357 Returns @var{dict}'s file label. If @var{dict} does not have a label,
2358 returns a null pointer.
2361 @deftypefun void dict_set_label (struct dictionary *@var{dict}, const char *@var{label})
2362 Sets @var{dict}'s label to @var{label}. If @var{label} is non-null,
2363 then its content, truncated to at most 60 bytes, becomes the new file
2364 label. If @var{label} is null, then @var{dict}'s label is removed.
2366 The caller retains ownership of @var{label}.
2369 @node Dictionary Documents
2370 @subsection Documents
2372 A dictionary may include an arbitrary number of lines of explanatory
2373 text, called the dictionary's documents. For compatibility, document
2374 lines have a fixed width, and lines that are not exactly this width
2375 are truncated or padded with spaces as necessary to bring them to the
2378 PSPP users can use the @cmd{DOCUMENT} (@pxref{DOCUMENT,,,pspp, PSPP
2379 Users Guide}), @cmd{ADD DOCUMENT} (@pxref{ADD DOCUMENT,,,pspp, PSPP
2380 Users Guide}), and @cmd{DROP DOCUMENTS} (@pxref{DROP DOCUMENTS,,,pspp,
2381 PSPP Users Guide}) commands to manipulate documents.
2383 @deftypefn Macro int DOC_LINE_LENGTH
2384 The fixed length of a document line, in bytes, defined to 80.
2387 The following functions work with whole sets of documents. They
2388 accept or return sets of documents formatted as null-terminated
2389 strings that are an exact multiple of @code{DOC_LINE_LENGTH}
2392 @deftypefun {const char *} dict_get_documents (const struct dictionary *@var{dict})
2393 Returns the documents in @var{dict}, or a null pointer if @var{dict}
2397 @deftypefun void dict_set_documents (struct dictionary *@var{dict}, const char *@var{new_documents})
2398 Sets @var{dict}'s documents to @var{new_documents}. If
2399 @var{new_documents} is a null pointer or an empty string, then
2400 @var{dict}'s documents are cleared. The caller retains ownership of
2401 @var{new_documents}.
2404 @deftypefun void dict_clear_documents (struct dictionary *@var{dict})
2405 Clears the documents from @var{dict}.
2408 The following functions work with individual lines in a dictionary's
2411 @deftypefun void dict_add_document_line (struct dictionary *@var{dict}, const char *@var{content})
2412 Appends @var{content} to the documents in @var{dict}. The text in
2413 @var{content} will be truncated or padded with spaces as necessary to
2414 make it exactly @code{DOC_LINE_LENGTH} bytes long. The caller retains
2415 ownership of @var{content}.
2417 If @var{content} is over @code{DOC_LINE_LENGTH}, this function also
2418 issues a warning using @func{msg}. To suppress the warning, enclose a
2419 call to one of this function in a @func{msg_disable}/@func{msg_enable}
2423 @deftypefun size_t dict_get_document_line_cnt (const struct dictionary *@var{dict})
2424 Returns the number of line of documents in @var{dict}. If the
2425 dictionary contains no documents, returns 0.
2428 @deftypefun void dict_get_document_line (const struct dictionary *@var{dict}, size_t @var{idx}, struct string *@var{content})
2429 Replaces the text in @var{content} (which must already have been
2430 initialized by the caller) by the document line in @var{dict} numbered
2431 @var{idx}, which must be less than the number of lines of documents in
2432 @var{dict}. Any trailing white space in the document line is trimmed,
2433 so that @var{content} will have a length between 0 and
2434 @code{DOC_LINE_LENGTH}.
2437 @node Coding Conventions
2438 @section Coding Conventions
2440 Every @file{.c} file should have @samp{#include <config.h>} as its
2441 first non-comment line. No @file{.h} file should include
2444 This section needs to be finished.
2449 This section needs to be written.
2454 This section needs to be written.
2459 This section needs to be written.