1 @c PSPP - a program for statistical analysis.
2 @c Copyright (C) 2019 Free Software Foundation, Inc.
3 @c Permission is granted to copy, distribute and/or modify this document
4 @c under the terms of the GNU Free Documentation License, Version 1.3
5 @c or any later version published by the Free Software Foundation;
6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7 @c A copy of the license is included in the section entitled "GNU
8 @c Free Documentation License".
12 @chapter Basic Concepts
14 This chapter introduces basic data structures and other concepts
15 needed for developing in PSPP.
19 * Input and Output Formats::
20 * User-Missing Values::
24 * Coding Conventions::
34 The unit of data in PSPP is a @dfn{value}.
40 Values are classified by @dfn{type} and @dfn{width}. The
41 type of a value is either @dfn{numeric} or @dfn{string} (sometimes
42 called alphanumeric). The width of a string value ranges from 1 to
43 @code{MAX_STRING} bytes. The width of a numeric value is artificially
44 defined to be 0; thus, the type of a value can be inferred from its
47 Some support is provided for working with value types and widths, in
48 @file{data/val-type.h}:
50 @deftypefn Macro int MAX_STRING
51 Maximum width of a string value, in bytes, currently 32,767.
54 @deftypefun bool val_type_is_valid (enum val_type @var{val_type})
55 Returns true if @var{val_type} is a valid value type, that is,
56 either @code{VAL_NUMERIC} or @code{VAL_STRING}. Useful for
60 @deftypefun {enum val_type} val_type_from_width (int @var{width})
61 Returns @code{VAL_NUMERIC} if @var{width} is 0 and thus represents the
62 width of a numeric value, otherwise @code{VAL_STRING} to indicate that
63 @var{width} is the width of a string value.
66 The following subsections describe how values of each type are
72 * Runtime Typed Values::
76 @subsection Numeric Values
78 A value known to be numeric at compile time is represented as a
79 @code{double}. PSPP provides three values of @code{double} for
80 special purposes, defined in @file{data/val-type.h}:
82 @deftypefn Macro double SYSMIS
83 The @dfn{system-missing value}, used to represent a datum whose true
84 value is unknown, such as a survey question that was not answered by
85 the respondent, or undefined, such as the result of division by zero.
86 PSPP propagates the system-missing value through calculations and
87 compensates for missing values in statistical analyses. @xref{Missing
88 Observations,,,pspp, PSPP Users Guide}, for a PSPP user's view of
91 PSPP currently defines @code{SYSMIS} as @code{-DBL_MAX}, that is, the
92 greatest finite negative value of @code{double}. It is best not to
93 depend on this definition, because PSPP may transition to using an
94 IEEE NaN (not a number) instead at some point in the future.
97 @deftypefn Macro double LOWEST
98 @deftypefnx Macro double HIGHEST
99 The greatest finite negative (except for @code{SYSMIS}) and positive
100 values of @code{double}, respectively. These values do not ordinarily
101 appear in user data files. Instead, they are used to implement
102 endpoints of open-ended ranges that are occasionally permitted in PSPP
103 syntax, e.g.@: @code{5 THRU HI} as a range of missing values
104 (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}).
108 @subsection String Values
110 A value known at compile time to have string type is represented as an
111 array of @code{char}. String values do not necessarily represent
112 readable text strings and may contain arbitrary 8-bit data, including
113 null bytes, control codes, and bytes with the high bit set. Thus,
114 string values are not null-terminated strings, but rather opaque
117 @code{SYSMIS}, @code{LOWEST}, and @code{HIGHEST} have no equivalents
118 as string values. Usually, PSPP fills an unknown or undefined string
119 values with spaces, but PSPP does not treat such a string as a special
120 case when it processes it later.
123 @code{MAX_STRING}, the maximum length of a string value, is defined in
124 @file{data/val-type.h}.
126 @node Runtime Typed Values
127 @subsection Runtime Typed Values
129 When a value's type is only known at runtime, it is often represented
130 as a @union{value}, defined in @file{data/value.h}. A @union{value}
131 does not identify the type or width of the data it contains. Code
132 that works with @union{values}s must therefore have external knowledge
133 of its content, often through the type and width of a
134 @struct{variable} (@pxref{Variables}).
136 @union{value} has one member that clients are permitted to access
137 directly, a @code{double} named @samp{f} that stores the content of a
138 numeric @union{value}. It has other members that store the content of
139 string @union{value}, but client code should use accessor functions
140 instead of referring to these directly.
142 PSPP provides some functions for working with @union{value}s. The
143 most useful are described below. To use these functions, recall that
144 a numeric value has a width of 0.
146 @deftypefun void value_init (union value *@var{value}, int @var{width})
147 Initializes @var{value} as a value of the given @var{width}. After
148 initialization, the data in @var{value} are indeterminate; the caller
149 is responsible for storing initial data in it.
152 @deftypefun void value_destroy (union value *@var{value}, int @var{width})
153 Frees auxiliary storage associated with @var{value}, which must have
154 the given @var{width}.
157 @deftypefun bool value_needs_init (int @var{width})
158 For some widths, @func{value_init} and @func{value_destroy} do not
159 actually do anything, because no additional storage is needed beyond
160 the size of @union{value}. This function returns true if @var{width}
161 is such a width, which case there is no actual need to call those
162 functions. This can be a useful optimization if a large number of
163 @union{value}s of such a width are to be initialized or destroyed.
165 This function returns false if @func{value_init} and
166 @func{value_destroy} are actually required for the given @var{width}.
169 @deftypefun void value_copy (union value *@var{dst}, @
170 const union value *@var{src}, @
172 Copies the contents of @union{value} @var{src} to @var{dst}. Both
173 @var{dst} and @var{src} must have been initialized with the specified
177 @deftypefun void value_set_missing (union value *@var{value}, int @var{width})
178 Sets @var{value} to @code{SYSMIS} if it is numeric or to all spaces if
179 it is alphanumeric, according to @var{width}. @var{value} must have
180 been initialized with the specified @var{width}.
183 @anchor{value_is_resizable}
184 @deftypefun bool value_is_resizable (const union value *@var{value}, int @var{old_width}, int @var{new_width})
185 Determines whether @var{value}, which must have been initialized with
186 the specified @var{old_width}, may be resized to @var{new_width}.
187 Resizing is possible if the following criteria are met. First,
188 @var{old_width} and @var{new_width} must be both numeric or both
189 string widths. Second, if @var{new_width} is a short string width and
190 less than @var{old_width}, resizing is allowed only if bytes
191 @var{new_width} through @var{old_width} in @var{value} contain only
194 These rules are part of those used by @func{mv_is_resizable} and
195 @func{val_labs_can_set_width}.
198 @deftypefun void value_resize (union value *@var{value}, int @var{old_width}, int @var{new_width})
199 Resizes @var{value} from @var{old_width} to @var{new_width}, which
200 must be allowed by the rules stated above. @var{value} must have been
201 initialized with the specified @var{old_width} before calling this
202 function. After resizing, @var{value} has width @var{new_width}.
204 If @var{new_width} is greater than @var{old_width}, @var{value} will
205 be padded on the right with spaces to the new width. If
206 @var{new_width} is less than @var{old_width}, the rightmost bytes of
207 @var{value} are truncated.
210 @deftypefun bool value_equal (const union value *@var{a}, const union value *@var{b}, int @var{width})
211 Compares of @var{a} and @var{b}, which must both have width
212 @var{width}. Returns true if their contents are the same, false if
216 @deftypefun int value_compare_3way (const union value *@var{a}, const union value *@var{b}, int @var{width})
217 Compares of @var{a} and @var{b}, which must both have width
218 @var{width}. Returns -1 if @var{a} is less than @var{b}, 0 if they
219 are equal, or 1 if @var{a} is greater than @var{b}.
221 Numeric values are compared numerically, with @code{SYSMIS} comparing
222 less than any real number. String values are compared
223 lexicographically byte-by-byte.
226 @deftypefun size_t value_hash (const union value *@var{value}, int @var{width}, unsigned int @var{basis})
227 Computes and returns a hash of @var{value}, which must have the
228 specified @var{width}. The value in @var{basis} is folded into the
232 @node Input and Output Formats
233 @section Input and Output Formats
235 Input and output formats specify how to convert data fields to and
236 from data values (@pxref{Input and Output Formats,,,pspp, PSPP Users
237 Guide}). PSPP uses @struct{fmt_spec} to represent input and output
240 Function prototypes and other declarations related to formats are in
241 the @file{<data/format.h>} header.
243 @deftp {Structure} {struct fmt_spec}
244 An input or output format, with the following members:
247 @item enum fmt_type type
248 The format type (see below).
251 Field width, in bytes. The width of numeric fields is always between
252 1 and 40 bytes, and the width of string fields is always between 1 and
253 65534 bytes. However, many individual types of formats place stricter
254 limits on field width (see @ref{fmt_max_input_width},
255 @ref{fmt_max_output_width}).
258 Number of decimal places, in character positions. For format types
259 that do not allow decimal places to be specified, this value must be
260 0. Format types that do allow decimal places have type-specific and
261 often width-specific restrictions on @code{d} (see
262 @ref{fmt_max_input_decimals}, @ref{fmt_max_output_decimals}).
266 @deftp {Enumeration} {enum fmt_type}
267 An enumerated type representing an input or output format type. Each
268 PSPP input and output format has a corresponding enumeration constant
269 prefixed by @samp{FMT}: @code{FMT_F}, @code{FMT_COMMA},
270 @code{FMT_DOT}, and so on.
273 The following sections describe functions for manipulating formats and
274 the data in fields represented by formats.
277 * Constructing and Verifying Formats::
278 * Format Utility Functions::
279 * Obtaining Properties of Format Types::
280 * Numeric Formatting Styles::
281 * Formatted Data Input and Output::
284 @node Constructing and Verifying Formats
285 @subsection Constructing and Verifying Formats
287 These functions construct @struct{fmt_spec}s and verify that they are
292 @deftypefun {struct fmt_spec} fmt_for_input (enum fmt_type @var{type}, int @var{w}, int @var{d})
293 @deftypefunx {struct fmt_spec} fmt_for_output (enum fmt_type @var{type}, int @var{w}, int @var{d})
294 Constructs a @struct{fmt_spec} with the given @var{type}, @var{w}, and
295 @var{d}, asserts that the result is a valid input (or output) format,
299 @anchor{fmt_for_output_from_input}
300 @deftypefun {struct fmt_spec} fmt_for_output_from_input (const struct fmt_spec *@var{input})
301 Given @var{input}, which must be a valid input format, returns the
302 equivalent output format. @xref{Input and Output Formats,,,pspp, PSPP
303 Users Guide}, for the rules for converting input formats into output
307 @deftypefun {struct fmt_spec} fmt_default_for_width (int @var{width})
308 Returns the default output format for a variable of the given
309 @var{width}. For a numeric variable, this is F8.2 format; for a
310 string variable, it is the A format of the given @var{width}.
313 The following functions check whether a @struct{fmt_spec} is valid for
314 various uses and return true if so, false otherwise. When any of them
315 returns false, it also outputs an explanatory error message using
316 @func{msg}. To suppress error output, enclose a call to one of these
317 functions by a @func{msg_disable}/@func{msg_enable} pair.
319 @deftypefun bool fmt_check (const struct fmt_spec *@var{format}, bool @var{for_input})
320 @deftypefunx bool fmt_check_input (const struct fmt_spec *@var{format})
321 @deftypefunx bool fmt_check_output (const struct fmt_spec *@var{format})
322 Checks whether @var{format} is a valid input format (for
323 @func{fmt_check_input}, or @func{fmt_check} if @var{for_input}) or
324 output format (for @func{fmt_check_output}, or @func{fmt_check} if not
328 @deftypefun bool fmt_check_type_compat (const struct fmt_spec *@var{format}, enum val_type @var{type})
329 Checks whether @var{format} matches the value type @var{type}, that
330 is, if @var{type} is @code{VAL_NUMERIC} and @var{format} is a numeric
331 format or @var{type} is @code{VAL_STRING} and @var{format} is a string
335 @deftypefun bool fmt_check_width_compat (const struct fmt_spec *@var{format}, int @var{width})
336 Checks whether @var{format} may be used as an output format for a
337 value of the given @var{width}.
339 @func{fmt_var_width}, described in
340 the following section, can be also be used to determine the value
341 width needed by a format.
344 @node Format Utility Functions
345 @subsection Format Utility Functions
347 These functions work with @struct{fmt_spec}s.
349 @deftypefun int fmt_var_width (const struct fmt_spec *@var{format})
350 Returns the width for values associated with @var{format}. If
351 @var{format} is a numeric format, the width is 0; if @var{format} is
352 an A format, then the width @code{@var{format}->w}; otherwise,
353 @var{format} is an AHEX format and its width is @code{@var{format}->w
357 @deftypefun char *fmt_to_string (const struct fmt_spec *@var{format}, char @var{s}[FMT_STRING_LEN_MAX + 1])
358 Converts @var{format} to a human-readable format specifier in @var{s}
359 and returns @var{s}. @var{format} need not be a valid input or output
360 format specifier, e.g.@: it is allowed to have an excess width or
361 decimal places. In particular, if @var{format} has decimals, they are
362 included in the output string, even if @var{format}'s type does not
363 allow decimals, to allow accurately presenting incorrect formats to
367 @deftypefun bool fmt_equal (const struct fmt_spec *@var{a}, const struct fmt_spec *@var{b})
368 Compares @var{a} and @var{b} memberwise and returns true if they are
369 identical, false otherwise. @var{format} need not be a valid input or
370 output format specifier.
373 @deftypefun void fmt_resize (struct fmt_spec *@var{fmt}, int @var{width})
374 Sets the width of @var{fmt} to a valid format for a @union{value} of size @var{width}.
377 @node Obtaining Properties of Format Types
378 @subsection Obtaining Properties of Format Types
380 These functions work with @enum{fmt_type}s instead of the higher-level
381 @struct{fmt_spec}s. Their primary purpose is to report properties of
382 each possible format type, which in turn allows clients to abstract
383 away many of the details of the very heterogeneous requirements of
386 The first group of functions works with format type names.
388 @deftypefun const char *fmt_name (enum fmt_type @var{type})
389 Returns the name for the given @var{type}, e.g.@: @code{"COMMA"} for
393 @deftypefun bool fmt_from_name (const char *@var{name}, enum fmt_type *@var{type})
394 Tries to find the @enum{fmt_type} associated with @var{name}. If
395 successful, sets @code{*@var{type}} to the type and returns true;
396 otherwise, returns false without modifying @code{*@var{type}}.
399 The functions below query basic limits on width and decimal places for
402 @deftypefun bool fmt_takes_decimals (enum fmt_type @var{type})
403 Returns true if a format of the given @var{type} is allowed to have a
404 nonzero number of decimal places (the @code{d} member of
405 @struct{fmt_spec}), false if not.
408 @anchor{fmt_min_input_width}
409 @anchor{fmt_max_input_width}
410 @anchor{fmt_min_output_width}
411 @anchor{fmt_max_output_width}
412 @deftypefun int fmt_min_input_width (enum fmt_type @var{type})
413 @deftypefunx int fmt_max_input_width (enum fmt_type @var{type})
414 @deftypefunx int fmt_min_output_width (enum fmt_type @var{type})
415 @deftypefunx int fmt_max_output_width (enum fmt_type @var{type})
416 Returns the minimum or maximum width (the @code{w} member of
417 @struct{fmt_spec}) allowed for an input or output format of the
418 specified @var{type}.
421 @anchor{fmt_max_input_decimals}
422 @anchor{fmt_max_output_decimals}
423 @deftypefun int fmt_max_input_decimals (enum fmt_type @var{type}, int @var{width})
424 @deftypefunx int fmt_max_output_decimals (enum fmt_type @var{type}, int @var{width})
425 Returns the maximum number of decimal places allowed for an input or
426 output format, respectively, of the given @var{type} and @var{width}.
427 Returns 0 if the specified @var{type} does not allow any decimal
428 places or if @var{width} is too narrow to allow decimal places.
431 @deftypefun int fmt_step_width (enum fmt_type @var{type})
432 Returns the ``width step'' for a @struct{fmt_spec} of the given
433 @var{type}. A @struct{fmt_spec}'s width must be a multiple of its
434 type's width step. Most format types have a width step of 1, so that
435 their formats' widths may be any integer within the valid range, but
436 hexadecimal numeric formats and AHEX string formats have a width step
440 These functions allow clients to broadly determine how each kind of
441 input or output format behaves.
443 @deftypefun bool fmt_is_string (enum fmt_type @var{type})
444 @deftypefunx bool fmt_is_numeric (enum fmt_type @var{type})
445 Returns true if @var{type} is a format for numeric or string values,
446 respectively, false otherwise.
449 @deftypefun enum fmt_category fmt_get_category (enum fmt_type @var{type})
450 Returns the category within which @var{type} falls.
452 @deftp {Enumeration} {enum fmt_category}
453 A group of format types. Format type categories correspond to the
454 input and output categories described in the PSPP user documentation
455 (@pxref{Input and Output Formats,,,pspp, PSPP Users Guide}).
457 Each format is in exactly one category. The categories have bitwise
458 disjoint values to make it easy to test whether a format type is in
459 one of multiple categories, e.g.@:
462 if (fmt_get_category (type) & (FMT_CAT_DATE | FMT_CAT_TIME))
464 /* @dots{}@r{@code{type} is a date or time format}@dots{} */
468 The format categories are:
471 Basic numeric formats.
474 Custom currency formats.
477 Legacy numeric formats.
482 @item FMT_CAT_HEXADECIMAL
491 @item FMT_CAT_DATE_COMPONENT
492 Date component formats.
500 The PSPP input and output routines use the following pair of functions
501 to convert @enum{fmt_type}s to and from the separate set of codes used
502 in system and portable files:
504 @deftypefun int fmt_to_io (enum fmt_type @var{type})
505 Returns the format code used in system and portable files that
506 corresponds to @var{type}.
509 @deftypefun bool fmt_from_io (int @var{io}, enum fmt_type *@var{type})
510 Converts @var{io}, a format code used in system and portable files,
511 into a @enum{fmt_type} in @code{*@var{type}}. Returns true if
512 successful, false if @var{io} is not valid.
515 These functions reflect the relationship between input and output
518 @deftypefun enum fmt_type fmt_input_to_output (enum fmt_type @var{type})
519 Returns the output format type that is used by default by DATA LIST
520 and other input procedures when @var{type} is specified as an input
521 format. The conversion from input format to output format is more
522 complicated than simply changing the format.
523 @xref{fmt_for_output_from_input}, for a function that performs the
527 @deftypefun bool fmt_usable_for_input (enum fmt_type @var{type})
528 Returns true if @var{type} may be used as an input format type, false
529 otherwise. The custom currency formats, in particular, may be used
530 for output but not for input.
532 All format types are valid for output.
535 The final group of format type property functions obtain
536 human-readable templates that illustrate the formats graphically.
538 @deftypefun const char *fmt_date_template (enum fmt_type @var{type})
539 Returns a formatting template for @var{type}, which must be a date or
540 time format type. These formats are used by @func{data_in} and
541 @func{data_out} to guide parsing and formatting date and time data.
544 @deftypefun char *fmt_dollar_template (const struct fmt_spec *@var{format})
545 Returns a string of the form @code{$#,###.##} according to
546 @var{format}, which must be of type @code{FMT_DOLLAR}. The caller
547 must free the string with @code{free}.
550 @node Numeric Formatting Styles
551 @subsection Numeric Formatting Styles
553 Each of the basic numeric formats (F, E, COMMA, DOT, DOLLAR, PCT) and
554 custom currency formats (CCA, CCB, CCC, CCD, CCE) has an associated
555 numeric formatting style, represented by @struct{fmt_number_style}.
556 Input and output conversion of formats that have numeric styles is
557 determined mainly by the style, although the formatting rules have
558 special cases that are not represented within the style.
560 @deftp {Structure} {struct fmt_number_style}
561 A structure type with the following members:
564 @item struct substring neg_prefix
565 @itemx struct substring prefix
566 @itemx struct substring suffix
567 @itemx struct substring neg_suffix
568 A set of strings used a prefix to negative numbers, a prefix to every
569 number, a suffix to every number, and a suffix to negative numbers,
570 respectively. Each of these strings is no more than
571 @code{FMT_STYLE_AFFIX_MAX} bytes (currently 16) bytes in length.
572 These strings must be freed with @func{ss_dealloc} when no longer
576 The character used as a decimal point. It must be either @samp{.} or
580 The character used for grouping digits to the left of the decimal
581 point. It may be @samp{.} or @samp{,}, in which case it must not be
582 equal to @code{decimal}, or it may be set to 0 to disable grouping.
586 The following functions are provided for working with numeric
589 @deftypefun void fmt_number_style_init (struct fmt_number_style *@var{style})
590 Initialises a @struct{fmt_number_style} with all of the
591 prefixes and suffixes set to the empty string, @samp{.} as the decimal
592 point character, and grouping disables.
596 @deftypefun void fmt_number_style_destroy (struct fmt_number_style *@var{style})
597 Destroys @var{style}, freeing its storage.
600 @deftypefun {struct fmt_number_style} *fmt_create (void)
601 A function which creates an array of all the styles used by pspp, and
602 calls fmt_number_style_init on each of them.
605 @deftypefun void fmt_done (struct fmt_number_style *@var{styles})
606 A wrapper function which takes an array of @struct{fmt_number_style}, calls
607 fmt_number_style_destroy on each of them, and then frees the array.
612 @deftypefun int fmt_affix_width (const struct fmt_number_style *@var{style})
613 Returns the total length of @var{style}'s @code{prefix} and @code{suffix}.
616 @deftypefun int fmt_neg_affix_width (const struct fmt_number_style *@var{style})
617 Returns the total length of @var{style}'s @code{neg_prefix} and
621 PSPP maintains a global set of number styles for each of the basic
622 numeric formats and custom currency formats. The following functions
623 work with these global styles:
625 @deftypefun {const struct fmt_number_style *} fmt_get_style (enum fmt_type @var{type})
626 Returns the numeric style for the given format @var{type}.
629 @deftypefun {const char *} fmt_name (enum fmt_type @var{type})
630 Returns the name of the given format @var{type}.
635 @node Formatted Data Input and Output
636 @subsection Formatted Data Input and Output
638 These functions provide the ability to convert data fields into
639 @union{value}s and vice versa.
641 @deftypefun bool data_in (struct substring @var{input}, const char *@var{encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, const struct dictionary *@var{dict}, union value *@var{output}, int @var{width})
642 Parses @var{input} as a field containing data in the given format
643 @var{type}. The resulting value is stored in @var{output}, which the
644 caller must have initialized with the given @var{width}. For
645 consistency, @var{width} must be 0 if
646 @var{type} is a numeric format type and greater than 0 if @var{type}
647 is a string format type.
648 @var{encoding} should be set to indicate the character
649 encoding of @var{input}.
650 @var{dict} must be a pointer to the dictionary with which @var{output}
653 If @var{input} is the empty string (with length 0), @var{output} is
654 set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP
655 Users Guide}) for a numeric value, or to all spaces for a string
656 value. This applies regardless of the usual parsing requirements for
659 If @var{implied_decimals} is greater than zero, then the numeric
660 result is shifted right by @var{implied_decimals} decimal places if
661 @var{input} does not contain a decimal point character or an exponent.
662 Only certain numeric format types support implied decimal places; for
663 string formats and other numeric formats, @var{implied_decimals} has
664 no effect. DATA LIST FIXED is the primary user of this feature
665 (@pxref{DATA LIST FIXED,,,pspp, PSPP Users Guide}). Other callers
666 should generally specify 0 for @var{implied_decimals}, to disable this
669 When @var{input} contains invalid input data, @func{data_in} outputs a
670 message using @func{msg}.
672 If @var{first_column} is
673 nonzero, it is included in any such error message as the 1-based
674 column number of the start of the field. The last column in the field
675 is calculated as @math{@var{first_column} + @var{input} - 1}. To
676 suppress error output, enclose the call to @func{data_in} by calls to
677 @func{msg_disable} and @func{msg_enable}.
679 This function returns true on success, false if a message was output
680 (even if suppressed). Overflow and underflow provoke warnings but are
681 not propagated to the caller as errors.
683 This function is declared in @file{data/data-in.h}.
686 @deftypefun char * data_out (const union value *@var{input}, const struct fmt_spec *@var{format})
687 @deftypefunx char * data_out_legacy (const union value *@var{input}, const char *@var{encoding}, const struct fmt_spec *@var{format})
688 Converts the data pointed to by @var{input} into a string value, which
689 will be encoded in UTF-8, according to output format specifier @var{format}.
691 must be a valid output format. The width of @var{input} is
692 inferred from @var{format} using an algorithm equivalent to
693 @func{fmt_var_width}.
695 When @var{input} contains data that cannot be represented in the given
696 @var{format}, @func{data_out} may output a message using @func{msg},
698 although the current implementation does not
699 consistently do so. To suppress error output, enclose the call to
700 @func{data_out} by calls to @func{msg_disable} and @func{msg_enable}.
702 This function is declared in @file{data/data-out.h}.
705 @node User-Missing Values
706 @section User-Missing Values
708 In addition to the system-missing value for numeric values, each
709 variable has a set of user-missing values (@pxref{MISSING
710 VALUES,,,pspp, PSPP Users Guide}). A set of user-missing values is
711 represented by @struct{missing_values}.
713 It is rarely necessary to interact directly with a
714 @struct{missing_values} object. Instead, the most common operation,
715 querying whether a particular value is a missing value for a given
716 variable, is most conveniently executed through functions on
717 @struct{variable}. @xref{Variable Missing Values}, for details.
719 A @struct{missing_values} is essentially a set of @union{value}s that
720 have a common value width (@pxref{Values}). For a set of
721 missing values associated with a variable (the common case), the set's
722 width is the same as the variable's width.
724 Function prototypes and other declarations related to missing values
725 are declared in @file{data/missing-values.h}.
727 @deftp {Structure} {struct missing_values}
728 Opaque type that represents a set of missing values.
731 The contents of a set of missing values is subject to some
732 restrictions. Regardless of width, a set of missing values is allowed
733 to be empty. A set of numeric missing values may contain up to three
734 discrete numeric values, or a range of numeric values (which includes
735 both ends of the range), or a range plus one discrete numeric value.
736 A set of string missing values may contain up to three discrete string
737 values (with the same width as the set), but ranges are not supported.
739 In addition, values in string missing values wider than
740 @code{MV_MAX_STRING} bytes may contain non-space characters only in
741 their first @code{MV_MAX_STRING} bytes; all the bytes after the first
742 @code{MV_MAX_STRING} must be spaces. @xref{mv_is_acceptable}, for a
743 function that tests a value against these constraints.
745 @deftypefn Macro int MV_MAX_STRING
746 Number of bytes in a string missing value that are not required to be
747 spaces. The current value is 8, a value which is fixed by the system
748 file format. In PSPP we could easily eliminate this restriction, but
749 doing so would also require us to extend the system file format in an
750 incompatible way, which we consider a bad tradeoff.
753 The most often useful functions for missing values are those for
754 testing whether a given value is missing, described in the following
755 section. Several other functions for creating, inspecting, and
756 modifying @struct{missing_values} objects are described afterward, but
757 these functions are much more rarely useful.
760 * Testing for Missing Values::
761 * Creating and Destroying User-Missing Values::
762 * Changing User-Missing Value Set Width::
763 * Inspecting User-Missing Value Sets::
764 * Modifying User-Missing Value Sets::
767 @node Testing for Missing Values
768 @subsection Testing for Missing Values
770 The most often useful functions for missing values are those for
771 testing whether a given value is missing, described here. However,
772 using one of the corresponding missing value testing functions for
773 variables can be even easier (@pxref{Variable Missing Values}).
775 @deftypefun bool mv_is_value_missing (const struct missing_values *@var{mv}, const union value *@var{value}, enum mv_class @var{class})
776 @deftypefunx bool mv_is_num_missing (const struct missing_values *@var{mv}, double @var{value}, enum mv_class @var{class})
777 @deftypefunx bool mv_is_str_missing (const struct missing_values *@var{mv}, const char @var{value}[], enum mv_class @var{class})
778 Tests whether @var{value} is in one of the categories of missing
779 values given by @var{class}. Returns true if so, false otherwise.
781 @var{mv} determines the width of @var{value} and provides the set of
782 user-missing values to test.
784 The only difference among these functions in the form in which
785 @var{value} is provided, so you may use whichever function is most
788 The @var{class} argument determines the exact kinds of missing values
789 that the functions test for:
791 @deftp Enumeration {enum mv_class}
794 Returns true if @var{value} is in the set of user-missing values given
798 Returns true if @var{value} is system-missing. (If @var{mv}
799 represents a set of string values, then @var{value} is never
803 @itemx MV_USER | MV_SYSTEM
804 Returns true if @var{value} is user-missing or system-missing.
807 Always returns false, that is, @var{value} is never considered
813 @node Creating and Destroying User-Missing Values
814 @subsection Creation and Destruction
816 These functions create and destroy @struct{missing_values} objects.
818 @deftypefun void mv_init (struct missing_values *@var{mv}, int @var{width})
819 Initializes @var{mv} as a set of user-missing values. The set is
820 initially empty. Any values added to it must have the specified
824 @deftypefun void mv_destroy (struct missing_values *@var{mv})
825 Destroys @var{mv}, which must not be referred to again.
828 @deftypefun void mv_copy (struct missing_values *@var{mv}, const struct missing_values *@var{old})
829 Initializes @var{mv} as a copy of the existing set of user-missing
833 @deftypefun void mv_clear (struct missing_values *@var{mv})
834 Empties the user-missing value set @var{mv}, retaining its existing
838 @node Changing User-Missing Value Set Width
839 @subsection Changing User-Missing Value Set Width
841 A few PSPP language constructs copy sets of user-missing values from
842 one variable to another. When the source and target variables have
843 the same width, this is simple. But when the target variable's width
844 might be different from the source variable's, it takes a little more
845 work. The functions described here can help.
847 In fact, it is usually unnecessary to call these functions directly.
848 Most of the time @func{var_set_missing_values}, which uses
849 @func{mv_resize} internally to resize the new set of missing values to
850 the required width, may be used instead.
851 @xref{var_set_missing_values}, for more information.
853 @deftypefun bool mv_is_resizable (const struct missing_values *@var{mv}, int @var{new_width})
854 Tests whether @var{mv}'s width may be changed to @var{new_width} using
855 @func{mv_resize}. Returns true if it is allowed, false otherwise.
857 If @var{mv} contains any missing values, then it may be resized only
858 if each missing value may be resized, as determined by
859 @func{value_is_resizable} (@pxref{value_is_resizable}).
863 @deftypefun void mv_resize (struct missing_values *@var{mv}, int @var{width})
864 Changes @var{mv}'s width to @var{width}. @var{mv} and @var{width}
865 must satisfy the constraints explained above.
867 When a string missing value set's width is increased, each
868 user-missing value is padded on the right with spaces to the new
872 @node Inspecting User-Missing Value Sets
873 @subsection Inspecting User-Missing Value Sets
875 These functions inspect the properties and contents of
876 @struct{missing_values} objects.
878 The first set of functions inspects the discrete values that sets of
879 user-missing values may contain:
881 @deftypefun bool mv_is_empty (const struct missing_values *@var{mv})
882 Returns true if @var{mv} contains no user-missing values, false if it
883 contains at least one user-missing value (either a discrete value or a
887 @deftypefun int mv_get_width (const struct missing_values *@var{mv})
888 Returns the width of the user-missing values that @var{mv} represents.
891 @deftypefun int mv_n_values (const struct missing_values *@var{mv})
892 Returns the number of discrete user-missing values included in
893 @var{mv}. The return value will be between 0 and 3. For sets of
894 numeric user-missing values that include a range, the return value
898 @deftypefun bool mv_has_value (const struct missing_values *@var{mv})
899 Returns true if @var{mv} has at least one discrete user-missing
900 values, that is, if @func{mv_n_values} would return nonzero for
904 @deftypefun {const union value *} mv_get_value (const struct missing_values *@var{mv}, int @var{index})
905 Returns the discrete user-missing value in @var{mv} with the given
906 @var{index}. The caller must not modify or free the returned value or
907 refer to it after modifying or freeing @var{mv}. The index must be
908 less than the number of discrete user-missing values in @var{mv}, as
909 reported by @func{mv_n_values}.
912 The second set of functions inspects the single range of values that
913 numeric sets of user-missing values may contain:
915 @deftypefun bool mv_has_range (const struct missing_values *@var{mv})
916 Returns true if @var{mv} includes a range, false otherwise.
919 @deftypefun void mv_get_range (const struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
920 Stores the low endpoint of @var{mv}'s range in @code{*@var{low}} and
921 the high endpoint of the range in @code{*@var{high}}. @var{mv} must
925 @node Modifying User-Missing Value Sets
926 @subsection Modifying User-Missing Value Sets
928 These functions modify the contents of @struct{missing_values}
931 The next set of functions applies to all sets of user-missing values:
933 @deftypefun bool mv_add_value (struct missing_values *@var{mv}, const union value *@var{value})
934 @deftypefunx bool mv_add_str (struct missing_values *@var{mv}, const char @var{value}[])
935 @deftypefunx bool mv_add_num (struct missing_values *@var{mv}, double @var{value})
936 Attempts to add the given discrete @var{value} to set of user-missing
937 values @var{mv}. @var{value} must have the same width as @var{mv}.
938 Returns true if @var{value} was successfully added, false if the set
939 could not accept any more discrete values or if @var{value} is not an
940 acceptable user-missing value (see @func{mv_is_acceptable} below).
942 These functions are equivalent, except for the form in which
943 @var{value} is provided, so you may use whichever function is most
947 @deftypefun void mv_pop_value (struct missing_values *@var{mv}, union value *@var{value})
948 Removes a discrete value from @var{mv} (which must contain at least
949 one discrete value) and stores it in @var{value}.
952 @deftypefun bool mv_replace_value (struct missing_values *@var{mv}, const union value *@var{value}, int @var{index})
953 Attempts to replace the discrete value with the given @var{index} in
954 @var{mv} (which must contain at least @var{index} + 1 discrete values)
955 by @var{value}. Returns true if successful, false if @var{value} is
956 not an acceptable user-missing value (see @func{mv_is_acceptable}
960 @deftypefun bool mv_is_acceptable (const union value *@var{value}, int @var{width})
961 @anchor{mv_is_acceptable}
962 Returns true if @var{value}, which must have the specified
963 @var{width}, may be added to a missing value set of the same
964 @var{width}, false if it cannot. As described above, all numeric
965 values and string values of width @code{MV_MAX_STRING} or less may be
966 added, but string value of greater width may be added only if bytes
967 beyond the first @code{MV_MAX_STRING} are all spaces.
970 The second set of functions applies only to numeric sets of
973 @deftypefun bool mv_add_range (struct missing_values *@var{mv}, double @var{low}, double @var{high})
974 Attempts to add a numeric range covering @var{low}@dots{}@var{high}
975 (inclusive on both ends) to @var{mv}, which must be a numeric set of
976 user-missing values. Returns true if the range is successful added,
977 false on failure. Fails if @var{mv} already contains a range, or if
978 @var{mv} contains more than one discrete value, or if @var{low} >
982 @deftypefun void mv_pop_range (struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
983 Given @var{mv}, which must be a numeric set of user-missing values
984 that contains a range, removes that range from @var{mv} and stores its
985 low endpoint in @code{*@var{low}} and its high endpoint in
990 @section Value Labels
992 Each variable has a set of value labels (@pxref{VALUE LABELS,,,pspp,
993 PSPP Users Guide}), represented as @struct{val_labs}. A
994 @struct{val_labs} is essentially a map from @union{value}s to strings.
995 All of the values in a set of value labels have the same width, which
996 for a set of value labels owned by a variable (the common case) is the
997 same as its variable.
999 Sets of value labels may contain any number of entries.
1001 It is rarely necessary to interact directly with a @struct{val_labs}
1002 object. Instead, the most common operation, looking up the label for
1003 a value of a given variable, can be conveniently executed through
1004 functions on @struct{variable}. @xref{Variable Value Labels}, for
1007 Function prototypes and other declarations related to missing values
1008 are declared in @file{data/value-labels.h}.
1010 @deftp {Structure} {struct val_labs}
1011 Opaque type that represents a set of value labels.
1014 The most often useful function for value labels is
1015 @func{val_labs_find}, for looking up the label associated with a
1018 @deftypefun {char *} val_labs_find (const struct val_labs *@var{val_labs}, union value @var{value})
1019 Looks in @var{val_labs} for a label for the given @var{value}.
1020 Returns the label, if one is found, or a null pointer otherwise.
1023 Several other functions for working with value labels are described in
1024 the following section, but these are more rarely useful.
1027 * Value Labels Creation and Destruction::
1028 * Value Labels Properties::
1029 * Value Labels Adding and Removing Labels::
1030 * Value Labels Iteration::
1033 @node Value Labels Creation and Destruction
1034 @subsection Creation and Destruction
1036 These functions create and destroy @struct{val_labs} objects.
1038 @deftypefun {struct val_labs *} val_labs_create (int @var{width})
1039 Creates and returns an initially empty set of value labels with the
1043 @deftypefun {struct val_labs *} val_labs_clone (const struct val_labs *@var{val_labs})
1044 Creates and returns a set of value labels whose width and contents are
1045 the same as those of @var{var_labs}.
1048 @deftypefun void val_labs_clear (struct val_labs *@var{var_labs})
1049 Deletes all value labels from @var{var_labs}.
1052 @deftypefun void val_labs_destroy (struct val_labs *@var{var_labs})
1053 Destroys @var{var_labs}, which must not be referenced again.
1056 @node Value Labels Properties
1057 @subsection Value Labels Properties
1059 These functions inspect and manipulate basic properties of
1060 @struct{val_labs} objects.
1062 @deftypefun size_t val_labs_count (const struct val_labs *@var{val_labs})
1063 Returns the number of value labels in @var{val_labs}.
1066 @deftypefun bool val_labs_can_set_width (const struct val_labs *@var{val_labs}, int @var{new_width})
1067 Tests whether @var{val_labs}'s width may be changed to @var{new_width}
1068 using @func{val_labs_set_width}. Returns true if it is allowed, false
1071 A set of value labels may be resized to a given width only if each
1072 value in it may be resized to that width, as determined by
1073 @func{value_is_resizable} (@pxref{value_is_resizable}).
1076 @deftypefun void val_labs_set_width (struct val_labs *@var{val_labs}, int @var{new_width})
1077 Changes the width of @var{val_labs}'s values to @var{new_width}, which
1078 must be a valid new width as determined by
1079 @func{val_labs_can_set_width}.
1082 @node Value Labels Adding and Removing Labels
1083 @subsection Adding and Removing Labels
1085 These functions add and remove value labels from a @struct{val_labs}
1088 @deftypefun bool val_labs_add (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1089 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1090 which must have the same width as the set of value labels. Returns
1091 true if successful, false if @var{value} already has a label.
1094 @deftypefun void val_labs_replace (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1095 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1096 which must have the same width as the set of value labels. If
1097 @var{value} already has a label in @var{var_labs}, it is replaced.
1100 @deftypefun bool val_labs_remove (struct val_labs *@var{val_labs}, union value @var{value})
1101 Removes from @var{val_labs} any label for @var{value}, which must have
1102 the same width as the set of value labels. Returns true if a label
1103 was removed, false otherwise.
1106 @node Value Labels Iteration
1107 @subsection Iterating through Value Labels
1109 These functions allow iteration through the set of value labels
1110 represented by a @struct{val_labs} object. They may be used in the
1111 context of a @code{for} loop:
1114 struct val_labs val_labs;
1115 const struct val_lab *vl;
1119 for (vl = val_labs_first (val_labs); vl != NULL;
1120 vl = val_labs_next (val_labs, vl))
1122 @dots{}@r{do something with @code{vl}}@dots{}
1126 Value labels should not be added or deleted from a @struct{val_labs}
1127 as it is undergoing iteration.
1129 @deftypefun {const struct val_lab *} val_labs_first (const struct val_labs *@var{val_labs})
1130 Returns the first value label in @var{var_labs}, if it contains at
1131 least one value label, or a null pointer if it does not contain any
1135 @deftypefun {const struct val_lab *} val_labs_next (const struct val_labs *@var{val_labs}, const struct val_labs_iterator **@var{vl})
1136 Returns the value label in @var{var_labs} following @var{vl}, if
1137 @var{vl} is not the last value label in @var{val_labs}, or a null
1138 pointer if there are no value labels following @var{vl}.
1141 @deftypefun {const struct val_lab **} val_labs_sorted (const struct val_labs *@var{val_labs})
1142 Allocates and returns an array of pointers to value labels, which are
1143 sorted in increasing order by value. The array has
1144 @code{val_labs_count (@var{val_labs})} elements. The caller is
1145 responsible for freeing the array with @func{free} (but must not free
1146 any of the @struct{val_lab} elements that the array points to).
1149 The iteration functions above work with pointers to @struct{val_lab}
1150 which is an opaque data structure that users of @struct{val_labs} must
1151 not modify or free directly. The following functions work with
1152 objects of this type:
1154 @deftypefun {const union value *} val_lab_get_value (const struct val_lab *@var{vl})
1155 Returns the value of value label @var{vl}. The caller must not modify
1156 or free the returned value. (To achieve a similar result, remove the
1157 value label with @func{val_labs_remove}, then add the new value with
1158 @func{val_labs_add}.)
1160 The width of the returned value cannot be determined directly from
1161 @var{vl}. It may be obtained by calling @func{val_labs_get_width} on
1162 the @struct{val_labs} that @var{vl} is in.
1165 @deftypefun {const char *} val_lab_get_label (const struct val_lab *@var{vl})
1166 Returns the label in @var{vl} as a null-terminated string. The caller
1167 must not modify or free the returned string. (Use
1168 @func{val_labs_replace} to change a value label.)
1174 A PSPP variable is represented by @struct{variable}, an opaque type
1175 declared in @file{data/variable.h} along with related declarations.
1176 @xref{Variables,,,pspp, PSPP Users Guide}, for a description of PSPP
1177 variables from a user perspective.
1179 PSPP is unusual among computer languages in that, by itself, a PSPP
1180 variable does not have a value. Instead, a variable in PSPP takes on
1181 a value only in the context of a case, which supplies one value for
1182 each variable in a set of variables (@pxref{Cases}). The set of
1183 variables in a case, in turn, are ordinarily part of a dictionary
1184 (@pxref{Dictionaries}).
1186 Every variable has several attributes, most of which correspond
1187 directly to one of the variable attributes visible to PSPP users
1188 (@pxref{Attributes,,,pspp, PSPP Users Guide}).
1190 The following sections describe variable-related functions and macros.
1194 * Variable Type and Width::
1195 * Variable Missing Values::
1196 * Variable Value Labels::
1197 * Variable Print and Write Formats::
1199 * Variable GUI Attributes::
1200 * Variable Leave Status::
1201 * Dictionary Class::
1202 * Variable Creation and Destruction::
1203 * Variable Short Names::
1204 * Variable Relationships::
1205 * Variable Auxiliary Data::
1206 * Variable Categorical Values::
1210 @subsection Variable Name
1212 A variable name is a string between 1 and @code{ID_MAX_LEN} bytes
1213 long that satisfies the rules for PSPP identifiers
1214 (@pxref{Tokens,,,pspp, PSPP Users Guide}). Variable names are
1215 mixed-case and treated case-insensitively.
1217 @deftypefn Macro int ID_MAX_LEN
1218 Maximum length of a variable name, in bytes, currently 64.
1221 Only one commonly useful function relates to variable names:
1223 @deftypefun {const char *} var_get_name (const struct variable *@var{var})
1224 Returns @var{var}'s variable name as a C string.
1227 A few other functions are much more rarely used. Some of these
1228 functions are used internally by the dictionary implementation:
1230 @anchor{var_set_name}
1231 @deftypefun {void} var_set_name (struct variable *@var{var}, const char *@var{new_name})
1232 Changes the name of @var{var} to @var{new_name}, which must be a
1233 ``plausible'' name as defined below.
1235 This function cannot be applied to a variable that is part of a
1236 dictionary. Use @func{dict_rename_var} instead (@pxref{Dictionary
1237 Renaming Variables}).
1240 @deftypefun {enum dict_class} var_get_dict_class (const struct variable *@var{var})
1241 Returns the dictionary class of @var{var}'s name (@pxref{Dictionary
1245 @node Variable Type and Width
1246 @subsection Variable Type and Width
1248 A variable's type and width are the type and width of its values
1251 @deftypefun {enum val_type} var_get_type (const struct variable *@var{var})
1252 Returns the type of variable @var{var}.
1255 @deftypefun int var_get_width (const struct variable *@var{var})
1256 Returns the width of variable @var{var}.
1259 @deftypefun void var_set_width (struct variable *@var{var}, int @var{width})
1260 Sets the width of variable @var{var} to @var{width}. The width of a
1261 variable should not normally be changed after the variable is created,
1262 so this function is rarely used. This function cannot be applied to a
1263 variable that is part of a dictionary.
1266 @deftypefun bool var_is_numeric (const struct variable *@var{var})
1267 Returns true if @var{var} is a numeric variable, false otherwise.
1270 @deftypefun bool var_is_alpha (const struct variable *@var{var})
1271 Returns true if @var{var} is an alphanumeric (string) variable, false
1275 @node Variable Missing Values
1276 @subsection Variable Missing Values
1278 A numeric or short string variable may have a set of user-missing
1279 values (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}), represented
1280 as a @struct{missing_values} (@pxref{User-Missing Values}).
1282 The most frequent operation on a variable's missing values is to query
1283 whether a value is user- or system-missing:
1285 @deftypefun bool var_is_value_missing (const struct variable *@var{var}, const union value *@var{value}, enum mv_class @var{class})
1286 @deftypefunx bool var_is_num_missing (const struct variable *@var{var}, double @var{value}, enum mv_class @var{class})
1287 @deftypefunx bool var_is_str_missing (const struct variable *@var{var}, const char @var{value}[], enum mv_class @var{class})
1288 Tests whether @var{value} is a missing value of the given @var{class}
1289 for variable @var{var} and returns true if so, false otherwise.
1290 @func{var_is_num_missing} may only be applied to numeric variables;
1291 @func{var_is_str_missing} may only be applied to string variables.
1292 @var{value} must have been initialized with the same width as
1295 @code{var_is_@var{type}_missing (@var{var}, @var{value}, @var{class})}
1296 is equivalent to @code{mv_is_@var{type}_missing
1297 (var_get_missing_values (@var{var}), @var{value}, @var{class})}.
1300 In addition, a few functions are provided to work more directly with a
1301 variable's @struct{missing_values}:
1303 @deftypefun {const struct missing_values *} var_get_missing_values (const struct variable *@var{var})
1304 Returns the @struct{missing_values} associated with @var{var}. The
1305 caller must not modify the returned structure. The return value is
1309 @anchor{var_set_missing_values}
1310 @deftypefun {void} var_set_missing_values (struct variable *@var{var}, const struct missing_values *@var{miss})
1311 Changes @var{var}'s missing values to a copy of @var{miss}, or if
1312 @var{miss} is a null pointer, clears @var{var}'s missing values. If
1313 @var{miss} is non-null, it must have the same width as @var{var} or be
1314 resizable to @var{var}'s width (@pxref{mv_resize}). The caller
1315 retains ownership of @var{miss}.
1318 @deftypefun void var_clear_missing_values (struct variable *@var{var})
1319 Clears @var{var}'s missing values. Equivalent to
1320 @code{var_set_missing_values (@var{var}, NULL)}.
1323 @deftypefun bool var_has_missing_values (const struct variable *@var{var})
1324 Returns true if @var{var} has any missing values, false if it has
1325 none. Equivalent to @code{mv_is_empty (var_get_missing_values (@var{var}))}.
1328 @node Variable Value Labels
1329 @subsection Variable Value Labels
1331 A numeric or short string variable may have a set of value labels
1332 (@pxref{VALUE LABELS,,,pspp, PSPP Users Guide}), represented as a
1333 @struct{val_labs} (@pxref{Value Labels}). The most commonly useful
1334 functions for value labels return the value label associated with a
1337 @deftypefun {const char *} var_lookup_value_label (const struct variable *@var{var}, const union value *@var{value})
1338 Looks for a label for @var{value} in @var{var}'s set of value labels.
1339 @var{value} must have the same width as @var{var}. Returns the label
1340 if one exists, otherwise a null pointer.
1343 @deftypefun void var_append_value_name (const struct variable *@var{var}, const union value *@var{value}, struct string *@var{str})
1344 Looks for a label for @var{value} in @var{var}'s set of value labels.
1345 @var{value} must have the same width as @var{var}.
1346 If a label exists, it will be appended to the string pointed to by @var{str}.
1347 Otherwise, it formats @var{value}
1348 using @var{var}'s print format (@pxref{Input and Output Formats})
1349 and appends the formatted string.
1352 The underlying @struct{val_labs} structure may also be accessed
1353 directly using the functions described below.
1355 @deftypefun bool var_has_value_labels (const struct variable *@var{var})
1356 Returns true if @var{var} has at least one value label, false
1360 @deftypefun {const struct val_labs *} var_get_value_labels (const struct variable *@var{var})
1361 Returns the @struct{val_labs} associated with @var{var}. If @var{var}
1362 has no value labels, then the return value may or may not be a null
1365 The variable retains ownership of the returned @struct{val_labs},
1366 which the caller must not attempt to modify.
1369 @deftypefun void var_set_value_labels (struct variable *@var{var}, const struct val_labs *@var{val_labs})
1370 Replaces @var{var}'s value labels by a copy of @var{val_labs}. The
1371 caller retains ownership of @var{val_labs}. If @var{val_labs} is a
1372 null pointer, then @var{var}'s value labels, if any, are deleted.
1375 @deftypefun void var_clear_value_labels (struct variable *@var{var})
1376 Deletes @var{var}'s value labels. Equivalent to
1377 @code{var_set_value_labels (@var{var}, NULL)}.
1380 A final group of functions offers shorthands for operations that would
1381 otherwise require getting the value labels from a variable, copying
1382 them, modifying them, and then setting the modified value labels into
1383 the variable (making a second copy):
1385 @deftypefun bool var_add_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1386 Attempts to add a copy of @var{label} as a label for @var{value} for
1387 the given @var{var}. @var{value} must have the same width as
1388 @var{var}. If @var{value} already has a label, then the old label is
1389 retained. Returns true if a label is added, false if there was an
1390 existing label for @var{value}. Either way, the caller retains
1391 ownership of @var{value} and @var{label}.
1394 @deftypefun void var_replace_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1395 Attempts to add a copy of @var{label} as a label for @var{value} for
1396 the given @var{var}. @var{value} must have the same width as
1397 @var{var}. If @var{value} already has a label, then
1398 @var{label} replaces the old label. Either way, the caller retains
1399 ownership of @var{value} and @var{label}.
1402 @node Variable Print and Write Formats
1403 @subsection Variable Print and Write Formats
1405 Each variable has an associated pair of output formats, called its
1406 @dfn{print format} and @dfn{write format}. @xref{Input and Output
1407 Formats,,,pspp, PSPP Users Guide}, for an introduction to formats.
1408 @xref{Input and Output Formats}, for a developer's description of
1409 format representation.
1411 The print format is used to convert a variable's data values to
1412 strings for human-readable output. The write format is used similarly
1413 for machine-readable output, primarily by the WRITE transformation
1414 (@pxref{WRITE,,,pspp, PSPP Users Guide}). Most often a variable's
1415 print and write formats are the same.
1417 A newly created variable by default has format F8.2 if it is numeric
1418 or an A format with the same width as the variable if it is string.
1419 Many creators of variables override these defaults.
1421 Both the print format and write format are output formats. Input
1422 formats are not part of @struct{variable}. Instead, input programs
1423 and transformations keep track of variable input formats themselves.
1425 The following functions work with variable print and write formats.
1427 @deftypefun {const struct fmt_spec *} var_get_print_format (const struct variable *@var{var})
1428 @deftypefunx {const struct fmt_spec *} var_get_write_format (const struct variable *@var{var})
1429 Returns @var{var}'s print or write format, respectively.
1432 @deftypefun void var_set_print_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1433 @deftypefunx void var_set_write_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1434 @deftypefunx void var_set_both_formats (struct variable *@var{var}, const struct fmt_spec *@var{format})
1435 Sets @var{var}'s print format, write format, or both formats,
1436 respectively, to a copy of @var{format}.
1439 @node Variable Labels
1440 @subsection Variable Labels
1442 A variable label is a string that describes a variable. Variable
1443 labels may contain spaces and punctuation not allowed in variable
1444 names. @xref{VARIABLE LABELS,,,pspp, PSPP Users Guide}, for a
1445 user-level description of variable labels.
1447 The most commonly useful functions for variable labels are those to
1448 retrieve a variable's label:
1450 @deftypefun {const char *} var_to_string (const struct variable *@var{var})
1451 Returns @var{var}'s variable label, if it has one, otherwise
1452 @var{var}'s name. In either case the caller must not attempt to
1453 modify or free the returned string.
1455 This function is useful for user output.
1458 @deftypefun {const char *} var_get_label (const struct variable *@var{var})
1459 Returns @var{var}'s variable label, if it has one, or a null pointer
1463 A few other variable label functions are also provided:
1465 @deftypefun void var_set_label (struct variable *@var{var}, const char *@var{label})
1466 Sets @var{var}'s variable label to a copy of @var{label}, or removes
1467 any label from @var{var} if @var{label} is a null pointer or contains
1468 only spaces. Leading and trailing spaces are removed from the
1469 variable label and its remaining content is truncated at 255 bytes.
1472 @deftypefun void var_clear_label (struct variable *@var{var})
1473 Removes any variable label from @var{var}.
1476 @deftypefun bool var_has_label (const struct variable *@var{var})
1477 Returns true if @var{var} has a variable label, false otherwise.
1480 @node Variable GUI Attributes
1481 @subsection GUI Attributes
1483 These functions and types access and set attributes that are mainly
1484 used by graphical user interfaces. Their values are also stored in
1485 and retrieved from system files (but not portable files).
1487 The first group of functions relate to the measurement level of
1488 numeric data. New variables are assigned a nominal level of
1489 measurement by default.
1491 @deftp {Enumeration} {enum measure}
1492 Measurement level. Available values are:
1495 @item MEASURE_NOMINAL
1496 Numeric data values are arbitrary. Arithmetic operations and
1497 numerical comparisons of such data are not meaningful.
1499 @item MEASURE_ORDINAL
1500 Numeric data values indicate progression along a rank order.
1501 Arbitrary arithmetic operations such as addition are not meaningful on
1502 such data, but inequality comparisons (less, greater, etc.) have
1503 straightforward interpretations.
1506 Ratios, sums, etc. of numeric data values have meaningful
1510 PSPP does not have a separate category for interval data, which would
1511 naturally fall between the ordinal and scale measurement levels.
1514 @deftypefun bool measure_is_valid (enum measure @var{measure})
1515 Returns true if @var{measure} is a valid level of measurement, that
1516 is, if it is one of the @code{enum measure} constants listed above,
1517 and false otherwise.
1520 @deftypefun enum measure var_get_measure (const struct variable *@var{var})
1521 @deftypefunx void var_set_measure (struct variable *@var{var}, enum measure @var{measure})
1522 Gets or sets @var{var}'s measurement level.
1525 The following set of functions relates to the width of on-screen
1526 columns used for displaying variable data in a graphical user
1527 interface environment. The unit of measurement is the width of a
1528 character. For proportionally spaced fonts, this is based on the
1529 average width of a character.
1531 @deftypefun int var_get_display_width (const struct variable *@var{var})
1532 @deftypefunx void var_set_display_width (struct variable *@var{var}, int @var{display_width})
1533 Gets or sets @var{var}'s display width.
1536 @anchor{var_default_display_width}
1537 @deftypefun int var_default_display_width (int @var{width})
1538 Returns the default display width for a variable with the given
1539 @var{width}. The default width of a numeric variable is 8. The
1540 default width of a string variable is @var{width} or 32, whichever is
1544 The final group of functions work with the justification of data when
1545 it is displayed in on-screen columns. New variables are by default
1548 @deftp {Enumeration} {enum alignment}
1549 Text justification. Possible values are @code{ALIGN_LEFT},
1550 @code{ALIGN_RIGHT}, and @code{ALIGN_CENTRE}.
1553 @deftypefun bool alignment_is_valid (enum alignment @var{alignment})
1554 Returns true if @var{alignment} is a valid alignment, that is, if it
1555 is one of the @code{enum alignment} constants listed above, and false
1559 @deftypefun enum alignment var_get_alignment (const struct variable *@var{var})
1560 @deftypefunx void var_set_alignment (struct variable *@var{var}, enum alignment @var{alignment})
1561 Gets or sets @var{var}'s alignment.
1564 @node Variable Leave Status
1565 @subsection Variable Leave Status
1567 Commonly, most or all data in a case come from an input file, read
1568 with a command such as DATA LIST or GET, but data can also be
1569 generated with transformations such as COMPUTE. In the latter case
1570 the question of a datum's ``initial value'' can arise. For example,
1571 the value of a piece of generated data can recursively depend on its
1576 Another situation where the initial value of a variable arises is when
1577 its value is not set at all for some cases, e.g.@: below, @code{Y} is
1578 set only for the first 10 cases:
1580 DO IF #CASENUM <= 10.
1585 By default, the initial value of a datum in either of these situations
1586 is the system-missing value for numeric values and spaces for string
1587 values. This means that, above, X would be system-missing and that Y
1588 would be 1 for the first 10 cases and system-missing for the
1591 PSPP also supports retaining the value of a variable from one case to
1592 another, using the LEAVE command (@pxref{LEAVE,,,pspp, PSPP Users
1593 Guide}). The initial value of such a variable is 0 if it is numeric
1594 and spaces if it is a string. If the command @samp{LEAVE X Y} is
1595 appended to the above example, then X would have value 1 in the first
1596 case and increase by 1 in every succeeding case, and Y would have
1597 value 1 for the first 10 cases and 0 for later cases.
1599 The LEAVE command has no effect on data that comes from an input file
1600 or whose values do not depend on a variable's initial value.
1602 The value of scratch variables (@pxref{Scratch Variables,,,pspp, PSPP
1603 Users Guide}) are always left from one case to another.
1605 The following functions work with a variable's leave status.
1607 @deftypefun bool var_get_leave (const struct variable *@var{var})
1608 Returns true if @var{var}'s value is to be retained from case to case,
1609 false if it is reinitialized to system-missing or spaces.
1612 @deftypefun void var_set_leave (struct variable *@var{var}, bool @var{leave})
1613 If @var{leave} is true, marks @var{var} to be left from case to case;
1614 if @var{leave} is false, marks @var{var} to be reinitialized for each
1617 If @var{var} is a scratch variable, @var{leave} must be true.
1620 @deftypefun bool var_must_leave (const struct variable *@var{var})
1621 Returns true if @var{var} must be left from case to case, that is, if
1622 @var{var} is a scratch variable.
1625 @node Dictionary Class
1626 @subsection Dictionary Class
1628 Occasionally it is useful to classify variables into @dfn{dictionary
1629 classes} based on their names. Dictionary classes are represented by
1630 @enum{dict_class}. This type and other declarations for dictionary
1631 classes are in the @file{<data/dict-class.h>} header.
1633 @deftp {Enumeration} {enum dict_class}
1634 The dictionary classes are:
1638 An ordinary variable, one whose name does not begin with @samp{$} or
1642 A system variable, one whose name begins with @samp{$}. @xref{System
1643 Variables,,,pspp, PSPP Users Guide}.
1646 A scratch variable, one whose name begins with @samp{#}.
1647 @xref{Scratch Variables,,,pspp, PSPP Users Guide}.
1650 The values for dictionary classes are bitwise disjoint, which allows
1651 them to be used in bit-masks. An extra enumeration constant
1652 @code{DC_ALL}, whose value is the bitwise-@i{or} of all of the above
1653 constants, is provided to aid in this purpose.
1656 One example use of dictionary classes arises in connection with PSPP
1657 syntax that uses @code{@var{a} TO @var{b}} to name the variables in a
1658 dictionary from @var{a} to @var{b} (@pxref{Sets of Variables,,,pspp,
1659 PSPP Users Guide}). This syntax requires @var{a} and @var{b} to be in
1660 the same dictionary class. It limits the variables that it includes
1661 to those in that dictionary class.
1663 The following functions relate to dictionary classes.
1665 @deftypefun {enum dict_class} dict_class_from_id (const char *@var{name})
1666 Returns the ``dictionary class'' for the given variable @var{name}, by
1667 looking at its first letter.
1670 @deftypefun {const char *} dict_class_to_name (enum dict_class @var{dict_class})
1671 Returns a name for the given @var{dict_class} as an adjective, e.g.@:
1674 This function should probably not be used in new code as it can lead
1675 to difficulties for internationalization.
1678 @node Variable Creation and Destruction
1679 @subsection Variable Creation and Destruction
1681 Only rarely should PSPP code create or destroy variables directly.
1682 Ordinarily, variables are created within a dictionary and destroying
1683 by individual deletion from the dictionary or by destroying the entire
1684 dictionary at once. The functions here enable the exceptional case,
1685 of creation and destruction of variables that are not associated with
1686 any dictionary. These functions are used internally in the dictionary
1690 @deftypefun {struct variable *} var_create (const char *@var{name}, int @var{width})
1691 Creates and returns a new variable with the given @var{name} and
1692 @var{width}. The new variable is not part of any dictionary. Use
1693 @func{dict_create_var}, instead, to create a variable in a dictionary
1694 (@pxref{Dictionary Creating Variables}).
1696 @var{name} should be a valid variable name and must be a ``plausible''
1697 variable name (@pxref{Variable Name}). @var{width} must be between 0
1698 and @code{MAX_STRING}, inclusive (@pxref{Values}).
1700 The new variable has no user-missing values, value labels, or variable
1701 label. Numeric variables initially have F8.2 print and write formats,
1702 right-justified display alignment, and scale level of measurement.
1703 String variables are created with A print and write formats,
1704 left-justified display alignment, and nominal level of measurement.
1705 The initial display width is determined by
1706 @func{var_default_display_width} (@pxref{var_default_display_width}).
1708 The new variable initially has no short name (@pxref{Variable Short
1709 Names}) and no auxiliary data (@pxref{Variable Auxiliary Data}).
1713 @deftypefun {struct variable *} var_clone (const struct variable *@var{old_var})
1714 Creates and returns a new variable with the same attributes as
1715 @var{old_var}, with a few exceptions. First, the new variable is not
1716 part of any dictionary, regardless of whether @var{old_var} was in a
1717 dictionary. Use @func{dict_clone_var}, instead, to add a clone of a
1718 variable to a dictionary.
1720 Second, the new variable is not given any short name, even if
1721 @var{old_var} had a short name. This is because the new variable is
1722 likely to be immediately renamed, in which case the short name would
1723 be incorrect (@pxref{Variable Short Names}).
1725 Finally, @var{old_var}'s auxiliary data, if any, is not copied to the
1726 new variable (@pxref{Variable Auxiliary Data}).
1729 @deftypefun {void} var_destroy (struct variable *@var{var})
1730 Destroys @var{var} and frees all associated storage, including its
1731 auxiliary data, if any. @var{var} must not be part of a dictionary.
1732 To delete a variable from a dictionary and destroy it, use
1733 @func{dict_delete_var} (@pxref{Dictionary Deleting Variables}).
1736 @node Variable Short Names
1737 @subsection Variable Short Names
1739 PSPP variable names may be up to 64 (@code{ID_MAX_LEN}) bytes long.
1740 The system and portable file formats, however, were designed when
1741 variable names were limited to 8 bytes in length. Since then, the
1742 system file format has been augmented with an extension record that
1743 explains how the 8-byte short names map to full-length names
1744 (@pxref{Long Variable Names Record}), but the short names are still
1745 present. Thus, the continued presence of the short names is more or
1746 less invisible to PSPP users, but every variable in a system file
1747 still has a short name that must be unique.
1749 PSPP can generate unique short names for variables based on their full
1750 names at the time it creates the data file. If all variables' full
1751 names are unique in their first 8 bytes, then the short names are
1752 simply prefixes of the full names; otherwise, PSPP changes them so
1753 that they are unique.
1755 By itself this algorithm interoperates well with other software that
1756 can read system files, as long as that software understands the
1757 extension record that maps short names to long names. When the other
1758 software does not understand the extension record, it can produce
1759 surprising results. Consider a situation where PSPP reads a system
1760 file that contains two variables named RANKINGSCORE, then the user
1761 adds a new variable named RANKINGSTATUS, then saves the modified data
1762 as a new system file. A program that does not understand long names
1763 would then see one of these variables under the name RANKINGS---either
1764 one, depending on the algorithm's details---and the other under a
1765 different name. The effect could be very confusing: by adding a new
1766 and apparently unrelated variable in PSPP, the user effectively
1767 renamed the existing variable.
1769 To counteract this potential problem, every @struct{variable} may have
1770 a short name. A variable created by the system or portable file
1771 reader receives the short name from that data file. When a variable
1772 with a short name is written to a system or portable file, that
1773 variable receives priority over other long names whose names begin
1774 with the same 8 bytes but which were not read from a data file under
1777 Variables not created by the system or portable file reader have no
1778 short name by default.
1780 A variable with a full name of 8 bytes or less in length has absolute
1781 priority for that name when the variable is written to a system file,
1782 even over a second variable with that assigned short name.
1784 PSPP does not enforce uniqueness of short names, although the short
1785 names read from any given data file will always be unique. If two
1786 variables with the same short name are written to a single data file,
1787 neither one receives priority.
1789 The following macros and functions relate to short names.
1791 @defmac SHORT_NAME_LEN
1792 Maximum length of a short name, in bytes. Its value is 8.
1795 @deftypefun {const char *} var_get_short_name (const struct variable *@var{var})
1796 Returns @var{var}'s short name, or a null pointer if @var{var} has not
1797 been assigned a short name.
1800 @deftypefun void var_set_short_name (struct variable *@var{var}, const char *@var{short_name})
1801 Sets @var{var}'s short name to @var{short_name}, or removes
1802 @var{var}'s short name if @var{short_name} is a null pointer. If it
1803 is non-null, then @var{short_name} must be a plausible name for a
1804 variable. The name will be truncated
1805 to 8 bytes in length and converted to all-uppercase.
1808 @deftypefun void var_clear_short_name (struct variable *@var{var})
1809 Removes @var{var}'s short name.
1812 @node Variable Relationships
1813 @subsection Variable Relationships
1815 Variables have close relationships with dictionaries
1816 (@pxref{Dictionaries}) and cases (@pxref{Cases}). A variable is
1817 usually a member of some dictionary, and a case is often used to store
1818 data for the set of variables in a dictionary.
1820 These functions report on these relationships. They may be applied
1821 only to variables that are in a dictionary.
1823 @deftypefun size_t var_get_dict_index (const struct variable *@var{var})
1824 Returns @var{var}'s index within its dictionary. The first variable
1825 in a dictionary has index 0, the next variable index 1, and so on.
1827 The dictionary index can be influenced using dictionary functions such
1828 as dict_reorder_var (@pxref{dict_reorder_var}).
1831 @deftypefun size_t var_get_case_index (const struct variable *@var{var})
1832 Returns @var{var}'s index within a case. The case index is an index
1833 into an array of @union{value} large enough to contain all the data in
1836 The returned case index can be used to access the value of @var{var}
1837 within a case for its dictionary, as in e.g.@: @code{case_data_idx
1838 (case, var_get_case_index (@var{var}))}, but ordinarily it is more
1839 convenient to use the data access functions that do variable-to-index
1840 translation internally, as in e.g.@: @code{case_data (case,
1844 @node Variable Auxiliary Data
1845 @subsection Variable Auxiliary Data
1847 Each @struct{variable} can have a single pointer to auxiliary data of
1848 type @code{void *}. These functions manipulate a variable's auxiliary
1851 Use of auxiliary data is discouraged because of its lack of
1852 flexibility. Only one client can make use of auxiliary data on a
1853 given variable at any time, even though many clients could usefully
1854 associate data with a variable.
1856 To prevent multiple clients from attempting to use a variable's single
1857 auxiliary data field at the same time, we adopt the convention that
1858 use of auxiliary data in the active dataset dictionary is restricted to
1859 the currently executing command. In particular, transformations must
1860 not attach auxiliary data to a variable in the active dataset in the
1861 expectation that it can be used later when the active dataset is read and
1862 the transformation is executed. To help enforce this restriction,
1863 auxiliary data is deleted from all variables in the active dataset
1864 dictionary after the execution of each PSPP command.
1866 This convention for safe use of auxiliary data applies only to the
1867 active dataset dictionary. Rules for other dictionaries may be
1868 established separately.
1870 Auxiliary data should be replaced by a more flexible mechanism at some
1871 point, but no replacement mechanism has been designed or implemented
1874 The following functions work with variable auxiliary data.
1876 @deftypefun {void *} var_get_aux (const struct variable *@var{var})
1877 Returns @var{var}'s auxiliary data, or a null pointer if none has been
1881 @deftypefun {void *} var_attach_aux (const struct variable *@var{var}, void *@var{aux}, void (*@var{aux_dtor}) (struct variable *))
1882 Sets @var{var}'s auxiliary data to @var{aux}, which must not be null.
1883 @var{var} must not already have auxiliary data.
1885 Before @var{var}'s auxiliary data is cleared by @code{var_clear_aux},
1886 @var{aux_dtor}, if non-null, will be called with @var{var} as its
1887 argument. It should free any storage associated with @var{aux}, if
1888 necessary. @code{var_dtor_free} may be appropriate for use as
1891 @deffn {Function} void var_dtor_free (struct variable *@var{var})
1892 Frees @var{var}'s auxiliary data by calling @code{free}.
1896 @deftypefun void var_clear_aux (struct variable *@var{var})
1897 Removes auxiliary data, if any, from @var{var}, first calling the
1898 destructor passed to @code{var_attach_aux}, if one was provided.
1900 Use @code{dict_clear_aux} to remove auxiliary data from every variable
1901 in a dictionary. @c (@pxref{dict_clear_aux}).
1904 @deftypefun {void *} var_detach_aux (struct variable *@var{var})
1905 Removes auxiliary data, if any, from @var{var}, and returns it.
1906 Returns a null pointer if @var{var} had no auxiliary data.
1908 Any destructor passed to @code{var_attach_aux} is not called, so the
1909 caller is responsible for freeing storage associated with the returned
1913 @node Variable Categorical Values
1914 @subsection Variable Categorical Values
1916 Some statistical procedures require a list of all the values that a
1917 categorical variable takes on. Arranging such a list requires making
1918 a pass through the data, so PSPP caches categorical values in
1921 When variable auxiliary data is revamped to support multiple clients
1922 as described in the previous section, categorical values are an
1923 obvious candidate. The form in which they are currently supported is
1926 Categorical values are not robust against changes in the data. That
1927 is, there is currently no way to detect that a transformation has
1928 changed data values, meaning that categorical values lists for the
1929 changed variables must be recomputed. PSPP is in fact in need of a
1930 general-purpose caching and cache-invalidation mechanism, but none
1931 has yet been designed and built.
1933 The following functions work with cached categorical values.
1935 @deftypefun {struct cat_vals *} var_get_obs_vals (const struct variable *@var{var})
1936 Returns @var{var}'s set of categorical values. Yields undefined
1937 behavior if @var{var} does not have any categorical values.
1940 @deftypefun void var_set_obs_vals (const struct variable *@var{var}, struct cat_vals *@var{cat_vals})
1941 Destroys @var{var}'s categorical values, if any, and replaces them by
1942 @var{cat_vals}, ownership of which is transferred to @var{var}. If
1943 @var{cat_vals} is a null pointer, then @var{var}'s categorical values
1947 @deftypefun bool var_has_obs_vals (const struct variable *@var{var})
1948 Returns true if @var{var} has a set of categorical values, false
1953 @section Dictionaries
1955 Each data file in memory or on disk has an associated dictionary,
1956 whose primary purpose is to describe the data in the file.
1957 @xref{Variables,,,pspp, PSPP Users Guide}, for a PSPP user's view of a
1960 A data file stored in a PSPP format, either as a system or portable
1961 file, has a representation of its dictionary embedded in it. Other
1962 kinds of data files are usually not self-describing enough to
1963 construct a dictionary unassisted, so the dictionaries for these files
1964 must be specified explicitly with PSPP commands such as @cmd{DATA
1967 The most important content of a dictionary is an array of variables,
1968 which must have unique names. A dictionary also conceptually contains
1969 a mapping from each of its variables to a location within a case
1970 (@pxref{Cases}), although in fact these mappings are stored within
1971 individual variables.
1973 System variables are not members of any dictionary (@pxref{System
1974 Variables,,,pspp, PSPP Users Guide}).
1976 Dictionaries are represented by @struct{dictionary}. Declarations
1977 related to dictionaries are in the @file{<data/dictionary.h>} header.
1979 The following sections describe functions for use with dictionaries.
1982 * Dictionary Variable Access::
1983 * Dictionary Creating Variables::
1984 * Dictionary Deleting Variables::
1985 * Dictionary Reordering Variables::
1986 * Dictionary Renaming Variables::
1987 * Dictionary Weight Variable::
1988 * Dictionary Filter Variable::
1989 * Dictionary Case Limit::
1990 * Dictionary Split Variables::
1991 * Dictionary File Label::
1992 * Dictionary Documents::
1995 @node Dictionary Variable Access
1996 @subsection Accessing Variables
1998 The most common operations on a dictionary simply retrieve a
1999 @code{struct variable *} of an individual variable based on its name
2002 @deftypefun {struct variable *} dict_lookup_var (const struct dictionary *@var{dict}, const char *@var{name})
2003 @deftypefunx {struct variable *} dict_lookup_var_assert (const struct dictionary *@var{dict}, const char *@var{name})
2004 Looks up and returns the variable with the given @var{name} within
2005 @var{dict}. Name lookup is not case-sensitive.
2007 @code{dict_lookup_var} returns a null pointer if @var{dict} does not
2008 contain a variable named @var{name}. @code{dict_lookup_var_assert}
2009 asserts that such a variable exists.
2012 @deftypefun {struct variable *} dict_get_var (const struct dictionary *@var{dict}, size_t @var{position})
2013 Returns the variable at the given @var{position} in @var{dict}.
2014 @var{position} must be less than the number of variables in @var{dict}
2018 @deftypefun size_t dict_get_var_cnt (const struct dictionary *@var{dict})
2019 Returns the number of variables in @var{dict}.
2022 Another pair of functions allows retrieving a number of variables at
2023 once. These functions are more rarely useful.
2025 @deftypefun void dict_get_vars (const struct dictionary *@var{dict}, const struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2026 @deftypefunx void dict_get_vars_mutable (const struct dictionary *@var{dict}, struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2027 Retrieves all of the variables in @var{dict}, in their original order,
2028 except that any variables in the dictionary classes specified
2029 @var{exclude}, if any, are excluded (@pxref{Dictionary Class}).
2030 Pointers to the variables are stored in an array allocated with
2031 @code{malloc}, and a pointer to the first element of this array is
2032 stored in @code{*@var{vars}}. The caller is responsible for freeing
2033 this memory when it is no longer needed. The number of variables
2034 retrieved is stored in @code{*@var{cnt}}.
2036 The presence or absence of @code{DC_SYSTEM} in @var{exclude} has no
2037 effect, because dictionaries never include system variables.
2040 One additional function is available. This function is most often
2041 used in assertions, but it is not restricted to such use.
2043 @deftypefun bool dict_contains_var (const struct dictionary *@var{dict}, const struct variable *@var{var})
2044 Tests whether @var{var} is one of the variables in @var{dict}.
2045 Returns true if so, false otherwise.
2048 @node Dictionary Creating Variables
2049 @subsection Creating Variables
2051 These functions create a new variable and insert it into a dictionary
2054 There is no provision for inserting an already created variable into a
2055 dictionary. There is no reason that such a function could not be
2056 written, but so far there has been no need for one.
2058 The names provided to one of these functions should be valid variable
2059 names and must be plausible variable names. @c (@pxref{Variable Names}).
2061 If a variable with the same name already exists in the dictionary, the
2062 non-@code{assert} variants of these functions return a null pointer,
2063 without modifying the dictionary. The @code{assert} variants, on the
2064 other hand, assert that no duplicate name exists.
2066 A variable may be in only one dictionary at any given time.
2068 @deftypefun {struct variable *} dict_create_var (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2069 @deftypefunx {struct variable *} dict_create_var_assert (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2070 Creates a new variable with the given @var{name} and @var{width}, as
2071 if through a call to @code{var_create} with those arguments
2072 (@pxref{var_create}), appends the new variable to @var{dict}'s array
2073 of variables, and returns the new variable.
2076 @deftypefun {struct variable *} dict_clone_var (struct dictionary *@var{dict}, const struct variable *@var{old_var})
2077 @deftypefunx {struct variable *} dict_clone_var_assert (struct dictionary *@var{dict}, const struct variable *@var{old_var})
2078 Creates a new variable as a clone of @var{var}, inserts the new
2079 variable into @var{dict}, and returns the new variable. Other
2080 properties of the new variable are copied from @var{old_var}, except
2081 for those not copied by @code{var_clone} (@pxref{var_clone}).
2083 @var{var} does not need to be a member of any dictionary.
2086 @deftypefun {struct variable *} dict_clone_var_as (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2087 @deftypefunx {struct variable *} dict_clone_var_as_assert (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2088 These functions are similar to @code{dict_clone_var} and
2089 @code{dict_clone_var_assert}, respectively, except that the new
2090 variable is named @var{name} instead of keeping @var{old_var}'s name.
2093 @node Dictionary Deleting Variables
2094 @subsection Deleting Variables
2096 These functions remove variables from a dictionary's array of
2097 variables. They also destroy the removed variables and free their
2100 Deleting a variable to which there might be external pointers is a bad
2101 idea. In particular, deleting variables from the active dataset
2102 dictionary is a risky proposition, because transformations can retain
2103 references to arbitrary variables. Therefore, no variable should be
2104 deleted from the active dataset dictionary when any transformations are
2105 active, because those transformations might reference the variable to
2106 be deleted. The safest time to delete a variable is just after a
2107 procedure has been executed, as done by @cmd{DELETE VARIABLES}.
2109 Deleting a variable automatically removes references to that variable
2110 from elsewhere in the dictionary as a weighting variable, filter
2111 variable, @cmd{SPLIT FILE} variable, or member of a vector.
2113 No functions are provided for removing a variable from a dictionary
2114 without destroying that variable. As with insertion of an existing
2115 variable, there is no reason that this could not be implemented, but
2116 so far there has been no need.
2118 @deftypefun void dict_delete_var (struct dictionary *@var{dict}, struct variable *@var{var})
2119 Deletes @var{var} from @var{dict}, of which it must be a member.
2122 @deftypefun void dict_delete_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{count})
2123 Deletes the @var{count} variables in array @var{vars} from @var{dict}.
2124 All of the variables in @var{vars} must be members of @var{dict}. No
2125 variable may be included in @var{vars} more than once.
2128 @deftypefun void dict_delete_consecutive_vars (struct dictionary *@var{dict}, size_t @var{idx}, size_t @var{count})
2129 Deletes the variables in sequential positions
2130 @var{idx}@dots{}@var{idx} + @var{count} (exclusive) from @var{dict},
2131 which must contain at least @var{idx} + @var{count} variables.
2134 @deftypefun void dict_delete_scratch_vars (struct dictionary *@var{dict})
2135 Deletes all scratch variables from @var{dict}.
2138 @node Dictionary Reordering Variables
2139 @subsection Changing Variable Order
2141 The variables in a dictionary are stored in an array. These functions
2142 change the order of a dictionary's array of variables without changing
2143 which variables are in the dictionary.
2145 @anchor{dict_reorder_var}
2146 @deftypefun void dict_reorder_var (struct dictionary *@var{dict}, struct variable *@var{var}, size_t @var{new_index})
2147 Moves @var{var}, which must be in @var{dict}, so that it is at
2148 position @var{new_index} in @var{dict}'s array of variables. Other
2149 variables in @var{dict}, if any, retain their relative positions.
2150 @var{new_index} must be less than the number of variables in
2154 @deftypefun void dict_reorder_vars (struct dictionary *@var{dict}, struct variable *const *@var{new_order}, size_t @var{count})
2155 Moves the @var{count} variables in @var{new_order} to the beginning of
2156 @var{dict}'s array of variables in the specified order. Other
2157 variables in @var{dict}, if any, retain their relative positions.
2159 All of the variables in @var{new_order} must be in @var{dict}. No
2160 duplicates are allowed within @var{new_order}, which means that
2161 @var{count} must be no greater than the number of variables in
2165 @node Dictionary Renaming Variables
2166 @subsection Renaming Variables
2168 These functions change the names of variables within a dictionary.
2169 The @func{var_set_name} function (@pxref{var_set_name}) cannot be
2170 applied directly to a variable that is in a dictionary, because
2171 @struct{dictionary} contains an index by name that @func{var_set_name}
2172 would not update. The following functions take care to update the
2173 index as well. They also ensure that variable renaming does not cause
2174 a dictionary to contain a duplicate variable name.
2176 @deftypefun void dict_rename_var (struct dictionary *@var{dict}, struct variable *@var{var}, const char *@var{new_name})
2177 Changes the name of @var{var}, which must be in @var{dict}, to
2178 @var{new_name}. A variable named @var{new_name} must not already be
2179 in @var{dict}, unless @var{new_name} is the same as @var{var}'s
2183 @deftypefun bool dict_rename_vars (struct dictionary *@var{dicT}, struct variable **@var{vars}, char **@var{new_names}, size_t @var{count}, char **@var{err_name})
2184 Renames each of the @var{count} variables in @var{vars} to the name in
2185 the corresponding position of @var{new_names}. If the renaming would
2186 result in a duplicate variable name, returns false and stores one of
2187 the names that would be be duplicated into @code{*@var{err_name}}, if
2188 @var{err_name} is non-null. Otherwise, the renaming is successful,
2189 and true is returned.
2192 @node Dictionary Weight Variable
2193 @subsection Weight Variable
2195 A data set's cases may optionally be weighted by the value of a
2196 numeric variable. @xref{WEIGHT,,,pspp, PSPP Users Guide}, for a user
2197 view of weight variables.
2199 The weight variable is written to and read from system and portable
2202 The most commonly useful function related to weighting is a
2203 convenience function to retrieve a weighting value from a case.
2205 @deftypefun double dict_get_case_weight (const struct dictionary *@var{dict}, const struct ccase *@var{case}, bool *@var{warn_on_invalid})
2206 Retrieves and returns the value of the weighting variable specified by
2207 @var{dict} from @var{case}. Returns 1.0 if @var{dict} has no
2210 Returns 0.0 if @var{c}'s weight value is user- or system-missing,
2211 zero, or negative. In such a case, if @var{warn_on_invalid} is
2212 non-null and @code{*@var{warn_on_invalid}} is true,
2213 @func{dict_get_case_weight} also issues an error message and sets
2214 @code{*@var{warn_on_invalid}} to false. To disable error reporting,
2215 pass a null pointer or a pointer to false as @var{warn_on_invalid} or
2216 use a @func{msg_disable}/@func{msg_enable} pair.
2219 The dictionary also has a pair of functions for getting and setting
2220 the weight variable.
2222 @deftypefun {struct variable *} dict_get_weight (const struct dictionary *@var{dict})
2223 Returns @var{dict}'s current weighting variable, or a null pointer if
2224 the dictionary does not have a weighting variable.
2227 @deftypefun void dict_set_weight (struct dictionary *@var{dict}, struct variable *@var{var})
2228 Sets @var{dict}'s weighting variable to @var{var}. If @var{var} is
2229 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2230 is null, then @var{dict}'s weighting variable, if any, is cleared.
2233 @node Dictionary Filter Variable
2234 @subsection Filter Variable
2236 When the active dataset is read by a procedure, cases can be excluded
2237 from analysis based on the values of a @dfn{filter variable}.
2238 @xref{FILTER,,,pspp, PSPP Users Guide}, for a user view of filtering.
2240 These functions store and retrieve the filter variable. They are
2241 rarely useful, because the data analysis framework automatically
2242 excludes from analysis the cases that should be filtered.
2244 @deftypefun {struct variable *} dict_get_filter (const struct dictionary *@var{dict})
2245 Returns @var{dict}'s current filter variable, or a null pointer if the
2246 dictionary does not have a filter variable.
2249 @deftypefun void dict_set_filter (struct dictionary *@var{dict}, struct variable *@var{var})
2250 Sets @var{dict}'s filter variable to @var{var}. If @var{var} is
2251 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2252 is null, then @var{dict}'s filter variable, if any, is cleared.
2255 @node Dictionary Case Limit
2256 @subsection Case Limit
2258 The limit on cases analyzed by a procedure, set by the @cmd{N OF
2259 CASES} command (@pxref{N OF CASES,,,pspp, PSPP Users Guide}), is
2260 stored as part of the dictionary. The dictionary does not, on the
2261 other hand, play any role in enforcing the case limit (a job done by
2262 data analysis framework code).
2264 A case limit of 0 means that the number of cases is not limited.
2266 These functions are rarely useful, because the data analysis framework
2267 automatically excludes from analysis any cases beyond the limit.
2269 @deftypefun casenumber dict_get_case_limit (const struct dictionary *@var{dict})
2270 Returns the current case limit for @var{dict}.
2273 @deftypefun void dict_set_case_limit (struct dictionary *@var{dict}, casenumber @var{limit})
2274 Sets @var{dict}'s case limit to @var{limit}.
2277 @node Dictionary Split Variables
2278 @subsection Split Variables
2280 The user may use the @cmd{SPLIT FILE} command (@pxref{SPLIT
2281 FILE,,,pspp, PSPP Users Guide}) to select a set of variables on which
2282 to split the active dataset into groups of cases to be analyzed
2283 independently in each statistical procedure. The set of split
2284 variables is stored as part of the dictionary, although the effect on
2285 data analysis is implemented by each individual statistical procedure.
2287 Split variables may be numeric or short or long string variables.
2289 The most useful functions for split variables are those to retrieve
2290 them. Even these functions are rarely useful directly: for the
2291 purpose of breaking cases into groups based on the values of the split
2292 variables, it is usually easier to use
2293 @func{casegrouper_create_splits}.
2295 @deftypefun {const struct variable *const *} dict_get_split_vars (const struct dictionary *@var{dict})
2296 Returns a pointer to an array of pointers to split variables. If and
2297 only if there are no split variables, returns a null pointer. The
2298 caller must not modify or free the returned array.
2301 @deftypefun size_t dict_get_split_cnt (const struct dictionary *@var{dict})
2302 Returns the number of split variables.
2305 The following functions are also available for working with split
2308 @deftypefun void dict_set_split_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{cnt})
2309 Sets @var{dict}'s split variables to the @var{cnt} variables in
2310 @var{vars}. If @var{cnt} is 0, then @var{dict} will not have any
2311 split variables. The caller retains ownership of @var{vars}.
2314 @deftypefun void dict_unset_split_var (struct dictionary *@var{dict}, struct variable *@var{var})
2315 Removes @var{var}, which must be a variable in @var{dict}, from
2316 @var{dict}'s split of split variables.
2319 @node Dictionary File Label
2320 @subsection File Label
2322 A dictionary may optionally have an associated string that describes
2323 its contents, called its file label. The user may set the file label
2324 with the @cmd{FILE LABEL} command (@pxref{FILE LABEL,,,pspp, PSPP
2327 These functions set and retrieve the file label.
2329 @deftypefun {const char *} dict_get_label (const struct dictionary *@var{dict})
2330 Returns @var{dict}'s file label. If @var{dict} does not have a label,
2331 returns a null pointer.
2334 @deftypefun void dict_set_label (struct dictionary *@var{dict}, const char *@var{label})
2335 Sets @var{dict}'s label to @var{label}. If @var{label} is non-null,
2336 then its content, truncated to at most 60 bytes, becomes the new file
2337 label. If @var{label} is null, then @var{dict}'s label is removed.
2339 The caller retains ownership of @var{label}.
2342 @node Dictionary Documents
2343 @subsection Documents
2345 A dictionary may include an arbitrary number of lines of explanatory
2346 text, called the dictionary's documents. For compatibility, document
2347 lines have a fixed width, and lines that are not exactly this width
2348 are truncated or padded with spaces as necessary to bring them to the
2351 PSPP users can use the @cmd{DOCUMENT} (@pxref{DOCUMENT,,,pspp, PSPP
2352 Users Guide}), @cmd{ADD DOCUMENT} (@pxref{ADD DOCUMENT,,,pspp, PSPP
2353 Users Guide}), and @cmd{DROP DOCUMENTS} (@pxref{DROP DOCUMENTS,,,pspp,
2354 PSPP Users Guide}) commands to manipulate documents.
2356 @deftypefn Macro int DOC_LINE_LENGTH
2357 The fixed length of a document line, in bytes, defined to 80.
2360 The following functions work with whole sets of documents. They
2361 accept or return sets of documents formatted as null-terminated
2362 strings that are an exact multiple of @code{DOC_LINE_LENGTH}
2365 @deftypefun {const char *} dict_get_documents (const struct dictionary *@var{dict})
2366 Returns the documents in @var{dict}, or a null pointer if @var{dict}
2370 @deftypefun void dict_set_documents (struct dictionary *@var{dict}, const char *@var{new_documents})
2371 Sets @var{dict}'s documents to @var{new_documents}. If
2372 @var{new_documents} is a null pointer or an empty string, then
2373 @var{dict}'s documents are cleared. The caller retains ownership of
2374 @var{new_documents}.
2377 @deftypefun void dict_clear_documents (struct dictionary *@var{dict})
2378 Clears the documents from @var{dict}.
2381 The following functions work with individual lines in a dictionary's
2384 @deftypefun void dict_add_document_line (struct dictionary *@var{dict}, const char *@var{content})
2385 Appends @var{content} to the documents in @var{dict}. The text in
2386 @var{content} will be truncated or padded with spaces as necessary to
2387 make it exactly @code{DOC_LINE_LENGTH} bytes long. The caller retains
2388 ownership of @var{content}.
2390 If @var{content} is over @code{DOC_LINE_LENGTH}, this function also
2391 issues a warning using @func{msg}. To suppress the warning, enclose a
2392 call to one of this function in a @func{msg_disable}/@func{msg_enable}
2396 @deftypefun size_t dict_get_document_line_cnt (const struct dictionary *@var{dict})
2397 Returns the number of line of documents in @var{dict}. If the
2398 dictionary contains no documents, returns 0.
2401 @deftypefun void dict_get_document_line (const struct dictionary *@var{dict}, size_t @var{idx}, struct string *@var{content})
2402 Replaces the text in @var{content} (which must already have been
2403 initialized by the caller) by the document line in @var{dict} numbered
2404 @var{idx}, which must be less than the number of lines of documents in
2405 @var{dict}. Any trailing white space in the document line is trimmed,
2406 so that @var{content} will have a length between 0 and
2407 @code{DOC_LINE_LENGTH}.
2410 @node Coding Conventions
2411 @section Coding Conventions
2413 Every @file{.c} file should have @samp{#include <config.h>} as its
2414 first non-comment line. No @file{.h} file should include
2417 This section needs to be finished.
2422 This section needs to be written.
2427 This section needs to be written.
2432 This section needs to be written.