2 @chapter Basic Concepts
4 This chapter introduces basic data structures and other concepts
5 needed for developing in PSPP.
9 * Input and Output Formats::
10 * User-Missing Values::
14 * Coding Conventions::
24 The unit of data in PSPP is a @dfn{value}.
30 Values are classified by @dfn{type} and @dfn{width}. The
31 type of a value is either @dfn{numeric} or @dfn{string} (sometimes
32 called alphanumeric). The width of a string value ranges from 1 to
33 @code{MAX_STRING} bytes. The width of a numeric value is artificially
34 defined to be 0; thus, the type of a value can be inferred from its
37 Some support is provided for working with value types and widths, in
38 @file{data/val-type.h}:
40 @deftypefn Macro int MAX_STRING
41 Maximum width of a string value, in bytes, currently 32,767.
44 @deftypefun bool val_type_is_valid (enum val_type @var{val_type})
45 Returns true if @var{val_type} is a valid value type, that is,
46 either @code{VAL_NUMERIC} or @code{VAL_STRING}. Useful for
50 @deftypefun {enum val_type} val_type_from_width (int @var{width})
51 Returns @code{VAL_NUMERIC} if @var{width} is 0 and thus represents the
52 width of a numeric value, otherwise @code{VAL_STRING} to indicate that
53 @var{width} is the width of a string value.
56 The following subsections describe how values of each type are
62 * Runtime Typed Values::
66 @subsection Numeric Values
68 A value known to be numeric at compile time is represented as a
69 @code{double}. PSPP provides three values of @code{double} for
70 special purposes, defined in @file{data/val-type.h}:
72 @deftypefn Macro double SYSMIS
73 The @dfn{system-missing value}, used to represent a datum whose true
74 value is unknown, such as a survey question that was not answered by
75 the respondent, or undefined, such as the result of division by zero.
76 PSPP propagates the system-missing value through calculations and
77 compensates for missing values in statistical analyses. @xref{Missing
78 Observations,,,pspp, PSPP Users Guide}, for a PSPP user's view of
81 PSPP currently defines @code{SYSMIS} as @code{-DBL_MAX}, that is, the
82 greatest finite negative value of @code{double}. It is best not to
83 depend on this definition, because PSPP may transition to using an
84 IEEE NaN (not a number) instead at some point in the future.
87 @deftypefn Macro double LOWEST
88 @deftypefnx Macro double HIGHEST
89 The greatest finite negative (except for @code{SYSMIS}) and positive
90 values of @code{double}, respectively. These values do not ordinarily
91 appear in user data files. Instead, they are used to implement
92 endpoints of open-ended ranges that are occasionally permitted in PSPP
93 syntax, e.g.@: @code{5 THRU HI} as a range of missing values
94 (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}).
98 @subsection String Values
100 A value known at compile time to have string type is represented as an
101 array of @code{char}. String values do not necessarily represent
102 readable text strings and may contain arbitrary 8-bit data, including
103 null bytes, control codes, and bytes with the high bit set. Thus,
104 string values are not null-terminated strings, but rather opaque
107 @code{SYSMIS}, @code{LOWEST}, and @code{HIGHEST} have no equivalents
108 as string values. Usually, PSPP fills an unknown or undefined string
109 values with spaces, but PSPP does not treat such a string as a special
110 case when it processes it later.
113 @code{MAX_STRING}, the maximum length of a string value, is defined in
114 @file{data/val-type.h}.
116 @node Runtime Typed Values
117 @subsection Runtime Typed Values
119 When a value's type is only known at runtime, it is often represented
120 as a @union{value}, defined in @file{data/value.h}. @union{value} has
121 two members: a @code{double} named @samp{f} to store a numeric value
122 and an array of @code{char} named @samp{s} to a store a string value.
123 A @union{value} does not identify the type or width of the data it
124 contains. Code that works with @union{values}s must therefore have
125 external knowledge of its content, often through the type and width of
126 a @struct{variable} (@pxref{Variables}).
128 @cindex MAX_SHORT_STRING
132 The array of @code{char} in @union{value} has only a small, fixed
133 capacity of @code{MAX_SHORT_STRING} bytes. A value that
134 fits within this capacity is called a @dfn{short string}. Any wider
135 string value, which must be represented by more than one
136 @union{value}, is called a @dfn{long string}.
138 @deftypefn Macro int MAX_SHORT_STRING
139 Maximum width of a short string value, never less than 8 bytes. It is
140 wider than 8 bytes on systems where @code{double} is either larger
141 than 8 bytes or has stricter alignment than 8 bytes.
144 @deftypefn Macro int MIN_LONG_STRING
145 Minimum width of a long string value, that is, @code{MAX_SHORT_STRING
149 Long string variables are slightly harder to work with than short
150 string values, because they cannot be conveniently and efficiently
151 allocated as block scope variables or structure members. The PSPP
152 language exposes this inconvenience to the user: there are many
153 circumstances in PSPP syntax where short strings are allowed but not
154 long strings. Short string variables, for example, may have
155 user-missing values, but long string variables may not (@pxref{Missing
156 Observations,,,pspp, PSPP Users Guide}).
158 PSPP provides a few functions for working with @union{value}s. The
159 most useful are described below. To use these functions, recall that
160 a numeric value has a width of 0.
162 @deftypefun size_t value_cnt_from_width (int @var{width})
163 Returns the number of consecutive @union{value}s that must be
164 allocated to store a value of the given @var{width}. For a numeric or
165 short string value, the return value is 1; for long string
166 variables, it is greater than 1.
169 @deftypefun void value_copy (union value *@var{dst}, @
170 const union value *@var{src}, @
172 Copies a value of the given @var{width} from the @union{value} array
173 starting at @var{src} to the one starting at @var{dst}. The two
174 arrays must not overlap.
177 @deftypefun void value_set_missing (union value *@var{value}, int @var{width})
178 Sets @var{value} to @code{SYSMIS} if it is numeric or to all spaces if
179 it is alphanumeric, according to @var{width}. @var{value} must point
180 to the start of a @union{value} array of the given @var{width}.
183 @anchor{value_is_resizable}
184 @deftypefun bool value_is_resizable (const union value *@var{value}, int @var{old_width}, int @var{new_width})
185 Determines whether @var{value} may be resized from @var{old_width} to
186 @var{new_width}. Resizing is possible if the following criteria are
187 met. First, @var{old_width} and @var{new_width} must be both numeric
188 or both string widths. Second, if @var{new_width} is a short string
189 width and less than @var{old_width}, resizing is allowed only if bytes
190 @var{new_width} through @var{old_width} in @var{value} contain only
193 These rules are part of those used by @func{mv_is_resizable} and
194 @func{val_labs_can_set_width}.
197 @deftypefun void value_resize (union value *@var{value}, int @var{old_width}, int @var{new_width})
198 Resizes @var{value} from @var{old_width} to @var{new_width}, which
199 must be allowed by the rules stated above. This has an effect only if
200 @var{new_width} is greater than @var{old_width}, in which case the
201 bytes newly added to @var{value} are cleared to spaces.
204 @node Input and Output Formats
205 @section Input and Output Formats
207 Input and output formats specify how to convert data fields to and
208 from data values (@pxref{Input and Output Formats,,,pspp, PSPP Users
209 Guide}). PSPP uses @struct{fmt_spec} to represent input and output
212 Function prototypes and other declarations related to formats are in
213 the @file{<data/format.h>} header.
215 @deftp {Structure} {struct fmt_spec}
216 An input or output format, with the following members:
219 @item enum fmt_type type
220 The format type (see below).
223 Field width, in bytes. The width of numeric fields is always between
224 1 and 40 bytes, and the width of string fields is always between 1 and
225 65534 bytes. However, many individual types of formats place stricter
226 limits on field width (see @ref{fmt_max_input_width},
227 @ref{fmt_max_output_width}).
230 Number of decimal places, in character positions. For format types
231 that do not allow decimal places to be specified, this value must be
232 0. Format types that do allow decimal places have type-specific and
233 often width-specific restrictions on @code{d} (see
234 @ref{fmt_max_input_decimals}, @ref{fmt_max_output_decimals}).
238 @deftp {Enumeration} {enum fmt_type}
239 An enumerated type representing an input or output format type. Each
240 PSPP input and output format has a corresponding enumeration constant
241 prefixed by @samp{FMT}: @code{FMT_F}, @code{FMT_COMMA},
242 @code{FMT_DOT}, and so on.
245 The following sections describe functions for manipulating formats and
246 the data in fields represented by formats.
249 * Constructing and Verifying Formats::
250 * Format Utility Functions::
251 * Obtaining Properties of Format Types::
252 * Numeric Formatting Styles::
253 * Formatted Data Input and Output::
256 @node Constructing and Verifying Formats
257 @subsection Constructing and Verifying Formats
259 These functions construct @struct{fmt_spec}s and verify that they are
262 @deftypefun {struct fmt_spec} fmt_for_input (enum fmt_type @var{type}, int @var{w}, int @var{d})
263 @deftypefunx {struct fmt_spec} fmt_for_output (enum fmt_type @var{type}, int @var{w}, int @var{d})
264 Constructs a @struct{fmt_spec} with the given @var{type}, @var{w}, and
265 @var{d}, asserts that the result is a valid input (or output) format,
269 @anchor{fmt_for_output_from_input}
270 @deftypefun {struct fmt_spec} fmt_for_output_from_input (const struct fmt_spec *@var{input})
271 Given @var{input}, which must be a valid input format, returns the
272 equivalent output format. @xref{Input and Output Formats,,,pspp, PSPP
273 Users Guide}, for the rules for converting input formats into output
277 @deftypefun {struct fmt_spec} fmt_default_for_width (int @var{width})
278 Returns the default output format for a variable of the given
279 @var{width}. For a numeric variable, this is F8.2 format; for a
280 string variable, it is the A format of the given @var{width}.
283 The following functions check whether a @struct{fmt_spec} is valid for
284 various uses and return true if so, false otherwise. When any of them
285 returns false, it also outputs an explanatory error message using
286 @func{msg}. To suppress error output, enclose a call to one of these
287 functions by a @func{msg_disable}/@func{msg_enable} pair.
289 @deftypefun bool fmt_check (const struct fmt_spec *@var{format}, bool @var{for_input})
290 @deftypefunx bool fmt_check_input (const struct fmt_spec *@var{format})
291 @deftypefunx bool fmt_check_output (const struct fmt_spec *@var{format})
292 Checks whether @var{format} is a valid input format (for
293 @func{fmt_check_input}, or @func{fmt_check} if @var{for_input}) or
294 output format (for @func{fmt_check_output}, or @func{fmt_check} if not
298 @deftypefun bool fmt_check_type_compat (const struct fmt_spec *@var{format}, enum val_type @var{type})
299 Checks whether @var{format} matches the value type @var{type}, that
300 is, if @var{type} is @code{VAL_NUMERIC} and @var{format} is a numeric
301 format or @var{type} is @code{VAL_STRING} and @var{format} is a string
305 @deftypefun bool fmt_check_width_compat (const struct fmt_spec *@var{format}, int @var{width})
306 Checks whether @var{format} may be used as an output format for a
307 value of the given @var{width}.
309 @func{fmt_var_width}, described in
310 the following section, can be also be used to determine the value
311 width needed by a format.
314 @node Format Utility Functions
315 @subsection Format Utility Functions
317 These functions work with @struct{fmt_spec}s.
319 @deftypefun int fmt_var_width (const struct fmt_spec *@var{format})
320 Returns the width for values associated with @var{format}. If
321 @var{format} is a numeric format, the width is 0; if @var{format} is
322 an A format, then the width @code{@var{format}->w}; otherwise,
323 @var{format} is an AHEX format and its width is @code{@var{format}->w
327 @deftypefun char *fmt_to_string (const struct fmt_spec *@var{format}, char @var{s}[FMT_STRING_LEN_MAX + 1])
328 Converts @var{format} to a human-readable format specifier in @var{s}
329 and returns @var{s}. @var{format} need not be a valid input or output
330 format specifier, e.g.@: it is allowed to have an excess width or
331 decimal places. In particular, if @var{format} has decimals, they are
332 included in the output string, even if @var{format}'s type does not
333 allow decimals, to allow accurately presenting incorrect formats to
337 @deftypefun bool fmt_equal (const struct fmt_spec *@var{a}, const struct fmt_spec *@var{b})
338 Compares @var{a} and @var{b} memberwise and returns true if they are
339 identical, false otherwise. @var{format} need not be a valid input or
340 output format specifier.
343 @node Obtaining Properties of Format Types
344 @subsection Obtaining Properties of Format Types
346 These functions work with @enum{fmt_type}s instead of the higher-level
347 @struct{fmt_spec}s. Their primary purpose is to report properties of
348 each possible format type, which in turn allows clients to abstract
349 away many of the details of the very heterogeneous requirements of
352 The first group of functions works with format type names.
354 @deftypefun const char *fmt_name (enum fmt_type @var{type})
355 Returns the name for the given @var{type}, e.g.@: @code{"COMMA"} for
359 @deftypefun bool fmt_from_name (const char *@var{name}, enum fmt_type *@var{type})
360 Tries to find the @enum{fmt_type} associated with @var{name}. If
361 successful, sets @code{*@var{type}} to the type and returns true;
362 otherwise, returns false without modifying @code{*@var{type}}.
365 The functions below query basic limits on width and decimal places for
368 @deftypefun bool fmt_takes_decimals (enum fmt_type @var{type})
369 Returns true if a format of the given @var{type} is allowed to have a
370 nonzero number of decimal places (the @code{d} member of
371 @struct{fmt_spec}), false if not.
374 @anchor{fmt_min_input_width}
375 @anchor{fmt_max_input_width}
376 @anchor{fmt_min_output_width}
377 @anchor{fmt_max_output_width}
378 @deftypefun int fmt_min_input_width (enum fmt_type @var{type})
379 @deftypefunx int fmt_max_input_width (enum fmt_type @var{type})
380 @deftypefunx int fmt_min_output_width (enum fmt_type @var{type})
381 @deftypefunx int fmt_max_output_width (enum fmt_type @var{type})
382 Returns the minimum or maximum width (the @code{w} member of
383 @struct{fmt_spec}) allowed for an input or output format of the
384 specified @var{type}.
387 @anchor{fmt_max_input_decimals}
388 @anchor{fmt_max_output_decimals}
389 @deftypefun int fmt_max_input_decimals (enum fmt_type @var{type}, int @var{width})
390 @deftypefunx int fmt_max_output_decimals (enum fmt_type @var{type}, int @var{width})
391 Returns the maximum number of decimal places allowed for an input or
392 output format, respectively, of the given @var{type} and @var{width}.
393 Returns 0 if the specified @var{type} does not allow any decimal
394 places or if @var{width} is too narrow to allow decimal places.
397 @deftypefun int fmt_step_width (enum fmt_type @var{type})
398 Returns the ``width step'' for a @struct{fmt_spec} of the given
399 @var{type}. A @struct{fmt_spec}'s width must be a multiple of its
400 type's width step. Most format types have a width step of 1, so that
401 their formats' widths may be any integer within the valid range, but
402 hexadecimal numeric formats and AHEX string formats have a width step
406 These functions allow clients to broadly determine how each kind of
407 input or output format behaves.
409 @deftypefun bool fmt_is_string (enum fmt_type @var{type})
410 @deftypefunx bool fmt_is_numeric (enum fmt_type @var{type})
411 Returns true if @var{type} is a format for numeric or string values,
412 respectively, false otherwise.
415 @deftypefun enum fmt_category fmt_get_category (enum fmt_type @var{type})
416 Returns the category within which @var{type} falls.
418 @deftp {Enumeration} {enum fmt_category}
419 A group of format types. Format type categories correspond to the
420 input and output categories described in the PSPP user documentation
421 (@pxref{Input and Output Formats,,,pspp, PSPP Users Guide}).
423 Each format is in exactly one category. The categories have bitwise
424 disjoint values to make it easy to test whether a format type is in
425 one of multiple categories, e.g.@:
428 if (fmt_get_category (type) & (FMT_CAT_DATE | FMT_CAT_TIME))
430 /* @dots{}@r{@code{type} is a date or time format}@dots{} */
434 The format categories are:
437 Basic numeric formats.
440 Custom currency formats.
443 Legacy numeric formats.
448 @item FMT_CAT_HEXADECIMAL
457 @item FMT_CAT_DATE_COMPONENT
458 Date component formats.
466 The PSPP input and output routines use the following pair of functions
467 to convert @enum{fmt_type}s to and from the separate set of codes used
468 in system and portable files:
470 @deftypefun int fmt_to_io (enum fmt_type @var{type})
471 Returns the format code used in system and portable files that
472 corresponds to @var{type}.
475 @deftypefun bool fmt_from_io (int @var{io}, enum fmt_type *@var{type})
476 Converts @var{io}, a format code used in system and portable files,
477 into a @enum{fmt_type} in @code{*@var{type}}. Returns true if
478 successful, false if @var{io} is not valid.
481 These functions reflect the relationship between input and output
484 @deftypefun enum fmt_type fmt_input_to_output (enum fmt_type @var{type})
485 Returns the output format type that is used by default by DATA LIST
486 and other input procedures when @var{type} is specified as an input
487 format. The conversion from input format to output format is more
488 complicated than simply changing the format.
489 @xref{fmt_for_output_from_input}, for a function that performs the
493 @deftypefun bool fmt_usable_for_input (enum fmt_type @var{type})
494 Returns true if @var{type} may be used as an input format type, false
495 otherwise. The custom currency formats, in particular, may be used
496 for output but not for input.
498 All format types are valid for output.
501 The final group of format type property functions obtain
502 human-readable templates that illustrate the formats graphically.
504 @deftypefun const char *fmt_date_template (enum fmt_type @var{type})
505 Returns a formatting template for @var{type}, which must be a date or
506 time format type. These formats are used by @func{data_in} and
507 @func{data_out} to guide parsing and formatting date and time data.
510 @deftypefun char *fmt_dollar_template (const struct fmt_spec *@var{format})
511 Returns a string of the form @code{$#,###.##} according to
512 @var{format}, which must be of type @code{FMT_DOLLAR}. The caller
513 must free the string with @code{free}.
516 @node Numeric Formatting Styles
517 @subsection Numeric Formatting Styles
519 Each of the basic numeric formats (F, E, COMMA, DOT, DOLLAR, PCT) and
520 custom currency formats (CCA, CCB, CCC, CCD, CCE) has an associated
521 numeric formatting style, represented by @struct{fmt_number_style}.
522 Input and output conversion of formats that have numeric styles is
523 determined mainly by the style, although the formatting rules have
524 special cases that are not represented within the style.
526 @deftp {Structure} {struct fmt_number_style}
527 A structure type with the following members:
530 @item struct substring neg_prefix
531 @itemx struct substring prefix
532 @itemx struct substring suffix
533 @itemx struct substring neg_suffix
534 A set of strings used a prefix to negative numbers, a prefix to every
535 number, a suffix to every number, and a suffix to negative numbers,
536 respectively. Each of these strings is no more than
537 @code{FMT_STYLE_AFFIX_MAX} bytes (currently 16) bytes in length.
538 These strings must be freed with @func{ss_dealloc} when no longer
542 The character used as a decimal point. It must be either @samp{.} or
546 The character used for grouping digits to the left of the decimal
547 point. It may be @samp{.} or @samp{,}, in which case it must not be
548 equal to @code{decimal}, or it may be set to 0 to disable grouping.
552 The following functions are provided for working with numeric
555 @deftypefun {struct fmt_number_style *} fmt_number_style_create (void)
556 Creates and returns a new @struct{fmt_number_style} with all of the
557 prefixes and suffixes set to the empty string, @samp{.} as the decimal
558 point character, and grouping disables.
561 @deftypefun void fmt_number_style_destroy (struct fmt_number_style *@var{style})
562 Destroys @var{style}, freeing its storage.
565 @deftypefun int fmt_affix_width (const struct fmt_number_style *@var{style})
566 Returns the total length of @var{style}'s @code{prefix} and @code{suffix}.
569 @deftypefun int fmt_neg_affix_width (const struct fmt_number_style *@var{style})
570 Returns the total length of @var{style}'s @code{neg_prefix} and
574 PSPP maintains a global set of number styles for each of the basic
575 numeric formats and custom currency formats. The following functions
576 work with these global styles:
578 @deftypefun {const struct fmt_number_style *} fmt_get_style (enum fmt_type @var{type})
579 Returns the numeric style for the given format @var{type}.
582 @deftypefun void fmt_set_style (enum fmt_type @var{type}, struct fmt_number_style *@var{style})
583 Replaces the current numeric style for format @var{type} by the given
584 @var{style}, which becomes owned by the callee. @var{type} must be a
585 custom currency format and @var{style} must follow all the rules for
586 numeric styles explained above.
589 @deftypefun int fmt_decimal_char (enum fmt_type @var{type})
590 Returns the decimal point character for the given format @var{type}.
591 Equivalent to @code{fmt_get_style (@var{type})->decimal}.
594 @deftypefun int fmt_grouping_char (enum fmt_type @var{type})
595 Returns the grouping character for the given format @var{type}, or 0
596 if @var{type} output should not be grouped. Equivalent to
597 @code{fmt_get_style (@var{type})->grouping}.
600 @deftypefun void fmt_set_decimal (char @var{decimal})
601 Changes the decimal point character for the basic numeric formats to
602 @var{decimal}, which must be @samp{.} or @samp{,}. The F, E, COMMA,
603 DOLLAR, and PCT will use the specified decimal point character, and the
604 opposite character for grouping where appropriate. The DOT format
605 uses the reverse choices.
608 @node Formatted Data Input and Output
609 @subsection Formatted Data Input and Output
611 These functions provide the ability to convert data fields into
612 @union{value}s and vice versa.
614 @deftypefun bool data_in (struct substring @var{input}, enum legacy_encoding @var{legacy_encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, union value *@var{output}, int @var{width})
615 Parses @var{input} as a field containing data in the given format
616 @var{type}. The resulting value is stored in @var{output}, which has
617 the given @var{width}. For consistency, @var{width} must be 0 if
618 @var{type} is a numeric format type and greater than 0 if @var{type}
619 is a string format type.
621 Ordinarily @var{legacy_encoding} should be @code{LEGACY_NATIVE},
622 indicating that @var{input} is encoded in the character set
623 conventionally used on the host machine. It may be set to
624 @code{LEGACY_EBCDIC} to cause @var{input} to be re-encoded from EBCDIC
627 If @var{input} is the empty string (with length 0), @var{output} is
628 set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP
629 Users Guide}) for a numeric value, or to all spaces for a string
630 value. This applies regardless of the usual parsing requirements for
633 If @var{implied_decimals} is greater than zero, then the numeric
634 result is shifted right by @var{implied_decimals} decimal places if
635 @var{input} does not contain a decimal point character or an exponent.
636 Only certain numeric format types support implied decimal places; for
637 string formats and other numeric formats, @var{implied_decimals} has
638 no effect. DATA LIST FIXED is the primary user of this feature
639 (@pxref{DATA LIST FIXED,,,pspp, PSPP Users Guide}). Other callers
640 should generally specify 0 for @var{implied_decimals}, to disable this
643 When @var{input} contains invalid input data, @func{data_in} outputs a
644 message using @func{msg}.
646 If @var{first_column} is
647 nonzero, it is included in any such error message as the 1-based
648 column number of the start of the field. The last column in the field
649 is calculated as @math{@var{first_column} + @var{input} - 1}. To
650 suppress error output, enclose the call to @func{data_in} by calls to
651 @func{msg_disable} and @func{msg_enable}.
653 This function returns true on success, false if a message was output
654 (even if suppressed). Overflow and underflow provoke warnings but are
655 not propagated to the caller as errors.
657 This function is declared in @file{data/data-in.h}.
660 @deftypefun void data_out (const union value *@var{input}, const struct fmt_spec *@var{format}, char *@var{output})
661 @deftypefunx void data_out_legacy (const union value *@var{input}, enum legacy_encoding @var{legacy_encoding}, const struct fmt_spec *@var{format}, char *@var{output})
662 Converts the data pointed to by @var{input} into a data field in
663 @var{output} according to output format specifier @var{format}, which
664 must be a valid output format. Exactly @code{@var{format}->w} bytes
665 are written to @var{output}. The width of @var{input} is also
666 inferred from @var{format} using an algorithm equivalent to
667 @func{fmt_var_width}.
669 If @func{data_out} is called, or @func{data_out_legacy} is called with
670 @var{legacy_encoding} set to @code{LEGACY_NATIVE}, @var{output} will
671 be encoded in the character set conventionally used on the host
672 machine. If @var{legacy_encoding} is set to @code{LEGACY_EBCDIC},
673 @var{output} will be re-encoded from EBCDIC during data output.
675 When @var{input} contains data that cannot be represented in the given
676 @var{format}, @func{data_out} may output a message using @func{msg},
678 although the current implementation does not
679 consistently do so. To suppress error output, enclose the call to
680 @func{data_out} by calls to @func{msg_disable} and @func{msg_enable}.
682 This function is declared in @file{data/data-out.h}.
685 @node User-Missing Values
686 @section User-Missing Values
688 In addition to the system-missing value for numeric values, each
689 variable has a set of user-missing values (@pxref{MISSING
690 VALUES,,,pspp, PSPP Users Guide}). A set of user-missing values is
691 represented by @struct{missing_values}.
693 It is rarely necessary to interact directly with a
694 @struct{missing_values} object. Instead, the most common operation,
695 querying whether a particular value is a missing value for a given
696 variable, is most conveniently executed through functions on
697 @struct{variable}. @xref{Variable Missing Values}, for details.
699 A @struct{missing_values} is essentially a set of @union{value}s that
700 have a common value width (@pxref{Values}). For a set of
701 missing values associated with a variable (the common case), the set's
702 width is the same as the variable's width. The contents of a set of
703 missing values is subject to some restrictions. Regardless of width,
704 a set of missing values is allowed to be empty. Otherwise, its
705 possible contents depend on its width:
708 @item 0 (numeric values)
709 Up to three discrete numeric values, or a range of numeric values
710 (which includes both ends of the range), or a range plus one discrete
713 @item 1@dots{}@t{MAX_SHORT_STRING} - 1 (short string values)
714 Up to three discrete string values (with the same width as the set).
716 @item @t{MAX_SHORT_STRING}@dots{}@t{MAX_STRING} (long string values)
720 These somewhat arbitrary restrictions are the same as those imposed by
721 SPSS. In PSPP we could easily eliminate these restrictions, but doing
722 so would also require us to extend the system file format in an
723 incompatible way, which we consider a bad tradeoff.
725 Function prototypes and other declarations related to missing values
726 are declared in @file{data/missing-values.h}.
728 @deftp {Structure} {struct missing_values}
729 Opaque type that represents a set of missing values.
732 The most often useful functions for missing values are those for
733 testing whether a given value is missing, described in the following
734 section. Several other functions for creating, inspecting, and
735 modifying @struct{missing_values} objects are described afterward, but
736 these functions are much more rarely useful. No function for
737 destroying a @struct{missing_values} is provided, because
738 @struct{missing_values} does not contain any pointers or other
739 references to resources that need deallocation.
742 * Testing for Missing Values::
743 * Initializing User-Missing Value Sets::
744 * Changing User-Missing Value Set Width::
745 * Inspecting User-Missing Value Sets::
746 * Modifying User-Missing Value Sets::
749 @node Testing for Missing Values
750 @subsection Testing for Missing Values
752 The most often useful functions for missing values are those for
753 testing whether a given value is missing, described here. However,
754 using one of the corresponding missing value testing functions for
755 variables can be even easier (@pxref{Variable Missing Values}).
757 @deftypefun bool mv_is_value_missing (const struct missing_values *@var{mv}, const union value *@var{value}, enum mv_class @var{class})
758 @deftypefunx bool mv_is_num_missing (const struct missing_values *@var{mv}, double @var{value}, enum mv_class @var{class})
759 @deftypefunx bool mv_is_str_missing (const struct missing_values *@var{mv}, const char @var{value}[], enum mv_class @var{class})
760 Tests whether @var{value} is in one of the categories of missing
761 values given by @var{class}. Returns true if so, false otherwise.
763 @var{mv} determines the width of @var{value} and provides the set of
764 user-missing values to test.
766 The only difference among these functions in the form in which
767 @var{value} is provided, so you may use whichever function is most
770 The @var{class} argument determines the exact kinds of missing values
771 that the functions test for:
773 @deftp Enumeration {enum mv_class}
776 Returns true if @var{value} is in the set of user-missing values given
780 Returns true if @var{value} is system-missing. (If @var{mv}
781 represents a set of string values, then @var{value} is never
785 @itemx MV_USER | MV_SYSTEM
786 Returns true if @var{value} is user-missing or system-missing.
789 Always returns false, that is, @var{value} is never considered
795 @node Initializing User-Missing Value Sets
796 @subsection Initializing User-Missing Value Sets
798 @deftypefun void mv_init (struct missing_values *@var{mv}, int @var{width})
799 Initializes @var{mv} as a set of user-missing values. The set is
800 initially empty. Any values added to it must have the specified
804 @deftypefun void mv_copy (struct missing_values *@var{mv}, const struct missing_values *@var{old})
805 Initializes @var{mv} as a copy of the existing set of user-missing
809 @deftypefun void mv_clear (struct missing_values *@var{mv})
810 Empties the user-missing value set @var{mv}, retaining its existing
814 @node Changing User-Missing Value Set Width
815 @subsection Changing User-Missing Value Set Width
817 A few PSPP language constructs copy sets of user-missing values from
818 one variable to another. When the source and target variables have
819 the same width, this is simple. But when the target variable's width
820 might be different from the source variable's, it takes a little more
821 work. The functions described here can help.
823 In fact, it is usually unnecessary to call these functions directly.
824 Most of the time @func{var_set_missing_values}, which uses
825 @func{mv_resize} internally to resize the new set of missing values to
826 the required width, may be used instead.
827 @xref{var_set_missing_values}, for more information.
829 @deftypefun bool mv_is_resizable (const struct missing_values *@var{mv}, int @var{new_width})
830 Tests whether @var{mv}'s width may be changed to @var{new_width} using
831 @func{mv_resize}. Returns true if it is allowed, false otherwise.
833 If @var{new_width} is a long string width, @var{mv} may be resized
834 only if it is empty. Otherwise, if @var{mv} contains any missing
835 values, then it may be resized only if each missing value may be
836 resized, as determined by @func{value_is_resizable}
837 (@pxref{value_is_resizable}).
841 @deftypefun void mv_resize (struct missing_values *@var{mv}, int @var{width})
842 Changes @var{mv}'s width to @var{width}. @var{mv} and @var{width}
843 must satisfy the constraints explained above.
845 When a string missing value set's width is increased, each
846 user-missing value is padded on the right with spaces to the new
850 @node Inspecting User-Missing Value Sets
851 @subsection Inspecting User-Missing Value Sets
853 These functions inspect the properties and contents of
854 @struct{missing_values} objects.
856 The first set of functions inspects the discrete values that numeric
857 and short string sets of user-missing values may contain:
859 @deftypefun bool mv_is_empty (const struct missing_values *@var{mv})
860 Returns true if @var{mv} contains no user-missing values, false if it
861 contains at least one user-missing value (either a discrete value or a
865 @deftypefun int mv_get_width (const struct missing_values *@var{mv})
866 Returns the width of the user-missing values that @var{mv} represents.
869 @deftypefun int mv_n_values (const struct missing_values *@var{mv})
870 Returns the number of discrete user-missing values included in
871 @var{mv}. The return value will be between 0 and 3. For sets of
872 numeric user-missing values that include a range, the return value
876 @deftypefun bool mv_has_value (const struct missing_values *@var{mv})
877 Returns true if @var{mv} has at least one discrete user-missing
878 values, that is, if @func{mv_n_values} would return nonzero for
882 @deftypefun void mv_get_value (const struct missing_values *@var{mv}, union value *@var{value}, int @var{index})
883 Copies the discrete user-missing value in @var{mv} with the given
884 @var{index} into @var{value}. The index must be less than the number
885 of discrete user-missing values in @var{mv}, as reported by
889 The second set of functions inspects the single range of values that
890 numeric sets of user-missing values may contain:
892 @deftypefun bool mv_has_range (const struct missing_values *@var{mv})
893 Returns true if @var{mv} includes a range, false otherwise.
896 @deftypefun void mv_get_range (const struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
897 Stores the low endpoint of @var{mv}'s range in @code{*@var{low}} and
898 the high endpoint of the range in @code{*@var{high}}. @var{mv} must
902 @node Modifying User-Missing Value Sets
903 @subsection Modifying User-Missing Value Sets
905 These functions modify the contents of @struct{missing_values}
908 The first set of functions applies to all sets of user-missing values:
910 @deftypefun bool mv_add_value (struct missing_values *@var{mv}, const union value *@var{value})
911 @deftypefunx bool mv_add_str (struct missing_values *@var{mv}, const char @var{value}[])
912 @deftypefunx bool mv_add_num (struct missing_values *@var{mv}, double @var{value})
913 Attempts to add the given discrete @var{value} to set of user-missing
914 values @var{mv}. @var{value} must have the same width as @var{mv}.
915 Returns true if @var{value} was successfully added, false if the set
916 could not accept any more discrete values. (Always returns false if
917 @var{mv} is a set of long string user-missing values.)
919 These functions are equivalent, except for the form in which
920 @var{value} is provided, so you may use whichever function is most
924 @deftypefun void mv_pop_value (struct missing_values *@var{mv}, union value *@var{value})
925 Removes a discrete value from @var{mv} (which must contain at least
926 one discrete value) and stores it in @var{value}.
929 @deftypefun void mv_replace_value (struct missing_values *@var{mv}, const union value *@var{value}, int @var{index})
930 Replaces the discrete value with the given @var{index} in @var{mv}
931 (which must contain at least @var{index} + 1 discrete values) with
935 The second set of functions applies only to numeric sets of
938 @deftypefun bool mv_add_range (struct missing_values *@var{mv}, double @var{low}, double @var{high})
939 Attempts to add a numeric range covering @var{low}@dots{}@var{high}
940 (inclusive on both ends) to @var{mv}, which must be a numeric set of
941 user-missing values. Returns true if the range is successful added,
942 false on failure. Fails if @var{mv} already contains a range, or if
943 @var{mv} contains more than one discrete value, or if @var{low} >
947 @deftypefun void mv_pop_range (struct missing_values *@var{mv}, double *@var{low}, double *@var{high})
948 Given @var{mv}, which must be a numeric set of user-missing values
949 that contains a range, removes that range from @var{mv} and stores its
950 low endpoint in @code{*@var{low}} and its high endpoint in
955 @section Value Labels
957 Each variable has a set of value labels (@pxref{VALUE LABELS,,,pspp,
958 PSPP Users Guide}), represented as @struct{val_labs}. A
959 @struct{val_labs} is essentially a map from @union{value}s to strings.
960 All of the values in a set of value labels have the same width, which
961 for a set of value labels owned by a variable (the common case) is the
962 same as its variable.
964 Numeric and short string sets of value labels may contain any number
965 of entries. Long string sets of value labels may not contain any
966 value labels at all, due to a corresponding restriction in SPSS. In
967 PSPP we could easily eliminate this restriction, but doing so would
968 also require us to extend the system file format in an incompatible
969 way, which we consider a bad tradeoff.
971 It is rarely necessary to interact directly with a @struct{val_labs}
972 object. Instead, the most common operation, looking up the label for
973 a value of a given variable, can be conveniently executed through
974 functions on @struct{variable}. @xref{Variable Value Labels}, for
977 Function prototypes and other declarations related to missing values
978 are declared in @file{data/value-labels.h}.
980 @deftp {Structure} {struct val_labs}
981 Opaque type that represents a set of value labels.
984 The most often useful function for value labels is
985 @func{val_labs_find}, for looking up the label associated with a
988 @deftypefun {char *} val_labs_find (const struct val_labs *@var{val_labs}, union value @var{value})
989 Looks in @var{val_labs} for a label for the given @var{value}.
990 Returns the label, if one is found, or a null pointer otherwise.
993 Several other functions for working with value labels are described in
994 the following section, but these are more rarely useful.
997 * Value Labels Creation and Destruction::
998 * Value Labels Properties::
999 * Value Labels Adding and Removing Labels::
1000 * Value Labels Iteration::
1003 @node Value Labels Creation and Destruction
1004 @subsection Creation and Destruction
1006 These functions create and destroy @struct{val_labs} objects.
1008 @deftypefun {struct val_labs *} val_labs_create (int @var{width})
1009 Creates and returns an initially empty set of value labels with the
1013 @deftypefun {struct val_labs *} val_labs_clone (const struct val_labs *@var{val_labs})
1014 Creates and returns a set of value labels whose width and contents are
1015 the same as those of @var{var_labs}.
1018 @deftypefun void val_labs_clear (struct val_labs *@var{var_labs})
1019 Deletes all value labels from @var{var_labs}.
1022 @deftypefun void val_labs_destroy (struct val_labs *@var{var_labs})
1023 Destroys @var{var_labs}, which must not be referenced again.
1026 @node Value Labels Properties
1027 @subsection Value Labels Properties
1029 These functions inspect and manipulate basic properties of
1030 @struct{val_labs} objects.
1032 @deftypefun size_t val_labs_count (const struct val_labs *@var{val_labs})
1033 Returns the number of value labels in @var{val_labs}.
1036 @deftypefun bool val_labs_can_set_width (const struct val_labs *@var{val_labs}, int @var{new_width})
1037 Tests whether @var{val_labs}'s width may be changed to @var{new_width}
1038 using @func{val_labs_set_width}. Returns true if it is allowed, false
1041 A set of value labels may be resized to a given width only if each
1042 value in it may be resized to that width, as determined by
1043 @func{value_is_resizable} (@pxref{value_is_resizable}).
1046 @deftypefun void val_labs_set_width (struct val_labs *@var{val_labs}, int @var{new_width})
1047 Changes the width of @var{val_labs}'s values to @var{new_width}, which
1048 must be a valid new width as determined by
1049 @func{val_labs_can_set_width}.
1051 If @var{new_width} is a long string width, this function deletes all
1052 value labels from @var{val_labs}.
1055 @node Value Labels Adding and Removing Labels
1056 @subsection Adding and Removing Labels
1058 These functions add and remove value labels from a @struct{val_labs}
1059 object. These functions apply only to numeric and short string sets
1060 of value labels. They have no effect on long string sets of value
1061 labels, since these sets are always empty.
1063 @deftypefun bool val_labs_add (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1064 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1065 which must have the same width as the set of value labels. Returns
1066 true if successful, false if @var{value} already has a label or if
1067 @var{val_labs} has long string width.
1070 @deftypefun void val_labs_replace (struct val_labs *@var{val_labs}, union value @var{value}, const char *@var{label})
1071 Adds @var{label} to in @var{var_labs} as a label for @var{value},
1072 which must have the same width as the set of value labels. If
1073 @var{value} already has a label in @var{var_labs}, it is replaced.
1074 Has no effect if @var{var_labs} has long string width.
1077 @deftypefun bool val_labs_remove (struct val_labs *@var{val_labs}, union value @var{value})
1078 Removes from @var{val_labs} any label for @var{value}, which must have
1079 the same width as the set of value labels. Returns true if a label
1080 was removed, false otherwise.
1083 @node Value Labels Iteration
1084 @subsection Iterating through Value Labels
1086 These functions allow iteration through the set of value labels
1087 represented by a @struct{val_labs} object. They are usually used in
1088 the context of a @code{for} loop:
1091 struct val_labs val_labs;
1092 struct val_labs_iterator *i;
1097 for (vl = val_labs_first (val_labs, &i); vl != NULL;
1098 vl = val_labs_next (val_labs, &i))
1100 @dots{}@r{do something with @code{vl}}@dots{}
1104 The value labels in a @struct{val_labs} must not be modified as it is
1105 undergoing iteration.
1107 @deftp {Structure} {struct val_lab}
1108 Represents a value label for iteration purposes, with two
1109 client-visible members:
1112 @item union value value
1113 Value being labeled, of the same width as the @struct{val_labs} being
1116 @item const char *label
1117 The label, as a null-terminated string.
1121 @deftp {Structure} {struct val_labs_iterator}
1122 Opaque object that represents the current state of iteration through a
1123 set of value value labels. Automatically destroyed by successful
1124 completion of iteration. Must be destroyed manually in other
1125 circumstances, by calling @func{val_labs_done}.
1128 @deftypefun {struct val_lab *} val_labs_first (const struct val_labs *@var{val_labs}, struct val_labs_iterator **@var{iterator})
1129 If @var{val_labs} contains at least one value label, starts an
1130 iteration through @var{val_labs}, initializes @code{*@var{iterator}}
1131 to point to a newly allocated iterator, and returns the first value
1132 label in @var{val_labs}. If @var{val_labs} is empty, sets
1133 @code{*@var{iterator}} to null and returns a null pointer.
1135 This function creates iterators that traverse sets of value labels in
1136 no particular order.
1139 @deftypefun {struct val_lab *} val_labs_first_sorted (const struct val_labs *@var{val_labs}, struct val_labs_iterator **@var{iterator})
1140 Same as @func{val_labs_first}, except that the created iterator
1141 traverses the set of value labels in ascending order of value.
1144 @deftypefun {struct val_lab *} val_labs_next (const struct val_labs *@var{val_labs}, struct val_labs_iterator **@var{iterator})
1145 Advances an iterator created with @func{val_labs_first} or
1146 @func{val_labs_first_sorted} to the next value label, which is
1147 returned. If the set of value labels is exhausted, returns a null
1148 pointer after freeing @code{*@var{iterator}} and setting it to a null
1152 @deftypefun void val_labs_done (struct val_labs_iterator **@var{iterator})
1153 Frees @code{*@var{iterator}} and sets it to a null pointer. Does
1154 not need to be called explicitly if @func{val_labs_next} returns a
1155 null pointer, indicating that all value labels have been visited.
1161 A PSPP variable is represented by @struct{variable}, an opaque type
1162 declared in @file{data/variable.h} along with related declarations.
1163 @xref{Variables,,,pspp, PSPP Users Guide}, for a description of PSPP
1164 variables from a user perspective.
1166 PSPP is unusual among computer languages in that, by itself, a PSPP
1167 variable does not have a value. Instead, a variable in PSPP takes on
1168 a value only in the context of a case, which supplies one value for
1169 each variable in a set of variables (@pxref{Cases}). The set of
1170 variables in a case, in turn, are ordinarily part of a dictionary
1171 (@pxref{Dictionaries}).
1173 Every variable has several attributes, most of which correspond
1174 directly to one of the variable attributes visible to PSPP users
1175 (@pxref{Attributes,,,pspp, PSPP Users Guide}).
1177 The following sections describe variable-related functions and macros.
1181 * Variable Type and Width::
1182 * Variable Missing Values::
1183 * Variable Value Labels::
1184 * Variable Print and Write Formats::
1186 * Variable GUI Attributes::
1187 * Variable Leave Status::
1188 * Dictionary Class::
1189 * Variable Creation and Destruction::
1190 * Variable Short Names::
1191 * Variable Relationships::
1192 * Variable Auxiliary Data::
1193 * Variable Categorical Values::
1197 @subsection Variable Name
1199 A variable name is a string between 1 and @code{VAR_NAME_LEN} bytes
1200 long that satisfies the rules for PSPP identifiers
1201 (@pxref{Tokens,,,pspp, PSPP Users Guide}). Variable names are
1202 mixed-case and treated case-insensitively.
1204 @deftypefn Macro int VAR_NAME_LEN
1205 Maximum length of a variable name, in bytes, currently 64.
1208 Only one commonly useful function relates to variable names:
1210 @deftypefun {const char *} var_get_name (const struct variable *@var{var})
1211 Returns @var{var}'s variable name as a C string.
1214 A few other functions are much more rarely used. Some of these
1215 functions are used internally by the dictionary implementation:
1217 @anchor{var_set_name}
1218 @deftypefun {void} var_set_name (struct variable *@var{var}, const char *@var{new_name})
1219 Changes the name of @var{var} to @var{new_name}, which must be a
1220 ``plausible'' name as defined below.
1222 This function cannot be applied to a variable that is part of a
1223 dictionary. Use @func{dict_rename_var} instead (@pxref{Dictionary
1224 Renaming Variables}).
1227 @anchor{var_is_plausible_name}
1228 @deftypefun {bool} var_is_valid_name (const char *@var{name}, bool @var{issue_error})
1229 @deftypefunx {bool} var_is_plausible_name (const char *@var{name}, bool @var{issue_error})
1230 Tests @var{name} for validity or ``plausibility.'' Returns true if
1231 the name is acceptable, false otherwise. If the name is not
1232 acceptable and @var{issue_error} is true, also issues an error message
1233 explaining the violation.
1235 A valid name is one that fully satisfies all of the requirements for
1236 variable names (@pxref{Tokens,,,pspp, PSPP Users Guide}). A
1237 ``plausible'' name is simply a string whose length is in the valid
1238 range and that is not a reserved word. PSPP accepts plausible but
1239 invalid names as variable names in some contexts where the character
1240 encoding scheme is ambiguous, as when reading variable names from
1244 @deftypefun {enum dict_class} var_get_dict_class (const struct variable *@var{var})
1245 Returns the dictionary class of @var{var}'s name (@pxref{Dictionary
1249 @node Variable Type and Width
1250 @subsection Variable Type and Width
1252 A variable's type and width are the type and width of its values
1255 @deftypefun {enum val_type} var_get_type (const struct variable *@var{var})
1256 Returns the type of variable @var{var}.
1259 @deftypefun int var_get_width (const struct variable *@var{var})
1260 Returns the width of variable @var{var}.
1263 @deftypefun void var_set_width (struct variable *@var{var}, int @var{width})
1264 Sets the width of variable @var{var} to @var{width}. The width of a
1265 variable should not normally be changed after the variable is created,
1266 so this function is rarely used. This function cannot be applied to a
1267 variable that is part of a dictionary.
1270 @deftypefun bool var_is_numeric (const struct variable *@var{var})
1271 Returns true if @var{var} is a numeric variable, false otherwise.
1274 @deftypefun bool var_is_alpha (const struct variable *@var{var})
1275 Returns true if @var{var} is an alphanumeric (string) variable, false
1279 @deftypefun bool var_is_short_string (const struct variable *@var{var})
1280 Returns true if @var{var} is a string variable of width
1281 @code{MAX_SHORT_STRING} or less, false otherwise.
1284 @deftypefun bool var_is_long_string (const struct variable *var{var})
1285 Returns true if @var{var} is a string variable of width greater than
1286 @code{MAX_SHORT_STRING}, false otherwise.
1289 @deftypefun size_t var_get_value_cnt (const struct variable *@var{var})
1290 Returns the number of @union{value}s needed to hold an instance of
1291 variable @var{var}. @code{var_get_value_cnt (var)} is equivalent to
1292 @code{value_cnt_from_width (var_get_width (var))}.
1295 @node Variable Missing Values
1296 @subsection Variable Missing Values
1298 A numeric or short string variable may have a set of user-missing
1299 values (@pxref{MISSING VALUES,,,pspp, PSPP Users Guide}), represented
1300 as a @struct{missing_values} (@pxref{User-Missing Values}).
1302 The most frequent operation on a variable's missing values is to query
1303 whether a value is user- or system-missing:
1305 @deftypefun bool var_is_value_missing (const struct variable *@var{var}, const union value *@var{value}, enum mv_class @var{class})
1306 @deftypefunx bool var_is_num_missing (const struct variable *@var{var}, double @var{value}, enum mv_class @var{class})
1307 @deftypefunx bool var_is_str_missing (const struct variable *@var{var}, const char @var{value}[], enum mv_class @var{class})
1308 Tests whether @var{value} is a missing value of the given @var{class}
1309 for variable @var{var} and returns true if so, false otherwise.
1310 @func{var_is_num_missing} may only be applied to numeric variables;
1311 @func{var_is_str_missing} may only be applied to string variables.
1312 For string variables, @var{value} must contain exactly as many
1313 characters as @var{var}'s width.
1315 @code{var_is_@var{type}_missing (@var{var}, @var{value}, @var{class})}
1316 is equivalent to @code{mv_is_@var{type}_missing
1317 (var_get_missing_values (@var{var}), @var{value}, @var{class})}.
1320 In addition, a few functions are provided to work more directly with a
1321 variable's @struct{missing_values}:
1323 @deftypefun {const struct missing_values *} var_get_missing_values (const struct variable *@var{var})
1324 Returns the @struct{missing_values} associated with @var{var}. The
1325 caller must not modify the returned structure. The return value is
1329 @anchor{var_set_missing_values}
1330 @deftypefun {void} var_set_missing_values (struct variable *@var{var}, const struct missing_values *@var{miss})
1331 Changes @var{var}'s missing values to a copy of @var{miss}, or if
1332 @var{miss} is a null pointer, clears @var{var}'s missing values. If
1333 @var{miss} is non-null, it must have the same width as @var{var} or be
1334 resizable to @var{var}'s width (@pxref{mv_resize}). The caller
1335 retains ownership of @var{miss}.
1338 b@deftypefun void var_clear_missing_values (struct variable *@var{var})
1339 Clears @var{var}'s missing values. Equivalent to
1340 @code{var_set_missing_values (@var{var}, NULL)}.
1343 @deftypefun bool var_has_missing_values (const struct variable *@var{var})
1344 Returns true if @var{var} has any missing values, false if it has
1345 none. Equivalent to @code{mv_is_empty (var_get_missing_values (@var{var}))}.
1348 @node Variable Value Labels
1349 @subsection Variable Value Labels
1351 A numeric or short string variable may have a set of value labels
1352 (@pxref{VALUE LABELS,,,pspp, PSPP Users Guide}), represented as a
1353 @struct{val_labs} (@pxref{Value Labels}). The most commonly useful
1354 functions for value labels return the value label associated with a
1357 @deftypefun {const char *} var_lookup_value_label (const struct variable *@var{var}, const union value *@var{value})
1358 Looks for a label for @var{value} in @var{var}'s set of value labels.
1359 Returns the label if one exists, otherwise a null pointer.
1362 @deftypefun void var_append_value_name (const struct variable *@var{var}, const union value *@var{value}, struct string *@var{str})
1363 Looks for a label for @var{value} in @var{var}'s set of value labels.
1364 If a label exists, it will be appended to the string pointed to by @var{str}.
1365 Otherwise, it formats @var{value}
1366 using @var{var}'s print format (@pxref{Input and Output Formats})
1367 and appends the formatted string.
1370 The underlying @struct{val_labs} structure may also be accessed
1371 directly using the functions described below.
1373 @deftypefun bool var_has_value_labels (const struct variable *@var{var})
1374 Returns true if @var{var} has at least one value label, false
1378 @deftypefun {const struct val_labs *} var_get_value_labels (const struct variable *@var{var})
1379 Returns the @struct{val_labs} associated with @var{var}. If @var{var}
1380 has no value labels, then the return value may or may not be a null
1383 The variable retains ownership of the returned @struct{val_labs},
1384 which the caller must not attempt to modify.
1387 @deftypefun void var_set_value_labels (struct variable *@var{var}, const struct val_labs *@var{val_labs})
1388 Replaces @var{var}'s value labels by a copy of @var{val_labs}. The
1389 caller retains ownership of @var{val_labs}. If @var{val_labs} is a
1390 null pointer, then @var{var}'s value labels, if any, are deleted.
1393 @deftypefun void var_clear_value_labels (struct variable *@var{var})
1394 Deletes @var{var}'s value labels. Equivalent to
1395 @code{var_set_value_labels (@var{var}, NULL)}.
1398 A final group of functions offers shorthands for operations that would
1399 otherwise require getting the value labels from a variable, copying
1400 them, modifying them, and then setting the modified value labels into
1401 the variable (making a second copy):
1403 @deftypefun bool var_add_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1404 Attempts to add a copy of @var{label} as a label for @var{value} for
1405 the given @var{var}. If @var{value} already has a label, then the old
1406 label is retained. Returns true if a label is added, false if there
1407 was an existing label for @var{value} or if @var{var} is a long string
1408 variable. Either way, the caller retains ownership of @var{value} and
1412 @deftypefun void var_replace_value_label (struct variable *@var{var}, const union value *@var{value}, const char *@var{label})
1413 Attempts to add a copy of @var{label} as a label for @var{value} for
1414 the given @var{var}. If @var{value} already has a label, then
1415 @var{label} replaces the old label. Either way, the caller retains
1416 ownership of @var{value} and @var{label}.
1418 If @var{var} is a long string variable, this function has no effect.
1421 @node Variable Print and Write Formats
1422 @subsection Variable Print and Write Formats
1424 Each variable has an associated pair of output formats, called its
1425 @dfn{print format} and @dfn{write format}. @xref{Input and Output
1426 Formats,,,pspp, PSPP Users Guide}, for an introduction to formats.
1427 @xref{Input and Output Formats}, for a developer's description of
1428 format representation.
1430 The print format is used to convert a variable's data values to
1431 strings for human-readable output. The write format is used similarly
1432 for machine-readable output, primarily by the WRITE transformation
1433 (@pxref{WRITE,,,pspp, PSPP Users Guide}). Most often a variable's
1434 print and write formats are the same.
1436 A newly created variable by default has format F8.2 if it is numeric
1437 or an A format with the same width as the variable if it is string.
1438 Many creators of variables override these defaults.
1440 Both the print format and write format are output formats. Input
1441 formats are not part of @struct{variable}. Instead, input programs
1442 and transformations keep track of variable input formats themselves.
1444 The following functions work with variable print and write formats.
1446 @deftypefun {const struct fmt_spec *} var_get_print_format (const struct variable *@var{var})
1447 @deftypefunx {const struct fmt_spec *} var_get_write_format (const struct variable *@var{var})
1448 Returns @var{var}'s print or write format, respectively.
1451 @deftypefun void var_set_print_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1452 @deftypefunx void var_set_write_format (struct variable *@var{var}, const struct fmt_spec *@var{format})
1453 @deftypefunx void var_set_both_formats (struct variable *@var{var}, const struct fmt_spec *@var{format})
1454 Sets @var{var}'s print format, write format, or both formats,
1455 respectively, to a copy of @var{format}.
1458 @node Variable Labels
1459 @subsection Variable Labels
1461 A variable label is a string that describes a variable. Variable
1462 labels may contain spaces and punctuation not allowed in variable
1463 names. @xref{VARIABLE LABELS,,,pspp, PSPP Users Guide}, for a
1464 user-level description of variable labels.
1466 The most commonly useful functions for variable labels are those to
1467 retrieve a variable's label:
1469 @deftypefun {const char *} var_to_string (const struct variable *@var{var})
1470 Returns @var{var}'s variable label, if it has one, otherwise
1471 @var{var}'s name. In either case the caller must not attempt to
1472 modify or free the returned string.
1474 This function is useful for user output.
1477 @deftypefun {const char *} var_get_label (const struct variable *@var{var})
1478 Returns @var{var}'s variable label, if it has one, or a null pointer
1482 A few other variable label functions are also provided:
1484 @deftypefun void var_set_label (struct variable *@var{var}, const char *@var{label})
1485 Sets @var{var}'s variable label to a copy of @var{label}, or removes
1486 any label from @var{var} if @var{label} is a null pointer or contains
1487 only spaces. Leading and trailing spaces are removed from the
1488 variable label and its remaining content is truncated at 255 bytes.
1491 @deftypefun void var_clear_label (struct variable *@var{var})
1492 Removes any variable label from @var{var}.
1495 @deftypefun bool var_has_label (const struct variable *@var{var})
1496 Returns true if @var{var} has a variable label, false otherwise.
1499 @node Variable GUI Attributes
1500 @subsection GUI Attributes
1502 These functions and types access and set attributes that are mainly
1503 used by graphical user interfaces. Their values are also stored in
1504 and retrieved from system files (but not portable files).
1506 The first group of functions relate to the measurement level of
1507 numeric data. New variables are assigned a nominal level of
1508 measurement by default.
1510 @deftp {Enumeration} {enum measure}
1511 Measurement level. Available values are:
1514 @item MEASURE_NOMINAL
1515 Numeric data values are arbitrary. Arithmetic operations and
1516 numerical comparisons of such data are not meaningful.
1518 @item MEASURE_ORDINAL
1519 Numeric data values indicate progression along a rank order.
1520 Arbitrary arithmetic operations such as addition are not meaningful on
1521 such data, but inequality comparisons (less, greater, etc.) have
1522 straightforward interpretations.
1525 Ratios, sums, etc. of numeric data values have meaningful
1529 PSPP does not have a separate category for interval data, which would
1530 naturally fall between the ordinal and scale measurement levels.
1533 @deftypefun bool measure_is_valid (enum measure @var{measure})
1534 Returns true if @var{measure} is a valid level of measurement, that
1535 is, if it is one of the @code{enum measure} constants listed above,
1536 and false otherwise.
1539 @deftypefun enum measure var_get_measure (const struct variable *@var{var})
1540 @deftypefunx void var_set_measure (struct variable *@var{var}, enum measure @var{measure})
1541 Gets or sets @var{var}'s measurement level.
1544 The following set of functions relates to the width of on-screen
1545 columns used for displaying variable data in a graphical user
1546 interface environment. The unit of measurement is the width of a
1547 character. For proportionally spaced fonts, this is based on the
1548 average width of a character.
1550 @deftypefun int var_get_display_width (const struct variable *@var{var})
1551 @deftypefunx void var_set_display_width (struct variable *@var{var}, int @var{display_width})
1552 Gets or sets @var{var}'s display width.
1555 @anchor{var_default_display_width}
1556 @deftypefun int var_default_display_width (int @var{width})
1557 Returns the default display width for a variable with the given
1558 @var{width}. The default width of a numeric variable is 8. The
1559 default width of a string variable is @var{width} or 32, whichever is
1563 The final group of functions work with the justification of data when
1564 it is displayed in on-screen columns. New variables are by default
1567 @deftp {Enumeration} {enum alignment}
1568 Text justification. Possible values are @code{ALIGN_LEFT},
1569 @code{ALIGN_RIGHT}, and @code{ALIGN_CENTRE}.
1572 @deftypefun bool alignment_is_valid (enum alignment @var{alignment})
1573 Returns true if @var{alignment} is a valid alignment, that is, if it
1574 is one of the @code{enum alignment} constants listed above, and false
1578 @deftypefun enum alignment var_get_alignment (const struct variable *@var{var})
1579 @deftypefunx void var_set_alignment (struct variable *@var{var}, enum alignment @var{alignment})
1580 Gets or sets @var{var}'s alignment.
1583 @node Variable Leave Status
1584 @subsection Variable Leave Status
1586 Commonly, most or all data in a case come from an input file, read
1587 with a command such as DATA LIST or GET, but data can also be
1588 generated with transformations such as COMPUTE. In the latter case
1589 the question of a datum's ``initial value'' can arise. For example,
1590 the value of a piece of generated data can recursively depend on its
1595 Another situation where the initial value of a variable arises is when
1596 its value is not set at all for some cases, e.g.@: below, @code{Y} is
1597 set only for the first 10 cases:
1599 DO IF #CASENUM <= 10.
1604 By default, the initial value of a datum in either of these situations
1605 is the system-missing value for numeric values and spaces for string
1606 values. This means that, above, X would be system-missing and that Y
1607 would be 1 for the first 10 cases and system-missing for the
1610 PSPP also supports retaining the value of a variable from one case to
1611 another, using the LEAVE command (@pxref{LEAVE,,,pspp, PSPP Users
1612 Guide}). The initial value of such a variable is 0 if it is numeric
1613 and spaces if it is a string. If the command @samp{LEAVE X Y} is
1614 appended to the above example, then X would have value 1 in the first
1615 case and increase by 1 in every succeeding case, and Y would have
1616 value 1 for the first 10 cases and 0 for later cases.
1618 The LEAVE command has no effect on data that comes from an input file
1619 or whose values do not depend on a variable's initial value.
1621 The value of scratch variables (@pxref{Scratch Variables,,,pspp, PSPP
1622 Users Guide}) are always left from one case to another.
1624 The following functions work with a variable's leave status.
1626 @deftypefun bool var_get_leave (const struct variable *@var{var})
1627 Returns true if @var{var}'s value is to be retained from case to case,
1628 false if it is reinitialized to system-missing or spaces.
1631 @deftypefun void var_set_leave (struct variable *@var{var}, bool @var{leave})
1632 If @var{leave} is true, marks @var{var} to be left from case to case;
1633 if @var{leave} is false, marks @var{var} to be reinitialized for each
1636 If @var{var} is a scratch variable, @var{leave} must be true.
1639 @deftypefun bool var_must_leave (const struct variable *@var{var})
1640 Returns true if @var{var} must be left from case to case, that is, if
1641 @var{var} is a scratch variable.
1644 @node Dictionary Class
1645 @subsection Dictionary Class
1647 Occasionally it is useful to classify variables into @dfn{dictionary
1648 classes} based on their names. Dictionary classes are represented by
1649 @enum{dict_class}. This type and other declarations for dictionary
1650 classes are in the @file{<data/dict-class.h>} header.
1652 @deftp {Enumeration} {enum dict_class}
1653 The dictionary classes are:
1657 An ordinary variable, one whose name does not begin with @samp{$} or
1661 A system variable, one whose name begins with @samp{$}. @xref{System
1662 Variables,,,pspp, PSPP Users Guide}.
1665 A scratch variable, one whose name begins with @samp{#}.
1666 @xref{Scratch Variables,,,pspp, PSPP Users Guide}.
1669 The values for dictionary classes are bitwise disjoint, which allows
1670 them to be used in bit-masks. An extra enumeration constant
1671 @code{DC_ALL}, whose value is the bitwise-@i{or} of all of the above
1672 constants, is provided to aid in this purpose.
1675 One example use of dictionary classes arises in connection with PSPP
1676 syntax that uses @code{@var{a} TO @var{b}} to name the variables in a
1677 dictionary from @var{a} to @var{b} (@pxref{Sets of Variables,,,pspp,
1678 PSPP Users Guide}). This syntax requires @var{a} and @var{b} to be in
1679 the same dictionary class. It limits the variables that it includes
1680 to those in that dictionary class.
1682 The following functions relate to dictionary classes.
1684 @deftypefun {enum dict_class} dict_class_from_id (const char *@var{name})
1685 Returns the ``dictionary class'' for the given variable @var{name}, by
1686 looking at its first letter.
1689 @deftypefun {const char *} dict_class_to_name (enum dict_class @var{dict_class})
1690 Returns a name for the given @var{dict_class} as an adjective, e.g.@:
1693 This function should probably not be used in new code as it can lead
1694 to difficulties for internationalization.
1697 @node Variable Creation and Destruction
1698 @subsection Variable Creation and Destruction
1700 Only rarely should PSPP code create or destroy variables directly.
1701 Ordinarily, variables are created within a dictionary and destroying
1702 by individual deletion from the dictionary or by destroying the entire
1703 dictionary at once. The functions here enable the exceptional case,
1704 of creation and destruction of variables that are not associated with
1705 any dictionary. These functions are used internally in the dictionary
1709 @deftypefun {struct variable *} var_create (const char *@var{name}, int @var{width})
1710 Creates and returns a new variable with the given @var{name} and
1711 @var{width}. The new variable is not part of any dictionary. Use
1712 @func{dict_create_var}, instead, to create a variable in a dictionary
1713 (@pxref{Dictionary Creating Variables}).
1715 @var{name} should be a valid variable name and must be a ``plausible''
1716 variable name (@pxref{Variable Name}). @var{width} must be between 0
1717 and @code{MAX_STRING}, inclusive (@pxref{Values}).
1719 The new variable has no user-missing values, value labels, or variable
1720 label. Numeric variables initially have F8.2 print and write formats,
1721 right-justified display alignment, and scale level of measurement.
1722 String variables are created with A print and write formats,
1723 left-justified display alignment, and nominal level of measurement.
1724 The initial display width is determined by
1725 @func{var_default_display_width} (@pxref{var_default_display_width}).
1727 The new variable initially has no short name (@pxref{Variable Short
1728 Names}) and no auxiliary data (@pxref{Variable Auxiliary Data}).
1732 @deftypefun {struct variable *} var_clone (const struct variable *@var{old_var})
1733 Creates and returns a new variable with the same attributes as
1734 @var{old_var}, with a few exceptions. First, the new variable is not
1735 part of any dictionary, regardless of whether @var{old_var} was in a
1736 dictionary. Use @func{dict_clone_var}, instead, to add a clone of a
1737 variable to a dictionary.
1739 Second, the new variable is not given any short name, even if
1740 @var{old_var} had a short name. This is because the new variable is
1741 likely to be immediately renamed, in which case the short name would
1742 be incorrect (@pxref{Variable Short Names}).
1744 Finally, @var{old_var}'s auxiliary data, if any, is not copied to the
1745 new variable (@pxref{Variable Auxiliary Data}).
1748 @deftypefun {void} var_destroy (struct variable *@var{var})
1749 Destroys @var{var} and frees all associated storage, including its
1750 auxiliary data, if any. @var{var} must not be part of a dictionary.
1751 To delete a variable from a dictionary and destroy it, use
1752 @func{dict_delete_var} (@pxref{Dictionary Deleting Variables}).
1755 @node Variable Short Names
1756 @subsection Variable Short Names
1758 PSPP variable names may be up to 64 (@code{VAR_NAME_LEN}) bytes long.
1759 The system and portable file formats, however, were designed when
1760 variable names were limited to 8 bytes in length. Since then, the
1761 system file format has been augmented with an extension record that
1762 explains how the 8-byte short names map to full-length names
1763 (@pxref{Long Variable Names Record}), but the short names are still
1764 present. Thus, the continued presence of the short names is more or
1765 less invisible to PSPP users, but every variable in a system file
1766 still has a short name that must be unique.
1768 PSPP can generate unique short names for variables based on their full
1769 names at the time it creates the data file. If all variables' full
1770 names are unique in their first 8 bytes, then the short names are
1771 simply prefixes of the full names; otherwise, PSPP changes them so
1772 that they are unique.
1774 By itself this algorithm interoperates well with other software that
1775 can read system files, as long as that software understands the
1776 extension record that maps short names to long names. When the other
1777 software does not understand the extension record, it can produce
1778 surprising results. Consider a situation where PSPP reads a system
1779 file that contains two variables named RANKINGSCORE, then the user
1780 adds a new variable named RANKINGSTATUS, then saves the modified data
1781 as a new system file. A program that does not understand long names
1782 would then see one of these variables under the name RANKINGS---either
1783 one, depending on the algorithm's details---and the other under a
1784 different name. The effect could be very confusing: by adding a new
1785 and apparently unrelated variable in PSPP, the user effectively
1786 renamed the existing variable.
1788 To counteract this potential problem, every @struct{variable} may have
1789 a short name. A variable created by the system or portable file
1790 reader receives the short name from that data file. When a variable
1791 with a short name is written to a system or portable file, that
1792 variable receives priority over other long names whose names begin
1793 with the same 8 bytes but which were not read from a data file under
1796 Variables not created by the system or portable file reader have no
1797 short name by default.
1799 A variable with a full name of 8 bytes or less in length has absolute
1800 priority for that name when the variable is written to a system file,
1801 even over a second variable with that assigned short name.
1803 PSPP does not enforce uniqueness of short names, although the short
1804 names read from any given data file will always be unique. If two
1805 variables with the same short name are written to a single data file,
1806 neither one receives priority.
1808 The following macros and functions relate to short names.
1810 @defmac SHORT_NAME_LEN
1811 Maximum length of a short name, in bytes. Its value is 8.
1814 @deftypefun {const char *} var_get_short_name (const struct variable *@var{var})
1815 Returns @var{var}'s short name, or a null pointer if @var{var} has not
1816 been assigned a short name.
1819 @deftypefun void var_set_short_name (struct variable *@var{var}, const char *@var{short_name})
1820 Sets @var{var}'s short name to @var{short_name}, or removes
1821 @var{var}'s short name if @var{short_name} is a null pointer. If it
1822 is non-null, then @var{short_name} must be a plausible name for a
1823 variable (@pxref{var_is_plausible_name}). The name will be truncated
1824 to 8 bytes in length and converted to all-uppercase.
1827 @deftypefun void var_clear_short_name (struct variable *@var{var})
1828 Removes @var{var}'s short name.
1831 @node Variable Relationships
1832 @subsection Variable Relationships
1834 Variables have close relationships with dictionaries
1835 (@pxref{Dictionaries}) and cases (@pxref{Cases}). A variable is
1836 usually a member of some dictionary, and a case is often used to store
1837 data for the set of variables in a dictionary.
1839 These functions report on these relationships. They may be applied
1840 only to variables that are in a dictionary.
1842 @deftypefun size_t var_get_dict_index (const struct variable *@var{var})
1843 Returns @var{var}'s index within its dictionary. The first variable
1844 in a dictionary has index 0, the next variable index 1, and so on.
1846 The dictionary index can be influenced using dictionary functions such
1847 as dict_reorder_var (@pxref{dict_reorder_var}).
1850 @deftypefun size_t var_get_case_index (const struct variable *@var{var})
1851 Returns @var{var}'s index within a case. The case index is an index
1852 into an array of @union{value} large enough to contain all the data in
1855 The returned case index can be used to access the value of @var{var}
1856 within a case for its dictionary, as in e.g.@: @code{case_data_idx
1857 (case, var_get_case_index (@var{var}))}, but ordinarily it is more
1858 convenient to use the data access functions that do variable-to-index
1859 translation internally, as in e.g.@: @code{case_data (case,
1863 @node Variable Auxiliary Data
1864 @subsection Variable Auxiliary Data
1866 Each @struct{variable} can have a single pointer to auxiliary data of
1867 type @code{void *}. These functions manipulate a variable's auxiliary
1870 Use of auxiliary data is discouraged because of its lack of
1871 flexibility. Only one client can make use of auxiliary data on a
1872 given variable at any time, even though many clients could usefully
1873 associate data with a variable.
1875 To prevent multiple clients from attempting to use a variable's single
1876 auxiliary data field at the same time, we adopt the convention that
1877 use of auxiliary data in the active file dictionary is restricted to
1878 the currently executing command. In particular, transformations must
1879 not attach auxiliary data to a variable in the active file in the
1880 expectation that it can be used later when the active file is read and
1881 the transformation is executed. To help enforce this restriction,
1882 auxiliary data is deleted from all variables in the active file
1883 dictionary after the execution of each PSPP command.
1885 This convention for safe use of auxiliary data applies only to the
1886 active file dictionary. Rules for other dictionaries may be
1887 established separately.
1889 Auxiliary data should be replaced by a more flexible mechanism at some
1890 point, but no replacement mechanism has been designed or implemented
1893 The following functions work with variable auxiliary data.
1895 @deftypefun {void *} var_get_aux (const struct variable *@var{var})
1896 Returns @var{var}'s auxiliary data, or a null pointer if none has been
1900 @deftypefun {void *} var_attach_aux (const struct variable *@var{var}, void *@var{aux}, void (*@var{aux_dtor}) (struct variable *))
1901 Sets @var{var}'s auxiliary data to @var{aux}, which must not be null.
1902 @var{var} must not already have auxiliary data.
1904 Before @var{var}'s auxiliary data is cleared by @code{var_clear_aux},
1905 @var{aux_dtor}, if non-null, will be called with @var{var} as its
1906 argument. It should free any storage associated with @var{aux}, if
1907 necessary. @code{var_dtor_free} may be appropriate for use as
1910 @deffn {Function} void var_dtor_free (struct variable *@var{var})
1911 Frees @var{var}'s auxiliary data by calling @code{free}.
1915 @deftypefun void var_clear_aux (struct variable *@var{var})
1916 Removes auxiliary data, if any, from @var{var}, first calling the
1917 destructor passed to @code{var_attach_aux}, if one was provided.
1919 Use @code{dict_clear_aux} to remove auxiliary data from every variable
1920 in a dictionary. @c (@pxref{dict_clear_aux}).
1923 @deftypefun {void *} var_detach_aux (struct variable *@var{var})
1924 Removes auxiliary data, if any, from @var{var}, and returns it.
1925 Returns a null pointer if @var{var} had no auxiliary data.
1927 Any destructor passed to @code{var_attach_aux} is not called, so the
1928 caller is responsible for freeing storage associated with the returned
1932 @node Variable Categorical Values
1933 @subsection Variable Categorical Values
1935 Some statistical procedures require a list of all the values that a
1936 categorical variable takes on. Arranging such a list requires making
1937 a pass through the data, so PSPP caches categorical values in
1940 When variable auxiliary data is revamped to support multiple clients
1941 as described in the previous section, categorical values are an
1942 obvious candidate. The form in which they are currently supported is
1945 Categorical values are not robust against changes in the data. That
1946 is, there is currently no way to detect that a transformation has
1947 changed data values, meaning that categorical values lists for the
1948 changed variables must be recomputed. PSPP is in fact in need of a
1949 general-purpose caching and cache-invalidation mechanism, but none
1950 has yet been designed and built.
1952 The following functions work with cached categorical values.
1954 @deftypefun {struct cat_vals *} var_get_obs_vals (const struct variable *@var{var})
1955 Returns @var{var}'s set of categorical values. Yields undefined
1956 behavior if @var{var} does not have any categorical values.
1959 @deftypefun void var_set_obs_vals (const struct variable *@var{var}, struct cat_vals *@var{cat_vals})
1960 Destroys @var{var}'s categorical values, if any, and replaces them by
1961 @var{cat_vals}, ownership of which is transferred to @var{var}. If
1962 @var{cat_vals} is a null pointer, then @var{var}'s categorical values
1966 @deftypefun bool var_has_obs_vals (const struct variable *@var{var})
1967 Returns true if @var{var} has a set of categorical values, false
1972 @section Dictionaries
1974 Each data file in memory or on disk has an associated dictionary,
1975 whose primary purpose is to describe the data in the file.
1976 @xref{Variables,,,pspp, PSPP Users Guide}, for a PSPP user's view of a
1979 A data file stored in a PSPP format, either as a system or portable
1980 file, has a representation of its dictionary embedded in it. Other
1981 kinds of data files are usually not self-describing enough to
1982 construct a dictionary unassisted, so the dictionaries for these files
1983 must be specified explicitly with PSPP commands such as @cmd{DATA
1986 The most important content of a dictionary is an array of variables,
1987 which must have unique names. A dictionary also conceptually contains
1988 a mapping from each of its variables to a location within a case
1989 (@pxref{Cases}), although in fact these mappings are stored within
1990 individual variables.
1992 System variables are not members of any dictionary (@pxref{System
1993 Variables,,,pspp, PSPP Users Guide}).
1995 Dictionaries are represented by @struct{dictionary}. Declarations
1996 related to dictionaries are in the @file{<data/dictionary.h>} header.
1998 The following sections describe functions for use with dictionaries.
2001 * Dictionary Variable Access::
2002 * Dictionary Creating Variables::
2003 * Dictionary Deleting Variables::
2004 * Dictionary Reordering Variables::
2005 * Dictionary Renaming Variables::
2006 * Dictionary Weight Variable::
2007 * Dictionary Filter Variable::
2008 * Dictionary Case Limit::
2009 * Dictionary Split Variables::
2010 * Dictionary File Label::
2011 * Dictionary Documents::
2014 @node Dictionary Variable Access
2015 @subsection Accessing Variables
2017 The most common operations on a dictionary simply retrieve a
2018 @code{struct variable *} of an individual variable based on its name
2021 @deftypefun {struct variable *} dict_lookup_var (const struct dictionary *@var{dict}, const char *@var{name})
2022 @deftypefunx {struct variable *} dict_lookup_var_assert (const struct dictionary *@var{dict}, const char *@var{name})
2023 Looks up and returns the variable with the given @var{name} within
2024 @var{dict}. Name lookup is not case-sensitive.
2026 @code{dict_lookup_var} returns a null pointer if @var{dict} does not
2027 contain a variable named @var{name}. @code{dict_lookup_var_assert}
2028 asserts that such a variable exists.
2031 @deftypefun {struct variable *} dict_get_var (const struct dictionary *@var{dict}, size_t @var{position})
2032 Returns the variable at the given @var{position} in @var{dict}.
2033 @var{position} must be less than the number of variables in @var{dict}
2037 @deftypefun size_t dict_get_var_cnt (const struct dictionary *@var{dict})
2038 Returns the number of variables in @var{dict}.
2041 Another pair of functions allows retrieving a number of variables at
2042 once. These functions are more rarely useful.
2044 @deftypefun void dict_get_vars (const struct dictionary *@var{dict}, const struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2045 @deftypefunx void dict_get_vars_mutable (const struct dictionary *@var{dict}, struct variable ***@var{vars}, size_t *@var{cnt}, enum dict_class @var{exclude})
2046 Retrieves all of the variables in @var{dict}, in their original order,
2047 except that any variables in the dictionary classes specified
2048 @var{exclude}, if any, are excluded (@pxref{Dictionary Class}).
2049 Pointers to the variables are stored in an array allocated with
2050 @code{malloc}, and a pointer to the first element of this array is
2051 stored in @code{*@var{vars}}. The caller is responsible for freeing
2052 this memory when it is no longer needed. The number of variables
2053 retrieved is stored in @code{*@var{cnt}}.
2055 The presence or absence of @code{DC_SYSTEM} in @var{exclude} has no
2056 effect, because dictionaries never include system variables.
2059 One additional function is available. This function is most often
2060 used in assertions, but it is not restricted to such use.
2062 @deftypefun bool dict_contains_var (const struct dictionary *@var{dict}, const struct variable *@var{var})
2063 Tests whether @var{var} is one of the variables in @var{dict}.
2064 Returns true if so, false otherwise.
2067 @node Dictionary Creating Variables
2068 @subsection Creating Variables
2070 These functions create a new variable and insert it into a dictionary
2073 There is no provision for inserting an already created variable into a
2074 dictionary. There is no reason that such a function could not be
2075 written, but so far there has been no need for one.
2077 The names provided to one of these functions should be valid variable
2078 names and must be plausible variable names. @c (@pxref{Variable Names}).
2080 If a variable with the same name already exists in the dictionary, the
2081 non-@code{assert} variants of these functions return a null pointer,
2082 without modifying the dictionary. The @code{assert} variants, on the
2083 other hand, assert that no duplicate name exists.
2085 A variable may be in only one dictionary at any given time.
2087 @deftypefun {struct variable *} dict_create_var (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2088 @deftypefunx {struct variable *} dict_create_var_assert (struct dictionary *@var{dict}, const char *@var{name}, int @var{width})
2089 Creates a new variable with the given @var{name} and @var{width}, as
2090 if through a call to @code{var_create} with those arguments
2091 (@pxref{var_create}), appends the new variable to @var{dict}'s array
2092 of variables, and returns the new variable.
2095 @deftypefun {struct variable *} dict_clone_var (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2096 @deftypefunx {struct variable *} dict_clone_var_assert (struct dictionary *@var{dict}, const struct variable *@var{old_var}, const char *@var{name})
2097 Creates a new variable as a clone of @var{var}, inserts the new
2098 variable into @var{dict}, and returns the new variable. The new
2099 variable is named @var{name}. Other properties of the new variable
2100 are copied from @var{old_var}, except for those not copied by
2101 @code{var_clone} (@pxref{var_clone}).
2103 @var{var} does not need to be a member of any dictionary.
2106 @node Dictionary Deleting Variables
2107 @subsection Deleting Variables
2109 These functions remove variables from a dictionary's array of
2110 variables. They also destroy the removed variables and free their
2113 Deleting a variable to which there might be external pointers is a bad
2114 idea. In particular, deleting variables from the active file
2115 dictionary is a risky proposition, because transformations can retain
2116 references to arbitrary variables. Therefore, no variable should be
2117 deleted from the active file dictionary when any transformations are
2118 active, because those transformations might reference the variable to
2119 be deleted. The safest time to delete a variable is just after a
2120 procedure has been executed, as done by @cmd{DELETE VARIABLES}.
2122 Deleting a variable automatically removes references to that variable
2123 from elsewhere in the dictionary as a weighting variable, filter
2124 variable, @cmd{SPLIT FILE} variable, or member of a vector.
2126 No functions are provided for removing a variable from a dictionary
2127 without destroying that variable. As with insertion of an existing
2128 variable, there is no reason that this could not be implemented, but
2129 so far there has been no need.
2131 @deftypefun void dict_delete_var (struct dictionary *@var{dict}, struct variable *@var{var})
2132 Deletes @var{var} from @var{dict}, of which it must be a member.
2135 @deftypefun void dict_delete_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{count})
2136 Deletes the @var{count} variables in array @var{vars} from @var{dict}.
2137 All of the variables in @var{vars} must be members of @var{dict}. No
2138 variable may be included in @var{vars} more than once.
2141 @deftypefun void dict_delete_consecutive_vars (struct dictionary *@var{dict}, size_t @var{idx}, size_t @var{count})
2142 Deletes the variables in sequential positions
2143 @var{idx}@dots{}@var{idx} + @var{count} (exclusive) from @var{dict},
2144 which must contain at least @var{idx} + @var{count} variables.
2147 @deftypefun void dict_delete_scratch_vars (struct dictionary *@var{dict})
2148 Deletes all scratch variables from @var{dict}.
2151 @node Dictionary Reordering Variables
2152 @subsection Changing Variable Order
2154 The variables in a dictionary are stored in an array. These functions
2155 change the order of a dictionary's array of variables without changing
2156 which variables are in the dictionary.
2158 @anchor{dict_reorder_var}
2159 @deftypefun void dict_reorder_var (struct dictionary *@var{dict}, struct variable *@var{var}, size_t @var{new_index})
2160 Moves @var{var}, which must be in @var{dict}, so that it is at
2161 position @var{new_index} in @var{dict}'s array of variables. Other
2162 variables in @var{dict}, if any, retain their relative positions.
2163 @var{new_index} must be less than the number of variables in
2167 @deftypefun void dict_reorder_vars (struct dictionary *@var{dict}, struct variable *const *@var{new_order}, size_t @var{count})
2168 Moves the @var{count} variables in @var{new_order} to the beginning of
2169 @var{dict}'s array of variables in the specified order. Other
2170 variables in @var{dict}, if any, retain their relative positions.
2172 All of the variables in @var{new_order} must be in @var{dict}. No
2173 duplicates are allowed within @var{new_order}, which means that
2174 @var{count} must be no greater than the number of variables in
2178 @node Dictionary Renaming Variables
2179 @subsection Renaming Variables
2181 These functions change the names of variables within a dictionary.
2182 The @func{var_set_name} function (@pxref{var_set_name}) cannot be
2183 applied directly to a variable that is in a dictionary, because
2184 @struct{dictionary} contains an index by name that @func{var_set_name}
2185 would not update. The following functions take care to update the
2186 index as well. They also ensure that variable renaming does not cause
2187 a dictionary to contain a duplicate variable name.
2189 @deftypefun void dict_rename_var (struct dictionary *@var{dict}, struct variable *@var{var}, const char *@var{new_name})
2190 Changes the name of @var{var}, which must be in @var{dict}, to
2191 @var{new_name}. A variable named @var{new_name} must not already be
2192 in @var{dict}, unless @var{new_name} is the same as @var{var}'s
2196 @deftypefun bool dict_rename_vars (struct dictionary *@var{dicT}, struct variable **@var{vars}, char **@var{new_names}, size_t @var{count}, char **@var{err_name})
2197 Renames each of the @var{count} variables in @var{vars} to the name in
2198 the corresponding position of @var{new_names}. If the renaming would
2199 result in a duplicate variable name, returns false and stores one of
2200 the names that would be be duplicated into @code{*@var{err_name}}, if
2201 @var{err_name} is non-null. Otherwise, the renaming is successful,
2202 and true is returned.
2205 @node Dictionary Weight Variable
2206 @subsection Weight Variable
2208 A data set's cases may optionally be weighted by the value of a
2209 numeric variable. @xref{WEIGHT,,,pspp, PSPP Users Guide}, for a user
2210 view of weight variables.
2212 The weight variable is written to and read from system and portable
2215 The most commonly useful function related to weighting is a
2216 convenience function to retrieve a weighting value from a case.
2218 @deftypefun double dict_get_case_weight (const struct dictionary *@var{dict}, const struct ccase *@var{case}, bool *@var{warn_on_invalid})
2219 Retrieves and returns the value of the weighting variable specified by
2220 @var{dict} from @var{case}. Returns 1.0 if @var{dict} has no
2223 Returns 0.0 if @var{c}'s weight value is user- or system-missing,
2224 zero, or negative. In such a case, if @var{warn_on_invalid} is
2225 non-null and @code{*@var{warn_on_invalid}} is true,
2226 @func{dict_get_case_weight} also issues an error message and sets
2227 @code{*@var{warn_on_invalid}} to false. To disable error reporting,
2228 pass a null pointer or a pointer to false as @var{warn_on_invalid} or
2229 use a @func{msg_disable}/@func{msg_enable} pair.
2232 The dictionary also has a pair of functions for getting and setting
2233 the weight variable.
2235 @deftypefun {struct variable *} dict_get_weight (const struct dictionary *@var{dict})
2236 Returns @var{dict}'s current weighting variable, or a null pointer if
2237 the dictionary does not have a weighting variable.
2240 @deftypefun void dict_set_weight (struct dictionary *@var{dict}, struct variable *@var{var})
2241 Sets @var{dict}'s weighting variable to @var{var}. If @var{var} is
2242 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2243 is null, then @var{dict}'s weighting variable, if any, is cleared.
2246 @node Dictionary Filter Variable
2247 @subsection Filter Variable
2249 When the active file is read by a procedure, cases can be excluded
2250 from analysis based on the values of a @dfn{filter variable}.
2251 @xref{FILTER,,,pspp, PSPP Users Guide}, for a user view of filtering.
2253 These functions store and retrieve the filter variable. They are
2254 rarely useful, because the data analysis framework automatically
2255 excludes from analysis the cases that should be filtered.
2257 @deftypefun {struct variable *} dict_get_filter (const struct dictionary *@var{dict})
2258 Returns @var{dict}'s current filter variable, or a null pointer if the
2259 dictionary does not have a filter variable.
2262 @deftypefun void dict_set_filter (struct dictionary *@var{dict}, struct variable *@var{var})
2263 Sets @var{dict}'s filter variable to @var{var}. If @var{var} is
2264 non-null, it must be a numeric variable in @var{dict}. If @var{var}
2265 is null, then @var{dict}'s filter variable, if any, is cleared.
2268 @node Dictionary Case Limit
2269 @subsection Case Limit
2271 The limit on cases analyzed by a procedure, set by the @cmd{N OF
2272 CASES} command (@pxref{N OF CASES,,,pspp, PSPP Users Guide}), is
2273 stored as part of the dictionary. The dictionary does not, on the
2274 other hand, play any role in enforcing the case limit (a job done by
2275 data analysis framework code).
2277 A case limit of 0 means that the number of cases is not limited.
2279 These functions are rarely useful, because the data analysis framework
2280 automatically excludes from analysis any cases beyond the limit.
2282 @deftypefun casenumber dict_get_case_limit (const struct dictionary *@var{dict})
2283 Returns the current case limit for @var{dict}.
2286 @deftypefun void dict_set_case_limit (struct dictionary *@var{dict}, casenumber @var{limit})
2287 Sets @var{dict}'s case limit to @var{limit}.
2290 @node Dictionary Split Variables
2291 @subsection Split Variables
2293 The user may use the @cmd{SPLIT FILE} command (@pxref{SPLIT
2294 FILE,,,pspp, PSPP Users Guide}) to select a set of variables on which
2295 to split the active file into groups of cases to be analyzed
2296 independently in each statistical procedure. The set of split
2297 variables is stored as part of the dictionary, although the effect on
2298 data analysis is implemented by each individual statistical procedure.
2300 Split variables may be numeric or short or long string variables.
2302 The most useful functions for split variables are those to retrieve
2303 them. Even these functions are rarely useful directly: for the
2304 purpose of breaking cases into groups based on the values of the split
2305 variables, it is usually easier to use
2306 @func{casegrouper_create_splits}.
2308 @deftypefun {const struct variable *const *} dict_get_split_vars (const struct dictionary *@var{dict})
2309 Returns a pointer to an array of pointers to split variables. If and
2310 only if there are no split variables, returns a null pointer. The
2311 caller must not modify or free the returned array.
2314 @deftypefun size_t dict_get_split_cnt (const struct dictionary *@var{dict})
2315 Returns the number of split variables.
2318 The following functions are also available for working with split
2321 @deftypefun void dict_set_split_vars (struct dictionary *@var{dict}, struct variable *const *@var{vars}, size_t @var{cnt})
2322 Sets @var{dict}'s split variables to the @var{cnt} variables in
2323 @var{vars}. If @var{cnt} is 0, then @var{dict} will not have any
2324 split variables. The caller retains ownership of @var{vars}.
2327 @deftypefun void dict_unset_split_var (struct dictionary *@var{dict}, struct variable *@var{var})
2328 Removes @var{var}, which must be a variable in @var{dict}, from
2329 @var{dict}'s split of split variables.
2332 @node Dictionary File Label
2333 @subsection File Label
2335 A dictionary may optionally have an associated string that describes
2336 its contents, called its file label. The user may set the file label
2337 with the @cmd{FILE LABEL} command (@pxref{FILE LABEL,,,pspp, PSPP
2340 These functions set and retrieve the file label.
2342 @deftypefun {const char *} dict_get_label (const struct dictionary *@var{dict})
2343 Returns @var{dict}'s file label. If @var{dict} does not have a label,
2344 returns a null pointer.
2347 @deftypefun void dict_set_label (struct dictionary *@var{dict}, const char *@var{label})
2348 Sets @var{dict}'s label to @var{label}. If @var{label} is non-null,
2349 then its content, truncated to at most 60 bytes, becomes the new file
2350 label. If @var{label} is null, then @var{dict}'s label is removed.
2352 The caller retains ownership of @var{label}.
2355 @node Dictionary Documents
2356 @subsection Documents
2358 A dictionary may include an arbitrary number of lines of explanatory
2359 text, called the dictionary's documents. For compatibility, document
2360 lines have a fixed width, and lines that are not exactly this width
2361 are truncated or padded with spaces as necessary to bring them to the
2364 PSPP users can use the @cmd{DOCUMENT} (@pxref{DOCUMENT,,,pspp, PSPP
2365 Users Guide}), @cmd{ADD DOCUMENT} (@pxref{ADD DOCUMENT,,,pspp, PSPP
2366 Users Guide}), and @cmd{DROP DOCUMENTS} (@pxref{DROP DOCUMENTS,,,pspp,
2367 PSPP Users Guide}) commands to manipulate documents.
2369 @deftypefn Macro int DOC_LINE_LENGTH
2370 The fixed length of a document line, in bytes, defined to 80.
2373 The following functions work with whole sets of documents. They
2374 accept or return sets of documents formatted as null-terminated
2375 strings that are an exact multiple of @code{DOC_LINE_LENGTH}
2378 @deftypefun {const char *} dict_get_documents (const struct dictionary *@var{dict})
2379 Returns the documents in @var{dict}, or a null pointer if @var{dict}
2383 @deftypefun void dict_set_documents (struct dictionary *@var{dict}, const char *@var{new_documents})
2384 Sets @var{dict}'s documents to @var{new_documents}. If
2385 @var{new_documents} is a null pointer or an empty string, then
2386 @var{dict}'s documents are cleared. The caller retains ownership of
2387 @var{new_documents}.
2390 @deftypefun void dict_clear_documents (struct dictionary *@var{dict})
2391 Clears the documents from @var{dict}.
2394 The following functions work with individual lines in a dictionary's
2397 @deftypefun void dict_add_document_line (struct dictionary *@var{dict}, const char *@var{content})
2398 Appends @var{content} to the documents in @var{dict}. The text in
2399 @var{content} will be truncated or padded with spaces as necessary to
2400 make it exactly @code{DOC_LINE_LENGTH} bytes long. The caller retains
2401 ownership of @var{content}.
2403 If @var{content} is over @code{DOC_LINE_LENGTH}, this function also
2404 issues a warning using @func{msg}. To suppress the warning, enclose a
2405 call to one of this function in a @func{msg_disable}/@func{msg_enable}
2409 @deftypefun size_t dict_get_document_line_cnt (const struct dictionary *@var{dict})
2410 Returns the number of line of documents in @var{dict}. If the
2411 dictionary contains no documents, returns 0.
2414 @deftypefun void dict_get_document_line (const struct dictionary *@var{dict}, size_t @var{idx}, struct string *@var{content})
2415 Replaces the text in @var{content} (which must already have been
2416 initialized by the caller) by the document line in @var{dict} numbered
2417 @var{idx}, which must be less than the number of lines of documents in
2418 @var{dict}. Any trailing white space in the document line is trimmed,
2419 so that @var{content} will have a length between 0 and
2420 @code{DOC_LINE_LENGTH}.
2423 @node Coding Conventions
2424 @section Coding Conventions
2426 Every @file{.c} file should have @samp{#include <config.h>} as its
2427 first non-comment line. No @file{.h} file should include
2430 This section needs to be finished.
2435 This section needs to be written.
2440 This section needs to be written.
2445 This section needs to be written.