X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fconcepts.texi;h=06652d62653b1ec21ccdbc387f8993ae8f762b1f;hb=dc4da1f8120bddad12c1326714438f05b594e6e1;hp=ac45416fc088371a4392a637d88b0ee2687f66d6;hpb=6b562f8a8263930b8d1ed1862efec76f2511ed08;p=pspp-builds.git diff --git a/doc/dev/concepts.texi b/doc/dev/concepts.texi index ac45416f..06652d62 100644 --- a/doc/dev/concepts.texi +++ b/doc/dev/concepts.texi @@ -654,19 +654,17 @@ Returns the name of the given format @var{type}. These functions provide the ability to convert data fields into @union{value}s and vice versa. -@deftypefun bool data_in (struct substring @var{input}, enum legacy_encoding @var{legacy_encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, union value *@var{output}, int @var{width}) +@deftypefun bool data_in (struct substring @var{input}, const char *@var{encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, const struct dictionary *@var{dict}, union value *@var{output}, int @var{width}) Parses @var{input} as a field containing data in the given format @var{type}. The resulting value is stored in @var{output}, which the caller must have initialized with the given @var{width}. For consistency, @var{width} must be 0 if @var{type} is a numeric format type and greater than 0 if @var{type} is a string format type. - -Ordinarily @var{legacy_encoding} should be @code{LEGACY_NATIVE}, -indicating that @var{input} is encoded in the character set -conventionally used on the host machine. It may be set to -@code{LEGACY_EBCDIC} to cause @var{input} to be re-encoded from EBCDIC -during data parsing. +@var{encoding} should be set to indicate the character +encoding of @var{input}. +@var{dict} must be a pointer to the dictionary with which @var{output} +is associated. If @var{input} is the empty string (with length 0), @var{output} is set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP @@ -701,21 +699,15 @@ not propagated to the caller as errors. This function is declared in @file{data/data-in.h}. @end deftypefun -@deftypefun void data_out (const union value *@var{input}, const struct fmt_spec *@var{format}, char *@var{output}) -@deftypefunx void data_out_legacy (const union value *@var{input}, enum legacy_encoding @var{legacy_encoding}, const struct fmt_spec *@var{format}, char *@var{output}) -Converts the data pointed to by @var{input} into a data field in -@var{output} according to output format specifier @var{format}, which -must be a valid output format. Exactly @code{@var{format}->w} bytes -are written to @var{output}. The width of @var{input} is also +@deftypefun char * data_out (const union value *@var{input}, const struct fmt_spec *@var{format}) +@deftypefunx char * data_out_legacy (const union value *@var{input}, const char *@var{encoding}, const struct fmt_spec *@var{format}) +Converts the data pointed to by @var{input} into a string value, which +will be encoded in UTF-8, according to output format specifier @var{format}. +Format +must be a valid output format. The width of @var{input} is inferred from @var{format} using an algorithm equivalent to @func{fmt_var_width}. -If @func{data_out} is called, or @func{data_out_legacy} is called with -@var{legacy_encoding} set to @code{LEGACY_NATIVE}, @var{output} will -be encoded in the character set conventionally used on the host -machine. If @var{legacy_encoding} is set to @code{LEGACY_EBCDIC}, -@var{output} will be re-encoded from EBCDIC during data output. - When @var{input} contains data that cannot be represented in the given @var{format}, @func{data_out} may output a message using @func{msg}, @c (@pxref{msg}), @@ -743,28 +735,7 @@ variable, is most conveniently executed through functions on A @struct{missing_values} is essentially a set of @union{value}s that have a common value width (@pxref{Values}). For a set of missing values associated with a variable (the common case), the set's -width is the same as the variable's width. The contents of a set of -missing values is subject to some restrictions. Regardless of width, -a set of missing values is allowed to be empty. Otherwise, its -possible contents depend on its width: - -@table @asis -@item 0 (numeric values) -Up to three discrete numeric values, or a range of numeric values -(which includes both ends of the range), or a range plus one discrete -numeric value. - -@item 1@dots{}@t{MAX_SHORT_STRING} - 1 (short string values) -Up to three discrete string values (with the same width as the set). - -@item @t{MAX_SHORT_STRING}@dots{}@t{MAX_STRING} (long string values) -Always empty. -@end table - -These somewhat arbitrary restrictions are the same as those imposed by -SPSS. In PSPP we could easily eliminate these restrictions, but doing -so would also require us to extend the system file format in an -incompatible way, which we consider a bad tradeoff. +width is the same as the variable's width. Function prototypes and other declarations related to missing values are declared in @file{data/missing-values.h}. @@ -773,18 +744,37 @@ are declared in @file{data/missing-values.h}. Opaque type that represents a set of missing values. @end deftp +The contents of a set of missing values is subject to some +restrictions. Regardless of width, a set of missing values is allowed +to be empty. A set of numeric missing values may contain up to three +discrete numeric values, or a range of numeric values (which includes +both ends of the range), or a range plus one discrete numeric value. +A set of string missing values may contain up to three discrete string +values (with the same width as the set), but ranges are not supported. + +In addition, values in string missing values wider than +@code{MV_MAX_STRING} bytes may contain non-space characters only in +their first @code{MV_MAX_STRING} bytes; all the bytes after the first +@code{MV_MAX_STRING} must be spaces. @xref{mv_is_acceptable}, for a +function that tests a value against these constraints. + +@deftypefn Macro int MV_MAX_STRING +Number of bytes in a string missing value that are not required to be +spaces. The current value is 8, a value which is fixed by the system +file format. In PSPP we could easily eliminate this restriction, but +doing so would also require us to extend the system file format in an +incompatible way, which we consider a bad tradeoff. +@end deftypefn + The most often useful functions for missing values are those for testing whether a given value is missing, described in the following section. Several other functions for creating, inspecting, and modifying @struct{missing_values} objects are described afterward, but -these functions are much more rarely useful. No function for -destroying a @struct{missing_values} is provided, because -@struct{missing_values} does not contain any pointers or other -references to resources that need deallocation. +these functions are much more rarely useful. @menu * Testing for Missing Values:: -* Initializing User-Missing Value Sets:: +* Creating and Destroying User-Missing Values:: * Changing User-Missing Value Set Width:: * Inspecting User-Missing Value Sets:: * Modifying User-Missing Value Sets:: @@ -836,8 +826,10 @@ missing. @end deftp @end deftypefun -@node Initializing User-Missing Value Sets -@subsection Initializing User-Missing Value Sets +@node Creating and Destroying User-Missing Values +@subsection Creation and Destruction + +These functions create and destroy @struct{missing_values} objects. @deftypefun void mv_init (struct missing_values *@var{mv}, int @var{width}) Initializes @var{mv} as a set of user-missing values. The set is @@ -845,6 +837,10 @@ initially empty. Any values added to it must have the specified @var{width}. @end deftypefun +@deftypefun void mv_destroy (struct missing_values *@var{mv}) +Destroys @var{mv}, which must not be referred to again. +@end deftypefun + @deftypefun void mv_copy (struct missing_values *@var{mv}, const struct missing_values *@var{old}) Initializes @var{mv} as a copy of the existing set of user-missing values @var{old}. @@ -874,11 +870,9 @@ the required width, may be used instead. Tests whether @var{mv}'s width may be changed to @var{new_width} using @func{mv_resize}. Returns true if it is allowed, false otherwise. -If @var{new_width} is a long string width, @var{mv} may be resized -only if it is empty. Otherwise, if @var{mv} contains any missing -values, then it may be resized only if each missing value may be -resized, as determined by @func{value_is_resizable} -(@pxref{value_is_resizable}). +If @var{mv} contains any missing values, then it may be resized only +if each missing value may be resized, as determined by +@func{value_is_resizable} (@pxref{value_is_resizable}). @end deftypefun @anchor{mv_resize} @@ -897,8 +891,8 @@ width. These functions inspect the properties and contents of @struct{missing_values} objects. -The first set of functions inspects the discrete values that numeric -and short string sets of user-missing values may contain: +The first set of functions inspects the discrete values that sets of +user-missing values may contain: @deftypefun bool mv_is_empty (const struct missing_values *@var{mv}) Returns true if @var{mv} contains no user-missing values, false if it @@ -923,11 +917,12 @@ values, that is, if @func{mv_n_values} would return nonzero for @var{mv}. @end deftypefun -@deftypefun void mv_get_value (const struct missing_values *@var{mv}, union value *@var{value}, int @var{index}) -Copies the discrete user-missing value in @var{mv} with the given -@var{index} into @var{value}. The index must be less than the number -of discrete user-missing values in @var{mv}, as reported by -@func{mv_n_values}. +@deftypefun {const union value *} mv_get_value (const struct missing_values *@var{mv}, int @var{index}) +Returns the discrete user-missing value in @var{mv} with the given +@var{index}. The caller must not modify or free the returned value or +refer to it after modifying or freeing @var{mv}. The index must be +less than the number of discrete user-missing values in @var{mv}, as +reported by @func{mv_n_values}. @end deftypefun The second set of functions inspects the single range of values that @@ -949,7 +944,7 @@ include a range. These functions modify the contents of @struct{missing_values} objects. -The first set of functions applies to all sets of user-missing values: +The next set of functions applies to all sets of user-missing values: @deftypefun bool mv_add_value (struct missing_values *@var{mv}, const union value *@var{value}) @deftypefunx bool mv_add_str (struct missing_values *@var{mv}, const char @var{value}[]) @@ -957,8 +952,8 @@ The first set of functions applies to all sets of user-missing values: Attempts to add the given discrete @var{value} to set of user-missing values @var{mv}. @var{value} must have the same width as @var{mv}. Returns true if @var{value} was successfully added, false if the set -could not accept any more discrete values. (Always returns false if -@var{mv} is a set of long string user-missing values.) +could not accept any more discrete values or if @var{value} is not an +acceptable user-missing value (see @func{mv_is_acceptable} below). These functions are equivalent, except for the form in which @var{value} is provided, so you may use whichever function is most @@ -970,10 +965,22 @@ Removes a discrete value from @var{mv} (which must contain at least one discrete value) and stores it in @var{value}. @end deftypefun -@deftypefun void mv_replace_value (struct missing_values *@var{mv}, const union value *@var{value}, int @var{index}) -Replaces the discrete value with the given @var{index} in @var{mv} -(which must contain at least @var{index} + 1 discrete values) with -@var{value}. +@deftypefun bool mv_replace_value (struct missing_values *@var{mv}, const union value *@var{value}, int @var{index}) +Attempts to replace the discrete value with the given @var{index} in +@var{mv} (which must contain at least @var{index} + 1 discrete values) +by @var{value}. Returns true if successful, false if @var{value} is +not an acceptable user-missing value (see @func{mv_is_acceptable} +below). +@end deftypefun + +@deftypefun bool mv_is_acceptable (const union value *@var{value}, int @var{width}) +@anchor{mv_is_acceptable} +Returns true if @var{value}, which must have the specified +@var{width}, may be added to a missing value set of the same +@var{width}, false if it cannot. As described above, all numeric +values and string values of width @code{MV_MAX_STRING} or less may be +added, but string value of greater width may be added only if bytes +beyond the first @code{MV_MAX_STRING} are all spaces. @end deftypefun The second set of functions applies only to numeric sets of @@ -1298,16 +1305,6 @@ Returns true if @var{var} is an alphanumeric (string) variable, false otherwise. @end deftypefun -@deftypefun bool var_is_short_string (const struct variable *@var{var}) -Returns true if @var{var} is a string variable of width -@code{MAX_SHORT_STRING} or less, false otherwise. -@end deftypefun - -@deftypefun bool var_is_long_string (const struct variable *@var{var}) -Returns true if @var{var} is a string variable of width greater than -@code{MAX_SHORT_STRING}, false otherwise. -@end deftypefun - @node Variable Missing Values @subsection Variable Missing Values