These functions provide the ability to convert data fields into
@union{value}s and vice versa.
-@deftypefun bool data_in (struct substring @var{input}, enum legacy_encoding @var{legacy_encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, union value *@var{output}, int @var{width})
+@deftypefun bool data_in (struct substring @var{input}, const char *@var{encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, const struct dictionary *@var{dict}, union value *@var{output}, int @var{width})
Parses @var{input} as a field containing data in the given format
@var{type}. The resulting value is stored in @var{output}, which the
caller must have initialized with the given @var{width}. For
consistency, @var{width} must be 0 if
@var{type} is a numeric format type and greater than 0 if @var{type}
is a string format type.
-
-Ordinarily @var{legacy_encoding} should be @code{LEGACY_NATIVE},
-indicating that @var{input} is encoded in the character set
-conventionally used on the host machine. It may be set to
-@code{LEGACY_EBCDIC} to cause @var{input} to be re-encoded from EBCDIC
-during data parsing.
+@var{encoding} should be set to indicate the character
+encoding of @var{input}.
+@var{dict} must be a pointer to the dictionary with which @var{output}
+is associated.
If @var{input} is the empty string (with length 0), @var{output} is
set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP
This function is declared in @file{data/data-in.h}.
@end deftypefun
-@deftypefun void data_out (const union value *@var{input}, const struct fmt_spec *@var{format}, char *@var{output})
-@deftypefunx void data_out_legacy (const union value *@var{input}, enum legacy_encoding @var{legacy_encoding}, const struct fmt_spec *@var{format}, char *@var{output})
-Converts the data pointed to by @var{input} into a data field in
-@var{output} according to output format specifier @var{format}, which
-must be a valid output format. Exactly @code{@var{format}->w} bytes
-are written to @var{output}. The width of @var{input} is also
+@deftypefun char * data_out (const union value *@var{input}, const struct fmt_spec *@var{format})
+@deftypefunx char * data_out_legacy (const union value *@var{input}, const char *@var{encoding}, const struct fmt_spec *@var{format})
+Converts the data pointed to by @var{input} into a string value, which
+will be encoded in UTF-8, according to output format specifier @var{format}.
+Format
+must be a valid output format. The width of @var{input} is
inferred from @var{format} using an algorithm equivalent to
@func{fmt_var_width}.
-If @func{data_out} is called, or @func{data_out_legacy} is called with
-@var{legacy_encoding} set to @code{LEGACY_NATIVE}, @var{output} will
-be encoded in the character set conventionally used on the host
-machine. If @var{legacy_encoding} is set to @code{LEGACY_EBCDIC},
-@var{output} will be re-encoded from EBCDIC during data output.
-
When @var{input} contains data that cannot be represented in the given
@var{format}, @func{data_out} may output a message using @func{msg},
@c (@pxref{msg}),
dictionary's character set.
-
@section System files
@file{*.sav} files contain a field which is supposed to identify the encoding
of the data they contain (@pxref{Machine Integer Info Record}).
longer required.
@end deftypefun
+In order to minimise the number of conversions required, and to simplify
+design, PSPP attempts to store all internal strings in UTF8 encoding.
+Thus, when reading system and portable files (or any other data source),
+the following items are immediately converted to UTF8 encoding:
+@itemize
+@item Variable names
+@item Variable labels
+@item Value labels
+@end itemize
+Conversely, when writing system files, these are converted back to the
+encoding of that system file.
-For example, in order to display a string variable's value in a label widget in the psppire gui one would use code similar to
-@example
-
-struct variable *var = /* assigned from somewhere */
-struct case c = /* from somewhere else */
-
-const union value *val = case_data (&c, var);
-
-char *utf8string = recode_string (UTF8, dict_get_encoding (dict), val->s,
- var_get_width (var));
-
-GtkWidget *entry = gtk_entry_new();
-gtk_entry_set_text (entry, utf8string);
-gtk_widget_show (entry);
-
-free (utf8string);
-
-@end example
+String data stored in union values are left in their original encoding.
+These will be converted by the data_in/data_out functions.