From dc4da1f8120bddad12c1326714438f05b594e6e1 Mon Sep 17 00:00:00 2001 From: John Darrington Date: Sun, 12 Jul 2009 22:50:11 +0800 Subject: [PATCH] Updated the developers' manual to reflect the new situation --- doc/dev/concepts.texi | 30 +++++++++++------------------- doc/dev/i18n.texi | 32 +++++++++++++------------------- 2 files changed, 24 insertions(+), 38 deletions(-) diff --git a/doc/dev/concepts.texi b/doc/dev/concepts.texi index cc6e7522..06652d62 100644 --- a/doc/dev/concepts.texi +++ b/doc/dev/concepts.texi @@ -654,19 +654,17 @@ Returns the name of the given format @var{type}. These functions provide the ability to convert data fields into @union{value}s and vice versa. -@deftypefun bool data_in (struct substring @var{input}, enum legacy_encoding @var{legacy_encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, union value *@var{output}, int @var{width}) +@deftypefun bool data_in (struct substring @var{input}, const char *@var{encoding}, enum fmt_type @var{type}, int @var{implied_decimals}, int @var{first_column}, const struct dictionary *@var{dict}, union value *@var{output}, int @var{width}) Parses @var{input} as a field containing data in the given format @var{type}. The resulting value is stored in @var{output}, which the caller must have initialized with the given @var{width}. For consistency, @var{width} must be 0 if @var{type} is a numeric format type and greater than 0 if @var{type} is a string format type. - -Ordinarily @var{legacy_encoding} should be @code{LEGACY_NATIVE}, -indicating that @var{input} is encoded in the character set -conventionally used on the host machine. It may be set to -@code{LEGACY_EBCDIC} to cause @var{input} to be re-encoded from EBCDIC -during data parsing. +@var{encoding} should be set to indicate the character +encoding of @var{input}. +@var{dict} must be a pointer to the dictionary with which @var{output} +is associated. If @var{input} is the empty string (with length 0), @var{output} is set to the value set on SET BLANKS (@pxref{SET BLANKS,,,pspp, PSPP @@ -701,21 +699,15 @@ not propagated to the caller as errors. This function is declared in @file{data/data-in.h}. @end deftypefun -@deftypefun void data_out (const union value *@var{input}, const struct fmt_spec *@var{format}, char *@var{output}) -@deftypefunx void data_out_legacy (const union value *@var{input}, enum legacy_encoding @var{legacy_encoding}, const struct fmt_spec *@var{format}, char *@var{output}) -Converts the data pointed to by @var{input} into a data field in -@var{output} according to output format specifier @var{format}, which -must be a valid output format. Exactly @code{@var{format}->w} bytes -are written to @var{output}. The width of @var{input} is also +@deftypefun char * data_out (const union value *@var{input}, const struct fmt_spec *@var{format}) +@deftypefunx char * data_out_legacy (const union value *@var{input}, const char *@var{encoding}, const struct fmt_spec *@var{format}) +Converts the data pointed to by @var{input} into a string value, which +will be encoded in UTF-8, according to output format specifier @var{format}. +Format +must be a valid output format. The width of @var{input} is inferred from @var{format} using an algorithm equivalent to @func{fmt_var_width}. -If @func{data_out} is called, or @func{data_out_legacy} is called with -@var{legacy_encoding} set to @code{LEGACY_NATIVE}, @var{output} will -be encoded in the character set conventionally used on the host -machine. If @var{legacy_encoding} is set to @code{LEGACY_EBCDIC}, -@var{output} will be re-encoded from EBCDIC during data output. - When @var{input} contains data that cannot be represented in the given @var{format}, @func{data_out} may output a message using @func{msg}, @c (@pxref{msg}), diff --git a/doc/dev/i18n.texi b/doc/dev/i18n.texi index 97077d34..3ab86c3d 100644 --- a/doc/dev/i18n.texi +++ b/doc/dev/i18n.texi @@ -53,7 +53,6 @@ Any string data stored in a @union{value} will be encoded in the dictionary's character set. - @section System files @file{*.sav} files contain a field which is supposed to identify the encoding of the data they contain (@pxref{Machine Integer Info Record}). @@ -103,25 +102,20 @@ It is the caller's responsibility to free the returned string when no longer required. @end deftypefun +In order to minimise the number of conversions required, and to simplify +design, PSPP attempts to store all internal strings in UTF8 encoding. +Thus, when reading system and portable files (or any other data source), +the following items are immediately converted to UTF8 encoding: +@itemize +@item Variable names +@item Variable labels +@item Value labels +@end itemize +Conversely, when writing system files, these are converted back to the +encoding of that system file. -For example, in order to display a string variable's value in a label widget in the psppire gui one would use code similar to -@example - -struct variable *var = /* assigned from somewhere */ -struct case c = /* from somewhere else */ - -const union value *val = case_data (&c, var); - -char *utf8string = recode_string (UTF8, dict_get_encoding (dict), val->s, - var_get_width (var)); - -GtkWidget *entry = gtk_entry_new(); -gtk_entry_set_text (entry, utf8string); -gtk_widget_show (entry); - -free (utf8string); - -@end example +String data stored in union values are left in their original encoding. +These will be converted by the data_in/data_out functions. -- 2.30.2