From: John Darrington Date: Thu, 2 Apr 2009 02:53:44 +0000 (+0800) Subject: Update pspp developer's guide with new i18n changes. X-Git-Tag: v0.7.3~181 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?p=pspp-builds.git;a=commitdiff_plain;h=d12513bc5a48e3b80476a8d23a6936665f17fa3d Update pspp developer's guide with new i18n changes. --- diff --git a/doc/dev/i18n.texi b/doc/dev/i18n.texi index 836ff810..039e32b2 100644 --- a/doc/dev/i18n.texi +++ b/doc/dev/i18n.texi @@ -3,7 +3,6 @@ Internationalisation in pspp is complicated. The most annoying aspect is that of character-encoding. -Currently, pspp does not fully deal with the issues. This chapter attempts to describe the problems and current ways in which they are addressed. @@ -14,7 +13,7 @@ Pspp has three ``working'' locales: @itemize @item The local of the user interface. @item The local of the output. -@item The local of the data. +@item The local of the data. Only the character encoding is relevant. @end itemize Each of these locales may, at different times take @@ -49,35 +48,46 @@ report generated by pspp. Non-data related strings (Eg: ``Page number'', This locale is the one associated with the data being analysed with pspp. The only important aspect of this locale is the character encoding. @footnote {It might also be desirable for the LC_COLLATE category to be used for the purposes of sorting data.} -Any string data stored in a @union{value} will be encoded in the character set -of the data locale. +The dictionary pertaining to the data contains a field denoting the encoding. +Any string data stored in a @union{value} will be encoded in the +dictionary's character set. -The data locale defaults to the locale of the user who starts pspp@{ire@}. -Spss has a @cmd{SET LOCALE} command (not currently supported in pspp) which -can be used to specify the character encoding of the data locale. @section System files @file{*.sav} files contain a field which is supposed to identify the encoding of the data they contain (@pxref{Machine Integer Info Record}). -This field is currently unused by Pspp. -Probably, would be appropriate to set the data locale from this field when -reading a new data file, and set it back to the default value -upon a @cmd{NEW FILE} command. However, many files produced by early versions of spss set this to ``2'' (ASCII) regardless of the encoding of the data. +Later versions contain an additional +record (@pxref{Character Encoding Record}) describing the encoding. +When a system file is read, the dictionary's encoding is set using information +gleened from the system file. +If the encoding cannot be determined or would be unreliable, then it +remains unset. @section GUI The psppire graphic user interface is written using the Gtk+ api, for which all strings must be encoded in UTF8. -All strings passed to the Gtk+/Glib library functions must be UTF-8 encoded -otherwise errors will occur. +All strings passed to the Gtk+/Glib library functions (except for filenames) +must be UTF-8 encoded otherwise errors will occur. Thus, for the purposes of the programming psppire, the user interface locale should be assumed to be UTF8, even if setlocale and/or nl_langinfo indicates otherwise. +@subsection Filenames +The GLib API has some special functions for dealing with filenames. +Strings returned from functions like gtk_file_chooser_dialog_get_name are not, +in general, encoded in UTF8, but in ``filename'' encoding. +If that filename is passed to another Glib function which expects a filename, +no conversion is necessary. +If it's passed to a function for the purposes of displaying it (eg. in a +window's title-bar) it must be converted to UTF8 --- there is a special +function for this: g_filename_display_name or g_filename_basename. +If however, a filename needs to be passed outside of Gtk/Glib (for example to fopen) it must be converted to the local system encoding. + @section Existing locale handling functions The major aspect of locale handling which the programmer has to consider is @@ -85,9 +95,9 @@ that of character encoding. The following function is used to recode strings: -@deftypefun char * recode_string (enum conv_id @var{how}, const char *@var{text}, int @var{len}); -Converts the string @var{text} to a new encoding according to @var{how}. -@var{How} can (currently) take the values @code{CONV_PSPP_TO_UTF8}, @code{CONV_SYSTEM_TO_PSPP} or @code{CONV_UTF8_TO_PSPP} @footnote{The label ``_PSPP'' ought to be changed to ``_DATA''}. +@deftypefun char * recode_string (const char *@var{to}, const char *@var{from}, const char *@var{text}, int @var{len}); + +Converts the string @var{text}, which is encoded in @var{from} to a new string encoded in @var{to} encoding. If @var{len} is not -1, then it must be the number of bytes in @var{text}. It is the caller's responsibility to free the returned string when no longer required. @@ -102,7 +112,7 @@ struct case c = /* from somewhere else */ const union value *val = case_data (&c, var); -char *utf8string = recode_string (CONV_PSPP_TO_UTF8, val->s, +char *utf8string = recode_string (UTF8, dict_get_encoding (dict), val->s, var_get_width (var)); GtkWidget *entry = gtk_entry_new();