X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=blobdiff_plain;f=doc%2Fdev%2Fi18n.texi;h=3ab86c3d2ec349fb85267ac934bb7c721d9eb304;hb=dc4da1f8120bddad12c1326714438f05b594e6e1;hp=836ff810a963b162b5428fee2645ba0e6941090f;hpb=7fbfc32fc3c636959b0a25b3e76609f86519e84a;p=pspp-builds.git diff --git a/doc/dev/i18n.texi b/doc/dev/i18n.texi index 836ff810..3ab86c3d 100644 --- a/doc/dev/i18n.texi +++ b/doc/dev/i18n.texi @@ -3,7 +3,6 @@ Internationalisation in pspp is complicated. The most annoying aspect is that of character-encoding. -Currently, pspp does not fully deal with the issues. This chapter attempts to describe the problems and current ways in which they are addressed. @@ -12,9 +11,9 @@ in which they are addressed. Pspp has three ``working'' locales: @itemize -@item The local of the user interface. -@item The local of the output. -@item The local of the data. +@item The locale of the user interface. +@item The locale of the output. +@item The locale of the data. Only the character encoding is relevant. @end itemize Each of these locales may, at different times take @@ -49,35 +48,45 @@ report generated by pspp. Non-data related strings (Eg: ``Page number'', This locale is the one associated with the data being analysed with pspp. The only important aspect of this locale is the character encoding. @footnote {It might also be desirable for the LC_COLLATE category to be used for the purposes of sorting data.} -Any string data stored in a @union{value} will be encoded in the character set -of the data locale. - -The data locale defaults to the locale of the user who starts pspp@{ire@}. -Spss has a @cmd{SET LOCALE} command (not currently supported in pspp) which -can be used to specify the character encoding of the data locale. +The dictionary pertaining to the data contains a field denoting the encoding. +Any string data stored in a @union{value} will be encoded in the +dictionary's character set. @section System files @file{*.sav} files contain a field which is supposed to identify the encoding of the data they contain (@pxref{Machine Integer Info Record}). -This field is currently unused by Pspp. -Probably, would be appropriate to set the data locale from this field when -reading a new data file, and set it back to the default value -upon a @cmd{NEW FILE} command. However, many files produced by early versions of spss set this to ``2'' (ASCII) regardless of the encoding of the data. +Later versions contain an additional +record (@pxref{Character Encoding Record}) describing the encoding. +When a system file is read, the dictionary's encoding is set using information +gleened from the system file. +If the encoding cannot be determined or would be unreliable, then it +remains unset. @section GUI The psppire graphic user interface is written using the Gtk+ api, for which all strings must be encoded in UTF8. -All strings passed to the Gtk+/Glib library functions must be UTF-8 encoded -otherwise errors will occur. +All strings passed to the GTK+/GLib library functions (except for filenames) +must be UTF-8 encoded otherwise errors will occur. Thus, for the purposes of the programming psppire, the user interface locale should be assumed to be UTF8, even if setlocale and/or nl_langinfo indicates otherwise. +@subsection Filenames +The GLib API has some special functions for dealing with filenames. +Strings returned from functions like gtk_file_chooser_dialog_get_name are not, +in general, encoded in UTF8, but in ``filename'' encoding. +If that filename is passed to another GLib function which expects a filename, +no conversion is necessary. +If it's passed to a function for the purposes of displaying it (eg. in a +window's title-bar) it must be converted to UTF8 --- there is a special +function for this: g_filename_display_name or g_filename_basename. +If however, a filename needs to be passed outside of GTK+/GLib (for example to fopen) it must be converted to the local system encoding. + @section Existing locale handling functions The major aspect of locale handling which the programmer has to consider is @@ -85,33 +94,28 @@ that of character encoding. The following function is used to recode strings: -@deftypefun char * recode_string (enum conv_id @var{how}, const char *@var{text}, int @var{len}); -Converts the string @var{text} to a new encoding according to @var{how}. -@var{How} can (currently) take the values @code{CONV_PSPP_TO_UTF8}, @code{CONV_SYSTEM_TO_PSPP} or @code{CONV_UTF8_TO_PSPP} @footnote{The label ``_PSPP'' ought to be changed to ``_DATA''}. +@deftypefun char * recode_string (const char *@var{to}, const char *@var{from}, const char *@var{text}, int @var{len}); + +Converts the string @var{text}, which is encoded in @var{from} to a new string encoded in @var{to} encoding. If @var{len} is not -1, then it must be the number of bytes in @var{text}. It is the caller's responsibility to free the returned string when no longer required. @end deftypefun +In order to minimise the number of conversions required, and to simplify +design, PSPP attempts to store all internal strings in UTF8 encoding. +Thus, when reading system and portable files (or any other data source), +the following items are immediately converted to UTF8 encoding: +@itemize +@item Variable names +@item Variable labels +@item Value labels +@end itemize +Conversely, when writing system files, these are converted back to the +encoding of that system file. -For example, in order to display a string variable's value in a label widget in the psppire gui one would use code similar to -@example - -struct variable *var = /* assigned from somewhere */ -struct case c = /* from somewhere else */ - -const union value *val = case_data (&c, var); - -char *utf8string = recode_string (CONV_PSPP_TO_UTF8, val->s, - var_get_width (var)); - -GtkWidget *entry = gtk_entry_new(); -gtk_entry_set_text (entry, utf8string); -gtk_widget_show (entry); - -free (utf8string); - -@end example +String data stored in union values are left in their original encoding. +These will be converted by the data_in/data_out functions.