From 7bb5c3c53e486798b1cb51fed0bcbb91e958b3d3 Mon Sep 17 00:00:00 2001 From: John Darrington Date: Tue, 9 Dec 2008 14:30:55 +0900 Subject: [PATCH] Added a chapter to the developers' manual about i18n. --- doc/automake.mk | 1 + doc/dev/i18n.texi | 134 +++++++++++++++++++++++++++++++++++++++++++ doc/pspp-dev.texinfo | 2 + 3 files changed, 137 insertions(+) create mode 100644 doc/dev/i18n.texi diff --git a/doc/automake.mk b/doc/automake.mk index 14359e6d..c25e0cb3 100644 --- a/doc/automake.mk +++ b/doc/automake.mk @@ -33,6 +33,7 @@ doc_pspp_dev_TEXINFOS = doc/version-dev.texi \ doc/dev/concepts.texi \ doc/dev/syntax.texi \ doc/dev/data.texi \ + doc/dev/i18n.texi \ doc/dev/output.texi \ doc/dev/system-file-format.texi \ doc/dev/portable-file-format.texi \ diff --git a/doc/dev/i18n.texi b/doc/dev/i18n.texi new file mode 100644 index 00000000..836ff810 --- /dev/null +++ b/doc/dev/i18n.texi @@ -0,0 +1,134 @@ +@node Internationalisation +@chapter Internationalisation + +Internationalisation in pspp is complicated. +The most annoying aspect is that of character-encoding. +Currently, pspp does not fully deal with the issues. +This chapter attempts to describe the problems and current ways +in which they are addressed. + + +@section The working locales +Pspp has three ``working'' locales: + +@itemize +@item The local of the user interface. +@item The local of the output. +@item The local of the data. +@end itemize + +Each of these locales may, at different times take +separate (or identical) values. +So for example, a French statistician can use pspp to prepare a report +in the English language, using +a datafile which has been created by a Japanese researcher hence +uses a Japanese character set. + +It's rarely, if ever, necessary to interrogate the system to find out +the values of the 3 locales. +However it's important to be aware of the source (destination) locale +when reading (writing) string data. +When transfering data between a source and a destination, the appropriate +recoding must be performed. + + +@subsection The user interface locale +This is the locale which is visible to the person using pspp. +Error messages and confidence indications are written in this locale. +For example ``Cannot open file'' will be written in the user interface locale. + +This locale is set from the environment of the user who starts pspp@{ire@} or +from the system locale if not set. + +@subsection The output locale +This locale is the one that should be visible to the person reading a +report generated by pspp. Non-data related strings (Eg: ``Page number'', +``Standard Deviation'' etc.) will appear in this locale. + +@subsection The data locale +This locale is the one associated with the data being analysed with pspp. +The only important aspect of this locale is the character encoding. +@footnote {It might also be desirable for the LC_COLLATE category to be used for the purposes of sorting data.} +Any string data stored in a @union{value} will be encoded in the character set +of the data locale. + +The data locale defaults to the locale of the user who starts pspp@{ire@}. +Spss has a @cmd{SET LOCALE} command (not currently supported in pspp) which +can be used to specify the character encoding of the data locale. + + +@section System files +@file{*.sav} files contain a field which is supposed to identify the encoding +of the data they contain (@pxref{Machine Integer Info Record}). +This field is currently unused by Pspp. +Probably, would be appropriate to set the data locale from this field when +reading a new data file, and set it back to the default value +upon a @cmd{NEW FILE} command. +However, many +files produced by early versions of spss set this to ``2'' (ASCII) regardless +of the encoding of the data. + + +@section GUI +The psppire graphic user interface is written using the Gtk+ api, for which +all strings must be encoded in UTF8. +All strings passed to the Gtk+/Glib library functions must be UTF-8 encoded +otherwise errors will occur. +Thus, for the purposes of the programming psppire, the user interface locale +should be assumed to be UTF8, even if setlocale and/or nl_langinfo +indicates otherwise. + + +@section Existing locale handling functions +The major aspect of locale handling which the programmer has to consider is +that of character encoding. + +The following function is used to recode strings: + +@deftypefun char * recode_string (enum conv_id @var{how}, const char *@var{text}, int @var{len}); +Converts the string @var{text} to a new encoding according to @var{how}. +@var{How} can (currently) take the values @code{CONV_PSPP_TO_UTF8}, @code{CONV_SYSTEM_TO_PSPP} or @code{CONV_UTF8_TO_PSPP} @footnote{The label ``_PSPP'' ought to be changed to ``_DATA''}. +If @var{len} is not -1, then it must be the number of bytes in @var{text}. +It is the caller's responsibility to free the returned string when no +longer required. +@end deftypefun + + +For example, in order to display a string variable's value in a label widget in the psppire gui one would use code similar to +@example + +struct variable *var = /* assigned from somewhere */ +struct case c = /* from somewhere else */ + +const union value *val = case_data (&c, var); + +char *utf8string = recode_string (CONV_PSPP_TO_UTF8, val->s, + var_get_width (var)); + +GtkWidget *entry = gtk_entry_new(); +gtk_entry_set_text (entry, utf8string); +gtk_widget_show (entry); + +free (utf8string); + +@end example + + + +@section Quirks +For historical reasons, not all locale handling follows posix conventions. +This makes it difficult (impossible?) to elegantly handle the issues. +For example, it would make sense for the gui's datasheet to display +numbers formatted according to the LC_NUMERIC category of the data locale. +Instead however there is the @func{data_out} function +(@pxref{Obtaining Properties of Format Types}) which uses the +@func{settings_get_decimal_char} function instead of the decimal separator +of the locale. Similarly, formatting of monetary values is displayed +in a pspp/spss specific fashion instead of using the LC_MONETARY category. + + + +@c LocalWords: pspp itemize Eg LC Spss cmd sav pxref spss GUI psppire Gtk api +@c LocalWords: UTF gtk setlocale nl langinfo deftypefun enum conv var const +@c LocalWords: int len gui struct val utf GtkWidget posix gui's datasheet +@c LocalWords: func diff --git a/doc/pspp-dev.texinfo b/doc/pspp-dev.texinfo index c2507412..4b92654d 100644 --- a/doc/pspp-dev.texinfo +++ b/doc/pspp-dev.texinfo @@ -79,6 +79,7 @@ modify this GNU manual.'' * Parsing Command Syntax:: How to parse command syntax. * Processing Data:: Data input, output, and processing. * Presenting Output:: Producing machine- and human-readable output. +* Internationalisation:: Dealing with locale issues. * Function Index:: Index of PSPP functions. * Concept Index:: Index of concepts. @@ -95,6 +96,7 @@ modify this GNU manual.'' @include dev/syntax.texi @include dev/data.texi @include dev/output.texi +@include dev/i18n.texi @include function-index.texi @include concept-index.texi -- 2.30.2