From: John Darrington Date: Wed, 8 Apr 2009 01:14:57 +0000 (+0800) Subject: Merge commit 'origin/master' into charset X-Git-Tag: v0.7.3~176 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=537fdeb3702c011e05d7826a8d556a7beeba2605;hp=7fbfc32fc3c636959b0a25b3e76609f86519e84a;p=pspp-builds.git Merge commit 'origin/master' into charset Conflicts: src/ui/gui/psppire-data-editor.c --- diff --git a/doc/data-io.texi b/doc/data-io.texi index 0b9583b0..b7bfda9c 100644 --- a/doc/data-io.texi +++ b/doc/data-io.texi @@ -178,7 +178,7 @@ situations. @display DATA LIST [FIXED] @{TABLE,NOTABLE@} - [FILE='file-name'] + [FILE='file-name' [ENCODING='encoding']] [RECORDS=record_count] [END=end_var] [SKIP=record_count] @@ -198,6 +198,8 @@ external file. It may be used to specify a file name as a string or a file handle (@pxref{File Handles}). If the FILE subcommand is not used, then input is assumed to be specified within the command file using @cmd{BEGIN DATA}@dots{}@cmd{END DATA} (@pxref{BEGIN DATA}). +The ENCODING subcommand may only be used if the FILE subcommand is also used. +It specifies the character encoding of the file. The optional RECORDS subcommand, which takes a single integer as an argument, is used to specify the number of lines per record. If RECORDS @@ -391,7 +393,7 @@ This example shows keywords abbreviated to their first 3 letters. DATA LIST FREE [(@{TAB,'c'@}, @dots{})] [@{NOTABLE,TABLE@}] - [FILE='file-name'] + [FILE='file-name' [ENCODING='encoding']] [SKIP=record_cnt] /var_spec@dots{} @@ -443,7 +445,7 @@ on field width apply, but they are honored on output. DATA LIST LIST [(@{TAB,'c'@}, @dots{})] [@{NOTABLE,TABLE@}] - [FILE='file-name'] + [FILE='file-name' [ENCODING='encoding']] [SKIP=record_count] /var_spec@dots{} diff --git a/doc/dev/i18n.texi b/doc/dev/i18n.texi index 836ff810..039e32b2 100644 --- a/doc/dev/i18n.texi +++ b/doc/dev/i18n.texi @@ -3,7 +3,6 @@ Internationalisation in pspp is complicated. The most annoying aspect is that of character-encoding. -Currently, pspp does not fully deal with the issues. This chapter attempts to describe the problems and current ways in which they are addressed. @@ -14,7 +13,7 @@ Pspp has three ``working'' locales: @itemize @item The local of the user interface. @item The local of the output. -@item The local of the data. +@item The local of the data. Only the character encoding is relevant. @end itemize Each of these locales may, at different times take @@ -49,35 +48,46 @@ report generated by pspp. Non-data related strings (Eg: ``Page number'', This locale is the one associated with the data being analysed with pspp. The only important aspect of this locale is the character encoding. @footnote {It might also be desirable for the LC_COLLATE category to be used for the purposes of sorting data.} -Any string data stored in a @union{value} will be encoded in the character set -of the data locale. +The dictionary pertaining to the data contains a field denoting the encoding. +Any string data stored in a @union{value} will be encoded in the +dictionary's character set. -The data locale defaults to the locale of the user who starts pspp@{ire@}. -Spss has a @cmd{SET LOCALE} command (not currently supported in pspp) which -can be used to specify the character encoding of the data locale. @section System files @file{*.sav} files contain a field which is supposed to identify the encoding of the data they contain (@pxref{Machine Integer Info Record}). -This field is currently unused by Pspp. -Probably, would be appropriate to set the data locale from this field when -reading a new data file, and set it back to the default value -upon a @cmd{NEW FILE} command. However, many files produced by early versions of spss set this to ``2'' (ASCII) regardless of the encoding of the data. +Later versions contain an additional +record (@pxref{Character Encoding Record}) describing the encoding. +When a system file is read, the dictionary's encoding is set using information +gleened from the system file. +If the encoding cannot be determined or would be unreliable, then it +remains unset. @section GUI The psppire graphic user interface is written using the Gtk+ api, for which all strings must be encoded in UTF8. -All strings passed to the Gtk+/Glib library functions must be UTF-8 encoded -otherwise errors will occur. +All strings passed to the Gtk+/Glib library functions (except for filenames) +must be UTF-8 encoded otherwise errors will occur. Thus, for the purposes of the programming psppire, the user interface locale should be assumed to be UTF8, even if setlocale and/or nl_langinfo indicates otherwise. +@subsection Filenames +The GLib API has some special functions for dealing with filenames. +Strings returned from functions like gtk_file_chooser_dialog_get_name are not, +in general, encoded in UTF8, but in ``filename'' encoding. +If that filename is passed to another Glib function which expects a filename, +no conversion is necessary. +If it's passed to a function for the purposes of displaying it (eg. in a +window's title-bar) it must be converted to UTF8 --- there is a special +function for this: g_filename_display_name or g_filename_basename. +If however, a filename needs to be passed outside of Gtk/Glib (for example to fopen) it must be converted to the local system encoding. + @section Existing locale handling functions The major aspect of locale handling which the programmer has to consider is @@ -85,9 +95,9 @@ that of character encoding. The following function is used to recode strings: -@deftypefun char * recode_string (enum conv_id @var{how}, const char *@var{text}, int @var{len}); -Converts the string @var{text} to a new encoding according to @var{how}. -@var{How} can (currently) take the values @code{CONV_PSPP_TO_UTF8}, @code{CONV_SYSTEM_TO_PSPP} or @code{CONV_UTF8_TO_PSPP} @footnote{The label ``_PSPP'' ought to be changed to ``_DATA''}. +@deftypefun char * recode_string (const char *@var{to}, const char *@var{from}, const char *@var{text}, int @var{len}); + +Converts the string @var{text}, which is encoded in @var{from} to a new string encoded in @var{to} encoding. If @var{len} is not -1, then it must be the number of bytes in @var{text}. It is the caller's responsibility to free the returned string when no longer required. @@ -102,7 +112,7 @@ struct case c = /* from somewhere else */ const union value *val = case_data (&c, var); -char *utf8string = recode_string (CONV_PSPP_TO_UTF8, val->s, +char *utf8string = recode_string (UTF8, dict_get_encoding (dict), val->s, var_get_width (var)); GtkWidget *entry = gtk_entry_new(); diff --git a/doc/dev/system-file-format.texi b/doc/dev/system-file-format.texi index 3e764c8c..164807b8 100644 --- a/doc/dev/system-file-format.texi +++ b/doc/dev/system-file-format.texi @@ -96,6 +96,7 @@ Each type of record is described separately below. * Variable Display Parameter Record:: * Long Variable Names Record:: * Very Long String Record:: +* Character Encoding Record:: * Data File and Variable Attributes Records:: * Miscellaneous Informational Records:: * Dictionary Termination Record:: @@ -546,9 +547,14 @@ Compression code. Always set to 1. Machine endianness. 1 indicates big-endian, 2 indicates little-endian. @item int32 character_code; +@anchor{character-code} Character code. 1 indicates EBCDIC, 2 indicates 7-bit ASCII, 3 indicates 8-bit ASCII, 4 indicates DEC Kanji. Windows code page numbers are also valid. + +Experience has shown that in many files, this field is ignored or incorrect. +For a more reliable indication of the file's character encoding +see @ref{Character Encoding Record}. @end table @node Machine Floating-Point Info Record @@ -792,6 +798,46 @@ After the last tuple, there may be a single byte 00, or @{00, 09@}. The total length is @code{count} bytes. @end table +@node Character Encoding Record +@section Character Encoding Record + +This record, if present, indicates the character encoding for string data, +long variable names, variable labels, value labels and other strings in the +file. + +@example +/* @r{Header.} */ +int32 rec_type; +int32 subtype; +int32 size; +int32 count; + +/* @r{Exactly @code{count} bytes of data.} */ +char encoding[]; +@end example + +@table @code +@item int32 rec_type; +Record type. Always set to 7. + +@item int32 subtype; +Record subtype. Always set to 20. + +@item int32 size; +The size of each element in the @code{encoding} member. Always set to 1. + +@item int32 count; +The total number of bytes in @code{encoding}. + +@item char encoding[]; +The name of the character encoding. Normally this will be an official IANA characterset name or alias. +See @url{http://www.iana.org/assignments/character-sets}. +@end table + +This record is not present in files generated by older software. +See also @ref{character-code}. + + @node Data File and Variable Attributes Records @section Data File and Variable Attributes Records diff --git a/doc/utilities.texi b/doc/utilities.texi index 95cfa25d..6882aa21 100644 --- a/doc/utilities.texi +++ b/doc/utilities.texi @@ -374,8 +374,10 @@ SET /COMPRESSION=@{ON,OFF@} /SCOMPRESSION=@{ON,OFF@} -(security) +(miscellaneous) /SAFER=ON + /LOCALE='string' + (obsolete settings accepted for compatibility, but ignored) /BOXSTRING=@{'xxx','xxxxxxxxxxx'@} @@ -701,6 +703,37 @@ Be aware that this setting does not guarantee safety (commands can still overwrite files, for instance) but it is an improvement. When set, this setting cannot be reset during the same session, for obvious security reasons. + +@item LOCALE +@cindex locale +@cindex encoding, characters +This item is used to set the default character encoding. +The encoding may be specified either as an encoding name or alias +(see @url{http://www.iana.org/assignments/character-sets}), or +as a locale name. +If given as a locale name, only the character encoding of the +locale is relevant. + +System files written by PSPP will use this encoding. +System files read by PSPP, for which the encoding is unknown, will be +interpreted using this encoding. + +The full list of valid encodings and locale names/alias are operating system +dependent. +The following are all examples of acceptable syntax on common GNU/Linux +systems. +@example + +SET LOCALE='iso-8859-1'. + +SET LOCALE='ru_RU.cp1251'. + +SET LOCALE='japanese'. + +@end example + +Contrary to the intuition, this command does not affect any aspect +of the system's locale. @end table @node SHOW diff --git a/src/data/dictionary.c b/src/data/dictionary.c index ba840898..b5b6cca5 100644 --- a/src/data/dictionary.c +++ b/src/data/dictionary.c @@ -63,6 +63,9 @@ struct dictionary struct vector **vector; /* Vectors of variables. */ size_t vector_cnt; /* Number of vectors. */ struct attrset attributes; /* Custom attributes. */ + + char *encoding; /* Character encoding of string data */ + const struct dict_callbacks *callbacks; /* Callbacks on dictionary modification */ void *cb_data ; /* Data passed to callbacks */ @@ -71,6 +74,21 @@ struct dictionary void *changed_data; }; + +void +dict_set_encoding (struct dictionary *d, const char *enc) +{ + if (enc) + d->encoding = strdup (enc); +} + +const char * +dict_get_encoding (const struct dictionary *d) +{ + return d->encoding ; +} + + void dict_set_change_callback (struct dictionary *d, void (*changed) (struct dictionary *, void*), @@ -194,6 +212,9 @@ dict_clone (const struct dictionary *s) for (i = 0; i < s->vector_cnt; i++) d->vector[i] = vector_clone (s->vector[i], s, d); + if ( s->encoding) + d->encoding = strdup (s->encoding); + dict_set_attributes (d, dict_get_attributes (s)); return d; diff --git a/src/data/dictionary.h b/src/data/dictionary.h index 18bf3f78..4efb953c 100644 --- a/src/data/dictionary.h +++ b/src/data/dictionary.h @@ -147,6 +147,11 @@ struct attrset *dict_get_attributes (const struct dictionary *); void dict_set_attributes (struct dictionary *, const struct attrset *); bool dict_has_attributes (const struct dictionary *); + +void dict_set_encoding (struct dictionary *d, const char *enc); +const char *dict_get_encoding (const struct dictionary *d); + + /* Functions to be called upon dictionary changes. */ struct dict_callbacks { diff --git a/src/data/gnumeric-reader.c b/src/data/gnumeric-reader.c index 5a0c75ed..f2f4e52f 100644 --- a/src/data/gnumeric-reader.c +++ b/src/data/gnumeric-reader.c @@ -314,11 +314,10 @@ static void convert_xml_string_to_value (struct ccase *c, const struct variable *var, const xmlChar *xv) { - char *text; int n_bytes = 0; union value *v = case_data_rw (c, var); - text = recode_string (CONV_UTF8_TO_PSPP, (const char *) xv, -1); + const char *text = (const char *) xv; if ( text) n_bytes = MIN (var_get_width (var), strlen (text)); @@ -335,8 +334,6 @@ convert_xml_string_to_value (struct ccase *c, const struct variable *var, if ( errno != 0 || endptr == text) v->f = SYSMIS; } - - free (text); } struct var_spec @@ -459,10 +456,8 @@ gnumeric_open_reader (struct gnumeric_read_info *gri, struct dictionary **dict) if ( r->node_type == XML_READER_TYPE_TEXT ) { - char *text ; xmlChar *value = xmlTextReaderValue (r->xtr); - - text = recode_string (CONV_UTF8_TO_PSPP, (const char *) value, -1); + const char *text = (const char *) value; if ( r->row < r->start_row) { @@ -481,7 +476,6 @@ gnumeric_open_reader (struct gnumeric_read_info *gri, struct dictionary **dict) } free (value); - free (text); } else if ( r->node_type == XML_READER_TYPE_ELEMENT && r->state == STATE_CELL) @@ -503,6 +497,8 @@ gnumeric_open_reader (struct gnumeric_read_info *gri, struct dictionary **dict) /* Create the dictionary and populate it */ *dict = r->dict = dict_create (); + dict_set_encoding (r->dict, (const char *) xmlTextReaderConstEncoding (r->xtr)); + r->value_cnt = 0; for (i = 0 ; i < n_var_specs ; ++i ) diff --git a/src/data/identifier.c b/src/data/identifier.c index a52944e2..37384b6d 100644 --- a/src/data/identifier.c +++ b/src/data/identifier.c @@ -35,7 +35,7 @@ bool lex_is_id1 (char c_) { unsigned char c = c_; - return isalpha (c) || c == '@' || c == '#' || c == '$'; + return isalpha (c) || c == '@' || c == '#' || c == '$' || c >= 128; } @@ -45,7 +45,7 @@ bool lex_is_idn (char c_) { unsigned char c = c_; - return lex_is_id1 (c) || isdigit (c) || c == '.' || c == '_'; + return lex_is_id1 (c) || isdigit (c) || c == '.' || c == '_' || c >= 128; } /* Returns the length of the longest prefix of STRING that forms diff --git a/src/data/procedure.c b/src/data/procedure.c index 32839e64..fbb9a757 100644 --- a/src/data/procedure.c +++ b/src/data/procedure.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "xalloc.h" @@ -548,6 +549,8 @@ create_dataset (void) dict_set_change_callback (ds->dict, dict_callback, ds); + dict_set_encoding (ds->dict, get_default_encoding ()); + ds->caseinit = caseinit_create (); proc_cancel_all_transformations (ds); return ds; diff --git a/src/data/psql-reader.c b/src/data/psql-reader.c index be6d0a60..85e777a9 100644 --- a/src/data/psql-reader.c +++ b/src/data/psql-reader.c @@ -288,10 +288,22 @@ psql_open_reader (struct psql_read_info *info, struct dictionary **dict) /* Create the dictionary and populate it */ *dict = r->dict = dict_create (); + { + const int enc = PQclientEncoding (r->conn); + + /* According to section 22.2 of the Postgresql manual + a value of zero (SQL_ASCII) indicates + "a declaration of ignorance about the encoding". + Accordingly, we don't set the dictionary's encoding + if we find this value. + */ + if ( enc != 0 ) + dict_set_encoding (r->dict, pg_encoding_to_char (enc)); + } + /* select count (*) from (select * from medium) stupid_sql_standard; */ - ds_init_cstr (&query, "BEGIN READ ONLY ISOLATION LEVEL SERIALIZABLE; " "DECLARE pspp BINARY CURSOR FOR "); diff --git a/src/data/settings.c b/src/data/settings.c index ad509fb0..f9c65fc8 100644 --- a/src/data/settings.c +++ b/src/data/settings.c @@ -22,9 +22,9 @@ #include "format.h" #include "value.h" #include "xalloc.h" -#include #include #include +#include #include "error.h" @@ -147,7 +147,6 @@ settings_init (int *width, int *length) { init_viewport (width, length); settings_set_epoch (-1); - i18n_init (); the_settings.styles = fmt_create (); settings_set_decimal_char (get_system_decimal ()); @@ -157,7 +156,6 @@ void settings_done (void) { fmt_done (the_settings.styles); - i18n_done (); } /* Returns the floating-point format used for RB and RBHEX diff --git a/src/data/sys-file-reader.c b/src/data/sys-file-reader.c index 84d7f83c..86607bcc 100644 --- a/src/data/sys-file-reader.c +++ b/src/data/sys-file-reader.c @@ -162,7 +162,9 @@ static void read_extension_record (struct sfm_reader *, struct dictionary *, struct sfm_read_info *); static void read_machine_integer_info (struct sfm_reader *, size_t size, size_t count, - struct sfm_read_info *); + struct sfm_read_info *, + struct dictionary * + ); static void read_machine_float_info (struct sfm_reader *, size_t size, size_t count); static void read_display_parameters (struct sfm_reader *, @@ -725,7 +727,7 @@ read_extension_record (struct sfm_reader *r, struct dictionary *dict, switch (subtype) { case 3: - read_machine_integer_info (r, size, count, info); + read_machine_integer_info (r, size, count, info, dict); return; case 4: @@ -778,7 +780,12 @@ read_extension_record (struct sfm_reader *r, struct dictionary *dict, case 20: /* New in SPSS 16. Contains a single string that describes the character encoding, e.g. "windows-1252". */ - break; + { + char *encoding = calloc (size, count + 1); + read_string (r, encoding, count + 1); + dict_set_encoding (dict, encoding); + return; + } case 21: /* New in SPSS 16. Encodes value labels for long string @@ -799,7 +806,8 @@ read_extension_record (struct sfm_reader *r, struct dictionary *dict, /* Read record type 7, subtype 3. */ static void read_machine_integer_info (struct sfm_reader *r, size_t size, size_t count, - struct sfm_read_info *info) + struct sfm_read_info *info, + struct dictionary *dict) { int version_major = read_int (r); int version_minor = read_int (r); @@ -808,7 +816,7 @@ read_machine_integer_info (struct sfm_reader *r, size_t size, size_t count, int float_representation = read_int (r); int compression_code UNUSED = read_int (r); int integer_representation = read_int (r); - int character_code UNUSED = read_int (r); + int character_code = read_int (r); int expected_float_format; int expected_integer_format; @@ -853,6 +861,47 @@ read_machine_integer_info (struct sfm_reader *r, size_t size, size_t count, gettext (endian[integer_representation == 1]), gettext (endian[expected_integer_format == 1])); } + + + /* + Record 7 (20) provides a much more reliable way of + setting the encoding. + The character_code is used as a fallback only. + */ + if ( NULL == dict_get_encoding (dict)) + { + switch (character_code) + { + case 1: + dict_set_encoding (dict, "EBCDIC-US"); + break; + case 2: + case 3: + /* These ostensibly mean "7-bit ASCII" and "8-bit ASCII"[sic] + respectively. However, there are known to be many files + in the wild with character code 2, yet have data which are + clearly not ascii. + Therefore we ignore these values. + */ + return; + case 4: + dict_set_encoding (dict, "MS_KANJI"); + break; + case 65000: + dict_set_encoding (dict, "UTF-7"); + break; + case 65001: + dict_set_encoding (dict, "UTF-8"); + break; + default: + { + char enc[100]; + snprintf (enc, 100, "CP%d", character_code); + dict_set_encoding (dict, enc); + } + break; + }; + } } /* Read record type 7, subtype 4. */ diff --git a/src/data/sys-file-writer.c b/src/data/sys-file-writer.c index 393be4e1..292ec9c5 100644 --- a/src/data/sys-file-writer.c +++ b/src/data/sys-file-writer.c @@ -103,6 +103,9 @@ static void write_float_info_record (struct sfm_writer *); static void write_longvar_table (struct sfm_writer *w, const struct dictionary *dict); +static void write_encoding_record (struct sfm_writer *w, + const struct dictionary *); + static void write_vls_length_table (struct sfm_writer *w, const struct dictionary *dict); @@ -246,6 +249,8 @@ sfm_open_writer (struct file_handle *fh, struct dictionary *d, write_data_file_attributes (w, d); write_variable_attributes (w, d); + write_encoding_record (w, d); + /* Write end-of-headers record. */ write_int (w, 999); write_int (w, 0); @@ -660,6 +665,24 @@ write_vls_length_table (struct sfm_writer *w, ds_destroy (&map); } + +static void +write_encoding_record (struct sfm_writer *w, + const struct dictionary *d) +{ + const char *enc = dict_get_encoding (d); + + if ( NULL == enc) + return; + + write_int (w, 7); /* Record type. */ + write_int (w, 20); /* Record subtype. */ + write_int (w, 1); /* Data item (char) size. */ + write_int (w, strlen (enc)); /* Number of data items. */ + write_string (w, enc, strlen (enc)); +} + + /* Writes the long variable name table. */ static void write_longvar_table (struct sfm_writer *w, const struct dictionary *dict) @@ -1019,7 +1042,7 @@ write_value (struct sfm_writer *w, const union value *value, int width) } /* Writes null-terminated STRING in a field of the given WIDTH to - W. If WIDTH is longer than WIDTH, it is truncated; if WIDTH + W. If STRING is longer than WIDTH, it is truncated; if WIDTH is narrowed, it is padded on the right with spaces. */ static void write_string (struct sfm_writer *w, const char *string, size_t width) diff --git a/src/language/data-io/combine-files.c b/src/language/data-io/combine-files.c index ccbe7679..1a82ef3f 100644 --- a/src/language/data-io/combine-files.c +++ b/src/language/data-io/combine-files.c @@ -488,12 +488,29 @@ merge_dictionary (struct dictionary *const m, struct comb_file *f) struct dictionary *d = f->dict; const char *d_docs, *m_docs; int i; + const char *file_encoding; if (dict_get_label (m) == NULL) dict_set_label (m, dict_get_label (d)); d_docs = dict_get_documents (d); m_docs = dict_get_documents (m); + + + /* If the input files have different encodings, then + */ + file_encoding = dict_get_encoding (f->dict); + if ( file_encoding != NULL) + { + if ( dict_get_encoding (m) == NULL) + dict_set_encoding (m, file_encoding); + else if ( 0 != strcmp (file_encoding, dict_get_encoding (m))) + { + msg (MW, + _("Combining files with incompatible encodings. String data may not be represented correctly.")); + } + } + if (d_docs != NULL) { if (m_docs == NULL) diff --git a/src/language/data-io/data-list.c b/src/language/data-io/data-list.c index d07eae5c..3b091404 100644 --- a/src/language/data-io/data-list.c +++ b/src/language/data-io/data-list.c @@ -75,8 +75,9 @@ cmd_data_list (struct lexer *lexer, struct dataset *ds) struct dictionary *dict; struct data_parser *parser; struct dfm_reader *reader; - struct variable *end; - struct file_handle *fh; + struct variable *end = NULL; + struct file_handle *fh = NULL; + struct string encoding = DS_EMPTY_INITIALIZER; int table; enum data_parser_type type; @@ -87,8 +88,6 @@ cmd_data_list (struct lexer *lexer, struct dataset *ds) dict = in_input_program () ? dataset_dict (ds) : dict_create (); parser = data_parser_create (); reader = NULL; - end = NULL; - fh = NULL; table = -1; /* Print table if nonzero, -1=undecided. */ has_type = false; @@ -103,6 +102,16 @@ cmd_data_list (struct lexer *lexer, struct dataset *ds) if (fh == NULL) goto error; } + else if (lex_match_id (lexer, "ENCODING")) + { + lex_match (lexer, '='); + if (!lex_force_string (lexer)) + goto error; + + ds_init_string (&encoding, lex_tokstr (lexer)); + + lex_get (lexer); + } else if (lex_match_id (lexer, "RECORDS")) { lex_match (lexer, '='); @@ -228,6 +237,14 @@ cmd_data_list (struct lexer *lexer, struct dataset *ds) } type = data_parser_get_type (parser); + if (! ds_is_empty (&encoding)) + { + if ( NULL == fh) + msg (MW, _("Encoding should not be specified for inline data. It will be ignored.")); + else + dict_set_encoding (dict, ds_cstr (&encoding)); + } + if (fh == NULL) fh = fh_inline_file (); fh_set_default_handle (fh); diff --git a/src/language/dictionary/sys-file-info.c b/src/language/dictionary/sys-file-info.c index d6279159..6d033419 100644 --- a/src/language/dictionary/sys-file-info.c +++ b/src/language/dictionary/sys-file-info.c @@ -108,7 +108,7 @@ cmd_sysfile_info (struct lexer *lexer, struct dataset *ds UNUSED) } casereader_destroy (reader); - t = tab_create (2, 10, 0); + t = tab_create (2, 11, 0); tab_vline (t, TAL_GAP, 1, 0, 8); tab_text (t, 0, 0, TAB_LEFT, _("File:")); tab_text (t, 1, 0, TAB_LEFT, fh_get_file_name (h)); @@ -153,6 +153,13 @@ cmd_sysfile_info (struct lexer *lexer, struct dataset *ds UNUSED) tab_text (t, 0, 9, TAB_LEFT, _("Mode:")); tab_text (t, 1, 9, TAB_LEFT | TAT_PRINTF, _("Compression %s."), info.compressed ? _("on") : _("off")); + + + tab_text (t, 0, 10, TAB_LEFT, _("Charset:")); + tab_text (t, 1, 10, TAB_LEFT | TAT_PRINTF, + dict_get_encoding(d) ? dict_get_encoding(d) : _("Unknown")); + + tab_dim (t, tab_natural_dimensions); tab_submit (t); diff --git a/src/language/utilities/set.q b/src/language/utilities/set.q index 37388f92..e8cdf1aa 100644 --- a/src/language/utilities/set.q +++ b/src/language/utilities/set.q @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -86,6 +87,7 @@ int tgetnum (const char *); journal=custom; log=custom; length=custom; + locale=custom; listing=custom; lowres=lores:auto/on/off; lpi=integer "x>0" "%s must be greater than 0"; @@ -361,6 +363,41 @@ stc_custom_length (struct lexer *lexer, struct dataset *ds UNUSED, struct cmd_se return 1; } +static int +stc_custom_locale (struct lexer *lexer, struct dataset *ds UNUSED, + struct cmd_set *cmd UNUSED, void *aux UNUSED) +{ + const struct string *s; + + lex_match (lexer, '='); + + if ( !lex_force_string (lexer)) + return 0; + + s = lex_tokstr (lexer); + + lex_get (lexer); + + /* First try this string as an encoding name */ + if ( valid_encoding (ds_cstr (s))) + set_default_encoding (ds_cstr (s)); + + /* Now try as a locale name (or alias) */ + else if (set_encoding_from_locale (ds_cstr (s))) + { + } + else + { + msg (ME, _("%s is not a recognised encoding or locale name"), + ds_cstr (s)); + return 0; + } + + return 1; +} + + + static int stc_custom_seed (struct lexer *lexer, struct dataset *ds UNUSED, struct cmd_set *cmd UNUSED, void *aux UNUSED) { @@ -589,6 +626,12 @@ show_length (const struct dataset *ds UNUSED) msg (SN, _("LENGTH is %d."), settings_get_viewlength ()); } +static void +show_locale (const struct dataset *ds UNUSED) +{ + msg (SN, _("LOCALE is %s"), get_default_encoding ()); +} + static void show_mxerrs (const struct dataset *ds UNUSED) { @@ -744,6 +787,7 @@ const struct show_sbc show_table[] = {"ERRORS", show_errors}, {"FORMAT", show_format}, {"LENGTH", show_length}, + {"LOCALE", show_locale}, {"MXERRS", show_mxerrs}, {"MXLOOPS", show_mxloops}, {"MXWARNS", show_mxwarns}, diff --git a/src/libpspp/i18n.c b/src/libpspp/i18n.c index f8e5e396..c0459712 100644 --- a/src/libpspp/i18n.c +++ b/src/libpspp/i18n.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2006 Free Software Foundation, Inc. + Copyright (C) 2006, 2009 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -21,12 +21,17 @@ #include #include #include +#include #include #include #include "assertion.h" +#include "hmapx.h" +#include "hash-functions.h" #include "i18n.h" +#include "version.h" + #include #include "xstrndup.h" @@ -34,28 +39,46 @@ #include #endif - -static char *locale; -static char *charset; - - -static iconv_t convertor[n_CONV]; - +static char *default_encoding; +static struct hmapx map; /* A wrapper around iconv_open */ static iconv_t create_iconv (const char* tocode, const char* fromcode) { - iconv_t conv = iconv_open (tocode, fromcode); + iconv_t conv; + struct hmapx_node *node; + size_t hash ; + char *key = alloca (strlen (tocode) + strlen (fromcode) + 2); + + strcpy (key, tocode); + strcat (key, "\n"); /* hopefully no encoding names contain '\n' */ + strcat (key, fromcode); - /* I don't think it's safe to translate this string or to use messaging - as the convertors have not yet been set up */ - if ( (iconv_t) -1 == conv && 0 != strcmp (tocode, fromcode)) + hash = hsh_hash_string (key); + + node = hmapx_first_with_hash (&map, hash); + + if (!node) + { + conv = iconv_open (tocode, fromcode); + + /* I don't think it's safe to translate this string or to use messaging + as the convertors have not yet been set up */ + if ( (iconv_t) -1 == conv && 0 != strcmp (tocode, fromcode)) + { + const int err = errno; + fprintf (stderr, + "Warning: " + "cannot create a convertor for \"%s\" to \"%s\": %s\n", + fromcode, tocode, strerror (err)); + } + + hmapx_insert (&map, conv, hash); + } + else { - const int err = errno; - fprintf (stderr, - "Warning: cannot create a convertor for \"%s\" to \"%s\": %s\n", - fromcode, tocode, strerror (err)); + conv = hmapx_node_data (node); } return conv; @@ -66,7 +89,8 @@ create_iconv (const char* tocode, const char* fromcode) The returned string must be freed when no longer required. */ char * -recode_string (enum conv_id how, const char *text, int length) +recode_string (const char *to, const char *from, + const char *text, int length) { char *outbuf = 0; size_t outbufferlength; @@ -74,6 +98,7 @@ recode_string (enum conv_id how, const char *text, int length) char *op ; size_t inbytes = 0; size_t outbytes ; + iconv_t conv ; /* FIXME: Need to ensure that this char is valid in the target encoding */ const char fallbackchar = '?'; @@ -84,10 +109,12 @@ recode_string (enum conv_id how, const char *text, int length) if ( length == -1 ) length = strlen(text); - assert (how < n_CONV); - if (convertor[how] == (iconv_t) -1) - return xstrndup (text, length); + if (to == NULL) + to = default_encoding; + + if (from == NULL) + from = default_encoding; for ( outbufferlength = 1 ; outbufferlength != 0; outbufferlength <<= 1 ) if ( outbufferlength > length) @@ -99,9 +126,12 @@ recode_string (enum conv_id how, const char *text, int length) outbytes = outbufferlength; inbytes = length; + + conv = create_iconv (to, from); + do { const char *ip = text; - result = iconv (convertor[how], (ICONV_CONST char **) &text, &inbytes, + result = iconv (conv, (ICONV_CONST char **) &text, &inbytes, &op, &outbytes); if ( -1 == result ) @@ -152,79 +182,113 @@ recode_string (enum conv_id how, const char *text, int length) } -/* Returns the current PSPP locale */ +void +i18n_init (void) +{ +#if ENABLE_NLS + setlocale (LC_CTYPE, ""); +#if HAVE_LC_MESSAGES + setlocale (LC_MESSAGES, ""); +#endif +#if HAVE_LC_PAPER + setlocale (LC_PAPER, ""); +#endif + bindtextdomain (PACKAGE, locale_dir); + textdomain (PACKAGE); +#endif /* ENABLE_NLS */ + + assert (default_encoding == NULL); + default_encoding = strdup (locale_charset ()); + + hmapx_init (&map); +} + + const char * -get_pspp_locale (void) +get_default_encoding (void) { - assert (locale); - return locale; + return default_encoding; } -/* Set the PSPP locale */ void -set_pspp_locale (const char *l) +set_default_encoding (const char *enc) { - char *current_locale; - char *current_charset; + free (default_encoding); + default_encoding = strdup (enc); +} - free(locale); - locale = strdup(l); - current_locale = strdup (setlocale (LC_CTYPE, 0)); - current_charset = strdup (locale_charset ()); - setlocale (LC_CTYPE, locale); +/* Attempts to set the encoding from a locale name + returns true if successfull. + This function does not (should not!) alter the current locale. +*/ +bool +set_encoding_from_locale (const char *loc) +{ + bool ok = true; + char *c_encoding; + char *loc_encoding; + char *tmp = strdup (setlocale (LC_CTYPE, NULL)); - free (charset); - charset = strdup (locale_charset ()); - setlocale (LC_CTYPE, current_locale); + setlocale (LC_CTYPE, "C"); + c_encoding = strdup (locale_charset ()); - iconv_close (convertor[CONV_PSPP_TO_UTF8]); - convertor[CONV_PSPP_TO_UTF8] = create_iconv ("UTF-8", charset); + setlocale (LC_CTYPE, loc); + loc_encoding = strdup (locale_charset ()); - iconv_close (convertor[CONV_SYSTEM_TO_PSPP]); - convertor[CONV_SYSTEM_TO_PSPP] = create_iconv (charset, current_charset); - iconv_close (convertor[CONV_UTF8_TO_PSPP]); - convertor[CONV_UTF8_TO_PSPP] = create_iconv (charset, "UTF-8"); + if ( 0 == strcmp (loc_encoding, c_encoding)) + { + ok = false; + } - free (current_locale); - free (current_charset); -} -void -i18n_init (void) -{ - assert (!locale) ; - locale = strdup (setlocale (LC_CTYPE, NULL)); + setlocale (LC_CTYPE, tmp); - setlocale (LC_CTYPE, locale); + free (tmp); + + if (ok) + { + free (default_encoding); + default_encoding = loc_encoding; + } + else + free (loc_encoding); - free (charset); - charset = strdup (locale_charset ()); + free (c_encoding); - convertor[CONV_PSPP_TO_UTF8] = create_iconv ("UTF-8", charset); - convertor[CONV_SYSTEM_TO_PSPP] = create_iconv (charset, charset); - convertor[CONV_UTF8_TO_PSPP] = create_iconv (charset, "UTF-8"); + return ok; } - void i18n_done (void) { - int i; - free (locale); - locale = 0; + struct hmapx_node *node; + iconv_t conv; + HMAPX_FOR_EACH (conv, node, &map) + iconv_close (conv); - for(i = 0 ; i < n_CONV; ++i ) - { - if ( (iconv_t) -1 == convertor[i] ) - continue; - iconv_close (convertor[i]); - } + hmapx_destroy (&map); + + free (default_encoding); + default_encoding = NULL; } +bool +valid_encoding (const char *enc) +{ + iconv_t conv = iconv_open ("UTF8", enc); + + if ( conv == (iconv_t) -1) + return false; + + iconv_close (conv); + + return true; +} + /* Return the system local's idea of the decimal seperator character */ diff --git a/src/libpspp/i18n.h b/src/libpspp/i18n.h index db15bad8..2c30a700 100644 --- a/src/libpspp/i18n.h +++ b/src/libpspp/i18n.h @@ -17,27 +17,27 @@ #ifndef I18N_H #define I18N_H -const char * get_pspp_locale (void); -void set_pspp_locale (const char *locale); -const char * get_pspp_charset (void); +#include void i18n_done (void); void i18n_init (void); -enum conv_id - { - CONV_PSPP_TO_UTF8, - CONV_SYSTEM_TO_PSPP, - CONV_UTF8_TO_PSPP, - n_CONV - }; +#define UTF8 "UTF-8" +char * recode_string (const char *to, const char *from, + const char *text, int len); -char * recode_string (enum conv_id how, const char *text, int len); +bool valid_encoding (const char *enc); /* Return the decimal separator according to the system locale */ char get_system_decimal (void); +const char * get_default_encoding (void); +void set_default_encoding (const char *enc); + +bool set_encoding_from_locale (const char *loc); + + #endif /* i18n.h */ diff --git a/src/ui/gui/compute-dialog.c b/src/ui/gui/compute-dialog.c index 7703640c..a42ce4fe 100644 --- a/src/ui/gui/compute-dialog.c +++ b/src/ui/gui/compute-dialog.c @@ -24,6 +24,7 @@ #include "psppire-var-store.h" #include "psppire-selector.h" #include "dialog-common.h" +#include #include #include @@ -631,7 +632,9 @@ insert_source_row_into_text_view (GtkTreeIter iter, gtk_tree_path_free (path); - name = pspp_locale_to_utf8 (var_get_name (var), -1, NULL); + name = recode_string (UTF8, psppire_dict_encoding (dict), + var_get_name (var), + -1); buffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (dest)); diff --git a/src/ui/gui/dialog-common.c b/src/ui/gui/dialog-common.c index 0aab294e..8d03bed1 100644 --- a/src/ui/gui/dialog-common.c +++ b/src/ui/gui/dialog-common.c @@ -16,6 +16,7 @@ #include +#include #include "dialog-common.h" #include "psppire-var-ptr.h" @@ -118,7 +119,8 @@ cell_var_name (GtkTreeViewColumn *tree_column, var = get_selected_variable (tree_model, iter, dict); - name = pspp_locale_to_utf8 (var_get_name (var), -1, NULL); + name = recode_string (UTF8, psppire_dict_encoding (dict), + var_get_name (var), -1); g_object_set (cell, "text", name, NULL); g_free (name); } diff --git a/src/ui/gui/dict-display.c b/src/ui/gui/dict-display.c index 2123c3c5..d6b1bcd5 100644 --- a/src/ui/gui/dict-display.c +++ b/src/ui/gui/dict-display.c @@ -23,6 +23,7 @@ #include "dict-display.h" #include "psppire-dict.h" +#include #include "helper.h" #include #include @@ -80,7 +81,8 @@ insert_source_row_into_entry (GtkTreeIter iter, gtk_tree_path_free (path); - name = pspp_locale_to_utf8 (var_get_name (var), -1, NULL); + name = recode_string (UTF8, psppire_dict_encoding (PSPPIRE_DICT (dict)), + var_get_name (var), -1); gtk_entry_set_text (GTK_ENTRY (dest), name); g_free (name); } @@ -142,7 +144,8 @@ is_currently_in_entry (GtkTreeModel *model, GtkTreeIter *iter, gtk_tree_path_free (path); - name = pspp_locale_to_utf8 (var_get_name (var), -1, NULL); + name = recode_string (UTF8, psppire_dict_encoding (PSPPIRE_DICT (dict)), + var_get_name (var), -1); result = ( 0 == strcmp (text, name)); g_free (name); diff --git a/src/ui/gui/helper.c b/src/ui/gui/helper.c index 9ac9fde6..6d90cfc4 100644 --- a/src/ui/gui/helper.c +++ b/src/ui/gui/helper.c @@ -152,20 +152,6 @@ get_widget_assert (GtkBuilder *builder, const gchar *name) return GTK_WIDGET (get_object_assert (builder, name, GTK_TYPE_WIDGET)); } -/* Converts a string in the pspp locale to utf-8. - The return value must be freed when no longer required*/ -gchar * -pspp_locale_to_utf8 (const gchar *text, gssize len, GError **err) -{ - return recode_string (CONV_PSPP_TO_UTF8, text, len); -} - -gchar * -utf8_to_pspp_locale (const gchar *text, gssize len, GError **err) -{ - return recode_string (CONV_UTF8_TO_PSPP, text, len); -} - /* This function must be used whenever a filename generated by glib, (eg, from gtk_file_chooser_get_filename) and passed to the C library, (eg through a pspp syntax string). diff --git a/src/ui/gui/helper.h b/src/ui/gui/helper.h index 30792faf..6bd610e7 100644 --- a/src/ui/gui/helper.h +++ b/src/ui/gui/helper.h @@ -43,14 +43,9 @@ GObject *get_object_assert (GtkBuilder *builder, const gchar *name, GType type); GtkAction * get_action_assert (GtkBuilder *builder, const gchar *name); GtkWidget * get_widget_assert (GtkBuilder *builder, const gchar *name); -/* Converts a string in the pspp locale to utf-8 */ -gchar * pspp_locale_to_utf8 (const gchar *text, gssize len, GError **err); -gchar * utf8_to_pspp_locale (const gchar *text, gssize len, GError **err); - gchar * convert_glib_filename_to_system_filename (const gchar *fname, GError **err); - void connect_help (GtkBuilder *); void reference_manual (GtkMenuItem *, gpointer); diff --git a/src/ui/gui/main.c b/src/ui/gui/main.c index a36e9410..71b9fbdc 100644 --- a/src/ui/gui/main.c +++ b/src/ui/gui/main.c @@ -142,6 +142,8 @@ main (int argc, char *argv[]) set_program_name (argv[0]); + gtk_disable_setlocale (); + if ( ! gtk_parse_args (&argc, &argv) ) { perror ("Error parsing arguments"); diff --git a/src/ui/gui/psppire-data-editor.c b/src/ui/gui/psppire-data-editor.c index be911ba6..9d15ceb9 100644 --- a/src/ui/gui/psppire-data-editor.c +++ b/src/ui/gui/psppire-data-editor.c @@ -23,6 +23,7 @@ #include #include "psppire-data-store.h" +#include #include #include "helper.h" @@ -743,7 +744,9 @@ update_data_ref_entry (const PsppireSheet *sheet, gchar *text = g_strdup_printf ("%d: %s", row + FIRST_CASE_NUMBER, var_get_name (var)); - gchar *s = pspp_locale_to_utf8 (text, -1, 0); + gchar *s = recode_string (UTF8, + psppire_dict_encoding (data_store->dict), + text, -1); g_free (text); diff --git a/src/ui/gui/psppire-data-store.c b/src/ui/gui/psppire-data-store.c index 45104f5c..45fa8248 100644 --- a/src/ui/gui/psppire-data-store.c +++ b/src/ui/gui/psppire-data-store.c @@ -31,6 +31,7 @@ #include #include "psppire-data-store.h" +#include #include "helper.h" #include @@ -598,7 +599,8 @@ psppire_data_store_get_string (PsppireDataStore *store, glong row, glong column) if (label) { free (v); - return pspp_locale_to_utf8 (label, -1, 0); + return recode_string (UTF8, psppire_dict_encoding (store->dict), + label, -1); } } @@ -616,7 +618,8 @@ psppire_data_store_get_string (PsppireDataStore *store, glong row, glong column) FP. No null terminator is appended to the buffer. */ data_out (v, fp, s->str); - text = pspp_locale_to_utf8 (s->str, fp->w, 0); + text = recode_string (UTF8, psppire_dict_encoding (store->dict), + s->str, fp->w); g_string_free (s, TRUE); g_strchomp (text); @@ -673,7 +676,7 @@ psppire_data_store_set_string (PsppireDataStore *store, if (row == n_cases) psppire_data_store_insert_new_case (store, row); - s = utf8_to_pspp_locale (text, -1, NULL); + s = recode_string (psppire_dict_encoding (store->dict), UTF8, text, -1); psppire_data_store_data_in (store, row, var_get_case_index (pv), ss_cstr (s), @@ -749,9 +752,11 @@ static const gchar null_var_name[]=N_("var"); static gchar * get_row_button_label (const PsppireSheetModel *model, gint unit) { + PsppireDataStore *ds = PSPPIRE_DATA_STORE (model); gchar *s = g_strdup_printf (_("%d"), unit + FIRST_CASE_NUMBER); - gchar *text = pspp_locale_to_utf8 (s, -1, 0); + gchar *text = recode_string (UTF8, psppire_dict_encoding (ds->dict), + s, -1); g_free (s); @@ -787,7 +792,8 @@ get_column_subtitle (const PsppireSheetModel *model, gint col) if ( ! var_has_label (v)) return NULL; - text = pspp_locale_to_utf8 (var_get_label (v), -1, 0); + text = recode_string (UTF8, psppire_dict_encoding (ds->dict), + var_get_label (v), -1); return text; } @@ -804,7 +810,8 @@ get_column_button_label (const PsppireSheetModel *model, gint col) pv = psppire_dict_get_variable (ds->dict, col); - text = pspp_locale_to_utf8 (var_get_name (pv), -1, 0); + text = recode_string (UTF8, psppire_dict_encoding (ds->dict), + var_get_name (pv), -1); return text; } diff --git a/src/ui/gui/psppire-data-window.c b/src/ui/gui/psppire-data-window.c index 03ae6e4d..52ec0d49 100644 --- a/src/ui/gui/psppire-data-window.c +++ b/src/ui/gui/psppire-data-window.c @@ -460,15 +460,16 @@ name_has_suffix (const gchar *name) static void save_file (PsppireWindow *w) { - gchar *fn = NULL; + gchar *native_file_name = NULL; + gchar *file_name = NULL; GString *fnx; struct getl_interface *sss; - struct string file_name ; + struct string filename ; PsppireDataWindow *de = PSPPIRE_DATA_WINDOW (w); - g_object_get (w, "filename", &fn, NULL); + g_object_get (w, "filename", &file_name, NULL); - fnx = g_string_new (fn); + fnx = g_string_new (file_name); if ( ! name_has_suffix (fnx->str)) { @@ -478,22 +479,28 @@ save_file (PsppireWindow *w) g_string_append (fnx, ".sav"); } - ds_init_empty (&file_name); - syntax_gen_string (&file_name, ss_cstr (fnx->str)); - g_string_free (fnx, FALSE); + ds_init_empty (&filename); + + native_file_name = + convert_glib_filename_to_system_filename (fnx->str, NULL); + + g_string_free (fnx, TRUE); + + syntax_gen_string (&filename, ss_cstr (native_file_name)); + g_free (native_file_name); if ( de->save_as_portable ) { sss = create_syntax_string_source ("EXPORT OUTFILE=%s.", - ds_cstr (&file_name)); + ds_cstr (&filename)); } else { sss = create_syntax_string_source ("SAVE OUTFILE=%s.", - ds_cstr (&file_name)); + ds_cstr (&filename)); } - ds_destroy (&file_name); + ds_destroy (&filename); execute_syntax (sss); } diff --git a/src/ui/gui/psppire-dict.c b/src/ui/gui/psppire-dict.c index 564257f4..afc7d570 100644 --- a/src/ui/gui/psppire-dict.c +++ b/src/ui/gui/psppire-dict.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "helper.h" #include "message-dialog.h" @@ -752,10 +753,11 @@ tree_model_get_value (GtkTreeModel *model, GtkTreeIter *iter, { case DICT_TVM_COL_NAME: { - gchar *name = pspp_locale_to_utf8(var_get_name (var), -1, NULL); - g_value_init (value, G_TYPE_STRING); - g_value_set_string (value, name); - g_free (name); + gchar *name = recode_string (UTF8, psppire_dict_encoding (dict), + var_get_name (var), -1); + g_value_init (value, G_TYPE_STRING); + g_value_set_string (value, name); + g_free (name); } break; case DICT_TVM_COL_VAR: @@ -859,3 +861,12 @@ psppire_dict_dump (const PsppireDict *dict) } } #endif + + + + +const gchar * +psppire_dict_encoding (const PsppireDict *dict) +{ + return dict_get_encoding (dict->dict); +} diff --git a/src/ui/gui/psppire-dict.h b/src/ui/gui/psppire-dict.h index 3fd73f9a..54f3e39a 100644 --- a/src/ui/gui/psppire-dict.h +++ b/src/ui/gui/psppire-dict.h @@ -109,6 +109,8 @@ struct variable * psppire_dict_get_weight_variable (const PsppireDict *); void psppire_dict_dump (const PsppireDict *); #endif +const gchar *psppire_dict_encoding (const PsppireDict *); + G_END_DECLS #endif /* __PSPPIRE_DICT_H__ */ diff --git a/src/ui/gui/psppire-dictview.c b/src/ui/gui/psppire-dictview.c index a2cc7df8..f63ea0ba 100644 --- a/src/ui/gui/psppire-dictview.c +++ b/src/ui/gui/psppire-dictview.c @@ -21,6 +21,7 @@ #include "psppire-dict.h" #include "psppire-conf.h" #include +#include #include "helper.h" #include @@ -274,11 +275,15 @@ dv_get_base_model (GtkTreeModel *top_model, GtkTreeIter *top_iter, ) { *model = top_model; - *iter = *top_iter; + + if ( iter) + *iter = *top_iter; while ( ! PSPPIRE_IS_DICT (*model)) { - GtkTreeIter parent_iter = *iter; + GtkTreeIter parent_iter; + if (iter) + parent_iter = *iter; if ( GTK_IS_TREE_MODEL_FILTER (*model)) { @@ -286,9 +291,10 @@ dv_get_base_model (GtkTreeModel *top_model, GtkTreeIter *top_iter, *model = gtk_tree_model_filter_get_model (parent_model); - gtk_tree_model_filter_convert_iter_to_child_iter (parent_model, - iter, - &parent_iter); + if (iter) + gtk_tree_model_filter_convert_iter_to_child_iter (parent_model, + iter, + &parent_iter); } else if (GTK_IS_TREE_MODEL_SORT (*model)) { @@ -296,9 +302,10 @@ dv_get_base_model (GtkTreeModel *top_model, GtkTreeIter *top_iter, *model = gtk_tree_model_sort_get_model (parent_model); - gtk_tree_model_sort_convert_iter_to_child_iter (parent_model, - iter, - &parent_iter); + if (iter) + gtk_tree_model_sort_convert_iter_to_child_iter (parent_model, + iter, + &parent_iter); } } } @@ -318,11 +325,11 @@ var_description_cell_data_func (GtkTreeViewColumn *col, struct variable *var; GtkTreeIter iter; GtkTreeModel *model; - + PsppireDict *dict; dv_get_base_model (top_model, top_iter, &model, &iter); - g_assert (PSPPIRE_IS_DICT (model)); + dict = PSPPIRE_DICT (model); gtk_tree_model_get (model, &iter, DICT_TVM_COL_VAR, &var, -1); @@ -333,7 +340,8 @@ var_description_cell_data_func (GtkTreeViewColumn *col, "%s", var_get_label (var)); - char *utf8 = pspp_locale_to_utf8 (text, -1, NULL); + char *utf8 = recode_string (UTF8, psppire_dict_encoding (dict), + text, -1); g_free (text); g_object_set (cell, "markup", utf8, NULL); @@ -341,7 +349,8 @@ var_description_cell_data_func (GtkTreeViewColumn *col, } else { - char *name = pspp_locale_to_utf8 (var_get_name (var), -1, NULL); + char *name = recode_string (UTF8, psppire_dict_encoding (dict), + var_get_name (var), -1); g_object_set (cell, "text", name, NULL); g_free (name); } @@ -406,7 +415,6 @@ set_tooltip_for_variable (GtkTreeView *treeview, struct variable *var = NULL; gboolean ok; - gtk_tree_view_convert_widget_to_bin_window_coords (treeview, x, y, &bx, &by); @@ -416,7 +424,6 @@ set_tooltip_for_variable (GtkTreeView *treeview, tree_model = gtk_tree_view_get_model (treeview); - gtk_tree_view_set_tooltip_row (treeview, tooltip, path); ok = gtk_tree_model_get_iter (tree_model, &iter, path); @@ -433,11 +440,18 @@ set_tooltip_for_variable (GtkTreeView *treeview, { gchar *tip ; + GtkTreeModel *m; + PsppireDict *dict; + + dv_get_base_model (tree_model, NULL, &m, NULL); + dict = PSPPIRE_DICT (m); if ( PSPPIRE_DICT_VIEW (treeview)->prefer_labels ) - tip = pspp_locale_to_utf8 (var_get_name (var), -1, NULL); + tip = recode_string (UTF8, psppire_dict_encoding (dict), + var_get_name (var), -1); else - tip = pspp_locale_to_utf8 (var_get_label (var), -1, NULL); + tip = recode_string (UTF8, psppire_dict_encoding (dict), + var_get_label (var), -1); gtk_tooltip_set_text (tooltip, tip); diff --git a/src/ui/gui/psppire-syntax-window.c b/src/ui/gui/psppire-syntax-window.c index 053dc227..e3aaa756 100644 --- a/src/ui/gui/psppire-syntax-window.c +++ b/src/ui/gui/psppire-syntax-window.c @@ -1,5 +1,5 @@ /* PSPPIRE - a graphical user interface for PSPP. - Copyright (C) 2008 Free Software Foundation + Copyright (C) 2008, 2009 Free Software Foundation This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -229,7 +229,8 @@ append_suffix (const gchar *filename) /* Save BUFFER to the file called FILENAME. - If successful, clears the buffer's modified flag + FILENAME must be encoded in Glib filename encoding. + If successful, clears the buffer's modified flag. */ static gboolean save_editor_to_file (PsppireSyntaxWindow *se, @@ -242,28 +243,24 @@ save_editor_to_file (PsppireSyntaxWindow *se, gchar *text; gchar *suffixedname; - gchar *glibfilename; g_assert (filename); suffixedname = append_suffix (filename); - glibfilename = g_filename_from_utf8 (suffixedname, -1, 0, 0, err); - - g_free ( suffixedname); - - if ( ! glibfilename ) - return FALSE; - gtk_text_buffer_get_iter_at_line (buffer, &start, 0); gtk_text_buffer_get_iter_at_offset (buffer, &stop, -1); text = gtk_text_buffer_get_text (buffer, &start, &stop, FALSE); - result = g_file_set_contents (glibfilename, text, -1, err); + result = g_file_set_contents (suffixedname, text, -1, err); + + g_free (suffixedname); if ( result ) { - gchar *msg = g_strdup_printf (_("Saved file \"%s\""), filename); + char *fn = g_filename_display_name (filename); + gchar *msg = g_strdup_printf (_("Saved file \"%s\""), fn); + g_free (fn); gtk_statusbar_push (GTK_STATUSBAR (se->sb), se->text_context, msg); gtk_text_buffer_set_modified (buffer, FALSE); g_free (msg); diff --git a/src/ui/gui/psppire-var-sheet.c b/src/ui/gui/psppire-var-sheet.c index 696bab21..94b88050 100644 --- a/src/ui/gui/psppire-var-sheet.c +++ b/src/ui/gui/psppire-var-sheet.c @@ -489,7 +489,8 @@ psppire_var_sheet_realize (GtkWidget *w) GtkWidget *toplevel = gtk_widget_get_toplevel (GTK_WIDGET (vs)); - vs->val_labs_dialog = val_labs_dialog_create (GTK_WINDOW (toplevel)); + vs->val_labs_dialog = val_labs_dialog_create (GTK_WINDOW (toplevel), + PSPPIRE_SHEET (vs)); vs->missing_val_dialog = missing_val_dialog_create (GTK_WINDOW (toplevel)); vs->var_type_dialog = var_type_dialog_create (GTK_WINDOW (toplevel)); diff --git a/src/ui/gui/psppire-var-store.c b/src/ui/gui/psppire-var-store.c index 1d6f465d..092de58c 100644 --- a/src/ui/gui/psppire-var-store.c +++ b/src/ui/gui/psppire-var-store.c @@ -21,7 +21,7 @@ #define _(msgid) gettext (msgid) #define N_(msgid) msgid - +#include #include @@ -53,9 +53,6 @@ static void psppire_var_store_sheet_model_init (PsppireSheetModelIface * static void psppire_var_store_finalize (GObject *object); -gchar * missing_values_to_string (const struct variable *pv, GError **err); - - static gchar *psppire_var_store_get_string (const PsppireSheetModel *sheet_model, glong row, glong column); static gboolean psppire_var_store_clear (PsppireSheetModel *model, glong row, glong col); @@ -67,7 +64,8 @@ static gboolean psppire_var_store_set_string (PsppireSheetModel *model, static glong psppire_var_store_get_row_count (const PsppireSheetModel * model); static glong psppire_var_store_get_column_count (const PsppireSheetModel * model); -static gchar *text_for_column (const struct variable *pv, gint c, GError **err); +static gchar *text_for_column (PsppireVarStore *vs, const struct variable *pv, + gint c, GError **err); static GObjectClass *parent_class = NULL; @@ -399,7 +397,8 @@ psppire_var_store_finalize (GObject *object) } static gchar * -psppire_var_store_get_string (const PsppireSheetModel *model, glong row, glong column) +psppire_var_store_get_string (const PsppireSheetModel *model, + glong row, glong column) { PsppireVarStore *store = PSPPIRE_VAR_STORE (model); @@ -410,7 +409,7 @@ psppire_var_store_get_string (const PsppireSheetModel *model, glong row, glong c pv = psppire_dict_get_variable (store->dict, row); - return text_for_column (pv, column, 0); + return text_for_column (store, pv, column, 0); } @@ -468,14 +467,15 @@ psppire_var_store_set_string (PsppireSheetModel *model, { case PSPPIRE_VAR_STORE_COL_NAME: { - int i; - /* Until non-ascii in variable names is better managed, - simply refuse to allow them to be entered. */ - for (i = 0 ; i < strlen (text) ; ++i ) - if (!g_ascii_isprint (text[i])) - return FALSE; - return psppire_dict_rename_var (var_store->dict, pv, text); - break; + gboolean ok; + char *s = recode_string (psppire_dict_encoding (var_store->dict), + UTF8, + text, -1); + + ok = psppire_dict_rename_var (var_store->dict, pv, s); + + free (s); + return ok; } case PSPPIRE_VAR_STORE_COL_COLUMNS: if ( ! text) return FALSE; @@ -531,7 +531,9 @@ psppire_var_store_set_string (PsppireSheetModel *model, break; case PSPPIRE_VAR_STORE_COL_LABEL: { - gchar *s = utf8_to_pspp_locale (text, -1, NULL); + gchar *s = recode_string (psppire_dict_encoding (var_store->dict), + UTF8, + text, -1); var_set_label (pv, s); free (s); return TRUE; @@ -557,8 +559,10 @@ psppire_var_store_set_string (PsppireSheetModel *model, static const gchar none[] = N_("None"); static gchar * -text_for_column (const struct variable *pv, gint c, GError **err) +text_for_column (PsppireVarStore *vs, + const struct variable *pv, gint c, GError **err) { + PsppireDict *dict = vs->dict; static const gchar *const type_label[] = { N_("Numeric"), @@ -578,7 +582,8 @@ text_for_column (const struct variable *pv, gint c, GError **err) switch (c) { case PSPPIRE_VAR_STORE_COL_NAME: - return pspp_locale_to_utf8 ( var_get_name (pv), -1, err); + return recode_string (UTF8, psppire_dict_encoding (dict), + var_get_name (pv), -1); break; case PSPPIRE_VAR_STORE_COL_TYPE: { @@ -665,12 +670,13 @@ text_for_column (const struct variable *pv, gint c, GError **err) } break; case PSPPIRE_VAR_STORE_COL_LABEL: - return pspp_locale_to_utf8 (var_get_label (pv), -1, err); + return recode_string (UTF8, psppire_dict_encoding (dict), + var_get_label (pv), -1); break; case PSPPIRE_VAR_STORE_COL_MISSING: { - return missing_values_to_string (pv, err); + return missing_values_to_string (dict, pv, err); } break; case PSPPIRE_VAR_STORE_COL_VALUES: @@ -696,7 +702,8 @@ text_for_column (const struct variable *pv, gint c, GError **err) val_labs_done (&ip); - ss = pspp_locale_to_utf8 (gstr->str, gstr->len, err); + ss = recode_string (UTF8, psppire_dict_encoding (dict), + gstr->str, gstr->len); g_string_free (gstr, TRUE); return ss; } diff --git a/src/ui/gui/psppire.c b/src/ui/gui/psppire.c index 599d8108..a1f48e7b 100644 --- a/src/ui/gui/psppire.c +++ b/src/ui/gui/psppire.c @@ -16,7 +16,7 @@ #include -#include +#include #include #include #include @@ -89,12 +89,8 @@ initialize (struct command_line_processor *clp, int argc, char **argv) { PsppireDict *dictionary = 0; - /* gtk_init messes with the locale. - So unset the bits we want to control ourselves */ - setlocale (LC_NUMERIC, "C"); - - bindtextdomain (PACKAGE, locale_dir); + i18n_init (); preregister_widgets (); @@ -171,6 +167,7 @@ de_initialize (void) message_dialog_done (); settings_done (); outp_done (); + i18n_done (); } diff --git a/src/ui/gui/val-labs-dialog.c b/src/ui/gui/val-labs-dialog.c index f6d0ab73..42dd0b34 100644 --- a/src/ui/gui/val-labs-dialog.c +++ b/src/ui/gui/val-labs-dialog.c @@ -1,5 +1,5 @@ /* PSPPIRE - a graphical user interface for PSPP. - Copyright (C) 2005 Free Software Foundation + Copyright (C) 2005, 2009 Free Software Foundation This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -26,12 +26,16 @@ #include "val-labs-dialog.h" #include #include - +#include "psppire-var-sheet.h" +#include "psppire-var-store.h" +#include struct val_labs_dialog { GtkWidget *window; + PsppireSheet *vs; + /* The variable to be updated */ struct variable *pv; @@ -325,8 +329,7 @@ on_remove (GtkWidget *w, gpointer data) /* Callback which occurs when a line item is selected in the list of value--label pairs.*/ static void -on_select_row (GtkTreeView *treeview, - gpointer data) +on_select_row (GtkTreeView *treeview, gpointer data) { gchar *labeltext; struct val_labs_dialog *dialog = data; @@ -336,6 +339,9 @@ on_select_row (GtkTreeView *treeview, gchar *const text = value_to_text (vl->value, *var_get_write_format (dialog->pv)); + PsppireVarStore *var_store = + PSPPIRE_VAR_STORE (psppire_sheet_get_model (dialog->vs)); + g_signal_handler_block (GTK_ENTRY (dialog->value_entry), dialog->value_handler_id); @@ -348,7 +354,10 @@ on_select_row (GtkTreeView *treeview, g_signal_handler_block (GTK_ENTRY (dialog->label_entry), dialog->change_handler_id); - labeltext = pspp_locale_to_utf8 (vl->label, -1, 0); + + labeltext = recode_string (UTF8, psppire_dict_encoding (var_store->dict), + vl->label, -1); + gtk_entry_set_text (GTK_ENTRY (dialog->label_entry), labeltext); g_free (labeltext); @@ -364,7 +373,7 @@ on_select_row (GtkTreeView *treeview, /* Create a new dialog box (there should normally be only one)*/ struct val_labs_dialog * -val_labs_dialog_create (GtkWindow *toplevel) +val_labs_dialog_create (GtkWindow *toplevel, PsppireSheet *sheet) { GtkTreeViewColumn *column; @@ -377,6 +386,7 @@ val_labs_dialog_create (GtkWindow *toplevel) dialog->window = get_widget_assert (xml,"val_labs_dialog"); dialog->value_entry = get_widget_assert (xml,"value_entry"); dialog->label_entry = get_widget_assert (xml,"label_entry"); + dialog->vs = sheet; gtk_window_set_transient_for (GTK_WINDOW (dialog->window), toplevel); @@ -461,6 +471,9 @@ repopulate_dialog (struct val_labs_dialog *dialog) GtkTreeIter iter; + PsppireVarStore *var_store = + PSPPIRE_VAR_STORE (psppire_sheet_get_model (dialog->vs)); + GtkListStore *list_store = gtk_list_store_new (2, G_TYPE_STRING, G_TYPE_DOUBLE); @@ -489,11 +502,12 @@ repopulate_dialog (struct val_labs_dialog *dialog) *var_get_write_format (dialog->pv)); gchar *labeltext = - pspp_locale_to_utf8 (vl->label, -1, 0); + recode_string (UTF8, + psppire_dict_encoding (var_store->dict), + vl->label, -1); gchar *const text = g_strdup_printf ("%s = \"%s\"", - vstr, labeltext); - + vstr, labeltext); gtk_list_store_append (list_store, &iter); gtk_list_store_set (list_store, &iter, diff --git a/src/ui/gui/val-labs-dialog.h b/src/ui/gui/val-labs-dialog.h index 6b1d0e37..3a09f1ca 100644 --- a/src/ui/gui/val-labs-dialog.h +++ b/src/ui/gui/val-labs-dialog.h @@ -14,8 +14,6 @@ You should have received a copy of the GNU General Public License along with this program. If not, see . */ - - #ifndef __PSPPIRE_VAL_LABS_DIALOG_H #define __PSPPIRE_VAL_LABS_DIALOG_H @@ -26,12 +24,13 @@ #include #include +#include struct val_labs; -struct val_labs_dialog * val_labs_dialog_create (GtkWindow *); +struct val_labs_dialog * val_labs_dialog_create (GtkWindow *, PsppireSheet *); void val_labs_dialog_show (struct val_labs_dialog *); diff --git a/src/ui/gui/var-display.c b/src/ui/gui/var-display.c index 07c7a473..50e0df4b 100644 --- a/src/ui/gui/var-display.c +++ b/src/ui/gui/var-display.c @@ -4,34 +4,17 @@ #include #include #include +#include "psppire-dict.h" #include #define _(msgid) gettext (msgid) #define N_(msgid) msgid #include "helper.h" +#include static const gchar none[] = N_("None"); -gchar * -name_to_string (const struct variable *var, GError **err) -{ - const char *name = var_get_name (var); - g_assert (name); - - return pspp_locale_to_utf8 (name, -1, err); -} - - -gchar * -label_to_string (const struct variable *var, GError **err) -{ - const char *label = var_get_label (var); - - if ( ! label ) return g_strdup (none); - - return pspp_locale_to_utf8 (label, -1, err); -} gchar * measure_to_string (const struct variable *var, GError **err) @@ -45,7 +28,7 @@ measure_to_string (const struct variable *var, GError **err) gchar * -missing_values_to_string (const struct variable *pv, GError **err) +missing_values_to_string (const PsppireDict *dict, const struct variable *pv, GError **err) { const struct fmt_spec *fmt = var_get_print_format (pv); gchar *s; @@ -70,7 +53,8 @@ missing_values_to_string (const struct variable *pv, GError **err) g_string_append (gstr, mv[i]); g_free (mv[i]); } - s = pspp_locale_to_utf8 (gstr->str, gstr->len, err); + s = recode_string (UTF8, psppire_dict_encoding (dict), + gstr->str, gstr->len); g_string_free (gstr, TRUE); } else @@ -99,7 +83,8 @@ missing_values_to_string (const struct variable *pv, GError **err) g_string_append (gstr, ss); free (ss); } - s = pspp_locale_to_utf8 (gstr->str, gstr->len, err); + s = recode_string (UTF8, psppire_dict_encoding (dict), + gstr->str, gstr->len); g_string_free (gstr, TRUE); } diff --git a/src/ui/gui/var-display.h b/src/ui/gui/var-display.h index 40404b89..927e235c 100644 --- a/src/ui/gui/var-display.h +++ b/src/ui/gui/var-display.h @@ -20,24 +20,16 @@ #include #include +#include "psppire-dict.h" struct variable; #define n_ALIGNMENTS 3 extern const gchar *const alignments[n_ALIGNMENTS + 1]; - extern const gchar *const measures[n_MEASURES + 1]; - -gchar * name_to_string (const struct variable *var, GError **err); - - -gchar * missing_values_to_string (const struct variable *pv, GError **err); - -gchar * measure_to_string (const struct variable *var, GError **err); - -gchar * label_to_string (const struct variable *var, GError **err); - +gchar *missing_values_to_string (const PsppireDict *dict, const struct variable *pv, GError **err); +gchar *measure_to_string (const struct variable *var, GError **err); #endif diff --git a/src/ui/gui/variable-info-dialog.c b/src/ui/gui/variable-info-dialog.c index 612643b4..3b3367f7 100644 --- a/src/ui/gui/variable-info-dialog.c +++ b/src/ui/gui/variable-info-dialog.c @@ -28,6 +28,7 @@ #include "helper.h" #include +#include #include "helper.h" @@ -36,28 +37,58 @@ #define N_(msgid) msgid +static const gchar none[] = N_("None"); + + +static gchar * +name_to_string (const struct variable *var, PsppireDict *dict) +{ + const char *name = var_get_name (var); + g_assert (name); + + return recode_string (UTF8, psppire_dict_encoding (dict), + name, -1); +} + + +static gchar * +label_to_string (const struct variable *var, PsppireDict *dict) +{ + const char *label = var_get_label (var); + + if (! label) return g_strdup (none); + + return recode_string (UTF8, psppire_dict_encoding (dict), + label, -1); +} + static void populate_text (PsppireDictView *treeview, gpointer data) { gchar *text = 0; GString *gstring; + PsppireDict *dict; - GtkTextBuffer *textbuffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW(data)); + GtkTextBuffer *textbuffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (data)); const struct variable *var = psppire_dict_view_get_selected_variable (treeview); if ( var == NULL) return; + g_object_get (treeview, + "dictionary", &dict, + NULL); + gstring = g_string_sized_new (200); - text = name_to_string (var, NULL); + text = name_to_string (var, dict); g_string_assign (gstring, text); g_free (text); g_string_append (gstring, "\n"); - text = label_to_string (var, NULL); + text = label_to_string (var, dict); g_string_append_printf (gstring, _("Label: %s\n"), text); g_free (text); @@ -70,7 +101,7 @@ populate_text (PsppireDictView *treeview, gpointer data) g_string_append_printf (gstring, _("Type: %s\n"), buffer); } - text = missing_values_to_string (var, NULL); + text = missing_values_to_string (dict, var, NULL); g_string_append_printf (gstring, _("Missing Values: %s\n"), text); g_free (text); @@ -92,7 +123,6 @@ populate_text (PsppireDictView *treeview, gpointer data) g_string_append (gstring, "\n"); g_string_append (gstring, _("Value Labels:\n")); -#if 1 for (vl = val_labs_first_sorted (labs, &vli); vl; vl = val_labs_next (labs, &vli)) @@ -100,14 +130,15 @@ populate_text (PsppireDictView *treeview, gpointer data) gchar *const vstr = value_to_text (vl->value, *var_get_print_format (var)); - text = pspp_locale_to_utf8 (vl->label, -1, NULL); + + text = recode_string (UTF8, psppire_dict_encoding (dict), + vl->label, -1); g_string_append_printf (gstring, _("%s %s\n"), vstr, text); g_free (text); g_free (vstr); } -#endif } gtk_text_buffer_set_text (textbuffer, gstring->str, gstring->len); diff --git a/src/ui/terminal/main.c b/src/ui/terminal/main.c index e37cace3..a1f20169 100644 --- a/src/ui/terminal/main.c +++ b/src/ui/terminal/main.c @@ -16,7 +16,6 @@ #include -#include #include #include #include @@ -30,6 +29,8 @@ #include #endif + +#include #include #include #include @@ -61,7 +62,6 @@ #define _(msgid) gettext (msgid) -static void i18n_init (void); static void fpu_init (void); static void clean_up (void); @@ -166,21 +166,6 @@ main (int argc, char **argv) return any_errors (); } -static void -i18n_init (void) -{ -#if ENABLE_NLS -#if HAVE_LC_MESSAGES - setlocale (LC_MESSAGES, ""); -#endif -#if HAVE_LC_PAPER - setlocale (LC_PAPER, ""); -#endif - bindtextdomain (PACKAGE, locale_dir); - textdomain (PACKAGE); -#endif /* ENABLE_NLS */ -} - static void fpu_init (void) { @@ -234,5 +219,6 @@ clean_up (void) readln_uninitialize (); outp_done (); msg_ui_done (); + i18n_done (); } } diff --git a/tests/command/sysfile-info.sh b/tests/command/sysfile-info.sh index 3456fba2..bc11fa3a 100755 --- a/tests/command/sysfile-info.sh +++ b/tests/command/sysfile-info.sh @@ -99,6 +99,7 @@ Cases: 3 Type: System File. Weight: Not weighted. Mode: Compression on. +Charset: Unknown +--------+-------------+---+ |Variable|Description |Pos| | | |iti| diff --git a/tests/dissect-sysfile.c b/tests/dissect-sysfile.c index 25d01158..1c81c72c 100644 --- a/tests/dissect-sysfile.c +++ b/tests/dissect-sysfile.c @@ -66,6 +66,9 @@ static void read_datafile_attributes (struct sfm_reader *r, size_t size, size_t count); static void read_variable_attributes (struct sfm_reader *r, size_t size, size_t count); +static void read_character_encoding (struct sfm_reader *r, + size_t size, size_t count); + static struct text_record *open_text_record ( struct sfm_reader *, size_t size); @@ -510,6 +513,10 @@ read_extension_record (struct sfm_reader *r) read_variable_attributes (r, size, count); return; + case 20: + read_character_encoding (r, size, count); + return; + default: sys_warn (r, _("Unrecognized record type 7, subtype %d."), subtype); break; @@ -712,6 +719,17 @@ read_datafile_attributes (struct sfm_reader *r, size_t size, size_t count) close_text_record (text); } +static void +read_character_encoding (struct sfm_reader *r, size_t size, size_t count) +{ + const unsigned long int posn = ftell (r->file); + char *encoding = calloc (size, count + 1); + read_string (r, encoding, count + 1); + + printf ("%08lx: Character Encoding: %s\n", posn, encoding); +} + + static void read_variable_attributes (struct sfm_reader *r, size_t size, size_t count) {