encoding-guesser: Fall back to windows-1252 when UTF-8 can't be right.
authorBen Pfaff <blp@cs.stanford.edu>
Thu, 1 Mar 2012 06:43:22 +0000 (22:43 -0800)
committerBen Pfaff <blp@cs.stanford.edu>
Fri, 2 Mar 2012 05:26:05 +0000 (21:26 -0800)
commitd6c75296e5573a997c79a7af1195b6a619c0190c
tree65f31d3e99f5d069e85e9ebbc6a6d1426f64db2a
parentd12d1f1b4a8a43d611e1c30b57f5bd66178a103a
encoding-guesser: Fall back to windows-1252 when UTF-8 can't be right.

Until now the encoding-guesser code has used UTF-8 as a fallback in
situations where we can tell that the file is not valid UTF-8.  In
this kind of situation having a single-byte character set as a
fallback makes more sense.  This commit hard-codes windows-1252 as
that fallback, since it is a widely encountered encoding (and
compatible with ISO-8859-1 as well).

John Darrington originally suggested this, if I recall correctly.

The bug report that spurred this work was from Harry Thijssen.  With
this commit, PSPP properly reads his windows-1252 file when the
system locale uses UTF-8 encoding.
doc/utilities.texi
src/libpspp/encoding-guesser.c
src/libpspp/encoding-guesser.h
src/libpspp/i18n.c
src/libpspp/i18n.h
src/libpspp/u8-istream.c
tests/libpspp/encoding-guesser.at