Ben Pfaff [Sat, 2 Apr 2011 04:14:49 +0000 (21:14 -0700)]
gui: Use dispose instead of finalize method in PsppireDataWindow.
According to the GObject reference manual, "When dispose ends, the
object should not hold any reference to any other member object."
That is, references should be dropped in dispose, not in finalize.
Ben Pfaff [Wed, 30 Mar 2011 04:51:42 +0000 (21:51 -0700)]
gui: Drop null base_finalize function from PsppireDataWindow.
According to the GObject manual, base_finalize is "Never used in
practice. Unlikely you will need it." so I don't see a reason to keep
a stub here.
Ben Pfaff [Sat, 2 Apr 2011 04:12:54 +0000 (21:12 -0700)]
gui: Fix const-ness warning in create_lines_tree_view().
This fixes this warning from GCC:
src/ui/gui/text-data-import-dialog.c: In function ‘create_lines_tree_view’:
src/ui/gui/text-data-import-dialog.c:875: warning: initialization
discards qualifiers from pointer target type
Ben Pfaff [Sat, 2 Apr 2011 04:11:47 +0000 (21:11 -0700)]
gui: Fix const-ness warning for measure_to_string() return type.
This fixes the following GCC warning:
src/ui/gui/var-display.c: In function ‘measure_to_string’:
rc/ui/gui/var-display.c:25: warning: return discards qualifiers from
pointer target type
Ben Pfaff [Wed, 6 Apr 2011 04:30:34 +0000 (21:30 -0700)]
gui: Put a separator line before list of windows in Windows menus.
I found it a bit confusing before, that the list of windows in the
Windows menu was not separated from the list of actions of windows.
This fixes the problem.
Simply adding a separator item to the .ui files doesn't work, because
GtkUIManager removes it. The same thing happens if we add a separator
without adding a real menu item at the same time.
Thanks to John Darrington for suggesting the root of the problem.
Ben Pfaff [Thu, 31 Mar 2011 04:39:01 +0000 (21:39 -0700)]
gui: Always convert file names to UTF-8 for use in syntax.
Syntax as understood by the lexer is always in UTF-8, so file names
have to be in UTF-8 too. (The PSPP code that opens files based on
strings from syntax is already using utf8_to_filename() to convert
them properly before opening.)
Before commit
9ade26c8349 "lexer: Reimplement for better
testability and internationalization", the encoding of syntax
files was not well-defined. It was reasonable, then, to put file
names in generated syntax in the file name encoding.
Commit
9ade26c8349 changed the encoding of syntax so that it was
always in UTF-8. This meant that file names in syntax had to be
converted back into the file name encoding before trying to open
the files, and I made that change (you can see, for example, the
call to utf8_to_filename in do_insert() in
src/language/utilities/include.c). But I forgot that the GUI
needs to convert its file names into UTF-8 when it is generating
syntax, so this commit fixes that up.
Ben Pfaff [Tue, 3 May 2011 04:01:35 +0000 (21:01 -0700)]
po: Update Spanish translation from translation-project.org.
Ben Pfaff [Tue, 3 May 2011 04:01:23 +0000 (21:01 -0700)]
po: Update Catalan translation from translation-project.org.
Ben Pfaff [Wed, 27 Apr 2011 04:53:26 +0000 (21:53 -0700)]
FACTOR: Use %s for literal string.
tab_title() treats its parameter as a printf format string, so it's
necessary to use %s.
Found on Mac OS X with help from Jeremy Lavergne.
Ben Pfaff [Wed, 27 Apr 2011 04:52:27 +0000 (21:52 -0700)]
Use %zu, not %d, to format a size_t.
Found on Mac OS X with help from Jeremy Lavergne.
Ben Pfaff [Wed, 27 Apr 2011 04:36:24 +0000 (21:36 -0700)]
encoding-guesser: Don't guess UTF-8 for ASCII if it is the fallback.
When the text presented to the encoding guesser is all ASCII, normally
the encoding guesser will report ASCII as its guess. But if the
fallback encoding is UTF-8, then it reports UTF-8 instead.
Unfortunately, this makes the encoding guesser a bit harder to test,
because guesses depend on the system's locale. It's easier to test
if all-ASCII always yields ASCII as the guess, so this changes the
encoding guesser to do that.
This fixes a test failure on Mac OS X. Thanks to Jeremy Lavergne for
reporting the problem.
Ben Pfaff [Wed, 27 Apr 2011 04:22:41 +0000 (21:22 -0700)]
i18n: Fix type of objects passed as u8_mbtouc()'s character parameter.
Thanks to Jeremy Lavergne for making a Mac OS build system available.
Ben Pfaff [Tue, 26 Apr 2011 03:06:19 +0000 (20:06 -0700)]
QUICK CLUSTER: Adjust comment style.
PSPP uses primarily /**/ style comments so the use of // comments in
this file sticks out.
Also we generally wrap comments (and code) at 79 columns and try to
write comments as whole sentences, at least where there is room.
Also, usually PSPP avoids multiple blank lines in a row.
Ben Pfaff [Tue, 26 Apr 2011 02:59:01 +0000 (19:59 -0700)]
QUICK CLUSTER: Update #include directives to match current style.
These days, usually system header files are #included first, using
<> notation, and then PSPP's one header files, using "" notation.
Each group is alphabetized.
Ben Pfaff [Tue, 26 Apr 2011 02:56:42 +0000 (19:56 -0700)]
QUICK CLUSTER: Fold quick-cluster.h into quick-cluster.c.
It's unusual to put 'static' function prototypes into a header file:
header files are usually used to export declarations for use by other
source files, but 'static' functions cannot be called outside their
own source files.
Mehmet Hakan Satman [Tue, 26 Apr 2011 02:53:15 +0000 (19:53 -0700)]
QUICK CLUSTER: New command.
Ben Pfaff [Tue, 26 Apr 2011 04:37:18 +0000 (21:37 -0700)]
sys-file: Add test to write non-ASCII to most .sav string fields.
Bug #33036.
Ben Pfaff [Mon, 25 Apr 2011 04:42:54 +0000 (21:42 -0700)]
sys-file-reader: Add tests for non-ASCII characters and encodings.
Ben Pfaff [Mon, 25 Apr 2011 04:41:30 +0000 (21:41 -0700)]
DISPLAY FILE LABEL: Display in a more natural way.
Ben Pfaff [Mon, 25 Apr 2011 04:41:02 +0000 (21:41 -0700)]
MRSETS: Recode counted value to UTF-8 before displaying.
Otherwise they display incorrectly if a counted value contains
non-ASCII characters.
Ben Pfaff [Sun, 24 Apr 2011 20:19:15 +0000 (13:19 -0700)]
sys-file-reader, sys-file-writer: Fix encoding problems for mrsets.
Ben Pfaff [Sun, 24 Apr 2011 03:15:18 +0000 (20:15 -0700)]
sys-file-writer: Fix encoding of several string fields.
PSPP keeps most string data in the dictionary in UTF-8. The system
file writer needs to recode this data into the correct encoding, but
in several cases it failed to do so. This fixes the problem.
Thanks to Mindaugas for reporting the problem and to John Darrington
for help diagnosing it.
Bug #33036.
Ben Pfaff [Sun, 24 Apr 2011 05:10:20 +0000 (22:10 -0700)]
sys-file-reader: Take string encoding into account for text records.
The 'encoding' member of struct sfm_reader was not filled in anywhere,
so it was always NULL, which meant that the recode operation in
open_text_record() was a no-op.
Ben Pfaff [Sun, 24 Apr 2011 04:45:19 +0000 (21:45 -0700)]
variable: Make var_set_label() use the variable's own encoding.
I don't see any reason to make the caller supply this. It just makes
code harder to read and write.
Ben Pfaff [Sun, 24 Apr 2011 04:40:48 +0000 (21:40 -0700)]
dictionary: Make dict_create() take the new dictionary's encoding.
There are several places in the PSPP tree that create dictionaries,
but few of them actually set an encoding. This causes most
dictionaries to be in the default encoding, which is often not
correct.
By making dict_create() take the encoding as a parameter we force
the caller to think about the encoding issue up-front.
Ben Pfaff [Sat, 23 Apr 2011 19:01:35 +0000 (12:01 -0700)]
dictionary: Set encoding early when cloning a dictionary.
Many operations that involve the names of variables and other entities
in a dictionary depend on the dictionary's encoding, so it is
potentially important to have the encoding set properly when adding
other entities to the dictionary.
I did not check that this fixes an actual bug.
Ben Pfaff [Sat, 23 Apr 2011 18:59:49 +0000 (11:59 -0700)]
short-names: Consider character encoding when making short names.
Variable names and short names are always in UTF-8, but the length of
short names needs to be limited to 8 bytes in the dictionary encoding,
not in UTF-8. This commit fixes that problem.
Ben Pfaff [Sat, 23 Apr 2011 18:40:42 +0000 (11:40 -0700)]
short-names: Drop redundant call to var_set_short_name().
This function always calls var_set_short_name() twice, so we can drop
the first call.
Ben Pfaff [Sun, 24 Apr 2011 04:59:38 +0000 (21:59 -0700)]
dissect-sysfile: Don't omit the last in a series of text tokens.
Otherwise dissect-sysfile would not print the last long string
variable name written by sys-file-writer, because it did not include
a separator byte after the last record. (This was obvious running
dissect-sysfile on a system file with only one variable.)
Ben Pfaff [Sat, 23 Apr 2011 19:02:04 +0000 (12:02 -0700)]
str: Always null-terminate string in str_format_26adic().
It seems like a good idea to always supply a null terminator, even on
error.
Ben Pfaff [Tue, 26 Apr 2011 01:28:20 +0000 (18:28 -0700)]
Reformat src/language/stats/automake.mk.
It seems more consistent with most of our Makefiles to just write one
file per line.
Ben Pfaff [Sat, 23 Apr 2011 18:00:49 +0000 (11:00 -0700)]
Updated the Lithuanian translation from translationproject.org.
Ben Pfaff [Sat, 23 Apr 2011 14:54:56 +0000 (07:54 -0700)]
Smake: Add unilbrk/u8-possible-linebreaks Gnulib module.
The ASCII output driver now requires this module, as of commit
14b3603043 "ascii: Add support for multibyte characters."
Reported by John Darrington.
Ben Pfaff [Sat, 23 Apr 2011 14:53:00 +0000 (07:53 -0700)]
Add new output/ascii.h header to the distribution, fixing "make dist".
Ben Pfaff [Sat, 23 Apr 2011 05:29:19 +0000 (22:29 -0700)]
ascii: Print syntax in output single-spaced.
When SET PRINTBACK=ON was in effect, the ASCII output driver would
put a blank line between successive lines of syntax, because each line
was output separately. This commit fixes that, by causing the output
core to combine successive syntax output items into a single item that
contains multiple lines of text. This was essentially what the HTML
output driver was doing anyhow, so putting this into the core also
allows removing the corresponding logic from the HTML driver.
Ben Pfaff [Sat, 23 Apr 2011 05:10:04 +0000 (22:10 -0700)]
ascii: Don't print command names in output.
This seems to be closer to what users expect.
I think that having the command names in output was being confused
by users with SET PRINTBACK=ON (see e.g. bug #31561) even though I
really just added that so that there are clear titles for the output
that goes with each procedure. This change should help, I think.
It might make sense to make drivers only print the titles of
procedures that have other output, but this commit doesn't do that.
Ben Pfaff [Sat, 23 Apr 2011 05:07:52 +0000 (22:07 -0700)]
odt: Write command name in output both only before command, not after.
Ben Pfaff [Sat, 23 Apr 2011 03:21:57 +0000 (20:21 -0700)]
ascii: Add support for multibyte characters.
This commit modifies render.at, changing hyphens to non-breaking
hyphens. This change is only to ensure that the output for the tests
in render.at are the same afterward. Without these changes, these
tests wrap these tables differently, because they break after the
hyphens; before, only spaces were considered valid breakpoints.
Bug #31478.
Ben Pfaff [Sun, 17 Apr 2011 01:43:56 +0000 (18:43 -0700)]
ascii: Drop useless 'wrap_mode' parameter from ascii_layout_cell().
This parameter is always supplied as WRAP_WORD, so there's no point
in having it at all.
Ben Pfaff [Sun, 17 Apr 2011 23:55:09 +0000 (16:55 -0700)]
Updated the Lithuanian translation from translationproject.org.
Mindaugas [Sun, 17 Apr 2011 23:52:41 +0000 (16:52 -0700)]
gui: Add Lithuanian translation and MIME type to pspp.desktop
Ben Pfaff [Sat, 16 Apr 2011 20:40:43 +0000 (13:40 -0700)]
Updated the Lithuanian translation from translationproject.org.
Ben Pfaff [Sat, 16 Apr 2011 18:20:59 +0000 (11:20 -0700)]
gui: Fix crash in executor when inline data is missing.
Without this commit, typing "DATA LIST /x 1." into an otherwise empty
syntax window and executing it caused a crash because the lexer was
being accessed after it was destroyed. This commit averts the crash.
Ben Pfaff [Sat, 16 Apr 2011 16:52:34 +0000 (09:52 -0700)]
CROSSTABS: Fix output of multiway statistics tables.
Bug #27452.
Ben Pfaff [Sat, 16 Apr 2011 16:30:10 +0000 (09:30 -0700)]
Updated the Lithuanian translation from translationproject.org.
Ben Pfaff [Sat, 16 Apr 2011 05:49:00 +0000 (22:49 -0700)]
table-casereader: Put space between columns.
When table_casereaders are pasted together next to each other, there
should normally be a little bit of space between neighboring columns,
instead of having them directly abutting. This makes the output of
LIST, for example, much more readable.
Without this commit, LIST output for three variables named x, y, and
z, all with F1.0 format, looks something like this:
xyz
---
111
222
311
412
521
612
711
811
912
With this commit, it looks like this:
x y z
-----
1 1 1
2 2 2
3 1 1
4 1 2
5 2 1
6 1 2
7 1 1
8 1 1
9 1 2
Ben Pfaff [Sat, 16 Apr 2011 05:26:43 +0000 (22:26 -0700)]
render: Fix rendering of TAL_GAP rules.
A rule that is set to TAL_GAP is supposed to have the same width or
height as a rule of type TAL_1, but without drawing the line. That
is, it is supposed to be a small blank space between rows or columns.
Unfortunately, TAL_GAP was not implemented properly in the rendering
code. It was treated just like TAL_0, which meant that it was ignored
and no gap appeared.
This commit implements TAL_GAP, fixing the problem.
Ben Pfaff [Sat, 16 Apr 2011 03:16:02 +0000 (20:16 -0700)]
configure: Invoke AM_GNU_GETTEXT by hand.
The Gnulib "gettext" module does not invoke AM_GNU_GETTEXT, because
gnulib-tool uses "sed" to filter it out. So we must invoke it
ourselves.
John Darrington narrowed the problem down to the commit
b54a5702b6
"Fix up build following dropping ulc-width-linebreaks module" that
started using the gettext Gnulib module.
Bug #33083.
Reported by Mindaugus.
Ben Pfaff [Fri, 15 Apr 2011 04:09:00 +0000 (21:09 -0700)]
gui: Add missing scroll bar in K related samples variables list.
Reported by and fix from Mindaugas Baranauskas <embar@super.lt>.
Reviewed by John Darrington.
Ben Pfaff [Thu, 14 Apr 2011 04:24:30 +0000 (21:24 -0700)]
value-labels: Interpret \n as new-line in value labels.
Bug #18497.
Ben Pfaff [Wed, 13 Apr 2011 05:19:45 +0000 (22:19 -0700)]
intern: New function intern_strlen().
Ben Pfaff [Wed, 13 Apr 2011 03:42:47 +0000 (20:42 -0700)]
intern: Use UP_CAST macro instad of open-coding it.
Ben Pfaff [Wed, 13 Apr 2011 03:33:13 +0000 (20:33 -0700)]
po: Update Dutch translation.
Thanks to the Dutch translation team and the translationproject.org
coordinators.
Ben Pfaff [Tue, 12 Apr 2011 13:48:08 +0000 (06:48 -0700)]
value-parser: Make parse_value() accept variable's print format also.
Previously commands such as VALUE LABELS required numeric values to
be given as plain numbers, but this makes it difficult to add
meaningful value labels for variables with date and time formats.
This commit allows values for this command and a few others to be
given in a variable's print format instead.
Bug #18497.
Ben Pfaff [Tue, 12 Apr 2011 13:45:53 +0000 (06:45 -0700)]
value-labels: Fix comment.
Ben Pfaff [Tue, 12 Apr 2011 13:45:33 +0000 (06:45 -0700)]
DISPLAY: Display values for value labels using the variable's format.
Until now, the values in value labels have been displayed as plain
numbers, but this makes the values for variables with date and time
formats unreadable. Changing them to use the variable's own format
makes them easier to read.
Ben Pfaff [Tue, 12 Apr 2011 13:43:04 +0000 (06:43 -0700)]
tab: Make tab_value() take a variable instead of a dictionary.
It seems far more likely that callers will have the variable handy
than the dictionary. Also, when the variable is used the format can
be optional since tab_value() can get it from the variable's print
format.
Ben Pfaff [Tue, 12 Apr 2011 05:49:01 +0000 (22:49 -0700)]
FORMATS: Allow an optional slash before each set of variable names.
This increases compatibility.
Ben Pfaff [Tue, 12 Apr 2011 05:48:08 +0000 (22:48 -0700)]
FORMATS: Allow setting formats of string variables.
Thanks to John Darrington for reporting this bug.
Bug #22012.
Ben Pfaff [Sun, 10 Apr 2011 16:52:31 +0000 (09:52 -0700)]
gui: psppire-var-view: Change "<unset>" to null string.
Commit
0cb2b9c42 caused NULL variables to be displayed as "<unset>".
This commit changes that to the empty string, which doesn't require
translation and is equally clear.
Suggested by John Darrington <john@darrington.wattle.id.au>.
Ben Pfaff [Sun, 10 Apr 2011 02:11:44 +0000 (19:11 -0700)]
LIST: Fix crash when SPLIT FILE was used.
Thanks to John Darrington for reporting the problem and to Michel
Boaventura for reducing the problem to a simple test case.
Ben Pfaff [Sat, 9 Apr 2011 23:53:26 +0000 (16:53 -0700)]
gui: Fix crash in Paired T-Test dialog on selecting first variable.
When a variable is moved into the list of selected variables in the
Paired T-Test dialog, that row of the treeview has one nonnull
variable and one null variable. Calling var_get_name(NULL) causes a
segfault.
I'm not certain that this is the correct fix, but it fixes the
segfault. The missing variable is now shown as <unset> until a second
variable is moved into the treeview.
Bug #32958.
Ben Pfaff [Sat, 9 Apr 2011 23:50:44 +0000 (16:50 -0700)]
gui: Fix Glib warnings for dialogs in realize and configure callbacks.
The GtkBuilder documentation says:
Prior to 2.20, GtkBuilder was setting the "name" property of
constructed widgets to the "id" attribute. In GTK+ 2.20 or newer,
you have to use gtk_buildable_get_name() instead of
gtk_widget_get_name() to obtain the "id", or set the "name"
property in your UI definition.
This commit fixes the problem by switching from using the "name"
property to calling gtk_buildable_get_name().
Ben Pfaff [Sat, 9 Apr 2011 17:57:55 +0000 (10:57 -0700)]
gui: Link against $(LIBICONV) too.
The GUI now uses iconv, so we need to link libiconv too.
Problem reported by Harry Thijssen <harry.thijssen@gmail.com>.
Fix suggested by John Darrington <john@darrington.wattle.id.au>.
Ben Pfaff [Sat, 9 Apr 2011 17:55:53 +0000 (10:55 -0700)]
u8-istream: Add cast to iconv() to suppress warnings on some systems.
Problem reported by Harry Thijssen <harry.thijssen@gmail.com>.
Fix suggested by John Darrington <john@darrington.wattle.id.au>.
Ben Pfaff [Sat, 9 Apr 2011 16:25:59 +0000 (09:25 -0700)]
u8-istream: Include <limits.h> for definition of MB_LEN_MAX.
Problem reported by Harry Thijssen <pspp@sjpaes.nl>.
Fix suggested by John Darrington <john@darrington.wattle.id.au>.
Ben Pfaff [Sat, 9 Apr 2011 04:14:13 +0000 (21:14 -0700)]
segment: Fix uninitialized variable in segmenter_parse_comment_2__().
This caused a couple of test failures on Mac OS X.
Thanks to Jeremy Lavergne <jeremy@lavergne.gotdns.org> for reporting
the problem.
Ben Pfaff [Fri, 8 Apr 2011 04:49:13 +0000 (21:49 -0700)]
tests: Only check MODE=360 when EBCDIC is supported.
Jeremy Lavergne <jeremy@lavergne.gotdns.org> reported that these tests
fail on Mac OS X. testsuite.log showed that Mac OS X did not support
the EBCDIC-US encoding, so this OS cannot support these tests.
Ben Pfaff [Fri, 8 Apr 2011 04:37:37 +0000 (21:37 -0700)]
tests: Fix quoting in data-in tests.
Without [[ ]] around the test commands, m4 swallows the inner [] in
the sed argument, causing the substitution to be ineffective.
Reported by Jeremy Lavergne <jeremy@lavergne.gotdns.org>.
Ben Pfaff [Fri, 8 Apr 2011 04:05:31 +0000 (21:05 -0700)]
gui: widget-io: Fix cleanup code in widget_printf(), widget_scanf().
The 'arg' member of arguments and the 'dir' member of char_directives
are only allocated from malloc() if there are more than fit in the
arrays that are included inside their respective structures, so they
must only be freed when that internal structure is not used.
Also, these arrays are allocated with malloc() and so must be freed
with free(), not g_free().
Thanks to Benoit Flippen <anagogue@gmail.com> for reporting the
problem.
Ben Pfaff [Fri, 8 Apr 2011 03:53:18 +0000 (20:53 -0700)]
FREQUENCIES: Fix crash when median and histogram both requested.
Thanks to Benoit Flippen <anagogue@gmail.com> for reporting this bug.
Ben Pfaff [Fri, 25 Mar 2011 03:50:27 +0000 (20:50 -0700)]
Update version number to 0.7.7 to
Ben Pfaff [Wed, 23 Mar 2011 05:10:49 +0000 (22:10 -0700)]
cairo: Correctly render table during scrolling.
Commit
845d4b4f3f (cairo: Draw table titles in xr_rendering_draw()
too) started rendering table titles in the GUI but forgot to
compensate for this in the call to render_page_draw_region(), so
scrolling caused visible damage.
Bug #31569.
Reported-by: John Darrington <john@darrington.wattle.id.au>
Ben Pfaff [Wed, 23 Mar 2011 04:20:04 +0000 (21:20 -0700)]
i18n: Only close valid iconv converters in i18n_done().
iconv_open() returns (iconv_t) -1 to indicate an error. We shouldn't
pass this to iconv_Close().
Reported-by: Jeremy Lavergne <jeremy@lavergne.gotdns.org>.
John Darrington suggested that this was probably the problem, and
Jeremy confirmed it with valgrind.
Ben Pfaff [Wed, 23 Mar 2011 04:14:15 +0000 (21:14 -0700)]
i18n: Test converting between unknown encodings too.
This would have found the bug fixed by the previous commit.
Ben Pfaff [Wed, 23 Mar 2011 03:55:55 +0000 (20:55 -0700)]
str: Make ss_alloc_substring_pool() null-terminate its output.
It's inconsistent that ss_alloc_substring() null-terminates its output
but ss_alloc_substring_pool() does not. This caught us out in
recode_substring_pool(), which used ss_alloc_substring_pool() in a
fallback case where create_iconv() failed and expected the result to
be null-terminated.
Reported-by: Jeremy Lavergne <jeremy@lavergne.gotdns.org>
Ben Pfaff [Wed, 23 Mar 2011 04:12:11 +0000 (21:12 -0700)]
tests: Wrap more binaries for "check-valgrind" target.
Ben Pfaff [Tue, 22 Mar 2011 14:58:37 +0000 (07:58 -0700)]
Smake: Add 'memrchr' Gnulib module.
Thanks to John Darrington for reporting that this was needed.
Ben Pfaff [Tue, 22 Mar 2011 04:22:54 +0000 (21:22 -0700)]
Add valgrind support to testsuite.
Ben Pfaff [Mon, 21 Mar 2011 04:03:15 +0000 (21:03 -0700)]
Fix up build following dropping ulc-width-linebreaks module.
Commit
b5cebf00d5e "Smake: Remove module"
unexpectedly broke the build, because ulc-width-linebreaks had several
indirect dependencies that PSPP actually used but were not in the
list of Gnulib modules in Smake. This fixes the problem.
The oddest consequence fixed by this commit has to do with
build-aux/config.rpath. AM_GNU_GETTEXT requires this file, which is
supplied by only a few Gnulib modules: havelib, gettext, and
threadlib. ulc-width-linebreaks depended indirectly on havelib, but
PSPP did not otherwise depend on any of these modules. It seemed that
the best fix was simply to use the Gnulib gettext module, which is
what this commit does.
Ben Pfaff [Sun, 20 Mar 2011 18:39:55 +0000 (11:39 -0700)]
Change terminology from "active file" to "active dataset".
I plan to introduce the concept of multiple datasets. One of these is
active at any given time, and the others are inactive. Each one is
similar to what has been called the "active file" until now. Thus, it
is natural to rename the "active file" to the "active dataset". I
guess that this greater uniformity of terminology will cause less
user confusion.
Ben Pfaff [Sun, 20 Mar 2011 18:35:11 +0000 (11:35 -0700)]
dataset: Use similar form to dictionary code for callbacks, and document.
Ben Pfaff [Sun, 20 Mar 2011 18:25:53 +0000 (11:25 -0700)]
dataset: Rename functions with "dataset_" prefix.
Ben Pfaff [Sun, 20 Mar 2011 18:19:39 +0000 (11:19 -0700)]
Rename procedure.[ch] to dataset.[ch].
These functions deal with datasets, so it is good to name the file after
the data structure.
Ben Pfaff [Tue, 31 Aug 2010 04:45:24 +0000 (21:45 -0700)]
dataset: Remove unused types and useless struct forward declaration.
These typedefs were not used anywhere in the tree. The struct forward
declaration duplicates an identical one at the top of the header.
Ben Pfaff [Sun, 20 Mar 2011 18:09:16 +0000 (11:09 -0700)]
message: Remove reference to deleted type "struct source_stream".
Ben Pfaff [Sun, 20 Mar 2011 18:07:30 +0000 (11:07 -0700)]
NEWS: Remove Time-stamp line.
This isn't useful in conjunction with a version control system. It
just causes artificial merge conflicts.
Ben Pfaff [Sun, 20 Mar 2011 00:05:47 +0000 (17:05 -0700)]
lexer: Reimplement for better testability and internationalization.
This commit reimplements PSPP lexical analysis from the ground up.
From a PSPP user's perspective, this should make PSPP more reliable
and make it easier to work with syntax files in non-ASCII encodings.
See the changes to NEWS for more details.
From a developer's perspective, the most visible change may be that
strings within tokens are now always encoded in UTF-8, regardless of
the syntax file's encoding. Many of the changes in this commit are
due to this, especially those to functions that check for valid
identifiers: an identifier in UTF-8 is not necessarily the same length
when encoded in the dictionary's encoding, but limits on identifier
length must be enforced in the dictionary's encoding (otherwise it
might not be possible to write out a valid system file, since the
identifier might not fit in the fixed length fields in such files).
Another important change is that, whereas before some special syntax
had to be handled by the parser providing feedback to the lexer, now
increasing the sophistication of the lexer has enabled all PSPP syntax
to be analyzed into tokens. This permitted some other improvements:
- An arbitrary number of tokens of lookahead, up to the end of the
current command, is now supported using lex_next_token() and
related functions.
- Before, some command implementations had a special attribute that
meant that the top-level PSPP command parser would not consume the
final token of the command name (because that token was not
followed by tokenizable syntax). This is no longer necessary and
has been removed.
- Before, each command implementation was responsible for ensuring
that valid command syntax was not followed by trailing garbage,
often by calling lex_end_of_command() as the last step of parsing.
This is no longer necessary; the main command parser will ensure
this for itself.
Ben Pfaff [Sat, 19 Mar 2011 23:32:16 +0000 (16:32 -0700)]
scan: New library for high-level PSPP syntax lexical analysis.
This library converts a stream of segments output by the "segment"
library into PSPP tokens.
Ben Pfaff [Sat, 19 Mar 2011 23:30:55 +0000 (16:30 -0700)]
segment: New library for low-level phase of lexical syntax analysis.
This library provides for a low-level part of lexical analysis for
PSPP syntax, which I call "segmentation". Segmentation accepts a
stream of UTF-8 bytes as input. It outputs a label (a segment type)
for each byte or contiguous sequence of bytes in the input.
The following commit will implement the high-level phase of lexical
analysis, called "scanning", that converts a sequence of segments into
PSPP tokens.
Ben Pfaff [Sat, 19 Mar 2011 23:34:53 +0000 (16:34 -0700)]
u8-istream: New library for reading a text file and recoding to UTF-8.
This new library will be used in an upcoming commit.
Ben Pfaff [Sun, 20 Mar 2011 16:43:42 +0000 (09:43 -0700)]
encoding-guesser: New library to guess the encoding of a text file.
This will be used by other new libraries in upcoming commits.
Ben Pfaff [Sat, 19 Mar 2011 23:20:44 +0000 (16:20 -0700)]
i18n: New functions and data structure for obtaining encoding info.
For now these functions don't do any caching, but it might sense to
add caching later if they are called frequently.
Ben Pfaff [Sat, 19 Mar 2011 21:40:11 +0000 (14:40 -0700)]
identifier: Rename token_type_to_string() and make a new version.
Ben Pfaff [Sun, 13 Feb 2011 18:43:57 +0000 (10:43 -0800)]
i18n: New functions for truncating strings in an arbitrary encoding.
Ben Pfaff [Sun, 13 Feb 2011 00:37:10 +0000 (16:37 -0800)]
i18n: New function recode_string_len().
Ben Pfaff [Sun, 12 Dec 2010 04:58:32 +0000 (20:58 -0800)]
i18n: New function uc_name().
Ben Pfaff [Tue, 7 Dec 2010 04:50:04 +0000 (20:50 -0800)]
hash-functions: New function hash_case_bytes().
This is useful for hashing an arbitrary byte sequence case-insensitively.
Obviously most uses would be better off working with Unicode but we aren't
there yet.
Ben Pfaff [Thu, 10 Mar 2011 06:21:11 +0000 (22:21 -0800)]
str: New functions for checking for and removing string suffixes.
Ben Pfaff [Thu, 10 Mar 2011 06:10:48 +0000 (22:10 -0800)]
str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to ds_chomp_byte().
This paves the way for new functions that chomp an entire substring.