pintos-os.org Git - pspp-builds.git/log

projects / pspp-builds.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Ben Pfaff [Wed, 13 Apr 2011 03:33:13 +0000 (20:33 -0700)]

po: Update Dutch translation.

Thanks to the Dutch translation team and the translationproject.org
coordinators.

commit | commitdiff | tree

Ben Pfaff [Tue, 12 Apr 2011 13:48:08 +0000 (06:48 -0700)]

value-parser: Make parse_value() accept variable's print format also.

Previously commands such as VALUE LABELS required numeric values to
be given as plain numbers, but this makes it difficult to add
meaningful value labels for variables with date and time formats.
This commit allows values for this command and a few others to be
given in a variable's print format instead.

Bug #18497.

commit | commitdiff | tree

Ben Pfaff [Tue, 12 Apr 2011 13:45:53 +0000 (06:45 -0700)]

value-labels: Fix comment.

commit | commitdiff | tree

Ben Pfaff [Tue, 12 Apr 2011 13:45:33 +0000 (06:45 -0700)]

DISPLAY: Display values for value labels using the variable's format.

Until now, the values in value labels have been displayed as plain
numbers, but this makes the values for variables with date and time
formats unreadable. Changing them to use the variable's own format
makes them easier to read.

commit | commitdiff | tree

Ben Pfaff [Tue, 12 Apr 2011 13:43:04 +0000 (06:43 -0700)]

tab: Make tab_value() take a variable instead of a dictionary.

It seems far more likely that callers will have the variable handy
than the dictionary. Also, when the variable is used the format can
be optional since tab_value() can get it from the variable's print
format.

commit | commitdiff | tree

Ben Pfaff [Tue, 12 Apr 2011 05:49:01 +0000 (22:49 -0700)]

FORMATS: Allow an optional slash before each set of variable names.

This increases compatibility.

commit | commitdiff | tree

Ben Pfaff [Tue, 12 Apr 2011 05:48:08 +0000 (22:48 -0700)]

FORMATS: Allow setting formats of string variables.

Thanks to John Darrington for reporting this bug.
Bug #22012.

commit | commitdiff | tree

Ben Pfaff [Sun, 10 Apr 2011 16:52:31 +0000 (09:52 -0700)]

gui: psppire-var-view: Change "<unset>" to null string.

Commit 0cb2b9c42 caused NULL variables to be displayed as "<unset>".
This commit changes that to the empty string, which doesn't require
translation and is equally clear.

Suggested by John Darrington <john@darrington.wattle.id.au>.

commit | commitdiff | tree

Ben Pfaff [Sun, 10 Apr 2011 02:11:44 +0000 (19:11 -0700)]

LIST: Fix crash when SPLIT FILE was used.

Thanks to John Darrington for reporting the problem and to Michel
Boaventura for reducing the problem to a simple test case.

commit | commitdiff | tree

Ben Pfaff [Sat, 9 Apr 2011 23:53:26 +0000 (16:53 -0700)]

gui: Fix crash in Paired T-Test dialog on selecting first variable.

When a variable is moved into the list of selected variables in the
Paired T-Test dialog, that row of the treeview has one nonnull
variable and one null variable. Calling var_get_name(NULL) causes a
segfault.

I'm not certain that this is the correct fix, but it fixes the
segfault. The missing variable is now shown as <unset> until a second
variable is moved into the treeview.

Bug #32958.

commit | commitdiff | tree

Ben Pfaff [Sat, 9 Apr 2011 23:50:44 +0000 (16:50 -0700)]

gui: Fix Glib warnings for dialogs in realize and configure callbacks.

The GtkBuilder documentation says:

    Prior to 2.20, GtkBuilder was setting the "name" property of
    constructed widgets to the "id" attribute. In GTK+ 2.20 or newer,
    you have to use gtk_buildable_get_name() instead of
    gtk_widget_get_name() to obtain the "id", or set the "name"
    property in your UI definition.

This commit fixes the problem by switching from using the "name"
property to calling gtk_buildable_get_name().

commit | commitdiff | tree

Ben Pfaff [Sat, 9 Apr 2011 17:57:55 +0000 (10:57 -0700)]

gui: Link against $(LIBICONV) too.

The GUI now uses iconv, so we need to link libiconv too.

Problem reported by Harry Thijssen <harry.thijssen@gmail.com>.
Fix suggested by John Darrington <john@darrington.wattle.id.au>.

commit | commitdiff | tree

Ben Pfaff [Sat, 9 Apr 2011 17:55:53 +0000 (10:55 -0700)]

u8-istream: Add cast to iconv() to suppress warnings on some systems.

Problem reported by Harry Thijssen <harry.thijssen@gmail.com>.
Fix suggested by John Darrington <john@darrington.wattle.id.au>.

commit | commitdiff | tree

Ben Pfaff [Sat, 9 Apr 2011 16:25:59 +0000 (09:25 -0700)]

u8-istream: Include <limits.h> for definition of MB_LEN_MAX.

Problem reported by Harry Thijssen <pspp@sjpaes.nl>.
Fix suggested by John Darrington <john@darrington.wattle.id.au>.

commit | commitdiff | tree

Ben Pfaff [Sat, 9 Apr 2011 04:14:13 +0000 (21:14 -0700)]

segment: Fix uninitialized variable in segmenter_parse_comment_2__().

This caused a couple of test failures on Mac OS X.

Thanks to Jeremy Lavergne <jeremy@lavergne.gotdns.org> for reporting
the problem.

commit | commitdiff | tree

Ben Pfaff [Fri, 8 Apr 2011 04:49:13 +0000 (21:49 -0700)]

tests: Only check MODE=360 when EBCDIC is supported.

Jeremy Lavergne <jeremy@lavergne.gotdns.org> reported that these tests
fail on Mac OS X. testsuite.log showed that Mac OS X did not support
the EBCDIC-US encoding, so this OS cannot support these tests.

commit | commitdiff | tree

Ben Pfaff [Fri, 8 Apr 2011 04:37:37 +0000 (21:37 -0700)]

tests: Fix quoting in data-in tests.

Without [[ ]] around the test commands, m4 swallows the inner [] in
the sed argument, causing the substitution to be ineffective.

Reported by Jeremy Lavergne <jeremy@lavergne.gotdns.org>.

commit | commitdiff | tree

Ben Pfaff [Fri, 8 Apr 2011 04:05:31 +0000 (21:05 -0700)]

gui: widget-io: Fix cleanup code in widget_printf(), widget_scanf().

The 'arg' member of arguments and the 'dir' member of char_directives
are only allocated from malloc() if there are more than fit in the
arrays that are included inside their respective structures, so they
must only be freed when that internal structure is not used.

Also, these arrays are allocated with malloc() and so must be freed
with free(), not g_free().

Thanks to Benoit Flippen <anagogue@gmail.com> for reporting the
problem.

commit | commitdiff | tree

Ben Pfaff [Fri, 8 Apr 2011 03:53:18 +0000 (20:53 -0700)]

FREQUENCIES: Fix crash when median and histogram both requested.

Thanks to Benoit Flippen <anagogue@gmail.com> for reporting this bug.

commit | commitdiff | tree

Ben Pfaff [Fri, 25 Mar 2011 03:50:27 +0000 (20:50 -0700)]

Update version number to 0.7.7 to

commit | commitdiff | tree

Ben Pfaff [Wed, 23 Mar 2011 05:10:49 +0000 (22:10 -0700)]

cairo: Correctly render table during scrolling.

Commit 845d4b4f3f (cairo: Draw table titles in xr_rendering_draw()
too) started rendering table titles in the GUI but forgot to
compensate for this in the call to render_page_draw_region(), so
scrolling caused visible damage.

Bug #31569.
Reported-by: John Darrington <john@darrington.wattle.id.au>

commit | commitdiff | tree

Ben Pfaff [Wed, 23 Mar 2011 04:20:04 +0000 (21:20 -0700)]

i18n: Only close valid iconv converters in i18n_done().

iconv_open() returns (iconv_t) -1 to indicate an error. We shouldn't
pass this to iconv_Close().

Reported-by: Jeremy Lavergne <jeremy@lavergne.gotdns.org>.
John Darrington suggested that this was probably the problem, and
Jeremy confirmed it with valgrind.

commit | commitdiff | tree

Ben Pfaff [Wed, 23 Mar 2011 04:14:15 +0000 (21:14 -0700)]

i18n: Test converting between unknown encodings too.

This would have found the bug fixed by the previous commit.

commit | commitdiff | tree

Ben Pfaff [Wed, 23 Mar 2011 03:55:55 +0000 (20:55 -0700)]

str: Make ss_alloc_substring_pool() null-terminate its output.

It's inconsistent that ss_alloc_substring() null-terminates its output
but ss_alloc_substring_pool() does not. This caught us out in
recode_substring_pool(), which used ss_alloc_substring_pool() in a
fallback case where create_iconv() failed and expected the result to
be null-terminated.

Reported-by: Jeremy Lavergne <jeremy@lavergne.gotdns.org>

commit | commitdiff | tree

Ben Pfaff [Wed, 23 Mar 2011 04:12:11 +0000 (21:12 -0700)]

tests: Wrap more binaries for "check-valgrind" target.

commit | commitdiff | tree

Ben Pfaff [Tue, 22 Mar 2011 14:58:37 +0000 (07:58 -0700)]

Smake: Add 'memrchr' Gnulib module.

Thanks to John Darrington for reporting that this was needed.

commit | commitdiff | tree

Ben Pfaff [Tue, 22 Mar 2011 04:22:54 +0000 (21:22 -0700)]

Add valgrind support to testsuite.

commit | commitdiff | tree

Ben Pfaff [Mon, 21 Mar 2011 04:03:15 +0000 (21:03 -0700)]

Fix up build following dropping ulc-width-linebreaks module.

Commit b5cebf00d5e "Smake: Remove  module"
unexpectedly broke the build, because ulc-width-linebreaks had several
indirect dependencies that PSPP actually used but were not in the
list of Gnulib modules in Smake.  This fixes the problem.

The oddest consequence fixed by this commit has to do with
build-aux/config.rpath.  AM_GNU_GETTEXT requires this file, which is
supplied by only a few Gnulib modules: havelib, gettext, and
threadlib.  ulc-width-linebreaks depended indirectly on havelib, but
PSPP did not otherwise depend on any of these modules.  It seemed that
the best fix was simply to use the Gnulib gettext module, which is
what this commit does.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 18:39:55 +0000 (11:39 -0700)]

Change terminology from "active file" to "active dataset".

I plan to introduce the concept of multiple datasets.  One of these is
active at any given time, and the others are inactive.  Each one is
similar to what has been called the "active file" until now.  Thus, it
is natural to rename the "active file" to the "active dataset".  I
guess that this greater uniformity of terminology will cause less
user confusion.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 18:35:11 +0000 (11:35 -0700)]

dataset: Use similar form to dictionary code for callbacks, and document.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 18:25:53 +0000 (11:25 -0700)]

dataset: Rename functions with "dataset_" prefix.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 18:19:39 +0000 (11:19 -0700)]

Rename procedure.[ch] to dataset.[ch].

These functions deal with datasets, so it is good to name the file after
the data structure.

commit | commitdiff | tree

Ben Pfaff [Tue, 31 Aug 2010 04:45:24 +0000 (21:45 -0700)]

dataset: Remove unused types and useless struct forward declaration.

These typedefs were not used anywhere in the tree. The struct forward
declaration duplicates an identical one at the top of the header.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 18:09:16 +0000 (11:09 -0700)]

message: Remove reference to deleted type "struct source_stream".

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 18:07:30 +0000 (11:07 -0700)]

NEWS: Remove Time-stamp line.

This isn't useful in conjunction with a version control system. It
just causes artificial merge conflicts.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 00:05:47 +0000 (17:05 -0700)]

lexer: Reimplement for better testability and internationalization.

This commit reimplements PSPP lexical analysis from the ground up.
From a PSPP user's perspective, this should make PSPP more reliable
and make it easier to work with syntax files in non-ASCII encodings.
See the changes to NEWS for more details.

From a developer's perspective, the most visible change may be that
strings within tokens are now always encoded in UTF-8, regardless of
the syntax file's encoding.  Many of the changes in this commit are
due to this, especially those to functions that check for valid
identifiers: an identifier in UTF-8 is not necessarily the same length
when encoded in the dictionary's encoding, but limits on identifier
length must be enforced in the dictionary's encoding (otherwise it
might not be possible to write out a valid system file, since the
identifier might not fit in the fixed length fields in such files).

Another important change is that, whereas before some special syntax
had to be handled by the parser providing feedback to the lexer, now
increasing the sophistication of the lexer has enabled all PSPP syntax
to be analyzed into tokens.  This permitted some other improvements:

  - An arbitrary number of tokens of lookahead, up to the end of the
    current command, is now supported using lex_next_token() and
    related functions.

  - Before, some command implementations had a special attribute that
    meant that the top-level PSPP command parser would not consume the
    final token of the command name (because that token was not
    followed by tokenizable syntax).  This is no longer necessary and
    has been removed.

  - Before, each command implementation was responsible for ensuring
    that valid command syntax was not followed by trailing garbage,
    often by calling lex_end_of_command() as the last step of parsing.
    This is no longer necessary; the main command parser will ensure
    this for itself.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Mar 2011 23:32:16 +0000 (16:32 -0700)]

scan: New library for high-level PSPP syntax lexical analysis.

This library converts a stream of segments output by the "segment"
library into PSPP tokens.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Mar 2011 23:30:55 +0000 (16:30 -0700)]

segment: New library for low-level phase of lexical syntax analysis.

This library provides for a low-level part of lexical analysis for
PSPP syntax, which I call "segmentation". Segmentation accepts a
stream of UTF-8 bytes as input. It outputs a label (a segment type)
for each byte or contiguous sequence of bytes in the input.

The following commit will implement the high-level phase of lexical
analysis, called "scanning", that converts a sequence of segments into
PSPP tokens.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Mar 2011 23:34:53 +0000 (16:34 -0700)]

u8-istream: New library for reading a text file and recoding to UTF-8.

This new library will be used in an upcoming commit.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Mar 2011 16:43:42 +0000 (09:43 -0700)]

encoding-guesser: New library to guess the encoding of a text file.

This will be used by other new libraries in upcoming commits.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Mar 2011 23:20:44 +0000 (16:20 -0700)]

i18n: New functions and data structure for obtaining encoding info.

For now these functions don't do any caching, but it might sense to
add caching later if they are called frequently.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Mar 2011 21:40:11 +0000 (14:40 -0700)]

identifier: Rename token_type_to_string() and make a new version.

commit | commitdiff | tree

Ben Pfaff [Sun, 13 Feb 2011 18:43:57 +0000 (10:43 -0800)]

i18n: New functions for truncating strings in an arbitrary encoding.

commit | commitdiff | tree

Ben Pfaff [Sun, 13 Feb 2011 00:37:10 +0000 (16:37 -0800)]

i18n: New function recode_string_len().

commit | commitdiff | tree

Ben Pfaff [Sun, 12 Dec 2010 04:58:32 +0000 (20:58 -0800)]

i18n: New function uc_name().

commit | commitdiff | tree

Ben Pfaff [Tue, 7 Dec 2010 04:50:04 +0000 (20:50 -0800)]

hash-functions: New function hash_case_bytes().

This is useful for hashing an arbitrary byte sequence case-insensitively.
Obviously most uses would be better off working with Unicode but we aren't
there yet.

commit | commitdiff | tree

Ben Pfaff [Thu, 10 Mar 2011 06:21:11 +0000 (22:21 -0800)]

str: New functions for checking for and removing string suffixes.

commit | commitdiff | tree

Ben Pfaff [Thu, 10 Mar 2011 06:10:48 +0000 (22:10 -0800)]

str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to ds_chomp_byte().

This paves the way for new functions that chomp an entire substring.

commit | commitdiff | tree

Ben Pfaff [Tue, 7 Dec 2010 04:46:56 +0000 (20:46 -0800)]

str: New function ss_realloc().

commit | commitdiff | tree

Ben Pfaff [Tue, 7 Dec 2010 04:54:40 +0000 (20:54 -0800)]

output: New function text_item_create_nocopy().

commit | commitdiff | tree

Ben Pfaff [Sun, 6 Feb 2011 05:10:10 +0000 (21:10 -0800)]

sys-file-reader: Refactor to clean up character encoding support.

The system file format is unusual in that it does not record the encoding
used by character strings at the beginning or at any fixed place in the
file.  Instead, it can be recorded practically anywhere in the file.  It
never precedes all of the actual character strings in the file, which makes
it impossible to interpret those strings completely and correctly until it
is encountered.

Until now, the system file reader has dealt with this situation by
stuffing uninterpreted character strings into data structures until the
encoding is known, then at that point fetching out the character strings,
reencoding them, and stuffing them back into the data structures.  This
does work, but it has the disadvantage that all of the PSPP data
structures have to tolerate character strings with unknown encoding.  In
some cases this seems like an ugly situation.  For example, arbitrary
variable names have to be supported, even though the syntax for variable
names is circumscribed by the language, because the syntax rules for
variable names cannot be completely and correctly applied to a string that
is in an unknown encoding.

This commit fixes that problem by adopting a new way to read system files.
Each record in the system file dictionary is essentially slurped into
memory as a chunk, then the character encoding is extracted from it, then
the rest of the dictionary is interpreted based on that encoding.  The
actual implementation is a little more intricate because the format of
system file records is somewhat non-uniform.

commit | commitdiff | tree

Ben Pfaff [Thu, 17 Mar 2011 04:33:54 +0000 (21:33 -0700)]

file-name: Do not make output files line-buffered in fn_open().

I don't see any reason to do this. I can't see anything in the commit
log for this file or in OChangeLog that explains why it was done.

commit | commitdiff | tree

Ben Pfaff [Tue, 15 Mar 2011 01:19:23 +0000 (18:19 -0700)]

data-reader: Remove unreachable "return" statements.

commit | commitdiff | tree

Ben Pfaff [Sun, 5 Sep 2010 03:56:41 +0000 (20:56 -0700)]

file-handle-def: Use hmap instead of list for name table.

It makes much more sense to keep an index of names using a hash table
than using a linked list.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Mar 2011 20:48:24 +0000 (13:48 -0700)]

Update all #include directives to the currently preferred style.

I left src/ui/gui alone for now.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Mar 2011 20:57:45 +0000 (13:57 -0700)]

Smake: Remove ulc-width-linebreaks module.

This function has not been used in PSPP for a long time now.

commit | commitdiff | tree

Ben Pfaff [Sat, 12 Mar 2011 06:10:54 +0000 (22:10 -0800)]

FREQUENCIES: Fix percentiles calculation.

The condition for using a variate directly instead of interpolating was
just wrong. It would interpolate in cases where it clearly should not,
which produced incorrect results in many cases.

Thanks to Fabio Bordignon <bordignon@demos.it> for reporting the problem
and supplying a simple test case.

commit | commitdiff | tree

Ben Pfaff [Fri, 11 Mar 2011 06:53:17 +0000 (22:53 -0800)]

T-TEST: Fix use-after-free with TEMPORARY and independent samples.

When TEMPORARY is in effect, proc_commit() destroys the temporary
dictionary.  This means that any procedure that does not somehow disable
temporary transformations and refers to a variable following proc_commit()
has a use-after-free error.

T-TEST has two different bugs of this type.  First, the loop that destroys
group statistics refers to destroyed variables.  This commit fixes this
problem by instead using variable aux data destructors to destroy group
statistics.

Second, when there is an independent variable, destroying its values
requires knowing the variable's width.  This commit fixes this problem by
destroying the values before calling proc_commit().

The AUTORECODE, DESCRIPTIVES, RANK, and REGRESSION procedures appear to
have similar issues (not fixed by this commit).

Reported by Jeremy Lavergne <jeremy@lavergne.gotdns.org>.

commit | commitdiff | tree

Ben Pfaff [Fri, 11 Mar 2011 06:46:04 +0000 (22:46 -0800)]

group: Delete unused functions.

commit | commitdiff | tree

Ben Pfaff [Fri, 11 Mar 2011 06:44:50 +0000 (22:44 -0800)]

DELETE VARIABLES: Style fix.

commit | commitdiff | tree

Ben Pfaff [Fri, 25 Feb 2011 05:31:44 +0000 (21:31 -0800)]

README.Git: Update to newer commit.

This is the Gnulib commit I'm testing against locally. It should fix the
problem that showed up in the nightly build, in which the dtoastr module
was missing.

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Feb 2011 01:30:39 +0000 (17:30 -0800)]

data-out: Add test for non-ASCII custom currency formats.

These now work as I would expect, so add a test to avoid future regression.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Feb 2011 20:55:54 +0000 (12:55 -0800)]

data-out: Optimize and fix some bad assumptions.

Until now, data_out_pool() and its wrapper function data_out() have always
done at least two memory allocations: one to fill in the initial version
of the result and another to recode it to UTF-8.  However, recoding to
UTF-8 is usually unnecessary, because most output formats always produce
output in UTF-8 anyway.  Only binary formats and the string A format ever
produce data in other encodings, so this commit drops recoding entirely
except for those cases.  Binary formats are a particularly special case:
usually it doesn't make any sense to use these formats for text output,
but this commit does its best to translate the binary output bytes into
valid UTF-8, at least up to the first null byte.

This commit also finishes fixing up display widths.

The closely related data_out_legacy() function, which only has one user
in three also needed some work.  It was badly named, so I renamed it to
data_out_recode().  It made the bad assumption that the data passed in
was encoded in ASCII (written C_ENCODING).  It also made the bad
assumption that the number of bytes output would be exactly the format's
width.  This rewrite fixes these problems.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Feb 2011 05:58:08 +0000 (21:58 -0800)]

pool: Support NULL pool argument to pool_alloc_unaligned().

I don't see a reason that this should be unsupported.

commit | commitdiff | tree

Ben Pfaff [Thu, 17 Feb 2011 05:42:13 +0000 (21:42 -0800)]

data-out: Reorganize output_Z() to be more easily understood.

It took me a minute to figure out what was going on here, so this commit
slightly reorganizes it.

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Feb 2011 05:55:18 +0000 (21:55 -0800)]

format: Count prefix and suffix width in terms of display columns.

Until now, the prefixes and suffixes for custom currency formats
(CCA, etc.) have been considered to occupy one display column per
byte. This is fine for prefixes and suffixes like "$" or "%", but
falls down badly with U+00A5 (¥) or U+20AC (€), which occupy two
or three bytes, respectively, in UTF-8, while occupying only a
single display column.

This commit fixes the problem. It doesn't add a test yet because
there are still some higher-level issues, but that will come in
a later commit when those remaining issues are resolved.

commit | commitdiff | tree

Ben Pfaff [Wed, 16 Feb 2011 06:00:33 +0000 (22:00 -0800)]

format: Create a new "struct fmt_affix" for prefix and suffix strings.

This combines two changes: changing the string type for numeric
prefix and suffix strings from "struct substring" to plain "char *",
and putting the string inside a new structure. Both of these will
make more sense after the following commit, which adds another member
to the new structure and stops using the length of the string in so
many places (which is the reason that "struct substring" was a good
choice).

commit | commitdiff | tree

Ben Pfaff [Sat, 19 Feb 2011 06:30:00 +0000 (22:30 -0800)]

data-out: Make each converter responsible for storing null terminator.

Until now, every converter has produced output that is exactly as many
bytes long as the format's width. In upcoming patches this will change,
because in UTF-8 a character that occupies a single display column can
require multiple bytes. In preparation, this commit requires the
individual converters to write out their own null byte, giving a bit
more flexibility regarding length.

commit | commitdiff | tree

Ben Pfaff [Tue, 15 Feb 2011 07:17:11 +0000 (23:17 -0800)]

format: Increase abstraction of fmt_number_style.

Upcoming commits will make some changes to fmt_number_style, so it
seems best to avoid having clients actually construct and modify
instances of this structure. This commit makes that change.

We could take it one step further and add accessor functions even
for reading out the structure, but in my opinion that would be
overkill for this structure.

commit | commitdiff | tree

Ben Pfaff [Sun, 13 Feb 2011 19:23:06 +0000 (11:23 -0800)]

legacy-encoding: Remove.

The functions in this module are no longer used.

commit | commitdiff | tree

Ben Pfaff [Sun, 13 Feb 2011 19:49:30 +0000 (11:49 -0800)]

i18n: Introduce C_ENCODING as replacement for LEGACY_NATIVE.

The LEGACY_NATIVE name seems a bit awkward for something that is just the
name of the encoding for strings in C source code, that is, the C locale,
so this commit renames it to C_ENCODING and moves it to i18n.h with the
rest of the encoding-related functions.

In addition, PSPP assumes in various places that the local system has
ASCII-based locales. I don't think there's much point in pretending to
support EBCDIC, so this commit removes that little bit of support.

commit | commitdiff | tree

Ben Pfaff [Sun, 13 Feb 2011 19:36:27 +0000 (11:36 -0800)]

i18n: New function recode_byte().

commit | commitdiff | tree

Ben Pfaff [Sun, 20 Feb 2011 00:55:58 +0000 (16:55 -0800)]

PRINT: Use UTF-8 encoding for output to the output subsystem.

All string data coming into the output subsystem must be encoded in UTF-8,
but PRINT was recoding it into ASCII instead.

commit | commitdiff | tree

Ben Pfaff [Tue, 15 Feb 2011 06:04:51 +0000 (22:04 -0800)]

CROSSTABS: Eliminate redundant data copying.

There's no point in copying the output string twice.

commit | commitdiff | tree

Ben Pfaff [Mon, 14 Feb 2011 06:20:45 +0000 (22:20 -0800)]

Use new Gnulib function dtoastr() to format short, accurate real numbers.

%.*g with DBL_DIG + 1 as argument is simple but in rare cases it fails to
accurately format a real number. The recently added Gnulib routine
dtoastr() always formats a real number accurately, so switch to using it
for these cases.

commit | commitdiff | tree

Ben Pfaff [Sun, 13 Feb 2011 19:20:35 +0000 (11:20 -0800)]

operations.def: Fix indentation.

commit | commitdiff | tree

Ben Pfaff [Sat, 12 Feb 2011 16:02:48 +0000 (08:02 -0800)]

PRINT SPACE: When an output file is specified, don't ignore expression.

When both OUTFILE= and an expression were specified on PRINT SPACE, the
expression was ignored (if it was only a single token) or an error would
occur (if it was was more than one token).

commit | commitdiff | tree

Ben Pfaff [Sun, 6 Feb 2011 21:39:38 +0000 (13:39 -0800)]

GET DATA: Get rid of lex_put_back().

An upcoming commit will get rid of lex_put_back(), so don't use it here.

commit | commitdiff | tree

Ben Pfaff [Sun, 12 Dec 2010 22:00:28 +0000 (14:00 -0800)]

float-format: Eliminate tests' dependence on exact string encoding.

Until now, the float-format tests have depended on the PSPP syntax
accepting arbitrary byte values in strings, without treating them as part
of any particular encoding. The lexer is being rewritten so that this
assumption is no longer true, so this commit eliminates this assumption in
the float-format tests. After this commit, the tests only use ASCII
characters in strings.

commit | commitdiff | tree

Ben Pfaff [Sun, 2 Jan 2011 00:42:06 +0000 (16:42 -0800)]

por-file-reader: Remove dependency on VAR_NAME_LEN.

VAR_NAME_LEN wasn't really needed here because we knew that the name we
started from was no more than 8 bytes long. Also, we know that we can
come up with a unique name within ULONG_MAX tries since we'd run out of
memory before running out of values to try.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sun, 2 Jan 2011 00:30:42 +0000 (16:30 -0800)]

text-data-import-dialog: Eliminate VAR_NAME_LEN restriction.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sun, 2 Jan 2011 00:26:25 +0000 (16:26 -0800)]

REGRESSION: Eliminate restriction to VAR_NAME_LEN in reg_get_name().

There's still an obvious problem here that the prefix isn't being trimmed
down so that the suffix will fit. Since an upcoming series of changes
would have to completely redo how this would be done, I'm not fixing that
now, only marking it with XXX.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sun, 2 Jan 2011 00:16:30 +0000 (16:16 -0800)]

FLIP: Eliminate false dependency on VAR_NAME_LEN.

The size of this buffer really shouldn't have anything to do with
VAR_NAME_LEN but with the string about to be put into it.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sun, 2 Jan 2011 00:08:03 +0000 (16:08 -0800)]

DESCRIPTIVES: Eliminate main restriction on Z-score variable name length.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 23:54:54 +0000 (15:54 -0800)]

variable-parser: Drop VAR_NAME_LEN restriction from var_set_lookup_var_idx().

This restriction is purely artificial, as part of an assertion. Since
longer variable names are going to have to be supported, remove it.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 21:00:49 +0000 (13:00 -0800)]

variable-parser: Rewrite parse_DATA_LIST_vars().

This rewrite was prompted by getting rid of the VAR_NAME_LEN limit inside
parse_DATA_LIST_vars(), but then I noticed that the variable naming and
coding style was dated, and that duplicate variable names were only
detected for variables named using TO, not for individual names, so I
rewrote much of the code instead.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 19:15:33 +0000 (11:15 -0800)]

DEBUG EVALUATE: Eliminate VAR_NAME_LEN limit.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 19:11:07 +0000 (11:11 -0800)]

VECTOR: Eliminate VAR_NAME_LEN limit for variable names.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 19:09:01 +0000 (11:09 -0800)]

MODIFY VARIABLES: Eliminate VAR_NAME_LEN limit on variable names.

This is actually a general code improvement, since it eliminates memory
allocation and copying that was not actually necessary.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 19:05:14 +0000 (11:05 -0800)]

DATAFILE ATTRIBUTE, VARIABLE ATTRIBUTE: Eliminate VAR_NAME_LEN limit.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 18:59:03 +0000 (10:59 -0800)]

GET DATA /TYPE=TXT: Get rid of VAR_NAME_LEN limit on variable names.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 18:56:55 +0000 (10:56 -0800)]

FILE HANDLE: Get rid of VAR_NAME_LEN limit on handle name.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 18:52:14 +0000 (10:52 -0800)]

combine-files: Eliminate VAR_NAME_LEN restriction from combine_files().

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 18:47:56 +0000 (10:47 -0800)]

vector: Remove VAR_NAME_LEN limit for internal representation of name.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 18:45:11 +0000 (10:45 -0800)]

variable: Remove VAR_NAME_LEN limit for internal representation of name.

Most uses of VAR_NAME_LEN within PSPP are wrong due to encoding issues:
the limit applies to variable names in the encoding used by the data
set, but most uses of VAR_NAME_LEN actually limit the length of a name
in UTF-8. The UTF-8 representation of a name can be longer or shorter
than its representation in the data set encoding, so it seems best to
eliminate references to VAR_NAME_LEN entirely.

commit | commitdiff | tree

Ben Pfaff [Sat, 1 Jan 2011 18:39:07 +0000 (10:39 -0800)]

dict: Make dict_make_unique_var_name() return an allocated string.

commit | commitdiff | tree

Ben Pfaff [Sun, 12 Dec 2010 22:29:00 +0000 (14:29 -0800)]

tests: Add `check-programs' target.

Occasionally I want to build all the programs required to run "make check"
without actually running the tests. This target allows me to do that.

commit | commitdiff | tree

Ben Pfaff [Mon, 27 Dec 2010 05:12:06 +0000 (21:12 -0800)]

Smake: Avoid duplicating $(GNULIB_TOOL) invocation.

The "all" and "gnulib" rules both contained the same gnulib-tool
invocation. This commit changes the "all" rule to use the "gnulib" rule
as a subrule.

commit | commitdiff | tree

Ben Pfaff [Sat, 5 Feb 2011 05:33:47 +0000 (21:33 -0800)]

i18n: Always allocate from pool in recode_substring_pool().

commit | commitdiff | tree

Ben Pfaff [Wed, 5 Jan 2011 05:55:07 +0000 (21:55 -0800)]

sys-file-writer: Fix subtype used for v14+ multiple response set records.

PSPP build tags.