John Darrington [Sat, 18 Jul 2009 10:34:30 +0000 (12:34 +0200)]
The length of the string is now not always the
same as the format width.
Thanks to Ben Pfaff for pointing this out.
John Darrington [Sat, 18 Jul 2009 10:29:35 +0000 (12:29 +0200)]
Improve code to trim leading spaces from numeric output.
Ben Pfaff pointed out that the code to chomp the results
of formated doubles was no longer correct. This change
fixes that.
John Darrington [Sat, 18 Jul 2009 10:08:04 +0000 (12:08 +0200)]
Use data_out_pool in crosstabs.q
We were erroneously allocating a buffer before
the size of the contents were known. Using
data_out_pool avoids this problem. Thanks to
Ben Pfaff for pointing this out.
John Darrington [Sat, 18 Jul 2009 09:32:46 +0000 (11:32 +0200)]
Before recoding a variable's name, check that it
doesn't clash with an existing one.
Thanks to Ben Pfaff for pointing out this potential
problem.
John Darrington [Mon, 13 Jul 2009 09:29:39 +0000 (17:29 +0800)]
Fix crash in find dialog and make code less horrible.
John Darrington [Mon, 13 Jul 2009 06:21:02 +0000 (14:21 +0800)]
Fix bug encodig missing value keys in gui
John Darrington [Sun, 12 Jul 2009 20:46:20 +0000 (04:46 +0800)]
Fix crash on text import dialog
John Darrington [Sun, 12 Jul 2009 15:44:36 +0000 (23:44 +0800)]
Fix compiler warning in test program
John Darrington [Sun, 12 Jul 2009 14:50:11 +0000 (22:50 +0800)]
Updated the developers' manual to reflect the new situation
John Darrington [Sun, 12 Jul 2009 14:13:44 +0000 (22:13 +0800)]
Added a dict parameter to data_in and dealt with the consequences.
The data_in function now takes a pointer to a struct dictionary,
which must be the dictionary with which the output value is
associated. Data_in now ensures that the data of string values
is converted to the dictionary's encoding if necessary.
John Darrington [Wed, 8 Jul 2009 19:05:24 +0000 (03:05 +0800)]
Remove recoding in data_store.
It's no longer appropriate to perform recoding in the gui.
Instead, this is expected to be done in the backend.
John Darrington [Tue, 7 Jul 2009 16:50:57 +0000 (00:50 +0800)]
Change union value type to contain uint8_t types instead of char.
Make the members of the union value type in src/data/value.h be
uint8_t instead of char. This is more logical since the contents
of values cannot be considered "strings" until they have been
formatted. The unformatted values are merely arrays of bytes.
This has the added advantage of provoking compiler warnings when
a char * type is being implicitly cast to a uint8_t * or vici-versa.
When such a warning is encountered, is probably means that the
data needs to be re-encoded using recode_string.
John Darrington [Tue, 7 Jul 2009 16:34:16 +0000 (00:34 +0800)]
Remove erroneously commited diagnostic statement
John Darrington [Tue, 7 Jul 2009 12:33:03 +0000 (20:33 +0800)]
Use default encoding when reading system files if no encoding is given in file.
John Darrington [Tue, 7 Jul 2009 11:24:40 +0000 (19:24 +0800)]
Fix problem running the perl module
John Darrington [Tue, 7 Jul 2009 09:35:21 +0000 (17:35 +0800)]
Replace legacy_recode with recode_string.
Iconv seems to do a good job of converting between
ascii and ebcdic, so use the recode_string function
instead of our own conversion routines.
John Darrington [Tue, 7 Jul 2009 05:19:18 +0000 (13:19 +0800)]
Fix compile warnings
John Darrington [Tue, 7 Jul 2009 04:52:45 +0000 (12:52 +0800)]
Fix bug in value labels dialog box
John Darrington [Tue, 7 Jul 2009 04:19:17 +0000 (12:19 +0800)]
Add dictionary argument to tab_value.
In order to properly display values, tab_value needs
to know the dictionary from whence the value comes.
This is necessary so that string values can be properly
decoded.
This change adds this argument to tab_value and updates
all callers.
John Darrington [Mon, 6 Jul 2009 19:39:36 +0000 (03:39 +0800)]
Recode strings when writing system files.
The long variable names, variable labels and value labels are
now converted from utf8 to the dictionary encoding when
writing a system file.
John Darrington [Mon, 6 Jul 2009 17:38:21 +0000 (01:38 +0800)]
Fix crash when opening empty dataset
John Darrington [Mon, 6 Jul 2009 16:44:27 +0000 (00:44 +0800)]
Convert to utf8 in data_out function.
Previously, the output value of data_out was of arbitrary encoding.
This change attempts to ensure that it is always utf8.
John Darrington [Mon, 6 Jul 2009 11:51:34 +0000 (19:51 +0800)]
data_out function to dynamically allocate return value.
Preparation for i18n of values. Instead of asking the
caller to prepare the buffer for output, data_out now
dynamically allocates the output value, and expects the
caller to free it. This is necessary since for utf8
strings, the caller cannot reasonably know the length of
the required output buffer. It also simplifies some uses
of data_out.
John Darrington [Sun, 5 Jul 2009 12:45:12 +0000 (20:45 +0800)]
Change enum legacy_encoding to const char *.
Preparation for i18n of union values. Remove the
legacy_encoding enum and substitute it with a const
char *. This makes it easier to integrate recoding
of union values in the data parsing stage.
John Darrington [Sun, 5 Jul 2009 09:33:29 +0000 (17:33 +0800)]
Store variable names, labels and value labels as UTF8.
This change converts long variable names, variable labels
and value labels to utf8 encoding when system files are
loaded. It is therefore no longer necessary (nor correct)
to convert them when displaying.
Jason H Stover [Tue, 16 Jun 2009 16:20:57 +0000 (12:20 -0400)]
Renamed interaction_variable_get_var to interaction_get_variable.
Renamed interaction_variable_get_member to interaction_get_member.
Split update_hash_entry into update_hash_entry and
update_hash_entry_intr for interactions.
inner_intr_loop: New function.
covariance_accumulate_pairwise: Loop separately over variables, then interactions.
interaction_variable_create: Make interactions type alpha when
appropriate.
interaction_value_create: Use value_resize to avoid copying more data than
necessary into new interaction_value.
Ben Pfaff [Mon, 15 Jun 2009 20:52:17 +0000 (13:52 -0700)]
sparse-xarray: Add missing #include <limits.h>.
Thanks to michel <michel@cecaps.ufmg.br> for reporting the problem.
Ben Pfaff [Mon, 15 Jun 2009 03:09:42 +0000 (20:09 -0700)]
Allow variables created by var_create_internal to have any width.
Until now, var_create_internal has always created a numeric variable.
In the long run we wish to phase out the use of internal variables
entirely, but this change should help Jason get some work done in the
short term.
John Darrington [Sun, 14 Jun 2009 09:27:32 +0000 (17:27 +0800)]
Fix compile warning
Ben Pfaff [Fri, 12 Jun 2009 03:25:49 +0000 (20:25 -0700)]
Fix type mismatch between value_hash prototype and definition.
Thanks to michel <michel@cecaps.ufmg.br> for pointing out the problem.
Ben Pfaff [Fri, 12 Jun 2009 03:11:55 +0000 (20:11 -0700)]
Drop call to deleted function value_cnt_from_width (from debug-only code).
Thanks to Jason for pointing out the problem.
Jason H Stover [Thu, 11 Jun 2009 15:31:40 +0000 (11:31 -0400)]
Fixed crash caused by regressing with categorical variables
John Darrington [Tue, 9 Jun 2009 11:47:08 +0000 (19:47 +0800)]
Fixed bug inserting cases in data sheet.
Cases were not being inserted in the correct position.
Ben Pfaff [Mon, 8 Jun 2009 04:57:36 +0000 (21:57 -0700)]
Fix handling of #! at beginning of PSPP syntax file; add regression test.
Fixes bug #26518.
Thanks to John Darrington for testing.
Ben Pfaff [Sun, 7 Jun 2009 20:14:23 +0000 (13:14 -0700)]
Remove spurious Makefile from src/output.
Ben Pfaff [Sun, 7 Jun 2009 04:04:21 +0000 (21:04 -0700)]
crosstabs: Fix chi-square display and add regression test.
Bug #26739.
Ben Pfaff [Sun, 7 Jun 2009 03:53:10 +0000 (20:53 -0700)]
crosstab: Remove struct that was defined but never used.
Ben Pfaff [Sun, 7 Jun 2009 03:44:49 +0000 (20:44 -0700)]
crosstabs: Remove write-only variable.
Ben Pfaff [Sun, 7 Jun 2009 03:30:14 +0000 (20:30 -0700)]
crosstabs: Fix segfault when chi-square was requested.
Bug #26739.
Ben Pfaff [Wed, 3 Jun 2009 05:21:01 +0000 (22:21 -0700)]
datasheet-test: Add support for testing string backing store columns.
Ben Pfaff [Wed, 3 Jun 2009 04:55:50 +0000 (21:55 -0700)]
crosstabs: Trim unsightly spaces from titles in output.
Unfortunately, none of the tests exercise this code, so it's hard to say
whether it is correct.
Ben Pfaff [Wed, 3 Jun 2009 02:52:18 +0000 (19:52 -0700)]
crosstabs: Fix memory leaks.
Ben Pfaff [Sat, 30 May 2009 04:51:45 +0000 (21:51 -0700)]
argv-parser: Add assertion to find likely bugs in client code.
Ben Pfaff [Sat, 30 May 2009 04:51:19 +0000 (21:51 -0700)]
datasheet: Fix bugs in datasheet_resize_column() found with new test.
Ben Pfaff [Sat, 30 May 2009 04:46:24 +0000 (21:46 -0700)]
datasheet-test: Add test for datasheet_resize_column().
Ben Pfaff [Sat, 30 May 2009 04:43:33 +0000 (21:43 -0700)]
datasheet-test: Fix printing of string values in error messages.
Ben Pfaff [Sat, 30 May 2009 04:26:13 +0000 (21:26 -0700)]
datasheet-test: Check duplicate states before discarding them.
By failing to check states whose hashes already appeared in the model
checker table, the datasheet test was missing some bugs. This commit
changes the datasheet test code to check the state before it checks for
the hash.
Ben Pfaff [Thu, 28 May 2009 05:22:48 +0000 (22:22 -0700)]
datasheet-test: Make column widths to test configurable on command line.
Ben Pfaff [Sat, 30 May 2009 04:45:28 +0000 (21:45 -0700)]
datasheet-test: Don't test null operations.
By not testing null operations (such as inserting or deleting 0 rows or
columns) the duration of the test is cut roughly in half, with little if
any reduction in test coverage.
Ben Pfaff [Sat, 30 May 2009 04:50:12 +0000 (21:50 -0700)]
sparse-xarray-test: Style and comment fixes.
Ben Pfaff [Wed, 27 May 2009 06:04:32 +0000 (23:04 -0700)]
value: New function value_swap.
Ben Pfaff [Wed, 27 May 2009 05:02:48 +0000 (22:02 -0700)]
Move datasheet test out of PSPP into a separate binary.
When it's not difficult to do so, it is better to put tests in separate
binaries instead of in the PSPP binaries, so that the binaries are not
burdened with code that is not of real interest to users and to make the
main PSPP binaries build faster.
Ben Pfaff [Tue, 26 May 2009 03:24:07 +0000 (20:24 -0700)]
Make MAX_SHORT_STRING an implementation detail of the "value" code.
MAX_SHORT_STRING used to be important. It was referenced all over the
source tree. Now, there is little reason for code outside the "value"
code itself to use it.
Ben Pfaff [Tue, 26 May 2009 03:22:01 +0000 (20:22 -0700)]
Use MAX_SHORT_STRING in place of MIN_LONG_STRING.
There is no good reason to have both of these constants, so replace all
uses of MAX_LONG_STRING by MAX_SHORT_STRING.
Ben Pfaff [Tue, 26 May 2009 03:21:08 +0000 (20:21 -0700)]
Fix portable file reader use of long strings.
This code hadn't been converted to the new "union value" representation,
where a single "union value" always represents a whole data item. This
commit fixes that.
Ben Pfaff [Tue, 26 May 2009 03:20:07 +0000 (20:20 -0700)]
Get rid of uses of MAX_SHORT_STRING in Gnumeric and PostgreSQL readers.
MAX_SHORT_STRING is now intended to be an implementation detail of the
value code. There is no real reason that the Gnumeric or PostgreSQL
readers need to use it, so make them use their own constants instead.
Ben Pfaff [Tue, 26 May 2009 03:07:19 +0000 (20:07 -0700)]
Implement missing values for long string variables.
Ben Pfaff [Mon, 25 May 2009 19:36:21 +0000 (12:36 -0700)]
Fix test failure introduced along with parse_value().
Ben Pfaff [Mon, 25 May 2009 02:36:01 +0000 (19:36 -0700)]
sys-file-reader: Fix memory leak.
Ben Pfaff [Mon, 25 May 2009 02:35:40 +0000 (19:35 -0700)]
Add support for value labels on long string variables.
Ben Pfaff [Sun, 24 May 2009 18:26:41 +0000 (11:26 -0700)]
New function parse_value() for parsing a value of specified width.
Occasionally a value of a given width needs to be parsed from syntax.
This commit adds a helper function for doing so and modifies a few pieces
of code to use it. Probably there are other places where it would be
useful that could not easily be found with "grep".
This commit also renames the range-parser code to value-parser and puts
the new function in there, as a natural generalization.
Suggested by John Darrington.
Ben Pfaff [Tue, 12 May 2009 03:08:19 +0000 (20:08 -0700)]
Make value_set_missing(), etc. tolerate values of width -1.
In some circumstances a value of width -1 crops up, e.g. when a case is
made from a dictionary that has had a variable deleted in the middle.
Such a value has no content at all. In the long run it should be possible
to get rid of these values entirely--their presence is a wart--but for now
the case and value code needs to tolerate them.
This fixes a segfault in the GUI when inserting a new case when the
datasheet case has a column with width -1 (due to deletion of a variable),
which was caused by case_set_missing() calling value_set_missing() for
the -1 width variable, which in turn was writing through an invalid
pointer.
John Darrington [Mon, 11 May 2009 23:18:31 +0000 (07:18 +0800)]
Prevent invalid variable widths in variable sheet.
Ben Pfaff [Mon, 11 May 2009 13:33:35 +0000 (06:33 -0700)]
Remove debug printfs that escaped from my local tree.
Ben Pfaff [Mon, 11 May 2009 13:33:15 +0000 (06:33 -0700)]
gui: Fix segfault when pushing Del on a long string variable cell.
Thanks to John Darrington for reporting the problem.
Ben Pfaff [Mon, 11 May 2009 05:23:00 +0000 (22:23 -0700)]
Change "union value" to dynamically allocate long strings.
Until now, a single "union value" could hold a numeric value or a short
string value. A long string value (one longer than MAX_SHORT_STRING)
required a number of contiguous "union value"s. This situation was
inconvenient sometimes, because any occasion where a long string value
might be required (even if it was unlikely) required using dynamic
memory allocation.
With this change, a value of any type, regardless of whether it is numeric
or short or long string, occupies a single "union value". The internal
representation of short and long strings is now different, however: long
strings are now internally represented by a pointer to dynamically
allocated memory. This means that "union value"s must now be initialized
and uninitialized properly, to ensure that memory is properly allocated
and freed behind the scenese.
This change thus has a ripple effect on PSPP code that works with values.
In particular, code that deals with cases is greatly changed, because a
case now needs to know the type of each value that it contains. Thus, a
new concept called a "case prototype", which represents the type and
width of each value within a case, is introduced, and every place in PSPP
that creates a case must now create a corresponding prototype to go with
it. This is why this commit is so big.
As part of writing up this commit, it became clear that some code was poor
enough that it needed to be rewritten entirely. Therefore, CROSSTABS and
T-TEST are almost completely modified by this commit.
Ben Pfaff [Thu, 7 May 2009 05:58:01 +0000 (22:58 -0700)]
output: Add auxiliary data parameter to tab_dim.
Until now, the tab_dim function has not provided any way to pass auxiliary
data to the table dimensioning function. This commit adds this ability
and updates all the callers of tab_dim to do so.
Ben Pfaff [Thu, 7 May 2009 05:47:51 +0000 (22:47 -0700)]
New data structure sparse_xarray.
Ben Pfaff [Thu, 7 May 2009 03:34:14 +0000 (20:34 -0700)]
New wrapper for access to temporary files.
Ben Pfaff [Tue, 5 May 2009 12:42:23 +0000 (05:42 -0700)]
model-checker: Add command-line parser for model checking options.
This adds a parser for command-line options to configure a set of
mc_options for running the model checker. It is used by an upcoming test
for the sparse_xarray. It might also make sense to break the datasheet
tests out of PSPP into a separate program using this parser.
Ben Pfaff [Tue, 5 May 2009 12:39:03 +0000 (05:39 -0700)]
Implement new command-line argument parser.
glibc has two option parsers, but neither one of them feels quite
right:
- getopt_long is simple, but not modular, in that there is no
easy way to make it accept multiple collections of options
supported by different modules.
- argp is more sophisticated and more complete, and hence more
complex. It still lacks one important feature for
modularity: there is no straightforward way for option groups
that are implemented independently to have separate auxiliary
data,
The parser implemented in this commit is meant to be simple and
modular. It is based internally on getopt_long.
The initial use for this option parser is for an upcoming commit of a test
program that has some of its own options and some from the model checker,
but it should also be appropriate for PSPP and PSPPIRE if anyone wants to
adapt them to use it.
Ben Pfaff [Tue, 5 May 2009 05:30:02 +0000 (22:30 -0700)]
model-checker: Don't discard error states.
Even if a state with an error is a duplicate, we don't want to discard it,
because then we lose information about bugs.
Ben Pfaff [Tue, 5 May 2009 05:27:05 +0000 (22:27 -0700)]
model-checker: Revise advice on checking for duplicates.
Until now the documentation on the model checker has advised checking for
a duplicate state before checking for consistency, but in fact this can
cause bugs to be missed if only some paths to a given state yield
incorrect results. So revise the advice to check for consistency before
checking for a duplicate state.
Ben Pfaff [Tue, 5 May 2009 05:20:42 +0000 (22:20 -0700)]
model-checker: Add more progress functions.
The model checker supports "progress functions" that report the current
status of the model checking run. Until now the implementation only
exported a single progress function that printed a line of dots across
stderr. This commit moves the "fancy" progress function that was
previously part of the PSPP language code into the model checker itself
and adds an even more verbose progress function as well.
Ben Pfaff [Tue, 5 May 2009 05:33:48 +0000 (22:33 -0700)]
model-checker: Move summary printing function into model checker.
There is no reason that the model checker itself should not be able to
print a summary of its results. Until now, this code was buried in the
PSPP language code, but the model checker itself is a better place for it.
Ben Pfaff [Fri, 24 Apr 2009 04:09:12 +0000 (21:09 -0700)]
model-checker: Kill dependencies and move back to libpspp.
Commit
95b074ff3 "Moved the datasheet testing code out of
src/{libspp,data}" moved the model-checker implementation from libpspp
into language/tests because it depended on math/moments.h and
data/val-type.h, which violates the dependency structure of the PSPP
libraries.
However, now I want to use the model checker in a test that should not
need to use anything from language/tests, so this commit eliminates these
dependencies and moves the model checker back to src/libpspp.
Ben Pfaff [Tue, 5 May 2009 04:53:07 +0000 (21:53 -0700)]
sparse-array: Simplify code slightly.
Instead of checking whether the key is in range in each caller of
find_leaf_node, do it in find_leaf_node itself. This also allows checking
the cache before checking whether the key is in range, which might be an
optimization.
Ben Pfaff [Thu, 7 May 2009 03:22:09 +0000 (20:22 -0700)]
sparse-array: Improve iteration interface.
The sparse_array_scan function only supports iteration in the forward
direction and its interface is somewhat awkward. This commit replaces it
by four new functions that allow iteration in both forward and reverse
directions and have a more conventional interface.
Ben Pfaff [Tue, 5 May 2009 12:51:54 +0000 (05:51 -0700)]
sparse-array: Use __builtin_ctzl on GCC 4.0 or later, as an optimization.
This should be a worthwhile optimization in many cases, because
__builtin_ctzl compiles to a single machine instruction on x86, whereas
the generic implementation compiles to several.
Ben Pfaff [Tue, 5 May 2009 02:31:16 +0000 (19:31 -0700)]
range-set: New functions range_set_last and range_set_prev.
These are useful for iterating through a range set in reverse order.
Ben Pfaff [Fri, 24 Apr 2009 03:27:54 +0000 (20:27 -0700)]
range-set: Add new function range_set_scan().
Ben Pfaff [Fri, 24 Apr 2009 03:27:10 +0000 (20:27 -0700)]
range-set: Inline some simple functions.
Some of the range-set functions are very simple and worth inlining, so
move the definitions of those functions into range-set.h. This required
moving the definition of struct range_set and struct range_set_node into
the header. Some of the functions used internally by those functions had
to be moved too, and renamed as well since for internal use in range-set.c
their names did not respect the namespace.
Ben Pfaff [Fri, 24 Apr 2009 03:14:52 +0000 (20:14 -0700)]
range-set: Add test coverage for range_set_destroy(NULL).
"gcov -b" showed that range_set_destroy() was never called with a NULL
argument. There's no reason not to test that too (although of course it
is unlikely to be broken).
Ben Pfaff [Tue, 5 May 2009 14:03:24 +0000 (07:03 -0700)]
range-set: New function range_set_allocate_fully.
Ben Pfaff [Wed, 1 Apr 2009 05:00:44 +0000 (22:00 -0700)]
Delete CORRELATIONS skeletal parser.
This code didn't do anything useful, it just parsed syntax. We can
resurrect it when someone wants to implement CORRELATIONS later.
Ben Pfaff [Tue, 5 May 2009 14:03:02 +0000 (07:03 -0700)]
pool: New function pool_strdup0.
This function is the pool analogue of xmemdup0, except that it is only
appropriate for use with strings.
Ben Pfaff [Tue, 5 May 2009 12:46:01 +0000 (05:46 -0700)]
datasheet: Drop false dependency on md4.h.
datasheet-check.c doesn't use anything from md4.h, so there's no point in
including it.
Ben Pfaff [Thu, 16 Apr 2009 05:31:56 +0000 (22:31 -0700)]
perl-module: Better document "make test" requirements.
Ben Pfaff [Fri, 16 Jan 2009 04:39:18 +0000 (20:39 -0800)]
t-test: Move 'mode' variable from file scope into cmd_t_test().
Ben Pfaff [Fri, 16 Jan 2009 04:37:32 +0000 (20:37 -0800)]
t-test: Move 'cmd' variable from file scope into cmd_t_test().
This variable was only used inside cmd_t_test() anyhow.
Ben Pfaff [Fri, 16 Jan 2009 04:34:43 +0000 (20:34 -0800)]
t-test: Remove write-only variable.
Jason H Stover [Wed, 3 Jun 2009 19:54:19 +0000 (15:54 -0400)]
Moved static is_origin from design_matrix.c to category.c: cat_is_origin.
John Darrington [Sun, 17 May 2009 23:43:14 +0000 (07:43 +0800)]
Fix incorrect word order
John Darrington [Sun, 17 May 2009 23:42:31 +0000 (07:42 +0800)]
Remove whitespace before footnotes
John Darrington [Sun, 17 May 2009 23:39:06 +0000 (07:39 +0800)]
Correct grammar
John Darrington [Sun, 17 May 2009 07:29:54 +0000 (15:29 +0800)]
Change examples using heights to be all in millimetres
John Darrington [Sun, 17 May 2009 07:20:23 +0000 (15:20 +0800)]
Delete '*' from DATA LIST examples
John Darrington [Sun, 17 May 2009 07:16:23 +0000 (15:16 +0800)]
Mention that two juxtapose LIST keywords are intentional
John Darrington [Sun, 17 May 2009 05:15:41 +0000 (13:15 +0800)]
Fix misaligned menu items
John Darrington [Sun, 17 May 2009 05:06:59 +0000 (13:06 +0800)]
Added a tutorial chapter to the manual.