John Darrington [Thu, 23 Jul 2009 05:05:41 +0000 (07:05 +0200)]
Merge commit 'origin/roc'
John Darrington [Tue, 21 Jul 2009 13:43:08 +0000 (15:43 +0200)]
Add perl functions to get the format of a variable
John Darrington [Mon, 20 Jul 2009 04:38:23 +0000 (06:38 +0200)]
Remove double semicolons.
John Darrington [Mon, 20 Jul 2009 04:34:38 +0000 (06:34 +0200)]
Replace caseproto_clone with caseproto_ref
Also unref the proto on destruction of the translator.
John Darrington [Sun, 19 Jul 2009 18:20:36 +0000 (20:20 +0200)]
Added more (hopefully usefull) comments
John Darrington [Sun, 19 Jul 2009 17:27:51 +0000 (19:27 +0200)]
Add some comments and macros to make the code more readable
John Darrington [Sun, 19 Jul 2009 12:20:07 +0000 (14:20 +0200)]
Add assertion to check code consitency
John Darrington [Sun, 19 Jul 2009 11:35:35 +0000 (13:35 +0200)]
Fix cleanup of ROC command.
Properly deallocate variables, and use correct
symbols for parser return values. Also, delete
roc.h which is unnecessary. Thanks to Ben Pfaff
for pointing out these problems.
John Darrington [Sun, 19 Jul 2009 11:04:39 +0000 (13:04 +0200)]
Respect the constness of caseproto.
New function caseproto_clone. This means we
can clone a proto, then mutate it as we want.
John Darrington [Sun, 19 Jul 2009 09:41:49 +0000 (11:41 +0200)]
Corrected spelling of "consolidate".
Thanks to Ben Pfaff for pointing out my mistake.
John Darrington [Sun, 19 Jul 2009 09:35:38 +0000 (11:35 +0200)]
Added the '=' to the plot subcommand's documentation.
Thanks to Ben Pfaff for pointing this out.
John Darrington [Sat, 18 Jul 2009 11:38:49 +0000 (13:38 +0200)]
Avoid compiler warning
John Darrington [Sat, 18 Jul 2009 11:20:38 +0000 (13:20 +0200)]
Merge commit 'origin/data-encoding'
Conflicts:
src/language/dictionary/split-file.c
John Darrington [Sat, 18 Jul 2009 10:38:02 +0000 (12:38 +0200)]
Add comment explaining the meaning of encoding to data_out
John Darrington [Sat, 18 Jul 2009 10:34:30 +0000 (12:34 +0200)]
The length of the string is now not always the
same as the format width.
Thanks to Ben Pfaff for pointing this out.
John Darrington [Sat, 18 Jul 2009 10:29:35 +0000 (12:29 +0200)]
Improve code to trim leading spaces from numeric output.
Ben Pfaff pointed out that the code to chomp the results
of formated doubles was no longer correct. This change
fixes that.
John Darrington [Sat, 18 Jul 2009 10:08:04 +0000 (12:08 +0200)]
Use data_out_pool in crosstabs.q
We were erroneously allocating a buffer before
the size of the contents were known. Using
data_out_pool avoids this problem. Thanks to
Ben Pfaff for pointing this out.
John Darrington [Sat, 18 Jul 2009 09:32:46 +0000 (11:32 +0200)]
Before recoding a variable's name, check that it
doesn't clash with an existing one.
Thanks to Ben Pfaff for pointing out this potential
problem.
Jason Stover [Fri, 17 Jul 2009 20:02:02 +0000 (16:02 -0400)]
pspp_linreg(): Use cache->n_coeffs to set the dimensions of sw, instead of
cache->n_indeps, which may give incorrect dimension in the case
of categorical variables. Fixes bug referenced in bug report
26861.
John Darrington [Fri, 17 Jul 2009 15:11:44 +0000 (23:11 +0800)]
Ensure correct behaviour when the state var is missing.
When the state variable is missing, then the entire
case is skipped.
John Darrington [Fri, 17 Jul 2009 15:04:22 +0000 (23:04 +0800)]
Update documentation regarding missing values.
Explicitly mention that cases are excluded on a
listwise basis.
John Darrington [Fri, 17 Jul 2009 14:48:29 +0000 (22:48 +0800)]
Fix ROC behaviour in the presence of missing values.
Make sure that the ROC command's behaviour is correct,
when missing values appear in the result variable.
John Darrington [Fri, 17 Jul 2009 07:30:40 +0000 (15:30 +0800)]
Corrected typos in the perl documentation
Ben Pfaff [Thu, 16 Jul 2009 05:19:33 +0000 (22:19 -0700)]
i18n: Merge translatable strings.
Ben Pfaff [Thu, 16 Jul 2009 05:15:03 +0000 (22:15 -0700)]
"Sig." is an abbreviation, "Exact" is not.
Ben Pfaff [Thu, 16 Jul 2009 05:12:41 +0000 (22:12 -0700)]
i18n: Eliminate some translatable strings.
Ben Pfaff [Thu, 16 Jul 2009 05:12:07 +0000 (22:12 -0700)]
i18n: Reduce translatable strings in SHOW command.
All the output from the SHOW command is of the form "%s is %s." but the
translators were being asked to translate similar strings over and over
again. Reduce their load by getting rid of many translatable strings.
Ben Pfaff [Thu, 16 Jul 2009 05:10:17 +0000 (22:10 -0700)]
i18n: Change some strings to reduce work of translation.
PSPP has a number of strings that happen to be phrased differently for no
particular reason. This commit changes some of those strings to be
exactly the same as other ones, to make the work of translators easier.
Ben Pfaff [Thu, 16 Jul 2009 04:15:39 +0000 (21:15 -0700)]
Separate table functions that format their arguments from those that don't.
The tab_text, tab_joint_text, and tab_output_text functions, until now,
had an option bit TAT_PRINTF that specified whether they passed their text
argument through sprintf. This interface was bad because it made it
impossible for GCC to tell whether it needed to verify a printf format
string or not.
This commit solves the problem by breaking each of these functions into one
that does format its argument and one that doesn't.
Ben Pfaff [Thu, 16 Jul 2009 02:58:01 +0000 (19:58 -0700)]
Consistently capitalize the name "Gnumeric".
Thanks to Harry Thijssen for pointing out the inconsistency.
John Darrington [Wed, 15 Jul 2009 12:19:49 +0000 (20:19 +0800)]
New function prepare_cutpoints
Move the code which creates the cutpoints into its own
function. This makes for easier reading IMO.
John Darrington [Wed, 15 Jul 2009 12:05:58 +0000 (20:05 +0800)]
Updated the example with an easier to visualise one
John Darrington [Tue, 14 Jul 2009 10:15:51 +0000 (18:15 +0800)]
Replaced the glade definition of about dialog with a C one.
The about dialog box is simple enough to maintain in C
rather than using a glade definition.
John Darrington [Tue, 14 Jul 2009 10:01:37 +0000 (18:01 +0800)]
Removed unused function prototypes
John Darrington [Mon, 13 Jul 2009 09:29:39 +0000 (17:29 +0800)]
Fix crash in find dialog and make code less horrible.
John Darrington [Mon, 13 Jul 2009 06:21:02 +0000 (14:21 +0800)]
Fix bug encodig missing value keys in gui
John Darrington [Sun, 12 Jul 2009 20:46:20 +0000 (04:46 +0800)]
Fix crash on text import dialog
Jason Stover [Sun, 12 Jul 2009 19:44:56 +0000 (15:44 -0400)]
Remove write-only variable from interaction_case_data.
John Darrington [Sun, 12 Jul 2009 15:44:36 +0000 (23:44 +0800)]
Fix compiler warning in test program
John Darrington [Sun, 12 Jul 2009 14:50:11 +0000 (22:50 +0800)]
Updated the developers' manual to reflect the new situation
John Darrington [Sun, 12 Jul 2009 14:13:44 +0000 (22:13 +0800)]
Added a dict parameter to data_in and dealt with the consequences.
The data_in function now takes a pointer to a struct dictionary,
which must be the dictionary with which the output value is
associated. Data_in now ensures that the data of string values
is converted to the dictionary's encoding if necessary.
Jason Stover [Sat, 11 Jul 2009 19:45:49 +0000 (15:45 -0400)]
Return 0.0 for mean of a categorical variable. Fixes bug mentioned in bug report 26861.
John Darrington [Wed, 8 Jul 2009 19:05:24 +0000 (03:05 +0800)]
Remove recoding in data_store.
It's no longer appropriate to perform recoding in the gui.
Instead, this is expected to be done in the backend.
John Darrington [Tue, 7 Jul 2009 16:50:57 +0000 (00:50 +0800)]
Change union value type to contain uint8_t types instead of char.
Make the members of the union value type in src/data/value.h be
uint8_t instead of char. This is more logical since the contents
of values cannot be considered "strings" until they have been
formatted. The unformatted values are merely arrays of bytes.
This has the added advantage of provoking compiler warnings when
a char * type is being implicitly cast to a uint8_t * or vici-versa.
When such a warning is encountered, is probably means that the
data needs to be re-encoded using recode_string.
John Darrington [Tue, 7 Jul 2009 16:34:16 +0000 (00:34 +0800)]
Remove erroneously commited diagnostic statement
John Darrington [Tue, 7 Jul 2009 12:33:03 +0000 (20:33 +0800)]
Use default encoding when reading system files if no encoding is given in file.
John Darrington [Tue, 7 Jul 2009 11:24:40 +0000 (19:24 +0800)]
Fix problem running the perl module
John Darrington [Tue, 7 Jul 2009 09:35:21 +0000 (17:35 +0800)]
Replace legacy_recode with recode_string.
Iconv seems to do a good job of converting between
ascii and ebcdic, so use the recode_string function
instead of our own conversion routines.
John Darrington [Tue, 7 Jul 2009 05:19:18 +0000 (13:19 +0800)]
Fix compile warnings
John Darrington [Tue, 7 Jul 2009 04:52:45 +0000 (12:52 +0800)]
Fix bug in value labels dialog box
John Darrington [Tue, 7 Jul 2009 04:19:17 +0000 (12:19 +0800)]
Add dictionary argument to tab_value.
In order to properly display values, tab_value needs
to know the dictionary from whence the value comes.
This is necessary so that string values can be properly
decoded.
This change adds this argument to tab_value and updates
all callers.
John Darrington [Mon, 6 Jul 2009 19:39:36 +0000 (03:39 +0800)]
Recode strings when writing system files.
The long variable names, variable labels and value labels are
now converted from utf8 to the dictionary encoding when
writing a system file.
John Darrington [Mon, 6 Jul 2009 17:38:21 +0000 (01:38 +0800)]
Fix crash when opening empty dataset
John Darrington [Mon, 6 Jul 2009 16:44:27 +0000 (00:44 +0800)]
Convert to utf8 in data_out function.
Previously, the output value of data_out was of arbitrary encoding.
This change attempts to ensure that it is always utf8.
John Darrington [Mon, 6 Jul 2009 11:51:34 +0000 (19:51 +0800)]
data_out function to dynamically allocate return value.
Preparation for i18n of values. Instead of asking the
caller to prepare the buffer for output, data_out now
dynamically allocates the output value, and expects the
caller to free it. This is necessary since for utf8
strings, the caller cannot reasonably know the length of
the required output buffer. It also simplifies some uses
of data_out.
John Darrington [Sun, 5 Jul 2009 12:45:12 +0000 (20:45 +0800)]
Change enum legacy_encoding to const char *.
Preparation for i18n of union values. Remove the
legacy_encoding enum and substitute it with a const
char *. This makes it easier to integrate recoding
of union values in the data parsing stage.
John Darrington [Sun, 5 Jul 2009 09:33:29 +0000 (17:33 +0800)]
Store variable names, labels and value labels as UTF8.
This change converts long variable names, variable labels
and value labels to utf8 encoding when system files are
loaded. It is therefore no longer necessary (nor correct)
to convert them when displaying.
John Darrington [Thu, 25 Jun 2009 03:08:09 +0000 (11:08 +0800)]
Fix bugs when input data is repeated
John Darrington [Wed, 24 Jun 2009 08:47:38 +0000 (16:47 +0800)]
Added second ROC test
Jason H Stover [Tue, 16 Jun 2009 16:20:57 +0000 (12:20 -0400)]
Renamed interaction_variable_get_var to interaction_get_variable.
Renamed interaction_variable_get_member to interaction_get_member.
Split update_hash_entry into update_hash_entry and
update_hash_entry_intr for interactions.
inner_intr_loop: New function.
covariance_accumulate_pairwise: Loop separately over variables, then interactions.
interaction_variable_create: Make interactions type alpha when
appropriate.
interaction_value_create: Use value_resize to avoid copying more data than
necessary into new interaction_value.
John Darrington [Mon, 15 Jun 2009 23:27:31 +0000 (07:27 +0800)]
Add new functions to define subcase orderings.
Allow subcases to be defined from a index and width,
rather from a variable. This avoids much of the
need for var_create_internal.
Ben Pfaff [Mon, 15 Jun 2009 20:52:17 +0000 (13:52 -0700)]
sparse-xarray: Add missing #include <limits.h>.
Thanks to michel <michel@cecaps.ufmg.br> for reporting the problem.
Ben Pfaff [Mon, 15 Jun 2009 03:09:42 +0000 (20:09 -0700)]
Allow variables created by var_create_internal to have any width.
Until now, var_create_internal has always created a numeric variable.
In the long run we wish to phase out the use of internal variables
entirely, but this change should help Jason get some work done in the
short term.
John Darrington [Sun, 14 Jun 2009 09:27:32 +0000 (17:27 +0800)]
Fix compile warning
John Darrington [Sat, 13 Jun 2009 05:29:25 +0000 (13:29 +0800)]
Added code to plot the ROC curve
Ben Pfaff [Fri, 12 Jun 2009 03:25:49 +0000 (20:25 -0700)]
Fix type mismatch between value_hash prototype and definition.
Thanks to michel <michel@cecaps.ufmg.br> for pointing out the problem.
Ben Pfaff [Fri, 12 Jun 2009 03:11:55 +0000 (20:11 -0700)]
Drop call to deleted function value_cnt_from_width (from debug-only code).
Thanks to Jason for pointing out the problem.
Jason H Stover [Thu, 11 Jun 2009 15:31:40 +0000 (11:31 -0400)]
Fixed crash caused by regressing with categorical variables
John Darrington [Thu, 11 Jun 2009 06:22:22 +0000 (14:22 +0800)]
Added code to generate the ROC cutpoint tables.
John Darrington [Thu, 11 Jun 2009 04:57:17 +0000 (12:57 +0800)]
Add check that input to casereader_create_distinct are sorted
John Darrington [Wed, 10 Jun 2009 13:50:48 +0000 (21:50 +0800)]
Fix bug when positive and negative groups are of different lengths
John Darrington [Wed, 10 Jun 2009 13:49:39 +0000 (21:49 +0800)]
Add framework for ROC summary table
John Darrington [Wed, 10 Jun 2009 13:25:50 +0000 (21:25 +0800)]
Use the requested method for calculating the ROC AUC standard error
John Darrington [Wed, 10 Jun 2009 13:14:01 +0000 (21:14 +0800)]
Added basic calculation and display of area under the curve
John Darrington [Wed, 10 Jun 2009 13:11:32 +0000 (21:11 +0800)]
Added test for the ROC command
John Darrington [Wed, 10 Jun 2009 03:36:05 +0000 (11:36 +0800)]
Added a new casereader translator to consolodate cases.
This new translator creates a reader which provides
a list of distinct cases in the input, with the weights
consolodated, where applicable.
John Darrington [Wed, 10 Jun 2009 01:44:01 +0000 (09:44 +0800)]
Added stub for ROC computation
John Darrington [Tue, 9 Jun 2009 11:47:08 +0000 (19:47 +0800)]
Fixed bug inserting cases in data sheet.
Cases were not being inserted in the correct position.
John Darrington [Tue, 9 Jun 2009 11:16:24 +0000 (19:16 +0800)]
Added documentation for the ROC command
John Darrington [Tue, 9 Jun 2009 11:15:08 +0000 (19:15 +0800)]
Added parser for the ROC command.
John Darrington [Tue, 9 Jun 2009 11:04:25 +0000 (19:04 +0800)]
Support mult-data charts and legend.
Add support for charts to have datasets with seperate
colours, and a legend to indicate them.
Ben Pfaff [Mon, 8 Jun 2009 04:57:36 +0000 (21:57 -0700)]
Fix handling of #! at beginning of PSPP syntax file; add regression test.
Fixes bug #26518.
Thanks to John Darrington for testing.
Ben Pfaff [Sun, 7 Jun 2009 20:14:23 +0000 (13:14 -0700)]
Remove spurious Makefile from src/output.
Ben Pfaff [Sun, 7 Jun 2009 04:04:21 +0000 (21:04 -0700)]
crosstabs: Fix chi-square display and add regression test.
Bug #26739.
Ben Pfaff [Sun, 7 Jun 2009 03:53:10 +0000 (20:53 -0700)]
crosstab: Remove struct that was defined but never used.
Ben Pfaff [Sun, 7 Jun 2009 03:44:49 +0000 (20:44 -0700)]
crosstabs: Remove write-only variable.
Ben Pfaff [Sun, 7 Jun 2009 03:30:14 +0000 (20:30 -0700)]
crosstabs: Fix segfault when chi-square was requested.
Bug #26739.
Ben Pfaff [Wed, 3 Jun 2009 05:21:01 +0000 (22:21 -0700)]
datasheet-test: Add support for testing string backing store columns.
Ben Pfaff [Wed, 3 Jun 2009 04:55:50 +0000 (21:55 -0700)]
crosstabs: Trim unsightly spaces from titles in output.
Unfortunately, none of the tests exercise this code, so it's hard to say
whether it is correct.
Ben Pfaff [Wed, 3 Jun 2009 02:52:18 +0000 (19:52 -0700)]
crosstabs: Fix memory leaks.
Ben Pfaff [Sat, 30 May 2009 04:51:45 +0000 (21:51 -0700)]
argv-parser: Add assertion to find likely bugs in client code.
Ben Pfaff [Sat, 30 May 2009 04:51:19 +0000 (21:51 -0700)]
datasheet: Fix bugs in datasheet_resize_column() found with new test.
Ben Pfaff [Sat, 30 May 2009 04:46:24 +0000 (21:46 -0700)]
datasheet-test: Add test for datasheet_resize_column().
Ben Pfaff [Sat, 30 May 2009 04:43:33 +0000 (21:43 -0700)]
datasheet-test: Fix printing of string values in error messages.
Ben Pfaff [Sat, 30 May 2009 04:26:13 +0000 (21:26 -0700)]
datasheet-test: Check duplicate states before discarding them.
By failing to check states whose hashes already appeared in the model
checker table, the datasheet test was missing some bugs. This commit
changes the datasheet test code to check the state before it checks for
the hash.
Ben Pfaff [Thu, 28 May 2009 05:22:48 +0000 (22:22 -0700)]
datasheet-test: Make column widths to test configurable on command line.
Ben Pfaff [Sat, 30 May 2009 04:45:28 +0000 (21:45 -0700)]
datasheet-test: Don't test null operations.
By not testing null operations (such as inserting or deleting 0 rows or
columns) the duration of the test is cut roughly in half, with little if
any reduction in test coverage.
Ben Pfaff [Sat, 30 May 2009 04:50:12 +0000 (21:50 -0700)]
sparse-xarray-test: Style and comment fixes.
Ben Pfaff [Wed, 27 May 2009 06:04:32 +0000 (23:04 -0700)]
value: New function value_swap.
Ben Pfaff [Wed, 27 May 2009 05:02:48 +0000 (22:02 -0700)]
Move datasheet test out of PSPP into a separate binary.
When it's not difficult to do so, it is better to put tests in separate
binaries instead of in the PSPP binaries, so that the binaries are not
burdened with code that is not of real interest to users and to make the
main PSPP binaries build faster.