From: Ben Pfaff Date: Mon, 4 Jan 2016 04:12:29 +0000 (-0800) Subject: REGRESSION: Correctly handle data in temporary file for SAVE subcommand. X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?p=pspp;a=commitdiff_plain;h=5e180775fa5a79e6f14b2798bd4a3d4ea9f51939 REGRESSION: Correctly handle data in temporary file for SAVE subcommand. The format of the data in the temporary file used by the SAVE subcommand is basically unrelated to the format of the data in the active file, but the code that created and wrote cases to it created them as clones of cases in the active file. This is simply incorrect behavior and causes nasty failures when the active file has an inconvenient number of variables or contains long string variables. This commit fixes the problem by correctly creating the cases for the temporary file using that file's own caseproto. In addition, the assignment of caseproto indexes to variables in the temporary file was broken for /SAVE=PRED, because it would skip index 0. This commit also fixes that problem. Bug #44877. Reported by Scott. --- diff --git a/src/language/stats/regression.c b/src/language/stats/regression.c index 9379ce70ef..a6c6cbacc6 100644 --- a/src/language/stats/regression.c +++ b/src/language/stats/regression.c @@ -348,8 +348,7 @@ cmd_regression (struct lexer *lexer, struct dataset *ds) if (regression.resid) { - workspace.extras ++; - workspace.res_idx = 0; + workspace.res_idx = workspace.extras ++; workspace.residvars = xcalloc (regression.n_dep_vars, sizeof (*workspace.residvars)); for (i = 0; i < regression.n_dep_vars; ++i) @@ -361,8 +360,7 @@ cmd_regression (struct lexer *lexer, struct dataset *ds) if (regression.pred) { - workspace.extras ++; - workspace.pred_idx = 1; + workspace.pred_idx = workspace.extras ++; workspace.predvars = xcalloc (regression.n_dep_vars, sizeof (*workspace.predvars)); for (i = 0; i < regression.n_dep_vars; ++i) @@ -709,7 +707,7 @@ run_regression (const struct regression *cmd, for (; (c = casereader_read (r)) != NULL; case_unref (c)) { - struct ccase *outc = case_clone (c); + struct ccase *outc = case_create (casewriter_get_proto (ws->writer)); for (k = 0; k < cmd->n_dep_vars; k++) { const struct variable **vars = xnmalloc (cmd->n_vars, sizeof (*vars)); diff --git a/tests/language/stats/regression.at b/tests/language/stats/regression.at index c48dfaf784..188d6a8de6 100644 --- a/tests/language/stats/regression.at +++ b/tests/language/stats/regression.at @@ -2130,3 +2130,53 @@ Table: Coefficients (science) AT_CLEANUP + + +dnl Checks for regression against bug #44877. +AT_SETUP([LINEAR REGRESSION crash with long string variables]) +AT_DATA([regression.sps], [dnl +SET DECIMAL=DOT. + +DATA LIST notable LIST /text (A24) Y * X1 * +BEGIN DATA. +V00276601 0.00 90.00 +V00292909 10.00 30.00 +V00291204 20.00 20.00 +V00300070 0.00 90.00 +END DATA. + +REGRESSION +/VARIABLES= Y +/DEPENDENT= X1 +/METHOD=ENTER +/STATISTICS=COEFF R ANOVA +/SAVE= RESID. + +LIST. +]) +AT_CHECK([pspp -o pspp.csv regression.sps]) +AT_CHECK([cat pspp.csv], [0], [dnl +Table: Model Summary (X1) +,R,R Square,Adjusted R Square,Std. Error of the Estimate +,.95,.89,.84,15.08 + +Table: ANOVA (X1) +,,Sum of Squares,df,Mean Square,F,Sig. +,Regression,3820.45,1,3820.45,16.81,.055 +,Residual,454.55,2,227.27,, +,Total,4275.00,3,,, + +Table: Coefficients (X1) +,,Unstandardized Coefficients,,Standardized Coefficients,, +,,B,Std. Error,Beta,t,Sig. +,(Constant),85.45,10.16,.00,8.41,.004 +,Y,-3.73,.91,-.95,-4.10,.055 + +Table: Data List +text,Y,X1,RES1 +V00276601 ,.00,90.00,4.55 +V00292909 ,10.00,30.00,-18.18 +V00291204 ,20.00,20.00,9.09 +V00300070 ,.00,90.00,4.55 +]) +AT_CLEANUP