From: Ben Pfaff Date: Fri, 26 Mar 2004 09:03:02 +0000 (+0000) Subject: Make the expression code a little nicer and fix bugs found X-Git-Tag: v0.4.0~304 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=97d6c6f6b1922621ca013668eba9a9a9f71d60fe;p=pspp-builds.git Make the expression code a little nicer and fix bugs found in testing. --- diff --git a/TODO b/TODO index c5a488fe..1048b6c0 100644 --- a/TODO +++ b/TODO @@ -1,16 +1,8 @@ -Time-stamp: <2004-03-24 19:52:48 blp> +Time-stamp: <2004-03-26 00:07:35 blp> What Ben's working on now. -------------------------- -Expression parser/optimizer/evaluator revisions: - - * Testing. - - * Add random distributions. - - * Get rid of Boolean/integer type mismatch errors. - Procedures need to be able to make multiple passes. Write a better descriptive stats evaluator based on NR two-pass technique, @@ -25,6 +17,11 @@ CROSSTABS needs to be re-examined. TODO ---- +The expression tests need tests for XDATE and a few others, see +tests/xforms/expressions.sh comments for details. + +Expressions need random distribution functions. + There needs to be another layer onto the lexer, which should probably be entirely rewritten anyway. The lexer needs to read entire *commands* at a time, not just a *line* at a time. It also needs to support arbitrary putback, diff --git a/doc/ChangeLog b/doc/ChangeLog index fe449865..a7ecb845 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,7 @@ +Fri Mar 26 00:07:46 2004 Ben Pfaff + + * pspp.texi: Update chapter on expressions. + Sat Mar 20 00:53:10 WST 2004 John Darrington * pspp.texi: Added a brief mention of the SHOW command. diff --git a/doc/pspp.texi b/doc/pspp.texi index 31ad84b7..30a514cb 100644 --- a/doc/pspp.texi +++ b/doc/pspp.texi @@ -3573,7 +3573,7 @@ as arguments. With few exceptions, operator arguments may be full-fledged expressions in themselves. @menu -* Booleans:: Boolean values. +* Boolean Values:: Boolean values. * Missing Values in Expressions:: Using missing values in expressions. * Grouping Operators:: ( ) * Arithmetic Operators:: + - * / ** @@ -3583,36 +3583,25 @@ full-fledged expressions in themselves. * Order of Operations:: Operator precedence. @end menu -@node Booleans, Missing Values in Expressions, Expressions, Expressions -@section Boolean values +@node Boolean Values, Missing Values in Expressions, Expressions, Expressions +@section Boolean Values @cindex Boolean @cindex values, Boolean -There is a third type for arguments and results, the @dfn{Boolean} type, -which is used to represent true/false conditions. Booleans have only -three possible values: 0 (false), 1 (true), and system-missing. -System-missing is neither true nor false. +Some PSPP operators and expressions work with Boolean values, which +represent true/false conditions. Booleans have only three possible +values: 0 (false), 1 (true), and system-missing (unknown). +System-missing is neither true nor false and indicates that the true +value is unknown. -@itemize @bullet -@item -A numeric expression that has value 0, 1, or system-missing may be used -in place of a Boolean. Thus, the expression @code{0 AND 1} is valid -(although it is always false). - -@item -A numeric expression with any other value will cause an error if it is -used as a Boolean. So, @code{2 OR 3} is invalid. - -@item -A Boolean expression may not be used in place of a numeric expression. -Thus, @code{(1>2) + (3<4)} is invalid. +Boolean-typed operands or function arguments must take on one of these +three values. Other values are considered false, but cause an error +when the expression is evaluated. -@item Strings and Booleans are not compatible, and neither may be used in place of the other. -@end itemize -@node Missing Values in Expressions, Grouping Operators, Booleans, Expressions +@node Missing Values in Expressions, Grouping Operators, Boolean Values, Expressions @section Missing Values in Expressions String missing values are not treated specially in expressions. Most @@ -3621,8 +3610,8 @@ arguments. Exceptions are listed under particular operator descriptions. User-missing values for numeric variables are always transformed into -the system-missing value, except inside the arguments to the -@code{VALUE}, @code{SYSMIS}, and @code{MISSING} functions. +the system-missing value, except inside the arguments to the +@code{VALUE} and @code{SYSMIS} functions. The missing-value functions can be used to precisely control how missing values are treated in expressions. @xref{Missing Value Functions}, for @@ -3706,8 +3695,8 @@ system-missing value. @cindex logical intersection @item @var{a} AND @var{b} @itemx @var{a} & @var{b} -True if both @var{a} and @var{b} are true. However, if one argument is -false and the other is missing, the result is false, not missing. If +True if both @var{a} and @var{b} are true, false otherwise. If one +argument is false, the result is false even if the other is missing. If both arguments are missing, the result is missing. @cindex @code{OR} @@ -3717,7 +3706,7 @@ both arguments are missing, the result is missing. @item @var{a} OR @var{b} @itemx @var{a} | @var{b} True if at least one of @var{a} and @var{b} is true. If one argument is -true and the other is missing, the result is true, not missing. If both +true, the result is true even if the other argument is missing. If both arguments are missing, the result is missing. @cindex @code{NOT} @@ -3726,7 +3715,8 @@ arguments are missing, the result is missing. @cindex logical inversion @item NOT @var{a} @itemx ~ @var{a} -True if @var{a} is false. +True if @var{a} is false. If the argument is missing, then the result +is missing. @end table @node Relational Operators, Functions, Logical Operators, Expressions @@ -3735,20 +3725,6 @@ True if @var{a} is false. The relational operators take numeric or string arguments and produce Boolean results. -Note that, with numeric arguments, PSPP does not make exact -relational tests. Instead, two numbers are considered to be equal even -if they differ by a small amount. This amount, @dfn{epsilon}, is -dependent on the PSPP configuration and determined at compile -time. (The default value is 0.000000001, or -@ifinfo -@code{10**(-9)}.) -@end ifinfo -@tex -$10 ^{-9}$.) -@end tex -Use of epsilon allows for round-off errors. Use of epsilon is also -idiotic, but the author is not a numeric analyst. - Strings cannot be compared to numbers. When strings of different lengths are compared, the shorter string is right-padded with spaces to match the length of the longer string. @@ -3916,11 +3892,9 @@ results. @cindex arccosine @cindex inverse cosine -@deftypefn {Function} {} ACOS (@var{number}) -@deftypefnx {Function} {} ARCOS (@var{number}) +@deftypefn {Function} {} ARCOS (@var{number}) Takes the arccosine, in radians, of @var{number}. Results in -system-missing if @var{number} is not between -1 and 1. Portability: -none. +system-missing if @var{number} is not between -1 and 1. @end deftypefn @cindex arcsine @@ -3936,26 +3910,6 @@ system-missing if @var{number} is not between -1 and 1 inclusive. Takes the arctangent, in radians, of @var{number}. @end deftypefn -@cindex arcsine -@cindex inverse sine -@deftypefn {Function} {} ASIN (@var{number}) -Takes the arcsine, in radians, of @var{number}. Results in -system-missing if @var{number} is not between -1 and 1 inclusive. -Portability: none. -@end deftypefn - -@cindex arctangent -@cindex inverse tangent -@deftypefn {Function} {} ATAN (@var{number}) -Takes the arctangent, in radians, of @var{number}. -@end deftypefn - -@quotation -@strong{Please note:} Use of the AR* group of inverse trigonometric -functions is recommended over the A* group because they are more -portable. -@end quotation - @cindex cosine @deftypefn {Function} {} COS (@var{angle}) Takes the cosine of @var{angle} which should be in radians. @@ -3980,57 +3934,42 @@ Portability: none. @cindex values, missing @cindex functions, missing-value -Missing-value functions take various types as arguments, returning -various types of results. - -@deftypefn {Function} {} MISSING (@var{variable or expression}) -@var{num} may be a single variable name or an expression. If it is a -variable name, results in 1 if the variable has a user-missing or -system-missing value for the current case, 0 otherwise. If it is an -expression, results in 1 if the expression has the system-missing value, -0 otherwise. +Missing-value functions take various numeric arguments and yield +various types of results. Note that the normal rules of evaluation +apply within expression arguments to these functions. In particular, +user-missing values for numeric variables are converted to +system-missing values. -@quotation -@strong{Please note:} If the argument is a string expression other than -a variable name, MISSING is guaranteed to return 0, because strings do -not have a system-missing value. Also, when using a numeric expression -argument, remember that user-missing values are converted to the -system-missing value in most contexts. Thus, the expressions -@code{MISSING(VAR1 @var{op} VAR2)} and @code{MISSING(VAR1) OR -MISSING(VAR2)} are often equivalent, depending on the specific operator -@var{op} used. -@end quotation +@deftypefn {Function} {} MISSING (@var{expr}) +Returns 1 if @var{expr} has the system-missing value, 0 otherwise. @end deftypefn @deftypefn {Function} {} NMISS (@var{expr} [, @var{expr}]@dots{}) Each argument must be a numeric expression. Returns the number of -user- or system-missing values in the list. As a special extension, +system-missing values in the list. As a special extension, the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a range of variables; see @ref{Sets of Variables}, for more details. @end deftypefn @deftypefn {Function} {} NVALID (@var{expr} [, @var{expr}]@dots{}) Each argument must be a numeric expression. Returns the number of -values in the list that are not user- or system-missing. As a special extension, +values in the list that are not system-missing. As a special extension, the syntax @code{@var{var1} TO @var{var2}} may be used to refer to a range of variables; see @ref{Sets of Variables}, for more details. @end deftypefn -@deftypefn {Function} {} SYSMIS (@var{variable or expression}) -When given the name of a numeric variable, returns 1 if the value of -that variable is system-missing. Otherwise, if the value is not -missing or if it is user-missing, returns 0. If given the name of a -string variable, always returns 1. If given an expression other than -a single variable name, results in 1 if the value is system- or -user-missing, 0 otherwise. +@deftypefn {Function} {} SYSMIS (@var{expr}) +When @var{expr} is simply the name of a numeric variable, returns 1 if +the variable has the system-missing value, 0 if it is user-missing or +not missing. If given @var{expr} takes another form, results in 1 if +the value is system-missing, 0 otherwise. @end deftypefn @deftypefn {Function} {} VALUE (@var{variable}) Prevents the user-missing values of @var{variable} from being -transformed into system-missing values: If @var{variable} is not -system- or user-missing, results in the value of @var{variable}. If -@var{variable} is user-missing, results in the value of @var{variable} -anyway. If @var{variable} is system-missing, results in system-missing. +transformed into system-missing values, and always results in the +actual value of @var{variable}, whether it is user-missing, +system-missing or not missing at all. @end deftypefn @node Pseudo-Random Numbers, Set Membership, Missing Value Functions, Functions @@ -4171,15 +4110,9 @@ non-missing result. @end deftypefn @cindex variance -@deftypefn {Function} {} VAR (@var{number}, @var{number}[, @dots{}]) -Results in the variance of the values of @var{number}. This function -requires at least two valid arguments to give a non-missing result. -@end deftypefn - @deftypefn {Function} {} VARIANCE (@var{number}, @var{number}[, @dots{}]) Results in the variance of the values of @var{number}. This function requires at least two valid arguments to give a non-missing result. -(Use VAR in preference to VARIANCE for reasons of portability.) @end deftypefn @node String Functions, Time & Date, Statistical Functions, Functions @@ -4258,20 +4191,15 @@ empty string. @cindex numbers, converting from strings @cindex strings, converting to numbers -@deftypefn {Function} {} NUMBER (@var{string}) -Returns the number produced when @var{string} is interpreted according -to format F@var{x}.0, where @var{x} is the number of characters in -@var{string}. If @var{string} does not form a proper number, -system-missing is returned without an error message. Portability: none. -@end deftypefn - @deftypefn {Function} {} NUMBER (@var{string}, @var{format}) Returns the number produced when @var{string} is interpreted according -to format specifier @var{format}. Only the number of characters in -@var{string} specified by @var{format} are examined. For example, -@code{NUMBER("123", F3.0)} and @code{NUMBER("1234", F3.0)} both have -value 123. If @var{string} does not form a proper number, -system-missing is returned without an error message. +to format specifier @var{format}. If the format width @var{w} is less +than the length of @var{string}, then only the first @var{w} +characters in @var{string} are used, e.g.@: @code{NUMBER("123", F3.0)} +and @code{NUMBER("1234", F3.0)} both have value 123. If @var{w} is +greater than @var{string}'s length, then it is treated as if it were +right-padded with spaces. If @var{string} is not in the correct +format for @var{format}, system-missing is returned. @end deftypefn @cindex strings, searching backwards diff --git a/src/ChangeLog b/src/ChangeLog index a027c8d4..aeb5cac1 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,182 @@ +Fri Mar 26 00:54:57 2004 Ben Pfaff + + * var-labs.c: (cmd_variable_labels) For compatibility, don't allow + `/' at start. Check return value of parse_variables() for error + return. + +Fri Mar 26 00:19:27 2004 Ben Pfaff + + Revamp expressions: make the code a little nicer, and fix bugs + found in testing. + + * expr-evl.c: (expr_evaluate) Make expression argument const. + Support OP_ADD, OP_SUB, OP_MUL, OP_DIV instead of OP_PLUS, OP_MUL. + OP_POW is missing for arg 2 <= 0. OP_LOG is natural log, not + base-10 log. Fix OP_ANY, OP_ANY_STRING, OP_RANGE, OP_RANGE_STRING + off-by-ones. Add OP_MAX_STRING, OP_MIN_STRING. Fix OP_TIME_HMS, + OP_DATE_WKYR boundary conditions. Add OP_CTIME_DAYS, + OP_CTIME_HOURS, OP_CTIME_MINUTES, OP_CTIME_DAYS, OP_CTIME_SECONDS. + Support OP_INDEX_2, OP_INDEX_3, OP_RINDEX_2, OP_RINDEX_3 instead + of OP_INDEX, OP_INDEX_OPT, OP_RINDEX, OP_RINDEX_OPT. Merge + OP_LPAD_OPT into OP_LPAD, OP_RPAD_OPT into OP_RPAD, OP_LTRIM_OPT + into OP_LTRIM, OP_RTRIM_OPT into OP_RTRIM, OP_NUMBER_OPT into + OP_NUMBER. Fix OP_RTRIM fragility. Support OP_SUBSTR_2, + OP_SUBSTR_3 instead of OP_SUBSTR, OP_SUBSTR_OPT. Remove OP_INV. + Simplify OP_SYSMIS. Remove OP_STR_MIS. + + * expr-opt.c: (optimize_expression) Rewrite. + (macro n0) Removed. + (macro n1) Removed. + (macro n2) Removed. + (macro s0) Removed. + (macro s0l) Removed. + (macro s1) Removed. + (macro s1l) Removed. + (macro s2) Removed. + (macro s2l) Removed. + (macro s) Removed. + (macro sl) Removed. + (eq_num_con) New function. + (optimize_tree) New function. + (macro rnc) Removed. + (macro frnc) Removed. + (str_search) Add const to string params. + (str_rsearch) Ditto. + (evaluate_tree_no_missing) Renamed from evaluate_tree. Add num[], + str[], str_len[] locals to substitute for most of removed macros. + Support OP_ADD, OP_SUB, OP_MUL, OP_DIV instead of OP_PLUS, OP_MUL. + Removed support for missing values because we're never called with + missing values. Use set_number() or set_number_errno() instead of + rnc or frnc. Removed any stuff that caused trouble in testing. + We can re-add it later if it really slows anything. Fix some + random problems. + (evaluate_tree_with_missing) Not yet supported. To be added later + if it's important. + (repl_num_con) Removed. + (collapse_node) New function. + (force_repl_num_con) Removed. + (set_number) New function. + (set_number_errno) New function. + (repl_str_con) Removed. + (set_string) New function. + (yrmoda) Tighten boundary conditions. Adopt 2030 cut-off for + 2-digit years. + (dump_node) No special case for OP_AND, OP_OR. + + * expr-prs.c: (expr_prs) Honor EXPR_NO_OPTIMIZE bit. Rewrite. + (expr_get_type) New function. + (type_check) Rewrite. + (type_coercion) New function. + (struct operator) New structure. + (match_operator New function. + (parse_binary_operators) New function. + (parse_inverting_unary_operator) New function. + (parse_or) Rewritten. + (parse_and) Rewritten. + (parse_not) Rewritten. + (parse_rel) Rewritten. + (parse_add) Rewritten. + (parse_mul) Rewritten. + (parse_neg) Rewritten. + (parse_exp) Rewritten. + (parse_sysvar) Add $TRUE, $FALSE system variables. + Get $LENGTH, $WIDTH from get_viewlength(), get_viewwidth(). + (parse_primary) Use allocate_var_node(), allocate_num_con(), + allocate_str_con(). + (struct function) Remove desc, change `func' prototype. + (unary_func) Remove special cases. + (MISSING_func) Reduce to unary_func() that just returns a boolean. + (SYSMIS_func) Handle SYSMIS((x)) like SYSMIS(x). + (VALUE_func) Use allocate_var_node(). + (nary_num_func) Allow MIN and MAX for strings. + Use allocate_var_node(). Properly clean up. + Fix return type. + (generic_str_func) Use local table instead of removed `desc' + member. Mostly rewrite. + (get_num_args) Revise error message. + (parse_function) Return EXPR_ERROR, not 0 on error. + (macro op) Removed. + (macro varies) Removed. + (ops[]) Use expr.def. + (free_node) Do nothing if node is null. + (allocate_num_con) New function. + (allocate_str_con) New function. + (allocate_var_node) New function. + (allocate_binary_nonterminal) New function. + (append_nonterminal_arg) Removed. + (static var func_tab[]) Revised. + (expr_debug_print_postfix) Make parameter const. + Use printf() instead of debug_printf(). + + * expr.def: New file. + + * expr.h: Change PXP_* to EXPR_*, all references updated. Also + use named enum instead of unnamed, all references updated. Add + EXPR_ANY, EXPR_NO_OPTIMIZE. + + * exprP.h: Remove EX_*. Add DEFINE_OPERATOR. Use expr.def + instead of defining OP_* directly. + (macro IS_TERMINAL) New macro. + (macro IS_NONTERMINAL) New macro. + (enum OP_NO_FLAGS) New. + +Fri Mar 26 00:18:01 2004 Ben Pfaff + + * error.c: (err_assert_fail) msg variable needs to be non-const. + +Fri Mar 26 00:17:24 2004 Ben Pfaff + + * debug.c: (cmd_debug_evaluate) Rewrite. + +Fri Mar 26 00:15:13 2004 Ben Pfaff + + Fix some CROSSTABS bit rot stupidity. + + * crosstabs.q: Reorder the CELLS subcommands for compatibility. + (internal_cmd_crosstabs) Initializes cells[] correctly. + (float_M_suffix) Rename format_cell_entry(), change prototype, + rewrite. + (display_crosstabulation) Fix cell formatting. + +Fri Mar 26 00:14:09 2004 Ben Pfaff + + Make lex_rest_of_line(), lex_entire_end() not discard lines. Have + to call lex_discard_line() to do that. + + * command.c: (run_command) Call lex_discard_line() after + lex_rest_of_line(). + + * lexer.c: (lex_entire_end) Change behavior. + (lex_rest_of_line) Change behavior. Return const char *. + (lex_discard_line) Don't clear getl_buf, don't emit message. + + * main.c: (handle_error) Emit message here. + + * repeat.c: (internal_cmd_do_repeat) Use lex_discard_line() + instead of lex_entire_line(). + + * str.c: (mm_find_reverse) Make length params size_t. Rewrite. + + * title.c: (get_title) Call lex_discard_line() after + lex_rest_of_line(). + (cmd_file_label) Ditto. + (cmd_document) Deal with const char * return value. + +Fri Mar 26 00:10:16 2004 Ben Pfaff + + Removed REMARK command. + + * command.c: (extract_prefix) Removed. + (output_line) Removed. + (cmd_remark) Removed. + + * command.def: Remove REMARK. + +Fri Mar 26 00:08:38 2004 Ben Pfaff + + Added abort() after lots of assert(0) invocations to avoid some + compiler warnings. We really need a NOT_REACHED macro. + Tue Mar 23 08:00:42 WAST 2004 John Darrington * sort.c: Added missing call to temp_file_close. Changed error diff --git a/src/ascii.c b/src/ascii.c index 72e536b1..9c0103e0 100644 --- a/src/ascii.c +++ b/src/ascii.c @@ -595,6 +595,7 @@ ascii_option (struct outp_driver *this, const char *key, break; default: assert (0); + abort (); } ls_create (ascii_pool, s, value); } @@ -1216,6 +1217,7 @@ output_shorts (struct outp_driver *this, break; default: assert (0); + abort (); } if (!on) { @@ -1244,6 +1246,7 @@ output_shorts (struct outp_driver *this, break; default: assert (0); + abort (); } else { @@ -1330,6 +1333,7 @@ return_carriage (struct outp_driver *this, int n_chars) break; default: assert (0); + abort (); } } @@ -1403,6 +1407,7 @@ output_lines (struct outp_driver *this, int first, int count) break; default: assert (0); + abort (); } if (off) output_string (this, ls_value (off), ls_end (off)); @@ -1427,6 +1432,7 @@ output_lines (struct outp_driver *this, int first, int count) break; default: assert (0); + abort (); } if (on) output_string (this, ls_value (on), ls_end (on)); @@ -1463,6 +1469,7 @@ output_lines (struct outp_driver *this, int first, int count) break; default: assert (0); + abort (); } output_char (this, 1, ch); n_chars += ep - bp + 1; diff --git a/src/command.c b/src/command.c index 37608055..071c7e4e 100644 --- a/src/command.c +++ b/src/command.c @@ -585,108 +585,6 @@ cmd_finish (void) return CMD_SUCCESS; } -/* Extracts a null-terminated 8-or-fewer-character PREFIX from STRING. - PREFIX is converted to lowercase. Removes trailing spaces from - STRING as a side effect. */ -static void -extract_prefix (char *string, char *prefix) -{ - /* Length of STRING. */ - int len; - - /* Points to the null terminator in STRING (`end pointer'). */ - char *ep; - - /* Strip spaces from end of STRING. */ - len = strlen (string); - while (len && isspace ((unsigned char) string[len - 1])) - string[--len] = 0; - - /* Find null terminator. */ - ep = memchr (string, '\0', 8); - if (!ep) - ep = &string[8]; - - /* Copy prefix, converting to lowercase. */ - while (string < ep) - *prefix++ = tolower ((unsigned char) (*string++)); - *prefix = 0; -} - -/* Prints STRING on the console and to the listing file, replacing \n - by newline. */ -static void -output_line (char *string) -{ - /* Location of \n in line read in. */ - char *cp; - - cp = strstr (string, "\\n"); - while (cp) - { - *cp = 0; - tab_output_text (TAB_LEFT | TAT_NOWRAP, string); - string = &cp[2]; - cp = strstr (string, "\\n"); - } - tab_output_text (TAB_LEFT | TAT_NOWRAP, string); -} - -/* Parse and execute REMARK command. */ -int -cmd_remark () -{ - /* Points to the line read in. */ - char *s; - - /* Index into s. */ - char *cp; - - /* 8-character sentinel used to terminate remark. */ - char sentinel[9]; - - /* Beginning of line used to compare with SENTINEL. */ - char prefix[9]; - - som_blank_line (); - - s = lex_rest_of_line (NULL); - if (*s == '-') - { - output_line (&s[1]); - return CMD_SUCCESS; - } - - /* Read in SENTINEL from end of current line. */ - cp = s; - while (isspace ((unsigned char) *cp)) - cp++; - extract_prefix (cp, sentinel); - if (sentinel[0] == 0) - { - msg (SE, _("The sentinel may not be the empty string.")); - return CMD_FAILURE; - } - - /* Read in other lines until we encounter the sentinel. */ - while (getl_read_line ()) - { - extract_prefix (ds_value (&getl_buf), prefix); - if (!strcmp (sentinel, prefix)) - break; - - /* Output the line. */ - output_line (ds_value (&getl_buf)); - } - - /* Calling lex_entire_line() forces the sentinel line to be - discarded. */ - getl_prompt = GETL_PRPT_STANDARD; - lex_entire_line (); - - return CMD_SUCCESS; -} - /* Parses the N command. */ int cmd_n_of_cases (void) @@ -797,7 +695,7 @@ shell (void) static int run_command (void) { - char *cmd; + const char *cmd; int string; /* Handle either a string argument or a full-line argument. */ @@ -815,6 +713,7 @@ run_command (void) else { cmd = lex_rest_of_line (NULL); + lex_discard_line (); string = 0; } } diff --git a/src/command.def b/src/command.def index beb18c4e..09f809cd 100644 --- a/src/command.def +++ b/src/command.def @@ -105,7 +105,6 @@ DEFCMD ("QUIT", INIT, INPU, TRAN, PROC, cmd_exit) DEFCMD ("RECODE", ERRO, INPU, TRAN, TRAN, cmd_recode) DEFCMD ("RECORD TYPE", ERRO, INPU, ERRO, ERRO, cmd_record_type) UNIMPL ("REFORMAT", ERRO, ERRO, TRAN, TRAN) -SPCCMD ("REMARK", INIT, INPU, TRAN, PROC, cmd_remark) DEFCMD ("RENAME VARIABLES", ERRO, INPU, TRAN, PROC, cmd_rename_variables) DEFCMD ("REPEATING DATA", ERRO, INPU, ERRO, ERRO, cmd_repeating_data) DEFCMD ("REREAD", ERRO, INPU, ERRO, ERRO, cmd_reread) diff --git a/src/compute.c b/src/compute.c index 7ef93858..4cc9a187 100644 --- a/src/compute.c +++ b/src/compute.c @@ -233,7 +233,7 @@ cmd_if (void) compute = compute_trns_create (); /* Test expression. */ - compute->test = expr_parse (PXP_BOOLEAN); + compute->test = expr_parse (EXPR_BOOLEAN); if (compute->test == NULL) goto fail; @@ -276,7 +276,7 @@ parse_rvalue_expression (struct compute_trns *compute, assert (type == NUMERIC || type == ALPHA); - compute->rvalue = expr_parse (type == ALPHA ? PXP_STRING : PXP_NUMERIC); + compute->rvalue = expr_parse (type == ALPHA ? EXPR_STRING : EXPR_NUMERIC); if (compute->rvalue == NULL) return 0; @@ -357,7 +357,7 @@ lvalue_parse (void) lex_get (); if (!lex_force_match ('(')) goto lossage; - lvalue->element = expr_parse (PXP_NUMERIC); + lvalue->element = expr_parse (EXPR_NUMERIC); if (lvalue->element == NULL) goto lossage; if (!lex_force_match (')')) diff --git a/src/crosstabs.q b/src/crosstabs.q index 09a7f761..c3a17aaf 100644 --- a/src/crosstabs.q +++ b/src/crosstabs.q @@ -64,7 +64,7 @@ tabl:!tables/notables, box:!box/nobox, pivot:!pivot/nopivot; - +cells[cl_]=count,none,row,column,total,expected,residual,sresidual, + +cells[cl_]=count,none,expected,row,column,total,residual,sresidual, asresidual,all; +statistics[st_]=chisq,phi,cc,lambda,uc,none,btau,ctau,risk,gamma,d, kappa,eta,corr,all. @@ -178,6 +178,8 @@ cmd_crosstabs (void) static int internal_cmd_crosstabs (void) { + int i; + variables = NULL; variables_cnt = 0; xtab = NULL; @@ -199,11 +201,9 @@ internal_cmd_crosstabs (void) if (!cmd.sbc_cells) { cmd.a_cells[CRS_CL_COUNT] = 1; - num_cells = 1; } else { - int i; int count = 0; for (i = 0; i < CRS_CL_count; i++) @@ -223,10 +223,10 @@ internal_cmd_crosstabs (void) cmd.a_cells[CRS_CL_ALL] = 0; } cmd.a_cells[CRS_CL_NONE] = 0; - for (num_cells = i = 0; i < CRS_CL_count; i++) - if (cmd.a_cells[i]) - cmd.a_cells[num_cells++] = i; } + for (num_cells = i = 0; i < CRS_CL_count; i++) + if (cmd.a_cells[i]) + cells[num_cells++] = i; /* STATISTICS. */ if (cmd.sbc_statistics) @@ -1766,22 +1766,31 @@ display_dimensions (struct tab_table *table, int first_difference, struct table_ x->vars[first_difference]); } -/* Put value V into cell (C,R) of TABLE, suffixed with letter M. */ +/* Put VALUE into cell (C,R) of TABLE, suffixed with character + SUFFIX if nonzero. If MARK_MISSING is nonzero the entry is + additionally suffixed with a letter `M'. */ static void -float_M_suffix (struct tab_table *table, int c, int r, double v) +format_cell_entry (struct tab_table *table, int c, int r, double value, + char suffix, int mark_missing) { - static const struct fmt_spec f = {FMT_F, 8, 0}; + const struct fmt_spec f = {FMT_F, 10, 1}; + union value v; struct len_string s; - - s.length = 9; - s.string = tab_alloc (table, 9); - s.string[8] = 'M'; - format_short (s.string, &f, (union value *) &v); + + s.length = 10; + s.string = tab_alloc (table, 16); + v.f = value; + data_out (s.string, &f, &v); while (*s.string == ' ') { s.length--; s.string++; } + if (suffix != 0) + s.string[s.length++] = suffix; + if (mark_missing) + s.string[s.length++] = 'M'; + tab_raw (table, c, r, TAB_RIGHT, &s); } @@ -1811,10 +1820,16 @@ display_crosstabulation (void) tab_hline (table, TAL_1, -1, n_cols, 0); for (c = 0; c < n_cols; c++) { - double expected_value = row_tot[r] * col_tot[c] / W; + int mark_missing = 0; + double expected_value = row_tot[r] * col_tot[c] / W; + if (cmd.miss == CRS_REPORT + && (is_num_user_missing (cols[c].f, x->vars[COL_VAR]) + || is_num_user_missing (rows[r].f, x->vars[ROW_VAR]))) + mark_missing = 1; for (i = 0; i < num_cells; i++) { double v; + int suffix = 0; switch (cells[i]) { @@ -1823,12 +1838,15 @@ display_crosstabulation (void) break; case CRS_CL_ROW: v = *mp / row_tot[r] * 100.; + suffix = '%'; break; case CRS_CL_COLUMN: v = *mp / col_tot[c] * 100.; + suffix = '%'; break; case CRS_CL_TOTAL: v = *mp / W * 100.; + suffix = '%'; break; case CRS_CL_EXPECTED: v = expected_value; @@ -1847,14 +1865,10 @@ display_crosstabulation (void) break; default: assert (0); + abort (); } - if (cmd.miss == CRS_REPORT - && (is_num_user_missing (cols[c].f, x->vars[COL_VAR]) - || is_num_user_missing (rows[r].f, x->vars[ROW_VAR]))) - float_M_suffix (table, c, i, v); - else if (v != 0.) - tab_float (table, c, i, TAB_RIGHT, v, 8, 0); + format_cell_entry (table, c, i, v, suffix, mark_missing); } mp++; @@ -1869,48 +1883,56 @@ display_crosstabulation (void) int r, i; tab_offset (table, -1, tab_row (table) - num_cells * n_rows); - for (r = 0; r < n_rows; r++) - for (i = 0; i < num_cells; i++) - { - double v; - - switch (cells[i]) - { - case CRS_CL_COUNT: - v = row_tot[r]; - break; - case CRS_CL_ROW: - v = 100.; - break; - case CRS_CL_COLUMN: - v = row_tot[r] / W * 100.; - break; - case CRS_CL_TOTAL: - v = row_tot[r] / W * 100.; - break; - case CRS_CL_EXPECTED: - case CRS_CL_RESIDUAL: - case CRS_CL_SRESIDUAL: - case CRS_CL_ASRESIDUAL: - v = 0.; - break; - default: - assert (0); - } - - if (cmd.miss == CRS_REPORT - && is_num_user_missing (rows[r].f, x->vars[ROW_VAR])) - float_M_suffix (table, n_cols, 0, v); - else if (v != 0.) - tab_float (table, n_cols, 0, TAB_RIGHT, v, 8, 0); - - tab_next_row (table); - } + for (r = 0; r < n_rows; r++) + { + char suffix = 0; + int mark_missing = 0; + + if (cmd.miss == CRS_REPORT + && is_num_user_missing (rows[r].f, x->vars[ROW_VAR])) + mark_missing = 1; + + for (i = 0; i < num_cells; i++) + { + double v; + + switch (cells[i]) + { + case CRS_CL_COUNT: + v = row_tot[r]; + break; + case CRS_CL_ROW: + v = 100.; + suffix = '%'; + break; + case CRS_CL_COLUMN: + v = row_tot[r] / W * 100.; + suffix = '%'; + break; + case CRS_CL_TOTAL: + v = row_tot[r] / W * 100.; + suffix = '%'; + break; + case CRS_CL_EXPECTED: + case CRS_CL_RESIDUAL: + case CRS_CL_SRESIDUAL: + case CRS_CL_ASRESIDUAL: + v = 0.; + break; + default: + assert (0); + abort (); + } + + format_cell_entry (table, n_cols, 0, v, suffix, mark_missing); + tab_next_row (table); + } + } } /* Column totals, grand total. */ { - int c, j; + int c; int last_row = 0; if (num_cells > 1) @@ -1918,9 +1940,15 @@ display_crosstabulation (void) for (c = 0; c <= n_cols; c++) { double ct = c < n_cols ? col_tot[c] : W; - int i; + int mark_missing = 0; + char suffix = 0; + int i; - for (i = j = 0; i < num_cells; i++) + if (cmd.miss == CRS_REPORT && c < n_cols + && is_num_user_missing (cols[c].f, x->vars[COL_VAR])) + mark_missing = 1; + + for (i = 0; i < num_cells; i++) { double v; @@ -1928,15 +1956,19 @@ display_crosstabulation (void) { case CRS_CL_COUNT: v = ct; + suffix = '%'; break; case CRS_CL_ROW: v = ct / W * 100.; + suffix = '%'; break; case CRS_CL_COLUMN: v = 100.; + suffix = '%'; break; case CRS_CL_TOTAL: v = ct / W * 100.; + suffix = '%'; break; case CRS_CL_EXPECTED: case CRS_CL_RESIDUAL: @@ -1945,17 +1977,12 @@ display_crosstabulation (void) continue; default: assert (0); + abort (); } - if (cmd.miss == CRS_REPORT && c < n_cols - && is_num_user_missing (cols[c].f, x->vars[COL_VAR])) - float_M_suffix (table, c, j, v); - else if (v != 0.) - tab_float (table, c, j, TAB_RIGHT, v, 8, 0); - - j++; + format_cell_entry (table, c, i, v, suffix, mark_missing); } - last_row = j; + last_row = i; } tab_offset (table, -1, tab_row (table) + last_row); diff --git a/src/data-list.c b/src/data-list.c index 044754d2..92be7cb8 100644 --- a/src/data-list.c +++ b/src/data-list.c @@ -1011,6 +1011,7 @@ get_data_list_read_func (const struct data_list_pgm *dls) default: assert (0); + abort (); } } diff --git a/src/data-out.c b/src/data-out.c index de9bfad3..e8d0fa3b 100644 --- a/src/data-out.c +++ b/src/data-out.c @@ -103,9 +103,11 @@ data_out (char *s, const struct fmt_spec *fp, const union value *v) case FMT_A: assert (0); + abort (); case FMT_AHEX: assert (0); + abort (); case FMT_IB: ok = convert_IB (s, fp, number); @@ -159,6 +161,7 @@ data_out (char *s, const struct fmt_spec *fp, const union value *v) default: assert (0); + abort (); } } else @@ -178,6 +181,7 @@ data_out (char *s, const struct fmt_spec *fp, const union value *v) default: assert (0); + abort (); } } diff --git a/src/debug.c b/src/debug.c index 441504f2..b1603bf3 100644 --- a/src/debug.c +++ b/src/debug.c @@ -18,6 +18,8 @@ 02111-1307, USA. */ #include +#include +#include #include "command.h" #include "error.h" #include "expr.h" @@ -28,18 +30,66 @@ int cmd_debug_evaluate (void) { struct expression *expr; + union value value; + enum expr_type expr_flags; + int dump_postfix = 0; discard_variables (); - expr = expr_parse (PXP_NONE); - if (!expr) - return CMD_FAILURE; - expr_free (expr); - if (token != '.') + expr_flags = 0; + if (lex_match_id ("NOOPTIMIZE")) + expr_flags |= EXPR_NO_OPTIMIZE; + if (lex_match_id ("POSTFIX")) + dump_postfix = 1; + if (token != '/') { - msg (SE, _("Extra characters after expression.")); + lex_force_match ('/'); return CMD_FAILURE; } + fprintf (stderr, "%s => ", lex_rest_of_line (NULL)); + lex_get (); + + expr = expr_parse (EXPR_ANY | expr_flags); + if (!expr || token != '.') + { + fprintf (stderr, "error\n"); + return CMD_FAILURE; + } + + if (dump_postfix) + expr_debug_print_postfix (expr); + else + { + expr_evaluate (expr, NULL, 0, &value); + switch (expr_get_type (expr)) + { + case EXPR_NUMERIC: + if (value.f == SYSMIS) + fprintf (stderr, "sysmis\n"); + else + fprintf (stderr, "%.2f\n", value.f); + break; + + case EXPR_BOOLEAN: + if (value.f == SYSMIS) + fprintf (stderr, "sysmis\n"); + else if (value.f == 0.0) + fprintf (stderr, "false\n"); + else + fprintf (stderr, "true\n"); + break; + + case EXPR_STRING: + fputc ('"', stderr); + fwrite (value.c + 1, value.c[0], 1, stderr); + fputs ("\"\n", stderr); + break; + + default: + assert (0); + } + } + expr_free (expr); return CMD_SUCCESS; } diff --git a/src/do-if.c b/src/do-if.c index 5dfc0626..6f9aa507 100644 --- a/src/do-if.c +++ b/src/do-if.c @@ -248,7 +248,7 @@ parse_do_if (void) struct do_if_trns *t; struct expression *e; - e = expr_parse (PXP_BOOLEAN); + e = expr_parse (EXPR_BOOLEAN); if (!e) return NULL; if (token != '.') diff --git a/src/error.c b/src/error.c index 1065fbd7..7a7e997d 100644 --- a/src/error.c +++ b/src/error.c @@ -559,7 +559,7 @@ request_bug_report_and_abort(const char *msg ) void err_assert_fail(const char *expr, const char *file, int line) { - const char msg[256]; + char msg[256]; snprintf(msg,256,"Assertion failed: %s:%d; (%s)",file,line,expr); request_bug_report_and_abort( msg ); } diff --git a/src/expr-evl.c b/src/expr-evl.c index d18971bd..4849d479 100644 --- a/src/expr-evl.c +++ b/src/expr-evl.c @@ -51,7 +51,7 @@ #include "vfmP.h" double -expr_evaluate (struct expression *e, const struct ccase *c, int case_num, +expr_evaluate (const struct expression *e, const struct ccase *c, int case_num, union value *v) { unsigned char *op = e->op; @@ -69,35 +69,33 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, { switch (*op++) { - case OP_PLUS: - sp -= *op - 1; - if (sp->f != SYSMIS) - for (i = 1; i < *op; i++) - { - if (sp[i].f == SYSMIS) - { - sp->f = SYSMIS; - break; - } - else - sp->f += sp[i].f; - } - op++; + case OP_ADD: + sp--; + if (sp[1].f == SYSMIS) + sp[0].f = SYSMIS; + else if (sp[0].f != SYSMIS) + sp[0].f += sp[1].f; + break; + case OP_SUB: + sp--; + if (sp[1].f == SYSMIS) + sp[0].f = SYSMIS; + else if (sp[0].f != SYSMIS) + sp[0].f -= sp[1].f; break; case OP_MUL: - sp -= *op - 1; - if (sp->f != SYSMIS) - for (i = 1; i < *op; i++) - { - if (sp[i].f == SYSMIS) - { - sp->f = SYSMIS; - break; - } - else - sp->f *= sp[i].f; - } - op++; + sp--; + if (sp[1].f == SYSMIS) + sp[0].f = SYSMIS; + else if (sp[0].f != SYSMIS) + sp[0].f *= sp[1].f; + break; + case OP_DIV: + sp--; + if (sp[1].f == SYSMIS || sp[1].f == 0.) + sp[0].f = SYSMIS; + else if (sp[0].f != SYSMIS) + sp[0].f /= sp[1].f; break; case OP_POW: sp--; @@ -109,12 +107,11 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, else if (sp[1].f == SYSMIS) { if (sp[0].f == 0.0) - /* SYSMIS**0 */ sp->f = 0.0; else sp->f = SYSMIS; } - else if (sp[0].f == 0.0 && sp[1].f == 0.0) + else if (sp[0].f == 0.0 && sp[1].f <= 0.0) sp->f = SYSMIS; else sp->f = pow (sp[0].f, sp[1].f); @@ -247,32 +244,32 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, break; /* String operators. */ - case OP_STRING_EQ: + case OP_EQ_STRING: sp--; sp[0].f = st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[1].c[1], sp[1].c[0]) == 0; break; - case OP_STRING_GE: + case OP_GE_STRING: sp--; sp[0].f = st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[1].c[1], sp[1].c[0]) >= 0; break; - case OP_STRING_GT: + case OP_GT_STRING: sp--; sp[0].f = st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[1].c[1], sp[1].c[0]) > 0; break; - case OP_STRING_LE: + case OP_LE_STRING: sp--; sp[0].f = st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[1].c[1], sp[1].c[0]) <= 0; break; - case OP_STRING_LT: + case OP_LT_STRING: sp--; sp[0].f = st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[1].c[1], sp[1].c[0]) < 0; break; - case OP_STRING_NE: + case OP_NE_STRING: sp--; sp[0].f = st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[1].c[1], sp[1].c[0]) != 0; @@ -335,7 +332,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, if (sp->f != SYSMIS) { errno = 0; - sp->f = log10 (sp->f); + sp->f = log (sp->f); if (errno) sp->f = SYSMIS; } @@ -394,7 +391,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp -= n_args - 1; if (sp->f == SYSMIS) break; - for (i = 1; i <= n_args; i++) + for (i = 1; i < n_args; i++) if (sp[0].f == sp[i].f) { sp->f = 1.0; @@ -408,16 +405,17 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, case OP_ANY_STRING: { int n_args = *op++; + int result = 0.0; sp -= n_args - 1; - for (i = 1; i <= n_args; i++) + for (i = 1; i < n_args; i++) if (!st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[i].c[1], sp[i].c[0])) { - sp->f = 1.0; - goto main_loop; + result = 1.0; + break; } - sp->f = 0.0; + sp->f = result; } break; case OP_CFVAR: @@ -461,6 +459,19 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp->f = max; } break; + case OP_MAX_STRING: + { + int n_args = *op++; + int max_idx = 0; + + sp -= n_args - 1; + for (i = 1; i < n_args; i++) + if (st_compare_pad (&sp[i].c[1], sp[i].c[0], + &sp[max_idx].c[1], sp[max_idx].c[0]) > 0) + max_idx = i; + sp[0].c = sp[max_idx].c; + } + break; case OP_MEAN: { int n_args = *op++; @@ -501,6 +512,19 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp->f = min; } break; + case OP_MIN_STRING: + { + int n_args = *op++; + int min_idx = 0; + + sp -= n_args - 1; + for (i = 1; i < n_args; i++) + if (st_compare_pad (&sp[i].c[1], sp[i].c[0], + &sp[min_idx].c[1], sp[min_idx].c[0]) < 0) + min_idx = i; + sp[0].c = sp[min_idx].c; + } + break; case OP_NMISS: { int n_args = *op++; @@ -533,7 +557,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp -= n_args - 1; if (sp->f == SYSMIS) break; - for (i = 1; i <= n_args; i += 2) + for (i = 1; i < n_args; i += 2) if (sp[i].f == SYSMIS || sp[i + 1].f == SYSMIS) continue; else if (sp[0].f >= sp[i].f && sp[0].f <= sp[i + 1].f) @@ -551,7 +575,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, int n_args = *op++; sp -= n_args - 1; - for (i = 1; i <= n_args; i += 2) + for (i = 1; i < n_args; i += 2) if (st_compare_pad (&sp[0].c[1], sp[0].c[0], &sp[i].c[1], sp[i].c[0]) >= 0 && st_compare_pad (&sp[0].c[1], sp[0].c[0], @@ -628,13 +652,44 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, break; /* Time construction function. */ - case OP_TIME_HMS: - sp -= 2; - if (sp[0].f == SYSMIS || sp[1].f == SYSMIS || sp[2].f == SYSMIS) - sp->f = SYSMIS; - else - sp->f = 60. * (60. * sp[0].f + sp[1].f) + sp[2].f; + case OP_TIME_HMS: + sp -= 2; + if (sp[0].f == SYSMIS || sp[1].f == SYSMIS || sp[2].f == SYSMIS) + sp->f = SYSMIS; + else + { + double min, max; + min = min (sp[0].f, min (sp[1].f, sp[2].f)); + max = max (sp[0].f, max (sp[1].f, sp[2].f)); + if (min < 0. && max > 0.) + { + msg (SW, _("TIME.HMS cannot mix positive and negative " + "in its arguments.")); + sp->f = SYSMIS; + } + else + sp->f = 60. * (60. * sp[0].f + sp[1].f) + sp[2].f; + } + break; + case OP_CTIME_DAYS: + if (sp->f != SYSMIS) + sp->f /= 60. * 60. * 24.; break; + case OP_CTIME_HOURS: + if (sp->f != SYSMIS) + sp->f /= 60. * 60; + break; + case OP_CTIME_MINUTES: + if (sp->f != SYSMIS) + sp->f /= 60.; + break; + case OP_TIME_DAYS: + if (sp->f != SYSMIS) + sp->f *= 60. * 60. * 24.; + break; + case OP_CTIME_SECONDS: + /* No-op. */ + break; /* Date construction functions. */ case OP_DATE_DMY: @@ -668,14 +723,19 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, break; case OP_DATE_WKYR: sp--; - if (sp[0].f == SYSMIS) + if (sp[0].f == SYSMIS || sp[1].f == SYSMIS) sp->f = SYSMIS; - else + else if (sp[0].f < 0. || sp[0].f > 53.) + { + msg (SW, _("Week argument to WKYR must be in range 0 to 53.")); + sp->f = SYSMIS; + } + else { - sp[1].f = yrmoda (sp[1].f, 1, 1); - if (sp->f != SYSMIS) - sp[1].f = 60. * 60. * 24. * (sp[1].f + 7. * (floor (sp[0].f) - 1.)); - sp->f = sp[1].f; + double result = yrmoda (sp[1].f, 1, 1); + if (result != SYSMIS) + result = 60. * 60. * 24. * (result + 7. * (sp->f - 1.)); + sp->f = result; } break; case OP_DATE_YRDAY: @@ -792,83 +852,116 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp[0].c = dest; } break; - case OP_INDEX: + case OP_INDEX_2: sp--; if (sp[1].c[0] == 0) sp->f = SYSMIS; else { int last = sp[0].c[0] - sp[1].c[0]; + int result = 0; for (i = 0; i <= last; i++) - if (!memcmp (&sp[0].c[i + 1], &sp[0].c[1], sp[0].c[0])) + if (!memcmp (&sp[0].c[i + 1], &sp[1].c[1], sp[1].c[0])) { - sp->f = i + 1; - goto main_loop; + result = i + 1; + break; } - sp->f = 0.0; + sp->f = result; } break; - case OP_INDEX_OPT: - { - /* Length of each search string. */ - int part_len = sp[2].f; - - sp -= 2; - if (sp[1].c[0] == 0 || part_len <= 0 || sp[2].f == SYSMIS - || sp[1].c[0] % part_len != 0) - sp->f = SYSMIS; - else - { - /* Last possible index. */ - int last = sp[0].c[0] - part_len; - - for (i = 0; i <= last; i++) - for (j = 0; j < sp[1].c[0]; j += part_len) - if (!memcmp (&sp[0].c[i], &sp[1].c[j], part_len)) - { - sp->f = i + 1; - goto main_loop; - } - sp->f = 0.0; - } - } - break; - case OP_RINDEX: + case OP_INDEX_3: + sp -= 2; + if (sp[1].c[0] == 0) + { + sp->f = SYSMIS; + break; + } + else if (sp[2].f == SYSMIS) + { + msg (SW, _("Argument 3 of RINDEX may not be system-missing.")); + sp->f = SYSMIS; + } + else + { + int part_len = sp[2].f; + int result = 0; + if (part_len < 0 || part_len > sp[1].c[0] + || sp[1].c[0] % part_len != 0) + { + msg (SW, _("Argument 3 of RINDEX must be between 1 and " + "the length of argument 2, and it must " + "evenly divide the length of argument 2.")); + sp->f = SYSMIS; + break; + } + else + { + int last = sp[0].c[0] - part_len; + for (i = 0; i <= last; i++) + for (j = 0; j < sp[1].c[0]; j += part_len) + if (!memcmp (&sp[0].c[i + 1], &sp[1].c[j + 1], part_len)) + { + result = i + 1; + goto index_3_out; + } + index_3_out: + sp->f = result; + } + } + break; + case OP_RINDEX_2: sp--; if (sp[1].c[0] == 0) sp->f = SYSMIS; else { + int result = 0; for (i = sp[0].c[0] - sp[1].c[0]; i >= 0; i--) - if (!memcmp (&sp[0].c[i + 1], &sp[0].c[1], sp[0].c[0])) + if (!memcmp (&sp[0].c[i + 1], &sp[1].c[1], sp[1].c[0])) { - sp->f = i + 1; - goto main_loop; + result = i + 1; + break; } - sp->f = 0.0; + sp->f = result; } break; - case OP_RINDEX_OPT: - { - /* Length of each search string. */ - int part_len = sp[2].f; - - sp -= 2; - if (sp[1].c[0] == 0 || part_len <= 0 || sp[2].f == SYSMIS - || sp[1].c[0] % part_len != 0) - sp->f = SYSMIS; - else - { - for (i = sp[0].c[0] - part_len; i >= 0; i--) - for (j = 0; j < sp[1].c[0]; j += part_len) - if (!memcmp (&sp[0].c[i], &sp[1].c[j], part_len)) - { - sp->f = i + 1; - goto main_loop; - } - sp->f = 0.0; - } - } + case OP_RINDEX_3: + sp -= 2; + if (sp[1].c[0] == 0) + { + sp->f = SYSMIS; + break; + } + else if (sp[2].f == SYSMIS) + { + msg (SW, _("Argument 3 of RINDEX may not be system-missing.")); + sp->f = SYSMIS; + } + else + { + int part_len = sp[2].f; + int result = 0; + if (part_len < 0 || part_len > sp[1].c[0] + || sp[1].c[0] % part_len != 0) + { + msg (SW, _("Argument 3 of RINDEX must be between 1 and " + "the length of argument 2, and it must " + "evenly divide the length of argument 2.")); + sp->f = SYSMIS; + } + else + { + for (i = sp[0].c[0] - part_len; i >= 0; i--) + for (j = 0; j < sp[1].c[0]; j += part_len) + if (!memcmp (&sp[0].c[i + 1], &sp[1].c[j + 1], part_len)) + { + result = i + 1; + goto rindex_3_out; + } + rindex_3_out: + sp->f = result; + } + } break; case OP_LENGTH: sp->f = sp[0].c[0]; @@ -882,25 +975,6 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp[0].c[i] = toupper ((unsigned char) (sp[0].c[i])); break; case OP_LPAD: - { - int len; - sp--; - len = sp[1].f; - if (sp[1].f == SYSMIS || len < 0 || len > 255) - sp->c[0] = 0; - else if (len > sp[0].c[0]) - { - unsigned char *dest; - - dest = pool_alloc (e->pool, len + 1); - dest[0] = len; - memset (&dest[1], ' ', len - sp->c[0]); - memcpy (&dest[len - sp->c[0] + 1], &sp->c[1], sp->c[0]); - sp->c = dest; - } - } - break; - case OP_LPAD_OPT: { int len; sp -= 2; @@ -920,25 +994,6 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, } break; case OP_RPAD: - { - int len; - sp--; - len = sp[1].f; - if (sp[1].f == SYSMIS || len < 0 || len > 255) - sp->c[0] = 0; - else if (len > sp[0].c[0]) - { - unsigned char *dest; - - dest = pool_alloc (e->pool, len + 1); - dest[0] = len; - memcpy (&dest[1], &sp->c[1], sp->c[0]); - memset (&dest[sp->c[0] + 1], ' ', len - sp->c[0]); - sp->c = dest; - } - } - break; - case OP_RPAD_OPT: { int len; sp -= 2; @@ -958,20 +1013,6 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, } break; case OP_LTRIM: - { - int len = sp[0].c[0]; - - i = 1; - while (i <= len && sp[0].c[i] == ' ') - i++; - if (--i) - { - sp[0].c[i] = sp[0].c[0] - i; - sp->c = &sp[0].c[i]; - } - } - break; - case OP_LTRIM_OPT: { sp--; if (sp[1].c[0] != 1) @@ -993,50 +1034,30 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, } break; case OP_RTRIM: - assert (' ' != 0); - while (sp[0].c[sp[0].c[0]] == ' ') - sp[0].c[0]--; - break; - case OP_RTRIM_OPT: sp--; if (sp[1].c[0] != 1) sp[0].c[0] = 0; else { - /* Note that NULs are not allowed in strings. This code - needs to change if this decision is changed. */ int cmp = sp[1].c[1]; - while (sp[0].c[sp[0].c[0]] == cmp) + while (sp[0].c[0] > 0 && sp[0].c[sp[0].c[0]] == cmp) sp[0].c[0]--; } break; case OP_NUMBER: { struct data_in di; - - di.s = &sp->c[1]; - di.e = &sp->c[1] + sp->c[0]; - di.v = sp; - di.flags = DI_IGNORE_ERROR; - di.f1 = 1; - di.format.type = FMT_F; - di.format.w = sp->c[0]; - di.format.d = 0; - data_in (&di); - } - break; - case OP_NUMBER_OPT: - { - struct data_in di; + union value out; di.s = &sp->c[1]; - di.e = &sp->c[1] + sp->c[0]; - di.v = sp; - di.flags = DI_IGNORE_ERROR; + di.v = &out; + di.flags = 0; di.f1 = 1; di.format.type = *op++; di.format.w = *op++; di.format.d = *op++; + di.e = &sp->c[1] + min (sp->c[0], di.format.w); data_in (&di); + sp->f = out.f; } break; case OP_STRING: @@ -1056,7 +1077,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp->c = dest; } break; - case OP_SUBSTR: + case OP_SUBSTR_2: { int index; @@ -1072,7 +1093,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, } } break; - case OP_SUBSTR_OPT: + case OP_SUBSTR_3: { int index; int n; @@ -1098,10 +1119,6 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, break; /* Artificial. */ - case OP_INV: - if (sp->f != SYSMIS) - sp->f = 1. / sp->f; - break; case OP_SQUARE: if (sp->f != SYSMIS) sp->f *= sp->f; @@ -1144,10 +1161,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, sp->f *= rng_get_double (pspp_rng ()); break; case OP_SYSMIS: - if (sp[0].f == SYSMIS || !finite (sp[0].f)) - sp->f = 1.0; - else - sp->f = 0.0; + sp->f = sp->f == SYSMIS || !finite (sp->f); break; case OP_VEC_ELEM_NUM: { @@ -1270,12 +1284,6 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, assert (c != NULL); sp->f = c->data[*op++].f == SYSMIS; break; - case OP_STR_MIS: - sp++; - assert (c != NULL); - sp->f = is_str_user_missing (c->data[(*vars)->fv].s, *vars); - vars++; - break; case OP_NUM_VAL: sp++; assert (c != NULL); @@ -1296,7 +1304,7 @@ expr_evaluate (struct expression *e, const struct ccase *c, int case_num, main_loop: ; } finished: - if (e->type != EX_STRING) + if (e->type != EXPR_STRING) { double value = sp->f; if (!finite (value)) diff --git a/src/expr-opt.c b/src/expr-opt.c index 2e65b3e3..5f574f99 100644 --- a/src/expr-opt.c +++ b/src/expr-opt.c @@ -35,223 +35,109 @@ #include "str.h" #include "var.h" -/* - Expression "optimizer" +static void evaluate_tree_no_missing (union any_node **); +static void evaluate_tree_with_missing (union any_node **, size_t count); +static void optimize_tree (union any_node **); - Operates on the tree representation of expressions. - optimize_expression() performs the optimizations listed below: +static void collapse_node (union any_node **node, size_t child_idx); +static void set_number (union any_node **node, double); +static void set_number_errno (union any_node **node, double); +static void set_string (union any_node **node, const char *, size_t); - 1. Constant folding - Any operation with constant operands is replaced by its value. - (Exception: random-number-generator functions.) - - 2. Strength reduction (x is any expression; a is a numeric constant) - x/0 => SYSMIS - x*0 => 0 - x**0 => 1 - x**1, x+0, x-0, x*1 => x - x**2 => sqr(x) - x/a => x*(1/a) (where 1/a is evaluated at optimization time) - - I thought about adding additional optimizations but decided that what - is here could already be considered overkill. - */ - -static struct nonterm_node *evaluate_tree (struct nonterm_node * n); -static struct nonterm_node *optimize_tree (struct nonterm_node * n); - -struct nonterm_node * -optimize_expression (struct nonterm_node * n) +void +optimize_expression (union any_node **node) { + int nonconst = 0; /* Number of nonconstant children. */ + int sysmis = 0; /* Number of system-missing children. */ + struct nonterm_node *nonterm; int i; - /* Set to 1 if a child is nonconstant. */ - int nonconst = 0; - - /* Number of system-missing children. */ - int sysmis = 0; - /* We can't optimize a terminal node. */ - if (n->type > OP_TERMINAL) - return n; + if (IS_TERMINAL ((*node)->type)) + return; + nonterm = &(*node)->nonterm; /* Start by optimizing all the children. */ - for (i = 0; i < n->n; i++) + for (i = 0; i < nonterm->n; i++) { - n->arg[i] = ((union any_node *) - optimize_expression ((struct nonterm_node *) n->arg[i])); - if (n->arg[i]->type == OP_NUM_CON) + optimize_expression (&nonterm->arg[i]); + if (nonterm->arg[i]->type == OP_NUM_CON) { - if (n->arg[i]->num_con.value == SYSMIS) + if (nonterm->arg[i]->num_con.value == SYSMIS) sysmis++; } - else if (n->arg[i]->type != OP_STR_CON) - nonconst = 1; + else if (nonterm->arg[i]->type != OP_STR_CON) + nonconst++; } - if (sysmis && !(ops[n->type].flags & OP_ABSORB_MISS)) - /* Just about any operation produces SYSMIS when given any SYSMIS - arguments. */ + if (sysmis && !(ops[nonterm->type].flags & OP_ABSORB_MISS)) + { + /* Most operation produce SYSMIS given any SYSMIS + argument. */ + set_number (node, SYSMIS); + } + else if (!nonconst) + { + /* Evaluate constant expressions. */ + if (!sysmis) + evaluate_tree_no_missing (node); + else + evaluate_tree_with_missing (node, sysmis); + } + else { - struct num_con_node *num = xmalloc (sizeof *num); - free_node ((union any_node *) n); - num->type = OP_NUM_CON; - num->value = SYSMIS; - n = (struct nonterm_node *) num; + /* A few optimization possibilities are still left. */ + optimize_tree (node); } - else if (!nonconst) - /* If all the children of this node are constants, then there are - obvious optimizations. */ - n = evaluate_tree (n); - else - /* Otherwise, we may be able to make certain optimizations - anyway. */ - n = optimize_tree (n); - return n; } -static struct nonterm_node *repl_num_con (struct nonterm_node *, double); -static struct nonterm_node *force_repl_num_con (struct nonterm_node *, double); -static struct nonterm_node *repl_str_con (struct nonterm_node *, char *, int); - -#define n0 n->arg[0]->num_con.value -#define n1 n->arg[1]->num_con.value -#define n2 n->arg[2]->num_con.value - -#define s0 n->arg[0]->str_con.s -#define s0l n->arg[0]->str_con.len -#define s1 n->arg[1]->str_con.s -#define s1l n->arg[1]->str_con.len -#define s2 n->arg[2]->str_con.s -#define s2l n->arg[2]->str_con.len -#define s(X) n->arg[X]->str_con.s -#define sl(X) n->arg[X]->str_con.len - -static struct nonterm_node * -optimize_tree (struct nonterm_node * n) +static int +eq_num_con (union any_node *node, double number) { - int i; - - errno = 0; - if (n->type == OP_PLUS || n->type == OP_MUL) - { - /* Default constant value. */ - double def = n->type == OP_MUL ? 1.0 : 0.0; - - /* Total value of all the constants. */ - double cval = def; - - /* Number of nonconst arguments. */ - int nvar = 0; - - /* New node. */ - struct nonterm_node *m = NULL; - - /* Argument copying counter. */ - int c; - - /* 1=SYSMIS encountered */ - int sysmis = 0; - - for (i = 0; i < n->n; i++) - if (n->arg[i]->type == OP_NUM_CON) - { - if (n->arg[i]->num_con.value != SYSMIS) - { - if (n->type == OP_MUL) - cval *= n->arg[i]->num_con.value; - else - cval += n->arg[i]->num_con.value; - } - else - sysmis++; - } - else - nvar++; - - /* 0*SYSMIS=0, 0/SYSMIS=0; otherwise, SYSMIS and infinities - produce SYSMIS. */ - if (cval == 0.0 && n->type == OP_MUL) - nvar = 0; - else if (sysmis || !finite (cval)) - { - nvar = 0; - cval = SYSMIS; - } - - /* If no nonconstant terms, replace with a constant node. */ - if (nvar == 0) - return force_repl_num_con (n, cval); + return node->type == OP_NUM_CON && node->num_con.value == number; +} - if (nvar == 1 && cval == def) - { - /* If there is exactly one nonconstant term and no constant - terms, replace with the nonconstant term. */ - for (i = 0; i < n->n; i++) - if (n->arg[i]->type != OP_NUM_CON) - m = (struct nonterm_node *) n->arg[i]; - else - free_node (n->arg[i]); - } - else - { - /* Otherwise consolidate all the nonconstant terms. */ - m = xmalloc (sizeof (struct nonterm_node) - + ((nvar + (cval != def) - 1) - * sizeof (union any_node *))); - for (i = c = 0; i < n->n; i++) - if (n->arg[i]->type != OP_NUM_CON) - m->arg[c++] = n->arg[i]; - else - free_node (n->arg[i]); - - if (cval != def) - { - m->arg[c] = xmalloc (sizeof (struct num_con_node)); - m->arg[c]->num_con.type = OP_NUM_CON; - m->arg[c]->num_con.value = cval; - c++; - } - - m->type = n->type; - m->n = c; - } - free (n); - n = m; - } - else if (n->type == OP_POW) +static void +optimize_tree (union any_node **node) +{ + struct nonterm_node *n = &(*node)->nonterm; + + /* x+0, x-0, 0+x => x. */ + if ((n->type == OP_ADD || n->type == OP_SUB) && eq_num_con (n->arg[1], 0.)) + collapse_node (node, 1); + else if (n->type == OP_ADD && eq_num_con (n->arg[0], 0.)) + collapse_node (node, 0); + + /* x*1, x/1, 1*x => x. */ + else if ((n->type == OP_MUL || n->type == OP_DIV) + && eq_num_con (n->arg[1], 1.)) + collapse_node (node, 0); + else if (n->type == OP_MUL && eq_num_con (n->arg[0], 1.)) + collapse_node (node, 1); + + /* 0*x, 0/x, x*0, MOD(0,x) => x. */ + else if (((n->type == OP_MUL || n->type == OP_DIV || n->type == OP_MOD) + && eq_num_con (n->arg[0], 0.)) + || (n->type == OP_MUL && eq_num_con (n->arg[1], 0.))) + set_number (node, 0.); + + /* x**1 => x. */ + else if (n->type == OP_POW && eq_num_con (n->arg[1], 1)) + collapse_node (node, 0); + + /* x**2 => SQUARE(x). */ + else if (n->type == OP_POW && eq_num_con (n->arg[2], 2)) { - if (n->arg[1]->type == OP_NUM_CON) - { - if (n1 == 1.0) - { - struct nonterm_node *m = (struct nonterm_node *) n->arg[0]; - - free_node (n->arg[1]); - free (n); - return m; - } - else if (n1 == 2.0) - { - n = xrealloc (n, sizeof (struct nonterm_node)); - n->type = OP_SQUARE; - n->n = 1; - } - } + n->type = OP_SQUARE; + n->n = 1; } - return n; } -#define rnc(D) \ - (n = repl_num_con (n, D)) - -#define frnc(D) \ - (n = force_repl_num_con (n, D)) - /* Finds the first NEEDLE of length NEEDLE_LEN in a HAYSTACK of length HAYSTACK_LEN. Returns a 1-based index, 0 on failure. */ -static inline int -str_search (char *haystack, int haystack_len, char *needle, int needle_len) +static int +str_search (const char *haystack, int haystack_len, + const char *needle, int needle_len) { char *p = memmem (haystack, haystack_len, needle, needle_len); return p ? p - haystack + 1 : 0; @@ -259,182 +145,199 @@ str_search (char *haystack, int haystack_len, char *needle, int needle_len) /* Finds the last NEEDLE of length NEEDLE_LEN in a HAYSTACK of length HAYSTACK_LEN. Returns a 1-based index, 0 on failure. */ -static inline int -str_rsearch (char *haystack, int haystack_len, char *needle, int needle_len) +static int +str_rsearch (const char *haystack, int haystack_len, + const char *needle, int needle_len) { char *p = mm_find_reverse (haystack, haystack_len, needle, needle_len); return p ? p - haystack + 1 : 0; } -static struct nonterm_node * -evaluate_tree (struct nonterm_node * n) +static void +evaluate_tree_no_missing (union any_node **node) { - static char *strbuf; - int add; - int len; + struct nonterm_node *n = &(*node)->nonterm; + double num[3]; + char *str[3]; + size_t str_len[3]; int i; - if (!strbuf) - strbuf = xmalloc (256); errno = 0; + for (i = 0; i < n->n && i < 3; i++) + { + union any_node *arg = n->arg[i]; + + if (arg->type == OP_NUM_CON) + num[i] = arg->num_con.value; + else if (arg->type == OP_STR_CON) + { + str[i] = arg->str_con.s; + str_len[i] = arg->str_con.len; + } + } + switch (n->type) { - case OP_PLUS: + case OP_ADD: + set_number (node, num[0] + num[1]); + break; + + case OP_SUB: + set_number (node, num[0] - num[1]); + break; + case OP_MUL: - return optimize_tree (n); + set_number (node, num[0] * num[1]); + break; + + case OP_DIV: + if (num[1] != 0.) + set_number (node, num[0] / num[1]); + break; case OP_POW: - if (n0 == 0.0 && n1 == 0.0) - frnc (SYSMIS); - else if (n0 == SYSMIS && n1 == 0.0) - frnc (1.0); - else if (n0 == 0.0 && n1 == SYSMIS) - frnc (0.0); + if (num[0] == 0. && num[1] == 0.) + set_number (node, SYSMIS); else - rnc (pow (n0, n1)); + set_number_errno (node, pow (num[0], num[1])); break; case OP_AND: - if (n0 == 0.0 || n1 == 0.0) - frnc (0.0); - else if (n0 == SYSMIS || n1 == SYSMIS) - frnc (SYSMIS); - else - frnc (1.0); + set_number (node, num[0] && num[1]); break; + case OP_OR: - if (n0 == 1.0 || n1 == 1.0) - frnc (1.0); - else if (n0 == SYSMIS || n1 == SYSMIS) - frnc (SYSMIS); - else - frnc (0.0); + set_number (node, num[0] || num[1]); break; + case OP_NOT: - rnc (n0 == 0.0 ? 1.0 : 0.0); + set_number (node, !num[0]); break; case OP_EQ: - rnc (n0 == n1); + set_number (node, num[0] == num[1]); break; case OP_GE: - rnc (n0 >= n1); + set_number (node, num[0] >= num[1]); break; case OP_GT: - rnc (n0 > n1); + set_number (node, num[0] > num[1]); break; case OP_LE: - rnc (n0 <= n1); + set_number (node, num[0] <= num[1]); break; case OP_LT: - rnc (n0 < n1); + set_number (node, num[0] < num[1]); break; case OP_NE: - rnc (n0 != n1); + set_number (node, num[0] != num[1]); break; /* String operators. */ - case OP_STRING_EQ: - rnc (st_compare_pad (s0, s0l, s1, s1l) == 0); + case OP_EQ_STRING: + set_number (node, st_compare_pad (str[0], str_len[0], + str[1], str_len[1]) == 0); break; - case OP_STRING_GE: - rnc (st_compare_pad (s0, s0l, s1, s1l) >= 0); + case OP_GE_STRING: + set_number (node, st_compare_pad (str[0], str_len[0], + str[1], str_len[1]) >= 0); break; - case OP_STRING_GT: - rnc (st_compare_pad (s0, s0l, s1, s1l) > 0); + case OP_GT_STRING: + set_number (node, st_compare_pad (str[0], str_len[0], + str[1], str_len[1]) > 0); break; - case OP_STRING_LE: - rnc (st_compare_pad (s0, s0l, s1, s1l) <= 0); + case OP_LE_STRING: + set_number (node, st_compare_pad (str[0], str_len[0], + str[1], str_len[1]) <= 0); break; - case OP_STRING_LT: - rnc (st_compare_pad (s0, s0l, s1, s1l) < 0); + case OP_LT_STRING: + set_number (node, st_compare_pad (str[0], str_len[0], + str[1], str_len[1]) < 0); break; - case OP_STRING_NE: - rnc (st_compare_pad (s0, s0l, s1, s1l) != 0); + case OP_NE_STRING: + set_number (node, st_compare_pad (str[0], str_len[0], + str[1], str_len[1]) != 0); break; /* Unary functions. */ case OP_NEG: - rnc (-n0); + set_number (node, -num[0]); break; case OP_ABS: - rnc (fabs (n0)); + set_number (node, fabs (num[0])); break; case OP_ARCOS: - rnc (acos (n0)); + set_number_errno (node, acos (num[0])); break; case OP_ARSIN: - rnc (asin (n0)); + set_number_errno (node, asin (num[0])); break; case OP_ARTAN: - rnc (atan (n0)); + set_number_errno (node, atan (num[0])); break; case OP_COS: - rnc (cos (n0)); + set_number_errno (node, cos (num[0])); break; case OP_EXP: - rnc (exp (n0)); + set_number_errno (node, exp (num[0])); break; case OP_LG10: - rnc (log10 (n0)); + set_number_errno (node, log10 (num[0])); break; case OP_LN: - rnc (log (n0)); + set_number_errno (node, log (num[0])); break; case OP_MOD10: - rnc (fmod (n0, 10)); + set_number_errno (node, fmod (num[0], 10)); break; case OP_RND: - rnc (n0 >= 0.0 ? floor (n0 + 0.5) : -floor (-n0 + 0.5)); + if (num[0] >= 0.0) + set_number_errno (node, floor (num[0] + 0.5)); + else + set_number_errno (node, -floor (-num[0] + 0.5)); break; case OP_SIN: - rnc (sin (n0)); + set_number_errno (node, sin (num[0])); break; case OP_SQRT: - rnc (sqrt (n0)); + set_number_errno (node, sqrt (num[0])); break; case OP_TAN: - rnc (tan (n0)); + set_number_errno (node, tan (num[0])); break; case OP_TRUNC: - rnc (n0 >= 0.0 ? floor (n0) : -floor (-n0)); + if (num[0] >= 0.0) + set_number_errno (node, floor (num[0])); + else + set_number_errno (node, -floor (-num[0])); break; /* N-ary numeric functions. */ case OP_ANY: - if (n0 == SYSMIS) - frnc (SYSMIS); - else - { - int sysmis = 1; - double ni; - - for (i = 1; i < n->n; i++) - { - ni = n->arg[i]->num_con.value; - if (n0 == ni) - { - frnc (1.0); - goto any_done; - } - if (ni != SYSMIS) - sysmis = 0; - } - frnc (sysmis ? SYSMIS : 0.0); - } - any_done: - break; - case OP_ANY_STRING: - for (i = 1; i < n->n; i++) - if (!st_compare_pad (n->arg[0]->str_con.s, n->arg[0]->str_con.len, - n->arg[i]->str_con.s, n->arg[i]->str_con.len)) - { - frnc (1.0); - goto any_string_done; - } - frnc (0.0); - any_string_done: + { + double result = 0.0; + for (i = 1; i < n->n; i++) + if (num[0] == n->arg[i]->num_con.value) + { + result = 1.0; + break; + } + set_number (node, result); + } + break; + case OP_ANY_STRING: + { + double result = 0.0; + for (i = 1; i < n->n; i++) + if (!st_compare_pad (n->arg[0]->str_con.s, n->arg[0]->str_con.len, + n->arg[i]->str_con.s, n->arg[i]->str_con.len)) + { + result = 1.0; + break; + } + set_number (node, result); + } break; case OP_CFVAR: @@ -446,507 +349,397 @@ evaluate_tree (struct nonterm_node * n) case OP_SD: case OP_SUM: case OP_VARIANCE: + /* FIXME */ + break; + + case OP_RANGE: { - double d[2] = - {0.0, 0.0}; /* sum, sum of squares */ - double min = DBL_MAX; /* minimum value */ - double max = -DBL_MAX; /* maximum value */ - double ni; /* value of i'th argument */ - int nv = 0; /* number of valid arguments */ - - for (i = 0; i < n->n; i++) - { - ni = n->arg[i]->num_con.value; - if (ni != SYSMIS) - { - nv++; - d[0] += ni; - d[1] += ni * ni; - if (ni < min) - min = ni; - if (ni > max) - max = ni; - } - } - if (n->type == OP_NMISS) - frnc (i - nv); - else if (n->type == OP_NVALID) - frnc (nv); - else if (nv >= (int) n->arg[i]) - { - switch (n->type) - { - case OP_CFVAR: - frnc (calc_cfvar (d, nv)); - break; - case OP_MAX: - frnc (max); - break; - case OP_MEAN: - frnc (calc_mean (d, nv)); - break; - case OP_MIN: - frnc (min); - break; - case OP_SD: - frnc (calc_stddev (calc_variance (d, nv))); - break; - case OP_SUM: - frnc (d[0]); - break; - case OP_VARIANCE: - frnc (calc_variance (d, nv)); - break; - } - } - else - frnc (SYSMIS); + double result = 0.0; + + for (i = 1; i < n->n; i += 2) + { + double min = n->arg[i]->num_con.value; + double max = n->arg[i + 1]->num_con.value; + if (num[0] >= min && num[0] <= max) + { + result = 1.0; + break; + } + } + set_number (node, result); } break; - case OP_RANGE: - if (n0 == SYSMIS) - frnc (SYSMIS); - else - { - double min, max; - int sysmis = 1; - - for (i = 1; i < n->n; i += 2) - { - min = n->arg[i]->num_con.value; - max = n->arg[i + 1]->num_con.value; - if (min == SYSMIS || max == SYSMIS) - continue; - sysmis = 0; - if (n0 >= min && n0 <= max) - { - frnc (1.0); - goto range_done; - } - } - frnc (sysmis ? SYSMIS : 0.0); - } - range_done: - break; - case OP_RANGE_STRING: - for (i = 1; i < n->n; i += 2) - if (st_compare_pad (n->arg[0]->str_con.s, n->arg[0]->str_con.len, - n->arg[i]->str_con.s, n->arg[i]->str_con.len) >= 0 - && st_compare_pad (n->arg[0]->str_con.s, n->arg[0]->str_con.len, - n->arg[i + 1]->str_con.s, - n->arg[i + 1]->str_con.len) <= 0) - { - frnc (1.0); - goto range_str_done; - } - frnc (0.0); - range_str_done: - break; - /* Time function. */ + case OP_RANGE_STRING: + { + double result = 0.0; + + for (i = 1; i < n->n; i += 2) + { + const char *min = n->arg[i]->str_con.s; + size_t min_len = n->arg[i]->str_con.len; + const char *max = n->arg[i + 1]->str_con.s; + size_t max_len = n->arg[i + 1]->str_con.len; + + if (st_compare_pad (str[0], str_len[0], min, min_len) >= 0 + && st_compare_pad (str[0], str_len[0], max, max_len) <= 0) + { + result = 1.0; + break; + } + } + set_number (node, result); + break; + } + + /* Time functions. */ case OP_TIME_HMS: - rnc (60. * (60. * n0 + n1) + n2); + { + double min, max; + min = min (num[0], min (num[1], num[2])); + max = max (num[0], max (num[1], num[2])); + if (min < 0. && max > 0.) + break; + set_number (node, 60. * (60. * num[0] + num[1]) + num[2]); + } + break; + case OP_CTIME_DAYS: + set_number (node, num[0] / (60. * 60. * 24.)); + break; + case OP_CTIME_HOURS: + set_number (node, num[0] / (60. * 60.)); + break; + case OP_CTIME_MINUTES: + set_number (node, num[0] / 60.); + break; + case OP_TIME_DAYS: + set_number (node, num[0] * (60. * 60. * 24.)); + break; + case OP_CTIME_SECONDS: + set_number (node, num[0]); break; /* Date construction functions. */ case OP_DATE_DMY: - rnc (60. * 60. * 24. * yrmoda (n2, n1, n0)); + set_number (node, 60. * 60. * 24. * yrmoda (num[2], num[1], num[0])); break; case OP_DATE_MDY: - rnc (60. * 60. * 24. * yrmoda (n2, n0, n1)); + set_number (node, 60. * 60. * 24. * yrmoda (num[2], num[0], num[1])); break; case OP_DATE_MOYR: - rnc (60. * 60. * 24. * yrmoda (n1, n0, 1)); + set_number (node, 60. * 60. * 24. * yrmoda (num[1], num[0], 1)); break; case OP_DATE_QYR: - rnc (60. * 60. * 24. * yrmoda (n1, 3 * (int) n0 - 2, 1)); + set_number (node, + 60. * 60. * 24. * yrmoda (num[1], 3 * (int) num[0] - 2, 1)); break; case OP_DATE_WKYR: { - double t = yrmoda (n1, 1, 1); + double t = yrmoda (num[1], 1, 1); + if (num[0] < 0. || num[0] > 53.) + break; if (t != SYSMIS) - t = 60. * 60. * 24. * (t + 7. * (n0 - 1)); - rnc (t); + t = 60. * 60. * 24. * (t + 7. * (num[0] - 1)); + set_number (node, t); } break; case OP_DATE_YRDAY: { - double t = yrmoda (n0, 1, 1); + double t = yrmoda (num[0], 1, 1); if (t != SYSMIS) - t = 60. * 60. * 24. * (t + n0 - 1); - rnc (t); + t = 60. * 60. * 24. * (t + num[1] - 1); + set_number (node, t); } break; case OP_YRMODA: - rnc (yrmoda (n0, n1, n2)); + set_number (node, yrmoda (num[0], num[1], num[2])); break; + /* Date extraction functions. */ case OP_XDATE_DATE: - rnc (floor (n0 / 60. / 60. / 24.) * 60. * 60. * 24.); + set_number_errno (node, + floor (num[0] / 60. / 60. / 24.) * 60. * 60. * 24.); break; case OP_XDATE_HOUR: - rnc (fmod (floor (n0 / 60. / 60.), 24.)); + set_number_errno (node, fmod (floor (num[0] / 60. / 60.), 24.)); break; case OP_XDATE_JDAY: - rnc (julian_to_jday (n0 / 86400.)); + set_number (node, julian_to_jday (num[0] / 86400.)); break; case OP_XDATE_MDAY: { int day; - julian_to_calendar (n0 / 86400., NULL, NULL, &day); - rnc (day); + julian_to_calendar (num[0] / 86400., NULL, NULL, &day); + set_number (node, day); } break; case OP_XDATE_MINUTE: - rnc (fmod (floor (n0 / 60.), 60.)); + set_number_errno (node, fmod (floor (num[0] / 60.), 60.)); break; case OP_XDATE_MONTH: { int month; - julian_to_calendar (n0 / 86400., NULL, &month, NULL); - rnc (month); + julian_to_calendar (num[0] / 86400., NULL, &month, NULL); + set_number (node, month); } break; case OP_XDATE_QUARTER: { int month; - julian_to_calendar (n0 / 86400., NULL, &month, NULL); - rnc ((month - 1) / 3 + 1); + julian_to_calendar (num[0] / 86400., NULL, &month, NULL); + set_number (node, (month - 1) / 3 + 1); } break; case OP_XDATE_SECOND: - rnc (fmod (n0, 60.)); + set_number_errno (node, fmod (num[0], 60.)); break; case OP_XDATE_TDAY: - rnc (floor (n0 / 60. / 60. / 24.)); + set_number_errno (node, floor (num[0] / 60. / 60. / 24.)); break; case OP_XDATE_TIME: - rnc (n0 - floor (n0 / 60. / 60. / 24.) * 60. * 60. * 24.); + set_number_errno (node, num[0] - (floor (num[0] / 60. / 60. / 24.) + * 60. * 60. * 24.)); break; case OP_XDATE_WEEK: - rnc ((julian_to_jday (n0) - 1) / 7 + 1); + set_number (node, (julian_to_jday (num[0]) - 1) / 7 + 1); break; case OP_XDATE_WKDAY: - rnc (julian_to_wday (n0)); + set_number (node, julian_to_wday (num[0])); break; case OP_XDATE_YEAR: { int year; - julian_to_calendar (n0 / 86400., &year, NULL, NULL); - rnc (year); + julian_to_calendar (num[0] / 86400., &year, NULL, NULL); + set_number (node, year); } break; /* String functions. */ case OP_CONCAT: { - len = s0l; - memcpy (strbuf, s0, len); + char string[256]; + int length = str_len[0]; + memcpy (string, str[0], length); for (i = 1; i < n->n; i++) { - add = sl (i); - if (add + len > 255) - add = 255 - len; - memcpy (&strbuf[len], s (i), add); - len += add; + int add = n->arg[i]->str_con.len; + if (add + length > 255) + add = 255 - length; + memcpy (&string[length], n->arg[i]->str_con.s, add); + length += add; } - n = repl_str_con (n, strbuf, len); + set_string (node, string, length); } break; - case OP_INDEX: - rnc (s1l ? str_search (s0, s0l, s1, s1l) : SYSMIS); - break; - case OP_INDEX_OPT: - if (n2 == SYSMIS || (int) n2 <= 0 || s1l % (int) n2) - { - msg (SW, _("While optimizing a constant expression, there was " - "a bad value for the third argument to INDEX.")); - frnc (SYSMIS); - } - else - { - int pos = 0; - int c = s1l / (int) n2; - int r; - - for (i = 0; i < c; i++) - { - r = str_search (s0, s0l, s (i), sl (i)); - if (r < pos || pos == 0) - pos = r; - } - frnc (pos); - } - break; - case OP_RINDEX: - rnc (str_rsearch (s0, s0l, s1, s1l)); - break; - case OP_RINDEX_OPT: - if (n2 == SYSMIS || (int) n2 <= 0 || s1l % (int) n2) - { - msg (SE, _("While optimizing a constant expression, there was " - "a bad value for the third argument to RINDEX.")); - frnc (SYSMIS); - } - else - { - int pos = 0; - int c = s1l / n2; - int r; - - for (i = 0; i < c; i++) - { - r = str_rsearch (s0, s0l, s (i), sl (i)); - if (r > pos) - pos = r; - } - frnc (pos); - } + case OP_INDEX_2: + case OP_INDEX_3: + case OP_RINDEX_2: + case OP_RINDEX_3: + { + int result, chunk_width, chunk_cnt; + + if (n->type == OP_INDEX_2 || n->type == OP_RINDEX_2) + chunk_width = str_len[1]; + else + chunk_width = num[2]; + if (chunk_width <= 0 || chunk_width > str_len[1] + || str_len[1] % chunk_width != 0) + break; + chunk_cnt = str_len[1] / chunk_width; + + result = 0; + for (i = 0; i < chunk_cnt; i++) + { + const char *chunk = str[1] + chunk_width * i; + int ofs; + if (n->type == OP_INDEX_2 || n->type == OP_INDEX_3) + { + ofs = str_search (str[0], str_len[0], chunk, chunk_width); + if (ofs < result || result == 0) + result = ofs; + } + else + { + ofs = str_rsearch (str[0], str_len[0], chunk, chunk_width); + if (ofs > result) + result = ofs; + } + } + set_number (node, result); + } break; case OP_LENGTH: - frnc (s0l); + set_number (node, str_len[0]); break; case OP_LOWER: { char *cp; - for (cp = &s0[s0l]; cp >= s0; cp--) - *cp = tolower ((unsigned char) (*cp)); - n = repl_str_con (n, s0, s0l); + for (cp = str[0]; cp < str[0] + str_len[0]; cp++) + *cp = tolower ((unsigned char) *cp); } break; case OP_UPPER: { char *cp; - for (cp = &s0[s0[0] + 1]; cp > s0; cp--) - *cp = toupper ((unsigned char) (*cp)); - n = repl_str_con (n, s0, s0l); + for (cp = str[0]; cp < str[0] + str_len[0]; cp++) + *cp = toupper ((unsigned char) *cp); } break; case OP_LPAD: - case OP_LPAD_OPT: case OP_RPAD: - case OP_RPAD_OPT: { - int c; - - if (n1 == SYSMIS) - { - n = repl_str_con (n, NULL, 0); - break; - } - len = n1; - len = range (len, 1, 255); - add = max (n1 - s0l, 0); - - if (n->type == OP_LPAD_OPT || n->type == OP_RPAD_OPT) - { - if (s2l < 1) - { - c = n->type == OP_LPAD_OPT ? 'L' : 'R'; - msg (SE, _("Third argument to %cPAD() must be at least one " - "character in length."), c); - c = ' '; - } - else - c = s2[0]; - } - else - c = ' '; - - if (n->type == OP_LPAD || n->type == OP_LPAD_OPT) - memmove (&s0[add], s0, len); - if (n->type == OP_LPAD || n->type == OP_LPAD_OPT) - memset (s0, c, add); - else - memset (&s0[s0l], c, add); - - n = repl_str_con (n, s0, len); + char string[256]; + int len, pad_len; + char pad_char; + + /* Target length. */ + len = num[1]; + if (len < 1 || len > 255) + break; + + /* Pad character. */ + if (str_len[2] != 1) + break; + pad_char = str[2][0]; + + if (str_len[0] >= len) + len = str_len[0]; + pad_len = len - str_len[0]; + if (n->type == OP_LPAD) + { + memset (string, pad_char, pad_len); + memcpy (string + pad_len, str[0], str_len[0]); + } + else + { + memcpy (string, str[0], str_len[0]); + memset (string + str_len[0], pad_char, pad_len); + } + + set_string (node, string, len); } break; case OP_LTRIM: - case OP_LTRIM_OPT: case OP_RTRIM: - case OP_RTRIM_OPT: { - int c = ' '; - char *cp = s0; - - if (n->type == OP_LTRIM_OPT || n->type == OP_RTRIM_OPT) - { - if (s1l < 1) - { - c = n->type == OP_LTRIM_OPT ? 'L' : 'R'; - msg (SE, _("Second argument to %cTRIM() must be at least one " - "character in length."), c); - } - else - c = s1[0]; - } - len = s0l; - if (n->type == OP_LTRIM || n->type == OP_LTRIM_OPT) - { - while (*cp == c && cp < &s0[len]) - cp++; - len -= cp - s0; - } + char pad_char; + const char *cp = str[0]; + int len = str_len[0]; + + /* Pad character. */ + if (str_len[1] != 1) + break; + pad_char = str[1][0]; + + if (n->type == OP_LTRIM) + while (len > 0 && *cp == pad_char) + cp++, len--; else - while (len > 0 && s0[len - 1] == c) + while (len > 0 && str[0][len - 1] == pad_char) len--; - n = repl_str_con (n, cp, len); - } - break; - case OP_NUMBER: - case OP_NUMBER_OPT: - { - union value v; - struct data_in di; - - di.s = s0; - di.e = s0 + s0l; - di.v = &v; - di.flags = DI_IGNORE_ERROR; - di.f1 = 1; - - if (n->type == OP_NUMBER_OPT) - { - di.format.type = (int) n->arg[1]; - di.format.w = (int) n->arg[2]; - di.format.d = (int) n->arg[3]; - } - else - { - di.format.type = FMT_F; - di.format.w = s0l; - di.format.d = 0; - } - - data_in (&di); - frnc (v.f); - } - break; - case OP_STRING: - { - union value v; - struct fmt_spec f; - f.type = (int) n->arg[1]; - f.w = (int) n->arg[2]; - f.d = (int) n->arg[3]; - v.f = n0; - - assert ((formats[f.type].cat & FCAT_STRING) == 0); - data_out (strbuf, &f, &v); - n = repl_str_con (n, strbuf, f.w); + set_string (node, cp, len); } break; - case OP_SUBSTR: - case OP_SUBSTR_OPT: + case OP_SUBSTR_2: + case OP_SUBSTR_3: { - int pos = (int) n1; - if (pos > s0l || pos <= 0 || n1 == SYSMIS - || (n->type == OP_SUBSTR_OPT && n2 == SYSMIS)) - n = repl_str_con (n, NULL, 0); + int pos = (int) num[1]; + if (pos > str_len[0] || pos <= 0 || num[1] == SYSMIS + || (n->type == OP_SUBSTR_3 && num[2] == SYSMIS)) + set_string (node, NULL, 0); else { - if (n->type == OP_SUBSTR_OPT) + int len; + if (n->type == OP_SUBSTR_3) { - len = (int) n2; - if (len + pos - 1 > s0l) - len = s0l - pos + 1; + len = (int) num[2]; + if (len + pos - 1 > str_len[0]) + len = str_len[0] - pos + 1; } else - len = s0l - pos + 1; - n = repl_str_con (n, &s0[pos - 1], len); + len = str_len[0] - pos + 1; + set_string (node, &str[0][pos - 1], len); } } break; /* Weirdness. */ - case OP_INV: - rnc (1.0 / n0); - break; case OP_MOD: - if (n0 == 0.0 && n1 == SYSMIS) - frnc (0.0); + if (num[0] == 0.0 && num[1] == SYSMIS) + set_number (node, 0.0); else - rnc (fmod (n0, n1)); + set_number (node, fmod (num[0], num[1])); break; case OP_NUM_TO_BOOL: - if (n0 == 0.0) - n0 = 0.0; - else if (n0 == 1.0) - n0 = 1.0; - else if (n0 != SYSMIS) + if (num[0] == 0.0) + num[0] = 0.0; + else if (num[0] == 1.0) + num[0] = 1.0; + else if (num[0] != SYSMIS) { msg (SE, _("When optimizing a constant expression, an integer " "that was being used as an Boolean value was found " "to have a constant value other than 0, 1, or SYSMIS.")); - n0 = 0.0; + num[0] = 0.0; } - rnc (n0); + set_number (node, num[0]); break; } - return n; } -#undef n0 -#undef n1 -#undef n2 - -#undef s0 -#undef s0l -#undef s1 -#undef s1l -#undef s2 -#undef s2l -#undef s -#undef sl - -#undef rnc -#undef frnc +static void +evaluate_tree_with_missing (union any_node **node UNUSED, size_t count UNUSED) +{ + /* FIXME */ +} -static struct nonterm_node * -repl_num_con (struct nonterm_node * n, double d) +static void +collapse_node (union any_node **node, size_t child_idx) { - int i; - if (!finite (d) || errno) - d = SYSMIS; - else - for (i = 0; i < n->n; i++) - if (n->arg[i]->type == OP_NUM_CON && n->arg[i]->num_con.value == SYSMIS) - { - d = SYSMIS; - break; - } - return force_repl_num_con (n, d); + struct nonterm_node *nonterm = &(*node)->nonterm; + union any_node *child; + + child = nonterm->arg[child_idx]; + nonterm->arg[child_idx] = NULL; + free_node (*node); + *node = child; } -static struct nonterm_node * -force_repl_num_con (struct nonterm_node * n, double d) + +static void +set_number (union any_node **node, double value) { struct num_con_node *num; + + free_node (*node); - if (!finite (d) || errno) - d = SYSMIS; - free_node ((union any_node *) n); - num = xmalloc (sizeof *num); + *node = xmalloc (sizeof *num); + num = &(*node)->num_con; num->type = OP_NUM_CON; - num->value = d; - return (struct nonterm_node *) num; + num->value = finite (value) ? value : SYSMIS; } -static struct nonterm_node * -repl_str_con (struct nonterm_node * n, char *s, int len) +static void +set_number_errno (union any_node **node, double value) +{ + if (errno == EDOM || errno == ERANGE) + value = SYSMIS; + set_number (node, value); +} + +static void +set_string (union any_node **node, const char *string, size_t length) { struct str_con_node *str; /* The ordering here is important since the source string may be part of a subnode of n. */ - str = xmalloc (sizeof *str + len - 1); + str = xmalloc (sizeof *str + length - 1); str->type = OP_STR_CON; - str->len = len; - memcpy (str->s, s, len); - free_node ((union any_node *) n); - return (struct nonterm_node *) str; + str->len = length; + memcpy (str->s, string, length); + free_node (*node); + *node = (union any_node *) str; } /* Returns the number of days since 10 Oct 1582 for the date @@ -964,12 +757,14 @@ yrmoda (double year, double month, double day) month = floor (month + EPSILON); day = floor (day + EPSILON); - if (year >= 0. && year <= 199.) + if (year >= 0. && year <= 29.) + year += 2000.; + else if (year >= 30. && year <= 99.) year += 1900.; if ((year < 1582. || year > 19999.) || (year == 1582. && (month < 10. || (month == 10. && day < 15.))) - || (month < -1 || month > 13) - || (day < -1 || day > 32)) + || (month < 0 || month > 13) + || (day < 0 || day > 31)) return SYSMIS; return calendar_to_julian (year, month, day); } @@ -1035,19 +830,7 @@ dump_expression (union any_node * n, struct expression * expr) static void dump_node (struct expr_dump_state *eds, union any_node * n) { - if (n->type == OP_AND || n->type == OP_OR) - { - int i; - - dump_node (eds, n->nonterm.arg[0]); - for (i = 1; i < n->nonterm.n; i++) - { - dump_node (eds, n->nonterm.arg[i]); - emit (eds, n->type); - } - return; - } - else if (n->type < OP_TERMINAL) + if (IS_NONTERMINAL (n->type)) { int i; for (i = 0; i < n->nonterm.n; i++) @@ -1063,26 +846,26 @@ dump_node (struct expr_dump_state *eds, union any_node * n) emit (eds, (int) n->nonterm.arg[n->nonterm.n + 1]); emit (eds, (int) n->nonterm.arg[n->nonterm.n + 2]); } - return; } - - emit (eds, n->type); - if (n->type == OP_NUM_CON) - emit_num_con (eds, n->num_con.value); - else if (n->type == OP_STR_CON) - emit_str_con (eds, n->str_con.s, n->str_con.len); - else if (n->type == OP_NUM_VAR || n->type == OP_STR_VAR - || n->type == OP_STR_MIS) - emit_var (eds, n->var.v); - else if (n->type == OP_NUM_LAG || n->type == OP_STR_LAG) + else { - emit_var (eds, n->lag.v); - emit (eds, n->lag.lag); + emit (eds, n->type); + if (n->type == OP_NUM_CON) + emit_num_con (eds, n->num_con.value); + else if (n->type == OP_STR_CON) + emit_str_con (eds, n->str_con.s, n->str_con.len); + else if (n->type == OP_NUM_VAR || n->type == OP_STR_VAR) + emit_var (eds, n->var.v); + else if (n->type == OP_NUM_LAG || n->type == OP_STR_LAG) + { + emit_var (eds, n->lag.v); + emit (eds, n->lag.lag); + } + else if (n->type == OP_NUM_SYS || n->type == OP_NUM_VAL) + emit (eds, n->var.v->fv); + else + assert (n->type == OP_CASENUM); } - else if (n->type == OP_NUM_SYS || n->type == OP_NUM_VAL) - emit (eds, n->var.v->fv); - else - assert (n->type == OP_CASENUM); } static void diff --git a/src/expr-prs.c b/src/expr-prs.c index 08bf2d4f..acf8c3c6 100644 --- a/src/expr-prs.c +++ b/src/expr-prs.c @@ -29,6 +29,7 @@ #include "error.h" #include "lexer.h" #include "misc.h" +#include "settings.h" #include "str.h" #include "var.h" #include "vfm.h" @@ -36,27 +37,26 @@ /* Declarations. */ -/* Lowest precedence. */ -static int parse_or (union any_node **n); -static int parse_and (union any_node **n); -static int parse_not (union any_node **n); -static int parse_rel (union any_node **n); -static int parse_add (union any_node **n); -static int parse_mul (union any_node **n); -static int parse_neg (union any_node **n); -static int parse_exp (union any_node **n); -static int parse_primary (union any_node **n); -static int parse_function (union any_node **n); -/* Highest precedence. */ +/* Recursive descent parser in order of increasing precedence. */ +typedef enum expr_type parse_recursively_func (union any_node **); +static parse_recursively_func parse_or, parse_and, parse_not; +static parse_recursively_func parse_rel, parse_add, parse_mul; +static parse_recursively_func parse_neg, parse_exp; +static parse_recursively_func parse_primary, parse_function; /* Utility functions. */ -static const char *expr_type_name (int type); -static const char *type_name (int type); +static const char *expr_type_name (enum expr_type type); +static const char *var_type_name (int var_type); static void make_bool (union any_node **n); static union any_node *allocate_nonterminal (int op, union any_node *n); -static union any_node *append_nonterminal_arg (union any_node *, - union any_node *); -static int type_check (union any_node **n, int type, int flags); +static union any_node *allocate_binary_nonterminal (int op, union any_node *, + union any_node *); +static union any_node *allocate_num_con (double value); +static union any_node *allocate_str_con (const char *string, size_t length); +static union any_node *allocate_var_node (int type, struct variable *); +static int type_check (union any_node **n, + enum expr_type actual_type, + enum expr_type expected_type); static algo_compare_func compare_functions; static void init_func_tab (void); @@ -64,10 +64,6 @@ static void init_func_tab (void); #if DEBUGGING static void debug_print_tree (union any_node *, int); #endif - -#if GLOBAL_DEBUGGING -static void debug_print_postfix (struct expression *); -#endif /* Public functions. */ @@ -87,483 +83,460 @@ expr_free (struct expression *e) } struct expression * -expr_parse (int flags) +expr_parse (enum expr_type expected_type) { struct expression *e; union any_node *n; - int type; + enum expr_type actual_type; + int optimize = (expected_type & EXPR_NO_OPTIMIZE) == 0; + + expected_type &= ~EXPR_NO_OPTIMIZE; /* Make sure the table of functions is initialized. */ init_func_tab (); /* Parse the expression. */ - type = parse_or (&n); - if (type == EX_ERROR) + actual_type = parse_or (&n); + if (actual_type == EXPR_ERROR) return NULL; /* Enforce type rules. */ - if (!type_check (&n, type, flags)) + if (!type_check (&n, actual_type, expected_type)) { free_node (n); return NULL; } /* Optimize the expression as best we can. */ - n = (union any_node *) optimize_expression ((struct nonterm_node *) n); + if (optimize) + optimize_expression (&n); /* Dump the tree-based expression to a postfix representation for best evaluation speed, and destroy the tree. */ e = xmalloc (sizeof *e); - e->type = type; + e->type = actual_type; dump_expression (n, e); free_node (n); - /* If we're debugging or the user requested it, print the postfix - representation. */ -#if DEBUGGING - debug_print_postfix (e); -#endif - return e; } +/* Returns the type of EXPR. */ +enum expr_type +expr_get_type (const struct expression *expr) +{ + assert (expr != NULL); + return expr->type; +} + static int -type_check (union any_node **n, int type, int flags) +type_check (union any_node **n, enum expr_type actual_type, enum expr_type expected_type) { - /* Enforce PXP_BOOLEAN flag. */ - if (flags & PXP_BOOLEAN) + switch (expected_type) { - if (type == EX_STRING) + case EXPR_BOOLEAN: + case EXPR_NUMERIC: + if (actual_type == EXPR_STRING) { - msg (SE, _("A string expression was supplied in a place " - "where a Boolean expression was expected.")); + msg (SE, _("Type mismatch: expression has string type, " + "but a numeric value is required here.")); return 0; } - else if (type == EX_NUMERIC) + if (actual_type == EXPR_NUMERIC && expected_type == EXPR_BOOLEAN) *n = allocate_nonterminal (OP_NUM_TO_BOOL, *n); - } - - /* Enforce PXP_NUMERIC flag. */ - if ((flags & PXP_NUMERIC) && (type != EX_NUMERIC)) - { - msg (SE, _("A numeric expression was expected in a place " - "where one was not supplied.")); - return 0; - } + break; + + case EXPR_STRING: + if (actual_type != EXPR_STRING) + { + msg (SE, _("Type mismatch: expression has numeric type, " + "but a string value is required here.")); + return 0; + } + break; + + case EXPR_ANY: + break; - /* Enforce PXP_STRING flag. */ - if ((flags & PXP_STRING) && (type != EX_STRING)) - { - msg (SE, _("A string expression was expected in a place " - "where one was not supplied.")); - return 0; + default: + assert (0); } - + return 1; } /* Recursive-descent expression parser. */ -/* Parses the OR level. */ +/* Coerces *NODE, of type ACTUAL_TYPE, to type REQUIRED_TYPE, and + returns success. If ACTUAL_TYPE cannot be coerced to the + desired type then we issue an error message about operator + OPERATOR_NAME and free *NODE. */ static int -parse_or (union any_node **n) +type_coercion (enum expr_type actual_type, enum expr_type required_type, + union any_node **node, + const char *operator_name) { - char typ[] = N_("The OR operator cannot take string operands."); - union any_node *c; - int type; + assert (required_type == EXPR_NUMERIC + || required_type == EXPR_BOOLEAN + || required_type == EXPR_STRING); - type = parse_and (n); - if (type == EX_ERROR || token != T_OR) - return type; - if (type == EX_STRING) + if (actual_type == required_type) { - free_node (*n); - msg (SE, gettext (typ)); + /* Type match. */ + return 1; + } + else if (actual_type == EXPR_ERROR) + { + /* Error already reported. */ + *node = NULL; return 0; } - else if (type == EX_NUMERIC) - make_bool (n); - - c = allocate_nonterminal (OP_OR, *n); - for (;;) + else if (actual_type == EXPR_BOOLEAN && required_type == EXPR_NUMERIC) { - lex_get (); - type = parse_and (n); - if (type == EX_ERROR) - goto fail; - else if (type == EX_STRING) - { - msg (SE, gettext (typ)); - goto fail; - } - else if (type == EX_NUMERIC) - make_bool (n); - c = append_nonterminal_arg (c, *n); - - if (token != T_OR) - break; + /* Boolean -> numeric: nothing to do. */ + return 1; } - *n = c; - return EX_BOOLEAN; + else if (actual_type == EXPR_NUMERIC && required_type == EXPR_BOOLEAN) + { + /* Numeric -> Boolean: insert conversion. */ + make_bool (node); + return 1; + } + else + { + /* We want a string and got a number/Boolean, or vice versa. */ + assert ((actual_type == EXPR_STRING) != (required_type == EXPR_STRING)); -fail: - free_node (c); - return EX_ERROR; + if (required_type == EXPR_STRING) + msg (SE, _("Type mismatch: operands of %s operator must be strings."), + operator_name); + else + msg (SE, _("Type mismatch: operands of %s operator must be numeric."), + operator_name); + free_node (*node); + *node = NULL; + return 0; + } } -/* Parses the AND level. */ +/* An operator. */ +struct operator + { + int token; /* Operator token. */ + int type; /* Operator node type. */ + const char *name; /* Operator name. */ + }; + +/* Attempts to match the current token against the tokens for the + OP_CNT operators in OPS[]. If successful, returns nonzero + and, if OPERATOR is non-null, sets *OPERATOR to the operator. + On failure, returns zero and, if OPERATOR is non-null, sets + *OPERATOR to a null pointer. */ static int -parse_and (union any_node ** n) +match_operator (const struct operator ops[], size_t op_cnt, + const struct operator **operator) { - static const char typ[] - = N_("The AND operator cannot take string operands."); - union any_node *c; - int type = parse_not (n); + const struct operator *op; - if (type == EX_ERROR) - return EX_ERROR; - if (token != T_AND) - return type; - if (type == EX_STRING) + for (op = ops; op < ops + op_cnt; op++) { - free_node (*n); - msg (SE, gettext (typ)); - return 0; + if (op->token == '-') + lex_negative_to_dash (); + if (lex_match (op->token)) + { + if (operator != NULL) + *operator = op; + return 1; + } } - else if (type == EX_NUMERIC) - make_bool (n); - - c = allocate_nonterminal (OP_AND, *n); - for (;;) - { - lex_get (); - type = parse_not (n); - if (type == EX_ERROR) - goto fail; - else if (type == EX_STRING) - { - msg (SE, gettext (typ)); - goto fail; - } - else if (type == EX_NUMERIC) - make_bool (n); - c = append_nonterminal_arg (c, *n); - - if (token != T_AND) - break; - } - *n = c; - return EX_BOOLEAN; - -fail: - free_node (c); - return EX_ERROR; + if (operator != NULL) + *operator = NULL; + return 0; } -/* Parses the NOT level. */ -static int -parse_not (union any_node ** n) +/* Parses a chain of left-associative operator/operand pairs. + The operators' operands uniformly must be type REQUIRED_TYPE. + There are OP_CNT operators, specified in OPS[]. The next + higher level is parsed by PARSE_NEXT_LEVEL. If CHAIN_WARNING + is non-null, then it will be issued as a warning if more than + one operator/operand pair is parsed. */ +static enum expr_type +parse_binary_operators (union any_node **node, + enum expr_type actual_type, + enum expr_type required_type, + enum expr_type result_type, + const struct operator ops[], size_t op_cnt, + parse_recursively_func *parse_next_level, + const char *chain_warning) { - static const char typ[] - = N_("The NOT operator cannot take a string operand."); - int not = 0; - int type; + int op_count; + const struct operator *operator; - while (lex_match (T_NOT)) - not ^= 1; - type = parse_rel (n); - if (!not || type == EX_ERROR) - return type; + if (actual_type == EXPR_ERROR) + return EXPR_ERROR; - if (type == EX_STRING) + for (op_count = 0; match_operator (ops, op_cnt, &operator); op_count++) { - free_node (*n); - msg (SE, gettext (typ)); - return 0; + union any_node *rhs; + + /* Convert the left-hand side to type REQUIRED_TYPE. */ + if (!type_coercion (actual_type, required_type, node, operator->name)) + return EXPR_ERROR; + + /* Parse the right-hand side and coerce to type + REQUIRED_TYPE. */ + if (!type_coercion (parse_next_level (&rhs), required_type, + &rhs, operator->name)) + { + free_node (*node); + *node = NULL; + return EXPR_ERROR; + } + *node = allocate_binary_nonterminal (operator->type, *node, rhs); + + /* The result is of type RESULT_TYPE. */ + actual_type = result_type; } - else if (type == EX_NUMERIC) - make_bool (n); - *n = allocate_nonterminal (OP_NOT, *n); - return EX_BOOLEAN; + if (op_count > 1 && chain_warning != NULL) + msg (SW, chain_warning); + + return actual_type; } -static int -parse_rel (union any_node ** n) +static enum expr_type +parse_inverting_unary_operator (union any_node **node, + enum expr_type required_type, + const struct operator *operator, + parse_recursively_func *parse_next_level) { - static const char typ[] - = N_("Strings cannot be compared with numeric or Boolean " - "values with the relational operators " - "= >= > <= < <>."); - union any_node *c; - int type = parse_add (n); + unsigned op_count; + + op_count = 0; + while (match_operator (operator, 1, NULL)) + op_count++; + if (op_count == 0) + return parse_next_level (node); + + if (!type_coercion (parse_next_level (node), required_type, + node, operator->name)) + return EXPR_ERROR; + if (op_count % 2 != 0) + *node = allocate_nonterminal (operator->type, *node); + return required_type; +} - if (type == EX_ERROR) - return EX_ERROR; - if (token == '=') - token = T_EQ; - if (token < T_EQ || token > T_NE) - return type; +/* Parses the OR level. */ +static enum expr_type +parse_or (union any_node **n) +{ + static const struct operator ops[] = + { + { T_OR, OP_OR, "logical disjunction (\"OR\")" }, + }; + + return parse_binary_operators (n, parse_and (n), EXPR_BOOLEAN, EXPR_BOOLEAN, + ops, sizeof ops / sizeof *ops, + parse_and, NULL); +} - for (;;) +/* Parses the AND level. */ +static enum expr_type +parse_and (union any_node ** n) +{ + static const struct operator ops[] = { - int t; + { T_AND, OP_AND, "logical conjunction (\"AND\")" }, + }; + + return parse_binary_operators (n, parse_not (n), EXPR_BOOLEAN, EXPR_BOOLEAN, + ops, sizeof ops / sizeof *ops, + parse_not, NULL); +} - c = allocate_nonterminal (token - T_EQ - + (type == EX_NUMERIC ? OP_EQ : OP_STRING_EQ), - *n); - lex_get (); +/* Parses the NOT level. */ +static enum expr_type +parse_not (union any_node ** n) +{ + static const struct operator op + = { T_NOT, OP_NOT, "logical negation (\"NOT-\")" }; + return parse_inverting_unary_operator (n, EXPR_BOOLEAN, &op, parse_rel); +} - t = parse_add (n); - if (t == EX_ERROR) - goto fail; - if (t == EX_BOOLEAN && type == EX_NUMERIC) - make_bool (&c->nonterm.arg[0]); - else if (t == EX_NUMERIC && type == EX_BOOLEAN) - make_bool (n); - else if (t != type) - { - msg (SE, gettext (typ)); - goto fail; - } +/* Parse relational operators. */ +static enum expr_type +parse_rel (union any_node **n) +{ + static const struct operator numeric_ops[] = + { + { '=', OP_EQ, "numeric equality (\"=\")" }, + { T_EQ, OP_EQ, "numeric equality (\"EQ\")" }, + { T_GE, OP_GE, "numeric greater-than-or-equal-to (\">=\")" }, + { T_GT, OP_GT, "numeric greater than (\">\")" }, + { T_LE, OP_LE, "numeric less-than-or-equal-to (\"<=\")" }, + { T_LT, OP_LT, "numeric less than (\"<\")" }, + { T_NE, OP_NE, "numeric inequality (\"<>\")" }, + }; + + static const struct operator string_ops[] = + { + { '=', OP_EQ_STRING, "string equality (\"=\")" }, + { T_EQ, OP_EQ_STRING, "string equality (\"EQ\")" }, + { T_GE, OP_GE_STRING, "string greater-than-or-equal-to (\">=\")" }, + { T_GT, OP_GT_STRING, "string greater than (\">\")" }, + { T_LE, OP_LE_STRING, "string less-than-or-equal-to (\"<=\")" }, + { T_LT, OP_LT_STRING, "string less than (\"<\")" }, + { T_NE, OP_NE_STRING, "string inequality (\"<>\")" }, + }; - c = append_nonterminal_arg (c, *n); - *n = c; + int type = parse_add (n); - if (token == '=') - token = T_EQ; - if (token < T_EQ || token > T_NE) - break; + const char *chain_warning = + _("Chaining relational operators (e.g. \"a < b < c\") will " + "not produce the mathematically expected result. " + "Use the AND logical operator to fix the problem " + "(e.g. \"a < b AND b < c\"). " + "If chaining is really intended, parentheses will disable " + "this warning (e.g. \"(a < b) < c\".)"); - type = EX_BOOLEAN; - } - return EX_BOOLEAN; + switch (type) + { + case EXPR_ERROR: + return EXPR_ERROR; + + case EXPR_NUMERIC: + case EXPR_BOOLEAN: + return parse_binary_operators (n, + type, EXPR_NUMERIC, EXPR_BOOLEAN, + numeric_ops, + sizeof numeric_ops / sizeof *numeric_ops, + parse_add, chain_warning); + + case EXPR_STRING: + return parse_binary_operators (n, + type, EXPR_STRING, EXPR_BOOLEAN, + string_ops, + sizeof string_ops / sizeof *string_ops, + parse_add, chain_warning); -fail: - free_node (c); - return EX_ERROR; + default: + assert (0); + abort (); + } } /* Parses the addition and subtraction level. */ -static int +static enum expr_type parse_add (union any_node **n) { - static const char typ[] - = N_("The `+' and `-' operators may only be used with " - "numeric operands."); - union any_node *c; - int type; - int op; - - type = parse_mul (n); - lex_negative_to_dash (); - if (type == EX_ERROR || (token != '+' && token != '-')) - return type; - if (type != EX_NUMERIC) + static const struct operator ops[] = { - free_node (*n); - msg (SE, gettext (typ)); - return 0; - } - - c = allocate_nonterminal (OP_PLUS, *n); - for (;;) - { - op = token; - lex_get (); - - type = parse_mul (n); - if (type == EX_ERROR) - goto fail; - else if (type != EX_NUMERIC) - { - msg (SE, gettext (typ)); - goto fail; - } - if (op == '-') - *n = allocate_nonterminal (OP_NEG, *n); - c = append_nonterminal_arg (c, *n); - - lex_negative_to_dash (); - if (token != '+' && token != '-') - break; - } - *n = c; - return EX_NUMERIC; - -fail: - free_node (c); - return EX_ERROR; + { '+', OP_ADD, "addition (\"+\")" }, + { '-', OP_SUB, "subtraction (\"-\")-" }, + }; + + return parse_binary_operators (n, parse_mul (n), EXPR_NUMERIC, EXPR_NUMERIC, + ops, sizeof ops / sizeof *ops, + parse_mul, NULL); } /* Parses the multiplication and division level. */ -static int +static enum expr_type parse_mul (union any_node ** n) { - static const char typ[] - = N_("The `*' and `/' operators may only be used with " - "numeric operands."); - - union any_node *c; - int type; - int op; - - type = parse_neg (n); - if (type == EX_ERROR || (token != '*' && token != '/')) - return type; - if (type != EX_NUMERIC) - { - free_node (*n); - msg (SE, gettext (typ)); - return 0; - } - - c = allocate_nonterminal (OP_MUL, *n); - for (;;) + static const struct operator ops[] = { - op = token; - lex_get (); - - type = parse_neg (n); - if (type == EX_ERROR) - goto fail; - else if (type != EX_NUMERIC) - { - msg (SE, gettext (typ)); - goto fail; - } - if (op == '/') - *n = allocate_nonterminal (OP_INV, *n); - c = append_nonterminal_arg (c, *n); - - if (token != '*' && token != '/') - break; - } - *n = c; - return EX_NUMERIC; - -fail: - free_node (c); - return EX_ERROR; + { '*', OP_MUL, "multiplication (\"*\")" }, + { '/', OP_DIV, "division (\"/\")" }, + }; + + return parse_binary_operators (n, parse_neg (n), EXPR_NUMERIC, EXPR_NUMERIC, + ops, sizeof ops / sizeof *ops, + parse_neg, NULL); } /* Parses the unary minus level. */ -static int +static enum expr_type parse_neg (union any_node **n) { - static const char typ[] - = N_("The unary minus (-) operator can only take a numeric operand."); - - int neg = 0; - int type; - - for (;;) - { - lex_negative_to_dash (); - if (!lex_match ('-')) - break; - neg ^= 1; - } - type = parse_exp (n); - if (!neg || type == EX_ERROR) - return type; - if (type != EX_NUMERIC) - { - free_node (*n); - msg (SE, gettext (typ)); - return 0; - } - - *n = allocate_nonterminal (OP_NEG, *n); - return EX_NUMERIC; + static const struct operator op = { '-', OP_NEG, "negation (\"-\")" }; + return parse_inverting_unary_operator (n, EXPR_NUMERIC, &op, parse_exp); } -static int +static enum expr_type parse_exp (union any_node **n) { - static const char typ[] - = N_("Both operands to the ** operator must be numeric."); - - union any_node *c; - int type; - - type = parse_primary (n); - if (type == EX_ERROR || token != T_EXP) - return type; - if (type != EX_NUMERIC) - { - free_node (*n); - msg (SE, gettext (typ)); - return 0; - } - - for (;;) + static const struct operator ops[] = { - c = allocate_nonterminal (OP_POW, *n); - lex_get (); - - type = parse_primary (n); - if (type == EX_ERROR) - goto fail; - else if (type != EX_NUMERIC) - { - msg (SE, gettext (typ)); - goto fail; - } - *n = append_nonterminal_arg (c, *n); - - if (token != T_EXP) - break; - } - return EX_NUMERIC; - -fail: - free_node (c); - return EX_ERROR; + { T_EXP, OP_POW, "exponentiation (\"**\")" }, + }; + + const char *chain_warning = + _("The exponentiation operator (\"**\") is left-associative, " + "even though right-associative semantics are more useful. " + "That is, \"a**b**c\" equals \"(a**b)**c\", not as \"a**(b**c)\". " + "To disable this warning, insert parentheses."); + + return parse_binary_operators (n, + parse_primary (n), EXPR_NUMERIC, EXPR_NUMERIC, + ops, sizeof ops / sizeof *ops, + parse_primary, chain_warning); } /* Parses system variables. */ -static int +static enum expr_type parse_sysvar (union any_node **n) { if (!strcmp (tokid, "$CASENUM")) { *n = xmalloc (sizeof (struct casenum_node)); (*n)->casenum.type = OP_CASENUM; - return EX_NUMERIC; + return EXPR_NUMERIC; + } + else if (!strcmp (tokid, "$DATE")) + { + static const char *months[12] = + { + "JAN", "FEB", "MAR", "APR", "MAY", "JUN", + "JUL", "AUG", "SEP", "OCT", "NOV", "DEC", + }; + + struct tm *time; + char temp_buf[10]; + + time = localtime (&last_vfm_invocation); + sprintf (temp_buf, "%02d %s %02d", abs (time->tm_mday) % 100, + months[abs (time->tm_mon) % 12], abs (time->tm_year) % 100); + + *n = xmalloc (sizeof (struct str_con_node) + 8); + (*n)->str_con.type = OP_STR_CON; + (*n)->str_con.len = 9; + memcpy ((*n)->str_con.s, temp_buf, 9); + return EXPR_STRING; } else { + enum expr_type type; double d; - if (!strcmp (tokid, "$SYSMIS")) + type = EXPR_NUMERIC; + if (!strcmp (tokid, "$TRUE")) + { + d = 1.0; + type = EXPR_BOOLEAN; + } + else if (!strcmp (tokid, "$FALSE")) + { + d = 0.0; + type = EXPR_BOOLEAN; + } + else if (!strcmp (tokid, "$SYSMIS")) d = SYSMIS; else if (!strcmp (tokid, "$JDATE")) { struct tm *time = localtime (&last_vfm_invocation); d = yrmoda (time->tm_year + 1900, time->tm_mon + 1, time->tm_mday); } - else if (!strcmp (tokid, "$DATE")) - { - static const char *months[12] = - { - "JAN", "FEB", "MAR", "APR", "MAY", "JUN", - "JUL", "AUG", "SEP", "OCT", "NOV", "DEC", - }; - - struct tm *time; - char temp_buf[10]; - - time = localtime (&last_vfm_invocation); - sprintf (temp_buf, "%02d %s %02d", abs (time->tm_mday) % 100, - months[abs (time->tm_mon) % 12], abs (time->tm_year) % 100); - - *n = xmalloc (sizeof (struct str_con_node) + 8); - (*n)->str_con.type = OP_STR_CON; - (*n)->str_con.len = 9; - memcpy ((*n)->str_con.s, temp_buf, 9); - return EX_STRING; - } else if (!strcmp (tokid, "$TIME")) { struct tm *time; @@ -575,30 +548,24 @@ parse_sysvar (union any_node **n) + time->tm_sec); } else if (!strcmp (tokid, "$LENGTH")) - { - msg (SW, _("Use of $LENGTH is obsolete, returning default of 66.")); - d = 66.0; - } + d = get_viewlength (); else if (!strcmp (tokid, "$WIDTH")) - { - msg (SW, _("Use of $WIDTH is obsolete, returning default of 131.")); - d = 131.0; - } + d = get_viewwidth (); else { msg (SE, _("Unknown system variable %s."), tokid); - return EX_ERROR; + return EXPR_ERROR; } *n = xmalloc (sizeof (struct num_con_node)); (*n)->num_con.type = OP_NUM_CON; (*n)->num_con.value = d; - return EX_NUMERIC; + return type; } } /* Parses numbers, varnames, etc. */ -static int +static enum expr_type parse_primary (union any_node **n) { switch (token) @@ -615,7 +582,7 @@ parse_primary (union any_node **n) /* $ at the beginning indicates a system variable. */ if (tokid[0] == '$') { - int type = parse_sysvar (n); + enum expr_type type = parse_sysvar (n); lex_get (); return type; } @@ -626,30 +593,31 @@ parse_primary (union any_node **n) if (v == NULL) { lex_error (_("expecting variable name")); - return EX_ERROR; + return EXPR_ERROR; } - *n = xmalloc (sizeof (struct var_node)); - (*n)->var.type = v->type == NUMERIC ? OP_NUM_VAR : OP_STR_VAR; - (*n)->var.v = v; - return v->type == NUMERIC ? EX_NUMERIC : EX_STRING; + if (v->type == NUMERIC) + { + *n = allocate_var_node (OP_NUM_VAR, v); + return EXPR_NUMERIC; + } + else + { + *n = allocate_var_node (OP_STR_VAR, v); + return EXPR_STRING; + } } case T_NUM: - *n = xmalloc (sizeof (struct num_con_node)); - (*n)->num_con.type = OP_NUM_CON; - (*n)->num_con.value = tokval; + *n = allocate_num_con (tokval); lex_get (); - return EX_NUMERIC; + return EXPR_NUMERIC; case T_STRING: { - *n = xmalloc (sizeof (struct str_con_node) + ds_length (&tokstr) - 1); - (*n)->str_con.type = OP_STR_CON; - (*n)->str_con.len = ds_length (&tokstr); - memcpy ((*n)->str_con.s, ds_value (&tokstr), ds_length (&tokstr)); + *n = allocate_str_con (ds_value (&tokstr), ds_length (&tokstr)); lex_get (); - return EX_STRING; + return EXPR_STRING; } case '(': @@ -661,14 +629,14 @@ parse_primary (union any_node **n) { lex_error (_("expecting `)'")); free_node (*n); - return EX_ERROR; + return EXPR_ERROR; } return t; } default: lex_error (_("in expression")); - return EX_ERROR; + return EXPR_ERROR; } } @@ -678,179 +646,94 @@ struct function { const char *s; int t; - int (*func) (struct function *, int, union any_node **); - const char *desc; + enum expr_type (*func) (const struct function *, int, union any_node **); }; static struct function func_tab[]; static int func_count; -static int get_num_args (struct function *, int, union any_node **); +static int get_num_args (const struct function *, int, union any_node **); -static int -unary_func (struct function * f, int x UNUSED, union any_node ** n) +static enum expr_type +unary_func (const struct function *f, int x UNUSED, union any_node ** n) { - double divisor; - struct nonterm_node *c; - if (!get_num_args (f, 1, n)) - return EX_ERROR; - - switch (f->t) - { - case OP_CTIME_DAYS: - divisor = 1 / 60. / 60. / 24.; - goto multiply; - case OP_CTIME_HOURS: - divisor = 1 / 60. / 60.; - goto multiply; - case OP_CTIME_MINUTES: - divisor = 1 / 60.; - goto multiply; - case OP_TIME_DAYS: - divisor = 60. * 60. * 24.; - goto multiply; - - case OP_CTIME_SECONDS: - c = &(*n)->nonterm; - *n = (*n)->nonterm.arg[0]; - free (c); - return EX_NUMERIC; - } - return EX_NUMERIC; - -multiply: - /* Arrive here when we encounter an operation that is just a - glorified version of a multiplication or division. Converts the - operation directly into that multiplication. */ - c = xmalloc (sizeof (struct nonterm_node) + sizeof (union any_node *)); - c->type = OP_MUL; - c->n = 2; - c->arg[0] = (*n)->nonterm.arg[0]; - c->arg[1] = xmalloc (sizeof (struct num_con_node)); - c->arg[1]->num_con.type = OP_NUM_CON; - c->arg[1]->num_con.value = divisor; - free (*n); - *n = (union any_node *) c; - return EX_NUMERIC; + return EXPR_ERROR; + return EXPR_NUMERIC; } -static int -binary_func (struct function * f, int x UNUSED, union any_node ** n) +static enum expr_type +binary_func (const struct function *f, int x UNUSED, union any_node ** n) { if (!get_num_args (f, 2, n)) - return EX_ERROR; - return EX_NUMERIC; + return EXPR_ERROR; + return EXPR_NUMERIC; } -static int -ternary_func (struct function * f, int x UNUSED, union any_node ** n) +static enum expr_type +ternary_func (const struct function *f, int x UNUSED, union any_node **n) { if (!get_num_args (f, 3, n)) - return EX_ERROR; - return EX_NUMERIC; + return EXPR_ERROR; + return EXPR_NUMERIC; } -static int -MISSING_func (struct function * f, int x UNUSED, union any_node ** n) +static enum expr_type +MISSING_func (const struct function *f, int x UNUSED, union any_node **n) { - if (token == T_ID - && dict_lookup_var (default_dict, tokid) != NULL - && lex_look_ahead () == ')') - { - struct var_node *c = xmalloc (sizeof *c); - c->v = parse_variable (); - c->type = c->v->type == ALPHA ? OP_STR_MIS : OP_NUM_SYS; - *n = (union any_node *) c; - return EX_BOOLEAN; - } if (!get_num_args (f, 1, n)) - return EX_ERROR; - return EX_BOOLEAN; + return EXPR_ERROR; + return EXPR_BOOLEAN; } -static int -SYSMIS_func (struct function * f UNUSED, int x UNUSED, union any_node ** n) +static enum expr_type +SYSMIS_func (const struct function *f, int x UNUSED, union any_node **n) { - int t; - - if (token == T_ID - && dict_lookup_var (default_dict, tokid) - && lex_look_ahead () == ')') - { - struct variable *v; - v = parse_variable (); - if (v->type == ALPHA) - { - struct num_con_node *c = xmalloc (sizeof *c); - c->type = OP_NUM_CON; - c->value = 0; - return EX_BOOLEAN; - } - else - { - struct var_node *c = xmalloc (sizeof *c); - c->type = OP_NUM_SYS; - c->v = v; - return EX_BOOLEAN; - } - } - - t = parse_or (n); - if (t == EX_ERROR) - return t; - else if (t == EX_NUMERIC) - { - *n = allocate_nonterminal (OP_SYSMIS, *n); - return EX_BOOLEAN; - } - else /* EX_STRING or EX_BOOLEAN */ + if (!get_num_args (f, 1, n)) + return EXPR_ERROR; + if ((*n)->nonterm.arg[0]->type == OP_NUM_VAR) { - /* Return constant `true' value. */ + struct variable *v = (*n)->nonterm.arg[0]->var.v; free_node (*n); - *n = xmalloc (sizeof (struct num_con_node)); - (*n)->num_con.type = OP_NUM_CON; - (*n)->num_con.value = 1.0; - return EX_BOOLEAN; + *n = allocate_var_node (OP_NUM_SYS, v); } + return EXPR_BOOLEAN; } -static int -VALUE_func (struct function *f UNUSED, int x UNUSED, union any_node **n) +static enum expr_type +VALUE_func (const struct function *f UNUSED, int x UNUSED, union any_node **n) { struct variable *v = parse_variable (); if (!v) - return EX_ERROR; - *n = xmalloc (sizeof (struct var_node)); - (*n)->var.v = v; + return EXPR_ERROR; if (v->type == NUMERIC) { - (*n)->var.type = OP_NUM_VAL; - return EX_NUMERIC; + *n = allocate_var_node (OP_NUM_VAL, v); + return EXPR_NUMERIC; } else { - (*n)->var.type = OP_STR_VAR; - return EX_STRING; + *n = allocate_var_node (OP_STR_VAR, v); + return EXPR_STRING; } } -static int -LAG_func (struct function *f UNUSED, int x UNUSED, union any_node **n) +static enum expr_type +LAG_func (const struct function *f UNUSED, int x UNUSED, union any_node **n) { struct variable *v = parse_variable (); int nlag = 1; if (!v) - return EX_ERROR; + return EXPR_ERROR; if (lex_match (',')) { if (!lex_integer_p () || lex_integer () <= 0 || lex_integer () > 1000) { msg (SE, _("Argument 2 to LAG must be a small positive " "integer constant.")); - return 0; + return EXPR_ERROR; } nlag = lex_integer (); @@ -861,21 +744,28 @@ LAG_func (struct function *f UNUSED, int x UNUSED, union any_node **n) (*n)->lag.type = (v->type == NUMERIC ? OP_NUM_LAG : OP_STR_LAG); (*n)->lag.v = v; (*n)->lag.lag = nlag; - return (v->type == NUMERIC ? EX_NUMERIC : EX_STRING); + return (v->type == NUMERIC ? EXPR_NUMERIC : EXPR_STRING); } /* This screwball function parses n-ary operators: - 1. NMISS, NVALID, SUM, MEAN, MIN, MAX: any number of (numeric) arguments. - 2. SD, VARIANCE, CFVAR: at least two (numeric) arguments. - 3. RANGE: An odd number of arguments, but at least three. - All arguments must be the same type. - 4. ANY: At least two arguments. All arguments must be the same type. + + 1. NMISS, NVALID, SUM, MEAN: any number of numeric + arguments. + + 2. SD, VARIANCE, CFVAR: at least two numeric arguments. + + 3. RANGE: An odd number of arguments, but at least three, and + all of the same type. + + 4. ANY: At least two arguments, all of the same type. + + 5. MIN, MAX: Any number of arguments, all of the same type. */ -static int -nary_num_func (struct function *f, int min_args, union any_node **n) +static enum expr_type +nary_num_func (const struct function *f, int min_args, union any_node **n) { /* Argument number of current argument (used for error messages). */ - int argn = 1; + int arg_idx = 1; /* Number of arguments. */ int nargs; @@ -884,7 +774,8 @@ nary_num_func (struct function *f, int min_args, union any_node **n) int m = 16; /* Type of arguments. */ - int type = (f->t == OP_ANY || f->t == OP_RANGE) ? -1 : NUMERIC; + int type = (f->t == OP_ANY || f->t == OP_RANGE + || f->t == OP_MIN || f->t == OP_MAX) ? -1 : NUMERIC; *n = xmalloc (sizeof (struct nonterm_node) + sizeof (union any_node *[15])); (*n)->nonterm.type = f->t; @@ -924,7 +815,7 @@ nary_num_func (struct function *f, int min_args, union any_node **n) msg (SE, _("Type mismatch in argument %d of %s, which was " "expected to be of %s type. It was actually " "of %s type. "), - argn, f->s, type_name (type), type_name (v[j]->type)); + arg_idx, f->s, var_type_name (type), var_type_name (v[j]->type)); free (v); goto fail; } @@ -932,9 +823,9 @@ nary_num_func (struct function *f, int min_args, union any_node **n) for (j = 0; j < nv; j++) { union any_node **c = &(*n)->nonterm.arg[(*n)->nonterm.n++]; - *c = xmalloc (sizeof (struct var_node)); - (*c)->var.type = (type == NUMERIC ? OP_NUM_VAR : OP_STR_VAR); - (*c)->var.v = v[j]; + *c = allocate_var_node ((type == NUMERIC + ? OP_NUM_VAR : OP_STR_VAR), + v[j]); } } else @@ -942,9 +833,9 @@ nary_num_func (struct function *f, int min_args, union any_node **n) union any_node *c; int t = parse_or (&c); - if (t == EX_ERROR) + if (t == EXPR_ERROR) goto fail; - if (t == EX_BOOLEAN) + if (t == EXPR_BOOLEAN) { free_node (c); msg (SE, _("%s cannot take Boolean operands."), f->s); @@ -952,18 +843,18 @@ nary_num_func (struct function *f, int min_args, union any_node **n) } if (type == -1) { - if (t == EX_NUMERIC) + if (t == EXPR_NUMERIC) type = NUMERIC; - else if (t == EX_STRING) + else if (t == EXPR_STRING) type = ALPHA; } - else if ((t == EX_NUMERIC) ^ (type == NUMERIC)) + else if ((t == EXPR_NUMERIC) ^ (type == NUMERIC)) { free_node (c); msg (SE, _("Type mismatch in argument %d of %s, which was " "expected to be of %s type. It was actually " "of %s type. "), - argn, f->s, type_name (type), expr_type_name (t)); + arg_idx, f->s, var_type_name (type), expr_type_name (t)); goto fail; } if ((*n)->nonterm.n + 1 >= m) @@ -983,7 +874,7 @@ nary_num_func (struct function *f, int min_args, union any_node **n) goto fail; } - argn++; + arg_idx++; } *n = xrealloc (*n, (sizeof (struct nonterm_node) + ((*n)->nonterm.n) * sizeof (union any_node *))); @@ -995,7 +886,7 @@ nary_num_func (struct function *f, int min_args, union any_node **n) { msg (SE, _("RANGE requires an odd number of arguments, but " "at least three.")); - return 0; + goto fail; } } else if (f->t == OP_SD || f->t == OP_VARIANCE @@ -1004,7 +895,7 @@ nary_num_func (struct function *f, int min_args, union any_node **n) if (nargs < 2) { msg (SE, _("%s requires at least two arguments."), f->s); - return 0; + goto fail; } } @@ -1020,25 +911,47 @@ nary_num_func (struct function *f, int min_args, union any_node **n) { msg (SE, _("%s.%d requires at least %d arguments."), f->s, min_args, min_args); - return 0; + goto fail; } - if (f->t == OP_ANY || f->t == OP_RANGE) + if (f->t == OP_MIN || f->t == OP_MAX) + { + if (type == ALPHA) + { + if (f->t == OP_MIN) + (*n)->type = OP_MIN_STRING; + else if (f->t == OP_MAX) + (*n)->type = OP_MAX_STRING; + else + assert (0); + return EXPR_STRING; + } + else + return EXPR_NUMERIC; + } + else if (f->t == OP_ANY || f->t == OP_RANGE) { - if (type == T_STRING) - f->t++; - return EX_BOOLEAN; + if (type == ALPHA) + { + if (f->t == OP_ANY) + (*n)->type = OP_ANY_STRING; + else if (f->t == OP_RANGE) + (*n)->type = OP_RANGE_STRING; + else + assert (0); + } + return EXPR_BOOLEAN; } else - return EX_NUMERIC; + return EXPR_NUMERIC; fail: free_node (*n); - return EX_ERROR; + return EXPR_ERROR; } -static int -CONCAT_func (struct function * f UNUSED, int x UNUSED, union any_node ** n) +static enum expr_type +CONCAT_func (const struct function *f UNUSED, int x UNUSED, union any_node **n) { int m = 0; @@ -1056,9 +969,9 @@ CONCAT_func (struct function * f UNUSED, int x UNUSED, union any_node ** n) + (m - 1) * sizeof (union any_node *))); } type = parse_or (&(*n)->nonterm.arg[(*n)->nonterm.n]); - if (type == EX_ERROR) + if (type == EXPR_ERROR) goto fail; - if (type != EX_STRING) + if (type != EXPR_STRING) { msg (SE, _("Argument %d to CONCAT is type %s. All arguments " "to CONCAT must be strings."), @@ -1072,11 +985,11 @@ CONCAT_func (struct function * f UNUSED, int x UNUSED, union any_node ** n) } *n = xrealloc (*n, (sizeof (struct nonterm_node) + ((*n)->nonterm.n - 1) * sizeof (union any_node *))); - return EX_STRING; + return EXPR_STRING; fail: free_node (*n); - return EX_ERROR; + return EXPR_ERROR; } /* Parses a string function according to f->desc. f->desc[0] is the @@ -1085,48 +998,85 @@ fail: args by a slash (`/'). Codes are `n', numeric arg; `s', string arg; and `f', format spec (this must be the last arg). If the optional args are included, the type becomes f->t+1. */ -static int -generic_str_func (struct function *f, int x UNUSED, union any_node ** n) +static enum expr_type +generic_str_func (const struct function *f, int x UNUSED, union any_node **n) { - int max_args = 0; - int type; + struct string_function + { + int t1, t2; + enum expr_type return_type; + const char *arg_types; + }; + + static const struct string_function string_func_tab[] = + { + {OP_INDEX_2, OP_INDEX_3, EXPR_NUMERIC, "ssN"}, + {OP_RINDEX_2, OP_RINDEX_3, EXPR_NUMERIC, "ssN"}, + {OP_LENGTH, 0, EXPR_NUMERIC, "s"}, + {OP_LOWER, 0, EXPR_STRING, "s"}, + {OP_UPPER, 0, EXPR_STRING, "s"}, + {OP_LPAD, 0, EXPR_STRING, "snS"}, + {OP_RPAD, 0, EXPR_STRING, "snS"}, + {OP_LTRIM, 0, EXPR_STRING, "sS"}, + {OP_RTRIM, 0, EXPR_STRING, "sS"}, + {OP_NUMBER, 0, EXPR_NUMERIC, "sf"}, + {OP_STRING, 0, EXPR_STRING, "nf"}, + {OP_SUBSTR_2, OP_SUBSTR_3, EXPR_STRING, "snN"}, + }; + + const int string_func_cnt = sizeof string_func_tab / sizeof *string_func_tab; + + const struct string_function *sf; + int arg_cnt; const char *cp; + struct nonterm_node *nonterm; + + /* Find string_function that corresponds to f. */ + for (sf = string_func_tab; sf < string_func_tab + string_func_cnt; sf++) + if (f->t == sf->t1) + break; + assert (sf < string_func_tab + string_func_cnt); /* Count max number of arguments. */ - cp = &f->desc[1]; - while (*cp) + arg_cnt = 0; + for (cp = sf->arg_types; *cp != '\0'; cp++) { - if (*cp == 'n' || *cp == 's') - max_args++; - else if (*cp == 'f') - max_args += 3; - cp++; + if (*cp != 'f') + arg_cnt++; + else + arg_cnt += 3; } - cp = &f->desc[1]; + /* Allocate node. */ *n = xmalloc (sizeof (struct nonterm_node) - + (max_args - 1) * sizeof (union any_node *)); - (*n)->nonterm.type = f->t; - (*n)->nonterm.n = 0; + + (arg_cnt - 1) * sizeof (union any_node *)); + nonterm = &(*n)->nonterm; + nonterm->type = sf->t1; + nonterm->n = 0; + + /* Parse arguments. */ + cp = sf->arg_types; for (;;) { - if (*cp == 'n' || *cp == 's') + if (*cp == 'n' || *cp == 's' || *cp == 'N' || *cp == 'S') { - int t = *cp == 'n' ? EX_NUMERIC : EX_STRING; - type = parse_or (&(*n)->nonterm.arg[(*n)->nonterm.n]); + enum expr_type wanted_type + = *cp == 'n' || *cp == 'N' ? EXPR_NUMERIC : EXPR_STRING; + enum expr_type actual_type = parse_or (&nonterm->arg[nonterm->n]); - if (type == EX_ERROR) + if (actual_type == EXPR_ERROR) goto fail; - if (type != t) + else if (actual_type == EXPR_BOOLEAN) + actual_type = EXPR_NUMERIC; + if (actual_type != wanted_type) { msg (SE, _("Argument %d to %s was expected to be of %s type. " "It was actually of type %s."), - (*n)->nonterm.n + 1, f->s, - *cp == 'n' ? _("numeric") : _("string"), - expr_type_name (type)); + nonterm->n + 1, f->s, + expr_type_name (actual_type), expr_type_name (wanted_type)); goto fail; } - (*n)->nonterm.n++; + nonterm->n++; } else if (*cp == 'f') { @@ -1141,26 +1091,38 @@ generic_str_func (struct function *f, int x UNUSED, union any_node ** n) msg (SE, _("%s is not a numeric format."), fmt_to_string (&fmt)); goto fail; } - (*n)->nonterm.arg[(*n)->nonterm.n + 0] = (union any_node *) fmt.type; - (*n)->nonterm.arg[(*n)->nonterm.n + 1] = (union any_node *) fmt.w; - (*n)->nonterm.arg[(*n)->nonterm.n + 2] = (union any_node *) fmt.d; + nonterm->arg[nonterm->n + 0] = (union any_node *) fmt.type; + nonterm->arg[nonterm->n + 1] = (union any_node *) fmt.w; + nonterm->arg[nonterm->n + 2] = (union any_node *) fmt.d; break; } else assert (0); - if (*++cp == 0) + /* We're done if no args are left. */ + cp++; + if (*cp == 0) break; - if (*cp == '/') + + /* Optional arguments are named with capital letters. */ + if (isupper ((unsigned char) *cp)) { - cp++; - if (lex_match (',')) - { - (*n)->nonterm.type++; - continue; - } - else - break; + if (!lex_match (',')) + { + if (sf->t2 == 0) + { + if (*cp == 'N') + nonterm->arg[nonterm->n++] = allocate_num_con (SYSMIS); + else if (*cp == 'S') + nonterm->arg[nonterm->n++] = allocate_str_con (" ", 1); + else + assert (0); + } + break; + } + + if (sf->t2 != 0) + nonterm->type = sf->t2; } else if (!lex_match (',')) { @@ -1169,17 +1131,17 @@ generic_str_func (struct function *f, int x UNUSED, union any_node ** n) } } - return f->desc[0] == 'n' ? EX_NUMERIC : EX_STRING; + return sf->return_type; fail: free_node (*n); - return EX_ERROR; + return EXPR_ERROR; } /* General function parsing. */ static int -get_num_args (struct function *f, int num_args, union any_node **n) +get_num_args (const struct function *f, int num_args, union any_node **n) { int t; int i; @@ -1191,14 +1153,16 @@ get_num_args (struct function *f, int num_args, union any_node **n) for (i = 0;;) { t = parse_or (&(*n)->nonterm.arg[i]); - if (t == EX_ERROR) + if (t == EXPR_ERROR) goto fail; (*n)->nonterm.n++; - if (t != EX_NUMERIC) + + if (t == EXPR_STRING) { - msg (SE, _("Type mismatch in argument %d of %s, which was expected " - "to be numeric. It was actually type %s."), - i + 1, f->s, expr_type_name (t)); + msg (SE, _("Type mismatch in argument %d of %s. A string " + "expression was supplied where only a numeric expression " + "is allowed."), + i + 1, f->s); goto fail; } if (++i >= num_args) @@ -1215,10 +1179,10 @@ fail: return 0; } -static int +static enum expr_type parse_function (union any_node ** n) { - struct function *fp; + const struct function *fp; char fname[32], *cp; int t; int min_args; @@ -1239,9 +1203,9 @@ parse_function (union any_node ** n) (*n)->nonterm.n = 0; t = parse_or (&(*n)->nonterm.arg[0]); - if (t == EX_ERROR) + if (t == EXPR_ERROR) goto fail; - if (t != EX_NUMERIC) + if (t != EXPR_NUMERIC) { msg (SE, _("The index value after a vector name must be numeric.")); goto fail; @@ -1255,7 +1219,7 @@ parse_function (union any_node ** n) } ((*n)->nonterm.arg[1]) = (union any_node *) v->idx; - return v->var[0]->type == NUMERIC ? EX_NUMERIC : EX_STRING; + return v->var[0]->type == NUMERIC ? EXPR_NUMERIC : EXPR_STRING; } ds_truncate (&tokstr, 31); @@ -1271,7 +1235,7 @@ parse_function (union any_node ** n) lex_get (); if (!lex_force_match ('(')) - return 0; + return EXPR_ERROR; { struct function f; @@ -1284,17 +1248,17 @@ parse_function (union any_node ** n) if (!fp) { msg (SE, _("There is no function named %s."), fname); - return 0; + return EXPR_ERROR; } if (min_args && fp->func != nary_num_func) { msg (SE, _("Function %s may not be given a minimum number of " "arguments."), fname); - return 0; + return EXPR_ERROR; } t = fp->func (fp, min_args, n); - if (t == EX_ERROR) - return EX_ERROR; + if (t == EXPR_ERROR) + return EXPR_ERROR; if (!lex_match (')')) { lex_error (_("expecting `)' after %s function"), fname); @@ -1305,168 +1269,26 @@ parse_function (union any_node ** n) fail: free_node (*n); - return EX_ERROR; + return EXPR_ERROR; } - -#if GLOBAL_DEBUGGING -#define op(a,b,c,d) {a,b,c,d} -#else -#define op(a,b,c,d) {b,c,d} -#endif - -#define varies 0 - -struct op_desc ops[OP_SENTINEL + 1] = -{ - op ("!?ERROR?!", 000, 0, 0), - - op ("plus", 001, varies, 1), - op ("mul", 011, varies, 1), - op ("pow", 010, -1, 0), - op ("and", 010, -1, 0), - op ("or", 010, -1, 0), - op ("not", 000, 0, 0), - op ("eq", 000, -1, 0), - op ("ge", 000, -1, 0), - op ("gt", 000, -1, 0), - op ("le", 000, -1, 0), - op ("lt", 000, -1, 0), - op ("ne", 000, -1, 0), - - op ("string-eq", 000, -1, 0), - op ("string-ge", 000, -1, 0), - op ("string-gt", 000, -1, 0), - op ("string-le", 000, -1, 0), - op ("string-lt", 000, -1, 0), - op ("string-ne", 000, -1, 0), - - op ("neg", 000, 0, 0), - op ("abs", 000, 0, 0), - op ("arcos", 000, 0, 0), - op ("arsin", 000, 0, 0), - op ("artan", 000, 0, 0), - op ("cos", 000, 0, 0), - op ("exp", 000, 0, 0), - op ("lg10", 000, 0, 0), - op ("ln", 000, 0, 0), - op ("mod10", 000, 0, 0), - op ("rnd", 000, 0, 0), - op ("sin", 000, 0, 0), - op ("sqrt", 000, 0, 0), - op ("tan", 000, 0, 0), - op ("trunc", 000, 0, 0), - - op ("any", 011, varies, 1), - op ("any-string", 001, varies, 1), - op ("cfvar", 013, varies, 2), - op ("max", 013, varies, 2), - op ("mean", 013, varies, 2), - op ("min", 013, varies, 2), - op ("nmiss", 011, varies, 1), - op ("nvalid", 011, varies, 1), - op ("range", 011, varies, 1), - op ("range-string", 001, varies, 1), - op ("sd", 013, varies, 2), - op ("sum", 013, varies, 2), - op ("variance", 013, varies, 2), - - op ("time_hms", 000, -2, 0), - op ("ctime_days?!", 000, 0, 0), - op ("ctime_hours?!", 000, 0, 0), - op ("ctime_minutes?!", 000, 0, 0), - op ("ctime_seconds?!", 000, 0, 0), - op ("time_days?!", 000, 0, 0), - - op ("date_dmy", 000, -2, 0), - op ("date_mdy", 000, -2, 0), - op ("date_moyr", 000, -1, 0), - op ("date_qyr", 000, -1, 0), - op ("date_wkyr", 000, -1, 0), - op ("date_yrday", 000, -1, 0), - op ("yrmoda", 000, -2, 0), - - op ("xdate_date", 000, 0, 0), - op ("xdate_hour", 000, 0, 0), - op ("xdate_jday", 000, 0, 0), - op ("xdate_mday", 000, 0, 0), - op ("xdate_minute", 000, 0, 0), - op ("xdate_month", 000, 0, 0), - op ("xdate_quarter", 000, 0, 0), - op ("xdate_second", 000, 0, 0), - op ("xdate_tday", 000, 0, 0), - op ("xdate_time", 000, 0, 0), - op ("xdate_week", 000, 0, 0), - op ("xdate_wkday", 000, 0, 0), - op ("xdate_year", 000, 0, 0), - - op ("concat", 001, varies, 1), - op ("index-2", 000, -1, 0), - op ("index-3", 000, -2, 0), - op ("rindex-2", 000, -1, 0), - op ("rindex-3", 000, -2, 0), - op ("length", 000, 0, 0), - op ("lower", 000, 0, 0), - op ("upcas", 000, 0, 0), - op ("lpad-2", 010, -1, 0), - op ("lpad-3", 010, -2, 0), - op ("rpad-2", 010, -1, 0), - op ("rpad-3", 010, -2, 0), - op ("ltrim-1", 000, 0, 0), - op ("ltrim-2", 000, -1, 0), - op ("rtrim-1", 000, 0, 0), - op ("rtrim-2", 000, -1, 0), - op ("number-1", 010, 0, 0), - op ("number-2", 014, 0, 3), - op ("string", 004, 0, 3), - op ("substr-2", 010, -1, 0), - op ("substr-3", 010, -2, 0), - - op ("inv", 000, 0, 0), - op ("square", 000, 0, 0), - op ("num-to-Bool", 000, 0, 0), - - op ("mod", 010, -1, 0), - op ("normal", 000, 0, 0), - op ("uniform", 000, 0, 0), - op ("sysmis", 010, 0, 0), - op ("vec-elem-num", 002, 0, 1), - op ("vec-elem-str", 002, 0, 1), - - op ("!?TERMINAL?!", 000, 0, 0), - op ("num-con", 000, +1, 0), - op ("str-con", 000, +1, 0), - op ("num-var", 000, +1, 0), - op ("str-var", 000, +1, 0), - op ("num-lag", 000, +1, 1), - op ("str-lag", 000, +1, 1), - op ("num-sys", 000, +1, 1), - op ("num-val", 000, +1, 1), - op ("str-mis", 000, +1, 1), - op ("$casenum", 000, +1, 0), - op ("!?SENTINEL?!", 000, 0, 0), -}; - -#undef op -#undef varies - /* Utility functions. */ static const char * -expr_type_name (int type) +expr_type_name (enum expr_type type) { switch (type) { - case EX_ERROR: + case EXPR_ERROR: return _("error"); - case EX_BOOLEAN: + case EXPR_BOOLEAN: return _("Boolean"); - case EX_NUMERIC: + case EXPR_NUMERIC: return _("numeric"); - case EX_STRING: + case EXPR_STRING: return _("string"); default: @@ -1476,7 +1298,7 @@ expr_type_name (int type) } static const char * -type_name (int type) +var_type_name (int type) { switch (type) { @@ -1505,14 +1327,54 @@ make_bool (union any_node **n) void free_node (union any_node *n) { - if (n->type < OP_TERMINAL) + if (n != NULL) { - int i; - - for (i = 0; i < n->nonterm.n; i++) - free_node (n->nonterm.arg[i]); + if (IS_NONTERMINAL (n->type)) + { + int i; + + for (i = 0; i < n->nonterm.n; i++) + free_node (n->nonterm.arg[i]); + } + free (n); } - free (n); +} + +static union any_node * +allocate_num_con (double value) +{ + union any_node *c; + + c = xmalloc (sizeof (struct num_con_node)); + c->num_con.type = OP_NUM_CON; + c->num_con.value = value; + + return c; +} + +static union any_node * +allocate_str_con (const char *string, size_t length) +{ + union any_node *c; + + c = xmalloc (sizeof (struct str_con_node) + length - 1); + c->str_con.type = OP_STR_CON; + c->str_con.len = length; + memcpy (c->str_con.s, string, length); + + return c; +} + +static union any_node * +allocate_var_node (int type, struct variable *variable) +{ + union any_node *c; + + c = xmalloc (sizeof (struct var_node)); + c->var.type = type; + c->var.v = variable; + + return c; } union any_node * @@ -1528,97 +1390,105 @@ allocate_nonterminal (int op, union any_node *n) return c; } -union any_node * -append_nonterminal_arg (union any_node *a, union any_node *b) +static union any_node * +allocate_binary_nonterminal (int op, union any_node *lhs, union any_node *rhs) { - a = xrealloc (a, sizeof *a + sizeof *a->nonterm.arg * a->nonterm.n); - a->nonterm.arg[a->nonterm.n++] = b; - return a; + union any_node *node; + + node = xmalloc (sizeof node->nonterm + sizeof *node->nonterm.arg); + node->nonterm.type = op; + node->nonterm.n = 2; + node->nonterm.arg[0] = lhs; + node->nonterm.arg[1] = rhs; + + return node; } static struct function func_tab[] = { - {"ABS", OP_ABS, unary_func, NULL}, - {"ACOS", OP_ARCOS, unary_func, NULL}, - {"ARCOS", OP_ARCOS, unary_func, NULL}, - {"ARSIN", OP_ARSIN, unary_func, NULL}, - {"ARTAN", OP_ARTAN, unary_func, NULL}, - {"ASIN", OP_ARSIN, unary_func, NULL}, - {"ATAN", OP_ARTAN, unary_func, NULL}, - {"COS", OP_COS, unary_func, NULL}, - {"EXP", OP_EXP, unary_func, NULL}, - {"LG10", OP_LG10, unary_func, NULL}, - {"LN", OP_LN, unary_func, NULL}, - {"MOD10", OP_MOD10, unary_func, NULL}, - {"NORMAL", OP_NORMAL, unary_func, NULL}, - {"RND", OP_RND, unary_func, NULL}, - {"SIN", OP_SIN, unary_func, NULL}, - {"SQRT", OP_SQRT, unary_func, NULL}, - {"TAN", OP_TAN, unary_func, NULL}, - {"TRUNC", OP_TRUNC, unary_func, NULL}, - {"UNIFORM", OP_UNIFORM, unary_func, NULL}, - - {"TIME.DAYS", OP_TIME_DAYS, unary_func, NULL}, - {"TIME.HMS", OP_TIME_HMS, ternary_func, NULL}, - - {"CTIME.DAYS", OP_CTIME_DAYS, unary_func, NULL}, - {"CTIME.HOURS", OP_CTIME_HOURS, unary_func, NULL}, - {"CTIME.MINUTES", OP_CTIME_MINUTES, unary_func, NULL}, - {"CTIME.SECONDS", OP_CTIME_SECONDS, unary_func, NULL}, - - {"DATE.DMY", OP_DATE_DMY, ternary_func, NULL}, - {"DATE.MDY", OP_DATE_MDY, ternary_func, NULL}, - {"DATE.MOYR", OP_DATE_MOYR, binary_func, NULL}, - {"DATE.QYR", OP_DATE_QYR, binary_func, NULL}, - {"DATE.WKYR", OP_DATE_WKYR, binary_func, NULL}, - {"DATE.YRDAY", OP_DATE_YRDAY, binary_func, NULL}, - - {"XDATE.DATE", OP_XDATE_DATE, unary_func, NULL}, - {"XDATE.HOUR", OP_XDATE_HOUR, unary_func, NULL}, - {"XDATE.JDAY", OP_XDATE_JDAY, unary_func, NULL}, - {"XDATE.MDAY", OP_XDATE_MDAY, unary_func, NULL}, - {"XDATE.MINUTE", OP_XDATE_MINUTE, unary_func, NULL}, - {"XDATE.MONTH", OP_XDATE_MONTH, unary_func, NULL}, - {"XDATE.QUARTER", OP_XDATE_QUARTER, unary_func, NULL}, - {"XDATE.SECOND", OP_XDATE_SECOND, unary_func, NULL}, - {"XDATE.TDAY", OP_XDATE_TDAY, unary_func, NULL}, - {"XDATE.TIME", OP_XDATE_TIME, unary_func, NULL}, - {"XDATE.WEEK", OP_XDATE_WEEK, unary_func, NULL}, - {"XDATE.WKDAY", OP_XDATE_WKDAY, unary_func, NULL}, - {"XDATE.YEAR", OP_XDATE_YEAR, unary_func, NULL}, - - {"MISSING", OP_SYSMIS, MISSING_func, NULL}, - {"MOD", OP_MOD, binary_func, NULL}, - {"SYSMIS", OP_SYSMIS, SYSMIS_func, NULL}, - {"VALUE", OP_NUM_VAL, VALUE_func, NULL}, - {"LAG", OP_NUM_LAG, LAG_func, NULL}, - {"YRMODA", OP_YRMODA, ternary_func, NULL}, - - {"ANY", OP_ANY, nary_num_func, NULL}, - {"CFVAR", OP_CFVAR, nary_num_func, NULL}, - {"MAX", OP_MAX, nary_num_func, NULL}, - {"MEAN", OP_MEAN, nary_num_func, NULL}, - {"MIN", OP_MIN, nary_num_func, NULL}, - {"NMISS", OP_NMISS, nary_num_func, NULL}, - {"NVALID", OP_NVALID, nary_num_func, NULL}, - {"RANGE", OP_RANGE, nary_num_func, NULL}, - {"SD", OP_SD, nary_num_func, NULL}, - {"SUM", OP_SUM, nary_num_func, NULL}, - {"VARIANCE", OP_VARIANCE, nary_num_func, NULL}, - - {"CONCAT", OP_CONCAT, CONCAT_func, NULL}, - {"INDEX", OP_INDEX, generic_str_func, "nss/n"}, - {"RINDEX", OP_RINDEX, generic_str_func, "nss/n"}, - {"LENGTH", OP_LENGTH, generic_str_func, "ns"}, - {"LOWER", OP_LOWER, generic_str_func, "ss"}, - {"UPCAS", OP_UPPER, generic_str_func, "ss"}, - {"LPAD", OP_LPAD, generic_str_func, "ssn/s"}, - {"RPAD", OP_RPAD, generic_str_func, "ssn/s"}, - {"LTRIM", OP_LTRIM, generic_str_func, "ss/s"}, - {"RTRIM", OP_RTRIM, generic_str_func, "ss/s"}, - {"NUMBER", OP_NUMBER, generic_str_func, "ns/f"}, - {"STRING", OP_STRING, generic_str_func, "snf"}, - {"SUBSTR", OP_SUBSTR, generic_str_func, "ssn/n"}, + {"ABS", OP_ABS, unary_func}, + {"ACOS", OP_ARCOS, unary_func}, + {"ARCOS", OP_ARCOS, unary_func}, + {"ARSIN", OP_ARSIN, unary_func}, + {"ARTAN", OP_ARTAN, unary_func}, + {"ASIN", OP_ARSIN, unary_func}, + {"ATAN", OP_ARTAN, unary_func}, + {"COS", OP_COS, unary_func}, + {"EXP", OP_EXP, unary_func}, + {"LG10", OP_LG10, unary_func}, + {"LN", OP_LN, unary_func}, + {"MOD10", OP_MOD10, unary_func}, + {"NORMAL", OP_NORMAL, unary_func}, + {"RND", OP_RND, unary_func}, + {"SIN", OP_SIN, unary_func}, + {"SQRT", OP_SQRT, unary_func}, + {"TAN", OP_TAN, unary_func}, + {"TRUNC", OP_TRUNC, unary_func}, + {"UNIFORM", OP_UNIFORM, unary_func}, + + {"TIME.DAYS", OP_TIME_DAYS, unary_func}, + {"TIME.HMS", OP_TIME_HMS, ternary_func}, + + {"CTIME.DAYS", OP_CTIME_DAYS, unary_func}, + {"CTIME.HOURS", OP_CTIME_HOURS, unary_func}, + {"CTIME.MINUTES", OP_CTIME_MINUTES, unary_func}, + {"CTIME.SECONDS", OP_CTIME_SECONDS, unary_func}, + + {"DATE.DMY", OP_DATE_DMY, ternary_func}, + {"DATE.MDY", OP_DATE_MDY, ternary_func}, + {"DATE.MOYR", OP_DATE_MOYR, binary_func}, + {"DATE.QYR", OP_DATE_QYR, binary_func}, + {"DATE.WKYR", OP_DATE_WKYR, binary_func}, + {"DATE.YRDAY", OP_DATE_YRDAY, binary_func}, + + {"XDATE.DATE", OP_XDATE_DATE, unary_func}, + {"XDATE.HOUR", OP_XDATE_HOUR, unary_func}, + {"XDATE.JDAY", OP_XDATE_JDAY, unary_func}, + {"XDATE.MDAY", OP_XDATE_MDAY, unary_func}, + {"XDATE.MINUTE", OP_XDATE_MINUTE, unary_func}, + {"XDATE.MONTH", OP_XDATE_MONTH, unary_func}, + {"XDATE.QUARTER", OP_XDATE_QUARTER, unary_func}, + {"XDATE.SECOND", OP_XDATE_SECOND, unary_func}, + {"XDATE.TDAY", OP_XDATE_TDAY, unary_func}, + {"XDATE.TIME", OP_XDATE_TIME, unary_func}, + {"XDATE.WEEK", OP_XDATE_WEEK, unary_func}, + {"XDATE.WKDAY", OP_XDATE_WKDAY, unary_func}, + {"XDATE.YEAR", OP_XDATE_YEAR, unary_func}, + + {"MISSING", OP_SYSMIS, MISSING_func}, + {"MOD", OP_MOD, binary_func}, + {"SYSMIS", OP_SYSMIS, SYSMIS_func}, + {"VALUE", OP_NUM_VAL, VALUE_func}, + {"LAG", OP_NUM_LAG, LAG_func}, + {"YRMODA", OP_YRMODA, ternary_func}, + + {"ANY", OP_ANY, nary_num_func}, + {"CFVAR", OP_CFVAR, nary_num_func}, + {"MAX", OP_MAX, nary_num_func}, + {"MEAN", OP_MEAN, nary_num_func}, + {"MIN", OP_MIN, nary_num_func}, + {"NMISS", OP_NMISS, nary_num_func}, + {"NVALID", OP_NVALID, nary_num_func}, + {"RANGE", OP_RANGE, nary_num_func}, + {"SD", OP_SD, nary_num_func}, + {"SUM", OP_SUM, nary_num_func}, + {"VAR", OP_VARIANCE, nary_num_func}, + {"VARIANCE", OP_VARIANCE, nary_num_func}, + + {"CONCAT", OP_CONCAT, CONCAT_func}, + + {"INDEX", OP_INDEX_2, generic_str_func}, + {"RINDEX", OP_RINDEX_2, generic_str_func}, + {"LENGTH", OP_LENGTH, generic_str_func}, + {"LOWER", OP_LOWER, generic_str_func}, + {"UPCASE", OP_UPPER, generic_str_func}, + {"LPAD", OP_LPAD, generic_str_func}, + {"RPAD", OP_RPAD, generic_str_func}, + {"LTRIM", OP_LTRIM, generic_str_func}, + {"RTRIM", OP_RTRIM, generic_str_func}, + {"NUMBER", OP_NUMBER, generic_str_func}, + {"STRING", OP_STRING, generic_str_func}, + {"SUBSTR", OP_SUBSTR_2, generic_str_func}, }; /* An algo_compare_func that compares functions A and B based on @@ -1727,32 +1597,31 @@ debug_print_tree (union any_node * n, int level) } #endif /* DEBUGGING */ -#if GLOBAL_DEBUGGING -static void -debug_print_postfix (struct expression * e) +void +expr_debug_print_postfix (const struct expression *e) { - unsigned char *o; - double *num = e->num; - unsigned char *str = e->str; - struct variable **v = e->var; + const unsigned char *o; + const double *num = e->num; + const unsigned char *str = e->str; + struct variable *const *v = e->var; int t; - debug_printf ((_("postfix:"))); + printf ("postfix:"); for (o = e->op; *o != OP_SENTINEL;) { t = *o++; - if (t < OP_TERMINAL) + if (IS_NONTERMINAL (t)) { - debug_printf ((" %s", ops[t].name)); + printf (" %s", ops[t].name); if (ops[t].flags & OP_VAR_ARGS) { - debug_printf (("(%d)", *o)); + printf ("(%d)", *o); o++; } if (ops[t].flags & OP_MIN_ARGS) { - debug_printf ((".%d", *o)); + printf (".%d", *o); o++; } if (ops[t].flags & OP_FMT_SPEC) @@ -1761,49 +1630,55 @@ debug_print_postfix (struct expression * e) f.type = (int) *o++; f.w = (int) *o++; f.d = (int) *o++; - debug_printf (("(%s)", fmt_to_string (&f))); + printf ("(%s)", fmt_to_string (&f)); } } else if (t == OP_NUM_CON) { if (*num == SYSMIS) - debug_printf ((" SYSMIS")); + printf (" SYSMIS"); else - debug_printf ((" %f", *num)); + printf (" %f", *num); num++; } else if (t == OP_STR_CON) { - debug_printf ((" \"%.*s\"", *str, &str[1])); + printf (" \"%.*s\"", *str, &str[1]); str += str[0] + 1; } else if (t == OP_NUM_VAR || t == OP_STR_VAR) { - debug_printf ((" %s", (*v)->name)); + printf (" %s", (*v)->name); v++; } else if (t == OP_NUM_SYS) { - debug_printf ((" SYSMIS(#%d)", *o)); + printf (" SYSMIS(#%d)", *o); o++; } else if (t == OP_NUM_VAL) { - debug_printf ((" VALUE(#%d)", *o)); + printf (" VALUE(#%d)", *o); o++; } else if (t == OP_NUM_LAG || t == OP_STR_LAG) { - debug_printf ((" LAG(%s,%d)", (*v)->name, *o)); + printf (" LAG(%s,%d)", (*v)->name, *o); o++; v++; } else { - printf ("debug_print_postfix(): %d\n", t); + printf ("%d unknown\n", t); assert (0); } } - debug_putc ('\n', stdout); + putchar ('\n'); } -#endif /* GLOBAL_DEBUGGING */ + +#define DEFINE_OPERATOR(NAME, STACK_DELTA, FLAGS, ARGS) \ + {#NAME, STACK_DELTA, FLAGS, ARGS}, +struct op_desc ops[OP_SENTINEL] = + { +#include "expr.def" + }; diff --git a/src/expr.def b/src/expr.def new file mode 100644 index 00000000..298e00e2 --- /dev/null +++ b/src/expr.def @@ -0,0 +1,151 @@ +/* One operand, one result. */ +#define UNARY_OPERATOR(NAME) DEFINE_OPERATOR (NAME, 0, OP_NO_FLAGS, 0) + +/* Two operands, one result. */ +#define BINARY_OPERATOR(NAME) DEFINE_OPERATOR (NAME, -1, OP_NO_FLAGS, 0) + +/* Three operands, one result. */ +#define TERNARY_OPERATOR(NAME) DEFINE_OPERATOR (NAME, -2, OP_NO_FLAGS, 0) + +/* Variable operands, one result. */ +#define NARY_OPERATOR(NAME, FLAGS, ARGS) \ + DEFINE_OPERATOR (NAME, 0, OP_VAR_ARGS | OP_ABSORB_MISS | (FLAGS), ARGS) + +/* No operands, one result. */ +#define TERMINAL(NAME, ARGS) DEFINE_OPERATOR (NAME, +1, OP_NO_FLAGS, ARGS) + +/* Basic operators. */ +BINARY_OPERATOR (ADD) +BINARY_OPERATOR (SUB) +DEFINE_OPERATOR (MUL, -1, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (DIV, -1, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (MOD, -1, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (POW, -1, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (AND, -1, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (OR, -1, OP_ABSORB_MISS, 0) +UNARY_OPERATOR (NOT) + +/* Numeric relational operators. */ +BINARY_OPERATOR (EQ) +BINARY_OPERATOR (GE) +BINARY_OPERATOR (GT) +BINARY_OPERATOR (LE) +BINARY_OPERATOR (LT) +BINARY_OPERATOR (NE) + +/* String relational operators. */ +BINARY_OPERATOR (EQ_STRING) +BINARY_OPERATOR (GE_STRING) +BINARY_OPERATOR (GT_STRING) +BINARY_OPERATOR (LE_STRING) +BINARY_OPERATOR (LT_STRING) +BINARY_OPERATOR (NE_STRING) + +/* Unary functions. */ +UNARY_OPERATOR (NEG) +UNARY_OPERATOR (ABS) +UNARY_OPERATOR (ARCOS) +UNARY_OPERATOR (ARSIN) +UNARY_OPERATOR (ARTAN) +UNARY_OPERATOR (COS) +UNARY_OPERATOR (EXP) +UNARY_OPERATOR (LG10) +UNARY_OPERATOR (LN) +UNARY_OPERATOR (MOD10) +UNARY_OPERATOR (RND) +UNARY_OPERATOR (SIN) +UNARY_OPERATOR (SQRT) +UNARY_OPERATOR (TAN) +UNARY_OPERATOR (TRUNC) + +/* N-ary numeric functions. */ +NARY_OPERATOR (ANY, 0, 1) +NARY_OPERATOR (ANY_STRING, 0, 1) +NARY_OPERATOR (CFVAR, OP_MIN_ARGS, 2) +NARY_OPERATOR (MAX, OP_MIN_ARGS, 2) +NARY_OPERATOR (MAX_STRING, 0, 1) +NARY_OPERATOR (MEAN, OP_MIN_ARGS, 2) +NARY_OPERATOR (MIN, OP_MIN_ARGS, 2) +NARY_OPERATOR (MIN_STRING, 0, 1) +NARY_OPERATOR (NMISS, 0, 1) +NARY_OPERATOR (NVALID, 0, 1) +NARY_OPERATOR (RANGE, 0, 1) +NARY_OPERATOR (RANGE_STRING, 0, 1) +NARY_OPERATOR (SD, OP_MIN_ARGS, 2) +NARY_OPERATOR (SUM, OP_MIN_ARGS, 2) +NARY_OPERATOR (VARIANCE, OP_MIN_ARGS, 2) + +/* Time construction & extraction functions. */ +TERNARY_OPERATOR (TIME_HMS) +UNARY_OPERATOR (CTIME_DAYS) +UNARY_OPERATOR (CTIME_HOURS) +UNARY_OPERATOR (CTIME_MINUTES) +UNARY_OPERATOR (CTIME_SECONDS) +UNARY_OPERATOR (TIME_DAYS) + +/* Date construction functions. */ +TERNARY_OPERATOR (DATE_DMY) +TERNARY_OPERATOR (DATE_MDY) +BINARY_OPERATOR (DATE_MOYR) +BINARY_OPERATOR (DATE_QYR) +BINARY_OPERATOR (DATE_WKYR) +BINARY_OPERATOR (DATE_YRDAY) +TERNARY_OPERATOR (YRMODA) + +/* Date extraction functions. */ +UNARY_OPERATOR (XDATE_DATE) +UNARY_OPERATOR (XDATE_HOUR) +UNARY_OPERATOR (XDATE_JDAY) +UNARY_OPERATOR (XDATE_MDAY) +UNARY_OPERATOR (XDATE_MINUTE) +UNARY_OPERATOR (XDATE_MONTH) +UNARY_OPERATOR (XDATE_QUARTER) +UNARY_OPERATOR (XDATE_SECOND) +UNARY_OPERATOR (XDATE_TDAY) +UNARY_OPERATOR (XDATE_TIME) +UNARY_OPERATOR (XDATE_WEEK) +UNARY_OPERATOR (XDATE_WKDAY) +UNARY_OPERATOR (XDATE_YEAR) + +/* String functions. */ +NARY_OPERATOR (CONCAT, 0, 0) +BINARY_OPERATOR (INDEX_2) +TERNARY_OPERATOR (INDEX_3) +BINARY_OPERATOR (RINDEX_2) +TERNARY_OPERATOR (RINDEX_3) +UNARY_OPERATOR (LENGTH) +UNARY_OPERATOR (LOWER) +UNARY_OPERATOR (UPPER) +DEFINE_OPERATOR (LPAD, -2, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (RPAD, -2, OP_ABSORB_MISS, 0) +UNARY_OPERATOR (LTRIM) +UNARY_OPERATOR (RTRIM) +DEFINE_OPERATOR (NUMBER, -1, OP_FMT_SPEC, 3) +DEFINE_OPERATOR (STRING, 0, OP_FMT_SPEC | OP_ABSORB_MISS, 3) +DEFINE_OPERATOR (SUBSTR_2, -1, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (SUBSTR_3, -2, OP_ABSORB_MISS, 0) + +/* Artificial. */ +UNARY_OPERATOR (SQUARE) +UNARY_OPERATOR (NUM_TO_BOOL) + +/* Weirdness. */ +UNARY_OPERATOR (NORMAL) +UNARY_OPERATOR (UNIFORM) +DEFINE_OPERATOR (SYSMIS, 0, OP_ABSORB_MISS, 0) +DEFINE_OPERATOR (VEC_ELEM_NUM, 0, OP_MIN_ARGS, 1) +DEFINE_OPERATOR (VEC_ELEM_STR, 0, OP_MIN_ARGS, 1) + +/* Terminals. */ +TERMINAL (NUM_CON, 0) +TERMINAL (STR_CON, 0) +TERMINAL (NUM_VAR, 0) +TERMINAL (STR_VAR, 0) +TERMINAL (NUM_LAG, 1) +TERMINAL (STR_LAG, 1) +TERMINAL (NUM_SYS, 1) +TERMINAL (NUM_VAL, 1) +TERMINAL (STR_MIS, 1) +TERMINAL (CASENUM, 0) + +#undef DEFINE_OPERATOR diff --git a/src/expr.h b/src/expr.h index e2cc9e36..12912565 100644 --- a/src/expr.h +++ b/src/expr.h @@ -21,23 +21,26 @@ #define expr_h 1 /* Expression parsing flags. */ -enum +enum expr_type { - PXP_NONE = 000, /* No flags. */ - - /* Specify expression type. */ - PXP_BOOLEAN = 002, /* Coerce return value to Boolean. */ - PXP_NUMERIC = 004, /* Must be numeric result type. */ - PXP_STRING = 010 /* Must be string result type. */ + EXPR_ANY = 0, /* Any type. */ + EXPR_BOOLEAN = 1, /* Must be numeric; coerce to Boolean. */ + EXPR_NUMERIC = 2, /* Must be numeric result type. */ + EXPR_STRING = 3, /* Must be string result type. */ + EXPR_ERROR = 4, /* Indicates an error. */ + EXPR_NO_OPTIMIZE = 0x1000 /* May be set in expr_parse() + argument to disable optimization. */ }; struct expression; struct ccase; union value; -struct expression *expr_parse (int flags); -double expr_evaluate (struct expression *, const struct ccase *, int case_num, - union value *); +struct expression *expr_parse (enum expr_type); +enum expr_type expr_get_type (const struct expression *); +double expr_evaluate (const struct expression *, const struct ccase *, + int case_num, union value *); void expr_free (struct expression *); +void expr_debug_print_postfix (const struct expression *); #endif /* expr.h */ diff --git a/src/exprP.h b/src/exprP.h index 59663800..40033a95 100644 --- a/src/exprP.h +++ b/src/exprP.h @@ -24,174 +24,25 @@ /*#define DEBUGGING 1*/ #include "debug-print.h" -#if GLOBAL_DEBUGGING -void debug_print_expr (struct expression *); void debug_print_op (short int *); -#endif -/* Expression types. */ -enum - { - EX_ERROR, /* Error value for propagation. */ - EX_BOOLEAN, /* Numeric value that's 0, 1, or SYSMIS. */ - EX_NUMERIC, /* Numeric value. */ - EX_STRING /* String value. */ - }; -/* Expression operators. - The ordering below is important. Do not change it. */ +/* Expression operators. */ +#define DEFINE_OPERATOR(NAME, STACK_DELTA, FLAGS, ARGS) \ + OP_##NAME, enum { - OP_ERROR, - - /* Basic operators. */ - OP_PLUS, - OP_MUL, - OP_POW, - OP_AND, - OP_OR, - OP_NOT, - - /* Numeric relational operators. */ - OP_EQ, - OP_GE, - OP_GT, - OP_LE, - OP_LT, - OP_NE, - - /* String relational operators. */ - OP_STRING_EQ, - OP_STRING_GE, - OP_STRING_GT, - OP_STRING_LE, - OP_STRING_LT, - OP_STRING_NE, - - /* Unary functions. */ - OP_NEG, - OP_ABS, - OP_ARCOS, - OP_ARSIN, - OP_ARTAN, - OP_COS, - OP_EXP, - OP_LG10, - OP_LN, - OP_MOD10, - OP_RND, - OP_SIN, - OP_SQRT, - OP_TAN, - OP_TRUNC, - - /* N-ary numeric functions. */ - OP_ANY, - OP_ANY_STRING, - OP_CFVAR, - OP_MAX, - OP_MEAN, - OP_MIN, - OP_NMISS, - OP_NVALID, - OP_RANGE, - OP_RANGE_STRING, - OP_SD, - OP_SUM, - OP_VARIANCE, - - /* Time construction & extraction functions. */ - OP_TIME_HMS, - - /* These never appear in a tree or an expression. - They disappear in parse.c:unary_func(). */ - OP_CTIME_DAYS, - OP_CTIME_HOURS, - OP_CTIME_MINUTES, - OP_CTIME_SECONDS, - OP_TIME_DAYS, - - /* Date construction functions. */ - OP_DATE_DMY, - OP_DATE_MDY, - OP_DATE_MOYR, - OP_DATE_QYR, - OP_DATE_WKYR, - OP_DATE_YRDAY, - OP_YRMODA, - - /* Date extraction functions. */ - OP_XDATE_DATE, - OP_XDATE_HOUR, - OP_XDATE_JDAY, - OP_XDATE_MDAY, - OP_XDATE_MINUTE, - OP_XDATE_MONTH, - OP_XDATE_QUARTER, - OP_XDATE_SECOND, - OP_XDATE_TDAY, - OP_XDATE_TIME, - OP_XDATE_WEEK, - OP_XDATE_WKDAY, - OP_XDATE_YEAR, - - /* String functions. */ - OP_CONCAT, - OP_INDEX, - OP_INDEX_OPT, - OP_RINDEX, - OP_RINDEX_OPT, - OP_LENGTH, - OP_LOWER, - OP_UPPER, - OP_LPAD, - OP_LPAD_OPT, - OP_RPAD, - OP_RPAD_OPT, - OP_LTRIM, - OP_LTRIM_OPT, - OP_RTRIM, - OP_RTRIM_OPT, - OP_NUMBER, - OP_NUMBER_OPT, - OP_STRING, - OP_SUBSTR, - OP_SUBSTR_OPT, - - /* Artificial. */ - OP_INV, /* Reciprocal. */ - OP_SQUARE, /* Squares the argument. */ - OP_NUM_TO_BOOL, /* Converts ~0=>0, ~1=>1, SYSMIS=>SYSMIS, - others=>0 with a warning. */ - - /* Weirdness. */ - OP_MOD, /* Modulo function. */ - OP_NORMAL, /* Normally distributed PRNG. */ - OP_UNIFORM, /* Uniformly distributed PRNG. */ - OP_SYSMIS, /* Tests whether for SYSMIS argument. */ - OP_VEC_ELEM_NUM, /* Element of a numeric vector. */ - OP_VEC_ELEM_STR, /* Element of a string vector. */ - - /* Terminals. */ - OP_TERMINAL, /* Not a valid type. Boundary - between terminals and nonterminals. */ - - OP_NUM_CON, /* Numeric constant. */ - OP_STR_CON, /* String literal. */ - OP_NUM_VAR, /* Numeric variable reference. */ - OP_STR_VAR, /* String variable reference. */ - OP_NUM_LAG, /* Numeric variable from an earlier case. */ - OP_STR_LAG, /* String variable from an earlier case. */ - OP_NUM_SYS, /* SYSMIS(numvar). */ - OP_NUM_VAL, /* VALUE(numvar). */ - OP_STR_MIS, /* MISSING(strvar). */ - OP_CASENUM, /* $CASENUM. */ - OP_SENTINEL /* Sentinel. */ +#include "expr.def" + OP_SENTINEL }; +#define IS_TERMINAL(OPERATOR) (ops[OPERATOR].height > 0) +#define IS_NONTERMINAL(OPERATOR) !IS_TERMINAL (OPERATOR) + /* Flags that describe operators. */ enum { + OP_NO_FLAGS = 0, /* No flags. */ OP_VAR_ARGS = 001, /* 1=Variable number of args. */ OP_MIN_ARGS = 002, /* 1=Can specific min args with .X. */ OP_FMT_SPEC = 004, /* 1=Includes a format specifier. */ @@ -202,11 +53,9 @@ enum /* Describes an operator. */ struct op_desc { -#if GLOBAL_DEBUGGING const char *name; /* Operator name. */ -#endif - unsigned char flags; /* Flags. */ signed char height; /* Effect on stack height. */ + unsigned char flags; /* Flags. */ unsigned char skip; /* Number of operator item arguments. */ }; @@ -234,7 +83,7 @@ struct str_con_node struct var_node { int type; /* OP_NUM_VAR, OP_NUM_SYS, OP_NUM_VAL, - OP_STR_MIS, or OP_STR_VAR. */ + or OP_STR_VAR. */ struct variable *v; /* Variable. */ }; @@ -275,7 +124,7 @@ union any_node /* An expression. */ struct expression { - int type; /* Type of expression result. */ + enum expr_type type; /* Type of expression result. */ unsigned char *op; /* Operators. */ struct variable **var; /* Variables. */ double *num; /* Numeric operands. */ @@ -284,7 +133,7 @@ struct expression struct pool *pool; /* Pool for evaluation temporaries. */ }; -struct nonterm_node *optimize_expression (struct nonterm_node *); +void optimize_expression (union any_node **); void dump_expression (union any_node *, struct expression *); void free_node (union any_node *); diff --git a/src/get.c b/src/get.c index e2b4977e..aa3a4e49 100644 --- a/src/get.c +++ b/src/get.c @@ -738,8 +738,11 @@ cmd_match_files (void) name = mtf.tail->last; sbc = "LAST"; } - else - assert (0); + else + { + assert (0); + abort (); + } lex_match ('='); if (token != T_ID) diff --git a/src/getline.c b/src/getline.c index b9f0a1d5..f55cd9f9 100644 --- a/src/getline.c +++ b/src/getline.c @@ -451,6 +451,7 @@ read_console (void) default: assert (0); + abort (); } line = readline (prompt); diff --git a/src/html.c b/src/html.c index 49261473..f0b4f5fd 100644 --- a/src/html.c +++ b/src/html.c @@ -168,6 +168,7 @@ html_option (struct outp_driver *this, const char *key, const struct string *val break; default: assert (0); + abort (); } if (*dest) free (*dest); diff --git a/src/inpt-pgm.c b/src/inpt-pgm.c index 3ed20e19..69f5b048 100644 --- a/src/inpt-pgm.c +++ b/src/inpt-pgm.c @@ -303,6 +303,7 @@ end_case_trns_proc (struct trns_header *t UNUSED, struct ccase * c UNUSED, int case_num UNUSED) { assert (0); + abort (); } /* REREAD transformation. */ @@ -342,7 +343,7 @@ cmd_reread (void) return CMD_FAILURE; } - e = expr_parse (PXP_NUMERIC); + e = expr_parse (EXPR_NUMERIC); if (!e) return CMD_FAILURE; } diff --git a/src/lexer.c b/src/lexer.c index 9ad48dfb..dd20c8b9 100644 --- a/src/lexer.c +++ b/src/lexer.c @@ -696,32 +696,23 @@ lex_put_back_id (const char *id) /* Weird line processing functions. */ -/* Discards the rest of the current input line for tokenization - purposes, but returns the entire contents of the line for use by - the caller. */ -char * +/* Returns the entire contents of the current line. */ +const char * lex_entire_line (void) { - prog = ds_end (&getl_buf); - dot = 0; return ds_value (&getl_buf); } /* As lex_entire_line(), but only returns the part of the current line that hasn't already been tokenized. - If HAD_DOT is non-null, stores nonzero into *HAD_DOT if the line + If END_DOT is non-null, stores nonzero into *END_DOT if the line ends with a terminal dot, or zero if it doesn't. */ -char * -lex_rest_of_line (int *had_dot) +const char * +lex_rest_of_line (int *end_dot) { - char *s = prog; - prog = ds_end (&getl_buf); - - if (had_dot) - *had_dot = dot; - dot = 0; - - return s; + if (end_dot) + *end_dot = dot; + return prog; } /* Causes the rest of the current input line to be ignored for @@ -729,10 +720,7 @@ lex_rest_of_line (int *had_dot) void lex_discard_line (void) { - msg (SW, _("The rest of this command has been discarded.")); - - ds_clear (&getl_buf); - prog = ds_value (&getl_buf); + prog = ds_end (&getl_buf); dot = put_token = 0; } diff --git a/src/lexer.h b/src/lexer.h index 2dd7da06..23ce64bf 100644 --- a/src/lexer.h +++ b/src/lexer.h @@ -114,8 +114,8 @@ void lex_put_back (int); void lex_put_back_id (const char *tokid); /* Weird line processing functions. */ -char *lex_entire_line (void); -char *lex_rest_of_line (int *had_dot); +const char *lex_entire_line (void); +const char *lex_rest_of_line (int *end_dot); void lex_discard_line (void); void lex_set_prog (char *p); diff --git a/src/loop.c b/src/loop.c index bfb18e6e..b2685cfa 100644 --- a/src/loop.c +++ b/src/loop.c @@ -198,7 +198,7 @@ internal_cmd_loop (void) assert (token == '='); lex_get (); - one->init = expr_parse (PXP_NUMERIC); + one->init = expr_parse (EXPR_NUMERIC); if (!one->init) return 0; @@ -207,7 +207,7 @@ internal_cmd_loop (void) expr_free (one->init); return 0; } - one->term = expr_parse (PXP_NUMERIC); + one->term = expr_parse (EXPR_NUMERIC); if (!one->term) { expr_free (one->init); @@ -216,7 +216,7 @@ internal_cmd_loop (void) if (lex_match (T_BY)) { - one->incr = expr_parse (PXP_NUMERIC); + one->incr = expr_parse (EXPR_NUMERIC); if (!one->incr) return 0; } @@ -229,7 +229,7 @@ internal_cmd_loop (void) { two->flags |= LPC_COND; - two->cond = expr_parse (PXP_BOOLEAN); + two->cond = expr_parse (EXPR_BOOLEAN); if (!two->cond) return 0; } @@ -313,7 +313,7 @@ internal_cmd_end_loop (void) /* Parse the expression if any. */ if (lex_match_id ("IF")) { - thr->cond = expr_parse (PXP_BOOLEAN); + thr->cond = expr_parse (EXPR_BOOLEAN); if (!thr->cond) return 0; } diff --git a/src/main.c b/src/main.c index 3b7687b2..987daa8f 100644 --- a/src/main.c +++ b/src/main.c @@ -166,8 +166,11 @@ handle_error (int code) while (token != T_STOP && token != '.') lex_get (); } - else - lex_discard_line (); + else + { + msg (SW, _("The rest of this command has been discarded.")); + lex_discard_line (); + } } diff --git a/src/matrix-data.c b/src/matrix-data.c index 7d8b55f3..6f3d58ad 100644 --- a/src/matrix-data.c +++ b/src/matrix-data.c @@ -1152,6 +1152,7 @@ nr_read_data_lines (struct nr_aux_data *nr, break; default: assert (0); + abort (); } break; case 2: @@ -1159,6 +1160,7 @@ nr_read_data_lines (struct nr_aux_data *nr, break; default: assert (0); + abort (); } { @@ -2006,10 +2008,12 @@ wr_read_indeps (struct wr_aux_data *wr) break; default: assert (0); + abort (); } break; default: assert (0); + abort (); } c->n_rows[wr->content]++; diff --git a/src/numeric.c b/src/numeric.c index 2b42d2cf..26961289 100644 --- a/src/numeric.c +++ b/src/numeric.c @@ -146,6 +146,7 @@ cmd_string (void) break; default: assert (0); + abort (); } /* Create each variable. */ diff --git a/src/postscript.c b/src/postscript.c index 2bfc3fd7..8a4c9acb 100644 --- a/src/postscript.c +++ b/src/postscript.c @@ -661,6 +661,7 @@ ps_option (struct outp_driver *this, const char *key, const struct string *val) break; default: assert (0); + abort (); } if (setting) x->output_options |= mask; @@ -762,6 +763,7 @@ ps_option (struct outp_driver *this, const char *key, const struct string *val) break; default: assert (0); + abort (); } if (*dest) free (*dest); @@ -2410,6 +2412,7 @@ write_text (struct outp_driver *this, break; default: assert (0); + abort (); } lp = line; diff --git a/src/print.c b/src/print.c index 43cff102..273ab291 100644 --- a/src/print.c +++ b/src/print.c @@ -878,6 +878,7 @@ alloc_line (void) case PRT_ERROR: default: assert (0); + abort (); } if (pot_w > w) w = pot_w; @@ -1030,7 +1031,7 @@ cmd_print_space (void) if (token != '.') { - e = expr_parse (PXP_NUMERIC); + e = expr_parse (EXPR_NUMERIC); if (token != '.') { expr_free (e); diff --git a/src/recode.c b/src/recode.c index 12f682c8..e415dcf8 100644 --- a/src/recode.c +++ b/src/recode.c @@ -788,6 +788,7 @@ recode_trns_proc (struct trns_header * t, struct ccase * c, break; default: assert (0); + abort (); } if (!cp) continue; diff --git a/src/repeat.c b/src/repeat.c index 401a50ec..dad14549 100644 --- a/src/repeat.c +++ b/src/repeat.c @@ -299,7 +299,7 @@ internal_cmd_do_repeat (void) REPEAT line. We should actually check for the PRINT specifier. This can be done easier when we buffer entire commands instead of doing it token by token; see TODO. */ - lex_entire_line (); + lex_discard_line (); /* Tie up the loose end of the chain. */ if (line_buf_head == NULL) diff --git a/src/sel-if.c b/src/sel-if.c index 330ae437..6f6f2d02 100644 --- a/src/sel-if.c +++ b/src/sel-if.c @@ -43,7 +43,7 @@ cmd_select_if (void) struct expression *e; struct select_if_trns *t; - e = expr_parse (PXP_BOOLEAN); + e = expr_parse (EXPR_BOOLEAN); if (!e) return CMD_FAILURE; @@ -120,7 +120,7 @@ cmd_process_if (void) { struct expression *e; - e = expr_parse (PXP_BOOLEAN); + e = expr_parse (EXPR_BOOLEAN); if (!e) return CMD_FAILURE; diff --git a/src/sfm-write.c b/src/sfm-write.c index 3cfb1f5c..cb6bfa54 100644 --- a/src/sfm-write.c +++ b/src/sfm-write.c @@ -365,6 +365,7 @@ write_variable (struct sfm_write_info *inf, struct variable *v) break; default: assert (0); + abort (); } sv.n_missing_values = nm; diff --git a/src/sort.c b/src/sort.c index ec90b558..c17ca2a8 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1114,6 +1114,7 @@ mod (int x, int y) return y - (-x) % y; assert (0); + abort (); } /* A run of data for use in merging. */ diff --git a/src/str.c b/src/str.c index e807f36d..056ef6fc 100644 --- a/src/str.c +++ b/src/str.c @@ -104,51 +104,36 @@ mm_find_reverse (const char *haystack, size_t haystack_len, return 0; } -/* Compares S0 of length S0L to S1 of length S1L. The shorter string - is considered to be padded with spaces to the length of the - longer. */ +/* Compares A of length A_LEN to B of length B_LEN. The shorter + string is considered to be padded with spaces to the length of + the longer. */ int -st_compare_pad (const char *s0, int s0l, const char *s1, int s1l) +st_compare_pad (const char *a, size_t a_len, const char *b, size_t b_len) { - /* 254 spaces. */ - static char blanks[254] = - " " - " " - " " - " " - " "; + size_t min_len; + int result; - int diff = s0l - s1l; - int r; - - if (diff == 0) - { - if (s0l == 0) - return 0; - return memcmp (s0, s1, s0l); - } - else if (diff > 0) + min_len = a_len < b_len ? a_len : b_len; + result = memcmp (a, b, min_len); + if (result != 0) + return result; + else { - /* s0l > s1l */ - if (s1l) - { - r = memcmp (s0, s1, s1l); - if (r) - return r; - } - return memcmp (&s0[s1l], blanks, diff); - } - else - /* diff<0 */ - { - /* s0l < s1l */ - if (s0l) - { - r = memcmp (s0, s1, s0l); - if (r) - return r; - } - return memcmp (blanks, &s1[s0l], -diff); + size_t idx; + + if (a_len < b_len) + { + for (idx = min_len; idx < b_len; idx++) + if (' ' != b[idx]) + return ' ' > b[idx] ? 1 : -1; + } + else + { + for (idx = min_len; idx < a_len; idx++) + if (a[idx] != ' ') + return a[idx] > ' ' ? 1 : -1; + } + return 0; } } diff --git a/src/str.h b/src/str.h index 1951507c..3520e280 100644 --- a/src/str.h +++ b/src/str.h @@ -121,7 +121,7 @@ long getdelim (char **lineptr, size_t * n, int delimiter, FILE * stream); void mm_reverse (void *, size_t); char *mm_find_reverse (const char *, size_t, const char *, size_t); -int st_compare_pad (const char *, int, const char *, int); +int st_compare_pad (const char *, size_t, const char *, size_t); char *st_spaces (int); void st_bare_pad_copy (char *dest, const char *src, size_t n); void st_bare_pad_len_copy (char *dest, const char *src, size_t n, size_t len); diff --git a/src/title.c b/src/title.c index 50a065b9..cb56f84d 100644 --- a/src/title.c +++ b/src/title.c @@ -75,6 +75,7 @@ get_title (const char *cmd, char **title) if (*title) free (*title); *title = xstrdup (lex_rest_of_line (NULL)); + lex_discard_line (); for (cp = *title; *cp; cp++) *cp = toupper ((unsigned char) (*cp)); token = '.'; @@ -87,9 +88,10 @@ get_title (const char *cmd, char **title) int cmd_file_label (void) { - char *label; + const char *label; label = lex_rest_of_line (NULL); + lex_discard_line (); while (isspace ((unsigned char) *label)) label++; @@ -143,20 +145,21 @@ cmd_document (void) for (;;) { int had_dot; - char *line; + const char *orig_line; + char *copy_line; - line = lex_rest_of_line (&had_dot); - while (isspace ((unsigned char) *line)) - line++; + orig_line = lex_rest_of_line (&had_dot); + lex_discard_line (); + while (isspace ((unsigned char) *orig_line)) + orig_line++; + copy_line = xmalloc (strlen (orig_line) + 2); + strcpy (copy_line, orig_line); if (had_dot) - { - char *cp = strchr (line, 0); - *cp++ = '.'; - *cp = 0; - } + strcat (copy_line, "."); - add_document_line (line, 3); + add_document_line (copy_line, 3); + free (copy_line); lex_get_line (); if (had_dot) diff --git a/src/var-labs.c b/src/var-labs.c index ee9f4b78..a098e62a 100644 --- a/src/var-labs.c +++ b/src/var-labs.c @@ -41,10 +41,10 @@ cmd_variable_labels (void) int i; - lex_match ('/'); do { - parse_variables (default_dict, &v, &nv, PV_NONE); + if (!parse_variables (default_dict, &v, &nv, PV_NONE)) + return CMD_PART_SUCCESS_MAYBE; if (token != T_STRING) { diff --git a/src/vars-prs.c b/src/vars-prs.c index 143ad171..21340ef0 100644 --- a/src/vars-prs.c +++ b/src/vars-prs.c @@ -104,12 +104,14 @@ dict_class_to_name (enum dict_class dict_class) return _("scratch"); default: assert (0); + abort (); } } /* Parses a set of variables from dictionary D given options OPTS. Resulting list of variables stored in *VAR and the - number of variables into *CNT. */ + number of variables into *CNT. Returns nonzero only if + successful. */ int parse_variables (struct dictionary *d, struct variable ***var, int *cnt, int opts) diff --git a/tests/ChangeLog b/tests/ChangeLog index 559bda08..2b27655a 100644 --- a/tests/ChangeLog +++ b/tests/ChangeLog @@ -1,3 +1,16 @@ +Fri Mar 26 00:55:48 2004 Ben Pfaff + + * Makefile.am: (TESTS) Add xforms/expressions.sh, remove + command/compute.sh. + + * command/beg-data.sh: Remove REMARK usage. + + * command/bignum.sh: Ditto. + + * command/list.sh: Ditto. + + * command/print.sh: Ditto. + Sat Mar 20 18:11:15 2004 Ben Pfaff * command/list.sh: Update output. diff --git a/tests/Makefile.am b/tests/Makefile.am index 9b2feee2..9e9bd47b 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -6,7 +6,6 @@ TESTS = command/aggregate.sh \ command/beg-data.sh \ command/bignum.sh \ command/count.sh \ - command/compute.sh \ command/descriptives.sh \ command/erase.sh \ command/file-label.sh \ @@ -42,7 +41,8 @@ TESTS = command/aggregate.sh \ bugs/random.sh \ bugs/t-test.sh \ bugs/temporary.sh \ - bugs/val-labs.sh + bugs/val-labs.sh \ + xforms/expressions.sh noinst_PROGRAMS = gengarbage diff --git a/tests/command/beg-data.sh b/tests/command/beg-data.sh index 382e2a03..d7131f09 100755 --- a/tests/command/beg-data.sh +++ b/tests/command/beg-data.sh @@ -49,12 +49,6 @@ activity="create program" cat > $TEMPDIR/prog.sps << EOF_foobar title 'Test BEGIN DATA ... END DATA'. -remark EOF ----------------------------------------------------------------------- -First we show that we can input data with BEGIN DATA/END DATA after -a procedure. ----------------------------------------------------------------------- -EOF data list /A B 1-2. list. begin data. @@ -65,11 +59,6 @@ begin data. 90 end data. -remark EOF ----------------------------------------------------------------------- -Next we show that BEGIN DATA/END DATA work fine on their own as well. ----------------------------------------------------------------------- -EOF data list /A B 1-2. begin data. 09 @@ -89,11 +78,6 @@ if [ $? -ne 0 ] ; then no_result ; fi activity="compare data" diff -b -B $TEMPDIR/pspp.list - << foobar ----------------------------------------------------------------------- -First we show that we can input data with BEGIN DATA/END DATA after -a procedure. ----------------------------------------------------------------------- - 1.1 DATA LIST. Reading 1 record from the command file. +--------+------+-------+------+ |Variable|Record|Columns|Format| @@ -110,10 +94,6 @@ A B 7 8 9 0 ----------------------------------------------------------------------- -Next we show that BEGIN DATA/END DATA work fine on their own as well. ----------------------------------------------------------------------- - 2.1 DATA LIST. Reading 1 record from the command file. +--------+------+-------+------+ |Variable|Record|Columns|Format| diff --git a/tests/command/bignum.sh b/tests/command/bignum.sh index 1ccc09ba..a6465426 100755 --- a/tests/command/bignum.sh +++ b/tests/command/bignum.sh @@ -118,31 +118,11 @@ cat > $TEMPDIR/prog.stat < $TEMPDIR/compute.stat < $TEMPDIR/list.stat << foobar *** Single lines. -remark EOF ----------------------------------------------------------------------- -Testing use of LIST in single-line cases. ----------------------------------------------------------------------- -EOF data list file='$top_srcdir/tests/weighting.data'/AVAR 1-5 BVAR 6-10. weight by BVAR. list. @@ -61,11 +56,6 @@ list. list /format numbered weight. *** Multiple lines. -remark EOF ----------------------------------------------------------------------- -Testing use of LIST in multi-line cases. ----------------------------------------------------------------------- -EOF data list file='$top_srcdir/tests/list.data' notable /X000 to X126 1-127. *list /cases=from 1 to 25 by 5 /format numbered. list x000 to x030. @@ -81,10 +71,6 @@ if [ $? -ne 0 ] ; then no_result ; fi activity="compare results" diff -b -B $TEMPDIR/pspp.list - < $TEMPDIR/print.stat << foobar title 'Test PRINT transformation'. -remark EOF ----------------------------------------------------------------------- -There is no test for DATA LIST FIXED since it is imagined that the -rest of the tests give it a pretty good workout. ----------------------------------------------------------------------- -EOF -remark EOF ----------------------------------------------------------------------- -Testing use of DATA LIST FREE. ----------------------------------------------------------------------- -EOF data list free table file='$TEMPDIR/data-list.data'/A B C D. print outfile="foo" table/A(f8.2) '/' B(e8.2) '/' C(n10) '/' D(rbhex16) '/'. print space a. print outfile="foo" /a b c d. list. -remark EOF ----------------------------------------------------------------------- -Testing use of DATA LIST LIST. ----------------------------------------------------------------------- -EOF data list list table file='$TEMPDIR/data-list.data'/A B C D. print table/A B C D. list. @@ -112,15 +96,6 @@ if [ $? -ne 0 ] ; then fail ; fi activity="compare output" diff -b -B $TEMPDIR/pspp.list - << EOF ----------------------------------------------------------------------- -There is no test for DATA LIST FIXED since it is imagined that the -rest of the tests give it a pretty good workout. ----------------------------------------------------------------------- - ----------------------------------------------------------------------- -Testing use of DATA LIST FREE. ----------------------------------------------------------------------- - 1.1 DATA LIST. Reading free-form data from file $TEMPDIR/data-list.data. +--------+------+ |Variable|Format| @@ -154,10 +129,6 @@ Testing use of DATA LIST FREE. . 10.00 11.00 12.00 ----------------------------------------------------------------------- -Testing use of DATA LIST LIST. ----------------------------------------------------------------------- - 3.1 DATA LIST. Reading free-form data from file $TEMPDIR/data-list.data. +--------+------+ |Variable|Format| diff --git a/tests/xforms/expressions.sh b/tests/xforms/expressions.sh new file mode 100755 index 00000000..a6e8fd3e --- /dev/null +++ b/tests/xforms/expressions.sh @@ -0,0 +1,944 @@ +#! /bin/sh + +# Tests the expression optimizer and evaluator. + +TEMPDIR=/tmp/pspp-tst-$$ + +here=`pwd`; + +# ensure that top_srcdir is absolute +cd $top_srcdir; top_srcdir=`pwd` + +export STAT_CONFIG_PATH=$top_srcdir/config + + +cleanup() +{ + #rm -rf $TEMPDIR + : +} + + +fail() +{ + echo $activity + echo FAILED + cleanup; + exit 1; +} + + +no_result() +{ + echo $activity + echo NO RESULT; + cleanup; + exit 2; +} + +pass() +{ + cleanup; + exit 0; +} + +mkdir -p $TEMPDIR + +cd $TEMPDIR +activity="create expressions list" +sed -ne 's/#.*//;/^[ ]*$/!p' > $TEMPDIR/expr-list <<'EOF' + +# Number syntax. +1e2 => 100.00 +1e+2 => 100.00 +1e-2 => 0.01 +1e-99 => 0.00 + +# Test using numeric/string values as Booleans and vice-versa +0 AND 1 => false +$true AND 1 => true +1 OR $false => true +1 OR $sysmis => true +2 OR $sysmis => sysmis +2 AND $sysmis => false +'string' AND $sysmis => error +0 AND $sysmis => false +(1>2) + 1 => 1.00 +$true + $false => 1.00 + +# Addition and subtraction. +1 + 2 => 3.00 +1 + $true => 2.00 +$sysmis + 1 => sysmis +7676 + $sysmis => sysmis +('foo') + 5 => error +('foo') + ('bar') => error # Arithmetic concatenation requires CONCAT. +'foo' + 'bar' => "foobar" # Lexical concatentation succeeds. +1 +3 - 2 +4 -5 => 1.00 +1 - $true => 0.00 +$true - 4/3 => -0.33 +'string' - 1e10 => error +9.5 - '' => error +1 - 2 => -1.00 +52 -23 => 29.00 + +# Multiplication and division +5 * 10 => 50.00 +10 * $true => 10.00 +$true * 5 => 5.00 +1.5 * $true => 1.50 +5 * $sysmis => sysmis +$sysmis * 15 => sysmis +2 * 5 / 10 => 1.00 +1 / 2 => 0.50 +2 / 5 => 0.40 +12 / 3 / 2 => 2.00 + +# Exponentiation. +2**8 => 256.00 +(2**3)**4 => 4096.00 # Irritating, but compatible. +2**3**4 => 4096.00 + +# Unary minus. +2+-3 => -1.00 +2*-3 => -6.00 +-3**2 => -9.00 +(-3)**2 => 9.00 +2**-1 => 0.50 +0**0 => sysmis +0**-1 => sysmis +(-3)**1.5 => sysmis + +# AND truth table. +$false AND $false => false +$false AND $true => false +$false AND $sysmis => false +$true AND $false => false +$true AND $true => true +$true AND $sysmis => sysmis +$sysmis AND $false => false +$sysmis AND $true => sysmis +$sysmis AND $sysmis => sysmis +$false & $false => false +$false & $true => false +$false & $sysmis => false +$true & $false => false +$true & $true => true +$true & $sysmis => sysmis +$sysmis & $false => false +$sysmis & $true => sysmis +$sysmis & $sysmis => sysmis + +# OR truth table. +$false OR $false => false +$false OR $true => true +$false OR $sysmis => sysmis +$true OR $false => true +$true OR $true => true +$true OR $sysmis => true +$sysmis OR $false => sysmis +$sysmis OR $true => true +$sysmis OR $sysmis => sysmis +$false | $false => false +$false | $true => true +$false | $sysmis => sysmis +$true | $false => true +$true | $true => true +$true | $sysmis => true +$sysmis | $false => sysmis +$sysmis | $true => true +$sysmis | $sysmis => sysmis + +# NOT truth table. +not $false => true +not 0 => true +not 2.5 => true +not $true => false +not 1 => false +not $sysmis => sysmis +~ $false => true +~ 0 => true +~ 2.5 => true +~ $true => false +~ 1 => false +~ $sysmis => sysmis + +# Relational operators. +1 eq 1 => true +1 = 1 => true +1 eq 2 => false +2 = 3 => false +1 eq 'foobar' => error +5 eq 'foobar' => error +'baz' = 10 => error +'quux' = 5.55 => error +'foobar' = 'foobar' => true +'quux' = 'bar' => false +'bar ' = 'bar' => true +'asdf ' = 'asdf ' => true +'asdfj ' = 'asdf' => false +1 + 2 = 3 => true # Check precedence. +1 >= 2 = 2 ge 3 => false # Check precedence. +3 ne 2 ~= 1 => false # Mathematically true. +3 > 2 > 1 => false # Mathematically true. + +1 <= 2 => true +2.5 <= 1.5 => false +1 le 2 => true +2 <= 2 => true +2 le 2 => true +2 < = 2 => error # Make sure <= token can't be split. +1 <= 'foobar' => error +5 <= 'foobar' => error +'baz' <= 10 => error +'quux' <= 5.55 => error +'0123' <= '0123' => true +'0123' <= '0124' => true +'0124' le '0123' => false +'0123 ' <= '0123' => true +'0123' le '0123 ' => true + +1 < 2 => true +2.5 < 1.5 => false +3.5 lt 4 => true +4 lt 3.5 => false +1 lt 'foobar' => error +5 lt 'foobar' => error +'baz' < 10 => error +'quux' < 5.55 => error +'0123' lt '0123' => false +'0123' < '0124' => true +'0124' lt '0123' => false +'0123 ' < '0123' => false +'0123' lt '0123 ' => false + +1 >= 2 => false +2.5 >= 1.5 => true +1 ge 2 => false +2 >= 2 => true +2 ge 2 => true +2 > = 2 => error # Make sure >= token can't be split. +1 >= 'foobar' => error +5 ge 'foobar' => error +'baz' ge 10 => error +'quux' >= 5.55 => error +'0123' ge '0123' => true +'0123' >= '0124' => false +'0124' >= '0123' => true +'0123 ' ge '0123' => true +'0123' >= '0123 ' => true + +1 > 2 => false +2.5 > 1.5 => true +3.5 gt 4 => false +4 gt 3.5 => true +1 gt 'foobar' => error +5 gt 'foobar' => error +'baz' > 10 => error +'quux' > 5.55 => error +'0123' gt '0123' => false +'0123' > '0124' => false +'0124' gt '0123' => true +'0123 ' > '0123' => false +'0123' gt '0123 ' => false + +1 ne 1 => false +1 ~= 1 => false +1 <> 2 => true +2 ne 3 => true +1 ~= 'foobar' => error +5 <> 'foobar' => error +'baz' ne 10 => error +'quux' ~= 5.55 => error +'foobar' <> 'foobar' => false +'quux' ne 'bar' => true +'bar ' <> 'bar' => false +'asdf ' ~= 'asdf ' => false +'asdfj ' ne 'asdf' => true +1 < > 1 => error # <> token can't be split +1 ~ = 1 => error # ~= token can't be split + +exp(10) => 22026.47 +exp('x') => error + +lg10(500) => 2.70 +lg10('x') => error + +ln(10) => 2.30 +ln('x') => error + +sqrt(500) => 22.36 +sqrt('x') => error + +abs(-10.5) => 10.50 +abs(-55.79) => 55.79 +abs(22) => 22.00 +abs(0) => 0.00 + +mod(55.5, 2) => 1.50 +mod(-55.5, 2) => -1.50 +mod(55.5, -2) => 1.50 +mod(-55.5, -2) => -1.50 +mod('a', 2) => error +mod(2, 'a') => error +mod('a', 'b') => error + +mod10(55.5) => 5.50 +mod10(-55.5) => -5.50 +mod10('x') => error + +rnd(5.4) => 5.00 +rnd(5.6) => 6.00 +rnd(-5.4) => -5.00 +rnd(-5.6) => -6.00 +rnd('x') => error + +trunc(1.2) => 1.00 +trunc(1.9) => 1.00 +trunc(-1.2) => -1.00 +trunc(-1.9) => -1.00 +trunc('x') => error + +acos(.5) / 3.14159 * 180 => 60.00 +arcos(.75) / 3.14159 * 180 => 41.41 +arcos(-.5) / 3.14159 * 180 => 120.00 +acos(-.75) / 3.14159 * 180 => 138.59 +acos(-1) / 3.14159 * 180 => 180.00 +arcos(1) / 3.14159 * 180 => 0.00 +acos(-1.01) => sysmis +arcos(1.01) => sysmis +acos('x') => error + +arsin(.5) / 3.14159 * 180 => 30.00 +asin(.25) / 3.14159 * 180 => 14.48 +arsin(-.5) / 3.14159 * 180 => -30.00 +asin(-.25) / 3.14159 * 180 => -14.48 +arsin(-1.01) => sysmis +asin(1.01) => sysmis +arsin('x') => error + +artan(1) / 3.14159 * 180 => 45.00 +atan(10) / 3.14159 * 180 => 84.29 +artan(-1) / 3.14159 * 180 => -45.00 +atan(-10) / 3.14159 * 180 => -84.29 +artan('x') => error + +cos(60 / 180 * 3.14159) => 0.50 +cos(45 / 180 * 3.14159) => 0.71 +cos(30 / 180 * 3.14159) => 0.87 +cos(15 / 180 * 3.14159) => 0.97 +cos(-60 / 180 * 3.14159) => 0.50 +cos(-45 / 180 * 3.14159) => 0.71 +cos(-30 / 180 * 3.14159) => 0.87 +cos(-15 / 180 * 3.14159) => 0.97 +cos(123 / 180 * 3.14159) => -0.54 +cos(321 / 180 * 3.14159) => 0.78 +cos('x') => error + +sin(60 / 180 * 3.14159) => 0.87 +sin(45 / 180 * 3.14159) => 0.71 +sin(30 / 180 * 3.14159) => 0.50 +sin(15 / 180 * 3.14159) => 0.26 +sin(-60 / 180 * 3.14159) => -0.87 +sin(-45 / 180 * 3.14159) => -0.71 +sin(-30 / 180 * 3.14159) => -0.50 +sin(-15 / 180 * 3.14159) => -0.26 +sin(123 / 180 * 3.14159) => 0.84 +sin(321 / 180 * 3.14159) => -0.63 +sin('x') => error + +tan(60 / 180 * 3.14159) => 1.73 +tan(45 / 180 * 3.14159) => 1.00 +tan(30 / 180 * 3.14159) => 0.58 +tan(15 / 180 * 3.14159) => 0.27 +tan(-60 / 180 * 3.14159) => -1.73 +tan(-45 / 180 * 3.14159) => -1.00 +tan(-30 / 180 * 3.14159) => -0.58 +tan(-15 / 180 * 3.14159) => -0.27 +tan(123 / 180 * 3.14159) => -1.54 +tan(321 / 180 * 3.14159) => -0.81 +tan('x') => error + +# FIXME: a variable name as the argument to SYSMIS is a special case +# that we don't yet test. We also can't test VALUE this way. +missing(10) => false +missing($sysmis) => true +missing(asin(1.01)) => true +missing(asin(.5)) => false +missing(' ') => error +nmiss($sysmis) => 1.00 +nmiss(0) => 0.00 +nmiss($sysmis, $sysmis, $sysmis) => 3.00 +nmiss(1, 2, 3, 4) => 0.00 +nmiss(1, $sysmis, $sysmis, 2, 2, $sysmis, $sysmis, 3, 4) => 4.00 +nvalid($sysmis) => 0.00 +nvalid(0) => 1.00 +nvalid($sysmis, $sysmis, $sysmis) => 0.00 +nvalid(1, 2, 3, 4) => 4.00 +nvalid(1, $sysmis, $sysmis, 2, 2, $sysmis, $sysmis, 3, 4) => 5.00 +sysmis(10) => false +sysmis($sysmis) => true +sysmis(asin(1.01)) => true +sysmis(asin(.5)) => false +sysmis(' ') => error + +any($sysmis, 1, $sysmis, 3) => sysmis +any(1, 1, 2, 3) => true +any(2, 1, 2, 3) => true +any(3, 1, 2, 3) => true +any(5, 1, 2, 3) => false +any(1, 1, 1, 1) => true +any($sysmis, 1, 1, 1) => sysmis +any(1, $sysmis, $sysmis, $sysmis) => sysmis +any($sysmis, $sysmis, $sysmis, $sysmis) => sysmis +any(1) => error +any('1', 2, 3, 4) => error +any(1, '2', 3, 4) => error +any(1, 2, '3', 4) => error +any(1, 2, 3, '4') => error + +any('', 'a', '', 'c') => true +any('a', 'a', 'b', 'c') => true +any('b', 'a', 'b', 'c') => true +any('c', 'a', 'b', 'c') => true +any('e', 'a', 'b', 'c') => false +any('a', 'a', 'a', 'a') => true +any('', 'a', 'a', 'a') => false +any('a', '', '', '') => false +any('a') => error +any('a', 'a ', 'b', 'c') => true +any('b ', 'a', 'b', 'c') => true +any('c ', 'a', 'b', 'c ') => true +any(a, 'b', 'c', 'd') => error +any('a', b, 'c', 'd') => error +any('a', 'b', c, 'd') => error +any('a', 'b', 'c', d) => error + +range(5, 1, 10) => true +range(1, 1, 10) => true +range(10, 1, 10) => true +range(-1, 1, 10) => false +range(12, 1, 10) => false +range($sysmis, 1, 10) => sysmis +range(5, 1, $sysmis) => sysmis +range(5, $sysmis, 10) => sysmis +range($sysmis, $sysmis, 10) => sysmis +range($sysmis, 1, $sysmis) => sysmis +range($sysmis, $sysmis, $sysmis) => sysmis +range(0, 1, 8, 10, 18) => false +range(1, 1, 8, 10, 18) => true +range(6, 1, 8, 10, 18) => true +range(8, 1, 8, 10, 18) => true +range(9, 1, 8, 10, 18) => false +range(10, 1, 8, 10, 18) => true +range(13, 1, 8, 10, 18) => true +range(16, 1, 8, 10, 18) => true +range(18, 1, 8, 10, 18) => true +range(20, 1, 8, 10, 18) => false +range(1) => error +range(1, 2) => error +range(1, 2, 3, 4) => error +range(1, 2, 3, 4, 5, 6) => error +range('1', 2, 3) => error +range(1, '2', 3) => error +range(1, 2, '3') => error + +range('123', '111', '888') => true +range('111', '111', '888') => true +range('888', '111', '888') => true +range('110', '111', '888') => false +range('889', '111', '888') => false +range('000', '111', '888') => false +range('999', '111', '888') => false +range('123 ', '111', '888') => true +range('123', '111 ', '888') => true +range('123', '111', '888 ') => true +range('123', '111 ', '888 ') => true +range('00', '01', '08', '10', '18') => false +range('01', '01', '08', '10', '18') => true +range('06', '01', '08', '10', '18') => true +range('08', '01', '08', '10', '18') => true +range('09', '01', '08', '10', '18') => false +range('10', '01', '08', '10', '18') => true +range('15', '01', '08', '10', '18') => true +range('18', '01', '08', '10', '18') => true +range('19', '01', '08', '10', '18') => false +range('1') => error +range('1', '2') => error +range('1', '2', '3', '4') => error +range('1', '2', '3', '4', '5', '6') => error +range(1, '2', '3') => error +range('1', 2, '3') => error +range('1', '2', 3) => error + +cfvar(1, 2, 3, 4, 5) => 0.53 +cfvar(1, $sysmis, 2, 3, $sysmis, 4, 5) => 0.53 +cfvar(1, 2) => 0.47 +cfvar(1) => error +cfvar(1, $sysmis) => sysmis +cfvar(1, 2, 3, $sysmis) => 0.50 +cfvar.4(1, 2, 3, $sysmis) => sysmis +cfvar.4(1, 2, 3) => error +cfvar('x') => error +cfvar('x', 1, 2, 3) => error + +max(1, 2, 3, 4, 5) => 5.00 +max(1, $sysmis, 2, 3, $sysmis, 4, 5) => 5.00 +max(1, 2) => 2.00 +max() => error +max(1) => 1.00 +max(1, $sysmis) => 1.00 +max(1, 2, 3, $sysmis) => 3.00 +max.4(1, 2, 3, $sysmis) => sysmis +max.4(1, 2, 3) => error + +max("2", "3", "5", "1", "4") => "5" +max("1", "2") => "2" +max("1") => "1" + +mean(1, 2, 3, 4, 5) => 3.00 +mean(1, $sysmis, 2, 3, $sysmis, 4, 5) => 3.00 +mean(1, 2) => 1.50 +mean() => error +mean(1) => 1.00 +mean(1, $sysmis) => 1.00 +mean(1, 2, 3, $sysmis) => 2.00 +mean.4(1, 2, 3, $sysmis) => sysmis +mean.4(1, 2, 3) => error + +min(1, 2, 3, 4, 5) => 1.00 +min(1, $sysmis, 2, 3, $sysmis, 4, 5) => 1.00 +min(1, 2) => 1.00 +min() => error +min(1) => 1.00 +min(1, $sysmis) => 1.00 +min(1, 2, 3, $sysmis) => 1.00 +min.4(1, 2, 3, $sysmis) => sysmis +min.4(1, 2, 3) => error + +min("2", "3", "5", "1", "4") => "1" +min("1", "2") => "1" +min("1") => "1" + +sd(1, 2, 3, 4, 5) => 1.58 +sd(1, $sysmis, 2, 3, $sysmis, 4, 5) => 1.58 +sd(1, 2) => 0.71 +sd(1) => error +sd(1, $sysmis) => sysmis +sd(1, 2, 3, $sysmis) => 1.00 +sd.4(1, 2, 3, $sysmis) => sysmis +sd.4(1, 2, 3) => error +sd('x') => error +sd('x', 1, 2, 3) => error + +sum(1, 2, 3, 4, 5) => 15.00 +sum(1, $sysmis, 2, 3, $sysmis, 4, 5) => 15.00 +sum(1, 2) => 3.00 +sum() => error +sum(1) => 1.00 +sum(1, $sysmis) => 1.00 +sum(1, 2, 3, $sysmis) => 6.00 +sum.4(1, 2, 3, $sysmis) => sysmis +sum.4(1, 2, 3) => error + +variance(1, 2, 3, 4, 5) => 2.50 +variance(1, $sysmis, 2, 3, $sysmis, 4, 5) => 2.50 +variance(1, 2) => 0.50 +variance(1) => error +variance(1, $sysmis) => sysmis +variance(1, 2, 3, $sysmis) => 1.00 +variance.4(1, 2, 3, $sysmis) => sysmis +variance.4(1, 2, 3) => error +variance('x') => error +variance('x', 1, 2, 3) => error + +concat('') => "" +concat('a', 'b') => "ab" +concat('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h') => "abcdefgh" +concat('abcdefgh', 'ijklmnopq') => "abcdefghijklmnopq" +concat('a', 1) => error +concat(1, 2) => error + +index('abcbcde', 'bc') => 2.00 +index('abcbcde', 'bcd') => 4.00 +index('abcbcde', 'bcbc') => 2.00 +index('abcdefgh', 'abc') => 1.00 +index('abcdefgh', 'bcd') => 2.00 +index('abcdefgh', 'cde') => 3.00 +index('abcdefgh', 'def') => 4.00 +index('abcdefgh', 'efg') => 5.00 +index('abcdefgh', 'fgh') => 6.00 +index('abcdefgh', 'fghi') => 0.00 +index('abcdefgh', 'x') => 0.00 +index('abcdefgh', 'abch') => 0.00 +index('banana', 'na') => 3.00 +index('banana', 'ana') => 2.00 +index('', 'x') => 0.00 +index('', '') => sysmis +index('abcdefgh', '') => sysmis +index('abcdefgh', 'alkjsfdjlskalkjfa') => 0.00 + +index('abcbcde', 'bc', 1) => 2.00 +index('abcbcde', 'dc', 1) => 3.00 +index('abcbcde', 'abc', 1) => 1.00 +index('abcbcde', 'bc', 2) => 2.00 +index('abcbcde', 'dc', 2) => 0.00 +index('abcbcde', 'abc', 1) => 1.00 +index('abcbcde', 'bccb', 2) => 2.00 +index('abcbcde', 'bcbc', 2) => 2.00 +index('abcbcde', 'bcbc', $sysmis) => sysmis + +rindex('abcbcde', 'bc') => 4.00 +rindex('abcbcde', 'bcd') => 4.00 +rindex('abcbcde', 'bcbc') => 2.00 +rindex('abcdefgh', 'abc') => 1.00 +rindex('abcdefgh', 'bcd') => 2.00 +rindex('abcdefgh', 'cde') => 3.00 +rindex('abcdefgh', 'def') => 4.00 +rindex('abcdefgh', 'efg') => 5.00 +rindex('abcdefgh', 'fgh') => 6.00 +rindex('abcdefgh', 'fghi') => 0.00 +rindex('abcdefgh', 'x') => 0.00 +rindex('abcdefgh', 'abch') => 0.00 +rindex('banana', 'na') => 5.00 +rindex('banana', 'ana') => 4.00 +rindex('', 'x') => 0.00 +rindex('', '') => sysmis +rindex('abcdefgh', '') => sysmis +rindex('abcdefgh', 'alkjsfdjlskalkjfa') => 0.00 + +rindex('abcbcde', 'bc', 1) => 5.00 +rindex('abcbcde', 'dc', 1) => 6.00 +rindex('abcbcde', 'abc', 1) => 5.00 +rindex('abcbcde', 'bc', 2) => 4.00 +rindex('abcbcde', 'dc', 2) => 0.00 +rindex('abcbcde', 'abc', 1) => 5.00 +rindex('abcbcde', 'bccb', 2) => 4.00 +rindex('abcbcde', 'bcbc', 2) => 4.00 +rindex('abcbcde', 'bcbc', $sysmis) => sysmis +rindex('abcbcde', 'bcbcg', 2) => sysmis +rindex('abcbcde', 'bcbcg', $sysmis) => sysmis +rindex('abcbcde', 'bcbcg', 'x') => error +rindex(1, 'bcdfkjl', 2) => error +rindex('aksj', 2, 2) => error +rindex(1, 2, 3) => error +rindex(1, 2, '3') => error + +length('') => 0.00 +length('a') => 1.00 +length('xy') => 2.00 +length('adsf ') => 8.00 +length('abcdefghijkl') => 12.00 +length(0) => error +length($sysmis) => error + +lower('ABCDEFGHIJKLMNOPQRSTUVWXYZ!@%&*(089') => "abcdefghijklmnopqrstuvwxyz!@%&*(089" +lower('') => "" +lower(1) => error + +lpad('abc', -1) => "" +lpad('abc', 0) => "abc" +lpad('abc', 2) => "abc" +lpad('abc', 3) => "abc" +lpad('abc', 10) => " abc" +lpad('abc', 256) => "" +lpad('abc', $sysmis) => "" +lpad('abc', -1, '*') => "" +lpad('abc', 0, '*') => "abc" +lpad('abc', 2, '*') => "abc" +lpad('abc', 3, '*') => "abc" +lpad('abc', 10, '*') => "*******abc" +lpad('abc', 256, '*') => "" +lpad('abc', $sysmis, '*') => "" +lpad('abc', $sysmis, '') => "" +lpad('abc', $sysmis, 'xy') => "" +lpad(0, 10) => error +lpad('abc', 'def') => error +lpad(0, 10, ' ') => error +lpad('abc', 'def', ' ') => error +lpad('x', 5, 0) => error +lpad('x', 5, 2) => error + +number("123", f3.0) => 123.00 +number(" 123", f3.0) => 12.00 +number("123", f3.1) => 12.30 +number(" ", f3.1) => sysmis + +ltrim(' abc') => "abc" +rtrim(' abc ') => " abc" +ltrim('abc') => "abc" +ltrim(' abc') => " abc" +ltrim(' ') => "" +ltrim('') => "" +ltrim(8) => error +ltrim('***abc', '*') => "abc" +ltrim('abc', '*') => "abc" +ltrim('*abc', '*') => "abc" +ltrim('', '*') => "" +ltrim(8, '*') => error +ltrim(' x', 8) => error +ltrim(8, 9) => error + +rpad('abc', -1) => "" +rpad('abc', 0) => "abc" +rpad('abc', 2) => "abc" +rpad('abc', 3) => "abc" +rpad('abc', 10) => "abc " +rpad('abc', 256) => "" +rpad('abc', $sysmis) => "" +rpad('abc', -1, '*') => "" +rpad('abc', 0, '*') => "abc" +rpad('abc', 2, '*') => "abc" +rpad('abc', 3, '*') => "abc" +rpad('abc', 10, '*') => "abc*******" +rpad('abc', 256, '*') => "" +rpad('abc', $sysmis, '*') => "" +rpad('abc', $sysmis, '') => "" +rpad('abc', $sysmis, 'xy') => "" +rpad(0, 10) => error +rpad('abc', 'def') => error +rpad(0, 10, ' ') => error +rpad('abc', 'def', ' ') => error +rpad('x', 5, 0) => error +rpad('x', 5, 2) => error + +rtrim('abc ') => "abc" +rtrim(' abc ') => " abc" +rtrim('abc') => "abc" +rtrim('abc ') => "abc " +rtrim(' ') => "" +rtrim('') => "" +rtrim(8) => error +rtrim('abc***', '*') => "abc" +rtrim('abc', '*') => "abc" +rtrim('abc*', '*') => "abc" +rtrim('', '*') => "" +rtrim(8, '*') => error +rtrim(' x', 8) => error +rtrim(8, 9) => error + +string(123.56, f5.1) => "123.6" +string($sysmis, f5.1) => " . " +string("abc", A5) => error + +substr('abcdefgh', -5) => "" +substr('abcdefgh', 0) => "" +substr('abcdefgh', 1) => "abcdefgh" +substr('abcdefgh', 3) => "cdefgh" +substr('abcdefgh', 5) => "efgh" +substr('abcdefgh', 6) => "fgh" +substr('abcdefgh', 7) => "gh" +substr('abcdefgh', 8) => "h" +substr('abcdefgh', 9) => "" +substr('abcdefgh', 10) => "" +substr('abcdefgh', 20) => "" +substr('abcdefgh', $sysmis) => "" +substr(0, 10) => error +substr('abcd', 'abc') => error +substr(0, 'abc') => error + +substr('abcdefgh', 0, 0) => "" +substr('abcdefgh', 3, 0) => "" +substr('abcdefgh', 5, 0) => "" +substr('abcdefgh', 9, 0) => "" +substr('abcdefgh', 0, 1) => "" +substr('abcdefgh', 0, 5) => "" +substr('abcdefgh', 1, 8) => "abcdefgh" +substr('abcdefgh', 1, 10) => "abcdefgh" +substr('abcdefgh', 1, 20) => "abcdefgh" +substr('abcdefgh', 3, 4) => "cdef" +substr('abcdefgh', 5, 2) => "ef" +substr('abcdefgh', 6, 1) => "f" +substr('abcdefgh', 7, 10) => "gh" +substr('abcdefgh', 8, 1) => "h" +substr('abcdefgh', 8, 2) => "h" +substr('abcdefgh', 9, 11) => "" +substr('abcdefgh', 10, 52) => "" +substr('abcdefgh', 20, 1) => "" +substr('abcdefgh', $sysmis, 2) => "" +substr('abcdefgh', 9, $sysmis) => "" +substr('abcdefgh', $sysmis, $sysmis) => "" +substr('abc', 1, 'x') => error +substr(0, 10, 1) => error +substr(0, 10, 'x') => error +substr('abcd', 'abc', 0) => error +substr('abcd', 'abc', 'j') => error +substr(0, 'abc', 4) => error +substr(0, 'abc', 'k') => error + +upcase('abcdefghijklmnopqrstuvwxyz!@%&*(089') => "ABCDEFGHIJKLMNOPQRSTUVWXYZ!@%&*(089" +upcase('') => "" +upcase(1) => error + +time.days(1) => 86400.00 +time.days(-1) => -86400.00 +time.days(0.5) => 43200.00 +time.days('x') => error +time.days($sysmis) => sysmis + +time.hms(5, 6, 7) => 18367.00 +time.hms(5, 6, 0) => 18360.00 +time.hms(5, 0, 7) => 18007.00 +time.hms(0, 6, 7) => 367.00 +time.hms(-5, 6, -7) => sysmis +time.hms(-5, 5, -7) => sysmis +time.hms($sysmis, 6, 7) => sysmis +time.hms(5, $sysmis, 7) => sysmis +time.hms(5, $sysmis, 7) => sysmis +time.hms($sysmis, $sysmis, 7) => sysmis +time.hms(5, $sysmis, $sysmis) => sysmis +time.hms($sysmis, $sysmis, 7) => sysmis +time.hms($sysmis, $sysmis, $sysmis) => sysmis + +ctime.days(106272) => 1.23 +ctime.hours(106272) => 29.52 +ctime.minutes(106272) => 1771.20 +ctime.seconds(106272) => 106272.00 +ctime.days(-106272) => -1.23 +ctime.hours(-106272) => -29.52 +ctime.minutes(-106272) => -1771.20 +ctime.seconds(-106272) => -106272.00 +ctime.days($sysmis) => sysmis +ctime.hours($sysmis) => sysmis +ctime.minutes($sysmis) => sysmis +ctime.seconds($sysmis) => sysmis +ctime.days('a') => error +ctime.hours('b') => error +ctime.minutes('c') => error +ctime.seconds('d') => error + +ctime.days(date.dmy(15,10,1582)) => 1.00 +ctime.days(date.dmy(6,9,1719)) => 50000.00 +ctime.days(date.dmy(24,1,1583)) => 102.00 +ctime.days(date.dmy(14,12,1585)) => 1157.00 +ctime.days(date.dmy(26,11,1621)) => 14288.00 +ctime.days(date.dmy(25,12,1821)) => 87365.00 +ctime.days(date.dmy(3,12,1882)) => 109623.00 +ctime.days(date.dmy(6,4,2002)) => 153211.00 +ctime.days(date.dmy(19,12,1999)) => 152372.00 +ctime.days(date.dmy(1,10,1978)) => 144623.00 +ctime.days(date.dmy(0,10,1978)) => 144622.00 +ctime.days(date.dmy(32,10,1978)) => sysmis +ctime.days(date.dmy(31,0,1978)) => 144349.00 +ctime.days(date.dmy(31,13,1978)) => 144745.00 +ctime.days(date.dmy($sysmis,10,1978)) => sysmis +ctime.days(date.dmy(31,$sysmis,1978)) => sysmis +ctime.days(date.dmy(31,10,$sysmis)) => sysmis +ctime.days(date.dmy($sysmis,$sysmis,1978)) => sysmis +ctime.days(date.dmy(31,$sysmis,$sysmis)) => sysmis +ctime.days(date.dmy($sysmis,10,$sysmis)) => sysmis +ctime.days(date.dmy($sysmis,$sysmis,$sysmis)) => sysmis +date.dmy('a',1,2) => error +date.dmy(1,'a',2) => error +date.dmy(1,2,'a') => error + +ctime.days(date.mdy(10,15,1582)) => 1.00 +ctime.days(date.mdy(9,6,1719)) => 50000.00 +ctime.days(date.mdy(1,24,1583)) => 102.00 +ctime.days(date.mdy(12,14,1585)) => 1157.00 +ctime.days(date.mdy(11,26,1621)) => 14288.00 +ctime.days(date.mdy(12,25,1821)) => 87365.00 +ctime.days(date.mdy(12,3,1882)) => 109623.00 +ctime.days(date.mdy(4,6,2002)) => 153211.00 +ctime.days(date.mdy(12,19,1999)) => 152372.00 +ctime.days(date.mdy(10,1,1978)) => 144623.00 +ctime.days(date.mdy(10,0,1978)) => 144622.00 +ctime.days(date.mdy(10,32,1978)) => sysmis +ctime.days(date.mdy(0,31,1978)) => 144349.00 +ctime.days(date.mdy(13,31,1978)) => 144745.00 +ctime.days(date.mdy(10,$sysmis,1978)) => sysmis +ctime.days(date.mdy($sysmis,31,1978)) => sysmis +ctime.days(date.mdy(10,31,$sysmis)) => sysmis +ctime.days(date.mdy($sysmis,$sysmis,1978)) => sysmis +ctime.days(date.mdy($sysmis,31,$sysmis)) => sysmis +ctime.days(date.mdy(10,$sysmis,$sysmis)) => sysmis +ctime.days(date.mdy($sysmis,$sysmis,$sysmis)) => sysmis +date.mdy('a',1,2) => error +date.mdy(1,'a',2) => error +date.mdy(1,2,'a') => error + +ctime.days(date.moyr(1,2000)) => 152385.00 +ctime.days(date.moyr(2,2000)) => 152416.00 +ctime.days(date.moyr(3,2000)) => 152445.00 +ctime.days(date.moyr(4,2000)) => 152476.00 +ctime.days(date.moyr(5,2000)) => 152506.00 +ctime.days(date.moyr(13,2000)) => 152751.00 +ctime.days(date.moyr(14,2000)) => sysmis +ctime.days(date.moyr($sysmis,2000)) => sysmis +ctime.days(date.moyr(1,$sysmis)) => sysmis +ctime.days(date.moyr($sysmis,$sysmis)) => sysmis +date.moyr('a',2000) => error +date.moyr(5,'a') => error +date.moyr('a','b') => error + +ctime.days(date.qyr(1,2000)) => 152385.00 +ctime.days(date.qyr(2,2000)) => 152476.00 +ctime.days(date.qyr(5,2000)) => 152751.00 +ctime.days(date.qyr(6,2000)) => sysmis +ctime.days(date.qyr($sysmis,2000)) => sysmis +ctime.days(date.qyr(1,$sysmis)) => sysmis +ctime.days(date.qyr($sysmis,$sysmis)) => sysmis +date.qyr('a',2000) => error +date.qyr(5,'a') => error +date.qyr('a','b') => error + +ctime.days(date.wkyr(1,2000)) => 152385.00 +ctime.days(date.wkyr(15,1999)) => 152118.00 +ctime.days(date.wkyr(36,1999)) => 152265.00 +ctime.days(date.wkyr(54,1999)) => sysmis +ctime.days(date.wkyr($sysmis,1999)) => sysmis +ctime.days(date.wkyr(1,$sysmis)) => sysmis +ctime.days(date.wkyr($sysmis,$sysmis)) => sysmis +date.wkyr('a',1999) => error +date.wkyr(5,'a') => error +date.wkyr('a','b') => error + +ctime.days(date.yrday(2000,1)) => 152385.00 +ctime.days(date.yrday(2000,100)) => 152484.00 +ctime.days(date.yrday(2000,253)) => 152637.00 +ctime.days(date.yrday(2000,500)) => 152884.00 +ctime.days(date.yrday(2000,-100)) => 152284.00 +ctime.days(date.yrday(1999,$sysmis)) => sysmis +ctime.days(date.yrday($sysmis,1)) => sysmis +ctime.days(date.yrday($sysmis,$sysmis)) => sysmis +date.yrday(1999,'a') => error +date.yrday('a',5) => error +date.yrday('a','b') => error + +# FIXME: XDATE.* functions +# FIXME: LAG +# FIXME: YRMODA +EOF +if [ $? -ne 0 ] ; then no_result ; fi + +activity="create optimizing input" +echo 'set mxwarn 1000. +set mxerr 1000.' > $TEMPDIR/expr-opt.stat +sed < $TEMPDIR/expr-list >> $TEMPDIR/expr-opt.stat \ + -e 's#^\(.*\) => \(.*\)$#DEBUG EVALUATE/\1.#' +if [ $? -ne 0 ] ; then no_result ; fi + +activity="run optimizing program" +$SUPERVISOR $here/../src/pspp --testing-mode -o raw-ascii \ + $TEMPDIR/expr-opt.stat >$TEMPDIR/expr-opt.err 2> $TEMPDIR/expr-opt.out + +activity="compare optimizing output" +diff -B -b $TEMPDIR/expr-list $TEMPDIR/expr-opt.out +if [ $? -ne 0 ] ; then fail ; fi + +activity="create non-optimizing input" +echo 'set mxwarn 1000. +set mxerr 1000.' > $TEMPDIR/expr-noopt.stat +sed < $TEMPDIR/expr-list >> $TEMPDIR/expr-noopt.stat \ + -e 's#^\(.*\) => \(.*\)$#DEBUG EVALUATE NOOPTIMIZE/\1.#' +if [ $? -ne 0 ] ; then no_result ; fi + +activity="run non-optimizing program" +$SUPERVISOR $here/../src/pspp --testing-mode -o raw-ascii \ + $TEMPDIR/expr-noopt.stat >$TEMPDIR/expr-noopt.err 2> $TEMPDIR/expr-noopt.out + +activity="compare non-optimizing output" +diff -B -b $TEMPDIR/expr-list $TEMPDIR/expr-noopt.out +if [ $? -ne 0 ] ; then fail ; fi + +pass