From: Ben Pfaff Date: Sun, 20 Mar 2011 00:05:47 +0000 (-0700) Subject: lexer: Reimplement for better testability and internationalization. X-Git-Tag: v0.7.7~16 X-Git-Url: https://pintos-os.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=9ade26c8349;p=pspp-builds.git lexer: Reimplement for better testability and internationalization. This commit reimplements PSPP lexical analysis from the ground up. From a PSPP user's perspective, this should make PSPP more reliable and make it easier to work with syntax files in non-ASCII encodings. See the changes to NEWS for more details. From a developer's perspective, the most visible change may be that strings within tokens are now always encoded in UTF-8, regardless of the syntax file's encoding. Many of the changes in this commit are due to this, especially those to functions that check for valid identifiers: an identifier in UTF-8 is not necessarily the same length when encoded in the dictionary's encoding, but limits on identifier length must be enforced in the dictionary's encoding (otherwise it might not be possible to write out a valid system file, since the identifier might not fit in the fixed length fields in such files). Another important change is that, whereas before some special syntax had to be handled by the parser providing feedback to the lexer, now increasing the sophistication of the lexer has enabled all PSPP syntax to be analyzed into tokens. This permitted some other improvements: - An arbitrary number of tokens of lookahead, up to the end of the current command, is now supported using lex_next_token() and related functions. - Before, some command implementations had a special attribute that meant that the top-level PSPP command parser would not consume the final token of the command name (because that token was not followed by tokenizable syntax). This is no longer necessary and has been removed. - Before, each command implementation was responsible for ensuring that valid command syntax was not followed by trailing garbage, often by calling lex_end_of_command() as the last step of parsing. This is no longer necessary; the main command parser will ensure this for itself. --- diff --git a/NEWS b/NEWS index a9606848..b4bb63f7 100644 --- a/NEWS +++ b/NEWS @@ -1,12 +1,54 @@ PSPP NEWS -- history of user-visible changes. -Time-stamp: <2010-11-21 11:58:30 blp> -Copyright (C) 1996-9, 2000, 2008, 2009, 2010 Free Software Foundation, Inc. +Time-stamp: <2011-03-19 16:39:28 blp> +Copyright (C) 1996-9, 2000, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. See the end for copying conditions. Please send PSPP bug reports to bug-gnu-pspp@gnu.org. Changes from 0.7.3 to 0.7.6: + * The "pspp" program has a new option --batch (or -b) that selects + "batch" syntax mode. In previous versions of PSPP this syntax mode + was the default. Now a new "auto" syntax mode is the default. In + "auto" mode, PSPP interprets most syntax files correctly regardless + of their intended syntax mode. + + See the "Syntax Variants" section in the PSPP manual for more + information. + + * The "pspp" program has a new option --syntax-encoding that + specifies the encoding for syntax files listed on the command line, + as well as the default encoding for syntax files included with + INCLUDE or INSERT. The default is to accept the system locale + encoding, UTF-8, UTF-16, or UTF-32, automatically detecting which + one the system file uses. + + See the documentation for the INSERT command in the PSPP manual for + more information. + + * The INCLUDE and INSERT commands now support the ENCODING subcommand + to specify the encoding for the included syntax file. + + * Strings may now include arbitrary Unicode code points specified in + hexadecimal, using the syntax U'hhhh'. For example, Unicode code + point U+1D11E, the musical G clef character, may be expressed as + U'1D11E'. + + See the "Tokens" section in the PSPP manual for more information. + + * In previous versions of PSPP, in a string expressed in hexadecimal + with X'hh' syntax, the hexadecimal digits expressed bytes in the + locale encoding. In this version of PSPP, X'hh' syntax always + expresses bytes in UTF-8 encoding. + + See the "Tokens" section in the PSPP manual for more information. + + * The DO REPEAT command has been reimplemented. The most prominent + change is that when a DO REPEAT block contains an INCLUDE or INSERT + command, substitutions are not applied to the included file. + + See the "DO REPEAT" section in the PSPP manual for more information. + * NPAR TESTS now supports the /KRUSKAL-WALLIS and /RUNS subcommands. * AUTORECODE now supports the /GROUP subcommand. diff --git a/Smake b/Smake index 683a8e38..a1a81651 100644 --- a/Smake +++ b/Smake @@ -49,6 +49,7 @@ GNULIB_MODULES = \ printf-posix \ printf-safe \ progname \ + rawmemchr \ read-file \ regex \ relocatable-prog \ @@ -80,6 +81,7 @@ GNULIB_MODULES = \ unistr/u8-cpy \ unistr/u8-mbtouc \ unistr/u8-strlen \ + unistr/u8-strmbtouc \ unistr/u8-strncat \ uniwidth/u8-strwidth \ unitypes \ diff --git a/doc/dev/concepts.texi b/doc/dev/concepts.texi index 24c16541..053d2521 100644 --- a/doc/dev/concepts.texi +++ b/doc/dev/concepts.texi @@ -1220,12 +1220,12 @@ The following sections describe variable-related functions and macros. @node Variable Name @subsection Variable Name -A variable name is a string between 1 and @code{VAR_NAME_LEN} bytes +A variable name is a string between 1 and @code{ID_MAX_LEN} bytes long that satisfies the rules for PSPP identifiers (@pxref{Tokens,,,pspp, PSPP Users Guide}). Variable names are mixed-case and treated case-insensitively. -@deftypefn Macro int VAR_NAME_LEN +@deftypefn Macro int ID_MAX_LEN Maximum length of a variable name, in bytes, currently 64. @end deftypefn @@ -1248,23 +1248,6 @@ dictionary. Use @func{dict_rename_var} instead (@pxref{Dictionary Renaming Variables}). @end deftypefun -@anchor{var_is_plausible_name} -@deftypefun {bool} var_is_valid_name (const char *@var{name}, bool @var{issue_error}) -@deftypefunx {bool} var_is_plausible_name (const char *@var{name}, bool @var{issue_error}) -Tests @var{name} for validity or ``plausibility.'' Returns true if -the name is acceptable, false otherwise. If the name is not -acceptable and @var{issue_error} is true, also issues an error message -explaining the violation. - -A valid name is one that fully satisfies all of the requirements for -variable names (@pxref{Tokens,,,pspp, PSPP Users Guide}). A -``plausible'' name is simply a string whose length is in the valid -range and that is not a reserved word. PSPP accepts plausible but -invalid names as variable names in some contexts where the character -encoding scheme is ambiguous, as when reading variable names from -system files. -@end deftypefun - @deftypefun {enum dict_class} var_get_dict_class (const struct variable *@var{var}) Returns the dictionary class of @var{var}'s name (@pxref{Dictionary Class}). @@ -1764,7 +1747,7 @@ To delete a variable from a dictionary and destroy it, use @node Variable Short Names @subsection Variable Short Names -PSPP variable names may be up to 64 (@code{VAR_NAME_LEN}) bytes long. +PSPP variable names may be up to 64 (@code{ID_MAX_LEN}) bytes long. The system and portable file formats, however, were designed when variable names were limited to 8 bytes in length. Since then, the system file format has been augmented with an extension record that @@ -1829,7 +1812,7 @@ been assigned a short name. Sets @var{var}'s short name to @var{short_name}, or removes @var{var}'s short name if @var{short_name} is a null pointer. If it is non-null, then @var{short_name} must be a plausible name for a -variable (@pxref{var_is_plausible_name}). The name will be truncated +variable. The name will be truncated to 8 bytes in length and converted to all-uppercase. @end deftypefun diff --git a/doc/flow-control.texi b/doc/flow-control.texi index 868143b3..892887e2 100644 --- a/doc/flow-control.texi +++ b/doc/flow-control.texi @@ -72,6 +72,7 @@ expansion takes one of the following forms: var_list num_or_range@dots{} 'string'@dots{} + ALL num_or_range takes one of the following forms: number @@ -82,13 +83,11 @@ num_or_range takes one of the following forms: different variables, numbers, or strings into the block with each repetition. -Specify a dummy variable name followed by an equals sign (@samp{=}) and -the list of replacements. Replacements can be a list of variables -(which may be existing variables or new variables or some combination), -numbers, or strings. When new variable names are -specified, @cmd{DO REPEAT} creates them as numeric variables. When numbers -are specified, runs of increasing integers may be indicated as -@code{@var{num1} TO @var{num2}}, so that +Specify a dummy variable name followed by an equals sign (@samp{=}) +and the list of replacements. Replacements can be a list of existing +or new variables, numbers, strings, or @code{ALL} to specify all +existing variables. When numbers are specified, runs of increasing +integers may be indicated as @code{@var{num1} TO @var{num2}}, so that @samp{1 TO 5} is short for @samp{1 2 3 4 5}. Multiple dummy variables can be specified. Each @@ -100,10 +99,22 @@ each dummy variable is substituted; the second time, the second value for each dummy variable is substituted; and so on. Dummy variable substitutions work like macros. They take place -anywhere in a line that the dummy variable name occurs as a token, -including command and subcommand names. For this reason, -words commonly used in command and subcommand names should not be used -as dummy variable identifiers. +anywhere in a line that the dummy variable name occurs. This includes +command and subcommand names, so command and subcommand names that +appear in the code block should not be used as dummy variable +identifiers. Dummy variable substitutions do not occur inside quoted +strings, comments, unquoted strings (such as the text on the +@cmd{TITLE} or @cmd{DOCUMENT} command), or inside @cmd{BEGIN +DATA}@dots{}@cmd{END DATA}. + +New variable names used as replacements are not automatically created +as variables, but only if used in the code block in a context that +would create them, e.g.@: on a @cmd{NUMERIC} or @cmd{STRING} command +or on the left side of a @cmd{COMPUTE} assignment. + +Any command may appear within DO REPEAT, including nested DO REPEAT +commands. If @cmd{INCLUDE} or @cmd{INSERT} appears within DO REPEAT, +the substitutions do not apply to the included file. If PRINT is specified on @cmd{END REPEAT}, the commands after substitutions are made are printed to the listing file, prefixed by a plus sign diff --git a/doc/invoking.texi b/doc/invoking.texi index 826498a2..4c4f74aa 100644 --- a/doc/invoking.texi +++ b/doc/invoking.texi @@ -49,10 +49,12 @@ corresponding short options. @example -I, --include=@var{dir} -I-, --no-include +-b, --batch -i, --interactive -r, --no-statrc -a, --algorithm=@{compatible|enhanced@} -x, --syntax=@{compatible|enhanced@} +--syntax-encoding=@var{encoding} @end example @item Informational options @@ -135,11 +137,13 @@ inserted in the include path by default. The default include path is user's home directory, followed by PSPP's system configuration directory (usually @file{/etc/pspp} or @file{/usr/local/etc/pspp}). +@item -b +@item --batch @item -i @itemx --interactive -This option forces syntax files to be interpreted in interactive -mode, rather than the default batch mode. @xref{Syntax Variants}, for -a description of the differences. +These options forces syntax files to be interpreted in batch mode or +interactive mode, respectively, rather than the default ``auto'' mode. +@xref{Syntax Variants}, for a description of the differences. @item -r @itemx --no-statrc @@ -161,8 +165,14 @@ With @code{enhanced}, the default, PSPP accepts its own extensions beyond those compatible with the proprietary program SPSS. With @code{compatible}, PSPP rejects syntax that uses these extensions. -@item -? -@itemx --help +@item --syntax-encoding=@var{encoding} +Specifies @var{encoding} as the encoding for syntax files named on the +command line. The @var{encoding} also becomes the default encoding +for other syntax files read during the PSPP session by the +@cmd{INCLUDE} and @cmd{INSERT} commands. @xref{INSERT}, for the +accepted forms of @var{encoding}. + +@item --help Prints a message describing PSPP command-line syntax and the available device formats, then exits. diff --git a/doc/language.texi b/doc/language.texi index 78d38acd..5381668c 100644 --- a/doc/language.texi +++ b/doc/language.texi @@ -111,26 +111,29 @@ character used for quoting in the string, double it, e.g.@: significant inside strings. Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' + -'c'} is equivalent to @samp{'abc'}. Concatenation is useful for -splitting a single string across multiple source lines. - -Strings may also be expressed as hexadecimal, octal, or binary -character values by prefixing the initial quote character by @samp{X}, -@samp{O}, or @samp{B} or their lowercase equivalents. Each pair, -triplet, or octet of characters, according to the radix, is -transformed into a single character with the given value. If there is -an incomplete group of characters, the missing final digits are -assumed to be @samp{0}. These forms of strings are nonportable -because numeric values are associated with different characters by -different operating systems. Therefore, their use should be confined -to syntax files that will not be widely distributed. - -@cindex characters, reserved -@cindex 0 -@cindex white space -The character with value 00 is reserved for -internal use by PSPP. Its use in strings causes an error and -replacement by a space character. +'c'} is equivalent to @samp{'abc'}. So that a long string may be +broken across lines, a line break may precede or follow, or both +precede and follow, the @samp{+}. (However, an entirely blank line +preceding or following the @samp{+} is interpreted as ending the +current command.) + +Strings may also be expressed as hexadecimal character values by +prefixing the initial quote character by @samp{x} or @samp{X}. +Regardless of the syntax file or active dataset's encoding, the +hexadecimal digits in the string are interpreted as Unicode characters +in UTF-8 encoding. + +Individual Unicode code points may also be expressed by specifying the +hexadecimal code point number in single or double quotes preceded by +@samp{u} or @samp{U}. For example, Unicode code point U+1D11E, the +musical G clef character, could be expressed as @code{U'1D11E'}. +Invalid Unicode code points (above U+10FFFF or in between U+D800 and +U+DFFF) are not allowed. + +When strings are concatenated with @samp{+}, each segment's prefix is +considered individually. For example, @code{'The G clef symbol is:' + +u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise +plain text string. @item Punctuators and Operators @cindex punctuators @@ -177,33 +180,40 @@ described in the previous section (@pxref{Tokens}). A blank line, or one that consists only of white space or comments, also ends a command. @node Syntax Variants -@section Variants of syntax. +@section Syntax Variants @cindex Batch syntax @cindex Interactive syntax -There are two variants of command syntax, @i{viz}: @dfn{batch} mode and -@dfn{interactive} mode. -Batch mode is the default when reading commands from a file. -Interactive mode is the default when commands are typed at a prompt -by a user. -Certain commands, such as @cmd{INSERT} (@pxref{INSERT}), may explicitly -change the syntax mode. - -In batch mode, any line that contains a non-space character -in the leftmost column begins a new command. -Thus, each command consists of a flush-left line followed by any -number of lines indented from the left margin. -In this mode, a plus or minus sign (@samp{+}, @samp{@minus{}}) as the -first character in a line is ignored and causes that line to begin a -new command, which allows for visual indentation of a command without -that command being considered part of the previous command. -The period terminating the end of a command is optional but recommended. - -In interactive mode, each command must be terminated with a period -or by a blank line. -The use of @samp{+} and @samp{@minus{}} as continuation characters is not -permitted. +There are three variants of command syntax, which vary only in how +they detect the end of one command and the start of the next. + +In @dfn{interactive mode}, which is the default for syntax typed at a +command prompt, a period as the last non-blank character on a line +ends a command. A blank line also ends a command. + +In @dfn{batch mode}, an end-of-line period or a blank line also ends a +command. Additionally, it treats any line that has a non-blank +character in the leftmost column as beginning a new command. Thus, in +batch mode the second and subsequent lines in a command must be +indented. + +Regardless of the syntax mode, a plus sign, minus sign, or period in +the leftmost column of a line is ignored and causes that line to begin +a new command. This is most useful in batch mode, in which the first +line of a new command could not otherwise be indented, but it is +accepted regardless of syntax mode. + +The default mode for reading commands from a file is @dfn{auto mode}. +It is the same as batch mode, except that a line with a non-blank in +the leftmost column only starts a new command if that line begins with +the name of a PSPP command. This correctly interprets most valid PSPP +syntax files regardless of the syntax mode for which they are +intended. + +The @option{--interactive} (or @option{-i}) or @option{--batch} (or +@option{-b}) options set the syntax mode for files listed on the PSPP +command line. @xref{Main Options}, for more details. @node Types of Commands @section Types of Commands diff --git a/doc/utilities.texi b/doc/utilities.texi index b729b88b..2cf95a31 100644 --- a/doc/utilities.texi +++ b/doc/utilities.texi @@ -242,7 +242,7 @@ subshell. @vindex INCLUDE @display - INCLUDE [FILE=]'file-name'. + INCLUDE [FILE=]'file-name' [ENCODING='encoding']. @end display @cmd{INCLUDE} causes the PSPP command processor to read an @@ -253,19 +253,11 @@ stop and no more commands will be processed. Include files may be nested to any depth, up to the limit of available memory. +The @cmd{INSERT} command (@pxref{INSERT}) is a more flexible +alternative to @cmd{INCLUDE}. An INCLUDE command acts the same as +INSERT with ERROR=STOP CD=NO SYNTAX=BATCH specified. -The @cmd{INSERT} command (@pxref{INSERT}) may be used instead of -@cmd{INCLUDE} if you require more flexible options. -The syntax -@example -INCLUDE FILE=@var{file-name}. -@end example -@noindent -functions identically to -@example -INSERT FILE=@var{file-name} ERROR=STOP CD=NO SYNTAX=BATCH. -@end example - +The optional ENCODING subcommand has the same meaning as on INSERT. @node INSERT @section INSERT @@ -275,7 +267,8 @@ INSERT FILE=@var{file-name} ERROR=STOP CD=NO SYNTAX=BATCH. INSERT [FILE=]'file-name' [CD=@{NO,YES@}] [ERROR=@{CONTINUE,STOP@}] - [SYNTAX=@{BATCH,INTERACTIVE@}]. + [SYNTAX=@{BATCH,INTERACTIVE@}] + [ENCODING='encoding']. @end display @cmd{INSERT} is similar to @cmd{INCLUDE} (@pxref{INCLUDE}) @@ -303,6 +296,37 @@ the included file must conform to interactive syntax conventions. @xref{Syntax Variants}. The default setting is @samp{SYNTAX=BATCH}. +ENCODING optionally specifies the character set used by the included +file. Its argument, which is not case-sensitive, must be in one of +the following forms: + +@table @asis +@item @code{Locale} +The encoding used by the system locale, or as overridden by the SET +LOCALE command (@pxref{SET}). On Unix systems, environment variables, +e.g.@: @env{LANG} or @env{LC_ALL}, determine the system locale. + +@item IANA character set name +One of the character set names listed by IANA at +@uref{http://www.iana.org/assignments/character-sets}. Some examples +are @code{ASCII} (United States), @code{ISO-8859-1} (western Europe), +@code{EUC-JP} (Japan), and @code{windows-1252} (Windows). Not all +systems support all character sets. + +@item @code{Auto} +@item @code{Auto,@var{encoding}} +Automatically detects whether a syntax file is encoded in +@var{encoding} or in a Unicode encoding such as UTF-8, UTF-16, or +UTF-32. The @var{encoding} may be an IANA character set name or +@code{Locale} (the default). Only ASCII compatible encodings can +automatically be distinguished from UTF-8 (the most common locale +encodings are all ASCII-compatible). +@end table + +When ENCODING is not specified, the default is taken from the +@option{--syntax-encoding} command option, if it was specified, and +otherwise it is @code{Auto}. + @node PERMISSIONS @section PERMISSIONS @vindex PERMISSIONS @@ -363,7 +387,8 @@ SET /MXWARNS=max_warnings /WORKSPACE=workspace_size -(program execution) +(syntax execution) + /LOCALE='locale' /MEXPAND=@{ON,OFF@} /MITERATE=max_iterations /MNEST=max_nest @@ -540,10 +565,20 @@ that warnings will not be given. The default value is 100. @end table -Program execution subcommands control the way that PSPP commands -execute. The program execution subcommands are +Syntax execution subcommands control the way that PSPP commands +execute. The syntax execution subcommands are @table @asis +@item LOCALE +Overrides the system locale for the purpose of reading and writing +syntax and data files. The argument should be a locale name in the +general form @code{language_country.encoding}, where @code{language} +and @code{country} are 2-character language and country abbreviations, +respectively, and @code{encoding} is an IANA character set name. +Example locales are @code{en_US.UTF-8} (UTF-8 encoded English as +spoken in the United States) and @code{ja_JP.EUC-JP} (EUC-JP encoded +Japanese as spoken in Japan). + @item MEXPAND @itemx MITERATE @itemx MNEST diff --git a/perl-module/PSPP.xs b/perl-module/PSPP.xs index 25effb92..58eac5b7 100644 --- a/perl-module/PSPP.xs +++ b/perl-module/PSPP.xs @@ -1,5 +1,5 @@ /* PSPP - computes sample statistics. - Copyright (C) 2007, 2008, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as @@ -88,7 +88,7 @@ struct sysreader_info /* A message handler which writes messages to PSPP::errstr */ static void -message_handler (const struct msg *m) +message_handler (const struct msg *m, void *aux) { SV *errstr = get_sv("PSPP::errstr", TRUE); sv_setpv (errstr, m->text); @@ -179,7 +179,7 @@ CODE: assert (0 == strncmp (ver, bare_version, strlen (ver))); i18n_init (); - msg_init (NULL, message_handler); + msg_set_handler (message_handler, NULL); settings_init (0, 0); fh_init (); @@ -255,7 +255,7 @@ set_documents (dict, docs) struct dictionary *dict char *docs CODE: - dict_set_documents (dict, docs); + dict_set_documents_string (dict, docs); void @@ -263,7 +263,7 @@ add_document (dict, doc) struct dictionary *dict char *doc CODE: - dict_add_document_line (dict, doc); + dict_add_document_line (dict, doc, false); void @@ -326,7 +326,7 @@ pxs_dict_create_var (dict, name, ip_fmt) INIT: SV *errstr = get_sv("PSPP::errstr", TRUE); sv_setpv (errstr, ""); - if ( ! var_is_plausible_name (name, false)) + if ( ! id_is_plausible (name, false)) { sv_setpv (errstr, "The variable name is not valid."); XSRETURN_UNDEF; @@ -376,7 +376,7 @@ set_label (var, label) struct variable *var; char *label CODE: - var_set_label (var, label); + var_set_label (var, label, NULL, false); void diff --git a/perl-module/t/Pspp.t b/perl-module/t/Pspp.t index fce5b74d..a1ff5051 100644 --- a/perl-module/t/Pspp.t +++ b/perl-module/t/Pspp.t @@ -72,7 +72,7 @@ sub run_pspp_syntax_cmp ok ($d->get_var_cnt () == 0); $d->set_label ("My Dictionary"); - $d->set_documents ("These Documents"); + $d->add_document ("These Documents"); # Tests for variable creation @@ -130,7 +130,7 @@ sub run_pspp_syntax_cmp ) ); - $d->set_documents ("This should not appear"); + $d->add_document ("This should not appear"); $d->clear_documents (); $d->add_document ("This is a document line"); diff --git a/src/data/automake.mk b/src/data/automake.mk index 609c59b8..ae0000d2 100644 --- a/src/data/automake.mk +++ b/src/data/automake.mk @@ -66,6 +66,7 @@ src_data_libdata_la_SOURCES = \ src/data/gnumeric-reader.c \ src/data/gnumeric-reader.h \ src/data/identifier.c \ + src/data/identifier2.c \ src/data/identifier.h \ src/data/lazy-casereader.c \ src/data/lazy-casereader.h \ diff --git a/src/data/dictionary.c b/src/data/dictionary.c index 467f347e..79d36374 100644 --- a/src/data/dictionary.c +++ b/src/data/dictionary.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "data/attributes.h" #include "data/case.h" @@ -36,14 +37,17 @@ #include "libpspp/compiler.h" #include "libpspp/hash-functions.h" #include "libpspp/hmap.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/misc.h" #include "libpspp/pool.h" #include "libpspp/str.h" +#include "libpspp/string-array.h" #include "gl/intprops.h" #include "gl/minmax.h" #include "gl/xalloc.h" +#include "gl/xmemdup0.h" #include "gettext.h" #define _(msgid) gettext (msgid) @@ -63,7 +67,7 @@ struct dictionary struct variable *filter; /* FILTER variable. */ casenumber case_limit; /* Current case limit (N command). */ char *label; /* File label. */ - struct string documents; /* Documents, as a string. */ + struct string_array documents; /* Documents. */ struct vector **vector; /* Vectors of variables. */ size_t vector_cnt; /* Number of vectors. */ struct attrset attributes; /* Custom attributes. */ @@ -99,6 +103,15 @@ dict_get_encoding (const struct dictionary *d) return d->encoding ; } +/* Returns true if UTF-8 string ID is an acceptable identifier in DICT's + encoding, false otherwise. If ISSUE_ERROR is true, issues an explanatory + error message on failure. */ +bool +dict_id_is_valid (const struct dictionary *dict, const char *id, + bool issue_error) +{ + return id_is_valid (id, dict->encoding, issue_error); +} void dict_set_change_callback (struct dictionary *d, @@ -268,7 +281,7 @@ dict_clear (struct dictionary *d) d->case_limit = 0; free (d->label); d->label = NULL; - ds_destroy (&d->documents); + string_array_clear (&d->documents); dict_clear_vectors (d); attrset_clear (&d->attributes); } @@ -845,54 +858,67 @@ var_name_is_insertable (const struct dictionary *dict, const char *name) static char * make_hinted_name (const struct dictionary *dict, const char *hint) { - char name[VAR_NAME_LEN + 1]; + size_t hint_len = strlen (hint); bool dropped = false; - char *cp; - - for (cp = name; *hint && cp < name + VAR_NAME_LEN; hint++) + char *root, *rp; + size_t ofs; + int mblen; + + /* The allocation size here is OK: characters that are copied directly fit + OK, and characters that are not copied directly are replaced by a single + '_' byte. If u8_mbtouc() replaces bad input by 0xfffd, then that will get + replaced by '_' too. */ + root = rp = xmalloc (hint_len + 1); + for (ofs = 0; ofs < hint_len; ofs += mblen) { - if (cp == name - ? lex_is_id1 (*hint) && *hint != '$' - : lex_is_idn (*hint)) + ucs4_t uc; + + mblen = u8_mbtouc (&uc, CHAR_CAST (const uint8_t *, hint + ofs), + hint_len - ofs); + if (rp == root + ? lex_uc_is_id1 (uc) && uc != '$' + : lex_uc_is_idn (uc)) { if (dropped) { - *cp++ = '_'; + *rp++ = '_'; dropped = false; } - if (cp < name + VAR_NAME_LEN) - *cp++ = *hint; + rp += u8_uctomb (CHAR_CAST (uint8_t *, rp), uc, 6); } - else if (cp > name) + else if (rp != root) dropped = true; } - *cp = '\0'; + *rp = '\0'; - if (name[0] != '\0') + if (root[0] != '\0') { - size_t len = strlen (name); unsigned long int i; - if (var_name_is_insertable (dict, name)) - return xstrdup (name); + if (var_name_is_insertable (dict, root)) + return root; for (i = 0; i < ULONG_MAX; i++) { char suffix[INT_BUFSIZE_BOUND (i) + 1]; - int ofs; + char *name; suffix[0] = '_'; if (!str_format_26adic (i + 1, &suffix[1], sizeof suffix - 1)) NOT_REACHED (); - ofs = MIN (VAR_NAME_LEN - strlen (suffix), len); - strcpy (&name[ofs], suffix); - + name = utf8_encoding_concat (root, suffix, dict->encoding, 64); if (var_name_is_insertable (dict, name)) - return xstrdup (name); + { + free (root); + return name; + } + free (name); } } + free (root); + return NULL; } @@ -1238,74 +1264,94 @@ dict_set_label (struct dictionary *d, const char *label) d->label = label != NULL && label[0] != '\0' ? xstrndup (label, 60) : NULL; } -/* Returns the documents for D, or a null pointer if D has no - documents. If the return value is nonnull, then the string - will be an exact multiple of DOC_LINE_LENGTH bytes in length, - with each segment corresponding to one line. */ -const char * +/* Returns the documents for D, as an UTF-8 encoded string_array. The + return value is always nonnull; if there are no documents then the + string_arary is empty.*/ +const struct string_array * dict_get_documents (const struct dictionary *d) { - return ds_is_empty (&d->documents) ? NULL : ds_cstr (&d->documents); + return &d->documents; } -/* Sets the documents for D to DOCUMENTS, or removes D's - documents if DOCUMENT is a null pointer. If DOCUMENTS is - nonnull, then it should be an exact multiple of - DOC_LINE_LENGTH bytes in length, with each segment - corresponding to one line. */ +/* Replaces the documents for D by NEW_DOCS, a UTF-8 encoded string_array. */ void -dict_set_documents (struct dictionary *d, const char *documents) +dict_set_documents (struct dictionary *d, const struct string_array *new_docs) { - size_t remainder; + size_t i; - ds_assign_cstr (&d->documents, documents != NULL ? documents : ""); + dict_clear_documents (d); - /* In case the caller didn't get it quite right, pad out the - final line with spaces. */ - remainder = ds_length (&d->documents) % DOC_LINE_LENGTH; - if (remainder != 0) - ds_put_byte_multiple (&d->documents, ' ', DOC_LINE_LENGTH - remainder); + for (i = 0; i < new_docs->n; i++) + dict_add_document_line (d, new_docs->strings[i], false); +} + +/* Replaces the documents for D by UTF-8 encoded string NEW_DOCS, dividing it + into individual lines at new-line characters. Each line is truncated to at + most DOC_LINE_LENGTH bytes in D's encoding. */ +void +dict_set_documents_string (struct dictionary *d, const char *new_docs) +{ + const char *s; + + dict_clear_documents (d); + for (s = new_docs; *s != '\0'; ) + { + size_t len = strcspn (s, "\n"); + char *line = xmemdup0 (s, len); + dict_add_document_line (d, line, false); + free (line); + + s += len; + if (*s == '\n') + s++; + } } /* Drops the documents from dictionary D. */ void dict_clear_documents (struct dictionary *d) { - ds_clear (&d->documents); + string_array_clear (&d->documents); } -/* Appends LINE to the documents in D. LINE will be truncated or - padded on the right with spaces to make it exactly - DOC_LINE_LENGTH bytes long. */ -void -dict_add_document_line (struct dictionary *d, const char *line) +/* Appends the UTF-8 encoded LINE to the documents in D. LINE will be + truncated so that it is no more than 80 bytes in the dictionary's + encoding. If this causes some text to be lost, and ISSUE_WARNING is true, + then a warning will be issued. */ +bool +dict_add_document_line (struct dictionary *d, const char *line, + bool issue_warning) { - if (strlen (line) > DOC_LINE_LENGTH) + size_t trunc_len; + bool truncated; + + trunc_len = utf8_encoding_trunc_len (line, d->encoding, DOC_LINE_LENGTH); + truncated = line[trunc_len] != '\0'; + if (truncated && issue_warning) { /* Note to translators: "bytes" is correct, not characters */ msg (SW, _("Truncating document line to %d bytes."), DOC_LINE_LENGTH); } - buf_copy_str_rpad (ds_put_uninit (&d->documents, DOC_LINE_LENGTH), - DOC_LINE_LENGTH, line, ' '); + + string_array_append_nocopy (&d->documents, xmemdup0 (line, trunc_len)); + + return !truncated; } /* Returns the number of document lines in dictionary D. */ size_t dict_get_document_line_cnt (const struct dictionary *d) { - return ds_length (&d->documents) / DOC_LINE_LENGTH; + return d->documents.n; } -/* Copies document line number IDX from dictionary D into - LINE, trimming off any trailing white space. */ -void -dict_get_document_line (const struct dictionary *d, - size_t idx, struct string *line) +/* Returns document line number IDX in dictionary D. The caller must not + modify or free the returned string. */ +const char * +dict_get_document_line (const struct dictionary *d, size_t idx) { - assert (idx < dict_get_document_line_cnt (d)); - ds_assign_substring (line, ds_substr (&d->documents, idx * DOC_LINE_LENGTH, - DOC_LINE_LENGTH)); - ds_rtrim (line, ss_cstr (CC_SPACES)); + assert (idx < d->documents.n); + return d->documents.strings[idx]; } /* Creates in D a vector named NAME that contains the CNT diff --git a/src/data/dictionary.h b/src/data/dictionary.h index dd220e4d..fa5d0dde 100644 --- a/src/data/dictionary.h +++ b/src/data/dictionary.h @@ -127,14 +127,15 @@ void dict_set_label (struct dictionary *, const char *); /* Documents. */ #define DOC_LINE_LENGTH 80 /* Fixed length of document lines. */ -const char *dict_get_documents (const struct dictionary *); -void dict_set_documents (struct dictionary *, const char *); +const struct string_array *dict_get_documents (const struct dictionary *); +void dict_set_documents (struct dictionary *, const struct string_array *); +void dict_set_documents_string (struct dictionary *, const char *); void dict_clear_documents (struct dictionary *); -void dict_add_document_line (struct dictionary *, const char *); +bool dict_add_document_line (struct dictionary *, const char *, + bool issue_warning); size_t dict_get_document_line_cnt (const struct dictionary *); -void dict_get_document_line (const struct dictionary *, - size_t, struct string *); +const char *dict_get_document_line (const struct dictionary *, size_t); /* Vectors. */ bool dict_create_vector (struct dictionary *, const char *name, @@ -166,6 +167,9 @@ bool dict_has_attributes (const struct dictionary *); void dict_set_encoding (struct dictionary *d, const char *enc); const char *dict_get_encoding (const struct dictionary *d); +bool dict_id_is_valid (const struct dictionary *, const char *id, + bool issue_error); + /* Internal variables. */ struct variable *dict_create_internal_var (int case_idx, int width); void dict_destroy_internal_var (struct variable *); diff --git a/src/data/file-handle-def.c b/src/data/file-handle-def.c index 95a92f57..2b8e40ce 100644 --- a/src/data/file-handle-def.c +++ b/src/data/file-handle-def.c @@ -220,10 +220,10 @@ fh_inline_file (void) return inline_file; } -/* Creates and returns a new file handle with the given ID, which - may be null. If it is non-null, it must be unique among - existing file identifiers. The new handle is associated with - file FILE_NAME and the given PROPERTIES. */ +/* Creates and returns a new file handle with the given ID, which may be null. + If it is non-null, it must be a UTF-8 encoded string that is unique among + existing file identifiers. The new handle is associated with file FILE_NAME + and the given PROPERTIES. */ struct file_handle * fh_create_file (const char *id, const char *file_name, const struct fh_properties *properties) diff --git a/src/data/gnumeric-reader.h b/src/data/gnumeric-reader.h index 6bb5a6b7..b313fc78 100644 --- a/src/data/gnumeric-reader.h +++ b/src/data/gnumeric-reader.h @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2007 Free Software Foundation, Inc. + Copyright (C) 2007, 2010 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -24,9 +24,9 @@ struct casereader; struct gnumeric_read_info { - char *sheet_name ; - char *file_name ; - char *cell_range ; + char *sheet_name ; /* In UTF-8. */ + char *file_name ; /* In filename encoding. */ + char *cell_range ; /* In UTF-8. */ int sheet_index ; bool read_names ; int asw ; diff --git a/src/data/identifier.c b/src/data/identifier.c index 4b613bb4..f1c22ef1 100644 --- a/src/data/identifier.c +++ b/src/data/identifier.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2005, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2005, 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -23,15 +23,10 @@ #include "data/identifier.h" -#include #include #include -#include #include "libpspp/assertion.h" -#include "libpspp/cast.h" -#include "libpspp/i18n.h" -#include "libpspp/message.h" #include "gl/c-ctype.h" @@ -319,20 +314,3 @@ lex_id_to_token (struct substring id) return T_ID; } - -/* Returns the name for the given keyword token type. */ -const char * -lex_id_name (enum token_type token) -{ - const struct keyword *kw; - - for (kw = keywords; kw < &keywords[keyword_cnt]; kw++) - if (kw->token == token) - { - /* A "struct substring" is not guaranteed to be - null-terminated, as our caller expects, but in this - case it always will be. */ - return ss_data (kw->identifier); - } - NOT_REACHED (); -} diff --git a/src/data/identifier.h b/src/data/identifier.h index bf20f9cc..7f2f9042 100644 --- a/src/data/identifier.h +++ b/src/data/identifier.h @@ -74,6 +74,12 @@ const char *token_type_to_string (enum token_type); /* Tokens. */ bool lex_is_keyword (enum token_type); +/* Validating identifiers. */ +#define ID_MAX_LEN 64 /* Maximum length of identifier, in bytes. */ + +bool id_is_valid (const char *id, const char *dict_encoding, bool issue_error); +bool id_is_plausible (const char *id, bool issue_error); + /* Recognizing identifiers. */ bool lex_is_id1 (char); bool lex_is_idn (char); @@ -88,7 +94,4 @@ bool lex_id_match_n (struct substring keyword, struct substring token, size_t n); int lex_id_to_token (struct substring); -/* Identifier names. */ -const char *lex_id_name (enum token_type); - #endif /* !data/identifier.h */ diff --git a/src/data/identifier2.c b/src/data/identifier2.c new file mode 100644 index 00000000..3b6458f0 --- /dev/null +++ b/src/data/identifier2.c @@ -0,0 +1,133 @@ +/* PSPP - a program for statistical analysis. + Copyright (C) 1997-9, 2000, 2005, 2009, 2010, 2011 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +/* This file implements parts of identifier.h that call the msg() function. + This allows test programs that do not use those functions to avoid linking + additional object files. */ + +#include + +#include "data/identifier.h" + +#include +#include + +#include "libpspp/cast.h" +#include "libpspp/i18n.h" +#include "libpspp/message.h" + +#include "gl/c-ctype.h" + +#include "gettext.h" +#define _(msgid) gettext (msgid) + +/* Returns true if UTF-8 string ID is an acceptable identifier in encoding + DICT_ENCODING (UTF-8 if null), false otherwise. If ISSUE_ERROR is true, + issues an explanatory error message on failure. */ +bool +id_is_valid (const char *id, const char *dict_encoding, bool issue_error) +{ + size_t dict_len; + + if (!id_is_plausible (id, issue_error)) + return false; + + if (dict_encoding != NULL) + { + /* XXX need to reject recoded strings that contain the fallback + character. */ + dict_len = recode_string_len (dict_encoding, "UTF-8", id, -1); + } + else + dict_len = strlen (id); + + if (dict_len > ID_MAX_LEN) + { + if (issue_error) + msg (SE, _("Identifier `%s' exceeds %d-byte limit."), + id, ID_MAX_LEN); + return false; + } + + return true; +} + +/* Returns true if UTF-8 string ID is an plausible identifier, false + otherwise. If ISSUE_ERROR is true, issues an explanatory error message on + failure. */ +bool +id_is_plausible (const char *id, bool issue_error) +{ + const uint8_t *bad_unit; + const uint8_t *s; + char ucname[16]; + int mblen; + ucs4_t uc; + + /* ID cannot be the empty string. */ + if (id[0] == '\0') + { + if (issue_error) + msg (SE, _("Identifier cannot be empty string.")); + return false; + } + + /* ID cannot be a reserved word. */ + if (lex_id_to_token (ss_cstr (id)) != T_ID) + { + if (issue_error) + msg (SE, _("`%s' may not be used as an identifier because it " + "is a reserved word."), id); + return false; + } + + bad_unit = u8_check (CHAR_CAST (const uint8_t *, id), strlen (id)); + if (bad_unit != NULL) + { + /* If this message ever appears, it probably indicates a PSPP bug since + it shouldn't be possible to get invalid UTF-8 this far. */ + if (issue_error) + msg (SE, _("`%s' may not be used as an identifier because it " + "contains ill-formed UTF-8 at byte offset %tu."), + id, CHAR_CAST (const char *, bad_unit) - id); + return false; + } + + /* Check that it is a valid identifier. */ + mblen = u8_strmbtouc (&uc, CHAR_CAST (uint8_t *, id)); + if (!lex_uc_is_id1 (uc)) + { + if (issue_error) + msg (SE, _("Character %s (in `%s') may not appear " + "as the first character in a identifier."), + uc_name (uc, ucname), id); + return false; + } + + for (s = CHAR_CAST (uint8_t *, id + mblen); + (mblen = u8_strmbtouc (&uc, s)) != 0; + s += mblen) + if (!lex_uc_is_idn (uc)) + { + if (issue_error) + msg (SE, _("Character %s (in `%s') may not appear in an " + "identifier."), + uc_name (uc, ucname), id); + return false; + } + + return true; +} diff --git a/src/data/mrset.c b/src/data/mrset.c index d1807b96..38b0ab2d 100644 --- a/src/data/mrset.c +++ b/src/data/mrset.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2010 Free Software Foundation, Inc. + Copyright (C) 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -21,11 +21,16 @@ #include #include "data/dictionary.h" +#include "data/identifier.h" #include "data/val-type.h" #include "data/variable.h" +#include "libpspp/message.h" #include "gl/xalloc.h" +#include "gettext.h" +#define _(msgid) gettext (msgid) + /* Creates and returns a clone of OLD. The caller is responsible for freeing the new multiple response set (using mrset_destroy()). */ struct mrset * @@ -62,9 +67,31 @@ mrset_destroy (struct mrset *mrset) } } +/* Returns true if the UTF-8 encoded NAME is a valid name for a multiple + response set in a dictionary encoded in DICT_ENCODING, false otherwise. If + ISSUE_ERROR is true, issues an explanatory error message on failure. */ +bool +mrset_is_valid_name (const char *name, const char *dict_encoding, + bool issue_error) +{ + if (!id_is_valid (name, dict_encoding, issue_error)) + return false; + + if (name[0] != '$') + { + if (issue_error) + msg (SE, _("%s is not a valid name for a multiple response " + "set. Multiple response set names must begin with " + "`$'."), name); + return false; + } + + return true; +} + /* Checks various constraints on MRSET: - - MRSET has a valid name for a multiple response set (beginning with '$'). + - MRSET's name begins with '$' and is valid as an identifier in DICT. - MRSET has a valid type. @@ -85,7 +112,7 @@ mrset_ok (const struct mrset *mrset, const struct dictionary *dict) size_t i; if (mrset->name == NULL - || mrset->name[0] != '$' + || !mrset_is_valid_name (mrset->name, dict_get_encoding (dict), false) || (mrset->type != MRSET_MD && mrset->type != MRSET_MC) || mrset->vars == NULL || mrset->n_vars < 2) diff --git a/src/data/mrset.h b/src/data/mrset.h index c531db7a..9971924e 100644 --- a/src/data/mrset.h +++ b/src/data/mrset.h @@ -61,8 +61,8 @@ enum mrset_md_cat_source /* A multiple response set. */ struct mrset { - char *name; /* Name for syntax. Always begins with "$". */ - char *label; /* Human-readable label for group. */ + char *name; /* UTF-8 encoded name beginning with "$". */ + char *label; /* Human-readable UTF-8 label for group. */ enum mrset_type type; /* Group type. */ struct variable **vars; /* Constituent variables. */ size_t n_vars; /* Number of constituent variables. */ @@ -77,6 +77,9 @@ struct mrset struct mrset *mrset_clone (const struct mrset *); void mrset_destroy (struct mrset *); +bool mrset_is_valid_name (const char *name, const char *dict_encoding, + bool issue_error); + bool mrset_ok (const struct mrset *, const struct dictionary *); #endif /* data/mrset.h */ diff --git a/src/data/por-file-reader.c b/src/data/por-file-reader.c index 3f8ee3c9..372d7682 100644 --- a/src/data/por-file-reader.c +++ b/src/data/por-file-reader.c @@ -105,10 +105,11 @@ error (struct pfm_reader *r, const char *msg, ...) m.category = MSG_C_GENERAL; m.severity = MSG_S_ERROR; - m.where.file_name = NULL; - m.where.line_number = 0; - m.where.first_column = 0; - m.where.last_column = 0; + m.file_name = NULL; + m.first_line = 0; + m.last_line = 0; + m.first_column = 0; + m.last_column = 0; m.text = ds_cstr (&text); msg_emit (&m); @@ -136,10 +137,11 @@ warning (struct pfm_reader *r, const char *msg, ...) m.category = MSG_C_GENERAL; m.severity = MSG_S_WARNING; - m.where.file_name = NULL; - m.where.line_number = 0; - m.where.first_column = 0; - m.where.last_column = 0; + m.file_name = NULL; + m.first_line = 0; + m.last_line = 0; + m.first_column = 0; + m.last_column = 0; m.text = ds_cstr (&text); msg_emit (&m); @@ -682,7 +684,8 @@ read_variables (struct pfm_reader *r, struct dictionary *dict) for (j = 0; j < 6; j++) fmt[j] = read_int (r); - if (!var_is_valid_name (name, false) || *name == '#' || *name == '$') + if (!dict_id_is_valid (dict, name, false) + || *name == '#' || *name == '$') error (r, _("Invalid variable name `%s' in position %d."), name, i); str_uppercase (name); @@ -742,7 +745,7 @@ read_variables (struct pfm_reader *r, struct dictionary *dict) { char label[256]; read_string (r, label); - var_set_label (v, label); + var_set_label (v, label, NULL, false); /* XXX */ } } @@ -832,7 +835,7 @@ read_documents (struct pfm_reader *r, struct dictionary *dict) { char line[256]; read_string (r, line); - dict_add_document_line (dict, line); + dict_add_document_line (dict, line, false); } } diff --git a/src/data/por-file-writer.c b/src/data/por-file-writer.c index ee84e7c9..ea0f9dc8 100644 --- a/src/data/por-file-writer.c +++ b/src/data/por-file-writer.c @@ -436,10 +436,7 @@ write_documents (struct pfm_writer *w, const struct dictionary *dict) buf_write (w, "E", 1); write_int (w, line_cnt); for (i = 0; i < line_cnt; i++) - { - dict_get_document_line (dict, i, &line); - write_string (w, ds_cstr (&line)); - } + write_string (w, dict_get_document_line (dict, i)); ds_destroy (&line); } diff --git a/src/data/procedure.c b/src/data/procedure.c index fa52806f..a45f497a 100644 --- a/src/data/procedure.c +++ b/src/data/procedure.c @@ -101,6 +101,8 @@ struct dataset { void (*callback) (void *); /* Callback for when the dataset changes */ void *cb_data; + /* Default encoding for reading syntax files. */ + char *syntax_encoding; }; /* struct dataset */ @@ -125,6 +127,18 @@ dataset_set_callback (struct dataset *ds, void (*cb) (void *), void *cb_data) ds->cb_data = cb_data; } +void +dataset_set_default_syntax_encoding (struct dataset *ds, const char *encoding) +{ + free (ds->syntax_encoding); + ds->syntax_encoding = xstrdup (encoding); +} + +const char * +dataset_get_default_syntax_encoding (const struct dataset *ds) +{ + return ds->syntax_encoding; +} /* Returns the last time the data was read. */ time_t @@ -597,6 +611,9 @@ create_dataset (void) ds->caseinit = caseinit_create (); proc_cancel_all_transformations (ds); + + ds->syntax_encoding = xstrdup ("Auto"); + return ds; } @@ -621,6 +638,8 @@ destroy_dataset (struct dataset *ds) if ( ds->xform_callback) ds->xform_callback (false, ds->xform_callback_aux); + + free (ds->syntax_encoding); free (ds); } diff --git a/src/data/procedure.h b/src/data/procedure.h index fd3af604..ad45a537 100644 --- a/src/data/procedure.h +++ b/src/data/procedure.h @@ -82,10 +82,12 @@ bool dataset_end_of_command (struct dataset *); struct dictionary *dataset_dict (const struct dataset *ds); const struct casereader *dataset_source (const struct dataset *ds); - const struct ccase *lagged_case (const struct dataset *ds, int n_before); void dataset_need_lag (struct dataset *ds, int n_before); void dataset_set_callback (struct dataset *ds, void (*cb) (void *), void *); +void dataset_set_default_syntax_encoding (struct dataset *, const char *); +const char *dataset_get_default_syntax_encoding (const struct dataset *); + #endif /* procedure.h */ diff --git a/src/data/sys-file-reader.c b/src/data/sys-file-reader.c index ceb4e04f..6643b85d 100644 --- a/src/data/sys-file-reader.c +++ b/src/data/sys-file-reader.c @@ -33,6 +33,7 @@ #include "data/file-handle-def.h" #include "data/file-name.h" #include "data/format.h" +#include "data/identifier.h" #include "data/missing-values.h" #include "data/mrset.h" #include "data/short-names.h" @@ -953,7 +954,8 @@ parse_variable_records (struct sfm_reader *r, struct dictionary *dict, rec->name, 8, r->pool); name[strcspn (name, " ")] = '\0'; - if (!var_is_valid_name (name, false) || name[0] == '$' || name[0] == '#') + if (!dict_id_is_valid (dict, name, false) + || name[0] == '$' || name[0] == '#') sys_error (r, rec->pos, _("Invalid variable name `%s'."), name); if (rec->width < 0 || rec->width > 255) @@ -974,7 +976,7 @@ parse_variable_records (struct sfm_reader *r, struct dictionary *dict, utf8_label = recode_string_pool ("UTF-8", dict_encoding, rec->label, -1, r->pool); - var_set_label (var, utf8_label); + var_set_label (var, utf8_label, NULL, false); } /* Set missing values. */ @@ -1099,7 +1101,7 @@ parse_document (struct dictionary *dict, struct sfm_document_record *record) ss_rtrim (&line, ss_cstr (" ")); line.string[line.length] = '\0'; - dict_add_document_line (dict, line.string); + dict_add_document_line (dict, line.string, false); ss_dealloc (&line); } @@ -1539,7 +1541,8 @@ parse_long_var_name_map (struct sfm_reader *r, while (read_variable_to_value_pair (r, dict, text, &var, &long_name)) { /* Validate long name. */ - if (!var_is_valid_name (long_name, false)) + /* XXX need to reencode name to UTF-8 */ + if (!dict_id_is_valid (dict, long_name, false)) { sys_warn (r, record->pos, _("Long variable mapping from %s to invalid " @@ -2467,10 +2470,11 @@ sys_msg (struct sfm_reader *r, off_t offset, m.category = msg_class_to_category (class); m.severity = msg_class_to_severity (class); - m.where.file_name = NULL; - m.where.line_number = 0; - m.where.first_column = 0; - m.where.last_column = 0; + m.file_name = NULL; + m.first_line = 0; + m.last_line = 0; + m.first_column = 0; + m.last_column = 0; m.text = ds_cstr (&text); msg_emit (&m); diff --git a/src/data/sys-file-writer.c b/src/data/sys-file-writer.c index 1a65e3c0..b1cb7c22 100644 --- a/src/data/sys-file-writer.c +++ b/src/data/sys-file-writer.c @@ -47,6 +47,7 @@ #include "libpspp/message.h" #include "libpspp/misc.h" #include "libpspp/str.h" +#include "libpspp/string-array.h" #include "libpspp/version.h" #include "gl/xmemdup0.h" @@ -238,7 +239,7 @@ sfm_open_writer (struct file_handle *fh, struct dictionary *d, idx += sfm_width_to_octs (var_get_width (v)); } - if (dict_get_documents (d) != NULL) + if (dict_get_document_line_cnt (d) > 0) write_documents (w, d); write_integer_info_record (w); @@ -552,11 +553,22 @@ write_value_labels (struct sfm_writer *w, struct variable *v, int idx) static void write_documents (struct sfm_writer *w, const struct dictionary *d) { - size_t line_cnt = dict_get_document_line_cnt (d); + const struct string_array *docs = dict_get_documents (d); + const char *enc = dict_get_encoding (d); + size_t i; write_int (w, 6); /* Record type. */ - write_int (w, line_cnt); - write_bytes (w, dict_get_documents (d), line_cnt * DOC_LINE_LENGTH); + write_int (w, docs->n); + for (i = 0; i < docs->n; i++) + { + char *s = recode_string (enc, "UTF-8", docs->strings[i], -1); + size_t s_len = strlen (s); + size_t write_len = MIN (s_len, DOC_LINE_LENGTH); + + write_bytes (w, s, write_len); + write_spaces (w, DOC_LINE_LENGTH - write_len); + free (s); + } } static void diff --git a/src/data/variable.c b/src/data/variable.c index 2ceeb0d0..029e3f49 100644 --- a/src/data/variable.c +++ b/src/data/variable.c @@ -31,6 +31,7 @@ #include "libpspp/assertion.h" #include "libpspp/compiler.h" #include "libpspp/hash-functions.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/misc.h" #include "libpspp/str.h" @@ -132,7 +133,7 @@ var_clone (const struct variable *old_var) var_set_print_format (new_var, var_get_print_format (old_var)); var_set_write_format (new_var, var_get_write_format (old_var)); var_set_value_labels (new_var, var_get_value_labels (old_var)); - var_set_label (new_var, var_get_label (old_var)); + var_set_label (new_var, var_get_label (old_var), NULL, false); var_set_measure (new_var, var_get_measure (old_var)); var_set_display_width (new_var, var_get_display_width (old_var)); var_set_alignment (new_var, var_get_alignment (old_var)); @@ -163,109 +164,27 @@ var_destroy (struct variable *v) /* Variable names. */ -/* Return variable V's name. */ +/* Return variable V's name, as a UTF-8 encoded string. */ const char * var_get_name (const struct variable *v) { return v->name; } -/* Sets V's name to NAME. +/* Sets V's name to NAME, a UTF-8 encoded string. Do not use this function for a variable in a dictionary. Use dict_rename_var instead. */ void var_set_name (struct variable *v, const char *name) { assert (!var_has_vardict (v)); - assert (var_is_plausible_name (name, false)); + assert (id_is_plausible (name, false)); free (v->name); v->name = xstrdup (name); dict_var_changed (v); } -/* Returns true if NAME is an acceptable name for a variable, - false otherwise. If ISSUE_ERROR is true, issues an - explanatory error message on failure. */ -bool -var_is_valid_name (const char *name, bool issue_error) -{ - bool plausible; - size_t length, i; - - /* Note that strlen returns number of BYTES, not the number of - CHARACTERS */ - length = strlen (name); - - plausible = var_is_plausible_name(name, issue_error); - - if ( ! plausible ) - return false; - - - if (!lex_is_id1 (name[0])) - { - if (issue_error) - msg (SE, _("Character `%c' (in %s) may not appear " - "as the first character in a variable name."), - name[0], name); - return false; - } - - - for (i = 0; i < length; i++) - { - if (!lex_is_idn (name[i])) - { - if (issue_error) - msg (SE, _("Character `%c' (in %s) may not appear in " - "a variable name."), - name[i], name); - return false; - } - } - - return true; -} - -/* Returns true if NAME is an plausible name for a variable, - false otherwise. If ISSUE_ERROR is true, issues an - explanatory error message on failure. - This function makes no use of LC_CTYPE. -*/ -bool -var_is_plausible_name (const char *name, bool issue_error) -{ - size_t length; - - /* Note that strlen returns number of BYTES, not the number of - CHARACTERS */ - length = strlen (name); - if (length < 1) - { - if (issue_error) - msg (SE, _("Variable name cannot be empty string.")); - return false; - } - else if (length > VAR_NAME_LEN) - { - if (issue_error) - msg (SE, _("Variable name %s exceeds %d-character limit."), - name, (int) VAR_NAME_LEN); - return false; - } - - if (lex_id_to_token (ss_cstr (name)) != T_ID) - { - if (issue_error) - msg (SE, _("`%s' may not be used as a variable name because it " - "is a reserved word."), name); - return false; - } - - return true; -} - /* Returns VAR's dictionary class. */ enum dict_class var_get_dict_class (const struct variable *var) @@ -644,33 +563,61 @@ var_get_label (const struct variable *v) return v->label; } -/* Sets V's variable label to LABEL, stripping off leading and - trailing white space and truncating to 255 characters. - If LABEL is a null pointer or if LABEL is an empty string - (after stripping white space), then V's variable label (if - any) is removed. */ -void -var_set_label (struct variable *v, const char *label) +/* Sets V's variable label to UTF-8 encoded string LABEL, stripping off leading + and trailing white space. If LABEL is a null pointer or if LABEL is an + empty string (after stripping white space), then V's variable label (if any) + is removed. + + Variable labels are limited to 255 bytes in the dictionary encoding, which + should be specified as DICT_ENCODING. If LABEL fits within this limit, this + function returns true. Otherwise, the variable label is set to a truncated + value, this function returns false and, if ISSUE_WARNING is true, issues a + warning. */ +bool +var_set_label (struct variable *v, const char *label, + const char *dict_encoding, bool issue_warning) { + bool truncated = false; + free (v->label); v->label = NULL; if (label != NULL) { struct substring s = ss_cstr (label); + size_t trunc_len; + + if (dict_encoding != NULL) + { + enum { MAX_LABEL_LEN = 255 }; + + trunc_len = utf8_encoding_trunc_len (label, dict_encoding, + MAX_LABEL_LEN); + if (ss_length (s) > trunc_len) + { + if (issue_warning) + msg (SW, _("Truncating variable label for variable `%s' to %d " + "bytes."), var_get_name (v), MAX_LABEL_LEN); + ss_truncate (&s, trunc_len); + truncated = true; + } + } + ss_trim (&s, ss_cstr (CC_SPACES)); - ss_truncate (&s, 255); if (!ss_is_empty (s)) v->label = ss_xstrdup (s); } + dict_var_changed (v); + + return truncated; } /* Removes any variable label from V. */ void var_clear_label (struct variable *v) { - var_set_label (v, NULL); + var_set_label (v, NULL, NULL, false); } /* Returns true if V has a variable V, @@ -839,7 +786,7 @@ var_get_short_name (const struct variable *var, size_t idx) void var_set_short_name (struct variable *var, size_t idx, const char *short_name) { - assert (short_name == NULL || var_is_plausible_name (short_name, false)); + assert (short_name == NULL || id_is_plausible (short_name, false)); /* Clear old short name numbered IDX, if any. */ if (idx < var->short_name_cnt) diff --git a/src/data/variable.h b/src/data/variable.h index 03257502..9cffafc9 100644 --- a/src/data/variable.h +++ b/src/data/variable.h @@ -34,12 +34,8 @@ struct variable *var_clone (const struct variable *); void var_destroy (struct variable *); /* Variable names. */ -#define VAR_NAME_LEN 64 /* Maximum length of variable name, in bytes. */ - const char *var_get_name (const struct variable *); void var_set_name (struct variable *, const char *); -bool var_is_valid_name (const char *, bool issue_error); -bool var_is_plausible_name (const char *name, bool issue_error); enum dict_class var_get_dict_class (const struct variable *); int compare_vars_by_name (const void *, const void *, const void *); @@ -102,7 +98,8 @@ struct fmt_spec var_default_formats (int width); /* Variable labels. */ const char *var_to_string (const struct variable *); const char *var_get_label (const struct variable *); -void var_set_label (struct variable *, const char *); +bool var_set_label (struct variable *, const char *label, + const char *dict_encoding, bool issue_warning); void var_clear_label (struct variable *); bool var_has_label (const struct variable *); diff --git a/src/data/vector.c b/src/data/vector.c index 5da797c0..87046ad4 100644 --- a/src/data/vector.c +++ b/src/data/vector.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2006, 2011 Free Software Foundation, Inc. + Copyright (C) 2006, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -21,6 +21,7 @@ #include #include "data/dictionary.h" +#include "data/identifier.h" #include "libpspp/assertion.h" #include "libpspp/str.h" @@ -46,19 +47,18 @@ check_widths (const struct vector *vector) assert (width == var_get_width (vector->vars[i])); } -/* Creates and returns a new vector with the given NAME +/* Creates and returns a new vector with the given UTF-8 encoded NAME that contains the VAR_CNT variables in VARS. All variables in VARS must have the same type and width. */ struct vector * -vector_create (const char *name, - struct variable **vars, size_t var_cnt) +vector_create (const char *name, struct variable **vars, size_t var_cnt) { struct vector *vector = xmalloc (sizeof *vector); assert (var_cnt > 0); - assert (var_is_plausible_name (name, false)); - vector->name = xstrdup (name); + assert (id_is_plausible (name, false)); + vector->name = xstrdup (name); vector->vars = xmemdup (vars, var_cnt * sizeof *vector->vars); vector->var_cnt = var_cnt; check_widths (vector); @@ -80,7 +80,6 @@ vector_clone (const struct vector *old, size_t i; new->name = xstrdup (old->name); - new->vars = xnmalloc (old->var_cnt, sizeof *new->vars); new->var_cnt = old->var_cnt; for (i = 0; i < new->var_cnt; i++) @@ -103,7 +102,7 @@ vector_destroy (struct vector *vector) free (vector); } -/* Returns VECTOR's name. */ +/* Returns VECTOR's name, as a UTF-8 encoded string. */ const char * vector_get_name (const struct vector *vector) { diff --git a/src/data/vector.h b/src/data/vector.h index fc2bf951..f8fe0888 100644 --- a/src/data/vector.h +++ b/src/data/vector.h @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2006, 2011 Free Software Foundation, Inc. + Copyright (C) 2006, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -34,6 +34,8 @@ enum val_type vector_get_type (const struct vector *); struct variable *vector_get_var (const struct vector *, size_t idx); size_t vector_get_var_cnt (const struct vector *); +bool vector_is_valid_name (const char *name, bool issue_error); + int compare_vector_ptrs_by_name (const void *a_, const void *b_); #endif /* data/vector.h */ diff --git a/src/language/automake.mk b/src/language/automake.mk index 3052b523..6c0eaa90 100644 --- a/src/language/automake.mk +++ b/src/language/automake.mk @@ -14,12 +14,6 @@ noinst_LTLIBRARIES += src/language/liblanguage.la src_language_liblanguage_la_SOURCES = \ - src/language/syntax-file.c \ - src/language/syntax-file.h \ - src/language/syntax-string-source.c \ - src/language/syntax-string-source.h \ - src/language/prompt.c \ - src/language/prompt.h \ src/language/command.c \ src/language/command.h \ src/language/command.def \ diff --git a/src/language/command.c b/src/language/command.c index 3bf9571f..8ba8a08e 100644 --- a/src/language/command.c +++ b/src/language/command.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -30,12 +30,11 @@ #include "data/variable.h" #include "language/lexer/command-name.h" #include "language/lexer/lexer.h" -#include "language/prompt.h" #include "libpspp/assertion.h" #include "libpspp/compiler.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/str.h" -#include "libpspp/getl.h" #include "output/text-item.h" #include "xalloc.h" @@ -89,7 +88,6 @@ enum flags { F_ENHANCED = 0x10, /* Allowed only in enhanced syntax mode. */ F_TESTING = 0x20, /* Allowed only in testing mode. */ - F_KEEP_FINAL_TOKEN = 0x40,/* Don't skip final token in command name. */ F_ABBREV = 0x80 /* Not a candidate for name completion. */ }; @@ -120,7 +118,8 @@ static void set_completion_state (enum cmd_state); /* Command parser. */ -static const struct command *parse_command_name (struct lexer *lexer); +static const struct command *parse_command_name (struct lexer *, + int *n_tokens); static enum cmd_result do_parse_command (struct lexer *, struct dataset *, enum cmd_state); /* Parses an entire command, from command name to terminating @@ -163,11 +162,10 @@ do_parse_command (struct lexer *lexer, const struct command *command = NULL; enum cmd_result result; bool opened = false; + int n_tokens; /* Read the command's first token. */ - prompt_set_style (PROMPT_FIRST); set_completion_state (state); - lex_get (lexer); if (lex_token (lexer) == T_STOP) { result = CMD_EOF; @@ -180,10 +178,8 @@ do_parse_command (struct lexer *lexer, goto finish; } - prompt_set_style (PROMPT_LATER); - /* Parse the command name. */ - command = parse_command_name (lexer); + command = parse_command_name (lexer, &n_tokens); if (command == NULL) { result = CMD_FAILURE; @@ -216,22 +212,24 @@ do_parse_command (struct lexer *lexer, else { /* Execute command. */ + int i; + + for (i = 0; i < n_tokens; i++) + lex_get (lexer); result = command->function (lexer, ds); } assert (cmd_result_is_valid (result)); - finish: +finish: if (cmd_result_is_failure (result)) - { - lex_discard_rest_of_command (lexer); - if (source_stream_current_error_mode ( - lex_get_source_stream (lexer)) == ERRMODE_STOP ) - { - msg (MW, _("Error encountered while ERROR=STOP is effective.")); - result = CMD_CASCADING_FAILURE; - } - } + lex_interactive_reset (lexer); + else if (result == CMD_SUCCESS) + result = lex_end_of_command (lexer); + + lex_discard_rest_of_command (lexer); + while (lex_token (lexer) == T_ENDCMD) + lex_get (lexer); if (opened) text_item_submit (text_item_create (TEXT_ITEM_COMMAND_CLOSE, @@ -259,51 +257,65 @@ find_best_match (struct substring s, const struct command **matchp) return missing_words; } -/* Parse the command name and return a pointer to the corresponding - struct command if successful. - If not successful, return a null pointer. */ +static bool +parse_command_word (struct lexer *lexer, struct string *s, int n) +{ + bool need_space = ds_last (s) != EOF && ds_last (s) != '-'; + + switch (lex_next_token (lexer, n)) + { + case T_DASH: + ds_put_byte (s, '-'); + return true; + + case T_ID: + if (need_space) + ds_put_byte (s, ' '); + ds_put_cstr (s, lex_next_tokcstr (lexer, n)); + return true; + + case T_POS_NUM: + if (lex_next_is_integer (lexer, n)) + { + int integer = lex_next_integer (lexer, n); + if (integer >= 0) + { + if (need_space) + ds_put_byte (s, ' '); + ds_put_format (s, "%ld", lex_next_integer (lexer, n)); + return true; + } + } + return false; + + default: + return false; + } +} + +/* Parses the command name. On success returns a pointer to the corresponding + struct command and stores the number of tokens in the command name into + *N_TOKENS. On failure, returns a null pointer and stores the number of + tokens required to determine that no command name was present into + *N_TOKENS. */ static const struct command * -parse_command_name (struct lexer *lexer) +parse_command_name (struct lexer *lexer, int *n_tokens) { const struct command *command; int missing_words; struct string s; - - if (lex_token (lexer) == T_EXP - || lex_token (lexer) == T_ASTERISK - || lex_token (lexer) == T_LBRACK) - { - static const struct command c = { S_ANY, 0, "COMMENT", cmd_comment }; - return &c; - } + int word; command = NULL; missing_words = 0; ds_init_empty (&s); - for (;;) + word = 0; + while (parse_command_word (lexer, &s, word)) { - if (lex_token (lexer) == T_DASH) - ds_put_byte (&s, '-'); - else if (lex_token (lexer) == T_ID) - { - if (!ds_is_empty (&s) && ds_last (&s) != '-') - ds_put_byte (&s, ' '); - ds_put_cstr (&s, lex_tokcstr (lexer)); - } - else if (lex_is_integer (lexer) && lex_integer (lexer) >= 0) - { - if (!ds_is_empty (&s) && ds_last (&s) != '-') - ds_put_byte (&s, ' '); - ds_put_format (&s, "%ld", lex_integer (lexer)); - } - else - break; - missing_words = find_best_match (ds_ss (&s), &command); if (missing_words <= 0) break; - - lex_get (lexer); + word++; } if (command == NULL && missing_words > 0) @@ -320,18 +332,10 @@ parse_command_name (struct lexer *lexer) else msg (SE, _("Unknown command `%s'."), ds_cstr (&s)); } - else if (missing_words == 0) - { - if (!(command->flags & F_KEEP_FINAL_TOKEN)) - lex_get (lexer); - } - else if (missing_words < 0) - { - assert (missing_words == -1); - assert (!(command->flags & F_KEEP_FINAL_TOKEN)); - } ds_destroy (&s); + + *n_tokens = (word + 1) + missing_words; return command; } @@ -423,7 +427,8 @@ report_state_mismatch (const struct command *command, enum cmd_state state) } } else if (state == CMD_STATE_INPUT_PROGRAM) - msg (SE, _("%s is not allowed inside %s."), command->name, "INPUT PROGRAM" ); + msg (SE, _("%s is not allowed inside %s."), + command->name, "INPUT PROGRAM" ); else if (state == CMD_STATE_FILE_TYPE) msg (SE, _("%s is not allowed inside %s."), command->name, "FILE TYPE"); @@ -485,23 +490,26 @@ cmd_n_of_cases (struct lexer *lexer, struct dataset *ds) if (!lex_match_id (lexer, "ESTIMATED")) dict_set_case_limit (dataset_dict (ds), x); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Parses, performs the EXECUTE procedure. */ int -cmd_execute (struct lexer *lexer, struct dataset *ds) +cmd_execute (struct lexer *lexer UNUSED, struct dataset *ds) { bool ok = casereader_destroy (proc_open (ds)); if (!proc_commit (ds) || !ok) return CMD_CASCADING_FAILURE; - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Parses, performs the ERASE command. */ int cmd_erase (struct lexer *lexer, struct dataset *ds UNUSED) { + char *filename; + int retval; + if (settings_get_safer_mode ()) { msg (SE, _("This command not allowed when the SAFER option is set.")); @@ -514,29 +522,25 @@ cmd_erase (struct lexer *lexer, struct dataset *ds UNUSED) if (!lex_force_string (lexer)) return CMD_FAILURE; - if (remove (lex_tokcstr (lexer)) == -1) + filename = utf8_to_filename (lex_tokcstr (lexer)); + retval = remove (filename); + free (filename); + + if (retval == -1) { msg (SW, _("Error removing `%s': %s."), lex_tokcstr (lexer), strerror (errno)); return CMD_FAILURE; } + lex_get (lexer); return CMD_SUCCESS; } /* Parses, performs the NEW FILE command. */ int -cmd_new_file (struct lexer *lexer, struct dataset *ds) +cmd_new_file (struct lexer *lexer UNUSED, struct dataset *ds) { proc_discard_active_file (ds); - - return lex_end_of_command (lexer); -} - -/* Parses a comment. */ -int -cmd_comment (struct lexer *lexer, struct dataset *ds UNUSED) -{ - lex_skip_comment (lexer); return CMD_SUCCESS; } diff --git a/src/language/command.def b/src/language/command.def index ece18a8c..3610b3d3 100644 --- a/src/language/command.def +++ b/src/language/command.def @@ -16,7 +16,6 @@ /* Utility commands acceptable anywhere. */ DEF_CMD (S_ANY, F_ENHANCED, "CLOSE FILE HANDLE", cmd_close_file_handle) -DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "COMMENT", cmd_comment) DEF_CMD (S_ANY, 0, "CACHE", cmd_cache) DEF_CMD (S_ANY, 0, "CD", cmd_cd) DEF_CMD (S_ANY, 0, "DO REPEAT", cmd_do_repeat) @@ -25,7 +24,7 @@ DEF_CMD (S_ANY, 0, "ECHO", cmd_echo) DEF_CMD (S_ANY, 0, "ERASE", cmd_erase) DEF_CMD (S_ANY, 0, "EXIT", cmd_finish) DEF_CMD (S_ANY, 0, "FILE HANDLE", cmd_file_handle) -DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "FILE LABEL", cmd_file_label) +DEF_CMD (S_ANY, 0, "FILE LABEL", cmd_file_label) DEF_CMD (S_ANY, 0, "FINISH", cmd_finish) DEF_CMD (S_ANY, 0, "HOST", cmd_host) DEF_CMD (S_ANY, 0, "INCLUDE", cmd_include) @@ -40,9 +39,9 @@ DEF_CMD (S_ANY, 0, "QUIT", cmd_finish) DEF_CMD (S_ANY, 0, "RESTORE", cmd_restore) DEF_CMD (S_ANY, 0, "SET", cmd_set) DEF_CMD (S_ANY, 0, "SHOW", cmd_show) -DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "SUBTITLE", cmd_subtitle) +DEF_CMD (S_ANY, 0, "SUBTITLE", cmd_subtitle) DEF_CMD (S_ANY, 0, "SYSFILE INFO", cmd_sysfile_info) -DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "TITLE", cmd_title) +DEF_CMD (S_ANY, 0, "TITLE", cmd_title) /* Commands that define (or replace) the active file. */ DEF_CMD (S_INITIAL | S_DATA, 0, "ADD FILES", cmd_add_files) @@ -63,7 +62,7 @@ DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "BREAK", cmd_break) DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "COMPUTE", cmd_compute) DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DATAFILE ATTRIBUTE", cmd_datafile_attribute) DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DISPLAY", cmd_display) -DEF_CMD (S_DATA | S_INPUT_PROGRAM, F_KEEP_FINAL_TOKEN, "DOCUMENT", cmd_document) +DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DOCUMENT", cmd_document) DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DO IF", cmd_do_if) DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DROP DOCUMENTS", cmd_drop_documents) DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "ELSE IF", cmd_else_if) @@ -101,7 +100,7 @@ DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "XSAVE", cmd_xsave) /* Commands that may appear after active file definition. */ DEF_CMD (S_DATA, 0, "AGGREGATE", cmd_aggregate) DEF_CMD (S_DATA, 0, "AUTORECODE", cmd_autorecode) -DEF_CMD (S_DATA, F_KEEP_FINAL_TOKEN, "BEGIN DATA", cmd_begin_data) +DEF_CMD (S_DATA, 0, "BEGIN DATA", cmd_begin_data) DEF_CMD (S_DATA, 0, "COUNT", cmd_count) DEF_CMD (S_DATA, 0, "CROSSTABS", cmd_crosstabs) DEF_CMD (S_DATA, 0, "CORRELATIONS", cmd_correlation) diff --git a/src/language/control/automake.mk b/src/language/control/automake.mk index e12813ab..3e87e5a0 100644 --- a/src/language/control/automake.mk +++ b/src/language/control/automake.mk @@ -6,8 +6,7 @@ language_control_sources = \ src/language/control/control-stack.h \ src/language/control/do-if.c \ src/language/control/loop.c \ - src/language/control/temporary.c \ src/language/control/repeat.c \ - src/language/control/repeat.h + src/language/control/temporary.c EXTRA_DIST += src/language/control/OChangeLog diff --git a/src/language/control/do-if.c b/src/language/control/do-if.c index a80dae31..612b0856 100644 --- a/src/language/control/do-if.c +++ b/src/language/control/do-if.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2009, 2011 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2009-2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -121,19 +121,19 @@ cmd_else_if (struct lexer *lexer, struct dataset *ds) /* Parse ELSE. */ int -cmd_else (struct lexer *lexer, struct dataset *ds) +cmd_else (struct lexer *lexer UNUSED, struct dataset *ds) { struct do_if_trns *do_if = ctl_stack_top (&do_if_class); assert (ds == do_if->ds); if (do_if == NULL || !must_not_have_else (do_if)) return CMD_CASCADING_FAILURE; add_else (do_if); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Parse END IF. */ int -cmd_end_if (struct lexer *lexer, struct dataset *ds) +cmd_end_if (struct lexer *lexer UNUSED, struct dataset *ds) { struct do_if_trns *do_if = ctl_stack_top (&do_if_class); assert (ds == do_if->ds); @@ -143,7 +143,7 @@ cmd_end_if (struct lexer *lexer, struct dataset *ds) ctl_stack_pop (do_if); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Closes out DO_IF, by adding a sentinel ELSE clause if @@ -204,7 +204,7 @@ parse_clause (struct lexer *lexer, struct do_if_trns *do_if, struct dataset *ds) add_clause (do_if, condition); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Adds a clause to DO_IF that tests for the given CONDITION and, diff --git a/src/language/control/loop.c b/src/language/control/loop.c index 362f1993..8f4ff825 100644 --- a/src/language/control/loop.c +++ b/src/language/control/loop.c @@ -154,7 +154,7 @@ cmd_end_loop (struct lexer *lexer, struct dataset *ds) /* Parses BREAK. */ int -cmd_break (struct lexer *lexer, struct dataset *ds) +cmd_break (struct lexer *lexer UNUSED, struct dataset *ds) { struct ctl_stmt *loop = ctl_stack_search (&loop_class); if (loop == NULL) @@ -162,7 +162,7 @@ cmd_break (struct lexer *lexer, struct dataset *ds) add_transformation (ds, break_trns_proc, NULL, loop); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Closes a LOOP construct by emitting the END LOOP diff --git a/src/language/control/repeat.c b/src/language/control/repeat.c index ecff0577..c0fa8fe0 100644 --- a/src/language/control/repeat.c +++ b/src/language/control/repeat.c @@ -16,483 +16,412 @@ #include -#include "language/control/repeat.h" - -#include -#include #include #include "data/dictionary.h" #include "data/procedure.h" -#include "data/settings.h" -#include "libpspp/getl.h" #include "language/command.h" #include "language/lexer/lexer.h" +#include "language/lexer/segment.h" +#include "language/lexer/token.h" #include "language/lexer/variable-parser.h" #include "libpspp/cast.h" -#include "libpspp/ll.h" +#include "libpspp/hash-functions.h" +#include "libpspp/hmap.h" #include "libpspp/message.h" -#include "libpspp/misc.h" -#include "libpspp/pool.h" #include "libpspp/str.h" -#include "data/variable.h" -#include "gl/intprops.h" +#include "gl/ftoastr.h" +#include "gl/minmax.h" #include "gl/xalloc.h" #include "gettext.h" #define _(msgid) gettext (msgid) -/* A line repeated by DO REPEAT. */ -struct repeat_line - { - struct ll ll; /* In struct repeat_block line_list. */ - const char *file_name; /* File name. */ - int line_number; /* Line number. */ - struct substring text; /* Contents. */ - }; - -/* The type of substitution made for a DO REPEAT macro. */ -enum repeat_macro_type - { - VAR_NAMES, - OTHER - }; - -/* Describes one DO REPEAT macro. */ -struct repeat_macro +struct dummy_var { - struct ll ll; /* In struct repeat_block macros. */ - enum repeat_macro_type type; /* Types of replacements. */ - struct substring name; /* Macro name. */ - struct substring *replacements; /* Macro replacement. */ + struct hmap_node hmap_node; + char *name; + char **values; + size_t n_values; }; -/* A DO REPEAT...END REPEAT block. */ -struct repeat_block - { - struct getl_interface parent; - - struct pool *pool; /* Pool used for storage. */ - struct dataset *ds; /* The dataset for this block */ - - struct ll_list lines; /* Lines in buffer. */ - struct ll *cur_line; /* Last line output. */ - int loop_cnt; /* Number of loops. */ - int loop_idx; /* Number of loops so far. */ +static bool parse_specification (struct lexer *, struct dictionary *, + struct hmap *dummies); +static bool parse_commands (struct lexer *, struct hmap *dummies); +static void destroy_dummies (struct hmap *dummies); - struct ll_list macros; /* Table of macros. */ +static bool parse_ids (struct lexer *, const struct dictionary *, + struct dummy_var *); +static bool parse_numbers (struct lexer *, struct dummy_var *); +static bool parse_strings (struct lexer *, struct dummy_var *); - bool print; /* Print lines as executed? */ - }; - -static bool parse_specification (struct lexer *, struct repeat_block *); -static bool parse_lines (struct lexer *, struct repeat_block *); -static void create_vars (struct repeat_block *); +int +cmd_do_repeat (struct lexer *lexer, struct dataset *ds) +{ + struct hmap dummies; + bool ok; -static struct repeat_macro *find_macro (struct repeat_block *, - struct substring name); + if (!parse_specification (lexer, dataset_dict (ds), &dummies)) + return CMD_CASCADING_FAILURE; -static int parse_ids (struct lexer *, const struct dictionary *dict, - struct repeat_macro *, struct pool *); + ok = parse_commands (lexer, &dummies); -static int parse_numbers (struct lexer *, struct repeat_macro *, - struct pool *); + destroy_dummies (&dummies); -static int parse_strings (struct lexer *, struct repeat_macro *, - struct pool *); + return ok ? CMD_SUCCESS : CMD_CASCADING_FAILURE; +} -static void do_repeat_filter (struct getl_interface *, - struct string *); -static bool do_repeat_read (struct getl_interface *, - struct string *); -static void do_repeat_close (struct getl_interface *); -static bool always_false (const struct getl_interface *); -static const char *do_repeat_name (const struct getl_interface *); -static int do_repeat_location (const struct getl_interface *); +static unsigned int +hash_dummy (const char *name, size_t name_len) +{ + return hash_case_bytes (name, name_len, 0); +} -int -cmd_do_repeat (struct lexer *lexer, struct dataset *ds) +static const struct dummy_var * +find_dummy_var (struct hmap *hmap, const char *name, size_t name_len) { - struct repeat_block *block; - - block = pool_create_container (struct repeat_block, pool); - block->ds = ds; - ll_init (&block->lines); - block->cur_line = ll_null (&block->lines); - block->loop_idx = 0; - ll_init (&block->macros); - - if (!parse_specification (lexer, block) || !parse_lines (lexer, block)) - goto error; - - create_vars (block); - - block->parent.read = do_repeat_read; - block->parent.close = do_repeat_close; - block->parent.filter = do_repeat_filter; - block->parent.interactive = always_false; - block->parent.name = do_repeat_name; - block->parent.location = do_repeat_location; - - if (!ll_is_empty (&block->lines)) - getl_include_source (lex_get_source_stream (lexer), - &block->parent, - lex_current_syntax_mode (lexer), - lex_current_error_mode (lexer) - ); - else - pool_destroy (block->pool); + const struct dummy_var *dv; - return CMD_SUCCESS; + HMAP_FOR_EACH_WITH_HASH (dv, struct dummy_var, hmap_node, + hash_dummy (name, name_len), hmap) + if (strcasecmp (dv->name, name)) + return dv; - error: - pool_destroy (block->pool); - return CMD_CASCADING_FAILURE; + return NULL; } /* Parses the whole DO REPEAT command specification. Returns success. */ static bool -parse_specification (struct lexer *lexer, struct repeat_block *block) +parse_specification (struct lexer *lexer, struct dictionary *dict, + struct hmap *dummies) { - struct substring first_name; + struct dummy_var *first_dv = NULL; - block->loop_cnt = 0; + hmap_init (dummies); do { - struct repeat_macro *macro; - struct dictionary *dict = dataset_dict (block->ds); - int count; + struct dummy_var *dv; + const char *name; + bool ok; /* Get a stand-in variable name and make sure it's unique. */ if (!lex_force_id (lexer)) - return false; - if (dict_lookup_var (dict, lex_tokcstr (lexer))) + goto error; + name = lex_tokcstr (lexer); + if (dict_lookup_var (dict, name)) msg (SW, _("Dummy variable name `%s' hides dictionary variable `%s'."), - lex_tokcstr (lexer), lex_tokcstr (lexer)); - if (find_macro (block, lex_tokss (lexer))) - { - msg (SE, _("Dummy variable name `%s' is given twice."), - lex_tokcstr (lexer)); - return false; - } + name, name); + if (find_dummy_var (dummies, name, strlen (name))) + { + msg (SE, _("Dummy variable name `%s' is given twice."), name); + goto error; + } /* Make a new macro. */ - macro = pool_alloc (block->pool, sizeof *macro); - ss_alloc_substring_pool (¯o->name, lex_tokss (lexer), block->pool); - ll_push_tail (&block->macros, ¯o->ll); + dv = xmalloc (sizeof *dv); + dv->name = xstrdup (name); + dv->values = NULL; + dv->n_values = 0; + hmap_insert (dummies, &dv->hmap_node, hash_dummy (name, strlen (name))); /* Skip equals sign. */ lex_get (lexer); if (!lex_force_match (lexer, T_EQUALS)) - return false; + goto error; /* Get the details of the variable's possible values. */ - if (lex_token (lexer) == T_ID) - count = parse_ids (lexer, dict, macro, block->pool); + if (lex_token (lexer) == T_ID || lex_token (lexer) == T_ALL) + ok = parse_ids (lexer, dict, dv); else if (lex_is_number (lexer)) - count = parse_numbers (lexer, macro, block->pool); + ok = parse_numbers (lexer, dv); else if (lex_is_string (lexer)) - count = parse_strings (lexer, macro, block->pool); + ok = parse_strings (lexer, dv); else { lex_error (lexer, NULL); - return false; + goto error; } - if (count == 0) - return false; + if (!ok) + goto error; + assert (dv->n_values > 0); if (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD) { lex_error (lexer, NULL); - return false; + goto error; } - /* If this is the first variable then it defines how many - replacements there must be; otherwise enforce this number of - replacements. */ - if (block->loop_cnt == 0) + /* If this is the first variable then it defines how many replacements + there must be; otherwise enforce this number of replacements. */ + if (first_dv == NULL) + first_dv = dv; + else if (first_dv->n_values != dv->n_values) { - block->loop_cnt = count; - first_name = macro->name; - } - else if (block->loop_cnt != count) - { - msg (SE, _("Dummy variable `%.*s' had %d " - "substitutions, so `%.*s' must also, but %d " - "were specified."), - (int) ss_length (first_name), ss_data (first_name), - block->loop_cnt, - (int) ss_length (macro->name), ss_data (macro->name), - count); - return false; + msg (SE, _("Dummy variable `%s' had %d substitutions, so `%s' must " + "also, but %d were specified."), + first_dv->name, first_dv->n_values, + dv->name, dv->n_values); + goto error; } lex_match (lexer, T_SLASH); } - while (lex_token (lexer) != T_ENDCMD); + while (!lex_match (lexer, T_ENDCMD)); - return true; -} + while (lex_match (lexer, T_ENDCMD)) + continue; -/* Finds and returns a DO REPEAT macro with the given NAME, or - NULL if there is none */ -static struct repeat_macro * -find_macro (struct repeat_block *block, struct substring name) -{ - struct repeat_macro *macro; - - ll_for_each (macro, struct repeat_macro, ll, &block->macros) - if (ss_equals (macro->name, name)) - return macro; + return true; - return NULL; +error: + destroy_dummies (dummies); + return false; } -/* Advances LINE past white space and an identifier, if present. - Returns true if KEYWORD matches the identifer, false - otherwise. */ -static bool -recognize_keyword (struct substring *line, const char *keyword) +static size_t +count_values (struct hmap *dummies) { - struct substring id; - ss_ltrim (line, ss_cstr (CC_SPACES)); - ss_get_bytes (line, lex_id_get_length (*line), &id); - return lex_id_match (ss_cstr (keyword), id); + const struct dummy_var *dv; + dv = HMAP_FIRST (struct dummy_var, hmap_node, dummies); + return dv->n_values; } -/* Returns true if LINE contains a DO REPEAT command, false - otherwise. */ -static bool -recognize_do_repeat (struct substring line) +static void +do_parse_commands (struct substring s, enum lex_syntax_mode syntax_mode, + struct hmap *dummies, + struct string *outputs, size_t n_outputs) { - return (recognize_keyword (&line, "do") - && recognize_keyword (&line, "repeat")); -} + struct segmenter segmenter; -/* Returns true if LINE contains an END REPEAT command, false - otherwise. Sets *PRINT to true for END REPEAT PRINT, false - otherwise. */ -static bool -recognize_end_repeat (struct substring line, bool *print) -{ - if (!recognize_keyword (&line, "end") - || !recognize_keyword (&line, "repeat")) - return false; + segmenter_init (&segmenter, syntax_mode); - *print = recognize_keyword (&line, "print"); - return true; -} + while (!ss_is_empty (s)) + { + enum segment_type type; + int n; -/* Read all the lines we are going to substitute, inside the DO - REPEAT...END REPEAT block. */ -static bool -parse_lines (struct lexer *lexer, struct repeat_block *block) -{ - char *previous_file_name; - int nesting_level; + n = segmenter_push (&segmenter, s.string, s.length, &type); + assert (n >= 0); - previous_file_name = NULL; - nesting_level = 0; + if (type == SEG_DO_REPEAT_COMMAND) + { + for (;;) + { + int k; - for (;;) - { - const char *cur_file_name; - struct repeat_line *line; - struct string text; - bool command_ends_before_line, command_ends_after_line; + k = segmenter_push (&segmenter, s.string + n, s.length - n, + &type); + if (type != SEG_NEWLINE && type != SEG_DO_REPEAT_COMMAND) + break; - /* Retrieve an input line and make a copy of it. */ - if (!lex_get_line_raw (lexer)) - { - msg (SE, _("DO REPEAT without END REPEAT.")); - return false; - } - ds_init_string (&text, lex_entire_line_ds (lexer)); - - /* Record file name. */ - cur_file_name = getl_source_name (lex_get_source_stream (lexer)); - if (cur_file_name != NULL && - (previous_file_name == NULL - || !strcmp (cur_file_name, previous_file_name))) - previous_file_name = pool_strdup (block->pool, cur_file_name); - - /* Create a line structure. */ - line = pool_alloc (block->pool, sizeof *line); - line->file_name = previous_file_name; - line->line_number = getl_source_location (lex_get_source_stream (lexer)); - ss_alloc_substring_pool (&line->text, ds_ss (&text), block->pool); - - - /* Check whether the line contains a DO REPEAT or END - REPEAT command. */ - lex_preprocess_line (&text, - lex_current_syntax_mode (lexer), - &command_ends_before_line, - &command_ends_after_line); - if (recognize_do_repeat (ds_ss (&text))) - { - if (settings_get_syntax () == COMPATIBLE) - msg (SE, _("DO REPEAT may not nest in compatibility mode.")); - else - nesting_level++; + n += k; + } + + do_parse_commands (ss_head (s, n), syntax_mode, dummies, + outputs, n_outputs); } - else if (recognize_end_repeat (ds_ss (&text), &block->print) - && nesting_level-- == 0) + else if (type != SEG_END) { - lex_discard_line (lexer); - ds_destroy (&text); - return true; + const struct dummy_var *dv; + size_t i; + + dv = (type == SEG_IDENTIFIER + ? find_dummy_var (dummies, s.string, n) + : NULL); + for (i = 0; i < n_outputs; i++) + if (dv != NULL) + ds_put_cstr (&outputs[i], dv->values[i]); + else + ds_put_substring (&outputs[i], ss_head (s, n)); } - ds_destroy (&text); - /* Add the line to the list. */ - ll_push_tail (&block->lines, &line->ll); + ss_advance (&s, n); } } -/* Creates variables for the given DO REPEAT. */ +static bool +parse_commands (struct lexer *lexer, struct hmap *dummies) +{ + struct string *outputs; + struct string input; + size_t input_len; + size_t n_values; + char *file_name; + int line_number; + bool ok; + size_t i; + + if (lex_get_file_name (lexer) != NULL) + file_name = xstrdup (lex_get_file_name (lexer)); + else + file_name = NULL; + line_number = lex_get_first_line_number (lexer, 0); + + ds_init_empty (&input); + while (lex_is_string (lexer)) + { + ds_put_substring (&input, lex_tokss (lexer)); + ds_put_byte (&input, '\n'); + lex_get (lexer); + } + if (ds_is_empty (&input)) + ds_put_byte (&input, '\n'); + ds_put_byte (&input, '\0'); + input_len = ds_length (&input); + + n_values = count_values (dummies); + outputs = xmalloc (n_values * sizeof *outputs); + for (i = 0; i < n_values; i++) + ds_init_empty (&outputs[i]); + + do_parse_commands (ds_ss (&input), lex_get_syntax_mode (lexer), + dummies, outputs, n_values); + + ds_destroy (&input); + + while (lex_match (lexer, T_ENDCMD)) + continue; + + ok = (lex_force_match_id (lexer, "END") + && lex_force_match_id (lexer, "REPEAT")); + if (ok) + lex_match_id (lexer, "PRINT"); /* XXX */ + + lex_discard_rest_of_command (lexer); + + for (i = 0; i < n_values; i++) + { + struct string *output = &outputs[n_values - i - 1]; + struct lex_reader *reader; + + reader = lex_reader_for_substring_nocopy (ds_ss (output)); + lex_reader_set_file_name (reader, file_name); + reader->line_number = line_number; + lex_include (lexer, reader); + } + free (file_name); + + return ok; +} + static void -create_vars (struct repeat_block *block) +destroy_dummies (struct hmap *dummies) { - struct repeat_macro *macro; - - ll_for_each (macro, struct repeat_macro, ll, &block->macros) - if (macro->type == VAR_NAMES) - { - int i; - - for (i = 0; i < block->loop_cnt; i++) - { - /* Ignore return value: if the variable already - exists there is no harm done. */ - char *var_name = ss_xstrdup (macro->replacements[i]); - dict_create_var (dataset_dict (block->ds), var_name, 0); - free (var_name); - } - } + struct dummy_var *dv, *next; + + HMAP_FOR_EACH_SAFE (dv, next, struct dummy_var, hmap_node, dummies) + { + size_t i; + + hmap_delete (dummies, &dv->hmap_node); + + free (dv->name); + for (i = 0; i < dv->n_values; i++) + free (dv->values[i]); + free (dv->values); + free (dv); + } + hmap_destroy (dummies); } /* Parses a set of ids for DO REPEAT. */ -static int +static bool parse_ids (struct lexer *lexer, const struct dictionary *dict, - struct repeat_macro *macro, struct pool *pool) + struct dummy_var *dv) { - char **replacements; - size_t n, i; - - macro->type = VAR_NAMES; - if (!parse_mixed_vars_pool (lexer, dict, pool, &replacements, &n, PV_NONE)) - return 0; - - macro->replacements = pool_nalloc (pool, n, sizeof *macro->replacements); - for (i = 0; i < n; i++) - macro->replacements[i] = ss_cstr (replacements[i]); - return n; + return parse_mixed_vars (lexer, dict, &dv->values, &dv->n_values, PV_NONE); } /* Adds REPLACEMENT to MACRO's list of replacements, which has *USED elements and has room for *ALLOCATED. Allocates memory from POOL. */ static void -add_replacement (struct substring replacement, - struct repeat_macro *macro, struct pool *pool, - size_t *used, size_t *allocated) +add_replacement (struct dummy_var *dv, char *value, size_t *allocated) { - if (*used == *allocated) - macro->replacements = pool_2nrealloc (pool, macro->replacements, allocated, - sizeof *macro->replacements); - macro->replacements[(*used)++] = replacement; + if (dv->n_values == *allocated) + dv->values = x2nrealloc (dv->values, allocated, sizeof *dv->values); + dv->values[dv->n_values++] = value; } /* Parses a list or range of numbers for DO REPEAT. */ -static int -parse_numbers (struct lexer *lexer, struct repeat_macro *macro, - struct pool *pool) +static bool +parse_numbers (struct lexer *lexer, struct dummy_var *dv) { - size_t used = 0; size_t allocated = 0; - macro->type = OTHER; - macro->replacements = NULL; - do { - bool integer_value_seen; - double a, b, i; - - /* Parse A TO B into a, b. */ if (!lex_force_num (lexer)) - return 0; + return false; - if ( (integer_value_seen = lex_is_integer (lexer) ) ) - a = lex_integer (lexer); - else - a = lex_number (lexer); + if (lex_next_token (lexer, 1) == T_TO) + { + long int a, b; + long int i; - lex_get (lexer); - if (lex_token (lexer) == T_TO) - { - if ( !integer_value_seen ) + if (!lex_is_integer (lexer)) { - msg (SE, _("Ranges may only have integer bounds")); - return 0; + msg (SE, _("Ranges may only have integer bounds.")); + return false; } - lex_get (lexer); - if (!lex_force_int (lexer)) - return 0; + + a = lex_integer (lexer); + lex_get (lexer); + lex_get (lexer); + + if (!lex_force_int (lexer)) + return false; + b = lex_integer (lexer); if (b < a) { - msg (SE, _("%g TO %g is an invalid range."), a, b); - return 0; + msg (SE, _("%ld TO %ld is an invalid range."), a, b); + return false; } lex_get (lexer); - } + + for (i = a; i <= b; i++) + add_replacement (dv, xasprintf ("%ld", i), &allocated); + } else - b = a; + { + char s[DBL_BUFSIZE_BOUND]; - for (i = a; i <= b; i++) - add_replacement (ss_cstr (pool_asprintf (pool, "%g", i)), - macro, pool, &used, &allocated); + dtoastr (s, sizeof s, 0, 0, lex_number (lexer)); + add_replacement (dv, xstrdup (s), &allocated); + lex_get (lexer); + } lex_match (lexer, T_COMMA); } while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD); - return used; + return true; } /* Parses a list of strings for DO REPEAT. */ -int -parse_strings (struct lexer *lexer, struct repeat_macro *macro, struct pool *pool) +static bool +parse_strings (struct lexer *lexer, struct dummy_var *dv) { - size_t used = 0; size_t allocated = 0; - macro->type = OTHER; - macro->replacements = NULL; - do { - char *string; - if (!lex_force_string (lexer)) { msg (SE, _("String expected.")); - return 0; + return false; } - string = lex_token_representation (lexer); - pool_register (pool, free, string); - add_replacement (ss_cstr (string), macro, pool, &used, &allocated); + add_replacement (dv, token_to_string (lex_next (lexer, 0)), &allocated); lex_get (lexer); lex_match (lexer, T_COMMA); } while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD); - return used; + return true; } int @@ -501,128 +430,3 @@ cmd_end_repeat (struct lexer *lexer UNUSED, struct dataset *ds UNUSED) msg (SE, _("No matching DO REPEAT.")); return CMD_CASCADING_FAILURE; } - -/* Finds a DO REPEAT macro with the given NAME and returns the - appropriate substitution if found, or NAME otherwise. */ -static struct substring -find_substitution (struct repeat_block *block, struct substring name) -{ - struct repeat_macro *macro = find_macro (block, name); - return macro ? macro->replacements[block->loop_idx] : name; -} - -/* Makes appropriate DO REPEAT macro substitutions within the - repeated lines. */ -static void -do_repeat_filter (struct getl_interface *interface, struct string *line) -{ - struct repeat_block *block - = UP_CAST (interface, struct repeat_block, parent); - bool in_apos, in_quote, dot; - struct substring input; - struct string output; - int c; - - ds_init_empty (&output); - - /* Strip trailing whitespace, check for & remove terminal dot. */ - ds_rtrim (line, ss_cstr (CC_SPACES)); - dot = ds_chomp_byte (line, '.'); - input = ds_ss (line); - in_apos = in_quote = false; - while ((c = ss_first (input)) != EOF) - { - if (c == '\'' && !in_quote) - in_apos = !in_apos; - else if (c == '"' && !in_apos) - in_quote = !in_quote; - - if (in_quote || in_apos || !lex_is_id1 (c)) - { - ds_put_byte (&output, c); - ss_advance (&input, 1); - } - else - { - struct substring id; - ss_get_bytes (&input, lex_id_get_length (input), &id); - ds_put_substring (&output, find_substitution (block, id)); - } - } - if (dot) - ds_put_byte (&output, '.'); - - ds_swap (line, &output); - ds_destroy (&output); -} - -static struct repeat_line * -current_line (const struct getl_interface *interface) -{ - struct repeat_block *block - = UP_CAST (interface, struct repeat_block, parent); - return (block->cur_line != ll_null (&block->lines) - ? ll_data (block->cur_line, struct repeat_line, ll) - : NULL); -} - -/* Function called by getl to read a line. Puts the line in - OUTPUT and its syntax mode in *SYNTAX. Returns true if a line - was obtained, false if the source is exhausted. */ -static bool -do_repeat_read (struct getl_interface *interface, - struct string *output) -{ - struct repeat_block *block - = UP_CAST (interface, struct repeat_block, parent); - struct repeat_line *line; - - block->cur_line = ll_next (block->cur_line); - if (block->cur_line == ll_null (&block->lines)) - { - block->loop_idx++; - if (block->loop_idx >= block->loop_cnt) - return false; - - block->cur_line = ll_head (&block->lines); - } - - line = current_line (interface); - ds_assign_substring (output, line->text); - return true; -} - -/* Frees a DO REPEAT block. - Called by getl to close out the DO REPEAT block. */ -static void -do_repeat_close (struct getl_interface *interface) -{ - struct repeat_block *block - = UP_CAST (interface, struct repeat_block, parent); - pool_destroy (block->pool); -} - - -static bool -always_false (const struct getl_interface *i UNUSED) -{ - return false; -} - -/* Returns the name of the source file from which the previous - line was originally obtained, or a null pointer if none. */ -static const char * -do_repeat_name (const struct getl_interface *interface) -{ - struct repeat_line *line = current_line (interface); - return line ? line->file_name : NULL; -} - -/* Returns the line number in the source file from which the - previous line was originally obtained, or 0 if none. */ -static int -do_repeat_location (const struct getl_interface *interface) -{ - struct repeat_line *line = current_line (interface); - return line ? line->line_number : 0; -} diff --git a/src/language/control/repeat.h b/src/language/control/repeat.h deleted file mode 100644 index 700bf64a..00000000 --- a/src/language/control/repeat.h +++ /dev/null @@ -1,22 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#if !INCLUDED_REPEAT_H -#define INCLUDED_REPEAT_H 1 - -void perform_DO_REPEAT_substitutions (void); - -#endif /* repeat.h */ diff --git a/src/language/control/temporary.c b/src/language/control/temporary.c index eda939bb..b6a5cc98 100644 --- a/src/language/control/temporary.c +++ b/src/language/control/temporary.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2011 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -37,12 +37,12 @@ /* Parses the TEMPORARY command. */ int -cmd_temporary (struct lexer *lexer, struct dataset *ds) +cmd_temporary (struct lexer *lexer UNUSED, struct dataset *ds) { if (!proc_in_temporary_transformations (ds)) proc_start_temporary_transformations (ds); else msg (SE, _("This command may only appear once between " "procedures and procedure-like commands.")); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/data-io/combine-files.c b/src/language/data-io/combine-files.c index 82c36945..58693c62 100644 --- a/src/language/data-io/combine-files.c +++ b/src/language/data-io/combine-files.c @@ -36,6 +36,7 @@ #include "language/stats/sort-criteria.h" #include "libpspp/assertion.h" #include "libpspp/message.h" +#include "libpspp/string-array.h" #include "libpspp/taint.h" #include "math/sort.h" @@ -491,7 +492,7 @@ static bool merge_dictionary (struct dictionary *const m, struct comb_file *f) { struct dictionary *d = f->dict; - const char *d_docs, *m_docs; + const struct string_array *d_docs, *m_docs; int i; const char *file_encoding; @@ -525,9 +526,19 @@ merge_dictionary (struct dictionary *const m, struct comb_file *f) dict_set_documents (m, d_docs); else { - char *new_docs = xasprintf ("%s%s", m_docs, d_docs); - dict_set_documents (m, new_docs); - free (new_docs); + struct string_array new_docs; + size_t i; + + new_docs.n = m_docs->n + d_docs->n; + new_docs.strings = xmalloc (new_docs.n * sizeof *new_docs.strings); + for (i = 0; i < m_docs->n; i++) + new_docs.strings[i] = m_docs->strings[i]; + for (i = 0; i < d_docs->n; i++) + new_docs.strings[m_docs->n + i] = d_docs->strings[i]; + + dict_set_documents (m, &new_docs); + + free (new_docs.strings); } } @@ -577,7 +588,7 @@ merge_dictionary (struct dictionary *const m, struct comb_file *f) if (var_has_missing_values (dv) && !var_has_missing_values (mv)) var_set_missing_values (mv, var_get_missing_values (dv)); if (var_get_label (dv) && !var_get_label (mv)) - var_set_label (mv, var_get_label (dv)); + var_set_label (mv, var_get_label (dv), file_encoding, false); } else mv = dict_clone_var_assert (m, dv); diff --git a/src/language/data-io/data-list.c b/src/language/data-io/data-list.c index 043b424d..6d21c843 100644 --- a/src/language/data-io/data-list.c +++ b/src/language/data-io/data-list.c @@ -204,6 +204,7 @@ cmd_data_list (struct lexer *lexer, struct dataset *ds) } else { + /* XXX should support multibyte UTF-8 characters */ lex_error (lexer, NULL); ds_destroy (&delims); goto error; @@ -330,7 +331,7 @@ parse_fixed (struct lexer *lexer, struct dictionary *dict, /* Parse everything. */ if (!parse_record_placement (lexer, &record, &column) - || !parse_DATA_LIST_vars_pool (lexer, tmp_pool, + || !parse_DATA_LIST_vars_pool (lexer, dict, tmp_pool, &names, &name_cnt, PV_NONE) || !parse_var_placements (lexer, tmp_pool, name_cnt, true, &formats, &format_cnt)) @@ -422,7 +423,7 @@ parse_free (struct lexer *lexer, struct dictionary *dict, size_t name_cnt; size_t i; - if (!parse_DATA_LIST_vars_pool (lexer, tmp_pool, + if (!parse_DATA_LIST_vars_pool (lexer, dict, tmp_pool, &name, &name_cnt, PV_NONE)) return false; diff --git a/src/language/data-io/data-parser.c b/src/language/data-io/data-parser.c index 630363af..0c52e07b 100644 --- a/src/language/data-io/data-parser.c +++ b/src/language/data-io/data-parser.c @@ -508,10 +508,11 @@ parse_error (const struct dfm_reader *reader, const struct field *field, m.category = MSG_C_DATA; m.severity = MSG_S_WARNING; - m.where.file_name = CONST_CAST (char *, dfm_get_file_name (reader)); - m.where.line_number = dfm_get_line_number (reader); - m.where.first_column = first_column; - m.where.last_column = last_column; + m.file_name = CONST_CAST (char *, dfm_get_file_name (reader)); + m.first_line = dfm_get_line_number (reader); + m.last_line = m.first_line + 1; + m.first_column = first_column; + m.last_column = last_column; m.text = xasprintf (_("Data for variable %s is not valid as format %s: %s"), field->name, fmt_name (field->format.type), error); msg_emit (&m); diff --git a/src/language/data-io/data-reader.c b/src/language/data-io/data-reader.c index 87fa8c92..e701a936 100644 --- a/src/language/data-io/data-reader.c +++ b/src/language/data-io/data-reader.c @@ -32,7 +32,6 @@ #include "language/command.h" #include "language/data-io/file-handle.h" #include "language/lexer/lexer.h" -#include "language/prompt.h" #include "libpspp/assertion.h" #include "libpspp/cast.h" #include "libpspp/integer-format.h" @@ -53,6 +52,7 @@ enum dfm_reader_flags DFM_SAW_BEGIN_DATA = 004, /* For inline_file only, whether we've already read a BEGIN DATA line. */ DFM_TABS_EXPANDED = 010, /* Tabs have been expanded. */ + DFM_CONSUME = 020 /* read_inline_record() should get a token? */ }; /* Data file reader. */ @@ -60,7 +60,7 @@ struct dfm_reader { struct file_handle *fh; /* File handle. */ struct fh_lock *lock; /* Mutual exclusion lock for file. */ - struct msg_locator where; /* Current location in data file. */ + int line_number; /* Current line or record number. */ struct string line; /* Current line. */ struct string scratch; /* Extra line buffer. */ enum dfm_reader_flags flags; /* Zero or more of DFM_*. */ @@ -141,8 +141,7 @@ dfm_open_reader (struct file_handle *fh, struct lexer *lexer) if (fh_get_referent (fh) != FH_REF_INLINE) { struct stat s; - r->where.file_name = CONST_CAST (char *, fh_get_file_name (fh)); - r->where.line_number = 0; + r->line_number = 0; r->file = fn_open (fh_get_file_name (fh), "rb"); if (r->file == NULL) { @@ -177,33 +176,37 @@ read_inline_record (struct dfm_reader *r) if ((r->flags & DFM_SAW_BEGIN_DATA) == 0) { r->flags |= DFM_SAW_BEGIN_DATA; + r->flags &= ~DFM_CONSUME; while (lex_token (r->lexer) == T_ENDCMD) lex_get (r->lexer); - if (!lex_force_match_id (r->lexer, "BEGIN") || !lex_force_match_id (r->lexer, "DATA")) + + if (!lex_force_match_id (r->lexer, "BEGIN") + || !lex_force_match_id (r->lexer, "DATA")) return false; - prompt_set_style (PROMPT_DATA); - } - if (!lex_get_line_raw (r->lexer)) - { - lex_discard_line (r->lexer); - msg (SE, _("Unexpected end-of-file while reading data in BEGIN " - "DATA. This probably indicates " - "a missing or incorrectly formatted END DATA command. " - "END DATA must appear by itself on a single line " - "with exactly one space between words.")); - return false; + lex_match (r->lexer, T_ENDCMD); } - if (ds_length (lex_entire_line_ds (r->lexer) ) >= 8 - && !strncasecmp (lex_entire_line (r->lexer), "end data", 8)) + if (r->flags & DFM_CONSUME) + lex_get (r->lexer); + + if (!lex_is_string (r->lexer)) { - lex_discard_line (r->lexer); + if (!lex_match_id (r->lexer, "END") || !lex_match_id (r->lexer, "DATA")) + { + msg (SE, _("Missing END DATA while reading inline data. " + "This probably indicates a missing or incorrectly " + "formatted END DATA command. END DATA must appear " + "by itself on a single line with exactly one space " + "between words.")); + lex_discard_rest_of_command (r->lexer); + } return false; } - ds_assign_string (&r->line, lex_entire_line_ds (r->lexer) ); + ds_assign_substring (&r->line, lex_tokss (r->lexer)); + r->flags |= DFM_CONSUME; return true; } @@ -480,7 +483,7 @@ read_record (struct dfm_reader *r) { bool ok = read_file_record (r); if (ok) - r->where.line_number++; + r->line_number++; return ok; } else @@ -678,13 +681,15 @@ dfm_get_column (const struct dfm_reader *r, const char *p) const char * dfm_get_file_name (const struct dfm_reader *r) { - return fh_get_referent (r->fh) == FH_REF_FILE ? r->where.file_name : NULL; + return (fh_get_referent (r->fh) == FH_REF_FILE + ? fh_get_file_name (r->fh) + : NULL); } int dfm_get_line_number (const struct dfm_reader *r) { - return fh_get_referent (r->fh) == FH_REF_FILE ? r->where.line_number : -1; + return fh_get_referent (r->fh) == FH_REF_FILE ? r->line_number : -1; } /* BEGIN DATA...END DATA procedure. */ @@ -702,13 +707,14 @@ cmd_begin_data (struct lexer *lexer, struct dataset *ds) "input program does not access the inline file.")); return CMD_CASCADING_FAILURE; } + lex_match (lexer, T_ENDCMD); /* Open inline file. */ r = dfm_open_reader (fh_inline_file (), lexer); r->flags |= DFM_SAW_BEGIN_DATA; + r->flags &= ~DFM_CONSUME; /* Input procedure reads from inline file. */ - prompt_set_style (PROMPT_DATA); casereader_destroy (proc_open (ds)); ok = proc_commit (ds); dfm_close_reader (r); diff --git a/src/language/data-io/file-handle.q b/src/language/data-io/file-handle.q index e847e00f..7e7cdcdf 100644 --- a/src/language/data-io/file-handle.q +++ b/src/language/data-io/file-handle.q @@ -54,30 +54,32 @@ cmd_file_handle (struct lexer *lexer, struct dataset *ds) { struct cmd_file_handle cmd; struct file_handle *handle; + enum cmd_result result; char *handle_name; + result = CMD_CASCADING_FAILURE; if (!lex_force_id (lexer)) - goto error; - handle_name = xstrdup (lex_tokcstr (lexer)); + goto exit; + handle_name = xstrdup (lex_tokcstr (lexer)); handle = fh_from_id (handle_name); if (handle != NULL) { msg (SE, _("File handle %s is already defined. " "Use CLOSE FILE HANDLE before redefining a file handle."), handle_name); - goto error; + goto exit_free_handle_name; } lex_get (lexer); if (!lex_force_match (lexer, T_SLASH)) - goto error_free_handle_name; + goto exit_free_handle_name; if (!parse_file_handle (lexer, ds, &cmd, NULL)) - goto error_free_handle_name; + goto exit_free_handle_name; if (lex_end_of_command (lexer) != CMD_SUCCESS) - goto error_free_cmd; + goto exit_free_cmd; if (cmd.mode != FH_SCRATCH) { @@ -86,7 +88,7 @@ cmd_file_handle (struct lexer *lexer, struct dataset *ds) if (cmd.s_name == NULL) { lex_sbc_missing (lexer, "NAME"); - goto error_free_cmd; + goto exit_free_cmd; } switch (cmd.mode) @@ -119,7 +121,7 @@ cmd_file_handle (struct lexer *lexer, struct dataset *ds) else { msg (SE, _("RECFORM must be specified with MODE=360.")); - goto error_free_cmd; + goto exit_free_cmd; } break; default: @@ -145,15 +147,14 @@ cmd_file_handle (struct lexer *lexer, struct dataset *ds) else fh_create_scratch (handle_name); - free_file_handle (&cmd); - return CMD_SUCCESS; + result = CMD_SUCCESS; -error_free_cmd: +exit_free_cmd: free_file_handle (&cmd); -error_free_handle_name: +exit_free_handle_name: free (handle_name); -error: - return CMD_CASCADING_FAILURE; +exit: + return result; } int diff --git a/src/language/data-io/get-data.c b/src/language/data-io/get-data.c index 7e75b413..f65e8ac7 100644 --- a/src/language/data-io/get-data.c +++ b/src/language/data-io/get-data.c @@ -31,6 +31,7 @@ #include "language/data-io/placement-parser.h" #include "language/lexer/format-parser.h" #include "language/lexer/lexer.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "gl/xalloc.h" @@ -152,7 +153,7 @@ parse_get_gnm (struct lexer *lexer, struct dataset *ds) if (!lex_force_string (lexer)) goto error; - gri.file_name = ss_xstrdup (lex_tokss (lexer)); + gri.file_name = utf8_to_filename (lex_tokcstr (lexer)); lex_get (lexer); @@ -418,6 +419,7 @@ parse_get_txt (struct lexer *lexer, struct dataset *ds) if (!lex_force_string (lexer)) goto error; + /* XXX should support multibyte UTF-8 characters */ s = lex_tokss (lexer); if (ss_match_string (&s, ss_cstr ("\\t"))) ds_put_cstr (&hard_seps, "\t"); @@ -443,6 +445,7 @@ parse_get_txt (struct lexer *lexer, struct dataset *ds) if (!lex_force_string (lexer)) goto error; + /* XXX should support multibyte UTF-8 characters */ if (settings_get_syntax () == COMPATIBLE && ss_length (lex_tokss (lexer)) != 1) { @@ -500,7 +503,8 @@ parse_get_txt (struct lexer *lexer, struct dataset *ds) lex_get (lexer); } - if (!lex_force_id (lexer)) + if (!lex_force_id (lexer) + || !dict_id_is_valid (dict, lex_tokcstr (lexer), true)) goto error; name = xstrdup (lex_tokcstr (lexer)); lex_get (lexer); diff --git a/src/language/data-io/inpt-pgm.c b/src/language/data-io/inpt-pgm.c index 415c48c7..72d5a2e7 100644 --- a/src/language/data-io/inpt-pgm.c +++ b/src/language/data-io/inpt-pgm.c @@ -16,7 +16,6 @@ #include - #include #include @@ -47,8 +46,7 @@ /* Private result codes for use within INPUT PROGRAM. */ enum cmd_result_extensions { - CMD_END_INPUT_PROGRAM = CMD_PRIVATE_FIRST, - CMD_END_CASE + CMD_END_CASE = CMD_PRIVATE_FIRST }; /* Indicates how a `union value' should be initialized. */ @@ -95,7 +93,7 @@ cmd_input_program (struct lexer *lexer, struct dataset *ds) bool saw_END_CASE = false; proc_discard_active_file (ds); - if (lex_token (lexer) != T_ENDCMD) + if (!lex_match (lexer, T_ENDCMD)) return lex_end_of_command (lexer); inp = xmalloc (sizeof *inp); @@ -104,12 +102,12 @@ cmd_input_program (struct lexer *lexer, struct dataset *ds) inp->proto = NULL; inside_input_program = true; - for (;;) + while (!lex_match_phrase (lexer, "END INPUT PROGRAM")) { - enum cmd_result result = cmd_parse_in_state (lexer, ds, CMD_STATE_INPUT_PROGRAM); - if (result == CMD_END_INPUT_PROGRAM) - break; - else if (result == CMD_END_CASE) + enum cmd_result result; + + result = cmd_parse_in_state (lexer, ds, CMD_STATE_INPUT_PROGRAM); + if (result == CMD_END_CASE) { emit_END_CASE (ds, inp); saw_END_CASE = true; @@ -156,8 +154,12 @@ cmd_input_program (struct lexer *lexer, struct dataset *ds) int cmd_end_input_program (struct lexer *lexer UNUSED, struct dataset *ds UNUSED) { - assert (in_input_program ()); - return CMD_END_INPUT_PROGRAM; + /* Inside INPUT PROGRAM, this should get caught at the top of the loop in + cmd_input_program(). + + Outside of INPUT PROGRAM, the command parser should reject this + command. */ + NOT_REACHED (); } /* Returns true if STATE is valid given the transformations that @@ -237,7 +239,7 @@ cmd_end_case (struct lexer *lexer, struct dataset *ds UNUSED) assert (in_input_program ()); if (lex_token (lexer) == T_ENDCMD) return CMD_END_CASE; - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Outputs the current case */ @@ -348,13 +350,13 @@ reread_trns_free (void *t_) /* Parses END FILE command. */ int -cmd_end_file (struct lexer *lexer, struct dataset *ds) +cmd_end_file (struct lexer *lexer UNUSED, struct dataset *ds) { assert (in_input_program ()); add_transformation (ds, end_file_trns_proc, NULL, NULL); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Executes an END FILE transformation. */ diff --git a/src/language/data-io/save-translate.c b/src/language/data-io/save-translate.c index 213f0bf7..d4c67b02 100644 --- a/src/language/data-io/save-translate.c +++ b/src/language/data-io/save-translate.c @@ -159,6 +159,7 @@ cmd_save_translate (struct lexer *lexer, struct dataset *ds) lex_match (lexer, T_EQUALS); if (!lex_force_string (lexer)) goto error; + /* XXX should support multibyte UTF-8 delimiters */ if (ss_length (lex_tokss (lexer)) != 1) { msg (SE, _("The %s string must contain exactly one " @@ -173,6 +174,7 @@ cmd_save_translate (struct lexer *lexer, struct dataset *ds) lex_match (lexer, T_EQUALS); if (!lex_force_string (lexer)) goto error; + /* XXX should support multibyte UTF-8 qualifiers */ if (ss_length (lex_tokss (lexer)) != 1) { msg (SE, _("The %s string must contain exactly one " diff --git a/src/language/data-io/trim.c b/src/language/data-io/trim.c index ac8bf272..63041f25 100644 --- a/src/language/data-io/trim.c +++ b/src/language/data-io/trim.c @@ -81,7 +81,8 @@ parse_dict_rename (struct lexer *lexer, struct dictionary *dict) if (v == NULL) return 0; if (!lex_force_match (lexer, T_EQUALS) - || !lex_force_id (lexer)) + || !lex_force_id (lexer) + || !dict_id_is_valid (dict, lex_tokcstr (lexer), true)) return 0; if (dict_lookup_var (dict, lex_tokcstr (lexer)) != NULL) { @@ -114,7 +115,7 @@ parse_dict_rename (struct lexer *lexer, struct dictionary *dict) msg (SE, _("`=' expected after variable list.")); goto done; } - if (!parse_DATA_LIST_vars (lexer, &new_names, &nn, + if (!parse_DATA_LIST_vars (lexer, dict, &new_names, &nn, PV_APPEND | PV_NO_SCRATCH | PV_NO_DUPLICATE)) goto done; if (nn != nv) diff --git a/src/language/dictionary/apply-dictionary.c b/src/language/dictionary/apply-dictionary.c index 7fdbdd47..36877302 100644 --- a/src/language/dictionary/apply-dictionary.c +++ b/src/language/dictionary/apply-dictionary.c @@ -79,12 +79,9 @@ cmd_apply_dictionary (struct lexer *lexer, struct dataset *ds) continue; } - if (var_get_label (s)) - { - const char *label = var_get_label (s); - if (strcspn (label, " ") != strlen (label)) - var_set_label (t, label); - } + if (var_has_label (s)) + var_set_label (t, var_get_label (s), + dict_get_encoding (dataset_dict (ds)), false); if (var_has_value_labels (s)) { @@ -129,5 +126,5 @@ cmd_apply_dictionary (struct lexer *lexer, struct dataset *ds) dict_set_weight (dataset_dict (ds), new_weight); } - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/dictionary/attributes.c b/src/language/dictionary/attributes.c index b0c9ddfd..13520730 100644 --- a/src/language/dictionary/attributes.c +++ b/src/language/dictionary/attributes.c @@ -32,21 +32,26 @@ #include "gettext.h" #define _(msgid) gettext (msgid) -static enum cmd_result parse_attributes (struct lexer *, struct attrset **, - size_t n); +static enum cmd_result parse_attributes (struct lexer *, + const char *dict_encoding, + struct attrset **, size_t n); /* Parses the DATAFILE ATTRIBUTE command. */ int cmd_datafile_attribute (struct lexer *lexer, struct dataset *ds) { - struct attrset *set = dict_get_attributes (dataset_dict (ds)); - return parse_attributes (lexer, &set, 1); + struct dictionary *dict = dataset_dict (ds); + struct attrset *set = dict_get_attributes (dict); + return parse_attributes (lexer, dict_get_encoding (dict), &set, 1); } /* Parses the VARIABLE ATTRIBUTE command. */ int cmd_variable_attribute (struct lexer *lexer, struct dataset *ds) { + struct dictionary *dict = dataset_dict (ds); + const char *dict_encoding = dict_get_encoding (dict); + do { struct variable **vars; @@ -56,15 +61,14 @@ cmd_variable_attribute (struct lexer *lexer, struct dataset *ds) if (!lex_force_match_id (lexer, "VARIABLES") || !lex_force_match (lexer, T_EQUALS) - || !parse_variables (lexer, dataset_dict (ds), &vars, &n_vars, - PV_NONE)) + || !parse_variables (lexer, dict, &vars, &n_vars, PV_NONE)) return CMD_FAILURE; sets = xmalloc (n_vars * sizeof *sets); for (i = 0; i < n_vars; i++) sets[i] = var_get_attributes (vars[i]); - ok = parse_attributes (lexer, sets, n_vars); + ok = parse_attributes (lexer, dict_encoding, sets, n_vars); free (vars); free (sets); if (!ok) @@ -72,33 +76,21 @@ cmd_variable_attribute (struct lexer *lexer, struct dataset *ds) } while (lex_match (lexer, T_SLASH)); - return lex_end_of_command (lexer); -} - -static bool -match_subcommand (struct lexer *lexer, const char *keyword) -{ - if (lex_token (lexer) == T_ID - && lex_id_match (lex_tokss (lexer), ss_cstr (keyword)) - && lex_look_ahead (lexer) == T_EQUALS) - { - lex_get (lexer); /* Skip keyword. */ - lex_get (lexer); /* Skip '='. */ - return true; - } - else - return false; + return CMD_SUCCESS; } -/* Parses an attribute name optionally followed by an index inside square - brackets. Returns the attribute name or NULL if there was a parse error. - Stores the index into *INDEX. */ +/* Parses an attribute name and verifies that it is valid in DICT_ENCODING, + optionally followed by an index inside square brackets. Returns the + attribute name or NULL if there was a parse error. Stores the index into + *INDEX. */ static char * -parse_attribute_name (struct lexer *lexer, size_t *index) +parse_attribute_name (struct lexer *lexer, const char *dict_encoding, + size_t *index) { char *name; - if (!lex_force_id (lexer)) + if (!lex_force_id (lexer) + || !id_is_valid (lex_tokcstr (lexer), dict_encoding, true)) return NULL; name = xstrdup (lex_tokcstr (lexer)); lex_get (lexer); @@ -127,13 +119,14 @@ error: } static bool -add_attribute (struct lexer *lexer, struct attrset **sets, size_t n) +add_attribute (struct lexer *lexer, const char *dict_encoding, + struct attrset **sets, size_t n) { const char *value; size_t index, i; char *name; - name = parse_attribute_name (lexer, &index); + name = parse_attribute_name (lexer, dict_encoding, &index); if (name == NULL) return false; if (!lex_force_match (lexer, T_LPAREN) || !lex_force_string (lexer)) @@ -160,12 +153,13 @@ add_attribute (struct lexer *lexer, struct attrset **sets, size_t n) } static bool -delete_attribute (struct lexer *lexer, struct attrset **sets, size_t n) +delete_attribute (struct lexer *lexer, const char *dict_encoding, + struct attrset **sets, size_t n) { size_t index, i; char *name; - name = parse_attribute_name (lexer, &index); + name = parse_attribute_name (lexer, dict_encoding, &index); if (name == NULL) return false; @@ -191,14 +185,15 @@ delete_attribute (struct lexer *lexer, struct attrset **sets, size_t n) } static enum cmd_result -parse_attributes (struct lexer *lexer, struct attrset **sets, size_t n) +parse_attributes (struct lexer *lexer, const char *dict_encoding, + struct attrset **sets, size_t n) { enum { UNKNOWN, ADD, DELETE } command = UNKNOWN; do { - if (match_subcommand (lexer, "ATTRIBUTE")) + if (lex_match_phrase (lexer, "ATTRIBUTE=")) command = ADD; - else if (match_subcommand (lexer, "DELETE")) + else if (lex_match_phrase (lexer, "DELETE=")) command = DELETE; else if (command == UNKNOWN) { @@ -207,8 +202,8 @@ parse_attributes (struct lexer *lexer, struct attrset **sets, size_t n) } if (!(command == ADD - ? add_attribute (lexer, sets, n) - : delete_attribute (lexer, sets, n))) + ? add_attribute (lexer, dict_encoding, sets, n) + : delete_attribute (lexer, dict_encoding, sets, n))) return CMD_FAILURE; } while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD); diff --git a/src/language/dictionary/missing-values.c b/src/language/dictionary/missing-values.c index ace99484..3ff4c426 100644 --- a/src/language/dictionary/missing-values.c +++ b/src/language/dictionary/missing-values.c @@ -19,6 +19,7 @@ #include #include "data/data-in.h" +#include "data/dictionary.h" #include "data/format.h" #include "data/missing-values.h" #include "data/procedure.h" @@ -28,6 +29,7 @@ #include "language/lexer/lexer.h" #include "language/lexer/value-parser.h" #include "language/lexer/variable-parser.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/str.h" @@ -37,21 +39,21 @@ int cmd_missing_values (struct lexer *lexer, struct dataset *ds) { + struct dictionary *dict = dataset_dict (ds); struct variable **v = NULL; size_t nv; - int retval = CMD_FAILURE; - bool deferred_errors = false; + bool ok = true; while (lex_token (lexer) != T_ENDCMD) { size_t i; - if (!parse_variables (lexer, dataset_dict (ds), &v, &nv, PV_NONE)) - goto done; + if (!parse_variables (lexer, dict, &v, &nv, PV_NONE)) + goto error; if (!lex_force_match (lexer, T_LPAREN)) - goto done; + goto error; for (i = 0; i < nv; i++) var_clear_missing_values (v[i]); @@ -68,7 +70,7 @@ cmd_missing_values (struct lexer *lexer, struct dataset *ds) msg (SE, _("Cannot mix numeric variables (e.g. %s) and " "string variables (e.g. %s) within a single list."), var_get_name (n), var_get_name (s)); - goto done; + goto error; } if (var_is_numeric (v[0])) @@ -81,13 +83,13 @@ cmd_missing_values (struct lexer *lexer, struct dataset *ds) bool ok; if (!parse_num_range (lexer, &x, &y, &type)) - goto done; + goto error; ok = (x == y ? mv_add_num (&mv, x) : mv_add_range (&mv, x, y)); if (!ok) - deferred_errors = true; + ok = false; lex_match (lexer, T_COMMA); } @@ -98,27 +100,33 @@ cmd_missing_values (struct lexer *lexer, struct dataset *ds) while (!lex_match (lexer, T_RPAREN)) { uint8_t value[MV_MAX_STRING]; + char *dict_mv; size_t length; if (!lex_force_string (lexer)) { - deferred_errors = true; + ok = false; break; } - length = ss_length (lex_tokss (lexer)); + dict_mv = recode_string (dict_get_encoding (dict), "UTF-8", + lex_tokcstr (lexer), + ss_length (lex_tokss (lexer))); + length = strlen (dict_mv); if (length > MV_MAX_STRING) { + /* XXX truncate graphemes not bytes */ msg (SE, _("Truncating missing value to maximum " "acceptable length (%d bytes)."), MV_MAX_STRING); length = MV_MAX_STRING; } memset (value, ' ', MV_MAX_STRING); - memcpy (value, ss_data (lex_tokss (lexer)), length); + memcpy (value, dict_mv, length); + free (dict_mv); if (!mv_add_str (&mv, value)) - deferred_errors = true; + ok = false; lex_get (lexer); lex_match (lexer, T_COMMA); @@ -134,7 +142,7 @@ cmd_missing_values (struct lexer *lexer, struct dataset *ds) msg (SE, _("Missing values provided are too long to assign " "to variable of width %d."), var_get_width (v[i])); - deferred_errors = true; + ok = false; } } @@ -145,12 +153,12 @@ cmd_missing_values (struct lexer *lexer, struct dataset *ds) free (v); v = NULL; } - retval = lex_end_of_command (lexer); - done: free (v); - if (deferred_errors) - retval = CMD_FAILURE; - return retval; + return ok ? CMD_SUCCESS : CMD_FAILURE; + +error: + free (v); + return CMD_FAILURE; } diff --git a/src/language/dictionary/modify-variables.c b/src/language/dictionary/modify-variables.c index afb90899..f3c31900 100644 --- a/src/language/dictionary/modify-variables.c +++ b/src/language/dictionary/modify-variables.c @@ -200,8 +200,8 @@ cmd_modify_vars (struct lexer *lexer, struct dataset *ds) "names on RENAME subcommand.")); goto done; } - if (!parse_DATA_LIST_vars (lexer, &vm.new_names, - &prev_nv_1, PV_APPEND)) + if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds), + &vm.new_names, &prev_nv_1, PV_APPEND)) goto done; if (prev_nv_1 != vm.rename_cnt) { diff --git a/src/language/dictionary/mrsets.c b/src/language/dictionary/mrsets.c index c775f497..3af5d033 100644 --- a/src/language/dictionary/mrsets.c +++ b/src/language/dictionary/mrsets.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2010 Free Software Foundation, Inc. + Copyright (C) 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -27,6 +27,7 @@ #include "language/lexer/variable-parser.h" #include "libpspp/assertion.h" #include "libpspp/hmap.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/str.h" #include "libpspp/stringi-map.h" @@ -69,7 +70,7 @@ cmd_mrsets (struct lexer *lexer, struct dataset *ds) return CMD_FAILURE; } - return lex_end_of_command (lexer); + return CMD_SUCCESS; } static bool @@ -91,15 +92,10 @@ parse_group (struct lexer *lexer, struct dictionary *dict, { if (lex_match_id (lexer, "NAME")) { - if (!lex_force_match (lexer, T_EQUALS) || !lex_force_id (lexer)) + if (!lex_force_match (lexer, T_EQUALS) || !lex_force_id (lexer) + || !mrset_is_valid_name (lex_tokcstr (lexer), + dict_get_encoding (dict), true)) goto error; - if (lex_tokcstr (lexer)[0] != '$') - { - msg (SE, _("%s is not a valid name for a multiple response " - "set. Multiple response set names must begin with " - "`$'."), lex_tokcstr (lexer)); - goto error; - } free (mrset->name); mrset->name = xstrdup (lex_tokcstr (lexer)); @@ -159,12 +155,15 @@ parse_group (struct lexer *lexer, struct dictionary *dict, } else if (lex_is_string (lexer)) { - const char *s = lex_tokcstr (lexer); - int width; + size_t width; + char *s; + + s = recode_string (dict_get_encoding (dict), "UTF-8", + lex_tokcstr (lexer), -1); + width = strlen (s); /* Trim off trailing spaces, but don't trim the string until it's empty because a width of 0 is a numeric type. */ - width = strlen (s); while (width > 1 && s[width - 1] == ' ') width--; @@ -172,6 +171,8 @@ parse_group (struct lexer *lexer, struct dictionary *dict, value_init (&mrset->counted, width); memcpy (value_str_rw (&mrset->counted, width), s, width); mrset->width = width; + + free (s); } else { diff --git a/src/language/dictionary/numeric.c b/src/language/dictionary/numeric.c index 25b2ecf8..c88d5143 100644 --- a/src/language/dictionary/numeric.c +++ b/src/language/dictionary/numeric.c @@ -49,7 +49,8 @@ cmd_numeric (struct lexer *lexer, struct dataset *ds) be used. */ struct fmt_spec f; - if (!parse_DATA_LIST_vars (lexer, &v, &nv, PV_NO_DUPLICATE)) + if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds), + &v, &nv, PV_NO_DUPLICATE)) return CMD_FAILURE; /* Get the optional format specification. */ @@ -98,7 +99,7 @@ cmd_numeric (struct lexer *lexer, struct dataset *ds) } while (lex_match (lexer, T_SLASH)); - return lex_end_of_command (lexer); + return CMD_SUCCESS; /* If we have an error at a point where cleanup is required, flow-of-control comes here. */ @@ -127,7 +128,8 @@ cmd_string (struct lexer *lexer, struct dataset *ds) do { - if (!parse_DATA_LIST_vars (lexer, &v, &nv, PV_NO_DUPLICATE)) + if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds), + &v, &nv, PV_NO_DUPLICATE)) return CMD_FAILURE; if (!lex_force_match (lexer, T_LPAREN) @@ -164,7 +166,7 @@ cmd_string (struct lexer *lexer, struct dataset *ds) } while (lex_match (lexer, T_SLASH)); - return lex_end_of_command (lexer); + return CMD_SUCCESS; /* If we have an error at a point where cleanup is required, flow-of-control comes here. */ @@ -190,5 +192,5 @@ cmd_leave (struct lexer *lexer, struct dataset *ds) var_set_leave (v[i], true); free (v); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/dictionary/rename-variables.c b/src/language/dictionary/rename-variables.c index 43745045..c1d31996 100644 --- a/src/language/dictionary/rename-variables.c +++ b/src/language/dictionary/rename-variables.c @@ -66,7 +66,8 @@ cmd_rename_variables (struct lexer *lexer, struct dataset *ds) msg (SE, _("`=' expected between lists of new and old variable names.")); goto lossage; } - if (!parse_DATA_LIST_vars (lexer, &rename_new_names, &prev_nv_1, + if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds), + &rename_new_names, &prev_nv_1, PV_APPEND | PV_NO_DUPLICATE)) goto lossage; if (prev_nv_1 != rename_cnt) diff --git a/src/language/dictionary/split-file.c b/src/language/dictionary/split-file.c index 02e7696b..56e3eeef 100644 --- a/src/language/dictionary/split-file.c +++ b/src/language/dictionary/split-file.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2009, 2011 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -59,7 +59,7 @@ cmd_split_file (struct lexer *lexer, struct dataset *ds) free (v); } - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Dumps out the values of all the split variables for the case C. */ diff --git a/src/language/dictionary/sys-file-info.c b/src/language/dictionary/sys-file-info.c index 952930b5..51b3cae4 100644 --- a/src/language/dictionary/sys-file-info.c +++ b/src/language/dictionary/sys-file-info.c @@ -37,6 +37,7 @@ #include "libpspp/array.h" #include "libpspp/message.h" #include "libpspp/misc.h" +#include "libpspp/string-array.h" #include "output/tab.h" #include "gl/minmax.h" @@ -163,7 +164,7 @@ cmd_sysfile_info (struct lexer *lexer, struct dataset *ds UNUSED) dict_destroy (d); fh_unref (h); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* DISPLAY utility. */ @@ -210,7 +211,7 @@ cmd_display (struct lexer *lexer, struct dataset *ds) if (lex_match_id (lexer, "VECTORS")) { display_vectors (dataset_dict(ds), sorted); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } else if (lex_match_id (lexer, "SCRATCH")) { @@ -280,7 +281,7 @@ cmd_display (struct lexer *lexer, struct dataset *ds) flags); } - return lex_end_of_command (lexer); + return CMD_SUCCESS; } static void @@ -292,24 +293,19 @@ display_macros (void) static void display_documents (const struct dictionary *dict) { - const char *documents = dict_get_documents (dict); + const struct string_array *documents = dict_get_documents (dict); - if (documents == NULL) + if (string_array_is_empty (documents)) tab_output_text (TAB_LEFT, _("The active file dictionary does not " "contain any documents.")); else { - struct string line = DS_EMPTY_INITIALIZER; size_t i; tab_output_text (TAB_LEFT | TAT_TITLE, _("Documents in the active file:")); for (i = 0; i < dict_get_document_line_cnt (dict); i++) - { - dict_get_document_line (dict, i, &line); - tab_output_text (TAB_LEFT | TAB_FIX, ds_cstr (&line)); - } - ds_destroy (&line); + tab_output_text (TAB_LEFT | TAB_FIX, dict_get_document_line (dict, i)); } } diff --git a/src/language/dictionary/value-labels.c b/src/language/dictionary/value-labels.c index 9068290b..18e890dd 100644 --- a/src/language/dictionary/value-labels.c +++ b/src/language/dictionary/value-labels.c @@ -19,6 +19,7 @@ #include #include +#include "data/dictionary.h" #include "data/procedure.h" #include "data/value-labels.h" #include "data/variable.h" @@ -26,6 +27,7 @@ #include "language/lexer/lexer.h" #include "language/lexer/value-parser.h" #include "language/lexer/variable-parser.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/str.h" @@ -39,7 +41,8 @@ static int do_value_labels (struct lexer *, const struct dictionary *dict, bool); static void erase_labels (struct variable **vars, size_t var_cnt); -static int get_label (struct lexer *, struct variable **vars, size_t var_cnt); +static int get_label (struct lexer *, struct variable **vars, size_t var_cnt, + const char *dict_encoding); /* Stubs. */ @@ -78,7 +81,7 @@ do_value_labels (struct lexer *lexer, const struct dictionary *dict, bool erase) if (erase) erase_labels (vars, var_cnt); while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD) - if (!get_label (lexer, vars, var_cnt)) + if (!get_label (lexer, vars, var_cnt, dict_get_encoding (dict))) goto lossage; if (lex_token (lexer) != T_SLASH) @@ -92,10 +95,7 @@ do_value_labels (struct lexer *lexer, const struct dictionary *dict, bool erase) free (vars); } - if (parse_err) - return CMD_FAILURE; - - return lex_end_of_command (lexer); + return parse_err ? CMD_FAILURE : CMD_SUCCESS; lossage: free (vars); @@ -116,7 +116,8 @@ erase_labels (struct variable **vars, size_t var_cnt) /* Parse all the labels for the VAR_CNT variables in VARS and add the specified labels to those variables. */ static int -get_label (struct lexer *lexer, struct variable **vars, size_t var_cnt) +get_label (struct lexer *lexer, struct variable **vars, size_t var_cnt, + const char *dict_encoding) { /* Parse all the labels and add them to the variables. */ do @@ -125,6 +126,7 @@ get_label (struct lexer *lexer, struct variable **vars, size_t var_cnt) int width = var_get_width (vars[0]); union value value; struct string label; + size_t trunc_len; size_t i; /* Set value. */ @@ -145,10 +147,12 @@ get_label (struct lexer *lexer, struct variable **vars, size_t var_cnt) ds_init_substring (&label, lex_tokss (lexer)); - if (ds_length (&label) > MAX_LABEL_LEN) + trunc_len = utf8_encoding_trunc_len (ds_cstr (&label), dict_encoding, + MAX_LABEL_LEN); + if (ds_length (&label) > trunc_len) { msg (SW, _("Truncating value label to %d bytes."), MAX_LABEL_LEN); - ds_truncate (&label, MAX_LABEL_LEN); + ds_truncate (&label, trunc_len); } for (i = 0; i < var_cnt; i++) diff --git a/src/language/dictionary/variable-label.c b/src/language/dictionary/variable-label.c index 4735047c..c0f80fbd 100644 --- a/src/language/dictionary/variable-label.c +++ b/src/language/dictionary/variable-label.c @@ -19,13 +19,13 @@ #include #include +#include "data/dictionary.h" #include "data/procedure.h" #include "data/variable.h" #include "language/command.h" #include "language/lexer/lexer.h" #include "language/lexer/variable-parser.h" #include "libpspp/message.h" -#include "libpspp/str.h" #include "gl/xalloc.h" @@ -35,15 +35,17 @@ int cmd_variable_labels (struct lexer *lexer, struct dataset *ds) { + struct dictionary *dict = dataset_dict (ds); + const char *dict_encoding = dict_get_encoding (dict); + do { struct variable **v; - struct string label; size_t nv; size_t i; - if (!parse_variables (lexer, dataset_dict (ds), &v, &nv, PV_NONE)) + if (!parse_variables (lexer, dict, &v, &nv, PV_NONE)) return CMD_FAILURE; if (!lex_force_string (lexer)) @@ -52,15 +54,8 @@ cmd_variable_labels (struct lexer *lexer, struct dataset *ds) return CMD_FAILURE; } - ds_init_substring (&label, lex_tokss (lexer)); - if (ds_length (&label) > 255) - { - msg (SW, _("Truncating variable label to 255 characters.")); - ds_truncate (&label, 255); - } for (i = 0; i < nv; i++) - var_set_label (v[i], ds_cstr (&label)); - ds_destroy (&label); + var_set_label (v[i], lex_tokcstr (lexer), dict_encoding, i == 0); lex_get (lexer); while (lex_token (lexer) == T_SLASH) diff --git a/src/language/dictionary/vector.c b/src/language/dictionary/vector.c index bf257194..a5e66df8 100644 --- a/src/language/dictionary/vector.c +++ b/src/language/dictionary/vector.c @@ -50,7 +50,8 @@ cmd_vector (struct lexer *lexer, struct dataset *ds) size_t vector_cnt, vector_cap; /* Get the name(s) of the new vector(s). */ - if (!lex_force_id (lexer)) + if (!lex_force_id (lexer) + || !dict_id_is_valid (dict, lex_tokcstr (lexer), true)) return CMD_CASCADING_FAILURE; vectors = NULL; @@ -151,19 +152,17 @@ cmd_vector (struct lexer *lexer, struct dataset *ds) goto fail; } - /* Check that none of the variables exist and that - their names are no more than VAR_NAME_LEN bytes - long. */ + /* Check that none of the variables exist and that their names are + not excessively long. */ for (i = 0; i < vector_cnt; i++) { int j; for (j = 0; j < var_cnt; j++) { char *name = xasprintf ("%s%d", vectors[i], j + 1); - if (strlen (name) > VAR_NAME_LEN) + if (!dict_id_is_valid (dict, name, true)) { free (name); - msg (SE, _("%s is too long for a variable name."), name); goto fail; } if (dict_lookup_var (dict, name)) @@ -200,7 +199,7 @@ cmd_vector (struct lexer *lexer, struct dataset *ds) while (lex_match (lexer, T_SLASH)); pool_destroy (pool); - return lex_end_of_command (lexer); + return CMD_SUCCESS; fail: pool_destroy (pool); diff --git a/src/language/dictionary/weight.c b/src/language/dictionary/weight.c index cca9aebe..ad06ad10 100644 --- a/src/language/dictionary/weight.c +++ b/src/language/dictionary/weight.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2011 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -58,5 +58,5 @@ cmd_weight (struct lexer *lexer, struct dataset *ds) dict_set_weight (dict, v); } - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/expressions/parse.c b/src/language/expressions/parse.c index a9c324bc..ed5a0709 100644 --- a/src/language/expressions/parse.c +++ b/src/language/expressions/parse.c @@ -33,6 +33,7 @@ #include "language/lexer/variable-parser.h" #include "libpspp/array.h" #include "libpspp/assertion.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/misc.h" #include "libpspp/pool.h" @@ -785,12 +786,14 @@ parse_sysvar (struct lexer *lexer, struct expression *e) time_t last_proc_time = time_of_last_procedure (e->ds); struct tm *time; char temp_buf[10]; + struct substring s; time = localtime (&last_proc_time); sprintf (temp_buf, "%02d %s %02d", abs (time->tm_mday) % 100, months[abs (time->tm_mon) % 12], abs (time->tm_year) % 100); - return expr_allocate_string_buffer (e, temp_buf, strlen (temp_buf)); + ss_alloc_substring (&s, ss_cstr (temp_buf)); + return expr_allocate_string (e, s); } else if (lex_match_id (lexer, "$TRUE")) return expr_allocate_boolean (e, 1.0); @@ -836,7 +839,7 @@ parse_primary (struct lexer *lexer, struct expression *e) switch (lex_token (lexer)) { case T_ID: - if (lex_look_ahead (lexer) == T_LPAREN) + if (lex_next_token (lexer, 1) == T_LPAREN) { /* An identifier followed by a left parenthesis may be a vector element reference. If not, it's a function @@ -887,8 +890,17 @@ parse_primary (struct lexer *lexer, struct expression *e) case T_STRING: { - union any_node *node = expr_allocate_string_buffer ( - e, lex_tokcstr (lexer), ss_length (lex_tokss (lexer))); + const char *dict_encoding; + union any_node *node; + char *s; + + dict_encoding = (e->ds != NULL + ? dict_get_encoding (dataset_dict (e->ds)) + : "UTF-8"); + s = recode_string (dict_encoding, "UTF-8", lex_tokcstr (lexer), + ss_length (lex_tokss (lexer))); + node = expr_allocate_string (e, ss_cstr (s)); + lex_get (lexer); return node; } @@ -1231,7 +1243,7 @@ parse_function (struct lexer *lexer, struct expression *e) for (;;) { if (lex_token (lexer) == T_ID - && toupper (lex_look_ahead (lexer)) == T_ID) + && lex_next_token (lexer, 1) == T_TO) { const struct variable **vars; size_t var_cnt; @@ -1473,18 +1485,6 @@ expr_allocate_vector (struct expression *e, const struct vector *vector) return n; } -union any_node * -expr_allocate_string_buffer (struct expression *e, - const char *string, size_t length) -{ - union any_node *n = pool_alloc (e->expr_pool, sizeof n->string); - n->type = OP_string; - if (length > MAX_STRING) - length = MAX_STRING; - n->string.s = copy_string (e, string, length); - return n; -} - union any_node * expr_allocate_string (struct expression *e, struct substring s) { diff --git a/src/language/expressions/private.h b/src/language/expressions/private.h index 1a485bb4..062d6f76 100644 --- a/src/language/expressions/private.h +++ b/src/language/expressions/private.h @@ -187,10 +187,7 @@ union any_node *expr_allocate_number (struct expression *e, double); union any_node *expr_allocate_boolean (struct expression *e, double); union any_node *expr_allocate_integer (struct expression *e, int); union any_node *expr_allocate_pos_int (struct expression *e, int); -union any_node *expr_allocate_string_buffer (struct expression *e, - const char *string, size_t length); -union any_node *expr_allocate_string (struct expression *e, - struct substring); +union any_node *expr_allocate_string (struct expression *e, struct substring); union any_node *expr_allocate_variable (struct expression *e, const struct variable *); union any_node *expr_allocate_format (struct expression *e, diff --git a/src/language/lexer/automake.mk b/src/language/lexer/automake.mk index be48873e..11771a24 100644 --- a/src/language/lexer/automake.mk +++ b/src/language/lexer/automake.mk @@ -4,6 +4,8 @@ language_lexer_sources = \ src/language/lexer/command-name.c \ src/language/lexer/command-name.h \ + src/language/lexer/include-path.c \ + src/language/lexer/include-path.h \ src/language/lexer/lexer.c \ src/language/lexer/lexer.h \ src/language/lexer/subcommand-list.c \ diff --git a/src/language/lexer/include-path.c b/src/language/lexer/include-path.c new file mode 100644 index 00000000..bc20122d --- /dev/null +++ b/src/language/lexer/include-path.c @@ -0,0 +1,89 @@ +/* PSPP - a program for statistical analysis. + Copyright (C) 2010 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include + +#include "src/language/lexer/include-path.h" + +#include + +#include "data/file-name.h" +#include "libpspp/string-array.h" + +#include "gl/configmake.h" +#include "gl/relocatable.h" +#include "gl/xvasprintf.h" + +static struct string_array the_include_path; +static struct string_array default_include_path; + +static void include_path_init__ (void); + +void +include_path_clear (void) +{ + include_path_init__ (); + string_array_clear (&the_include_path); +} + +void +include_path_add (const char *dir) +{ + include_path_init__ (); + string_array_append (&the_include_path, dir); +} + +char * +include_path_search (const char *base_name) +{ + return fn_search_path (base_name, include_path ()); +} + +const struct string_array * +include_path_default (void) +{ + include_path_init__ (); + return &default_include_path; +} + +char ** +include_path (void) +{ + include_path_init__ (); + string_array_terminate_null (&the_include_path); + return the_include_path.strings; +} + +static void +include_path_init__ (void) +{ + static bool inited; + char *home; + + if (inited) + return; + inited = false; + + string_array_init (&the_include_path); + string_array_append (&the_include_path, "."); + home = getenv ("HOME"); + if (home != NULL) + string_array_append_nocopy (&the_include_path, + xasprintf ("%s/.pspp", home)); + string_array_append (&the_include_path, relocate (PKGDATADIR)); + + string_array_clone (&default_include_path, &the_include_path); +} diff --git a/src/language/lexer/include-path.h b/src/language/lexer/include-path.h new file mode 100644 index 00000000..447b9a6c --- /dev/null +++ b/src/language/lexer/include-path.h @@ -0,0 +1,29 @@ +/* PSPP - a program for statistical analysis. + Copyright (C) 2010 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#ifndef INCLUDE_PATH_H +#define INCLUDE_PATH_H 1 + +struct string_array; + +void include_path_clear (void); +void include_path_add (const char *dir); +char *include_path_search (const char *base_name); + +const struct string_array *include_path_default (void); +char **include_path (void); + +#endif /* include-path.h */ diff --git a/src/language/lexer/lexer.c b/src/language/lexer/lexer.c index 938d2667..9a27d867 100644 --- a/src/language/lexer/lexer.c +++ b/src/language/lexer/lexer.c @@ -18,405 +18,254 @@ #include "language/lexer/lexer.h" -#include -#include #include +#include #include #include #include -#include #include +#include +#include +#include +#include +#include -#include "data/settings.h" +#include "data/file-name.h" #include "language/command.h" +#include "language/lexer/scan.h" +#include "language/lexer/segment.h" +#include "language/lexer/token.h" #include "libpspp/assertion.h" -#include "libpspp/getl.h" +#include "libpspp/cast.h" +#include "libpspp/deque.h" +#include "libpspp/i18n.h" +#include "libpspp/ll.h" #include "libpspp/message.h" +#include "libpspp/misc.h" #include "libpspp/str.h" +#include "libpspp/u8-istream.h" #include "output/journal.h" #include "output/text-item.h" +#include "gl/c-ctype.h" +#include "gl/minmax.h" #include "gl/xalloc.h" +#include "gl/xmemdup0.h" #include "gettext.h" #define _(msgid) gettext (msgid) #define N_(msgid) msgid -struct lexer -{ - struct string line_buffer; - - struct source_stream *ss; - - int token; /* Current token. */ - double tokval; /* T_POS_NUM, T_NEG_NUM: the token's value. */ - - struct string tokstr; /* T_ID, T_STRING: token string value. */ - - char *prog; /* Pointer to next token in line_buffer. */ - bool dot; /* True only if this line ends with a terminal dot. */ - - int put_token ; /* If nonzero, next token returned by lex_get(). - Used only in exceptional circumstances. */ +/* A token within a lex_source. */ +struct lex_token + { + /* The regular token information. */ + struct token token; + + /* Location of token in terms of the lex_source's buffer. + src->tail <= line_pos <= token_pos <= src->head. */ + size_t token_pos; /* Start of token. */ + size_t token_len; /* Length of source for token in bytes. */ + size_t line_pos; /* Start of line containing token_pos. */ + int first_line; /* Line number at token_pos. */ + }; - struct string put_tokstr; - double put_tokval; -}; +/* A source of tokens, corresponding to a syntax file. + This is conceptually a lex_reader wrapped with everything needed to convert + its UTF-8 bytes into tokens. */ +struct lex_source + { + struct ll ll; /* In lexer's list of sources. */ + struct lex_reader *reader; + struct segmenter segmenter; + bool eof; /* True if T_STOP was read from 'reader'. */ + + /* Buffer of UTF-8 bytes. */ + char *buffer; + size_t allocated; /* Number of bytes allocated. */ + size_t tail; /* &buffer[0] offset into UTF-8 source. */ + size_t head; /* &buffer[head - tail] offset into source. */ + + /* Positions in source file, tail <= pos <= head for each member here. */ + size_t journal_pos; /* First byte not yet output to journal. */ + size_t seg_pos; /* First byte not yet scanned as token. */ + size_t line_pos; /* First byte of line containing seg_pos. */ + + int n_newlines; /* Number of new-lines up to seg_pos. */ + bool suppress_next_newline; + + /* Tokens. */ + struct deque deque; /* Indexes into 'tokens'. */ + struct lex_token *tokens; /* Lookahead tokens for parser. */ + }; -static int parse_id (struct lexer *); +static struct lex_source *lex_source_create (struct lex_reader *); +static void lex_source_destroy (struct lex_source *); -/* How a string represents its contents. */ -enum string_type +/* Lexer. */ +struct lexer { - CHARACTER_STRING, /* Characters. */ - BINARY_STRING, /* Binary digits. */ - OCTAL_STRING, /* Octal digits. */ - HEX_STRING /* Hexadecimal digits. */ + struct ll_list sources; /* Contains "struct lex_source"s. */ }; -static int parse_string (struct lexer *, enum string_type); +static struct lex_source *lex_source__ (const struct lexer *); +static const struct lex_token *lex_next__ (const struct lexer *, int n); +static void lex_source_push_endcmd__ (struct lex_source *); + +static void lex_source_pop__ (struct lex_source *); +static bool lex_source_get__ (const struct lex_source *); +static void lex_source_error_valist (struct lex_source *, int n0, int n1, + const char *format, va_list) + PRINTF_FORMAT (4, 0); +static const struct lex_token *lex_source_next__ (const struct lex_source *, + int n); -/* Initialization. */ - -/* Initializes the lexer. */ -struct lexer * -lex_create (struct source_stream *ss) -{ - struct lexer *lexer = xzalloc (sizeof (*lexer)); - - ds_init_empty (&lexer->tokstr); - ds_init_empty (&lexer->put_tokstr); - ds_init_empty (&lexer->line_buffer); - lexer->ss = ss; - - return lexer; -} - -struct source_stream * -lex_get_source_stream (const struct lexer *lex) +/* Initializes READER with the specified CLASS and otherwise some reasonable + defaults. The caller should fill in the others members as desired. */ +void +lex_reader_init (struct lex_reader *reader, + const struct lex_reader_class *class) { - return lex->ss; + reader->class = class; + reader->syntax = LEX_SYNTAX_AUTO; + reader->error = LEX_ERROR_INTERACTIVE; + reader->file_name = NULL; + reader->line_number = 0; } -enum syntax_mode -lex_current_syntax_mode (const struct lexer *lex) +/* Frees any file name already in READER and replaces it by a copy of + FILE_NAME, or if FILE_NAME is null then clears any existing name. */ +void +lex_reader_set_file_name (struct lex_reader *reader, const char *file_name) { - return source_stream_current_syntax_mode (lex->ss); + free (reader->file_name); + reader->file_name = file_name != NULL ? xstrdup (file_name) : NULL; } - -enum error_mode -lex_current_error_mode (const struct lexer *lex) + +/* Creates and returns a new lexer. */ +struct lexer * +lex_create (void) { - return source_stream_current_error_mode (lex->ss); + struct lexer *lexer = xzalloc (sizeof *lexer); + ll_init (&lexer->sources); + return lexer; } - +/* Destroys LEXER. */ void lex_destroy (struct lexer *lexer) { - if ( NULL != lexer ) + if (lexer != NULL) { - ds_destroy (&lexer->put_tokstr); - ds_destroy (&lexer->tokstr); - ds_destroy (&lexer->line_buffer); + struct lex_source *source, *next; + ll_for_each_safe (source, next, struct lex_source, ll, &lexer->sources) + lex_source_destroy (source); free (lexer); } } +/* Inserts READER into LEXER so that the next token read by LEXER comes from + READER. Before the caller, LEXER must either be empty or at a T_ENDCMD + token. */ +void +lex_include (struct lexer *lexer, struct lex_reader *reader) +{ + assert (ll_is_empty (&lexer->sources) || lex_token (lexer) == T_ENDCMD); + ll_push_head (&lexer->sources, &lex_source_create (reader)->ll); +} + +/* Appends READER to LEXER, so that it will be read after all other current + readers have already been read. */ +void +lex_append (struct lexer *lexer, struct lex_reader *reader) +{ + ll_push_tail (&lexer->sources, &lex_source_create (reader)->ll); +} -/* Common functions. */ +/* Advacning. */ + +static struct lex_token * +lex_push_token__ (struct lex_source *src) +{ + struct lex_token *token; + + if (deque_is_full (&src->deque)) + src->tokens = deque_expand (&src->deque, src->tokens, sizeof *src->tokens); + + token = &src->tokens[deque_push_front (&src->deque)]; + token_init (&token->token); + return token; +} -/* Copies put_token, lexer->put_tokstr, put_tokval into token, tokstr, - tokval, respectively, and sets tokid appropriately. */ static void -restore_token (struct lexer *lexer) +lex_source_pop__ (struct lex_source *src) { - assert (lexer->put_token != 0); - lexer->token = lexer->put_token; - ds_assign_string (&lexer->tokstr, &lexer->put_tokstr); - lexer->tokval = lexer->put_tokval; - lexer->put_token = 0; + token_destroy (&src->tokens[deque_pop_back (&src->deque)].token); } -/* Copies token, tokstr, lexer->tokval into lexer->put_token, put_tokstr, - put_lexer->tokval respectively. */ static void -save_token (struct lexer *lexer) +lex_source_pop_front (struct lex_source *src) { - lexer->put_token = lexer->token; - ds_assign_string (&lexer->put_tokstr, &lexer->tokstr); - lexer->put_tokval = lexer->tokval; + token_destroy (&src->tokens[deque_pop_front (&src->deque)].token); } -/* Parses a single token, setting appropriate global variables to - indicate the token's attributes. */ +/* Advances LEXER to the next token, consuming the current token. */ void lex_get (struct lexer *lexer) { - /* Find a token. */ - for (;;) - { - if (NULL == lexer->prog && ! lex_get_line (lexer) ) - { - lexer->token = T_STOP; - return; - } - - /* If a token was pushed ahead, return it. */ - if (lexer->put_token) - { - restore_token (lexer); - return; - } + struct lex_source *src; - for (;;) - { - /* Skip whitespace. */ - while (c_isspace ((unsigned char) *lexer->prog)) - lexer->prog++; - - if (*lexer->prog) - break; - - if (lexer->dot) - { - lexer->dot = 0; - lexer->token = T_ENDCMD; - return; - } - else if (!lex_get_line (lexer)) - { - lexer->prog = NULL; - lexer->token = T_STOP; - return; - } - - if (lexer->put_token) - { - restore_token (lexer); - return; - } - } - - - /* Actually parse the token. */ - ds_clear (&lexer->tokstr); - - switch (*lexer->prog) - { - case '-': case '.': - case '0': case '1': case '2': case '3': case '4': - case '5': case '6': case '7': case '8': case '9': - { - char *tail; - - /* `-' can introduce a negative number, or it can be a token by - itself. */ - if (*lexer->prog == '-') - { - ds_put_byte (&lexer->tokstr, *lexer->prog++); - while (c_isspace ((unsigned char) *lexer->prog)) - lexer->prog++; - - if (!c_isdigit ((unsigned char) *lexer->prog) && *lexer->prog != '.') - { - lexer->token = T_DASH; - break; - } - lexer->token = T_NEG_NUM; - } - else - lexer->token = T_POS_NUM; - - /* Parse the number, copying it into tokstr. */ - while (c_isdigit ((unsigned char) *lexer->prog)) - ds_put_byte (&lexer->tokstr, *lexer->prog++); - if (*lexer->prog == '.') - { - ds_put_byte (&lexer->tokstr, *lexer->prog++); - while (c_isdigit ((unsigned char) *lexer->prog)) - ds_put_byte (&lexer->tokstr, *lexer->prog++); - } - if (*lexer->prog == 'e' || *lexer->prog == 'E') - { - ds_put_byte (&lexer->tokstr, *lexer->prog++); - if (*lexer->prog == '+' || *lexer->prog == '-') - ds_put_byte (&lexer->tokstr, *lexer->prog++); - while (c_isdigit ((unsigned char) *lexer->prog)) - ds_put_byte (&lexer->tokstr, *lexer->prog++); - } - - /* Parse as floating point. */ - lexer->tokval = c_strtod (ds_cstr (&lexer->tokstr), &tail); - if (*tail) - { - msg (SE, _("%s does not form a valid number."), - ds_cstr (&lexer->tokstr)); - lexer->tokval = 0.0; - - ds_clear (&lexer->tokstr); - ds_put_byte (&lexer->tokstr, '0'); - } - - break; - } - - case '\'': case '"': - lexer->token = parse_string (lexer, CHARACTER_STRING); - break; - - case '+': - lexer->token = T_PLUS; - lexer->prog++; - break; - - case '/': - lexer->token = T_SLASH; - lexer->prog++; - break; - - case '=': - lexer->token = T_EQUALS; - lexer->prog++; - break; + src = lex_source__ (lexer); + if (src == NULL) + return; - case '(': - lexer->token = T_LPAREN; - lexer->prog++; - break; - - case ')': - lexer->token = T_RPAREN; - lexer->prog++; - break; - - case '[': - lexer->token = T_LBRACK; - lexer->prog++; - break; - - case ']': - lexer->token = T_RBRACK; - lexer->prog++; - break; + if (!deque_is_empty (&src->deque)) + lex_source_pop__ (src); - case ',': - lexer->token = T_COMMA; - lexer->prog++; - break; - - case '*': - if (*++lexer->prog == '*') - { - lexer->prog++; - lexer->token = T_EXP; - } - else - lexer->token = T_ASTERISK; - break; - - case '<': - if (*++lexer->prog == '=') - { - lexer->prog++; - lexer->token = T_LE; - } - else if (*lexer->prog == '>') - { - lexer->prog++; - lexer->token = T_NE; - } - else - lexer->token = T_LT; - break; - - case '>': - if (*++lexer->prog == '=') - { - lexer->prog++; - lexer->token = T_GE; - } - else - lexer->token = T_GT; - break; - - case '~': - if (*++lexer->prog == '=') - { - lexer->prog++; - lexer->token = T_NE; - } - else - lexer->token = T_NOT; - break; - - case '&': - lexer->prog++; - lexer->token = T_AND; - break; - - case '|': - lexer->prog++; - lexer->token = T_OR; - break; - - case 'b': case 'B': - if (lexer->prog[1] == '\'' || lexer->prog[1] == '"') - lexer->token = parse_string (lexer, BINARY_STRING); - else - lexer->token = parse_id (lexer); - break; + while (deque_is_empty (&src->deque)) + if (!lex_source_get__ (src)) + { + lex_source_destroy (src); + src = lex_source__ (lexer); + if (src == NULL) + return; + } +} + +/* Issuing errors. */ - case 'o': case 'O': - if (lexer->prog[1] == '\'' || lexer->prog[1] == '"') - lexer->token = parse_string (lexer, OCTAL_STRING); - else - lexer->token = parse_id (lexer); - break; +/* Prints a syntax error message containing the current token and + given message MESSAGE (if non-null). */ +void +lex_error (struct lexer *lexer, const char *format, ...) +{ + va_list args; - case 'x': case 'X': - if (lexer->prog[1] == '\'' || lexer->prog[1] == '"') - lexer->token = parse_string (lexer, HEX_STRING); - else - lexer->token = parse_id (lexer); - break; + va_start (args, format); + lex_next_error_valist (lexer, 0, 0, format, args); + va_end (args); +} - default: - if (lex_is_id1 (*lexer->prog)) - { - lexer->token = parse_id (lexer); - break; - } - else - { - unsigned char c = *lexer->prog++; - char *c_name = xasprintf (c_isgraph (c) ? "%c" : "\\%o", c); - msg (SE, _("Bad character in input: `%s'."), c_name); - free (c_name); - continue; - } - } - break; - } +/* Prints a syntax error message containing the current token and + given message MESSAGE (if non-null). */ +void +lex_error_valist (struct lexer *lexer, const char *format, va_list args) +{ + lex_next_error_valist (lexer, 0, 0, format, args); } -/* Parses an identifier at the current position into tokstr. - Returns the correct token type. */ -static int -parse_id (struct lexer *lexer) +/* Prints a syntax error message containing the current token and + given message MESSAGE (if non-null). */ +void +lex_next_error (struct lexer *lexer, int n0, int n1, const char *format, ...) { - struct substring rest_of_line - = ss_substr (ds_ss (&lexer->line_buffer), - ds_pointer_to_position (&lexer->line_buffer, lexer->prog), - SIZE_MAX); - struct substring id = ss_head (rest_of_line, - lex_id_get_length (rest_of_line)); - lexer->prog += ss_length (id); + va_list args; - ds_assign_substring (&lexer->tokstr, id); - return lex_id_to_token (id); + va_start (args, format); + lex_next_error_valist (lexer, n0, n1, format, args); + va_end (args); } /* Reports an error to the effect that subcommand SBC may only be @@ -438,36 +287,28 @@ lex_sbc_missing (struct lexer *lexer, const char *sbc) /* Prints a syntax error message containing the current token and given message MESSAGE (if non-null). */ void -lex_error (struct lexer *lexer, const char *message, ...) +lex_next_error_valist (struct lexer *lexer, int n0, int n1, + const char *format, va_list args) { - struct string s; - - ds_init_empty (&s); + struct lex_source *src = lex_source__ (lexer); - if (lexer->token == T_STOP) - ds_put_cstr (&s, _("Syntax error at end of file")); - else if (lexer->token == T_ENDCMD) - ds_put_cstr (&s, _("Syntax error at end of command")); + if (src != NULL) + lex_source_error_valist (src, n0, n1, format, args); else { - char *token_rep = lex_token_representation (lexer); - ds_put_format (&s, _("Syntax error at `%s'"), token_rep); - free (token_rep); - } - - if (message) - { - va_list args; + struct string s; - ds_put_cstr (&s, ": "); - - va_start (args, message); - ds_put_vformat (&s, message, args); - va_end (args); + ds_init_empty (&s); + ds_put_format (&s, _("Syntax error at end of input")); + if (format != NULL) + { + ds_put_cstr (&s, ": "); + ds_put_vformat (&s, format, args); + } + ds_put_byte (&s, '.'); + msg (SE, "%s", ds_cstr (&s)); + ds_destroy (&s); } - - msg (SE, "%s.", ds_cstr (&s)); - ds_destroy (&s); } /* Checks that we're at end of command. @@ -477,7 +318,7 @@ lex_error (struct lexer *lexer, const char *message, ...) int lex_end_of_command (struct lexer *lexer) { - if (lexer->token != T_ENDCMD) + if (lex_token (lexer) != T_ENDCMD && lex_token (lexer) != T_STOP) { lex_error (lexer, _("expecting end of command")); return CMD_FAILURE; @@ -492,35 +333,29 @@ lex_end_of_command (struct lexer *lexer) bool lex_is_number (struct lexer *lexer) { - return lexer->token == T_POS_NUM || lexer->token == T_NEG_NUM; + return lex_next_is_number (lexer, 0); } - /* Returns true if the current token is a string. */ bool lex_is_string (struct lexer *lexer) { - return lexer->token == T_STRING; + return lex_next_is_string (lexer, 0); } - /* Returns the value of the current token, which must be a floating point number. */ double lex_number (struct lexer *lexer) { - assert (lex_is_number (lexer)); - return lexer->tokval; + return lex_next_number (lexer, 0); } /* Returns true iff the current token is an integer. */ bool lex_is_integer (struct lexer *lexer) { - return (lex_is_number (lexer) - && lexer->tokval > LONG_MIN - && lexer->tokval <= LONG_MAX - && floor (lexer->tokval) == lexer->tokval); + return lex_next_is_integer (lexer, 0); } /* Returns the value of the current token, which must be an @@ -528,18 +363,70 @@ lex_is_integer (struct lexer *lexer) long lex_integer (struct lexer *lexer) { - assert (lex_is_integer (lexer)); - return lexer->tokval; + return lex_next_integer (lexer, 0); +} + +/* Token testing functions with lookahead. + + A value of 0 for N as an argument to any of these functions refers to the + current token. Lookahead is limited to the current command. Any N greater + than the number of tokens remaining in the current command will be treated + as referring to a T_ENDCMD token. */ + +/* Returns true if the token N ahead of the current token is a number. */ +bool +lex_next_is_number (struct lexer *lexer, int n) +{ + enum token_type next_token = lex_next_token (lexer, n); + return next_token == T_POS_NUM || next_token == T_NEG_NUM; +} + +/* Returns true if the token N ahead of the current token is a string. */ +bool +lex_next_is_string (struct lexer *lexer, int n) +{ + return lex_next_token (lexer, n) == T_STRING; +} + +/* Returns the value of the token N ahead of the current token, which must be a + floating point number. */ +double +lex_next_number (struct lexer *lexer, int n) +{ + assert (lex_next_is_number (lexer, n)); + return lex_next_tokval (lexer, n); +} + +/* Returns true if the token N ahead of the current token is an integer. */ +bool +lex_next_is_integer (struct lexer *lexer, int n) +{ + double value; + + if (!lex_next_is_number (lexer, n)) + return false; + + value = lex_next_tokval (lexer, n); + return value > LONG_MIN && value <= LONG_MAX && floor (value) == value; +} + +/* Returns the value of the token N ahead of the current token, which must be + an integer. */ +long +lex_next_integer (struct lexer *lexer, int n) +{ + assert (lex_next_is_integer (lexer, n)); + return lex_next_tokval (lexer, n); } /* Token matching functions. */ -/* If TOK is the current token, skips it and returns true +/* If the current token has the specified TYPE, skips it and returns true. Otherwise, returns false. */ bool -lex_match (struct lexer *lexer, enum token_type t) +lex_match (struct lexer *lexer, enum token_type type) { - if (lexer->token == t) + if (lex_token (lexer) == type) { lex_get (lexer); return true; @@ -548,25 +435,26 @@ lex_match (struct lexer *lexer, enum token_type t) return false; } -/* If the current token is the identifier S, skips it and returns - true. The identifier may be abbreviated to its first three - letters. - Otherwise, returns false. */ +/* If the current token matches IDENTIFIER, skips it and returns true. + IDENTIFIER may be abbreviated to its first three letters. Otherwise, + returns false. + + IDENTIFIER must be an ASCII string. */ bool -lex_match_id (struct lexer *lexer, const char *s) +lex_match_id (struct lexer *lexer, const char *identifier) { - return lex_match_id_n (lexer, s, 3); + return lex_match_id_n (lexer, identifier, 3); } -/* If the current token is the identifier S, skips it and returns - true. The identifier may be abbreviated to its first N - letters. - Otherwise, returns false. */ +/* If the current token is IDENTIFIER, skips it and returns true. IDENTIFIER + may be abbreviated to its first N letters. Otherwise, returns false. + + IDENTIFIER must be an ASCII string. */ bool -lex_match_id_n (struct lexer *lexer, const char *s, size_t n) +lex_match_id_n (struct lexer *lexer, const char *identifier, size_t n) { - if (lexer->token == T_ID - && lex_id_match_n (ss_cstr (s), lex_tokss (lexer), n)) + if (lex_token (lexer) == T_ID + && lex_id_match_n (ss_cstr (identifier), lex_tokss (lexer), n)) { lex_get (lexer); return true; @@ -575,8 +463,8 @@ lex_match_id_n (struct lexer *lexer, const char *s, size_t n) return false; } -/* If the current token is integer N, skips it and returns true. - Otherwise, returns false. */ +/* If the current token is integer X, skips it and returns true. Otherwise, + returns false. */ bool lex_match_int (struct lexer *lexer, int x) { @@ -591,39 +479,41 @@ lex_match_int (struct lexer *lexer, int x) /* Forced matches. */ -/* If this token is identifier S, fetches the next token and returns - nonzero. - Otherwise, reports an error and returns zero. */ +/* If this token is IDENTIFIER, skips it and returns true. IDENTIFIER may be + abbreviated to its first 3 letters. Otherwise, reports an error and returns + false. + + IDENTIFIER must be an ASCII string. */ bool -lex_force_match_id (struct lexer *lexer, const char *s) +lex_force_match_id (struct lexer *lexer, const char *identifier) { - if (lex_match_id (lexer, s)) + if (lex_match_id (lexer, identifier)) return true; else { - lex_error (lexer, _("expecting `%s'"), s); + lex_error (lexer, _("expecting `%s'"), identifier); return false; } } -/* If the current token is T, skips the token. Otherwise, reports an - error and returns from the current function with return value false. */ +/* If the current token has the specified TYPE, skips it and returns true. + Otherwise, reports an error and returns false. */ bool -lex_force_match (struct lexer *lexer, enum token_type t) +lex_force_match (struct lexer *lexer, enum token_type type) { - if (lexer->token == t) + if (lex_token (lexer) == type) { lex_get (lexer); return true; } else { - lex_error (lexer, _("expecting `%s'"), lex_token_name (t)); + lex_error (lexer, _("expecting `%s'"), token_type_to_string (type)); return false; } } -/* If this token is a string, does nothing and returns true. +/* If the current token is a string, does nothing and returns true. Otherwise, reports an error and returns false. */ bool lex_force_string (struct lexer *lexer) @@ -637,7 +527,7 @@ lex_force_string (struct lexer *lexer) } } -/* If this token is an integer, does nothing and returns true. +/* If the current token is an integer, does nothing and returns true. Otherwise, reports an error and returns false. */ bool lex_force_int (struct lexer *lexer) @@ -651,7 +541,7 @@ lex_force_int (struct lexer *lexer) } } -/* If this token is a number, does nothing and returns true. +/* If the current token is a number, does nothing and returns true. Otherwise, reports an error and returns false. */ bool lex_force_num (struct lexer *lexer) @@ -663,710 +553,1081 @@ lex_force_num (struct lexer *lexer) return false; } -/* If this token is an identifier, does nothing and returns true. +/* If the current token is an identifier, does nothing and returns true. Otherwise, reports an error and returns false. */ bool lex_force_id (struct lexer *lexer) { - if (lexer->token == T_ID) + if (lex_token (lexer) == T_ID) return true; lex_error (lexer, _("expecting identifier")); return false; } + +/* Token accessors. */ -/* Weird token functions. */ - -/* Returns the likely type of the next token, or 0 if it's hard to tell. */ +/* Returns the type of LEXER's current token. */ enum token_type -lex_look_ahead (struct lexer *lexer) +lex_token (const struct lexer *lexer) { - if (lexer->put_token) - return lexer->put_token; + return lex_next_token (lexer, 0); +} - for (;;) +/* Returns the number in LEXER's current token. + + Only T_NEG_NUM and T_POS_NUM tokens have meaningful values. For other + tokens this function will always return zero. */ +double +lex_tokval (const struct lexer *lexer) +{ + return lex_next_tokval (lexer, 0); +} + +/* Returns the null-terminated string in LEXER's current token, UTF-8 encoded. + + Only T_ID and T_STRING tokens have meaningful strings. For other tokens + this functions this function will always return NULL. + + The UTF-8 encoding of the returned string is correct for variable names and + other identifiers. Use filename_to_utf8() to use it as a filename. Use + data_in() to use it in a "union value". */ +const char * +lex_tokcstr (const struct lexer *lexer) +{ + return lex_next_tokcstr (lexer, 0); +} + +/* Returns the string in LEXER's current token, UTF-8 encoded. The string is + null-terminated (but the null terminator is not included in the returned + substring's 'length'). + + Only T_ID and T_STRING tokens have meaningful strings. For other tokens + this functions this function will always return NULL. + + The UTF-8 encoding of the returned string is correct for variable names and + other identifiers. Use filename_to_utf8() to use it as a filename. Use + data_in() to use it in a "union value". */ +struct substring +lex_tokss (const struct lexer *lexer) +{ + return lex_next_tokss (lexer, 0); +} + +/* Looking ahead. + + A value of 0 for N as an argument to any of these functions refers to the + current token. Lookahead is limited to the current command. Any N greater + than the number of tokens remaining in the current command will be treated + as referring to a T_ENDCMD token. */ + +static const struct lex_token * +lex_next__ (const struct lexer *lexer_, int n) +{ + struct lexer *lexer = CONST_CAST (struct lexer *, lexer_); + struct lex_source *src = lex_source__ (lexer); + + if (src != NULL) + return lex_source_next__ (src, n); + else + { + static const struct lex_token stop_token = + { TOKEN_INITIALIZER (T_STOP, 0.0, ""), 0, 0, 0, 0 }; + + return &stop_token; + } +} + +static const struct lex_token * +lex_source_next__ (const struct lex_source *src, int n) +{ + while (deque_count (&src->deque) <= n) { - if (NULL == lexer->prog && ! lex_get_line (lexer) ) - return 0; - - for (;;) - { - while (c_isspace ((unsigned char) *lexer->prog)) - lexer->prog++; - if (*lexer->prog) - break; - - if (lexer->dot) - return T_ENDCMD; - else if (!lex_get_line (lexer)) - return 0; - - if (lexer->put_token) - return lexer->put_token; - } - - switch (toupper ((unsigned char) *lexer->prog)) + if (!deque_is_empty (&src->deque)) { - case 'X': case 'B': case 'O': - if (lexer->prog[1] == '\'' || lexer->prog[1] == '"') - return T_STRING; - /* Fall through */ + struct lex_token *front; - case '-': - return T_DASH; + front = &src->tokens[deque_front (&src->deque, 0)]; + if (front->token.type == T_STOP || front->token.type == T_ENDCMD) + return front; + } + + lex_source_get__ (src); + } + + return &src->tokens[deque_back (&src->deque, n)]; +} + +/* Returns the "struct token" of the token N after the current one in LEXER. + The returned pointer can be invalidated by pretty much any succeeding call + into the lexer, although the string pointer within the returned token is + only invalidated by consuming the token (e.g. with lex_get()). */ +const struct token * +lex_next (const struct lexer *lexer, int n) +{ + return &lex_next__ (lexer, n)->token; +} + +/* Returns the type of the token N after the current one in LEXER. */ +enum token_type +lex_next_token (const struct lexer *lexer, int n) +{ + return lex_next (lexer, n)->type; +} + +/* Returns the number in the tokn N after the current one in LEXER. + + Only T_NEG_NUM and T_POS_NUM tokens have meaningful values. For other + tokens this function will always return zero. */ +double +lex_next_tokval (const struct lexer *lexer, int n) +{ + const struct token *token = lex_next (lexer, n); + return token->number; +} + +/* Returns the null-terminated string in the token N after the current one, in + UTF-8 encoding. + + Only T_ID and T_STRING tokens have meaningful strings. For other tokens + this functions this function will always return NULL. + + The UTF-8 encoding of the returned string is correct for variable names and + other identifiers. Use filename_to_utf8() to use it as a filename. Use + data_in() to use it in a "union value". */ +const char * +lex_next_tokcstr (const struct lexer *lexer, int n) +{ + return lex_next_tokss (lexer, n).string; +} + +/* Returns the string in the token N after the current one, in UTF-8 encoding. + The string is null-terminated (but the null terminator is not included in + the returned substring's 'length'). + + Only T_ID and T_STRING tokens have meaningful strings. For other tokens + this functions this function will always return NULL. + + The UTF-8 encoding of the returned string is correct for variable names and + other identifiers. Use filename_to_utf8() to use it as a filename. Use + data_in() to use it in a "union value". */ +struct substring +lex_next_tokss (const struct lexer *lexer, int n) +{ + return lex_next (lexer, n)->string; +} - case '.': - case '0': case '1': case '2': case '3': case '4': - case '5': case '6': case '7': case '8': case '9': - return T_POS_NUM; +/* If LEXER is positioned at the (pseudo)identifier S, skips it and returns + true. Otherwise, returns false. - case '\'': case '"': - return T_STRING; + S may consist of an arbitrary number of identifiers, integers, and + punctuation e.g. "KRUSKAL-WALLIS", "2SLS", or "END INPUT PROGRAM". + Identifiers may be abbreviated to their first three letters. Currently only + hyphens, slashes, and equals signs are supported as punctuation (but it + would be easy to add more). - case '+': - return T_PLUS; + S must be an ASCII string. */ +bool +lex_match_phrase (struct lexer *lexer, const char *s) +{ + int tok_idx; + + for (tok_idx = 0; ; tok_idx++) + { + enum token_type token; + unsigned char c; + + while (c_isspace (*s)) + s++; + + c = *s; + if (c == '\0') + { + int i; + + for (i = 0; i < tok_idx; i++) + lex_get (lexer); + return true; + } + + token = lex_next_token (lexer, tok_idx); + switch (c) + { + case '-': + if (token != T_DASH) + return false; + s++; + break; case '/': - return T_SLASH; + if (token != T_SLASH) + return false; + s++; + break; case '=': - return T_EQUALS; + if (token != T_EQUALS) + return false; + s++; + break; - case '(': - return T_LPAREN; + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + { + unsigned int value; - case ')': - return T_RPAREN; + if (token != T_POS_NUM) + return false; - case '[': - return T_LBRACK; + value = 0; + do + { + value = value * 10 + (*s++ - '0'); + } + while (c_isdigit (*s)); - case ']': - return T_RBRACK; + if (lex_next_tokval (lexer, tok_idx) != value) + return false; + } + break; - case ',': - return T_COMMA; + default: + if (lex_is_id1 (c)) + { + int len; - case '*': - return lexer->prog[1] == '*' ? T_EXP : T_ASTERISK; + if (token != T_ID) + return false; - case '<': - return (lexer->prog[1] == '=' ? T_LE - : lexer->prog[1] == '>' ? T_NE - : T_LT); + len = lex_id_get_length (ss_cstr (s)); + if (!lex_id_match (ss_buffer (s, len), + lex_next_tokss (lexer, tok_idx))) + return false; - case '>': - return lexer->prog[1] == '=' ? T_GE : T_GT; + s += len; + } + else + NOT_REACHED (); + } + } +} - case '~': - return lexer->prog[1] == '=' ? T_NE : T_NOT; +static int +lex_source_get_first_line_number (const struct lex_source *src, int n) +{ + return lex_source_next__ (src, n)->first_line; +} - case '&': - return T_AND; +static int +count_newlines (char *s, size_t length) +{ + int n_newlines = 0; + char *newline; - case '|': - return T_OR; + while ((newline = memchr (s, '\n', length)) != NULL) + { + n_newlines++; + length -= (newline + 1) - s; + s = newline + 1; + } - default: - if (lex_is_id1 (*lexer->prog)) - return T_ID; - return 0; + return n_newlines; +} + +static int +lex_source_get_last_line_number (const struct lex_source *src, int n) +{ + const struct lex_token *token = lex_source_next__ (src, n); + + if (token->first_line == 0) + return 0; + else + { + char *token_str = &src->buffer[token->token_pos - src->tail]; + return token->first_line + count_newlines (token_str, token->token_len) + 1; + } +} + +static int +count_columns (const char *s_, size_t length) +{ + const uint8_t *s = CHAR_CAST (const uint8_t *, s_); + int columns; + size_t ofs; + int mblen; + + columns = 0; + for (ofs = 0; ofs < length; ofs += mblen) + { + ucs4_t uc; + + mblen = u8_mbtouc (&uc, s + ofs, length - ofs); + if (uc != '\t') + { + int width = uc_width (uc, "UTF-8"); + if (width > 0) + columns += width; } + else + columns = ROUND_UP (columns + 1, 8); } + + return columns + 1; } -/* Makes the current token become the next token to be read; the - current token is set to T. */ -void -lex_put_back (struct lexer *lexer, enum token_type t) +static int +lex_source_get_first_column (const struct lex_source *src, int n) { - save_token (lexer); - lexer->token = t; + const struct lex_token *token = lex_source_next__ (src, n); + return count_columns (&src->buffer[token->line_pos - src->tail], + token->token_pos - token->line_pos); } - -/* Weird line processing functions. */ -/* Returns the entire contents of the current line. */ -const char * -lex_entire_line (const struct lexer *lexer) +static int +lex_source_get_last_column (const struct lex_source *src, int n) { - return ds_cstr (&lexer->line_buffer); + const struct lex_token *token = lex_source_next__ (src, n); + char *start, *end, *newline; + + start = &src->buffer[token->line_pos - src->tail]; + end = &src->buffer[(token->token_pos + token->token_len) - src->tail]; + newline = memrchr (start, '\n', end - start); + if (newline != NULL) + start = newline + 1; + return count_columns (start, end - start); } -const struct string * -lex_entire_line_ds (const struct lexer *lexer) +/* Returns the 1-based line number of the start of the syntax that represents + the token N after the current one in LEXER. Returns 0 for a T_STOP token or + if the token is drawn from a source that does not have line numbers. */ +int +lex_get_first_line_number (const struct lexer *lexer, int n) { - return &lexer->line_buffer; + const struct lex_source *src = lex_source__ (lexer); + return src != NULL ? lex_source_get_first_line_number (src, n) : 0; } -/* As lex_entire_line(), but only returns the part of the current line - that hasn't already been tokenized. */ +/* Returns the 1-based line number of the end of the syntax that represents the + token N after the current one in LEXER, plus 1. Returns 0 for a T_STOP + token or if the token is drawn from a source that does not have line + numbers. + + Most of the time, a single token is wholly within a single line of syntax, + but there are two exceptions: a T_STRING token can be made up of multiple + segments on adjacent lines connected with "+" punctuators, and a T_NEG_NUM + token can consist of a "-" on one line followed by the number on the next. + */ +int +lex_get_last_line_number (const struct lexer *lexer, int n) +{ + const struct lex_source *src = lex_source__ (lexer); + return src != NULL ? lex_source_get_last_line_number (src, n) : 0; +} + +/* Returns the 1-based column number of the start of the syntax that represents + the token N after the current one in LEXER. Returns 0 for a T_STOP + token. + + Column numbers are measured according to the width of characters as shown in + a typical fixed-width font, in which CJK characters have width 2 and + combining characters have width 0. */ +int +lex_get_first_column (const struct lexer *lexer, int n) +{ + const struct lex_source *src = lex_source__ (lexer); + return src != NULL ? lex_source_get_first_column (src, n) : 0; +} + +/* Returns the 1-based column number of the end of the syntax that represents + the token N after the current one in LEXER, plus 1. Returns 0 for a T_STOP + token. + + Column numbers are measured according to the width of characters as shown in + a typical fixed-width font, in which CJK characters have width 2 and + combining characters have width 0. */ +int +lex_get_last_column (const struct lexer *lexer, int n) +{ + const struct lex_source *src = lex_source__ (lexer); + return src != NULL ? lex_source_get_last_column (src, n) : 0; +} + +/* Returns the name of the syntax file from which the current command is drawn. + Returns NULL for a T_STOP token or if the command's source does not have + line numbers. + + There is no version of this function that takes an N argument because + lookahead only works to the end of a command and any given command is always + within a single syntax file. */ const char * -lex_rest_of_line (const struct lexer *lexer) +lex_get_file_name (const struct lexer *lexer) { - return lexer->prog; + struct lex_source *src = lex_source__ (lexer); + return src == NULL ? NULL : src->reader->file_name; } -/* Returns true if the current line ends in a terminal dot, - false otherwise. */ -bool -lex_end_dot (const struct lexer *lexer) +/* Returns the syntax mode for the syntax file from which the current drawn is + drawn. Returns LEX_SYNTAX_AUTO for a T_STOP token or if the command's + source does not have line numbers. + + There is no version of this function that takes an N argument because + lookahead only works to the end of a command and any given command is always + within a single syntax file. */ +enum lex_syntax_mode +lex_get_syntax_mode (const struct lexer *lexer) { - return lexer->dot; + struct lex_source *src = lex_source__ (lexer); + return src == NULL ? LEX_SYNTAX_AUTO : src->reader->syntax; } -/* Causes the rest of the current input line to be ignored for - tokenization purposes. */ -void -lex_discard_line (struct lexer *lexer) +/* Returns the error mode for the syntax file from which the current drawn is + drawn. Returns LEX_ERROR_INTERACTIVE for a T_STOP token or if the command's + source does not have line numbers. + + There is no version of this function that takes an N argument because + lookahead only works to the end of a command and any given command is always + within a single syntax file. */ +enum lex_error_mode +lex_get_error_mode (const struct lexer *lexer) { - ds_cstr (&lexer->line_buffer); /* Ensures ds_end points to something valid */ - lexer->prog = ds_end (&lexer->line_buffer); - lexer->dot = false; - lexer->put_token = 0; + struct lex_source *src = lex_source__ (lexer); + return src == NULL ? LEX_ERROR_INTERACTIVE : src->reader->error; } +/* If the source that LEXER is currently reading has error mode + LEX_ERROR_INTERACTIVE, discards all buffered input and tokens, so that the + next token to be read comes directly from whatever is next read from the + stream. -/* Discards the rest of the current command. - When we're reading commands from a file, we skip tokens until - a terminal dot or EOF. - When we're reading commands interactively from the user, - that's just discarding the current line, because presumably - the user doesn't want to finish typing a command that will be - ignored anyway. */ + It makes sense to call this function after encountering an error in a + command entered on the console, because usually the user would prefer not to + have cascading errors. */ +void +lex_interactive_reset (struct lexer *lexer) +{ + struct lex_source *src = lex_source__ (lexer); + if (src != NULL && src->reader->error == LEX_ERROR_INTERACTIVE) + { + src->head = src->tail = 0; + src->journal_pos = src->seg_pos = src->line_pos = 0; + src->n_newlines = 0; + src->suppress_next_newline = false; + segmenter_init (&src->segmenter, segmenter_get_mode (&src->segmenter)); + while (!deque_is_empty (&src->deque)) + lex_source_pop__ (src); + lex_source_push_endcmd__ (src); + } +} + +/* Advances past any tokens in LEXER up to a T_ENDCMD or T_STOP. */ void lex_discard_rest_of_command (struct lexer *lexer) { - if (!getl_is_interactive (lexer->ss)) + while (lex_token (lexer) != T_STOP && lex_token (lexer) != T_ENDCMD) + lex_get (lexer); +} + +/* Discards all lookahead tokens in LEXER, then discards all input sources + until it encounters one with error mode LEX_ERROR_INTERACTIVE or until it + runs out of input sources. */ +void +lex_discard_noninteractive (struct lexer *lexer) +{ + struct lex_source *src = lex_source__ (lexer); + + if (src != NULL) { - while (lexer->token != T_STOP && lexer->token != T_ENDCMD) - lex_get (lexer); + while (!deque_is_empty (&src->deque)) + lex_source_pop__ (src); + + for (; src != NULL && src->reader->error != LEX_ERROR_INTERACTIVE; + src = lex_source__ (lexer)) + lex_source_destroy (src); } - else - lex_discard_line (lexer); } -/* Weird line reading functions. */ +static size_t +lex_source_max_tail__ (const struct lex_source *src) +{ + const struct lex_token *token; + size_t max_tail; + + assert (src->seg_pos >= src->line_pos); + max_tail = MIN (src->journal_pos, src->line_pos); + + /* Use the oldest token also. (We know that src->deque cannot be empty + because we are in the process of adding a new token, which is already + initialized enough to use here.) */ + token = &src->tokens[deque_back (&src->deque, 0)]; + assert (token->token_pos >= token->line_pos); + max_tail = MIN (max_tail, token->line_pos); + + return max_tail; +} -/* Remove C-style comments in STRING, begun by slash-star and - terminated by star-slash or newline. */ static void -strip_comments (struct string *string) +lex_source_expand__ (struct lex_source *src) { - char *cp; - int quote; - bool in_comment; - - in_comment = false; - quote = EOF; - for (cp = ds_cstr (string); *cp; ) + if (src->head - src->tail >= src->allocated) { - /* If we're not in a comment, check for quote marks. */ - if (!in_comment) + size_t max_tail = lex_source_max_tail__ (src); + if (max_tail > src->tail) { - if (*cp == quote) - quote = EOF; - else if (*cp == '\'' || *cp == '"') - quote = *cp; + /* Advance the tail, freeing up room at the head. */ + memmove (src->buffer, src->buffer + (max_tail - src->tail), + src->head - max_tail); + src->tail = max_tail; } - - /* If we're not inside a quotation, check for comment. */ - if (quote == EOF) + else { - if (cp[0] == '/' && cp[1] == '*') - { - in_comment = true; - *cp++ = ' '; - *cp++ = ' '; - continue; - } - else if (in_comment && cp[0] == '*' && cp[1] == '/') - { - in_comment = false; - *cp++ = ' '; - *cp++ = ' '; - continue; - } + /* Buffer is completely full. Expand it. */ + src->buffer = x2realloc (src->buffer, &src->allocated); } - - /* Check commenting. */ - if (in_comment) - *cp = ' '; - cp++; } -} - -/* Prepares LINE, which is subject to the given SYNTAX rules, for - tokenization by stripping comments and determining whether it - is the beginning or end of a command and storing into - *LINE_STARTS_COMMAND and *LINE_ENDS_COMMAND appropriately. */ -void -lex_preprocess_line (struct string *line, - enum syntax_mode syntax, - bool *line_starts_command, - bool *line_ends_command) -{ - strip_comments (line); - ds_rtrim (line, ss_cstr (CC_SPACES)); - *line_ends_command = ds_chomp_byte (line, '.') || ds_is_empty (line); - *line_starts_command = false; - if (syntax == GETL_BATCH) + else { - int first = ds_first (line); - *line_starts_command = !c_isspace (first); - if (first == '+' || first == '-') - *ds_data (line) = ' '; + /* There's space available at the head of the buffer. Nothing to do. */ } } -/* Reads a line, without performing any preprocessing. */ -bool -lex_get_line_raw (struct lexer *lexer) +static void +lex_source_read__ (struct lex_source *src) { - bool ok = getl_read_line (lexer->ss, &lexer->line_buffer); - if (ok) + do { - const char *line = ds_cstr (&lexer->line_buffer); - text_item_submit (text_item_create (TEXT_ITEM_SYNTAX, line)); + size_t head_ofs; + size_t n; + + lex_source_expand__ (src); + + head_ofs = src->head - src->tail; + n = src->reader->class->read (src->reader, &src->buffer[head_ofs], + src->allocated - head_ofs, + segmenter_get_prompt (&src->segmenter)); + if (n == 0) + { + /* End of input. + + Ensure that the input always ends in a new-line followed by a null + byte, as required by the segmenter library. */ + + if (src->head == src->tail + || src->buffer[src->head - src->tail - 1] != '\n') + src->buffer[src->head++ - src->tail] = '\n'; + + lex_source_expand__ (src); + src->buffer[src->head++ - src->tail] = '\0'; + + return; + } + + src->head += n; } - else - lexer->prog = NULL; - return ok; + while (!memchr (&src->buffer[src->seg_pos - src->tail], '\n', + src->head - src->seg_pos)); } -/* Reads a line for use by the tokenizer, and preprocesses it by - removing comments, stripping trailing whitespace and the - terminal dot, and removing leading indentors. */ -bool -lex_get_line (struct lexer *lexer) +static struct lex_source * +lex_source__ (const struct lexer *lexer) +{ + return (ll_is_empty (&lexer->sources) ? NULL + : ll_data (ll_head (&lexer->sources), struct lex_source, ll)); +} + +static struct substring +lex_source_get_syntax__ (const struct lex_source *src, int n0, int n1) { - bool line_starts_command; + const struct lex_token *token0 = lex_source_next__ (src, n0); + const struct lex_token *token1 = lex_source_next__ (src, MAX (n0, n1)); + size_t start = token0->token_pos; + size_t end = token1->token_pos + token1->token_len; - if (!lex_get_line_raw (lexer)) - return false; + return ss_buffer (&src->buffer[start - src->tail], end - start); +} - lex_preprocess_line (&lexer->line_buffer, - lex_current_syntax_mode (lexer), - &line_starts_command, &lexer->dot); +static void +lex_ellipsize__ (struct substring in, char *out, size_t out_size) +{ + size_t out_maxlen; + size_t out_len; + int mblen; - if (line_starts_command) - lexer->put_token = T_ENDCMD; + assert (out_size >= 16); + out_maxlen = out_size - (in.length >= out_size ? 3 : 0) - 1; + for (out_len = 0; out_len < in.length; out_len += mblen) + { + if (in.string[out_len] == '\n' + || (in.string[out_len] == '\r' + && out_len + 1 < in.length + && in.string[out_len + 1] == '\n')) + break; + + mblen = u8_mblen (CHAR_CAST (const uint8_t *, in.string + out_len), + in.length - out_len); + if (out_len + mblen > out_maxlen) + break; + } - lexer->prog = ds_cstr (&lexer->line_buffer); - return true; + memcpy (out, in.string, out_len); + strcpy (&out[out_len], out_len < in.length ? "..." : ""); } - -/* Token names. */ -/* Returns the name of a token. */ -const char * -lex_token_name (enum token_type token) +static void +lex_source_error_valist (struct lex_source *src, int n0, int n1, + const char *format, va_list args) { - switch (token) - { - case T_ID: - case T_POS_NUM: - case T_NEG_NUM: - case T_STRING: - case TOKEN_N_TYPES: - NOT_REACHED (); + const struct lex_token *token; + struct string s; + struct msg m; - case T_STOP: - return ""; + ds_init_empty (&s); - case T_ENDCMD: - return "."; + token = lex_source_next__ (src, n0); + if (token->token.type == T_ENDCMD) + ds_put_cstr (&s, _("Syntax error at end of command")); + else + { + struct substring syntax = lex_source_get_syntax__ (src, n0, n1); + if (!ss_is_empty (syntax)) + { + char syntax_cstr[64]; - case T_PLUS: - return "+"; + lex_ellipsize__ (syntax, syntax_cstr, sizeof syntax_cstr); + ds_put_format (&s, _("Syntax error at `%s'"), syntax_cstr); + } + else + ds_put_cstr (&s, _("Syntax error")); + } - case T_DASH: - return "-"; + if (format) + { + ds_put_cstr (&s, ": "); + ds_put_vformat (&s, format, args); + } + ds_put_byte (&s, '.'); + + m.category = MSG_C_SYNTAX; + m.severity = MSG_S_ERROR; + m.file_name = src->reader->file_name; + m.first_line = lex_source_get_first_line_number (src, n0); + m.last_line = lex_source_get_last_line_number (src, n1); + m.first_column = lex_source_get_first_column (src, n0); + m.last_column = lex_source_get_last_column (src, n1); + m.text = ds_steal_cstr (&s); + msg_emit (&m); +} - case T_ASTERISK: - return "*"; +static void PRINTF_FORMAT (2, 3) +lex_get_error (struct lex_source *src, const char *format, ...) +{ + va_list args; + int n; - case T_SLASH: - return "/"; + va_start (args, format); - case T_EQUALS: - return "="; + n = deque_count (&src->deque) - 1; + lex_source_error_valist (src, n, n, format, args); + lex_source_pop_front (src); - case T_LPAREN: - return "("; + va_end (args); +} - case T_RPAREN: - return ")"; +static bool +lex_source_get__ (const struct lex_source *src_) +{ + struct lex_source *src = CONST_CAST (struct lex_source *, src_); - case T_LBRACK: - return "["; + struct state + { + struct segmenter segmenter; + enum segment_type last_segment; + int newlines; + size_t line_pos; + size_t seg_pos; + }; + + struct state state, saved; + enum scan_result result; + struct scanner scanner; + struct lex_token *token; + int n_lines; + int i; + + if (src->eof) + return false; - case T_RBRACK: - return "]"; + state.segmenter = src->segmenter; + state.newlines = 0; + state.seg_pos = src->seg_pos; + state.line_pos = src->line_pos; + saved = state; + + token = lex_push_token__ (src); + scanner_init (&scanner, &token->token); + token->line_pos = src->line_pos; + token->token_pos = src->seg_pos; + if (src->reader->line_number > 0) + token->first_line = src->reader->line_number + src->n_newlines; + else + token->first_line = 0; - case T_COMMA: - return ","; + for (;;) + { + enum segment_type type; + const char *segment; + size_t seg_maxlen; + int seg_len; + + segment = &src->buffer[state.seg_pos - src->tail]; + seg_maxlen = src->head - state.seg_pos; + seg_len = segmenter_push (&state.segmenter, segment, seg_maxlen, &type); + if (seg_len < 0) + { + lex_source_read__ (src); + continue; + } - case T_AND: - return "AND"; + state.last_segment = type; + state.seg_pos += seg_len; + if (type == SEG_NEWLINE) + { + state.newlines++; + state.line_pos = state.seg_pos; + } - case T_OR: - return "OR"; + result = scanner_push (&scanner, type, ss_buffer (segment, seg_len), + &token->token); + if (result == SCAN_SAVE) + saved = state; + else if (result == SCAN_BACK) + { + state = saved; + break; + } + else if (result == SCAN_DONE) + break; + } - case T_NOT: - return "NOT"; + n_lines = state.newlines; + if (state.last_segment == SEG_END_COMMAND && !src->suppress_next_newline) + { + n_lines++; + src->suppress_next_newline = true; + } + else if (n_lines > 0 && src->suppress_next_newline) + { + n_lines--; + src->suppress_next_newline = false; + } + for (i = 0; i < n_lines; i++) + { + const char *newline; + const char *line; + size_t line_len; - case T_EQ: - return "EQ"; + line = &src->buffer[src->journal_pos - src->tail]; + newline = rawmemchr (line, '\n'); + line_len = newline - line; + if (line_len > 0 && line[line_len - 1] == '\r') + line_len--; - case T_GE: - return ">="; + text_item_submit (text_item_create_nocopy (TEXT_ITEM_SYNTAX, + xmemdup0 (line, line_len))); - case T_GT: - return ">"; + src->journal_pos += newline - line + 1; + } - case T_LE: - return "<="; + token->token_len = state.seg_pos - src->seg_pos; - case T_LT: - return "<"; + src->segmenter = state.segmenter; + src->seg_pos = state.seg_pos; + src->line_pos = state.line_pos; + src->n_newlines += state.newlines; - case T_NE: - return "~="; + switch (token->token.type) + { + default: + break; - case T_ALL: - return "ALL"; + case T_STOP: + token->token.type = T_ENDCMD; + src->eof = true; + break; - case T_BY: - return "BY"; + case SCAN_BAD_HEX_LENGTH: + lex_get_error (src, _("String of hex digits has %d characters, which " + "is not a multiple of 2"), + (int) token->token.number); + break; - case T_TO: - return "TO"; + case SCAN_BAD_HEX_DIGIT: + case SCAN_BAD_UNICODE_DIGIT: + lex_get_error (src, _("`%c' is not a valid hex digit"), + (int) token->token.number); + break; - case T_WITH: - return "WITH"; + case SCAN_BAD_UNICODE_LENGTH: + lex_get_error (src, _("Unicode string contains %d bytes, which is " + "not in the valid range of 1 to 8 bytes"), + (int) token->token.number); + break; - case T_EXP: - return "**"; - } + case SCAN_BAD_UNICODE_CODE_POINT: + lex_get_error (src, _("U+%04X is not a valid Unicode code point"), + (int) token->token.number); + break; - NOT_REACHED (); -} + case SCAN_EXPECTED_QUOTE: + lex_get_error (src, _("Unterminated string constant")); + break; -/* Returns an ASCII representation of the current token as a - malloc()'d string. */ -char * -lex_token_representation (struct lexer *lexer) -{ - char *token_rep; + case SCAN_EXPECTED_EXPONENT: + lex_get_error (src, _("Missing exponent following `%s'"), + token->token.string.string); + break; - switch (lexer->token) - { - case T_ID: - case T_POS_NUM: - case T_NEG_NUM: - return ss_xstrdup (lex_tokss (lexer)); + case SCAN_UNEXPECTED_DOT: + lex_get_error (src, _("Unexpected `.' in middle of command")); + break; - case T_STRING: + case SCAN_UNEXPECTED_CHAR: { - struct substring ss; - int hexstring = 0; - char *sp, *dp; - - ss = lex_tokss (lexer); - for (sp = ss_data (ss); sp < ss_end (ss); sp++) - if (!c_isprint ((unsigned char) *sp)) - { - hexstring = 1; - break; - } - - token_rep = xmalloc (2 + ss_length (ss) * 2 + 1 + 1); - - dp = token_rep; - if (hexstring) - *dp++ = 'X'; - *dp++ = '\''; - - for (sp = ss_data (ss); sp < ss_end (ss); sp++) - if (!hexstring) - { - if (*sp == '\'') - *dp++ = '\''; - *dp++ = (unsigned char) *sp; - } - else - { - *dp++ = (((unsigned char) *sp) >> 4)["0123456789ABCDEF"]; - *dp++ = (((unsigned char) *sp) & 15)["0123456789ABCDEF"]; - } - *dp++ = '\''; - *dp = '\0'; - - return token_rep; + char c_name[16]; + lex_get_error (src, _("Bad character %s in input"), + uc_name (token->token.number, c_name)); } + break; - default: - return xstrdup (lex_token_name (lexer->token)); + case SCAN_SKIP: + lex_source_pop_front (src); + break; } + + return true; } -/* Really weird functions. */ +static void +lex_source_push_endcmd__ (struct lex_source *src) +{ + struct lex_token *token = lex_push_token__ (src); + token->token.type = T_ENDCMD; + token->token_pos = 0; + token->token_len = 0; + token->line_pos = 0; + token->first_line = 0; +} -/* Skip a COMMENT command. */ -void -lex_skip_comment (struct lexer *lexer) +static struct lex_source * +lex_source_create (struct lex_reader *reader) { - for (;;) - { - if (!lex_get_line (lexer)) - { - lexer->put_token = T_STOP; - lexer->prog = NULL; - return; - } + struct lex_source *src; + enum segmenter_mode mode; + + src = xzalloc (sizeof *src); + src->reader = reader; + + if (reader->syntax == LEX_SYNTAX_AUTO) + mode = SEG_MODE_AUTO; + else if (reader->syntax == LEX_SYNTAX_INTERACTIVE) + mode = SEG_MODE_INTERACTIVE; + else if (reader->syntax == LEX_SYNTAX_BATCH) + mode = SEG_MODE_BATCH; + else + NOT_REACHED (); + segmenter_init (&src->segmenter, mode); - if (lexer->put_token == T_ENDCMD) - break; + src->tokens = deque_init (&src->deque, 4, sizeof *src->tokens); - ds_cstr (&lexer->line_buffer); /* Ensures ds_end will point to a valid char */ - lexer->prog = ds_end (&lexer->line_buffer); - if (lexer->dot) - break; - } + lex_source_push_endcmd__ (src); + + return src; } - -/* Private functions. */ -/* When invoked, tokstr contains a string of binary, octal, or - hex digits, according to TYPE. The string is converted to - characters having the specified values. */ static void -convert_numeric_string_to_char_string (struct lexer *lexer, - enum string_type type) +lex_source_destroy (struct lex_source *src) +{ + char *file_name = src->reader->file_name; + if (src->reader->class->close != NULL) + src->reader->class->close (src->reader); + free (file_name); + free (src->buffer); + while (!deque_is_empty (&src->deque)) + lex_source_pop__ (src); + free (src->tokens); + ll_remove (&src->ll); + free (src); +} + +struct lex_file_reader + { + struct lex_reader reader; + struct u8_istream *istream; + char *file_name; + }; + +static struct lex_reader_class lex_file_reader_class; + +/* Creates and returns a new lex_reader that will read from file FILE_NAME (or + from stdin if FILE_NAME is "-"). The file is expected to be encoded with + ENCODING, which should take one of the forms accepted by + u8_istream_for_file(). SYNTAX and ERROR become the syntax mode and error + mode of the new reader, respectively. + + Returns a null pointer if FILE_NAME cannot be opened. */ +struct lex_reader * +lex_reader_for_file (const char *file_name, const char *encoding, + enum lex_syntax_mode syntax, + enum lex_error_mode error) { - const char *base_name; - int base; - int chars_per_byte; - size_t byte_cnt; - size_t i; - char *p; + struct lex_file_reader *r; + struct u8_istream *istream; - switch (type) + istream = (!strcmp(file_name, "-") + ? u8_istream_for_fd (encoding, STDIN_FILENO) + : u8_istream_for_file (encoding, file_name, O_RDONLY)); + if (istream == NULL) { - case BINARY_STRING: - base_name = _("binary"); - base = 2; - chars_per_byte = 8; - break; - case OCTAL_STRING: - base_name = _("octal"); - base = 8; - chars_per_byte = 3; - break; - case HEX_STRING: - base_name = _("hex"); - base = 16; - chars_per_byte = 2; - break; - default: - NOT_REACHED (); + msg (ME, _("Opening `%s': %s."), file_name, strerror (errno)); + return NULL; } - byte_cnt = ds_length (&lexer->tokstr) / chars_per_byte; - if (ds_length (&lexer->tokstr) % chars_per_byte) - msg (SE, _("String of %s digits has %zu characters, which is not a " - "multiple of %d."), - base_name, ds_length (&lexer->tokstr), chars_per_byte); + r = xmalloc (sizeof *r); + lex_reader_init (&r->reader, &lex_file_reader_class); + r->reader.syntax = syntax; + r->reader.error = error; + r->reader.file_name = xstrdup (file_name); + r->reader.line_number = 1; + r->istream = istream; + r->file_name = xstrdup (file_name); - p = ds_cstr (&lexer->tokstr); - for (i = 0; i < byte_cnt; i++) + return &r->reader; +} + +static struct lex_file_reader * +lex_file_reader_cast (struct lex_reader *r) +{ + return UP_CAST (r, struct lex_file_reader, reader); +} + +static size_t +lex_file_read (struct lex_reader *r_, char *buf, size_t n, + enum prompt_style prompt_style UNUSED) +{ + struct lex_file_reader *r = lex_file_reader_cast (r_); + ssize_t n_read = u8_istream_read (r->istream, buf, n); + if (n_read < 0) { - int value; - int j; - - value = 0; - for (j = 0; j < chars_per_byte; j++, p++) - { - int v; - - if (*p >= '0' && *p <= '9') - v = *p - '0'; - else - { - static const char alpha[] = "abcdef"; - const char *q = strchr (alpha, tolower ((unsigned char) *p)); - - if (q) - v = q - alpha + 10; - else - v = base; - } - - if (v >= base) - msg (SE, _("`%c' is not a valid %s digit."), *p, base_name); - - value = value * base + v; - } - - ds_cstr (&lexer->tokstr)[i] = (unsigned char) value; + msg (ME, _("Error reading `%s': %s."), r->file_name, strerror (errno)); + return 0; } - - ds_truncate (&lexer->tokstr, byte_cnt); + return n_read; } -/* Parses a string from the input buffer into tokstr. The input - buffer pointer lexer->prog must point to the initial single or double - quote. TYPE indicates the type of string to be parsed. - Returns token type. */ -static int -parse_string (struct lexer *lexer, enum string_type type) +static void +lex_file_close (struct lex_reader *r_) { - if (type != CHARACTER_STRING) - lexer->prog++; + struct lex_file_reader *r = lex_file_reader_cast (r_); - /* Accumulate the entire string, joining sections indicated by + - signs. */ - for (;;) + if (u8_istream_fileno (r->istream) != STDIN_FILENO) { - /* Single or double quote. */ - int c = *lexer->prog++; - - /* Accumulate section. */ - for (;;) - { - /* Check end of line. */ - if (*lexer->prog == '\0') - { - msg (SE, _("Unterminated string constant.")); - goto finish; - } - - /* Double quote characters to embed them in strings. */ - if (*lexer->prog == c) - { - if (lexer->prog[1] == c) - lexer->prog++; - else - break; - } - - ds_put_byte (&lexer->tokstr, *lexer->prog++); - } - lexer->prog++; - - /* Skip whitespace after final quote mark. */ - if (lexer->prog == NULL) - break; - for (;;) - { - while (c_isspace ((unsigned char) *lexer->prog)) - lexer->prog++; - if (*lexer->prog) - break; - - if (lexer->dot) - goto finish; - - if (!lex_get_line (lexer)) - goto finish; - } - - /* Skip plus sign. */ - if (*lexer->prog != '+') - break; - lexer->prog++; - - /* Skip whitespace after plus sign. */ - if (lexer->prog == NULL) - break; - for (;;) - { - while (c_isspace ((unsigned char) *lexer->prog)) - lexer->prog++; - if (*lexer->prog) - break; - - if (lexer->dot) - goto finish; - - if (!lex_get_line (lexer)) - { - msg (SE, _("Unexpected end of file in string concatenation.")); - goto finish; - } - } - - /* Ensure that a valid string follows. */ - if (*lexer->prog != '\'' && *lexer->prog != '"') - { - msg (SE, _("String expected following `+'.")); - goto finish; - } + if (u8_istream_close (r->istream) != 0) + msg (ME, _("Error closing `%s': %s."), r->file_name, strerror (errno)); } + else + u8_istream_free (r->istream); - /* We come here when we've finished concatenating all the string sections - into one large string. */ -finish: - if (type != CHARACTER_STRING) - convert_numeric_string_to_char_string (lexer, type); - - return T_STRING; + free (r->file_name); + free (r); } + +static struct lex_reader_class lex_file_reader_class = + { + lex_file_read, + lex_file_close + }; -/* Token Accessor Functions */ +struct lex_string_reader + { + struct lex_reader reader; + struct substring s; + size_t offset; + }; -enum token_type -lex_token (const struct lexer *lexer) +static struct lex_reader_class lex_string_reader_class; + +/* Creates and returns a new lex_reader for the contents of S, which must be + encoded in UTF-8. The new reader takes ownership of S and will free it + with ss_dealloc() when it is closed. */ +struct lex_reader * +lex_reader_for_substring_nocopy (struct substring s) { - return lexer->token; + struct lex_string_reader *r; + + r = xmalloc (sizeof *r); + lex_reader_init (&r->reader, &lex_string_reader_class); + r->reader.syntax = LEX_SYNTAX_INTERACTIVE; + r->s = s; + r->offset = 0; + + return &r->reader; } -double -lex_tokval (const struct lexer *lexer) +/* Creates and returns a new lex_reader for a copy of null-terminated string S, + which must be encoded in UTF-8. The caller retains ownership of S. */ +struct lex_reader * +lex_reader_for_string (const char *s) { - return lexer->tokval; + struct substring ss; + ss_alloc_substring (&ss, ss_cstr (s)); + return lex_reader_for_substring_nocopy (ss); } -/* Returns the null-terminated string value associated with LEXER's current - token. For a T_ID token, this is the identifier, and for a T_STRING token, - this is the string. For other tokens the value is undefined. */ -const char * -lex_tokcstr (const struct lexer *lexer) +/* Formats FORMAT as a printf()-like format string and creates and returns a + new lex_reader for the formatted result. */ +struct lex_reader * +lex_reader_for_format (const char *format, ...) { - return ds_cstr (&lexer->tokstr); + struct lex_reader *r; + va_list args; + + va_start (args, format); + r = lex_reader_for_substring_nocopy (ss_cstr (xvasprintf (format, args))); + va_end (args); + + return r; } -/* Returns the string value associated with LEXER's current token. For a T_ID - token, this is the identifier, and for a T_STRING token, this is the string. - For other tokens the value is undefined. */ -struct substring -lex_tokss (const struct lexer *lexer) +static struct lex_string_reader * +lex_string_reader_cast (struct lex_reader *r) { - return ds_ss (&lexer->tokstr); + return UP_CAST (r, struct lex_string_reader, reader); } -/* If the lexer is positioned at the (pseudo)identifier S, which - may contain a hyphen ('-'), skips it and returns true. Each - half of the identifier may be abbreviated to its first three - letters. - Otherwise, returns false. */ -bool -lex_match_hyphenated_word (struct lexer *lexer, const char *s) -{ - const char *hyphen = strchr (s, '-'); - if (hyphen == NULL) - return lex_match_id (lexer, s); - else if (lexer->token != T_ID - || !lex_id_match (ss_buffer (s, hyphen - s), lex_tokss (lexer)) - || lex_look_ahead (lexer) != T_DASH) - return false; - else - { - lex_get (lexer); - lex_force_match (lexer, T_DASH); - lex_force_match_id (lexer, hyphen + 1); - return true; - } +static size_t +lex_string_read (struct lex_reader *r_, char *buf, size_t n, + enum prompt_style prompt_style UNUSED) +{ + struct lex_string_reader *r = lex_string_reader_cast (r_); + size_t chunk; + + chunk = MIN (n, r->s.length - r->offset); + memcpy (buf, r->s.string + r->offset, chunk); + r->offset += chunk; + + return chunk; } +static void +lex_string_close (struct lex_reader *r_) +{ + struct lex_string_reader *r = lex_string_reader_cast (r_); + + ss_dealloc (&r->s); + free (r); +} + +static struct lex_reader_class lex_string_reader_class = + { + lex_string_read, + lex_string_close + }; diff --git a/src/language/lexer/lexer.h b/src/language/lexer/lexer.h index f00ca77d..b9e936bf 100644 --- a/src/language/lexer/lexer.h +++ b/src/language/lexer/lexer.h @@ -14,34 +14,91 @@ You should have received a copy of the GNU General Public License along with this program. If not, see . */ -#if !lexer_h -#define lexer_h 1 +#ifndef LEXER_H +#define LEXER_H 1 -#include #include #include #include "data/identifier.h" #include "data/variable.h" -#include "libpspp/getl.h" +#include "libpspp/compiler.h" +#include "libpspp/prompt.h" struct lexer; +/* The syntax mode for which a syntax file is intended. */ +enum lex_syntax_mode + { + LEX_SYNTAX_AUTO, /* Try to guess intent. */ + LEX_SYNTAX_INTERACTIVE, /* Interactive mode. */ + LEX_SYNTAX_BATCH /* Batch mode. */ + }; + +/* Handling of errors. */ +enum lex_error_mode + { + LEX_ERROR_INTERACTIVE, /* Always continue to next command. */ + LEX_ERROR_CONTINUE, /* Continue to next command, except for + cascading failures. */ + LEX_ERROR_STOP /* Stop processing. */ + }; + +/* Reads a single syntax file as a stream of bytes encoded in UTF-8. + + Not opaque. */ +struct lex_reader + { + const struct lex_reader_class *class; + enum lex_syntax_mode syntax; + enum lex_error_mode error; + char *file_name; /* NULL if not associated with a file. */ + int line_number; /* 1-based initial line number, 0 if none. */ + }; + +/* An implementation of a lex_reader. */ +struct lex_reader_class + { + /* Reads up to N bytes of data from READER into N. Returns the positive + number of bytes read if successful, or zero at end of input or on + error. + + STYLE provides a hint to interactive readers as to what kind of syntax + is being read right now. */ + size_t (*read) (struct lex_reader *reader, char *buf, size_t n, + enum prompt_style style); + + /* Closes and destroys READER, releasing any allocated storage. + + The caller will free the 'file_name' member of READER, so the + implementation should not do so. */ + void (*close) (struct lex_reader *reader); + }; + +/* Helper functions for lex_reader. */ +void lex_reader_init (struct lex_reader *, const struct lex_reader_class *); +void lex_reader_set_file_name (struct lex_reader *, const char *file_name); + +/* Creating various kinds of lex_readers. */ +struct lex_reader *lex_reader_for_file (const char *file_name, + const char *encoding, + enum lex_syntax_mode syntax, + enum lex_error_mode error); +struct lex_reader *lex_reader_for_string (const char *); +struct lex_reader *lex_reader_for_format (const char *, ...) + PRINTF_FORMAT (1, 2); +struct lex_reader *lex_reader_for_substring_nocopy (struct substring); + /* Initialization. */ -struct lexer * lex_create (struct source_stream *); +struct lexer *lex_create (void); void lex_destroy (struct lexer *); -/* State accessors */ -struct source_stream * lex_get_source_stream (const struct lexer *); -enum syntax_mode lex_current_syntax_mode (const struct lexer *); -enum error_mode lex_current_error_mode (const struct lexer *); +/* Files. */ +void lex_include (struct lexer *, struct lex_reader *); +void lex_append (struct lexer *, struct lex_reader *); -/* Common functions. */ +/* Advancing. */ void lex_get (struct lexer *); -void lex_error (struct lexer *, const char *, ...); -void lex_sbc_only_once (const char *); -void lex_sbc_missing (struct lexer *, const char *); -int lex_end_of_command (struct lexer *); /* Token testing functions. */ bool lex_is_number (struct lexer *); @@ -50,14 +107,19 @@ bool lex_is_integer (struct lexer *); long lex_integer (struct lexer *); bool lex_is_string (struct lexer *); +/* Token testing functions with lookahead. */ +bool lex_next_is_number (struct lexer *, int n); +double lex_next_number (struct lexer *, int n); +bool lex_next_is_integer (struct lexer *, int n); +long lex_next_integer (struct lexer *, int n); +bool lex_next_is_string (struct lexer *, int n); /* Token matching functions. */ bool lex_match (struct lexer *, enum token_type); bool lex_match_id (struct lexer *, const char *); bool lex_match_id_n (struct lexer *, const char *, size_t n); bool lex_match_int (struct lexer *, int); -bool lex_match_hyphenated_word (struct lexer *lexer, const char *s); - +bool lex_match_phrase (struct lexer *, const char *s); /* Forcible matching functions. */ bool lex_force_match (struct lexer *, enum token_type); @@ -67,36 +129,46 @@ bool lex_force_num (struct lexer *); bool lex_force_id (struct lexer *); bool lex_force_string (struct lexer *); -/* Weird token functions. */ -enum token_type lex_look_ahead (struct lexer *); -void lex_put_back (struct lexer *, enum token_type); - -/* Weird line processing functions. */ -const char *lex_entire_line (const struct lexer *); -const struct string *lex_entire_line_ds (const struct lexer *); -const char *lex_rest_of_line (const struct lexer *); -bool lex_end_dot (const struct lexer *); -void lex_preprocess_line (struct string *, enum syntax_mode, - bool *line_starts_command, - bool *line_ends_command); -void lex_discard_line (struct lexer *); -void lex_discard_rest_of_command (struct lexer *); - -/* Weird line reading functions. */ -bool lex_get_line (struct lexer *); -bool lex_get_line_raw (struct lexer *); - -/* Token names. */ -const char *lex_token_name (enum token_type); -char *lex_token_representation (struct lexer *); - -/* Token accessors */ +/* Token accessors. */ enum token_type lex_token (const struct lexer *); double lex_tokval (const struct lexer *); const char *lex_tokcstr (const struct lexer *); struct substring lex_tokss (const struct lexer *); -/* Really weird functions. */ -void lex_skip_comment (struct lexer *); +/* Looking ahead. */ +const struct token *lex_next (const struct lexer *, int n); +enum token_type lex_next_token (const struct lexer *, int n); +const char *lex_next_tokcstr (const struct lexer *, int n); +double lex_next_tokval (const struct lexer *, int n); +struct substring lex_next_tokss (const struct lexer *, int n); + +/* Current position. */ +int lex_get_first_line_number (const struct lexer *, int n); +int lex_get_last_line_number (const struct lexer *, int n); +int lex_get_first_column (const struct lexer *, int n); +int lex_get_last_column (const struct lexer *, int n); +const char *lex_get_file_name (const struct lexer *); + +/* Issuing errors. */ +void lex_error (struct lexer *, const char *, ...) PRINTF_FORMAT (2, 3); +void lex_next_error (struct lexer *, int n0, int n1, const char *, ...) + PRINTF_FORMAT (4, 5); +int lex_end_of_command (struct lexer *); + +void lex_sbc_only_once (const char *); +void lex_sbc_missing (struct lexer *, const char *); + +void lex_error_valist (struct lexer *, const char *, va_list) + PRINTF_FORMAT (2, 0); +void lex_next_error_valist (struct lexer *lexer, int n0, int n1, + const char *format, va_list) + PRINTF_FORMAT (4, 0); + +/* Error handling. */ +enum lex_syntax_mode lex_get_syntax_mode (const struct lexer *); +enum lex_error_mode lex_get_error_mode (const struct lexer *); +void lex_discard_rest_of_command (struct lexer *); +void lex_interactive_reset (struct lexer *); +void lex_discard_noninteractive (struct lexer *); -#endif /* !lexer_h */ +#endif /* lexer.h */ diff --git a/src/language/lexer/q2c.c b/src/language/lexer/q2c.c index d578dbc2..f53ccfc3 100644 --- a/src/language/lexer/q2c.c +++ b/src/language/lexer/q2c.c @@ -1425,7 +1425,7 @@ make_match (const char *t) else if (strchr (t, hyphen_proxy)) { char *c = unmunge (t); - sprintf (s, "lex_match_hyphenated_word (lexer, \"%s\")", c); + sprintf (s, "lex_match_phrase (lexer, \"%s\")", c); free (c); } else @@ -1836,12 +1836,12 @@ dump_parser (int persistent) if (def->type == SBC_VARLIST) dump (1, "if (lex_token (lexer) == T_ID " "&& dict_lookup_var (dataset_dict (ds), lex_tokcstr (lexer)) != NULL " - "&& lex_look_ahead (lexer) != '=')"); + "&& lex_next_token (lexer, 1) != T_EQUALS)"); else { dump (0, "if ((lex_token (lexer) == T_ID " "&& dict_lookup_var (dataset_dict (ds), lex_tokcstr (lexer)) " - "&& lex_look_ahead () != '=')"); + "&& lex_next_token (lexer, 1) != T_EQUALS)"); dump (1, " || token == T_ALL)"); } dump (1, "{"); diff --git a/src/language/lexer/value-parser.c b/src/language/lexer/value-parser.c index 649bbf24..ff2701e8 100644 --- a/src/language/lexer/value-parser.c +++ b/src/language/lexer/value-parser.c @@ -107,7 +107,7 @@ parse_number (struct lexer *lexer, double *x, const enum fmt_type *format) assert (fmt_get_category (*format) != FMT_CAT_STRING); - if (!data_in_msg (lex_tokss (lexer), C_ENCODING, *format, &v, 0, NULL)) + if (!data_in_msg (lex_tokss (lexer), "UTF-8", *format, &v, 0, NULL)) return false; lex_get (lexer); diff --git a/src/language/lexer/variable-parser.c b/src/language/lexer/variable-parser.c index 84cf9725..fbcc170d 100644 --- a/src/language/lexer/variable-parser.c +++ b/src/language/lexer/variable-parser.c @@ -415,8 +415,8 @@ add_var_name (char *name, /* Parses a list of variable names according to the DATA LIST version of the TO convention. */ bool -parse_DATA_LIST_vars (struct lexer *lexer, char ***namesp, - size_t *n_varsp, int pv_opts) +parse_DATA_LIST_vars (struct lexer *lexer, const struct dictionary *dict, + char ***namesp, size_t *n_varsp, int pv_opts) { char **names; size_t n_vars; @@ -453,7 +453,8 @@ parse_DATA_LIST_vars (struct lexer *lexer, char ***namesp, do { - if (lex_token (lexer) != T_ID) + if (lex_token (lexer) != T_ID + || !dict_id_is_valid (dict, lex_tokcstr (lexer), true)) { lex_error (lexer, "expecting variable name"); goto exit; @@ -474,7 +475,8 @@ parse_DATA_LIST_vars (struct lexer *lexer, char ***namesp, unsigned long int number; lex_get (lexer); - if (lex_token (lexer) != T_ID) + if (lex_token (lexer) != T_ID + || !dict_id_is_valid (dict, lex_tokcstr (lexer), true)) { lex_error (lexer, "expecting variable name"); goto exit; @@ -574,7 +576,8 @@ register_vars_pool (struct pool *pool, char **names, size_t nnames) parse_DATA_LIST_vars(), except that all allocations are taken from the given POOL. */ bool -parse_DATA_LIST_vars_pool (struct lexer *lexer, struct pool *pool, +parse_DATA_LIST_vars_pool (struct lexer *lexer, const struct dictionary *dict, + struct pool *pool, char ***names, size_t *nnames, int pv_opts) { int retval; @@ -585,7 +588,7 @@ parse_DATA_LIST_vars_pool (struct lexer *lexer, struct pool *pool, re-free it later. */ assert (!(pv_opts & PV_APPEND)); - retval = parse_DATA_LIST_vars (lexer, names, nnames, pv_opts); + retval = parse_DATA_LIST_vars (lexer, dict, names, nnames, pv_opts); if (retval) register_vars_pool (pool, *names, *nnames); return retval; @@ -624,7 +627,7 @@ parse_mixed_vars (struct lexer *lexer, const struct dictionary *dict, free (v); *nnames += nv; } - else if (!parse_DATA_LIST_vars (lexer, names, nnames, PV_APPEND)) + else if (!parse_DATA_LIST_vars (lexer, dict, names, nnames, PV_APPEND)) goto fail; } return 1; diff --git a/src/language/lexer/variable-parser.h b/src/language/lexer/variable-parser.h index b0ab8c51..7abea0a1 100644 --- a/src/language/lexer/variable-parser.h +++ b/src/language/lexer/variable-parser.h @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006, 2007 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2006, 2007, 2010 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -59,9 +59,11 @@ bool parse_variables_pool (struct lexer *, struct pool *, const struct dictionar struct variable ***, size_t *, int opts); bool parse_var_set_vars (struct lexer *, const struct var_set *, struct variable ***, size_t *, int opts); -bool parse_DATA_LIST_vars (struct lexer *, char ***names, size_t *cnt, int opts); -bool parse_DATA_LIST_vars_pool (struct lexer *, struct pool *, - char ***names, size_t *cnt, int opts); +bool parse_DATA_LIST_vars (struct lexer *, const struct dictionary *, + char ***names, size_t *cnt, int opts); +bool parse_DATA_LIST_vars_pool (struct lexer *, const struct dictionary *, + struct pool *, + char ***names, size_t *cnt, int opts); bool parse_mixed_vars (struct lexer *, const struct dictionary *dict, char ***names, size_t *cnt, int opts); bool parse_mixed_vars_pool (struct lexer *, const struct dictionary *dict, diff --git a/src/language/prompt.c b/src/language/prompt.c deleted file mode 100644 index 614796e1..00000000 --- a/src/language/prompt.c +++ /dev/null @@ -1,75 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2010, 2011 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#include - -#include -#include -#include - -#include "language/prompt.h" - -#include "data/file-name.h" -#include "data/settings.h" -#include "data/variable.h" -#include "language/command.h" -#include "language/lexer/lexer.h" -#include "libpspp/assertion.h" -#include "libpspp/message.h" -#include "libpspp/str.h" -#include "libpspp/version.h" -#include "output/tab.h" - -#include "gl/xalloc.h" - -/* Current prompting style. */ -static enum prompt_style current_style; - -/* Gets the command prompt for the given STYLE. */ -const char * -prompt_get (enum prompt_style style) -{ - switch (style) - { - case PROMPT_FIRST: - return "PSPP> "; - - case PROMPT_LATER: - return " > "; - - case PROMPT_DATA: - return "data> "; - - case PROMPT_CNT: - NOT_REACHED (); - } - NOT_REACHED (); -} - -/* Sets STYLE as the current prompt style. */ -void -prompt_set_style (enum prompt_style style) -{ - assert (style < PROMPT_CNT); - current_style = style; -} - -/* Returns the current prompt. */ -enum prompt_style -prompt_get_style (void) -{ - return current_style; -} diff --git a/src/language/prompt.h b/src/language/prompt.h deleted file mode 100644 index aa5733aa..00000000 --- a/src/language/prompt.h +++ /dev/null @@ -1,35 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2010 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#ifndef PROMPT_H -#define PROMPT_H 1 - -#include - -enum prompt_style - { - PROMPT_FIRST, /* First line of command. */ - PROMPT_LATER, /* Second or later line of command. */ - PROMPT_DATA, /* Between BEGIN DATA and END DATA. */ - PROMPT_CNT - }; - -enum prompt_style prompt_get_style (void); -void prompt_set_style (enum prompt_style); - -const char *prompt_get (enum prompt_style); - -#endif /* PROMPT_H */ diff --git a/src/language/stats/aggregate.c b/src/language/stats/aggregate.c index 241ad7c1..2a4eda87 100644 --- a/src/language/stats/aggregate.c +++ b/src/language/stats/aggregate.c @@ -39,14 +39,15 @@ #include "language/lexer/variable-parser.h" #include "language/stats/sort-criteria.h" #include "libpspp/assertion.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/misc.h" #include "libpspp/pool.h" #include "libpspp/str.h" #include "math/moments.h" +#include "math/percentiles.h" #include "math/sort.h" #include "math/statistic.h" -#include "math/percentiles.h" #include "gl/minmax.h" #include "gl/xalloc.h" @@ -416,7 +417,7 @@ parse_aggregate_functions (struct lexer *lexer, const struct dictionary *dict, { size_t n_dest_prev = n_dest; - if (!parse_DATA_LIST_vars (lexer, &dest, &n_dest, + if (!parse_DATA_LIST_vars (lexer, dict, &dest, &n_dest, (PV_APPEND | PV_SINGLE | PV_NO_SCRATCH | PV_NO_DUPLICATE))) goto error; @@ -434,14 +435,8 @@ parse_aggregate_functions (struct lexer *lexer, const struct dictionary *dict, if (lex_is_string (lexer)) { - /* XXX check re-encoded length */ - struct string label; - ds_init_substring (&label, lex_tokss (lexer)); - - ds_truncate (&label, 255); - dest_label[n_dest - 1] = ds_xstrdup (&label); + dest_label[n_dest - 1] = xstrdup (lex_tokcstr (lexer)); lex_get (lexer); - ds_destroy (&label); } } @@ -502,7 +497,9 @@ parse_aggregate_functions (struct lexer *lexer, const struct dictionary *dict, lex_match (lexer, T_COMMA); if (lex_is_string (lexer)) { - arg[i].c = ss_xstrdup (lex_tokss (lexer)); + arg[i].c = recode_string (dict_get_encoding (agr->dict), + "UTF-8", lex_tokcstr (lexer), + -1); type = VAL_STRING; } else if (lex_is_number (lexer)) @@ -640,7 +637,8 @@ parse_aggregate_functions (struct lexer *lexer, const struct dictionary *dict, free (dest[i]); if (dest_label[i]) - var_set_label (destvar, dest_label[i]); + var_set_label (destvar, dest_label[i], + dict_get_encoding (agr->dict), true); v->dest = destvar; } @@ -811,6 +809,7 @@ accumulate_aggregate_info (struct agr_proc *agr, const struct ccase *input) iter->int1 = 1; break; case MAX | FSTRING: + /* Need to do some kind of Unicode collation thingy here */ if (memcmp (iter->string, value_str (v, src_width), src_width) < 0) memcpy (iter->string, value_str (v, src_width), src_width); iter->int1 = 1; diff --git a/src/language/stats/autorecode.c b/src/language/stats/autorecode.c index 77385783..fdf3dddc 100644 --- a/src/language/stats/autorecode.c +++ b/src/language/stats/autorecode.c @@ -120,7 +120,8 @@ cmd_autorecode (struct lexer *lexer, struct dataset *ds) if (!lex_force_match_id (lexer, "INTO")) goto error; lex_match (lexer, T_EQUALS); - if (!parse_DATA_LIST_vars (lexer, &dst_names, &n_dsts, PV_NO_DUPLICATE)) + if (!parse_DATA_LIST_vars (lexer, dict, &dst_names, &n_dsts, + PV_NO_DUPLICATE)) goto error; if (n_dsts != n_srcs) { diff --git a/src/language/stats/descriptives.c b/src/language/stats/descriptives.c index 50d52d3c..adedd5e6 100644 --- a/src/language/stats/descriptives.c +++ b/src/language/stats/descriptives.c @@ -31,9 +31,10 @@ #include "language/lexer/lexer.h" #include "language/lexer/variable-parser.h" #include "libpspp/array.h" +#include "libpspp/assertion.h" #include "libpspp/compiler.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" -#include "libpspp/assertion.h" #include "math/moments.h" #include "output/tab.h" @@ -303,7 +304,7 @@ cmd_descriptives (struct lexer *lexer, struct dataset *ds) } else if (var_cnt == 0) { - if (lex_look_ahead (lexer) == T_EQUALS) + if (lex_next_token (lexer, 1) == T_EQUALS) { lex_match_id (lexer, "VARIABLES"); lex_match (lexer, T_EQUALS); @@ -507,17 +508,22 @@ static char * generate_z_varname (const struct dictionary *dict, struct dsc_proc *dsc, const char *var_name, int *z_cnt) { - char name[VAR_NAME_LEN + 1]; + char *z_name, *trunc_name; /* Try a name based on the original variable name. */ - name[0] = 'Z'; - str_copy_trunc (name + 1, sizeof name - 1, var_name); - if (try_name (dict, dsc, name)) - return xstrdup (name); + z_name = xasprintf ("Z%s", var_name); + trunc_name = utf8_encoding_trunc (z_name, dict_get_encoding (dict), + ID_MAX_LEN); + free (z_name); + if (try_name (dict, dsc, trunc_name)) + return trunc_name; + free (trunc_name); /* Generate a synthetic name. */ for (;;) { + char name[8]; + (*z_cnt)++; if (*z_cnt <= 99) @@ -675,7 +681,8 @@ setup_z_trns (struct dsc_proc *dsc, struct dataset *ds) dst_var = dict_create_var_assert (dataset_dict (ds), dv->z_name, 0); var_set_label (dst_var, xasprintf (_("Z-score of %s"), - var_to_string (dv->v))); + var_to_string (dv->v)), + dict_get_encoding (dataset_dict (ds)), false); z = &t->z_scores[cnt++]; z->src_var = dv->v; diff --git a/src/language/stats/flip.c b/src/language/stats/flip.c index 534efb49..23544c8b 100644 --- a/src/language/stats/flip.c +++ b/src/language/stats/flip.c @@ -220,7 +220,7 @@ cmd_flip (struct lexer *lexer, struct dataset *ds) flip->n_vars, &flip_casereader_class, flip); proc_set_active_file_data (ds, reader); - return lex_end_of_command (lexer); + return CMD_SUCCESS; error: destroy_flip_pgm (flip); @@ -249,7 +249,7 @@ make_new_var (struct dictionary *dict, const char *name_) *--cp = '\0'; /* Fix invalid characters. */ - for (cp = name; *cp && cp < name + VAR_NAME_LEN; cp++) + for (cp = name; *cp && cp < name + ID_MAX_LEN; cp++) if (cp == name) { if (!lex_is_id1 (*cp) || *cp == '$') @@ -270,8 +270,8 @@ make_new_var (struct dictionary *dict, const char *name_) int i; for (i = 1; ; i++) { - char n[VAR_NAME_LEN + 1]; - int ofs = MIN (VAR_NAME_LEN - 1 - intlog10 (i), len); + char n[ID_MAX_LEN + 1]; + int ofs = MIN (ID_MAX_LEN - 1 - intlog10 (i), len); strncpy (n, name, ofs); sprintf (&n[ofs], "%d", i); diff --git a/src/language/stats/frequencies.q b/src/language/stats/frequencies.q index adc4f16b..ef4b7f95 100644 --- a/src/language/stats/frequencies.q +++ b/src/language/stats/frequencies.q @@ -738,14 +738,23 @@ frq_custom_grouped (struct lexer *lexer, struct dataset *ds, struct cmd_frequenc } free (v); - if (!lex_match (lexer, T_SLASH)) - break; - if ((lex_token (lexer) != T_ID || dict_lookup_var (dataset_dict (ds), lex_tokcstr (lexer)) != NULL) - && lex_token (lexer) != T_ALL) - { - lex_put_back (lexer, T_SLASH); - break; - } + if (lex_token (lexer) != T_SLASH) + break; + + if ((lex_next_token (lexer, 1) == T_ID + && dict_lookup_var (dataset_dict (ds), + lex_next_tokcstr (lexer, 1))) + || lex_next_token (lexer, 1) == T_ALL) + { + /* The token after the slash is a variable name. Keep parsing. */ + lex_get (lexer); + } + else + { + /* The token after the slash must be the start of a new + subcommand. Let the caller see the slash. */ + break; + } } return 1; diff --git a/src/language/stats/npar.c b/src/language/stats/npar.c index 3a178b23..a572e09f 100644 --- a/src/language/stats/npar.c +++ b/src/language/stats/npar.c @@ -258,8 +258,8 @@ parse_npar_tests (struct lexer *lexer, struct dataset *ds, struct cmd_npar_tests NOT_REACHED (); } } - else if (lex_match_hyphenated_word (lexer, "K-W") || - lex_match_hyphenated_word (lexer, "KRUSKAL-WALLIS")) + else if (lex_match_phrase (lexer, "K-W") || + lex_match_phrase (lexer, "KRUSKAL-WALLIS")) { lex_match (lexer, T_EQUALS); npt->kruskal_wallis++; @@ -276,8 +276,8 @@ parse_npar_tests (struct lexer *lexer, struct dataset *ds, struct cmd_npar_tests NOT_REACHED (); } } - else if (lex_match_hyphenated_word (lexer, "M-W") || - lex_match_hyphenated_word (lexer, "MANN-WHITNEY")) + else if (lex_match_phrase (lexer, "M-W") || + lex_match_phrase (lexer, "MANN-WHITNEY")) { lex_match (lexer, T_EQUALS); npt->mann_whitney++; @@ -759,44 +759,39 @@ npar_chisquare (struct lexer *lexer, struct dataset *ds, cstp->n_expected = 0; cstp->expected = NULL; - if ( lex_match (lexer, T_SLASH) ) + if (lex_match_phrase (lexer, "/EXPECTED")) { - if ( lex_match_id (lexer, "EXPECTED") ) - { - lex_force_match (lexer, T_EQUALS); - if ( ! lex_match_id (lexer, "EQUAL") ) - { - double f; - int n; - while ( lex_is_number (lexer) ) - { - int i; - n = 1; - f = lex_number (lexer); - lex_get (lexer); - if ( lex_match (lexer, T_ASTERISK)) - { - n = f; - f = lex_number (lexer); - lex_get (lexer); - } - lex_match (lexer, T_COMMA); - - cstp->n_expected += n; - cstp->expected = pool_realloc (specs->pool, - cstp->expected, - sizeof (double) * - cstp->n_expected); - for ( i = cstp->n_expected - n ; - i < cstp->n_expected; - ++i ) - cstp->expected[i] = f; + lex_force_match (lexer, T_EQUALS); + if ( ! lex_match_id (lexer, "EQUAL") ) + { + double f; + int n; + while ( lex_is_number (lexer) ) + { + int i; + n = 1; + f = lex_number (lexer); + lex_get (lexer); + if ( lex_match (lexer, T_ASTERISK)) + { + n = f; + f = lex_number (lexer); + lex_get (lexer); + } + lex_match (lexer, T_COMMA); - } - } - } - else - retval = 3; + cstp->n_expected += n; + cstp->expected = pool_realloc (specs->pool, + cstp->expected, + sizeof (double) * + cstp->n_expected); + for ( i = cstp->n_expected - n ; + i < cstp->n_expected; + ++i ) + cstp->expected[i] = f; + + } + } } if ( cstp->ranged && cstp->n_expected > 0 && @@ -828,7 +823,7 @@ npar_binomial (struct lexer *lexer, struct dataset *ds, struct binomial_test *btp = pool_alloc (specs->pool, sizeof (*btp)); struct one_sample_test *tp = &btp->parent; struct npar_test *nt = &tp->parent; - bool equals; + bool equals = false; nt->execute = binomial_execute; nt->insert_variables = one_sample_insert_variables; diff --git a/src/language/stats/rank.q b/src/language/stats/rank.q index 49a040e3..97c98c32 100644 --- a/src/language/stats/rank.q +++ b/src/language/stats/rank.q @@ -198,7 +198,8 @@ fraction_name(void) /* Create a label on DEST_VAR, describing its derivation from SRC_VAR and F */ static void create_var_label (struct variable *dest_var, - const struct variable *src_var, enum RANK_FUNC f) + const struct variable *src_var, enum RANK_FUNC f, + const char *dict_encoding) { struct string label; ds_init_empty (&label); @@ -224,7 +225,7 @@ create_var_label (struct variable *dest_var, ds_put_format (&label, _("%s of %s"), function_name[f], var_get_name (src_var)); - var_set_label (dest_var, ds_cstr (&label)); + var_set_label (dest_var, ds_cstr (&label), dict_encoding, false); ds_destroy (&label); } @@ -673,15 +674,18 @@ cmd_rank (struct lexer *lexer, struct dataset *ds) int v; for ( v = 0 ; v < n_src_vars ; v ++ ) { + struct dictionary *dict = dataset_dict (ds); + if ( rank_specs[i].destvars[v] == NULL ) { rank_specs[i].destvars[v] = - create_rank_variable (dataset_dict(ds), rank_specs[i].rfunc, src_vars[v], NULL); + create_rank_variable (dict, rank_specs[i].rfunc, src_vars[v], NULL); } create_var_label ( rank_specs[i].destvars[v], src_vars[v], - rank_specs[i].rfunc); + rank_specs[i].rfunc, + dict_get_encoding (dict)); } } diff --git a/src/language/stats/sort-cases.c b/src/language/stats/sort-cases.c index 1134874d..ff0f305e 100644 --- a/src/language/stats/sort-cases.c +++ b/src/language/stats/sort-cases.c @@ -78,6 +78,6 @@ cmd_sort_cases (struct lexer *lexer, struct dataset *ds) max_buffers = INT_MAX; subcase_destroy (&ordering); - return ok ? lex_end_of_command (lexer) : CMD_CASCADING_FAILURE; + return ok ? CMD_SUCCESS : CMD_CASCADING_FAILURE; } diff --git a/src/language/syntax-file.c b/src/language/syntax-file.c deleted file mode 100644 index 286ce1e3..00000000 --- a/src/language/syntax-file.c +++ /dev/null @@ -1,144 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#include - -#include "language/syntax-file.h" - -#include -#include -#include -#include - -#include "data/file-name.h" -#include "data/settings.h" -#include "data/variable.h" -#include "language/command.h" -#include "language/lexer/lexer.h" -#include "language/prompt.h" -#include "libpspp/assertion.h" -#include "libpspp/cast.h" -#include "libpspp/getl.h" -#include "libpspp/ll.h" -#include "libpspp/message.h" -#include "libpspp/str.h" -#include "libpspp/version.h" -#include "output/tab.h" - -#include "gl/xalloc.h" - -#include "gettext.h" -#define _(msgid) gettext (msgid) - -struct syntax_file_source - { - struct getl_interface parent ; - - FILE *syntax_file; - - /* Current location. */ - char *fn; /* File name. */ - int ln; /* Line number. */ - }; - -static const char * -name (const struct getl_interface *s) -{ - const struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source, - parent); - return sfs->fn; -} - -static int -line_number (const struct getl_interface *s) -{ - const struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source, - parent); - return sfs->ln; -} - - -/* Reads a line from syntax file source S into LINE. - Returns true if successful, false at end of file. */ -static bool -read_syntax_file (struct getl_interface *s, - struct string *line) -{ - struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source, - parent); - - if (sfs->syntax_file == NULL) - return false; - - /* Read line from file and remove new-line. - Skip initial "#! /usr/bin/pspp" line. */ - do - { - sfs->ln++; - ds_clear (line); - if (!ds_read_line (line, sfs->syntax_file, SIZE_MAX)) - { - if (ferror (sfs->syntax_file)) - msg (ME, _("Reading `%s': %s."), sfs->fn, strerror (errno)); - return false; - } - ds_chomp_byte (line, '\n'); - } - while (sfs->ln == 1 && !memcmp (ds_cstr (line), "#!", 2)); - - return true; -} - -static void -syntax_close (struct getl_interface *s) -{ - struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source, - parent); - - if (sfs->syntax_file && EOF == fn_close (sfs->fn, sfs->syntax_file)) - msg (MW, _("Closing `%s': %s."), sfs->fn, strerror (errno)); - free (sfs->fn); - free (sfs); -} - -static bool -always_false (const struct getl_interface *s UNUSED) -{ - return false; -} - - -/* Creates a syntax file source with file name FN. */ -struct getl_interface * -create_syntax_file_source (const char *fn) -{ - struct syntax_file_source *ss = xzalloc (sizeof (*ss)); - - ss->fn = xstrdup (fn); - ss->syntax_file = fn_open (ss->fn, "r"); - if (ss->syntax_file == NULL) - msg (ME, _("Opening `%s': %s."), ss->fn, strerror (errno)); - - ss->parent.interactive = always_false; - ss->parent.read = read_syntax_file ; - ss->parent.filter = NULL; - ss->parent.close = syntax_close ; - ss->parent.name = name ; - ss->parent.location = line_number; - - return &ss->parent; -} - diff --git a/src/language/syntax-file.h b/src/language/syntax-file.h deleted file mode 100644 index 8044f3c0..00000000 --- a/src/language/syntax-file.h +++ /dev/null @@ -1,25 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#if !SYNTAX_FILE -#define SYNTAX_FILE 1 - -struct getl_interface; - -/* Creates a syntax file source with file name FN. */ -struct getl_interface * create_syntax_file_source (const char *) ; - -#endif diff --git a/src/language/syntax-string-source.c b/src/language/syntax-string-source.c deleted file mode 100644 index 1d3d4d6b..00000000 --- a/src/language/syntax-string-source.c +++ /dev/null @@ -1,151 +0,0 @@ -/* PSPPIRE - a graphical interface for PSPP. - Copyright (C) 2007, 2009, 2010, 2011 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - - -#include - -#include "language/syntax-string-source.h" - -#include - -#include "libpspp/cast.h" -#include "libpspp/getl.h" -#include "libpspp/compiler.h" -#include "libpspp/str.h" - -#include "gl/xalloc.h" - -struct syntax_string_source - { - struct getl_interface parent; - struct string buffer; - size_t posn; - }; - - -static bool -always_false (const struct getl_interface *i UNUSED) -{ - return false; -} - -/* Returns the name of the source */ -static const char * -name (const struct getl_interface *i UNUSED) -{ - return NULL; -} - - -/* Returns the location within the source */ -static int -location (const struct getl_interface *i UNUSED) -{ - return 0; -} - - -static void -do_close (struct getl_interface *i ) -{ - struct syntax_string_source *sss = UP_CAST (i, struct syntax_string_source, - parent); - - ds_destroy (&sss->buffer); - - free (sss); -} - - - -static bool -read_single_line (struct getl_interface *i, - struct string *line) -{ - struct syntax_string_source *sss = UP_CAST (i, struct syntax_string_source, - parent); - - size_t next; - - if ( sss->posn == -1) - return false; - - next = ss_find_byte (ds_substr (&sss->buffer, - sss->posn, -1), '\n'); - - ds_assign_substring (line, - ds_substr (&sss->buffer, - sss->posn, - next) - ); - - if ( next != -1 ) - sss->posn += next + 1; /* + 1 to skip newline */ - else - sss->posn = -1; /* End of file encountered */ - - return true; -} - -static struct syntax_string_source * -create_syntax_string_source__ (void) -{ - struct syntax_string_source *sss = xzalloc (sizeof *sss); - - sss->posn = 0; - - sss->parent.interactive = always_false; - sss->parent.close = do_close; - sss->parent.read = read_single_line; - - sss->parent.name = name; - sss->parent.location = location; - - return sss; -} - -struct getl_interface * -create_syntax_string_source (const char *s) -{ - struct syntax_string_source *sss = create_syntax_string_source__ (); - ds_init_cstr (&sss->buffer, s); - return &sss->parent; -} - -struct getl_interface * -create_syntax_format_source (const char *format, ...) -{ - struct syntax_string_source *sss; - va_list args; - - sss = create_syntax_string_source__ (); - - ds_init_empty (&sss->buffer); - - va_start (args, format); - ds_put_vformat (&sss->buffer, format, args); - va_end (args); - - return &sss->parent; -} - -/* Return the syntax currently contained in S. - Primarily usefull for debugging */ -const char * -syntax_string_source_get_syntax (const struct syntax_string_source *s) -{ - return ds_cstr (&s->buffer); -} diff --git a/src/language/syntax-string-source.h b/src/language/syntax-string-source.h deleted file mode 100644 index d2e1a9ba..00000000 --- a/src/language/syntax-string-source.h +++ /dev/null @@ -1,33 +0,0 @@ -/* PSPPIRE - a graphical interface for PSPP. - Copyright (C) 2007, 2010 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#ifndef SYNTAX_STRING_SOURCE_H -#define SYNTAX_STRING_SOURCE_H - -#include "libpspp/compiler.h" - -struct getl_interface; - -struct syntax_string_source; - -struct getl_interface *create_syntax_string_source (const char *); -struct getl_interface *create_syntax_format_source (const char *, ...) - PRINTF_FORMAT (1, 2); - -const char * syntax_string_source_get_syntax (const struct syntax_string_source *s); - - -#endif diff --git a/src/language/tests/format-guesser-test.c b/src/language/tests/format-guesser-test.c index b5dbe6d6..cd7ca524 100644 --- a/src/language/tests/format-guesser-test.c +++ b/src/language/tests/format-guesser-test.c @@ -53,5 +53,5 @@ cmd_debug_format_guesser (struct lexer *lexer, struct dataset *ds UNUSED) msg_enable (); putc ('\n', stderr); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/tests/moments-test.c b/src/language/tests/moments-test.c index f6499cb8..af328928 100644 --- a/src/language/tests/moments-test.c +++ b/src/language/tests/moments-test.c @@ -135,7 +135,7 @@ cmd_debug_moments (struct lexer *lexer, struct dataset *ds UNUSED) } fprintf (stderr, "\n"); - retval = lex_end_of_command (lexer); + retval = CMD_SUCCESS; done: free (values); diff --git a/src/language/tests/paper-size.c b/src/language/tests/paper-size.c index 660fe9c7..0322f572 100644 --- a/src/language/tests/paper-size.c +++ b/src/language/tests/paper-size.c @@ -44,5 +44,5 @@ cmd_debug_paper_size (struct lexer *lexer, struct dataset *ds UNUSED) printf ("error\n"); lex_get (lexer); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/utilities/cache.c b/src/language/utilities/cache.c index 2d818af8..dda055db 100644 --- a/src/language/utilities/cache.c +++ b/src/language/utilities/cache.c @@ -27,8 +27,8 @@ /* Parses the CACHE command. */ int -cmd_cache (struct lexer *lexer, struct dataset *ds UNUSED) +cmd_cache (struct lexer *lexer UNUSED, struct dataset *ds UNUSED) { - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/utilities/cd.c b/src/language/utilities/cd.c index 90a1bcf8..cae84eb3 100644 --- a/src/language/utilities/cd.c +++ b/src/language/utilities/cd.c @@ -16,12 +16,14 @@ #include +#include "language/command.h" + #include #include -#include "language/command.h" -#include "libpspp/message.h" #include "language/lexer/lexer.h" +#include "libpspp/i18n.h" +#include "libpspp/message.h" #include "gettext.h" #define _(msgid) gettext (msgid) @@ -35,7 +37,7 @@ cmd_cd (struct lexer *lexer, struct dataset *ds UNUSED) if ( ! lex_force_string (lexer)) goto error; - path = ss_xstrdup (lex_tokss (lexer)); + path = utf8_to_filename (lex_tokcstr (lexer)); if ( -1 == chdir (path) ) { diff --git a/src/language/utilities/date.c b/src/language/utilities/date.c index 80a09c21..3b754b49 100644 --- a/src/language/utilities/date.c +++ b/src/language/utilities/date.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2004, 2011 Free Software Foundation, Inc. + Copyright (C) 2004, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -28,7 +28,7 @@ int cmd_use (struct lexer *lexer, struct dataset *ds UNUSED) { if (lex_match (lexer, T_ALL)) - return lex_end_of_command (lexer); + return CMD_SUCCESS; msg (SW, _("Only USE ALL is currently implemented.")); return CMD_FAILURE; diff --git a/src/language/utilities/host.c b/src/language/utilities/host.c index f9b10480..fbc9d208 100644 --- a/src/language/utilities/host.c +++ b/src/language/utilities/host.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -30,9 +30,11 @@ #include "language/lexer/lexer.h" #include "libpspp/assertion.h" #include "libpspp/compiler.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/str.h" +#include "gl/localcharset.h" #include "gl/xalloc.h" #include "gl/xmalloca.h" @@ -134,6 +136,7 @@ cmd_host (struct lexer *lexer, struct dataset *ds UNUSED) else if (lex_match_id (lexer, "COMMAND")) { struct string command; + char *locale_command; bool ok; lex_match (lexer, T_EQUALS); @@ -154,10 +157,15 @@ cmd_host (struct lexer *lexer, struct dataset *ds UNUSED) return CMD_FAILURE; } - ok = run_command (ds_cstr (&command)); + locale_command = recode_string (locale_charset (), "UTF-8", + ds_cstr (&command), + ds_length (&command)); ds_destroy (&command); - return ok ? lex_end_of_command (lexer) : CMD_FAILURE; + ok = run_command (locale_command); + free (locale_command); + + return ok ? CMD_SUCCESS : CMD_FAILURE; } else { diff --git a/src/language/utilities/include.c b/src/language/utilities/include.c index e326134d..e802abec 100644 --- a/src/language/utilities/include.c +++ b/src/language/utilities/include.c @@ -23,10 +23,11 @@ #include #include "data/file-name.h" +#include "data/procedure.h" #include "language/command.h" +#include "language/lexer/include-path.h" #include "language/lexer/lexer.h" -#include "language/syntax-file.h" -#include "libpspp/getl.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/str.h" @@ -36,67 +37,79 @@ #include "gettext.h" #define _(msgid) gettext (msgid) -static int parse_insert (struct lexer *lexer, char **filename); +enum variant + { + INSERT, + INCLUDE + }; - -int -cmd_include (struct lexer *lexer, struct dataset *ds UNUSED) +static int +do_insert (struct lexer *lexer, struct dataset *ds, enum variant variant) { - char *filename = NULL; - int status = parse_insert (lexer, &filename); - - if ( CMD_SUCCESS != status) - return status; + enum lex_syntax_mode syntax_mode; + enum lex_error_mode error_mode; + char *relative_name; + char *filename; + char *encoding; + int status; + bool cd; - lex_get (lexer); - - status = lex_end_of_command (lexer); + /* Skip optional FILE=. */ + if (lex_match_id (lexer, "FILE")) + lex_match (lexer, T_EQUALS); - if ( status == CMD_SUCCESS) + /* File name can be identifier or string. */ + if (lex_token (lexer) != T_ID && !lex_is_string (lexer)) { - struct source_stream *ss = lex_get_source_stream (lexer); - - assert (filename); - getl_include_source (ss, create_syntax_file_source (filename), - GETL_BATCH, ERRMODE_STOP); - free (filename); + lex_error (lexer, _("expecting file name")); + return CMD_FAILURE; } - return status; -} - - -int -cmd_insert (struct lexer *lexer, struct dataset *ds UNUSED) -{ - enum syntax_mode syntax_mode = GETL_INTERACTIVE; - enum error_mode error_mode = ERRMODE_CONTINUE; - char *filename = NULL; - int status = parse_insert (lexer, &filename); - bool cd = false; - - if ( CMD_SUCCESS != status) - return status; + relative_name = utf8_to_filename (lex_tokcstr (lexer)); + filename = include_path_search (relative_name); + free (relative_name); + if ( ! filename) + { + msg (SE, _("Can't find `%s' in include file search path."), + lex_tokcstr (lexer)); + return CMD_FAILURE; + } lex_get (lexer); + syntax_mode = LEX_SYNTAX_INTERACTIVE; + error_mode = LEX_ERROR_CONTINUE; + cd = false; + status = CMD_FAILURE; + encoding = xstrdup (dataset_get_default_syntax_encoding (ds)); while ( T_ENDCMD != lex_token (lexer)) { - if (lex_match_id (lexer, "SYNTAX")) + if (lex_match_id (lexer, "ENCODING")) + { + lex_match (lexer, T_EQUALS); + if (!lex_force_string (lexer)) + goto exit; + + free (encoding); + encoding = xstrdup (lex_tokcstr (lexer)); + } + else if (variant == INSERT && lex_match_id (lexer, "SYNTAX")) { lex_match (lexer, T_EQUALS); if ( lex_match_id (lexer, "INTERACTIVE") ) - syntax_mode = GETL_INTERACTIVE; + syntax_mode = LEX_SYNTAX_INTERACTIVE; else if ( lex_match_id (lexer, "BATCH")) - syntax_mode = GETL_BATCH; + syntax_mode = LEX_SYNTAX_BATCH; + else if ( lex_match_id (lexer, "AUTO")) + syntax_mode = LEX_SYNTAX_AUTO; else { - lex_error (lexer, _("expecting %s or %s after %s"), - "BATCH", "INTERACTIVE", "SYNTAX"); - return CMD_FAILURE; + lex_error (lexer, _("expecting %s, %s, or %s after %s"), + "BATCH", "INTERACTIVE", "AUTO", "SYNTAX"); + goto exit; } } - else if (lex_match_id (lexer, "CD")) + else if (variant == INSERT && lex_match_id (lexer, "CD")) { lex_match (lexer, T_EQUALS); if ( lex_match_id (lexer, "YES") ) @@ -111,100 +124,71 @@ cmd_insert (struct lexer *lexer, struct dataset *ds UNUSED) { lex_error (lexer, _("expecting %s or %s after %s"), "YES", "NO", "CD"); - return CMD_FAILURE; + goto exit; } } - else if (lex_match_id (lexer, "ERROR")) + else if (variant == INSERT && lex_match_id (lexer, "ERROR")) { lex_match (lexer, T_EQUALS); if ( lex_match_id (lexer, "CONTINUE") ) { - error_mode = ERRMODE_CONTINUE; + error_mode = LEX_ERROR_CONTINUE; } else if ( lex_match_id (lexer, "STOP")) { - error_mode = ERRMODE_STOP; + error_mode = LEX_ERROR_STOP; } else { lex_error (lexer, _("expecting %s or %s after %s"), "CONTINUE", "STOP", "ERROR"); - return CMD_FAILURE; + goto exit; } } else { - lex_error (lexer, _("Unexpected token: `%s'."), - lex_token_representation (lexer)); - - return CMD_FAILURE; + lex_error (lexer, NULL); + goto exit; } } - status = lex_end_of_command (lexer); if ( status == CMD_SUCCESS) { - struct source_stream *ss = lex_get_source_stream (lexer); - - assert (filename); - getl_include_source (ss, create_syntax_file_source (filename), - syntax_mode, - error_mode); - - if ( cd ) - { - char *directory = dir_name (filename); - chdir (directory); - free (directory); - } - - free (filename); + struct lex_reader *reader; + + reader = lex_reader_for_file (filename, encoding, + syntax_mode, error_mode); + if (reader != NULL) + { + lex_discard_rest_of_command (lexer); + lex_include (lexer, reader); + + if ( cd ) + { + char *directory = dir_name (filename); + chdir (directory); + free (directory); + } + } } +exit: + free (encoding); + free (filename); return status; } - -static int -parse_insert (struct lexer *lexer, char **filename) +int +cmd_include (struct lexer *lexer, struct dataset *ds) { - const char *target_fn; - char *relative_filename; - - /* Skip optional FILE=. */ - if (lex_match_id (lexer, "FILE")) - lex_match (lexer, T_EQUALS); - - /* File name can be identifier or string. */ - if (lex_token (lexer) != T_ID && !lex_is_string (lexer)) - { - lex_error (lexer, _("expecting file name")); - return CMD_FAILURE; - } - - target_fn = lex_tokcstr (lexer); - - relative_filename = - fn_search_path (target_fn, - getl_include_path (lex_get_source_stream (lexer))); - - if ( ! relative_filename) - { - msg (SE, _("Can't find `%s' in include file search path."), - target_fn); - return CMD_FAILURE; - } - - *filename = relative_filename; - if (*filename == NULL) - { - msg (SE, _("Unable to open `%s': %s."), - relative_filename, strerror (errno)); - free (relative_filename); - return CMD_FAILURE; - } + return do_insert (lexer, ds, INCLUDE); +} - return CMD_SUCCESS; +int +cmd_insert (struct lexer *lexer, struct dataset *ds) +{ + return do_insert (lexer, ds, INSERT); } + diff --git a/src/language/utilities/permissions.c b/src/language/utilities/permissions.c index 83fa820c..8b0e3f0c 100644 --- a/src/language/utilities/permissions.c +++ b/src/language/utilities/permissions.c @@ -25,6 +25,7 @@ #include "data/settings.h" #include "language/command.h" #include "language/lexer/lexer.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/misc.h" #include "libpspp/str.h" @@ -94,20 +95,23 @@ cmd_permissions (struct lexer *lexer, struct dataset *ds UNUSED) int change_permissions (const char *file_name, enum PER per) { + char *locale_file_name; struct stat buf; mode_t mode; if (settings_get_safer_mode ()) { msg (SE, _("This command not allowed when the SAFER option is set.")); - return CMD_FAILURE; + return 0; } - if ( -1 == stat(file_name, &buf) ) + locale_file_name = utf8_to_filename (file_name); + if ( -1 == stat(locale_file_name, &buf) ) { const int errnum = errno; msg (SE, _("Cannot stat %s: %s"), file_name, strerror(errnum)); + free (locale_file_name); return 0; } @@ -116,13 +120,16 @@ change_permissions (const char *file_name, enum PER per) else mode = buf.st_mode & ~0222; - if ( -1 == chmod(file_name, mode)) + if ( -1 == chmod(locale_file_name, mode)) { const int errnum = errno; msg (SE, _("Cannot change mode of %s: %s"), file_name, strerror(errnum)); + free (locale_file_name); return 0; } + free (locale_file_name); + return 1; } diff --git a/src/language/utilities/set.q b/src/language/utilities/set.q index 9837d2cc..3da12a61 100644 --- a/src/language/utilities/set.q +++ b/src/language/utilities/set.q @@ -493,7 +493,10 @@ stc_custom_journal (struct lexer *lexer, struct dataset *ds UNUSED, struct cmd_s journal_disable (); else if (lex_is_string (lexer) || lex_token (lexer) == T_ID) { - journal_set_file_name (lex_tokcstr (lexer)); + char *filename = utf8_to_filename (lex_tokcstr (lexer)); + journal_set_file_name (filename); + free (filename); + lex_get (lexer); } else @@ -905,12 +908,12 @@ static struct settings *saved_settings[MAX_SAVED_SETTINGS]; static int n_saved_settings; int -cmd_preserve (struct lexer *lexer, struct dataset *ds UNUSED) +cmd_preserve (struct lexer *lexer UNUSED, struct dataset *ds UNUSED) { if (n_saved_settings < MAX_SAVED_SETTINGS) { saved_settings[n_saved_settings++] = settings_get (); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } else { @@ -922,14 +925,14 @@ cmd_preserve (struct lexer *lexer, struct dataset *ds UNUSED) } int -cmd_restore (struct lexer *lexer, struct dataset *ds UNUSED) +cmd_restore (struct lexer *lexer UNUSED, struct dataset *ds UNUSED) { if (n_saved_settings > 0) { struct settings *s = saved_settings[--n_saved_settings]; settings_set (s); settings_destroy (s); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } else { diff --git a/src/language/utilities/title.c b/src/language/utilities/title.c index 9d5b8261..398288b8 100644 --- a/src/language/utilities/title.c +++ b/src/language/utilities/title.c @@ -52,20 +52,10 @@ cmd_subtitle (struct lexer *lexer, struct dataset *ds UNUSED) static int parse_title (struct lexer *lexer, enum text_item_type type) { - if (lex_look_ahead (lexer) == T_STRING) - { - lex_get (lexer); - if (!lex_force_string (lexer)) - return CMD_FAILURE; - set_title (lex_tokcstr (lexer), type); - lex_get (lexer); - return lex_end_of_command (lexer); - } - else - { - set_title (lex_rest_of_line (lexer), type); - lex_discard_line (lexer); - } + if (!lex_force_string (lexer)) + return CMD_FAILURE; + set_title (lex_tokcstr (lexer), type); + lex_get (lexer); return CMD_SUCCESS; } @@ -79,81 +69,49 @@ set_title (const char *title, enum text_item_type type) int cmd_file_label (struct lexer *lexer, struct dataset *ds) { - const char *label; - - label = lex_rest_of_line (lexer); - lex_discard_line (lexer); - while (isspace ((unsigned char) *label)) - label++; + if (!lex_force_string (lexer)) + return CMD_FAILURE; - dict_set_label (dataset_dict (ds), label); + dict_set_label (dataset_dict (ds), lex_tokcstr (lexer)); + lex_get (lexer); return CMD_SUCCESS; } -/* Add entry date line to DICT's documents. */ -static void -add_document_trailer (struct dictionary *dict) -{ - char buf[64]; - - sprintf (buf, _(" (Entered %s)"), get_start_date ()); - dict_add_document_line (dict, buf); -} - /* Performs the DOCUMENT command. */ int cmd_document (struct lexer *lexer, struct dataset *ds) { struct dictionary *dict = dataset_dict (ds); - struct string line = DS_EMPTY_INITIALIZER; - bool end_dot; + char *trailer; - do + if (!lex_force_string (lexer)) + return CMD_FAILURE; + + while (lex_is_string (lexer)) { - end_dot = lex_end_dot (lexer); - ds_assign_string (&line, lex_entire_line_ds (lexer)); - if (end_dot) - ds_put_byte (&line, '.'); - dict_add_document_line (dict, ds_cstr (&line)); - - lex_discard_line (lexer); - lex_get_line (lexer); + dict_add_document_line (dict, lex_tokcstr (lexer), true); + lex_get (lexer); } - while (!end_dot); - add_document_trailer (dict); - ds_destroy (&line); + trailer = xasprintf (_(" (Entered %s)"), get_start_date ()); + dict_add_document_line (dict, trailer, true); + free (trailer); return CMD_SUCCESS; } -/* Performs the DROP DOCUMENTS command. */ +/* Performs the ADD DOCUMENTS command. */ int -cmd_drop_documents (struct lexer *lexer, struct dataset *ds) +cmd_add_documents (struct lexer *lexer, struct dataset *ds) { - dict_clear_documents (dataset_dict (ds)); - - return lex_end_of_command (lexer); + return cmd_document (lexer, ds); } - -/* Performs the ADD DOCUMENTS command. */ +/* Performs the DROP DOCUMENTS command. */ int -cmd_add_documents (struct lexer *lexer, struct dataset *ds) +cmd_drop_documents (struct lexer *lexer UNUSED, struct dataset *ds) { - struct dictionary *dict = dataset_dict (ds); - - if ( ! lex_force_string (lexer) ) - return CMD_FAILURE; - - while ( lex_is_string (lexer)) - { - dict_add_document_line (dict, lex_tokcstr (lexer)); - lex_get (lexer); - } - - add_document_trailer (dict); - - return lex_end_of_command (lexer) ; + dict_clear_documents (dataset_dict (ds)); + return CMD_SUCCESS; } diff --git a/src/language/xforms/compute.c b/src/language/xforms/compute.c index 5089d80d..82a1121f 100644 --- a/src/language/xforms/compute.c +++ b/src/language/xforms/compute.c @@ -100,7 +100,7 @@ cmd_compute (struct lexer *lexer, struct dataset *ds) lvalue_finalize (lvalue, compute, dict); - return lex_end_of_command (lexer); + return CMD_SUCCESS; fail: lvalue_destroy (lvalue, dict); @@ -256,7 +256,7 @@ cmd_if (struct lexer *lexer, struct dataset *ds) lvalue_finalize (lvalue, compute, dict); - return lex_end_of_command (lexer); + return CMD_SUCCESS; fail: lvalue_destroy (lvalue, dict); @@ -346,7 +346,7 @@ lvalue_parse (struct lexer *lexer, struct dataset *ds) if (!lex_force_id (lexer)) goto lossage; - if (lex_look_ahead (lexer) == T_LPAREN) + if (lex_next_token (lexer, 1) == T_LPAREN) { /* Vector. */ lvalue->vector = dict_lookup_vector (dict, lex_tokcstr (lexer)); diff --git a/src/language/xforms/count.c b/src/language/xforms/count.c index 172a5e2c..d0045fee 100644 --- a/src/language/xforms/count.c +++ b/src/language/xforms/count.c @@ -28,6 +28,7 @@ #include "language/lexer/value-parser.h" #include "language/lexer/variable-parser.h" #include "libpspp/compiler.h" +#include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/pool.h" #include "libpspp/str.h" @@ -91,7 +92,9 @@ static trns_proc_func count_trns_proc; static trns_free_func count_trns_free; static bool parse_numeric_criteria (struct lexer *, struct pool *, struct criteria *); -static bool parse_string_criteria (struct lexer *, struct pool *, struct criteria *); +static bool parse_string_criteria (struct lexer *, struct pool *, + struct criteria *, + const char *dict_encoding); int cmd_count (struct lexer *lexer, struct dataset *ds) @@ -133,13 +136,14 @@ cmd_count (struct lexer *lexer, struct dataset *ds) crit = dv->crit = pool_alloc (trns->pool, sizeof *crit); for (;;) { + struct dictionary *dict = dataset_dict (ds); bool ok; crit->next = NULL; crit->vars = NULL; - if (!parse_variables_const (lexer, dataset_dict (ds), &crit->vars, + if (!parse_variables_const (lexer, dict, &crit->vars, &crit->var_cnt, - PV_DUPLICATE | PV_SAME_TYPE)) + PV_DUPLICATE | PV_SAME_TYPE)) goto fail; pool_register (trns->pool, free, crit->vars); @@ -150,7 +154,8 @@ cmd_count (struct lexer *lexer, struct dataset *ds) if (var_is_numeric (crit->vars[0])) ok = parse_numeric_criteria (lexer, trns->pool, crit); else - ok = parse_string_criteria (lexer, trns->pool, crit); + ok = parse_string_criteria (lexer, trns->pool, crit, + dict_get_encoding (dict)); if (!ok) goto fail; @@ -230,7 +235,8 @@ parse_numeric_criteria (struct lexer *lexer, struct pool *pool, struct criteria /* Parses a set of string criteria values. Returns success. */ static bool -parse_string_criteria (struct lexer *lexer, struct pool *pool, struct criteria *crit) +parse_string_criteria (struct lexer *lexer, struct pool *pool, + struct criteria *crit, const char *dict_encoding) { int len = 0; size_t allocated = 0; @@ -244,6 +250,8 @@ parse_string_criteria (struct lexer *lexer, struct pool *pool, struct criteria * for (;;) { char **cur; + char *s; + if (crit->value_cnt >= allocated) crit->values.str = pool_2nrealloc (pool, crit->values.str, &allocated, @@ -251,11 +259,17 @@ parse_string_criteria (struct lexer *lexer, struct pool *pool, struct criteria * if (!lex_force_string (lexer)) return false; + + s = recode_string (dict_encoding, "UTF-8", lex_tokcstr (lexer), + ss_length (lex_tokss (lexer))); + cur = &crit->values.str[crit->value_cnt++]; *cur = pool_alloc (pool, len + 1); - str_copy_rpad (*cur, len + 1, lex_tokcstr (lexer)); + str_copy_rpad (*cur, len + 1, s); lex_get (lexer); + free (s); + lex_match (lexer, T_COMMA); if (lex_match (lexer, T_RPAREN)) break; diff --git a/src/language/xforms/fail.c b/src/language/xforms/fail.c index 3ca94524..feedb780 100644 --- a/src/language/xforms/fail.c +++ b/src/language/xforms/fail.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 2007, 2009 Free Software Foundation, Inc. + Copyright (C) 2007, 2009, 2010 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -38,10 +38,8 @@ trns_fail (void *x UNUSED, struct ccase **c UNUSED, } int -cmd_debug_xform_fail (struct lexer *lexer, struct dataset *ds) +cmd_debug_xform_fail (struct lexer *lexer UNUSED, struct dataset *ds) { - add_transformation (ds, trns_fail, NULL, NULL); - - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/language/xforms/recode.c b/src/language/xforms/recode.c index 62cf387e..77543ca7 100644 --- a/src/language/xforms/recode.c +++ b/src/language/xforms/recode.c @@ -85,8 +85,6 @@ struct recode_trns { struct pool *pool; - - /* Variable types, for convenience. */ enum val_type src_type; /* src_vars[*] type. */ enum val_type dst_type; /* dst_vars[*] type. */ @@ -106,18 +104,21 @@ struct recode_trns }; static bool parse_src_vars (struct lexer *, struct recode_trns *, const struct dictionary *dict); -static bool parse_mappings (struct lexer *, struct recode_trns *); +static bool parse_mappings (struct lexer *, struct recode_trns *, + const char *dict_encoding); static bool parse_dst_vars (struct lexer *, struct recode_trns *, const struct dictionary *dict); static void add_mapping (struct recode_trns *, size_t *map_allocated, const struct map_in *); static bool parse_map_in (struct lexer *lexer, struct map_in *, struct pool *, - enum val_type src_type, size_t max_src_width); + enum val_type src_type, size_t max_src_width, + const char *dict_encoding); static void set_map_in_generic (struct map_in *, enum map_in_type); static void set_map_in_num (struct map_in *, enum map_in_type, double, double); static void set_map_in_str (struct map_in *, struct pool *, - struct substring, size_t width); + struct substring, size_t width, + const char *dict_encoding); static bool parse_map_out (struct lexer *lexer, struct pool *, struct map_out *); static void set_map_out_num (struct map_out *, double); @@ -138,15 +139,16 @@ cmd_recode (struct lexer *lexer, struct dataset *ds) { do { + struct dictionary *dict = dataset_dict (ds); struct recode_trns *trns = pool_create_container (struct recode_trns, pool); /* Parse source variable names, then input to output mappings, then destintation variable names. */ - if (!parse_src_vars (lexer, trns, dataset_dict (ds) ) - || !parse_mappings (lexer, trns) - || !parse_dst_vars (lexer, trns, dataset_dict (ds))) + if (!parse_src_vars (lexer, trns, dict) + || !parse_mappings (lexer, trns, dict_get_encoding (dict)) + || !parse_dst_vars (lexer, trns, dict)) { recode_trns_free (trns); return CMD_FAILURE; @@ -160,9 +162,9 @@ cmd_recode (struct lexer *lexer, struct dataset *ds) /* Create destination variables, if needed. This must be the final step; otherwise we'd have to delete destination variables on failure. */ - trns->dst_dict = dataset_dict (ds); + trns->dst_dict = dict; if (trns->src_vars != trns->dst_vars) - create_dst_vars (trns, dataset_dict (ds)); + create_dst_vars (trns, dict); /* Done. */ add_transformation (ds, @@ -170,7 +172,7 @@ cmd_recode (struct lexer *lexer, struct dataset *ds) } while (lex_match (lexer, T_SLASH)); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Parses a set of variables to recode into TRNS->src_vars and @@ -192,7 +194,8 @@ parse_src_vars (struct lexer *lexer, into TRNS->mappings and TRNS->map_cnt. Sets TRNS->dst_type. Returns true if successful, false on parse error. */ static bool -parse_mappings (struct lexer *lexer, struct recode_trns *trns) +parse_mappings (struct lexer *lexer, struct recode_trns *trns, + const char *dict_encoding) { size_t map_allocated; bool have_dst_type; @@ -232,7 +235,8 @@ parse_mappings (struct lexer *lexer, struct recode_trns *trns) struct map_in in; if (!parse_map_in (lexer, &in, trns->pool, - trns->src_type, trns->max_src_width)) + trns->src_type, trns->max_src_width, + dict_encoding)) return false; add_mapping (trns, &map_allocated, &in); lex_match (lexer, T_COMMA); @@ -292,7 +296,8 @@ parse_mappings (struct lexer *lexer, struct recode_trns *trns) false on parse error. */ static bool parse_map_in (struct lexer *lexer, struct map_in *in, struct pool *pool, - enum val_type src_type, size_t max_src_width) + enum val_type src_type, size_t max_src_width, + const char *dict_encoding) { if (lex_match_id (lexer, "ELSE")) @@ -319,7 +324,8 @@ parse_map_in (struct lexer *lexer, struct map_in *in, struct pool *pool, return false; else { - set_map_in_str (in, pool, lex_tokss (lexer), max_src_width); + set_map_in_str (in, pool, lex_tokss (lexer), max_src_width, + dict_encoding); lex_get (lexer); if (lex_token (lexer) == T_ID && lex_id_match (ss_cstr ("THRU"), lex_tokss (lexer))) @@ -371,13 +377,16 @@ set_map_in_num (struct map_in *in, enum map_in_type type, double x, double y) right to WIDTH characters long. */ static void set_map_in_str (struct map_in *in, struct pool *pool, - struct substring string, size_t width) + struct substring string, size_t width, + const char *dict_encoding) { + char *s = recode_string (dict_encoding, "UTF-8", + ss_data (string), ss_length (string)); in->type = MAP_SINGLE; value_init_pool (pool, &in->x, width); value_copy_buf_rpad (&in->x, width, - CHAR_CAST_BUG (uint8_t *, ss_data (string)), - ss_length (string), ' '); + CHAR_CAST (uint8_t *, s), strlen (s), ' '); + free (s); } /* Parses a mapping output value into OUT, allocating memory from diff --git a/src/language/xforms/sample.c b/src/language/xforms/sample.c index 693f8447..f2a30a2a 100644 --- a/src/language/xforms/sample.c +++ b/src/language/xforms/sample.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2009, 2011 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2009-2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -111,7 +111,7 @@ cmd_sample (struct lexer *lexer, struct dataset *ds) trns->frac = frac; add_transformation (ds, sample_trns_proc, sample_trns_free, trns); - return lex_end_of_command (lexer); + return CMD_SUCCESS; } /* Executes a SAMPLE transformation. */ diff --git a/src/language/xforms/select-if.c b/src/language/xforms/select-if.c index 4240f63b..9df4eba5 100644 --- a/src/language/xforms/select-if.c +++ b/src/language/xforms/select-if.c @@ -125,5 +125,5 @@ cmd_filter (struct lexer *lexer, struct dataset *ds) dict_set_filter (dict, v); } - return lex_end_of_command (lexer); + return CMD_SUCCESS; } diff --git a/src/libpspp/automake.mk b/src/libpspp/automake.mk index fcb28140..e4948406 100644 --- a/src/libpspp/automake.mk +++ b/src/libpspp/automake.mk @@ -28,8 +28,6 @@ src_libpspp_libpspp_la_SOURCES = \ src/libpspp/float-format.h \ src/libpspp/freaderror.c \ src/libpspp/freaderror.h \ - src/libpspp/getl.c \ - src/libpspp/getl.h \ src/libpspp/hash-functions.c \ src/libpspp/hash-functions.h \ src/libpspp/hash.c \ @@ -56,8 +54,6 @@ src_libpspp_libpspp_la_SOURCES = \ src/libpspp/misc.h \ src/libpspp/model-checker.c \ src/libpspp/model-checker.h \ - src/libpspp/msg-locator.c \ - src/libpspp/msg-locator.h \ src/libpspp/pool.c \ src/libpspp/pool.h \ src/libpspp/prompt.c \ diff --git a/src/libpspp/getl.c b/src/libpspp/getl.c deleted file mode 100644 index 9db6c3ae..00000000 --- a/src/libpspp/getl.c +++ /dev/null @@ -1,271 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006, 2009, 2010 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#include - -#include "libpspp/getl.h" - -#include - -#include "libpspp/ll.h" -#include "libpspp/str.h" -#include "libpspp/string-array.h" - -#include "gl/configmake.h" -#include "gl/relocatable.h" -#include "gl/xalloc.h" - -struct getl_source - { - struct getl_source *included_from; /* File that this is nested inside. */ - struct getl_source *includes; /* File nested inside this file. */ - - struct ll ll; /* Element in the sources list */ - - struct getl_interface *interface; - enum syntax_mode syntax_mode; - enum error_mode error_mode; - }; - -struct source_stream - { - struct ll_list sources ; /* List of source files. */ - struct string_array include_path; - }; - -char ** -getl_include_path (const struct source_stream *ss_) -{ - struct source_stream *ss = CONST_CAST (struct source_stream *, ss_); - string_array_terminate_null (&ss->include_path); - return ss->include_path.strings; -} - -static struct getl_source * -current_source (const struct source_stream *ss) -{ - const struct ll *ll = ll_head (&ss->sources); - return ll_data (ll, struct getl_source, ll ); -} - -enum syntax_mode -source_stream_current_syntax_mode (const struct source_stream *ss) -{ - struct getl_source *cs = current_source (ss); - - return cs->syntax_mode; -} - - - -enum error_mode -source_stream_current_error_mode (const struct source_stream *ss) -{ - struct getl_source *cs = current_source (ss); - - return cs->error_mode; -} - - - -/* Initialize getl. */ -struct source_stream * -create_source_stream (void) -{ - struct source_stream *ss; - - ss = xzalloc (sizeof (*ss)); - ll_init (&ss->sources); - - string_array_init (&ss->include_path); - string_array_append (&ss->include_path, "."); - if (getenv ("HOME") != NULL) - string_array_append_nocopy (&ss->include_path, - xasprintf ("%s/.pspp", getenv ("HOME"))); - string_array_append (&ss->include_path, relocate (PKGDATADIR)); - - return ss; -} - -/* Delete everything from the include path. */ -void -getl_clear_include_path (struct source_stream *ss) -{ - string_array_clear (&ss->include_path); -} - -/* Add to the include path. */ -void -getl_add_include_dir (struct source_stream *ss, const char *path) -{ - string_array_append (&ss->include_path, path); -} - -/* Appends source S to the list of source files. */ -void -getl_append_source (struct source_stream *ss, - struct getl_interface *i, - enum syntax_mode syntax_mode, - enum error_mode err_mode) -{ - struct getl_source *s = xzalloc (sizeof ( struct getl_source )); - - s->interface = i ; - s->syntax_mode = syntax_mode; - s->error_mode = err_mode; - - ll_push_tail (&ss->sources, &s->ll); -} - -/* Nests source S within the current source file. */ -void -getl_include_source (struct source_stream *ss, - struct getl_interface *i, - enum syntax_mode syntax_mode, - enum error_mode err_mode) -{ - struct getl_source *current = current_source (ss); - struct getl_source *s = xzalloc (sizeof ( struct getl_source )); - - s->interface = i; - - s->included_from = current ; - s->includes = NULL; - s->syntax_mode = syntax_mode; - s->error_mode = err_mode; - current->includes = s; - - ll_push_head (&ss->sources, &s->ll); -} - -/* Closes the current source, and move the current source to the - next file in the chain. */ -static void -close_source (struct source_stream *ss) -{ - struct getl_source *s = current_source (ss); - - if ( s->interface->close ) - s->interface->close (s->interface); - - ll_pop_head (&ss->sources); - - if (s->included_from != NULL) - current_source (ss)->includes = NULL; - - free (s); -} - -/* Closes all sources until an interactive source is - encountered. */ -void -getl_abort_noninteractive (struct source_stream *ss) -{ - while ( ! ll_is_empty (&ss->sources)) - { - const struct getl_source *s = current_source (ss); - - if ( !s->interface->interactive (s->interface) ) - close_source (ss); - } -} - -/* Returns true if the current source is interactive, - false otherwise. */ -bool -getl_is_interactive (const struct source_stream *ss) -{ - const struct getl_source *s = current_source (ss); - - if (ll_is_empty (&ss->sources) ) - return false; - - return s->interface->interactive (s->interface); -} - -/* Returns the name of the current source, or NULL if there is no - current source */ -const char * -getl_source_name (const struct source_stream *ss) -{ - const struct getl_source *s = current_source (ss); - - if ( ll_is_empty (&ss->sources) ) - return NULL; - - if ( ! s->interface->name ) - return NULL; - - return s->interface->name (s->interface); -} - -/* Returns the line number within the current source, or 0 if there is no - current source. */ -int -getl_source_location (const struct source_stream *ss) -{ - const struct getl_source *s = current_source (ss); - - if ( ll_is_empty (&ss->sources) ) - return 0; - - if ( !s->interface->location ) - return 0; - - return s->interface->location (s->interface); -} - - -/* Close getl. */ -void -destroy_source_stream (struct source_stream *ss) -{ - while ( !ll_is_empty (&ss->sources)) - close_source (ss); - string_array_destroy (&ss->include_path); - - free (ss); -} - - -/* Reads a single line into LINE. - Returns true when a line has been read, false at end of input. -*/ -bool -getl_read_line (struct source_stream *ss, struct string *line) -{ - assert (ss != NULL); - while (!ll_is_empty (&ss->sources)) - { - struct getl_source *s = current_source (ss); - - ds_clear (line); - if (s->interface->read (s->interface, line)) - { - while (s) - { - if (s->interface->filter) - s->interface->filter (s->interface, line); - s = s->included_from; - } - - return true; - } - close_source (ss); - } - - return false; -} diff --git a/src/libpspp/getl.h b/src/libpspp/getl.h deleted file mode 100644 index c7d0967f..00000000 --- a/src/libpspp/getl.h +++ /dev/null @@ -1,113 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006, 2010, 2011 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#ifndef GETL_H -#define GETL_H 1 - -#include -#include "libpspp/ll.h" - -struct string; - -struct getl_source; - -/* Syntax rules that apply to a given source line. */ -enum syntax_mode - { - /* Each line that begins in column 1 starts a new command. A - `+' or `-' in column 1 is ignored to allow visual - indentation of new commands. Continuation lines must be - indented from the left margin. A period at the end of a - line does end a command, but it is optional. */ - GETL_BATCH, - - /* Each command must end in a period or in a blank line. */ - GETL_INTERACTIVE - }; - -enum error_mode - { - /* When errors are encountered, report the error and continue to - the next command. */ - ERRMODE_CONTINUE, - - /* When errors are encountered, abort the current stream. */ - ERRMODE_STOP - }; - -/* An abstract base class for objects which act as line buffers for the - PSPP. Ie anything which might contain content for the lexer */ -struct getl_interface - { - /* Returns true if the interface is interactive, that is, if - it prompts a human user. This property is independent of - the syntax mode returned by the read member function. */ - bool (*interactive) (const struct getl_interface *); - - /* Read a line the intended syntax mode from the interface. - Returns true if succesful, false on failure or at end of - input. */ - bool (*read) (struct getl_interface *, - struct string *); - - /* Close and destroy the interface */ - void (*close) (struct getl_interface *); - - /* Filter for current and all included sources, which may - modify the line. Usually null. */ - void (*filter) (struct getl_interface *, - struct string *line); - - /* Returns the name of the source */ - const char * (*name) (const struct getl_interface *); - - /* Returns the current location within the source */ - int (*location) (const struct getl_interface *); - }; - -struct source_stream; - -struct source_stream *create_source_stream (void); - -enum syntax_mode source_stream_current_syntax_mode - (const struct source_stream *); - - -enum error_mode source_stream_current_error_mode - (const struct source_stream *); - - -void destroy_source_stream (struct source_stream *); - -void getl_clear_include_path (struct source_stream *); -void getl_add_include_dir (struct source_stream *, const char *); -char **getl_include_path (const struct source_stream *); - -void getl_abort_noninteractive (struct source_stream *); -bool getl_is_interactive (const struct source_stream *); - -bool getl_read_line (struct source_stream *, struct string *); - -void getl_append_source (struct source_stream *, struct getl_interface *s, - enum syntax_mode, enum error_mode) ; - -void getl_include_source (struct source_stream *, struct getl_interface *s, - enum syntax_mode, enum error_mode) ; - -const char * getl_source_name (const struct source_stream *); -int getl_source_location (const struct source_stream *); - -#endif /* line-buffer.h */ diff --git a/src/libpspp/message.c b/src/libpspp/message.c index c7b56e0a..80475b35 100644 --- a/src/libpspp/message.c +++ b/src/libpspp/message.c @@ -17,7 +17,6 @@ #include #include "libpspp/message.h" -#include "libpspp/msg-locator.h" #include #include @@ -26,10 +25,12 @@ #include #include -#include "data/settings.h" +#include "libpspp/cast.h" #include "libpspp/str.h" #include "libpspp/version.h" +#include "data/settings.h" +#include "gl/minmax.h" #include "gl/progname.h" #include "gl/xalloc.h" #include "gl/xvasprintf.h" @@ -37,8 +38,9 @@ #include "gettext.h" #define _(msgid) gettext (msgid) -/* Message handler as set by msg_init(). */ -static void (*msg_handler) (const struct msg *); +/* Message handler as set by msg_set_handler(). */ +static void (*msg_handler) (const struct msg *, void *aux); +static void *msg_aux; /* Disables emitting messages if positive. */ static int messages_disabled; @@ -57,27 +59,19 @@ msg (enum msg_class class, const char *format, ...) m.severity = msg_class_to_severity (class); va_start (args, format); m.text = xvasprintf (format, args); - m.where.file_name = NULL; - m.where.line_number = 0; - m.where.first_column = 0; - m.where.last_column = 0; + m.file_name = NULL; + m.first_line = m.last_line = 0; + m.first_column = m.last_column = 0; va_end (args); msg_emit (&m); } -static struct source_stream *s_stream; - void -msg_init (struct source_stream *ss, void (*handler) (const struct msg *) ) +msg_set_handler (void (*handler) (const struct msg *, void *aux), void *aux) { - s_stream = ss; msg_handler = handler; -} - -void -msg_done (void) -{ + msg_aux = aux; } /* Working with messages. */ @@ -89,8 +83,8 @@ msg_dup (const struct msg *m) struct msg *new_msg; new_msg = xmemdup (m, sizeof *m); - if (m->where.file_name != NULL) - new_msg->where.file_name = xstrdup (m->where.file_name); + if (m->file_name != NULL) + new_msg->file_name = xstrdup (m->file_name); new_msg->text = xstrdup (m->text); return new_msg; @@ -98,13 +92,13 @@ msg_dup (const struct msg *m) /* Frees a message created by msg_dup(). - (Messages not created by msg_dup(), as well as their where.file_name + (Messages not created by msg_dup(), as well as their file_name members, are typically not dynamically allocated, so this function should not be used to destroy them.) */ void msg_destroy (struct msg *m) { - free (m->where.file_name); + free (m->file_name); free (m->text); free (m); } @@ -118,23 +112,56 @@ msg_to_string (const struct msg *m, const char *command_name) ds_init_empty (&s); if (m->category != MSG_C_GENERAL - && (m->where.file_name - || m->where.line_number > 0 - || m->where.first_column > 0)) + && (m->file_name || m->first_line > 0 || m->first_column > 0)) { - if (m->where.file_name) - ds_put_format (&s, "%s", m->where.file_name); - if (m->where.line_number > 0) + int l1 = m->first_line; + int l2 = MAX (m->first_line, m->last_line - 1); + int c1 = m->first_column; + int c2 = MAX (m->first_column, m->last_column - 1); + + if (m->file_name) + ds_put_format (&s, "%s", m->file_name); + + if (l1 > 0) { if (!ds_is_empty (&s)) ds_put_byte (&s, ':'); - ds_put_format (&s, "%d", m->where.line_number); + + if (l2 > l1) + { + if (c1 > 0) + ds_put_format (&s, "%d.%d-%d.%d", l1, c1, l2, c2); + else + ds_put_format (&s, "%d-%d", l1, l2); + } + else + { + if (c1 > 0) + { + if (c2 > c1) + { + /* The GNU coding standards say to use + LINENO-1.COLUMN-1-COLUMN-2 for this case, but GNU + Emacs interprets COLUMN-2 as LINENO-2 if I do that. + I've submitted an Emacs bug report: + http://debbugs.gnu.org/cgi/bugreport.cgi?bug=7725. + + For now, let's be compatible. */ + ds_put_format (&s, "%d.%d-%d.%d", l1, c1, l1, c2); + } + else + ds_put_format (&s, "%d.%d", l1, c1); + } + else + ds_put_format (&s, "%d", l1); + } } - if (m->where.first_column > 0) + else if (c1 > 0) { - ds_put_format (&s, ".%d", m->where.first_column); - if (m->where.last_column > m->where.first_column + 1) - ds_put_format (&s, "-%d", m->where.last_column - 1); + if (c2 > c1) + ds_put_format (&s, ".%d-%d", c1, c2); + else + ds_put_format (&s, ".%d", c1); } ds_put_cstr (&s, ": "); } @@ -214,12 +241,13 @@ submit_note (char *s) m.category = MSG_C_GENERAL; m.severity = MSG_S_NOTE; - m.where.file_name = NULL; - m.where.line_number = 0; - m.where.first_column = 0; - m.where.last_column = 0; + m.file_name = NULL; + m.first_line = 0; + m.last_line = 0; + m.first_column = 0; + m.last_column = 0; m.text = s; - msg_handler (&m); + msg_handler (&m, msg_aux); free (s); } @@ -236,7 +264,7 @@ process_msg (const struct msg *m) || (warnings_off && m->severity == MSG_S_WARNING) ) return; - msg_handler (m); + msg_handler (m, msg_aux); counts[m->severity]++; max_msgs = settings_get_max_messages (m->severity); @@ -271,20 +299,6 @@ process_msg (const struct msg *m) void msg_emit (struct msg *m) { - if ( s_stream && m->where.file_name == NULL ) - { - struct msg_locator loc; - - get_msg_location (s_stream, &loc); - m->where.file_name = loc.file_name; - m->where.line_number = loc.line_number; - } - else - { - m->where.file_name = NULL; - m->where.line_number = 0; - } - if (!messages_disabled) process_msg (m); diff --git a/src/libpspp/message.h b/src/libpspp/message.h index 7c59847f..5ced994a 100644 --- a/src/libpspp/message.h +++ b/src/libpspp/message.h @@ -67,30 +67,24 @@ msg_class_from_category_and_severity (enum msg_category category, return category * 3 + severity; } -/* A file location. */ -struct msg_locator - { - char *file_name; /* File name (NULL if none). */ - int line_number; /* Line number (0 if none). */ - int first_column; /* 1-based column number (0 if none). */ - int last_column; /* 1-based exclusive last column (0 if none). */ - }; - /* A message. */ struct msg { enum msg_category category; /* Message category. */ enum msg_severity severity; /* Message severity. */ - struct msg_locator where; /* File location, or (NULL, -1). */ + char *file_name; /* Name of file containing error, or NULL. */ + int first_line; /* 1-based line number, or 0 if none. */ + int last_line; /* 1-based exclusive last line (0=none). */ + int first_column; /* 1-based first column, or 0 if none. */ + int last_column; /* 1-based exclusive last column (0=none). */ char *text; /* Error text. */ }; struct source_stream ; /* Initialization. */ -void msg_init (struct source_stream *, void (*handler) (const struct msg *) ); - -void msg_done (void); +void msg_set_handler (void (*handler) (const struct msg *, void *lexer), + void *aux); /* Working with messages. */ struct msg *msg_dup (const struct msg *); @@ -107,9 +101,6 @@ void msg_enable (void); void msg_disable (void); /* Error context. */ -void msg_push_msg_locator (const struct msg_locator *); -void msg_pop_msg_locator (const struct msg_locator *); - bool msg_ui_too_many_errors (void); void msg_ui_reset_counts (void); bool msg_ui_any_errors (void); diff --git a/src/libpspp/msg-locator.c b/src/libpspp/msg-locator.c deleted file mode 100644 index d29f35db..00000000 --- a/src/libpspp/msg-locator.c +++ /dev/null @@ -1,87 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006, 2011 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#include - -#include "libpspp/msg-locator.h" - -#include - -#include "libpspp/assertion.h" -#include "libpspp/cast.h" -#include "libpspp/message.h" -#include "libpspp/getl.h" - -#include "gl/xalloc.h" - -/* File locator stack. */ -static const struct msg_locator **file_loc; - -static int nfile_loc, mfile_loc; - -void -msg_locator_done (void) -{ - free(file_loc); - file_loc = NULL; - nfile_loc = mfile_loc = 0; -} - - -/* File locator stack functions. */ - -/* Pushes F onto the stack of file locations. */ -void -msg_push_msg_locator (const struct msg_locator *loc) -{ - if (nfile_loc >= mfile_loc) - { - if (mfile_loc == 0) - mfile_loc = 8; - else - mfile_loc *= 2; - - file_loc = xnrealloc (file_loc, mfile_loc, sizeof *file_loc); - } - - file_loc[nfile_loc++] = loc; -} - -/* Pops F off the stack of file locations. - Argument F is only used for verification that that is actually the - item on top of the stack. */ -void -msg_pop_msg_locator (const struct msg_locator *loc) -{ - assert (nfile_loc >= 0 && file_loc[nfile_loc - 1] == loc); - nfile_loc--; -} - -/* Puts the current file and line number into LOC, or NULL and -1 if - none. */ -void -get_msg_location (const struct source_stream *ss, struct msg_locator *loc) -{ - if (nfile_loc) - { - *loc = *file_loc[nfile_loc - 1]; - } - else - { - loc->file_name = CONST_CAST (char *, getl_source_name (ss)); - loc->line_number = getl_source_location (ss); - } -} diff --git a/src/libpspp/msg-locator.h b/src/libpspp/msg-locator.h deleted file mode 100644 index 1dfc8831..00000000 --- a/src/libpspp/msg-locator.h +++ /dev/null @@ -1,34 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -struct msg_locator ; - -void msg_locator_done (void); - -/* File locator stack functions. */ - -/* Pushes F onto the stack of file locations. */ -void msg_push_msg_locator (const struct msg_locator *loc); - -/* Pops F off the stack of file locations. - Argument F is only used for verification that that is actually the - item on top of the stack. */ -void msg_pop_msg_locator (const struct msg_locator *loc); - -struct source_stream ; -/* Puts the current file and line number into LOC, or NULL and -1 if - none. */ -void get_msg_location (const struct source_stream *ss, struct msg_locator *loc); diff --git a/src/output/driver.c b/src/output/driver.c index 136b4c4f..0da3e6a0 100644 --- a/src/output/driver.c +++ b/src/output/driver.c @@ -50,9 +50,6 @@ static const struct output_driver_factory *factories[]; /* Drivers currently registered with output_driver_register(). */ static struct llx_list drivers = LLX_INITIALIZER (drivers); -static struct output_item *deferred_syntax; -static bool in_command; - void output_close (void) { @@ -72,8 +69,10 @@ output_get_supported_formats (struct string_set *formats) string_set_insert (formats, (*fp)->extension); } -static void -output_submit__ (struct output_item *item) +/* Submits ITEM to the configured output drivers, and transfers ownership to + the output subsystem. */ +void +output_submit (struct output_item *item) { struct llx *llx, *next; @@ -105,53 +104,6 @@ output_submit__ (struct output_item *item) output_item_unref (item); } -static void -flush_deferred_syntax (void) -{ - if (deferred_syntax != NULL) - { - output_submit__ (deferred_syntax); - deferred_syntax = NULL; - } -} - -/* Submits ITEM to the configured output drivers, and transfers ownership to - the output subsystem. */ -void -output_submit (struct output_item *item) -{ - if (is_text_item (item)) - { - struct text_item *text = to_text_item (item); - switch (text_item_get_type (text)) - { - case TEXT_ITEM_SYNTAX: - if (!in_command) - { - flush_deferred_syntax (); - deferred_syntax = item; - return; - } - break; - - case TEXT_ITEM_COMMAND_OPEN: - output_submit__ (item); - flush_deferred_syntax (); - in_command = true; - return; - - case TEXT_ITEM_COMMAND_CLOSE: - in_command = false; - break; - - default: - break; - } - } - - output_submit__ (item); -} - /* Flushes output to screen devices, so that the user can see output that doesn't fill up an entire page. */ void diff --git a/src/ui/gui/automake.mk b/src/ui/gui/automake.mk index 1595fdd0..eb37c16c 100644 --- a/src/ui/gui/automake.mk +++ b/src/ui/gui/automake.mk @@ -221,8 +221,6 @@ src_ui_gui_psppire_SOURCES = \ src/ui/gui/sort-cases-dialog.h \ src/ui/gui/split-file-dialog.c \ src/ui/gui/split-file-dialog.h \ - src/ui/gui/syntax-editor-source.c \ - src/ui/gui/syntax-editor-source.h \ src/ui/gui/text-data-import-dialog.c \ src/ui/gui/text-data-import-dialog.h \ src/ui/gui/transpose-dialog.c \ diff --git a/src/ui/gui/comments-dialog.c b/src/ui/gui/comments-dialog.c index 78cc74de..35e7c368 100644 --- a/src/ui/gui/comments-dialog.c +++ b/src/ui/gui/comments-dialog.c @@ -1,5 +1,5 @@ /* PSPPIRE - a graphical user interface for PSPP. - Copyright (C) 2007, 2010 Free Software Foundation + Copyright (C) 2007, 2010, 2011 Free Software Foundation This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -195,13 +195,7 @@ refresh (PsppireDialog *dialog, const struct comment_dialog *cd) gtk_text_buffer_set_text (buffer, "", 0); for ( i = 0 ; i < dict_get_document_line_cnt (cd->dict->dict); ++i ) - { - struct string str; - ds_init_empty (&str); - dict_get_document_line (cd->dict->dict, i, &str); - add_line_to_buffer (buffer, ds_cstr (&str)); - ds_destroy (&str); - } + add_line_to_buffer (buffer, dict_get_document_line (cd->dict->dict, i)); } @@ -216,11 +210,10 @@ generate_syntax (const struct comment_dialog *cd) GtkWidget *tv = get_widget_assert (cd->xml, "comments-textview1"); GtkWidget *check = get_widget_assert (cd->xml, "comments-checkbutton1"); GtkTextBuffer *buffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (tv)); - const char *existing_docs = dict_get_documents (cd->dict->dict); str = g_string_new ("\n* Data File Comments.\n\n"); - if ( NULL != existing_docs) + if (dict_get_documents (cd->dict->dict) != NULL) g_string_append (str, "DROP DOCUMENTS.\n"); g_string_append (str, "ADD DOCUMENT\n"); diff --git a/src/ui/gui/executor.c b/src/ui/gui/executor.c index 24b80db2..754d09ae 100644 --- a/src/ui/gui/executor.c +++ b/src/ui/gui/executor.c @@ -22,15 +22,12 @@ #include "data/procedure.h" #include "language/command.h" #include "language/lexer/lexer.h" -#include "language/syntax-string-source.h" #include "libpspp/cast.h" -#include "libpspp/getl.h" #include "output/driver.h" #include "ui/gui/psppire-data-store.h" #include "ui/gui/psppire-output-window.h" extern struct dataset *the_dataset; -extern struct source_stream *the_source_stream; extern PsppireDataStore *the_data_store; /* Lazy casereader callback function used by execute_syntax. */ @@ -42,7 +39,7 @@ create_casereader_from_data_store (void *data_store_) } gboolean -execute_syntax (struct getl_interface *sss) +execute_syntax (struct lex_reader *lex_reader) { struct lexer *lexer; gboolean retval = TRUE; @@ -74,9 +71,9 @@ execute_syntax (struct getl_interface *sss) g_return_val_if_fail (proc_has_active_file (the_dataset), FALSE); - lexer = lex_create (the_source_stream); - - getl_append_source (the_source_stream, sss, GETL_BATCH, ERRMODE_CONTINUE); + lexer = lex_create (); + psppire_set_lexer (lexer); + lex_append (lexer, lex_reader); for (;;) { @@ -85,8 +82,7 @@ execute_syntax (struct getl_interface *sss) if ( cmd_result_is_failure (result)) { retval = FALSE; - if ( source_stream_current_error_mode (the_source_stream) - == ERRMODE_STOP ) + if ( lex_get_error_mode (lexer) == LEX_ERROR_STOP ) break; } @@ -94,9 +90,8 @@ execute_syntax (struct getl_interface *sss) break; } - getl_abort_noninteractive (the_source_stream); - lex_destroy (lexer); + psppire_set_lexer (NULL); proc_execute (the_dataset); @@ -125,5 +120,5 @@ execute_syntax_string (gchar *syntax) void execute_const_syntax_string (const gchar *syntax) { - execute_syntax (create_syntax_string_source (syntax)); + execute_syntax (lex_reader_for_string (syntax)); } diff --git a/src/ui/gui/executor.h b/src/ui/gui/executor.h index 81ece2b8..ae363e96 100644 --- a/src/ui/gui/executor.h +++ b/src/ui/gui/executor.h @@ -20,9 +20,9 @@ #include -struct getl_interface; +struct lex_reader; -gboolean execute_syntax (struct getl_interface *sss); +gboolean execute_syntax (struct lex_reader *); gchar *execute_syntax_string (gchar *syntax); void execute_const_syntax_string (const gchar *syntax); diff --git a/src/ui/gui/main.c b/src/ui/gui/main.c index 0c88204a..7e9d4ee5 100644 --- a/src/ui/gui/main.c +++ b/src/ui/gui/main.c @@ -21,13 +21,14 @@ #include #include +#include "language/lexer/include-path.h" #include "libpspp/argv-parser.h" #include "libpspp/assertion.h" #include "libpspp/cast.h" -#include "libpspp/getl.h" -#include "libpspp/version.h" #include "libpspp/copyleft.h" #include "libpspp/str.h" +#include "libpspp/string-array.h" +#include "libpspp/version.h" #include "ui/source-init-opts.h" #include "gl/configmake.h" @@ -58,28 +59,10 @@ static const struct argv_option startup_options[N_STARTUP_OPTIONS] = {"no-splash", 'q', no_argument, OPT_NO_SPLASH} }; -static char * -get_default_include_path (void) -{ - struct source_stream *ss; - struct string dst; - char **path; - size_t i; - - ss = create_source_stream (); - path = getl_include_path (ss); - ds_init_empty (&dst); - for (i = 0; path[i] != NULL; i++) - ds_put_format (&dst, " %s", path[i]); - destroy_source_stream (ss); - - return ds_steal_cstr (&dst); -} - static void usage (void) { - char *default_include_path = get_default_include_path (); + char *inc_path = string_array_join (include_path_default (), " "); GOptionGroup *gtk_options; GOptionContext *ctx; gchar *gtk_help_base, *gtk_help; @@ -116,16 +99,16 @@ Language options:\n\ set to `compatible' to disable PSPP extensions\n\ -i, --interactive interpret syntax in interactive mode\n\ -s, --safer don't allow some unsafe operations\n\ -Default search path:%s\n\ +Default search path: %s\n\ \n\ Informative output:\n\ -h, --help display this help and exit\n\ -V, --version output version information and exit\n\ \n\ A non-option argument is interpreted as a .sav or .por file to load.\n"), - program_name, gtk_help, default_include_path); + program_name, gtk_help, inc_path); - free (default_include_path); + free (inc_path); g_free (gtk_help_base); emit_bug_reporting_address (); @@ -202,7 +185,6 @@ quit_one_loop (gpointer data) struct initialisation_parameters { - struct source_stream *ss; const char *data_file; GtkWidget *splash_window; }; @@ -212,7 +194,7 @@ static gboolean run_inner_loop (gpointer data) { struct initialisation_parameters *ip = data; - initialize (ip->ss, ip->data_file); + initialize (ip->data_file); g_timeout_add (500, hide_splash_window, ip->splash_window); @@ -240,7 +222,6 @@ main (int argc, char *argv[]) struct initialisation_parameters init_p; gboolean show_splash = TRUE; struct argv_parser *parser; - struct source_stream *ss; const gchar *vers; set_program_name (argv[0]); @@ -264,7 +245,6 @@ main (int argc, char *argv[]) } - ss = create_source_stream (); /* Parse our own options. This must come BEFORE gdk_init otherwise options such as --help --version which ought to work without an X server, won't. @@ -272,7 +252,7 @@ main (int argc, char *argv[]) parser = argv_parser_create (); argv_parser_add_options (parser, startup_options, N_STARTUP_OPTIONS, startup_option_callback, &show_splash); - source_init_register_argv_parser (parser, ss); + source_init_register_argv_parser (parser); if (!argv_parser_run (parser, argc, argv)) exit (EXIT_FAILURE); argv_parser_destroy (parser); @@ -283,7 +263,6 @@ main (int argc, char *argv[]) gdk_init (&argc, &argv); init_p.splash_window = create_splash_window (); - init_p.ss = ss; init_p.data_file = optind < argc ? argv[optind] : NULL; if ( show_splash ) diff --git a/src/ui/gui/psppire-data-window.c b/src/ui/gui/psppire-data-window.c index b01582bb..6dcb8c69 100644 --- a/src/ui/gui/psppire-data-window.c +++ b/src/ui/gui/psppire-data-window.c @@ -22,7 +22,7 @@ #include "data/any-reader.h" #include "data/procedure.h" -#include "language/syntax-string-source.h" +#include "language/lexer/lexer.h" #include "libpspp/message.h" #include "ui/gui/help-menu.h" #include "ui/gui/binomial-dialog.h" @@ -352,8 +352,9 @@ static gboolean load_file (PsppireWindow *de, const gchar *file_name) { gchar *native_file_name; - struct getl_interface *sss; struct string filename; + gchar *syntax; + bool ok; ds_init_empty (&filename); @@ -364,15 +365,12 @@ load_file (PsppireWindow *de, const gchar *file_name) g_free (native_file_name); - sss = create_syntax_format_source ("GET FILE=%s.", - ds_cstr (&filename)); - + syntax = g_strdup_printf ("GET FILE=%s.", ds_cstr (&filename)); ds_destroy (&filename); - if (execute_syntax (sss) ) - return TRUE; - - return FALSE; + ok = execute_syntax (lex_reader_for_string (syntax)); + g_free (syntax); + return ok; } static GtkWidget * diff --git a/src/ui/gui/psppire-dict.c b/src/ui/gui/psppire-dict.c index d19fc809..91bfed25 100644 --- a/src/ui/gui/psppire-dict.c +++ b/src/ui/gui/psppire-dict.c @@ -1,5 +1,5 @@ /* PSPPIRE - a graphical user interface for PSPP. - Copyright (C) 2004, 2006, 2007, 2009 Free Software Foundation + Copyright (C) 2004, 2006, 2007, 2009, 2010, 2011 Free Software Foundation This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -23,6 +23,7 @@ #include #include "data/dictionary.h" +#include "data/identifier.h" #include "data/missing-values.h" #include "data/value-labels.h" #include "data/variable.h" @@ -425,7 +426,7 @@ psppire_dict_set_name (PsppireDict* d, gint idx, const gchar *name) g_assert (d); g_assert (PSPPIRE_IS_DICT (d)); - if ( ! var_is_valid_name (name, false)) + if ( ! dict_id_is_valid (d->dict, name, false)) return FALSE; if ( idx < dict_get_var_cnt (d->dict)) @@ -527,7 +528,7 @@ gboolean psppire_dict_check_name (const PsppireDict *dict, const gchar *name, gboolean report) { - if ( ! var_is_valid_name (name, report ) ) + if ( ! dict_id_is_valid (dict->dict, name, report ) ) return FALSE; if (psppire_dict_lookup_var (dict, name)) @@ -835,7 +836,7 @@ gboolean psppire_dict_rename_var (PsppireDict *dict, struct variable *v, const gchar *name) { - if ( ! var_is_valid_name (name, false)) + if ( ! dict_id_is_valid (dict->dict, name, false)) return FALSE; /* Make sure no other variable has this name */ diff --git a/src/ui/gui/psppire-syntax-window.c b/src/ui/gui/psppire-syntax-window.c index 1d788819..50e7f09d 100644 --- a/src/ui/gui/psppire-syntax-window.c +++ b/src/ui/gui/psppire-syntax-window.c @@ -30,7 +30,6 @@ #include "psppire-data-window.h" #include "psppire-window-register.h" #include "psppire-syntax-window.h" -#include "syntax-editor-source.h" #include "xalloc.h" @@ -156,8 +155,16 @@ editor_execute_syntax (const PsppireSyntaxWindow *sw, GtkTextIter start, GtkTextIter stop) { PsppireWindow *win = PSPPIRE_WINDOW (sw); - const gchar *name = psppire_window_get_filename (win); - execute_syntax (create_syntax_editor_source (sw->buffer, start, stop, name)); + struct lex_reader *reader; + gchar *text; + + text = gtk_text_buffer_get_text (sw->buffer, &start, &stop, FALSE); + reader = lex_reader_for_string (text); + g_free (text); + + lex_reader_set_file_name (reader, psppire_window_get_filename (win)); + + execute_syntax (reader); } @@ -590,8 +597,6 @@ on_modified_changed (GtkTextBuffer *buffer, PsppireWindow *window) psppire_window_set_unsaved (window); } -extern struct source_stream *the_source_stream ; - static void psppire_syntax_window_init (PsppireSyntaxWindow *window) { @@ -616,7 +621,6 @@ psppire_syntax_window_init (PsppireSyntaxWindow *window) window->edit_paste = get_action_assert (xml, "edit_paste"); window->buffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (text_view)); - window->lexer = lex_create (the_source_stream); window->sb = get_widget_assert (xml, "statusbar2"); window->text_context = gtk_statusbar_get_context_id (GTK_STATUSBAR (window->sb), "Text Context"); diff --git a/src/ui/gui/psppire-syntax-window.h b/src/ui/gui/psppire-syntax-window.h index 08f5a7c5..2b1fe03f 100644 --- a/src/ui/gui/psppire-syntax-window.h +++ b/src/ui/gui/psppire-syntax-window.h @@ -48,7 +48,6 @@ struct _PsppireSyntaxWindow /* */ GtkTextBuffer *buffer; /* The buffer which contains the text */ - struct lexer *lexer; /* Lexer to parse syntax */ GtkWidget *sb; guint text_context; diff --git a/src/ui/gui/psppire-var-store.c b/src/ui/gui/psppire-var-store.c index a2e66854..2a915afe 100644 --- a/src/ui/gui/psppire-var-store.c +++ b/src/ui/gui/psppire-var-store.c @@ -1,5 +1,5 @@ /* PSPPIRE - a graphical user interface for PSPP. - Copyright (C) 2006, 2009, 2010 Free Software Foundation + Copyright (C) 2006, 2009, 2010, 2011 Free Software Foundation This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -489,7 +489,7 @@ psppire_var_store_clear (PsppireSheetModel *model, glong row, glong col) switch (col) { case PSPPIRE_VAR_STORE_COL_LABEL: - var_set_label (pv, NULL); + var_clear_label (pv); return TRUE; break; } @@ -588,7 +588,8 @@ psppire_var_store_set_string (PsppireSheetModel *model, break; case PSPPIRE_VAR_STORE_COL_LABEL: { - var_set_label (pv, text); + var_set_label (pv, text, + psppire_dict_encoding (var_store->dictionary), true); return TRUE; } break; diff --git a/src/ui/gui/psppire.c b/src/ui/gui/psppire.c index bb6006d1..0d41932a 100644 --- a/src/ui/gui/psppire.c +++ b/src/ui/gui/psppire.c @@ -32,9 +32,6 @@ #include "data/sys-file-reader.h" #include "language/lexer/lexer.h" -#include "language/syntax-string-source.h" - -#include "libpspp/getl.h" #include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/version.h" @@ -68,12 +65,10 @@ PsppireVarStore *the_var_store = 0; static void create_icon_factory (void); -struct source_stream *the_source_stream ; struct dataset * the_dataset = NULL; static GtkWidget *the_data_window; -static void handle_msg (const struct msg *); static void load_data_file (const char *); static void @@ -89,7 +84,7 @@ replace_casereader (struct casereader *s) void -initialize (struct source_stream *ss, const char *data_file) +initialize (const char *data_file) { PsppireDict *dictionary = 0; @@ -102,9 +97,7 @@ initialize (struct source_stream *ss, const char *data_file) fh_init (); the_dataset = create_dataset (); - - the_source_stream = ss; - msg_init (ss, handle_msg); + psppire_set_lexer (NULL); dictionary = psppire_dict_new_from_dict (dataset_dict (the_dataset)); @@ -143,7 +136,6 @@ initialize (struct source_stream *ss, const char *data_file) void de_initialize (void) { - destroy_source_stream (the_source_stream); settings_done (); output_close (); i18n_done (); @@ -300,7 +292,25 @@ load_data_file (const char *arg) } static void -handle_msg (const struct msg *m) +handle_msg (const struct msg *m_, void *lexer_) +{ + struct lexer *lexer = lexer_; + struct msg m = *m_; + + if (lexer != NULL && m.file_name == NULL) + { + m.file_name = CONST_CAST (char *, lex_get_file_name (lexer)); + m.first_line = lex_get_first_line_number (lexer, 0); + m.last_line = lex_get_last_line_number (lexer, 0); + m.first_column = lex_get_first_column (lexer, 0); + m.last_column = lex_get_last_column (lexer, 0); + } + + message_item_submit (message_item_create (&m)); +} + +void +psppire_set_lexer (struct lexer *lexer) { - message_item_submit (message_item_create (m)); + msg_set_handler (handle_msg, lexer); } diff --git a/src/ui/gui/psppire.h b/src/ui/gui/psppire.h index ee747ef9..bb796608 100644 --- a/src/ui/gui/psppire.h +++ b/src/ui/gui/psppire.h @@ -17,13 +17,15 @@ #ifndef PSPPIRE_H #define PSPPIRE_H -struct source_stream; +struct lexer; -void initialize (struct source_stream *, const char *data_file); +void initialize (const char *data_file); void de_initialize (void); void psppire_quit (void); const char * output_file_name (void); +void psppire_set_lexer (struct lexer *); + #endif /* PSPPIRE_H */ diff --git a/src/ui/gui/syntax-editor-source.c b/src/ui/gui/syntax-editor-source.c deleted file mode 100644 index 6ec866c9..00000000 --- a/src/ui/gui/syntax-editor-source.c +++ /dev/null @@ -1,130 +0,0 @@ -/* PSPPIRE - a graphical user interface for PSPP. - Copyright (C) 2006, 2009 Free Software Foundation - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - - -#include - -#include -#include -#include -#include - -#include - -#include - -#include "syntax-editor-source.h" -#include "psppire-syntax-window.h" - -#include "xalloc.h" - -struct syntax_editor_source - { - struct getl_interface parent; - GtkTextBuffer *buffer; - GtkTextIter i; - GtkTextIter end; - const gchar *name; - }; - - -static bool -always_false (const struct getl_interface *i UNUSED) -{ - return false; -} - -/* Returns the name of the source */ -static const char * -name (const struct getl_interface *i) -{ - const struct syntax_editor_source *ses = (const struct syntax_editor_source *) i; - return ses->name; -} - - -/* Returns the location within the source */ -static int -location (const struct getl_interface *i) -{ - const struct syntax_editor_source *ses = (const struct syntax_editor_source *) i; - - return gtk_text_iter_get_line (&ses->i); -} - - -static bool -read_line_from_buffer (struct getl_interface *i, - struct string *line) -{ - gchar *text; - GtkTextIter next_line; - - struct syntax_editor_source *ses - = UP_CAST (i, struct syntax_editor_source, parent); - - if ( gtk_text_iter_compare (&ses->i, &ses->end) >= 0) - return false; - - next_line = ses->i; - gtk_text_iter_forward_line (&next_line); - - text = gtk_text_buffer_get_text (ses->buffer, - &ses->i, &next_line, - FALSE); - g_strchomp (text); - - ds_assign_cstr (line, text); - - g_free (text); - - gtk_text_iter_forward_line (&ses->i); - - return true; -} - - -static void -do_close (struct getl_interface *i ) -{ - free (i); -} - -struct getl_interface * -create_syntax_editor_source (GtkTextBuffer *buffer, - GtkTextIter start, - GtkTextIter stop, - const gchar *nm - ) -{ - struct syntax_editor_source *ses = xzalloc (sizeof *ses); - - ses->buffer = buffer; - ses->i = start; - ses->end = stop; - ses->name = nm; - - - ses->parent.interactive = always_false; - ses->parent.read = read_line_from_buffer; - ses->parent.close = do_close; - - ses->parent.name = name; - ses->parent.location = location; - - - return &ses->parent; -} diff --git a/src/ui/gui/syntax-editor-source.h b/src/ui/gui/syntax-editor-source.h deleted file mode 100644 index f8d08eaa..00000000 --- a/src/ui/gui/syntax-editor-source.h +++ /dev/null @@ -1,34 +0,0 @@ -/* PSPPIRE - a graphical user interface for PSPP. - Copyright (C) 2006 Free Software Foundation - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#ifndef SYNTAX_EDITOR_SOURCE_H -#define SYNTAX_EDITOR_SOURCE_H - -#include -struct getl_interface; - -struct syntax_editor; - -struct getl_interface * -create_syntax_editor_source (GtkTextBuffer *buffer, - GtkTextIter start, - GtkTextIter stop, - const gchar *name - ); - - - -#endif diff --git a/src/ui/source-init-opts.c b/src/ui/source-init-opts.c index a984b3d0..9627fdf5 100644 --- a/src/ui/source-init-opts.c +++ b/src/ui/source-init-opts.c @@ -26,11 +26,10 @@ #include "data/por-file-reader.h" #include "data/settings.h" #include "data/sys-file-reader.h" -#include "language/syntax-file.h" -#include "language/syntax-string-source.h" +#include "language/lexer/include-path.h" +#include "language/lexer/lexer.h" #include "libpspp/assertion.h" #include "libpspp/argv-parser.h" -#include "libpspp/getl.h" #include "libpspp/llx.h" #include "libpspp/message.h" #include "ui/syntax-gen.h" @@ -62,10 +61,8 @@ static const struct argv_option source_init_options[N_SOURCE_INIT_OPTIONS] = }; static void -source_init_option_callback (int id, void *ss_) +source_init_option_callback (int id, void *aux UNUSED) { - struct source_stream *ss = ss_; - switch (id) { case OPT_ALGORITHM: @@ -82,13 +79,13 @@ source_init_option_callback (int id, void *ss_) case OPT_INCLUDE: if (!strcmp (optarg, "-")) - getl_clear_include_path (ss); + include_path_clear (); else - getl_add_include_dir (ss, optarg); + include_path_add (optarg); break; case OPT_NO_INCLUDE: - getl_clear_include_path (ss); + include_path_clear (); break; case OPT_SAFER: @@ -113,9 +110,8 @@ source_init_option_callback (int id, void *ss_) } void -source_init_register_argv_parser (struct argv_parser *ap, - struct source_stream *ss) +source_init_register_argv_parser (struct argv_parser *ap) { argv_parser_add_options (ap, source_init_options, N_SOURCE_INIT_OPTIONS, - source_init_option_callback, ss); + source_init_option_callback, NULL); } diff --git a/src/ui/source-init-opts.h b/src/ui/source-init-opts.h index cfd87f3f..ee1de031 100644 --- a/src/ui/source-init-opts.h +++ b/src/ui/source-init-opts.h @@ -19,9 +19,7 @@ #define UI_SOURCE_INIT_OPTS struct argv_parser; -struct source_stream; -void source_init_register_argv_parser (struct argv_parser *, - struct source_stream *); +void source_init_register_argv_parser (struct argv_parser *); #endif /* ui/source/source-init-opts.h */ diff --git a/src/ui/terminal/automake.mk b/src/ui/terminal/automake.mk index 81b896dc..3fa72c41 100644 --- a/src/ui/terminal/automake.mk +++ b/src/ui/terminal/automake.mk @@ -3,16 +3,13 @@ noinst_LTLIBRARIES += src/ui/terminal/libui.la src_ui_terminal_libui_la_SOURCES = \ - src/ui/terminal/read-line.c \ - src/ui/terminal/read-line.h \ src/ui/terminal/main.c \ - src/ui/terminal/msg-ui.c \ - src/ui/terminal/msg-ui.h \ - src/ui/terminal/terminal.c \ - src/ui/terminal/terminal.h \ src/ui/terminal/terminal-opts.c \ - src/ui/terminal/terminal-opts.h - + src/ui/terminal/terminal-opts.h \ + src/ui/terminal/terminal-reader.c \ + src/ui/terminal/terminal-reader.h \ + src/ui/terminal/terminal.c \ + src/ui/terminal/terminal.h src_ui_terminal_libui_la_CFLAGS = $(NCURSES_CFLAGS) diff --git a/src/ui/terminal/main.c b/src/ui/terminal/main.c index 5fa16041..a9db6feb 100644 --- a/src/ui/terminal/main.c +++ b/src/ui/terminal/main.c @@ -1,5 +1,5 @@ /* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006, 2007, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 1997-9, 2000, 2006, 2007, 2009, 2010, 2011 Free Software Foundation, Inc. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -39,20 +39,19 @@ #include "gsl/gsl_errno.h" #include "language/command.h" #include "language/lexer/lexer.h" -#include "language/syntax-file.h" +#include "language/lexer/include-path.h" #include "libpspp/argv-parser.h" #include "libpspp/compiler.h" -#include "libpspp/getl.h" #include "libpspp/i18n.h" #include "libpspp/message.h" #include "libpspp/version.h" #include "math/random.h" #include "output/driver.h" +#include "output/message-item.h" #include "ui/debugger.h" #include "ui/source-init-opts.h" -#include "ui/terminal/msg-ui.h" -#include "ui/terminal/read-line.h" #include "ui/terminal/terminal-opts.h" +#include "ui/terminal/terminal-reader.h" #include "ui/terminal/terminal.h" #include "gl/fatal-signal.h" @@ -62,15 +61,13 @@ #include "gettext.h" #define _(msgid) gettext (msgid) -static struct dataset * the_dataset = NULL; +static struct dataset *the_dataset; -static struct lexer *the_lexer; -static struct source_stream *the_source_stream ; - -static void add_syntax_file (struct source_stream *, enum syntax_mode, - const char *file_name); +static void add_syntax_reader (struct lexer *, const char *file_name, + const char *encoding, enum lex_syntax_mode); static void bug_handler(int sig); static void fpu_init (void); +static void output_msg (const struct msg *, void *); /* Program entry point. */ int @@ -78,8 +75,10 @@ main (int argc, char **argv) { struct terminal_opts *terminal_opts; struct argv_parser *parser; - enum syntax_mode syntax_mode; + enum lex_syntax_mode syntax_mode; + char *syntax_encoding; bool process_statrc; + struct lexer *lexer; set_program_name (argv[0]); @@ -92,31 +91,32 @@ main (int argc, char **argv) gsl_set_error_handler_off (); fh_init (); - the_source_stream = create_source_stream (); - readln_initialize (); settings_init (); terminal_check_size (); random_init (); + lexer = lex_create (); the_dataset = create_dataset (); parser = argv_parser_create (); - terminal_opts = terminal_opts_init (parser, &syntax_mode, &process_statrc); - source_init_register_argv_parser (parser, the_source_stream); + terminal_opts = terminal_opts_init (parser, &syntax_mode, &process_statrc, + &syntax_encoding); + source_init_register_argv_parser (parser); if (!argv_parser_run (parser, argc, argv)) exit (EXIT_FAILURE); terminal_opts_done (terminal_opts, argc, argv); argv_parser_destroy (parser); - msg_ui_init (the_source_stream); + msg_set_handler (output_msg, lexer); + dataset_set_default_syntax_encoding (the_dataset, syntax_encoding); /* Add syntax files to source stream. */ if (process_statrc) { - char *rc = fn_search_path ("rc", getl_include_path (the_source_stream)); + char *rc = include_path_search ("rc"); if (rc != NULL) { - add_syntax_file (the_source_stream, GETL_BATCH, rc); + add_syntax_reader (lexer, rc, "Auto", LEX_SYNTAX_AUTO); free (rc); } } @@ -125,28 +125,37 @@ main (int argc, char **argv) int i; for (i = optind; i < argc; i++) - add_syntax_file (the_source_stream, syntax_mode, argv[i]); + add_syntax_reader (lexer, argv[i], syntax_encoding, syntax_mode); } else - add_syntax_file (the_source_stream, syntax_mode, "-"); + add_syntax_reader (lexer, "-", syntax_encoding, syntax_mode); /* Parse and execute syntax. */ - the_lexer = lex_create (the_source_stream); + lex_get (lexer); for (;;) { - int result = cmd_parse (the_lexer, the_dataset); + int result = cmd_parse (lexer, the_dataset); if (result == CMD_EOF || result == CMD_FINISH) break; - if (result == CMD_CASCADING_FAILURE && - !getl_is_interactive (the_source_stream)) - { - msg (SE, _("Stopping syntax file processing here to avoid " - "a cascade of dependent command failures.")); - getl_abort_noninteractive (the_source_stream); - } - else if (msg_ui_too_many_errors ()) - getl_abort_noninteractive (the_source_stream); + else if (cmd_result_is_failure (result) && lex_token (lexer) != T_STOP) + { + if (lex_get_error_mode (lexer) == LEX_ERROR_STOP) + { + msg (MW, _("Error encountered while ERROR=STOP is effective.")); + lex_discard_noninteractive (lexer); + } + else if (result == CMD_CASCADING_FAILURE + && lex_get_error_mode (lexer) != LEX_ERROR_INTERACTIVE) + { + msg (SE, _("Stopping syntax file processing here to avoid " + "a cascade of dependent command failures.")); + lex_discard_noninteractive (lexer); + } + } + + if (msg_ui_too_many_errors ()) + lex_discard_noninteractive (lexer); } @@ -155,16 +164,13 @@ main (int argc, char **argv) random_done (); settings_done (); fh_done (); - lex_destroy (the_lexer); - destroy_source_stream (the_source_stream); - readln_uninitialize (); + lex_destroy (lexer); output_close (); - msg_ui_done (); i18n_done (); return msg_ui_any_errors (); } - + static void fpu_init (void) { @@ -210,13 +216,32 @@ bug_handler(int sig) } static void -add_syntax_file (struct source_stream *ss, enum syntax_mode syntax_mode, - const char *file_name) +output_msg (const struct msg *m_, void *lexer_) +{ + struct lexer *lexer = lexer_; + struct msg m = *m_; + + if (m.file_name == NULL) + { + m.file_name = CONST_CAST (char *, lex_get_file_name (lexer)); + m.first_line = lex_get_first_line_number (lexer, 0); + m.last_line = lex_get_last_line_number (lexer, 0); + } + + message_item_submit (message_item_create (&m)); +} + +static void +add_syntax_reader (struct lexer *lexer, const char *file_name, + const char *encoding, enum lex_syntax_mode syntax_mode) { - struct getl_interface *source; + struct lex_reader *reader; + + reader = (!strcmp (file_name, "-") && isatty (STDIN_FILENO) + ? terminal_reader_create () + : lex_reader_for_file (file_name, encoding, syntax_mode, + LEX_ERROR_CONTINUE)); - source = (!strcmp (file_name, "-") && isatty (STDIN_FILENO) - ? create_readln_source () - : create_syntax_file_source (file_name)); - getl_append_source (ss, source, syntax_mode, ERRMODE_CONTINUE); + if (reader) + lex_append (lexer, reader); } diff --git a/src/ui/terminal/msg-ui.c b/src/ui/terminal/msg-ui.c deleted file mode 100644 index 63682d18..00000000 --- a/src/ui/terminal/msg-ui.c +++ /dev/null @@ -1,41 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006, 2010 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#include - -#include "msg-ui.h" -#include "libpspp/message.h" -#include "libpspp/msg-locator.h" -#include "output/message-item.h" - -static void -handle_msg (const struct msg *m) -{ - message_item_submit (message_item_create (m)); -} - -void -msg_ui_init (struct source_stream *ss) -{ - msg_init (ss, handle_msg); -} - -void -msg_ui_done (void) -{ - msg_done (); - msg_locator_done (); -} diff --git a/src/ui/terminal/msg-ui.h b/src/ui/terminal/msg-ui.h deleted file mode 100644 index 197d7c02..00000000 --- a/src/ui/terminal/msg-ui.h +++ /dev/null @@ -1,29 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#ifndef MSG_UI_H -#define MSG_UI_H 1 - -#include -#include - -struct source_stream; - -void msg_ui_set_error_file (FILE *); -void msg_ui_init (struct source_stream *); -void msg_ui_done (void); - -#endif /* msg-ui.h */ diff --git a/src/ui/terminal/read-line.c b/src/ui/terminal/read-line.c deleted file mode 100644 index 544e18d5..00000000 --- a/src/ui/terminal/read-line.c +++ /dev/null @@ -1,264 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2007, 2009, 2011 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#include - -#include "ui/terminal/read-line.h" - -#include -#include -#include -#include -#if ! HAVE_READLINE -#include -#endif - -#include "data/file-name.h" -#include "data/settings.h" -#include "language/command.h" -#include "language/prompt.h" -#include "libpspp/cast.h" -#include "libpspp/message.h" -#include "libpspp/str.h" -#include "libpspp/version.h" -#include "output/driver.h" -#include "output/journal.h" -#include "ui/terminal/msg-ui.h" -#include "ui/terminal/terminal.h" - -#include "gl/xalloc.h" - -#include "gettext.h" -#define _(msgid) gettext (msgid) - -#if HAVE_READLINE -#include -#include - -static char *history_file; - -static char **complete_command_name (const char *, int, int); -static char **dont_complete (const char *, int, int); -#endif /* HAVE_READLINE */ - - -struct readln_source -{ - struct getl_interface parent ; - - bool (*interactive_func) (struct string *line, - enum prompt_style) ; -}; - - -static bool initialised = false; - -/* Initialize getl. */ -void -readln_initialize (void) -{ - initialised = true; - -#if HAVE_READLINE - rl_basic_word_break_characters = "\n"; - using_history (); - stifle_history (500); - if (history_file == NULL) - { - const char *home_dir = getenv ("HOME"); - if (home_dir != NULL) - { - history_file = xasprintf ("%s/.pspp_history", home_dir); - read_history (history_file); - } - } -#endif -} - -/* Close getl. */ -void -readln_uninitialize (void) -{ - initialised = false; - -#if HAVE_READLINE - if (history_file != NULL && false == settings_get_testing_mode () ) - write_history (history_file); - clear_history (); - free (history_file); -#endif -} - - -static bool -read_interactive (struct getl_interface *s, - struct string *line) -{ - struct readln_source *is = UP_CAST (s, struct readln_source, parent); - - return is->interactive_func (line, prompt_get_style ()); -} - -static bool -always_true (const struct getl_interface *s UNUSED) -{ - return true; -} - -/* Display a welcoming message. */ -static void -welcome (void) -{ - static bool welcomed = false; - if (welcomed) - return; - welcomed = true; - fputs ("PSPP is free software and you are welcome to distribute copies of " - "it\nunder certain conditions; type \"show copying.\" to see the " - "conditions.\nThere is ABSOLUTELY NO WARRANTY for PSPP; type \"show " - "warranty.\" for details.\n", stdout); - puts (stat_version); - readln_initialize (); - journal_enable (); -} - -/* Gets a line from the user and stores it into LINE. - Prompts the user with PROMPT. - Returns true if successful, false at end of file. - */ -static bool -readln_read (struct string *line, enum prompt_style style) -{ - const char *prompt = prompt_get (style); -#if HAVE_READLINE - char *string; -#endif - bool eof; - - assert (initialised); - - msg_ui_reset_counts (); - - welcome (); - - output_flush (); - -#if HAVE_READLINE - rl_attempted_completion_function = (style == PROMPT_FIRST - ? complete_command_name - : dont_complete); - string = readline (prompt); - if (string == NULL) - eof = true; - else - { - if (string[0]) - add_history (string); - ds_assign_cstr (line, string); - free (string); - eof = false; - } -#else - fputs (prompt, stdout); - fflush (stdout); - if (ds_read_line (line, stdin, SIZE_MAX)) - { - ds_chomp (line, '\n'); - eof = false; - } - else - eof = true; -#endif - - /* Check whether the size of the window has changed, so that - the output drivers can adjust their settings as needed. We - only do this for the first line of a command, as it's - possible that the output drivers are actually in use - afterward, and we don't want to confuse them in the middle - of output. */ - if (style == PROMPT_FIRST) - terminal_check_size (); - - return !eof; -} - -static void -readln_close (struct getl_interface *i) -{ - free (i); -} - -/* Creates a source which uses readln to get its line */ -struct getl_interface * -create_readln_source (void) -{ - struct readln_source *rlns = xzalloc (sizeof (*rlns)); - - rlns->interactive_func = readln_read; - - rlns->parent.interactive = always_true; - rlns->parent.read = read_interactive; - rlns->parent.close = readln_close; - - return &rlns->parent; -} - - -#if HAVE_READLINE -static char *command_generator (const char *text, int state); - -/* Returns a set of command name completions for TEXT. - This is of the proper form for assigning to - rl_attempted_completion_function. */ -static char ** -complete_command_name (const char *text, int start, int end UNUSED) -{ - if (start == 0) - { - /* Complete command name at start of line. */ - return rl_completion_matches (text, command_generator); - } - else - { - /* Otherwise don't do any completion. */ - rl_attempted_completion_over = 1; - return NULL; - } -} - -/* Do not do any completion for TEXT. */ -static char ** -dont_complete (const char *text UNUSED, int start UNUSED, int end UNUSED) -{ - rl_attempted_completion_over = 1; - return NULL; -} - -/* If STATE is 0, returns the first command name matching TEXT. - Otherwise, returns the next command name matching TEXT. - Returns a null pointer when no matches are left. */ -static char * -command_generator (const char *text, int state) -{ - static const struct command *cmd; - const char *name; - - if (state == 0) - cmd = NULL; - name = cmd_complete (text, &cmd); - return name ? xstrdup (name) : NULL; -} -#endif /* HAVE_READLINE */ diff --git a/src/ui/terminal/read-line.h b/src/ui/terminal/read-line.h deleted file mode 100644 index 0eb67097..00000000 --- a/src/ui/terminal/read-line.h +++ /dev/null @@ -1,31 +0,0 @@ -/* PSPP - a program for statistical analysis. - Copyright (C) 1997-9, 2000, 2011 Free Software Foundation, Inc. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . */ - -#ifndef READLN_H -#define READLN_H - -#include "libpspp/str.h" -#include "libpspp/getl.h" - -void readln_initialize (void); -void readln_uninitialize (void); - -struct getl_interface *create_readln_source (void); - - - -#endif /* READLN_H */ - diff --git a/src/ui/terminal/terminal-opts.c b/src/ui/terminal/terminal-opts.c index d55a8a2e..121d89d5 100644 --- a/src/ui/terminal/terminal-opts.c +++ b/src/ui/terminal/terminal-opts.c @@ -23,12 +23,11 @@ #include "data/settings.h" #include "data/file-name.h" -#include "language/syntax-file.h" +#include "language/lexer/include-path.h" #include "libpspp/argv-parser.h" #include "libpspp/assertion.h" #include "libpspp/cast.h" #include "libpspp/compiler.h" -#include "libpspp/getl.h" #include "libpspp/llx.h" #include "libpspp/str.h" #include "libpspp/string-array.h" @@ -38,8 +37,6 @@ #include "output/driver.h" #include "output/driver-provider.h" #include "output/msglog.h" -#include "ui/terminal/msg-ui.h" -#include "ui/terminal/read-line.h" #include "gl/error.h" #include "gl/progname.h" @@ -53,12 +50,13 @@ struct terminal_opts { - enum syntax_mode *syntax_mode; struct string_map options; /* Output driver options. */ bool has_output_driver; bool has_terminal_driver; bool has_error_file; + enum lex_syntax_mode *syntax_mode; bool *process_statrc; + char **syntax_encoding; }; enum @@ -68,7 +66,9 @@ enum OPT_OUTPUT, OPT_OUTPUT_OPTION, OPT_NO_OUTPUT, + OPT_BATCH, OPT_INTERACTIVE, + OPT_SYNTAX_ENCODING, OPT_NO_STATRC, OPT_HELP, OPT_VERSION, @@ -82,7 +82,9 @@ static struct argv_option terminal_argv_options[N_TERMINAL_OPTIONS] = {"output", 'o', required_argument, OPT_OUTPUT}, {NULL, 'O', required_argument, OPT_OUTPUT_OPTION}, {"no-output", 0, no_argument, OPT_NO_OUTPUT}, + {"batch", 'b', no_argument, OPT_BATCH}, {"interactive", 'i', no_argument, OPT_INTERACTIVE}, + {"syntax-encoding", 0, required_argument, OPT_SYNTAX_ENCODING}, {"no-statrc", 'r', no_argument, OPT_NO_STATRC}, {"help", 'h', no_argument, OPT_HELP}, {"version", 'V', no_argument, OPT_VERSION}, @@ -160,29 +162,11 @@ get_supported_formats (void) return format_string; } -static char * -get_default_include_path (void) -{ - struct source_stream *ss; - struct string dst; - char **path; - size_t i; - - ss = create_source_stream (); - path = getl_include_path (ss); - ds_init_empty (&dst); - for (i = 0; path[i] != NULL; i++) - ds_put_format (&dst, " %s", path[i]); - destroy_source_stream (ss); - - return ds_steal_cstr (&dst); -} - static void usage (void) { char *supported_formats = get_supported_formats (); - char *default_include_path = get_default_include_path (); + char *inc_path = string_array_join (include_path_default (), " "); printf (_("\ PSPP, a program for statistical analysis of sample data.\n\ @@ -208,19 +192,21 @@ Language options:\n\ calculated from broken algorithms\n\ -x, --syntax={compatible|enhanced}\n\ set to `compatible' to disable PSPP extensions\n\ + -b, --batch interpret syntax in batch mode\n\ -i, --interactive interpret syntax in interactive mode\n\ + --syntax-encoding=ENCODING specify encoding for syntax files\n\ -s, --safer don't allow some unsafe operations\n\ -Default search path:%s\n\ +Default search path: %s\n\ \n\ Informative output:\n\ -h, --help display this help and exit\n\ -V, --version output version information and exit\n\ \n\ Non-option arguments are interpreted as syntax files to execute.\n"), - program_name, supported_formats, default_include_path); + program_name, supported_formats, inc_path); free (supported_formats); - free (default_include_path); + free (inc_path); emit_bug_reporting_address (); exit (EXIT_SUCCESS); @@ -257,8 +243,16 @@ terminal_option_callback (int id, void *to_) to->has_output_driver = true; break; + case OPT_BATCH: + *to->syntax_mode = LEX_SYNTAX_BATCH; + break; + case OPT_INTERACTIVE: - *to->syntax_mode = GETL_INTERACTIVE; + *to->syntax_mode = LEX_SYNTAX_INTERACTIVE; + break; + + case OPT_SYNTAX_ENCODING: + *to->syntax_encoding = optarg; break; case OPT_NO_STATRC: @@ -282,19 +276,23 @@ terminal_option_callback (int id, void *to_) struct terminal_opts * terminal_opts_init (struct argv_parser *ap, - enum syntax_mode *syntax_mode, bool *process_statrc) + enum lex_syntax_mode *syntax_mode, bool *process_statrc, + char **syntax_encoding) { struct terminal_opts *to; - *syntax_mode = GETL_BATCH; + *syntax_mode = LEX_SYNTAX_AUTO; *process_statrc = true; + *syntax_encoding = "Auto"; to = xzalloc (sizeof *to); to->syntax_mode = syntax_mode; string_map_init (&to->options); to->has_output_driver = false; to->has_error_file = false; + to->syntax_mode = syntax_mode; to->process_statrc = process_statrc; + to->syntax_encoding = syntax_encoding; argv_parser_add_options (ap, terminal_argv_options, N_TERMINAL_OPTIONS, terminal_option_callback, to); diff --git a/src/ui/terminal/terminal-opts.h b/src/ui/terminal/terminal-opts.h index 50f23196..64581bae 100644 --- a/src/ui/terminal/terminal-opts.h +++ b/src/ui/terminal/terminal-opts.h @@ -19,14 +19,16 @@ #define UI_TERMINAL_TERMINAL_OPTS_H 1 #include -#include "libpspp/getl.h" +#include "language/lexer/lexer.h" struct argv_parser; +struct lexer; struct terminal_opts; struct terminal_opts *terminal_opts_init (struct argv_parser *, - enum syntax_mode *, - bool *process_statrc); + enum lex_syntax_mode *, + bool *process_statrc, + char **syntax_encoding); void terminal_opts_done (struct terminal_opts *, int argc, char *argv[]); #endif /* ui/terminal/terminal-opts.h */ diff --git a/src/ui/terminal/terminal-reader.c b/src/ui/terminal/terminal-reader.c new file mode 100644 index 00000000..7c80d27b --- /dev/null +++ b/src/ui/terminal/terminal-reader.c @@ -0,0 +1,308 @@ +/* PSPP - a program for statistical analysis. + Copyright (C) 1997-9, 2000, 2007, 2009, 2010, 2011 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include + +#include "ui/terminal/terminal-reader.h" + +#include +#include +#include +#include +#include + +#include "data/file-name.h" +#include "data/settings.h" +#include "language/command.h" +#include "language/lexer/lexer.h" +#include "libpspp/assertion.h" +#include "libpspp/cast.h" +#include "libpspp/message.h" +#include "libpspp/prompt.h" +#include "libpspp/str.h" +#include "libpspp/version.h" +#include "output/driver.h" +#include "output/journal.h" +#include "ui/terminal/terminal.h" + +#include "gl/minmax.h" +#include "gl/xalloc.h" + +#include "gettext.h" +#define _(msgid) gettext (msgid) + +struct terminal_reader + { + struct lex_reader reader; + struct substring s; + size_t offset; + bool eof; + }; + +static int n_terminal_readers; + +static void readline_init (void); +static void readline_done (void); +static struct substring readline_read (enum prompt_style); + +/* Display a welcoming message. */ +static void +welcome (void) +{ + static bool welcomed = false; + if (welcomed) + return; + welcomed = true; + fputs ("PSPP is free software and you are welcome to distribute copies of " + "it\nunder certain conditions; type \"show copying.\" to see the " + "conditions.\nThere is ABSOLUTELY NO WARRANTY for PSPP; type \"show " + "warranty.\" for details.\n", stdout); + puts (stat_version); + journal_enable (); +} + +static struct terminal_reader * +terminal_reader_cast (struct lex_reader *r) +{ + return UP_CAST (r, struct terminal_reader, reader); +} + +static size_t +terminal_reader_read (struct lex_reader *r_, char *buf, size_t n, + enum prompt_style prompt_style) +{ + struct terminal_reader *r = terminal_reader_cast (r_); + size_t chunk; + + if (r->offset >= r->s.length && !r->eof) + { + welcome (); + msg_ui_reset_counts (); + output_flush (); + + ss_dealloc (&r->s); + r->s = readline_read (prompt_style); + r->offset = 0; + r->eof = ss_is_empty (r->s); + + /* Check whether the size of the window has changed, so that + the output drivers can adjust their settings as needed. We + only do this for the first line of a command, as it's + possible that the output drivers are actually in use + afterward, and we don't want to confuse them in the middle + of output. */ + if (prompt_style == PROMPT_FIRST) + terminal_check_size (); + } + + chunk = MIN (n, r->s.length - r->offset); + memcpy (buf, r->s.string + r->offset, chunk); + r->offset += chunk; + return chunk; +} + +static void +terminal_reader_close (struct lex_reader *r_) +{ + struct terminal_reader *r = terminal_reader_cast (r_); + + ss_dealloc (&r->s); + free (r->reader.file_name); + free (r); + + if (!--n_terminal_readers) + readline_done (); +} + +static struct lex_reader_class terminal_reader_class = + { + terminal_reader_read, + terminal_reader_close + }; + +/* Creates a source which uses readln to get its line */ +struct lex_reader * +terminal_reader_create (void) +{ + struct terminal_reader *r; + + if (!n_terminal_readers++) + readline_init (); + + r = xzalloc (sizeof *r); + r->reader.class = &terminal_reader_class; + r->reader.syntax = LEX_SYNTAX_INTERACTIVE; + r->reader.error = LEX_ERROR_INTERACTIVE; + r->reader.file_name = NULL; + r->s = ss_empty (); + r->offset = 0; + r->eof = false; + return &r->reader; +} + +#if HAVE_READLINE +#include +#include + +static char *history_file; + +static char **complete_command_name (const char *, int, int); +static char **dont_complete (const char *, int, int); +static char *command_generator (const char *text, int state); + +static void +readline_init (void) +{ + rl_basic_word_break_characters = "\n"; + using_history (); + stifle_history (500); + if (history_file == NULL) + { + const char *home_dir = getenv ("HOME"); + if (home_dir != NULL) + { + history_file = xasprintf ("%s/.pspp_history", home_dir); + read_history (history_file); + } + } +} + +static void +readline_done (void) +{ + if (history_file != NULL && false == settings_get_testing_mode () ) + write_history (history_file); + clear_history (); + free (history_file); +} + +static const char * +readline_prompt (enum prompt_style style) +{ + switch (style) + { + case PROMPT_FIRST: + return "PSPP> "; + + case PROMPT_LATER: + return " > "; + + case PROMPT_DATA: + return "data> "; + + case PROMPT_COMMENT: + return "comment> "; + + case PROMPT_DOCUMENT: + return "document> "; + + case PROMPT_DO_REPEAT: + return "DO REPEAT> "; + } + + NOT_REACHED (); +} + +static struct substring +readline_read (enum prompt_style style) +{ + char *string; + + rl_attempted_completion_function = (style == PROMPT_FIRST + ? complete_command_name + : dont_complete); + string = readline (readline_prompt (style)); + if (string != NULL) + { + char *end; + + if (string[0]) + add_history (string); + + end = strchr (string, '\0'); + *end = '\n'; + return ss_buffer (string, end - string + 1); + } + else + return ss_empty (); +} + +/* Returns a set of command name completions for TEXT. + This is of the proper form for assigning to + rl_attempted_completion_function. */ +static char ** +complete_command_name (const char *text, int start, int end UNUSED) +{ + if (start == 0) + { + /* Complete command name at start of line. */ + return rl_completion_matches (text, command_generator); + } + else + { + /* Otherwise don't do any completion. */ + rl_attempted_completion_over = 1; + return NULL; + } +} + +/* Do not do any completion for TEXT. */ +static char ** +dont_complete (const char *text UNUSED, int start UNUSED, int end UNUSED) +{ + rl_attempted_completion_over = 1; + return NULL; +} + +/* If STATE is 0, returns the first command name matching TEXT. + Otherwise, returns the next command name matching TEXT. + Returns a null pointer when no matches are left. */ +static char * +command_generator (const char *text, int state) +{ + static const struct command *cmd; + const char *name; + + if (state == 0) + cmd = NULL; + name = cmd_complete (text, &cmd); + return name ? xstrdup (name) : NULL; +} +#else /* !HAVE_READLINE */ +static void +readline_init (void) +{ +} + +static void +readline_done (void) +{ +} + +static struct substring +readline_read (enum prompt_style style) +{ + const char *prompt = prompt_get (style); + struct string line; + + fputs (prompt, stdout); + fflush (stdout); + ds_init_empty (&line); + ds_read_line (&line, stdin, SIZE_MAX); + + return line.ss; +} +#endif /* !HAVE_READLINE */ diff --git a/src/ui/terminal/terminal-reader.h b/src/ui/terminal/terminal-reader.h new file mode 100644 index 00000000..2d51c9bd --- /dev/null +++ b/src/ui/terminal/terminal-reader.h @@ -0,0 +1,23 @@ +/* PSPP - a program for statistical analysis. + Copyright (C) 1997-9, 2000, 2010 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#ifndef TERMINAL_READER_H +#define TERMINAL_READER_H + +struct lex_reader *terminal_reader_create (void); + +#endif /* terminal-reader.h */ + diff --git a/tests/data/data-in.at b/tests/data/data-in.at index 05dc10ff..d5595975 100644 --- a/tests/data/data-in.at +++ b/tests/data/data-in.at @@ -3500,23 +3500,23 @@ PRINT OUTFILE='wkday.out'/ALL. EXECUTE. ]) AT_CHECK([pspp -O format=csv wkday.sps], [0], [dnl -wkday.sps:20.1-2: warning: Data for variable wkday2 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.2: warning: Data for variable wkday2 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-3: warning: Data for variable wkday3 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.3: warning: Data for variable wkday3 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-4: warning: Data for variable wkday4 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.4: warning: Data for variable wkday4 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-5: warning: Data for variable wkday5 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.5: warning: Data for variable wkday5 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-6: warning: Data for variable wkday6 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.6: warning: Data for variable wkday6 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-7: warning: Data for variable wkday7 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.7: warning: Data for variable wkday7 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-8: warning: Data for variable wkday8 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.8: warning: Data for variable wkday8 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-9: warning: Data for variable wkday9 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.9: warning: Data for variable wkday9 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. -wkday.sps:20.1-10: warning: Data for variable wkday10 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. +wkday.sps:20.1-20.10: warning: Data for variable wkday10 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified. ]) AT_CHECK([cat wkday.out], [0], [dnl . . . . . . . . . @&t@ @@ -3595,51 +3595,51 @@ PRINT OUTFILE='month.out'/ALL. EXECUTE. ]) AT_CHECK([pspp -O format=csv month.sps], [0], [dnl -month.sps:15.1-4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:15.1-15.4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:15.1-5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:15.1-15.5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:15.1-6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:15.1-15.6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:15.1-7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:15.1-15.7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:15.1-8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:15.1-15.8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:15.1-9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:15.1-15.9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:15.1-10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:15.1-15.10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:26.1-10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:26.1-26.10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. -month.sps:39.1-10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. +month.sps:39.1-39.10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names. ]) AT_CHECK([cat month.out], [0], [dnl . . . . . . . . @&t@ diff --git a/tests/data/sys-file-reader.at b/tests/data/sys-file-reader.at index 3a8c6892..4a06ebdf 100644 --- a/tests/data/sys-file-reader.at +++ b/tests/data/sys-file-reader.at @@ -1268,8 +1268,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xd4: Misplaced type 4 record. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1298,8 +1296,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xd4: Unrecognized record type 8. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1356,8 +1352,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xb4: Invalid variable name `$UM1'. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1386,8 +1380,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xb4: Invalid variable name `TO'. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1416,8 +1408,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xb4: Bad width 256 for variable VAR1. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1447,8 +1437,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xd4: Duplicate variable name `VAR1'. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1477,8 +1465,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xb4: Variable label indicator field is not 0 or 1. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1507,8 +1493,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], ["error: `sys-file.sav' near offset 0xb4: Numeric missing value indicator field is not -3, -2, 0, 1, 2, or 3." - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1537,8 +1521,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], ["error: `sys-file.sav' near offset 0xb4: String missing value indicator field is not 0, 1, 2, or 3." - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1568,8 +1550,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xb4: Missing string continuation record. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1598,8 +1578,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0xc0: Unknown variable format 255. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1675,8 +1653,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav': Weighting variable must be numeric (not string variable `STR1'). - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1708,8 +1684,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0x4c: Variable index 3 not in valid range 1...2. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1742,8 +1716,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [error: `sys-file.sav' near offset 0x4c: Variable index 3 refers to long string continuation. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1779,8 +1751,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0x12c: Duplicate type 6 (document) record. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1817,8 +1787,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0xd4: Number of document lines (0) must be greater than 0 and less than 26843545. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1848,8 +1816,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0xd8: Record type 7 subtype 3 too large. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -1942,8 +1908,6 @@ do ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0xd8: Floating-point representation indicated by system file (2) differs from expected (1). - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -2718,8 +2682,6 @@ warning: `sys-file.sav' near offset 0xd8: NUM1 listed as string of invalid lengt "warning: `sys-file.sav' near offset 0xd8: NUM1 listed in very long string record with width 00255, which requires only one segment." error: `sys-file.sav' near offset 0xd8: Very long string NUM1 overflows dictionary. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -2757,8 +2719,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0x4f8: Very long string with width 256 has segment 1 of width 9 (expected 4). - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -2789,8 +2749,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0xd4: Invalid number of labels 2147483647. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -2823,8 +2781,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0xe8: Variable index record (type 4) does not immediately follow value label record (type 3) as it should. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -2854,8 +2810,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0xec: Number of variables associated with a value label (0) is not between 1 and the number of variables (1). - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -2888,8 +2842,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl error: `sys-file.sav' near offset 0xf4: Value labels may not be added to long string variables (e.g. STR1) using records types 3 and 4. - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -2922,8 +2874,6 @@ GET FILE='sys-file.sav'. ]) AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl "error: `sys-file.sav' near offset 0xf4: Variables associated with value label are not all of identical type. Variable STR1 is string, but variable NUM1 is numeric." - -sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -3148,8 +3098,6 @@ num1,num2 3,4 5,6 7,8 - -sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -3186,8 +3134,6 @@ LIST. Table: Data List num1,num2 1,2 - -sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -3224,8 +3170,6 @@ LIST. Table: Data List str14 one data item @&t@ - -sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP @@ -3276,8 +3220,6 @@ LIST. Table: Data List num1,num2,str4,str8,str15 -99,0,,abcdefgh,0123 @&t@ - -sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) done AT_CLEANUP diff --git a/tests/dissect-sysfile.c b/tests/dissect-sysfile.c index fda3b385..444c15cc 100644 --- a/tests/dissect-sysfile.c +++ b/tests/dissect-sysfile.c @@ -36,7 +36,7 @@ #include "gettext.h" #define _(msgid) gettext (msgid) -#define VAR_NAME_LEN 64 +#define ID_MAX_LEN 64 struct sfm_reader { @@ -925,7 +925,7 @@ read_long_string_value_labels (struct sfm_reader *r, size_t size, size_t count) while (ftello (r->file) - start < size * count) { long long posn = ftello (r->file); - char var_name[VAR_NAME_LEN + 1]; + char var_name[ID_MAX_LEN + 1]; int var_name_len; int n_values; int width; @@ -933,10 +933,10 @@ read_long_string_value_labels (struct sfm_reader *r, size_t size, size_t count) /* Read variable name. */ var_name_len = read_int (r); - if (var_name_len > VAR_NAME_LEN) + if (var_name_len > ID_MAX_LEN) sys_error (r, _("Variable name length in long string value label " "record (%d) exceeds %d-byte limit."), - var_name_len, VAR_NAME_LEN); + var_name_len, ID_MAX_LEN); read_string (r, var_name, var_name_len + 1); /* Read width, number of values. */ diff --git a/tests/language/control/do-repeat.at b/tests/language/control/do-repeat.at index a0b29d93..4421ba6b 100644 --- a/tests/language/control/do-repeat.at +++ b/tests/language/control/do-repeat.at @@ -1,6 +1,95 @@ AT_BANNER([DO REPEAT]) -AT_SETUP([DO REPEAT -- ordinary]) +AT_SETUP([DO REPEAT -- simple]) +AT_DATA([do-repeat.sps], [dnl +INPUT PROGRAM. +STRING y(A1). +DO REPEAT xval = 1 2 3 / yval = 'a' 'b' 'c' / var = a b c. +COMPUTE x=xval. +COMPUTE y=yval. +COMPUTE var=xval. +END CASE. +END REPEAT. +END FILE. +END INPUT PROGRAM. +LIST. +]) +AT_CHECK([pspp -o pspp.csv do-repeat.sps]) +AT_CHECK([cat pspp.csv], [0], [dnl +Table: Data List +y,x,a,b,c +a,1.00,1.00,. ,. @&t@ +b,2.00,. ,2.00,. @&t@ +c,3.00,. ,. ,3.00 +]) +AT_CLEANUP + +AT_SETUP([DO REPEAT -- containing BEGIN DATA]) +AT_DATA([do-repeat.sps], [dnl +DO REPEAT offset = 1 2 3. +DATA LIST NOTABLE /x 1-2. +BEGIN DATA. +10 +20 +30 +END DATA. +COMPUTE x = x + offset. +LIST. +END REPEAT. +]) +AT_CHECK([pspp -o pspp.csv do-repeat.sps]) +AT_CHECK([cat pspp.csv], [0], [dnl +Table: Data List +x +11 +21 +31 + +Table: Data List +x +12 +22 +32 + +Table: Data List +x +13 +23 +33 +]) +AT_CLEANUP + +AT_SETUP([DO REPEAT -- dummy vars not expanded in include files]) +AT_DATA([include.sps], [dnl +COMPUTE y = y + x + 10. +]) +AT_DATA([do-repeat.sps], [dnl +INPUT PROGRAM. +COMPUTE x = 0. +COMPUTE y = 0. +END CASE. +END FILE. +END INPUT PROGRAM. + +DO REPEAT x = 1 2 3. +INCLUDE 'include.sps'. +END REPEAT. + +LIST. +]) +AT_CHECK([pspp -o pspp.csv do-repeat.sps], [0], [dnl +do-repeat.sps:8: warning: DO REPEAT: Dummy variable name `x' hides dictionary variable `x'. +]) +AT_CHECK([cat pspp.csv], [0], [dnl +do-repeat.sps:8: warning: DO REPEAT: Dummy variable name `x' hides dictionary variable `x'. + +Table: Data List +x,y +.00,30.00 +]) +AT_CLEANUP + +AT_SETUP([DO REPEAT -- nested]) AT_DATA([do-repeat.sps], [dnl DATA LIST NOTABLE /a 1. BEGIN DATA. @@ -55,13 +144,7 @@ AT_DATA([do-repeat.sps], [dnl DATA LIST NOTABLE /x 1. DO REPEAT y = 1 TO 10. ]) -AT_CHECK([pspp -o pspp.csv do-repeat.sps], [1], [dnl -error: DO REPEAT: DO REPEAT without END REPEAT. -error: Stopping syntax file processing here to avoid a cascade of dependent command failures. -]) -AT_CHECK([cat pspp.csv], [0], [dnl -error: DO REPEAT: DO REPEAT without END REPEAT. - -error: Stopping syntax file processing here to avoid a cascade of dependent command failures. +AT_CHECK([pspp -O format=csv do-repeat.sps], [1], [dnl +error: DO REPEAT: Syntax error at end of input: expecting `END'. ]) AT_CLEANUP diff --git a/tests/language/data-io/data-list.at b/tests/language/data-io/data-list.at index dba2a4a1..b0582c6c 100644 --- a/tests/language/data-io/data-list.at +++ b/tests/language/data-io/data-list.at @@ -49,7 +49,7 @@ B,F8.0 C,F8.0 D,F8.0 -data-list.pspp:3.9-13: warning: Data for variable D is not valid as format F: Number followed by garbage. +data-list.pspp:3.9-3.13: warning: Data for variable D is not valid as format F: Number followed by garbage. Table: Data List A,B,C,D @@ -160,9 +160,9 @@ end data. list. ]) AT_CHECK([pspp -O format=csv data-list.pspp], [0], [dnl -data-list.pspp:8.1-3: warning: Data for variable count is not valid as format F: Field contents are not numeric. +data-list.pspp:8.1-8.3: warning: Data for variable count is not valid as format F: Field contents are not numeric. -data-list.pspp:11.1-3: warning: Data for variable count is not valid as format F: Field contents are not numeric. +data-list.pspp:11.1-11.3: warning: Data for variable count is not valid as format F: Field contents are not numeric. Table: Data List start,end,count diff --git a/tests/language/data-io/get.at b/tests/language/data-io/get.at index a519b67f..b9a187f2 100644 --- a/tests/language/data-io/get.at +++ b/tests/language/data-io/get.at @@ -58,8 +58,6 @@ dnl We use stdin here, because the bug seems to manifest itself only in dnl interactive mode. AT_CHECK([echo "GET /FILE='nonexistent.sav'." | pspp -O format=csv], [1], [dnl error: An error occurred while opening `nonexistent.sav': No such file or directory. - --:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) AT_CLEANUP diff --git a/tests/language/data-io/inpt-pgm.at b/tests/language/data-io/inpt-pgm.at index f0ce9395..f048d374 100644 --- a/tests/language/data-io/inpt-pgm.at +++ b/tests/language/data-io/inpt-pgm.at @@ -13,10 +13,6 @@ END INPUT PROGRAM. ]) AT_CHECK([pspp -O format=csv input-program.sps], [1], [dnl input-program.sps:3: error: BEGIN DATA: BEGIN DATA is not allowed inside INPUT PROGRAM. - -input-program.sps:4: error: Unknown command `123456789'. - -input-program.sps:5: error: Unknown command `END DATA'. ]) AT_CLEANUP @@ -32,6 +28,6 @@ END INPUT PROGRAM. DESCRIPTIVES x. ]) AT_CHECK([pspp -O format=csv input-program.sps], [1], [dnl -error: DESCRIPTIVES: Syntax error at end of file: expecting `BEGIN'. +error: DESCRIPTIVES: Syntax error at end of input: expecting `BEGIN'. ]) AT_CLEANUP diff --git a/tests/language/data-io/print.at b/tests/language/data-io/print.at index 9381d7da..71259e0c 100644 --- a/tests/language/data-io/print.at +++ b/tests/language/data-io/print.at @@ -172,7 +172,7 @@ PRINT F8.2 LIST. ]) AT_CHECK([pspp -O format=csv print.sps], [1], [dnl -print.sps:7: error: PRINT: Syntax error at `F8.2': expecting a valid subcommand. +print.sps:7.7-7.10: error: PRINT: Syntax error at `F8.2': expecting a valid subcommand. Table: Data List a,b diff --git a/tests/language/dictionary/missing-values.at b/tests/language/dictionary/missing-values.at index df2aeebe..75254462 100644 --- a/tests/language/dictionary/missing-values.at +++ b/tests/language/dictionary/missing-values.at @@ -63,7 +63,7 @@ missing-values.sps:5: error: MISSING VALUES: Missing values provided are too lon missing-values.sps:8: error: MISSING VALUES: Truncating missing value to maximum acceptable length (8 bytes). -missing-values.sps:11: error: MISSING VALUES: Syntax error at `THRU': expecting string. +missing-values.sps:11.26-11.29: error: MISSING VALUES: Syntax error at `THRU': expecting string. missing-values.sps:11: error: MISSING VALUES: THRU is not a variable name. diff --git a/tests/language/expressions/evaluate.at b/tests/language/expressions/evaluate.at index efa71529..e56a3a45 100644 --- a/tests/language/expressions/evaluate.at +++ b/tests/language/expressions/evaluate.at @@ -10,7 +10,12 @@ DEBUG EVALUATE m4_argn(4, check)/[]m4_car(check). AT_CAPTURE_FILE([evaluate.sps]) m4_pushdef([i], [2]) AT_CHECK([pspp --testing-mode --error-file=- --no-output evaluate.sps], - [m4_if(m4_bregexp([m4_foreach([check], [m4_shift($@)], [m4_argn(3, check)])], [error:]), [-1], [0], [1])], + [m4_if(m4_bregexp([m4_foreach([check], [m4_shift($@)], [m4_argn(3, check)])], [error:]), [-1], [0], [1])], + [stdout]) + # Use sed to transform "file:line.column:" into plain "file:line:", + # because column numbers change between opt and noopt versions. + AT_CHECK([[sed 's/\(evaluate.sps:[0-9]\{1,\}\)\.[0-9]\{1,\}:/\1:/' stdout]], + [0], [m4_foreach([check], [m4_shift($@)], [m4_define([i], m4_incr(i))dnl m4_if(m4_argn(3, check), [], [], [evaluate.sps:[]i[]: m4_argn(3, check) @@ -284,7 +289,7 @@ dnl <> token can't be split: [error: DEBUG EVALUATE: Syntax error at `>'.]], dnl # ~= token can't be split: [[1 ~ = 1], [error], - [error: DEBUG EVALUATE: Syntax error at `NOT': expecting end of command.]]) + [error: DEBUG EVALUATE: Syntax error at `~': expecting end of command.]]) CHECK_EXPR_EVAL([exp lg10 ln sqrt abs mod mod10 rnd trunc], [[exp(10)], [22026.47]], diff --git a/tests/language/expressions/parse.at b/tests/language/expressions/parse.at index 7ebc5dc1..df8192b2 100644 --- a/tests/language/expressions/parse.at +++ b/tests/language/expressions/parse.at @@ -18,6 +18,6 @@ END IF. AT_CHECK([pspp -O format=csv parse.sps], [1], [dnl parse.sps:10: error: IF: Unknown identifier y. -parse.sps:10: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. +parse.sps:11: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) AT_CLEANUP diff --git a/tests/language/lexer/lexer.at b/tests/language/lexer/lexer.at index 2c1dfc93..08c14644 100644 --- a/tests/language/lexer/lexer.at +++ b/tests/language/lexer/lexer.at @@ -18,3 +18,46 @@ a 2.00 ]) AT_CLEANUP + +AT_SETUP([lexer properly reports scan errors]) +AT_DATA([lexer.sps], [dnl +x'123' +x'1x' +u'' +u'012345678' +u'd800' +u'110000' +'foo +'very long unterminated string that be ellipsized in its error message +1e .x +` +� +]) +AT_CHECK([pspp -O format=csv lexer.sps], [1], [dnl +"lexer.sps:1.1-1.6: error: Syntax error at `x'123'': String of hex digits has 3 characters, which is not a multiple of 2." + +lexer.sps:2.1-2.5: error: Syntax error at `x'1x'': `x' is not a valid hex digit. + +"lexer.sps:3.1-3.3: error: Syntax error at `u''': Unicode string contains 0 bytes, which is not in the valid range of 1 to 8 bytes." + +"lexer.sps:4.1-4.12: error: Syntax error at `u'012345678'': Unicode string contains 9 bytes, which is not in the valid range of 1 to 8 bytes." + +lexer.sps:5.1-5.7: error: Syntax error at `u'd800'': U+D800 is not a valid Unicode code point. + +lexer.sps:6.1-6.9: error: Syntax error at `u'110000'': U+110000 is not a valid Unicode code point. + +lexer.sps:7.1-7.4: error: Syntax error at `'foo': Unterminated string constant. + +lexer.sps:8.1-8.70: error: Syntax error at `'very long unterminated string that be ellipsized in its err...': Unterminated string constant. + +lexer.sps:9.1-9.2: error: Syntax error at `1e': Missing exponent following `1e'. + +lexer.sps:9.4: error: Syntax error at `.': Unexpected `.' in middle of command. + +lexer.sps:9: error: Unknown command `x'. + +lexer.sps:10.1: error: Syntax error at ``': Bad character ``' in input. + +lexer.sps:11.1: error: Syntax error at `�': Bad character U+FFFD in input. +]) +AT_CLEANUP diff --git a/tests/language/lexer/q2c.at b/tests/language/lexer/q2c.at index eeeed8d7..6ba3f7ab 100644 --- a/tests/language/lexer/q2c.at +++ b/tests/language/lexer/q2c.at @@ -16,7 +16,7 @@ CROSSTABS. AT_CHECK([pspp -O format=csv q2c.sps], [1], [dnl q2c.sps:8: error: EXAMINE: VARIABLES subcommand must be given. -q2c.sps:9: error: ONEWAY: Syntax error at end of command: expecting variable name. +q2c.sps:9.7: error: ONEWAY: Syntax error at end of command: expecting variable name. q2c.sps:10: error: CROSSTABS: TABLES subcommand must be given. ]) diff --git a/tests/language/stats/aggregate.at b/tests/language/stats/aggregate.at index da078f84..1c3a7e17 100644 --- a/tests/language/stats/aggregate.at +++ b/tests/language/stats/aggregate.at @@ -283,9 +283,7 @@ AGGREGATE OUTFILE=* MODE=ADDVARIABLES AT_CHECK([pspp -O format=csv dup-variables.sps], [1], ["dup-variables.sps:24: error: AGGREGATE: Variable name N_BREAK is not unique within the aggregate file dictionary, which contains the aggregate variables and the break variables." - -dup-variables.sps:24: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) -AT_CLEANUP \ No newline at end of file +AT_CLEANUP diff --git a/tests/language/stats/rank.at b/tests/language/stats/rank.at index 99a0459f..6cb36684 100644 --- a/tests/language/stats/rank.at +++ b/tests/language/stats/rank.at @@ -539,8 +539,6 @@ Variables Created By RANK x into Rx(RANK of x) rank.sps:14: error: RANK: DEBUG XFORM FAIL transformation executed - -rank.sps:14: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) AT_CLEANUP @@ -578,9 +576,9 @@ RANK x /RANK INTO foo bar wiz. ]) AT_CHECK([pspp -O format=csv rank.sps], [1], [dnl -rank.sps:15: error: RANK: Syntax error at end of command: expecting `@{:@'. +rank.sps:15.1: error: RANK: Syntax error at end of command: expecting `@{:@'. -rank.sps:19: error: RANK: Syntax error at `d': expecting integer. +rank.sps:19.11: error: RANK: Syntax error at `d': expecting integer. rank.sps:25: error: RANK: Variable x already exists. diff --git a/tests/language/utilities/insert.at b/tests/language/utilities/insert.at index e119f103..34376b15 100644 --- a/tests/language/utilities/insert.at +++ b/tests/language/utilities/insert.at @@ -3,13 +3,13 @@ AT_BANNER([INSERT]) dnl Create a file "batch.sps" that is valid syntax only in batch mode. m4_define([CREATE_BATCH_SPS], [AT_DATA([batch.sps], [dnl -input program. -+ loop #i = 1 to 5. -+ compute z = #i -+ end case. -+ end loop -end file. -end input program. +input program +loop #i = 1 to 5 ++ compute z = #i ++ end case +end loop +end file +end input program ])]) AT_SETUP([INSERT SYNTAX=INTERACTIVE]) @@ -17,14 +17,13 @@ CREATE_BATCH_SPS AT_DATA([insert.sps], [dnl INSERT FILE='batch.sps' - SYNTAX=INTERACTIVE. + SYNTAX=interactive. LIST. ]) AT_CHECK([pspp -o pspp.csv insert.sps], [1], [dnl -batch.sps:2: error: INPUT PROGRAM: Syntax error at `+': expecting command name. -batch.sps:3: error: INPUT PROGRAM: Syntax error at `+': expecting command name. -batch.sps:5: error: INPUT PROGRAM: Syntax error at `+': expecting command name. -batch.sps:7: error: Input program did not create any variables. +batch.sps:2.1-2.4: error: INPUT PROGRAM: Syntax error at `loop': expecting end of command. +batch.sps:3: error: COMPUTE: COMPUTE is allowed only after the active file has been defined or inside INPUT PROGRAM. +batch.sps:4: error: END CASE: END CASE is allowed only inside INPUT PROGRAM. insert.sps:4: error: LIST: LIST is allowed only after the active file has been defined. ]) AT_CLEANUP @@ -111,24 +110,22 @@ END DATA. * The following line is erroneous DISPLAY AKSDJ. +LIST. ])]) AT_SETUP([INSERT ERROR=STOP]) CREATE_ERROR_SPS AT_DATA([insert.sps], [INSERT FILE='error.sps' ERROR=STOP. -LIST. ]) AT_CHECK([pspp -o pspp.csv insert.sps], [1], [dnl error.sps:10: error: DISPLAY: AKSDJ is not a variable name. warning: Error encountered while ERROR=STOP is effective. -error.sps:10: error: Stopping syntax file processing here to avoid a cascade of dependent command failures. ]) AT_CLEANUP AT_SETUP([INSERT ERROR=CONTINUE]) CREATE_ERROR_SPS AT_DATA([insert.sps], [INSERT FILE='error.sps' ERROR=CONTINUE. -LIST. ]) AT_CHECK([pspp -o pspp.csv insert.sps], [1], [dnl error.sps:10: error: DISPLAY: AKSDJ is not a variable name. @@ -156,7 +153,7 @@ INSERT LIST. ]) AT_CHECK([pspp -O format=csv insert.sps], [1], [dnl -insert.sps:3: error: INSERT: Can't find `nonexistent' in include file search path. +insert.sps:2: error: INSERT: Can't find `nonexistent' in include file search path. insert.sps:6: error: LIST: LIST is allowed only after the active file has been defined. ])