This commit reimplements PSPP lexical analysis from the ground up.
From a PSPP user's perspective, this should make PSPP more reliable
and make it easier to work with syntax files in non-ASCII encodings.
See the changes to NEWS for more details.
From a developer's perspective, the most visible change may be that
strings within tokens are now always encoded in UTF-8, regardless of
the syntax file's encoding. Many of the changes in this commit are
due to this, especially those to functions that check for valid
identifiers: an identifier in UTF-8 is not necessarily the same length
when encoded in the dictionary's encoding, but limits on identifier
length must be enforced in the dictionary's encoding (otherwise it
might not be possible to write out a valid system file, since the
identifier might not fit in the fixed length fields in such files).
Another important change is that, whereas before some special syntax
had to be handled by the parser providing feedback to the lexer, now
increasing the sophistication of the lexer has enabled all PSPP syntax
to be analyzed into tokens. This permitted some other improvements:
- An arbitrary number of tokens of lookahead, up to the end of the
current command, is now supported using lex_next_token() and
related functions.
- Before, some command implementations had a special attribute that
meant that the top-level PSPP command parser would not consume the
final token of the command name (because that token was not
followed by tokenizable syntax). This is no longer necessary and
has been removed.
- Before, each command implementation was responsible for ensuring
that valid command syntax was not followed by trailing garbage,
often by calling lex_end_of_command() as the last step of parsing.
This is no longer necessary; the main command parser will ensure
this for itself.
PSPP NEWS -- history of user-visible changes.
-Time-stamp: <2010-11-21 11:58:30 blp>
-Copyright (C) 1996-9, 2000, 2008, 2009, 2010 Free Software Foundation, Inc.
+Time-stamp: <2011-03-19 16:39:28 blp>
+Copyright (C) 1996-9, 2000, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
See the end for copying conditions.
Please send PSPP bug reports to bug-gnu-pspp@gnu.org.
Changes from 0.7.3 to 0.7.6:
+ * The "pspp" program has a new option --batch (or -b) that selects
+ "batch" syntax mode. In previous versions of PSPP this syntax mode
+ was the default. Now a new "auto" syntax mode is the default. In
+ "auto" mode, PSPP interprets most syntax files correctly regardless
+ of their intended syntax mode.
+
+ See the "Syntax Variants" section in the PSPP manual for more
+ information.
+
+ * The "pspp" program has a new option --syntax-encoding that
+ specifies the encoding for syntax files listed on the command line,
+ as well as the default encoding for syntax files included with
+ INCLUDE or INSERT. The default is to accept the system locale
+ encoding, UTF-8, UTF-16, or UTF-32, automatically detecting which
+ one the system file uses.
+
+ See the documentation for the INSERT command in the PSPP manual for
+ more information.
+
+ * The INCLUDE and INSERT commands now support the ENCODING subcommand
+ to specify the encoding for the included syntax file.
+
+ * Strings may now include arbitrary Unicode code points specified in
+ hexadecimal, using the syntax U'hhhh'. For example, Unicode code
+ point U+1D11E, the musical G clef character, may be expressed as
+ U'1D11E'.
+
+ See the "Tokens" section in the PSPP manual for more information.
+
+ * In previous versions of PSPP, in a string expressed in hexadecimal
+ with X'hh' syntax, the hexadecimal digits expressed bytes in the
+ locale encoding. In this version of PSPP, X'hh' syntax always
+ expresses bytes in UTF-8 encoding.
+
+ See the "Tokens" section in the PSPP manual for more information.
+
+ * The DO REPEAT command has been reimplemented. The most prominent
+ change is that when a DO REPEAT block contains an INCLUDE or INSERT
+ command, substitutions are not applied to the included file.
+
+ See the "DO REPEAT" section in the PSPP manual for more information.
+
* NPAR TESTS now supports the /KRUSKAL-WALLIS and /RUNS subcommands.
* AUTORECODE now supports the /GROUP subcommand.
printf-posix \
printf-safe \
progname \
+ rawmemchr \
read-file \
regex \
relocatable-prog \
unistr/u8-cpy \
unistr/u8-mbtouc \
unistr/u8-strlen \
+ unistr/u8-strmbtouc \
unistr/u8-strncat \
uniwidth/u8-strwidth \
unitypes \
@node Variable Name
@subsection Variable Name
-A variable name is a string between 1 and @code{VAR_NAME_LEN} bytes
+A variable name is a string between 1 and @code{ID_MAX_LEN} bytes
long that satisfies the rules for PSPP identifiers
(@pxref{Tokens,,,pspp, PSPP Users Guide}). Variable names are
mixed-case and treated case-insensitively.
-@deftypefn Macro int VAR_NAME_LEN
+@deftypefn Macro int ID_MAX_LEN
Maximum length of a variable name, in bytes, currently 64.
@end deftypefn
Renaming Variables}).
@end deftypefun
-@anchor{var_is_plausible_name}
-@deftypefun {bool} var_is_valid_name (const char *@var{name}, bool @var{issue_error})
-@deftypefunx {bool} var_is_plausible_name (const char *@var{name}, bool @var{issue_error})
-Tests @var{name} for validity or ``plausibility.'' Returns true if
-the name is acceptable, false otherwise. If the name is not
-acceptable and @var{issue_error} is true, also issues an error message
-explaining the violation.
-
-A valid name is one that fully satisfies all of the requirements for
-variable names (@pxref{Tokens,,,pspp, PSPP Users Guide}). A
-``plausible'' name is simply a string whose length is in the valid
-range and that is not a reserved word. PSPP accepts plausible but
-invalid names as variable names in some contexts where the character
-encoding scheme is ambiguous, as when reading variable names from
-system files.
-@end deftypefun
-
@deftypefun {enum dict_class} var_get_dict_class (const struct variable *@var{var})
Returns the dictionary class of @var{var}'s name (@pxref{Dictionary
Class}).
@node Variable Short Names
@subsection Variable Short Names
-PSPP variable names may be up to 64 (@code{VAR_NAME_LEN}) bytes long.
+PSPP variable names may be up to 64 (@code{ID_MAX_LEN}) bytes long.
The system and portable file formats, however, were designed when
variable names were limited to 8 bytes in length. Since then, the
system file format has been augmented with an extension record that
Sets @var{var}'s short name to @var{short_name}, or removes
@var{var}'s short name if @var{short_name} is a null pointer. If it
is non-null, then @var{short_name} must be a plausible name for a
-variable (@pxref{var_is_plausible_name}). The name will be truncated
+variable. The name will be truncated
to 8 bytes in length and converted to all-uppercase.
@end deftypefun
var_list
num_or_range@dots{}
'string'@dots{}
+ ALL
num_or_range takes one of the following forms:
number
different variables, numbers, or strings into the block with each
repetition.
-Specify a dummy variable name followed by an equals sign (@samp{=}) and
-the list of replacements. Replacements can be a list of variables
-(which may be existing variables or new variables or some combination),
-numbers, or strings. When new variable names are
-specified, @cmd{DO REPEAT} creates them as numeric variables. When numbers
-are specified, runs of increasing integers may be indicated as
-@code{@var{num1} TO @var{num2}}, so that
+Specify a dummy variable name followed by an equals sign (@samp{=})
+and the list of replacements. Replacements can be a list of existing
+or new variables, numbers, strings, or @code{ALL} to specify all
+existing variables. When numbers are specified, runs of increasing
+integers may be indicated as @code{@var{num1} TO @var{num2}}, so that
@samp{1 TO 5} is short for @samp{1 2 3 4 5}.
Multiple dummy variables can be specified. Each
for each dummy variable is substituted; and so on.
Dummy variable substitutions work like macros. They take place
-anywhere in a line that the dummy variable name occurs as a token,
-including command and subcommand names. For this reason,
-words commonly used in command and subcommand names should not be used
-as dummy variable identifiers.
+anywhere in a line that the dummy variable name occurs. This includes
+command and subcommand names, so command and subcommand names that
+appear in the code block should not be used as dummy variable
+identifiers. Dummy variable substitutions do not occur inside quoted
+strings, comments, unquoted strings (such as the text on the
+@cmd{TITLE} or @cmd{DOCUMENT} command), or inside @cmd{BEGIN
+DATA}@dots{}@cmd{END DATA}.
+
+New variable names used as replacements are not automatically created
+as variables, but only if used in the code block in a context that
+would create them, e.g.@: on a @cmd{NUMERIC} or @cmd{STRING} command
+or on the left side of a @cmd{COMPUTE} assignment.
+
+Any command may appear within DO REPEAT, including nested DO REPEAT
+commands. If @cmd{INCLUDE} or @cmd{INSERT} appears within DO REPEAT,
+the substitutions do not apply to the included file.
If PRINT is specified on @cmd{END REPEAT}, the commands after substitutions
are made are printed to the listing file, prefixed by a plus sign
@example
-I, --include=@var{dir}
-I-, --no-include
+-b, --batch
-i, --interactive
-r, --no-statrc
-a, --algorithm=@{compatible|enhanced@}
-x, --syntax=@{compatible|enhanced@}
+--syntax-encoding=@var{encoding}
@end example
@item Informational options
user's home directory, followed by PSPP's system configuration
directory (usually @file{/etc/pspp} or @file{/usr/local/etc/pspp}).
+@item -b
+@item --batch
@item -i
@itemx --interactive
-This option forces syntax files to be interpreted in interactive
-mode, rather than the default batch mode. @xref{Syntax Variants}, for
-a description of the differences.
+These options forces syntax files to be interpreted in batch mode or
+interactive mode, respectively, rather than the default ``auto'' mode.
+@xref{Syntax Variants}, for a description of the differences.
@item -r
@itemx --no-statrc
beyond those compatible with the proprietary program SPSS. With
@code{compatible}, PSPP rejects syntax that uses these extensions.
-@item -?
-@itemx --help
+@item --syntax-encoding=@var{encoding}
+Specifies @var{encoding} as the encoding for syntax files named on the
+command line. The @var{encoding} also becomes the default encoding
+for other syntax files read during the PSPP session by the
+@cmd{INCLUDE} and @cmd{INSERT} commands. @xref{INSERT}, for the
+accepted forms of @var{encoding}.
+
+@item --help
Prints a message describing PSPP command-line syntax and the available
device formats, then exits.
significant inside strings.
Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
-'c'} is equivalent to @samp{'abc'}. Concatenation is useful for
-splitting a single string across multiple source lines.
-
-Strings may also be expressed as hexadecimal, octal, or binary
-character values by prefixing the initial quote character by @samp{X},
-@samp{O}, or @samp{B} or their lowercase equivalents. Each pair,
-triplet, or octet of characters, according to the radix, is
-transformed into a single character with the given value. If there is
-an incomplete group of characters, the missing final digits are
-assumed to be @samp{0}. These forms of strings are nonportable
-because numeric values are associated with different characters by
-different operating systems. Therefore, their use should be confined
-to syntax files that will not be widely distributed.
-
-@cindex characters, reserved
-@cindex 0
-@cindex white space
-The character with value 00 is reserved for
-internal use by PSPP. Its use in strings causes an error and
-replacement by a space character.
+'c'} is equivalent to @samp{'abc'}. So that a long string may be
+broken across lines, a line break may precede or follow, or both
+precede and follow, the @samp{+}. (However, an entirely blank line
+preceding or following the @samp{+} is interpreted as ending the
+current command.)
+
+Strings may also be expressed as hexadecimal character values by
+prefixing the initial quote character by @samp{x} or @samp{X}.
+Regardless of the syntax file or active dataset's encoding, the
+hexadecimal digits in the string are interpreted as Unicode characters
+in UTF-8 encoding.
+
+Individual Unicode code points may also be expressed by specifying the
+hexadecimal code point number in single or double quotes preceded by
+@samp{u} or @samp{U}. For example, Unicode code point U+1D11E, the
+musical G clef character, could be expressed as @code{U'1D11E'}.
+Invalid Unicode code points (above U+10FFFF or in between U+D800 and
+U+DFFF) are not allowed.
+
+When strings are concatenated with @samp{+}, each segment's prefix is
+considered individually. For example, @code{'The G clef symbol is:' +
+u"1d11e" + "."} inserts a G clef symbol in the middle of an otherwise
+plain text string.
@item Punctuators and Operators
@cindex punctuators
one that consists only of white space or comments, also ends a command.
@node Syntax Variants
-@section Variants of syntax.
+@section Syntax Variants
@cindex Batch syntax
@cindex Interactive syntax
-There are two variants of command syntax, @i{viz}: @dfn{batch} mode and
-@dfn{interactive} mode.
-Batch mode is the default when reading commands from a file.
-Interactive mode is the default when commands are typed at a prompt
-by a user.
-Certain commands, such as @cmd{INSERT} (@pxref{INSERT}), may explicitly
-change the syntax mode.
-
-In batch mode, any line that contains a non-space character
-in the leftmost column begins a new command.
-Thus, each command consists of a flush-left line followed by any
-number of lines indented from the left margin.
-In this mode, a plus or minus sign (@samp{+}, @samp{@minus{}}) as the
-first character in a line is ignored and causes that line to begin a
-new command, which allows for visual indentation of a command without
-that command being considered part of the previous command.
-The period terminating the end of a command is optional but recommended.
-
-In interactive mode, each command must be terminated with a period
-or by a blank line.
-The use of @samp{+} and @samp{@minus{}} as continuation characters is not
-permitted.
+There are three variants of command syntax, which vary only in how
+they detect the end of one command and the start of the next.
+
+In @dfn{interactive mode}, which is the default for syntax typed at a
+command prompt, a period as the last non-blank character on a line
+ends a command. A blank line also ends a command.
+
+In @dfn{batch mode}, an end-of-line period or a blank line also ends a
+command. Additionally, it treats any line that has a non-blank
+character in the leftmost column as beginning a new command. Thus, in
+batch mode the second and subsequent lines in a command must be
+indented.
+
+Regardless of the syntax mode, a plus sign, minus sign, or period in
+the leftmost column of a line is ignored and causes that line to begin
+a new command. This is most useful in batch mode, in which the first
+line of a new command could not otherwise be indented, but it is
+accepted regardless of syntax mode.
+
+The default mode for reading commands from a file is @dfn{auto mode}.
+It is the same as batch mode, except that a line with a non-blank in
+the leftmost column only starts a new command if that line begins with
+the name of a PSPP command. This correctly interprets most valid PSPP
+syntax files regardless of the syntax mode for which they are
+intended.
+
+The @option{--interactive} (or @option{-i}) or @option{--batch} (or
+@option{-b}) options set the syntax mode for files listed on the PSPP
+command line. @xref{Main Options}, for more details.
@node Types of Commands
@section Types of Commands
@vindex INCLUDE
@display
- INCLUDE [FILE=]'file-name'.
+ INCLUDE [FILE=]'file-name' [ENCODING='encoding'].
@end display
@cmd{INCLUDE} causes the PSPP command processor to read an
Include files may be nested to any depth, up to the limit of available
memory.
+The @cmd{INSERT} command (@pxref{INSERT}) is a more flexible
+alternative to @cmd{INCLUDE}. An INCLUDE command acts the same as
+INSERT with ERROR=STOP CD=NO SYNTAX=BATCH specified.
-The @cmd{INSERT} command (@pxref{INSERT}) may be used instead of
-@cmd{INCLUDE} if you require more flexible options.
-The syntax
-@example
-INCLUDE FILE=@var{file-name}.
-@end example
-@noindent
-functions identically to
-@example
-INSERT FILE=@var{file-name} ERROR=STOP CD=NO SYNTAX=BATCH.
-@end example
-
+The optional ENCODING subcommand has the same meaning as on INSERT.
@node INSERT
@section INSERT
INSERT [FILE=]'file-name'
[CD=@{NO,YES@}]
[ERROR=@{CONTINUE,STOP@}]
- [SYNTAX=@{BATCH,INTERACTIVE@}].
+ [SYNTAX=@{BATCH,INTERACTIVE@}]
+ [ENCODING='encoding'].
@end display
@cmd{INSERT} is similar to @cmd{INCLUDE} (@pxref{INCLUDE})
conventions. @xref{Syntax Variants}.
The default setting is @samp{SYNTAX=BATCH}.
+ENCODING optionally specifies the character set used by the included
+file. Its argument, which is not case-sensitive, must be in one of
+the following forms:
+
+@table @asis
+@item @code{Locale}
+The encoding used by the system locale, or as overridden by the SET
+LOCALE command (@pxref{SET}). On Unix systems, environment variables,
+e.g.@: @env{LANG} or @env{LC_ALL}, determine the system locale.
+
+@item IANA character set name
+One of the character set names listed by IANA at
+@uref{http://www.iana.org/assignments/character-sets}. Some examples
+are @code{ASCII} (United States), @code{ISO-8859-1} (western Europe),
+@code{EUC-JP} (Japan), and @code{windows-1252} (Windows). Not all
+systems support all character sets.
+
+@item @code{Auto}
+@item @code{Auto,@var{encoding}}
+Automatically detects whether a syntax file is encoded in
+@var{encoding} or in a Unicode encoding such as UTF-8, UTF-16, or
+UTF-32. The @var{encoding} may be an IANA character set name or
+@code{Locale} (the default). Only ASCII compatible encodings can
+automatically be distinguished from UTF-8 (the most common locale
+encodings are all ASCII-compatible).
+@end table
+
+When ENCODING is not specified, the default is taken from the
+@option{--syntax-encoding} command option, if it was specified, and
+otherwise it is @code{Auto}.
+
@node PERMISSIONS
@section PERMISSIONS
@vindex PERMISSIONS
/MXWARNS=max_warnings
/WORKSPACE=workspace_size
-(program execution)
+(syntax execution)
+ /LOCALE='locale'
/MEXPAND=@{ON,OFF@}
/MITERATE=max_iterations
/MNEST=max_nest
The default value is 100.
@end table
-Program execution subcommands control the way that PSPP commands
-execute. The program execution subcommands are
+Syntax execution subcommands control the way that PSPP commands
+execute. The syntax execution subcommands are
@table @asis
+@item LOCALE
+Overrides the system locale for the purpose of reading and writing
+syntax and data files. The argument should be a locale name in the
+general form @code{language_country.encoding}, where @code{language}
+and @code{country} are 2-character language and country abbreviations,
+respectively, and @code{encoding} is an IANA character set name.
+Example locales are @code{en_US.UTF-8} (UTF-8 encoded English as
+spoken in the United States) and @code{ja_JP.EUC-JP} (EUC-JP encoded
+Japanese as spoken in Japan).
+
@item MEXPAND
@itemx MITERATE
@itemx MNEST
/* PSPP - computes sample statistics.
- Copyright (C) 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
+ Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
/* A message handler which writes messages to PSPP::errstr */
static void
-message_handler (const struct msg *m)
+message_handler (const struct msg *m, void *aux)
{
SV *errstr = get_sv("PSPP::errstr", TRUE);
sv_setpv (errstr, m->text);
assert (0 == strncmp (ver, bare_version, strlen (ver)));
i18n_init ();
- msg_init (NULL, message_handler);
+ msg_set_handler (message_handler, NULL);
settings_init (0, 0);
fh_init ();
struct dictionary *dict
char *docs
CODE:
- dict_set_documents (dict, docs);
+ dict_set_documents_string (dict, docs);
void
struct dictionary *dict
char *doc
CODE:
- dict_add_document_line (dict, doc);
+ dict_add_document_line (dict, doc, false);
void
INIT:
SV *errstr = get_sv("PSPP::errstr", TRUE);
sv_setpv (errstr, "");
- if ( ! var_is_plausible_name (name, false))
+ if ( ! id_is_plausible (name, false))
{
sv_setpv (errstr, "The variable name is not valid.");
XSRETURN_UNDEF;
struct variable *var;
char *label
CODE:
- var_set_label (var, label);
+ var_set_label (var, label, NULL, false);
void
ok ($d->get_var_cnt () == 0);
$d->set_label ("My Dictionary");
- $d->set_documents ("These Documents");
+ $d->add_document ("These Documents");
# Tests for variable creation
)
);
- $d->set_documents ("This should not appear");
+ $d->add_document ("This should not appear");
$d->clear_documents ();
$d->add_document ("This is a document line");
src/data/gnumeric-reader.c \
src/data/gnumeric-reader.h \
src/data/identifier.c \
+ src/data/identifier2.c \
src/data/identifier.h \
src/data/lazy-casereader.c \
src/data/lazy-casereader.h \
#include <stdint.h>
#include <stdlib.h>
#include <ctype.h>
+#include <unistr.h>
#include "data/attributes.h"
#include "data/case.h"
#include "libpspp/compiler.h"
#include "libpspp/hash-functions.h"
#include "libpspp/hmap.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/misc.h"
#include "libpspp/pool.h"
#include "libpspp/str.h"
+#include "libpspp/string-array.h"
#include "gl/intprops.h"
#include "gl/minmax.h"
#include "gl/xalloc.h"
+#include "gl/xmemdup0.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
struct variable *filter; /* FILTER variable. */
casenumber case_limit; /* Current case limit (N command). */
char *label; /* File label. */
- struct string documents; /* Documents, as a string. */
+ struct string_array documents; /* Documents. */
struct vector **vector; /* Vectors of variables. */
size_t vector_cnt; /* Number of vectors. */
struct attrset attributes; /* Custom attributes. */
return d->encoding ;
}
+/* Returns true if UTF-8 string ID is an acceptable identifier in DICT's
+ encoding, false otherwise. If ISSUE_ERROR is true, issues an explanatory
+ error message on failure. */
+bool
+dict_id_is_valid (const struct dictionary *dict, const char *id,
+ bool issue_error)
+{
+ return id_is_valid (id, dict->encoding, issue_error);
+}
void
dict_set_change_callback (struct dictionary *d,
d->case_limit = 0;
free (d->label);
d->label = NULL;
- ds_destroy (&d->documents);
+ string_array_clear (&d->documents);
dict_clear_vectors (d);
attrset_clear (&d->attributes);
}
static char *
make_hinted_name (const struct dictionary *dict, const char *hint)
{
- char name[VAR_NAME_LEN + 1];
+ size_t hint_len = strlen (hint);
bool dropped = false;
- char *cp;
-
- for (cp = name; *hint && cp < name + VAR_NAME_LEN; hint++)
+ char *root, *rp;
+ size_t ofs;
+ int mblen;
+
+ /* The allocation size here is OK: characters that are copied directly fit
+ OK, and characters that are not copied directly are replaced by a single
+ '_' byte. If u8_mbtouc() replaces bad input by 0xfffd, then that will get
+ replaced by '_' too. */
+ root = rp = xmalloc (hint_len + 1);
+ for (ofs = 0; ofs < hint_len; ofs += mblen)
{
- if (cp == name
- ? lex_is_id1 (*hint) && *hint != '$'
- : lex_is_idn (*hint))
+ ucs4_t uc;
+
+ mblen = u8_mbtouc (&uc, CHAR_CAST (const uint8_t *, hint + ofs),
+ hint_len - ofs);
+ if (rp == root
+ ? lex_uc_is_id1 (uc) && uc != '$'
+ : lex_uc_is_idn (uc))
{
if (dropped)
{
- *cp++ = '_';
+ *rp++ = '_';
dropped = false;
}
- if (cp < name + VAR_NAME_LEN)
- *cp++ = *hint;
+ rp += u8_uctomb (CHAR_CAST (uint8_t *, rp), uc, 6);
}
- else if (cp > name)
+ else if (rp != root)
dropped = true;
}
- *cp = '\0';
+ *rp = '\0';
- if (name[0] != '\0')
+ if (root[0] != '\0')
{
- size_t len = strlen (name);
unsigned long int i;
- if (var_name_is_insertable (dict, name))
- return xstrdup (name);
+ if (var_name_is_insertable (dict, root))
+ return root;
for (i = 0; i < ULONG_MAX; i++)
{
char suffix[INT_BUFSIZE_BOUND (i) + 1];
- int ofs;
+ char *name;
suffix[0] = '_';
if (!str_format_26adic (i + 1, &suffix[1], sizeof suffix - 1))
NOT_REACHED ();
- ofs = MIN (VAR_NAME_LEN - strlen (suffix), len);
- strcpy (&name[ofs], suffix);
-
+ name = utf8_encoding_concat (root, suffix, dict->encoding, 64);
if (var_name_is_insertable (dict, name))
- return xstrdup (name);
+ {
+ free (root);
+ return name;
+ }
+ free (name);
}
}
+ free (root);
+
return NULL;
}
d->label = label != NULL && label[0] != '\0' ? xstrndup (label, 60) : NULL;
}
-/* Returns the documents for D, or a null pointer if D has no
- documents. If the return value is nonnull, then the string
- will be an exact multiple of DOC_LINE_LENGTH bytes in length,
- with each segment corresponding to one line. */
-const char *
+/* Returns the documents for D, as an UTF-8 encoded string_array. The
+ return value is always nonnull; if there are no documents then the
+ string_arary is empty.*/
+const struct string_array *
dict_get_documents (const struct dictionary *d)
{
- return ds_is_empty (&d->documents) ? NULL : ds_cstr (&d->documents);
+ return &d->documents;
}
-/* Sets the documents for D to DOCUMENTS, or removes D's
- documents if DOCUMENT is a null pointer. If DOCUMENTS is
- nonnull, then it should be an exact multiple of
- DOC_LINE_LENGTH bytes in length, with each segment
- corresponding to one line. */
+/* Replaces the documents for D by NEW_DOCS, a UTF-8 encoded string_array. */
void
-dict_set_documents (struct dictionary *d, const char *documents)
+dict_set_documents (struct dictionary *d, const struct string_array *new_docs)
{
- size_t remainder;
+ size_t i;
- ds_assign_cstr (&d->documents, documents != NULL ? documents : "");
+ dict_clear_documents (d);
- /* In case the caller didn't get it quite right, pad out the
- final line with spaces. */
- remainder = ds_length (&d->documents) % DOC_LINE_LENGTH;
- if (remainder != 0)
- ds_put_byte_multiple (&d->documents, ' ', DOC_LINE_LENGTH - remainder);
+ for (i = 0; i < new_docs->n; i++)
+ dict_add_document_line (d, new_docs->strings[i], false);
+}
+
+/* Replaces the documents for D by UTF-8 encoded string NEW_DOCS, dividing it
+ into individual lines at new-line characters. Each line is truncated to at
+ most DOC_LINE_LENGTH bytes in D's encoding. */
+void
+dict_set_documents_string (struct dictionary *d, const char *new_docs)
+{
+ const char *s;
+
+ dict_clear_documents (d);
+ for (s = new_docs; *s != '\0'; )
+ {
+ size_t len = strcspn (s, "\n");
+ char *line = xmemdup0 (s, len);
+ dict_add_document_line (d, line, false);
+ free (line);
+
+ s += len;
+ if (*s == '\n')
+ s++;
+ }
}
/* Drops the documents from dictionary D. */
void
dict_clear_documents (struct dictionary *d)
{
- ds_clear (&d->documents);
+ string_array_clear (&d->documents);
}
-/* Appends LINE to the documents in D. LINE will be truncated or
- padded on the right with spaces to make it exactly
- DOC_LINE_LENGTH bytes long. */
-void
-dict_add_document_line (struct dictionary *d, const char *line)
+/* Appends the UTF-8 encoded LINE to the documents in D. LINE will be
+ truncated so that it is no more than 80 bytes in the dictionary's
+ encoding. If this causes some text to be lost, and ISSUE_WARNING is true,
+ then a warning will be issued. */
+bool
+dict_add_document_line (struct dictionary *d, const char *line,
+ bool issue_warning)
{
- if (strlen (line) > DOC_LINE_LENGTH)
+ size_t trunc_len;
+ bool truncated;
+
+ trunc_len = utf8_encoding_trunc_len (line, d->encoding, DOC_LINE_LENGTH);
+ truncated = line[trunc_len] != '\0';
+ if (truncated && issue_warning)
{
/* Note to translators: "bytes" is correct, not characters */
msg (SW, _("Truncating document line to %d bytes."), DOC_LINE_LENGTH);
}
- buf_copy_str_rpad (ds_put_uninit (&d->documents, DOC_LINE_LENGTH),
- DOC_LINE_LENGTH, line, ' ');
+
+ string_array_append_nocopy (&d->documents, xmemdup0 (line, trunc_len));
+
+ return !truncated;
}
/* Returns the number of document lines in dictionary D. */
size_t
dict_get_document_line_cnt (const struct dictionary *d)
{
- return ds_length (&d->documents) / DOC_LINE_LENGTH;
+ return d->documents.n;
}
-/* Copies document line number IDX from dictionary D into
- LINE, trimming off any trailing white space. */
-void
-dict_get_document_line (const struct dictionary *d,
- size_t idx, struct string *line)
+/* Returns document line number IDX in dictionary D. The caller must not
+ modify or free the returned string. */
+const char *
+dict_get_document_line (const struct dictionary *d, size_t idx)
{
- assert (idx < dict_get_document_line_cnt (d));
- ds_assign_substring (line, ds_substr (&d->documents, idx * DOC_LINE_LENGTH,
- DOC_LINE_LENGTH));
- ds_rtrim (line, ss_cstr (CC_SPACES));
+ assert (idx < d->documents.n);
+ return d->documents.strings[idx];
}
/* Creates in D a vector named NAME that contains the CNT
/* Documents. */
#define DOC_LINE_LENGTH 80 /* Fixed length of document lines. */
-const char *dict_get_documents (const struct dictionary *);
-void dict_set_documents (struct dictionary *, const char *);
+const struct string_array *dict_get_documents (const struct dictionary *);
+void dict_set_documents (struct dictionary *, const struct string_array *);
+void dict_set_documents_string (struct dictionary *, const char *);
void dict_clear_documents (struct dictionary *);
-void dict_add_document_line (struct dictionary *, const char *);
+bool dict_add_document_line (struct dictionary *, const char *,
+ bool issue_warning);
size_t dict_get_document_line_cnt (const struct dictionary *);
-void dict_get_document_line (const struct dictionary *,
- size_t, struct string *);
+const char *dict_get_document_line (const struct dictionary *, size_t);
/* Vectors. */
bool dict_create_vector (struct dictionary *, const char *name,
void dict_set_encoding (struct dictionary *d, const char *enc);
const char *dict_get_encoding (const struct dictionary *d);
+bool dict_id_is_valid (const struct dictionary *, const char *id,
+ bool issue_error);
+
/* Internal variables. */
struct variable *dict_create_internal_var (int case_idx, int width);
void dict_destroy_internal_var (struct variable *);
return inline_file;
}
-/* Creates and returns a new file handle with the given ID, which
- may be null. If it is non-null, it must be unique among
- existing file identifiers. The new handle is associated with
- file FILE_NAME and the given PROPERTIES. */
+/* Creates and returns a new file handle with the given ID, which may be null.
+ If it is non-null, it must be a UTF-8 encoded string that is unique among
+ existing file identifiers. The new handle is associated with file FILE_NAME
+ and the given PROPERTIES. */
struct file_handle *
fh_create_file (const char *id, const char *file_name,
const struct fh_properties *properties)
/* PSPP - a program for statistical analysis.
- Copyright (C) 2007 Free Software Foundation, Inc.
+ Copyright (C) 2007, 2010 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
struct gnumeric_read_info
{
- char *sheet_name ;
- char *file_name ;
- char *cell_range ;
+ char *sheet_name ; /* In UTF-8. */
+ char *file_name ; /* In filename encoding. */
+ char *cell_range ; /* In UTF-8. */
int sheet_index ;
bool read_names ;
int asw ;
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2005, 2009, 2010 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2005, 2009, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include "data/identifier.h"
-#include <assert.h>
#include <string.h>
#include <unictype.h>
-#include <unistr.h>
#include "libpspp/assertion.h"
-#include "libpspp/cast.h"
-#include "libpspp/i18n.h"
-#include "libpspp/message.h"
#include "gl/c-ctype.h"
return T_ID;
}
-
-/* Returns the name for the given keyword token type. */
-const char *
-lex_id_name (enum token_type token)
-{
- const struct keyword *kw;
-
- for (kw = keywords; kw < &keywords[keyword_cnt]; kw++)
- if (kw->token == token)
- {
- /* A "struct substring" is not guaranteed to be
- null-terminated, as our caller expects, but in this
- case it always will be. */
- return ss_data (kw->identifier);
- }
- NOT_REACHED ();
-}
/* Tokens. */
bool lex_is_keyword (enum token_type);
+/* Validating identifiers. */
+#define ID_MAX_LEN 64 /* Maximum length of identifier, in bytes. */
+
+bool id_is_valid (const char *id, const char *dict_encoding, bool issue_error);
+bool id_is_plausible (const char *id, bool issue_error);
+
/* Recognizing identifiers. */
bool lex_is_id1 (char);
bool lex_is_idn (char);
size_t n);
int lex_id_to_token (struct substring);
-/* Identifier names. */
-const char *lex_id_name (enum token_type);
-
#endif /* !data/identifier.h */
--- /dev/null
+/* PSPP - a program for statistical analysis.
+ Copyright (C) 1997-9, 2000, 2005, 2009, 2010, 2011 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+/* This file implements parts of identifier.h that call the msg() function.
+ This allows test programs that do not use those functions to avoid linking
+ additional object files. */
+
+#include <config.h>
+
+#include "data/identifier.h"
+
+#include <string.h>
+#include <unistr.h>
+
+#include "libpspp/cast.h"
+#include "libpspp/i18n.h"
+#include "libpspp/message.h"
+
+#include "gl/c-ctype.h"
+
+#include "gettext.h"
+#define _(msgid) gettext (msgid)
+
+/* Returns true if UTF-8 string ID is an acceptable identifier in encoding
+ DICT_ENCODING (UTF-8 if null), false otherwise. If ISSUE_ERROR is true,
+ issues an explanatory error message on failure. */
+bool
+id_is_valid (const char *id, const char *dict_encoding, bool issue_error)
+{
+ size_t dict_len;
+
+ if (!id_is_plausible (id, issue_error))
+ return false;
+
+ if (dict_encoding != NULL)
+ {
+ /* XXX need to reject recoded strings that contain the fallback
+ character. */
+ dict_len = recode_string_len (dict_encoding, "UTF-8", id, -1);
+ }
+ else
+ dict_len = strlen (id);
+
+ if (dict_len > ID_MAX_LEN)
+ {
+ if (issue_error)
+ msg (SE, _("Identifier `%s' exceeds %d-byte limit."),
+ id, ID_MAX_LEN);
+ return false;
+ }
+
+ return true;
+}
+
+/* Returns true if UTF-8 string ID is an plausible identifier, false
+ otherwise. If ISSUE_ERROR is true, issues an explanatory error message on
+ failure. */
+bool
+id_is_plausible (const char *id, bool issue_error)
+{
+ const uint8_t *bad_unit;
+ const uint8_t *s;
+ char ucname[16];
+ int mblen;
+ ucs4_t uc;
+
+ /* ID cannot be the empty string. */
+ if (id[0] == '\0')
+ {
+ if (issue_error)
+ msg (SE, _("Identifier cannot be empty string."));
+ return false;
+ }
+
+ /* ID cannot be a reserved word. */
+ if (lex_id_to_token (ss_cstr (id)) != T_ID)
+ {
+ if (issue_error)
+ msg (SE, _("`%s' may not be used as an identifier because it "
+ "is a reserved word."), id);
+ return false;
+ }
+
+ bad_unit = u8_check (CHAR_CAST (const uint8_t *, id), strlen (id));
+ if (bad_unit != NULL)
+ {
+ /* If this message ever appears, it probably indicates a PSPP bug since
+ it shouldn't be possible to get invalid UTF-8 this far. */
+ if (issue_error)
+ msg (SE, _("`%s' may not be used as an identifier because it "
+ "contains ill-formed UTF-8 at byte offset %tu."),
+ id, CHAR_CAST (const char *, bad_unit) - id);
+ return false;
+ }
+
+ /* Check that it is a valid identifier. */
+ mblen = u8_strmbtouc (&uc, CHAR_CAST (uint8_t *, id));
+ if (!lex_uc_is_id1 (uc))
+ {
+ if (issue_error)
+ msg (SE, _("Character %s (in `%s') may not appear "
+ "as the first character in a identifier."),
+ uc_name (uc, ucname), id);
+ return false;
+ }
+
+ for (s = CHAR_CAST (uint8_t *, id + mblen);
+ (mblen = u8_strmbtouc (&uc, s)) != 0;
+ s += mblen)
+ if (!lex_uc_is_idn (uc))
+ {
+ if (issue_error)
+ msg (SE, _("Character %s (in `%s') may not appear in an "
+ "identifier."),
+ uc_name (uc, ucname), id);
+ return false;
+ }
+
+ return true;
+}
/* PSPP - a program for statistical analysis.
- Copyright (C) 2010 Free Software Foundation, Inc.
+ Copyright (C) 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include <stdlib.h>
#include "data/dictionary.h"
+#include "data/identifier.h"
#include "data/val-type.h"
#include "data/variable.h"
+#include "libpspp/message.h"
#include "gl/xalloc.h"
+#include "gettext.h"
+#define _(msgid) gettext (msgid)
+
/* Creates and returns a clone of OLD. The caller is responsible for freeing
the new multiple response set (using mrset_destroy()). */
struct mrset *
}
}
+/* Returns true if the UTF-8 encoded NAME is a valid name for a multiple
+ response set in a dictionary encoded in DICT_ENCODING, false otherwise. If
+ ISSUE_ERROR is true, issues an explanatory error message on failure. */
+bool
+mrset_is_valid_name (const char *name, const char *dict_encoding,
+ bool issue_error)
+{
+ if (!id_is_valid (name, dict_encoding, issue_error))
+ return false;
+
+ if (name[0] != '$')
+ {
+ if (issue_error)
+ msg (SE, _("%s is not a valid name for a multiple response "
+ "set. Multiple response set names must begin with "
+ "`$'."), name);
+ return false;
+ }
+
+ return true;
+}
+
/* Checks various constraints on MRSET:
- - MRSET has a valid name for a multiple response set (beginning with '$').
+ - MRSET's name begins with '$' and is valid as an identifier in DICT.
- MRSET has a valid type.
size_t i;
if (mrset->name == NULL
- || mrset->name[0] != '$'
+ || !mrset_is_valid_name (mrset->name, dict_get_encoding (dict), false)
|| (mrset->type != MRSET_MD && mrset->type != MRSET_MC)
|| mrset->vars == NULL
|| mrset->n_vars < 2)
/* A multiple response set. */
struct mrset
{
- char *name; /* Name for syntax. Always begins with "$". */
- char *label; /* Human-readable label for group. */
+ char *name; /* UTF-8 encoded name beginning with "$". */
+ char *label; /* Human-readable UTF-8 label for group. */
enum mrset_type type; /* Group type. */
struct variable **vars; /* Constituent variables. */
size_t n_vars; /* Number of constituent variables. */
struct mrset *mrset_clone (const struct mrset *);
void mrset_destroy (struct mrset *);
+bool mrset_is_valid_name (const char *name, const char *dict_encoding,
+ bool issue_error);
+
bool mrset_ok (const struct mrset *, const struct dictionary *);
#endif /* data/mrset.h */
m.category = MSG_C_GENERAL;
m.severity = MSG_S_ERROR;
- m.where.file_name = NULL;
- m.where.line_number = 0;
- m.where.first_column = 0;
- m.where.last_column = 0;
+ m.file_name = NULL;
+ m.first_line = 0;
+ m.last_line = 0;
+ m.first_column = 0;
+ m.last_column = 0;
m.text = ds_cstr (&text);
msg_emit (&m);
m.category = MSG_C_GENERAL;
m.severity = MSG_S_WARNING;
- m.where.file_name = NULL;
- m.where.line_number = 0;
- m.where.first_column = 0;
- m.where.last_column = 0;
+ m.file_name = NULL;
+ m.first_line = 0;
+ m.last_line = 0;
+ m.first_column = 0;
+ m.last_column = 0;
m.text = ds_cstr (&text);
msg_emit (&m);
for (j = 0; j < 6; j++)
fmt[j] = read_int (r);
- if (!var_is_valid_name (name, false) || *name == '#' || *name == '$')
+ if (!dict_id_is_valid (dict, name, false)
+ || *name == '#' || *name == '$')
error (r, _("Invalid variable name `%s' in position %d."), name, i);
str_uppercase (name);
{
char label[256];
read_string (r, label);
- var_set_label (v, label);
+ var_set_label (v, label, NULL, false); /* XXX */
}
}
{
char line[256];
read_string (r, line);
- dict_add_document_line (dict, line);
+ dict_add_document_line (dict, line, false);
}
}
buf_write (w, "E", 1);
write_int (w, line_cnt);
for (i = 0; i < line_cnt; i++)
- {
- dict_get_document_line (dict, i, &line);
- write_string (w, ds_cstr (&line));
- }
+ write_string (w, dict_get_document_line (dict, i));
ds_destroy (&line);
}
void (*callback) (void *); /* Callback for when the dataset changes */
void *cb_data;
+ /* Default encoding for reading syntax files. */
+ char *syntax_encoding;
}; /* struct dataset */
ds->cb_data = cb_data;
}
+void
+dataset_set_default_syntax_encoding (struct dataset *ds, const char *encoding)
+{
+ free (ds->syntax_encoding);
+ ds->syntax_encoding = xstrdup (encoding);
+}
+
+const char *
+dataset_get_default_syntax_encoding (const struct dataset *ds)
+{
+ return ds->syntax_encoding;
+}
/* Returns the last time the data was read. */
time_t
ds->caseinit = caseinit_create ();
proc_cancel_all_transformations (ds);
+
+ ds->syntax_encoding = xstrdup ("Auto");
+
return ds;
}
if ( ds->xform_callback)
ds->xform_callback (false, ds->xform_callback_aux);
+
+ free (ds->syntax_encoding);
free (ds);
}
struct dictionary *dataset_dict (const struct dataset *ds);
const struct casereader *dataset_source (const struct dataset *ds);
-
const struct ccase *lagged_case (const struct dataset *ds, int n_before);
void dataset_need_lag (struct dataset *ds, int n_before);
void dataset_set_callback (struct dataset *ds, void (*cb) (void *), void *);
+void dataset_set_default_syntax_encoding (struct dataset *, const char *);
+const char *dataset_get_default_syntax_encoding (const struct dataset *);
+
#endif /* procedure.h */
#include "data/file-handle-def.h"
#include "data/file-name.h"
#include "data/format.h"
+#include "data/identifier.h"
#include "data/missing-values.h"
#include "data/mrset.h"
#include "data/short-names.h"
rec->name, 8, r->pool);
name[strcspn (name, " ")] = '\0';
- if (!var_is_valid_name (name, false) || name[0] == '$' || name[0] == '#')
+ if (!dict_id_is_valid (dict, name, false)
+ || name[0] == '$' || name[0] == '#')
sys_error (r, rec->pos, _("Invalid variable name `%s'."), name);
if (rec->width < 0 || rec->width > 255)
utf8_label = recode_string_pool ("UTF-8", dict_encoding,
rec->label, -1, r->pool);
- var_set_label (var, utf8_label);
+ var_set_label (var, utf8_label, NULL, false);
}
/* Set missing values. */
ss_rtrim (&line, ss_cstr (" "));
line.string[line.length] = '\0';
- dict_add_document_line (dict, line.string);
+ dict_add_document_line (dict, line.string, false);
ss_dealloc (&line);
}
while (read_variable_to_value_pair (r, dict, text, &var, &long_name))
{
/* Validate long name. */
- if (!var_is_valid_name (long_name, false))
+ /* XXX need to reencode name to UTF-8 */
+ if (!dict_id_is_valid (dict, long_name, false))
{
sys_warn (r, record->pos,
_("Long variable mapping from %s to invalid "
m.category = msg_class_to_category (class);
m.severity = msg_class_to_severity (class);
- m.where.file_name = NULL;
- m.where.line_number = 0;
- m.where.first_column = 0;
- m.where.last_column = 0;
+ m.file_name = NULL;
+ m.first_line = 0;
+ m.last_line = 0;
+ m.first_column = 0;
+ m.last_column = 0;
m.text = ds_cstr (&text);
msg_emit (&m);
#include "libpspp/message.h"
#include "libpspp/misc.h"
#include "libpspp/str.h"
+#include "libpspp/string-array.h"
#include "libpspp/version.h"
#include "gl/xmemdup0.h"
idx += sfm_width_to_octs (var_get_width (v));
}
- if (dict_get_documents (d) != NULL)
+ if (dict_get_document_line_cnt (d) > 0)
write_documents (w, d);
write_integer_info_record (w);
static void
write_documents (struct sfm_writer *w, const struct dictionary *d)
{
- size_t line_cnt = dict_get_document_line_cnt (d);
+ const struct string_array *docs = dict_get_documents (d);
+ const char *enc = dict_get_encoding (d);
+ size_t i;
write_int (w, 6); /* Record type. */
- write_int (w, line_cnt);
- write_bytes (w, dict_get_documents (d), line_cnt * DOC_LINE_LENGTH);
+ write_int (w, docs->n);
+ for (i = 0; i < docs->n; i++)
+ {
+ char *s = recode_string (enc, "UTF-8", docs->strings[i], -1);
+ size_t s_len = strlen (s);
+ size_t write_len = MIN (s_len, DOC_LINE_LENGTH);
+
+ write_bytes (w, s, write_len);
+ write_spaces (w, DOC_LINE_LENGTH - write_len);
+ free (s);
+ }
}
static void
#include "libpspp/assertion.h"
#include "libpspp/compiler.h"
#include "libpspp/hash-functions.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/misc.h"
#include "libpspp/str.h"
var_set_print_format (new_var, var_get_print_format (old_var));
var_set_write_format (new_var, var_get_write_format (old_var));
var_set_value_labels (new_var, var_get_value_labels (old_var));
- var_set_label (new_var, var_get_label (old_var));
+ var_set_label (new_var, var_get_label (old_var), NULL, false);
var_set_measure (new_var, var_get_measure (old_var));
var_set_display_width (new_var, var_get_display_width (old_var));
var_set_alignment (new_var, var_get_alignment (old_var));
\f
/* Variable names. */
-/* Return variable V's name. */
+/* Return variable V's name, as a UTF-8 encoded string. */
const char *
var_get_name (const struct variable *v)
{
return v->name;
}
-/* Sets V's name to NAME.
+/* Sets V's name to NAME, a UTF-8 encoded string.
Do not use this function for a variable in a dictionary. Use
dict_rename_var instead. */
void
var_set_name (struct variable *v, const char *name)
{
assert (!var_has_vardict (v));
- assert (var_is_plausible_name (name, false));
+ assert (id_is_plausible (name, false));
free (v->name);
v->name = xstrdup (name);
dict_var_changed (v);
}
-/* Returns true if NAME is an acceptable name for a variable,
- false otherwise. If ISSUE_ERROR is true, issues an
- explanatory error message on failure. */
-bool
-var_is_valid_name (const char *name, bool issue_error)
-{
- bool plausible;
- size_t length, i;
-
- /* Note that strlen returns number of BYTES, not the number of
- CHARACTERS */
- length = strlen (name);
-
- plausible = var_is_plausible_name(name, issue_error);
-
- if ( ! plausible )
- return false;
-
-
- if (!lex_is_id1 (name[0]))
- {
- if (issue_error)
- msg (SE, _("Character `%c' (in %s) may not appear "
- "as the first character in a variable name."),
- name[0], name);
- return false;
- }
-
-
- for (i = 0; i < length; i++)
- {
- if (!lex_is_idn (name[i]))
- {
- if (issue_error)
- msg (SE, _("Character `%c' (in %s) may not appear in "
- "a variable name."),
- name[i], name);
- return false;
- }
- }
-
- return true;
-}
-
-/* Returns true if NAME is an plausible name for a variable,
- false otherwise. If ISSUE_ERROR is true, issues an
- explanatory error message on failure.
- This function makes no use of LC_CTYPE.
-*/
-bool
-var_is_plausible_name (const char *name, bool issue_error)
-{
- size_t length;
-
- /* Note that strlen returns number of BYTES, not the number of
- CHARACTERS */
- length = strlen (name);
- if (length < 1)
- {
- if (issue_error)
- msg (SE, _("Variable name cannot be empty string."));
- return false;
- }
- else if (length > VAR_NAME_LEN)
- {
- if (issue_error)
- msg (SE, _("Variable name %s exceeds %d-character limit."),
- name, (int) VAR_NAME_LEN);
- return false;
- }
-
- if (lex_id_to_token (ss_cstr (name)) != T_ID)
- {
- if (issue_error)
- msg (SE, _("`%s' may not be used as a variable name because it "
- "is a reserved word."), name);
- return false;
- }
-
- return true;
-}
-
/* Returns VAR's dictionary class. */
enum dict_class
var_get_dict_class (const struct variable *var)
return v->label;
}
-/* Sets V's variable label to LABEL, stripping off leading and
- trailing white space and truncating to 255 characters.
- If LABEL is a null pointer or if LABEL is an empty string
- (after stripping white space), then V's variable label (if
- any) is removed. */
-void
-var_set_label (struct variable *v, const char *label)
+/* Sets V's variable label to UTF-8 encoded string LABEL, stripping off leading
+ and trailing white space. If LABEL is a null pointer or if LABEL is an
+ empty string (after stripping white space), then V's variable label (if any)
+ is removed.
+
+ Variable labels are limited to 255 bytes in the dictionary encoding, which
+ should be specified as DICT_ENCODING. If LABEL fits within this limit, this
+ function returns true. Otherwise, the variable label is set to a truncated
+ value, this function returns false and, if ISSUE_WARNING is true, issues a
+ warning. */
+bool
+var_set_label (struct variable *v, const char *label,
+ const char *dict_encoding, bool issue_warning)
{
+ bool truncated = false;
+
free (v->label);
v->label = NULL;
if (label != NULL)
{
struct substring s = ss_cstr (label);
+ size_t trunc_len;
+
+ if (dict_encoding != NULL)
+ {
+ enum { MAX_LABEL_LEN = 255 };
+
+ trunc_len = utf8_encoding_trunc_len (label, dict_encoding,
+ MAX_LABEL_LEN);
+ if (ss_length (s) > trunc_len)
+ {
+ if (issue_warning)
+ msg (SW, _("Truncating variable label for variable `%s' to %d "
+ "bytes."), var_get_name (v), MAX_LABEL_LEN);
+ ss_truncate (&s, trunc_len);
+ truncated = true;
+ }
+ }
+
ss_trim (&s, ss_cstr (CC_SPACES));
- ss_truncate (&s, 255);
if (!ss_is_empty (s))
v->label = ss_xstrdup (s);
}
+
dict_var_changed (v);
+
+ return truncated;
}
/* Removes any variable label from V. */
void
var_clear_label (struct variable *v)
{
- var_set_label (v, NULL);
+ var_set_label (v, NULL, NULL, false);
}
/* Returns true if V has a variable V,
void
var_set_short_name (struct variable *var, size_t idx, const char *short_name)
{
- assert (short_name == NULL || var_is_plausible_name (short_name, false));
+ assert (short_name == NULL || id_is_plausible (short_name, false));
/* Clear old short name numbered IDX, if any. */
if (idx < var->short_name_cnt)
void var_destroy (struct variable *);
/* Variable names. */
-#define VAR_NAME_LEN 64 /* Maximum length of variable name, in bytes. */
-
const char *var_get_name (const struct variable *);
void var_set_name (struct variable *, const char *);
-bool var_is_valid_name (const char *, bool issue_error);
-bool var_is_plausible_name (const char *name, bool issue_error);
enum dict_class var_get_dict_class (const struct variable *);
int compare_vars_by_name (const void *, const void *, const void *);
/* Variable labels. */
const char *var_to_string (const struct variable *);
const char *var_get_label (const struct variable *);
-void var_set_label (struct variable *, const char *);
+bool var_set_label (struct variable *, const char *label,
+ const char *dict_encoding, bool issue_warning);
void var_clear_label (struct variable *);
bool var_has_label (const struct variable *);
/* PSPP - a program for statistical analysis.
- Copyright (C) 2006, 2011 Free Software Foundation, Inc.
+ Copyright (C) 2006, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include <stdlib.h>
#include "data/dictionary.h"
+#include "data/identifier.h"
#include "libpspp/assertion.h"
#include "libpspp/str.h"
assert (width == var_get_width (vector->vars[i]));
}
-/* Creates and returns a new vector with the given NAME
+/* Creates and returns a new vector with the given UTF-8 encoded NAME
that contains the VAR_CNT variables in VARS.
All variables in VARS must have the same type and width. */
struct vector *
-vector_create (const char *name,
- struct variable **vars, size_t var_cnt)
+vector_create (const char *name, struct variable **vars, size_t var_cnt)
{
struct vector *vector = xmalloc (sizeof *vector);
assert (var_cnt > 0);
- assert (var_is_plausible_name (name, false));
- vector->name = xstrdup (name);
+ assert (id_is_plausible (name, false));
+ vector->name = xstrdup (name);
vector->vars = xmemdup (vars, var_cnt * sizeof *vector->vars);
vector->var_cnt = var_cnt;
check_widths (vector);
size_t i;
new->name = xstrdup (old->name);
-
new->vars = xnmalloc (old->var_cnt, sizeof *new->vars);
new->var_cnt = old->var_cnt;
for (i = 0; i < new->var_cnt; i++)
free (vector);
}
-/* Returns VECTOR's name. */
+/* Returns VECTOR's name, as a UTF-8 encoded string. */
const char *
vector_get_name (const struct vector *vector)
{
/* PSPP - a program for statistical analysis.
- Copyright (C) 2006, 2011 Free Software Foundation, Inc.
+ Copyright (C) 2006, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
struct variable *vector_get_var (const struct vector *, size_t idx);
size_t vector_get_var_cnt (const struct vector *);
+bool vector_is_valid_name (const char *name, bool issue_error);
+
int compare_vector_ptrs_by_name (const void *a_, const void *b_);
#endif /* data/vector.h */
src_language_liblanguage_la_SOURCES = \
- src/language/syntax-file.c \
- src/language/syntax-file.h \
- src/language/syntax-string-source.c \
- src/language/syntax-string-source.h \
- src/language/prompt.c \
- src/language/prompt.h \
src/language/command.c \
src/language/command.h \
src/language/command.def \
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2009, 2010 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include "data/variable.h"
#include "language/lexer/command-name.h"
#include "language/lexer/lexer.h"
-#include "language/prompt.h"
#include "libpspp/assertion.h"
#include "libpspp/compiler.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/str.h"
-#include "libpspp/getl.h"
#include "output/text-item.h"
#include "xalloc.h"
{
F_ENHANCED = 0x10, /* Allowed only in enhanced syntax mode. */
F_TESTING = 0x20, /* Allowed only in testing mode. */
- F_KEEP_FINAL_TOKEN = 0x40,/* Don't skip final token in command name. */
F_ABBREV = 0x80 /* Not a candidate for name completion. */
};
\f
/* Command parser. */
-static const struct command *parse_command_name (struct lexer *lexer);
+static const struct command *parse_command_name (struct lexer *,
+ int *n_tokens);
static enum cmd_result do_parse_command (struct lexer *, struct dataset *, enum cmd_state);
/* Parses an entire command, from command name to terminating
const struct command *command = NULL;
enum cmd_result result;
bool opened = false;
+ int n_tokens;
/* Read the command's first token. */
- prompt_set_style (PROMPT_FIRST);
set_completion_state (state);
- lex_get (lexer);
if (lex_token (lexer) == T_STOP)
{
result = CMD_EOF;
goto finish;
}
- prompt_set_style (PROMPT_LATER);
-
/* Parse the command name. */
- command = parse_command_name (lexer);
+ command = parse_command_name (lexer, &n_tokens);
if (command == NULL)
{
result = CMD_FAILURE;
else
{
/* Execute command. */
+ int i;
+
+ for (i = 0; i < n_tokens; i++)
+ lex_get (lexer);
result = command->function (lexer, ds);
}
assert (cmd_result_is_valid (result));
- finish:
+finish:
if (cmd_result_is_failure (result))
- {
- lex_discard_rest_of_command (lexer);
- if (source_stream_current_error_mode (
- lex_get_source_stream (lexer)) == ERRMODE_STOP )
- {
- msg (MW, _("Error encountered while ERROR=STOP is effective."));
- result = CMD_CASCADING_FAILURE;
- }
- }
+ lex_interactive_reset (lexer);
+ else if (result == CMD_SUCCESS)
+ result = lex_end_of_command (lexer);
+
+ lex_discard_rest_of_command (lexer);
+ while (lex_token (lexer) == T_ENDCMD)
+ lex_get (lexer);
if (opened)
text_item_submit (text_item_create (TEXT_ITEM_COMMAND_CLOSE,
return missing_words;
}
-/* Parse the command name and return a pointer to the corresponding
- struct command if successful.
- If not successful, return a null pointer. */
+static bool
+parse_command_word (struct lexer *lexer, struct string *s, int n)
+{
+ bool need_space = ds_last (s) != EOF && ds_last (s) != '-';
+
+ switch (lex_next_token (lexer, n))
+ {
+ case T_DASH:
+ ds_put_byte (s, '-');
+ return true;
+
+ case T_ID:
+ if (need_space)
+ ds_put_byte (s, ' ');
+ ds_put_cstr (s, lex_next_tokcstr (lexer, n));
+ return true;
+
+ case T_POS_NUM:
+ if (lex_next_is_integer (lexer, n))
+ {
+ int integer = lex_next_integer (lexer, n);
+ if (integer >= 0)
+ {
+ if (need_space)
+ ds_put_byte (s, ' ');
+ ds_put_format (s, "%ld", lex_next_integer (lexer, n));
+ return true;
+ }
+ }
+ return false;
+
+ default:
+ return false;
+ }
+}
+
+/* Parses the command name. On success returns a pointer to the corresponding
+ struct command and stores the number of tokens in the command name into
+ *N_TOKENS. On failure, returns a null pointer and stores the number of
+ tokens required to determine that no command name was present into
+ *N_TOKENS. */
static const struct command *
-parse_command_name (struct lexer *lexer)
+parse_command_name (struct lexer *lexer, int *n_tokens)
{
const struct command *command;
int missing_words;
struct string s;
-
- if (lex_token (lexer) == T_EXP
- || lex_token (lexer) == T_ASTERISK
- || lex_token (lexer) == T_LBRACK)
- {
- static const struct command c = { S_ANY, 0, "COMMENT", cmd_comment };
- return &c;
- }
+ int word;
command = NULL;
missing_words = 0;
ds_init_empty (&s);
- for (;;)
+ word = 0;
+ while (parse_command_word (lexer, &s, word))
{
- if (lex_token (lexer) == T_DASH)
- ds_put_byte (&s, '-');
- else if (lex_token (lexer) == T_ID)
- {
- if (!ds_is_empty (&s) && ds_last (&s) != '-')
- ds_put_byte (&s, ' ');
- ds_put_cstr (&s, lex_tokcstr (lexer));
- }
- else if (lex_is_integer (lexer) && lex_integer (lexer) >= 0)
- {
- if (!ds_is_empty (&s) && ds_last (&s) != '-')
- ds_put_byte (&s, ' ');
- ds_put_format (&s, "%ld", lex_integer (lexer));
- }
- else
- break;
-
missing_words = find_best_match (ds_ss (&s), &command);
if (missing_words <= 0)
break;
-
- lex_get (lexer);
+ word++;
}
if (command == NULL && missing_words > 0)
else
msg (SE, _("Unknown command `%s'."), ds_cstr (&s));
}
- else if (missing_words == 0)
- {
- if (!(command->flags & F_KEEP_FINAL_TOKEN))
- lex_get (lexer);
- }
- else if (missing_words < 0)
- {
- assert (missing_words == -1);
- assert (!(command->flags & F_KEEP_FINAL_TOKEN));
- }
ds_destroy (&s);
+
+ *n_tokens = (word + 1) + missing_words;
return command;
}
}
}
else if (state == CMD_STATE_INPUT_PROGRAM)
- msg (SE, _("%s is not allowed inside %s."), command->name, "INPUT PROGRAM" );
+ msg (SE, _("%s is not allowed inside %s."),
+ command->name, "INPUT PROGRAM" );
else if (state == CMD_STATE_FILE_TYPE)
msg (SE, _("%s is not allowed inside %s."), command->name, "FILE TYPE");
if (!lex_match_id (lexer, "ESTIMATED"))
dict_set_case_limit (dataset_dict (ds), x);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Parses, performs the EXECUTE procedure. */
int
-cmd_execute (struct lexer *lexer, struct dataset *ds)
+cmd_execute (struct lexer *lexer UNUSED, struct dataset *ds)
{
bool ok = casereader_destroy (proc_open (ds));
if (!proc_commit (ds) || !ok)
return CMD_CASCADING_FAILURE;
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Parses, performs the ERASE command. */
int
cmd_erase (struct lexer *lexer, struct dataset *ds UNUSED)
{
+ char *filename;
+ int retval;
+
if (settings_get_safer_mode ())
{
msg (SE, _("This command not allowed when the SAFER option is set."));
if (!lex_force_string (lexer))
return CMD_FAILURE;
- if (remove (lex_tokcstr (lexer)) == -1)
+ filename = utf8_to_filename (lex_tokcstr (lexer));
+ retval = remove (filename);
+ free (filename);
+
+ if (retval == -1)
{
msg (SW, _("Error removing `%s': %s."),
lex_tokcstr (lexer), strerror (errno));
return CMD_FAILURE;
}
+ lex_get (lexer);
return CMD_SUCCESS;
}
/* Parses, performs the NEW FILE command. */
int
-cmd_new_file (struct lexer *lexer, struct dataset *ds)
+cmd_new_file (struct lexer *lexer UNUSED, struct dataset *ds)
{
proc_discard_active_file (ds);
-
- return lex_end_of_command (lexer);
-}
-
-/* Parses a comment. */
-int
-cmd_comment (struct lexer *lexer, struct dataset *ds UNUSED)
-{
- lex_skip_comment (lexer);
return CMD_SUCCESS;
}
/* Utility commands acceptable anywhere. */
DEF_CMD (S_ANY, F_ENHANCED, "CLOSE FILE HANDLE", cmd_close_file_handle)
-DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "COMMENT", cmd_comment)
DEF_CMD (S_ANY, 0, "CACHE", cmd_cache)
DEF_CMD (S_ANY, 0, "CD", cmd_cd)
DEF_CMD (S_ANY, 0, "DO REPEAT", cmd_do_repeat)
DEF_CMD (S_ANY, 0, "ERASE", cmd_erase)
DEF_CMD (S_ANY, 0, "EXIT", cmd_finish)
DEF_CMD (S_ANY, 0, "FILE HANDLE", cmd_file_handle)
-DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "FILE LABEL", cmd_file_label)
+DEF_CMD (S_ANY, 0, "FILE LABEL", cmd_file_label)
DEF_CMD (S_ANY, 0, "FINISH", cmd_finish)
DEF_CMD (S_ANY, 0, "HOST", cmd_host)
DEF_CMD (S_ANY, 0, "INCLUDE", cmd_include)
DEF_CMD (S_ANY, 0, "RESTORE", cmd_restore)
DEF_CMD (S_ANY, 0, "SET", cmd_set)
DEF_CMD (S_ANY, 0, "SHOW", cmd_show)
-DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "SUBTITLE", cmd_subtitle)
+DEF_CMD (S_ANY, 0, "SUBTITLE", cmd_subtitle)
DEF_CMD (S_ANY, 0, "SYSFILE INFO", cmd_sysfile_info)
-DEF_CMD (S_ANY, F_KEEP_FINAL_TOKEN, "TITLE", cmd_title)
+DEF_CMD (S_ANY, 0, "TITLE", cmd_title)
/* Commands that define (or replace) the active file. */
DEF_CMD (S_INITIAL | S_DATA, 0, "ADD FILES", cmd_add_files)
DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "COMPUTE", cmd_compute)
DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DATAFILE ATTRIBUTE", cmd_datafile_attribute)
DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DISPLAY", cmd_display)
-DEF_CMD (S_DATA | S_INPUT_PROGRAM, F_KEEP_FINAL_TOKEN, "DOCUMENT", cmd_document)
+DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DOCUMENT", cmd_document)
DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DO IF", cmd_do_if)
DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "DROP DOCUMENTS", cmd_drop_documents)
DEF_CMD (S_DATA | S_INPUT_PROGRAM, 0, "ELSE IF", cmd_else_if)
/* Commands that may appear after active file definition. */
DEF_CMD (S_DATA, 0, "AGGREGATE", cmd_aggregate)
DEF_CMD (S_DATA, 0, "AUTORECODE", cmd_autorecode)
-DEF_CMD (S_DATA, F_KEEP_FINAL_TOKEN, "BEGIN DATA", cmd_begin_data)
+DEF_CMD (S_DATA, 0, "BEGIN DATA", cmd_begin_data)
DEF_CMD (S_DATA, 0, "COUNT", cmd_count)
DEF_CMD (S_DATA, 0, "CROSSTABS", cmd_crosstabs)
DEF_CMD (S_DATA, 0, "CORRELATIONS", cmd_correlation)
src/language/control/control-stack.h \
src/language/control/do-if.c \
src/language/control/loop.c \
- src/language/control/temporary.c \
src/language/control/repeat.c \
- src/language/control/repeat.h
+ src/language/control/temporary.c
EXTRA_DIST += src/language/control/OChangeLog
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2009, 2011 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2009-2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
/* Parse ELSE. */
int
-cmd_else (struct lexer *lexer, struct dataset *ds)
+cmd_else (struct lexer *lexer UNUSED, struct dataset *ds)
{
struct do_if_trns *do_if = ctl_stack_top (&do_if_class);
assert (ds == do_if->ds);
if (do_if == NULL || !must_not_have_else (do_if))
return CMD_CASCADING_FAILURE;
add_else (do_if);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Parse END IF. */
int
-cmd_end_if (struct lexer *lexer, struct dataset *ds)
+cmd_end_if (struct lexer *lexer UNUSED, struct dataset *ds)
{
struct do_if_trns *do_if = ctl_stack_top (&do_if_class);
assert (ds == do_if->ds);
ctl_stack_pop (do_if);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Closes out DO_IF, by adding a sentinel ELSE clause if
add_clause (do_if, condition);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Adds a clause to DO_IF that tests for the given CONDITION and,
/* Parses BREAK. */
int
-cmd_break (struct lexer *lexer, struct dataset *ds)
+cmd_break (struct lexer *lexer UNUSED, struct dataset *ds)
{
struct ctl_stmt *loop = ctl_stack_search (&loop_class);
if (loop == NULL)
add_transformation (ds, break_trns_proc, NULL, loop);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Closes a LOOP construct by emitting the END LOOP
#include <config.h>
-#include "language/control/repeat.h"
-
-#include <ctype.h>
-#include <math.h>
#include <stdlib.h>
#include "data/dictionary.h"
#include "data/procedure.h"
-#include "data/settings.h"
-#include "libpspp/getl.h"
#include "language/command.h"
#include "language/lexer/lexer.h"
+#include "language/lexer/segment.h"
+#include "language/lexer/token.h"
#include "language/lexer/variable-parser.h"
#include "libpspp/cast.h"
-#include "libpspp/ll.h"
+#include "libpspp/hash-functions.h"
+#include "libpspp/hmap.h"
#include "libpspp/message.h"
-#include "libpspp/misc.h"
-#include "libpspp/pool.h"
#include "libpspp/str.h"
-#include "data/variable.h"
-#include "gl/intprops.h"
+#include "gl/ftoastr.h"
+#include "gl/minmax.h"
#include "gl/xalloc.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
-/* A line repeated by DO REPEAT. */
-struct repeat_line
- {
- struct ll ll; /* In struct repeat_block line_list. */
- const char *file_name; /* File name. */
- int line_number; /* Line number. */
- struct substring text; /* Contents. */
- };
-
-/* The type of substitution made for a DO REPEAT macro. */
-enum repeat_macro_type
- {
- VAR_NAMES,
- OTHER
- };
-
-/* Describes one DO REPEAT macro. */
-struct repeat_macro
+struct dummy_var
{
- struct ll ll; /* In struct repeat_block macros. */
- enum repeat_macro_type type; /* Types of replacements. */
- struct substring name; /* Macro name. */
- struct substring *replacements; /* Macro replacement. */
+ struct hmap_node hmap_node;
+ char *name;
+ char **values;
+ size_t n_values;
};
-/* A DO REPEAT...END REPEAT block. */
-struct repeat_block
- {
- struct getl_interface parent;
-
- struct pool *pool; /* Pool used for storage. */
- struct dataset *ds; /* The dataset for this block */
-
- struct ll_list lines; /* Lines in buffer. */
- struct ll *cur_line; /* Last line output. */
- int loop_cnt; /* Number of loops. */
- int loop_idx; /* Number of loops so far. */
+static bool parse_specification (struct lexer *, struct dictionary *,
+ struct hmap *dummies);
+static bool parse_commands (struct lexer *, struct hmap *dummies);
+static void destroy_dummies (struct hmap *dummies);
- struct ll_list macros; /* Table of macros. */
+static bool parse_ids (struct lexer *, const struct dictionary *,
+ struct dummy_var *);
+static bool parse_numbers (struct lexer *, struct dummy_var *);
+static bool parse_strings (struct lexer *, struct dummy_var *);
- bool print; /* Print lines as executed? */
- };
-
-static bool parse_specification (struct lexer *, struct repeat_block *);
-static bool parse_lines (struct lexer *, struct repeat_block *);
-static void create_vars (struct repeat_block *);
+int
+cmd_do_repeat (struct lexer *lexer, struct dataset *ds)
+{
+ struct hmap dummies;
+ bool ok;
-static struct repeat_macro *find_macro (struct repeat_block *,
- struct substring name);
+ if (!parse_specification (lexer, dataset_dict (ds), &dummies))
+ return CMD_CASCADING_FAILURE;
-static int parse_ids (struct lexer *, const struct dictionary *dict,
- struct repeat_macro *, struct pool *);
+ ok = parse_commands (lexer, &dummies);
-static int parse_numbers (struct lexer *, struct repeat_macro *,
- struct pool *);
+ destroy_dummies (&dummies);
-static int parse_strings (struct lexer *, struct repeat_macro *,
- struct pool *);
+ return ok ? CMD_SUCCESS : CMD_CASCADING_FAILURE;
+}
-static void do_repeat_filter (struct getl_interface *,
- struct string *);
-static bool do_repeat_read (struct getl_interface *,
- struct string *);
-static void do_repeat_close (struct getl_interface *);
-static bool always_false (const struct getl_interface *);
-static const char *do_repeat_name (const struct getl_interface *);
-static int do_repeat_location (const struct getl_interface *);
+static unsigned int
+hash_dummy (const char *name, size_t name_len)
+{
+ return hash_case_bytes (name, name_len, 0);
+}
-int
-cmd_do_repeat (struct lexer *lexer, struct dataset *ds)
+static const struct dummy_var *
+find_dummy_var (struct hmap *hmap, const char *name, size_t name_len)
{
- struct repeat_block *block;
-
- block = pool_create_container (struct repeat_block, pool);
- block->ds = ds;
- ll_init (&block->lines);
- block->cur_line = ll_null (&block->lines);
- block->loop_idx = 0;
- ll_init (&block->macros);
-
- if (!parse_specification (lexer, block) || !parse_lines (lexer, block))
- goto error;
-
- create_vars (block);
-
- block->parent.read = do_repeat_read;
- block->parent.close = do_repeat_close;
- block->parent.filter = do_repeat_filter;
- block->parent.interactive = always_false;
- block->parent.name = do_repeat_name;
- block->parent.location = do_repeat_location;
-
- if (!ll_is_empty (&block->lines))
- getl_include_source (lex_get_source_stream (lexer),
- &block->parent,
- lex_current_syntax_mode (lexer),
- lex_current_error_mode (lexer)
- );
- else
- pool_destroy (block->pool);
+ const struct dummy_var *dv;
- return CMD_SUCCESS;
+ HMAP_FOR_EACH_WITH_HASH (dv, struct dummy_var, hmap_node,
+ hash_dummy (name, name_len), hmap)
+ if (strcasecmp (dv->name, name))
+ return dv;
- error:
- pool_destroy (block->pool);
- return CMD_CASCADING_FAILURE;
+ return NULL;
}
/* Parses the whole DO REPEAT command specification.
Returns success. */
static bool
-parse_specification (struct lexer *lexer, struct repeat_block *block)
+parse_specification (struct lexer *lexer, struct dictionary *dict,
+ struct hmap *dummies)
{
- struct substring first_name;
+ struct dummy_var *first_dv = NULL;
- block->loop_cnt = 0;
+ hmap_init (dummies);
do
{
- struct repeat_macro *macro;
- struct dictionary *dict = dataset_dict (block->ds);
- int count;
+ struct dummy_var *dv;
+ const char *name;
+ bool ok;
/* Get a stand-in variable name and make sure it's unique. */
if (!lex_force_id (lexer))
- return false;
- if (dict_lookup_var (dict, lex_tokcstr (lexer)))
+ goto error;
+ name = lex_tokcstr (lexer);
+ if (dict_lookup_var (dict, name))
msg (SW, _("Dummy variable name `%s' hides dictionary variable `%s'."),
- lex_tokcstr (lexer), lex_tokcstr (lexer));
- if (find_macro (block, lex_tokss (lexer)))
- {
- msg (SE, _("Dummy variable name `%s' is given twice."),
- lex_tokcstr (lexer));
- return false;
- }
+ name, name);
+ if (find_dummy_var (dummies, name, strlen (name)))
+ {
+ msg (SE, _("Dummy variable name `%s' is given twice."), name);
+ goto error;
+ }
/* Make a new macro. */
- macro = pool_alloc (block->pool, sizeof *macro);
- ss_alloc_substring_pool (¯o->name, lex_tokss (lexer), block->pool);
- ll_push_tail (&block->macros, ¯o->ll);
+ dv = xmalloc (sizeof *dv);
+ dv->name = xstrdup (name);
+ dv->values = NULL;
+ dv->n_values = 0;
+ hmap_insert (dummies, &dv->hmap_node, hash_dummy (name, strlen (name)));
/* Skip equals sign. */
lex_get (lexer);
if (!lex_force_match (lexer, T_EQUALS))
- return false;
+ goto error;
/* Get the details of the variable's possible values. */
- if (lex_token (lexer) == T_ID)
- count = parse_ids (lexer, dict, macro, block->pool);
+ if (lex_token (lexer) == T_ID || lex_token (lexer) == T_ALL)
+ ok = parse_ids (lexer, dict, dv);
else if (lex_is_number (lexer))
- count = parse_numbers (lexer, macro, block->pool);
+ ok = parse_numbers (lexer, dv);
else if (lex_is_string (lexer))
- count = parse_strings (lexer, macro, block->pool);
+ ok = parse_strings (lexer, dv);
else
{
lex_error (lexer, NULL);
- return false;
+ goto error;
}
- if (count == 0)
- return false;
+ if (!ok)
+ goto error;
+ assert (dv->n_values > 0);
if (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD)
{
lex_error (lexer, NULL);
- return false;
+ goto error;
}
- /* If this is the first variable then it defines how many
- replacements there must be; otherwise enforce this number of
- replacements. */
- if (block->loop_cnt == 0)
+ /* If this is the first variable then it defines how many replacements
+ there must be; otherwise enforce this number of replacements. */
+ if (first_dv == NULL)
+ first_dv = dv;
+ else if (first_dv->n_values != dv->n_values)
{
- block->loop_cnt = count;
- first_name = macro->name;
- }
- else if (block->loop_cnt != count)
- {
- msg (SE, _("Dummy variable `%.*s' had %d "
- "substitutions, so `%.*s' must also, but %d "
- "were specified."),
- (int) ss_length (first_name), ss_data (first_name),
- block->loop_cnt,
- (int) ss_length (macro->name), ss_data (macro->name),
- count);
- return false;
+ msg (SE, _("Dummy variable `%s' had %d substitutions, so `%s' must "
+ "also, but %d were specified."),
+ first_dv->name, first_dv->n_values,
+ dv->name, dv->n_values);
+ goto error;
}
lex_match (lexer, T_SLASH);
}
- while (lex_token (lexer) != T_ENDCMD);
+ while (!lex_match (lexer, T_ENDCMD));
- return true;
-}
+ while (lex_match (lexer, T_ENDCMD))
+ continue;
-/* Finds and returns a DO REPEAT macro with the given NAME, or
- NULL if there is none */
-static struct repeat_macro *
-find_macro (struct repeat_block *block, struct substring name)
-{
- struct repeat_macro *macro;
-
- ll_for_each (macro, struct repeat_macro, ll, &block->macros)
- if (ss_equals (macro->name, name))
- return macro;
+ return true;
- return NULL;
+error:
+ destroy_dummies (dummies);
+ return false;
}
-/* Advances LINE past white space and an identifier, if present.
- Returns true if KEYWORD matches the identifer, false
- otherwise. */
-static bool
-recognize_keyword (struct substring *line, const char *keyword)
+static size_t
+count_values (struct hmap *dummies)
{
- struct substring id;
- ss_ltrim (line, ss_cstr (CC_SPACES));
- ss_get_bytes (line, lex_id_get_length (*line), &id);
- return lex_id_match (ss_cstr (keyword), id);
+ const struct dummy_var *dv;
+ dv = HMAP_FIRST (struct dummy_var, hmap_node, dummies);
+ return dv->n_values;
}
-/* Returns true if LINE contains a DO REPEAT command, false
- otherwise. */
-static bool
-recognize_do_repeat (struct substring line)
+static void
+do_parse_commands (struct substring s, enum lex_syntax_mode syntax_mode,
+ struct hmap *dummies,
+ struct string *outputs, size_t n_outputs)
{
- return (recognize_keyword (&line, "do")
- && recognize_keyword (&line, "repeat"));
-}
+ struct segmenter segmenter;
-/* Returns true if LINE contains an END REPEAT command, false
- otherwise. Sets *PRINT to true for END REPEAT PRINT, false
- otherwise. */
-static bool
-recognize_end_repeat (struct substring line, bool *print)
-{
- if (!recognize_keyword (&line, "end")
- || !recognize_keyword (&line, "repeat"))
- return false;
+ segmenter_init (&segmenter, syntax_mode);
- *print = recognize_keyword (&line, "print");
- return true;
-}
+ while (!ss_is_empty (s))
+ {
+ enum segment_type type;
+ int n;
-/* Read all the lines we are going to substitute, inside the DO
- REPEAT...END REPEAT block. */
-static bool
-parse_lines (struct lexer *lexer, struct repeat_block *block)
-{
- char *previous_file_name;
- int nesting_level;
+ n = segmenter_push (&segmenter, s.string, s.length, &type);
+ assert (n >= 0);
- previous_file_name = NULL;
- nesting_level = 0;
+ if (type == SEG_DO_REPEAT_COMMAND)
+ {
+ for (;;)
+ {
+ int k;
- for (;;)
- {
- const char *cur_file_name;
- struct repeat_line *line;
- struct string text;
- bool command_ends_before_line, command_ends_after_line;
+ k = segmenter_push (&segmenter, s.string + n, s.length - n,
+ &type);
+ if (type != SEG_NEWLINE && type != SEG_DO_REPEAT_COMMAND)
+ break;
- /* Retrieve an input line and make a copy of it. */
- if (!lex_get_line_raw (lexer))
- {
- msg (SE, _("DO REPEAT without END REPEAT."));
- return false;
- }
- ds_init_string (&text, lex_entire_line_ds (lexer));
-
- /* Record file name. */
- cur_file_name = getl_source_name (lex_get_source_stream (lexer));
- if (cur_file_name != NULL &&
- (previous_file_name == NULL
- || !strcmp (cur_file_name, previous_file_name)))
- previous_file_name = pool_strdup (block->pool, cur_file_name);
-
- /* Create a line structure. */
- line = pool_alloc (block->pool, sizeof *line);
- line->file_name = previous_file_name;
- line->line_number = getl_source_location (lex_get_source_stream (lexer));
- ss_alloc_substring_pool (&line->text, ds_ss (&text), block->pool);
-
-
- /* Check whether the line contains a DO REPEAT or END
- REPEAT command. */
- lex_preprocess_line (&text,
- lex_current_syntax_mode (lexer),
- &command_ends_before_line,
- &command_ends_after_line);
- if (recognize_do_repeat (ds_ss (&text)))
- {
- if (settings_get_syntax () == COMPATIBLE)
- msg (SE, _("DO REPEAT may not nest in compatibility mode."));
- else
- nesting_level++;
+ n += k;
+ }
+
+ do_parse_commands (ss_head (s, n), syntax_mode, dummies,
+ outputs, n_outputs);
}
- else if (recognize_end_repeat (ds_ss (&text), &block->print)
- && nesting_level-- == 0)
+ else if (type != SEG_END)
{
- lex_discard_line (lexer);
- ds_destroy (&text);
- return true;
+ const struct dummy_var *dv;
+ size_t i;
+
+ dv = (type == SEG_IDENTIFIER
+ ? find_dummy_var (dummies, s.string, n)
+ : NULL);
+ for (i = 0; i < n_outputs; i++)
+ if (dv != NULL)
+ ds_put_cstr (&outputs[i], dv->values[i]);
+ else
+ ds_put_substring (&outputs[i], ss_head (s, n));
}
- ds_destroy (&text);
- /* Add the line to the list. */
- ll_push_tail (&block->lines, &line->ll);
+ ss_advance (&s, n);
}
}
-/* Creates variables for the given DO REPEAT. */
+static bool
+parse_commands (struct lexer *lexer, struct hmap *dummies)
+{
+ struct string *outputs;
+ struct string input;
+ size_t input_len;
+ size_t n_values;
+ char *file_name;
+ int line_number;
+ bool ok;
+ size_t i;
+
+ if (lex_get_file_name (lexer) != NULL)
+ file_name = xstrdup (lex_get_file_name (lexer));
+ else
+ file_name = NULL;
+ line_number = lex_get_first_line_number (lexer, 0);
+
+ ds_init_empty (&input);
+ while (lex_is_string (lexer))
+ {
+ ds_put_substring (&input, lex_tokss (lexer));
+ ds_put_byte (&input, '\n');
+ lex_get (lexer);
+ }
+ if (ds_is_empty (&input))
+ ds_put_byte (&input, '\n');
+ ds_put_byte (&input, '\0');
+ input_len = ds_length (&input);
+
+ n_values = count_values (dummies);
+ outputs = xmalloc (n_values * sizeof *outputs);
+ for (i = 0; i < n_values; i++)
+ ds_init_empty (&outputs[i]);
+
+ do_parse_commands (ds_ss (&input), lex_get_syntax_mode (lexer),
+ dummies, outputs, n_values);
+
+ ds_destroy (&input);
+
+ while (lex_match (lexer, T_ENDCMD))
+ continue;
+
+ ok = (lex_force_match_id (lexer, "END")
+ && lex_force_match_id (lexer, "REPEAT"));
+ if (ok)
+ lex_match_id (lexer, "PRINT"); /* XXX */
+
+ lex_discard_rest_of_command (lexer);
+
+ for (i = 0; i < n_values; i++)
+ {
+ struct string *output = &outputs[n_values - i - 1];
+ struct lex_reader *reader;
+
+ reader = lex_reader_for_substring_nocopy (ds_ss (output));
+ lex_reader_set_file_name (reader, file_name);
+ reader->line_number = line_number;
+ lex_include (lexer, reader);
+ }
+ free (file_name);
+
+ return ok;
+}
+
static void
-create_vars (struct repeat_block *block)
+destroy_dummies (struct hmap *dummies)
{
- struct repeat_macro *macro;
-
- ll_for_each (macro, struct repeat_macro, ll, &block->macros)
- if (macro->type == VAR_NAMES)
- {
- int i;
-
- for (i = 0; i < block->loop_cnt; i++)
- {
- /* Ignore return value: if the variable already
- exists there is no harm done. */
- char *var_name = ss_xstrdup (macro->replacements[i]);
- dict_create_var (dataset_dict (block->ds), var_name, 0);
- free (var_name);
- }
- }
+ struct dummy_var *dv, *next;
+
+ HMAP_FOR_EACH_SAFE (dv, next, struct dummy_var, hmap_node, dummies)
+ {
+ size_t i;
+
+ hmap_delete (dummies, &dv->hmap_node);
+
+ free (dv->name);
+ for (i = 0; i < dv->n_values; i++)
+ free (dv->values[i]);
+ free (dv->values);
+ free (dv);
+ }
+ hmap_destroy (dummies);
}
/* Parses a set of ids for DO REPEAT. */
-static int
+static bool
parse_ids (struct lexer *lexer, const struct dictionary *dict,
- struct repeat_macro *macro, struct pool *pool)
+ struct dummy_var *dv)
{
- char **replacements;
- size_t n, i;
-
- macro->type = VAR_NAMES;
- if (!parse_mixed_vars_pool (lexer, dict, pool, &replacements, &n, PV_NONE))
- return 0;
-
- macro->replacements = pool_nalloc (pool, n, sizeof *macro->replacements);
- for (i = 0; i < n; i++)
- macro->replacements[i] = ss_cstr (replacements[i]);
- return n;
+ return parse_mixed_vars (lexer, dict, &dv->values, &dv->n_values, PV_NONE);
}
/* Adds REPLACEMENT to MACRO's list of replacements, which has
*USED elements and has room for *ALLOCATED. Allocates memory
from POOL. */
static void
-add_replacement (struct substring replacement,
- struct repeat_macro *macro, struct pool *pool,
- size_t *used, size_t *allocated)
+add_replacement (struct dummy_var *dv, char *value, size_t *allocated)
{
- if (*used == *allocated)
- macro->replacements = pool_2nrealloc (pool, macro->replacements, allocated,
- sizeof *macro->replacements);
- macro->replacements[(*used)++] = replacement;
+ if (dv->n_values == *allocated)
+ dv->values = x2nrealloc (dv->values, allocated, sizeof *dv->values);
+ dv->values[dv->n_values++] = value;
}
/* Parses a list or range of numbers for DO REPEAT. */
-static int
-parse_numbers (struct lexer *lexer, struct repeat_macro *macro,
- struct pool *pool)
+static bool
+parse_numbers (struct lexer *lexer, struct dummy_var *dv)
{
- size_t used = 0;
size_t allocated = 0;
- macro->type = OTHER;
- macro->replacements = NULL;
-
do
{
- bool integer_value_seen;
- double a, b, i;
-
- /* Parse A TO B into a, b. */
if (!lex_force_num (lexer))
- return 0;
+ return false;
- if ( (integer_value_seen = lex_is_integer (lexer) ) )
- a = lex_integer (lexer);
- else
- a = lex_number (lexer);
+ if (lex_next_token (lexer, 1) == T_TO)
+ {
+ long int a, b;
+ long int i;
- lex_get (lexer);
- if (lex_token (lexer) == T_TO)
- {
- if ( !integer_value_seen )
+ if (!lex_is_integer (lexer))
{
- msg (SE, _("Ranges may only have integer bounds"));
- return 0;
+ msg (SE, _("Ranges may only have integer bounds."));
+ return false;
}
- lex_get (lexer);
- if (!lex_force_int (lexer))
- return 0;
+
+ a = lex_integer (lexer);
+ lex_get (lexer);
+ lex_get (lexer);
+
+ if (!lex_force_int (lexer))
+ return false;
+
b = lex_integer (lexer);
if (b < a)
{
- msg (SE, _("%g TO %g is an invalid range."), a, b);
- return 0;
+ msg (SE, _("%ld TO %ld is an invalid range."), a, b);
+ return false;
}
lex_get (lexer);
- }
+
+ for (i = a; i <= b; i++)
+ add_replacement (dv, xasprintf ("%ld", i), &allocated);
+ }
else
- b = a;
+ {
+ char s[DBL_BUFSIZE_BOUND];
- for (i = a; i <= b; i++)
- add_replacement (ss_cstr (pool_asprintf (pool, "%g", i)),
- macro, pool, &used, &allocated);
+ dtoastr (s, sizeof s, 0, 0, lex_number (lexer));
+ add_replacement (dv, xstrdup (s), &allocated);
+ lex_get (lexer);
+ }
lex_match (lexer, T_COMMA);
}
while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD);
- return used;
+ return true;
}
/* Parses a list of strings for DO REPEAT. */
-int
-parse_strings (struct lexer *lexer, struct repeat_macro *macro, struct pool *pool)
+static bool
+parse_strings (struct lexer *lexer, struct dummy_var *dv)
{
- size_t used = 0;
size_t allocated = 0;
- macro->type = OTHER;
- macro->replacements = NULL;
-
do
{
- char *string;
-
if (!lex_force_string (lexer))
{
msg (SE, _("String expected."));
- return 0;
+ return false;
}
- string = lex_token_representation (lexer);
- pool_register (pool, free, string);
- add_replacement (ss_cstr (string), macro, pool, &used, &allocated);
+ add_replacement (dv, token_to_string (lex_next (lexer, 0)), &allocated);
lex_get (lexer);
lex_match (lexer, T_COMMA);
}
while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD);
- return used;
+ return true;
}
\f
int
msg (SE, _("No matching DO REPEAT."));
return CMD_CASCADING_FAILURE;
}
-\f
-/* Finds a DO REPEAT macro with the given NAME and returns the
- appropriate substitution if found, or NAME otherwise. */
-static struct substring
-find_substitution (struct repeat_block *block, struct substring name)
-{
- struct repeat_macro *macro = find_macro (block, name);
- return macro ? macro->replacements[block->loop_idx] : name;
-}
-
-/* Makes appropriate DO REPEAT macro substitutions within the
- repeated lines. */
-static void
-do_repeat_filter (struct getl_interface *interface, struct string *line)
-{
- struct repeat_block *block
- = UP_CAST (interface, struct repeat_block, parent);
- bool in_apos, in_quote, dot;
- struct substring input;
- struct string output;
- int c;
-
- ds_init_empty (&output);
-
- /* Strip trailing whitespace, check for & remove terminal dot. */
- ds_rtrim (line, ss_cstr (CC_SPACES));
- dot = ds_chomp_byte (line, '.');
- input = ds_ss (line);
- in_apos = in_quote = false;
- while ((c = ss_first (input)) != EOF)
- {
- if (c == '\'' && !in_quote)
- in_apos = !in_apos;
- else if (c == '"' && !in_apos)
- in_quote = !in_quote;
-
- if (in_quote || in_apos || !lex_is_id1 (c))
- {
- ds_put_byte (&output, c);
- ss_advance (&input, 1);
- }
- else
- {
- struct substring id;
- ss_get_bytes (&input, lex_id_get_length (input), &id);
- ds_put_substring (&output, find_substitution (block, id));
- }
- }
- if (dot)
- ds_put_byte (&output, '.');
-
- ds_swap (line, &output);
- ds_destroy (&output);
-}
-
-static struct repeat_line *
-current_line (const struct getl_interface *interface)
-{
- struct repeat_block *block
- = UP_CAST (interface, struct repeat_block, parent);
- return (block->cur_line != ll_null (&block->lines)
- ? ll_data (block->cur_line, struct repeat_line, ll)
- : NULL);
-}
-
-/* Function called by getl to read a line. Puts the line in
- OUTPUT and its syntax mode in *SYNTAX. Returns true if a line
- was obtained, false if the source is exhausted. */
-static bool
-do_repeat_read (struct getl_interface *interface,
- struct string *output)
-{
- struct repeat_block *block
- = UP_CAST (interface, struct repeat_block, parent);
- struct repeat_line *line;
-
- block->cur_line = ll_next (block->cur_line);
- if (block->cur_line == ll_null (&block->lines))
- {
- block->loop_idx++;
- if (block->loop_idx >= block->loop_cnt)
- return false;
-
- block->cur_line = ll_head (&block->lines);
- }
-
- line = current_line (interface);
- ds_assign_substring (output, line->text);
- return true;
-}
-
-/* Frees a DO REPEAT block.
- Called by getl to close out the DO REPEAT block. */
-static void
-do_repeat_close (struct getl_interface *interface)
-{
- struct repeat_block *block
- = UP_CAST (interface, struct repeat_block, parent);
- pool_destroy (block->pool);
-}
-
-
-static bool
-always_false (const struct getl_interface *i UNUSED)
-{
- return false;
-}
-
-/* Returns the name of the source file from which the previous
- line was originally obtained, or a null pointer if none. */
-static const char *
-do_repeat_name (const struct getl_interface *interface)
-{
- struct repeat_line *line = current_line (interface);
- return line ? line->file_name : NULL;
-}
-
-/* Returns the line number in the source file from which the
- previous line was originally obtained, or 0 if none. */
-static int
-do_repeat_location (const struct getl_interface *interface)
-{
- struct repeat_line *line = current_line (interface);
- return line ? line->line_number : 0;
-}
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#if !INCLUDED_REPEAT_H
-#define INCLUDED_REPEAT_H 1
-
-void perform_DO_REPEAT_substitutions (void);
-
-#endif /* repeat.h */
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2011 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
/* Parses the TEMPORARY command. */
int
-cmd_temporary (struct lexer *lexer, struct dataset *ds)
+cmd_temporary (struct lexer *lexer UNUSED, struct dataset *ds)
{
if (!proc_in_temporary_transformations (ds))
proc_start_temporary_transformations (ds);
else
msg (SE, _("This command may only appear once between "
"procedures and procedure-like commands."));
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
#include "language/stats/sort-criteria.h"
#include "libpspp/assertion.h"
#include "libpspp/message.h"
+#include "libpspp/string-array.h"
#include "libpspp/taint.h"
#include "math/sort.h"
merge_dictionary (struct dictionary *const m, struct comb_file *f)
{
struct dictionary *d = f->dict;
- const char *d_docs, *m_docs;
+ const struct string_array *d_docs, *m_docs;
int i;
const char *file_encoding;
dict_set_documents (m, d_docs);
else
{
- char *new_docs = xasprintf ("%s%s", m_docs, d_docs);
- dict_set_documents (m, new_docs);
- free (new_docs);
+ struct string_array new_docs;
+ size_t i;
+
+ new_docs.n = m_docs->n + d_docs->n;
+ new_docs.strings = xmalloc (new_docs.n * sizeof *new_docs.strings);
+ for (i = 0; i < m_docs->n; i++)
+ new_docs.strings[i] = m_docs->strings[i];
+ for (i = 0; i < d_docs->n; i++)
+ new_docs.strings[m_docs->n + i] = d_docs->strings[i];
+
+ dict_set_documents (m, &new_docs);
+
+ free (new_docs.strings);
}
}
if (var_has_missing_values (dv) && !var_has_missing_values (mv))
var_set_missing_values (mv, var_get_missing_values (dv));
if (var_get_label (dv) && !var_get_label (mv))
- var_set_label (mv, var_get_label (dv));
+ var_set_label (mv, var_get_label (dv), file_encoding, false);
}
else
mv = dict_clone_var_assert (m, dv);
}
else
{
+ /* XXX should support multibyte UTF-8 characters */
lex_error (lexer, NULL);
ds_destroy (&delims);
goto error;
/* Parse everything. */
if (!parse_record_placement (lexer, &record, &column)
- || !parse_DATA_LIST_vars_pool (lexer, tmp_pool,
+ || !parse_DATA_LIST_vars_pool (lexer, dict, tmp_pool,
&names, &name_cnt, PV_NONE)
|| !parse_var_placements (lexer, tmp_pool, name_cnt, true,
&formats, &format_cnt))
size_t name_cnt;
size_t i;
- if (!parse_DATA_LIST_vars_pool (lexer, tmp_pool,
+ if (!parse_DATA_LIST_vars_pool (lexer, dict, tmp_pool,
&name, &name_cnt, PV_NONE))
return false;
m.category = MSG_C_DATA;
m.severity = MSG_S_WARNING;
- m.where.file_name = CONST_CAST (char *, dfm_get_file_name (reader));
- m.where.line_number = dfm_get_line_number (reader);
- m.where.first_column = first_column;
- m.where.last_column = last_column;
+ m.file_name = CONST_CAST (char *, dfm_get_file_name (reader));
+ m.first_line = dfm_get_line_number (reader);
+ m.last_line = m.first_line + 1;
+ m.first_column = first_column;
+ m.last_column = last_column;
m.text = xasprintf (_("Data for variable %s is not valid as format %s: %s"),
field->name, fmt_name (field->format.type), error);
msg_emit (&m);
#include "language/command.h"
#include "language/data-io/file-handle.h"
#include "language/lexer/lexer.h"
-#include "language/prompt.h"
#include "libpspp/assertion.h"
#include "libpspp/cast.h"
#include "libpspp/integer-format.h"
DFM_SAW_BEGIN_DATA = 004, /* For inline_file only, whether we've
already read a BEGIN DATA line. */
DFM_TABS_EXPANDED = 010, /* Tabs have been expanded. */
+ DFM_CONSUME = 020 /* read_inline_record() should get a token? */
};
/* Data file reader. */
{
struct file_handle *fh; /* File handle. */
struct fh_lock *lock; /* Mutual exclusion lock for file. */
- struct msg_locator where; /* Current location in data file. */
+ int line_number; /* Current line or record number. */
struct string line; /* Current line. */
struct string scratch; /* Extra line buffer. */
enum dfm_reader_flags flags; /* Zero or more of DFM_*. */
if (fh_get_referent (fh) != FH_REF_INLINE)
{
struct stat s;
- r->where.file_name = CONST_CAST (char *, fh_get_file_name (fh));
- r->where.line_number = 0;
+ r->line_number = 0;
r->file = fn_open (fh_get_file_name (fh), "rb");
if (r->file == NULL)
{
if ((r->flags & DFM_SAW_BEGIN_DATA) == 0)
{
r->flags |= DFM_SAW_BEGIN_DATA;
+ r->flags &= ~DFM_CONSUME;
while (lex_token (r->lexer) == T_ENDCMD)
lex_get (r->lexer);
- if (!lex_force_match_id (r->lexer, "BEGIN") || !lex_force_match_id (r->lexer, "DATA"))
+
+ if (!lex_force_match_id (r->lexer, "BEGIN")
+ || !lex_force_match_id (r->lexer, "DATA"))
return false;
- prompt_set_style (PROMPT_DATA);
- }
- if (!lex_get_line_raw (r->lexer))
- {
- lex_discard_line (r->lexer);
- msg (SE, _("Unexpected end-of-file while reading data in BEGIN "
- "DATA. This probably indicates "
- "a missing or incorrectly formatted END DATA command. "
- "END DATA must appear by itself on a single line "
- "with exactly one space between words."));
- return false;
+ lex_match (r->lexer, T_ENDCMD);
}
- if (ds_length (lex_entire_line_ds (r->lexer) ) >= 8
- && !strncasecmp (lex_entire_line (r->lexer), "end data", 8))
+ if (r->flags & DFM_CONSUME)
+ lex_get (r->lexer);
+
+ if (!lex_is_string (r->lexer))
{
- lex_discard_line (r->lexer);
+ if (!lex_match_id (r->lexer, "END") || !lex_match_id (r->lexer, "DATA"))
+ {
+ msg (SE, _("Missing END DATA while reading inline data. "
+ "This probably indicates a missing or incorrectly "
+ "formatted END DATA command. END DATA must appear "
+ "by itself on a single line with exactly one space "
+ "between words."));
+ lex_discard_rest_of_command (r->lexer);
+ }
return false;
}
- ds_assign_string (&r->line, lex_entire_line_ds (r->lexer) );
+ ds_assign_substring (&r->line, lex_tokss (r->lexer));
+ r->flags |= DFM_CONSUME;
return true;
}
{
bool ok = read_file_record (r);
if (ok)
- r->where.line_number++;
+ r->line_number++;
return ok;
}
else
const char *
dfm_get_file_name (const struct dfm_reader *r)
{
- return fh_get_referent (r->fh) == FH_REF_FILE ? r->where.file_name : NULL;
+ return (fh_get_referent (r->fh) == FH_REF_FILE
+ ? fh_get_file_name (r->fh)
+ : NULL);
}
int
dfm_get_line_number (const struct dfm_reader *r)
{
- return fh_get_referent (r->fh) == FH_REF_FILE ? r->where.line_number : -1;
+ return fh_get_referent (r->fh) == FH_REF_FILE ? r->line_number : -1;
}
\f
/* BEGIN DATA...END DATA procedure. */
"input program does not access the inline file."));
return CMD_CASCADING_FAILURE;
}
+ lex_match (lexer, T_ENDCMD);
/* Open inline file. */
r = dfm_open_reader (fh_inline_file (), lexer);
r->flags |= DFM_SAW_BEGIN_DATA;
+ r->flags &= ~DFM_CONSUME;
/* Input procedure reads from inline file. */
- prompt_set_style (PROMPT_DATA);
casereader_destroy (proc_open (ds));
ok = proc_commit (ds);
dfm_close_reader (r);
{
struct cmd_file_handle cmd;
struct file_handle *handle;
+ enum cmd_result result;
char *handle_name;
+ result = CMD_CASCADING_FAILURE;
if (!lex_force_id (lexer))
- goto error;
- handle_name = xstrdup (lex_tokcstr (lexer));
+ goto exit;
+ handle_name = xstrdup (lex_tokcstr (lexer));
handle = fh_from_id (handle_name);
if (handle != NULL)
{
msg (SE, _("File handle %s is already defined. "
"Use CLOSE FILE HANDLE before redefining a file handle."),
handle_name);
- goto error;
+ goto exit_free_handle_name;
}
lex_get (lexer);
if (!lex_force_match (lexer, T_SLASH))
- goto error_free_handle_name;
+ goto exit_free_handle_name;
if (!parse_file_handle (lexer, ds, &cmd, NULL))
- goto error_free_handle_name;
+ goto exit_free_handle_name;
if (lex_end_of_command (lexer) != CMD_SUCCESS)
- goto error_free_cmd;
+ goto exit_free_cmd;
if (cmd.mode != FH_SCRATCH)
{
if (cmd.s_name == NULL)
{
lex_sbc_missing (lexer, "NAME");
- goto error_free_cmd;
+ goto exit_free_cmd;
}
switch (cmd.mode)
else
{
msg (SE, _("RECFORM must be specified with MODE=360."));
- goto error_free_cmd;
+ goto exit_free_cmd;
}
break;
default:
else
fh_create_scratch (handle_name);
- free_file_handle (&cmd);
- return CMD_SUCCESS;
+ result = CMD_SUCCESS;
-error_free_cmd:
+exit_free_cmd:
free_file_handle (&cmd);
-error_free_handle_name:
+exit_free_handle_name:
free (handle_name);
-error:
- return CMD_CASCADING_FAILURE;
+exit:
+ return result;
}
int
#include "language/data-io/placement-parser.h"
#include "language/lexer/format-parser.h"
#include "language/lexer/lexer.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "gl/xalloc.h"
if (!lex_force_string (lexer))
goto error;
- gri.file_name = ss_xstrdup (lex_tokss (lexer));
+ gri.file_name = utf8_to_filename (lex_tokcstr (lexer));
lex_get (lexer);
if (!lex_force_string (lexer))
goto error;
+ /* XXX should support multibyte UTF-8 characters */
s = lex_tokss (lexer);
if (ss_match_string (&s, ss_cstr ("\\t")))
ds_put_cstr (&hard_seps, "\t");
if (!lex_force_string (lexer))
goto error;
+ /* XXX should support multibyte UTF-8 characters */
if (settings_get_syntax () == COMPATIBLE
&& ss_length (lex_tokss (lexer)) != 1)
{
lex_get (lexer);
}
- if (!lex_force_id (lexer))
+ if (!lex_force_id (lexer)
+ || !dict_id_is_valid (dict, lex_tokcstr (lexer), true))
goto error;
name = xstrdup (lex_tokcstr (lexer));
lex_get (lexer);
#include <config.h>
-
#include <float.h>
#include <stdlib.h>
/* Private result codes for use within INPUT PROGRAM. */
enum cmd_result_extensions
{
- CMD_END_INPUT_PROGRAM = CMD_PRIVATE_FIRST,
- CMD_END_CASE
+ CMD_END_CASE = CMD_PRIVATE_FIRST
};
/* Indicates how a `union value' should be initialized. */
bool saw_END_CASE = false;
proc_discard_active_file (ds);
- if (lex_token (lexer) != T_ENDCMD)
+ if (!lex_match (lexer, T_ENDCMD))
return lex_end_of_command (lexer);
inp = xmalloc (sizeof *inp);
inp->proto = NULL;
inside_input_program = true;
- for (;;)
+ while (!lex_match_phrase (lexer, "END INPUT PROGRAM"))
{
- enum cmd_result result = cmd_parse_in_state (lexer, ds, CMD_STATE_INPUT_PROGRAM);
- if (result == CMD_END_INPUT_PROGRAM)
- break;
- else if (result == CMD_END_CASE)
+ enum cmd_result result;
+
+ result = cmd_parse_in_state (lexer, ds, CMD_STATE_INPUT_PROGRAM);
+ if (result == CMD_END_CASE)
{
emit_END_CASE (ds, inp);
saw_END_CASE = true;
int
cmd_end_input_program (struct lexer *lexer UNUSED, struct dataset *ds UNUSED)
{
- assert (in_input_program ());
- return CMD_END_INPUT_PROGRAM;
+ /* Inside INPUT PROGRAM, this should get caught at the top of the loop in
+ cmd_input_program().
+
+ Outside of INPUT PROGRAM, the command parser should reject this
+ command. */
+ NOT_REACHED ();
}
/* Returns true if STATE is valid given the transformations that
assert (in_input_program ());
if (lex_token (lexer) == T_ENDCMD)
return CMD_END_CASE;
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Outputs the current case */
/* Parses END FILE command. */
int
-cmd_end_file (struct lexer *lexer, struct dataset *ds)
+cmd_end_file (struct lexer *lexer UNUSED, struct dataset *ds)
{
assert (in_input_program ());
add_transformation (ds, end_file_trns_proc, NULL, NULL);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Executes an END FILE transformation. */
lex_match (lexer, T_EQUALS);
if (!lex_force_string (lexer))
goto error;
+ /* XXX should support multibyte UTF-8 delimiters */
if (ss_length (lex_tokss (lexer)) != 1)
{
msg (SE, _("The %s string must contain exactly one "
lex_match (lexer, T_EQUALS);
if (!lex_force_string (lexer))
goto error;
+ /* XXX should support multibyte UTF-8 qualifiers */
if (ss_length (lex_tokss (lexer)) != 1)
{
msg (SE, _("The %s string must contain exactly one "
if (v == NULL)
return 0;
if (!lex_force_match (lexer, T_EQUALS)
- || !lex_force_id (lexer))
+ || !lex_force_id (lexer)
+ || !dict_id_is_valid (dict, lex_tokcstr (lexer), true))
return 0;
if (dict_lookup_var (dict, lex_tokcstr (lexer)) != NULL)
{
msg (SE, _("`=' expected after variable list."));
goto done;
}
- if (!parse_DATA_LIST_vars (lexer, &new_names, &nn,
+ if (!parse_DATA_LIST_vars (lexer, dict, &new_names, &nn,
PV_APPEND | PV_NO_SCRATCH | PV_NO_DUPLICATE))
goto done;
if (nn != nv)
continue;
}
- if (var_get_label (s))
- {
- const char *label = var_get_label (s);
- if (strcspn (label, " ") != strlen (label))
- var_set_label (t, label);
- }
+ if (var_has_label (s))
+ var_set_label (t, var_get_label (s),
+ dict_get_encoding (dataset_dict (ds)), false);
if (var_has_value_labels (s))
{
dict_set_weight (dataset_dict (ds), new_weight);
}
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
#include "gettext.h"
#define _(msgid) gettext (msgid)
-static enum cmd_result parse_attributes (struct lexer *, struct attrset **,
- size_t n);
+static enum cmd_result parse_attributes (struct lexer *,
+ const char *dict_encoding,
+ struct attrset **, size_t n);
/* Parses the DATAFILE ATTRIBUTE command. */
int
cmd_datafile_attribute (struct lexer *lexer, struct dataset *ds)
{
- struct attrset *set = dict_get_attributes (dataset_dict (ds));
- return parse_attributes (lexer, &set, 1);
+ struct dictionary *dict = dataset_dict (ds);
+ struct attrset *set = dict_get_attributes (dict);
+ return parse_attributes (lexer, dict_get_encoding (dict), &set, 1);
}
/* Parses the VARIABLE ATTRIBUTE command. */
int
cmd_variable_attribute (struct lexer *lexer, struct dataset *ds)
{
+ struct dictionary *dict = dataset_dict (ds);
+ const char *dict_encoding = dict_get_encoding (dict);
+
do
{
struct variable **vars;
if (!lex_force_match_id (lexer, "VARIABLES")
|| !lex_force_match (lexer, T_EQUALS)
- || !parse_variables (lexer, dataset_dict (ds), &vars, &n_vars,
- PV_NONE))
+ || !parse_variables (lexer, dict, &vars, &n_vars, PV_NONE))
return CMD_FAILURE;
sets = xmalloc (n_vars * sizeof *sets);
for (i = 0; i < n_vars; i++)
sets[i] = var_get_attributes (vars[i]);
- ok = parse_attributes (lexer, sets, n_vars);
+ ok = parse_attributes (lexer, dict_encoding, sets, n_vars);
free (vars);
free (sets);
if (!ok)
}
while (lex_match (lexer, T_SLASH));
- return lex_end_of_command (lexer);
-}
-
-static bool
-match_subcommand (struct lexer *lexer, const char *keyword)
-{
- if (lex_token (lexer) == T_ID
- && lex_id_match (lex_tokss (lexer), ss_cstr (keyword))
- && lex_look_ahead (lexer) == T_EQUALS)
- {
- lex_get (lexer); /* Skip keyword. */
- lex_get (lexer); /* Skip '='. */
- return true;
- }
- else
- return false;
+ return CMD_SUCCESS;
}
-/* Parses an attribute name optionally followed by an index inside square
- brackets. Returns the attribute name or NULL if there was a parse error.
- Stores the index into *INDEX. */
+/* Parses an attribute name and verifies that it is valid in DICT_ENCODING,
+ optionally followed by an index inside square brackets. Returns the
+ attribute name or NULL if there was a parse error. Stores the index into
+ *INDEX. */
static char *
-parse_attribute_name (struct lexer *lexer, size_t *index)
+parse_attribute_name (struct lexer *lexer, const char *dict_encoding,
+ size_t *index)
{
char *name;
- if (!lex_force_id (lexer))
+ if (!lex_force_id (lexer)
+ || !id_is_valid (lex_tokcstr (lexer), dict_encoding, true))
return NULL;
name = xstrdup (lex_tokcstr (lexer));
lex_get (lexer);
}
static bool
-add_attribute (struct lexer *lexer, struct attrset **sets, size_t n)
+add_attribute (struct lexer *lexer, const char *dict_encoding,
+ struct attrset **sets, size_t n)
{
const char *value;
size_t index, i;
char *name;
- name = parse_attribute_name (lexer, &index);
+ name = parse_attribute_name (lexer, dict_encoding, &index);
if (name == NULL)
return false;
if (!lex_force_match (lexer, T_LPAREN) || !lex_force_string (lexer))
}
static bool
-delete_attribute (struct lexer *lexer, struct attrset **sets, size_t n)
+delete_attribute (struct lexer *lexer, const char *dict_encoding,
+ struct attrset **sets, size_t n)
{
size_t index, i;
char *name;
- name = parse_attribute_name (lexer, &index);
+ name = parse_attribute_name (lexer, dict_encoding, &index);
if (name == NULL)
return false;
}
static enum cmd_result
-parse_attributes (struct lexer *lexer, struct attrset **sets, size_t n)
+parse_attributes (struct lexer *lexer, const char *dict_encoding,
+ struct attrset **sets, size_t n)
{
enum { UNKNOWN, ADD, DELETE } command = UNKNOWN;
do
{
- if (match_subcommand (lexer, "ATTRIBUTE"))
+ if (lex_match_phrase (lexer, "ATTRIBUTE="))
command = ADD;
- else if (match_subcommand (lexer, "DELETE"))
+ else if (lex_match_phrase (lexer, "DELETE="))
command = DELETE;
else if (command == UNKNOWN)
{
}
if (!(command == ADD
- ? add_attribute (lexer, sets, n)
- : delete_attribute (lexer, sets, n)))
+ ? add_attribute (lexer, dict_encoding, sets, n)
+ : delete_attribute (lexer, dict_encoding, sets, n)))
return CMD_FAILURE;
}
while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD);
#include <stdlib.h>
#include "data/data-in.h"
+#include "data/dictionary.h"
#include "data/format.h"
#include "data/missing-values.h"
#include "data/procedure.h"
#include "language/lexer/lexer.h"
#include "language/lexer/value-parser.h"
#include "language/lexer/variable-parser.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/str.h"
int
cmd_missing_values (struct lexer *lexer, struct dataset *ds)
{
+ struct dictionary *dict = dataset_dict (ds);
struct variable **v = NULL;
size_t nv;
- int retval = CMD_FAILURE;
- bool deferred_errors = false;
+ bool ok = true;
while (lex_token (lexer) != T_ENDCMD)
{
size_t i;
- if (!parse_variables (lexer, dataset_dict (ds), &v, &nv, PV_NONE))
- goto done;
+ if (!parse_variables (lexer, dict, &v, &nv, PV_NONE))
+ goto error;
if (!lex_force_match (lexer, T_LPAREN))
- goto done;
+ goto error;
for (i = 0; i < nv; i++)
var_clear_missing_values (v[i]);
msg (SE, _("Cannot mix numeric variables (e.g. %s) and "
"string variables (e.g. %s) within a single list."),
var_get_name (n), var_get_name (s));
- goto done;
+ goto error;
}
if (var_is_numeric (v[0]))
bool ok;
if (!parse_num_range (lexer, &x, &y, &type))
- goto done;
+ goto error;
ok = (x == y
? mv_add_num (&mv, x)
: mv_add_range (&mv, x, y));
if (!ok)
- deferred_errors = true;
+ ok = false;
lex_match (lexer, T_COMMA);
}
while (!lex_match (lexer, T_RPAREN))
{
uint8_t value[MV_MAX_STRING];
+ char *dict_mv;
size_t length;
if (!lex_force_string (lexer))
{
- deferred_errors = true;
+ ok = false;
break;
}
- length = ss_length (lex_tokss (lexer));
+ dict_mv = recode_string (dict_get_encoding (dict), "UTF-8",
+ lex_tokcstr (lexer),
+ ss_length (lex_tokss (lexer)));
+ length = strlen (dict_mv);
if (length > MV_MAX_STRING)
{
+ /* XXX truncate graphemes not bytes */
msg (SE, _("Truncating missing value to maximum "
"acceptable length (%d bytes)."),
MV_MAX_STRING);
length = MV_MAX_STRING;
}
memset (value, ' ', MV_MAX_STRING);
- memcpy (value, ss_data (lex_tokss (lexer)), length);
+ memcpy (value, dict_mv, length);
+ free (dict_mv);
if (!mv_add_str (&mv, value))
- deferred_errors = true;
+ ok = false;
lex_get (lexer);
lex_match (lexer, T_COMMA);
msg (SE, _("Missing values provided are too long to assign "
"to variable of width %d."),
var_get_width (v[i]));
- deferred_errors = true;
+ ok = false;
}
}
free (v);
v = NULL;
}
- retval = lex_end_of_command (lexer);
- done:
free (v);
- if (deferred_errors)
- retval = CMD_FAILURE;
- return retval;
+ return ok ? CMD_SUCCESS : CMD_FAILURE;
+
+error:
+ free (v);
+ return CMD_FAILURE;
}
"names on RENAME subcommand."));
goto done;
}
- if (!parse_DATA_LIST_vars (lexer, &vm.new_names,
- &prev_nv_1, PV_APPEND))
+ if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds),
+ &vm.new_names, &prev_nv_1, PV_APPEND))
goto done;
if (prev_nv_1 != vm.rename_cnt)
{
/* PSPP - a program for statistical analysis.
- Copyright (C) 2010 Free Software Foundation, Inc.
+ Copyright (C) 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include "language/lexer/variable-parser.h"
#include "libpspp/assertion.h"
#include "libpspp/hmap.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/str.h"
#include "libpspp/stringi-map.h"
return CMD_FAILURE;
}
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
static bool
{
if (lex_match_id (lexer, "NAME"))
{
- if (!lex_force_match (lexer, T_EQUALS) || !lex_force_id (lexer))
+ if (!lex_force_match (lexer, T_EQUALS) || !lex_force_id (lexer)
+ || !mrset_is_valid_name (lex_tokcstr (lexer),
+ dict_get_encoding (dict), true))
goto error;
- if (lex_tokcstr (lexer)[0] != '$')
- {
- msg (SE, _("%s is not a valid name for a multiple response "
- "set. Multiple response set names must begin with "
- "`$'."), lex_tokcstr (lexer));
- goto error;
- }
free (mrset->name);
mrset->name = xstrdup (lex_tokcstr (lexer));
}
else if (lex_is_string (lexer))
{
- const char *s = lex_tokcstr (lexer);
- int width;
+ size_t width;
+ char *s;
+
+ s = recode_string (dict_get_encoding (dict), "UTF-8",
+ lex_tokcstr (lexer), -1);
+ width = strlen (s);
/* Trim off trailing spaces, but don't trim the string until
it's empty because a width of 0 is a numeric type. */
- width = strlen (s);
while (width > 1 && s[width - 1] == ' ')
width--;
value_init (&mrset->counted, width);
memcpy (value_str_rw (&mrset->counted, width), s, width);
mrset->width = width;
+
+ free (s);
}
else
{
be used. */
struct fmt_spec f;
- if (!parse_DATA_LIST_vars (lexer, &v, &nv, PV_NO_DUPLICATE))
+ if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds),
+ &v, &nv, PV_NO_DUPLICATE))
return CMD_FAILURE;
/* Get the optional format specification. */
}
while (lex_match (lexer, T_SLASH));
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
/* If we have an error at a point where cleanup is required,
flow-of-control comes here. */
do
{
- if (!parse_DATA_LIST_vars (lexer, &v, &nv, PV_NO_DUPLICATE))
+ if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds),
+ &v, &nv, PV_NO_DUPLICATE))
return CMD_FAILURE;
if (!lex_force_match (lexer, T_LPAREN)
}
while (lex_match (lexer, T_SLASH));
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
/* If we have an error at a point where cleanup is required,
flow-of-control comes here. */
var_set_leave (v[i], true);
free (v);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
msg (SE, _("`=' expected between lists of new and old variable names."));
goto lossage;
}
- if (!parse_DATA_LIST_vars (lexer, &rename_new_names, &prev_nv_1,
+ if (!parse_DATA_LIST_vars (lexer, dataset_dict (ds),
+ &rename_new_names, &prev_nv_1,
PV_APPEND | PV_NO_DUPLICATE))
goto lossage;
if (prev_nv_1 != rename_cnt)
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2009, 2011 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
free (v);
}
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Dumps out the values of all the split variables for the case C. */
#include "libpspp/array.h"
#include "libpspp/message.h"
#include "libpspp/misc.h"
+#include "libpspp/string-array.h"
#include "output/tab.h"
#include "gl/minmax.h"
dict_destroy (d);
fh_unref (h);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
\f
/* DISPLAY utility. */
if (lex_match_id (lexer, "VECTORS"))
{
display_vectors (dataset_dict(ds), sorted);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
else if (lex_match_id (lexer, "SCRATCH"))
{
flags);
}
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
static void
static void
display_documents (const struct dictionary *dict)
{
- const char *documents = dict_get_documents (dict);
+ const struct string_array *documents = dict_get_documents (dict);
- if (documents == NULL)
+ if (string_array_is_empty (documents))
tab_output_text (TAB_LEFT, _("The active file dictionary does not "
"contain any documents."));
else
{
- struct string line = DS_EMPTY_INITIALIZER;
size_t i;
tab_output_text (TAB_LEFT | TAT_TITLE,
_("Documents in the active file:"));
for (i = 0; i < dict_get_document_line_cnt (dict); i++)
- {
- dict_get_document_line (dict, i, &line);
- tab_output_text (TAB_LEFT | TAB_FIX, ds_cstr (&line));
- }
- ds_destroy (&line);
+ tab_output_text (TAB_LEFT | TAB_FIX, dict_get_document_line (dict, i));
}
}
#include <stdio.h>
#include <stdlib.h>
+#include "data/dictionary.h"
#include "data/procedure.h"
#include "data/value-labels.h"
#include "data/variable.h"
#include "language/lexer/lexer.h"
#include "language/lexer/value-parser.h"
#include "language/lexer/variable-parser.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/str.h"
static int do_value_labels (struct lexer *,
const struct dictionary *dict, bool);
static void erase_labels (struct variable **vars, size_t var_cnt);
-static int get_label (struct lexer *, struct variable **vars, size_t var_cnt);
+static int get_label (struct lexer *, struct variable **vars, size_t var_cnt,
+ const char *dict_encoding);
\f
/* Stubs. */
if (erase)
erase_labels (vars, var_cnt);
while (lex_token (lexer) != T_SLASH && lex_token (lexer) != T_ENDCMD)
- if (!get_label (lexer, vars, var_cnt))
+ if (!get_label (lexer, vars, var_cnt, dict_get_encoding (dict)))
goto lossage;
if (lex_token (lexer) != T_SLASH)
free (vars);
}
- if (parse_err)
- return CMD_FAILURE;
-
- return lex_end_of_command (lexer);
+ return parse_err ? CMD_FAILURE : CMD_SUCCESS;
lossage:
free (vars);
/* Parse all the labels for the VAR_CNT variables in VARS and add
the specified labels to those variables. */
static int
-get_label (struct lexer *lexer, struct variable **vars, size_t var_cnt)
+get_label (struct lexer *lexer, struct variable **vars, size_t var_cnt,
+ const char *dict_encoding)
{
/* Parse all the labels and add them to the variables. */
do
int width = var_get_width (vars[0]);
union value value;
struct string label;
+ size_t trunc_len;
size_t i;
/* Set value. */
ds_init_substring (&label, lex_tokss (lexer));
- if (ds_length (&label) > MAX_LABEL_LEN)
+ trunc_len = utf8_encoding_trunc_len (ds_cstr (&label), dict_encoding,
+ MAX_LABEL_LEN);
+ if (ds_length (&label) > trunc_len)
{
msg (SW, _("Truncating value label to %d bytes."), MAX_LABEL_LEN);
- ds_truncate (&label, MAX_LABEL_LEN);
+ ds_truncate (&label, trunc_len);
}
for (i = 0; i < var_cnt; i++)
#include <stdio.h>
#include <stdlib.h>
+#include "data/dictionary.h"
#include "data/procedure.h"
#include "data/variable.h"
#include "language/command.h"
#include "language/lexer/lexer.h"
#include "language/lexer/variable-parser.h"
#include "libpspp/message.h"
-#include "libpspp/str.h"
#include "gl/xalloc.h"
int
cmd_variable_labels (struct lexer *lexer, struct dataset *ds)
{
+ struct dictionary *dict = dataset_dict (ds);
+ const char *dict_encoding = dict_get_encoding (dict);
+
do
{
struct variable **v;
- struct string label;
size_t nv;
size_t i;
- if (!parse_variables (lexer, dataset_dict (ds), &v, &nv, PV_NONE))
+ if (!parse_variables (lexer, dict, &v, &nv, PV_NONE))
return CMD_FAILURE;
if (!lex_force_string (lexer))
return CMD_FAILURE;
}
- ds_init_substring (&label, lex_tokss (lexer));
- if (ds_length (&label) > 255)
- {
- msg (SW, _("Truncating variable label to 255 characters."));
- ds_truncate (&label, 255);
- }
for (i = 0; i < nv; i++)
- var_set_label (v[i], ds_cstr (&label));
- ds_destroy (&label);
+ var_set_label (v[i], lex_tokcstr (lexer), dict_encoding, i == 0);
lex_get (lexer);
while (lex_token (lexer) == T_SLASH)
size_t vector_cnt, vector_cap;
/* Get the name(s) of the new vector(s). */
- if (!lex_force_id (lexer))
+ if (!lex_force_id (lexer)
+ || !dict_id_is_valid (dict, lex_tokcstr (lexer), true))
return CMD_CASCADING_FAILURE;
vectors = NULL;
goto fail;
}
- /* Check that none of the variables exist and that
- their names are no more than VAR_NAME_LEN bytes
- long. */
+ /* Check that none of the variables exist and that their names are
+ not excessively long. */
for (i = 0; i < vector_cnt; i++)
{
int j;
for (j = 0; j < var_cnt; j++)
{
char *name = xasprintf ("%s%d", vectors[i], j + 1);
- if (strlen (name) > VAR_NAME_LEN)
+ if (!dict_id_is_valid (dict, name, true))
{
free (name);
- msg (SE, _("%s is too long for a variable name."), name);
goto fail;
}
if (dict_lookup_var (dict, name))
while (lex_match (lexer, T_SLASH));
pool_destroy (pool);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
fail:
pool_destroy (pool);
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2011 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
dict_set_weight (dict, v);
}
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
#include "language/lexer/variable-parser.h"
#include "libpspp/array.h"
#include "libpspp/assertion.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/misc.h"
#include "libpspp/pool.h"
time_t last_proc_time = time_of_last_procedure (e->ds);
struct tm *time;
char temp_buf[10];
+ struct substring s;
time = localtime (&last_proc_time);
sprintf (temp_buf, "%02d %s %02d", abs (time->tm_mday) % 100,
months[abs (time->tm_mon) % 12], abs (time->tm_year) % 100);
- return expr_allocate_string_buffer (e, temp_buf, strlen (temp_buf));
+ ss_alloc_substring (&s, ss_cstr (temp_buf));
+ return expr_allocate_string (e, s);
}
else if (lex_match_id (lexer, "$TRUE"))
return expr_allocate_boolean (e, 1.0);
switch (lex_token (lexer))
{
case T_ID:
- if (lex_look_ahead (lexer) == T_LPAREN)
+ if (lex_next_token (lexer, 1) == T_LPAREN)
{
/* An identifier followed by a left parenthesis may be
a vector element reference. If not, it's a function
case T_STRING:
{
- union any_node *node = expr_allocate_string_buffer (
- e, lex_tokcstr (lexer), ss_length (lex_tokss (lexer)));
+ const char *dict_encoding;
+ union any_node *node;
+ char *s;
+
+ dict_encoding = (e->ds != NULL
+ ? dict_get_encoding (dataset_dict (e->ds))
+ : "UTF-8");
+ s = recode_string (dict_encoding, "UTF-8", lex_tokcstr (lexer),
+ ss_length (lex_tokss (lexer)));
+ node = expr_allocate_string (e, ss_cstr (s));
+
lex_get (lexer);
return node;
}
for (;;)
{
if (lex_token (lexer) == T_ID
- && toupper (lex_look_ahead (lexer)) == T_ID)
+ && lex_next_token (lexer, 1) == T_TO)
{
const struct variable **vars;
size_t var_cnt;
return n;
}
-union any_node *
-expr_allocate_string_buffer (struct expression *e,
- const char *string, size_t length)
-{
- union any_node *n = pool_alloc (e->expr_pool, sizeof n->string);
- n->type = OP_string;
- if (length > MAX_STRING)
- length = MAX_STRING;
- n->string.s = copy_string (e, string, length);
- return n;
-}
-
union any_node *
expr_allocate_string (struct expression *e, struct substring s)
{
union any_node *expr_allocate_boolean (struct expression *e, double);
union any_node *expr_allocate_integer (struct expression *e, int);
union any_node *expr_allocate_pos_int (struct expression *e, int);
-union any_node *expr_allocate_string_buffer (struct expression *e,
- const char *string, size_t length);
-union any_node *expr_allocate_string (struct expression *e,
- struct substring);
+union any_node *expr_allocate_string (struct expression *e, struct substring);
union any_node *expr_allocate_variable (struct expression *e,
const struct variable *);
union any_node *expr_allocate_format (struct expression *e,
language_lexer_sources = \
src/language/lexer/command-name.c \
src/language/lexer/command-name.h \
+ src/language/lexer/include-path.c \
+ src/language/lexer/include-path.h \
src/language/lexer/lexer.c \
src/language/lexer/lexer.h \
src/language/lexer/subcommand-list.c \
--- /dev/null
+/* PSPP - a program for statistical analysis.
+ Copyright (C) 2010 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "src/language/lexer/include-path.h"
+
+#include <stdlib.h>
+
+#include "data/file-name.h"
+#include "libpspp/string-array.h"
+
+#include "gl/configmake.h"
+#include "gl/relocatable.h"
+#include "gl/xvasprintf.h"
+
+static struct string_array the_include_path;
+static struct string_array default_include_path;
+
+static void include_path_init__ (void);
+
+void
+include_path_clear (void)
+{
+ include_path_init__ ();
+ string_array_clear (&the_include_path);
+}
+
+void
+include_path_add (const char *dir)
+{
+ include_path_init__ ();
+ string_array_append (&the_include_path, dir);
+}
+
+char *
+include_path_search (const char *base_name)
+{
+ return fn_search_path (base_name, include_path ());
+}
+
+const struct string_array *
+include_path_default (void)
+{
+ include_path_init__ ();
+ return &default_include_path;
+}
+
+char **
+include_path (void)
+{
+ include_path_init__ ();
+ string_array_terminate_null (&the_include_path);
+ return the_include_path.strings;
+}
+
+static void
+include_path_init__ (void)
+{
+ static bool inited;
+ char *home;
+
+ if (inited)
+ return;
+ inited = false;
+
+ string_array_init (&the_include_path);
+ string_array_append (&the_include_path, ".");
+ home = getenv ("HOME");
+ if (home != NULL)
+ string_array_append_nocopy (&the_include_path,
+ xasprintf ("%s/.pspp", home));
+ string_array_append (&the_include_path, relocate (PKGDATADIR));
+
+ string_array_clone (&default_include_path, &the_include_path);
+}
--- /dev/null
+/* PSPP - a program for statistical analysis.
+ Copyright (C) 2010 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#ifndef INCLUDE_PATH_H
+#define INCLUDE_PATH_H 1
+
+struct string_array;
+
+void include_path_clear (void);
+void include_path_add (const char *dir);
+char *include_path_search (const char *base_name);
+
+const struct string_array *include_path_default (void);
+char **include_path (void);
+
+#endif /* include-path.h */
#include "language/lexer/lexer.h"
-#include <c-ctype.h>
-#include <c-strtod.h>
#include <errno.h>
+#include <fcntl.h>
#include <limits.h>
#include <math.h>
#include <stdarg.h>
-#include <stdint.h>
#include <stdlib.h>
+#include <string.h>
+#include <unictype.h>
+#include <unistd.h>
+#include <unistr.h>
+#include <uniwidth.h>
-#include "data/settings.h"
+#include "data/file-name.h"
#include "language/command.h"
+#include "language/lexer/scan.h"
+#include "language/lexer/segment.h"
+#include "language/lexer/token.h"
#include "libpspp/assertion.h"
-#include "libpspp/getl.h"
+#include "libpspp/cast.h"
+#include "libpspp/deque.h"
+#include "libpspp/i18n.h"
+#include "libpspp/ll.h"
#include "libpspp/message.h"
+#include "libpspp/misc.h"
#include "libpspp/str.h"
+#include "libpspp/u8-istream.h"
#include "output/journal.h"
#include "output/text-item.h"
+#include "gl/c-ctype.h"
+#include "gl/minmax.h"
#include "gl/xalloc.h"
+#include "gl/xmemdup0.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
#define N_(msgid) msgid
-struct lexer
-{
- struct string line_buffer;
-
- struct source_stream *ss;
-
- int token; /* Current token. */
- double tokval; /* T_POS_NUM, T_NEG_NUM: the token's value. */
-
- struct string tokstr; /* T_ID, T_STRING: token string value. */
-
- char *prog; /* Pointer to next token in line_buffer. */
- bool dot; /* True only if this line ends with a terminal dot. */
-
- int put_token ; /* If nonzero, next token returned by lex_get().
- Used only in exceptional circumstances. */
+/* A token within a lex_source. */
+struct lex_token
+ {
+ /* The regular token information. */
+ struct token token;
+
+ /* Location of token in terms of the lex_source's buffer.
+ src->tail <= line_pos <= token_pos <= src->head. */
+ size_t token_pos; /* Start of token. */
+ size_t token_len; /* Length of source for token in bytes. */
+ size_t line_pos; /* Start of line containing token_pos. */
+ int first_line; /* Line number at token_pos. */
+ };
- struct string put_tokstr;
- double put_tokval;
-};
+/* A source of tokens, corresponding to a syntax file.
+ This is conceptually a lex_reader wrapped with everything needed to convert
+ its UTF-8 bytes into tokens. */
+struct lex_source
+ {
+ struct ll ll; /* In lexer's list of sources. */
+ struct lex_reader *reader;
+ struct segmenter segmenter;
+ bool eof; /* True if T_STOP was read from 'reader'. */
+
+ /* Buffer of UTF-8 bytes. */
+ char *buffer;
+ size_t allocated; /* Number of bytes allocated. */
+ size_t tail; /* &buffer[0] offset into UTF-8 source. */
+ size_t head; /* &buffer[head - tail] offset into source. */
+
+ /* Positions in source file, tail <= pos <= head for each member here. */
+ size_t journal_pos; /* First byte not yet output to journal. */
+ size_t seg_pos; /* First byte not yet scanned as token. */
+ size_t line_pos; /* First byte of line containing seg_pos. */
+
+ int n_newlines; /* Number of new-lines up to seg_pos. */
+ bool suppress_next_newline;
+
+ /* Tokens. */
+ struct deque deque; /* Indexes into 'tokens'. */
+ struct lex_token *tokens; /* Lookahead tokens for parser. */
+ };
-static int parse_id (struct lexer *);
+static struct lex_source *lex_source_create (struct lex_reader *);
+static void lex_source_destroy (struct lex_source *);
-/* How a string represents its contents. */
-enum string_type
+/* Lexer. */
+struct lexer
{
- CHARACTER_STRING, /* Characters. */
- BINARY_STRING, /* Binary digits. */
- OCTAL_STRING, /* Octal digits. */
- HEX_STRING /* Hexadecimal digits. */
+ struct ll_list sources; /* Contains "struct lex_source"s. */
};
-static int parse_string (struct lexer *, enum string_type);
+static struct lex_source *lex_source__ (const struct lexer *);
+static const struct lex_token *lex_next__ (const struct lexer *, int n);
+static void lex_source_push_endcmd__ (struct lex_source *);
+
+static void lex_source_pop__ (struct lex_source *);
+static bool lex_source_get__ (const struct lex_source *);
+static void lex_source_error_valist (struct lex_source *, int n0, int n1,
+ const char *format, va_list)
+ PRINTF_FORMAT (4, 0);
+static const struct lex_token *lex_source_next__ (const struct lex_source *,
+ int n);
\f
-/* Initialization. */
-
-/* Initializes the lexer. */
-struct lexer *
-lex_create (struct source_stream *ss)
-{
- struct lexer *lexer = xzalloc (sizeof (*lexer));
-
- ds_init_empty (&lexer->tokstr);
- ds_init_empty (&lexer->put_tokstr);
- ds_init_empty (&lexer->line_buffer);
- lexer->ss = ss;
-
- return lexer;
-}
-
-struct source_stream *
-lex_get_source_stream (const struct lexer *lex)
+/* Initializes READER with the specified CLASS and otherwise some reasonable
+ defaults. The caller should fill in the others members as desired. */
+void
+lex_reader_init (struct lex_reader *reader,
+ const struct lex_reader_class *class)
{
- return lex->ss;
+ reader->class = class;
+ reader->syntax = LEX_SYNTAX_AUTO;
+ reader->error = LEX_ERROR_INTERACTIVE;
+ reader->file_name = NULL;
+ reader->line_number = 0;
}
-enum syntax_mode
-lex_current_syntax_mode (const struct lexer *lex)
+/* Frees any file name already in READER and replaces it by a copy of
+ FILE_NAME, or if FILE_NAME is null then clears any existing name. */
+void
+lex_reader_set_file_name (struct lex_reader *reader, const char *file_name)
{
- return source_stream_current_syntax_mode (lex->ss);
+ free (reader->file_name);
+ reader->file_name = file_name != NULL ? xstrdup (file_name) : NULL;
}
-
-enum error_mode
-lex_current_error_mode (const struct lexer *lex)
+\f
+/* Creates and returns a new lexer. */
+struct lexer *
+lex_create (void)
{
- return source_stream_current_error_mode (lex->ss);
+ struct lexer *lexer = xzalloc (sizeof *lexer);
+ ll_init (&lexer->sources);
+ return lexer;
}
-
+/* Destroys LEXER. */
void
lex_destroy (struct lexer *lexer)
{
- if ( NULL != lexer )
+ if (lexer != NULL)
{
- ds_destroy (&lexer->put_tokstr);
- ds_destroy (&lexer->tokstr);
- ds_destroy (&lexer->line_buffer);
+ struct lex_source *source, *next;
+ ll_for_each_safe (source, next, struct lex_source, ll, &lexer->sources)
+ lex_source_destroy (source);
free (lexer);
}
}
+/* Inserts READER into LEXER so that the next token read by LEXER comes from
+ READER. Before the caller, LEXER must either be empty or at a T_ENDCMD
+ token. */
+void
+lex_include (struct lexer *lexer, struct lex_reader *reader)
+{
+ assert (ll_is_empty (&lexer->sources) || lex_token (lexer) == T_ENDCMD);
+ ll_push_head (&lexer->sources, &lex_source_create (reader)->ll);
+}
+
+/* Appends READER to LEXER, so that it will be read after all other current
+ readers have already been read. */
+void
+lex_append (struct lexer *lexer, struct lex_reader *reader)
+{
+ ll_push_tail (&lexer->sources, &lex_source_create (reader)->ll);
+}
\f
-/* Common functions. */
+/* Advacning. */
+
+static struct lex_token *
+lex_push_token__ (struct lex_source *src)
+{
+ struct lex_token *token;
+
+ if (deque_is_full (&src->deque))
+ src->tokens = deque_expand (&src->deque, src->tokens, sizeof *src->tokens);
+
+ token = &src->tokens[deque_push_front (&src->deque)];
+ token_init (&token->token);
+ return token;
+}
-/* Copies put_token, lexer->put_tokstr, put_tokval into token, tokstr,
- tokval, respectively, and sets tokid appropriately. */
static void
-restore_token (struct lexer *lexer)
+lex_source_pop__ (struct lex_source *src)
{
- assert (lexer->put_token != 0);
- lexer->token = lexer->put_token;
- ds_assign_string (&lexer->tokstr, &lexer->put_tokstr);
- lexer->tokval = lexer->put_tokval;
- lexer->put_token = 0;
+ token_destroy (&src->tokens[deque_pop_back (&src->deque)].token);
}
-/* Copies token, tokstr, lexer->tokval into lexer->put_token, put_tokstr,
- put_lexer->tokval respectively. */
static void
-save_token (struct lexer *lexer)
+lex_source_pop_front (struct lex_source *src)
{
- lexer->put_token = lexer->token;
- ds_assign_string (&lexer->put_tokstr, &lexer->tokstr);
- lexer->put_tokval = lexer->tokval;
+ token_destroy (&src->tokens[deque_pop_front (&src->deque)].token);
}
-/* Parses a single token, setting appropriate global variables to
- indicate the token's attributes. */
+/* Advances LEXER to the next token, consuming the current token. */
void
lex_get (struct lexer *lexer)
{
- /* Find a token. */
- for (;;)
- {
- if (NULL == lexer->prog && ! lex_get_line (lexer) )
- {
- lexer->token = T_STOP;
- return;
- }
-
- /* If a token was pushed ahead, return it. */
- if (lexer->put_token)
- {
- restore_token (lexer);
- return;
- }
+ struct lex_source *src;
- for (;;)
- {
- /* Skip whitespace. */
- while (c_isspace ((unsigned char) *lexer->prog))
- lexer->prog++;
-
- if (*lexer->prog)
- break;
-
- if (lexer->dot)
- {
- lexer->dot = 0;
- lexer->token = T_ENDCMD;
- return;
- }
- else if (!lex_get_line (lexer))
- {
- lexer->prog = NULL;
- lexer->token = T_STOP;
- return;
- }
-
- if (lexer->put_token)
- {
- restore_token (lexer);
- return;
- }
- }
-
-
- /* Actually parse the token. */
- ds_clear (&lexer->tokstr);
-
- switch (*lexer->prog)
- {
- case '-': case '.':
- case '0': case '1': case '2': case '3': case '4':
- case '5': case '6': case '7': case '8': case '9':
- {
- char *tail;
-
- /* `-' can introduce a negative number, or it can be a token by
- itself. */
- if (*lexer->prog == '-')
- {
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- while (c_isspace ((unsigned char) *lexer->prog))
- lexer->prog++;
-
- if (!c_isdigit ((unsigned char) *lexer->prog) && *lexer->prog != '.')
- {
- lexer->token = T_DASH;
- break;
- }
- lexer->token = T_NEG_NUM;
- }
- else
- lexer->token = T_POS_NUM;
-
- /* Parse the number, copying it into tokstr. */
- while (c_isdigit ((unsigned char) *lexer->prog))
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- if (*lexer->prog == '.')
- {
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- while (c_isdigit ((unsigned char) *lexer->prog))
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- }
- if (*lexer->prog == 'e' || *lexer->prog == 'E')
- {
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- if (*lexer->prog == '+' || *lexer->prog == '-')
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- while (c_isdigit ((unsigned char) *lexer->prog))
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- }
-
- /* Parse as floating point. */
- lexer->tokval = c_strtod (ds_cstr (&lexer->tokstr), &tail);
- if (*tail)
- {
- msg (SE, _("%s does not form a valid number."),
- ds_cstr (&lexer->tokstr));
- lexer->tokval = 0.0;
-
- ds_clear (&lexer->tokstr);
- ds_put_byte (&lexer->tokstr, '0');
- }
-
- break;
- }
-
- case '\'': case '"':
- lexer->token = parse_string (lexer, CHARACTER_STRING);
- break;
-
- case '+':
- lexer->token = T_PLUS;
- lexer->prog++;
- break;
-
- case '/':
- lexer->token = T_SLASH;
- lexer->prog++;
- break;
-
- case '=':
- lexer->token = T_EQUALS;
- lexer->prog++;
- break;
+ src = lex_source__ (lexer);
+ if (src == NULL)
+ return;
- case '(':
- lexer->token = T_LPAREN;
- lexer->prog++;
- break;
-
- case ')':
- lexer->token = T_RPAREN;
- lexer->prog++;
- break;
-
- case '[':
- lexer->token = T_LBRACK;
- lexer->prog++;
- break;
-
- case ']':
- lexer->token = T_RBRACK;
- lexer->prog++;
- break;
+ if (!deque_is_empty (&src->deque))
+ lex_source_pop__ (src);
- case ',':
- lexer->token = T_COMMA;
- lexer->prog++;
- break;
-
- case '*':
- if (*++lexer->prog == '*')
- {
- lexer->prog++;
- lexer->token = T_EXP;
- }
- else
- lexer->token = T_ASTERISK;
- break;
-
- case '<':
- if (*++lexer->prog == '=')
- {
- lexer->prog++;
- lexer->token = T_LE;
- }
- else if (*lexer->prog == '>')
- {
- lexer->prog++;
- lexer->token = T_NE;
- }
- else
- lexer->token = T_LT;
- break;
-
- case '>':
- if (*++lexer->prog == '=')
- {
- lexer->prog++;
- lexer->token = T_GE;
- }
- else
- lexer->token = T_GT;
- break;
-
- case '~':
- if (*++lexer->prog == '=')
- {
- lexer->prog++;
- lexer->token = T_NE;
- }
- else
- lexer->token = T_NOT;
- break;
-
- case '&':
- lexer->prog++;
- lexer->token = T_AND;
- break;
-
- case '|':
- lexer->prog++;
- lexer->token = T_OR;
- break;
-
- case 'b': case 'B':
- if (lexer->prog[1] == '\'' || lexer->prog[1] == '"')
- lexer->token = parse_string (lexer, BINARY_STRING);
- else
- lexer->token = parse_id (lexer);
- break;
+ while (deque_is_empty (&src->deque))
+ if (!lex_source_get__ (src))
+ {
+ lex_source_destroy (src);
+ src = lex_source__ (lexer);
+ if (src == NULL)
+ return;
+ }
+}
+\f
+/* Issuing errors. */
- case 'o': case 'O':
- if (lexer->prog[1] == '\'' || lexer->prog[1] == '"')
- lexer->token = parse_string (lexer, OCTAL_STRING);
- else
- lexer->token = parse_id (lexer);
- break;
+/* Prints a syntax error message containing the current token and
+ given message MESSAGE (if non-null). */
+void
+lex_error (struct lexer *lexer, const char *format, ...)
+{
+ va_list args;
- case 'x': case 'X':
- if (lexer->prog[1] == '\'' || lexer->prog[1] == '"')
- lexer->token = parse_string (lexer, HEX_STRING);
- else
- lexer->token = parse_id (lexer);
- break;
+ va_start (args, format);
+ lex_next_error_valist (lexer, 0, 0, format, args);
+ va_end (args);
+}
- default:
- if (lex_is_id1 (*lexer->prog))
- {
- lexer->token = parse_id (lexer);
- break;
- }
- else
- {
- unsigned char c = *lexer->prog++;
- char *c_name = xasprintf (c_isgraph (c) ? "%c" : "\\%o", c);
- msg (SE, _("Bad character in input: `%s'."), c_name);
- free (c_name);
- continue;
- }
- }
- break;
- }
+/* Prints a syntax error message containing the current token and
+ given message MESSAGE (if non-null). */
+void
+lex_error_valist (struct lexer *lexer, const char *format, va_list args)
+{
+ lex_next_error_valist (lexer, 0, 0, format, args);
}
-/* Parses an identifier at the current position into tokstr.
- Returns the correct token type. */
-static int
-parse_id (struct lexer *lexer)
+/* Prints a syntax error message containing the current token and
+ given message MESSAGE (if non-null). */
+void
+lex_next_error (struct lexer *lexer, int n0, int n1, const char *format, ...)
{
- struct substring rest_of_line
- = ss_substr (ds_ss (&lexer->line_buffer),
- ds_pointer_to_position (&lexer->line_buffer, lexer->prog),
- SIZE_MAX);
- struct substring id = ss_head (rest_of_line,
- lex_id_get_length (rest_of_line));
- lexer->prog += ss_length (id);
+ va_list args;
- ds_assign_substring (&lexer->tokstr, id);
- return lex_id_to_token (id);
+ va_start (args, format);
+ lex_next_error_valist (lexer, n0, n1, format, args);
+ va_end (args);
}
/* Reports an error to the effect that subcommand SBC may only be
/* Prints a syntax error message containing the current token and
given message MESSAGE (if non-null). */
void
-lex_error (struct lexer *lexer, const char *message, ...)
+lex_next_error_valist (struct lexer *lexer, int n0, int n1,
+ const char *format, va_list args)
{
- struct string s;
-
- ds_init_empty (&s);
+ struct lex_source *src = lex_source__ (lexer);
- if (lexer->token == T_STOP)
- ds_put_cstr (&s, _("Syntax error at end of file"));
- else if (lexer->token == T_ENDCMD)
- ds_put_cstr (&s, _("Syntax error at end of command"));
+ if (src != NULL)
+ lex_source_error_valist (src, n0, n1, format, args);
else
{
- char *token_rep = lex_token_representation (lexer);
- ds_put_format (&s, _("Syntax error at `%s'"), token_rep);
- free (token_rep);
- }
-
- if (message)
- {
- va_list args;
+ struct string s;
- ds_put_cstr (&s, ": ");
-
- va_start (args, message);
- ds_put_vformat (&s, message, args);
- va_end (args);
+ ds_init_empty (&s);
+ ds_put_format (&s, _("Syntax error at end of input"));
+ if (format != NULL)
+ {
+ ds_put_cstr (&s, ": ");
+ ds_put_vformat (&s, format, args);
+ }
+ ds_put_byte (&s, '.');
+ msg (SE, "%s", ds_cstr (&s));
+ ds_destroy (&s);
}
-
- msg (SE, "%s.", ds_cstr (&s));
- ds_destroy (&s);
}
/* Checks that we're at end of command.
int
lex_end_of_command (struct lexer *lexer)
{
- if (lexer->token != T_ENDCMD)
+ if (lex_token (lexer) != T_ENDCMD && lex_token (lexer) != T_STOP)
{
lex_error (lexer, _("expecting end of command"));
return CMD_FAILURE;
bool
lex_is_number (struct lexer *lexer)
{
- return lexer->token == T_POS_NUM || lexer->token == T_NEG_NUM;
+ return lex_next_is_number (lexer, 0);
}
-
/* Returns true if the current token is a string. */
bool
lex_is_string (struct lexer *lexer)
{
- return lexer->token == T_STRING;
+ return lex_next_is_string (lexer, 0);
}
-
/* Returns the value of the current token, which must be a
floating point number. */
double
lex_number (struct lexer *lexer)
{
- assert (lex_is_number (lexer));
- return lexer->tokval;
+ return lex_next_number (lexer, 0);
}
/* Returns true iff the current token is an integer. */
bool
lex_is_integer (struct lexer *lexer)
{
- return (lex_is_number (lexer)
- && lexer->tokval > LONG_MIN
- && lexer->tokval <= LONG_MAX
- && floor (lexer->tokval) == lexer->tokval);
+ return lex_next_is_integer (lexer, 0);
}
/* Returns the value of the current token, which must be an
long
lex_integer (struct lexer *lexer)
{
- assert (lex_is_integer (lexer));
- return lexer->tokval;
+ return lex_next_integer (lexer, 0);
+}
+\f
+/* Token testing functions with lookahead.
+
+ A value of 0 for N as an argument to any of these functions refers to the
+ current token. Lookahead is limited to the current command. Any N greater
+ than the number of tokens remaining in the current command will be treated
+ as referring to a T_ENDCMD token. */
+
+/* Returns true if the token N ahead of the current token is a number. */
+bool
+lex_next_is_number (struct lexer *lexer, int n)
+{
+ enum token_type next_token = lex_next_token (lexer, n);
+ return next_token == T_POS_NUM || next_token == T_NEG_NUM;
+}
+
+/* Returns true if the token N ahead of the current token is a string. */
+bool
+lex_next_is_string (struct lexer *lexer, int n)
+{
+ return lex_next_token (lexer, n) == T_STRING;
+}
+
+/* Returns the value of the token N ahead of the current token, which must be a
+ floating point number. */
+double
+lex_next_number (struct lexer *lexer, int n)
+{
+ assert (lex_next_is_number (lexer, n));
+ return lex_next_tokval (lexer, n);
+}
+
+/* Returns true if the token N ahead of the current token is an integer. */
+bool
+lex_next_is_integer (struct lexer *lexer, int n)
+{
+ double value;
+
+ if (!lex_next_is_number (lexer, n))
+ return false;
+
+ value = lex_next_tokval (lexer, n);
+ return value > LONG_MIN && value <= LONG_MAX && floor (value) == value;
+}
+
+/* Returns the value of the token N ahead of the current token, which must be
+ an integer. */
+long
+lex_next_integer (struct lexer *lexer, int n)
+{
+ assert (lex_next_is_integer (lexer, n));
+ return lex_next_tokval (lexer, n);
}
\f
/* Token matching functions. */
-/* If TOK is the current token, skips it and returns true
+/* If the current token has the specified TYPE, skips it and returns true.
Otherwise, returns false. */
bool
-lex_match (struct lexer *lexer, enum token_type t)
+lex_match (struct lexer *lexer, enum token_type type)
{
- if (lexer->token == t)
+ if (lex_token (lexer) == type)
{
lex_get (lexer);
return true;
return false;
}
-/* If the current token is the identifier S, skips it and returns
- true. The identifier may be abbreviated to its first three
- letters.
- Otherwise, returns false. */
+/* If the current token matches IDENTIFIER, skips it and returns true.
+ IDENTIFIER may be abbreviated to its first three letters. Otherwise,
+ returns false.
+
+ IDENTIFIER must be an ASCII string. */
bool
-lex_match_id (struct lexer *lexer, const char *s)
+lex_match_id (struct lexer *lexer, const char *identifier)
{
- return lex_match_id_n (lexer, s, 3);
+ return lex_match_id_n (lexer, identifier, 3);
}
-/* If the current token is the identifier S, skips it and returns
- true. The identifier may be abbreviated to its first N
- letters.
- Otherwise, returns false. */
+/* If the current token is IDENTIFIER, skips it and returns true. IDENTIFIER
+ may be abbreviated to its first N letters. Otherwise, returns false.
+
+ IDENTIFIER must be an ASCII string. */
bool
-lex_match_id_n (struct lexer *lexer, const char *s, size_t n)
+lex_match_id_n (struct lexer *lexer, const char *identifier, size_t n)
{
- if (lexer->token == T_ID
- && lex_id_match_n (ss_cstr (s), lex_tokss (lexer), n))
+ if (lex_token (lexer) == T_ID
+ && lex_id_match_n (ss_cstr (identifier), lex_tokss (lexer), n))
{
lex_get (lexer);
return true;
return false;
}
-/* If the current token is integer N, skips it and returns true.
- Otherwise, returns false. */
+/* If the current token is integer X, skips it and returns true. Otherwise,
+ returns false. */
bool
lex_match_int (struct lexer *lexer, int x)
{
\f
/* Forced matches. */
-/* If this token is identifier S, fetches the next token and returns
- nonzero.
- Otherwise, reports an error and returns zero. */
+/* If this token is IDENTIFIER, skips it and returns true. IDENTIFIER may be
+ abbreviated to its first 3 letters. Otherwise, reports an error and returns
+ false.
+
+ IDENTIFIER must be an ASCII string. */
bool
-lex_force_match_id (struct lexer *lexer, const char *s)
+lex_force_match_id (struct lexer *lexer, const char *identifier)
{
- if (lex_match_id (lexer, s))
+ if (lex_match_id (lexer, identifier))
return true;
else
{
- lex_error (lexer, _("expecting `%s'"), s);
+ lex_error (lexer, _("expecting `%s'"), identifier);
return false;
}
}
-/* If the current token is T, skips the token. Otherwise, reports an
- error and returns from the current function with return value false. */
+/* If the current token has the specified TYPE, skips it and returns true.
+ Otherwise, reports an error and returns false. */
bool
-lex_force_match (struct lexer *lexer, enum token_type t)
+lex_force_match (struct lexer *lexer, enum token_type type)
{
- if (lexer->token == t)
+ if (lex_token (lexer) == type)
{
lex_get (lexer);
return true;
}
else
{
- lex_error (lexer, _("expecting `%s'"), lex_token_name (t));
+ lex_error (lexer, _("expecting `%s'"), token_type_to_string (type));
return false;
}
}
-/* If this token is a string, does nothing and returns true.
+/* If the current token is a string, does nothing and returns true.
Otherwise, reports an error and returns false. */
bool
lex_force_string (struct lexer *lexer)
}
}
-/* If this token is an integer, does nothing and returns true.
+/* If the current token is an integer, does nothing and returns true.
Otherwise, reports an error and returns false. */
bool
lex_force_int (struct lexer *lexer)
}
}
-/* If this token is a number, does nothing and returns true.
+/* If the current token is a number, does nothing and returns true.
Otherwise, reports an error and returns false. */
bool
lex_force_num (struct lexer *lexer)
return false;
}
-/* If this token is an identifier, does nothing and returns true.
+/* If the current token is an identifier, does nothing and returns true.
Otherwise, reports an error and returns false. */
bool
lex_force_id (struct lexer *lexer)
{
- if (lexer->token == T_ID)
+ if (lex_token (lexer) == T_ID)
return true;
lex_error (lexer, _("expecting identifier"));
return false;
}
+\f
+/* Token accessors. */
-/* Weird token functions. */
-
-/* Returns the likely type of the next token, or 0 if it's hard to tell. */
+/* Returns the type of LEXER's current token. */
enum token_type
-lex_look_ahead (struct lexer *lexer)
+lex_token (const struct lexer *lexer)
{
- if (lexer->put_token)
- return lexer->put_token;
+ return lex_next_token (lexer, 0);
+}
- for (;;)
+/* Returns the number in LEXER's current token.
+
+ Only T_NEG_NUM and T_POS_NUM tokens have meaningful values. For other
+ tokens this function will always return zero. */
+double
+lex_tokval (const struct lexer *lexer)
+{
+ return lex_next_tokval (lexer, 0);
+}
+
+/* Returns the null-terminated string in LEXER's current token, UTF-8 encoded.
+
+ Only T_ID and T_STRING tokens have meaningful strings. For other tokens
+ this functions this function will always return NULL.
+
+ The UTF-8 encoding of the returned string is correct for variable names and
+ other identifiers. Use filename_to_utf8() to use it as a filename. Use
+ data_in() to use it in a "union value". */
+const char *
+lex_tokcstr (const struct lexer *lexer)
+{
+ return lex_next_tokcstr (lexer, 0);
+}
+
+/* Returns the string in LEXER's current token, UTF-8 encoded. The string is
+ null-terminated (but the null terminator is not included in the returned
+ substring's 'length').
+
+ Only T_ID and T_STRING tokens have meaningful strings. For other tokens
+ this functions this function will always return NULL.
+
+ The UTF-8 encoding of the returned string is correct for variable names and
+ other identifiers. Use filename_to_utf8() to use it as a filename. Use
+ data_in() to use it in a "union value". */
+struct substring
+lex_tokss (const struct lexer *lexer)
+{
+ return lex_next_tokss (lexer, 0);
+}
+\f
+/* Looking ahead.
+
+ A value of 0 for N as an argument to any of these functions refers to the
+ current token. Lookahead is limited to the current command. Any N greater
+ than the number of tokens remaining in the current command will be treated
+ as referring to a T_ENDCMD token. */
+
+static const struct lex_token *
+lex_next__ (const struct lexer *lexer_, int n)
+{
+ struct lexer *lexer = CONST_CAST (struct lexer *, lexer_);
+ struct lex_source *src = lex_source__ (lexer);
+
+ if (src != NULL)
+ return lex_source_next__ (src, n);
+ else
+ {
+ static const struct lex_token stop_token =
+ { TOKEN_INITIALIZER (T_STOP, 0.0, ""), 0, 0, 0, 0 };
+
+ return &stop_token;
+ }
+}
+
+static const struct lex_token *
+lex_source_next__ (const struct lex_source *src, int n)
+{
+ while (deque_count (&src->deque) <= n)
{
- if (NULL == lexer->prog && ! lex_get_line (lexer) )
- return 0;
-
- for (;;)
- {
- while (c_isspace ((unsigned char) *lexer->prog))
- lexer->prog++;
- if (*lexer->prog)
- break;
-
- if (lexer->dot)
- return T_ENDCMD;
- else if (!lex_get_line (lexer))
- return 0;
-
- if (lexer->put_token)
- return lexer->put_token;
- }
-
- switch (toupper ((unsigned char) *lexer->prog))
+ if (!deque_is_empty (&src->deque))
{
- case 'X': case 'B': case 'O':
- if (lexer->prog[1] == '\'' || lexer->prog[1] == '"')
- return T_STRING;
- /* Fall through */
+ struct lex_token *front;
- case '-':
- return T_DASH;
+ front = &src->tokens[deque_front (&src->deque, 0)];
+ if (front->token.type == T_STOP || front->token.type == T_ENDCMD)
+ return front;
+ }
+
+ lex_source_get__ (src);
+ }
+
+ return &src->tokens[deque_back (&src->deque, n)];
+}
+
+/* Returns the "struct token" of the token N after the current one in LEXER.
+ The returned pointer can be invalidated by pretty much any succeeding call
+ into the lexer, although the string pointer within the returned token is
+ only invalidated by consuming the token (e.g. with lex_get()). */
+const struct token *
+lex_next (const struct lexer *lexer, int n)
+{
+ return &lex_next__ (lexer, n)->token;
+}
+
+/* Returns the type of the token N after the current one in LEXER. */
+enum token_type
+lex_next_token (const struct lexer *lexer, int n)
+{
+ return lex_next (lexer, n)->type;
+}
+
+/* Returns the number in the tokn N after the current one in LEXER.
+
+ Only T_NEG_NUM and T_POS_NUM tokens have meaningful values. For other
+ tokens this function will always return zero. */
+double
+lex_next_tokval (const struct lexer *lexer, int n)
+{
+ const struct token *token = lex_next (lexer, n);
+ return token->number;
+}
+
+/* Returns the null-terminated string in the token N after the current one, in
+ UTF-8 encoding.
+
+ Only T_ID and T_STRING tokens have meaningful strings. For other tokens
+ this functions this function will always return NULL.
+
+ The UTF-8 encoding of the returned string is correct for variable names and
+ other identifiers. Use filename_to_utf8() to use it as a filename. Use
+ data_in() to use it in a "union value". */
+const char *
+lex_next_tokcstr (const struct lexer *lexer, int n)
+{
+ return lex_next_tokss (lexer, n).string;
+}
+
+/* Returns the string in the token N after the current one, in UTF-8 encoding.
+ The string is null-terminated (but the null terminator is not included in
+ the returned substring's 'length').
+
+ Only T_ID and T_STRING tokens have meaningful strings. For other tokens
+ this functions this function will always return NULL.
+
+ The UTF-8 encoding of the returned string is correct for variable names and
+ other identifiers. Use filename_to_utf8() to use it as a filename. Use
+ data_in() to use it in a "union value". */
+struct substring
+lex_next_tokss (const struct lexer *lexer, int n)
+{
+ return lex_next (lexer, n)->string;
+}
- case '.':
- case '0': case '1': case '2': case '3': case '4':
- case '5': case '6': case '7': case '8': case '9':
- return T_POS_NUM;
+/* If LEXER is positioned at the (pseudo)identifier S, skips it and returns
+ true. Otherwise, returns false.
- case '\'': case '"':
- return T_STRING;
+ S may consist of an arbitrary number of identifiers, integers, and
+ punctuation e.g. "KRUSKAL-WALLIS", "2SLS", or "END INPUT PROGRAM".
+ Identifiers may be abbreviated to their first three letters. Currently only
+ hyphens, slashes, and equals signs are supported as punctuation (but it
+ would be easy to add more).
- case '+':
- return T_PLUS;
+ S must be an ASCII string. */
+bool
+lex_match_phrase (struct lexer *lexer, const char *s)
+{
+ int tok_idx;
+
+ for (tok_idx = 0; ; tok_idx++)
+ {
+ enum token_type token;
+ unsigned char c;
+
+ while (c_isspace (*s))
+ s++;
+
+ c = *s;
+ if (c == '\0')
+ {
+ int i;
+
+ for (i = 0; i < tok_idx; i++)
+ lex_get (lexer);
+ return true;
+ }
+
+ token = lex_next_token (lexer, tok_idx);
+ switch (c)
+ {
+ case '-':
+ if (token != T_DASH)
+ return false;
+ s++;
+ break;
case '/':
- return T_SLASH;
+ if (token != T_SLASH)
+ return false;
+ s++;
+ break;
case '=':
- return T_EQUALS;
+ if (token != T_EQUALS)
+ return false;
+ s++;
+ break;
- case '(':
- return T_LPAREN;
+ case '0': case '1': case '2': case '3': case '4':
+ case '5': case '6': case '7': case '8': case '9':
+ {
+ unsigned int value;
- case ')':
- return T_RPAREN;
+ if (token != T_POS_NUM)
+ return false;
- case '[':
- return T_LBRACK;
+ value = 0;
+ do
+ {
+ value = value * 10 + (*s++ - '0');
+ }
+ while (c_isdigit (*s));
- case ']':
- return T_RBRACK;
+ if (lex_next_tokval (lexer, tok_idx) != value)
+ return false;
+ }
+ break;
- case ',':
- return T_COMMA;
+ default:
+ if (lex_is_id1 (c))
+ {
+ int len;
- case '*':
- return lexer->prog[1] == '*' ? T_EXP : T_ASTERISK;
+ if (token != T_ID)
+ return false;
- case '<':
- return (lexer->prog[1] == '=' ? T_LE
- : lexer->prog[1] == '>' ? T_NE
- : T_LT);
+ len = lex_id_get_length (ss_cstr (s));
+ if (!lex_id_match (ss_buffer (s, len),
+ lex_next_tokss (lexer, tok_idx)))
+ return false;
- case '>':
- return lexer->prog[1] == '=' ? T_GE : T_GT;
+ s += len;
+ }
+ else
+ NOT_REACHED ();
+ }
+ }
+}
- case '~':
- return lexer->prog[1] == '=' ? T_NE : T_NOT;
+static int
+lex_source_get_first_line_number (const struct lex_source *src, int n)
+{
+ return lex_source_next__ (src, n)->first_line;
+}
- case '&':
- return T_AND;
+static int
+count_newlines (char *s, size_t length)
+{
+ int n_newlines = 0;
+ char *newline;
- case '|':
- return T_OR;
+ while ((newline = memchr (s, '\n', length)) != NULL)
+ {
+ n_newlines++;
+ length -= (newline + 1) - s;
+ s = newline + 1;
+ }
- default:
- if (lex_is_id1 (*lexer->prog))
- return T_ID;
- return 0;
+ return n_newlines;
+}
+
+static int
+lex_source_get_last_line_number (const struct lex_source *src, int n)
+{
+ const struct lex_token *token = lex_source_next__ (src, n);
+
+ if (token->first_line == 0)
+ return 0;
+ else
+ {
+ char *token_str = &src->buffer[token->token_pos - src->tail];
+ return token->first_line + count_newlines (token_str, token->token_len) + 1;
+ }
+}
+
+static int
+count_columns (const char *s_, size_t length)
+{
+ const uint8_t *s = CHAR_CAST (const uint8_t *, s_);
+ int columns;
+ size_t ofs;
+ int mblen;
+
+ columns = 0;
+ for (ofs = 0; ofs < length; ofs += mblen)
+ {
+ ucs4_t uc;
+
+ mblen = u8_mbtouc (&uc, s + ofs, length - ofs);
+ if (uc != '\t')
+ {
+ int width = uc_width (uc, "UTF-8");
+ if (width > 0)
+ columns += width;
}
+ else
+ columns = ROUND_UP (columns + 1, 8);
}
+
+ return columns + 1;
}
-/* Makes the current token become the next token to be read; the
- current token is set to T. */
-void
-lex_put_back (struct lexer *lexer, enum token_type t)
+static int
+lex_source_get_first_column (const struct lex_source *src, int n)
{
- save_token (lexer);
- lexer->token = t;
+ const struct lex_token *token = lex_source_next__ (src, n);
+ return count_columns (&src->buffer[token->line_pos - src->tail],
+ token->token_pos - token->line_pos);
}
-\f
-/* Weird line processing functions. */
-/* Returns the entire contents of the current line. */
-const char *
-lex_entire_line (const struct lexer *lexer)
+static int
+lex_source_get_last_column (const struct lex_source *src, int n)
{
- return ds_cstr (&lexer->line_buffer);
+ const struct lex_token *token = lex_source_next__ (src, n);
+ char *start, *end, *newline;
+
+ start = &src->buffer[token->line_pos - src->tail];
+ end = &src->buffer[(token->token_pos + token->token_len) - src->tail];
+ newline = memrchr (start, '\n', end - start);
+ if (newline != NULL)
+ start = newline + 1;
+ return count_columns (start, end - start);
}
-const struct string *
-lex_entire_line_ds (const struct lexer *lexer)
+/* Returns the 1-based line number of the start of the syntax that represents
+ the token N after the current one in LEXER. Returns 0 for a T_STOP token or
+ if the token is drawn from a source that does not have line numbers. */
+int
+lex_get_first_line_number (const struct lexer *lexer, int n)
{
- return &lexer->line_buffer;
+ const struct lex_source *src = lex_source__ (lexer);
+ return src != NULL ? lex_source_get_first_line_number (src, n) : 0;
}
-/* As lex_entire_line(), but only returns the part of the current line
- that hasn't already been tokenized. */
+/* Returns the 1-based line number of the end of the syntax that represents the
+ token N after the current one in LEXER, plus 1. Returns 0 for a T_STOP
+ token or if the token is drawn from a source that does not have line
+ numbers.
+
+ Most of the time, a single token is wholly within a single line of syntax,
+ but there are two exceptions: a T_STRING token can be made up of multiple
+ segments on adjacent lines connected with "+" punctuators, and a T_NEG_NUM
+ token can consist of a "-" on one line followed by the number on the next.
+ */
+int
+lex_get_last_line_number (const struct lexer *lexer, int n)
+{
+ const struct lex_source *src = lex_source__ (lexer);
+ return src != NULL ? lex_source_get_last_line_number (src, n) : 0;
+}
+
+/* Returns the 1-based column number of the start of the syntax that represents
+ the token N after the current one in LEXER. Returns 0 for a T_STOP
+ token.
+
+ Column numbers are measured according to the width of characters as shown in
+ a typical fixed-width font, in which CJK characters have width 2 and
+ combining characters have width 0. */
+int
+lex_get_first_column (const struct lexer *lexer, int n)
+{
+ const struct lex_source *src = lex_source__ (lexer);
+ return src != NULL ? lex_source_get_first_column (src, n) : 0;
+}
+
+/* Returns the 1-based column number of the end of the syntax that represents
+ the token N after the current one in LEXER, plus 1. Returns 0 for a T_STOP
+ token.
+
+ Column numbers are measured according to the width of characters as shown in
+ a typical fixed-width font, in which CJK characters have width 2 and
+ combining characters have width 0. */
+int
+lex_get_last_column (const struct lexer *lexer, int n)
+{
+ const struct lex_source *src = lex_source__ (lexer);
+ return src != NULL ? lex_source_get_last_column (src, n) : 0;
+}
+
+/* Returns the name of the syntax file from which the current command is drawn.
+ Returns NULL for a T_STOP token or if the command's source does not have
+ line numbers.
+
+ There is no version of this function that takes an N argument because
+ lookahead only works to the end of a command and any given command is always
+ within a single syntax file. */
const char *
-lex_rest_of_line (const struct lexer *lexer)
+lex_get_file_name (const struct lexer *lexer)
{
- return lexer->prog;
+ struct lex_source *src = lex_source__ (lexer);
+ return src == NULL ? NULL : src->reader->file_name;
}
-/* Returns true if the current line ends in a terminal dot,
- false otherwise. */
-bool
-lex_end_dot (const struct lexer *lexer)
+/* Returns the syntax mode for the syntax file from which the current drawn is
+ drawn. Returns LEX_SYNTAX_AUTO for a T_STOP token or if the command's
+ source does not have line numbers.
+
+ There is no version of this function that takes an N argument because
+ lookahead only works to the end of a command and any given command is always
+ within a single syntax file. */
+enum lex_syntax_mode
+lex_get_syntax_mode (const struct lexer *lexer)
{
- return lexer->dot;
+ struct lex_source *src = lex_source__ (lexer);
+ return src == NULL ? LEX_SYNTAX_AUTO : src->reader->syntax;
}
-/* Causes the rest of the current input line to be ignored for
- tokenization purposes. */
-void
-lex_discard_line (struct lexer *lexer)
+/* Returns the error mode for the syntax file from which the current drawn is
+ drawn. Returns LEX_ERROR_INTERACTIVE for a T_STOP token or if the command's
+ source does not have line numbers.
+
+ There is no version of this function that takes an N argument because
+ lookahead only works to the end of a command and any given command is always
+ within a single syntax file. */
+enum lex_error_mode
+lex_get_error_mode (const struct lexer *lexer)
{
- ds_cstr (&lexer->line_buffer); /* Ensures ds_end points to something valid */
- lexer->prog = ds_end (&lexer->line_buffer);
- lexer->dot = false;
- lexer->put_token = 0;
+ struct lex_source *src = lex_source__ (lexer);
+ return src == NULL ? LEX_ERROR_INTERACTIVE : src->reader->error;
}
+/* If the source that LEXER is currently reading has error mode
+ LEX_ERROR_INTERACTIVE, discards all buffered input and tokens, so that the
+ next token to be read comes directly from whatever is next read from the
+ stream.
-/* Discards the rest of the current command.
- When we're reading commands from a file, we skip tokens until
- a terminal dot or EOF.
- When we're reading commands interactively from the user,
- that's just discarding the current line, because presumably
- the user doesn't want to finish typing a command that will be
- ignored anyway. */
+ It makes sense to call this function after encountering an error in a
+ command entered on the console, because usually the user would prefer not to
+ have cascading errors. */
+void
+lex_interactive_reset (struct lexer *lexer)
+{
+ struct lex_source *src = lex_source__ (lexer);
+ if (src != NULL && src->reader->error == LEX_ERROR_INTERACTIVE)
+ {
+ src->head = src->tail = 0;
+ src->journal_pos = src->seg_pos = src->line_pos = 0;
+ src->n_newlines = 0;
+ src->suppress_next_newline = false;
+ segmenter_init (&src->segmenter, segmenter_get_mode (&src->segmenter));
+ while (!deque_is_empty (&src->deque))
+ lex_source_pop__ (src);
+ lex_source_push_endcmd__ (src);
+ }
+}
+
+/* Advances past any tokens in LEXER up to a T_ENDCMD or T_STOP. */
void
lex_discard_rest_of_command (struct lexer *lexer)
{
- if (!getl_is_interactive (lexer->ss))
+ while (lex_token (lexer) != T_STOP && lex_token (lexer) != T_ENDCMD)
+ lex_get (lexer);
+}
+
+/* Discards all lookahead tokens in LEXER, then discards all input sources
+ until it encounters one with error mode LEX_ERROR_INTERACTIVE or until it
+ runs out of input sources. */
+void
+lex_discard_noninteractive (struct lexer *lexer)
+{
+ struct lex_source *src = lex_source__ (lexer);
+
+ if (src != NULL)
{
- while (lexer->token != T_STOP && lexer->token != T_ENDCMD)
- lex_get (lexer);
+ while (!deque_is_empty (&src->deque))
+ lex_source_pop__ (src);
+
+ for (; src != NULL && src->reader->error != LEX_ERROR_INTERACTIVE;
+ src = lex_source__ (lexer))
+ lex_source_destroy (src);
}
- else
- lex_discard_line (lexer);
}
\f
-/* Weird line reading functions. */
+static size_t
+lex_source_max_tail__ (const struct lex_source *src)
+{
+ const struct lex_token *token;
+ size_t max_tail;
+
+ assert (src->seg_pos >= src->line_pos);
+ max_tail = MIN (src->journal_pos, src->line_pos);
+
+ /* Use the oldest token also. (We know that src->deque cannot be empty
+ because we are in the process of adding a new token, which is already
+ initialized enough to use here.) */
+ token = &src->tokens[deque_back (&src->deque, 0)];
+ assert (token->token_pos >= token->line_pos);
+ max_tail = MIN (max_tail, token->line_pos);
+
+ return max_tail;
+}
-/* Remove C-style comments in STRING, begun by slash-star and
- terminated by star-slash or newline. */
static void
-strip_comments (struct string *string)
+lex_source_expand__ (struct lex_source *src)
{
- char *cp;
- int quote;
- bool in_comment;
-
- in_comment = false;
- quote = EOF;
- for (cp = ds_cstr (string); *cp; )
+ if (src->head - src->tail >= src->allocated)
{
- /* If we're not in a comment, check for quote marks. */
- if (!in_comment)
+ size_t max_tail = lex_source_max_tail__ (src);
+ if (max_tail > src->tail)
{
- if (*cp == quote)
- quote = EOF;
- else if (*cp == '\'' || *cp == '"')
- quote = *cp;
+ /* Advance the tail, freeing up room at the head. */
+ memmove (src->buffer, src->buffer + (max_tail - src->tail),
+ src->head - max_tail);
+ src->tail = max_tail;
}
-
- /* If we're not inside a quotation, check for comment. */
- if (quote == EOF)
+ else
{
- if (cp[0] == '/' && cp[1] == '*')
- {
- in_comment = true;
- *cp++ = ' ';
- *cp++ = ' ';
- continue;
- }
- else if (in_comment && cp[0] == '*' && cp[1] == '/')
- {
- in_comment = false;
- *cp++ = ' ';
- *cp++ = ' ';
- continue;
- }
+ /* Buffer is completely full. Expand it. */
+ src->buffer = x2realloc (src->buffer, &src->allocated);
}
-
- /* Check commenting. */
- if (in_comment)
- *cp = ' ';
- cp++;
}
-}
-
-/* Prepares LINE, which is subject to the given SYNTAX rules, for
- tokenization by stripping comments and determining whether it
- is the beginning or end of a command and storing into
- *LINE_STARTS_COMMAND and *LINE_ENDS_COMMAND appropriately. */
-void
-lex_preprocess_line (struct string *line,
- enum syntax_mode syntax,
- bool *line_starts_command,
- bool *line_ends_command)
-{
- strip_comments (line);
- ds_rtrim (line, ss_cstr (CC_SPACES));
- *line_ends_command = ds_chomp_byte (line, '.') || ds_is_empty (line);
- *line_starts_command = false;
- if (syntax == GETL_BATCH)
+ else
{
- int first = ds_first (line);
- *line_starts_command = !c_isspace (first);
- if (first == '+' || first == '-')
- *ds_data (line) = ' ';
+ /* There's space available at the head of the buffer. Nothing to do. */
}
}
-/* Reads a line, without performing any preprocessing. */
-bool
-lex_get_line_raw (struct lexer *lexer)
+static void
+lex_source_read__ (struct lex_source *src)
{
- bool ok = getl_read_line (lexer->ss, &lexer->line_buffer);
- if (ok)
+ do
{
- const char *line = ds_cstr (&lexer->line_buffer);
- text_item_submit (text_item_create (TEXT_ITEM_SYNTAX, line));
+ size_t head_ofs;
+ size_t n;
+
+ lex_source_expand__ (src);
+
+ head_ofs = src->head - src->tail;
+ n = src->reader->class->read (src->reader, &src->buffer[head_ofs],
+ src->allocated - head_ofs,
+ segmenter_get_prompt (&src->segmenter));
+ if (n == 0)
+ {
+ /* End of input.
+
+ Ensure that the input always ends in a new-line followed by a null
+ byte, as required by the segmenter library. */
+
+ if (src->head == src->tail
+ || src->buffer[src->head - src->tail - 1] != '\n')
+ src->buffer[src->head++ - src->tail] = '\n';
+
+ lex_source_expand__ (src);
+ src->buffer[src->head++ - src->tail] = '\0';
+
+ return;
+ }
+
+ src->head += n;
}
- else
- lexer->prog = NULL;
- return ok;
+ while (!memchr (&src->buffer[src->seg_pos - src->tail], '\n',
+ src->head - src->seg_pos));
}
-/* Reads a line for use by the tokenizer, and preprocesses it by
- removing comments, stripping trailing whitespace and the
- terminal dot, and removing leading indentors. */
-bool
-lex_get_line (struct lexer *lexer)
+static struct lex_source *
+lex_source__ (const struct lexer *lexer)
+{
+ return (ll_is_empty (&lexer->sources) ? NULL
+ : ll_data (ll_head (&lexer->sources), struct lex_source, ll));
+}
+
+static struct substring
+lex_source_get_syntax__ (const struct lex_source *src, int n0, int n1)
{
- bool line_starts_command;
+ const struct lex_token *token0 = lex_source_next__ (src, n0);
+ const struct lex_token *token1 = lex_source_next__ (src, MAX (n0, n1));
+ size_t start = token0->token_pos;
+ size_t end = token1->token_pos + token1->token_len;
- if (!lex_get_line_raw (lexer))
- return false;
+ return ss_buffer (&src->buffer[start - src->tail], end - start);
+}
- lex_preprocess_line (&lexer->line_buffer,
- lex_current_syntax_mode (lexer),
- &line_starts_command, &lexer->dot);
+static void
+lex_ellipsize__ (struct substring in, char *out, size_t out_size)
+{
+ size_t out_maxlen;
+ size_t out_len;
+ int mblen;
- if (line_starts_command)
- lexer->put_token = T_ENDCMD;
+ assert (out_size >= 16);
+ out_maxlen = out_size - (in.length >= out_size ? 3 : 0) - 1;
+ for (out_len = 0; out_len < in.length; out_len += mblen)
+ {
+ if (in.string[out_len] == '\n'
+ || (in.string[out_len] == '\r'
+ && out_len + 1 < in.length
+ && in.string[out_len + 1] == '\n'))
+ break;
+
+ mblen = u8_mblen (CHAR_CAST (const uint8_t *, in.string + out_len),
+ in.length - out_len);
+ if (out_len + mblen > out_maxlen)
+ break;
+ }
- lexer->prog = ds_cstr (&lexer->line_buffer);
- return true;
+ memcpy (out, in.string, out_len);
+ strcpy (&out[out_len], out_len < in.length ? "..." : "");
}
-\f
-/* Token names. */
-/* Returns the name of a token. */
-const char *
-lex_token_name (enum token_type token)
+static void
+lex_source_error_valist (struct lex_source *src, int n0, int n1,
+ const char *format, va_list args)
{
- switch (token)
- {
- case T_ID:
- case T_POS_NUM:
- case T_NEG_NUM:
- case T_STRING:
- case TOKEN_N_TYPES:
- NOT_REACHED ();
+ const struct lex_token *token;
+ struct string s;
+ struct msg m;
- case T_STOP:
- return "";
+ ds_init_empty (&s);
- case T_ENDCMD:
- return ".";
+ token = lex_source_next__ (src, n0);
+ if (token->token.type == T_ENDCMD)
+ ds_put_cstr (&s, _("Syntax error at end of command"));
+ else
+ {
+ struct substring syntax = lex_source_get_syntax__ (src, n0, n1);
+ if (!ss_is_empty (syntax))
+ {
+ char syntax_cstr[64];
- case T_PLUS:
- return "+";
+ lex_ellipsize__ (syntax, syntax_cstr, sizeof syntax_cstr);
+ ds_put_format (&s, _("Syntax error at `%s'"), syntax_cstr);
+ }
+ else
+ ds_put_cstr (&s, _("Syntax error"));
+ }
- case T_DASH:
- return "-";
+ if (format)
+ {
+ ds_put_cstr (&s, ": ");
+ ds_put_vformat (&s, format, args);
+ }
+ ds_put_byte (&s, '.');
+
+ m.category = MSG_C_SYNTAX;
+ m.severity = MSG_S_ERROR;
+ m.file_name = src->reader->file_name;
+ m.first_line = lex_source_get_first_line_number (src, n0);
+ m.last_line = lex_source_get_last_line_number (src, n1);
+ m.first_column = lex_source_get_first_column (src, n0);
+ m.last_column = lex_source_get_last_column (src, n1);
+ m.text = ds_steal_cstr (&s);
+ msg_emit (&m);
+}
- case T_ASTERISK:
- return "*";
+static void PRINTF_FORMAT (2, 3)
+lex_get_error (struct lex_source *src, const char *format, ...)
+{
+ va_list args;
+ int n;
- case T_SLASH:
- return "/";
+ va_start (args, format);
- case T_EQUALS:
- return "=";
+ n = deque_count (&src->deque) - 1;
+ lex_source_error_valist (src, n, n, format, args);
+ lex_source_pop_front (src);
- case T_LPAREN:
- return "(";
+ va_end (args);
+}
- case T_RPAREN:
- return ")";
+static bool
+lex_source_get__ (const struct lex_source *src_)
+{
+ struct lex_source *src = CONST_CAST (struct lex_source *, src_);
- case T_LBRACK:
- return "[";
+ struct state
+ {
+ struct segmenter segmenter;
+ enum segment_type last_segment;
+ int newlines;
+ size_t line_pos;
+ size_t seg_pos;
+ };
+
+ struct state state, saved;
+ enum scan_result result;
+ struct scanner scanner;
+ struct lex_token *token;
+ int n_lines;
+ int i;
+
+ if (src->eof)
+ return false;
- case T_RBRACK:
- return "]";
+ state.segmenter = src->segmenter;
+ state.newlines = 0;
+ state.seg_pos = src->seg_pos;
+ state.line_pos = src->line_pos;
+ saved = state;
+
+ token = lex_push_token__ (src);
+ scanner_init (&scanner, &token->token);
+ token->line_pos = src->line_pos;
+ token->token_pos = src->seg_pos;
+ if (src->reader->line_number > 0)
+ token->first_line = src->reader->line_number + src->n_newlines;
+ else
+ token->first_line = 0;
- case T_COMMA:
- return ",";
+ for (;;)
+ {
+ enum segment_type type;
+ const char *segment;
+ size_t seg_maxlen;
+ int seg_len;
+
+ segment = &src->buffer[state.seg_pos - src->tail];
+ seg_maxlen = src->head - state.seg_pos;
+ seg_len = segmenter_push (&state.segmenter, segment, seg_maxlen, &type);
+ if (seg_len < 0)
+ {
+ lex_source_read__ (src);
+ continue;
+ }
- case T_AND:
- return "AND";
+ state.last_segment = type;
+ state.seg_pos += seg_len;
+ if (type == SEG_NEWLINE)
+ {
+ state.newlines++;
+ state.line_pos = state.seg_pos;
+ }
- case T_OR:
- return "OR";
+ result = scanner_push (&scanner, type, ss_buffer (segment, seg_len),
+ &token->token);
+ if (result == SCAN_SAVE)
+ saved = state;
+ else if (result == SCAN_BACK)
+ {
+ state = saved;
+ break;
+ }
+ else if (result == SCAN_DONE)
+ break;
+ }
- case T_NOT:
- return "NOT";
+ n_lines = state.newlines;
+ if (state.last_segment == SEG_END_COMMAND && !src->suppress_next_newline)
+ {
+ n_lines++;
+ src->suppress_next_newline = true;
+ }
+ else if (n_lines > 0 && src->suppress_next_newline)
+ {
+ n_lines--;
+ src->suppress_next_newline = false;
+ }
+ for (i = 0; i < n_lines; i++)
+ {
+ const char *newline;
+ const char *line;
+ size_t line_len;
- case T_EQ:
- return "EQ";
+ line = &src->buffer[src->journal_pos - src->tail];
+ newline = rawmemchr (line, '\n');
+ line_len = newline - line;
+ if (line_len > 0 && line[line_len - 1] == '\r')
+ line_len--;
- case T_GE:
- return ">=";
+ text_item_submit (text_item_create_nocopy (TEXT_ITEM_SYNTAX,
+ xmemdup0 (line, line_len)));
- case T_GT:
- return ">";
+ src->journal_pos += newline - line + 1;
+ }
- case T_LE:
- return "<=";
+ token->token_len = state.seg_pos - src->seg_pos;
- case T_LT:
- return "<";
+ src->segmenter = state.segmenter;
+ src->seg_pos = state.seg_pos;
+ src->line_pos = state.line_pos;
+ src->n_newlines += state.newlines;
- case T_NE:
- return "~=";
+ switch (token->token.type)
+ {
+ default:
+ break;
- case T_ALL:
- return "ALL";
+ case T_STOP:
+ token->token.type = T_ENDCMD;
+ src->eof = true;
+ break;
- case T_BY:
- return "BY";
+ case SCAN_BAD_HEX_LENGTH:
+ lex_get_error (src, _("String of hex digits has %d characters, which "
+ "is not a multiple of 2"),
+ (int) token->token.number);
+ break;
- case T_TO:
- return "TO";
+ case SCAN_BAD_HEX_DIGIT:
+ case SCAN_BAD_UNICODE_DIGIT:
+ lex_get_error (src, _("`%c' is not a valid hex digit"),
+ (int) token->token.number);
+ break;
- case T_WITH:
- return "WITH";
+ case SCAN_BAD_UNICODE_LENGTH:
+ lex_get_error (src, _("Unicode string contains %d bytes, which is "
+ "not in the valid range of 1 to 8 bytes"),
+ (int) token->token.number);
+ break;
- case T_EXP:
- return "**";
- }
+ case SCAN_BAD_UNICODE_CODE_POINT:
+ lex_get_error (src, _("U+%04X is not a valid Unicode code point"),
+ (int) token->token.number);
+ break;
- NOT_REACHED ();
-}
+ case SCAN_EXPECTED_QUOTE:
+ lex_get_error (src, _("Unterminated string constant"));
+ break;
-/* Returns an ASCII representation of the current token as a
- malloc()'d string. */
-char *
-lex_token_representation (struct lexer *lexer)
-{
- char *token_rep;
+ case SCAN_EXPECTED_EXPONENT:
+ lex_get_error (src, _("Missing exponent following `%s'"),
+ token->token.string.string);
+ break;
- switch (lexer->token)
- {
- case T_ID:
- case T_POS_NUM:
- case T_NEG_NUM:
- return ss_xstrdup (lex_tokss (lexer));
+ case SCAN_UNEXPECTED_DOT:
+ lex_get_error (src, _("Unexpected `.' in middle of command"));
+ break;
- case T_STRING:
+ case SCAN_UNEXPECTED_CHAR:
{
- struct substring ss;
- int hexstring = 0;
- char *sp, *dp;
-
- ss = lex_tokss (lexer);
- for (sp = ss_data (ss); sp < ss_end (ss); sp++)
- if (!c_isprint ((unsigned char) *sp))
- {
- hexstring = 1;
- break;
- }
-
- token_rep = xmalloc (2 + ss_length (ss) * 2 + 1 + 1);
-
- dp = token_rep;
- if (hexstring)
- *dp++ = 'X';
- *dp++ = '\'';
-
- for (sp = ss_data (ss); sp < ss_end (ss); sp++)
- if (!hexstring)
- {
- if (*sp == '\'')
- *dp++ = '\'';
- *dp++ = (unsigned char) *sp;
- }
- else
- {
- *dp++ = (((unsigned char) *sp) >> 4)["0123456789ABCDEF"];
- *dp++ = (((unsigned char) *sp) & 15)["0123456789ABCDEF"];
- }
- *dp++ = '\'';
- *dp = '\0';
-
- return token_rep;
+ char c_name[16];
+ lex_get_error (src, _("Bad character %s in input"),
+ uc_name (token->token.number, c_name));
}
+ break;
- default:
- return xstrdup (lex_token_name (lexer->token));
+ case SCAN_SKIP:
+ lex_source_pop_front (src);
+ break;
}
+
+ return true;
}
\f
-/* Really weird functions. */
+static void
+lex_source_push_endcmd__ (struct lex_source *src)
+{
+ struct lex_token *token = lex_push_token__ (src);
+ token->token.type = T_ENDCMD;
+ token->token_pos = 0;
+ token->token_len = 0;
+ token->line_pos = 0;
+ token->first_line = 0;
+}
-/* Skip a COMMENT command. */
-void
-lex_skip_comment (struct lexer *lexer)
+static struct lex_source *
+lex_source_create (struct lex_reader *reader)
{
- for (;;)
- {
- if (!lex_get_line (lexer))
- {
- lexer->put_token = T_STOP;
- lexer->prog = NULL;
- return;
- }
+ struct lex_source *src;
+ enum segmenter_mode mode;
+
+ src = xzalloc (sizeof *src);
+ src->reader = reader;
+
+ if (reader->syntax == LEX_SYNTAX_AUTO)
+ mode = SEG_MODE_AUTO;
+ else if (reader->syntax == LEX_SYNTAX_INTERACTIVE)
+ mode = SEG_MODE_INTERACTIVE;
+ else if (reader->syntax == LEX_SYNTAX_BATCH)
+ mode = SEG_MODE_BATCH;
+ else
+ NOT_REACHED ();
+ segmenter_init (&src->segmenter, mode);
- if (lexer->put_token == T_ENDCMD)
- break;
+ src->tokens = deque_init (&src->deque, 4, sizeof *src->tokens);
- ds_cstr (&lexer->line_buffer); /* Ensures ds_end will point to a valid char */
- lexer->prog = ds_end (&lexer->line_buffer);
- if (lexer->dot)
- break;
- }
+ lex_source_push_endcmd__ (src);
+
+ return src;
}
-\f
-/* Private functions. */
-/* When invoked, tokstr contains a string of binary, octal, or
- hex digits, according to TYPE. The string is converted to
- characters having the specified values. */
static void
-convert_numeric_string_to_char_string (struct lexer *lexer,
- enum string_type type)
+lex_source_destroy (struct lex_source *src)
+{
+ char *file_name = src->reader->file_name;
+ if (src->reader->class->close != NULL)
+ src->reader->class->close (src->reader);
+ free (file_name);
+ free (src->buffer);
+ while (!deque_is_empty (&src->deque))
+ lex_source_pop__ (src);
+ free (src->tokens);
+ ll_remove (&src->ll);
+ free (src);
+}
+\f
+struct lex_file_reader
+ {
+ struct lex_reader reader;
+ struct u8_istream *istream;
+ char *file_name;
+ };
+
+static struct lex_reader_class lex_file_reader_class;
+
+/* Creates and returns a new lex_reader that will read from file FILE_NAME (or
+ from stdin if FILE_NAME is "-"). The file is expected to be encoded with
+ ENCODING, which should take one of the forms accepted by
+ u8_istream_for_file(). SYNTAX and ERROR become the syntax mode and error
+ mode of the new reader, respectively.
+
+ Returns a null pointer if FILE_NAME cannot be opened. */
+struct lex_reader *
+lex_reader_for_file (const char *file_name, const char *encoding,
+ enum lex_syntax_mode syntax,
+ enum lex_error_mode error)
{
- const char *base_name;
- int base;
- int chars_per_byte;
- size_t byte_cnt;
- size_t i;
- char *p;
+ struct lex_file_reader *r;
+ struct u8_istream *istream;
- switch (type)
+ istream = (!strcmp(file_name, "-")
+ ? u8_istream_for_fd (encoding, STDIN_FILENO)
+ : u8_istream_for_file (encoding, file_name, O_RDONLY));
+ if (istream == NULL)
{
- case BINARY_STRING:
- base_name = _("binary");
- base = 2;
- chars_per_byte = 8;
- break;
- case OCTAL_STRING:
- base_name = _("octal");
- base = 8;
- chars_per_byte = 3;
- break;
- case HEX_STRING:
- base_name = _("hex");
- base = 16;
- chars_per_byte = 2;
- break;
- default:
- NOT_REACHED ();
+ msg (ME, _("Opening `%s': %s."), file_name, strerror (errno));
+ return NULL;
}
- byte_cnt = ds_length (&lexer->tokstr) / chars_per_byte;
- if (ds_length (&lexer->tokstr) % chars_per_byte)
- msg (SE, _("String of %s digits has %zu characters, which is not a "
- "multiple of %d."),
- base_name, ds_length (&lexer->tokstr), chars_per_byte);
+ r = xmalloc (sizeof *r);
+ lex_reader_init (&r->reader, &lex_file_reader_class);
+ r->reader.syntax = syntax;
+ r->reader.error = error;
+ r->reader.file_name = xstrdup (file_name);
+ r->reader.line_number = 1;
+ r->istream = istream;
+ r->file_name = xstrdup (file_name);
- p = ds_cstr (&lexer->tokstr);
- for (i = 0; i < byte_cnt; i++)
+ return &r->reader;
+}
+
+static struct lex_file_reader *
+lex_file_reader_cast (struct lex_reader *r)
+{
+ return UP_CAST (r, struct lex_file_reader, reader);
+}
+
+static size_t
+lex_file_read (struct lex_reader *r_, char *buf, size_t n,
+ enum prompt_style prompt_style UNUSED)
+{
+ struct lex_file_reader *r = lex_file_reader_cast (r_);
+ ssize_t n_read = u8_istream_read (r->istream, buf, n);
+ if (n_read < 0)
{
- int value;
- int j;
-
- value = 0;
- for (j = 0; j < chars_per_byte; j++, p++)
- {
- int v;
-
- if (*p >= '0' && *p <= '9')
- v = *p - '0';
- else
- {
- static const char alpha[] = "abcdef";
- const char *q = strchr (alpha, tolower ((unsigned char) *p));
-
- if (q)
- v = q - alpha + 10;
- else
- v = base;
- }
-
- if (v >= base)
- msg (SE, _("`%c' is not a valid %s digit."), *p, base_name);
-
- value = value * base + v;
- }
-
- ds_cstr (&lexer->tokstr)[i] = (unsigned char) value;
+ msg (ME, _("Error reading `%s': %s."), r->file_name, strerror (errno));
+ return 0;
}
-
- ds_truncate (&lexer->tokstr, byte_cnt);
+ return n_read;
}
-/* Parses a string from the input buffer into tokstr. The input
- buffer pointer lexer->prog must point to the initial single or double
- quote. TYPE indicates the type of string to be parsed.
- Returns token type. */
-static int
-parse_string (struct lexer *lexer, enum string_type type)
+static void
+lex_file_close (struct lex_reader *r_)
{
- if (type != CHARACTER_STRING)
- lexer->prog++;
+ struct lex_file_reader *r = lex_file_reader_cast (r_);
- /* Accumulate the entire string, joining sections indicated by +
- signs. */
- for (;;)
+ if (u8_istream_fileno (r->istream) != STDIN_FILENO)
{
- /* Single or double quote. */
- int c = *lexer->prog++;
-
- /* Accumulate section. */
- for (;;)
- {
- /* Check end of line. */
- if (*lexer->prog == '\0')
- {
- msg (SE, _("Unterminated string constant."));
- goto finish;
- }
-
- /* Double quote characters to embed them in strings. */
- if (*lexer->prog == c)
- {
- if (lexer->prog[1] == c)
- lexer->prog++;
- else
- break;
- }
-
- ds_put_byte (&lexer->tokstr, *lexer->prog++);
- }
- lexer->prog++;
-
- /* Skip whitespace after final quote mark. */
- if (lexer->prog == NULL)
- break;
- for (;;)
- {
- while (c_isspace ((unsigned char) *lexer->prog))
- lexer->prog++;
- if (*lexer->prog)
- break;
-
- if (lexer->dot)
- goto finish;
-
- if (!lex_get_line (lexer))
- goto finish;
- }
-
- /* Skip plus sign. */
- if (*lexer->prog != '+')
- break;
- lexer->prog++;
-
- /* Skip whitespace after plus sign. */
- if (lexer->prog == NULL)
- break;
- for (;;)
- {
- while (c_isspace ((unsigned char) *lexer->prog))
- lexer->prog++;
- if (*lexer->prog)
- break;
-
- if (lexer->dot)
- goto finish;
-
- if (!lex_get_line (lexer))
- {
- msg (SE, _("Unexpected end of file in string concatenation."));
- goto finish;
- }
- }
-
- /* Ensure that a valid string follows. */
- if (*lexer->prog != '\'' && *lexer->prog != '"')
- {
- msg (SE, _("String expected following `+'."));
- goto finish;
- }
+ if (u8_istream_close (r->istream) != 0)
+ msg (ME, _("Error closing `%s': %s."), r->file_name, strerror (errno));
}
+ else
+ u8_istream_free (r->istream);
- /* We come here when we've finished concatenating all the string sections
- into one large string. */
-finish:
- if (type != CHARACTER_STRING)
- convert_numeric_string_to_char_string (lexer, type);
-
- return T_STRING;
+ free (r->file_name);
+ free (r);
}
+
+static struct lex_reader_class lex_file_reader_class =
+ {
+ lex_file_read,
+ lex_file_close
+ };
\f
-/* Token Accessor Functions */
+struct lex_string_reader
+ {
+ struct lex_reader reader;
+ struct substring s;
+ size_t offset;
+ };
-enum token_type
-lex_token (const struct lexer *lexer)
+static struct lex_reader_class lex_string_reader_class;
+
+/* Creates and returns a new lex_reader for the contents of S, which must be
+ encoded in UTF-8. The new reader takes ownership of S and will free it
+ with ss_dealloc() when it is closed. */
+struct lex_reader *
+lex_reader_for_substring_nocopy (struct substring s)
{
- return lexer->token;
+ struct lex_string_reader *r;
+
+ r = xmalloc (sizeof *r);
+ lex_reader_init (&r->reader, &lex_string_reader_class);
+ r->reader.syntax = LEX_SYNTAX_INTERACTIVE;
+ r->s = s;
+ r->offset = 0;
+
+ return &r->reader;
}
-double
-lex_tokval (const struct lexer *lexer)
+/* Creates and returns a new lex_reader for a copy of null-terminated string S,
+ which must be encoded in UTF-8. The caller retains ownership of S. */
+struct lex_reader *
+lex_reader_for_string (const char *s)
{
- return lexer->tokval;
+ struct substring ss;
+ ss_alloc_substring (&ss, ss_cstr (s));
+ return lex_reader_for_substring_nocopy (ss);
}
-/* Returns the null-terminated string value associated with LEXER's current
- token. For a T_ID token, this is the identifier, and for a T_STRING token,
- this is the string. For other tokens the value is undefined. */
-const char *
-lex_tokcstr (const struct lexer *lexer)
+/* Formats FORMAT as a printf()-like format string and creates and returns a
+ new lex_reader for the formatted result. */
+struct lex_reader *
+lex_reader_for_format (const char *format, ...)
{
- return ds_cstr (&lexer->tokstr);
+ struct lex_reader *r;
+ va_list args;
+
+ va_start (args, format);
+ r = lex_reader_for_substring_nocopy (ss_cstr (xvasprintf (format, args)));
+ va_end (args);
+
+ return r;
}
-/* Returns the string value associated with LEXER's current token. For a T_ID
- token, this is the identifier, and for a T_STRING token, this is the string.
- For other tokens the value is undefined. */
-struct substring
-lex_tokss (const struct lexer *lexer)
+static struct lex_string_reader *
+lex_string_reader_cast (struct lex_reader *r)
{
- return ds_ss (&lexer->tokstr);
+ return UP_CAST (r, struct lex_string_reader, reader);
}
-/* If the lexer is positioned at the (pseudo)identifier S, which
- may contain a hyphen ('-'), skips it and returns true. Each
- half of the identifier may be abbreviated to its first three
- letters.
- Otherwise, returns false. */
-bool
-lex_match_hyphenated_word (struct lexer *lexer, const char *s)
-{
- const char *hyphen = strchr (s, '-');
- if (hyphen == NULL)
- return lex_match_id (lexer, s);
- else if (lexer->token != T_ID
- || !lex_id_match (ss_buffer (s, hyphen - s), lex_tokss (lexer))
- || lex_look_ahead (lexer) != T_DASH)
- return false;
- else
- {
- lex_get (lexer);
- lex_force_match (lexer, T_DASH);
- lex_force_match_id (lexer, hyphen + 1);
- return true;
- }
+static size_t
+lex_string_read (struct lex_reader *r_, char *buf, size_t n,
+ enum prompt_style prompt_style UNUSED)
+{
+ struct lex_string_reader *r = lex_string_reader_cast (r_);
+ size_t chunk;
+
+ chunk = MIN (n, r->s.length - r->offset);
+ memcpy (buf, r->s.string + r->offset, chunk);
+ r->offset += chunk;
+
+ return chunk;
}
+static void
+lex_string_close (struct lex_reader *r_)
+{
+ struct lex_string_reader *r = lex_string_reader_cast (r_);
+
+ ss_dealloc (&r->s);
+ free (r);
+}
+
+static struct lex_reader_class lex_string_reader_class =
+ {
+ lex_string_read,
+ lex_string_close
+ };
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. */
-#if !lexer_h
-#define lexer_h 1
+#ifndef LEXER_H
+#define LEXER_H 1
-#include <ctype.h>
#include <stdbool.h>
#include <stddef.h>
#include "data/identifier.h"
#include "data/variable.h"
-#include "libpspp/getl.h"
+#include "libpspp/compiler.h"
+#include "libpspp/prompt.h"
struct lexer;
+/* The syntax mode for which a syntax file is intended. */
+enum lex_syntax_mode
+ {
+ LEX_SYNTAX_AUTO, /* Try to guess intent. */
+ LEX_SYNTAX_INTERACTIVE, /* Interactive mode. */
+ LEX_SYNTAX_BATCH /* Batch mode. */
+ };
+
+/* Handling of errors. */
+enum lex_error_mode
+ {
+ LEX_ERROR_INTERACTIVE, /* Always continue to next command. */
+ LEX_ERROR_CONTINUE, /* Continue to next command, except for
+ cascading failures. */
+ LEX_ERROR_STOP /* Stop processing. */
+ };
+
+/* Reads a single syntax file as a stream of bytes encoded in UTF-8.
+
+ Not opaque. */
+struct lex_reader
+ {
+ const struct lex_reader_class *class;
+ enum lex_syntax_mode syntax;
+ enum lex_error_mode error;
+ char *file_name; /* NULL if not associated with a file. */
+ int line_number; /* 1-based initial line number, 0 if none. */
+ };
+
+/* An implementation of a lex_reader. */
+struct lex_reader_class
+ {
+ /* Reads up to N bytes of data from READER into N. Returns the positive
+ number of bytes read if successful, or zero at end of input or on
+ error.
+
+ STYLE provides a hint to interactive readers as to what kind of syntax
+ is being read right now. */
+ size_t (*read) (struct lex_reader *reader, char *buf, size_t n,
+ enum prompt_style style);
+
+ /* Closes and destroys READER, releasing any allocated storage.
+
+ The caller will free the 'file_name' member of READER, so the
+ implementation should not do so. */
+ void (*close) (struct lex_reader *reader);
+ };
+
+/* Helper functions for lex_reader. */
+void lex_reader_init (struct lex_reader *, const struct lex_reader_class *);
+void lex_reader_set_file_name (struct lex_reader *, const char *file_name);
+
+/* Creating various kinds of lex_readers. */
+struct lex_reader *lex_reader_for_file (const char *file_name,
+ const char *encoding,
+ enum lex_syntax_mode syntax,
+ enum lex_error_mode error);
+struct lex_reader *lex_reader_for_string (const char *);
+struct lex_reader *lex_reader_for_format (const char *, ...)
+ PRINTF_FORMAT (1, 2);
+struct lex_reader *lex_reader_for_substring_nocopy (struct substring);
+
/* Initialization. */
-struct lexer * lex_create (struct source_stream *);
+struct lexer *lex_create (void);
void lex_destroy (struct lexer *);
-/* State accessors */
-struct source_stream * lex_get_source_stream (const struct lexer *);
-enum syntax_mode lex_current_syntax_mode (const struct lexer *);
-enum error_mode lex_current_error_mode (const struct lexer *);
+/* Files. */
+void lex_include (struct lexer *, struct lex_reader *);
+void lex_append (struct lexer *, struct lex_reader *);
-/* Common functions. */
+/* Advancing. */
void lex_get (struct lexer *);
-void lex_error (struct lexer *, const char *, ...);
-void lex_sbc_only_once (const char *);
-void lex_sbc_missing (struct lexer *, const char *);
-int lex_end_of_command (struct lexer *);
/* Token testing functions. */
bool lex_is_number (struct lexer *);
long lex_integer (struct lexer *);
bool lex_is_string (struct lexer *);
+/* Token testing functions with lookahead. */
+bool lex_next_is_number (struct lexer *, int n);
+double lex_next_number (struct lexer *, int n);
+bool lex_next_is_integer (struct lexer *, int n);
+long lex_next_integer (struct lexer *, int n);
+bool lex_next_is_string (struct lexer *, int n);
/* Token matching functions. */
bool lex_match (struct lexer *, enum token_type);
bool lex_match_id (struct lexer *, const char *);
bool lex_match_id_n (struct lexer *, const char *, size_t n);
bool lex_match_int (struct lexer *, int);
-bool lex_match_hyphenated_word (struct lexer *lexer, const char *s);
-
+bool lex_match_phrase (struct lexer *, const char *s);
/* Forcible matching functions. */
bool lex_force_match (struct lexer *, enum token_type);
bool lex_force_id (struct lexer *);
bool lex_force_string (struct lexer *);
-/* Weird token functions. */
-enum token_type lex_look_ahead (struct lexer *);
-void lex_put_back (struct lexer *, enum token_type);
-
-/* Weird line processing functions. */
-const char *lex_entire_line (const struct lexer *);
-const struct string *lex_entire_line_ds (const struct lexer *);
-const char *lex_rest_of_line (const struct lexer *);
-bool lex_end_dot (const struct lexer *);
-void lex_preprocess_line (struct string *, enum syntax_mode,
- bool *line_starts_command,
- bool *line_ends_command);
-void lex_discard_line (struct lexer *);
-void lex_discard_rest_of_command (struct lexer *);
-
-/* Weird line reading functions. */
-bool lex_get_line (struct lexer *);
-bool lex_get_line_raw (struct lexer *);
-
-/* Token names. */
-const char *lex_token_name (enum token_type);
-char *lex_token_representation (struct lexer *);
-
-/* Token accessors */
+/* Token accessors. */
enum token_type lex_token (const struct lexer *);
double lex_tokval (const struct lexer *);
const char *lex_tokcstr (const struct lexer *);
struct substring lex_tokss (const struct lexer *);
-/* Really weird functions. */
-void lex_skip_comment (struct lexer *);
+/* Looking ahead. */
+const struct token *lex_next (const struct lexer *, int n);
+enum token_type lex_next_token (const struct lexer *, int n);
+const char *lex_next_tokcstr (const struct lexer *, int n);
+double lex_next_tokval (const struct lexer *, int n);
+struct substring lex_next_tokss (const struct lexer *, int n);
+
+/* Current position. */
+int lex_get_first_line_number (const struct lexer *, int n);
+int lex_get_last_line_number (const struct lexer *, int n);
+int lex_get_first_column (const struct lexer *, int n);
+int lex_get_last_column (const struct lexer *, int n);
+const char *lex_get_file_name (const struct lexer *);
+
+/* Issuing errors. */
+void lex_error (struct lexer *, const char *, ...) PRINTF_FORMAT (2, 3);
+void lex_next_error (struct lexer *, int n0, int n1, const char *, ...)
+ PRINTF_FORMAT (4, 5);
+int lex_end_of_command (struct lexer *);
+
+void lex_sbc_only_once (const char *);
+void lex_sbc_missing (struct lexer *, const char *);
+
+void lex_error_valist (struct lexer *, const char *, va_list)
+ PRINTF_FORMAT (2, 0);
+void lex_next_error_valist (struct lexer *lexer, int n0, int n1,
+ const char *format, va_list)
+ PRINTF_FORMAT (4, 0);
+
+/* Error handling. */
+enum lex_syntax_mode lex_get_syntax_mode (const struct lexer *);
+enum lex_error_mode lex_get_error_mode (const struct lexer *);
+void lex_discard_rest_of_command (struct lexer *);
+void lex_interactive_reset (struct lexer *);
+void lex_discard_noninteractive (struct lexer *);
-#endif /* !lexer_h */
+#endif /* lexer.h */
else if (strchr (t, hyphen_proxy))
{
char *c = unmunge (t);
- sprintf (s, "lex_match_hyphenated_word (lexer, \"%s\")", c);
+ sprintf (s, "lex_match_phrase (lexer, \"%s\")", c);
free (c);
}
else
if (def->type == SBC_VARLIST)
dump (1, "if (lex_token (lexer) == T_ID "
"&& dict_lookup_var (dataset_dict (ds), lex_tokcstr (lexer)) != NULL "
- "&& lex_look_ahead (lexer) != '=')");
+ "&& lex_next_token (lexer, 1) != T_EQUALS)");
else
{
dump (0, "if ((lex_token (lexer) == T_ID "
"&& dict_lookup_var (dataset_dict (ds), lex_tokcstr (lexer)) "
- "&& lex_look_ahead () != '=')");
+ "&& lex_next_token (lexer, 1) != T_EQUALS)");
dump (1, " || token == T_ALL)");
}
dump (1, "{");
assert (fmt_get_category (*format) != FMT_CAT_STRING);
- if (!data_in_msg (lex_tokss (lexer), C_ENCODING, *format, &v, 0, NULL))
+ if (!data_in_msg (lex_tokss (lexer), "UTF-8", *format, &v, 0, NULL))
return false;
lex_get (lexer);
/* Parses a list of variable names according to the DATA LIST version
of the TO convention. */
bool
-parse_DATA_LIST_vars (struct lexer *lexer, char ***namesp,
- size_t *n_varsp, int pv_opts)
+parse_DATA_LIST_vars (struct lexer *lexer, const struct dictionary *dict,
+ char ***namesp, size_t *n_varsp, int pv_opts)
{
char **names;
size_t n_vars;
do
{
- if (lex_token (lexer) != T_ID)
+ if (lex_token (lexer) != T_ID
+ || !dict_id_is_valid (dict, lex_tokcstr (lexer), true))
{
lex_error (lexer, "expecting variable name");
goto exit;
unsigned long int number;
lex_get (lexer);
- if (lex_token (lexer) != T_ID)
+ if (lex_token (lexer) != T_ID
+ || !dict_id_is_valid (dict, lex_tokcstr (lexer), true))
{
lex_error (lexer, "expecting variable name");
goto exit;
parse_DATA_LIST_vars(), except that all allocations are taken
from the given POOL. */
bool
-parse_DATA_LIST_vars_pool (struct lexer *lexer, struct pool *pool,
+parse_DATA_LIST_vars_pool (struct lexer *lexer, const struct dictionary *dict,
+ struct pool *pool,
char ***names, size_t *nnames, int pv_opts)
{
int retval;
re-free it later. */
assert (!(pv_opts & PV_APPEND));
- retval = parse_DATA_LIST_vars (lexer, names, nnames, pv_opts);
+ retval = parse_DATA_LIST_vars (lexer, dict, names, nnames, pv_opts);
if (retval)
register_vars_pool (pool, *names, *nnames);
return retval;
free (v);
*nnames += nv;
}
- else if (!parse_DATA_LIST_vars (lexer, names, nnames, PV_APPEND))
+ else if (!parse_DATA_LIST_vars (lexer, dict, names, nnames, PV_APPEND))
goto fail;
}
return 1;
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006, 2007 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2006, 2007, 2010 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
struct variable ***, size_t *, int opts);
bool parse_var_set_vars (struct lexer *, const struct var_set *, struct variable ***, size_t *,
int opts);
-bool parse_DATA_LIST_vars (struct lexer *, char ***names, size_t *cnt, int opts);
-bool parse_DATA_LIST_vars_pool (struct lexer *, struct pool *,
- char ***names, size_t *cnt, int opts);
+bool parse_DATA_LIST_vars (struct lexer *, const struct dictionary *,
+ char ***names, size_t *cnt, int opts);
+bool parse_DATA_LIST_vars_pool (struct lexer *, const struct dictionary *,
+ struct pool *,
+ char ***names, size_t *cnt, int opts);
bool parse_mixed_vars (struct lexer *, const struct dictionary *dict,
char ***names, size_t *cnt, int opts);
bool parse_mixed_vars_pool (struct lexer *, const struct dictionary *dict,
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2010, 2011 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#include <config.h>
-
-#include <stdio.h>
-#include <errno.h>
-#include <stdlib.h>
-
-#include "language/prompt.h"
-
-#include "data/file-name.h"
-#include "data/settings.h"
-#include "data/variable.h"
-#include "language/command.h"
-#include "language/lexer/lexer.h"
-#include "libpspp/assertion.h"
-#include "libpspp/message.h"
-#include "libpspp/str.h"
-#include "libpspp/version.h"
-#include "output/tab.h"
-
-#include "gl/xalloc.h"
-
-/* Current prompting style. */
-static enum prompt_style current_style;
-
-/* Gets the command prompt for the given STYLE. */
-const char *
-prompt_get (enum prompt_style style)
-{
- switch (style)
- {
- case PROMPT_FIRST:
- return "PSPP> ";
-
- case PROMPT_LATER:
- return " > ";
-
- case PROMPT_DATA:
- return "data> ";
-
- case PROMPT_CNT:
- NOT_REACHED ();
- }
- NOT_REACHED ();
-}
-
-/* Sets STYLE as the current prompt style. */
-void
-prompt_set_style (enum prompt_style style)
-{
- assert (style < PROMPT_CNT);
- current_style = style;
-}
-
-/* Returns the current prompt. */
-enum prompt_style
-prompt_get_style (void)
-{
- return current_style;
-}
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2010 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#ifndef PROMPT_H
-#define PROMPT_H 1
-
-#include <stdbool.h>
-
-enum prompt_style
- {
- PROMPT_FIRST, /* First line of command. */
- PROMPT_LATER, /* Second or later line of command. */
- PROMPT_DATA, /* Between BEGIN DATA and END DATA. */
- PROMPT_CNT
- };
-
-enum prompt_style prompt_get_style (void);
-void prompt_set_style (enum prompt_style);
-
-const char *prompt_get (enum prompt_style);
-
-#endif /* PROMPT_H */
#include "language/lexer/variable-parser.h"
#include "language/stats/sort-criteria.h"
#include "libpspp/assertion.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/misc.h"
#include "libpspp/pool.h"
#include "libpspp/str.h"
#include "math/moments.h"
+#include "math/percentiles.h"
#include "math/sort.h"
#include "math/statistic.h"
-#include "math/percentiles.h"
#include "gl/minmax.h"
#include "gl/xalloc.h"
{
size_t n_dest_prev = n_dest;
- if (!parse_DATA_LIST_vars (lexer, &dest, &n_dest,
+ if (!parse_DATA_LIST_vars (lexer, dict, &dest, &n_dest,
(PV_APPEND | PV_SINGLE | PV_NO_SCRATCH
| PV_NO_DUPLICATE)))
goto error;
if (lex_is_string (lexer))
{
- /* XXX check re-encoded length */
- struct string label;
- ds_init_substring (&label, lex_tokss (lexer));
-
- ds_truncate (&label, 255);
- dest_label[n_dest - 1] = ds_xstrdup (&label);
+ dest_label[n_dest - 1] = xstrdup (lex_tokcstr (lexer));
lex_get (lexer);
- ds_destroy (&label);
}
}
lex_match (lexer, T_COMMA);
if (lex_is_string (lexer))
{
- arg[i].c = ss_xstrdup (lex_tokss (lexer));
+ arg[i].c = recode_string (dict_get_encoding (agr->dict),
+ "UTF-8", lex_tokcstr (lexer),
+ -1);
type = VAL_STRING;
}
else if (lex_is_number (lexer))
free (dest[i]);
if (dest_label[i])
- var_set_label (destvar, dest_label[i]);
+ var_set_label (destvar, dest_label[i],
+ dict_get_encoding (agr->dict), true);
v->dest = destvar;
}
iter->int1 = 1;
break;
case MAX | FSTRING:
+ /* Need to do some kind of Unicode collation thingy here */
if (memcmp (iter->string, value_str (v, src_width), src_width) < 0)
memcpy (iter->string, value_str (v, src_width), src_width);
iter->int1 = 1;
if (!lex_force_match_id (lexer, "INTO"))
goto error;
lex_match (lexer, T_EQUALS);
- if (!parse_DATA_LIST_vars (lexer, &dst_names, &n_dsts, PV_NO_DUPLICATE))
+ if (!parse_DATA_LIST_vars (lexer, dict, &dst_names, &n_dsts,
+ PV_NO_DUPLICATE))
goto error;
if (n_dsts != n_srcs)
{
#include "language/lexer/lexer.h"
#include "language/lexer/variable-parser.h"
#include "libpspp/array.h"
+#include "libpspp/assertion.h"
#include "libpspp/compiler.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
-#include "libpspp/assertion.h"
#include "math/moments.h"
#include "output/tab.h"
}
else if (var_cnt == 0)
{
- if (lex_look_ahead (lexer) == T_EQUALS)
+ if (lex_next_token (lexer, 1) == T_EQUALS)
{
lex_match_id (lexer, "VARIABLES");
lex_match (lexer, T_EQUALS);
generate_z_varname (const struct dictionary *dict, struct dsc_proc *dsc,
const char *var_name, int *z_cnt)
{
- char name[VAR_NAME_LEN + 1];
+ char *z_name, *trunc_name;
/* Try a name based on the original variable name. */
- name[0] = 'Z';
- str_copy_trunc (name + 1, sizeof name - 1, var_name);
- if (try_name (dict, dsc, name))
- return xstrdup (name);
+ z_name = xasprintf ("Z%s", var_name);
+ trunc_name = utf8_encoding_trunc (z_name, dict_get_encoding (dict),
+ ID_MAX_LEN);
+ free (z_name);
+ if (try_name (dict, dsc, trunc_name))
+ return trunc_name;
+ free (trunc_name);
/* Generate a synthetic name. */
for (;;)
{
+ char name[8];
+
(*z_cnt)++;
if (*z_cnt <= 99)
dst_var = dict_create_var_assert (dataset_dict (ds), dv->z_name, 0);
var_set_label (dst_var, xasprintf (_("Z-score of %s"),
- var_to_string (dv->v)));
+ var_to_string (dv->v)),
+ dict_get_encoding (dataset_dict (ds)), false);
z = &t->z_scores[cnt++];
z->src_var = dv->v;
flip->n_vars,
&flip_casereader_class, flip);
proc_set_active_file_data (ds, reader);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
error:
destroy_flip_pgm (flip);
*--cp = '\0';
/* Fix invalid characters. */
- for (cp = name; *cp && cp < name + VAR_NAME_LEN; cp++)
+ for (cp = name; *cp && cp < name + ID_MAX_LEN; cp++)
if (cp == name)
{
if (!lex_is_id1 (*cp) || *cp == '$')
int i;
for (i = 1; ; i++)
{
- char n[VAR_NAME_LEN + 1];
- int ofs = MIN (VAR_NAME_LEN - 1 - intlog10 (i), len);
+ char n[ID_MAX_LEN + 1];
+ int ofs = MIN (ID_MAX_LEN - 1 - intlog10 (i), len);
strncpy (n, name, ofs);
sprintf (&n[ofs], "%d", i);
}
free (v);
- if (!lex_match (lexer, T_SLASH))
- break;
- if ((lex_token (lexer) != T_ID || dict_lookup_var (dataset_dict (ds), lex_tokcstr (lexer)) != NULL)
- && lex_token (lexer) != T_ALL)
- {
- lex_put_back (lexer, T_SLASH);
- break;
- }
+ if (lex_token (lexer) != T_SLASH)
+ break;
+
+ if ((lex_next_token (lexer, 1) == T_ID
+ && dict_lookup_var (dataset_dict (ds),
+ lex_next_tokcstr (lexer, 1)))
+ || lex_next_token (lexer, 1) == T_ALL)
+ {
+ /* The token after the slash is a variable name. Keep parsing. */
+ lex_get (lexer);
+ }
+ else
+ {
+ /* The token after the slash must be the start of a new
+ subcommand. Let the caller see the slash. */
+ break;
+ }
}
return 1;
NOT_REACHED ();
}
}
- else if (lex_match_hyphenated_word (lexer, "K-W") ||
- lex_match_hyphenated_word (lexer, "KRUSKAL-WALLIS"))
+ else if (lex_match_phrase (lexer, "K-W") ||
+ lex_match_phrase (lexer, "KRUSKAL-WALLIS"))
{
lex_match (lexer, T_EQUALS);
npt->kruskal_wallis++;
NOT_REACHED ();
}
}
- else if (lex_match_hyphenated_word (lexer, "M-W") ||
- lex_match_hyphenated_word (lexer, "MANN-WHITNEY"))
+ else if (lex_match_phrase (lexer, "M-W") ||
+ lex_match_phrase (lexer, "MANN-WHITNEY"))
{
lex_match (lexer, T_EQUALS);
npt->mann_whitney++;
cstp->n_expected = 0;
cstp->expected = NULL;
- if ( lex_match (lexer, T_SLASH) )
+ if (lex_match_phrase (lexer, "/EXPECTED"))
{
- if ( lex_match_id (lexer, "EXPECTED") )
- {
- lex_force_match (lexer, T_EQUALS);
- if ( ! lex_match_id (lexer, "EQUAL") )
- {
- double f;
- int n;
- while ( lex_is_number (lexer) )
- {
- int i;
- n = 1;
- f = lex_number (lexer);
- lex_get (lexer);
- if ( lex_match (lexer, T_ASTERISK))
- {
- n = f;
- f = lex_number (lexer);
- lex_get (lexer);
- }
- lex_match (lexer, T_COMMA);
-
- cstp->n_expected += n;
- cstp->expected = pool_realloc (specs->pool,
- cstp->expected,
- sizeof (double) *
- cstp->n_expected);
- for ( i = cstp->n_expected - n ;
- i < cstp->n_expected;
- ++i )
- cstp->expected[i] = f;
+ lex_force_match (lexer, T_EQUALS);
+ if ( ! lex_match_id (lexer, "EQUAL") )
+ {
+ double f;
+ int n;
+ while ( lex_is_number (lexer) )
+ {
+ int i;
+ n = 1;
+ f = lex_number (lexer);
+ lex_get (lexer);
+ if ( lex_match (lexer, T_ASTERISK))
+ {
+ n = f;
+ f = lex_number (lexer);
+ lex_get (lexer);
+ }
+ lex_match (lexer, T_COMMA);
- }
- }
- }
- else
- retval = 3;
+ cstp->n_expected += n;
+ cstp->expected = pool_realloc (specs->pool,
+ cstp->expected,
+ sizeof (double) *
+ cstp->n_expected);
+ for ( i = cstp->n_expected - n ;
+ i < cstp->n_expected;
+ ++i )
+ cstp->expected[i] = f;
+
+ }
+ }
}
if ( cstp->ranged && cstp->n_expected > 0 &&
struct binomial_test *btp = pool_alloc (specs->pool, sizeof (*btp));
struct one_sample_test *tp = &btp->parent;
struct npar_test *nt = &tp->parent;
- bool equals;
+ bool equals = false;
nt->execute = binomial_execute;
nt->insert_variables = one_sample_insert_variables;
/* Create a label on DEST_VAR, describing its derivation from SRC_VAR and F */
static void
create_var_label (struct variable *dest_var,
- const struct variable *src_var, enum RANK_FUNC f)
+ const struct variable *src_var, enum RANK_FUNC f,
+ const char *dict_encoding)
{
struct string label;
ds_init_empty (&label);
ds_put_format (&label, _("%s of %s"),
function_name[f], var_get_name (src_var));
- var_set_label (dest_var, ds_cstr (&label));
+ var_set_label (dest_var, ds_cstr (&label), dict_encoding, false);
ds_destroy (&label);
}
int v;
for ( v = 0 ; v < n_src_vars ; v ++ )
{
+ struct dictionary *dict = dataset_dict (ds);
+
if ( rank_specs[i].destvars[v] == NULL )
{
rank_specs[i].destvars[v] =
- create_rank_variable (dataset_dict(ds), rank_specs[i].rfunc, src_vars[v], NULL);
+ create_rank_variable (dict, rank_specs[i].rfunc, src_vars[v], NULL);
}
create_var_label ( rank_specs[i].destvars[v],
src_vars[v],
- rank_specs[i].rfunc);
+ rank_specs[i].rfunc,
+ dict_get_encoding (dict));
}
}
max_buffers = INT_MAX;
subcase_destroy (&ordering);
- return ok ? lex_end_of_command (lexer) : CMD_CASCADING_FAILURE;
+ return ok ? CMD_SUCCESS : CMD_CASCADING_FAILURE;
}
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#include <config.h>
-
-#include "language/syntax-file.h"
-
-#include <stdio.h>
-#include <errno.h>
-#include <stdint.h>
-#include <stdlib.h>
-
-#include "data/file-name.h"
-#include "data/settings.h"
-#include "data/variable.h"
-#include "language/command.h"
-#include "language/lexer/lexer.h"
-#include "language/prompt.h"
-#include "libpspp/assertion.h"
-#include "libpspp/cast.h"
-#include "libpspp/getl.h"
-#include "libpspp/ll.h"
-#include "libpspp/message.h"
-#include "libpspp/str.h"
-#include "libpspp/version.h"
-#include "output/tab.h"
-
-#include "gl/xalloc.h"
-
-#include "gettext.h"
-#define _(msgid) gettext (msgid)
-
-struct syntax_file_source
- {
- struct getl_interface parent ;
-
- FILE *syntax_file;
-
- /* Current location. */
- char *fn; /* File name. */
- int ln; /* Line number. */
- };
-
-static const char *
-name (const struct getl_interface *s)
-{
- const struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source,
- parent);
- return sfs->fn;
-}
-
-static int
-line_number (const struct getl_interface *s)
-{
- const struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source,
- parent);
- return sfs->ln;
-}
-
-
-/* Reads a line from syntax file source S into LINE.
- Returns true if successful, false at end of file. */
-static bool
-read_syntax_file (struct getl_interface *s,
- struct string *line)
-{
- struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source,
- parent);
-
- if (sfs->syntax_file == NULL)
- return false;
-
- /* Read line from file and remove new-line.
- Skip initial "#! /usr/bin/pspp" line. */
- do
- {
- sfs->ln++;
- ds_clear (line);
- if (!ds_read_line (line, sfs->syntax_file, SIZE_MAX))
- {
- if (ferror (sfs->syntax_file))
- msg (ME, _("Reading `%s': %s."), sfs->fn, strerror (errno));
- return false;
- }
- ds_chomp_byte (line, '\n');
- }
- while (sfs->ln == 1 && !memcmp (ds_cstr (line), "#!", 2));
-
- return true;
-}
-
-static void
-syntax_close (struct getl_interface *s)
-{
- struct syntax_file_source *sfs = UP_CAST (s, struct syntax_file_source,
- parent);
-
- if (sfs->syntax_file && EOF == fn_close (sfs->fn, sfs->syntax_file))
- msg (MW, _("Closing `%s': %s."), sfs->fn, strerror (errno));
- free (sfs->fn);
- free (sfs);
-}
-
-static bool
-always_false (const struct getl_interface *s UNUSED)
-{
- return false;
-}
-
-
-/* Creates a syntax file source with file name FN. */
-struct getl_interface *
-create_syntax_file_source (const char *fn)
-{
- struct syntax_file_source *ss = xzalloc (sizeof (*ss));
-
- ss->fn = xstrdup (fn);
- ss->syntax_file = fn_open (ss->fn, "r");
- if (ss->syntax_file == NULL)
- msg (ME, _("Opening `%s': %s."), ss->fn, strerror (errno));
-
- ss->parent.interactive = always_false;
- ss->parent.read = read_syntax_file ;
- ss->parent.filter = NULL;
- ss->parent.close = syntax_close ;
- ss->parent.name = name ;
- ss->parent.location = line_number;
-
- return &ss->parent;
-}
-
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#if !SYNTAX_FILE
-#define SYNTAX_FILE 1
-
-struct getl_interface;
-
-/* Creates a syntax file source with file name FN. */
-struct getl_interface * create_syntax_file_source (const char *) ;
-
-#endif
+++ /dev/null
-/* PSPPIRE - a graphical interface for PSPP.
- Copyright (C) 2007, 2009, 2010, 2011 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-
-#include <config.h>
-
-#include "language/syntax-string-source.h"
-
-#include <stdlib.h>
-
-#include "libpspp/cast.h"
-#include "libpspp/getl.h"
-#include "libpspp/compiler.h"
-#include "libpspp/str.h"
-
-#include "gl/xalloc.h"
-
-struct syntax_string_source
- {
- struct getl_interface parent;
- struct string buffer;
- size_t posn;
- };
-
-
-static bool
-always_false (const struct getl_interface *i UNUSED)
-{
- return false;
-}
-
-/* Returns the name of the source */
-static const char *
-name (const struct getl_interface *i UNUSED)
-{
- return NULL;
-}
-
-
-/* Returns the location within the source */
-static int
-location (const struct getl_interface *i UNUSED)
-{
- return 0;
-}
-
-
-static void
-do_close (struct getl_interface *i )
-{
- struct syntax_string_source *sss = UP_CAST (i, struct syntax_string_source,
- parent);
-
- ds_destroy (&sss->buffer);
-
- free (sss);
-}
-
-
-
-static bool
-read_single_line (struct getl_interface *i,
- struct string *line)
-{
- struct syntax_string_source *sss = UP_CAST (i, struct syntax_string_source,
- parent);
-
- size_t next;
-
- if ( sss->posn == -1)
- return false;
-
- next = ss_find_byte (ds_substr (&sss->buffer,
- sss->posn, -1), '\n');
-
- ds_assign_substring (line,
- ds_substr (&sss->buffer,
- sss->posn,
- next)
- );
-
- if ( next != -1 )
- sss->posn += next + 1; /* + 1 to skip newline */
- else
- sss->posn = -1; /* End of file encountered */
-
- return true;
-}
-
-static struct syntax_string_source *
-create_syntax_string_source__ (void)
-{
- struct syntax_string_source *sss = xzalloc (sizeof *sss);
-
- sss->posn = 0;
-
- sss->parent.interactive = always_false;
- sss->parent.close = do_close;
- sss->parent.read = read_single_line;
-
- sss->parent.name = name;
- sss->parent.location = location;
-
- return sss;
-}
-
-struct getl_interface *
-create_syntax_string_source (const char *s)
-{
- struct syntax_string_source *sss = create_syntax_string_source__ ();
- ds_init_cstr (&sss->buffer, s);
- return &sss->parent;
-}
-
-struct getl_interface *
-create_syntax_format_source (const char *format, ...)
-{
- struct syntax_string_source *sss;
- va_list args;
-
- sss = create_syntax_string_source__ ();
-
- ds_init_empty (&sss->buffer);
-
- va_start (args, format);
- ds_put_vformat (&sss->buffer, format, args);
- va_end (args);
-
- return &sss->parent;
-}
-
-/* Return the syntax currently contained in S.
- Primarily usefull for debugging */
-const char *
-syntax_string_source_get_syntax (const struct syntax_string_source *s)
-{
- return ds_cstr (&s->buffer);
-}
+++ /dev/null
-/* PSPPIRE - a graphical interface for PSPP.
- Copyright (C) 2007, 2010 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#ifndef SYNTAX_STRING_SOURCE_H
-#define SYNTAX_STRING_SOURCE_H
-
-#include "libpspp/compiler.h"
-
-struct getl_interface;
-
-struct syntax_string_source;
-
-struct getl_interface *create_syntax_string_source (const char *);
-struct getl_interface *create_syntax_format_source (const char *, ...)
- PRINTF_FORMAT (1, 2);
-
-const char * syntax_string_source_get_syntax (const struct syntax_string_source *s);
-
-
-#endif
msg_enable ();
putc ('\n', stderr);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
}
fprintf (stderr, "\n");
- retval = lex_end_of_command (lexer);
+ retval = CMD_SUCCESS;
done:
free (values);
printf ("error\n");
lex_get (lexer);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Parses the CACHE command. */
int
-cmd_cache (struct lexer *lexer, struct dataset *ds UNUSED)
+cmd_cache (struct lexer *lexer UNUSED, struct dataset *ds UNUSED)
{
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
#include <config.h>
+#include "language/command.h"
+
#include <errno.h>
#include <unistd.h>
-#include "language/command.h"
-#include "libpspp/message.h"
#include "language/lexer/lexer.h"
+#include "libpspp/i18n.h"
+#include "libpspp/message.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
if ( ! lex_force_string (lexer))
goto error;
- path = ss_xstrdup (lex_tokss (lexer));
+ path = utf8_to_filename (lex_tokcstr (lexer));
if ( -1 == chdir (path) )
{
/* PSPP - a program for statistical analysis.
- Copyright (C) 2004, 2011 Free Software Foundation, Inc.
+ Copyright (C) 2004, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
cmd_use (struct lexer *lexer, struct dataset *ds UNUSED)
{
if (lex_match (lexer, T_ALL))
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
msg (SW, _("Only USE ALL is currently implemented."));
return CMD_FAILURE;
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2009, 2010 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2009, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include "language/lexer/lexer.h"
#include "libpspp/assertion.h"
#include "libpspp/compiler.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/str.h"
+#include "gl/localcharset.h"
#include "gl/xalloc.h"
#include "gl/xmalloca.h"
else if (lex_match_id (lexer, "COMMAND"))
{
struct string command;
+ char *locale_command;
bool ok;
lex_match (lexer, T_EQUALS);
return CMD_FAILURE;
}
- ok = run_command (ds_cstr (&command));
+ locale_command = recode_string (locale_charset (), "UTF-8",
+ ds_cstr (&command),
+ ds_length (&command));
ds_destroy (&command);
- return ok ? lex_end_of_command (lexer) : CMD_FAILURE;
+ ok = run_command (locale_command);
+ free (locale_command);
+
+ return ok ? CMD_SUCCESS : CMD_FAILURE;
}
else
{
#include <unistd.h>
#include "data/file-name.h"
+#include "data/procedure.h"
#include "language/command.h"
+#include "language/lexer/include-path.h"
#include "language/lexer/lexer.h"
-#include "language/syntax-file.h"
-#include "libpspp/getl.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/str.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
-static int parse_insert (struct lexer *lexer, char **filename);
+enum variant
+ {
+ INSERT,
+ INCLUDE
+ };
-
-int
-cmd_include (struct lexer *lexer, struct dataset *ds UNUSED)
+static int
+do_insert (struct lexer *lexer, struct dataset *ds, enum variant variant)
{
- char *filename = NULL;
- int status = parse_insert (lexer, &filename);
-
- if ( CMD_SUCCESS != status)
- return status;
+ enum lex_syntax_mode syntax_mode;
+ enum lex_error_mode error_mode;
+ char *relative_name;
+ char *filename;
+ char *encoding;
+ int status;
+ bool cd;
- lex_get (lexer);
-
- status = lex_end_of_command (lexer);
+ /* Skip optional FILE=. */
+ if (lex_match_id (lexer, "FILE"))
+ lex_match (lexer, T_EQUALS);
- if ( status == CMD_SUCCESS)
+ /* File name can be identifier or string. */
+ if (lex_token (lexer) != T_ID && !lex_is_string (lexer))
{
- struct source_stream *ss = lex_get_source_stream (lexer);
-
- assert (filename);
- getl_include_source (ss, create_syntax_file_source (filename),
- GETL_BATCH, ERRMODE_STOP);
- free (filename);
+ lex_error (lexer, _("expecting file name"));
+ return CMD_FAILURE;
}
- return status;
-}
-
-
-int
-cmd_insert (struct lexer *lexer, struct dataset *ds UNUSED)
-{
- enum syntax_mode syntax_mode = GETL_INTERACTIVE;
- enum error_mode error_mode = ERRMODE_CONTINUE;
- char *filename = NULL;
- int status = parse_insert (lexer, &filename);
- bool cd = false;
-
- if ( CMD_SUCCESS != status)
- return status;
+ relative_name = utf8_to_filename (lex_tokcstr (lexer));
+ filename = include_path_search (relative_name);
+ free (relative_name);
+ if ( ! filename)
+ {
+ msg (SE, _("Can't find `%s' in include file search path."),
+ lex_tokcstr (lexer));
+ return CMD_FAILURE;
+ }
lex_get (lexer);
+ syntax_mode = LEX_SYNTAX_INTERACTIVE;
+ error_mode = LEX_ERROR_CONTINUE;
+ cd = false;
+ status = CMD_FAILURE;
+ encoding = xstrdup (dataset_get_default_syntax_encoding (ds));
while ( T_ENDCMD != lex_token (lexer))
{
- if (lex_match_id (lexer, "SYNTAX"))
+ if (lex_match_id (lexer, "ENCODING"))
+ {
+ lex_match (lexer, T_EQUALS);
+ if (!lex_force_string (lexer))
+ goto exit;
+
+ free (encoding);
+ encoding = xstrdup (lex_tokcstr (lexer));
+ }
+ else if (variant == INSERT && lex_match_id (lexer, "SYNTAX"))
{
lex_match (lexer, T_EQUALS);
if ( lex_match_id (lexer, "INTERACTIVE") )
- syntax_mode = GETL_INTERACTIVE;
+ syntax_mode = LEX_SYNTAX_INTERACTIVE;
else if ( lex_match_id (lexer, "BATCH"))
- syntax_mode = GETL_BATCH;
+ syntax_mode = LEX_SYNTAX_BATCH;
+ else if ( lex_match_id (lexer, "AUTO"))
+ syntax_mode = LEX_SYNTAX_AUTO;
else
{
- lex_error (lexer, _("expecting %s or %s after %s"),
- "BATCH", "INTERACTIVE", "SYNTAX");
- return CMD_FAILURE;
+ lex_error (lexer, _("expecting %s, %s, or %s after %s"),
+ "BATCH", "INTERACTIVE", "AUTO", "SYNTAX");
+ goto exit;
}
}
- else if (lex_match_id (lexer, "CD"))
+ else if (variant == INSERT && lex_match_id (lexer, "CD"))
{
lex_match (lexer, T_EQUALS);
if ( lex_match_id (lexer, "YES") )
{
lex_error (lexer, _("expecting %s or %s after %s"),
"YES", "NO", "CD");
- return CMD_FAILURE;
+ goto exit;
}
}
- else if (lex_match_id (lexer, "ERROR"))
+ else if (variant == INSERT && lex_match_id (lexer, "ERROR"))
{
lex_match (lexer, T_EQUALS);
if ( lex_match_id (lexer, "CONTINUE") )
{
- error_mode = ERRMODE_CONTINUE;
+ error_mode = LEX_ERROR_CONTINUE;
}
else if ( lex_match_id (lexer, "STOP"))
{
- error_mode = ERRMODE_STOP;
+ error_mode = LEX_ERROR_STOP;
}
else
{
lex_error (lexer, _("expecting %s or %s after %s"),
"CONTINUE", "STOP", "ERROR");
- return CMD_FAILURE;
+ goto exit;
}
}
else
{
- lex_error (lexer, _("Unexpected token: `%s'."),
- lex_token_representation (lexer));
-
- return CMD_FAILURE;
+ lex_error (lexer, NULL);
+ goto exit;
}
}
-
status = lex_end_of_command (lexer);
if ( status == CMD_SUCCESS)
{
- struct source_stream *ss = lex_get_source_stream (lexer);
-
- assert (filename);
- getl_include_source (ss, create_syntax_file_source (filename),
- syntax_mode,
- error_mode);
-
- if ( cd )
- {
- char *directory = dir_name (filename);
- chdir (directory);
- free (directory);
- }
-
- free (filename);
+ struct lex_reader *reader;
+
+ reader = lex_reader_for_file (filename, encoding,
+ syntax_mode, error_mode);
+ if (reader != NULL)
+ {
+ lex_discard_rest_of_command (lexer);
+ lex_include (lexer, reader);
+
+ if ( cd )
+ {
+ char *directory = dir_name (filename);
+ chdir (directory);
+ free (directory);
+ }
+ }
}
+exit:
+ free (encoding);
+ free (filename);
return status;
}
-
-static int
-parse_insert (struct lexer *lexer, char **filename)
+int
+cmd_include (struct lexer *lexer, struct dataset *ds)
{
- const char *target_fn;
- char *relative_filename;
-
- /* Skip optional FILE=. */
- if (lex_match_id (lexer, "FILE"))
- lex_match (lexer, T_EQUALS);
-
- /* File name can be identifier or string. */
- if (lex_token (lexer) != T_ID && !lex_is_string (lexer))
- {
- lex_error (lexer, _("expecting file name"));
- return CMD_FAILURE;
- }
-
- target_fn = lex_tokcstr (lexer);
-
- relative_filename =
- fn_search_path (target_fn,
- getl_include_path (lex_get_source_stream (lexer)));
-
- if ( ! relative_filename)
- {
- msg (SE, _("Can't find `%s' in include file search path."),
- target_fn);
- return CMD_FAILURE;
- }
-
- *filename = relative_filename;
- if (*filename == NULL)
- {
- msg (SE, _("Unable to open `%s': %s."),
- relative_filename, strerror (errno));
- free (relative_filename);
- return CMD_FAILURE;
- }
+ return do_insert (lexer, ds, INCLUDE);
+}
- return CMD_SUCCESS;
+int
+cmd_insert (struct lexer *lexer, struct dataset *ds)
+{
+ return do_insert (lexer, ds, INSERT);
}
+
#include "data/settings.h"
#include "language/command.h"
#include "language/lexer/lexer.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/misc.h"
#include "libpspp/str.h"
int
change_permissions (const char *file_name, enum PER per)
{
+ char *locale_file_name;
struct stat buf;
mode_t mode;
if (settings_get_safer_mode ())
{
msg (SE, _("This command not allowed when the SAFER option is set."));
- return CMD_FAILURE;
+ return 0;
}
- if ( -1 == stat(file_name, &buf) )
+ locale_file_name = utf8_to_filename (file_name);
+ if ( -1 == stat(locale_file_name, &buf) )
{
const int errnum = errno;
msg (SE, _("Cannot stat %s: %s"), file_name, strerror(errnum));
+ free (locale_file_name);
return 0;
}
else
mode = buf.st_mode & ~0222;
- if ( -1 == chmod(file_name, mode))
+ if ( -1 == chmod(locale_file_name, mode))
{
const int errnum = errno;
msg (SE, _("Cannot change mode of %s: %s"), file_name, strerror(errnum));
+ free (locale_file_name);
return 0;
}
+ free (locale_file_name);
+
return 1;
}
journal_disable ();
else if (lex_is_string (lexer) || lex_token (lexer) == T_ID)
{
- journal_set_file_name (lex_tokcstr (lexer));
+ char *filename = utf8_to_filename (lex_tokcstr (lexer));
+ journal_set_file_name (filename);
+ free (filename);
+
lex_get (lexer);
}
else
static int n_saved_settings;
int
-cmd_preserve (struct lexer *lexer, struct dataset *ds UNUSED)
+cmd_preserve (struct lexer *lexer UNUSED, struct dataset *ds UNUSED)
{
if (n_saved_settings < MAX_SAVED_SETTINGS)
{
saved_settings[n_saved_settings++] = settings_get ();
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
else
{
}
int
-cmd_restore (struct lexer *lexer, struct dataset *ds UNUSED)
+cmd_restore (struct lexer *lexer UNUSED, struct dataset *ds UNUSED)
{
if (n_saved_settings > 0)
{
struct settings *s = saved_settings[--n_saved_settings];
settings_set (s);
settings_destroy (s);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
else
{
static int
parse_title (struct lexer *lexer, enum text_item_type type)
{
- if (lex_look_ahead (lexer) == T_STRING)
- {
- lex_get (lexer);
- if (!lex_force_string (lexer))
- return CMD_FAILURE;
- set_title (lex_tokcstr (lexer), type);
- lex_get (lexer);
- return lex_end_of_command (lexer);
- }
- else
- {
- set_title (lex_rest_of_line (lexer), type);
- lex_discard_line (lexer);
- }
+ if (!lex_force_string (lexer))
+ return CMD_FAILURE;
+ set_title (lex_tokcstr (lexer), type);
+ lex_get (lexer);
return CMD_SUCCESS;
}
int
cmd_file_label (struct lexer *lexer, struct dataset *ds)
{
- const char *label;
-
- label = lex_rest_of_line (lexer);
- lex_discard_line (lexer);
- while (isspace ((unsigned char) *label))
- label++;
+ if (!lex_force_string (lexer))
+ return CMD_FAILURE;
- dict_set_label (dataset_dict (ds), label);
+ dict_set_label (dataset_dict (ds), lex_tokcstr (lexer));
+ lex_get (lexer);
return CMD_SUCCESS;
}
-/* Add entry date line to DICT's documents. */
-static void
-add_document_trailer (struct dictionary *dict)
-{
- char buf[64];
-
- sprintf (buf, _(" (Entered %s)"), get_start_date ());
- dict_add_document_line (dict, buf);
-}
-
/* Performs the DOCUMENT command. */
int
cmd_document (struct lexer *lexer, struct dataset *ds)
{
struct dictionary *dict = dataset_dict (ds);
- struct string line = DS_EMPTY_INITIALIZER;
- bool end_dot;
+ char *trailer;
- do
+ if (!lex_force_string (lexer))
+ return CMD_FAILURE;
+
+ while (lex_is_string (lexer))
{
- end_dot = lex_end_dot (lexer);
- ds_assign_string (&line, lex_entire_line_ds (lexer));
- if (end_dot)
- ds_put_byte (&line, '.');
- dict_add_document_line (dict, ds_cstr (&line));
-
- lex_discard_line (lexer);
- lex_get_line (lexer);
+ dict_add_document_line (dict, lex_tokcstr (lexer), true);
+ lex_get (lexer);
}
- while (!end_dot);
- add_document_trailer (dict);
- ds_destroy (&line);
+ trailer = xasprintf (_(" (Entered %s)"), get_start_date ());
+ dict_add_document_line (dict, trailer, true);
+ free (trailer);
return CMD_SUCCESS;
}
-/* Performs the DROP DOCUMENTS command. */
+/* Performs the ADD DOCUMENTS command. */
int
-cmd_drop_documents (struct lexer *lexer, struct dataset *ds)
+cmd_add_documents (struct lexer *lexer, struct dataset *ds)
{
- dict_clear_documents (dataset_dict (ds));
-
- return lex_end_of_command (lexer);
+ return cmd_document (lexer, ds);
}
-
-/* Performs the ADD DOCUMENTS command. */
+/* Performs the DROP DOCUMENTS command. */
int
-cmd_add_documents (struct lexer *lexer, struct dataset *ds)
+cmd_drop_documents (struct lexer *lexer UNUSED, struct dataset *ds)
{
- struct dictionary *dict = dataset_dict (ds);
-
- if ( ! lex_force_string (lexer) )
- return CMD_FAILURE;
-
- while ( lex_is_string (lexer))
- {
- dict_add_document_line (dict, lex_tokcstr (lexer));
- lex_get (lexer);
- }
-
- add_document_trailer (dict);
-
- return lex_end_of_command (lexer) ;
+ dict_clear_documents (dataset_dict (ds));
+ return CMD_SUCCESS;
}
lvalue_finalize (lvalue, compute, dict);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
fail:
lvalue_destroy (lvalue, dict);
lvalue_finalize (lvalue, compute, dict);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
fail:
lvalue_destroy (lvalue, dict);
if (!lex_force_id (lexer))
goto lossage;
- if (lex_look_ahead (lexer) == T_LPAREN)
+ if (lex_next_token (lexer, 1) == T_LPAREN)
{
/* Vector. */
lvalue->vector = dict_lookup_vector (dict, lex_tokcstr (lexer));
#include "language/lexer/value-parser.h"
#include "language/lexer/variable-parser.h"
#include "libpspp/compiler.h"
+#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/pool.h"
#include "libpspp/str.h"
static trns_free_func count_trns_free;
static bool parse_numeric_criteria (struct lexer *, struct pool *, struct criteria *);
-static bool parse_string_criteria (struct lexer *, struct pool *, struct criteria *);
+static bool parse_string_criteria (struct lexer *, struct pool *,
+ struct criteria *,
+ const char *dict_encoding);
\f
int
cmd_count (struct lexer *lexer, struct dataset *ds)
crit = dv->crit = pool_alloc (trns->pool, sizeof *crit);
for (;;)
{
+ struct dictionary *dict = dataset_dict (ds);
bool ok;
crit->next = NULL;
crit->vars = NULL;
- if (!parse_variables_const (lexer, dataset_dict (ds), &crit->vars,
+ if (!parse_variables_const (lexer, dict, &crit->vars,
&crit->var_cnt,
- PV_DUPLICATE | PV_SAME_TYPE))
+ PV_DUPLICATE | PV_SAME_TYPE))
goto fail;
pool_register (trns->pool, free, crit->vars);
if (var_is_numeric (crit->vars[0]))
ok = parse_numeric_criteria (lexer, trns->pool, crit);
else
- ok = parse_string_criteria (lexer, trns->pool, crit);
+ ok = parse_string_criteria (lexer, trns->pool, crit,
+ dict_get_encoding (dict));
if (!ok)
goto fail;
/* Parses a set of string criteria values. Returns success. */
static bool
-parse_string_criteria (struct lexer *lexer, struct pool *pool, struct criteria *crit)
+parse_string_criteria (struct lexer *lexer, struct pool *pool,
+ struct criteria *crit, const char *dict_encoding)
{
int len = 0;
size_t allocated = 0;
for (;;)
{
char **cur;
+ char *s;
+
if (crit->value_cnt >= allocated)
crit->values.str = pool_2nrealloc (pool, crit->values.str,
&allocated,
if (!lex_force_string (lexer))
return false;
+
+ s = recode_string (dict_encoding, "UTF-8", lex_tokcstr (lexer),
+ ss_length (lex_tokss (lexer)));
+
cur = &crit->values.str[crit->value_cnt++];
*cur = pool_alloc (pool, len + 1);
- str_copy_rpad (*cur, len + 1, lex_tokcstr (lexer));
+ str_copy_rpad (*cur, len + 1, s);
lex_get (lexer);
+ free (s);
+
lex_match (lexer, T_COMMA);
if (lex_match (lexer, T_RPAREN))
break;
/* PSPP - a program for statistical analysis.
- Copyright (C) 2007, 2009 Free Software Foundation, Inc.
+ Copyright (C) 2007, 2009, 2010 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
}
int
-cmd_debug_xform_fail (struct lexer *lexer, struct dataset *ds)
+cmd_debug_xform_fail (struct lexer *lexer UNUSED, struct dataset *ds)
{
-
add_transformation (ds, trns_fail, NULL, NULL);
-
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
{
struct pool *pool;
-
-
/* Variable types, for convenience. */
enum val_type src_type; /* src_vars[*] type. */
enum val_type dst_type; /* dst_vars[*] type. */
};
static bool parse_src_vars (struct lexer *, struct recode_trns *, const struct dictionary *dict);
-static bool parse_mappings (struct lexer *, struct recode_trns *);
+static bool parse_mappings (struct lexer *, struct recode_trns *,
+ const char *dict_encoding);
static bool parse_dst_vars (struct lexer *, struct recode_trns *, const struct dictionary *dict);
static void add_mapping (struct recode_trns *,
size_t *map_allocated, const struct map_in *);
static bool parse_map_in (struct lexer *lexer, struct map_in *, struct pool *,
- enum val_type src_type, size_t max_src_width);
+ enum val_type src_type, size_t max_src_width,
+ const char *dict_encoding);
static void set_map_in_generic (struct map_in *, enum map_in_type);
static void set_map_in_num (struct map_in *, enum map_in_type, double, double);
static void set_map_in_str (struct map_in *, struct pool *,
- struct substring, size_t width);
+ struct substring, size_t width,
+ const char *dict_encoding);
static bool parse_map_out (struct lexer *lexer, struct pool *, struct map_out *);
static void set_map_out_num (struct map_out *, double);
{
do
{
+ struct dictionary *dict = dataset_dict (ds);
struct recode_trns *trns
= pool_create_container (struct recode_trns, pool);
/* Parse source variable names,
then input to output mappings,
then destintation variable names. */
- if (!parse_src_vars (lexer, trns, dataset_dict (ds) )
- || !parse_mappings (lexer, trns)
- || !parse_dst_vars (lexer, trns, dataset_dict (ds)))
+ if (!parse_src_vars (lexer, trns, dict)
+ || !parse_mappings (lexer, trns, dict_get_encoding (dict))
+ || !parse_dst_vars (lexer, trns, dict))
{
recode_trns_free (trns);
return CMD_FAILURE;
/* Create destination variables, if needed.
This must be the final step; otherwise we'd have to
delete destination variables on failure. */
- trns->dst_dict = dataset_dict (ds);
+ trns->dst_dict = dict;
if (trns->src_vars != trns->dst_vars)
- create_dst_vars (trns, dataset_dict (ds));
+ create_dst_vars (trns, dict);
/* Done. */
add_transformation (ds,
}
while (lex_match (lexer, T_SLASH));
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Parses a set of variables to recode into TRNS->src_vars and
into TRNS->mappings and TRNS->map_cnt. Sets TRNS->dst_type.
Returns true if successful, false on parse error. */
static bool
-parse_mappings (struct lexer *lexer, struct recode_trns *trns)
+parse_mappings (struct lexer *lexer, struct recode_trns *trns,
+ const char *dict_encoding)
{
size_t map_allocated;
bool have_dst_type;
struct map_in in;
if (!parse_map_in (lexer, &in, trns->pool,
- trns->src_type, trns->max_src_width))
+ trns->src_type, trns->max_src_width,
+ dict_encoding))
return false;
add_mapping (trns, &map_allocated, &in);
lex_match (lexer, T_COMMA);
false on parse error. */
static bool
parse_map_in (struct lexer *lexer, struct map_in *in, struct pool *pool,
- enum val_type src_type, size_t max_src_width)
+ enum val_type src_type, size_t max_src_width,
+ const char *dict_encoding)
{
if (lex_match_id (lexer, "ELSE"))
return false;
else
{
- set_map_in_str (in, pool, lex_tokss (lexer), max_src_width);
+ set_map_in_str (in, pool, lex_tokss (lexer), max_src_width,
+ dict_encoding);
lex_get (lexer);
if (lex_token (lexer) == T_ID
&& lex_id_match (ss_cstr ("THRU"), lex_tokss (lexer)))
right to WIDTH characters long. */
static void
set_map_in_str (struct map_in *in, struct pool *pool,
- struct substring string, size_t width)
+ struct substring string, size_t width,
+ const char *dict_encoding)
{
+ char *s = recode_string (dict_encoding, "UTF-8",
+ ss_data (string), ss_length (string));
in->type = MAP_SINGLE;
value_init_pool (pool, &in->x, width);
value_copy_buf_rpad (&in->x, width,
- CHAR_CAST_BUG (uint8_t *, ss_data (string)),
- ss_length (string), ' ');
+ CHAR_CAST (uint8_t *, s), strlen (s), ' ');
+ free (s);
}
/* Parses a mapping output value into OUT, allocating memory from
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2009, 2011 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2009-2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
trns->frac = frac;
add_transformation (ds, sample_trns_proc, sample_trns_free, trns);
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
/* Executes a SAMPLE transformation. */
dict_set_filter (dict, v);
}
- return lex_end_of_command (lexer);
+ return CMD_SUCCESS;
}
src/libpspp/float-format.h \
src/libpspp/freaderror.c \
src/libpspp/freaderror.h \
- src/libpspp/getl.c \
- src/libpspp/getl.h \
src/libpspp/hash-functions.c \
src/libpspp/hash-functions.h \
src/libpspp/hash.c \
src/libpspp/misc.h \
src/libpspp/model-checker.c \
src/libpspp/model-checker.h \
- src/libpspp/msg-locator.c \
- src/libpspp/msg-locator.h \
src/libpspp/pool.c \
src/libpspp/pool.h \
src/libpspp/prompt.c \
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006, 2009, 2010 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#include <config.h>
-
-#include "libpspp/getl.h"
-
-#include <stdlib.h>
-
-#include "libpspp/ll.h"
-#include "libpspp/str.h"
-#include "libpspp/string-array.h"
-
-#include "gl/configmake.h"
-#include "gl/relocatable.h"
-#include "gl/xalloc.h"
-
-struct getl_source
- {
- struct getl_source *included_from; /* File that this is nested inside. */
- struct getl_source *includes; /* File nested inside this file. */
-
- struct ll ll; /* Element in the sources list */
-
- struct getl_interface *interface;
- enum syntax_mode syntax_mode;
- enum error_mode error_mode;
- };
-
-struct source_stream
- {
- struct ll_list sources ; /* List of source files. */
- struct string_array include_path;
- };
-
-char **
-getl_include_path (const struct source_stream *ss_)
-{
- struct source_stream *ss = CONST_CAST (struct source_stream *, ss_);
- string_array_terminate_null (&ss->include_path);
- return ss->include_path.strings;
-}
-
-static struct getl_source *
-current_source (const struct source_stream *ss)
-{
- const struct ll *ll = ll_head (&ss->sources);
- return ll_data (ll, struct getl_source, ll );
-}
-
-enum syntax_mode
-source_stream_current_syntax_mode (const struct source_stream *ss)
-{
- struct getl_source *cs = current_source (ss);
-
- return cs->syntax_mode;
-}
-
-
-
-enum error_mode
-source_stream_current_error_mode (const struct source_stream *ss)
-{
- struct getl_source *cs = current_source (ss);
-
- return cs->error_mode;
-}
-
-
-
-/* Initialize getl. */
-struct source_stream *
-create_source_stream (void)
-{
- struct source_stream *ss;
-
- ss = xzalloc (sizeof (*ss));
- ll_init (&ss->sources);
-
- string_array_init (&ss->include_path);
- string_array_append (&ss->include_path, ".");
- if (getenv ("HOME") != NULL)
- string_array_append_nocopy (&ss->include_path,
- xasprintf ("%s/.pspp", getenv ("HOME")));
- string_array_append (&ss->include_path, relocate (PKGDATADIR));
-
- return ss;
-}
-
-/* Delete everything from the include path. */
-void
-getl_clear_include_path (struct source_stream *ss)
-{
- string_array_clear (&ss->include_path);
-}
-
-/* Add to the include path. */
-void
-getl_add_include_dir (struct source_stream *ss, const char *path)
-{
- string_array_append (&ss->include_path, path);
-}
-
-/* Appends source S to the list of source files. */
-void
-getl_append_source (struct source_stream *ss,
- struct getl_interface *i,
- enum syntax_mode syntax_mode,
- enum error_mode err_mode)
-{
- struct getl_source *s = xzalloc (sizeof ( struct getl_source ));
-
- s->interface = i ;
- s->syntax_mode = syntax_mode;
- s->error_mode = err_mode;
-
- ll_push_tail (&ss->sources, &s->ll);
-}
-
-/* Nests source S within the current source file. */
-void
-getl_include_source (struct source_stream *ss,
- struct getl_interface *i,
- enum syntax_mode syntax_mode,
- enum error_mode err_mode)
-{
- struct getl_source *current = current_source (ss);
- struct getl_source *s = xzalloc (sizeof ( struct getl_source ));
-
- s->interface = i;
-
- s->included_from = current ;
- s->includes = NULL;
- s->syntax_mode = syntax_mode;
- s->error_mode = err_mode;
- current->includes = s;
-
- ll_push_head (&ss->sources, &s->ll);
-}
-
-/* Closes the current source, and move the current source to the
- next file in the chain. */
-static void
-close_source (struct source_stream *ss)
-{
- struct getl_source *s = current_source (ss);
-
- if ( s->interface->close )
- s->interface->close (s->interface);
-
- ll_pop_head (&ss->sources);
-
- if (s->included_from != NULL)
- current_source (ss)->includes = NULL;
-
- free (s);
-}
-
-/* Closes all sources until an interactive source is
- encountered. */
-void
-getl_abort_noninteractive (struct source_stream *ss)
-{
- while ( ! ll_is_empty (&ss->sources))
- {
- const struct getl_source *s = current_source (ss);
-
- if ( !s->interface->interactive (s->interface) )
- close_source (ss);
- }
-}
-
-/* Returns true if the current source is interactive,
- false otherwise. */
-bool
-getl_is_interactive (const struct source_stream *ss)
-{
- const struct getl_source *s = current_source (ss);
-
- if (ll_is_empty (&ss->sources) )
- return false;
-
- return s->interface->interactive (s->interface);
-}
-
-/* Returns the name of the current source, or NULL if there is no
- current source */
-const char *
-getl_source_name (const struct source_stream *ss)
-{
- const struct getl_source *s = current_source (ss);
-
- if ( ll_is_empty (&ss->sources) )
- return NULL;
-
- if ( ! s->interface->name )
- return NULL;
-
- return s->interface->name (s->interface);
-}
-
-/* Returns the line number within the current source, or 0 if there is no
- current source. */
-int
-getl_source_location (const struct source_stream *ss)
-{
- const struct getl_source *s = current_source (ss);
-
- if ( ll_is_empty (&ss->sources) )
- return 0;
-
- if ( !s->interface->location )
- return 0;
-
- return s->interface->location (s->interface);
-}
-
-
-/* Close getl. */
-void
-destroy_source_stream (struct source_stream *ss)
-{
- while ( !ll_is_empty (&ss->sources))
- close_source (ss);
- string_array_destroy (&ss->include_path);
-
- free (ss);
-}
-
-
-/* Reads a single line into LINE.
- Returns true when a line has been read, false at end of input.
-*/
-bool
-getl_read_line (struct source_stream *ss, struct string *line)
-{
- assert (ss != NULL);
- while (!ll_is_empty (&ss->sources))
- {
- struct getl_source *s = current_source (ss);
-
- ds_clear (line);
- if (s->interface->read (s->interface, line))
- {
- while (s)
- {
- if (s->interface->filter)
- s->interface->filter (s->interface, line);
- s = s->included_from;
- }
-
- return true;
- }
- close_source (ss);
- }
-
- return false;
-}
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006, 2010, 2011 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#ifndef GETL_H
-#define GETL_H 1
-
-#include <stdbool.h>
-#include "libpspp/ll.h"
-
-struct string;
-
-struct getl_source;
-
-/* Syntax rules that apply to a given source line. */
-enum syntax_mode
- {
- /* Each line that begins in column 1 starts a new command. A
- `+' or `-' in column 1 is ignored to allow visual
- indentation of new commands. Continuation lines must be
- indented from the left margin. A period at the end of a
- line does end a command, but it is optional. */
- GETL_BATCH,
-
- /* Each command must end in a period or in a blank line. */
- GETL_INTERACTIVE
- };
-
-enum error_mode
- {
- /* When errors are encountered, report the error and continue to
- the next command. */
- ERRMODE_CONTINUE,
-
- /* When errors are encountered, abort the current stream. */
- ERRMODE_STOP
- };
-
-/* An abstract base class for objects which act as line buffers for the
- PSPP. Ie anything which might contain content for the lexer */
-struct getl_interface
- {
- /* Returns true if the interface is interactive, that is, if
- it prompts a human user. This property is independent of
- the syntax mode returned by the read member function. */
- bool (*interactive) (const struct getl_interface *);
-
- /* Read a line the intended syntax mode from the interface.
- Returns true if succesful, false on failure or at end of
- input. */
- bool (*read) (struct getl_interface *,
- struct string *);
-
- /* Close and destroy the interface */
- void (*close) (struct getl_interface *);
-
- /* Filter for current and all included sources, which may
- modify the line. Usually null. */
- void (*filter) (struct getl_interface *,
- struct string *line);
-
- /* Returns the name of the source */
- const char * (*name) (const struct getl_interface *);
-
- /* Returns the current location within the source */
- int (*location) (const struct getl_interface *);
- };
-
-struct source_stream;
-
-struct source_stream *create_source_stream (void);
-
-enum syntax_mode source_stream_current_syntax_mode
- (const struct source_stream *);
-
-
-enum error_mode source_stream_current_error_mode
- (const struct source_stream *);
-
-
-void destroy_source_stream (struct source_stream *);
-
-void getl_clear_include_path (struct source_stream *);
-void getl_add_include_dir (struct source_stream *, const char *);
-char **getl_include_path (const struct source_stream *);
-
-void getl_abort_noninteractive (struct source_stream *);
-bool getl_is_interactive (const struct source_stream *);
-
-bool getl_read_line (struct source_stream *, struct string *);
-
-void getl_append_source (struct source_stream *, struct getl_interface *s,
- enum syntax_mode, enum error_mode) ;
-
-void getl_include_source (struct source_stream *, struct getl_interface *s,
- enum syntax_mode, enum error_mode) ;
-
-const char * getl_source_name (const struct source_stream *);
-int getl_source_location (const struct source_stream *);
-
-#endif /* line-buffer.h */
#include <config.h>
#include "libpspp/message.h"
-#include "libpspp/msg-locator.h"
#include <assert.h>
#include <stdarg.h>
#include <string.h>
#include <unistd.h>
-#include "data/settings.h"
+#include "libpspp/cast.h"
#include "libpspp/str.h"
#include "libpspp/version.h"
+#include "data/settings.h"
+#include "gl/minmax.h"
#include "gl/progname.h"
#include "gl/xalloc.h"
#include "gl/xvasprintf.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
-/* Message handler as set by msg_init(). */
-static void (*msg_handler) (const struct msg *);
+/* Message handler as set by msg_set_handler(). */
+static void (*msg_handler) (const struct msg *, void *aux);
+static void *msg_aux;
/* Disables emitting messages if positive. */
static int messages_disabled;
m.severity = msg_class_to_severity (class);
va_start (args, format);
m.text = xvasprintf (format, args);
- m.where.file_name = NULL;
- m.where.line_number = 0;
- m.where.first_column = 0;
- m.where.last_column = 0;
+ m.file_name = NULL;
+ m.first_line = m.last_line = 0;
+ m.first_column = m.last_column = 0;
va_end (args);
msg_emit (&m);
}
-static struct source_stream *s_stream;
-
void
-msg_init (struct source_stream *ss, void (*handler) (const struct msg *) )
+msg_set_handler (void (*handler) (const struct msg *, void *aux), void *aux)
{
- s_stream = ss;
msg_handler = handler;
-}
-
-void
-msg_done (void)
-{
+ msg_aux = aux;
}
\f
/* Working with messages. */
struct msg *new_msg;
new_msg = xmemdup (m, sizeof *m);
- if (m->where.file_name != NULL)
- new_msg->where.file_name = xstrdup (m->where.file_name);
+ if (m->file_name != NULL)
+ new_msg->file_name = xstrdup (m->file_name);
new_msg->text = xstrdup (m->text);
return new_msg;
/* Frees a message created by msg_dup().
- (Messages not created by msg_dup(), as well as their where.file_name
+ (Messages not created by msg_dup(), as well as their file_name
members, are typically not dynamically allocated, so this function should
not be used to destroy them.) */
void
msg_destroy (struct msg *m)
{
- free (m->where.file_name);
+ free (m->file_name);
free (m->text);
free (m);
}
ds_init_empty (&s);
if (m->category != MSG_C_GENERAL
- && (m->where.file_name
- || m->where.line_number > 0
- || m->where.first_column > 0))
+ && (m->file_name || m->first_line > 0 || m->first_column > 0))
{
- if (m->where.file_name)
- ds_put_format (&s, "%s", m->where.file_name);
- if (m->where.line_number > 0)
+ int l1 = m->first_line;
+ int l2 = MAX (m->first_line, m->last_line - 1);
+ int c1 = m->first_column;
+ int c2 = MAX (m->first_column, m->last_column - 1);
+
+ if (m->file_name)
+ ds_put_format (&s, "%s", m->file_name);
+
+ if (l1 > 0)
{
if (!ds_is_empty (&s))
ds_put_byte (&s, ':');
- ds_put_format (&s, "%d", m->where.line_number);
+
+ if (l2 > l1)
+ {
+ if (c1 > 0)
+ ds_put_format (&s, "%d.%d-%d.%d", l1, c1, l2, c2);
+ else
+ ds_put_format (&s, "%d-%d", l1, l2);
+ }
+ else
+ {
+ if (c1 > 0)
+ {
+ if (c2 > c1)
+ {
+ /* The GNU coding standards say to use
+ LINENO-1.COLUMN-1-COLUMN-2 for this case, but GNU
+ Emacs interprets COLUMN-2 as LINENO-2 if I do that.
+ I've submitted an Emacs bug report:
+ http://debbugs.gnu.org/cgi/bugreport.cgi?bug=7725.
+
+ For now, let's be compatible. */
+ ds_put_format (&s, "%d.%d-%d.%d", l1, c1, l1, c2);
+ }
+ else
+ ds_put_format (&s, "%d.%d", l1, c1);
+ }
+ else
+ ds_put_format (&s, "%d", l1);
+ }
}
- if (m->where.first_column > 0)
+ else if (c1 > 0)
{
- ds_put_format (&s, ".%d", m->where.first_column);
- if (m->where.last_column > m->where.first_column + 1)
- ds_put_format (&s, "-%d", m->where.last_column - 1);
+ if (c2 > c1)
+ ds_put_format (&s, ".%d-%d", c1, c2);
+ else
+ ds_put_format (&s, ".%d", c1);
}
ds_put_cstr (&s, ": ");
}
m.category = MSG_C_GENERAL;
m.severity = MSG_S_NOTE;
- m.where.file_name = NULL;
- m.where.line_number = 0;
- m.where.first_column = 0;
- m.where.last_column = 0;
+ m.file_name = NULL;
+ m.first_line = 0;
+ m.last_line = 0;
+ m.first_column = 0;
+ m.last_column = 0;
m.text = s;
- msg_handler (&m);
+ msg_handler (&m, msg_aux);
free (s);
}
|| (warnings_off && m->severity == MSG_S_WARNING) )
return;
- msg_handler (m);
+ msg_handler (m, msg_aux);
counts[m->severity]++;
max_msgs = settings_get_max_messages (m->severity);
void
msg_emit (struct msg *m)
{
- if ( s_stream && m->where.file_name == NULL )
- {
- struct msg_locator loc;
-
- get_msg_location (s_stream, &loc);
- m->where.file_name = loc.file_name;
- m->where.line_number = loc.line_number;
- }
- else
- {
- m->where.file_name = NULL;
- m->where.line_number = 0;
- }
-
if (!messages_disabled)
process_msg (m);
return category * 3 + severity;
}
-/* A file location. */
-struct msg_locator
- {
- char *file_name; /* File name (NULL if none). */
- int line_number; /* Line number (0 if none). */
- int first_column; /* 1-based column number (0 if none). */
- int last_column; /* 1-based exclusive last column (0 if none). */
- };
-
/* A message. */
struct msg
{
enum msg_category category; /* Message category. */
enum msg_severity severity; /* Message severity. */
- struct msg_locator where; /* File location, or (NULL, -1). */
+ char *file_name; /* Name of file containing error, or NULL. */
+ int first_line; /* 1-based line number, or 0 if none. */
+ int last_line; /* 1-based exclusive last line (0=none). */
+ int first_column; /* 1-based first column, or 0 if none. */
+ int last_column; /* 1-based exclusive last column (0=none). */
char *text; /* Error text. */
};
struct source_stream ;
/* Initialization. */
-void msg_init (struct source_stream *, void (*handler) (const struct msg *) );
-
-void msg_done (void);
+void msg_set_handler (void (*handler) (const struct msg *, void *lexer),
+ void *aux);
/* Working with messages. */
struct msg *msg_dup (const struct msg *);
void msg_disable (void);
/* Error context. */
-void msg_push_msg_locator (const struct msg_locator *);
-void msg_pop_msg_locator (const struct msg_locator *);
-
bool msg_ui_too_many_errors (void);
void msg_ui_reset_counts (void);
bool msg_ui_any_errors (void);
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006, 2011 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#include <config.h>
-
-#include "libpspp/msg-locator.h"
-
-#include <stdlib.h>
-
-#include "libpspp/assertion.h"
-#include "libpspp/cast.h"
-#include "libpspp/message.h"
-#include "libpspp/getl.h"
-
-#include "gl/xalloc.h"
-
-/* File locator stack. */
-static const struct msg_locator **file_loc;
-
-static int nfile_loc, mfile_loc;
-
-void
-msg_locator_done (void)
-{
- free(file_loc);
- file_loc = NULL;
- nfile_loc = mfile_loc = 0;
-}
-
-
-/* File locator stack functions. */
-
-/* Pushes F onto the stack of file locations. */
-void
-msg_push_msg_locator (const struct msg_locator *loc)
-{
- if (nfile_loc >= mfile_loc)
- {
- if (mfile_loc == 0)
- mfile_loc = 8;
- else
- mfile_loc *= 2;
-
- file_loc = xnrealloc (file_loc, mfile_loc, sizeof *file_loc);
- }
-
- file_loc[nfile_loc++] = loc;
-}
-
-/* Pops F off the stack of file locations.
- Argument F is only used for verification that that is actually the
- item on top of the stack. */
-void
-msg_pop_msg_locator (const struct msg_locator *loc)
-{
- assert (nfile_loc >= 0 && file_loc[nfile_loc - 1] == loc);
- nfile_loc--;
-}
-
-/* Puts the current file and line number into LOC, or NULL and -1 if
- none. */
-void
-get_msg_location (const struct source_stream *ss, struct msg_locator *loc)
-{
- if (nfile_loc)
- {
- *loc = *file_loc[nfile_loc - 1];
- }
- else
- {
- loc->file_name = CONST_CAST (char *, getl_source_name (ss));
- loc->line_number = getl_source_location (ss);
- }
-}
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-struct msg_locator ;
-
-void msg_locator_done (void);
-
-/* File locator stack functions. */
-
-/* Pushes F onto the stack of file locations. */
-void msg_push_msg_locator (const struct msg_locator *loc);
-
-/* Pops F off the stack of file locations.
- Argument F is only used for verification that that is actually the
- item on top of the stack. */
-void msg_pop_msg_locator (const struct msg_locator *loc);
-
-struct source_stream ;
-/* Puts the current file and line number into LOC, or NULL and -1 if
- none. */
-void get_msg_location (const struct source_stream *ss, struct msg_locator *loc);
/* Drivers currently registered with output_driver_register(). */
static struct llx_list drivers = LLX_INITIALIZER (drivers);
-static struct output_item *deferred_syntax;
-static bool in_command;
-
void
output_close (void)
{
string_set_insert (formats, (*fp)->extension);
}
-static void
-output_submit__ (struct output_item *item)
+/* Submits ITEM to the configured output drivers, and transfers ownership to
+ the output subsystem. */
+void
+output_submit (struct output_item *item)
{
struct llx *llx, *next;
output_item_unref (item);
}
-static void
-flush_deferred_syntax (void)
-{
- if (deferred_syntax != NULL)
- {
- output_submit__ (deferred_syntax);
- deferred_syntax = NULL;
- }
-}
-
-/* Submits ITEM to the configured output drivers, and transfers ownership to
- the output subsystem. */
-void
-output_submit (struct output_item *item)
-{
- if (is_text_item (item))
- {
- struct text_item *text = to_text_item (item);
- switch (text_item_get_type (text))
- {
- case TEXT_ITEM_SYNTAX:
- if (!in_command)
- {
- flush_deferred_syntax ();
- deferred_syntax = item;
- return;
- }
- break;
-
- case TEXT_ITEM_COMMAND_OPEN:
- output_submit__ (item);
- flush_deferred_syntax ();
- in_command = true;
- return;
-
- case TEXT_ITEM_COMMAND_CLOSE:
- in_command = false;
- break;
-
- default:
- break;
- }
- }
-
- output_submit__ (item);
-}
-
/* Flushes output to screen devices, so that the user can see
output that doesn't fill up an entire page. */
void
src/ui/gui/sort-cases-dialog.h \
src/ui/gui/split-file-dialog.c \
src/ui/gui/split-file-dialog.h \
- src/ui/gui/syntax-editor-source.c \
- src/ui/gui/syntax-editor-source.h \
src/ui/gui/text-data-import-dialog.c \
src/ui/gui/text-data-import-dialog.h \
src/ui/gui/transpose-dialog.c \
/* PSPPIRE - a graphical user interface for PSPP.
- Copyright (C) 2007, 2010 Free Software Foundation
+ Copyright (C) 2007, 2010, 2011 Free Software Foundation
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
gtk_text_buffer_set_text (buffer, "", 0);
for ( i = 0 ; i < dict_get_document_line_cnt (cd->dict->dict); ++i )
- {
- struct string str;
- ds_init_empty (&str);
- dict_get_document_line (cd->dict->dict, i, &str);
- add_line_to_buffer (buffer, ds_cstr (&str));
- ds_destroy (&str);
- }
+ add_line_to_buffer (buffer, dict_get_document_line (cd->dict->dict, i));
}
GtkWidget *tv = get_widget_assert (cd->xml, "comments-textview1");
GtkWidget *check = get_widget_assert (cd->xml, "comments-checkbutton1");
GtkTextBuffer *buffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (tv));
- const char *existing_docs = dict_get_documents (cd->dict->dict);
str = g_string_new ("\n* Data File Comments.\n\n");
- if ( NULL != existing_docs)
+ if (dict_get_documents (cd->dict->dict) != NULL)
g_string_append (str, "DROP DOCUMENTS.\n");
g_string_append (str, "ADD DOCUMENT\n");
#include "data/procedure.h"
#include "language/command.h"
#include "language/lexer/lexer.h"
-#include "language/syntax-string-source.h"
#include "libpspp/cast.h"
-#include "libpspp/getl.h"
#include "output/driver.h"
#include "ui/gui/psppire-data-store.h"
#include "ui/gui/psppire-output-window.h"
extern struct dataset *the_dataset;
-extern struct source_stream *the_source_stream;
extern PsppireDataStore *the_data_store;
/* Lazy casereader callback function used by execute_syntax. */
}
gboolean
-execute_syntax (struct getl_interface *sss)
+execute_syntax (struct lex_reader *lex_reader)
{
struct lexer *lexer;
gboolean retval = TRUE;
g_return_val_if_fail (proc_has_active_file (the_dataset), FALSE);
- lexer = lex_create (the_source_stream);
-
- getl_append_source (the_source_stream, sss, GETL_BATCH, ERRMODE_CONTINUE);
+ lexer = lex_create ();
+ psppire_set_lexer (lexer);
+ lex_append (lexer, lex_reader);
for (;;)
{
if ( cmd_result_is_failure (result))
{
retval = FALSE;
- if ( source_stream_current_error_mode (the_source_stream)
- == ERRMODE_STOP )
+ if ( lex_get_error_mode (lexer) == LEX_ERROR_STOP )
break;
}
break;
}
- getl_abort_noninteractive (the_source_stream);
-
lex_destroy (lexer);
+ psppire_set_lexer (NULL);
proc_execute (the_dataset);
void
execute_const_syntax_string (const gchar *syntax)
{
- execute_syntax (create_syntax_string_source (syntax));
+ execute_syntax (lex_reader_for_string (syntax));
}
#include <glib.h>
-struct getl_interface;
+struct lex_reader;
-gboolean execute_syntax (struct getl_interface *sss);
+gboolean execute_syntax (struct lex_reader *);
gchar *execute_syntax_string (gchar *syntax);
void execute_const_syntax_string (const gchar *syntax);
#include <gtk/gtk.h>
#include <stdlib.h>
+#include "language/lexer/include-path.h"
#include "libpspp/argv-parser.h"
#include "libpspp/assertion.h"
#include "libpspp/cast.h"
-#include "libpspp/getl.h"
-#include "libpspp/version.h"
#include "libpspp/copyleft.h"
#include "libpspp/str.h"
+#include "libpspp/string-array.h"
+#include "libpspp/version.h"
#include "ui/source-init-opts.h"
#include "gl/configmake.h"
{"no-splash", 'q', no_argument, OPT_NO_SPLASH}
};
-static char *
-get_default_include_path (void)
-{
- struct source_stream *ss;
- struct string dst;
- char **path;
- size_t i;
-
- ss = create_source_stream ();
- path = getl_include_path (ss);
- ds_init_empty (&dst);
- for (i = 0; path[i] != NULL; i++)
- ds_put_format (&dst, " %s", path[i]);
- destroy_source_stream (ss);
-
- return ds_steal_cstr (&dst);
-}
-
static void
usage (void)
{
- char *default_include_path = get_default_include_path ();
+ char *inc_path = string_array_join (include_path_default (), " ");
GOptionGroup *gtk_options;
GOptionContext *ctx;
gchar *gtk_help_base, *gtk_help;
set to `compatible' to disable PSPP extensions\n\
-i, --interactive interpret syntax in interactive mode\n\
-s, --safer don't allow some unsafe operations\n\
-Default search path:%s\n\
+Default search path: %s\n\
\n\
Informative output:\n\
-h, --help display this help and exit\n\
-V, --version output version information and exit\n\
\n\
A non-option argument is interpreted as a .sav or .por file to load.\n"),
- program_name, gtk_help, default_include_path);
+ program_name, gtk_help, inc_path);
- free (default_include_path);
+ free (inc_path);
g_free (gtk_help_base);
emit_bug_reporting_address ();
struct initialisation_parameters
{
- struct source_stream *ss;
const char *data_file;
GtkWidget *splash_window;
};
run_inner_loop (gpointer data)
{
struct initialisation_parameters *ip = data;
- initialize (ip->ss, ip->data_file);
+ initialize (ip->data_file);
g_timeout_add (500, hide_splash_window, ip->splash_window);
struct initialisation_parameters init_p;
gboolean show_splash = TRUE;
struct argv_parser *parser;
- struct source_stream *ss;
const gchar *vers;
set_program_name (argv[0]);
}
- ss = create_source_stream ();
/* Parse our own options.
This must come BEFORE gdk_init otherwise options such as
--help --version which ought to work without an X server, won't.
parser = argv_parser_create ();
argv_parser_add_options (parser, startup_options, N_STARTUP_OPTIONS,
startup_option_callback, &show_splash);
- source_init_register_argv_parser (parser, ss);
+ source_init_register_argv_parser (parser);
if (!argv_parser_run (parser, argc, argv))
exit (EXIT_FAILURE);
argv_parser_destroy (parser);
gdk_init (&argc, &argv);
init_p.splash_window = create_splash_window ();
- init_p.ss = ss;
init_p.data_file = optind < argc ? argv[optind] : NULL;
if ( show_splash )
#include "data/any-reader.h"
#include "data/procedure.h"
-#include "language/syntax-string-source.h"
+#include "language/lexer/lexer.h"
#include "libpspp/message.h"
#include "ui/gui/help-menu.h"
#include "ui/gui/binomial-dialog.h"
load_file (PsppireWindow *de, const gchar *file_name)
{
gchar *native_file_name;
- struct getl_interface *sss;
struct string filename;
+ gchar *syntax;
+ bool ok;
ds_init_empty (&filename);
g_free (native_file_name);
- sss = create_syntax_format_source ("GET FILE=%s.",
- ds_cstr (&filename));
-
+ syntax = g_strdup_printf ("GET FILE=%s.", ds_cstr (&filename));
ds_destroy (&filename);
- if (execute_syntax (sss) )
- return TRUE;
-
- return FALSE;
+ ok = execute_syntax (lex_reader_for_string (syntax));
+ g_free (syntax);
+ return ok;
}
static GtkWidget *
/* PSPPIRE - a graphical user interface for PSPP.
- Copyright (C) 2004, 2006, 2007, 2009 Free Software Foundation
+ Copyright (C) 2004, 2006, 2007, 2009, 2010, 2011 Free Software Foundation
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include <gtk/gtk.h>
#include "data/dictionary.h"
+#include "data/identifier.h"
#include "data/missing-values.h"
#include "data/value-labels.h"
#include "data/variable.h"
g_assert (d);
g_assert (PSPPIRE_IS_DICT (d));
- if ( ! var_is_valid_name (name, false))
+ if ( ! dict_id_is_valid (d->dict, name, false))
return FALSE;
if ( idx < dict_get_var_cnt (d->dict))
psppire_dict_check_name (const PsppireDict *dict,
const gchar *name, gboolean report)
{
- if ( ! var_is_valid_name (name, report ) )
+ if ( ! dict_id_is_valid (dict->dict, name, report ) )
return FALSE;
if (psppire_dict_lookup_var (dict, name))
psppire_dict_rename_var (PsppireDict *dict, struct variable *v,
const gchar *name)
{
- if ( ! var_is_valid_name (name, false))
+ if ( ! dict_id_is_valid (dict->dict, name, false))
return FALSE;
/* Make sure no other variable has this name */
#include "psppire-data-window.h"
#include "psppire-window-register.h"
#include "psppire-syntax-window.h"
-#include "syntax-editor-source.h"
#include "xalloc.h"
GtkTextIter stop)
{
PsppireWindow *win = PSPPIRE_WINDOW (sw);
- const gchar *name = psppire_window_get_filename (win);
- execute_syntax (create_syntax_editor_source (sw->buffer, start, stop, name));
+ struct lex_reader *reader;
+ gchar *text;
+
+ text = gtk_text_buffer_get_text (sw->buffer, &start, &stop, FALSE);
+ reader = lex_reader_for_string (text);
+ g_free (text);
+
+ lex_reader_set_file_name (reader, psppire_window_get_filename (win));
+
+ execute_syntax (reader);
}
psppire_window_set_unsaved (window);
}
-extern struct source_stream *the_source_stream ;
-
static void
psppire_syntax_window_init (PsppireSyntaxWindow *window)
{
window->edit_paste = get_action_assert (xml, "edit_paste");
window->buffer = gtk_text_view_get_buffer (GTK_TEXT_VIEW (text_view));
- window->lexer = lex_create (the_source_stream);
window->sb = get_widget_assert (xml, "statusbar2");
window->text_context = gtk_statusbar_get_context_id (GTK_STATUSBAR (window->sb), "Text Context");
/* <private> */
GtkTextBuffer *buffer; /* The buffer which contains the text */
- struct lexer *lexer; /* Lexer to parse syntax */
GtkWidget *sb;
guint text_context;
/* PSPPIRE - a graphical user interface for PSPP.
- Copyright (C) 2006, 2009, 2010 Free Software Foundation
+ Copyright (C) 2006, 2009, 2010, 2011 Free Software Foundation
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
switch (col)
{
case PSPPIRE_VAR_STORE_COL_LABEL:
- var_set_label (pv, NULL);
+ var_clear_label (pv);
return TRUE;
break;
}
break;
case PSPPIRE_VAR_STORE_COL_LABEL:
{
- var_set_label (pv, text);
+ var_set_label (pv, text,
+ psppire_dict_encoding (var_store->dictionary), true);
return TRUE;
}
break;
#include "data/sys-file-reader.h"
#include "language/lexer/lexer.h"
-#include "language/syntax-string-source.h"
-
-#include "libpspp/getl.h"
#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/version.h"
static void create_icon_factory (void);
-struct source_stream *the_source_stream ;
struct dataset * the_dataset = NULL;
static GtkWidget *the_data_window;
-static void handle_msg (const struct msg *);
static void load_data_file (const char *);
static void
void
-initialize (struct source_stream *ss, const char *data_file)
+initialize (const char *data_file)
{
PsppireDict *dictionary = 0;
fh_init ();
the_dataset = create_dataset ();
-
- the_source_stream = ss;
- msg_init (ss, handle_msg);
+ psppire_set_lexer (NULL);
dictionary = psppire_dict_new_from_dict (dataset_dict (the_dataset));
void
de_initialize (void)
{
- destroy_source_stream (the_source_stream);
settings_done ();
output_close ();
i18n_done ();
}
static void
-handle_msg (const struct msg *m)
+handle_msg (const struct msg *m_, void *lexer_)
+{
+ struct lexer *lexer = lexer_;
+ struct msg m = *m_;
+
+ if (lexer != NULL && m.file_name == NULL)
+ {
+ m.file_name = CONST_CAST (char *, lex_get_file_name (lexer));
+ m.first_line = lex_get_first_line_number (lexer, 0);
+ m.last_line = lex_get_last_line_number (lexer, 0);
+ m.first_column = lex_get_first_column (lexer, 0);
+ m.last_column = lex_get_last_column (lexer, 0);
+ }
+
+ message_item_submit (message_item_create (&m));
+}
+
+void
+psppire_set_lexer (struct lexer *lexer)
{
- message_item_submit (message_item_create (m));
+ msg_set_handler (handle_msg, lexer);
}
#ifndef PSPPIRE_H
#define PSPPIRE_H
-struct source_stream;
+struct lexer;
-void initialize (struct source_stream *, const char *data_file);
+void initialize (const char *data_file);
void de_initialize (void);
void psppire_quit (void);
const char * output_file_name (void);
+void psppire_set_lexer (struct lexer *);
+
#endif /* PSPPIRE_H */
+++ /dev/null
-/* PSPPIRE - a graphical user interface for PSPP.
- Copyright (C) 2006, 2009 Free Software Foundation
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-
-#include <config.h>
-
-#include <libpspp/getl.h>
-#include <libpspp/compiler.h>
-#include <libpspp/cast.h>
-#include <libpspp/str.h>
-
-#include <stdlib.h>
-
-#include <gtk/gtk.h>
-
-#include "syntax-editor-source.h"
-#include "psppire-syntax-window.h"
-
-#include "xalloc.h"
-
-struct syntax_editor_source
- {
- struct getl_interface parent;
- GtkTextBuffer *buffer;
- GtkTextIter i;
- GtkTextIter end;
- const gchar *name;
- };
-
-
-static bool
-always_false (const struct getl_interface *i UNUSED)
-{
- return false;
-}
-
-/* Returns the name of the source */
-static const char *
-name (const struct getl_interface *i)
-{
- const struct syntax_editor_source *ses = (const struct syntax_editor_source *) i;
- return ses->name;
-}
-
-
-/* Returns the location within the source */
-static int
-location (const struct getl_interface *i)
-{
- const struct syntax_editor_source *ses = (const struct syntax_editor_source *) i;
-
- return gtk_text_iter_get_line (&ses->i);
-}
-
-
-static bool
-read_line_from_buffer (struct getl_interface *i,
- struct string *line)
-{
- gchar *text;
- GtkTextIter next_line;
-
- struct syntax_editor_source *ses
- = UP_CAST (i, struct syntax_editor_source, parent);
-
- if ( gtk_text_iter_compare (&ses->i, &ses->end) >= 0)
- return false;
-
- next_line = ses->i;
- gtk_text_iter_forward_line (&next_line);
-
- text = gtk_text_buffer_get_text (ses->buffer,
- &ses->i, &next_line,
- FALSE);
- g_strchomp (text);
-
- ds_assign_cstr (line, text);
-
- g_free (text);
-
- gtk_text_iter_forward_line (&ses->i);
-
- return true;
-}
-
-
-static void
-do_close (struct getl_interface *i )
-{
- free (i);
-}
-
-struct getl_interface *
-create_syntax_editor_source (GtkTextBuffer *buffer,
- GtkTextIter start,
- GtkTextIter stop,
- const gchar *nm
- )
-{
- struct syntax_editor_source *ses = xzalloc (sizeof *ses);
-
- ses->buffer = buffer;
- ses->i = start;
- ses->end = stop;
- ses->name = nm;
-
-
- ses->parent.interactive = always_false;
- ses->parent.read = read_line_from_buffer;
- ses->parent.close = do_close;
-
- ses->parent.name = name;
- ses->parent.location = location;
-
-
- return &ses->parent;
-}
+++ /dev/null
-/* PSPPIRE - a graphical user interface for PSPP.
- Copyright (C) 2006 Free Software Foundation
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#ifndef SYNTAX_EDITOR_SOURCE_H
-#define SYNTAX_EDITOR_SOURCE_H
-
-#include <gtk/gtk.h>
-struct getl_interface;
-
-struct syntax_editor;
-
-struct getl_interface *
-create_syntax_editor_source (GtkTextBuffer *buffer,
- GtkTextIter start,
- GtkTextIter stop,
- const gchar *name
- );
-
-
-
-#endif
#include "data/por-file-reader.h"
#include "data/settings.h"
#include "data/sys-file-reader.h"
-#include "language/syntax-file.h"
-#include "language/syntax-string-source.h"
+#include "language/lexer/include-path.h"
+#include "language/lexer/lexer.h"
#include "libpspp/assertion.h"
#include "libpspp/argv-parser.h"
-#include "libpspp/getl.h"
#include "libpspp/llx.h"
#include "libpspp/message.h"
#include "ui/syntax-gen.h"
};
static void
-source_init_option_callback (int id, void *ss_)
+source_init_option_callback (int id, void *aux UNUSED)
{
- struct source_stream *ss = ss_;
-
switch (id)
{
case OPT_ALGORITHM:
case OPT_INCLUDE:
if (!strcmp (optarg, "-"))
- getl_clear_include_path (ss);
+ include_path_clear ();
else
- getl_add_include_dir (ss, optarg);
+ include_path_add (optarg);
break;
case OPT_NO_INCLUDE:
- getl_clear_include_path (ss);
+ include_path_clear ();
break;
case OPT_SAFER:
}
void
-source_init_register_argv_parser (struct argv_parser *ap,
- struct source_stream *ss)
+source_init_register_argv_parser (struct argv_parser *ap)
{
argv_parser_add_options (ap, source_init_options, N_SOURCE_INIT_OPTIONS,
- source_init_option_callback, ss);
+ source_init_option_callback, NULL);
}
#define UI_SOURCE_INIT_OPTS
struct argv_parser;
-struct source_stream;
-void source_init_register_argv_parser (struct argv_parser *,
- struct source_stream *);
+void source_init_register_argv_parser (struct argv_parser *);
#endif /* ui/source/source-init-opts.h */
noinst_LTLIBRARIES += src/ui/terminal/libui.la
src_ui_terminal_libui_la_SOURCES = \
- src/ui/terminal/read-line.c \
- src/ui/terminal/read-line.h \
src/ui/terminal/main.c \
- src/ui/terminal/msg-ui.c \
- src/ui/terminal/msg-ui.h \
- src/ui/terminal/terminal.c \
- src/ui/terminal/terminal.h \
src/ui/terminal/terminal-opts.c \
- src/ui/terminal/terminal-opts.h
-
+ src/ui/terminal/terminal-opts.h \
+ src/ui/terminal/terminal-reader.c \
+ src/ui/terminal/terminal-reader.h \
+ src/ui/terminal/terminal.c \
+ src/ui/terminal/terminal.h
src_ui_terminal_libui_la_CFLAGS = $(NCURSES_CFLAGS)
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006, 2007, 2009, 2010 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2006, 2007, 2009, 2010, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
#include "gsl/gsl_errno.h"
#include "language/command.h"
#include "language/lexer/lexer.h"
-#include "language/syntax-file.h"
+#include "language/lexer/include-path.h"
#include "libpspp/argv-parser.h"
#include "libpspp/compiler.h"
-#include "libpspp/getl.h"
#include "libpspp/i18n.h"
#include "libpspp/message.h"
#include "libpspp/version.h"
#include "math/random.h"
#include "output/driver.h"
+#include "output/message-item.h"
#include "ui/debugger.h"
#include "ui/source-init-opts.h"
-#include "ui/terminal/msg-ui.h"
-#include "ui/terminal/read-line.h"
#include "ui/terminal/terminal-opts.h"
+#include "ui/terminal/terminal-reader.h"
#include "ui/terminal/terminal.h"
#include "gl/fatal-signal.h"
#include "gettext.h"
#define _(msgid) gettext (msgid)
-static struct dataset * the_dataset = NULL;
+static struct dataset *the_dataset;
-static struct lexer *the_lexer;
-static struct source_stream *the_source_stream ;
-
-static void add_syntax_file (struct source_stream *, enum syntax_mode,
- const char *file_name);
+static void add_syntax_reader (struct lexer *, const char *file_name,
+ const char *encoding, enum lex_syntax_mode);
static void bug_handler(int sig);
static void fpu_init (void);
+static void output_msg (const struct msg *, void *);
/* Program entry point. */
int
{
struct terminal_opts *terminal_opts;
struct argv_parser *parser;
- enum syntax_mode syntax_mode;
+ enum lex_syntax_mode syntax_mode;
+ char *syntax_encoding;
bool process_statrc;
+ struct lexer *lexer;
set_program_name (argv[0]);
gsl_set_error_handler_off ();
fh_init ();
- the_source_stream = create_source_stream ();
- readln_initialize ();
settings_init ();
terminal_check_size ();
random_init ();
+ lexer = lex_create ();
the_dataset = create_dataset ();
parser = argv_parser_create ();
- terminal_opts = terminal_opts_init (parser, &syntax_mode, &process_statrc);
- source_init_register_argv_parser (parser, the_source_stream);
+ terminal_opts = terminal_opts_init (parser, &syntax_mode, &process_statrc,
+ &syntax_encoding);
+ source_init_register_argv_parser (parser);
if (!argv_parser_run (parser, argc, argv))
exit (EXIT_FAILURE);
terminal_opts_done (terminal_opts, argc, argv);
argv_parser_destroy (parser);
- msg_ui_init (the_source_stream);
+ msg_set_handler (output_msg, lexer);
+ dataset_set_default_syntax_encoding (the_dataset, syntax_encoding);
/* Add syntax files to source stream. */
if (process_statrc)
{
- char *rc = fn_search_path ("rc", getl_include_path (the_source_stream));
+ char *rc = include_path_search ("rc");
if (rc != NULL)
{
- add_syntax_file (the_source_stream, GETL_BATCH, rc);
+ add_syntax_reader (lexer, rc, "Auto", LEX_SYNTAX_AUTO);
free (rc);
}
}
int i;
for (i = optind; i < argc; i++)
- add_syntax_file (the_source_stream, syntax_mode, argv[i]);
+ add_syntax_reader (lexer, argv[i], syntax_encoding, syntax_mode);
}
else
- add_syntax_file (the_source_stream, syntax_mode, "-");
+ add_syntax_reader (lexer, "-", syntax_encoding, syntax_mode);
/* Parse and execute syntax. */
- the_lexer = lex_create (the_source_stream);
+ lex_get (lexer);
for (;;)
{
- int result = cmd_parse (the_lexer, the_dataset);
+ int result = cmd_parse (lexer, the_dataset);
if (result == CMD_EOF || result == CMD_FINISH)
break;
- if (result == CMD_CASCADING_FAILURE &&
- !getl_is_interactive (the_source_stream))
- {
- msg (SE, _("Stopping syntax file processing here to avoid "
- "a cascade of dependent command failures."));
- getl_abort_noninteractive (the_source_stream);
- }
- else if (msg_ui_too_many_errors ())
- getl_abort_noninteractive (the_source_stream);
+ else if (cmd_result_is_failure (result) && lex_token (lexer) != T_STOP)
+ {
+ if (lex_get_error_mode (lexer) == LEX_ERROR_STOP)
+ {
+ msg (MW, _("Error encountered while ERROR=STOP is effective."));
+ lex_discard_noninteractive (lexer);
+ }
+ else if (result == CMD_CASCADING_FAILURE
+ && lex_get_error_mode (lexer) != LEX_ERROR_INTERACTIVE)
+ {
+ msg (SE, _("Stopping syntax file processing here to avoid "
+ "a cascade of dependent command failures."));
+ lex_discard_noninteractive (lexer);
+ }
+ }
+
+ if (msg_ui_too_many_errors ())
+ lex_discard_noninteractive (lexer);
}
random_done ();
settings_done ();
fh_done ();
- lex_destroy (the_lexer);
- destroy_source_stream (the_source_stream);
- readln_uninitialize ();
+ lex_destroy (lexer);
output_close ();
- msg_ui_done ();
i18n_done ();
return msg_ui_any_errors ();
}
-
+\f
static void
fpu_init (void)
{
}
static void
-add_syntax_file (struct source_stream *ss, enum syntax_mode syntax_mode,
- const char *file_name)
+output_msg (const struct msg *m_, void *lexer_)
+{
+ struct lexer *lexer = lexer_;
+ struct msg m = *m_;
+
+ if (m.file_name == NULL)
+ {
+ m.file_name = CONST_CAST (char *, lex_get_file_name (lexer));
+ m.first_line = lex_get_first_line_number (lexer, 0);
+ m.last_line = lex_get_last_line_number (lexer, 0);
+ }
+
+ message_item_submit (message_item_create (&m));
+}
+
+static void
+add_syntax_reader (struct lexer *lexer, const char *file_name,
+ const char *encoding, enum lex_syntax_mode syntax_mode)
{
- struct getl_interface *source;
+ struct lex_reader *reader;
+
+ reader = (!strcmp (file_name, "-") && isatty (STDIN_FILENO)
+ ? terminal_reader_create ()
+ : lex_reader_for_file (file_name, encoding, syntax_mode,
+ LEX_ERROR_CONTINUE));
- source = (!strcmp (file_name, "-") && isatty (STDIN_FILENO)
- ? create_readln_source ()
- : create_syntax_file_source (file_name));
- getl_append_source (ss, source, syntax_mode, ERRMODE_CONTINUE);
+ if (reader)
+ lex_append (lexer, reader);
}
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006, 2010 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#include <config.h>
-
-#include "msg-ui.h"
-#include "libpspp/message.h"
-#include "libpspp/msg-locator.h"
-#include "output/message-item.h"
-
-static void
-handle_msg (const struct msg *m)
-{
- message_item_submit (message_item_create (m));
-}
-
-void
-msg_ui_init (struct source_stream *ss)
-{
- msg_init (ss, handle_msg);
-}
-
-void
-msg_ui_done (void)
-{
- msg_done ();
- msg_locator_done ();
-}
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2006 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#ifndef MSG_UI_H
-#define MSG_UI_H 1
-
-#include <stdbool.h>
-#include <stdio.h>
-
-struct source_stream;
-
-void msg_ui_set_error_file (FILE *);
-void msg_ui_init (struct source_stream *);
-void msg_ui_done (void);
-
-#endif /* msg-ui.h */
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2007, 2009, 2011 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#include <config.h>
-
-#include "ui/terminal/read-line.h"
-
-#include <stdlib.h>
-#include <stdbool.h>
-#include <assert.h>
-#include <errno.h>
-#if ! HAVE_READLINE
-#include <stdint.h>
-#endif
-
-#include "data/file-name.h"
-#include "data/settings.h"
-#include "language/command.h"
-#include "language/prompt.h"
-#include "libpspp/cast.h"
-#include "libpspp/message.h"
-#include "libpspp/str.h"
-#include "libpspp/version.h"
-#include "output/driver.h"
-#include "output/journal.h"
-#include "ui/terminal/msg-ui.h"
-#include "ui/terminal/terminal.h"
-
-#include "gl/xalloc.h"
-
-#include "gettext.h"
-#define _(msgid) gettext (msgid)
-
-#if HAVE_READLINE
-#include <readline/readline.h>
-#include <readline/history.h>
-
-static char *history_file;
-
-static char **complete_command_name (const char *, int, int);
-static char **dont_complete (const char *, int, int);
-#endif /* HAVE_READLINE */
-
-
-struct readln_source
-{
- struct getl_interface parent ;
-
- bool (*interactive_func) (struct string *line,
- enum prompt_style) ;
-};
-
-
-static bool initialised = false;
-
-/* Initialize getl. */
-void
-readln_initialize (void)
-{
- initialised = true;
-
-#if HAVE_READLINE
- rl_basic_word_break_characters = "\n";
- using_history ();
- stifle_history (500);
- if (history_file == NULL)
- {
- const char *home_dir = getenv ("HOME");
- if (home_dir != NULL)
- {
- history_file = xasprintf ("%s/.pspp_history", home_dir);
- read_history (history_file);
- }
- }
-#endif
-}
-
-/* Close getl. */
-void
-readln_uninitialize (void)
-{
- initialised = false;
-
-#if HAVE_READLINE
- if (history_file != NULL && false == settings_get_testing_mode () )
- write_history (history_file);
- clear_history ();
- free (history_file);
-#endif
-}
-
-
-static bool
-read_interactive (struct getl_interface *s,
- struct string *line)
-{
- struct readln_source *is = UP_CAST (s, struct readln_source, parent);
-
- return is->interactive_func (line, prompt_get_style ());
-}
-
-static bool
-always_true (const struct getl_interface *s UNUSED)
-{
- return true;
-}
-
-/* Display a welcoming message. */
-static void
-welcome (void)
-{
- static bool welcomed = false;
- if (welcomed)
- return;
- welcomed = true;
- fputs ("PSPP is free software and you are welcome to distribute copies of "
- "it\nunder certain conditions; type \"show copying.\" to see the "
- "conditions.\nThere is ABSOLUTELY NO WARRANTY for PSPP; type \"show "
- "warranty.\" for details.\n", stdout);
- puts (stat_version);
- readln_initialize ();
- journal_enable ();
-}
-
-/* Gets a line from the user and stores it into LINE.
- Prompts the user with PROMPT.
- Returns true if successful, false at end of file.
- */
-static bool
-readln_read (struct string *line, enum prompt_style style)
-{
- const char *prompt = prompt_get (style);
-#if HAVE_READLINE
- char *string;
-#endif
- bool eof;
-
- assert (initialised);
-
- msg_ui_reset_counts ();
-
- welcome ();
-
- output_flush ();
-
-#if HAVE_READLINE
- rl_attempted_completion_function = (style == PROMPT_FIRST
- ? complete_command_name
- : dont_complete);
- string = readline (prompt);
- if (string == NULL)
- eof = true;
- else
- {
- if (string[0])
- add_history (string);
- ds_assign_cstr (line, string);
- free (string);
- eof = false;
- }
-#else
- fputs (prompt, stdout);
- fflush (stdout);
- if (ds_read_line (line, stdin, SIZE_MAX))
- {
- ds_chomp (line, '\n');
- eof = false;
- }
- else
- eof = true;
-#endif
-
- /* Check whether the size of the window has changed, so that
- the output drivers can adjust their settings as needed. We
- only do this for the first line of a command, as it's
- possible that the output drivers are actually in use
- afterward, and we don't want to confuse them in the middle
- of output. */
- if (style == PROMPT_FIRST)
- terminal_check_size ();
-
- return !eof;
-}
-
-static void
-readln_close (struct getl_interface *i)
-{
- free (i);
-}
-
-/* Creates a source which uses readln to get its line */
-struct getl_interface *
-create_readln_source (void)
-{
- struct readln_source *rlns = xzalloc (sizeof (*rlns));
-
- rlns->interactive_func = readln_read;
-
- rlns->parent.interactive = always_true;
- rlns->parent.read = read_interactive;
- rlns->parent.close = readln_close;
-
- return &rlns->parent;
-}
-
-
-#if HAVE_READLINE
-static char *command_generator (const char *text, int state);
-
-/* Returns a set of command name completions for TEXT.
- This is of the proper form for assigning to
- rl_attempted_completion_function. */
-static char **
-complete_command_name (const char *text, int start, int end UNUSED)
-{
- if (start == 0)
- {
- /* Complete command name at start of line. */
- return rl_completion_matches (text, command_generator);
- }
- else
- {
- /* Otherwise don't do any completion. */
- rl_attempted_completion_over = 1;
- return NULL;
- }
-}
-
-/* Do not do any completion for TEXT. */
-static char **
-dont_complete (const char *text UNUSED, int start UNUSED, int end UNUSED)
-{
- rl_attempted_completion_over = 1;
- return NULL;
-}
-
-/* If STATE is 0, returns the first command name matching TEXT.
- Otherwise, returns the next command name matching TEXT.
- Returns a null pointer when no matches are left. */
-static char *
-command_generator (const char *text, int state)
-{
- static const struct command *cmd;
- const char *name;
-
- if (state == 0)
- cmd = NULL;
- name = cmd_complete (text, &cmd);
- return name ? xstrdup (name) : NULL;
-}
-#endif /* HAVE_READLINE */
+++ /dev/null
-/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2011 Free Software Foundation, Inc.
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>. */
-
-#ifndef READLN_H
-#define READLN_H
-
-#include "libpspp/str.h"
-#include "libpspp/getl.h"
-
-void readln_initialize (void);
-void readln_uninitialize (void);
-
-struct getl_interface *create_readln_source (void);
-
-
-
-#endif /* READLN_H */
-
#include "data/settings.h"
#include "data/file-name.h"
-#include "language/syntax-file.h"
+#include "language/lexer/include-path.h"
#include "libpspp/argv-parser.h"
#include "libpspp/assertion.h"
#include "libpspp/cast.h"
#include "libpspp/compiler.h"
-#include "libpspp/getl.h"
#include "libpspp/llx.h"
#include "libpspp/str.h"
#include "libpspp/string-array.h"
#include "output/driver.h"
#include "output/driver-provider.h"
#include "output/msglog.h"
-#include "ui/terminal/msg-ui.h"
-#include "ui/terminal/read-line.h"
#include "gl/error.h"
#include "gl/progname.h"
struct terminal_opts
{
- enum syntax_mode *syntax_mode;
struct string_map options; /* Output driver options. */
bool has_output_driver;
bool has_terminal_driver;
bool has_error_file;
+ enum lex_syntax_mode *syntax_mode;
bool *process_statrc;
+ char **syntax_encoding;
};
enum
OPT_OUTPUT,
OPT_OUTPUT_OPTION,
OPT_NO_OUTPUT,
+ OPT_BATCH,
OPT_INTERACTIVE,
+ OPT_SYNTAX_ENCODING,
OPT_NO_STATRC,
OPT_HELP,
OPT_VERSION,
{"output", 'o', required_argument, OPT_OUTPUT},
{NULL, 'O', required_argument, OPT_OUTPUT_OPTION},
{"no-output", 0, no_argument, OPT_NO_OUTPUT},
+ {"batch", 'b', no_argument, OPT_BATCH},
{"interactive", 'i', no_argument, OPT_INTERACTIVE},
+ {"syntax-encoding", 0, required_argument, OPT_SYNTAX_ENCODING},
{"no-statrc", 'r', no_argument, OPT_NO_STATRC},
{"help", 'h', no_argument, OPT_HELP},
{"version", 'V', no_argument, OPT_VERSION},
return format_string;
}
-static char *
-get_default_include_path (void)
-{
- struct source_stream *ss;
- struct string dst;
- char **path;
- size_t i;
-
- ss = create_source_stream ();
- path = getl_include_path (ss);
- ds_init_empty (&dst);
- for (i = 0; path[i] != NULL; i++)
- ds_put_format (&dst, " %s", path[i]);
- destroy_source_stream (ss);
-
- return ds_steal_cstr (&dst);
-}
-
static void
usage (void)
{
char *supported_formats = get_supported_formats ();
- char *default_include_path = get_default_include_path ();
+ char *inc_path = string_array_join (include_path_default (), " ");
printf (_("\
PSPP, a program for statistical analysis of sample data.\n\
calculated from broken algorithms\n\
-x, --syntax={compatible|enhanced}\n\
set to `compatible' to disable PSPP extensions\n\
+ -b, --batch interpret syntax in batch mode\n\
-i, --interactive interpret syntax in interactive mode\n\
+ --syntax-encoding=ENCODING specify encoding for syntax files\n\
-s, --safer don't allow some unsafe operations\n\
-Default search path:%s\n\
+Default search path: %s\n\
\n\
Informative output:\n\
-h, --help display this help and exit\n\
-V, --version output version information and exit\n\
\n\
Non-option arguments are interpreted as syntax files to execute.\n"),
- program_name, supported_formats, default_include_path);
+ program_name, supported_formats, inc_path);
free (supported_formats);
- free (default_include_path);
+ free (inc_path);
emit_bug_reporting_address ();
exit (EXIT_SUCCESS);
to->has_output_driver = true;
break;
+ case OPT_BATCH:
+ *to->syntax_mode = LEX_SYNTAX_BATCH;
+ break;
+
case OPT_INTERACTIVE:
- *to->syntax_mode = GETL_INTERACTIVE;
+ *to->syntax_mode = LEX_SYNTAX_INTERACTIVE;
+ break;
+
+ case OPT_SYNTAX_ENCODING:
+ *to->syntax_encoding = optarg;
break;
case OPT_NO_STATRC:
struct terminal_opts *
terminal_opts_init (struct argv_parser *ap,
- enum syntax_mode *syntax_mode, bool *process_statrc)
+ enum lex_syntax_mode *syntax_mode, bool *process_statrc,
+ char **syntax_encoding)
{
struct terminal_opts *to;
- *syntax_mode = GETL_BATCH;
+ *syntax_mode = LEX_SYNTAX_AUTO;
*process_statrc = true;
+ *syntax_encoding = "Auto";
to = xzalloc (sizeof *to);
to->syntax_mode = syntax_mode;
string_map_init (&to->options);
to->has_output_driver = false;
to->has_error_file = false;
+ to->syntax_mode = syntax_mode;
to->process_statrc = process_statrc;
+ to->syntax_encoding = syntax_encoding;
argv_parser_add_options (ap, terminal_argv_options, N_TERMINAL_OPTIONS,
terminal_option_callback, to);
#define UI_TERMINAL_TERMINAL_OPTS_H 1
#include <stdbool.h>
-#include "libpspp/getl.h"
+#include "language/lexer/lexer.h"
struct argv_parser;
+struct lexer;
struct terminal_opts;
struct terminal_opts *terminal_opts_init (struct argv_parser *,
- enum syntax_mode *,
- bool *process_statrc);
+ enum lex_syntax_mode *,
+ bool *process_statrc,
+ char **syntax_encoding);
void terminal_opts_done (struct terminal_opts *, int argc, char *argv[]);
#endif /* ui/terminal/terminal-opts.h */
--- /dev/null
+/* PSPP - a program for statistical analysis.
+ Copyright (C) 1997-9, 2000, 2007, 2009, 2010, 2011 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "ui/terminal/terminal-reader.h"
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "data/file-name.h"
+#include "data/settings.h"
+#include "language/command.h"
+#include "language/lexer/lexer.h"
+#include "libpspp/assertion.h"
+#include "libpspp/cast.h"
+#include "libpspp/message.h"
+#include "libpspp/prompt.h"
+#include "libpspp/str.h"
+#include "libpspp/version.h"
+#include "output/driver.h"
+#include "output/journal.h"
+#include "ui/terminal/terminal.h"
+
+#include "gl/minmax.h"
+#include "gl/xalloc.h"
+
+#include "gettext.h"
+#define _(msgid) gettext (msgid)
+
+struct terminal_reader
+ {
+ struct lex_reader reader;
+ struct substring s;
+ size_t offset;
+ bool eof;
+ };
+
+static int n_terminal_readers;
+
+static void readline_init (void);
+static void readline_done (void);
+static struct substring readline_read (enum prompt_style);
+
+/* Display a welcoming message. */
+static void
+welcome (void)
+{
+ static bool welcomed = false;
+ if (welcomed)
+ return;
+ welcomed = true;
+ fputs ("PSPP is free software and you are welcome to distribute copies of "
+ "it\nunder certain conditions; type \"show copying.\" to see the "
+ "conditions.\nThere is ABSOLUTELY NO WARRANTY for PSPP; type \"show "
+ "warranty.\" for details.\n", stdout);
+ puts (stat_version);
+ journal_enable ();
+}
+
+static struct terminal_reader *
+terminal_reader_cast (struct lex_reader *r)
+{
+ return UP_CAST (r, struct terminal_reader, reader);
+}
+
+static size_t
+terminal_reader_read (struct lex_reader *r_, char *buf, size_t n,
+ enum prompt_style prompt_style)
+{
+ struct terminal_reader *r = terminal_reader_cast (r_);
+ size_t chunk;
+
+ if (r->offset >= r->s.length && !r->eof)
+ {
+ welcome ();
+ msg_ui_reset_counts ();
+ output_flush ();
+
+ ss_dealloc (&r->s);
+ r->s = readline_read (prompt_style);
+ r->offset = 0;
+ r->eof = ss_is_empty (r->s);
+
+ /* Check whether the size of the window has changed, so that
+ the output drivers can adjust their settings as needed. We
+ only do this for the first line of a command, as it's
+ possible that the output drivers are actually in use
+ afterward, and we don't want to confuse them in the middle
+ of output. */
+ if (prompt_style == PROMPT_FIRST)
+ terminal_check_size ();
+ }
+
+ chunk = MIN (n, r->s.length - r->offset);
+ memcpy (buf, r->s.string + r->offset, chunk);
+ r->offset += chunk;
+ return chunk;
+}
+
+static void
+terminal_reader_close (struct lex_reader *r_)
+{
+ struct terminal_reader *r = terminal_reader_cast (r_);
+
+ ss_dealloc (&r->s);
+ free (r->reader.file_name);
+ free (r);
+
+ if (!--n_terminal_readers)
+ readline_done ();
+}
+
+static struct lex_reader_class terminal_reader_class =
+ {
+ terminal_reader_read,
+ terminal_reader_close
+ };
+
+/* Creates a source which uses readln to get its line */
+struct lex_reader *
+terminal_reader_create (void)
+{
+ struct terminal_reader *r;
+
+ if (!n_terminal_readers++)
+ readline_init ();
+
+ r = xzalloc (sizeof *r);
+ r->reader.class = &terminal_reader_class;
+ r->reader.syntax = LEX_SYNTAX_INTERACTIVE;
+ r->reader.error = LEX_ERROR_INTERACTIVE;
+ r->reader.file_name = NULL;
+ r->s = ss_empty ();
+ r->offset = 0;
+ r->eof = false;
+ return &r->reader;
+}
+\f
+#if HAVE_READLINE
+#include <readline/readline.h>
+#include <readline/history.h>
+
+static char *history_file;
+
+static char **complete_command_name (const char *, int, int);
+static char **dont_complete (const char *, int, int);
+static char *command_generator (const char *text, int state);
+
+static void
+readline_init (void)
+{
+ rl_basic_word_break_characters = "\n";
+ using_history ();
+ stifle_history (500);
+ if (history_file == NULL)
+ {
+ const char *home_dir = getenv ("HOME");
+ if (home_dir != NULL)
+ {
+ history_file = xasprintf ("%s/.pspp_history", home_dir);
+ read_history (history_file);
+ }
+ }
+}
+
+static void
+readline_done (void)
+{
+ if (history_file != NULL && false == settings_get_testing_mode () )
+ write_history (history_file);
+ clear_history ();
+ free (history_file);
+}
+
+static const char *
+readline_prompt (enum prompt_style style)
+{
+ switch (style)
+ {
+ case PROMPT_FIRST:
+ return "PSPP> ";
+
+ case PROMPT_LATER:
+ return " > ";
+
+ case PROMPT_DATA:
+ return "data> ";
+
+ case PROMPT_COMMENT:
+ return "comment> ";
+
+ case PROMPT_DOCUMENT:
+ return "document> ";
+
+ case PROMPT_DO_REPEAT:
+ return "DO REPEAT> ";
+ }
+
+ NOT_REACHED ();
+}
+
+static struct substring
+readline_read (enum prompt_style style)
+{
+ char *string;
+
+ rl_attempted_completion_function = (style == PROMPT_FIRST
+ ? complete_command_name
+ : dont_complete);
+ string = readline (readline_prompt (style));
+ if (string != NULL)
+ {
+ char *end;
+
+ if (string[0])
+ add_history (string);
+
+ end = strchr (string, '\0');
+ *end = '\n';
+ return ss_buffer (string, end - string + 1);
+ }
+ else
+ return ss_empty ();
+}
+
+/* Returns a set of command name completions for TEXT.
+ This is of the proper form for assigning to
+ rl_attempted_completion_function. */
+static char **
+complete_command_name (const char *text, int start, int end UNUSED)
+{
+ if (start == 0)
+ {
+ /* Complete command name at start of line. */
+ return rl_completion_matches (text, command_generator);
+ }
+ else
+ {
+ /* Otherwise don't do any completion. */
+ rl_attempted_completion_over = 1;
+ return NULL;
+ }
+}
+
+/* Do not do any completion for TEXT. */
+static char **
+dont_complete (const char *text UNUSED, int start UNUSED, int end UNUSED)
+{
+ rl_attempted_completion_over = 1;
+ return NULL;
+}
+
+/* If STATE is 0, returns the first command name matching TEXT.
+ Otherwise, returns the next command name matching TEXT.
+ Returns a null pointer when no matches are left. */
+static char *
+command_generator (const char *text, int state)
+{
+ static const struct command *cmd;
+ const char *name;
+
+ if (state == 0)
+ cmd = NULL;
+ name = cmd_complete (text, &cmd);
+ return name ? xstrdup (name) : NULL;
+}
+#else /* !HAVE_READLINE */
+static void
+readline_init (void)
+{
+}
+
+static void
+readline_done (void)
+{
+}
+
+static struct substring
+readline_read (enum prompt_style style)
+{
+ const char *prompt = prompt_get (style);
+ struct string line;
+
+ fputs (prompt, stdout);
+ fflush (stdout);
+ ds_init_empty (&line);
+ ds_read_line (&line, stdin, SIZE_MAX);
+
+ return line.ss;
+}
+#endif /* !HAVE_READLINE */
--- /dev/null
+/* PSPP - a program for statistical analysis.
+ Copyright (C) 1997-9, 2000, 2010 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#ifndef TERMINAL_READER_H
+#define TERMINAL_READER_H
+
+struct lex_reader *terminal_reader_create (void);
+
+#endif /* terminal-reader.h */
+
EXECUTE.
])
AT_CHECK([pspp -O format=csv wkday.sps], [0], [dnl
-wkday.sps:20.1-2: warning: Data for variable wkday2 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.2: warning: Data for variable wkday2 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-3: warning: Data for variable wkday3 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.3: warning: Data for variable wkday3 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-4: warning: Data for variable wkday4 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.4: warning: Data for variable wkday4 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-5: warning: Data for variable wkday5 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.5: warning: Data for variable wkday5 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-6: warning: Data for variable wkday6 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.6: warning: Data for variable wkday6 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-7: warning: Data for variable wkday7 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.7: warning: Data for variable wkday7 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-8: warning: Data for variable wkday8 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.8: warning: Data for variable wkday8 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-9: warning: Data for variable wkday9 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.9: warning: Data for variable wkday9 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
-wkday.sps:20.1-10: warning: Data for variable wkday10 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
+wkday.sps:20.1-20.10: warning: Data for variable wkday10 is not valid as format WKDAY: Unrecognized weekday name. At least the first two letters of an English weekday name must be specified.
])
AT_CHECK([cat wkday.out], [0], [dnl
. . . . . . . . . @&t@
EXECUTE.
])
AT_CHECK([pspp -O format=csv month.sps], [0], [dnl
-month.sps:15.1-4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:15.1-15.4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:15.1-5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:15.1-15.5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:15.1-6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:15.1-15.6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:15.1-7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:15.1-15.7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:15.1-8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:15.1-15.8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:15.1-9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:15.1-15.9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:15.1-10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:15.1-15.10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:26.1-10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:26.1-26.10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.3: warning: Data for variable month3 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.4: warning: Data for variable month4 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.5: warning: Data for variable month5 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.6: warning: Data for variable month6 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.7: warning: Data for variable month7 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.8: warning: Data for variable month8 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.9: warning: Data for variable month9 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
-month.sps:39.1-10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
+month.sps:39.1-39.10: warning: Data for variable month10 is not valid as format MONTH: Unrecognized month format. Months may be specified as Arabic or Roman numerals or as at least 3 letters of their English names.
])
AT_CHECK([cat month.out], [0], [dnl
. . . . . . . . @&t@
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xd4: Misplaced type 4 record.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xd4: Unrecognized record type 8.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xb4: Invalid variable name `$UM1'.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xb4: Invalid variable name `TO'.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xb4: Bad width 256 for variable VAR1.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xd4: Duplicate variable name `VAR1'.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xb4: Variable label indicator field is not 0 or 1.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
["error: `sys-file.sav' near offset 0xb4: Numeric missing value indicator field is not -3, -2, 0, 1, 2, or 3."
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
["error: `sys-file.sav' near offset 0xb4: String missing value indicator field is not 0, 1, 2, or 3."
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xb4: Missing string continuation record.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0xc0: Unknown variable format 255.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav': Weighting variable must be numeric (not string variable `STR1').
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0x4c: Variable index 3 not in valid range 1...2.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1],
[error: `sys-file.sav' near offset 0x4c: Variable index 3 refers to long string continuation.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0x12c: Duplicate type 6 (document) record.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0xd4: Number of document lines (0) must be greater than 0 and less than 26843545.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0xd8: Record type 7 subtype 3 too large.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0xd8: Floating-point representation indicated by system file (2) differs from expected (1).
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
"warning: `sys-file.sav' near offset 0xd8: NUM1 listed in very long string record with width 00255, which requires only one segment."
error: `sys-file.sav' near offset 0xd8: Very long string NUM1 overflows dictionary.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0x4f8: Very long string with width 256 has segment 1 of width 9 (expected 4).
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0xd4: Invalid number of labels 2147483647.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0xe8: Variable index record (type 4) does not immediately follow value label record (type 3) as it should.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0xec: Number of variables associated with a value label (0) is not between 1 and the number of variables (1).
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
error: `sys-file.sav' near offset 0xf4: Value labels may not be added to long string variables (e.g. STR1) using records types 3 and 4.
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv sys-file.sps], [1], [dnl
"error: `sys-file.sav' near offset 0xf4: Variables associated with value label are not all of identical type. Variable STR1 is string, but variable NUM1 is numeric."
-
-sys-file.sps:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
3,4
5,6
7,8
-
-sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
Table: Data List
num1,num2
1,2
-
-sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
Table: Data List
str14
one data item @&t@
-
-sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
Table: Data List
num1,num2,str4,str8,str15
-99,0,,abcdefgh,0123 @&t@
-
-sys-file.sps:2: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
done
AT_CLEANUP
#include "gettext.h"
#define _(msgid) gettext (msgid)
-#define VAR_NAME_LEN 64
+#define ID_MAX_LEN 64
struct sfm_reader
{
while (ftello (r->file) - start < size * count)
{
long long posn = ftello (r->file);
- char var_name[VAR_NAME_LEN + 1];
+ char var_name[ID_MAX_LEN + 1];
int var_name_len;
int n_values;
int width;
/* Read variable name. */
var_name_len = read_int (r);
- if (var_name_len > VAR_NAME_LEN)
+ if (var_name_len > ID_MAX_LEN)
sys_error (r, _("Variable name length in long string value label "
"record (%d) exceeds %d-byte limit."),
- var_name_len, VAR_NAME_LEN);
+ var_name_len, ID_MAX_LEN);
read_string (r, var_name, var_name_len + 1);
/* Read width, number of values. */
AT_BANNER([DO REPEAT])
-AT_SETUP([DO REPEAT -- ordinary])
+AT_SETUP([DO REPEAT -- simple])
+AT_DATA([do-repeat.sps], [dnl
+INPUT PROGRAM.
+STRING y(A1).
+DO REPEAT xval = 1 2 3 / yval = 'a' 'b' 'c' / var = a b c.
+COMPUTE x=xval.
+COMPUTE y=yval.
+COMPUTE var=xval.
+END CASE.
+END REPEAT.
+END FILE.
+END INPUT PROGRAM.
+LIST.
+])
+AT_CHECK([pspp -o pspp.csv do-repeat.sps])
+AT_CHECK([cat pspp.csv], [0], [dnl
+Table: Data List
+y,x,a,b,c
+a,1.00,1.00,. ,. @&t@
+b,2.00,. ,2.00,. @&t@
+c,3.00,. ,. ,3.00
+])
+AT_CLEANUP
+
+AT_SETUP([DO REPEAT -- containing BEGIN DATA])
+AT_DATA([do-repeat.sps], [dnl
+DO REPEAT offset = 1 2 3.
+DATA LIST NOTABLE /x 1-2.
+BEGIN DATA.
+10
+20
+30
+END DATA.
+COMPUTE x = x + offset.
+LIST.
+END REPEAT.
+])
+AT_CHECK([pspp -o pspp.csv do-repeat.sps])
+AT_CHECK([cat pspp.csv], [0], [dnl
+Table: Data List
+x
+11
+21
+31
+
+Table: Data List
+x
+12
+22
+32
+
+Table: Data List
+x
+13
+23
+33
+])
+AT_CLEANUP
+
+AT_SETUP([DO REPEAT -- dummy vars not expanded in include files])
+AT_DATA([include.sps], [dnl
+COMPUTE y = y + x + 10.
+])
+AT_DATA([do-repeat.sps], [dnl
+INPUT PROGRAM.
+COMPUTE x = 0.
+COMPUTE y = 0.
+END CASE.
+END FILE.
+END INPUT PROGRAM.
+
+DO REPEAT x = 1 2 3.
+INCLUDE 'include.sps'.
+END REPEAT.
+
+LIST.
+])
+AT_CHECK([pspp -o pspp.csv do-repeat.sps], [0], [dnl
+do-repeat.sps:8: warning: DO REPEAT: Dummy variable name `x' hides dictionary variable `x'.
+])
+AT_CHECK([cat pspp.csv], [0], [dnl
+do-repeat.sps:8: warning: DO REPEAT: Dummy variable name `x' hides dictionary variable `x'.
+
+Table: Data List
+x,y
+.00,30.00
+])
+AT_CLEANUP
+
+AT_SETUP([DO REPEAT -- nested])
AT_DATA([do-repeat.sps], [dnl
DATA LIST NOTABLE /a 1.
BEGIN DATA.
DATA LIST NOTABLE /x 1.
DO REPEAT y = 1 TO 10.
])
-AT_CHECK([pspp -o pspp.csv do-repeat.sps], [1], [dnl
-error: DO REPEAT: DO REPEAT without END REPEAT.
-error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
-])
-AT_CHECK([cat pspp.csv], [0], [dnl
-error: DO REPEAT: DO REPEAT without END REPEAT.
-
-error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
+AT_CHECK([pspp -O format=csv do-repeat.sps], [1], [dnl
+error: DO REPEAT: Syntax error at end of input: expecting `END'.
])
AT_CLEANUP
C,F8.0
D,F8.0
-data-list.pspp:3.9-13: warning: Data for variable D is not valid as format F: Number followed by garbage.
+data-list.pspp:3.9-3.13: warning: Data for variable D is not valid as format F: Number followed by garbage.
Table: Data List
A,B,C,D
list.
])
AT_CHECK([pspp -O format=csv data-list.pspp], [0], [dnl
-data-list.pspp:8.1-3: warning: Data for variable count is not valid as format F: Field contents are not numeric.
+data-list.pspp:8.1-8.3: warning: Data for variable count is not valid as format F: Field contents are not numeric.
-data-list.pspp:11.1-3: warning: Data for variable count is not valid as format F: Field contents are not numeric.
+data-list.pspp:11.1-11.3: warning: Data for variable count is not valid as format F: Field contents are not numeric.
Table: Data List
start,end,count
dnl interactive mode.
AT_CHECK([echo "GET /FILE='nonexistent.sav'." | pspp -O format=csv], [1], [dnl
error: An error occurred while opening `nonexistent.sav': No such file or directory.
-
--:1: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
AT_CLEANUP
])
AT_CHECK([pspp -O format=csv input-program.sps], [1], [dnl
input-program.sps:3: error: BEGIN DATA: BEGIN DATA is not allowed inside INPUT PROGRAM.
-
-input-program.sps:4: error: Unknown command `123456789'.
-
-input-program.sps:5: error: Unknown command `END DATA'.
])
AT_CLEANUP
DESCRIPTIVES x.
])
AT_CHECK([pspp -O format=csv input-program.sps], [1], [dnl
-error: DESCRIPTIVES: Syntax error at end of file: expecting `BEGIN'.
+error: DESCRIPTIVES: Syntax error at end of input: expecting `BEGIN'.
])
AT_CLEANUP
LIST.
])
AT_CHECK([pspp -O format=csv print.sps], [1], [dnl
-print.sps:7: error: PRINT: Syntax error at `F8.2': expecting a valid subcommand.
+print.sps:7.7-7.10: error: PRINT: Syntax error at `F8.2': expecting a valid subcommand.
Table: Data List
a,b
missing-values.sps:8: error: MISSING VALUES: Truncating missing value to maximum acceptable length (8 bytes).
-missing-values.sps:11: error: MISSING VALUES: Syntax error at `THRU': expecting string.
+missing-values.sps:11.26-11.29: error: MISSING VALUES: Syntax error at `THRU': expecting string.
missing-values.sps:11: error: MISSING VALUES: THRU is not a variable name.
AT_CAPTURE_FILE([evaluate.sps])
m4_pushdef([i], [2])
AT_CHECK([pspp --testing-mode --error-file=- --no-output evaluate.sps],
- [m4_if(m4_bregexp([m4_foreach([check], [m4_shift($@)], [m4_argn(3, check)])], [error:]), [-1], [0], [1])],
+ [m4_if(m4_bregexp([m4_foreach([check], [m4_shift($@)], [m4_argn(3, check)])], [error:]), [-1], [0], [1])],
+ [stdout])
+ # Use sed to transform "file:line.column:" into plain "file:line:",
+ # because column numbers change between opt and noopt versions.
+ AT_CHECK([[sed 's/\(evaluate.sps:[0-9]\{1,\}\)\.[0-9]\{1,\}:/\1:/' stdout]],
+ [0],
[m4_foreach([check], [m4_shift($@)],
[m4_define([i], m4_incr(i))dnl
m4_if(m4_argn(3, check), [], [], [evaluate.sps:[]i[]: m4_argn(3, check)
[error: DEBUG EVALUATE: Syntax error at `>'.]],
dnl # ~= token can't be split:
[[1 ~ = 1], [error],
- [error: DEBUG EVALUATE: Syntax error at `NOT': expecting end of command.]])
+ [error: DEBUG EVALUATE: Syntax error at `~': expecting end of command.]])
CHECK_EXPR_EVAL([exp lg10 ln sqrt abs mod mod10 rnd trunc],
[[exp(10)], [22026.47]],
AT_CHECK([pspp -O format=csv parse.sps], [1], [dnl
parse.sps:10: error: IF: Unknown identifier y.
-parse.sps:10: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
+parse.sps:11: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
AT_CLEANUP
2.00
])
AT_CLEANUP
+
+AT_SETUP([lexer properly reports scan errors])
+AT_DATA([lexer.sps], [dnl
+x'123'
+x'1x'
+u''
+u'012345678'
+u'd800'
+u'110000'
+'foo
+'very long unterminated string that be ellipsized in its error message
+1e .x
+`
+�
+])
+AT_CHECK([pspp -O format=csv lexer.sps], [1], [dnl
+"lexer.sps:1.1-1.6: error: Syntax error at `x'123'': String of hex digits has 3 characters, which is not a multiple of 2."
+
+lexer.sps:2.1-2.5: error: Syntax error at `x'1x'': `x' is not a valid hex digit.
+
+"lexer.sps:3.1-3.3: error: Syntax error at `u''': Unicode string contains 0 bytes, which is not in the valid range of 1 to 8 bytes."
+
+"lexer.sps:4.1-4.12: error: Syntax error at `u'012345678'': Unicode string contains 9 bytes, which is not in the valid range of 1 to 8 bytes."
+
+lexer.sps:5.1-5.7: error: Syntax error at `u'd800'': U+D800 is not a valid Unicode code point.
+
+lexer.sps:6.1-6.9: error: Syntax error at `u'110000'': U+110000 is not a valid Unicode code point.
+
+lexer.sps:7.1-7.4: error: Syntax error at `'foo': Unterminated string constant.
+
+lexer.sps:8.1-8.70: error: Syntax error at `'very long unterminated string that be ellipsized in its err...': Unterminated string constant.
+
+lexer.sps:9.1-9.2: error: Syntax error at `1e': Missing exponent following `1e'.
+
+lexer.sps:9.4: error: Syntax error at `.': Unexpected `.' in middle of command.
+
+lexer.sps:9: error: Unknown command `x'.
+
+lexer.sps:10.1: error: Syntax error at ``': Bad character ``' in input.
+
+lexer.sps:11.1: error: Syntax error at `�': Bad character U+FFFD in input.
+])
+AT_CLEANUP
AT_CHECK([pspp -O format=csv q2c.sps], [1], [dnl
q2c.sps:8: error: EXAMINE: VARIABLES subcommand must be given.
-q2c.sps:9: error: ONEWAY: Syntax error at end of command: expecting variable name.
+q2c.sps:9.7: error: ONEWAY: Syntax error at end of command: expecting variable name.
q2c.sps:10: error: CROSSTABS: TABLES subcommand must be given.
])
AT_CHECK([pspp -O format=csv dup-variables.sps], [1],
["dup-variables.sps:24: error: AGGREGATE: Variable name N_BREAK is not unique within the aggregate file dictionary, which contains the aggregate variables and the break variables."
-
-dup-variables.sps:24: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
-AT_CLEANUP
\ No newline at end of file
+AT_CLEANUP
x into Rx(RANK of x)
rank.sps:14: error: RANK: DEBUG XFORM FAIL transformation executed
-
-rank.sps:14: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
AT_CLEANUP
/RANK INTO foo bar wiz.
])
AT_CHECK([pspp -O format=csv rank.sps], [1], [dnl
-rank.sps:15: error: RANK: Syntax error at end of command: expecting `@{:@'.
+rank.sps:15.1: error: RANK: Syntax error at end of command: expecting `@{:@'.
-rank.sps:19: error: RANK: Syntax error at `d': expecting integer.
+rank.sps:19.11: error: RANK: Syntax error at `d': expecting integer.
rank.sps:25: error: RANK: Variable x already exists.
dnl Create a file "batch.sps" that is valid syntax only in batch mode.
m4_define([CREATE_BATCH_SPS],
[AT_DATA([batch.sps], [dnl
-input program.
-+ loop #i = 1 to 5.
-+ compute z = #i
-+ end case.
-+ end loop
-end file.
-end input program.
+input program
+loop #i = 1 to 5
++ compute z = #i
++ end case
+end loop
+end file
+end input program
])])
AT_SETUP([INSERT SYNTAX=INTERACTIVE])
AT_DATA([insert.sps], [dnl
INSERT
FILE='batch.sps'
- SYNTAX=INTERACTIVE.
+ SYNTAX=interactive.
LIST.
])
AT_CHECK([pspp -o pspp.csv insert.sps], [1], [dnl
-batch.sps:2: error: INPUT PROGRAM: Syntax error at `+': expecting command name.
-batch.sps:3: error: INPUT PROGRAM: Syntax error at `+': expecting command name.
-batch.sps:5: error: INPUT PROGRAM: Syntax error at `+': expecting command name.
-batch.sps:7: error: Input program did not create any variables.
+batch.sps:2.1-2.4: error: INPUT PROGRAM: Syntax error at `loop': expecting end of command.
+batch.sps:3: error: COMPUTE: COMPUTE is allowed only after the active file has been defined or inside INPUT PROGRAM.
+batch.sps:4: error: END CASE: END CASE is allowed only inside INPUT PROGRAM.
insert.sps:4: error: LIST: LIST is allowed only after the active file has been defined.
])
AT_CLEANUP
* The following line is erroneous
DISPLAY AKSDJ.
+LIST.
])])
AT_SETUP([INSERT ERROR=STOP])
CREATE_ERROR_SPS
AT_DATA([insert.sps], [INSERT FILE='error.sps' ERROR=STOP.
-LIST.
])
AT_CHECK([pspp -o pspp.csv insert.sps], [1], [dnl
error.sps:10: error: DISPLAY: AKSDJ is not a variable name.
warning: Error encountered while ERROR=STOP is effective.
-error.sps:10: error: Stopping syntax file processing here to avoid a cascade of dependent command failures.
])
AT_CLEANUP
AT_SETUP([INSERT ERROR=CONTINUE])
CREATE_ERROR_SPS
AT_DATA([insert.sps], [INSERT FILE='error.sps' ERROR=CONTINUE.
-LIST.
])
AT_CHECK([pspp -o pspp.csv insert.sps], [1], [dnl
error.sps:10: error: DISPLAY: AKSDJ is not a variable name.
LIST.
])
AT_CHECK([pspp -O format=csv insert.sps], [1], [dnl
-insert.sps:3: error: INSERT: Can't find `nonexistent' in include file search path.
+insert.sps:2: error: INSERT: Can't find `nonexistent' in include file search path.
insert.sps:6: error: LIST: LIST is allowed only after the active file has been defined.
])