pintos-os.org Git - pspp-builds.git/commit

lexer: Reimplement for better testability and internationalization.

This commit reimplements PSPP lexical analysis from the ground up.
From a PSPP user's perspective, this should make PSPP more reliable
and make it easier to work with syntax files in non-ASCII encodings.
See the changes to NEWS for more details.

From a developer's perspective, the most visible change may be that
strings within tokens are now always encoded in UTF-8, regardless of
the syntax file's encoding.  Many of the changes in this commit are
due to this, especially those to functions that check for valid
identifiers: an identifier in UTF-8 is not necessarily the same length
when encoded in the dictionary's encoding, but limits on identifier
length must be enforced in the dictionary's encoding (otherwise it
might not be possible to write out a valid system file, since the
identifier might not fit in the fixed length fields in such files).

Another important change is that, whereas before some special syntax
had to be handled by the parser providing feedback to the lexer, now
increasing the sophistication of the lexer has enabled all PSPP syntax
to be analyzed into tokens.  This permitted some other improvements:

  - An arbitrary number of tokens of lookahead, up to the end of the
    current command, is now supported using lex_next_token() and
    related functions.

  - Before, some command implementations had a special attribute that
    meant that the top-level PSPP command parser would not consume the
    final token of the command name (because that token was not
    followed by tokenizable syntax).  This is no longer necessary and
    has been removed.

  - Before, each command implementation was responsible for ensuring
    that valid command syntax was not followed by trailing garbage,
    often by calling lex_end_of_command() as the last step of parsing.
    This is no longer necessary; the main command parser will ensure
    this for itself.

author	Ben Pfaff <blp@cs.stanford.edu>
	Sun, 20 Mar 2011 00:05:47 +0000 (17:05 -0700)
committer	Ben Pfaff <blp@cs.stanford.edu>
	Sun, 20 Mar 2011 16:43:45 +0000 (09:43 -0700)
commit	9ade26c8349b4434008c46cf09bc7473ec743972
tree	8b088c842b96696ed3fbbe17d2c92e3bdfe65274	tree \| snapshot
parent	afdf3096926b561f4e6511c10fcf73fc6796b9d2	commit \| diff

NEWS		diff \| blob \| history
Smake		diff \| blob \| history
doc/dev/concepts.texi		diff \| blob \| history
doc/flow-control.texi		diff \| blob \| history
doc/invoking.texi		diff \| blob \| history
doc/language.texi		diff \| blob \| history
doc/utilities.texi		diff \| blob \| history
perl-module/PSPP.xs		diff \| blob \| history
perl-module/t/Pspp.t		diff \| blob \| history
src/data/automake.mk		diff \| blob \| history
src/data/dictionary.c		diff \| blob \| history
src/data/dictionary.h		diff \| blob \| history
src/data/file-handle-def.c		diff \| blob \| history
src/data/gnumeric-reader.h		diff \| blob \| history
src/data/identifier.c		diff \| blob \| history
src/data/identifier.h		diff \| blob \| history
src/data/identifier2.c	[new file with mode: 0644]	blob
src/data/mrset.c		diff \| blob \| history
src/data/mrset.h		diff \| blob \| history
src/data/por-file-reader.c		diff \| blob \| history
src/data/por-file-writer.c		diff \| blob \| history
src/data/procedure.c		diff \| blob \| history
src/data/procedure.h		diff \| blob \| history
src/data/sys-file-reader.c		diff \| blob \| history
src/data/sys-file-writer.c		diff \| blob \| history
src/data/variable.c		diff \| blob \| history
src/data/variable.h		diff \| blob \| history
src/data/vector.c		diff \| blob \| history
src/data/vector.h		diff \| blob \| history
src/language/automake.mk		diff \| blob \| history
src/language/command.c		diff \| blob \| history
src/language/command.def		diff \| blob \| history
src/language/control/automake.mk		diff \| blob \| history
src/language/control/do-if.c		diff \| blob \| history
src/language/control/loop.c		diff \| blob \| history
src/language/control/repeat.c		diff \| blob \| history
src/language/control/repeat.h	[deleted file]	blob \| history
src/language/control/temporary.c		diff \| blob \| history
src/language/data-io/combine-files.c		diff \| blob \| history
src/language/data-io/data-list.c		diff \| blob \| history
src/language/data-io/data-parser.c		diff \| blob \| history
src/language/data-io/data-reader.c		diff \| blob \| history
src/language/data-io/file-handle.q		diff \| blob \| history
src/language/data-io/get-data.c		diff \| blob \| history
src/language/data-io/inpt-pgm.c		diff \| blob \| history
src/language/data-io/save-translate.c		diff \| blob \| history
src/language/data-io/trim.c		diff \| blob \| history
src/language/dictionary/apply-dictionary.c		diff \| blob \| history
src/language/dictionary/attributes.c		diff \| blob \| history
src/language/dictionary/missing-values.c		diff \| blob \| history
src/language/dictionary/modify-variables.c		diff \| blob \| history
src/language/dictionary/mrsets.c		diff \| blob \| history
src/language/dictionary/numeric.c		diff \| blob \| history
src/language/dictionary/rename-variables.c		diff \| blob \| history
src/language/dictionary/split-file.c		diff \| blob \| history
src/language/dictionary/sys-file-info.c		diff \| blob \| history
src/language/dictionary/value-labels.c		diff \| blob \| history
src/language/dictionary/variable-label.c		diff \| blob \| history
src/language/dictionary/vector.c		diff \| blob \| history
src/language/dictionary/weight.c		diff \| blob \| history
src/language/expressions/parse.c		diff \| blob \| history
src/language/expressions/private.h		diff \| blob \| history
src/language/lexer/automake.mk		diff \| blob \| history
src/language/lexer/include-path.c	[new file with mode: 0644]	blob
src/language/lexer/include-path.h	[new file with mode: 0644]	blob
src/language/lexer/lexer.c		diff \| blob \| history
src/language/lexer/lexer.h		diff \| blob \| history
src/language/lexer/q2c.c		diff \| blob \| history
src/language/lexer/value-parser.c		diff \| blob \| history
src/language/lexer/variable-parser.c		diff \| blob \| history
src/language/lexer/variable-parser.h		diff \| blob \| history
src/language/prompt.c	[deleted file]	blob \| history
src/language/prompt.h	[deleted file]	blob \| history
src/language/stats/aggregate.c		diff \| blob \| history
src/language/stats/autorecode.c		diff \| blob \| history
src/language/stats/descriptives.c		diff \| blob \| history
src/language/stats/flip.c		diff \| blob \| history
src/language/stats/frequencies.q		diff \| blob \| history
src/language/stats/npar.c		diff \| blob \| history
src/language/stats/rank.q		diff \| blob \| history
src/language/stats/sort-cases.c		diff \| blob \| history
src/language/syntax-file.c	[deleted file]	blob \| history
src/language/syntax-file.h	[deleted file]	blob \| history
src/language/syntax-string-source.c	[deleted file]	blob \| history
src/language/syntax-string-source.h	[deleted file]	blob \| history
src/language/tests/format-guesser-test.c		diff \| blob \| history
src/language/tests/moments-test.c		diff \| blob \| history
src/language/tests/paper-size.c		diff \| blob \| history
src/language/utilities/cache.c		diff \| blob \| history
src/language/utilities/cd.c		diff \| blob \| history
src/language/utilities/date.c		diff \| blob \| history
src/language/utilities/host.c		diff \| blob \| history
src/language/utilities/include.c		diff \| blob \| history
src/language/utilities/permissions.c		diff \| blob \| history
src/language/utilities/set.q		diff \| blob \| history
src/language/utilities/title.c		diff \| blob \| history
src/language/xforms/compute.c		diff \| blob \| history
src/language/xforms/count.c		diff \| blob \| history
src/language/xforms/fail.c		diff \| blob \| history
src/language/xforms/recode.c		diff \| blob \| history
src/language/xforms/sample.c		diff \| blob \| history
src/language/xforms/select-if.c		diff \| blob \| history
src/libpspp/automake.mk		diff \| blob \| history
src/libpspp/getl.c	[deleted file]	blob \| history
src/libpspp/getl.h	[deleted file]	blob \| history
src/libpspp/message.c		diff \| blob \| history
src/libpspp/message.h		diff \| blob \| history
src/libpspp/msg-locator.c	[deleted file]	blob \| history
src/libpspp/msg-locator.h	[deleted file]	blob \| history
src/output/driver.c		diff \| blob \| history
src/ui/gui/automake.mk		diff \| blob \| history
src/ui/gui/comments-dialog.c		diff \| blob \| history
src/ui/gui/executor.c		diff \| blob \| history
src/ui/gui/executor.h		diff \| blob \| history
src/ui/gui/main.c		diff \| blob \| history
src/ui/gui/psppire-data-window.c		diff \| blob \| history
src/ui/gui/psppire-dict.c		diff \| blob \| history
src/ui/gui/psppire-syntax-window.c		diff \| blob \| history
src/ui/gui/psppire-syntax-window.h		diff \| blob \| history
src/ui/gui/psppire-var-store.c		diff \| blob \| history
src/ui/gui/psppire.c		diff \| blob \| history
src/ui/gui/psppire.h		diff \| blob \| history
src/ui/gui/syntax-editor-source.c	[deleted file]	blob \| history
src/ui/gui/syntax-editor-source.h	[deleted file]	blob \| history
src/ui/source-init-opts.c		diff \| blob \| history
src/ui/source-init-opts.h		diff \| blob \| history
src/ui/terminal/automake.mk		diff \| blob \| history
src/ui/terminal/main.c		diff \| blob \| history
src/ui/terminal/msg-ui.c	[deleted file]	blob \| history
src/ui/terminal/msg-ui.h	[deleted file]	blob \| history
src/ui/terminal/read-line.c	[deleted file]	blob \| history
src/ui/terminal/read-line.h	[deleted file]	blob \| history
src/ui/terminal/terminal-opts.c		diff \| blob \| history
src/ui/terminal/terminal-opts.h		diff \| blob \| history
src/ui/terminal/terminal-reader.c	[new file with mode: 0644]	blob
src/ui/terminal/terminal-reader.h	[new file with mode: 0644]	blob
tests/data/data-in.at		diff \| blob \| history
tests/data/sys-file-reader.at		diff \| blob \| history
tests/dissect-sysfile.c		diff \| blob \| history
tests/language/control/do-repeat.at		diff \| blob \| history
tests/language/data-io/data-list.at		diff \| blob \| history
tests/language/data-io/get.at		diff \| blob \| history
tests/language/data-io/inpt-pgm.at		diff \| blob \| history
tests/language/data-io/print.at		diff \| blob \| history
tests/language/dictionary/missing-values.at		diff \| blob \| history
tests/language/expressions/evaluate.at		diff \| blob \| history
tests/language/expressions/parse.at		diff \| blob \| history
tests/language/lexer/lexer.at		diff \| blob \| history
tests/language/lexer/q2c.at		diff \| blob \| history
tests/language/stats/aggregate.at		diff \| blob \| history
tests/language/stats/rank.at		diff \| blob \| history
tests/language/utilities/insert.at		diff \| blob \| history