lexer: Add support for embedded \0 bytes and missing trailing new-line.
authorBen Pfaff <blp@cs.stanford.edu>
Mon, 24 Sep 2018 03:42:07 +0000 (20:42 -0700)
committerBen Pfaff <blp@cs.stanford.edu>
Mon, 24 Sep 2018 05:51:31 +0000 (22:51 -0700)
commite0f9210e814d03bc43b6a9b30a402e403d5666b9
treef610857aa2b672f57250938a2e3e7a961c547601
parent89198b80bfd7e3893ed7499ba25b9bf94faaffb9
lexer: Add support for embedded \0 bytes and missing trailing new-line.

The lexer, at a low level, has not supported \0 bytes in the input stream
because it used such a byte as the end-of-input indicator.  This caused
some awkwardness for the higher-level lexer, which had to remove and flag
\0 bytes as it read them.  This caused a bug in the higher-level lexer,
which raised an error for each \0 byte it removed but did so when the
lexer was in an intermediate state, which could read uninitialized data.

This commit fixes the problem by adding support for \0 bytes to the
low-level lexer (segmenter).  At the same time, it adds support for input
that doesn't end in a new-line character.

Bug #54664.
Thanks to Tianxiao Gu for reporting this bug.
12 files changed:
src/language/control/repeat.c
src/language/lexer/lexer.c
src/language/lexer/lexer.h
src/language/lexer/scan.c
src/language/lexer/scan.h
src/language/lexer/segment.c
src/language/lexer/segment.h
tests/language/lexer/lexer.at
tests/language/lexer/scan-test.c
tests/language/lexer/scan.at
tests/language/lexer/segment-test.c
tests/language/lexer/segment.at